Posted by amarble 23 hours ago
Personally, I don’t like the change, but it’s just how technology works so I’d rather move with the flow than try to stick my foot down and freeze time.
Yes but why does that matter? If I am happy with its capabilities now, I will continue being happy with its capabilities in the future.
Yes, it cannot do the newest magic shit, but why does that matter? It can still do everything that existed up until that point, which is _a lot_.
Eventually, you might also need something new, but it's not like the world shifts over all problems that exist from <old> to <new> and any tech for <old> problems suddenly becomes obsolete?
If the model prefers a version of Ruby or node with an RCE, I guess you can burn tokens to teach the model how to avoid the introducing the vulnerability into your code?
That feels quite tedious and token inefficient..
Yes. Yes. The only way one can write secure software is by always using the latest SOTA model. Anything else is inefficient and vulnerable.
I hate this platform
Maybe you missed this article, but vercel found it quite annoying to teach AI about the latest updates in the React Framework.
I think you’re confusing my point. I’m not saying that only SOTA models can write secure software, I’m saying that the models produced today will write software that’s considered insecure by 2034 standards, thus you would require to burn more tokens in AGENTS.md or burn more of your time to hand write code.
For example, you’re more than welcome to run Windows ME if it does everything you need it to, but that doesn’t mean Windows ME is a secure environment.
The API rates are very affordable once you start to optimize for the fact that prepaid tokens seem to massively outperform other kinds of tokens.
I can often do with 1 million tokens what my peers have failed to do with 100 million. For me to spend more than $200/m in prepaid API tokens I'd have to pull a 007 work schedule.
Because my 500m tokens so far this month would cost me about $500. My subscription is 100$/month.
An open weight inference provider only needs to pay for GPUs, or discounted APIs from 3rd party vendors. Same basic financial model but they didn't spend a trillion dollars so their loss isn't as high so they can afford to do more inference for less money, and their demand isn't as high so there's more than enough compute.
I think it would be pretty neat to launch a service helping people who wanted to participate in something like that locate one another.
There's a post at the top of /r/localllama about this exact math right now: https://www.reddit.com/r/LocalLLaMA/comments/1ubrcwj/tokenom...
TL;DR: Running GLM 5.2 is going to cost about $20K minimum, and that's going to be painfully slow compared to the cloud hosted versions. Even the estimates where the server is computing tokens 24/7 you can't break even for several years.
The only reason to run locally is if complete data privacy is your top concern. You pay a high premium for that.
If you can bring the load to run the model on close to optimal hardware 24/7 with multiple concurrent requests, and have reasonably cheap power and AC, you would break even in a reasonable timespan. Which won't happen unless you are self-hosting for a medium-sized company. I guess you could sell your spare capacity to get better utilization ... and we've reinvented hosted inference
The appeal to me is that we can run that, but we can also run smaller models on your laptop _and it’s functional!_ I can run DeepSeek v4 flash and a qwen 3.6 on my laptop! Thats crazy good.
Don’t get me wrong. I wish I could run a local model and be happy about it. At the moment, I’m not.
uh.. no?
The whole thing is that it cannot be enshittified, because there's not just a single party having control over it.
As it has happened, is happening and will happen.
With open weights, you cannot easily be rugpulled or locked out or any of that stuff. If the corp attempts that, someone else with an server farm will gladly take you as a customer with absolutely 0 changes to your workflow other than swapping out the API URL + Key.
You'll be talking to the same model with the same personality and same knowledge.
Sure, there may be some cases and reasons for local models and industry is so large they will continue to make progress and gather economic value and users for specific use case; but frontier will command vast majority of the economic value distinct from Linux and open source where the model created better than proriatary economic incentives around development
Also, on that note. Not every company needs 10x developers, just as not every task needs frontier llms. Ultimately, operating costs will be the largest contributing factor.
Ultimately its a financial game. Open source is far cheaper so it already has an upper-hand. Frontier models have to justify financially why they are worth the additional spend.
$10 a month gets you generous usage with the best open weight models and they claim to have zero retention and not to train on your usage.
It’s unclear to me what the advantages of openrouter are but it seems to be a default I see many people talking about here.
The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.
The advantage of OpenRouter compared to OpenCode Go is that the price for DeepSeek-V4-Pro and MiMo-V2.5-Pro is better on OpenRouter.
For example, DeepSeek costs $0.435/0.87/0.003625 for 1M in/out/cached tokens (https://openrouter.ai/deepseek/deepseek-v4-pro), compared to an equivalent of $1.74/3.48/0.0145 under the OpenCode Go plan (https://opencode.ai/docs/go/#usage-limits), almost exactly 4x.
But since you get a monthly usage limit of $60 with the OpenCode Go plan for $10 (i.e. 6x), you might still come out ahead if you use it a lot (or use other models, where the pricing difference is smaller or non-existent).
“The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.”
Opencode Go gives you a choice between “the best” open weight models and you’re not tied down to just GLM or MiniMax and Zen gives you an even longer list of providers including Claude and GPT?
Is it that Openrouter gives you access to like… every model and provider?
So while it is not complicated and certainly something that can be solved, it is not plug and play.
That being said, we switch to open weight models earlier this month and the results has been more than positive so far. The cost savings are also hard to dismiss.
1. Evals that can quickly tell you how much downside there is to switching 2. Something like OpenRouter that can help you run those evals quickly
Now #2 is starting to become popular, and I think we'll soon see more people adopting a model-agnostic approach. Of course, there will still be high-intelligence use cases where nothing comes close to Claude or GPT.
Whether you're using SDK or harness based agents, having evals means you're able to modify any part of your agent and still know what satisfies your "good enough".
It's great for designing products that are easy to change as well.
Whatever reason people have to run those (cheaper? backwards compatibility once you get something running) surely applies to the open models too, maybe even more so.
Right now, due to profound shortfalls in both data and hardware compared to the US labs, the OSS models are IMO basically technology demonstrators that in practise are even more jagged than the US labs' efforts. The high points of the jaggedness are close - but number of happy paths is many times fewer, and their behaviour inside the harness is far less refined. Barring some incredible breakthrough I don't think that is changing without a much higher level of resources - which seems impossible given the current hardware environment.
I have no reason to think that Anthropic or OpenAI are in possession of some secret sauce that the Chinese labs can't duplicate given the right resources, but the fact remains that absent those resources they'll remain behind. Barring some incredible bombshell reveal from Huawei I don't think this asymmetry resolves in a year. In three years it may well be a different story.
But the question was about whether the Chinese labs will have fable-equivalence in 1 year. I am by no means some kind of insider, but knowing the vaguest outlines of what went into Mythos, they just can't do it. The compute is not there. The Chinese engineers are incredible, but they're not literal magicians.
Of course there could be something incredible to come out of left field and overturn the apple cart yet again, but that's speculation. It would be awesome, sure! But I wouldn't bet too heavily on it.
And FWIW - again, no disrespect at all to the Chinese engineers but I don't rate GLM5.2 as being even close to opus 4.6. It can hit a few benchmarks, sure, that's the top edge of the "jag". But filling in the rest of the capabilities - again, it takes compute and data the OSS labs just don't have, that anyone knows about at least.