There is minimal downside to switching to open models

Posted by amarble 23 hours ago

There is minimal downside to switching to open models(www.marble.onl)

357 points | 295 commentspage 2

pkulak 17 hours ago|

Sure. But OpenAI is the same price. Why would I pay $18/month for z.ai when OpenAI is $20/month?

CJefferson 17 hours ago||

One big advantage I’ve found — people get attached to models (including me). With open models if you find one that works perfectly for you but the next version doesn’t, you can run the old one forever (or someone will for you)

itake 16 hours ago|||

But… the models will fall behind. As libraries and languages and tool calling updates or the world knowledge changes, the models decay.

Personally, I don’t like the change, but it’s just how technology works so I’d rather move with the flow than try to stick my foot down and freeze time.

hypfer 10 hours ago|||

> But… the models will fall behind.

Yes but why does that matter? If I am happy with its capabilities now, I will continue being happy with its capabilities in the future.

Yes, it cannot do the newest magic shit, but why does that matter? It can still do everything that existed up until that point, which is _a lot_.

Eventually, you might also need something new, but it's not like the world shifts over all problems that exist from <old> to <new> and any tech for <old> problems suddenly becomes obsolete?

itake 9 hours ago||

ideally, the software produced should include the latest security patches.

If the model prefers a version of Ruby or node with an RCE, I guess you can burn tokens to teach the model how to avoid the introducing the vulnerability into your code?

That feels quite tedious and token inefficient..

hypfer 9 hours ago||

I'm sorry, but.. are you being serious?

Yes. Yes. The only way one can write secure software is by always using the latest SOTA model. Anything else is inefficient and vulnerable.

I hate this platform

itake 9 hours ago||

https://news.ycombinator.com/item?id=46809708

Maybe you missed this article, but vercel found it quite annoying to teach AI about the latest updates in the React Framework.

I think you’re confusing my point. I’m not saying that only SOTA models can write secure software, I’m saying that the models produced today will write software that’s considered insecure by 2034 standards, thus you would require to burn more tokens in AGENTS.md or burn more of your time to hand write code.

For example, you’re more than welcome to run Windows ME if it does everything you need it to, but that doesn’t mean Windows ME is a secure environment.

0xbadcafebee 5 hours ago||

Another solution might also be to stop reinventing the wheel every few years. New languages aren't producing better software. But people keep churning new languages out, and they become popular because humans have emotional attachment to inanimate things. If humans weren't so emotionally involved with the code, AI could happily produce C/C++ software indefinitely. (And if we could kick our dependence on the fucking browser for an application platform, we wouldn't need the horror that is the JavaScript ecosystem)

OtomotO 14 hours ago|||

No problem, "AI" will just write its own frameworks and libs then!

taytus 16 hours ago|||

This is a good point I never thought of. I appreciate it.

bob1029 7 hours ago|||

Why pay a monthly fee when you can pay for exactly the # of tokens you actually consume?

The API rates are very affordable once you start to optimize for the fact that prepaid tokens seem to massively outperform other kinds of tokens.

I can often do with 1 million tokens what my peers have failed to do with 100 million. For me to spend more than $200/m in prepaid API tokens I'd have to pull a 007 work schedule.

baby_souffle 6 hours ago||

> Why pay a monthly fee when you can pay for exactly the # of tokens you actually consume?

Because my 500m tokens so far this month would cost me about $500. My subscription is 100$/month.

slopinthebag 1 hour ago||

That’s insane. 500m tokens costs me $12 on Deepseek.

0xbadcafebee 13 hours ago|||

One reason might be request limits. OpenAI's ChatGPT Plus w/Codex ($20/month) provides a worst-case 5-hour-request-limit of 15 for GPT-5.5, 20 for GPT-5.4, 60 for GPT-5.4-Mini. Whereas Z.ai Lite ($18/month) provides a worst-case of ~80 for GLM 5.2 (off-peak; on-peak is 2am-6am New York time). So Z.ai can provide higher limits for a cheaper price. (https://codeberg.org/mutablecc/calculate-ai-cost/src/branch/...)

pbgcp2026 11 hours ago||

Subscriptions are done. By the end of 2026 everyone will be paying for actual mils of tokens consumed, via API calls.

0xbadcafebee 5 hours ago||

I don't see any indicator of that happening. And actually token count pricing is frequently being replaced with "credits pricing", and subscriptions with obscure variable limits

fulafel 16 hours ago|||

https://news.ycombinator.com/item?id=48618455

pkulak 15 hours ago||

I pay month to month.

notatoad 14 hours ago|||

the pricing page doesn't seem to call it out anymore, but the claim on z.ai coding plan used to be 3x the usage of the equivalent-price claude plan. whether that's accurate i don't know, but just based on api pricing GLM is way cheaper.

flexagoon 10 hours ago||

OpenCode Go is $10/month and the limits are much more generous than those or Codex

aitchnyu 6 hours ago||

After all the articles calculating OpenAI and Anthropic giving heavily subsidizing their subscriptions, how does OpenCode Go manage to be even cheaper?

0xbadcafebee 5 hours ago|||

OpenAI and Anthropic are trying to pay off a half trillion dollars of investment. They also have the most demand right now, to the point that Anthropic sometimes doesn't have enough compute and that means more limits. They can't stop taking new customers, though, because the market would hate it.

An open weight inference provider only needs to pay for GPUs, or discounted APIs from 3rd party vendors. Same basic financial model but they didn't spend a trillion dollars so their loss isn't as high so they can afford to do more inference for less money, and their demand isn't as high so there's more than enough compute.

flexagoon 6 hours ago|||

Because it offers cheap open source models, not GPT and Claude. I mentioned it as an alternative to Z.ai's subscription in OP's comment, not to Codex.

bnj 16 hours ago||

I’ve been wanting to get better acquainted with local inference but I don’t have the hardware, which has made me think about something I haven’t seen discussed, which is local collaboratives. The economics makes it seem like a group of people joining together to run good hardware and an open model might make sense, but I haven’t seen anything like this mentioned. Have I been missing it?

I think it would be pretty neat to launch a service helping people who wanted to participate in something like that locate one another.

Aurornis 15 hours ago||

The reason you don't see more of this is because everyone does the math, realizes it's not a good deal, and then gives up on the idea.

There's a post at the top of /r/localllama about this exact math right now: https://www.reddit.com/r/LocalLLaMA/comments/1ubrcwj/tokenom...

TL;DR: Running GLM 5.2 is going to cost about $20K minimum, and that's going to be painfully slow compared to the cloud hosted versions. Even the estimates where the server is computing tokens 24/7 you can't break even for several years.

The only reason to run locally is if complete data privacy is your top concern. You pay a high premium for that.

wongarsu 5 hours ago|||

If you invest the minimum to run the model, obviously that's more expensive per-token than investing the optimum to get the best price/performance tradeoff (which for GLM 5.2 is at least five times that figure)

If you can bring the load to run the model on close to optimal hardware 24/7 with multiple concurrent requests, and have reasonably cheap power and AC, you would break even in a reasonable timespan. Which won't happen unless you are self-hosting for a medium-sized company. I guess you could sell your spare capacity to get better utilization ... and we've reinvented hosted inference

FridgeSeal 12 hours ago||||

I mean sure, I’d you’re attempting to run the biggest possible models, it’s going to require a stupid amount of compute? I thought we all knew this?

The appeal to me is that we can run that, but we can also run smaller models on your laptop _and it’s functional!_ I can run DeepSeek v4 flash and a qwen 3.6 on my laptop! Thats crazy good.

pjc50 7 hours ago|||

.. conversely, all the cloud LLMs are being subsidized by their investors in addition to massive economies of scale.

Aurornis 6 hours ago|||

It is false to say that all cloud LLMs are subsidized. The open weights models are hosted through numerous third party providers on OpenRouter that are operating as hosting businesses. They aren’t spending investor money to provide tokens for you at below-cost rates. They’re operating as hosting businesses.

wongarsu 5 hours ago|||

economies of scale are enough to explain the entire price difference. Running 8 concurrent requests at 100 token/s on $100k hardware is a lot cheaper than running one concurrent request at 20 token/s on $20k hardware

uberex 12 hours ago|||

https://news.ycombinator.com/item?id=48524387

markerz 16 hours ago|||

There are plenty of providers of open models that offer very affordable rates. Generally, I recommend looking at OpenRouter since they track various metrics for the various providers.

blackoil 16 hours ago||

Open models hosted in Cloud???

pbgcp2026 11 hours ago||

AWS Bedrock hosts Gemma 4 31B and this is The Best Deal – hands down. Try it. Vertex also has Gemma 4 MoE version. Not "lobotomised" by quants. There are also GLM (latest) and Qwen / DS (but these two are not latest versions)

reacharavindh 10 hours ago||

It was easy to be a rebel and use Linux when it was clearly competent, but needed hacks and extra elbow grease to get it polished for use. IME, the open models are “not there yet” in terms of capability or operational needs. Sure, GLM5.2 looks competent, but I will only be able to get it to run that competent if I had a huge cluster of GPUs.. if I am accessing an open model via hosted API, I might as well run a closed model via hosted API. The incentives fall apart in comparison to using Linux 15 years ago.

Don’t get me wrong. I wish I could run a local model and be happy about it. At the moment, I’m not.

hypfer 10 hours ago|

> if I am accessing an open model via hosted API, I might as well run a closed model via hosted API.

uh.. no?

The whole thing is that it cannot be enshittified, because there's not just a single party having control over it.

As it has happened, is happening and will happen.

With open weights, you cannot easily be rugpulled or locked out or any of that stuff. If the corp attempts that, someone else with an server farm will gladly take you as a customer with absolutely 0 changes to your workflow other than swapping out the API URL + Key.

You'll be talking to the same model with the same personality and same knowledge.

mdale 19 hours ago||

I think the frontier will command premium for sometime just as slight better software developers were 10x's vs their peers as their architecture & development strategies and code approach compounded quickly. One less error per block of work compounds quickly.

Sure, there may be some cases and reasons for local models and industry is so large they will continue to make progress and gather economic value and users for specific use case; but frontier will command vast majority of the economic value distinct from Linux and open source where the model created better than proriatary economic incentives around development

byzantinegene 17 hours ago||

10x developers were not slightly better than their peers, they were vastly superior and faster. OTOH, the lead of frontier llms is diminishing as training is getting diminishing returns.

Also, on that note. Not every company needs 10x developers, just as not every task needs frontier llms. Ultimately, operating costs will be the largest contributing factor.

4fffs 18 hours ago||

Youre clutching at straws.

Ultimately its a financial game. Open source is far cheaper so it already has an upper-hand. Frontier models have to justify financially why they are worth the additional spend.

radhitya 18 hours ago||

Have you read about Opencode Go? They are great provider for open model, like GLM 5.2, Deepseek v4 Pro, Kimi 2.7 Code. You should give it shot to them :-)

2muchtime 13 hours ago|

The amount the HN community, at least from what I’ve seen, is sleeping on OpenCode Go (and zen) is kind of amazing.

$10 a month gets you generous usage with the best open weight models and they claim to have zero retention and not to train on your usage.

It’s unclear to me what the advantages of openrouter are but it seems to be a default I see many people talking about here.

johndough 12 hours ago||

> It’s unclear to me what the advantages of openrouter are but it seems to be a default I see many people talking about here.

The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.

The advantage of OpenRouter compared to OpenCode Go is that the price for DeepSeek-V4-Pro and MiMo-V2.5-Pro is better on OpenRouter.

For example, DeepSeek costs $0.435/0.87/0.003625 for 1M in/out/cached tokens (https://openrouter.ai/deepseek/deepseek-v4-pro), compared to an equivalent of $1.74/3.48/0.0145 under the OpenCode Go plan (https://opencode.ai/docs/go/#usage-limits), almost exactly 4x.

But since you get a monthly usage limit of $60 with the OpenCode Go plan for $10 (i.e. 6x), you might still come out ahead if you use it a lot (or use other models, where the pricing difference is smaller or non-existent).

2muchtime 1 hour ago||

So the cost makes sense I was unaware but

“The advantage of OpenRouter compared to using API providers directly is that you can switch between API providers without binding your money to a single provider.”

Opencode Go gives you a choice between “the best” open weight models and you’re not tied down to just GLM or MiniMax and Zen gives you an even longer list of providers including Claude and GPT?

Is it that Openrouter gives you access to like… every model and provider?

_pdp_ 10 hours ago||

There are downsides depending on how good is your harness. Switching the model is easy enough. Ensuring that the harness continues working the way it did is a completely different thing. This is not just about the prompts but also general behaviour around the model and its infrastructure.

So while it is not complicated and certainly something that can be solved, it is not plug and play.

That being said, we switch to open weight models earlier this month and the results has been more than positive so far. The cost savings are also hard to dismiss.

c-b 6 hours ago||

What's confusing to me is that there is no discussion about the actual downside experienced it's just theoretical.

shever73 5 hours ago|

There seemed to be no real discussion about anything! I was expecting more of a conclusion, but the article did not support the proposition in the headline.

arttaboi 13 hours ago||

I guess this will happen soon. There are two catalysts needed for this to happen:

1. Evals that can quickly tell you how much downside there is to switching 2. Something like OpenRouter that can help you run those evals quickly

Now #2 is starting to become popular, and I think we'll soon see more people adopting a model-agnostic approach. Of course, there will still be high-intelligence use cases where nothing comes close to Claude or GPT.

alexhans 12 hours ago|

Exactly. I'm very happy the discourse has moved on from "but X model is the best" to "you can use open models".

Whether you're using SDK or harness based agents, having evals means you're able to modify any part of your agent and still know what satisfies your "good enough".

It's great for designing products that are easy to change as well.

ZeroGravitas 11 hours ago||

It seems the best self-hosted and the worst models served by big providers has some considerable overlap in quality.

Whatever reason people have to run those (cheaper? backwards compatibility once you get something running) surely applies to the open models too, maybe even more so.

linzhangrun 18 hours ago|

Open source models are still not good enough for now, but with the current speed of one new SOTA every two months, by this time next year we will definitely have cheap open source models at least as good as Fable :)

sho 13 hours ago|

I don't think we will. The open model labs are too resource constrained to approach Fable or even Opus on the general case and I don't see that changing within a year.

Right now, due to profound shortfalls in both data and hardware compared to the US labs, the OSS models are IMO basically technology demonstrators that in practise are even more jagged than the US labs' efforts. The high points of the jaggedness are close - but number of happy paths is many times fewer, and their behaviour inside the harness is far less refined. Barring some incredible breakthrough I don't think that is changing without a much higher level of resources - which seems impossible given the current hardware environment.

I have no reason to think that Anthropic or OpenAI are in possession of some secret sauce that the Chinese labs can't duplicate given the right resources, but the fact remains that absent those resources they'll remain behind. Barring some incredible bombshell reveal from Huawei I don't think this asymmetry resolves in a year. In three years it may well be a different story.

linzhangrun 12 hours ago||

deepseek-v4-pro, probably the representative cheap opensouce LLM, was released in 2026.4 One year before, what OAI had in hand was gpt-4.1 and gpt-o3. I think it is not very controversial to say that deepseek is stronger than them, at most you can point to some post-training problems, basically the instability you mentioned. Also I am not sure if it is because the people who are best at using AI -- the people making AI -- get more development speed as the models get smarter, but my feeling is model progress is getting faster and faster. GPT-3.5 and GPT-4 were almost one year apart. The disadvantage from hardware limits and compute shortage is visible from the size of chinese models. glm-5.2, which is claimed to be around opus-4.6 level in coding, is only 744B. But Chinese engineers are obviously, how to put it, getting very effective results on "performance at the same size". And that is not even talking about the advantages from China's electricity, manpower, or even "national will" to compete against America. So saying it may take three years to catch up with a gap that is now only several months looks too pessimistic. ChatGPT itself was released only three and a half years ago, and today is already a completely different world.

sho 12 hours ago||

You may be right, and I certainly hope so!

But the question was about whether the Chinese labs will have fable-equivalence in 1 year. I am by no means some kind of insider, but knowing the vaguest outlines of what went into Mythos, they just can't do it. The compute is not there. The Chinese engineers are incredible, but they're not literal magicians.

Of course there could be something incredible to come out of left field and overturn the apple cart yet again, but that's speculation. It would be awesome, sure! But I wouldn't bet too heavily on it.

And FWIW - again, no disrespect at all to the Chinese engineers but I don't rate GLM5.2 as being even close to opus 4.6. It can hit a few benchmarks, sure, that's the top edge of the "jag". But filling in the rest of the capabilities - again, it takes compute and data the OSS labs just don't have, that anyone knows about at least.

More comments...