Top
Best
New

Posted by meetpateltech 8 hours ago

GLM-5: From Vibe Coding to Agentic Engineering(z.ai)
348 points | 189 commentspage 2
woeirua 7 hours ago||
It might be impressive on benchmarks, but there's just no way for them to break through the noise from the frontier models. At these prices they're just hemorrhaging money. I can't see a path forward for the smaller companies in this space.
lukev 5 hours ago||
I expect that the reason for their existence is political rather than financial (though I have no idea how that's structured.)

It's a big deal that open-source capability is less than a year behind frontier models.

And I'm very, very glad it is. A world in which LLM technology is exclusive and proprietary to three companies from the same country is not a good world.

syntaxing 7 hours ago|||
Tim Dettmers had an interesting take on this [1]. Fundamentally, the philosophy is different.

>China’s philosophy is different. They believe model capabilities do not matter as much as application. What matters is how you use AI.

https://timdettmers.com/2025/12/10/why-agi-will-not-happen/

woeirua 7 hours ago|||
Sorry, but that's an exceptionally unimpressive article. The crux of his thesis is:

>The main flaw is that this idea treats intelligence as purely abstract and not grounded in physical reality. To improve any system, you need resources. And even if a superintelligence uses these resources more effectively than humans to improve itself, it is still bound by the scaling of improvements I mentioned before — linear improvements need exponential resources. Diminishing returns can be avoided by switching to more independent problems – like adding one-off features to GPUs – but these quickly hit their own diminishing returns.

Literally everyone already knows the problems with scaling compute and data. This is not a deep insight. His assertion that we can't keep scaling GPUs is apparently not being taken seriously by _anyone_ else.

syntaxing 6 hours ago|||
Was more mentioning the article about the economic aspect of China vs US in terms of AI.

While I do understand your sentiment, it might be worth noting the author is the author of bitandbytes. Which is one of the first library with quantization methods built in and was(?) one of the most used inference engines. I’m pretty sure transformers from HF still uses this as the Python to CUDA framework

qprofyeh 6 hours ago|||
There are startups in this space getting funded as we speak: https://olix.com/blog/compute-manifesto
re-thc 6 hours ago|||
When you have export restrictions what do you expect them to say?

> They believe model capabilities do not matter as much as application.

Tell me their tone when their hardware can match up.

It doesn't matter because they can't make it matter (yet).

riku_iki 6 hours ago||
maybe being in China gives them advantage of electricity cost, which could be big chunk of bill..
riku_iki 3 hours ago||
Also, LLM prices include all other capital expenditures: building/maintaining datacenter, paying salary to SWEs, fees to financial transactions (investments) middlemen, which could be much cheaper in China.
goldenarm 6 hours ago|||
If you're tired of cross-referencing the cherry-picked benchmarks, here's the geometric mean of SWE-bench Verified & HLE-tools :

Claude Opus 4.6: 65.5%

GLM-5: 62.6%

GPT-5.2: 60.3%

Gemini 3 Pro: 59.1%

CDieumegard 2 hours ago|||
Interesting timing — GLM-4.7 was already impressive for local use on 24GB+ setups. Curious to see when the distilled/quantized versions of GLM-5 drop. The gap between what you can run via API vs locally keeps shrinking. I've been tracking which models actually run well at each RAM tier and the Chinese models (Qwen, DeepSeek, GLM) are dominating the local inference space right now
algorithm314 7 hours ago|||
Here is the pricing per M tokens. https://docs.z.ai/guides/overview/pricing

Why is GLM 5 more expensive than GLM 4.7 even when using sparse attention?

There is also a GLM 5-code model.

logicprog 7 hours ago||
I think it's likely more expensive because they have more activated parameters, which kind of outweighs the benefits of DSA?
l5870uoo9y 7 hours ago||
It's roughly three times cheaper than GPT-5.2-codex, which in turn reflects the difference in energy cost between US and China.
anthonypasq 7 hours ago|||
1. electricity costs are at most 25% of inference costs so even if electricity is 3x cheaper in china that would only be a 16% cost reduction.

2. cost is only a singular input into price determination and we really have absolutely zero idea what the margins on inference even are so assuming the current pricing is actually connected to costs is suspect.

re-thc 7 hours ago|||
It reflects the Nvidia tax overhead too.
bigyabai 4 hours ago||
Not really, Western AI companies can set their margins at whatever they want.
beAroundHere 8 hours ago|||
I'd say that they're super confident about the GLM-5 release, since they're directly comparing it with Opus 4.5 and don't mention Sonnet 4.5 at all.

I am still waiting if they'd launch GLM-5 Air series,which would run on consumer hardware.

revolvingthrow 7 hours ago||
Qwen and GLM both promise the stars in the sky every single release and the results are always firmly in the "whatever" range
CuriouslyC 3 hours ago||
Qwen famously benchmaxxes. GLM is more robust, I'd say it's comparable to DeepSeek in that regard.
esafak 8 hours ago||
I place GLM 4.7 behind Sonnet.
2001zhaozhao 4 hours ago|||
GLM-4.7-Flash was the first local coding model that I felt was intelligent enough to be useful. It feels something like Claude 4.5 Haiku at a parameter size where other coding models are still getting into loops and making bewilderingly stupid tool calls. It also has very clear reasoning traces that feel like Claude, which does result in the ability to inspect its reasoning to figure out why it made certain decisions.

So far I haven't managed to get comparably good results out of any other local model including Devstral 2 Small and the more recent Qwen-Coder-Next.

khimaros 4 hours ago|
minimax-m.2 is close
Aeroi 2 hours ago|||
benchmark and pricing made me realize how good kimi 2.5 is. im an opus 4.6 person but wow, its almost 5x cheaper.
pu_pe 7 hours ago|||
Really impressive benchmarks. It was commonly stated that open source models were lagging 6 months behind state of the art, but they are likely even closer now.
jnd0 7 hours ago||
Probably related: https://news.ycombinator.com/item?id=46974853
tomhow 1 hour ago||
Comments moved thither. Thanks!
cmrdporcupine 7 hours ago||
yes, plenty of good convo over there, the two should probably be merged
mnicky 6 hours ago||
What I haven't seen discussed anywhere so far is how big a lead Anthropic seems to have in intelligence per output token, e.g. if you look at [1].

We already know that intelligence scales with the log of tokens used for reasoning, but Anthropic seems to have much more powerful non-reasoning models than its competitors.

I read somewhere that they have a policy of not advancing capabilities too much, so could it be that they are sandbagging and releasing models with artificially capped reasoning to be at a similar level to their competitors?

How do you read this?

[1] https://imgur.com/a/EwW9H6q

phamilton 6 hours ago|
Intelligence per token doesn't seem quite right to me.

Intelligence per <consumable> feels closer. Per dollar, or per second, or per watt.

mnicky 5 hours ago||
It is possible to think of tokens as some proxy for thinking space. At least reasoning tokens work like this.

Dollar/watt are not public and time has confounders like hardware.

More comments...