Top
Best
New

Posted by ritzaco 12 hours ago

GLM 5.2 vs. Opus(techstackups.com)
392 points | 276 commentspage 2
coreyburnsdev 4 hours ago|
People are looking for ways not to burn through their premium subs when in many cases all you have to do is move down to 5.4-mini codex and it will probably solve your issue while barely touching your 5 hour or weekly limits.
xg15 10 hours ago||
So GLM emits fewer tokens and does fewer tool calls, but still takes over twice as long to complete.

Can someone explain to me where that time usage is coming from if not from the model operation itself?

Are the individual tool calls more complex and take more time to complete? Or is the rate of tok/s lower because the model does more compute per token?

iagooar 10 hours ago||
I have noticed that Opus and GPT 5.5 are very good at adjusting their thinking / reasoning intensity depending on the task at hand, something the open weights models are still not as good at.

In addition to that, some of the open weights models like GLM 5.2 or DeepSeek v4 Pro tend to be MUCH slower when generating tokens, which contributes to the perceived slowness. Although I wouldn't call models like GLM 5.2 slow by any means, e.g. it is currently one of the fastest models inside Notion today.

twobitshifter 8 hours ago|||
Probably the data center where the model is running more than anything. Another option is if Opus is using anything like a Mixture of Experts approach, in which case the amount of the model loaded in memory at one time could be smaller than GLM.
radu_floricica 10 hours ago||
Could just be infra. I'm betting Anthropic is much better prepared.
js4ever 9 hours ago||
"GLM-5.2 hit a problem here, because it can't read images. It isn't multimodal. So instead of looking at a screenshot, it fell back on a hacky workaround: it wrote scripts to read the raw pixel data and check whether the colors came out roughly as expected."

A better way would be to use https://github.com/openbmb/MiniCPM-V

twobitshifter 8 hours ago|
Right, just give the text llm access to a vision specific agent and that problem can be solved. Or if you really want let it even call Opus with an image - seems like you’d still save money
InsideOutSanta 3 hours ago||
One nice thing about GLM is that it has never refused a task. I'm working on a website that renders countries right now, and Anthropic's models regularly give me the old "This request triggered safety guardrails."

I'm not sure what exactly triggers it, but it seems to happen when it has to look at lists of countries. I suspect there must be at least one country name that triggers the safety guardrail.

You'd expect GLM to balk at something like Taiwan, but so far, it hasn't.

johnnyApplePRNG 1 hour ago|
The amount of times I have had to spend tokens to attempt (in futility) to convince a proprietary model that the request I asked it to perform on code that I wrote is safe/legal/moral is insane.

Part of me wants to believe they really do care about protecting the world from... something... I don't know quite what exactly tbh... but it must be costing them a small fortune to scan each input and output against N guardrails and they are a for-profit corporation who could easily turn a blind eye to all of this and simply say "what you do with this model is on you" like I would expect most corporations to.

Strange times.

XCSme 4 hours ago||
Check out my comparison too, it has some not-really-benchmarks too (between any two models actually, SVG generation test and CSS animation test):

https://aibenchy.com/compare/anthropic-claude-opus-4-8-mediu...

mellosouls 3 hours ago||
GLM-5.2 cost a fraction as much. Opus finished in half the time and shipped a cleaner game

This implies Opus was potentially much (?) better value.

GLM cost a quarter but Opus was twice as fast. So we are already at GLM actually costing half when you compare on time, without even considering the extra effort and time it would take to get Opus-par results.

It's good to have cheaper options and very impressive to see the Chinese continue to set open standards in this field, but the article is maybe a little over-generous.

InsideOutSanta 3 hours ago|
For me, time doesn't matter for LLMs. I can start a bunch of tasks, and I'll review the PRs when they're done. Faster is nicer, but if the task gets done correctly, I'm good.
mellosouls 13 minutes ago||
Me too, I just think the comparison was a bit simplistic, at least in the expression of it.
wiremine 6 hours ago||
I've been using GLM 5.2 extensively for the last few days. It is slower, and the lack of multimodality is a bummer.

But, it produces solid results for a fraction of the price. Worth checking out if you have the time.

One of my goto "tests" of a new frontier models is having it rebuild a programming language from scratch. For GLM 5.2 I had it rebuild the old Rebol language in Rust:

https://github.com/mhs/rebol-clone-glm-5.2

It did a fairly good job roughing in the language for a low token cost.

david_shi 11 hours ago||
> GLM-5.2 cost a fraction as much. Opus finished in half the time and shipped a cleaner game.

Off topic, but does anyone else instantly pick up on LLMisms like this? It seems like all the models have converged on this style of writing, and improvements aren't really changing it.

yard2010 1 hour ago||
I cannot unsee it.

There was this dude here not long ago who bought like $70k worth of gpus to research, and if I'm not mistaken his research was something related to make llms sound less llm-y. I wonder how it goes for him.

speedgoose 11 hours ago|||
I think a bunch of real humans started to adopt the LLMs writing style.
himata4113 11 hours ago|||
Yep, as I reread my own sentances I notice these LLMisms and have to rewrite them quite often. Reading so much llm-output definitely impacts your writing style.
lelele 10 hours ago|||
Indeed. I'm trying to develop a similar style. The phrasing in the quoted passage is really tight.
jameswhitford 10 hours ago|||
This is excellent feedback thank you! These LLMisms in writing are a challenge I am living with currently and trying to improve on. The technical writing industry is taking a huge knock right now with companies demanding more work in less time with a big drop in quality, day to day I get less and less time to work on the quality in the prose of my work. We are working at the frontier of this right now, so we are the most heavily effected, but also get to experiment with the changes first which can be both stimulating and very frustrating.
VulgarExigency 10 hours ago||
Yes, and it's really grating. It's like half of all new writing is done in the same "voice" now.
pietz 10 hours ago||
GLM 5.2 has one big issue that will limit its meaningful success and that's the value of their coding subscription.

Yes, in terms of API pricing, GLM 5.2 outperforms the competition. But the only people that use API billing for their coding work are large corporations, where these highly subsidized subscriptions are being fazed out.

At the same time, none of these companies will use a Chinese API for their employees.

For individuals and smaller teams, Z.ai's coding subscription is outperformed by Anthropic and OpenAI. You probably get around the same usage with Claude, but Codex definitely offers more usage for the amount you pay.

We can have a debate how much Z.ai closed the gap to GPT5.5 and Opus 4.8, but if I can freely decide between them in a world where they all cost the same, I simply wouldn't choose GLM.

So the important question becomes: How good will the offering from Z.ai get with GLM 5.3 or 6 and how much will OpenAI and Anthropic cripple their current offering in the near future.

Certhas 10 hours ago||
My impression is that individual subscriptions are the loss leading hook. The money is made on Enterprise token contracts.

Employees and students used to coding with thousands of dollars worth of tokens (on a 20/100 dollar plan) will push enterprise to spend.

Having a Chinese model that is competitive won't displace this enterprise spend. But an open model hosted in the US/EU might.

The existence of GLM 5.2 puts a ceiling on how much OpenAI/Anthropic can charge for API Access.

LUmBULtERA 9 hours ago|||
> My impression is that individual subscriptions are the loss leading hook

Except there is no evidence of this at all, just people comparing API and subscription pricing. The leaked financial info for OpenAI shows inference is profitable right now, though it does not show a distinction between subscription and API revenue... but if subscription revenue was so lossy, it would hard for total inference to still be profitable.

CuriouslyC 9 hours ago|||
Anthropic has indicated in the past that API gross margins are ~60%. This might have improved since then, though competition from OAI puts a ceiling on that.
LUmBULtERA 8 hours ago||
Subscription inference can also be cheaper than the cost of API inference if the provider wants it to -- providers can do flexible scheduling for subscription inference for example, around API inference, to lower its cost and get better utilization of the hardware.
Certhas 9 hours ago|||
I did clearly say "my impression is". And you have no evidence to the contrary. We don't even reliably in w how many subscribers Vs enterprise customers they have. And the OpenAI leak doesn't even cleanly say that inference is profitable from what I can tell... The better evidence that it probably is are the prices charged by open weight model providers.
LUmBULtERA 8 hours ago||
Fair enough, there is not strong specific evidence to the contrary except about overall inference being profitable for OpenAI (as well as the open weight model providers hosted throughout the world).
fbnszb 8 hours ago||||
> The existence of GLM 5.2 puts a ceiling on how much OpenAI/Anthropic can charge for API Access.

I believe this is the reason why we can even have this debate. Without this kind of competition we would not have these subsidies.

pietz 9 hours ago|||
To be clear, I agree with this and they have my unlimited support pushing for relevance of open source models. GLM 5.2 is amazing and I couldn't be more excited.

I just think that as of today, most people will not find a good reason to switch to GLM.

twobitshifter 8 hours ago|||
Taking a view from outside the USA, European companies just had Fable taken away due to US export controls, and before that Anthropic announced it is holding their data for 30 days. There is immediate value to these firms to build their infrastructure around an AI that won’t be pulled away from them. And outside of Europe, other countries are more price sensitive and don’t have the same fear of building relationships with Chinese companies.
WarmWash 6 hours ago|||
There is no such thing as a relationship with "chinese companies". In China there is just the State, and that is it.

If the world needs any more evidence of Europe's short-sightedness, it would be them running to China to spite the US (instead of creating fertile grounds for their own tech).

metobehonest 5 hours ago||
No one is running to China to "spite the US". Recent geopolitical developments have shown the US to be a violent, unpredictable and unreliable partner.
SubiculumCode 6 hours ago|||
And you have that guarantee from Xi?
bornfreddy 3 hours ago||
With openweights? Yes. It might halucinate a backdoor somewhere ( not that you can trust any model about that), but it will still work.
edg5000 8 hours ago|||
This is an important point. I suspect API pricing will eventually disappear just like how paying for an MMS disappeared. It's an antiquated model. The bulk of the work is being done on "coding plans" is my wild guess.

It's annoying that the plans are so restrictive beyond usage limits. Understandable maybe, but annoying. In practice, only Anthropic (and maybe Google) are really restrictive though. They really scared me away with their policy of charging API rates after the fact if they consider your usage not TOS-aligned. This might be an ungrounded fear that I have, but I feel this is something they'd do so they scared me away.

HarHarVeryFunny 7 hours ago|||
> But the only people that use API billing for their coding work are large corporations

As well as people using 3rd party harnesses like OpenCode.

> At the same time, none of these companies will use a Chinese API for their employees

So who are Amazon Bedrock (who serve GLM) targetting?

Individuals are presumably going with one of the cheaper US providers such as DeepInfra ($0.18/M cached input for GLM vs $0.50 for Opus) or Fireworks AI.

veber-alex 6 hours ago|||
The value of these models is that you can run them on your own hardware.

A company can buy a NVIDIA B300 and serve it's developers in house with unlimited tokens.

tw1984 6 hours ago|||
> At the same time, none of these companies will use a Chinese API for their employees.

nice try but you intentionally ignored the entire Chinese market & Chinese big corporates. there are 130 Chinese companies in the fortune 500 list, with an average revenue of 80 billion USD each. do you think they are going to sign up for Claude, Codex or GLM? now consider South East Asia, Africa, Middle East, Middle Asia and South America, tell me why their large corporates won't be using GLM API billings?

your western centric view of the world is totally out of date, like it or not, 2026 is vastly different from 1996, the US no longer controls high tech whatsoever.

tpm 8 hours ago|||
Also, I was testing out the GLM 5.2 using Openrouter because that's where I've got an account with some money and then when I wanted to perhaps subscribe for a better deal at z.ai, their infra was clearly overloaded to the point the 5.2 was timing out on 100% of chat requests, so perhaps I will try later when the infrastructure catches up with the model capability. Only then I can make sure their subscription is worth it.
jauntywundrkind 8 hours ago||
I'm on glm pro subscription and I get so so so much more usage than Claude or Codex! I hammer on glm all day. It's a more expensive plan, but I would need a much much much bigger plan for codex or Claude to do what I do.
elliotbnvl 4 hours ago|
It is insane that we are comparing locally-hostable models to leading cloud providers, it is wild to me that this article even exists.

We have come a long way, and very clearly have a long way yet to go.

nijave 4 hours ago|
Calling GLM-5.2 locally hostable is a bit of a stretch. It's 1.5Ti of weights at bf16. FP8 requires >800Gi of VRAM which is well into data center multi-GPU systems
elliotbnvl 4 hours ago||
It's more about the trajectory.
More comments...