Can someone explain to me where that time usage is coming from if not from the model operation itself?
Are the individual tool calls more complex and take more time to complete? Or is the rate of tok/s lower because the model does more compute per token?
In addition to that, some of the open weights models like GLM 5.2 or DeepSeek v4 Pro tend to be MUCH slower when generating tokens, which contributes to the perceived slowness. Although I wouldn't call models like GLM 5.2 slow by any means, e.g. it is currently one of the fastest models inside Notion today.
A better way would be to use https://github.com/openbmb/MiniCPM-V
I'm not sure what exactly triggers it, but it seems to happen when it has to look at lists of countries. I suspect there must be at least one country name that triggers the safety guardrail.
You'd expect GLM to balk at something like Taiwan, but so far, it hasn't.
Part of me wants to believe they really do care about protecting the world from... something... I don't know quite what exactly tbh... but it must be costing them a small fortune to scan each input and output against N guardrails and they are a for-profit corporation who could easily turn a blind eye to all of this and simply say "what you do with this model is on you" like I would expect most corporations to.
Strange times.
https://aibenchy.com/compare/anthropic-claude-opus-4-8-mediu...
This implies Opus was potentially much (?) better value.
GLM cost a quarter but Opus was twice as fast. So we are already at GLM actually costing half when you compare on time, without even considering the extra effort and time it would take to get Opus-par results.
It's good to have cheaper options and very impressive to see the Chinese continue to set open standards in this field, but the article is maybe a little over-generous.
But, it produces solid results for a fraction of the price. Worth checking out if you have the time.
One of my goto "tests" of a new frontier models is having it rebuild a programming language from scratch. For GLM 5.2 I had it rebuild the old Rebol language in Rust:
https://github.com/mhs/rebol-clone-glm-5.2
It did a fairly good job roughing in the language for a low token cost.
Off topic, but does anyone else instantly pick up on LLMisms like this? It seems like all the models have converged on this style of writing, and improvements aren't really changing it.
There was this dude here not long ago who bought like $70k worth of gpus to research, and if I'm not mistaken his research was something related to make llms sound less llm-y. I wonder how it goes for him.
Yes, in terms of API pricing, GLM 5.2 outperforms the competition. But the only people that use API billing for their coding work are large corporations, where these highly subsidized subscriptions are being fazed out.
At the same time, none of these companies will use a Chinese API for their employees.
For individuals and smaller teams, Z.ai's coding subscription is outperformed by Anthropic and OpenAI. You probably get around the same usage with Claude, but Codex definitely offers more usage for the amount you pay.
We can have a debate how much Z.ai closed the gap to GPT5.5 and Opus 4.8, but if I can freely decide between them in a world where they all cost the same, I simply wouldn't choose GLM.
So the important question becomes: How good will the offering from Z.ai get with GLM 5.3 or 6 and how much will OpenAI and Anthropic cripple their current offering in the near future.
Employees and students used to coding with thousands of dollars worth of tokens (on a 20/100 dollar plan) will push enterprise to spend.
Having a Chinese model that is competitive won't displace this enterprise spend. But an open model hosted in the US/EU might.
The existence of GLM 5.2 puts a ceiling on how much OpenAI/Anthropic can charge for API Access.
Except there is no evidence of this at all, just people comparing API and subscription pricing. The leaked financial info for OpenAI shows inference is profitable right now, though it does not show a distinction between subscription and API revenue... but if subscription revenue was so lossy, it would hard for total inference to still be profitable.
I believe this is the reason why we can even have this debate. Without this kind of competition we would not have these subsidies.
I just think that as of today, most people will not find a good reason to switch to GLM.
If the world needs any more evidence of Europe's short-sightedness, it would be them running to China to spite the US (instead of creating fertile grounds for their own tech).
It's annoying that the plans are so restrictive beyond usage limits. Understandable maybe, but annoying. In practice, only Anthropic (and maybe Google) are really restrictive though. They really scared me away with their policy of charging API rates after the fact if they consider your usage not TOS-aligned. This might be an ungrounded fear that I have, but I feel this is something they'd do so they scared me away.
As well as people using 3rd party harnesses like OpenCode.
> At the same time, none of these companies will use a Chinese API for their employees
So who are Amazon Bedrock (who serve GLM) targetting?
Individuals are presumably going with one of the cheaper US providers such as DeepInfra ($0.18/M cached input for GLM vs $0.50 for Opus) or Fireworks AI.
A company can buy a NVIDIA B300 and serve it's developers in house with unlimited tokens.
nice try but you intentionally ignored the entire Chinese market & Chinese big corporates. there are 130 Chinese companies in the fortune 500 list, with an average revenue of 80 billion USD each. do you think they are going to sign up for Claude, Codex or GLM? now consider South East Asia, Africa, Middle East, Middle Asia and South America, tell me why their large corporates won't be using GLM API billings?
your western centric view of the world is totally out of date, like it or not, 2026 is vastly different from 1996, the US no longer controls high tech whatsoever.
We have come a long way, and very clearly have a long way yet to go.