Posted by meetpateltech 8 hours ago
It's a big deal that open-source capability is less than a year behind frontier models.
And I'm very, very glad it is. A world in which LLM technology is exclusive and proprietary to three companies from the same country is not a good world.
>China’s philosophy is different. They believe model capabilities do not matter as much as application. What matters is how you use AI.
>The main flaw is that this idea treats intelligence as purely abstract and not grounded in physical reality. To improve any system, you need resources. And even if a superintelligence uses these resources more effectively than humans to improve itself, it is still bound by the scaling of improvements I mentioned before — linear improvements need exponential resources. Diminishing returns can be avoided by switching to more independent problems – like adding one-off features to GPUs – but these quickly hit their own diminishing returns.
Literally everyone already knows the problems with scaling compute and data. This is not a deep insight. His assertion that we can't keep scaling GPUs is apparently not being taken seriously by _anyone_ else.
While I do understand your sentiment, it might be worth noting the author is the author of bitandbytes. Which is one of the first library with quantization methods built in and was(?) one of the most used inference engines. I’m pretty sure transformers from HF still uses this as the Python to CUDA framework
> They believe model capabilities do not matter as much as application.
Tell me their tone when their hardware can match up.
It doesn't matter because they can't make it matter (yet).
Claude Opus 4.6: 65.5%
GLM-5: 62.6%
GPT-5.2: 60.3%
Gemini 3 Pro: 59.1%
Why is GLM 5 more expensive than GLM 4.7 even when using sparse attention?
There is also a GLM 5-code model.
2. cost is only a singular input into price determination and we really have absolutely zero idea what the margins on inference even are so assuming the current pricing is actually connected to costs is suspect.
I am still waiting if they'd launch GLM-5 Air series,which would run on consumer hardware.
So far I haven't managed to get comparably good results out of any other local model including Devstral 2 Small and the more recent Qwen-Coder-Next.
We already know that intelligence scales with the log of tokens used for reasoning, but Anthropic seems to have much more powerful non-reasoning models than its competitors.
I read somewhere that they have a policy of not advancing capabilities too much, so could it be that they are sandbagging and releasing models with artificially capped reasoning to be at a similar level to their competitors?
How do you read this?
Intelligence per <consumable> feels closer. Per dollar, or per second, or per watt.
Dollar/watt are not public and time has confounders like hardware.