Posted by zixuanlimit 5 hours ago
> "build a Linux-style desktop environment as a web application"
They claim "50 applications from scratch", but "Browser" and a bunch of the other apps are likely all <iframe> elements.We all know that building a spec-compliant browser alone is a herculean task.
Would it succeed? Probably not, but it would be way more interesting, even if it didn't work.
I find things like Claude's C compiler way more interesting where, even though CCC is objectively bad (code is messy, generates very bad unoptimized code, etc) it at least is something cool and shows that with some human guideance it could generate something even better.
For short-term bugfixing and tweaks though, it does about what I'd expect from Sonnet for a pretty low price.
https://github.com/Opencode-DCP/opencode-dynamic-context-pru...
Since the entire purpose, focus and motivation of this model seems to have been "coherency over longer contexts", doesn't that issue makes it not an OK model? It's bad at the thing it's supposed to be good at, no?
It does devolve into gibberish at long context (~120k+ tokens by my estimation but I haven't properly measured), but this is still by far the best bang-for-buck value model I have used for coding.
It's a fine model
as kimi did a huge amount of claude distilation it seems to be somewhat based in data
https://www.anthropic.com/news/detecting-and-preventing-dist...
I'm curious how the bang for buck ratio works in comparison. My initial tests for coding tasks have been positive and I can run it at home. Bigger models I assume are still better on harder tasks.
So I need them to not only not devolve into gibberish, but remain smart enough to be useful at contexts several times longer than that.
I suspect that this isn't the model, but something that z.ai is doing with hosting it. At launch I was related to find glm-5.1 was stable even as the context window filled all the way up (~200k). Where-as glm-5, while it could still talk and think, but had forgotten the finer points of tool use to the point where it was making grevious errors as it went (burning gobs of tokens to fix duplicate code problems).
However, real brutal changes happened sometimes in the last two or three months: the parent problem emerged and emerged hard, out of nowhere. Worse, for me, it seemed to be around 60k context windows, which was heinous: I was honestly a bit despondent that my z.ai subscription had become so effectively useless. That I could only work on small problems.
Thankfully the coherency barrier raised signficiantly around three weeks go. It now seems to lose its mind and emits chaotic non-sentance gibberish around 100k for me. GLM-5 was already getting pretty shaky at this point, so I feel like I at least have some kind of parity. But at least glm-5 was speaking & thinking with real sentances, I could keep conversing with it somewhat, where-as glm-5.1 seems to go from perfectly level headed working fine to all of a sudden just total breakdown, hard switch, at such a predictable context window size.
It seems so so probable to me that this isn't the model that's making this happen: it's the hosting. There's some KV cache issue, or they are trying to expand the context window in some way, or to switch from one serving pool of small context to a big context serving pool, or something infrastructure wise that falls flat and collapses. Seeing the window so clearly change from 200k to 60k to 100k is both hope, but also, misery.
I've been leaving some breadcrumbs on Bluesky as I go. It's been brutal to see. Especially having tasted a working glm-5.1. I don't super want to pay API rates to someone else, but I fully expect this situation to not reproduce on other hosting, and may well spend the money to try and see. https://bsky.app/profile/jauntywk.bsky.social/post/3mhxep7ek...
All such a shame because aside from totally going mad & speaking unpuncutaed gibberish, glm-5.1 is clearly very very good and I trust it enormously.
GLM5 also had this issue. When it was free on Openrouter / Kilo the model was rock solid though did degrade after 100k tokens gracefully. Same at launch with Zai aside from regular timeouts.
Somewhere around early-mid March zai did something significant to GLM5 - like KV quanting or model quanting or both.
After that it's been russian roulette. Sometimes it works flawlessly but very often (1/4 or 1/5 of the time) thinking tokens spill into main context and if you don't spot it happening it can do real damage - heavily corrupting files, deleting whole directories.
You can see the pain by visiting the zai discord - filled with reports of the issue yet radio silence by zai.
Tellingly despite being open source not a single provider will sell you access to this model at anything approaching the plans zai offers. The numbers just don't work so your choice is either pay per token significantly more and get reliability or put up with the bait and switch.
The bar is very low :(
But I used 70m tokens yesterday on glm-5.1 (thanks glm for having good observability of your token usage unlike openai, dunno about anthropic). And got incredible beautiful results that I super trust. It's done amazing work.
This limitation feels very shady and artificial to me, and i don't love this, but I also feel like I'm working somewhat effectively within the constraints. This does put a huge damper on people running more autonomous agentic systems, unless they have Pi or other systems that can more self adaptively improve the harness.
[[you guys, please don't post like this to HN - it will just irritate the community and get you flamed]]
Interesting.
Hopefully these aren't bots created by Z.AI because GLM doesn't need fake engagement.
Thanks for watching out for the quality of HN...
These are different from the submitter-passed-a-link-to-friends kind of upvoting and booster comments, which feel quaint by comparison. In this case people usually don't know they are breaking HN's rules, which is why they don't try to hide it.
There are YC members in the current batch who are spamming us right now [2]. They are all obvious engagement-bait questions which are conveniently answered with references to the SaaS.
[0]: https://www.reddit.com/r/DoneDirtCheap/comments/1n5gubz/get_...
[1]: https://www.reddit.com/r/AIJobs/comments/1oxjfjs/hiring_paid...
[2]: https://www.reddit.com/r/androiddev/comments/1sdyijs/no_code...