Top
Best
New

Posted by pretext 12/22/2025

GLM-4.7: Advancing the Coding Capability(z.ai)
437 points | 235 commentspage 2
gigatexal 12/22/2025|
Even if this is one or two iterations behind the big models Claude or openai or Gemini it’s showing large gains. Here’s hoping this gets even better and better and I can run this locally and also that it doesn’t melt my PC.
Imustaskforhelp 12/22/2025|
Although one would hope they can run it locally (which I hope so too but I doubt that with the increase of ram prices, I feel like its possible around 2027-2028). but Even if in the meanwhile we can't, I am sure that competition in general (on places like Openrouter and others) would give a meaningful way to cheapen the prices overall even further than the monopolistic ways of claude (let's say).

It does feel like these models are only behind 6 months tho as many like to say and for some things its 100% reasonable to use it and for some others not so much.

gigatexal 12/23/2025||
I’ve 128GB of memory in my laptop. But running models with LM studio turns the fans to 100 and isn’t as effective as the hosted models. So I’m not worried about ram. I’m hoping for a revolution or what comes after LLMs to see if local will be better.
cmrdporcupine 12/22/2025||
Running it in Crush right now and so far fairly impressed. It seems roughly in the same zone as Sonnet, but not as good as Opus or GPT 5.2.
alok-g 12/23/2025|
For others like me who did not know about Crush:

https://github.com/charmbracelet/crush

https://news.ycombinator.com/item?id=44736176

philipkiely 12/23/2025||
GLM 4.6 has been very popular from my perspective as an inference provider with a surprising number of people using it as a daily driver for coding. Excited to see the improvements 4.7 delivers, this model has great PMF so to speak.
sumedh 12/23/2025||
When I click on Subscribe on any of the plan, nothing happens. I see this error on Dev Tools.

page-3f0b51d55efc183b.js:1 Uncaught TypeError: Cannot read properties of undefined (reading 'toString') at page-3f0b51d55efc183b.js:1:16525 at Object.onClick (page-3f0b51d55efc183b.js:1:17354) at 4677-95d3b905dc8dee28.js:1:24494 at i8 (aa09bbc3-6ec66205233465ec.js:1:135367) at aa09bbc3-6ec66205233465ec.js:1:141453 at nz (aa09bbc3-6ec66205233465ec.js:1:19201) at sn (aa09bbc3-6ec66205233465ec.js:1:136600) at cc (aa09bbc3-6ec66205233465ec.js:1:163602) at ci (aa09bbc3-6ec66205233465ec.js:1:163424)

A bit weird for an AI coding model company not to have seamless buying experience

Bayaz 12/23/2025|
Subscribe didn’t do anything for me until I created an account.
LoveMortuus 12/23/2025||
I tried the web chat with their model, I asked only one thing: "version check". It replied with the following: "I am Claude, made by Anthropic. My current model version is Claude 3.5 Sonnet."
bonoboTP 12/23/2025||
I cannot reproduce this. It says it's GLM by Z.ai.
gessha 12/23/2025||
I got 4o. When I edited the prompt several times it started questioning my intentions and towards the later side it responded with GLM 4.7.
esafak 12/22/2025||
The terminal bench scores look weak but nice otherwise. I hope once the benchmarks are saturated, companies can focus on shrinking the models. Until then, let the games continue.
anonzzzies 12/22/2025||
Shrinking and speed; speed is a major thing. Claude Code is just too slow, very good but it has no reasonable way to handle simple requests because of the overhead, so then everything should just be faster. If I were Anthropic, I would've bought Groq or Cerebras by now. Not sure if they (or the other big ones) are working on similar inference hardware to provide 2000tok/s or more.
pqtyw 12/23/2025||
Z.ai (at least mid/top end subscription not sure about the API) is pretty slow too especially during some periods. Cerebras of course is probably a different story (if its not quantitized)
bigyabai 12/22/2025|||
It's a good model, for what it is. Z.ai's big business prop is that you can get Claude Code with their GLM models at much lower prices than what Anthropic charges. This model is going to be great for that agentic coding application.
maxdo 12/22/2025||
… and wake up every night because you saved a few dollars , there are bugs and they are due to this decision?
bigyabai 12/22/2025|||
I pay for both Claude and Z.ai right now, and GLM-4.7 is more than capable for what I need. Opus 4.5 is nice but not worth the quota cost for most tasks.
csomar 12/23/2025||||
Yeah because Claude never makes bugs?
Imustaskforhelp 12/22/2025|||
well I feel like all models are converging and maybe claude is good but only time will tell as gemini flash and GLM put pressure on claude/anthropic models

People (here) are definitely comparing it to sonnet so if you take this stance of saving a few dollars, I am sure that you must be having the same opinion of using opus model and nobody should use sonnet too

Personally I am interested in open source models because they would be something which would have genuine value and competition after the bubble bursts

theshrike79 12/22/2025|||
z.ai models are crazy cheap. The one year lite plan is like 30€ (on sale though).

Complete no-brainer to get it as a backup with Crush. I've been using it for read-only analysis and implementing already planned tasks with pretty good results. It has a slight habit of expanding scope without being asked. Sometimes it's a good thing, sometimes it does useless work or messes things up a bit.

maxdo 12/22/2025|||
I tried several times . It is no match in my personal experience with Claude models . There’s almost no place for second spot from my point of view . You are doing things for work each bug is hours of work, potentially lost customer etc . Why would you trust your money … just to back up ?
ewoodrich 12/23/2025|||
It's a perfectly serviceable fallback when Claude Code kicks me off in the middle of an edit on the Pro plan (which happens constantly to me now) and I just want to finish tweaking some CSS styles or whatever to wrap up. If you have a legitimate concern about losing customers than yes, you're probably in the wrong target market for a $3/mo plan...
maxdo 12/23/2025|||
you can have a $20 usd /mo cursor with cutting edge models, and pay per use for extra use when you need per token, most of the time you will be ok within basic cursor plans, and you don't need to stick with one vendor. Today Claude is good , awesome ,tomorrow google is good - great.

I sometimes even ask several models to see what suggestion is best, or even mix two. Epcecially during bugfixes.

skippyboxedhero 12/23/2025|||
Openrouter with OpenCode.
ewoodrich 12/23/2025||
I've gone down that route already with Roo/Kilo Code and then OpenCode, but OpenCode with the z.ai backend and/or the CC z.ai Anthropic compatible endpoint although I've been moving to OC in general more and more over time.

GLM 4.6 with Z.ai plan (haven't tried 4.7 yet) has worked well enough for straightforward changes with a relatively large quota (more generous than CC which only gets more frustrating on the Pro plan over time) and has predictable billing which is a big pro for me. I just got tired of having to police my OpenRouter usage to avoid burning through my credits.

But yes, OpenCode is awesome particularly as it supports all the subscriptions I have access to via personal or work (Github Copilot/CC/z.ai). And as model churn/competition slows down over time I can stick which whichever end up having the best value/performance with sufficient quota for my personal projects without fear of lock-in and enshittification.

skippyboxedhero 12/23/2025||
There is a free tier for GLM 4.7 with OpenCode Zen. Think the cost is pretty reasonable for all apart from Anthropic.
theshrike79 12/22/2025||||
I'm using it for my own stuff and I'm definitely not dropping however much it costs for the Claude Max plans.

That's why I usually use Claude for planning, feed the issues to beads or a markdown file and then have Codex or Crush+GLM implement them.

For exploratory stuff I'm "pair-programming" with Claude.

At work we have all the toys, but I'm not putting my own code through them =)

maxdo 12/23/2025||
it's beyond me, why do you need Max plans? I use Opus/Sonnet/Gemini,GPT 5.2 every day in cursor and I'm not paying Claude Max.
theshrike79 12/23/2025||
I'm mostly just coding at night after the family goes to bed and even I can hit Claude Pro limits - and I started AI assisted programming when we didn't have monthly plans and I had to pay every token out of my own pocket.

I learned to be pretty efficient with token use after the first bill dropped :D

sumedh 12/23/2025||||
> I tried several times

Did you try the new GLM 4.7 or the older models?

pqtyw 12/23/2025|||
GLM 4.6 was kind of meh. Especially on Claude code since thinking was seemingly entirely broken. This week I've been playing with 4.7 and it seems like massive improvement, subjective pretty much almost at Sonnet level (it's still using a lot less thinking tokens, though).
sh3rl0ck 12/22/2025||||
I shifted from Crush to Opencode this week because Crush doesn't seem to be evolving in its utility; having a plan mode, subagents etc seems to not be a thing they're working on at the mo.

I'd love to hear your insight though, because maybe I just configured things wrong haha

theshrike79 12/23/2025||
I can't understand why every CLI tool doesn't have Plan mode already, it should be table stakes to make sure I can just ask questions or have a model do code reviews without having to worry about it rushing into implementation headlong.

Looking at you, Gemini CLI.

allovertheworld 12/22/2025|||
this doesn’t mean much if you hit daily limits quickly anyway. So the API pricing matters more
theshrike79 12/23/2025||
TBH when I hit the Claude daily limit I just take that as a sign to go outside (or go to bed, depending on the time).

If the project management is on point, it really doesn't matter. Unfinished tasks stay as is, if something is unfinished in the context I leave the terminal open and come back some time later, type "continue", hit enter and go away.

CuriouslyC 12/22/2025||
We're not gonna see significant model shrinkage until the money tap dries up. Between now and then, we'll see new benchmarks/evals that push the holes in model capabilities in cycles as they saturate each new round.
lanthissa 12/22/2025|||
isn't gemini 3 flash already model shrinkage that does well in coding?
skippyboxedhero 12/23/2025|||
Xiaomi, Nvidia Nemotron, Minimax, lots of other smaller ones too. There are massive economic incentives to shrink models because they can be provided faster and at lower cost.

I think even with the money going in, there has to be some revenue supporting that development somewhere. And users are now looking at the cost. I have been using Anthropic Max for most of this year after checking out some of these other models, it is clearly overpriced (I would also say their moat of Claude Code has been breached). And Anthropic's API pricing is completely crazy when you use some of the paradigms that they suggest (agents/commands/etc) i.e. token usage is going up so efficient models are driving growth.

hedgehog 12/22/2025||||
Smaller open-weights models are also improving noticeably (like Qwen3 Coder 30B), the improvements are happening at all sizes.
cmrdporcupine 12/22/2025||
Devstral Small 24b looks promising as something I want to try fine tuning on DSLs, etc. and then embedding in tooling.
hedgehog 12/22/2025||
I haven't tried it yet, but yes. Qwen3 Next 80B works decently in my testing, and fast. I had mixed results with the new Nemotron, but it and the new Qwen models are both very fast to run.
mark_l_watson 12/23/2025||
Same experience: on my old M2 Mac with just 32B of memory both Qwen 3 30B and the new Nemotron models are very useful for coding if I prepare a one-shot prompt with directions and relevant code. I don’t like them for agentic coding tools. I have mentioned this elsewhere: it is deeply satisfying to mix local model use with commercial APIs and services.
Imustaskforhelp 12/22/2025|||
How much billion parameter model is gemini 3 flash, I can't seem to find info about it online.
naasking 12/23/2025|||
> We're not gonna see significant model shrinkage until the money tap dries up.

I'm not sure about that. Microsoft has been doing great work on "1-bit" LLMs, and dropping the memory requirements would significantly cut down on operating costs for the frontier players.

mark_l_watson 12/23/2025||
The open models are sometimes competitive with foundation models. The costs of Z.ai’s monthly plans just increased a bit, but still inexpensive compared to Google/Anthropic/OpenAI.

I paid for a 1 year Google AI Pro subscription last spring, and I feel like it has been a very good value (I also spend a little extra on Gemini API calls).

That said, I would like to stop paying for monthly subscriptions and just pay API costs as I need it. Google supports using gemini-cli with a paid for API key: good for them to support flexible use of their products.

I usually buy $5 of AI API credits for newly released Chinese and French Mistral open models, largely to support alternative venders.

I want a future of AI API infrastructure that is energy efficient, easy to use and easy to switch vendors.

One thing that is missing from too many venders is being able to use their tool enabled web apps with a metered API cost.

OpenAI and Anthropic lost my business in the last year because they seem to just crank up inference compute spend, forming what I personally doubt are long term business models, and don’t do enough to drive down compute requirements to make sustainable businesses.

mrbonner 12/23/2025||
I tried this on OpenRouter chat interface to write a few documents. Quick thoughts: Its writing has less vibe of AI due to the lack of em-dashes! I primarily use Kimi2 Thinking for personal usage. Kimi writing is also very good, on par with the frontier models like Sonnet or Gemini. But, just like them, Kimi2 also feels AI. I can't quantify or explain why, though.

For work, it is Claude Code and Anthropic exclusively.

Tiberium 12/22/2025||
The frontend examples, especially the first one, look uncannily similar to what Gemini 3 Pro usually produces. Make of that what you will :)

EDIT: Also checked the chats they shared, and the thinking process is very similar to the raw (not the summarized) Gemini 3 CoT. All the bold sections, numbered lists. It's a very unique CoT style that only Gemini 3 had before today :)

reissbaker 12/22/2025||
I don't mind if they're distilling frontier models to make them cheaper, and open-sourcing the weights!
Imustaskforhelp 12/22/2025||
Same, although gemini 3 flash already gives a run for the cheaper aspect but a part of me really wants to get open source too because that way if I really want to some day, I can have privacy or get my own hardware to run it

I genuinely hope that gemini 3 flash gets open sourced but I feel like that can actually crash the AI bubble if something like this happens because I genuinely feel like although there are still some issues of vibing with the overall model itself, I find it very competent overall and fast and I genuinely feel like at this point, there might be some placebo effects too but in reality, the model feels really solid.

Like all of western countries (mostly) wouldn't really have a point to compete or incentives if someone open sources the model because then the competition would rather be on providers/ their speeds (like how groq,cerebras have an insane speed)

I had heard that google would allow institutions like universities to self host gemini models or similar so there are chances as to what if the AI bubble actually pops up if gemini models or top tier models accidentally get leaked or similar but I genuinely doubt of it as happening and there are many other ways that the AI bubble will pop.

scotty79 12/23/2025||
Models being open weights lets infrastructure providers compete in delivering models as service, fastests and cheapest.

At some point companies should be forced to release the weights after a reasonable time passed since they sold the service for the first time. Maybe after 3 years or so.

It would be great for competition and security research.

orbital-decay 12/23/2025|||
Yeah, I think it sometimes even repeats Gemini's injected platform instructions. It's pretty curious because a) Gemini uses something closer to the "chain of draft" and never repeats them in full naturally, only the relevant part, and b) these instructions don't seem to have any effect in GLM, it repeats them in the CoT but never follows them. Which is a real problem with any CoT trained through RL (the meaning diverges from the natural language due to reward hacking). Is it possible they used is in the initial SFT pass to improve the CoT readability?
ImprobableTruth 12/22/2025||
How is the raw Gemini 3 CoT accessed? Isn't it hidden?
Tiberium 12/22/2025|||
There are tricks on the API to get access to the raw Gemini 3 CoT, it's extremely easy compared to getting CoT of GPT-5 (very, very hard).
ceroxylon 12/23/2025||
What are you referring to? I see the 'reasoning' in OpenRouter for GPT-5.2, I was under the impression that is the CoT.
Tiberium 12/23/2025||
Yes, that's exactly what I'm referring to. When you're using the direct Gemini API (AI Studio/Vertex), with specific tricks you can get the raw reasoning/CoT output of the model, not the summary.
bwat49 12/23/2025|||
in antigravity gemini sometimes inserts its CoT directly into code comments lol
polyrand 12/22/2025|
A few comments mentioning distillation. If you use claude-code with the z.ai coding plan, I think it quickly becomes obvious they did train on other models. Even the "you're absolutely right" was there. But that's ok. The price/performance ratio is unmatched.
hashbig 12/23/2025||
I had Gemini 3 Flash hit me this morning with "you're absolutely right" when I corrected it on a mistake it did. It's not conclusive of anything.
polyrand 12/23/2025|||
That's interesting, thanks for sharing!

It's a pattern I saw more often with claude code, at least in terms of how frequently it says it (much improved now). But it's true that just this pattern alone is not enough to infer the training methods.

theptip 12/23/2025|||
Or it’s conclusive of an even broader trend!
ljosifov 12/23/2025|||
I imagine - and sure hope so - everyone trains on everything else. Distillation - ofc if one has bigger/other models providing true posterior token probabilities in the (0,1) interval (a number between 0 and 1), rather than 1-hot-N targets that are '0 for 200K-sans-this-token, and 1 for the desired output token' - one should use the former instead of the latter. It's amazing how as a simple as straightforward idea should face so much resistance (paper rejected) and from the supposedly most open minded and devoted to knowing (academia) and on the wrong grounds ('will have no impact on industry'; in fact - it's had tremendous impact on industry; better rejection wd have been 'duh it is obvious'). We are not trying to torture the model and the gpu cluster to be learning from 0 - when knowledge is already available. :-)
Havoc 12/23/2025||
>Even the "you're absolutely right" was there.

I don't think that's particularly conclusive for training on other models. Seems plausible to me that the internet data corpus simply converges on this hence multiple models doing this.

...or not...hard to tell either way.

More comments...