Top
Best
New

Posted by pretext 14 hours ago

GLM-4.7: Advancing the Coding Capability(z.ai)
302 points | 142 commentspage 2
polyrand 10 hours ago|
A few comments mentioning distillation. If you use claude-code with the z.ai coding plan, I think it quickly becomes obvious they did train on other models. Even the "you're absolutely right" was there. But that's ok. The price/performance ratio is unmatched.
hashbig 4 hours ago||
I had Gemini 3 Flash hit me this morning with "you're absolutely right" when I corrected it on a mistake it did. It's not conclusive of anything.
polyrand 2 hours ago|||
That's interesting, thanks for sharing!

It's a pattern I saw more often with claude code, at least in terms of how frequently it says it (much improved now). But it's true that just this pattern alone is not enough to infer the training methods.

theptip 2 hours ago|||
Or it’s conclusive of an even broader trend!
Havoc 7 hours ago||
>Even the "you're absolutely right" was there.

I don't think that's particularly conclusive for training on other models. Seems plausible to me that the internet data corpus simply converges on this hence multiple models doing this.

...or not...hard to tell either way.

Tiberium 12 hours ago||
The frontend examples, especially the first one, look uncannily similar to what Gemini 3 Pro usually produces. Make of that what you will :)

EDIT: Also checked the chats they shared, and the thinking process is very similar to the raw (not the summarized) Gemini 3 CoT. All the bold sections, numbered lists. It's a very unique CoT style that only Gemini 3 had before today :)

orbital-decay 1 hour ago||
Yeah, I think it sometimes even repeats Gemini's injected platform instructions. It's pretty curious because a) Gemini uses something closer to the "chain of draft" and never repeats them in full naturally, only the relevant part, and b) these instructions don't seem to have any effect in GLM, it repeats them in the CoT but never follows them. Which is a real problem with any CoT trained through RL (the meaning diverges from the natural language due to reward hacking). Is it possible they used is in the initial SFT pass to improve the CoT readability?
reissbaker 12 hours ago|||
I don't mind if they're distilling frontier models to make them cheaper, and open-sourcing the weights!
Imustaskforhelp 11 hours ago||
Same, although gemini 3 flash already gives a run for the cheaper aspect but a part of me really wants to get open source too because that way if I really want to some day, I can have privacy or get my own hardware to run it

I genuinely hope that gemini 3 flash gets open sourced but I feel like that can actually crash the AI bubble if something like this happens because I genuinely feel like although there are still some issues of vibing with the overall model itself, I find it very competent overall and fast and I genuinely feel like at this point, there might be some placebo effects too but in reality, the model feels really solid.

Like all of western countries (mostly) wouldn't really have a point to compete or incentives if someone open sources the model because then the competition would rather be on providers/ their speeds (like how groq,cerebras have an insane speed)

I had heard that google would allow institutions like universities to self host gemini models or similar so there are chances as to what if the AI bubble actually pops up if gemini models or top tier models accidentally get leaked or similar but I genuinely doubt of it as happening and there are many other ways that the AI bubble will pop.

ImprobableTruth 10 hours ago||
How is the raw Gemini 3 CoT accessed? Isn't it hidden?
Tiberium 10 hours ago||
There are tricks on the API to get access to the raw Gemini 3 CoT, it's extremely easy compared to getting CoT of GPT-5 (very, very hard).
ceroxylon 7 hours ago||
What are you referring to? I see the 'reasoning' in OpenRouter for GPT-5.2, I was under the impression that is the CoT.
Tiberium 7 hours ago||
Yes, that's exactly what I'm referring to. When you're using the direct Gemini API (AI Studio/Vertex), with specific tricks you can get the raw reasoning/CoT output of the model, not the summary.
mrbonner 8 hours ago||
I tried this on OpenRouter chat interface to write a few documents. Quick thoughts: Its writing has less vibe of AI due to the lack of em-dashes! I primarily use Kimi2 Thinking for personal usage. Kimi writing is also very good, on par with the frontier models like Sonnet or Gemini. But, just like them, Kimi2 also feels AI. I can't quantify or explain why, though.

For work, it is Claude Code and Anthropic exclusively.

swyx 5 hours ago||
> Preserved Thinking: In coding agent scenarios, GLM-4.7 automatically retains all thinking blocks across multi-turn conversations, reusing the existing reasoning instead of re-deriving from scratch. This reduces information loss and inconsistencies, and is well-suited for long-horizon, complex tasks.

does it NOT already do this? i dont see the difference. the image doesnt show any before/after so i dont see any difference

desireco42 10 hours ago||
I've been using Z.Ai coding plan for last few months, generally very pleasant experience. I think with GLM-4.6 they had some issues which this corrects.

Overall solid offering, they have MCP you plug into ClaudeCode or OpenCode and it just works.

jbm 8 hours ago|
I'm surprised by this; I have it also and was running through OpenCode but I gave up and moved back to Claude Code. I was not able to get it to generate any useful code for me.

How did you manage to use it? I am wondering if maybe I was using it incorrectly, or needed to include different context to get something useful out of it.

csomar 3 hours ago|||
I've been using it for the last couple months. In many cases, it was superior to Gemini 3 Pro. One thing about Claude Code, it delegates certain tasks to glm-4.5 air and that drops performance a ton. What I did is set the default models to 4.6 (now 4.7)

Be careful this makes you run through your quota very fast (as smaller models have much higher quotas).

    ANTHROPIC_DEFAULT_HAIKU_MODEL=glm-4.7
    ANTHROPIC_DEFAULT_MODEL=glm-4.7
    ANTHROPIC_DEFAULT_OPUS_MODEL=glm-4.7
    ANTHROPIC_DEFAULT_SONNET_MODEL=glm-4.7
big_man_ting 6 hours ago|||
i'm in the same boat as you. i really wanted to like OpenCode but it doesn't seem to work properly for me. i keep going back to CC.
XCSme 12 hours ago||
Funny how they didn't include Gemini 3.0 Pro in the bar chart comparison, considering that it seems to do the best in the table view.
jychang 12 hours ago||
Also, funny how they included GPT-5.0 and 5.1 but not 5.2... I'm pretty sure they ran the benchmarks for 5.0, then 5.1 came out, so they ran the benchmarks for 5.1... and then 5.2 came out and they threw their hands up in the air and said "fuck it".
rynn 9 hours ago|||
gpt-5.2 codex isn't available in the API yet.

If you want to be picky they could've compared it against gpt-5 pro gpt-5.2 gpt-5.1 gpt-5.1-codex-max gpt-5.2 pro

all depending on when they ran benchmarks (unless, of course, they are simply copying OAI's marketing).

At some point it's enough to give OAI a fair shot and let OAI come out with their own PR, which they doubtlessly will.

XCSme 12 hours ago||||
I didn't even notice that, I assumed it was the latest GPT version.
amelius 10 hours ago|||
after or before running the benchmarks?
guluarte 11 hours ago||
Gemini is garbage and does it's own thing most of the time ignoring the instructions
gigatexal 12 hours ago||
Even if this is one or two iterations behind the big models Claude or openai or Gemini it’s showing large gains. Here’s hoping this gets even better and better and I can run this locally and also that it doesn’t melt my PC.
Imustaskforhelp 11 hours ago|
Although one would hope they can run it locally (which I hope so too but I doubt that with the increase of ram prices, I feel like its possible around 2027-2028). but Even if in the meanwhile we can't, I am sure that competition in general (on places like Openrouter and others) would give a meaningful way to cheapen the prices overall even further than the monopolistic ways of claude (let's say).

It does feel like these models are only behind 6 months tho as many like to say and for some things its 100% reasonable to use it and for some others not so much.

cmrdporcupine 12 hours ago||
Running it in Crush right now and so far fairly impressed. It seems roughly in the same zone as Sonnet, but not as good as Opus or GPT 5.2.
alok-g 6 hours ago|
For others like me who did not know about Crush:

https://github.com/charmbracelet/crush

https://news.ycombinator.com/item?id=44736176

tonyhart7 10 hours ago|
less than 30 bucks for entire year, insanely cheap

(I know that people must pay it on privacy) but still for maybe playing around with still worth it imo

sumedh 3 hours ago|
Are you saying the reason they are offering it so cheap is because they are training on user data?
More comments...