Top
Best
New

Posted by pretext 16 hours ago

GLM-4.7: Advancing the Coding Capability(z.ai)
326 points | 152 commentspage 3
larodi 13 hours ago||
From my limited exposure to these models, they seem very very very promising.
maxdo 13 hours ago||
Funny enough they excluded 4.5 opus :)
zaiguru 6 hours ago||
I'm completely blown away by ZAI GLM 4.7.

Great performance for coding after I snatched a pretty good deal 50%+20%+10%(with bonus link) off.

60x Claude Code Pro Performance for Max Plan for the almost the same price. Unbelievable

Anyone cares to subscribe here is a link:

You’ve been invited to join the GLM Coding Plan! Enjoy full support for Claude Code, Cline, and 10+ top coding tools — starting at just $3/month. Subscribe now and grab the limited-time deal! Link:

https://z.ai/subscribe?ic=OUCO7ISEDB

zaiguru 6 hours ago||
I'm completely blown away by ZAI GLM 4.7.

Great performance for coding after I snatched a pretty good deal 50%+20%+10%(with bonus link) off.

60x Claude Code Pro Performance for Max Plan for the almost the same price. Unbelievable

Anyone cares to subscribe here is a link:

https://z.ai/subscribe?ic=OUCO7ISEDB

emp17344 6 hours ago|
This guy keeps spamming the same comment. Pretty sure this is a bot.
observationist 12 hours ago|
Grok 4 Heavy wasn't considered in comparisons. Grok meets or exceeds the same benchmarks that Gemini 3 excels at, saturating mmlu, scoring highest on many of the coding specific benchmarks. Overall better than Claude 4.5, in my experience, not just with the benchmarks.

Benchmarks aren't everything, but if you're going to contrast performance against a selection of top models, then pick the top models? I've seen a handful of companies do this, including big labs, where they conveniently leave out significant competitors, and it comes across as insecure and petty.

Claude has better tooling and UX. xAI isn't nearly as focused on the app and the ecosystem of tools around it and so on, so a lot of things end up more or less an afterthought, with nearly all the focus going toward the AI development.

$300/month is a lot, and it's not as fast as other models, so it should be easy to sell GLM as almost as good as the very expensive, slow, Grok Heavy, or so on.

GLM has 128k, grok 4 heavy 256k, etc.

Nitpicking aside, the fact that they've got an open model that is just a smidge less capable than the multibillion dollar state of the art models is fantastic. Should hopefully see GLM 4.7 showing up on the private hosting platforms before long. We're still a year or two from consumer gear starting to get enough memory and power to handle the big models. Prosumer mac rigs can get up there, quantized, but quantized performance is rickety at best, and at that point you look at the costs of self hosting vs private hosts vs $200/$300 a month (+ continual upgrades)

Frontier labs only have a few years left where they can continue to charge a pile for the flagship heavyweight models, I don't think most people will be willing to pay $300 for a 5 or 10% boost over what they can run locally.

nl 9 hours ago||
It seems like someone at X.ai likes maxing benchmarks but real world usage shows it significantly behind frontier models.

I do appreciate their desire to be the most popular coding model on OpenRouter and offer Grok4-Fast for free. That's a notable step down from frontier models but fine for lots of bug fixing. I've put hundreds of millions of tokens through it.

Alifatisk 12 hours ago|||
In my experience, Grok 4 expert performs way worse then what the benchmarks say.

I’ve tried it with coding, writing and instructions following. The only thing it excels at currently and searching for things across the web is+ twitter.

Otherwise, I would never use it for anything else. At coding, it always includes an error, when it patches it, it introduces another one. When writing creative text and had to follow instructions, it hallucinates a lot.

Based on my experience, I am suspecting XAI for bench-maxing on Artificial Analysis because no way Grok 4 expert performs close to Gpt-5.2, Claude sonnet 4.5 and Gemini 3 pro

Alifatisk 51 minutes ago||
Excuse my grammar error, I wrote this shortly before falling asleep
lame-robot-hoax 12 hours ago|||
Grok, in my experience, is extremely prone to hallucinations when not used for coding. It will readily claim to have access to internal Slack channels at companies, it will hallucinate scientific papers that do not exist, etc. to back its claims.

I don’t know if the hallucinations extend to code, but it makes me unwilling to consider using it.

observationist 12 hours ago|||
Fair - it's gotten significantly better over the last 4 months or so, and hallucinations aren't nearly as bad as they once were. When I was using Heavy, it was excellent at ensuring grounding and factual statements, but it's not worth $100 more than ChatGPT Pro in capabilities or utility. In general, it's about the same as ChatGPT Pro - once every so often I'll have to call out the model making something up, but for the most part they're good at using search tools and ensuring claims get grounding and confirmation.

I do expect them to pull ahead, given the resources and the allocation of developers at xAI, so maybe at some point it'll be clearly worth paying $300 a month compared to the prices of other flagships. For now, private hosts and ChatGPT Pro are the best bang for your buck.

F7F7F7 8 hours ago||
What are you doing with GPT Pro? I've compared it directly with Claude Max x20 and Google's premium offer. I just don't see myself ever leaving Claude Code as my daily driver. Codex is slow and opaque, albeit accurate. And Gemini is just super clumsy inside of it's CLI (and in OpenRouter) often confusing BASH and plans with actual output.
ls612 5 hours ago|||
I had Grok write me a 150 line shell script which it nearly oneshot, except for the fact it made a one character typo in some file path handling code that took me an hour to diagnose. On one hand it’s so close to being really really good for coding, but on the other with this sort of errors (unlike other frontier models which have easily diagnosable error modes) it can be super frustrating. I’m hopeful we will see good things from Grok 5 in the coming months.
kristianp 12 hours ago|||
Perhaps people are steering clear of grok due to its extremist political training.
observationist 12 hours ago||
This is a silly meme.
knowsuchagency 12 hours ago||
Mecha hitler
observationist 12 hours ago||
Yes, an adventure in public facing bots that can pull from trending feeds, self referential system prompts, minimal guardrails, and that poor fellow Will Stancil.

The absence of guard rails is a good thing - what happened with mechahitler was a series of feature rollouts that combined with Pliny trending, resulting in his latest grok jailbreak ending up in the prompt, followed by the trending mechahitler tweets, and so on. They did a whole lot of new things all at once with the public facing bot, and didn't consider unintended consequences.

I'd rather a company that has a mechahitler incident and laughs it off than a company that pre-emptively clutches pearls on behalf of their customers, or smugly insists that we should just trust them, and that their vision of "safety" is best for everyone.

zamalek 10 hours ago|||
Unfortunately grok doesn't even meet that bar anymore. There was the very recent incident where it claimed Musk was the best at everything, so xAI are clearly not beyond baking in intentional bias/clutching pearls.

https://techcrunch.com/2025/11/20/grok-says-elon-musk-is-bet...

bigyabai 7 hours ago|||
> The absence of guard rails is a good thing

It's really not. I have no axe to grind with Elon, but X and it's reputation for "oops we made a mistake" critical failures is a no-go. I don't feel safe signing up to try whatever their free model when their public image is nonstop obvious mistakes. There is no world where I'm bringing those models to work, and explaining to HR why my web traffic included a Mechahitler response (or worse).

Anthropic and OpenAI are Silicon Valley circuses in a relative sense, but they take this stuff seriously and make genuine advancements. XAI could disappear tomorrow and the human race would not lose any irreplaceable research. It's a dedicated fart-huffing division on the best of days, I hope you're not personally invested in their success.

guluarte 8 hours ago|||
Opus > Codex > Gemini in my opinion, grok is not even close
Madmallard 7 hours ago|||
" Grok 4 Heavy wasn't considered in comparisons. Grok meets or exceeds the same benchmarks that Gemini 3 excels at, saturating mmlu, scoring highest on many of the coding specific benchmarks. Overall better than Claude 4.5, in my experience, not just with the benchmarks."

I think these types of comments should just be forbidden from Hacker News.

It's all feelycraft and impossible to distinguish from motivated speech.

claudiug 12 hours ago||
every time i use grok is get some bad results. basically is all 1000% perfect from his point of view, review the code... "bollocks" methods that dont exists or just one line of code or method created with a nice comment: //#TODO implement