GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2

Posted by oshrimpton 4 days ago

GPT-5.5 hallucinates 3x more than MIT-licensed GLM-5.2(arrowtsx.dev)

577 points | 292 commentspage 4

zuzuen_1 4 days ago|

I think we need better classification and taxonomy on erroneous LLM behaviors than the catch-all term "hallucinate"..

metalspot 4 days ago||

hallucination is good for tasks that have an external oracle like computer programming

dgellow 3 days ago||

Could you explain what you mean? That feels like a waste of processing to me. Yes the model will correct itself once it eventually run a compiler/linter. But that's still wasted time and compute

sometimelurker 3 days ago||

ehh ur right but there's a lot of nuance here. if you have a system that doesn't hallucinate a ton and is still very "creative" that's great, and probably much better than a hallucinating system regardless of its creativity. I'm reminded of theoremproving LLMs working in lean producing millions of slop proofs until one works, but if you have something like that simple RLVR should fix it (external oracle can be the judge for the RL.

anArbitraryOne 4 days ago||

It's fine if it hallucinates, as long as it sounds overconfident

brown_munda 4 days ago||

GLM 5.2 is really impressive at design as well. Overall loving it.

gitaarik 3 days ago||

This reminds me of the Missing Dollar Riddle [1], where the listener is deliberately put on a wrong thinking path, to fool it.

With your own logical thinking you might never come to this confusion, and if you never heard this riddle before, you might be tricked by it.

But as we grow in life, and get experience, we learn about these riddles and aren't fooled as easily anymore.

Maybe it'll work like that for LLMs too?

[1]: https://en.wikipedia.org/wiki/Missing_dollar_riddle

ecommerceguy 3 days ago||

It's very much looking like OpenAI will be bailed out, along with all the other Capex'ers. I say this because the trump admin (I feel partially at fault because I voted for him) has indicated they will be bailing out the entire ai stargate from intel and amd to amazon and anthropic. I know alot of everyday folks that absolutely hate - passionately HATE - anything and everything tech bro. Downvote all you want, that's the reality. They see Palantir et al as evil and demonic.

dgellow 4 days ago||

> One of the biggest models in the world was banned because a single jailbreak was too much of a risk.

We really don't know what the actual reason is given the politics at play. I would bet more on the Trump administration looking for any excuse to punish Anthropic

remix2000 4 days ago||

Calling llm slop "hallucinating" is so counter-productive imo. After all, LLMs are just a variant of markov chains and as such this technology isn't able to discern falsehoods from truths. It's like trying to use a barometer to tell the time.

hit8run 4 days ago|

You are also just a variant of markov chains wired in your brain. So what you complaining about?

remix2000 4 days ago|||

Well the difference here is that you're overly simplifying complex biology and many other factors whereas llms are in fact actually simple mathematical models. As always, the devil lies in the details. Dismissing intricacies is a useful tool for daydreamers, not so much for engineers.

sometimelurker 3 days ago||

LLMs actually aren't simple Markov chains tho, your also simplifying. and LLMs trained with RLVR aren't just optimized over the space of functions (like gpt2 was), they're optimized over the space of programs (programs under some length). You find the ideal algorithm that can do the task you need it to.

qwery2 3 days ago|||

RLVR is a process which updates the Markov chain

__natty__ 4 days ago|||

And often it’s not perfect either. Just because one is true it doesn’t dismiss the other

Naveja 4 days ago||

loving glm 5.2 personally

metalman 3 days ago|

to paraphrase the title, "in the land of the insane, those who are meerly delusional will rule"

More comments...