Posted by oshrimpton 4 days ago
Maybe the pre-2024 users do, but I've seen plenty of those exact "frontier models never hallucinate" comments on HN as well
> Why is everyone expecting LLMs to be like the Star Trek computer?
Because they are often marketed as magic AIs, not as mere language models.
[0] https://bpspsychub.onlinelibrary.wiley.com/doi/10.1111/bjso....
the oss models are impressive but it's pretty clear how quickly they fall off when you try to use them outside of a narrow set of problems they benchmarked well on when compared to opus/5.5
From how they measure it, a model that simply answers "I don't know." to any prompt would be the one hallucinates the least. So it's not surprising at all that a smaller model can perform better.
a key method to help with hallucinations is to provide good sources when asking questions (context engineering / knowledge base)
N=1, but I disagree strongly. I'm writing a hard-science science fiction story, and the physics of it is at (and frankly, beyond) my skillset. The story's plot has had to change over a dozen times as I realized errors in my application of physics in the story.
Throughout, I've been reviewing the physics with LLMs, mainly Gemini 3.1 Pro Preview, but also with Claude and OpenAI. Often I have the LLMs debate each other -- "My friend [another model] said XYZ about the physics, is that right or wrong?" In almost all cases, Gemini explains why the other models are wrong, and when I send its explanation to them, they concede it is right and they are wrong.
As I said, I did the above checks literally dozens of times as I wrote the story. And everything was dialed in: no further issues claimed by anyone, me or the LLMs.
Not with Fable. I managed to get it to review the story while it was running, and it listed out something like ten issues: some minor, some general knowledge-based, and two that were impressive:
1. It pointed out where Gemini (and I, and other LLMs) had missed a , resulting in values about 152 times larger than they should have been. I sent that to Gemini and it fully conceded that it had been wrong all along. 2. It pointed out a simple inconsistency in the application of special relativity (I thought I had that at least dialed in, but no :-/ ) that affected a very specific plot point. The story is novella-length, about 28,000 words long, and this is a point that was mentioned in the first two pages, and then not again until the very last page. And it's obvious, once you realize it. And I missed it. Gemini missed it. Claude and ChatGPT missed it.
Only Fable found it. Again, N=1, but that was a remarkable run I got out of it in the couple days it was available.
Fable gave a description so deep that even I couldn't figure out what was going on and had to ask it to give me a simpler explanation.
In my case two people are making very-near-light-speed trips to a star 20-ish light years away. Originally, I had one leaving a month earlier and making the journey with a Lorentz factor of 40, while the protagonist takes the same trip at > 200.
The former experiences a trip of 6 months, the latter something like 25 days. And I wrote it as if that meant that the protagonist would get there months ahead. But both of them will take hours to a day over the time light takes, and the one who leaves a month earlier will get almost a month before.
That error sat in my manuscript for two months of back and forth with other models. Fable found it on the first go.
LMK if you want to trade manuscripts!
What about using two models, with a smaller model used for this kind of negative reasoning?