Posted by serjester 3/31/2025
How many years do we have to keep hearing this line? ChatGPT is two years old and still can't be relied on.
As far as I know, llm hallucinations are inherent to them and will never be completely removed. If I book a flight, i want 100,0% reliability, Not 99% ( which we are still far away today).
People got to take llm for what they are, good bullshiter, awesome to translate text or reformulate words but it's not designed to have thought or be an alternate secretary. Merely a secretary tool.
Ofc I'm not defending all thy hype and I look forward to more advanced models that get it right more often.
But I do laugh at him tech people and managers who expect ml based on an analog process to be sterile and clean like a digital environs.
All of reality is probabilistic. Expecting that to map deterministically to solving open ended complex problems is absurd. It's vectors all the way down.
”It works when it works” is fine when stakes are low and human is in the loop, like artwork for a blog post. And so in a way, I agree with you. AI doesn’t belong in intermediate computer-to-computer interactions, unless the stakes are low. What scares me is that the AI optimists are desperately looking to apply LLMs to domains and tasks where the cost of mistakes are high.
So we also thought with AI in general, and spent decades toiling on rules based systems. Until interpretability was thrown out the window and we just started letting deep learning algorithms run wild with endless compute, and looked at the actual results. This will be very similar.
TCP creates an abstraction layer with more reliability than what it's built on. If you can detect failure, you can create a retry loop, assuming you can understand the rules of the environment you're operating in.
Indeed, this is what makes autonomous agentic tool using systems robust as well. Those retry loops become ad-hoc where needed, and the agent can self correct based on error responses, compared to a defined workflow that would get stuck in said loop if it couldn't figure things out, or just error out the whole process.