Top
Best
New

Posted by serjester 3/31/2025

AI agents: Less capability, more reliability, please(www.sergey.fyi)
423 points | 253 commentspage 6
bobosha 4/1/2025|
i think this agents vs workflow is a false dichotomy. A workflow - at least as I understand it - is the atomic unit of an agent i.e an agent stitches workflow(s) together.
segh 3/31/2025||
Lots of people are building on the edge of current AI capabilities, where things don't quite work, because in 6 months when the AI labs release a more capable model, you will just be able to plug it in and have it work consistently.
cube00 3/31/2025||
> because in 6 months when the AI labs release a more capable model

How many years do we have to keep hearing this line? ChatGPT is two years old and still can't be relied on.

postexitus 3/31/2025|||
and where is that product that was developed on the edge of current AI capabilities and now with latest AI model plugged in it's suddenly working consistently? All I am seeing is models getting better and better in generating videos of spaghetti eating movie stars.
alltoowell 3/31/2025|||
They're coming. I've seen the observability tools try to do this but I still have to tweak it. it's just time-consuming. Empromptu.ai is the closest to solving this problem. They are the only ones that have a library that you install in your to do system optimization, evals, for accuracy in real-time.
segh 3/31/2025|||
For me, they have come from the AI labs themselves. I have been impressed with Claude Code and OpenAI's Deep Research.
vslira 3/31/2025||
while i'm bullish on AI capabilities, that is not a very optimistic observation for developers building on top of it
techpineapple 3/31/2025||
In 6 months when FSD is completed, and we get robots in every home? I suspect we keep adding features, because reliability is hard. I do not know what heuristic you would be looking to conclude that this problem will eventually be solved by current AI paradigms.
thornewolf 3/31/2025||
GP comment is what has already happened "every 6 months" multiple times
techblaze3 4/1/2025||
Appreciate the effort in writing this.
daxfohl 3/31/2025||
We can barely make deterministic distributed services reliable. And microservices now have a bad reputation for being expensive distributed spaghetti. I'm not holding my breath for distributed AI agents to be a thing.
asdev 3/31/2025||
want reliability? build automation instead of using non deterministic models to complete tasks
anishpalakurT 3/31/2025||
Check out BAML at boundaryml.com
alltoowell 3/31/2025|
baml isn't great. You need to write it in their format and it still doesnt really solve the accuracy problem from humans interacting with your system that we're talking about here.
revskill 4/1/2025||
Ai can uhnderstand its output.
aucisson_masque 4/1/2025||
Can you actually make the LLM more reliable tho ?

As far as I know, llm hallucinations are inherent to them and will never be completely removed. If I book a flight, i want 100,0% reliability, Not 99% ( which we are still far away today).

People got to take llm for what they are, good bullshiter, awesome to translate text or reformulate words but it's not designed to have thought or be an alternate secretary. Merely a secretary tool.

fennecbutt 4/1/2025||
Lmao, training models off what is essentially a process directly inspired by imperfect "Good enough" biological processes and expecting it to be a calculator.

Ofc I'm not defending all thy hype and I look forward to more advanced models that get it right more often.

But I do laugh at him tech people and managers who expect ml based on an analog process to be sterile and clean like a digital environs.

ramesh31 3/31/2025|
More capability, less reliability please. I want something that can achieve superhuman results 1 out of 10 times, not something that gives mediocre human results 9 out of 10 times.

All of reality is probabilistic. Expecting that to map deterministically to solving open ended complex problems is absurd. It's vectors all the way down.

klabb3 3/31/2025||
Reality is probabilistic yes but it’s not black box. We can improve our systems by understanding and addressing the flaws in our engineering. Do you want probabilistic black-box banking? Flight controls? Insurance?

”It works when it works” is fine when stakes are low and human is in the loop, like artwork for a blog post. And so in a way, I agree with you. AI doesn’t belong in intermediate computer-to-computer interactions, unless the stakes are low. What scares me is that the AI optimists are desperately looking to apply LLMs to domains and tasks where the cost of mistakes are high.

soulofmischief 3/31/2025|||
Stability is the bedrock of the evolution of stable systems. LLMs will not democratize software until an average person can get consistently decent and useful results without needing to be a senior engineer capable of a thorough audit.
ramesh31 3/31/2025||
>Stability is the bedrock of the evolution of stable systems.

So we also thought with AI in general, and spent decades toiling on rules based systems. Until interpretability was thrown out the window and we just started letting deep learning algorithms run wild with endless compute, and looked at the actual results. This will be very similar.

klabb3 3/31/2025|||
This can be explained easily – there are simply some domains that were hard to model, and those are the ones where AI is outperforming humans. Natural language is the canonical example of this. Just because we focus on those domains now due to the recent advancements, doesn’t mean that AI will be better at every domain, especially the ones we understand exceptionally well. In fact, all evidence suggests that AI excels at some tasks and struggles with others. The null hypothesis should be that it continues to be the case, even as capability improves. Not all computation is the same.
skydhash 3/31/2025||||
Rules based systems are quite useful, not for interacting with an untrained human, but for getting things done. Deep learning can be good at exploring the edges of a problem space, but when a solution is found, we can actually get to the doing part.
soulofmischief 3/31/2025|||
Stability and probability are orthogonal concepts. You can have stable probabilistic systems. Look no further than our own universe, where everything is ultimately probabilistic and not "rules-based".
recursive 3/31/2025|||
> Expecting that to map deterministically to solving open ended complex problems is absurd.

TCP creates an abstraction layer with more reliability than what it's built on. If you can detect failure, you can create a retry loop, assuming you can understand the rules of the environment you're operating in.

ramesh31 3/31/2025||
>If you can detect failure, you can create a retry loop, assuming you can understand the rules of the environment you're operating in

Indeed, this is what makes autonomous agentic tool using systems robust as well. Those retry loops become ad-hoc where needed, and the agent can self correct based on error responses, compared to a defined workflow that would get stuck in said loop if it couldn't figure things out, or just error out the whole process.

Jianghong94 3/31/2025|||
Superhuman results 1/10 are, in fact, a very strong reliability guarantee (maybe not up to today's nth 9 decimal standard that we are accustomed to, but probably much higher than any agent in real-world workflow).
deprave 3/31/2025||
What would be a superhuman result for booking a flight?
mjmsmith 3/31/2025||
10% of the time the seat on either side of you is empty, 90% of the time you land in the wrong country.
More comments...