AI agents: Less capability, more reliability, please

Posted by serjester 3/31/2025

AI agents: Less capability, more reliability, please(www.sergey.fyi)

423 points | 253 commentspage 4

qoez 3/31/2025|

You get more reliability from better capability though. More capability means being better at not misclassifying subtle tasks, which is what causes reliability issues.

andreash 3/31/2025||

We are building this with https://lets.dev. We believe there will be great demand for less capable, but much more determinisic agents. I also recommend everyone to read "What is an agent?" by Harrison Chase. https://blog.langchain.dev/what-is-an-agent/

mdaniel 4/1/2025|

That's some pretty big chutzpah putting up an 3l33t page and a "subscribe to our mailing list" input box to draw in potential customers

But hey, this whole thread is about the warring factions about whether anything matters anymore, so don't listen to me

genevra 3/31/2025||

I agree up until the coding example. If someone doesn't know about version control I don't think that's any fault of the company trying to stretch the technology to its limits and let people experiment. Cursor is a really cool step in a direction, and it's weird to say we should clamp what it's doing because people might not be competent enough to fix its mistakes.

cadamsdotcom 3/31/2025||

Models aren’t great at deciding whether an action is irreversible - and thus whether to stop to ask for input/advice/approval. Hence agentic systems usually are given a policy to follow.

Perhaps the question “is this irreversible?” should be delegated to a separate model invocation.

There could be a future in which agentic systems are a tree of model and tool invocations, maybe with a shared scratchpad.

YetAnotherNick 3/31/2025||

I think the author is doing apples to oranges comparison. If you have AI acting agnatically, capability is likely positively correlated with reliability. If you don't have AI agents, it is more reliable.

AI agents are not there yet and even cursor has agent mode not selected by default. I have seen cursor agent quite a bit worse that the raw model with human selected context.

jappwilson 3/31/2025||

Can't wait for this being a plot point in a murder mystery, someone gamed the AI agent to create a planned "accident"

nottorp 3/31/2025||

But but...

People don't get promoted for reliability. They get promoted for new capabilities. Everyone thinks they're the next Google.

prng2021 3/31/2025||

I think the best shot we have at solving this problem is an explosion of specialized agents. That will limit how off the rails each one can go at interpreting or performing some type of task. The end user still just needs to interact with one agent though, as long as it can delegate properly to subagents.

piokoch 3/31/2025|

Funny note about Cursor. Commercial project, rather expensive, cannot figure out that it would be good to use, say, version control not to break somebody's work. That's why I prefer Aider (free), which is simply committing whatever it does, so any change could be reverted. Easily.

More comments...