AI agents: Less capability, more reliability, please

Posted by serjester 3/31/2025

AI agents: Less capability, more reliability, please(www.sergey.fyi)

423 points | 253 commentspage 5

rglover 3/31/2025|

> Given the intensifying competition within AI, teams face a difficult balance: move fast and risk breaking things, or prioritize reliability and risk being left behind.

Can we please retire this dichotomy? Part of why teams do this in the first place is because there's this language of "being left behind."

We badly need to retreat to a world in which rigorous engineering is applauded and expected—not treated as a nice to have or "old world thinking."

mentalgear 3/31/2025||

Capability demos (like Rabbit R1 vaporware) will go up as long as the market is hot and investors (like lemmings) foolishly running after those companies that are best @ hype.

shireboy 3/31/2025||

" It’s easy to blame the user's missing grasp of basic version control, but that misses the deeper point."

Uhh, no, that's pretty much the point. A developer without basic understanding of version control is like a pilot without a basic understanding of landing. A ton of problems with AI (or any other tool, including your own brain) get fixed by iterating on small commits and branching. Throw away the commit or branch if it really goes sideways. I can't fathom working on something for 4 months without realizing a problem or having any way to roll back.

That said, the one argument I could see is if Cursor (or copilot, etc) had built in to suggest "this project isn't in source control, we should probably do that before getting too far ahead of ourselves.", then help the user setup sc, repo, commit, etc. The topic _is_ tricky and I do remember not totally grasping git, branching, etc.

highmastdon 3/31/2025|

The nice thing is that adding this to the basic prompt that cursor uses will advance all those users and directly do away with this problem only to discover the next one. However, all these little things add up to a very powerful prompt where the LLM will make it only easier for anyone to build real stuff that on the surface looks very good

vivzkestrel 3/31/2025||

remember 2016 chatbots anymore. sounds like the same thing all over again except this time we got hallucinations and unpredictability

xg15 4/1/2025||

> If your task can be expressed as a workflow, build a workflow.

And miss out on the sweet, sweet VC millions? Naah.

fullstackwife 3/31/2025||

Are we reinventing software engineering? What happened to the "write code for error" principle?

marban 3/31/2025||

Giving up accuracy for a bit of convenience—if any at all—almost never pays off. Looking at you, Alexa.

danielbln 3/31/2025|

Image compression, eventual consistency, fuzzy search. There are many more examples I'm sure.

skydhash 3/31/2025||

> Image compression, eventual consistency, fuzzy search. There are many more examples I'm sure.

Isn't all of these very deterministic? You can predict what's going to be discarded by the compression algorithm. Eventual consistency is only eventual because of the generation of events. Once that stops, you will have a consistent system and the whole thing can be replayed based on the history of events. Even with fuzzy search you can intuit how to get reliable results and ordering without even looking at the algorithms.

An LLMs based agent is the least efficient method for most of the cases they're marketing if for. Sometimes all you need is a rule-based engine. Then you can add bounded fuzziness where it's actually helpful.

cnst 4/2/2025||

This is my biggest complaint about AI.

Instead of creating easy-to-navigate help sections of the website, and explaining the product and everything clearly, the flashy vendors simply put everything behind an opaque model as if that's somehow better.

Then you have to guess what to type to get the most basic info about fees, terms and procedures of a service.

You want to see how the Pros are doing it? Well, they're not using any AI! Tesla, for example, still has a regular PDF and a regular section-based manual (in HTML) where you can read the details about your car.

$TSLA is priced as being the most innovative auto manufacturer, and they're clearly proficient with the AI (Autopilot/FSD), yet when it comes to user's manual, clearly they're following the same process as all the legacy automakers always have had (besides not hiding the PDF behind a parts paywall, and having an open-access HTML version of the manual, too, of course). Why? Because that actually works!

amogul 3/31/2025||

Reliability, consistency and accuracy is the next frontier that we all have to tackle it sucks. Friend of mine is building Empromptu.ai to tackle exactly this. From what she told me built a model where that let's you define accuracy based on your use case and their models optimize your whole system towards it.

donfotto 3/31/2025|

> choosing a small number of tasks to execute exceptionally well

And that is the Unix philosophy

More comments...