Don’t let an LLM make decisions or execute business logic

Posted by petesergeant 4/1/2025

Don’t let an LLM make decisions or execute business logic(sgnt.ai)

325 points | 169 commentspage 2

renewiltord 4/1/2025|

A more general application of this is why we have LLM tool use. I don’t have the LLM figure out how to integrate with my blog, I write an MCP and expose it to the LLM as a tool. Likewise, when I want to interpret free text I don’t push all the state into the LLM and ask it to do so. I just interpret it into bits and use those.

It’s just a tool that does well with language. You have to be smart about using it for that. And most people are. That’s why tools, MCPs, etc. are so big nowadays.

egypturnash 4/1/2025||

But feel free to let it try to summarize the thrust of your article with an AI-generated image that makes half your audience wonder if the text beneath it isn’t also AI spew.

petesergeant 4/1/2025|

> if the text beneath it isn’t also AI spew

About 25% of the sentences are rewrites from Claude for clarity and accuracy. Claude was also heavily involved in how the article is laid out, and challenged me to add several transitional pieces I wouldn’t have added otherwise. In all, I found it very helpful for writing this article, and strongly recommend using it for improving articles.

anal_reactor 4/1/2025||

The entire post feels like "cars will never become popular because they're not nearly as reliable as horses". It's incredible that we're all tech people, yet we're blind to not only the idea that tech will improve, but also the speed at which it is currently improving. People who don't like AI simply keep moving goalposts. If you told a person 10 years ago that the computer will be able to write a logically structured essay on any topic in any language without major errors, they'd be blown away. We are not though, because AI cannot write complete applications yet. And once it does, we'll be disappointed it cannot run an entire company on its own. And once it does, we'll be disappointed it cannot replace the government. And once it does, we'll find another reason to be disappointed.

Is there some website where I can read more on what AI can do, instead of what it cannot do?

clemens3 4/1/2025|

New LLM releases, market trends, interviews etc.

http://techinvest.li/ai/

tdiff 4/1/2025||

I believe many of the "vibe coders" won't be able to follow that advise (as they are not trained to actually design systems), and they will form a market of "sometimes working" programs.

Its unlikely that they would change their approach, so the world and LLM creators would have to adapt.

soco 4/1/2025|

At least in today's world with citizen programmers, a few low/no-code systems live much longer than expected and get used much wider than expected so hit walls nobody was bothered to think beforehand. Getting those programs past that bump is... no expletive is hard enough for it. Now how would we dream of fixing a vibe-programmed app? More vibe programming? Does anybody you know save their chats so the next viber has any trace of context?

tdiff 4/1/2025||

Chat history will be stored in git /s

DiscourseFan 4/1/2025||

Anyone whose done adversarial work with the models can tell you there are actually things that LLMs get consistently wrong, regardless of compute power. What those things are, it has not yet been fully codified but we are arriving now at a general understanding of the limits and capabilities of these machines and soon they will be employed for far more directly useful purposes than the wasteful, energy-sinks of tasks they are called on for now like "creative" work or writing shitty code. Then there will be a reasonable market adjustment and the technology will enter into the stream of things used for everyday commerce.

gsf_emergency_2 4/1/2025||

Not quite God of the Gaps, but "god of the not-yet-on-AI-blamed"

https://phys.org/news/2025-03-atheists-secular-countries-int...

>The "Knobe effect" is the phenomenon where people tend to judge that a bad side effect is brought about intentionally, whereas a good side effect is judged not to be brought about intentionally.

jolt42 4/1/2025|

Didn't Kurt Godel prove there will always be gaps?

gsf_emergency_2 4/1/2025||

Wrt the collection of all axiom systems, the gaps would be almost imperceptible,akin to those between the rationals?

(Note that DeepSeek got "good enough" with "only" FP8)

BeetleB 4/1/2025||

All his reasons for not using an LLM make sense only if you're a tech guy who has programming skills.

Have a conversation with a nontech person who achieves quite a bit with LLMs. Why would they give it up and spend a huge amount of time to learn programming so they can do it the "right" way, when they have a good enough solution now?

nkmnz 4/1/2025||

The example of chess is really bad. The LLM doesn’t need to know chess to beat every single human on earth most of the time. It needs to know how to interface with stockfish and that is a solved problem by now, either via mcp or vision.

etempleton 4/1/2025||

I think a lot of people are going to be surprised at where LLMs stop progressing.

webprofusion 4/1/2025|

The tone of the article is that getting AI agents to do anything fundamentally wrong because they'll make mistakes and its expensive to run them.

So:

- Humans make mistakes all the time and we happily pay for those by the hour as long as the mistakes stay within an acceptable threshold.

- Models/agents will get cheaper as diminishing returns in quality of results get more common. Hardware to run them will get cheaper and less power hungry as it increases in commodity.

- In all cases, It Depends.

If I ask a human tester to test the UI and API of my app (which will take them hours) the documented tests and expected results are the same as if I asked an AI to do it, the cost may be the same or less of an AI to do it but I can ask the AI to do it again for every change, or every week etc. Have genuinely started to test this way.

petesergeant 4/1/2025||

It depends what you mean by agent, first of all, but I’m going to assume you mean what I’ve called “narrow agency” here[0]: “[an LLM] that can _plan and execute_ tasks that happen outside the chat window“.

That humans make mistakes all the time is the reason we encode business logic in code and automate systems. An “if” statement is always going to be faster, more reliable, and have better observability than a human or LLM-based reasoning agent.

0: https://sgnt.ai/p/agentic-ai-bad-definitions/

bigstrat2003 4/1/2025||

> Humans make mistakes all the time and we happily pay for those by the hour as long as the mistakes stay within an acceptable threshold.

We don't, however, continue to pay for the same person who keeps making the same mistakes and doesn't learn from them. Which is what happens with LLMs.

imtringued 4/1/2025||

This is why easy "out of the box" continual learning is absolutely essential in practice. It's not like the LLM is incapable of solving tasks, it simply wasn't trained for your specific one. There are optimizers like DSPy that let you validate against a test dataset to increase reliability at the expense of generality.

More comments...