Agents need control flow, not more prompts

Posted by bsuh 19 hours ago

Agents need control flow, not more prompts(bsuh.bearblog.dev)

480 points | 238 commentspage 2

bandrami 6 hours ago|

It's going to be hilarious in a few years when people are still using LLMs but only via a controlled vocabulary and syntax that you have to learn. It's just like how everybody moved to NoSQL 15 years ago but immediately recreated schemae in their JSON.

plumbline 11 hours ago||

I've been thinking about this a lot actually. It can almost be related to the conversation about specialization. The more specialized a model is required to be, the less capable it seems to be at a foundational level, where as if you just aim towards a liiitle bit of abstraction, you might get the best of both worlds.

Here's a pretty specific example of what I mean, but maybe food for thought:

Podcast (20 minute digest): https://pub-6333550e348d4a5abe6f40ae47d2925c.r2.dev/EP008.ht...

Paper: https://arxiv.org/abs/2605.00225

Weryj 3 hours ago||

Pure agentic loops with markdown documents as a program 'agentic workflow' is incredible for experimentation, developing and testing your workflow idea.

The second it works, bake the workflow into the harness. Yesterday I was doing just that, and the whole agent loop disappeared because the process could've been condensed into a one-shot request (+1 MorphLLM fast apply) from careful context construction. (It was an Autoresearcher)

Imanari 2 hours ago||

As with so many things aider.chat was ahead of its time with its ability to create deterministic scripts.

Neywiny 19 hours ago||

If you're trying to get reliability and determinism out of the LLM, you've already lost

tekne 18 hours ago||

Wait... why?

Making an unreliable, nondeterministic system give reliable results for a bounded task with well-understood parameters is... like half of engineering, no?

There's a huge difference between "generate this code here's a vague feature description" and "here's a list of criteria, assign this input to one of these buckets" -- the latter is obviously subject to prompt engineering, hallucination, etc -- but so can a human pipeline!

JCTheDenthog 17 hours ago|||

>the latter is obviously subject to prompt engineering, hallucination, etc -- but so can a human pipeline!

...which is why we write deterministic code to take the human out of the pipeline. One of the early uses of computers was calculating firing tables for artillery, to replace teams of humans that were doing the calculations by hand (and usually with multiple humans performing each calculation to catch errors). If early computers had a 99% chance of hallucinating the wrong answer to an artillery firing table, the response from the governments and militaries that used them would not be to keep using computers to calculate them. It would be to go back to having humans do it with lots of manual verification steps and duplicated work to be sure of the results.

If you're trying to make LLMs (a vague simulacrum of humans) with their inherent and unsolvable[1] hallucination problems replace deterministic systems, people are going to eventually decide to return to the tried and true deterministic systems.

1: https://arxiv.org/abs/2401.11817

Neywiny 17 hours ago|||

Because it's not possible. There is nothing you can say to the LLM that will guarantee that something happens. It's not how it works. It will maybe be taken into consideration if you're lucky.

But if you're trying to tell me that every time you list criteria you get them all perfectly matched, you're clearly gifted.

gf000 4 hours ago||

I'm being deliberately pedantic, but depending on what kind of representation we use for the neural network (due to rounding) as well as the choice of inference (that is, given a distribution for next token, which one to choose), it can absolutely be reproducible and completely deterministic.

Though chaotic, which I believe is the better word here - a single letter change may result in widely different results.

We just choose to use more random inference rules, because they have better results.

Neywiny 1 hour ago||

With determinism you're not wrong. The problem is that you'd need to make sure all your seeds, temperatures, and other input parameters are exactly the same, and importantly that all context is cleared. But people don't do that. And I'm not sure every if even any provider lets you set those parameters.

aleksiy123 18 hours ago|||

There’s a whole range between completely random and completely rule based deterministic.

Somewhere in between that I guess is the varying levels of intelligence more likely able to make the “right” decision for anything you throw at it.

evantbyrne 16 hours ago|||

I would hope that when engineers speak of LLM determinism they just mean it as shorthand for close to 1 under expected conditions

sudosteph 16 hours ago|||

I mean, with reliability there's a spectrum. If the risks that an unreliable outcome brings aren't all that bad, then sometimes it's worth it to chase "my agents made an acceptable PR 70% of the time, can I get it to 90?"

Determinism is a different matter. Scripts and hooks are really the main levers you can pull there, but yeah - a a decent script and a cron job will handle certain things much better (and for a fraction of the cost)

pydry 18 hours ago||

This is something I think some people are fundamentally not capable of understanding.

59nadir 18 hours ago||

This was one of the key insights in Stripe's explanations about Minions[0], their autonomous agent system; in-between non-deterministic LLM work they had deterministic nodes that handled quality assurance and so on in order to not leave those types of things to the LLMs.

0 - https://stripe.dev/blog/minions-stripes-one-shot-end-to-end-...

k__ 5 hours ago||

At my new job, I was assigned to improve processes with AI.

My first thought was, well agents seem nice, but I think, AI workflows are a better bet. However, I don't really understood AI or agents in depth and felt like I was just "doing things the old way" and removing flexibility from agents was a ridiculous idea.

After some research I got the impression that I was right. A well defined workflow and scope is just what's needed for AI. It's cheaper and more consistent. It probably even makes the whole thing run well with non-SOTA models.

moconnor 14 hours ago||

“Flow” moves agents through a yaml flowchart of prompts and decisions. It’s working quite well for a couple of us in Tenstorrent, more to discover here though:

https://github.com/yieldthought/flow

Happily, 5.5 is good at writing and using it.

aryehof 7 hours ago|

I find Flow really interesting, thanks for pointing it out.

Deterministic workflows using AI to help perform those steps not requiring human input has been an area of interest for me for some time. Particularly interesting how you are using the AI to determine what a step has achieved and the action of the next step.

Combine it with workflow elements that does handle human steps together with a notification/routing/task system would make for a helpful system for so many.

rglover 16 hours ago||

> Babysitter: Keep a human in the loop to catch errors before they propagate.

This is the only way to guarantee AI usage doesn't burn you. Any automation beyond this is just theater, no matter how much that hurts to hear/undermines your business model.

A bird sings, a duck quacks. You don't expect the duck to start singing now, do you?

kelseyfrog 16 hours ago||

I'm not sure I agree. Like all stochastic processes, LLM errors can be quantified. That makes each use case a risk-reward tradeoff where users can decide if the tradeoff makes sense for them or not. There are scenarios where errors are acceptable because the risks are low or errors are acceptable or the rewards make up for them. This is a process engineer problem where business and technology specifics matter.

rglover 16 hours ago||

I see where you're coming from, but this assumes good behavior and discipline which most people/teams struggle with.

If a business can get away with some margin of error being acceptable, more power to them. But if not (or doing so would cause additional problems; what I'd imagine to be true for a non-trivial number of orgs), it's wise to consider the nature of the tool a lot of people are suggesting is mandatory if you're dependent on consistent, predictable results.

kelseyfrog 15 hours ago||

That's fair. A heuristic that leaves some opportunity on the table due to org capability is a reasonable one to have.

alasano 10 hours ago||

I think babysitting LLMs is exactly the thing that burns you.

Presuming you meant burns you out though.

doubled112 8 hours ago||

No, "burns you" as in "play with fire and you'll get burned".

It will make a mistake and you will get burned, so you have to babysit it.

apalmer 19 hours ago|

Generally agree with this stance case in point: the breakthrough in ai coding was not that AI intelligence increased as much as that a lot of the core process execution moved out of the LLM prompt and into the harness.

More comments...