Top
Best
New

Posted by bsuh 23 hours ago

Agents need control flow, not more prompts(bsuh.bearblog.dev)
526 points | 258 commentspage 5
alasano 14 hours ago|
I'm building a robust runtime for this.

It's externally orchestrated and managed, not by an agent running the the loop.

The goal is to force LLMs to produce exactly what you want every time.

I will be open sourcing soon. You can use whatever harness or tools you already use, you just delegate the actual implementation to the engine.

https://engine.build

sbinnee 12 hours ago||
I have been telling this to my team that 1000 lines of instructions are deemed to fail no matter how great of instruction following capability of a model. I have been reviewing hundreds of line changes daily basis for about a month. I couldn’t help becoming a prayer.
astra_omnia 15 hours ago||
I think this also points to what needs to exist after the control-flow layer. Once an agent executes a bounded workflow, teams still need a reviewable object showing what authority/scope it had, what artifacts it touched, what validation ran, what evidence was retained, and what limitations remain. Logs are useful, but they are not the same thing as an action receipt.
arian_ 21 hours ago||
Control flow tells the agent what it's allowed to do. It doesn't tell you what the agent actually did. Both matter. Everyone is building the permission layer. Almost nobody is building the verification layer.
allynjalford 12 hours ago|
I am...
niyikiza 16 hours ago||
My analogy[1] has been that we need a valet key: capped speed, geofenced, short ttl, can't open trunk/glovebox, etc. That way we don't have to say pretty please to the valet and hope that they won't get ideas.

[1] https://niyikiza.com/posts/capability-delegation/

onion2k 21 hours ago||
Agents are probabilistic systems. A common mechanism to get a reliable answer from systems that can have variable output is to run them several times (ideally in separate, isolated instances) and then have something vote on the best result or use the most common result. This happens in things like rockets and aviation where you have multiple systems giving an answer and an orchestrator picking the result.

I've tried doing something similar with AI by running a prompt several times and then have an agent pick the best response. It works fairly well but it burns a lot of tokens.

Yokohiii 19 hours ago||
An LLMs "wrong" decision is either systemic or biased. They learn "common sense" from human input (i.e. shared datasets, reinforcement learning). If a decision is flat out wrong for you, asking 10 LLMs is unlikely to help.
suprfnk 21 hours ago||
But then, if an agent picks the best response, how would you know that that is reliable?
onion2k 20 hours ago|||
You could get the agents to output something structured and then use a deterministic test if you're worried about that.
xienze 21 hours ago|||
Obviously you have multiple agents justify why they picked a certain response and then create another agent that picks the solution with the best justification.
kkyr 21 hours ago||
touché
pron 16 hours ago||
How do you have "aggressive error detection" when one of the most common and pernicious mistakes agents make are architectural? The behaviour is fine, but the code is overly defensive, hiding possible bugs and invariant violations, leading to ever more layers of complexity that ultimately end up diverging when nothing can be changed without breaking something.
zby 19 hours ago||
I concur - it does not make sense to do in llm prompts what can be done in code. Code is cheaper, faster, deterministic and we have lots of experience with working with code.

Especially all bookkeeping logic should move into the symbolic layer: https://zby.github.io/commonplace/notes/scheduler-llm-separa...

briga 21 hours ago||
Sometimes it feels like Agents are just reinventing microservices. Except they are are doing it in the most inefficient way possible. It is certainly a good way for the LLM companies to sell more tokens
mnalley95 19 hours ago|
Own your control flow! A key point from 12 factor agents.

"One thing that I have seen in the wild quite a bit is taking the agent pattern and sprinkling it into a broader more deterministic DAG." - https://github.com/humanlayer/12-factor-agents/blob/main/REA...

More comments...