Agents need control flow, not more prompts

Posted by bsuh 22 hours ago

Agents need control flow, not more prompts(bsuh.bearblog.dev)

507 points | 250 commentspage 4

trolleski 5 hours ago|

Maybe we could devise a language which would be like a natural language but have some pretty neat formal properties... Wait...

shivnathtathe 5 hours ago||

Observability is the missing piece here — built opensmith for exactly this reason, tracing agent control flow locally

est 8 hours ago||

I have a question, does LLM follow these MANDATORY or DO NOT SKIP during pre-train, like how people write a comment paragraph on reddit corpus, or is it just some post-train alignment habbit?

stingraycharles 7 hours ago|

Instruction following is a specific fine tuning / post training phase, yes.

That’s why you see “base” vs “instruct” models for example — base is just that, the basic language model that models language, but doesn’t follow instructions yet.

Especially the open weights models have lots of variants, eg tuned for math, tuned for code, tuned for deep thinking, etc.

But it’s definitely a post train thing, usually done by generating synthetic data using other models.

dirtbag__dad 13 hours ago||

Build CLIs your agents call, that scaffold what you want, and lint so it actually does achieves your intended design.

Markdown files are a good reference but they are a weak enforcement tool and go stale easily.

Avoid burying yourself in more skills docs you’re not even writing yourself and probably never even read. Focus that toward deterministic tooling. (Not that skills or prompts are bad, I agree a meta skill that tells an agent what subagents and what order to run is useful)

zapataband1 13 hours ago|

lol so write an actual deterministic program? we're close to full circle

noisy_boy 12 hours ago||

Yes but with the "judgement" to call them. If you put "review the results based on conditions described here and anything else suspicious you may spot before call the <next_deterministic_program>", it should be able to catch some case you didn't think about in your standard checks. Of course it may miss out on those or have false positives but that is the nature of the beast, as it is now.

illwrks 19 hours ago||

I’ve been building a small ‘agent’ using copilot at work, partly a learning exercise as well as testing it in a small use case.

My personal opinion is that AI and agents are being misrepresented… The amount of setup, guidance and testing that’s required to create smarter version of a form is insane.

At the moment my small test is: Compressed instructions (to fit within the 8k limit) 9 different types of policies to guide the agent (json) 3 actual documents outlining domain knowledge (json) 8 Topics (hint harvesting, guide rails, and the pieces of information prepared as adaptive cards for the user) 3 Tools (to allow for connectors)

The whole thing is as robust as I can make it but it still feels like a house of cards and I expect some random hiccup will cause a failure.

dnh44 16 hours ago|

To be honest Copilot really stinks and is really far from the sharp edge of what is possible these days.

rbren 16 hours ago||

If you're interested in driving coding agents with code, check out the OpenHands Software Agent SDK [1]

We need to define agents in code, and drive them through semi-deterministic workflows. Kick subtasks off to agents where appropriate, but do things like gather context and deal with agent output deterministically.

This is a massive boost in accuracy, cost efficiency, AND speed. Stop using tokens to do the deterministic parts of the task!

[1] https://github.com/OpenHands/software-agent-sdk

zapataband1 13 hours ago|

"conversation.send_message("Write 3 facts about the current project into FACTS.txt.")"

why tf would i ever need this

alasano 13 hours ago||

I'm building a robust runtime for this.

It's externally orchestrated and managed, not by an agent running the the loop.

The goal is to force LLMs to produce exactly what you want every time.

I will be open sourcing soon. You can use whatever harness or tools you already use, you just delegate the actual implementation to the engine.

https://engine.build

xuhu 20 hours ago||

It sounds like the "app written in C++ calling Lua scripts, versus app written in Lua calling C++ libraries" debate.

Both designs (Lightroom, game engines) have worked successfully.

There's probably nothing that prevents mixing both approaches in the same "app".

QuercusMax 17 hours ago|

This pattern has been described for decades: https://wiki.c2.com/?AlternateHardAndSoftLayers. It's not just a matter of who's in control - you can layer these things.

astrobiased 21 hours ago||

It's the right direction, but control flow introduces limitations within a system that is quite adaptable to dynamic situations. The more control flow you try to do, the more buggy edge cases that pop up if done poorly.

Still have yet to see a universal treatment that tackles this well.

TuringTest 19 hours ago|

I would just reverse the architecture of the whole system. Build a classic deterministic program, and use LLMs as heuristics adapting the system to the environment - the functions that you call on the 'if's and 'switch' statements to decide where the system should go.

I see this as the most robust way to build a predictable system that runs in a controlled way while taking advantage of probabilistic AIs while reducing the impact of their alucinations.

LLMs simply can't be trusted to follow instructions in the general case, no matter how much you constraint them. The power of very large probabilistic models is that they basically solved the _frame problem_ of classic AI: logical reasoning didn't work for general tasks because you can't encode all common sense knowledge as axioms, and inference engines lost their way trying to solve large problems.

LLMs fix those handicaps, as they contain huge amounts of real world knowledge and they're capable of finding facts relevant to the problem at hand in an efficient way. Any autonomous system using them should exploit this benefit.

sbinnee 11 hours ago|

I have been telling this to my team that 1000 lines of instructions are deemed to fail no matter how great of instruction following capability of a model. I have been reviewing hundreds of line changes daily basis for about a month. I couldn’t help becoming a prayer.

More comments...