Agents need control flow, not more prompts

Posted by bsuh 1 day ago

Agents need control flow, not more prompts(bsuh.bearblog.dev)

564 points | 281 commentspage 11

yogthos 1 day ago|

This was basically my realization as well. We are trying to get LLMs to write software the way humans do it, but they have a different set of strength and weaknesses. Structuring tooling around what LLMs actually do well seems like an obvious thing to do. I wrote about this in some detail here:

https://yogthos.net/posts/2026-02-25-ai-at-scale.html

flowgrammer 1 day ago|

My experimentation with Verblets also concluded plain functions are the most logical harness for LLMs.

encoderer 1 day ago||

You can get a lot done with agentic programming without going "all in" on a gastown-like system, but I think there is a minimum viable setup:

1. an adversarial agent harness that uses one agent to create a plan and implement it, and another to review the plan and code-review each step.

2. an agentic validation suite -- a more flexible take on e2e testing.

3. some custom skills that explain how to use both of those flows.

With this in place you can formulate ideas in a chat session, produce planning artifacts, then use the adversarial system to implement the plans and the validation layer to get everything working e2e for human review.

There are a lot of tools you can use for these things but I chose to just build the tooling in the repo as I go.

Schiendelman 1 day ago|

Claude already creates multiple agents for some projects just to keep context windows smaller. I don't think it'll be long before they offer a testing agent along with their planning agent.

encoderer 1 day ago||

I prefer having codex author plans and implement, and claude play reviwer. I do swap them from time to time and i have a lot of respect for claude 4.6 and 4.7 but for my domain I think codex does a better job with the authoring.

Schiendelman 1 day ago||

That's a cool idea! Plus I bet you can stay in lower tiers with both?

encoderer 1 day ago||

You're definitely burning more tokens with the back/forth and multi-step approach but assuming you swap who does the authoring from time to time you can definitely get the max out of each plan. Review doesn't use as many tokens.

ares623 13 hours ago||

Guys, c'mon, what are we even doing...

moron4hire 22 hours ago||

I've been building this at work. It's... shockingly not hard. People have been telling me, "get into agentic coding now or you'll get left behind" and the things they are saying need training and taste and expertise to figure out how to cajole the AI into doing a job are things that I can just write a program to do.

There's this guy at work who is kind of precious about Claude Code. When Hegseth banned Anthropic, this guy freaked out. He spent many pages ranting about how terrible Gemini and Codex are and basically nuked his project. He insisted only Claude could do his project.

Meanwhile, I managed to redo his work with GPT 4o in a weekend. No AI generated code anywhere, just being capable of writing a for-loop over a directory of files my own self. The AI part is only really necessary because folks can't be bothered to author documents with proper hierarchies.

People talk about "AI is going to eliminate boilerplate and accelerate development and we'll do new jobs that were too costly before". Yet this guy spent weeks coaxing Claude to do something that took me a few hours because "boilerplate" is really not that big of a deal. If this is the kind of job we're going to be able to do because the value-to-effort ratio was less than 1, it kind of indicates to me that there isn't a lot of value to gain at any level of effort. Yeah, it's not really worth your time to bend over and pick up a penny, but even if I had a magical penny snagging magnet, I'm still going to ignore the pennies because that's just how valueless pennies are.

If AI lets me never have to open a PowerPoint from a client to read the chart values from the piechart they screenshot and pasted into PowerPoint, that's wonderful. What more would I ever need? The rest of the work just isn't that hard. But if you think AI is going to replace people like me because it can do "boilerplate", the AI is not anywhere near as fast or cheap at getting to a reliable, consistent, repeatable process as a human for that.

empath75 1 day ago||

I have heard this sort of thing a lot from people working with agents, and I just think it's fundamentally misguided as a way to think of them, and if you work with them this way, you are probably setting money on fire for no reason because the tasks you are able to perform this way _don't need agents to begin with_.

You might use an LLM api call here as a translation or summary step in a deterministic workflow, but they are not acting as agents, because they lack _agency_.

The value of using an agent harness is precisely that they are _not deterministic_. You provide agents a goal, tools and constraints and they do the task they were asked to perform as best as they can figure out how to do it. You may provide them deterministic workflows as tools they can call, but those workflows, outside of the agent harness itself, should not constrain what the agent does. You are paying a lot of money for agent reasoning, not to act as an expensive data transformation pipeline.

It may be the case that a lot of agentic workflows are more properly done with fully deterministic workflows, but the goal there should be to _remove the agents entirely_ and spend those tokens on non deterministic tasks that require agentic decision making.

I do think there are fundamental limits to what agents are capable of doing unsupervised and there does need to be a lot more human guidance, observability and control over what they are doing, but that's sort of the opposite of embedding them in deterministic workflows, that is more of team integration/communication problem to solve.

AIorNot 1 day ago||

I mean we have Langgraph, BAML etc

MagicMoonlight 19 hours ago||

This is slop generated right?

naturalintell 1 hour ago||

[dead]

hiroto_lemon 5 hours ago|

[flagged]

More comments...