Top
Best
New

Posted by bsuh 20 hours ago

Agents need control flow, not more prompts(bsuh.bearblog.dev)
497 points | 245 commentspage 3
Nizoss 16 hours ago|
If you’re interested in such deterministic scaffolding/control flow, check out Probity.

I created it to address this exact issue. It is a vendor-neutral ESLint-style policy engine and currently supports Claude Code, Codex, and Copilot.

It uses the agents hooks payloads and session history to enforce the policies. Allowing it to be setup to block commits if a file has been modified since the checks were last run, disallow content or commands using string or regex matching, and enforce TDD without the need of any extra reporter setup and it works with any language.

Feedback welcome: https://github.com/nizos/probity

Imanari 3 hours ago||
As with so many things aider.chat was ahead of its time with its ability to create deterministic scripts.
andai 9 hours ago||
Yeah, you could also see this in 2023 with Auto-GPT. People were letting GPT "drive" when what they actually needed, in most cases, was like ten lines of Python (and maybe a few calls to a llm() function).

The alternative is running your ten lines of Python in the most expensive, slowest, least reliable way possible. (Sure is popular though)

For example, most people were using the agents for internet research. It would spin for hours, get distracted or forget what it was supposed to be doing.

Meanwhile `import duckduckgo` and `import llm` and you can write ten lines that does the same thing in 20 seconds, actually runs deterministically, and costs 50x less.

The current models are much better -- good enough that the Auto-GPT is real now! -- but running poorly specified control flow in the most expensive way possible is still a bad idea.

kenjackson 18 hours ago||
I feel like people forget that they're still allowed to program. You're still allowed to create workflows tying together LLMs and agents if you want. Almost all the tools and technology that existed before LLMs are still available to be used.
nickstinemates 9 hours ago||
This is why we built swamp[1].

Swamp teaches your Agent to build and execute repeatable workflows, makes all the data they produce searchable, and enables your team to collaborate.

We also build swamp and swamp club using swamp. You can see that process in the lab[2]. This combines all of the creativity of the LLM for the parts that matter, while providing deterministic outcomes for the parts you need to be deterministic.

1: https://swamp.club

2: https://swamp.club/lab

socketcluster 14 hours ago||
That's why I built https://saasufy.com/ as an agent tool for building data-driven realtime apps.

I started working on it piece by piece about 14 years ago. It was originally targeted at junior developers to provide them the necessary security and scalability guardrails whilst trying to maintain as much flexibility as possible. It's very flexible; most of Saasufy is itself is built using Saasufy. Only the actual user service and orchestration is custom backend code.

Also, I designed it in a way that it would help the user fast-track their learning of important concepts like authentication, access control, schema validation.

It turns out that all of these things that junior devs need are exactly what LLMs need as well.

I tested it with Claude Code originally and got consistently great results. More recently, I tested with https://pi.dev with GPT 5.5 and it seemed to be on par.

trolleski 4 hours ago||
Maybe we could devise a language which would be like a natural language but have some pretty neat formal properties... Wait...
sudosteph 17 hours ago||
This is a good discussion topic. A lot of people really seem to believe that if you word a prompt just so, that you just need to throw a high-powered model at it, it will work consistently how you want. And maybe as models progress that might be the case. But right now, that's not how I've seen real life work out.

Even skills are not a catch-all, because besides the supply chain risk from using skills you pull from someone else, a lot of tasks require an assortment of skills.

I've accommodated this with my agent team (mostly sonnets fwiw) by developing what we call "operational reflexes". Basically common tasks that require multiple domains of expertise are given a lockfile defining which of the skills are most relevant (even which fragment of a skill) and how in-depth / verbose each element needs to be to accomplish the same task the same way, with minimal hallucinations or external sources.

A coordinator agent assigns the tasks and selects the relevant lockfile and sends it along or passes it along to another agent with a different specified lockfile geared towards reviewing.

It's a bit, but this workflow dramatically increased the quality of output for technical work I get from my agents and I don't really need to write many prompts myself like this.

shivnathtathe 4 hours ago|
Observability is the missing piece here — built opensmith for exactly this reason, tracing agent control flow locally
More comments...