Posted by bsuh 20 hours ago
I created it to address this exact issue. It is a vendor-neutral ESLint-style policy engine and currently supports Claude Code, Codex, and Copilot.
It uses the agents hooks payloads and session history to enforce the policies. Allowing it to be setup to block commits if a file has been modified since the checks were last run, disallow content or commands using string or regex matching, and enforce TDD without the need of any extra reporter setup and it works with any language.
Feedback welcome: https://github.com/nizos/probity
The alternative is running your ten lines of Python in the most expensive, slowest, least reliable way possible. (Sure is popular though)
For example, most people were using the agents for internet research. It would spin for hours, get distracted or forget what it was supposed to be doing.
Meanwhile `import duckduckgo` and `import llm` and you can write ten lines that does the same thing in 20 seconds, actually runs deterministically, and costs 50x less.
The current models are much better -- good enough that the Auto-GPT is real now! -- but running poorly specified control flow in the most expensive way possible is still a bad idea.
Swamp teaches your Agent to build and execute repeatable workflows, makes all the data they produce searchable, and enables your team to collaborate.
We also build swamp and swamp club using swamp. You can see that process in the lab[2]. This combines all of the creativity of the LLM for the parts that matter, while providing deterministic outcomes for the parts you need to be deterministic.
I started working on it piece by piece about 14 years ago. It was originally targeted at junior developers to provide them the necessary security and scalability guardrails whilst trying to maintain as much flexibility as possible. It's very flexible; most of Saasufy is itself is built using Saasufy. Only the actual user service and orchestration is custom backend code.
Also, I designed it in a way that it would help the user fast-track their learning of important concepts like authentication, access control, schema validation.
It turns out that all of these things that junior devs need are exactly what LLMs need as well.
I tested it with Claude Code originally and got consistently great results. More recently, I tested with https://pi.dev with GPT 5.5 and it seemed to be on par.
Even skills are not a catch-all, because besides the supply chain risk from using skills you pull from someone else, a lot of tasks require an assortment of skills.
I've accommodated this with my agent team (mostly sonnets fwiw) by developing what we call "operational reflexes". Basically common tasks that require multiple domains of expertise are given a lockfile defining which of the skills are most relevant (even which fragment of a skill) and how in-depth / verbose each element needs to be to accomplish the same task the same way, with minimal hallucinations or external sources.
A coordinator agent assigns the tasks and selects the relevant lockfile and sends it along or passes it along to another agent with a different specified lockfile geared towards reviewing.
It's a bit, but this workflow dramatically increased the quality of output for technical work I get from my agents and I don't really need to write many prompts myself like this.