Posted by bsuh 1 day ago
so, every stage outputs a source of truth for that stage, which can be used by later stages for verification, alone or together with other artifacts. if you want to read more, here's the recursive-mode development workflow I built: https://recursive-mode.dev/introduction
I decided to build my agentic environment differently. Local only, sandboxed, enforced with Go specific requirement definitions that different agent roles cannot break as a contract.
That alone is far better than any hyped markdown-storage-sold-as-memory project I've seen in the last weeks.
Currently I am experimenting with skills tailored to other languages, because agentskills actually are kinda useless because they're not enforced nor can any of their metadata be used to predictably verify their behaviors.
My recommendation to others is: Treat LLM output as malware. Analyse its behavior, not its code. Never let LLMs work outside your sandbox. Force them to not being able to escape sandboxes. And that includes removing the Bash tool, for example, because that's not a reproducible sandbox.
Also, choose a language that comes with a strong unit testing methodology. I chose Go because it allows me to write unit tests for my tools, and even agents to agents communication down the line (with some limitations due to TestMain, but at least it's possible).
If you write your agent environment or harness in Typescript, you already failed before you started. Compiled code isn't typesafe because the compiler doesn't generate type checks in the resulting JS code.
Anyways, my two cents from the purpleteaming perspective that tries to make LLMs as deterministic as possible.