If the LLM fails, either you didn't describe your outcome sufficiently or is misinterpreted what you said or it couldn't do it (rare).
Common errors should be encoded as context for future similar tasks, don't bloat skills with stuff that isn't shown to be necessary.
This is not true for anything complex. They’re instruction followers, of which task completion is just one facet.
They’re also extremely eager to complete tasks without enough information, and do it wrongly. In the case of just describing task completion, despite your best efforts, there are always some oversights or things you didn’t even realize were underspecified.
So it helps a lot to add some process around it, eg “look up relevant project conventions and information. think through how to complete the task. ask me clarifying questions to resolve ambiguities. blah blah”. This type of prompt will also help with the new Opus 4.7 adaptive thinking to ensure it thinks through the task properly.
Yes, not everything I use LLMs for going to have the same level of ambiguity or complex requirements. Optimizing by choosing to skip over parts of the process is exactly Addy is talking in this article.
Prompting is just the first part. To get the outcome, you need to have other systems to steer the agent as it get things wrong. Proper deterministic tests work. But there is also stuff that need to happen during the LLM execution like cyclic detection etc. All of this adds up.
You cannot just prompt an LLM an hope for a good outcome. It might work in small isolated scenarios but it just does not work consistently enough to call it reliable.
Without further guardrails enforced by the process or the harness, LLMs do not have sufficient capabilities to complete a task up to a certain standard.
I prefer the start small and iterate approach to arrive at a result.
Then I ask it to summarize. Sometimes after that I ask it to generalize.
Good test cases.
Clear and concise documentation.
CI/CD.
Best practices and onboarding docs.
Managing LLMs is becoming more and more similar to managing teams of people.
Yep, benchmarks, comparisons of with/without, samples of generated code with/without. This kind of stuff matters, and you may be making your agent stupider or getting worse results without real analysis.
Also this prose reads like the author has drunk the Google kool-aid and not much else.
This (sdlc == working backwards & bar raiser) is so horribly wrong, that I hope this was an LLM hallucination.
In general, I'm starting to see these agent scaffolding systems as an anti-pattern: people obsess over systems for guiding agents and construct elaborate rube-goldberg machines and then others cargo-cult them wholesale, in an effort to optimize and control a random process and minimize human involvement.
But I don't expect anyone to every use my stuff. It's complicated as hell. But it's for me, and it works without me having to remotely think about the complexity.
I love that.
Agents do read that. And actually remember it. Because it's tiny with other things you are cramming into their context.
Very grateful for this repository and everyone who contributed to it!