Levels of Agentic Engineering

Posted by bombastic311 1 day ago

Levels of Agentic Engineering(www.bassimeledath.com)

219 points | 102 commentspage 3

CuriouslyC 17 hours ago|

The thing blocking level 8 isn't the difficulty of orchestration, it's the cost of validation. The quality of your software is a function of the amount of time you've spent validating it, and if you produce 100x more code in a given time frame, that code is going to get 1/100th as much validation, and your product will be lower quality as a result.

Spec driven development can reduce the amount of re-implementation that is required due to requirements errors, but we need faster validation cycles. I wrote a rant about this topic: https://sibylline.dev/articles/2026-01-27-stop-orchestrating...

kantselovich 15 hours ago||

I’m at level 6 according to this article. I have solid harness, but I still need to review the code so I can understand how to plan for the next set of changes .

Also, I’m struggling to take it to multiple agents level, mostly because things depend on each other in the project - most changes cut across UI, protocol and the server side, so not clear how agents would merge incompatible versions.

Verification is a tricky part as well, all tests could be passing, including end to end integration and visual tests, but my verification still catches things like data is not persisted or crypto signatures not verified.

sjkoelle 20 hours ago||

Oceania has always been context engineering. Its been interesting to see this prioritized in the zeitgeist over the last 6 months from the "long context" zeitgeist.

mkoubaa 3 hours ago||

Theres this unstated assumption that higher levels are better that hasn't been proven empirically yet

osigurdson 15 hours ago||

"Level 8" isn't really a level, it is more like a problem type: language translation. Perhaps it can be extended to something a bit broader but the pre-requisite is you need to have a working reference implementation and high quality test suite.

jackby03 19 hours ago||

Good taxonomy. One thing missing from most discussions at these levels is how agents discover project context — most tools still rely on vendor-specific files (CLAUDE.md, .cursorrules). Would love to see standardization at that layer too.

politelemon 20 hours ago||

These are levels of gatekeeping. The items are barely related to each other. Lists like these will only promote toxicity, you should be using the tools and techniques that solve your problems and fit your comfort levels.

jakejmnz 15 hours ago||

This idea of harness engineering, is being thrown around more and more often nowadays. I believe I'm using things at that level but still needing to review so as to understand the architecture. Flaky tests are still a massive issue.

ramoz 16 hours ago||

Level4 is most interesting to me right now. And I would say we as an industry are still figuring out the right ergonomics and UX around these four things.

I spend a great deal of my time planning and assessing/reviewing through various mechanisms. I think I do codify in ways when I create a skill for any repeated assessment or planning task.

> To be clear, planning as a general practice isn't going away. It's just changing shape. For newer practitioners, plan mode remains the right entry point (as described in Levels 1 and 2). But for complex features at Level 7, "planning" looks less like writing a step-by-step outline and more like exploration: probing the codebase, prototyping options in worktrees, mapping the solution space. And increasingly, background agents are doing that exploration for you.

I mean, it's worth noting that a lot of plan modes are shaped to do the Socratic discovery before creating plans. For any user level. Advanced users probably put a great deal of effort (or thought) into guiding that process themselves.

> ralph loops (later on)

Ralph loops have been nothing but a dramatic mess for me, honestly. They disrupt the assessment process where humans are needed. Otherwise, don't expect them to go craft out extensive PRD without massive issues that is hard to review.

  - It would seem that this is a Harness problem in terms of how they keep an agent working and focused on specific tasks (in relation to model capability), but not something maybe a user should initiate on their own.

C0ldSmi1e 18 hours ago|

One of the best article I've read recently.

More comments...