Levels of Agentic Engineering

Posted by bombastic311 1 day ago

Levels of Agentic Engineering(www.bassimeledath.com)

206 points | 93 commentspage 2

priowise 4 hours ago|

One interesting side effect of agents becoming more capable is that the bottleneck slowly moves from “how to build the system” to “how to decide what the system should actually do.”

In other words, engineering gets faster, but prioritization and decision frameworks start to matter more.

mattlondon 5 minutes ago||

> prioritization and decision frameworks start to matter more.

This is the thing though, prioritization doesn't matter in the same way it used to.

We only needed to prioritize before because engineering was relatively slow and precious resource, so we had to pick and chose what to work on first because it took time.

But now we effectively have a limitless supply of SWEs, so why not do everything on the backlog?

I think the question now is more about sequencing than prioritization. What do we need to do first, before we can do these other things?

But yes generally requirements are still very important. Which features do we need etc.

bensyverson 24 minutes ago||

Yes, the more you delegate, the more you need to define the ultimate business outcomes you want, your taste, your brand and your technology preferences.

This is why building a "dark factory" is hard; at a certain point, you need to either externalize all that information into a "digital twin" of yourself, or you have to stop caring what gets built.

ftkftk 18 hours ago||

I prefer Dan Shapiro's 5 level analogy (based on car autonomy levels) because it makes for a cleaner maturity model when discussing with people who are not as deeply immersed in the current state of the art. But there are some good overall insights in this piece, and there are enough breadcrumbs to lead to further exploration, which I appreciate. I think levels 3 and 4 should be collapsed, and the real magic starts to happen after combining 5 and 6; maybe they should be merged as well.

bensyverson 32 minutes ago||

Agreed; here's the link for anyone looking for it:

https://www.danshapiro.com/blog/2026/01/the-five-levels-from...

maxdo 5 hours ago||

Car levels autonomy is fake. Everything including Level 3 is not a real autonomy it is hard rules + some reaction to the world, and everything above 3 is autonomy with just s slightly human security guardrails to attempt the real autonomy.

At this moment where we have human who just sit there before verify enough 9 after comas of error rates, the entire level conversation is dead. It's almost a binary state. Autonomous or not.

Similar happened with software levels. Even Level 2 was sci-fi 2 years ago, 1 year away from now anything bellow level 5 will be a joke except very regulated or billion users systems scale software.

orbital-decay 4 hours ago||

>You don't hear as much about context engineering these days. The scale has tipped in favor of models that forgive noisier context and reason through messier terrain (larger context windows help too).

Newer models are only marginally better at ignoring the distractors, very little has actually changed, and managing the context matters just as much as a year ago. People building agents just largely ignore that inefficiency and concentrate on higher abstraction levels, compensating it with token waste. (which the article is also discussing)

Arainach 15 hours ago||

> If your repo requires a colleague's approval before merge, and that colleague is on level 2, still manually reviewing PRs, that stifles your throughput. So it is in your best interest to pull your team up.

Until you build an AI oncaller to handle customer issues in the middle of the night (and depending on your product an AI who can be fired if customer data is corrupted/lost), no team should be willing to remove the "human reviews code step.

For a real product with real users, stability is vastly more important than individual IC velocity. Stability is what enables TEAM velocity and user trust.

tkiolp4 13 hours ago||

I want to move on to the next phase of AI programming. All these SKILLS, agentic programming and what not reminds me of the time of servlets, rmi, flash… all of that is obsolete, we have better tools now. Hope we can soon reach the “json over http” version of AI: simple but powerful.

Like imagine if you could go back in time and servlets and applets are the big new thing. You wouldn’t like to spend your time learning about those technologies, but your boss would be constantly telling that it is the future. So boring

hansonkd 13 hours ago|

skills obviously are a temporary thing. same with teams. the models will just train on all published skills and ai teams are more or less context engineering. all of it can be replaced by a better model

braebo 10 hours ago||

My use of skills is more like prompt templates for steering as opposed to the traditional sense of the word skill

efsavage 18 hours ago||

Yegge's list resonated a little more closely with my progression to a clumsy L8.

I think eventually 4-8 will be collapsed behind a more capable layer that can handle this stuff on its own, maybe I tinker with MCP settings and granular control to minmax the process, but for the most part I shouldn't have to worry about it any more than I worry about how many threads my compiler is using.

mattlondon 2 minutes ago||

Yep I was also surprised to see MCP & Skills as not only a distinct "level", but so high up.

In my mind, MCP & Skills is inseparable part of chat interfaces for LLMs, not a distinct level.

lherron 18 hours ago|||

I was surprised the author didn’t mention Yegge’s list (or maybe I missed it in my skim).

taude 12 hours ago|||

Agreed a bit. I'm probably too paranoid for MCP, but also don't mind rolling my own CLI tools that do the exact minimum I need them to do. Will see where we're at in a year or so....

ramesh31 17 hours ago||

>"Yegge's list resonated a little more closely with my progression to a clumsy L8."

I thought level 8 was a joke until Claude Code agent teams. Now I can't even imagine being limited to working with a single agent. We will be coordinating teams of hundreds by years end.

captainkrtek 12 hours ago||

There seems to be so much value in planning, but in my organization, there is no artifact of the plan aside from the code produced and whatever PR description of the change summary exists. It makes it incredibly difficult to assess the change in isolation of its' plan/process.

The idea that Claude/Cursor are the new high level programming language for us to work in introduces the problem that we're not actually committing code in this "natural language", we're committing the "compiled" output of our prompting. Which leaves us reviewing the "compiled code" without seeing the inputs (eg: the plan, prompt history, rules, etc.)

skybrian 9 hours ago||

I have a design doc subdirectory and instead of "plan mode" I ask the agent to write another design doc, based on a template. It seems to work? I can't say we've looked at completed design docs very often, though.

braebo 10 hours ago|||

If branches are tied to linear ids then gh cli and linear mcp is enough for any model to get most of the why context from any commit

fragmede 4 hours ago||

Have you considered having it write a plan.md file and saving it to git?

eikenberry 18 hours ago||

In my opinion there are 2 levels, human writes the code with AI assist or AI writes the code with human assist; centuar or reverse-centuar. But this article tries to focus on the evolution of the ideas and mistakenly terms them as levels (indicating a skill ladder as other commenters have noted) when they are more like stages that the AI ecosystem has evolved through. The article reads better if you think of it that way.

dist-epoch 17 hours ago|

There is another level - AI writes the code with AI assist.

eikenberry 17 hours ago||

That is just another level of reverse centaur and will eventually have a human ass attached to it.

Aperocky 15 hours ago||

The steps are small at the front and huge on the bottom, and carries a lot of opinions on the last 2 steps (but specifically on step 7)

That's a smell for where the author and maybe even the industry is.

Agents don't have any purpose or drive like human do, they are probabilistic machines, so eventually they are limited by the amount of finite information they carry. Maybe that's what's blocking level 8, or blocking it from working like a large human organization.

philipp-gayret 14 hours ago|

Floating what you call levels 6, 7 and 8. I have a strong harness, but manually kick off the background agents which pick up tasks I queue while off my machine.

I've experimented with agent teams. However the current implementation (in Claude Code) burns tokens. I used 1 prompt to spin up a team of 9+ agents: Claude Code used up about 1M output tokens. Granted, it was a long; very long horizon task. (It kept itself busy for almost an hour uninterrupted). But 1M+ output tokens is excessive. What I also find is that for parallel agents, the UI is not good enough yet when you run it in the foreground. My permission management is done in such a way that I almost never get interrupted, but that took a lot of investment to make it that way. Most users will likely run agent teams in an unsafe fashion. From my point of view the devex for agent teams does not really exist yet.

More comments...