Posted by bombastic311 1 day ago
In other words, engineering gets faster, but prioritization and decision frameworks start to matter more.
This is the thing though, prioritization doesn't matter in the same way it used to.
We only needed to prioritize before because engineering was relatively slow and precious resource, so we had to pick and chose what to work on first because it took time.
But now we effectively have a limitless supply of SWEs, so why not do everything on the backlog?
I think the question now is more about sequencing than prioritization. What do we need to do first, before we can do these other things?
But yes generally requirements are still very important. Which features do we need etc.
This is why building a "dark factory" is hard; at a certain point, you need to either externalize all that information into a "digital twin" of yourself, or you have to stop caring what gets built.
https://www.danshapiro.com/blog/2026/01/the-five-levels-from...
At this moment where we have human who just sit there before verify enough 9 after comas of error rates, the entire level conversation is dead. It's almost a binary state. Autonomous or not.
Similar happened with software levels. Even Level 2 was sci-fi 2 years ago, 1 year away from now anything bellow level 5 will be a joke except very regulated or billion users systems scale software.
Newer models are only marginally better at ignoring the distractors, very little has actually changed, and managing the context matters just as much as a year ago. People building agents just largely ignore that inefficiency and concentrate on higher abstraction levels, compensating it with token waste. (which the article is also discussing)
Until you build an AI oncaller to handle customer issues in the middle of the night (and depending on your product an AI who can be fired if customer data is corrupted/lost), no team should be willing to remove the "human reviews code step.
For a real product with real users, stability is vastly more important than individual IC velocity. Stability is what enables TEAM velocity and user trust.
Like imagine if you could go back in time and servlets and applets are the big new thing. You wouldn’t like to spend your time learning about those technologies, but your boss would be constantly telling that it is the future. So boring
I think eventually 4-8 will be collapsed behind a more capable layer that can handle this stuff on its own, maybe I tinker with MCP settings and granular control to minmax the process, but for the most part I shouldn't have to worry about it any more than I worry about how many threads my compiler is using.
In my mind, MCP & Skills is inseparable part of chat interfaces for LLMs, not a distinct level.
I thought level 8 was a joke until Claude Code agent teams. Now I can't even imagine being limited to working with a single agent. We will be coordinating teams of hundreds by years end.
The idea that Claude/Cursor are the new high level programming language for us to work in introduces the problem that we're not actually committing code in this "natural language", we're committing the "compiled" output of our prompting. Which leaves us reviewing the "compiled code" without seeing the inputs (eg: the plan, prompt history, rules, etc.)
That's a smell for where the author and maybe even the industry is.
Agents don't have any purpose or drive like human do, they are probabilistic machines, so eventually they are limited by the amount of finite information they carry. Maybe that's what's blocking level 8, or blocking it from working like a large human organization.
I've experimented with agent teams. However the current implementation (in Claude Code) burns tokens. I used 1 prompt to spin up a team of 9+ agents: Claude Code used up about 1M output tokens. Granted, it was a long; very long horizon task. (It kept itself busy for almost an hour uninterrupted). But 1M+ output tokens is excessive. What I also find is that for parallel agents, the UI is not good enough yet when you run it in the foreground. My permission management is done in such a way that I almost never get interrupted, but that took a lot of investment to make it that way. Most users will likely run agent teams in an unsafe fashion. From my point of view the devex for agent teams does not really exist yet.