I think a big issue with a lot of AI enabled coding is that tokens are currently heavily subsidized, and that refusing to learn how to write psudocode and pound out bugs in shell scripts is a fundamental step a lot of programmers are skipping... a stance that I find ironic considering that when I was being told as a preteen to "read the fucking manual" by the 90s internet, I was led to believe if I'm not churning out zero days in C by senior year of high school I might as well abandon all hope of ever understanding anything about computers.
(Flash forward, and the immortal words of the rapper Jay-Z: "I ain't passed the bar, but know a little bit... enough you won't be illegally searching my shit.")
Ah ! This is me too... at least for what I have to ship at work. Not so much for my toy/weekend projects. But it turns out agents are also good at explaining.
Before someone else says it, no I don't read the assembly code that is produced by my compilers. However, I can generally predict what kind of assembly will be produced, and the result is deterministic unlike LLMs. It seems like most vibe coders scoff at the idea of even looking at the code, and it just seems untenable to me when we're working with (usually correct) stochastic parrots.
Well the result generally matches the input specification, something that an LLM is much less consistent about. I don't think it's deterministic in the sense of being able to predict a priori what the compiler will generate for a novel and nontrivial code block.
I wrote an article recently (https://hic-ai.com/blog/tool-response-engineering) in which I argued that AI tool-engineering is the new frontier beyond prompts, and it talks about the agent loop and engineering loops, but boy I have a completely different perspective than Boris's. Rather than contending that prompts are no longer relevant because we can simply have AI think for us by having loops "prompt Claude and figuring out what to do", like what Boris claimed, I believe we have to think much harder now.
Why problems require human judgment and can't just be offloaded to an AI agent is simply this: AI agents lack durable, long-lasting, unique identities forged by real-world memories and experience, and they therefore lack judgment. There is no well-defined notion of having AI agents communicate to each other because they can't even tell the difference between talking to their future self or talking to another agent! They certainly don't reliably weigh whether a proposed fix to a failing unit test will subtly introduce new fallback logic that was never requested or otherwise alter the functionality of the system under test in some manner, or whether now is the time to refactor or now is the time not to refactor or whether to abstract more or less or anything like that. Most importantly, from a practical perspective: AI agents lack legal person status, and they therefore cannot own property or money, sue or be sued, be hired or fired, or otherwise be held financially responsible for their errors or other harmful acts they commit. Clearly, AI agents cannot even arguably qualify for legal personhood status in the future, unless and until they first are capable of assuming a unique and durable long-term identity, which in turn requires solving auth, memory, communications, and many other technical issues that are neither resolved nor standardized today. These facts combine to ensure that AI agents cannot be held liable and financially responsible when things go wrong, meaning that humans alone bear the costs of AI errors until further notice, and thus, human judgment remains the vital commodity that AI cannot replace. So many problems arise from abdicating judgment to AI agent in 2026!
Now, if all the above items were in existence, built, well-settled, etc., maybe I'd have to rethink things. But unless and until AI agents have unique identities and attain legal person status with bank accounts that can be sued and garnished in case of error, I don't think AI agents can seriously be trusted. Good human judgment and approval of all important decisions will remain the most important resource for any successful enterprise, for the foreseeable future. I think it's a very serious mistake to assume that human judgment can be swapped out safely by AI and certainly advise against taking Boris literally. Anyway, great article.
The more I play in this space, the more I’m drawn to the idea that some kind of back tracking constraint solver is a better solution than then the current naive while loop / brute force approach here.
The results I see are similar to what you get from a greedy brute force constraint solver; solves trivial problems, sometimes solves harder problems after a long time, takes too long to solve really hard problems; solutions are increasingly non optimal on average as complexity goes up.
We have so much existing knowledge about building good constraint solvers, if we could just figure out how to apply it here somehow.
We'd need a LLM back tracking constraint solver and I think the bottleneck will be good grading rules on whether a direction is actually architecturally clean or extensible. I'm not sure how one would approach that problem
If token costs are nil, then you can afford to run verification and generation through the same models. If token costs are high, then you will go broke verifying code sprawl.
Currently costs are (mostly) absent from the conversation, even though costs are what decide the limits which shape experience.
Also: Firms can be held liable for the products they sell, so if code cannot be reviewed then that code is essentially a law suit waiting to happen. I believe this is what customers will be demanding in the future: someone to hold accountable when things go wrong.
But this article is strangely lacking in foresight in terms of rapidly evolving model capabilities and output. One visual way to see this is to compare levels of SOTA video generation models. Look at outputs from Sora, to Veo, to Seedance 2.0, and now just released Seedance 2.5.
Or compare LLMs/VLMs as they have progressed: GPT-2, GPT-3, GPT-4, Opus, Fable/Mythos.
You can see the level of sloppiness or poor world understanding progress from comical nonsense to junior to senior with a few holes in their brain to an engineer you can actually almost trust to produce clean code if you mention the right guidelines in your instructions (such as avoiding overly local code).
As the model size/complexity increases, the intelligence increases, and so does code quality. We will also start specifically putting more high level code quality tasks into training datasets and training harnesses. I mean, Karpathy will probably see this article and make a huge dent in the issues without even larger models.
One thing people may not be aware of is that there is still a lot of room for hardware efficiency improvements and model size to grow. The compute-in-memory paradigm is just getting started in a way. Look at companies like Tensordyne and Mythic AI, but they are going to get blown out of the water by fully in-memory approaches.
For example look at the recent wurtzite ferroelectric nitrides breakthrough from the University of Michigan team (one of them tragically jumped from height after intense interrogation regarding national security concerns). The military is providing significant funding to move this towards development and scaling out of the lab.
That type or level of truly new paradigm system is going to boost efficiency by multiple orders of magnitude.
I know there are people who think Fable 5 was the end of the public LLM/VLM frontier moving, or that it is impossible to scale models further due to energy consumption. But there is zero chance that every high level VLM/LLM research team on the planet is going to stop publishing models or that the rapid progress in compute efficiency will stop.
Point being, within a year or two, the code coming out will be much cleaner. And within five or six years what you may see is that the leading models are 100+ trillion parameters and have sophisticated persistent context management etc. and they do not even produce application source code.
Instead, the database is in the context and is neurally rendered at 24 fps into whatever UI, schema and business logic you prompt it with in a broad way. The whole application is just precise thinking in an artificial brain ten times the complexity of an equivalent human brain.
And if you are disturbed by the current level of outsourcing for thinking to AI, it is just getting started. In a way it will be incredible, from another perspective horrific, but what I think we are seeing is the evolution of an ExoCortex. There will be an AI glasses stage where the integration is closer but still somewhat low bandwidth.
But sooner than later we are headed towards high bandwidth brain computer interfaces that make AI into an actual new cognitive layer.
So the waves of slop might make you feel sick, but that is nothing compared to the transhuman cyborgs powered by superhuman AI that are around the corner.