Posted by todsacerdoti 19 hours ago
If a piece of code is produced by an agent loop (prompt -> tool calls -> edits -> tests), the real artifact isn’t just the final code but the trace/pipeline that produced it.
In that sense verification might look closer to: checking constraints on the generator (tests/specs/contracts), verifying the toolchain used by the agent, and replaying generation under controlled inputs.
That feels closer to build reproducibility or supply-chain verification than traditional program proofs.
One beautiful thing about current AI is that this process can handle fuzzy constraints. So you don't have to describe the requirements (constraints) exactly, but it can work with fuzzy sets and constraints (I am using "fuzzy" in the quite broad sense), such as "user can move snake head in 4 directions".
Now, because of this fuzzy reasoning, it can sometimes fail. So the wrong point (source code) can get picked from the fuzzy set that represents "snake game". For example, it can be something buggy or something less like a canonical snake game.
In that case of the failure, you can either sample another datapoint ("write another snake game"), or you can add additional constraints.
Now, the article argues in favor of formal verification, which essentially means, somehow convert all these fuzzy constraints into hard constraints, so then when we get our data point (source code of the snake game), we can verify that it indeeds belongs to the (now exact) set of all snake games.
So, while it can help with the sampling problem, the alignment problem still remains - how can we tell that the AI's (fuzzy) definition of a functional "snake game" is in line with our fuzzy definition? So that is something we don't know how to handle other than iteratively throwing AIs at many problems and slowly getting these definitions aligned with humans.
And I think the latter problem (alignment with humans on definitions) is the real elephant in the room, and so the article is IMHO focusing on the wrong problem by thinking the fuzzy nature of the constraints is the main issue.
Although I think it would definitely be useful if we had a better theoretical grasp on how AI handles fuzzy reasoning. As AI stands now, practicality has beaten theory. (You can formalize fuzzy logic in Lean, so in theory nothing prevents us from specifying fuzzy constraints in a formal way and then solving the resulting constraint problem formally, it just might be quite difficult, like solving an equation symbolically vs numerically.)
I think it's closer to 100%. I don't know anyone who isn't doing 100% ai-generated code. And we don't even code review it because why bother? If there's an error then we just regenerate the code or adjust the prompt.
1. Agent writes a minimal test against a spec. 2. Another agent writes minimal implementation to make test pass only.
Repeat
This is ping pong pair programming.
Someone needs to be held accountable when things go wrong. Someone needs to be able to explain to the CEO why this or that is impossible.
If you want to have AI generate all the code for your business critical software, fine, but you better make sure you understand it well. Sometimes the fastest path to deep understanding is just coding things out yourself - so be it.
This is why the truly critical software doesn’t get developed much faster when AI tools are introduced. The bottleneck isn’t how fast the code can be created, it’s how fast humans can construct their understanding before they put their careers on the line by deploying it.
Ofc… this doesn’t apply to prototypes, hackathons, POCs, etc. for those “low stakes” projects, vibe code away, if you wish.
I personally find the “move fast and break thing” ethos morally abhorrent, but that doesn’t make it go away.