Top
Best
New

Posted by todsacerdoti 18 hours ago

When AI writes the software, who verifies it?(leodemoura.github.io)
229 points | 227 commentspage 3
dataviz1000 12 hours ago|
100% of my innovation for the past month has been getting the coding agent to iterate with an OODA loop (I make it validate after act step) trying to figure out how to get it to not stop iterating.

For example, I have discovered there is a big difference between prompting 'there is a look ahead bias' and 'there is a [T+1] look ahead bias' where the later will cause it to not stop until it finds the [T+1] look ahead bias. It will start to write scripts that will `.shift(1)` all values and do statistical analysis on the result set trying to find the look ahead bias.

Now, I know there isn't look ahead bias, but the point is I was able to get it to iterate automatically trying different approaches to solve the problem.

The software is going to verify itself eventually, sooner than later.

macrolet 10 hours ago|
Your post is not clear to me. What did you innovate? Your example is unclear.
boznz 14 hours ago||
This is the biggest problem going forward. I wrote about the problem many times on my blog, in talks, and as premises in my sci-fi novels

Sitting in your cubical with your perfect set of test suites, code verification rules, SOP's and code reviews you wont want to hear this, but other companies will be gunning for your market; writing almost identical software to yours in the future from a series of prompts that generate the code they want fast, cheap, functionally identical, and quite possibly untested.

As AI gets more proficient and are given more autonomy (OpenClaw++) they will also generate directly executable binaries completely replacing the compiler, making it unreadable to a normal human, and may even do this without prompts.

The scenario is terrifying to professional software developers, but other people will do this regardless of what you think, and run it in production, and I expect we are months or just a few years away from this.

Source code of the future will be the complete series of prompts used to generate the software, another AI to verify it, and an extensive test suites.

Aldipower 12 hours ago||
If you need to interact with some things in platform.openai.com, you know it is not months away, it is there already now. I had to go through forms and flows there, so buggy and untested, simply broken. They really eat their own dog food. Interacting with the support, resulted in literally weeks of ping pong between me and AI smoothed replies via email to fix their bugs. Terrible.
skydhash 12 hours ago||
How do you get an extensive test suite?
boznz 7 hours ago||
I wish I had a crystal ball to know how this will play out in the future, will a different AI create it from the SOP? How will humans fit in?

For me with any new release of my winery production software I re-ran every job put into my clients production systems from job #1, about 200,000 jobs; Before going into production we checked all the balances, product compositions and inventory from this new software matches what the old system currently says. Takes about an hour to re-run fifteen years production and even a milliliter, or milligram difference was enough to trigger a stop/check. We had an extensive set of input data I could also feed in, to ensure mistakes were caught there too.

I expect other people do it there own way, but as a business this would be the low bar of testing I would expect to be done.

signa11 4 hours ago||
> When AI writes the software, who verifies it?

oh thats quite simple: the dude / dudette who gets blamed is the one who verifies it.

ljlolel 11 hours ago||
The users verify and fix it on the fly with the Claude VM JIT https://jperla.com/blog/claude-is-a-jit
pnathan 8 hours ago||
I am experimenting at a very early stage with using Verus in Rust to generate proveably correct Rust. I let the AI bang on the proof and trust the proof assistant to confirm it.

There is another route with Lean where Rust generates the Lean and there is proof done there but I haven't chased that down fully.

I think formal verification is a big win in the LLM era.

milind-soni 2 hours ago||
At this point, its the users
js8 5 hours ago||
I think intelligence in general means solving (and manipulating) constraint problems. So when you ask AI to, say, write a "snake game", it figures out what this means in terms of constraints to all the possible source codes that can be written (so it will have things like the program is a game, so it has a score, there is a simple game world and there is user input connected to the display of the game world and all sorts of constraints like this), and then it further refines these constraints until it picks a point (from the space of all possible programs) that satisfies those constraints, more or less.

One beautiful thing about current AI is that this process can handle fuzzy constraints. So you don't have to describe the requirements (constraints) exactly, but it can work with fuzzy sets and constraints (I am using "fuzzy" in the quite broad sense), such as "user can move snake head in 4 directions".

Now, because of this fuzzy reasoning, it can sometimes fail. So the wrong point (source code) can get picked from the fuzzy set that represents "snake game". For example, it can be something buggy or something less like a canonical snake game.

In that case of the failure, you can either sample another datapoint ("write another snake game"), or you can add additional constraints.

Now, the article argues in favor of formal verification, which essentially means, somehow convert all these fuzzy constraints into hard constraints, so then when we get our data point (source code of the snake game), we can verify that it indeeds belongs to the (now exact) set of all snake games.

So, while it can help with the sampling problem, the alignment problem still remains - how can we tell that the AI's (fuzzy) definition of a functional "snake game" is in line with our fuzzy definition? So that is something we don't know how to handle other than iteratively throwing AIs at many problems and slowly getting these definitions aligned with humans.

And I think the latter problem (alignment with humans on definitions) is the real elephant in the room, and so the article is IMHO focusing on the wrong problem by thinking the fuzzy nature of the constraints is the main issue.

Although I think it would definitely be useful if we had a better theoretical grasp on how AI handles fuzzy reasoning. As AI stands now, practicality has beaten theory. (You can formalize fuzzy logic in Lean, so in theory nothing prevents us from specifying fuzzy constraints in a formal way and then solving the resulting constraint problem formally, it just might be quite difficult, like solving an equation symbolically vs numerically.)

shubhamintech 10 hours ago||
Formal verification gets you to deploy with confidence but it's still a snapshot. What happens when real-world inputs drift from what you tested against? The subtler problem is runtime behavioral drift: an agent that's technically correct but consistently misunderstands a whole class of user queries is invisible to any pre-deploy check. Pre-deploy and post-deploy verification are genuinely different problems.
hackyhacky 10 hours ago|
> What happens when real-world inputs drift from what you tested against?

The whole point of formal verification is that you don't test. You prove the program correct mathematically for any input.

> an agent that's technically correct but consistently misunderstands a whole class of user queries is invisible to any pre-deploy check

The agent isn't verifying the program. The agent is writing the code that proves the program correct. If the agent misunderstands, it fails to verify the program.

50lo 15 hours ago||
One thing that seems under-discussed in this space is the shift from verifying programs to verifying generation processes.

If a piece of code is produced by an agent loop (prompt -> tool calls -> edits -> tests), the real artifact isn’t just the final code but the trace/pipeline that produced it.

In that sense verification might look closer to: checking constraints on the generator (tests/specs/contracts), verifying the toolchain used by the agent, and replaying generation under controlled inputs.

That feels closer to build reproducibility or supply-chain verification than traditional program proofs.

reenorap 4 hours ago|
> Google and Microsoft both report that 25–30% of their new code is AI-generated.

I think it's closer to 100%. I don't know anyone who isn't doing 100% ai-generated code. And we don't even code review it because why bother? If there's an error then we just regenerate the code or adjust the prompt.

More comments...