AI will make formal verification go mainstream

Posted by evakhoury 12/16/2025

AI will make formal verification go mainstream(martin.kleppmann.com)

827 points | 434 commentspage 3

teiferer 12/17/2025|

> rather than having humans review AI-generated code, I’d much rather have the AI prove to me that the code it has generated is correct. If it can do that, I’ll take AI-generated code over handcrafted code (with all its artisanal bugs) any day!

I wouldn't. An unreadable mess that has been formally verified is worse than a clear easy to understand piece of code that has not.

Code is rarely written from scratch. As long as you want humans to maintain code, readability is crucial. Code is changed magnitudes more often than written initially.

Of course, if you don't want humans to maintain the code then this point is moot. Though he gets to the catch later on: then we need to write (and maintain and debug and reason about) the specification instead. We will just have kicked the can down the road.

crowdyriver 12/17/2025|

You already have this problem whenever you are using a library in any programming language. Unless you are extremely strict, vendor it and read line by line what the library does, you are just trusting that the code that you are using works.

And nothing is stopping the AI from making the unreadable mess more readable in later iterations. It can make it pass the spec first and make it cleaner later. Just like we do!

teiferer 12/17/2025||

I think you missed the point. This was about whether proven-correct-but-unmaintainable-by-a-human is preferable over maintainable-but-not-proven-correct. I argued that no, it is not. If you change the two options at hand, then of course the outcome can be different.

mrkeen 12/17/2025||

Not a chance. Apart from type-checking, all formal verification needs a human to invest time and thought.

Fans of LLMs brag about speed and productivity.

blurbleblurble 12/17/2025||

Stuffing this awesome headline through the Curry-Howard isomorphism: LLMs produce better code when the type checker gives more useful feedback.

swayson 12/17/2025||

Really nice book on the subject -- Algorithms for Validation. The online version is actually freely accessible as a PDF.

https://algorithmsbook.com/validation/

nrhrjrjrjtntbt 12/16/2025||

I love HN because HN comments have talked about this a fair bit already. I think on the recent Erdos problem submission.

I like the idea that languages even like Rust and Haskell may be more accessible. Learn them of course but LLM can steer you out of getting stuck.

QuercusMax 12/16/2025||

Or it will lead you to unsafe workarounds when you get stuck

ben_w 12/16/2025||

Always a risk.

Some students whose work I had to fix (pre AI), was crashing a lot all over the place, due to !'s instead of ?'s followed by guard let … {} and if let … {}

QuercusMax 12/16/2025||

Forums used to be full of ppl using ON ERROR RESUME NEXT in VB

iwontberude 12/16/2025||

The idea that LLMs are steering anything correctly with Rust reference management is hilarious to me, but only due to my experiences.

nrhrjrjrjtntbt 12/17/2025||

Id definitely use LLMs to troubleshoot but not blindly vibecode.

misir 12/17/2025||

I think as long as we don't integrate formal verification into the programs themselves, it's not going to become mainstream. Especially now you got two different pieces you need to maintain and keep in sync (whether using LLMs or not).

Strong type systems are providing partial validation which is helping quite a lot IMO. The better we can model the state - the more constraints we can define in the model, the closer we'd be getting to writing "self-proven" code. I would assume formal proofs do way more than just ensuring validity of the model, but the similar approaches can be integrated to mainstream programs as well I believe.

jon-wood 12/17/2025||

Counterpoint: No it won’t. People are using LLMs because they don’t want to think deeply about the code they’re writing, why in hell would they instead start thinking deeply about the code being written to verify the code the LLM is writing?

Validark 12/17/2025||

"we wouldn’t even need to bother looking at the AI-generated code any more, just like we don’t bother looking at the machine code generated by a compiler."

2020: I don't care how it performs

2030: I don't care why it performs

2040: I don't care what it performs

agentultra 12/16/2025||

You’re still going to need people to guide and understand the proofs.

Might as well just learn Agda or Lean. There are good books out there. It’s not as hard as the author suggests. Hard, yes, but there’s no Royal road.

newpavlov 12/17/2025|

For all my skepticism towards using LLM in programming (I think that the current trajectory leads to slow degradation of the IT industry and massive loss of expertise in the following decades), LLM-based advanced proof assistants is the only bright spot for which I have high hopes.

Generation of proofs requires a lot of complex pattern matching, so it's a very good fit for LLMs (assuming sufficiently big datasets are available). And we can automatically verify LLM output, so hallucinations are not the problem. You still need proper engineers to construct and understand specifications (with or without LLM help), but it can significantly reduce development cost of high assurance software. LLMs also could help with explaining why a proof can not be generated.

But I think it would require a Rust-like breakthrough, not in the sense of developing the fundamental technology (after all, Rust is fairly conservative from the PLT point of view), but in the sense of making it accessible for a wider audience of programmers.

I also hope that we will get LLM-guided compilers which generate equivalency proofs as part of the compilation process. Personally, I find it surprising that the industry is able to function as well as it does on top of software like LLVM which feels like a giant with feet of clay with its numerous miscompilation bugs and human-written optimization heuristics which are applied to a somewhat vague abstract machine model. Just look how long it took to fix the god damn noalias atrtibute! If not for Rust, it probably would've still been a bug ridden mess.

More comments...