AI will make formal verification go mainstream

Posted by evakhoury 12/16/2025

AI will make formal verification go mainstream(martin.kleppmann.com)

827 points | 434 commentspage 6

bob1029 12/16/2025|

I don't know if it's "formal", but I think combining Roslyn analyzers with an LLM and a human in the loop could be game changing. The analyzers can enforce intent over time without any regressions. The LLM can write and modify analyzers. Analyzers can constrain the source code of other analyzers. Hypothetically there is no limit to the # of things that could be encoded this way. I am currently working on a few prototypes in the vicinity of this.

https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/t...

Analemma_ 12/16/2025||

I don't really buy it. IMO, the biggest reason formal verification isn't used much in software development right now is that formal verification needs requirements to be fixed in stone, and in the real world requirements are changing constantly: from customer requests, from changes in libraries or external systems, and competitive market pressures. AI and vibe coding will probably accelerate this trend: when people know you can vibe code something, they will feel permitted to demand even faster changes (and your upstream libraries and external systems will change faster too), so formal verification will be less useful than ever.

lawrpaulson 12/17/2025||

Formal verification is progressing inexorably and will, over time, transform the software development experience. There is a long way to go, and in particular, you can't even get started until basic interfaces have been specified formally. This will be a bottom-up process where larger and larger components gradually get specified and verified. These will mostly be algorithmic or infrastructure building blocks, as opposed to application software subject to changing and vague requirements. I don't think LLMs will be contributing much to the verification process soon: there is nowhere near enough material available.

iamwil 12/17/2025||

I wrote a post on how I dipped my toes into Lean, using LLMs to guide me.

https://interjectedfuture.com/the-best-way-to-learn-might-be...

You might find it useful. I also caveat the experience and recount some of the pitfalls, which you might enjoy as a skeptic.

Martin Klepmann seemed to like it too. https://bsky.app/profile/martin.kleppmann.com/post/3m7ugznx4...

runeks 12/17/2025||

> As the verification process itself becomes automated, the challenge will move to correctly defining the specification: that is, how do you know that the properties that were proved are actually the properties that you cared about? Reading and writing such formal specifications still requires expertise and careful thought. But writing the spec is vastly easier and quicker than writing the proof by hand, so this is progress.

How big is the effort of writing a specification for an application versus implementing the application in the traditional way? Can someone with more knowledge chime in here please?

ookdatnog 12/17/2025|

It depends on what you're working on. If you're doing real algorithmic work, often the algorithm is a lot more complex than its spec because it needs to be fast.

Take sorting a list for example. The spec is quite short.

- for all xs: xs is a permutation of sort(xs)

- for all xs: sorted(sort(xs))

Where we can define "xs is a permutation of ys" as "for each x in xs: occurrences(x, xs) = occurrences(x, ys)"

And "sorted(l)" as "forall xs, x, y, ys: (l = xs ++ [x, y] ++ ys) => x < y".

A straightforward bubble or insertion sort would perhaps be considered as simple or simpler than this spec. But the sorting algorithms in, say, standard libraries, tend to be significantly more complex than this spec.

woopsn 12/17/2025||

Will or should? It's plausible, this same argument was made in an article the other day, but basic type/static analysis tools are cheap with enormous payoff and even those methods aren't ubiquitous.

inimino 12/17/2025||

I both agree and disagree. Yes, AI will democratize access to formal methods and will probably increase the adoption of them in areas where they make sense (e.g. safety-critical systems), but no, it won't increase the areas where formal methods are appropriate (probably < 1% of software).

What will happen instead is a more general application of AI systems to verifying software correctness, which should lead to more reliable software. The bottleneck in software quality is in specifying what the behavior needs to be, not in validating conformance to a known specification.

metalrain 12/17/2025||

I think we will use more tools to check the programs in the future.

However I don't still believe in vibecoding full programs. There are too many layers in software systems, even when the program core is fully verified, the programmer must know about the other layers.

You are Android app developer, you need to know what phones people commonly use, what kind of performance they have, how the apps are deployed through Google App Store, how to manage wide variety of app versions, how to manage issues when storage is low, network is offline, battery is low and CPU is in lower power state.

71bw 12/17/2025|

LLMs can handle a lot of these issues already, without having the user think about such issues.

Problem is - while these will be resolved (in one way or another) - or left unresolved, as the user will only test the app on his device and that LLM "roll" will not have optimizations for the broad range of others - the user is still pretty much left clueless as to what has really happened.

Models theoretically inform you about what they did, why they did it (albeit, largely by using blanket terms and/or phrases unintelligible to the average 'vibe coder') but I feel like most people ignore that completely, and those who don't wouldn't need to use a LLM to code an entirety of an app regardless.

Still, for very simple projects I use at work just chucking something into Gemini and letting it work on it is oftentimes faster and more productive than doing it manually. Plus, if the user is interested in it, it can be used as a relatively good learning tool.

BobSonOfBob 12/17/2025||

Formal verification at the “unit test” level seems feasible. At the system level of a modern application, the combinations of possible states will need a quantum computer to finish testing in this lifetime.

bgwalter 12/16/2025|

No it won't. People who aren't interested in writing specifications now won't be interested later as well. They want to hit the randomization button on their favorite "AI" jukebox.

If anyone does write a specification, the "AI" won't get even past the termination proof of a moderately complex function, which is the first step of accepting said function in the proof environment. Before you can even start the actual proof.

This article is pretty low on evidence, perhaps it is about getting funding by talking about "AI".

More comments...