Top
Best
New

Posted by evakhoury 7 days ago

AI will make formal verification go mainstream(martin.kleppmann.com)
825 points | 432 commentspage 4
joegibbs 7 days ago|
I think the problem is that people don't know exactly what it is that they want. You could easily make a formally verified application that is nevertheless completely buggy and doesn't make any sense. Like he says about the challenge moving to defining the specification: I don't think that would help because there are fewer people who understand formal verification, who would be able to read that and make sense of it, than there would be people who could write the code.
NackerHughes 7 days ago||
It absolutely will not make formal verification go mainstream.

What it will make go mainstream, and in fact has already started to, is “ChatGPT verified it so it must be OK.”

mdnahas 6 days ago||
I’m the author of a tutorial on Rocq.

I’m surprised at the negativity on HN. We all want bug-free code and this is a way to not just reduce bugs, but eliminate whole classes of bugs.

Moreover, with proven code, we can have rock-solid libraries and code reuse.

Also, proven specifications allow LLMs to do divide-and-conquer when generating code. You can ask it to generate part A that does X, assuming part B will do Y. And then ask it to generate part B in parallel. And know that, when merged, the code will work.

And, provable code is good for LLMs because it will let LLM creators study hallucinations. The proof checker can identify what is a hallucination and what is not. This means LLMs may learn what they don’t know!

I’m not saying LLMs with proven code is nirvana. There are parts of systems where it doesn’t apply. Specifications are often complex and difficult to understand. Some important details are hard to write a spec for. And specs can miss things. But code proven correct by LLMs has potential to do real good.

the_duke 7 days ago||
I've been preaching similar thoughts for the last half year.

Most popular programming languages are optimized for human convenience, not for correctness! Even most of the popular typed languages (Java/Kotlin/Go/...) have a wide surface area for misuse that is not caught at compile time.

Case in point: In my experience, LLMs produce correct code way more regularly for Rust than for Js/Ts/Python/... . Rust has a very strict type system. Both the standard library and the whole library ecosystem lean towards strict APIs that enforce correctness, prevent invalid operations, and push towards handling or at least propagating errors.

The AIs will often write code that won't compile initially, but after a few iterations with the compiler the result is often correct. Strong typing also makes it much easier to validate the output when reviewing.

With AIs being able to do more and more of the implementation, the "feel-good" factor of languages will become much less relevant. Iteration speed is not so important when parallel AI agents do the "grunt work". I'd much rather wait 10 minutes for solid output rather than 2 minutes for something fragile.

We can finally move the industry away from wild-west languages like Python/JS and towards more rigorous standards.

Rust is probably the sweet spot at the moment, thanks to it being semi-popular with a reasonably active ecosystem, sadly I don't think the right language exists at the moment.

What we really want is a language with a very strict, comprehensive type system with dependent types, maybe linear types, structured concurrency, and a built-in formal proof system.

Something like ADA/Spark, but more modern.

baalimago 7 days ago||
Well, then the formal verification will be vibe-coded as well, killing the point.

More likely is the rise of test driven development, or spec driven development.

monkeydust 7 days ago||
What are the paradigms people are using to use AI in helping generate better specs and then converting those specs to code and test cases? The Kiro IDE from Amazon I felt was a step in the direction of applying AI across the entire SDLC
barishnamazov 7 days ago||
Then should we apply formal verification to the vibe coded formal verification software?
GTP 7 days ago||
I doubt some of this article's claims.

> At present, a human with specialist expertise still has to guide the process, but it’s not hard to extrapolate and imagine that process becoming fully automated in the next few years.

We already had some software engineers here on HN explain that they don't make a large use of LLMs because the hard part of their job isn't to actually write the code, but to understand the requirements behind it. And formal verification is all about requirements.

> Reading and writing such formal specifications still requires expertise and careful thought. But writing the spec is vastly easier and quicker than writing the proof by hand, so this is progress.

Writing the spec is easier once you are confident about having fully understood the requirements, and here we get back to the above issue. Plus, it is already the case that you don't write the proof by hand, this is what the prover either assists you with or does in full.

> I find it exciting to think that we could just specify in a high-level, declarative way the properties that we want some piece of code to have, and then to vibe code the implementation along with a proof that it satisfies the specification.

And here is where I think problems will arise: moving from the high level specification to the formal one that is the one actually getting formally verified.

Of course, this would still be better than having no verification at all. But it is important to keep in mind that, with these additional levels of abstractions, you will likely end up with a weaker form of formal verification, so to speak. Maybe it is worth it to still verify some high assurance software "the old way" and leave this only for the cases where additional verification is nice to have but not a matter of life or death.

momojo 7 days ago||
@simonw's successful port of JustHTML from python to javascript proved that an agent iteration + an exhaustive test suite is a powerful combo [0].

I don't know if TLA+ is going to suddenly appear as 'the next language I want to learn' in Stackoverflow's 2026 Developer Survey, but I bet we're going to see a rise in testing frameworks/languages. Anything to make it easier for an agent to spit out tokens or write smaller tests for itself.

Not a perfect piece of evidence, but I'm really interested to see how successful Reflex[1] is in this upcoming space.

[0] https://simonwillison.net/2025/Dec/15/porting-justhtml/ [1] https://github.com/reflex-dev/reflex

positron26 7 days ago||
The natural can create the formal. An extremely intuitive proof is that human walked to Greece and created new formalisms from pre-historic cultures that did not have them.

Gödel's incompleteness theorems are a formal argument that only the natural can create the formal (because no formal system can create all others).

Tarski's undefinability theorem gives us the idea that we need different languages for formalization and the formalisms themselves.

The Howard Curry correspondence concludes that the formalisms that pop out are indistinguishable from programs.

Altogether we can basically synthesize a proof that AGI means automatic formalization, which absolutely requires strong natural systems employed to create new formal systems.

I ended up staying with some family who were watching The Voice. XG performed Glam, and now that I have spit many other truths, may you discover the truth that motivates my work on swapchain resizing. I wish the world would not waste my time and their own, but bootstrapping is about using the merely sufficient to make the good.

igornotarobot 7 days ago||
It probably will, but not the way we all imagine. What we see now is an attempt to recycle the interactive provers that took decades to develop. Writing code, experimenting with new ideas and getting feedback has always been a very slow process in academia. Getting accepted at a top peer-reviewed conference takes months and even years. The essential knowledge is hidden inside big corps that only promote their "products" and rarely give the knowledge back.

LLMs enable code bootstrapping and experimentation faster not only for the vibe coders, but also for the researchers, many of them are not really good coders, btw. It may well be that we will see new wild verification tools soon that come as a result of quick iteration with LLMs.

For example, I recently wrote an experimental distributed bug finder for TLA+ with Claude in about three weeks. A couple of years ago that effort would require three months and a team of three people.

8note 7 days ago|
i could see formal verification become a key part of "the prompt is the code" so that as versions bump and so on, you can have an llm cpmpletely regenerate the code from scratch-ish and be sure that the spec is still followed

but i dont think people will suddenly gravitate towards using them because they're cheaper to write - bugs of the form "we had no idea this sould be considered" is way more common than "we wrote code that didnt do what we wanted it to"

an alternative guess for LLMs and formal verification is that systems where formal verification is a natural fit - putting code in places that are hard to update and have well known conditions, will move faster.

i could also see agent tools embedding in formal methods proofs into their tooling, so they write both the code and the spec at the same time, with the spec acting as memory. that kinda ties into the recent post about "why not have the LLM write machine code?"

More comments...