Posted by RebelPotato 3 days ago
And there are verification tools for verification tools! Isabelle has a verified proof checker that can check whether a given proof is sound: https://www21.in.tum.de/~rosskops/papers/metalogic_pre.pdf The only caveat is that the proof checker itself is verified in Isabelle...
The linked blog post says "Unfortunately, a student once submitted work containing this error; it was almost entirely incorrect and he had no idea." I guess the student probably was aware that not every proof had gone through, and that the that he saw like "99 out of 100 proofs succeeded" and assumed that meant that he had mostly completed the task, not realizing that a false theorem would be used to the give incorrect proofs for the rest of the file.
[1] https://www.msoos.org/2013/04/gcc-4-5-2-at-sat-competition-2...
E.g., Metamath is designed to be as theoretically simple as possible, to the point that it's widely considered a toy in comparison to 'serious' proof systems: a verifier is mainly just responsible for pushing around symbols and strings. In spite of this simplicity, I was able to find soundness bugs in a couple major verifiers, simply because few people use the project to begin with, and even fewer take the time to pore over the implementations.
So I'd be hesitant to start saying that one approach is inherently more or less bug-prone than another, except to be slightly warier in general of larger or less accessible kernels.
LCF-style provers have a much smaller trusted computing base than Curry/Howard based provers like Coq, Agda and Lean.
One may wonder if there is a correlation between size of TCB and error rate in widely used provers?
I'm not sure that this is correct. The TCB in a CH based prover is just the implementation of the actual kernel. In LCF, you also have to trust that any tactics are implemented in a programming language that doesn't allow you to perform unsound operations. That's a vast expansion in your TCB. (You can implement LCF-like "tactics" in a CH-based prover via so-called reflection that delegates the proof to a runtime computation, but you do have to prove that your computation yields a correct decision for the problem.)
Tactics generate thms by calling functions of the Thm module in some fashion.
Of course there's no free lunch, as you say, in making sure that high-level proofs are lowered into the trusted part correctly, but it's certainly a piece that should ideally be as simple as possible.
[0] https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-02/...
[1] https://lists.cam.ac.uk/sympa/arc/cl-isabelle-users/2025-02/...
There is no free lunch.
ā https://github.com/jrh13/hol-light/blob/master/fusion.ml
A mere 676 LoCs. This miniscule size of the TCB is where the comparatively lesser bug count (despite intense use) comes from.