Posted by evakhoury 7 days ago
It appears many of the proof assistants/verification systems can generate OCaml. Or perhaps ADA/Spark?
Regardless of how the software engineering discipline will change in the age of gen AI, we must aim to produce higher not lower quality software than whatever we have today, and formal verification will definitely help.
Also, I am a novice when it comes to programming with sound, and today I have been dorking with a simple limiter. ChatGPT knows way more than me about what I am doing. It has taught me a ton. And as magical and wonderful as it is, it is incredibly tedious to try to work with it to come up with real specifications of interesting properties.
Instead of banging my head against a theorem prover that won't say QED, I get a confident sounding stream of words that I often don't even understand. I often don't even have the language to tell it what I am imagining. When I do understand, it's a lot of typing to explain my understanding. And so often, as a teacher, it just is utterly failing to effectively communicate to me why I am wrong.
At the end of all of this, I think specification is really hard, intellectually creative and challenging work. An LLM cannot do the work for you. Even to be guided down the right path, you will need perseverance and motivation.
Server class CPUs and GPUs are littered with side channels which are very difficult to “close”, even in hardened cloud VMs.
We haven’t verified “frontier performance” hardware down to the logic gate in quite some time. Prof. Margaret Martinosi’s lab and her students have spent quite some time on this challenge, and i am excited to see better, safer memory models oyt in the wild.
A lot of the same big ideas used in hardware are making their way into the software later too, see https://faultlore.com/blah/tower-of-weakenings/
While I agree formal verification itself has its problems, I think the argument has merit because soon AI generated code will surpass all human generated code and when that happens we atleast need a way to verify the code can be proved that it won't have security issues or adheres to compliance / policy.
If we're looking to use LLMs to make code absolutely rock-solid, I would say advanced testing practices are a good candidate!. Property-based testing, fuzzing, contract testing (for example https://github.com/griffinbank/test.contract) are all fun but extremely tedious to write and maintain. I think that makes it the perfect candidate for LLMs. These kinds of tests are also more easily understandable by regular ol' software developers, and I think we'll have to be auditing LLM output for quite a while.
With a domain specific language you can add extra limitations on the kinds of outputs. This can also make formal verification faster.
Maybe like React components. You limit the format. You could limit which libraries can be imported, what hooks could be used, how expressive could be.
Its both. A colorblind person will admit their shortcomings and, if compelled to be helpful like an LLM is, will reason their way to finding a solution that works around their limitations.
But as LLMs lack a way to reason, you get nonsense instead.
This assumes the colorblind person both believes it is true that they are colorblind, in a world where that can be verified, and possesses tools to overcome these limitations.
You have to be much more clever to 'see' an atom before the invention of a microscope, if the tool doesn't exist: most of the time you are SOL.
This makes me wonder if LLMs works better in Chinese.
It's an example of a simple task. How often are you relying on LLMs to complete simple tasks?
https://chatgpt.com/share/6941df90-789c-8005-8783-6e1c76cdfc...
If you really need an answer and you really need the LLM to give it to you, then ask it to write a (Python?) script to do the calculation you need, execute it, and give you the answer.
https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/t...