Posted by programLyrique 21 hours ago
I've found that you can get wildly different quality results from these sorts of models due to seemingly insignificant differences in prompt construction. It would be much easier to guess at what it wants if I could just see some RL transcripts -- and so the model author is in a much better position to provide initial advice.
Identify bugs in [datrs/varinteger](https://github.com/datrs/varinteger) . Do NOT look at the GitHub issues, just inspect the source
It also found the bug that Leanstral 1.5 found and the authors highlighted. I think this bug wasn't especially tricky; it's just a case of too few eyeballs on this repo.Congrats on the release regardless! Excited for the direction Lean + automated AI proofs are headed.
Disclosure: I work at OpenAI.
This is a little bit like someone pointing the moon and you look at the finger.
The formal proof domain goes way beyond just finding bugs.
It has tons of usages in term of functional safety, protocol validation, cryptography, etc...
The fact Mistral tackle this kind of problem is both smart and not so surprising.
Smart because it is niche enough that they do not front face the big competitors (yet).
No so surprising because the French labs have a well known and long time expertise with formal proof tools (Coq and all its Ocaml associated tools). It has been historically mainly pushed by the aerospace and train industries (Airbus, Dassault, Alsthom).
Honestly: Think twice before dragging your firm into what you say.
Disclaimer: I speak for myself. Not any firm I am associated with.
this sounds like a great tool to add to the toolbelt, as part of the "how do we handle all the code output from LLMs" problem
The best and the brightest from Europe have no incentive to build in Europe when they can do it in America and be compensated and treated far better
That said, if (or when) the progress of the LLMs flatten out, then I think even Europe can catch up in a few years. If they don’t, and that seems unlikely to me, if the required compute needs to increase at the rate it does today, then I am not sure any of us can predict where society ends up.
For example, Mercedes autonomous driving team is moving ahead at glacial speed, but the system they have so far is excellent and reliable. I'd prefer that over the sad joke Tesla is promoting any day.
Life's not all about the number on your bank account. Once you're past a certain level (which a competent engineer can easily reach in Europe as well), the marginal utility of money diminished quite quickly.