Top
Best
New

Posted by Poudlardo 1 day ago

Leanstral: Open-source agent for trustworthy coding and formal proof engineering(mistral.ai)
Lean 4 paper (2021): https://dl.acm.org/doi/10.1007/978-3-030-79876-5_37
734 points | 179 commentspage 4
miacycle 1 day ago|
The TDD foundation! We might need one of those. :)
igravious 22 hours ago||
"and continues to scale linearly"

it clearly and demonstrably does not. in fact, from eyeballing their chart Qwen, Kimi, and GLM scale linearly whereas Leanstral does not. But this is not surprising because the Alibaba, Moonshot, and Zhipu have hundreds of employees each and hundreds of millions of dollars of investment each.

ucsandman 10 hours ago||
love the opensource push for agents, the fleet grows!
westurner 18 hours ago||
From https://mistral.ai/news/leanstral :

  Model        Cost ($) Score
  ..
  Claude Opus     1,650 39.6
  ..
  Leanstral pass@8  145 31.0
  Leanstral pass@16 290 31.9
kittikitti 1 day ago||
This is great, congratulations to the Mistral team! I'm looking forward to the code arena benchmark results. Thanks for sharing.
jiehong 21 hours ago||
Congratulations on the launch!

Mistral seems to focus on a different market than the others. Their best model is meh, their best ASR model locally is either rather slow compared to Parakeet on similar languages, or not as good for others (like qwen ASR).

Side note: Lean seems quite unreadable with tons of single letter variable names. Part of it is me being unaccustomed with it, but still.

aimanbenbaha 20 hours ago|
Mistral seems to focus on some niche LLM model tooling that are somehow very needed in certain cases. Can't forget their OCR multimodal embedding model!
xpe 8 hours ago||
Public service announcement to hopefully reduce unnecessary knife fights*:

There are two compatible and important (but different) questions in play:

1. Is a program correct relative to a formal specification?

2. Is the formal specification what we mean/want?

*: Worth asking: “What that other person necessarily wrong? Or perhaps they are discussing a different aspect or framing?” AKA: “be curious and charitable” I’m not going to link to the specific threads, but they are happened / are happening. Le Sigh.

myylogic 5 hours ago||
The verification angle makes sense, especially for high-stakes domains.

But I wonder how this scales in practice outside of formal environments.

In most ML/LLM systems, the bottleneck isn’t just correctness of individual steps, but the interaction between components (data → tokenizer → model → inference). A lot of failures come from subtle mismatches across the pipeline rather than strictly invalid logic.

Formal specs are great when the system is well-defined, but many real-world pipelines are still exploratory and data-dependent.

It feels like there’s a gap between: • formally verified components • and emergent behavior in end-to-end systems

Curious how you see this approach handling those system-level uncertainties.

whazor 5 hours ago|
There are also software model checkers that can model distributed processes. You have to simplify the state a bit, otherwise you get a state space explosion.

I tried it out myself, I let AI add action transitions through the code, like: // A -> B: some description. Then I validate via a test that every action transition defined in my model is also defined somewhere commented in code, and other way around that every comment exists in the model.

Finally, I let AI write model check queries on particular properties. If I notice a particular bug, then I ask AI to analyze the model and the model check queries on why it could happen, and ask to strengthen it.

It sounds like a lot of effort, but I got it working in a half hour.

maxothex 8 hours ago||
[dead]
leontloveless 12 hours ago|
[dead]
More comments...