Posted by Anon84 9/4/2025
Same price, but dramatically better results, way more reliable, and 10x faster. The only downside is when it does fail, it seems to fail much harder. Where gpt-5-mini would disregard the formatting in the prompt 70% of the time, mistral-medium follows it 99% of the time, but the other 1% of the time inserts random characters (for whatever reason, normally backticks... which then causes it's own formatting issues).
Still, very happy with Mistral so far!
e.g. DOMINO https://arxiv.org/html/2403.06988v1
I think they are saying that if highest probability phrase fails the regex, the LLM is able to substitute the next most likely candidate.
In slack/teams I do it with anything someone might copy and paste to ensure that the chat client doesn't do something horrendous like replace my ascii double quotes with the fancy unicode ones that cause syntax errors.
In readme files any example path, code, yaml, or json is wrapped in code quotes.
In my personal (text file) notes I also use ``` {} ``` to denote a code block I'd like to remember, just out of habit from the other two above.
Would you like to elaborate further on how the experience was with it? What was your approach for using it? How did you generate synthetic data? How did it perform?
It also helps with having a field of the json be the confidence or a similar pattern to act as a cut for what response is accepted.
I got this format from writing markdown files, it’s a nice way to share examples and also specify which format it is.
This is maybe more maybe less insidious. It will literally just insert a random character into the middle of a word.
I work with an app that supports 120+ languages though. I give the LLM translations, transliterations, grammar features etc and ask it to explain it in plain English. So it’s constantly switching between multiple real, and sometimes fake (transliterations) languages. I don’t think most users would experience this
Was looking to both decrease costs and experiment out of OpenAI offering and ended up using Mistral Small on summarization and Large for the final analysis step and I'm super happy.
They have also a very generous free tier which helps in creating PoCs and demos.
Is there an example you can show that tended to fail?
I’m curious how token constraint could have strayed so far from your desired format.
Yes I use(d) structured output. I gave it very specific instructions and data for every paragraph, and asked it to generate paragraphs for each one using this specific format. For the formatting, I have a large portion of the system prompt detailing it exactly, with dozens of examples.
gpt-5-mini would normally use this formatting maybe once, and then just kinda do whatever it wanted for the rest of the time. It also would freestyle and put all sorts of things in the various bold and italic sections (using the language name instead of the translation was one of its favorites) that I’ve never seen mistral do in the thousands of paragraphs I’ve read. It also would fail in some other truly spectacular ways, but to go into all of them would just be bashing on gpt-5-mini.
Switched it over to mistral, and with a bit of tweaking, it’s nearly perfect (as perfect as I would expect from an LLM, which is only really 90% sufficient XD)
Would you be willing to share an example prompt? I'm curious to see what it'sesponding well to.
Mistral medium is ranked #8 on lmsys arena IIRC, so it’s probably just not your style?
I’m also comparing this to gpt-5-mini, not the big boy
For example, I ask, what are the most common targets of removal in magic: the gathering? Mistral's answer is so-so, including a slew of cards you would prioritize removing, but also several you typically wouldn't, including things like mox amber, a 0 cost mana rock. Gemini flash gave far fewer examples, one for each major card type type, but all of them are definitely priority targets that often defined an entire metagame, like Tarmogoyf.
It’s pretty rare though. Really solid model, just a few quirks
I’m making an app to learn multiple languages. This portion of the pipeline is about explaining everything I can determine about a work in a sentence in specifically formatted prose.
Example: https://x.com/barrelltech/status/1963684443006066772?s=46&t=...
I'd love to try Phrasing, but there is no way that I'm going to give my credit card information if I've never seen how it works. I'm not asking for special treatment. I think the 4 Euro 14 day trial should be free. Not because I don't want to pay, but because I don't want my credit card details shared with every service I ever try.
Yes, I understand that your credit card provider is safe and that is not my worry. It's just that handing over that information is a very large hurdle, and one that users get over only after you built trust with them. If we can never see what we're getting, we have no reason to get over that hurdle and develop that trust.
My main task this week (besides QAing the new setup flow) is updating examples on the home page and in the blog. It’s all a one person show, so things move a little slowly, but I’m working as fast as I can :)
The paid trial gets you credits at cost — Phrasing is bootstrapped by me, and there’s no way I can afford a $4 loss leader for everyone who signs up. I would have to 100x the subscription prices (or more) in order to afford that.
There is a way to use the new setup process to “try” the application without spending money, but that’s not an avenue I want to intentionally funnel users through, so you won’t see any copy around it. Feel free to get in touch via the app though if you need more details.
I have plans for a free trial eventually, but its blocked by a few key product developments, so it will be a year at least before that’s an option!
I have no affiliation with mistral, I just have recent experience with them and wanted to share
Also @mistral hmu if you want to arrange something!
I don’t write much about elixir (or clojure) as I’m terrible at technical writing, but I am a die hard fan of both languages
Apart from that, Mistral appears to remain the only really relevant new player with European ties in the Gen AI space. Aleph Alpha is not heard of anymore and is essentially steered by the Schwarz Group now, so at least chances of an acquihire I guess.
I think AI in Europe is doable in general.
I don't think that was unclear to anyone - again, I'm sure some EU entities might want EU related AI companies more than they care about any other features, just as some Turkmenistani entities would prefer Turkmenistan AI. I hope the point about why that advantage is banausic here is more clear, now.
Besides those EU entities, do these companies offer any advantages compared to American or Chinese AI companies for the entire rest of the world? Licensing, rankings in specific benchmarks, etc?
Our engineer used lovable for about a day, then just cloned the repo and used cursor since it it was much more productive.
One PM I know uses it for designing prototypes then handing them off to the engineering team who then continue them in Claude Code etc.
So it's sorta of competing with Wix, Squarespace, Wordpress, and also prototyping tools like Figma.
1. A general purpose LLM chat interface with high reasoning capacity (GPT-5 thinking on web is my go to for now)
2. An agent that has unrestricted token consumption running on my machine (Claude Code with Opus and Amp are my go to for now).
3. A fine-tuned, single purpose LLM like v0 that is really good at one thing, in this case at generating a specific UI component with good design aesthetics from a wireframe in a sandbox.
Everything else seems like getting the worst of all worlds.
I could never get anything useful out of Lovable and was frustrated with the long editing and waiting process.
I'd prefer a site builder template with dropdowns. Lovable feels like that type of product, just with an LLM facade.
I don't hate AI, I just wasn't getting into the groove with Lovable.
I use Lumo a lot and usually results are good enough. To be clear though, I do fall back on gemini-cli and OpenAI’s codex systems for coding a few times a week.
I live in the US, but if I were a European, I would be all in on supporting Mistral. Strengthen your own country and region.
That's a bit of a double edged sword. My support goes as far as giving local offerings a try when I might not have done otherwise. But at that point they need to be able to compete on merit.
> Lumo is powered by open-source large language models (LLMs) which have been optimized by Proton to give you the best answer based on the model most capable of dealing with your request. The models we’re using currently are Nemo, OpenHands 32B, OLMO 2 32B, and Mistral Small 3. These run exclusively on servers Proton controls so your data is never stored on a third-party platform.
The problem is that if it's actually successful it'll just be bought by one of the big US based competitors.
https://www.bloomberg.com/news/articles/2025-09-03/mistral-s...
What does secure mean in this context? I didn't see it explained here.
Perhaps they mean this?
> Admin users can confidently control which connectors are available to whom in their organization, with on-behalf authentication, ensuring users only access data they’re permitted to.
Cool!