Top
Best
New

Posted by Anon84 9/4/2025

Le Chat: Custom MCP Connectors, Memories(mistral.ai)
397 points | 163 comments
barrell 9/4/2025|
I recently upgraded a large portion of my pipeline from gpt-4.1-mini to gpt-5-mini. The performance was horrible - after some research I decided to move everything to mistral-medium-0525.

Same price, but dramatically better results, way more reliable, and 10x faster. The only downside is when it does fail, it seems to fail much harder. Where gpt-5-mini would disregard the formatting in the prompt 70% of the time, mistral-medium follows it 99% of the time, but the other 1% of the time inserts random characters (for whatever reason, normally backticks... which then causes it's own formatting issues).

Still, very happy with Mistral so far!

mark_l_watson 9/4/2025||
It is such a common pattern for LLMs to surround generated JSON with ```json … ``` that I check for this at the application level and fix it. Ten years ago I would do the same sort of sanity checks on formatting when I used LSTMs to generate synthetic data.
mpartel 9/4/2025|||
Some LLM APIs let you give a schema or regex for the answer. I think it works because LLMs give a probability for every possible next token, and you can filter that list by what the schema/regex allows next.
hansvm 9/4/2025||
Interestingly, that gives a different response distribution from simply regenerating while the output doesn't match the schema.
Rudybega 9/4/2025|||
This is true, but there are methods to greatly reduce the effect of this and generate results that match or even improve overall output accuracy:

e.g. DOMINO https://arxiv.org/html/2403.06988v1

joshred 9/4/2025|||
It sounds like they are describing a regex filter being applied to the model's beam search. LLMs generate the most probable words, but they are frequently tracking several candidate phrases at a time and revising their combined probability. It lets them self correct if a high probability word leads to a low probability phrase.

I think they are saying that if highest probability phrase fails the regex, the LLM is able to substitute the next most likely candidate.

stavros 9/4/2025||
You're actually applying a grammar to the token. If you're outputting, for example, JSON, you know what characters are valid next (because of the grammar), so you just filter out the tokens that don't fit the grammar.
viridian 9/4/2025||||
I'm sure the reason is the plethora of markdown data is was trained on. I personally use ``` stuff.txt ``` extremely frequently, in a variety of places.

In slack/teams I do it with anything someone might copy and paste to ensure that the chat client doesn't do something horrendous like replace my ascii double quotes with the fancy unicode ones that cause syntax errors.

In readme files any example path, code, yaml, or json is wrapped in code quotes.

In my personal (text file) notes I also use ``` {} ``` to denote a code block I'd like to remember, just out of habit from the other two above.

accrual 9/4/2025||
Same. For me it's almost like a symbiotic thing to me. After using LLMs for a couple of years I noticed I use code blocks/backticks a lot more often. It's helpful for me as an inline signal like "this is a function name or hostname or special keyword" but it's also helpful for other people/Teams/Slack and LLMs alike.
OJFord 9/4/2025||
I'm the opposite, always been pretty good about doing that in Slack etc. (or even here where it doesn't affect the rendering) but I sometimes don't bother in LLM chat.
fumeux_fume 9/4/2025||||
Very common struggle, but a great way to prevent that is prefilling the assistant response with "{" or as much JSON output as you're going to know ahead of time like '{"response": ['
XenophileJKO 9/4/2025|||
Just to be clear for anyone reading this, the optimal way to do this is schema enforced inference. You can only get a parsable response. There are failure modes, but you don't have to mess with parsing at all.
psadri 9/4/2025|||
Haven’t tried this. Does it mix well with tool calls? Or does it force a response where you might have expected a tool call?
fumeux_fume 9/4/2025||
It'll force a response that begins with an open bracket. So if you might need a response with a tool call that doesn't start with "{", then it might not fit your workflow.
Alifatisk 9/4/2025||||
I think this is the first time I stumped upon someone who actually mentions LSTM in a practical way instead of just theory. Cool!

Would you like to elaborate further on how the experience was with it? What was your approach for using it? How did you generate synthetic data? How did it perform?

p1esk 9/4/2025||
10 years ago I used LSTMs for music generation. Worked pretty well for short MIDI snippets (30-60 seconds).
freehorse 9/4/2025||||
I had similar issues with local models, ended up actually requesting the backticks because it was easier this way, and parsed the output accordingly. I cached a prompt with explicit examples how to structure data, and reused this over and over. I have found that without examples in the prompts some llms are very unreliable, but with caching some example prompts this becomes a non-issue.
mejutoco 9/4/2025||||
Funny, I do the same. Additionally, one can define a json schema for the output and try to load the response as json or retry for a number of times. If it is not valid json or the schema is not followed we discard it and retry.

It also helps with having a field of the json be the confidence or a similar pattern to act as a cut for what response is accepted.

tosh 9/4/2025||||
I think most mainstream APIs by now have a way for you to conform the generated answer to a schema.
Alifatisk 9/4/2025||||
I do use backticks a lot when sharing examples in different format when using LLMs and I have instructed them to do likewise, I also upvote whenever they respond in that matter.

I got this format from writing markdown files, it’s a nice way to share examples and also specify which format it is.

barrell 9/4/2025|||
Yeah, that’s infuriating. They’re getting better now with structured data, but it’s going to be a never ending battle getting reliable data structures from an LLM.

This is maybe more maybe less insidious. It will literally just insert a random character into the middle of a word.

I work with an app that supports 120+ languages though. I give the LLM translations, transliterations, grammar features etc and ask it to explain it in plain English. So it’s constantly switching between multiple real, and sometimes fake (transliterations) languages. I don’t think most users would experience this

epolanski 9/4/2025|||
I had a similar experience on my pipeline.

Was looking to both decrease costs and experiment out of OpenAI offering and ended up using Mistral Small on summarization and Large for the final analysis step and I'm super happy.

They have also a very generous free tier which helps in creating PoCs and demos.

siva7 9/5/2025|||
I thought i was the only one experiencing this slowness. I can't comprehend why something called gpt mini is actually slower than their non-mini counterpart.
barrell 7 days ago||
Nooo you are definitely not alone. gpt-5-nano even is slowest model I’ve used since like 2023, second only to gpt-5-mini
fkyoureadthedoc 9/4/2025|||
Same, my project has a step that selects between many options when a user is trying to do some tasks. The test set for the workflow that supports this has a better success rate by about 7% on gpt-4.1-mini vs gpt-5 and gpt-5-mini (with minimal thinking)
WhitneyLand 9/4/2025|||
Were you using structured output with gpt-5 mini?

Is there an example you can show that tended to fail?

I’m curious how token constraint could have strayed so far from your desired format.

barrell 9/4/2025||
Here is an example of the formatting I desired: https://x.com/barrelltech/status/1963684443006066772?s=46&t=...

Yes I use(d) structured output. I gave it very specific instructions and data for every paragraph, and asked it to generate paragraphs for each one using this specific format. For the formatting, I have a large portion of the system prompt detailing it exactly, with dozens of examples.

gpt-5-mini would normally use this formatting maybe once, and then just kinda do whatever it wanted for the rest of the time. It also would freestyle and put all sorts of things in the various bold and italic sections (using the language name instead of the translation was one of its favorites) that I’ve never seen mistral do in the thousands of paragraphs I’ve read. It also would fail in some other truly spectacular ways, but to go into all of them would just be bashing on gpt-5-mini.

Switched it over to mistral, and with a bit of tweaking, it’s nearly perfect (as perfect as I would expect from an LLM, which is only really 90% sufficient XD)

viridian 9/4/2025|||
I'm curious what your prompts look like, as this is the opposite of my experience. I use lmarena for many of the random one shot questions I have, and I've noticed that mistral-medium is almost always the worse of the two after I blind vote. Feels like it consistently takes losses from qwen, llama, gemini, gpt, you name it. I find it overwhelmingly the most likely to produce factually untrue information to an inquiry.

Would you be willing to share an example prompt? I'm curious to see what it'sesponding well to.

barrell 9/4/2025||
I provide it with data and ask it to convert it to prose in specific formats.

Mistral medium is ranked #8 on lmsys arena IIRC, so it’s probably just not your style?

I’m also comparing this to gpt-5-mini, not the big boy

viridian 9/4/2025||
I think input strategy probably accounts for the difference. Usually I'm just asking a short question with no additional context, and usually it's not the sort of thing that has one well defined answer. I'm really asking it to summarize the wisdom of the crowd, so to speak.

For example, I ask, what are the most common targets of removal in magic: the gathering? Mistral's answer is so-so, including a slew of cards you would prioritize removing, but also several you typically wouldn't, including things like mox amber, a 0 cost mana rock. Gemini flash gave far fewer examples, one for each major card type type, but all of them are definitely priority targets that often defined an entire metagame, like Tarmogoyf.

barrell 9/4/2025||
Ah yeah. I’m only grading it on its prose, formatting, ability to interpret data, and instruction following. I do not use it as a store of knowledge
thijsverreck 9/4/2025|||
Any chance at fixing it with regex parsing or redoing inference when the results are below a certain treshold?
barrell 9/4/2025||
It’s user facing, so will just have an option for users to regenerate the explanation. It happens rarely enough that it’s not a huge issue, and normally doesn’t effect content (I think once I saw it go a little wonky and end the sentence with a few random words). Just sometimes switches to mono space font in the middle of a paragraph, or it “spells” a word wrong (spell is in quotes because it will spell `chien` as `chi§en`).

It’s pretty rare though. Really solid model, just a few quirks

brcmthrowaway 9/4/2025|||
What are you actually making
barrell 9/4/2025||
https://phrasing.app

I’m making an app to learn multiple languages. This portion of the pipeline is about explaining everything I can determine about a work in a sentence in specifically formatted prose.

Example: https://x.com/barrelltech/status/1963684443006066772?s=46&t=...

dotancohen 6 days ago||
This looks great. I invest over an hour a day on Anki, so I'm probably your target audience.

I'd love to try Phrasing, but there is no way that I'm going to give my credit card information if I've never seen how it works. I'm not asking for special treatment. I think the 4 Euro 14 day trial should be free. Not because I don't want to pay, but because I don't want my credit card details shared with every service I ever try.

Yes, I understand that your credit card provider is safe and that is not my worry. It's just that handing over that information is a very large hurdle, and one that users get over only after you built trust with them. If we can never see what we're getting, we have no reason to get over that hurdle and develop that trust.

barrell 4 days ago||
The onboarding flow is already updated with better examples. It’s been my main task this past week — it still needs quite a bit of QA, but it’s been better than the previous flow, so I decided to ship it early. Any bugs will get ironed out in the next day or two. There’s now pretty clear examples of everything

My main task this week (besides QAing the new setup flow) is updating examples on the home page and in the blog. It’s all a one person show, so things move a little slowly, but I’m working as fast as I can :)

The paid trial gets you credits at cost — Phrasing is bootstrapped by me, and there’s no way I can afford a $4 loss leader for everyone who signs up. I would have to 100x the subscription prices (or more) in order to afford that.

There is a way to use the new setup process to “try” the application without spending money, but that’s not an avenue I want to intentionally funnel users through, so you won’t see any copy around it. Feel free to get in touch via the app though if you need more details.

I have plans for a free trial eventually, but its blocked by a few key product developments, so it will be a year at least before that’s an option!

FranklinMaillot 9/4/2025|||
You may be aware of that, but they released mistral-medium-2508 a few days ago.
barrell 9/4/2025||
I did not! It’s not on azure yet and I’ve still got some credits to burn. That’s exciting though, hopefully it will iron out this weird ghost character issue.
noreplydev 9/4/2025|||
mistral speed is amazing
VeryNosy 9/4/2025||
[flagged]
neya 9/4/2025||
I'm sorry, I don't see anywhere that they work for Mistral? Is there something I'm missing here?
barrell 9/4/2025||
I’m curious too. I’m building https://phrasing.app. A quick glance at my profile should be enough to disprove the claim that I work for mistral.

I have no affiliation with mistral, I just have recent experience with them and wanted to share

Also @mistral hmu if you want to arrange something!

neya 9/4/2025||
Nice to meet a fellow Elixir dev, all the best for your app, looks super cool :) Are you on Reddit? (I'm usually more active there).
barrell 9/4/2025||
Thanks for the kind words! You can find me at https://reddit.com/u/phrasingapp and https://x.com/barrelltech

I don’t write much about elixir (or clojure) as I’m terrible at technical writing, but I am a die hard fan of both languages

mickael-kerjean 9/4/2025||
If someone from mistral comes around, is there a way for third party MCP implementation to be in the connector directory? I built a MCP connector allowing people to connect to every possible file transfer protocol from S3 to FTP(S), SFTP, SMB, NFS, Gdrive, Dropbox, azure blob, onedrive, sharepoint, etc.. and have a couple layers to delegate authentication, enforce authorisation, support for RBAC and create chroots so the LLM can't go haywire + tools to visualise and or edit hundreds of file format. Would be awesome to get it listed, and it's open source: https://github.com/mickael-kerjean/filestash
beernet 9/4/2025||
Mistral recently being valued at $14 billion in the previous funding appears like a steal to me, especially compared to the Anthropic and OAI valuations. Would be interesting to compare revenues and growth rates as well to put these valuations into better perspective.

Apart from that, Mistral appears to remain the only really relevant new player with European ties in the Gen AI space. Aleph Alpha is not heard of anymore and is essentially steered by the Schwarz Group now, so at least chances of an acquihire I guess.

riedel 9/4/2025||
Without creating so much buzz there is also still DeepL . They just announced an agent framework: https://www.heise.de/en/news/DeepL-presents-its-own-AI-agent...

I think AI in Europe is doable in general.

FinnLobsien 9/4/2025||
What's their unique value? How are they differentiated vs. OAI/Anthropic/etc. who have way more money/distribution/etc.?
meesles 9/4/2025|||
Not being based in the US is quite a differentiator for a lot of the world
DetroitThrow 9/4/2025||
I think the obvious question is whether they provide any differentiation beyond merely their HQ jurisdiction, since I'm sure we can all agree Turkmenistan AI would be very important for Turkmenistani government agencies too..
saubeidl 9/4/2025||
The difference with Turkmenistan is that the EU is the world's second largest economy. Having a near-monopoly on that is better than fighting over the largest economy.
DetroitThrow 9/5/2025||
>The difference with Turkmenistan is that the EU is the world's second largest economy

I don't think that was unclear to anyone - again, I'm sure some EU entities might want EU related AI companies more than they care about any other features, just as some Turkmenistani entities would prefer Turkmenistan AI. I hope the point about why that advantage is banausic here is more clear, now.

Besides those EU entities, do these companies offer any advantages compared to American or Chinese AI companies for the entire rest of the world? Licensing, rankings in specific benchmarks, etc?

riedel 9/4/2025|||
They get translation in many languages right (which is important in Europe). They do not offer general purpose GenAI yet. But as they provide models for translation and text editing they have gained the trust of many companies. If they now move towards agentic AI for administrative task, they for sure have chances in procurement.
ljlolel 9/4/2025|||
Also Lovable
hansonkd 9/4/2025|||
Lovable has the worst moat I have ever seen for a company.

Our engineer used lovable for about a day, then just cloned the repo and used cursor since it it was much more productive.

saberience 9/5/2025||
Engineers aren't the target audience for Lovable. I see it being used by designers and product managers, and also solopreneur type people, or non-technical folks wanting to build websites or start companies.

One PM I know uses it for designing prototypes then handing them off to the engineering team who then continue them in Claude Code etc.

So it's sorta of competing with Wix, Squarespace, Wordpress, and also prototyping tools like Figma.

hashbig 9/4/2025||||
I just couldn't love it, and frankly I don't get the hype around it. I recently found that all my use cases can be served by either:

1. A general purpose LLM chat interface with high reasoning capacity (GPT-5 thinking on web is my go to for now)

2. An agent that has unrestricted token consumption running on my machine (Claude Code with Opus and Amp are my go to for now).

3. A fine-tuned, single purpose LLM like v0 that is really good at one thing, in this case at generating a specific UI component with good design aesthetics from a wireframe in a sandbox.

Everything else seems like getting the worst of all worlds.

echelon 9/4/2025|||
Aren't there a billion Lovable clones now that do the exact same thing?

I could never get anything useful out of Lovable and was frustrated with the long editing and waiting process.

I'd prefer a site builder template with dropdowns. Lovable feels like that type of product, just with an LLM facade.

I don't hate AI, I just wasn't getting into the groove with Lovable.

brulard 9/4/2025||
Yeah for me Lovable was not really lovable.
hugedickfounder 9/4/2025||
[dead]
mark_l_watson 9/4/2025||
I pay to use ProtonMail’s privacy preserving Lumo LLM Chat with good web_search tooling. Lumo is powered by Mistral models.

I use Lumo a lot and usually results are good enough. To be clear though, I do fall back on gemini-cli and OpenAI’s codex systems for coding a few times a week.

I live in the US, but if I were a European, I would be all in on supporting Mistral. Strengthen your own country and region.

g-mork 9/4/2025||
I wonder what ProtonMail are doing internally? Mistral's public API endpoints route via CloudFlare, just like apparently every other hosted LLM out there, even any of the Chinese models I've checked
fauigerzigerk 9/4/2025|||
>I live in the US, but if I were a European, I would be all in on supporting Mistral. Strengthen your own country and region

That's a bit of a double edged sword. My support goes as far as giving local offerings a try when I might not have done otherwise. But at that point they need to be able to compete on merit.

TranquilMarmot 9/4/2025||||
https://proton.me/support/lumo-privacy

> Lumo is powered by open-source large language models (LLMs) which have been optimized by Proton to give you the best answer based on the model most capable of dealing with your request. The models we’re using currently are Nemo, OpenHands 32B, OLMO 2 32B, and Mistral Small 3. These run exclusively on servers Proton controls so your data is never stored on a third-party platform.

ac29 9/4/2025|||
Mistral small and large are open weight, so they are likely self hosting?
coolspot 9/4/2025|||
Note that Proton is sketchy about their code being open-source and available for anyone to review: https://news.ycombinator.com/item?id=44665398
basisword 9/4/2025||
>> I live in the US, but if I were a European, I would be all in on supporting Mistral. Strengthen your own country and region.

The problem is that if it's actually successful it'll just be bought by one of the big US based competitors.

saubeidl 9/4/2025||
I don't think France would allow that to happen - they would block it on national interest grounds.
cramsession 9/4/2025||
I've never used their models, but I love that design. Kudos to the Mistral design team, the modern pixel art look with orange colors is very cool.
bobbylarrybobby 9/5/2025|
And they even use Arial as their site-wide font?? Now that's certainly... a decision.
kjgkjhfkjf 9/4/2025||
Why would I want to use Mistral's MCP services instead of official MCP services from Notion, Stripe, etc.? It seems to me that the official MCP services would be strictly better, e.g. because I don't have to grant access to my resources to Mistral.
signatoremo 9/4/2025||
Related, Mistral is closing ơn a funding round at $14 billion valuation

https://www.bloomberg.com/news/articles/2025-09-03/mistral-s...

SilverElfin 9/4/2025|
Doesn’t seem like much. Anthropic raised nearly as much the other day in funding as what Mistral is being valued at. Can they really survive?
saubeidl 9/4/2025||
I find that American AI companies are being incredibly wasteful - they've famously been shown up by frugal Deepseek, but generally smarter architectures outweigh more raw resources.
santiagobasulto 9/4/2025||
I think the "frugality" has been challenged by several sources, right? Nobody can prove if they were so inexpensive as they claimed.
aargh_aargh 9/4/2025||
> Directory of 20+ secure connectors

What does secure mean in this context? I didn't see it explained here.

Perhaps they mean this?

> Admin users can confidently control which connectors are available to whom in their organization, with on-behalf authentication, ensuring users only access data they’re permitted to.

oezi 9/4/2025|
Yeah, and what kind of features do MCPs by Stripe and Paypal offer? Currency comversion? Fees? API Docs?
ffsm8 9/4/2025|||
Maybe also transaction search (helpful for customer support), just the currency conversion ratio or balances, helpful for accounting etc. lots of read only usecases around
amelius 9/4/2025||
> Everything available on the Free plan

Cool!

samuel 9/4/2025|
Custom connectors are cool and a good selling point but they have to be remote (afaik there is no Le Chat Desktop) so using it with local resources it's not impossible, but hard to set up and not very practical (you need tail scale funnel or equivalent).
More comments...