Mistral OCR 3 - Hacker News

Posted by pember 12/18/2025

694 points | 130 commentspage 2

jesuslop 12/19/2025|

I am testing it as a replacement of MathPix, first few tests look rather decent. In python for windows: https://pastebin.com/uyiFHKdJ (alpha version prototype). Launches windows snip tool, waits for clipboard image, calls Mistral, retrieves markdown and puts it as text in the clipboard, ready to be pasted in Typora, Obsidian, or other markdown editor.

speff 12/20/2025||

This might be a good place to check the options available for OCR in-place translations. I took a look at OCR3, but it doesn't seem to support my use-case. It looks more tailored towards data extraction for further processing.

I've got some foreign artbooks that I would like to get translated. The translations would need to be in place since the placement of the text relative to the pictures around it is fairly important. I took a look at some paid options online, but they seemed to choke - mostly because of the non-standard text placements and all.

The best solution I could come up with is using Google Lens to overlay a translation while I go through the books, but holding a camera/tablet up to my screen isn't very comfortable. Chrome has Lens built in, but (IIRC) I still need to manually select sections for it to translate - it's not as easy to use as just holding my phone up.

Anyone know of any progress towards in-place OCR/translations?

claar 12/20/2025||

If you don't mind a paid solution, try DEEPL. I also use Word's built in document translation to good effect.

speff 12/20/2025||

I don't mind paying for one, though I do remember trying DEEPL without much success. Can't remember the problem offhand, but one of the services I tried just gave me a generic error when I uploaded the PDF. My view at the time was that it had a conniption and just gave up.

Wonder if Word uses the same system Edge has. I remember Edge was also good, but like Chrome's Lens, I'd need to highlight sections for it to get translated. Edge also OCR'd everything very well - just didn't do the translation part automatically.

haraldooo 12/20/2025||

I’m fairly confident this is solvable quite well with “just two api calls”. Are examples of those books available online?

speff 12/20/2025||

Sure - there are some good examples in the product pictures for this book: https://www.amazon.com/hands-Takami-Kagami-teaches-power/dp/...

ethin 12/20/2025||

So I tried this on the NVMe specification (I have a huge library of PDFs) and it worked decently, though the output had some oddities:

- Parts of the table of contents were headings

- I didn't like how tables were links to separate markdown files.

In theory, I could recombine everything into one document, but that would require complicated Markdown parsing and manipulation and I wasn't even sure how to go about that given how free-form the resulting text was. I also haven't gone through the entire document (it's 784 pages) to check to make sure it's correct compared to what pdftotext or acrobat could create, so there's that too.

film42 12/19/2025||

Is open router still sending all OCR jobs to Mistral? I wonder if they're trying to keep that spot. Seems like Mistral and Google are the best at OCR right now, with Google leading Mistral by a fair bit.

numlocked 12/19/2025|

(I work at OpenRouter) If you send a PDF to our API we will:

1. Use native PDF parsing if the model supports it

2. Use this Mistral OCR model (we updated to this version yesterday)

3. UNLESS you override the "engine" param to use an alternate. We support a JS-based (non-LLM) parser as well [0]

So yes, in practice a lot of OCR jobs go to Mistral, but not all of them.

Would love to hear requests for other parsers if folks have them!

[0] https://openrouter.ai/docs/guides/overview/multimodal/pdfs#p...

vikp 12/20/2025|||

Hey, I'm the founder of Datalab (we released Chandra OCR). I see someone requested it below - happy to help you all get setup. I'm vik@datalab.to

siquick 12/20/2025||||

That links gives an error and so does https://openrouter.ai/docs/guides/overview/multimodal/pdfs

dimitri-vs 12/19/2025|||

Chandra

singularity2001 12/19/2025||

No one mentioning the possibly most beautiful css effect on the Internet??

jbk 12/20/2025|

How so?

i_am_not_groot 12/20/2025||

Finally a way to read doctor's prescriptions

7thpower 12/19/2025||

My main beef with mistral is that they don’t bother to respond to customer inquiries for products the hide behind “reach out for pricing” terms, so even if they were better than SoTA it wouldn’t really matter.

650REDHAIR 12/20/2025|

I absolutely loathe dealing with sales people.

I will pay a premium for an inferior product or service if it means I don't have to deal with sales people.

7thpower 12/20/2025||

Agreed. In this case the offering just fit neatly into a non core stack we had designed and displaced a bunch of stuff didn’t want to build ourselves.

I also hate dealing with sales people and am not going to reach out to them via another avenue as they will try and posture as if they’re doing us a huge favor (in contrast to me begging gdb for gpt4 api access).

constantinum 12/20/2025||

At instances where data accuracy is of paramount importance, i think a hybrid route of non-llm ocr for data parsing and LLMs for structured data extraction is the safe passage to tread on. Seen better results for LLMWhisperer(OCR)[1] and Latest Gemini.

[1] - https://pg.llmwhisperer.unstract.com/

Western0 12/20/2025||

I need solresol in any language. It are constructed for discusion and negotiation on war

singularity2001 12/19/2025|

Not OS / free weights right?

More comments...