I've got some foreign artbooks that I would like to get translated. The translations would need to be in place since the placement of the text relative to the pictures around it is fairly important. I took a look at some paid options online, but they seemed to choke - mostly because of the non-standard text placements and all.
The best solution I could come up with is using Google Lens to overlay a translation while I go through the books, but holding a camera/tablet up to my screen isn't very comfortable. Chrome has Lens built in, but (IIRC) I still need to manually select sections for it to translate - it's not as easy to use as just holding my phone up.
Anyone know of any progress towards in-place OCR/translations?
Wonder if Word uses the same system Edge has. I remember Edge was also good, but like Chrome's Lens, I'd need to highlight sections for it to get translated. Edge also OCR'd everything very well - just didn't do the translation part automatically.
- Parts of the table of contents were headings
- I didn't like how tables were links to separate markdown files.
In theory, I could recombine everything into one document, but that would require complicated Markdown parsing and manipulation and I wasn't even sure how to go about that given how free-form the resulting text was. I also haven't gone through the entire document (it's 784 pages) to check to make sure it's correct compared to what pdftotext or acrobat could create, so there's that too.
1. Use native PDF parsing if the model supports it
2. Use this Mistral OCR model (we updated to this version yesterday)
3. UNLESS you override the "engine" param to use an alternate. We support a JS-based (non-LLM) parser as well [0]
So yes, in practice a lot of OCR jobs go to Mistral, but not all of them.
Would love to hear requests for other parsers if folks have them!
[0] https://openrouter.ai/docs/guides/overview/multimodal/pdfs#p...
I will pay a premium for an inferior product or service if it means I don't have to deal with sales people.
I also hate dealing with sales people and am not going to reach out to them via another avenue as they will try and posture as if they’re doing us a huge favor (in contrast to me begging gdb for gpt4 api access).