Posted by tmaly 1 day ago
Ask HN: How are you doing RAG locally?
Are you using a vector database, some type of semantic search, a knowledge graph, a hypergraph?
To answer the question more directly, I've spent the last couple of years with a few different quant models mostly running on llama.cpp and ollama, depending. The results are way slower than the paid token api versions, but they are completely free of external influence and cost.
However the models I've tests generally turn out to be pretty dumb at the quant level I'm running to be relatively fast. And their code generation capabilities are just a mess not to be dealt with.
Works well, but I didn't tested on larger scale
The problems with datasheets is tables which span multiple pages, embedded images for diagrams and plots, they're generally PDFs, and only sometimes are they 2-column layout.
Converting from PDF to markdown while retaining tables correctly seems to work well for me with Mistral's latest OCR model, but this isn't an open model. Using docling with different models has produced much worse results.
I’ve optimized https://markdownconverter.pro/pdf-to-markdown to handle complex PDFs, including those tricky tables that span multiple pages and 2-column formats that usually trip up tools like Docling. It also extracts embedded diagrams/images and links them properly in the output.
Full disclosure: I'm the developer behind it. I’d love to see if it handles your specific datasheets better than the models you've tried. Feel free to give it a spin!