Posted by tmaly 1 day ago
Ask HN: How are you doing RAG locally?
Are you using a vector database, some type of semantic search, a knowledge graph, a hypergraph?
Not sure how useful it is for what you need specifically: https://blog.yakkomajuri.com/blog/local-rag
If the total size of your data isn't loo large...?
Data being a plural gets me.
You might have small datums but a lot of kilobytes!
save_memory, recall_memory, search
Save memory vectorizes a session, summarizes it, and stores it in SQLite. Recall memory takes vector or a previous tool run id and loads the full text output. Search takes a vector array or string array and searches through the graph using fuzzy matching and vector dot products.
It’s not fancy, but it works really well. gpt-oss
For local deployments, Qdrant supports storing embeddings in memory as well as in a local directory (similar to sqlite) - for larger deployments Qdrant supports running as a standalone service/sidecar and can be made available over the network.
The newer “agent” search approach can just query a file system or api. It’s slightly slower but easier to setup and maintain as no extra infrastructure.