Posted by tmaly 1/14/2026
Ask HN: How are you doing RAG locally?
Are you using a vector database, some type of semantic search, a knowledge graph, a hypergraph?
If the total size of your data isn't loo large...?
Data being a plural gets me.
You might have small datums but a lot of kilobytes!
On the retrieval side, I built a custom search/indexing layer (Node) specifically for service traceability and discovery. It uses a hybrid approach — embeddings + full-text search + IVF-HNSW — to index and cross-reference our APIs, services, proxies and orchestration repos. The RAG pipelines sit on top of this layer, which gives us reasonable recall and predictable latency.
Compliance and observability are still a problem. Every year new vendors show up promising audits, data lineage and observability, but none of them really handle the informational sprawl of ~600 distributed systems. The entropy keeps increasing.
Lately I’ve been experimenting with a more semantic/logical KAG approach on top of knowledge graphs to map business rules scattered across those systems. The goal is to answer higher-level questions about how things actually work — Palantir-like outcomes, but with explicit logic instead of magic.
Curious if others are moving beyond “pure RAG” toward graph-based or hybrid reasoning setups.
I'm positively surprised on how well it works, especially if you also connect it to an LLM.
https://aws.amazon.com/blogs/machine-learning/use-language-e...
The code for it is here: https://github.com/aws-samples/rss-aggregator-using-cohere-e...
The example link no longer works, as I no longer work at AWS.
For local deployments, Qdrant supports storing embeddings in memory as well as in a local directory (similar to sqlite) - for larger deployments Qdrant supports running as a standalone service/sidecar and can be made available over the network.