Top
Best
New

Posted by tmaly 1/14/2026

Ask HN: How are you doing RAG locally?

I am curious how people are doing RAG locally with minimal dependencies for internal code or complex documents?

Are you using a vector database, some type of semantic search, a knowledge graph, a hypergraph?

377 points | 147 commentspage 2
scosman 1/15/2026|
Kiln wraps up all the parts in on app. Just drag and drop in files. You can easily compare different configs on your dataset: extraction methods, embedding model, search method (BM25, hybrid, vector), etc.

It uses LanceDB and has dozens of different extraction/embedding models to choose from. It even has evals for checking retrieval accuracy, including automatically generating the eval dataset.

You can use its UI, or call the RAG via MCP.

https://github.com/kiln-ai/kiln

https://docs.kiln.tech/docs/documents-and-search-rag

juanre 1/15/2026||
I built https://github.com/juanre/llmemory and I use it both locally and as part of company apps. Quite happy with the performance.

It uses PostgreSQL with pgvector, hybrid BM25, multi-query expansion, and reranking.

(It's the first time I share it publicly, so I am sure there'll be quirks.)

threecheese 1/16/2026||
For my personal PKM slash “learn this crap”, I have a fully local hybrid search on my MacBook using MLX and SQLite.

I store file content blobs in SQLite, and use FTS5 (bm25) to maintain a fulltext index plus sqlite-vec for storing embeddings. Search uses both of these, and then reciprocal rank fusion gets the best results and pipes those to a local transformers model to judge. It’s all Python with mlx-lm and mlx-embeddings libraries, the models are grabbed from huggingface. It’s not the fastest, but it’s local and easy to understand (and for Claude to write, mostly).

marwamc 1/15/2026||
BM25 has been sufficient for my needs. I typically need to refer to codebases of existing tools as referential sources (istio, envoy, oauth2-proxy, tantivy index etc) so I just clone those repos, index them and search away. Built a cli and mcp tool for this workflow.

https://github.com/rhobimd-oss/shebe

One area where BM25 particularly shines is the refactoring workflow: let's say you want to upgrade your istio installation from 1.28 to 1.29 and maybe in 1.29 the authorizationpolicy crd has a breaking change in one of it's properties. BM25 allows you to efficiently enumerate all code locations in your codebase that need to change and then you can set the cli coders off using this list. Grep and LSP can still perform this enumeration but they have shortcomings. Wrote about it here https://github.com/rhobimd-oss/shebe/blob/main/WHY_SHEBE.md#...

tubs 1/15/2026|
The download links for binaries 404 for me.
marwamc 1/15/2026||
Will fix the links. Meanwhile here is the releases page. I develop on gitlab and mirror to github. Need to make that clear as well.

https://gitlab.com/rhobimd-oss/shebe/-/releases

tubs 1/15/2026||
Ah, I tried the gitlab and the tarballs 404 for me there, sorry I should have been more specific in the original post!

fwiw this does look interesting.

marwamc 1/19/2026|||
Got around to sorting the 404. Releases now work.

https://gitlab.com/rhobimd-oss/shebe/-/releases/v0.5.6-rc2

marwamc 1/15/2026|||
I see what's happening. I never validated those build artifacts... Thanks for the catch. Will rebuild notify you here.
lmeyerov 1/15/2026||
Claude code / codex which internally uses ripgrep, and I'm unsure if it's using parallel mode. And, project specific static analyzers.

Studies generally show when you do agentic retrieval w/ text search, that's pretty good. Adding vector retrieval and graph rag, so the typical parallel multi-retrieval followed by reranking, gives a bit of speedup and quality lift. That lines up with my local flow experience, where it is only enough that I want that for $$$$ consumer/prosumer tools, and not easy enough for DIY that I want to invest in that locally. For those who struggle with tools like spotlight running when it shouldn't, that kind of thing turns me off on the cost/benefit side.

For code, I experiment with unsound tools (semgrep, ...) vs sound flow analyzers, carefully setup for the project. Basically, ai coders love to use grep/sed for global replace refactors and other global needs, but keeps tripped up on sound flow analysis. Similar to lint and type checking, that needs to be setup for a project and taught as a skill. I'm not happy with any of my experiments here yet however :(

autogn0me 1/15/2026||
https://github.com/ggozad/haiku.rag/ - the embedded lancedb is convenient and has benchmarks; uses docling. qwen3-embedding:4b, 2560 w/ gpt-oss:20b.
miohtama 1/15/2026|
+1 for Haiku! It's very simple to get up and running.
raghavankl 1/15/2026||
I have a python tooling to do indexing and relevance offline using ollama.

https://github.com/raghavan/pdfgptindexer-offline

spqw 1/15/2026||
I am surprised to see very few setups leveraging LSP support. (Language Server Protocol) It has been added to Claude Code last month. Most setups rely on naive grep.
d4rkp4ttern 1/15/2026||
LSP is currently broken in CC:

https://github.com/anthropics/claude-code/issues/15168

woggy 1/15/2026|||
I've written a few terminal tools on top of Roslyn to assist Claude in code analysis for C# code. Obviously the tools are also written with the help of Claude. Worked quite well.
aqula 1/15/2026||
LSP is not great for non-editor use cases. Everything is cursor position oriented.
HarHarVeryFunny 1/15/2026|||
Yes, something like TreeSitter would seem to be of more value - able to lookup symbols by name, and find the spans of source code where they are defined and used.
alchemist1e9 1/15/2026||
https://github.com/ast-grep/ast-grep
HarHarVeryFunny 1/16/2026||
I don't see ast-grep as being very useful to an agent.

What a coding agent needs is to be able to locate portions of source code relevant to what it has been tasked with, and preferably in more context-efficient fashion than just grepping and loading entire source files into context. One way to do this is something like Cursor's vector index of code chunks, and another would be something like TreeSitter (or other identifier-based tools) that knows where identifiers (variables, functions) are defined and used.

Language servers (LSP) are not useful for this task since they can't tell the agent "where is function foo() defined" (but TreeSitter can), since as someone else noted language servers are based on location (line number) not content (symbols). Language servers are designed to help editors.

It's possible that ast-grep might be some some use to a coding agent, but looking for syntax/AST patterns rather than just identifier definitions and usages seems a much more niche facility.

WilcoKruijer 1/15/2026|||
There are actions that don't require cursor position, like document/workspace symbols, that could be useful.
yokuze 1/15/2026||
I made, and use this: https://github.com/libragen/libragen

It’s a CLI tool and MCP server for creating discrete, versioned “libraries” of RAG-able content.

Under the hood, it uses an embedding model locally. It chunks your content and stores embeddings in SQLite. The search functionality uses vector + keyword search + a re-ranking model.

You can also point it at any GitHub repo and it will create a RAG DB out of it.

You can also use the MCP server to create and query the libraries.

Site: https://www.libragen.dev/

bradfa 1/15/2026|
Your README references a file named LICENSE which doesn't seem to exist on the main branch.
folli 1/15/2026|
I was just working on a RAG implementation for >500k news articles, completely local, using postgres as a vector database: https://github.com/r-follador/TeletextSignals

I'm positively surprised on how well it works, especially if you also connect it to an LLM.

More comments...