Top
Best
New

Posted by tmaly 1/14/2026

Ask HN: How are you doing RAG locally?

I am curious how people are doing RAG locally with minimal dependencies for internal code or complex documents?

Are you using a vector database, some type of semantic search, a knowledge graph, a hypergraph?

413 points | 157 commentspage 2
amscotti 1/15/2026|
More of a proof of concept to test out ideas, but here's my approach for local RAG, https://github.com/amscotti/local-LLM-with-RAG

Using Ollama for the embeddings with “nomic-embed-text”, with LanceDB for the vector database. Recently updated it to use “agentic” RAG, but probably not fully needed for a small project.

someguyiguess 1/15/2026||
Woah. I am doing something very similar also using lancedb https://github.com/nicholaspsmith/lance-context

Mine is much more basic than yours and I just started it a couple of weeks ago.

threecheese 1/16/2026||
There are so many of us doing the same, just had a similar conversation at $work. It’s pretty exciting. I feel like I’m having to shove another 20 years of development experience into my brain with all these new concepts and abstractions, but the dots have been connecting!
vaylian 1/15/2026||
Thank you for being the kind of person who explains what the abbreviation RAG stands for. I have been very confused reading this thread.
someguyiguess 1/15/2026||
I feel this pain! It feels like in the world of LLMs there is a new acronym to learn every day!

For the curious RAG = Retrieval Augmented Generation. From wikipedia: RAG enables large language models (LLMs) to retrieve and incorporate new information from external data sources

autogn0me 1/15/2026||
https://github.com/ggozad/haiku.rag/ - the embedded lancedb is convenient and has benchmarks; uses docling. qwen3-embedding:4b, 2560 w/ gpt-oss:20b.
miohtama 1/15/2026|
+1 for Haiku! It's very simple to get up and running.
juanre 1/15/2026||
I built https://github.com/juanre/llmemory and I use it both locally and as part of company apps. Quite happy with the performance.

It uses PostgreSQL with pgvector, hybrid BM25, multi-query expansion, and reranking.

(It's the first time I share it publicly, so I am sure there'll be quirks.)

marwamc 1/15/2026||
BM25 has been sufficient for my needs. I typically need to refer to codebases of existing tools as referential sources (istio, envoy, oauth2-proxy, tantivy index etc) so I just clone those repos, index them and search away. Built a cli and mcp tool for this workflow.

https://github.com/rhobimd-oss/shebe

One area where BM25 particularly shines is the refactoring workflow: let's say you want to upgrade your istio installation from 1.28 to 1.29 and maybe in 1.29 the authorizationpolicy crd has a breaking change in one of it's properties. BM25 allows you to efficiently enumerate all code locations in your codebase that need to change and then you can set the cli coders off using this list. Grep and LSP can still perform this enumeration but they have shortcomings. Wrote about it here https://github.com/rhobimd-oss/shebe/blob/main/WHY_SHEBE.md#...

tubs 1/15/2026|
The download links for binaries 404 for me.
marwamc 1/15/2026||
Will fix the links. Meanwhile here is the releases page. I develop on gitlab and mirror to github. Need to make that clear as well.

https://gitlab.com/rhobimd-oss/shebe/-/releases

tubs 1/15/2026||
Ah, I tried the gitlab and the tarballs 404 for me there, sorry I should have been more specific in the original post!

fwiw this does look interesting.

marwamc 1/19/2026|||
Got around to sorting the 404. Releases now work.

https://gitlab.com/rhobimd-oss/shebe/-/releases/v0.5.6-rc2

marwamc 1/15/2026|||
I see what's happening. I never validated those build artifacts... Thanks for the catch. Will rebuild notify you here.
lmeyerov 1/15/2026||
Claude code / codex which internally uses ripgrep, and I'm unsure if it's using parallel mode. And, project specific static analyzers.

Studies generally show when you do agentic retrieval w/ text search, that's pretty good. Adding vector retrieval and graph rag, so the typical parallel multi-retrieval followed by reranking, gives a bit of speedup and quality lift. That lines up with my local flow experience, where it is only enough that I want that for $$$$ consumer/prosumer tools, and not easy enough for DIY that I want to invest in that locally. For those who struggle with tools like spotlight running when it shouldn't, that kind of thing turns me off on the cost/benefit side.

For code, I experiment with unsound tools (semgrep, ...) vs sound flow analyzers, carefully setup for the project. Basically, ai coders love to use grep/sed for global replace refactors and other global needs, but keeps tripped up on sound flow analysis. Similar to lint and type checking, that needs to be setup for a project and taught as a skill. I'm not happy with any of my experiments here yet however :(

mmargenot 1/15/2026|
Cursor uses a vector index, some details here: https://cursor.com/docs/context/semantic-search
lmeyerov 1/15/2026||
Thanks!

Their discussion is super relevant to exactly what I wrote --

* They note speed benefits * The quality benefit they note is synonym search... which agentic text search can do: Agents can guess synonyms in the first shot for you, eg, `navigation` -> `nav|header|footer`, and they'll be iterating anyways

To truly do better, and not make the infra experience stink, it's real work. We do it on our product (louie.ai) and our service engagements, but real costs/benefits.

spqw 1/15/2026||
I am surprised to see very few setups leveraging LSP support. (Language Server Protocol) It has been added to Claude Code last month. Most setups rely on naive grep.
d4rkp4ttern 1/15/2026||
LSP is currently broken in CC:

https://github.com/anthropics/claude-code/issues/15168

woggy 1/15/2026|||
I've written a few terminal tools on top of Roslyn to assist Claude in code analysis for C# code. Obviously the tools are also written with the help of Claude. Worked quite well.
aqula 1/15/2026||
LSP is not great for non-editor use cases. Everything is cursor position oriented.
HarHarVeryFunny 1/15/2026|||
Yes, something like TreeSitter would seem to be of more value - able to lookup symbols by name, and find the spans of source code where they are defined and used.
alchemist1e9 1/15/2026||
https://github.com/ast-grep/ast-grep
HarHarVeryFunny 1/16/2026||
I don't see ast-grep as being very useful to an agent.

What a coding agent needs is to be able to locate portions of source code relevant to what it has been tasked with, and preferably in more context-efficient fashion than just grepping and loading entire source files into context. One way to do this is something like Cursor's vector index of code chunks, and another would be something like TreeSitter (or other identifier-based tools) that knows where identifiers (variables, functions) are defined and used.

Language servers (LSP) are not useful for this task since they can't tell the agent "where is function foo() defined" (but TreeSitter can), since as someone else noted language servers are based on location (line number) not content (symbols). Language servers are designed to help editors.

It's possible that ast-grep might be some some use to a coding agent, but looking for syntax/AST patterns rather than just identifier definitions and usages seems a much more niche facility.

WilcoKruijer 1/15/2026|||
There are actions that don't require cursor position, like document/workspace symbols, that could be useful.
yokuze 1/15/2026||
I made, and use this: https://github.com/libragen/libragen

It’s a CLI tool and MCP server for creating discrete, versioned “libraries” of RAG-able content.

Under the hood, it uses an embedding model locally. It chunks your content and stores embeddings in SQLite. The search functionality uses vector + keyword search + a re-ranking model.

You can also point it at any GitHub repo and it will create a RAG DB out of it.

You can also use the MCP server to create and query the libraries.

Site: https://www.libragen.dev/

bradfa 1/15/2026|
Your README references a file named LICENSE which doesn't seem to exist on the main branch.
yokuze 1/17/2026||
Fixed. Thank you!
raghavankl 1/15/2026||
I have a python tooling to do indexing and relevance offline using ollama.

https://github.com/raghavan/pdfgptindexer-offline

threecheese 1/16/2026||
For my personal PKM slash “learn this crap”, I have a fully local hybrid search on my MacBook using MLX and SQLite.

I store file content blobs in SQLite, and use FTS5 (bm25) to maintain a fulltext index plus sqlite-vec for storing embeddings. Search uses both of these, and then reciprocal rank fusion gets the best results and pipes those to a local transformers model to judge. It’s all Python with mlx-lm and mlx-embeddings libraries, the models are grabbed from huggingface. It’s not the fastest, but it’s local and easy to understand (and for Claude to write, mostly).

gaganyatri 1/15/2026|
Built discovery using - Qwen-3-VL-8B for Document Ocr + Prompts + Tool Call - ChromaDB for Vector storage. - BM25 + Embedding model for Hybrid RAG. - Backend- FastAPI + Python - Frontend- React + Typescript - vllm + docker for model deployment on L40 GPU

Demo: https://app.dwani.ai

GitHub: https://github.com/dwani-ai/discovery

Now working on added Agentic features, by continuous analysis of Document with Generated prompts.

More comments...