Top
Best
New

Posted by tmaly 1/14/2026

Ask HN: How are you doing RAG locally?

I am curious how people are doing RAG locally with minimal dependencies for internal code or complex documents?

Are you using a vector database, some type of semantic search, a knowledge graph, a hypergraph?

413 points | 157 commentspage 4
prakashn27 1/15/2026|
I feel local rag system , slows down my computer (I got M1 Pro 32 GB)

So I use hosted one to prevent this. My business use vector db, so created a new db to vectorize and host my knowledge base. 1. All my knowledge base is markdown files. So I split that by header tags. 2. The split is hashed and hash value is stored in SQLite 3. The hashed version is vectorized and pushed to cloud db. 4. When ever I make changes , I run a script which splits and checks hash, if it is changed the. I upsert the document. If not I don’t do anything. This helps me keep the store up to date

For search I have a cli query which searches and fetches from vector store.

IXCoach 1/19/2026||
I have production agents which run vector search via FAISS locally ( in their env not 3rd party environments ), and for which I am creating embeddings for specific domains.

1 - agent memory ( its an ai coach so its the unique training methods that allow for instant adoption of new skills and distilling best fit skills for context )

2 - user memory ( the ai coaches memory of a user )

3 - session memory ( for long conversations, instead of compaction or truncation )

Then separately I have coding agents which I give semantic search, same system FAISS

- on command they create new memories from lessons ( consumes tokens * ) - they vector search FAISS when needing more context ( 2x greater agent alignment / outcomes this way )

And finally I forked openais codex terminal agent code to add - inbuilt vector search and injection

So I say "Find any uncovered TDD opportunity matching intent to actuality for auth on these 3 repos, write TDD coverage, and bring failures to my attention"

They set my message to {$query}

vector search on {$query}

embed results in their context window

programmatically - so no token consumption ( what a freaking dream )

thats open source if helpful

Its here

https://github.com/Next-AI-Labs-Inc/codex/tree/nextailabs

Im trying to determine where something like this fits in

https://huggingface.co/MongoDB/mdbr-leaf-ir

My gaps right now are ...

I am not training the agents yet, like fine tuning the underlying models.

Would love the simplest approach to test this, because at least with the codex clone I could easily swap out local models, but somehow doubting that they will be able to match performance of the outsourced models.

especially bc claude code just launched ahead of codex in the last week or so in quality, and they are closed source. Im seeing clear swarm agentic coding internally which is a dream for context window efficiency. ( in claude code as of today )

jackfranklyn 1/15/2026||
For document processing in a side project, I've been using a local all-MiniLM model with FAISS. Works well enough for semantic matching against ~50k transaction descriptions.

The real challenge wasn't model quality - it was the chunking strategy. Financial data is weirdly structured and breaking it into sensible chunks that preserve context took more iteration than expected. Eventually settled on treating each complete record as a chunk rather than doing sliding windows over raw text. The "obvious" approaches from tutorials didn't work well at all for structured tabular-ish data.

metawake 1/15/2026||
I am using a vector DB using Docker image. And for debugging and benchmarking local RAG retrieval, I've been building a CLI tool that shows what's actually being retrieved:

  ragtune explain "your query" --collection prod
Shows scores, sources, and diagnostics. Helps catch when your chunking or embeddings are silently failing or you need numeric estimations to base your judgements on.

Open source: https://github.com/metawake/ragtune

mmargenot 1/15/2026||
I made an obsidian extension that does semantic and hybrid (RRF with FTS) search with local models. I have done some knowledge graph and ontology experimentation around this, but nothing that I’d like to include yet.

This is specifically a “remembrance agent”, so it surfaces related atoms to what you’re writing rather than doing anything generative.

Extension: https://github.com/mmargenot/tezcat

Also available in community plugins.

init0 1/15/2026||
from piragi import Ragi

kb = Ragi(["./docs", "s3://bucket/data/*/*.pdf", "https://api.example.com/docs"])

answer = kb.ask("How do I deploy this?")

that's it! with https://pypi.org/project/piragi/

yakkomajuri 1/15/2026||
I've written about this (and the post was even here on HN) but mostly from the perspective of running a RAG on your infra as an organization. But I cover the general components and alternatives to Cloud services.

Not sure how useful it is for what you need specifically: https://blog.yakkomajuri.com/blog/local-rag

pj4533 1/15/2026||
Well this isn’t code, but I’ve been working on a memory system for Claude Code. This portion provides semantic search over the session files in .claude/projects. It uses OpenAI for embeddings so not completely local (would be easy to modify) and storage in ChromaDB.

https://github.com/pj4533/seance

reactordev 1/15/2026||
I have three tools dedicated to this.

save_memory, recall_memory, search

Save memory vectorizes a session, summarizes it, and stores it in SQLite. Recall memory takes vector or a previous tool run id and loads the full text output. Search takes a vector array or string array and searches through the graph using fuzzy matching and vector dot products.

It’s not fancy, but it works really well. gpt-oss

ktyptorio 1/18/2026|
I've just released a casual personal project for Ephemeral GraphRAG. It's still experimental and open source: https://github.com/gibram-io/gibram
More comments...