Posted by denssumesh 1 day ago
I'm not familiar with how most out of the box RAG systems categorize data, but with a database you can index content literally in any way you want. You could do it like a filesystem with hierarchy, you could do it tags, or any other design you can dream up.
The search can be keyword, like grep, or vector, like rag, or use the ranking algorithms that traditional text search uses (tf-idf, BM25), or a combination of them. You don't have to use just the top X ranked documents, you could, just like grep, evaluate all results past whatever matching threshold you have.
Search is an extremely rich field with a ton of very good established ways of doing things. Going back to grep and a file system is going back to ... I don't know, the 60s level of search tech?
Empirically, agents (especially the coding CLIs) seem to be doing so much better with files, even if the tooling around them is less than ideal. With other custom tools they instantly lose 50 IQ points, if they even bother using the tools in the first place.
We’re rediscovering forms of in search we’ve known about for decades. And it turns out they’re more interpretable to agents.
https://softwaredoug.com/blog/2026/01/08/semantic-search-wit...
We started with LLMs when everyone in search was building question answering systems. Those architectures look like the vector DB + chunking we associate with RAG.
Agents ability to call tools, using any retrieval backend, call that into question.
We really shouldn’t start RAG with the assumption we need that. I’ll be speaking about the subject in a few weeks
https://maven.com/p/7105dc/rag-is-the-what-agentic-search-is...
Now, the pendulum on that general concept seems to be swinging the opposite direction where a lot of those people just figured out that you don't need embeddings. That's true, but I'd suggest that people don't overindex on thinking that means embeddings are not actually useful or valuable. Embeddings can be downright magical in what you can build with them, they're just one more tool at your disposal.
You can mix and match these things, too! Indexing your documents into semantically nested folders for agents to peruse? Try chunking and/or summarizing each one, and putting the vectors in sidecar files, or even Yaml frontmatter. Disks are fast these days, you can rip through a lot of files indexed like that before you come close to needing something more sophisticated.
I am active in fandoms and want to create a search where someone can ask "what was that fanfic where XYZ happened?" and get an answer back in the form of links to fanfiction that are responsive.
This is a RAG system, right? I understand I need an actual model (that's something like ollama), the thing that trawls the fanfiction archive and inserts whatever it's supposed to insert into one of these vector DBs, and I need a front-facing thing I write, that takes a user query, sends it to ollama, which can then search the vector DB and return results.
Or something like that.
Is it a RAG system that solves my use case? And if so, what software might I go about using to provide this service to me and my friends? I'm assuming it's pretty low in resource usage since it's just text indexing (maybe indexing new stuff once a week).
The goal is self-hosting. I don't wanna be making monthly payments indefinitely for some silly little thing I'm doing for me and my friends.
I am just a stay at home dad these days and don't have anyone to ask. I'm totally out the tech game for a few years now. I hope that you could respond (or someone else could), and maybe it will help other people.
There's just so many moving parts these days that I can't even hope to keep up. (It's been rather annoying to be totally unable to ride this tech wave the way I've done in the past; watching it all blow by me is disheartening).
The important part here is that you now don’t have to compare strings anymore (like looking for occurrences of the word "fanfiction" in the title and content), but instead you can perform arbitrary mathematical operations to compare query embeddings to stored embeddings: 1 is closer to 3 than 7, and in the same way, fanfiction is closer to romance than it is to biography. Now, if you rank documents by that proximity and take the top 10 or so, you end up with the documents most similar to your query, and thus the most relevant.
That is the R in RAG; the A as in Augmentation happens when, before forwarding the search query to an LLM, you also add all results that came back from your vector database with a prefix like "the following records may be relevant to answer the users request", and that brings us to G like Generation, since the LLM now responds to the question aided by a limited set of relevant entries from a database, which should allow it to yield very relevant responses.
I hope this helps :-)
The "ai" npm package includes a root-level docs folder containing .mdx versions of the docs from their site, specific to the version of the package. Their intended AI-assisted developer experience is that people discover and install their ai-sdk skill (via their npx skills tool, which supports discovery and install of skills from most any provider, not just Vercel). The SKILL.md instructs the agent to explicitly ignore all knowledge that may have been trained into its model, and to first use grep to look for docs in node_modules/ai/docs/ before searching the website.
https://github.com/vercel/ai/blob/main/skills/use-ai-sdk/SKI...
I'm working on a related challenge which is mounting a virtual filesystem with FUSE that mirrors my Mac's actual filesystem (over a subtree like ~/source), so I can constrain the agents within that filesystem, and block destructive changes outside their repo.
I have it so every repo has its own long-lived agent. They do get excited and start changing other repos, which messes up memory.
I didn't want to create a system user per repo because that's obnoxious, so I created a single claude system user, and I am using the virtual file system to manage permissions. My gmail repo's agent can for instance change the gmail repo and the google_auth repo, but it can't change the rag repo.
Edit: I'm publishing it here. It's still under development. https://github.com/sunir/bashguard
I also vibed a brainstorming note with my knowledge base system. The initial prompt: """when I read "We replaced RAG with a virtual filesystem for our AI documentation assistant (mintlify.com)" title on HackerNews - the discussion is about RAG, filesystems, databases, graphs - but maybe there is something more fundamental in how we structure the systems so that the LLM can find the information needed to answer a question. Maybe there is nothing new - people had elaborate systems in libraries even before computers - but maybe there is something. Semantic search sounds useful - but knowing which page to return might be nearly as difficult as answering the question itself - and what about questions that require synthesis from many pages? Then we have distillation - an table of content is a kind of distillation targeting the task of search. """ Then I added a few more comments and the llm linked the note with the other pages in my kb. I am documenting that - because there were many voices against posting LLM generated content and that a prompt will be enough. IMHO the prompt is not enough - because the thought was also grounded in the whole theory I gathered in the KB. And that is also kind of on topic here. Anyway - here is the vibed note: https://zby.github.io/commonplace/notes/charting-the-knowled...
The filesystem metaphor works because it preserves heirarchy. Documents have sections, sections have relationships, and those relationships carry meaning that gets lost when you flatten everything into embeddings.
Curious how this handles versioning though. Docs change constantly and stale context fed to an LLM is arguably worse than no context at all.
But in the end, I would expect, that you could add a skill / instructions on how to use chromadb directly
To be honest, I have no idea what chromadb is or how it works. But building an overlay FS seems like quite lot of work.
Putting Chroma behind a FUSE adapter was my initial thought when I was implementing this but it was way too slow.
I think we would also need to optimize grep even if we had a FUSE mount.
This was easier in our case, because we didn’t need a 100% POSIX compatibility for our read only docs use case because the agent used only a subset of bash commands anyway to traverse the docs. This also avoids any extra infra overhead or maintenance of EC2 nodes/sandboxes that the agent would have to use.
But the idea of spinning up a whole VM to use unix IO primitives is way overkill. Makes way more sense to let the agent spit our unix-like tool calls and then use whatever your prod stack uses to do IO.