From zero to a RAG system: successes and failures

Posted by andros 2 days ago

From zero to a RAG system: successes and failures(en.andros.dev)

236 points | 72 commentspage 2

abd7894 7 hours ago|

What ended up being the main bottleneck in your pipeline—embedding throughput, cost, or something else? Did you explore parallelizing vectorization (e.g., multiple workers) or did that not help much in practice?

trgn 7 hours ago||

Odd to me that Elasticsearch isn't finding a second breath in these new ecosystems. It basically is that now, a RAG engine with model integration.

mickeyp 4 hours ago||

The old joke Zawinski made about picking regex "and now you have two problems" applies here.

If you pick Elasticsearch, useful as it is, you now have more than two problems. You have Elastic the company; Elasticsearch the tool; and also the clay-footed colossus, Java, to contend with.

sailfast 7 hours ago|||

It’s definitely a use case for this and would’ve saved a lot of pain IMO but also seems like it would have added confusing technology to what was a VERY Python-heavy stack that would’ve benefitted from other elements.

Hardest part is always figuring out your company’s knowledge management has been dogsh!t for years so now you need to either throw most of it away or stick to the authoritative stuff somehow.

Elastic plus an agent with MCP may have worked as a prototype very quickly here, but hosting costs for 500GB worth of indexes sounds too expensive for this person’s use case if $185 is a lot.

trgn 7 hours ago||

ah got it! thanks for the color

mrits 7 hours ago||

The people that survived it aren't willing to give it anymore of their breathing left

trgn 7 hours ago||

haha! it's been ok for me, but a lot of song and dance is required. the saas-version is a black box (in a bad way).

civeng 7 hours ago||

Great write-up. Thank you! I’m contemplating a similar RAG architecture for my engineering firm, but we’re dealing with roughly 20x the data volume (estimating around 9TB of project files, specs, and PDFs). I've been reading about Google's new STATIC framework (sparse matrix constrained decoding) and am really curious about the shift toward generative retrieval for massive speedups well beyond this approach. For those who have scaled RAG into the multi-terabyte range: is it actually worth exploring generative retrieval approaches like STATIC to bypass standard dense vector search, or is a traditional sharded vector DB (Milvus, Pinecone, etc.) still the most practical path at this scale?

I would guess the ingestion pain is still the same.

This new world is astounding.

lukewarm707 5 hours ago||

9tb should be fine for vectordb, for sure. google search is many petabytes of index with vector+semantic search, that is using ScaNN.

you could probably use the hybrid search in llamaindex; or elasticsearch. there is an off the shelf discovery engine api on gcp. vertex rag engine is end to end for building your own. gcp is too expensive though. alibaba cloud have a similar solution.

physicsguy 7 hours ago|||

We did it in an engineering setting and had very mixed results. Big 800 page machine manuals are hard to contextualise.

te_chris 6 hours ago||

There’s turbopuffer

lucfranken 7 hours ago||

Cool work! Would be so interested in what would happen if you would put the data and you plan / features you wanted in a Claude Code instance and let it go. You did carefully thinking, but those models now also go really far and deep. Would be really interested in seeing what it comes up with. For that kind of data getting something like a Mac mini or whatever (no not with OpenClaw) would be damn interesting to see how fast and far you can go.

tom1337 7 hours ago|

But where is the fun with that?

lucfranken 45 minutes ago||

Being curious is always fun right.

Horatius77 2 days ago||

Great writeup but ... pretty sure ChromaDB is open source and not "Google's database"?

nalinidash 9 hours ago||

ChromaDB is open source with Apache-2.0 license.

https://github.com/chroma-core/chroma

threatofrain 10 hours ago||

I'm afraid this hits the credibility of the article for me, that's a pretty weird mistake to make. It's like paying for a Model 3 while thinking it comes from Ford.

andros 9 hours ago||

Thank you for your feedback!

alansaber 7 hours ago||

Think that's the first time i've seen someone write about checkpointing, definitely worth doing for similar projects.

supermooka 6 hours ago||

Thanks for an interesting read! Are you monitoring usage, and what kind of user feedback have you received? Always curious if these projects end up used because, even with the perfect tech, if the data is low quality, nobody is going to bother

aledevv 9 hours ago||

I made something similar in my project. My more difficult task has been choice the right approach to chunking long documents. I used both structural and semantic chunking approach. The semantic one helped to better store vectors in vectorial DB. I used QDrant and openAi embedding model.

brcmthrowaway 4 hours ago|

What was the system prompt?

More comments...