Remix.run Logo
civeng 5 hours ago

Great write-up. Thank you! I’m contemplating a similar RAG architecture for my engineering firm, but we’re dealing with roughly 20x the data volume (estimating around 9TB of project files, specs, and PDFs). I've been reading about Google's new STATIC framework (sparse matrix constrained decoding) and am really curious about the shift toward generative retrieval for massive speedups well beyond this approach. For those who have scaled RAG into the multi-terabyte range: is it actually worth exploring generative retrieval approaches like STATIC to bypass standard dense vector search, or is a traditional sharded vector DB (Milvus, Pinecone, etc.) still the most practical path at this scale?

I would guess the ingestion pain is still the same.

This new world is astounding.

lukewarm707 3 hours ago | parent | next [-]

9tb should be fine for vectordb, for sure. google search is many petabytes of index with vector+semantic search, that is using ScaNN.

you could probably use the hybrid search in llamaindex; or elasticsearch. there is an off the shelf discovery engine api on gcp. vertex rag engine is end to end for building your own. gcp is too expensive though. alibaba cloud have a similar solution.

physicsguy 5 hours ago | parent | prev | next [-]

We did it in an engineering setting and had very mixed results. Big 800 page machine manuals are hard to contextualise.

te_chris 4 hours ago | parent | prev [-]

There’s turbopuffer