Remix.run Logo
m00dy 3 hours ago

RAG is broken when you have too much data.

plingamp 2 hours ago | parent | next [-]

Specifically when the document number reaches around 10k+, a phenomenon called "Semantic Collapse" occurs.

https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Halluc...

thunky 3 hours ago | parent | prev | next [-]

Gemini with Google search is RAG using all public data, and it isn't broken.

fhd2 2 hours ago | parent [-]

It's not tool use with natural language search queries? That's what I'd expect.

kaicianflone 2 hours ago | parent [-]

It is tool use with natural language search queries but going down a layer they are searched on a vector DB, very similar to RAG. Essentially Google RankBrain is the very far ancestor to RAG before compute and scaling.

PlatoIsADisease 2 hours ago | parent | prev [-]

Cant you make thresholds higher?

Hmm... I guess not, you might want all that data.

Super interesting topic. Learning a lot.