Remix.run Logo
_the_inflator 2 days ago

I implemented many RAGs and feel sorry for anyone proclaiming "RAG is dead". These folks have never implemented one, maybe followed a tutorial and installed a "Hello World!" project but that's it.

I don't want to go into detail but 100% agree with the author's conclusion: data is key. Data ingestion to be precisely. Simply using docling and transforming PDFs to markdown and have a vector database doing the rest is ridiculous.

For example, for a high precision RAG with 100% accuracy in pricing as part of the information that RAG provided, I took a week to build a ETL for a 20 page PDF document to separate information between SQL and Graph Database.

And this was a small step with all the tweaking that laid ahead to ensure exceptional results.

What search algorithm or: how many? Embeddings, which quality? Semantics, how and which exactly?

Believe me, RAG is the finest of technical masterpiece there is. I have so many respect for the folks at OpenAI and Anthropic for the ingestion processes and tools they use, because they operate on a level, I will never touch with my RAG implementations.

RAG is really something you should try for yourself, if you love to solve tricky fundamental problems that in the end can provide a lot of value to you or your customers.

Simply don't believe the hype and ignore all "install and embed" solutions. They are crap, sorry to say so.

RansomStark 2 days ago | parent | next [-]

I have proclaimed RAG is dead many times, and I stand by it.

RAG is Dead! Long Live Agentic RAG! || Long Live putting stuff in databases where it damn well belongs!

I think you agree with the people saying RAG is Dead, or at least you agree with me and I say RAG is Dead, when you say "Simply using docling and transforming PDFs to markdown and have a vector database doing the rest is ridiculous."

I fully agree, but that was the promise of RAG, chunk your documents into little bits and find the bit that is closet to the users query and add it to the context, maybe leave a little overlap on the chunks, is how RAG was initially presented, and how many vendors implement RAG, looking at tools like Amazon Bedrock Knowledge Bases here.

When I want to know the latest <important financial number>, I want that pulled that from the source of truth for that data, not hopefully get the latest and not last years number from some document chunk.

So, when people, or at least when I say RAG is Dead, it's short hand for: this is really damn complex, and vector search doesn't replace decades of information theory, storage and retrieval patterns.

Hell, I've worked with teams trying to extract everything from databases to push it into vector stores so the LLM can use the data. First, it often failed as they had chunks with multiple rows of data, and the LLM got confused as to which row actually mattered, they hadn't realized that the full chunk would be returned and not just the row they were interested in. Second, the use cases being worked on by these teams were usually well defined, that is, the required data could be deterministically defined before going to the LLM and pulled from a database using a simple script, no similarity required, but that's not the cool way to do it.

joefourier 2 days ago | parent | next [-]

I agree with you that simple vector search + context stuffing is dead as a method, but I think it's ridiculous to reserve the term "RAG" for just the earliest most basic implementation. The definition of Retrieval Augmented Generation is any method that tries to give the LLM relevant data dynamically as opposed to relying purely on it memorising training data, or giving it everything it could possibly need and relying on long context windows.

The RAG system you mentioned is just RAG done badly, but doing it properly doesn't require a fundamentally different technique.

hbrn 16 hours ago | parent [-]

> it's ridiculous to reserve the term "RAG" for just the earliest most basic implementation

Whether we like it or not, dumb semantic search became the colloquial definition of RAG.

And when you hear someone saying "we use RAG here" 95% of the time this is exactly what they mean.

When you inject user's name into the system prompt, technically you're doing RAG - but nobody thinks about it that way. I think it's one of those case where colloquial definition is actually more useful that the formal one.

> doing it properly doesn't require a fundamentally different technique

But agentic RAG is fundamentally different.

joefourier 15 hours ago | parent [-]

Then what do you call RAG done well? You need a term for it.

> And when you hear someone saying "we use RAG here" 95% of the time this is exactly what they mean.

That's just Sturgeon's law in action. 95% of every implementation is crap. Back in the 90s, you might have heard "we use OOP here" and come to a similar conclusion, but that doesn't mean you need to invent a new word for doing OOP properly.

> But agentic RAG is fundamentally different.

From an implementation POV, absolutely not.

I've personally gradually converted a dumb semantic search to a more fully featured agentic RAG in small steps like these:

- Have a separate LLM call write the query instead of just using the user's message. - Make the RAG search a synthetic injected tool call, instead of appending it to the system prompt. - Improve the search endpoint by using an LLM to pre-process the data into structured chunks with hierarchical categories, tags, and possible search queries, embedding the search queries separately from the desired information (versus originally just having a raw blob). - Have the LLM be able to search both with a semantic sentence, and a list of tags. - Have the LLM view and navigate the hierarchy in a tree-like manner. - Make the original LLM able to call the search on its own instead of being automatically injected using a separate query rewriting call, letting it search in multiple rounds and refine its own queries.

When did the system go from RAG to "not RAG"? Because fundamentally, all you need to do to make an agentic RAG is to have the LLM be able to write/rewrite its own search queries (possibly in multiple passes) as opposed to just passing the user's messages(s) directly.

ozim 2 hours ago | parent | next [-]

I like the audacity of parent poster that equates 95% of implementations he has seen with 95% of all there is. When it easily could have been 0.01% of all there is. World is much bigger than we think :)

hbrn 9 hours ago | parent | prev [-]

>all you need to do to make an agentic RAG is to have the LLM be able to write/rewrite its own search queries (possibly in multiple passes)

I think this is a huge oversimplification, the term "search query" is doing a lot of heavy lifting here.

When Claude Code calls something like

  find . -type d -maxdepth 3 -not -path '*/node_modules/*'
to understand the project hierarchy before doing any of the grep calls, I don't think it's fair to call it just a "search query", it's more like "analyze query". Just because text goes in and out in both cases, doesn't mean that it's all the same.

When you give the agent the ability to query the nature of the data (e.g. hierarchy), and not just data itself, it means that you need to design your product around it. Agentic RAG has entirely different implementation, product implications, cost, latency, and primarily, outcomes. I don't think it's useful to pretend that it's just a different flavor of the same thing, simply because at the end of the day it's just some text flying over the network.

whakim 2 days ago | parent | prev [-]

I don't think we should undersell that transformers and semantic search are really powerful information retrieval tools, and they are extremely potent for solving search problems. That being said, I think I agree with you that RAG is fundamentally just search, and the hype (like any hype) elides the fact that you still have to solve all of the normal, difficult search problems.

maCDzP 2 days ago | parent | prev | next [-]

Do you have any good resources for what you are describing?

financltravsty a day ago | parent | prev | next [-]

I simply have no idea what you're babbling about. I'm not trying to be rude, but I really cannot parse what you're saying.

Simple RAG is fine for very simple workflows, but semantic similarity vector search has a lot of edge cases and isn't the best tool out there. RIG or even recursive LLMs work better in the general case.

Whatever you're saying, it does not really mesh with my experience.

Mrngl1991 a day ago | parent | prev [-]

[dead]