▲ | AmazingTurtle 10 hours ago | |
At everfind.ai, we've found a middle ground that leverages both structured and unstructured data effectively in retrieval systems. We utilize a linear OpenSearch index for chunked information but complement this by capturing structured metadata during ingestion—either via integrations or through schema extraction using LLMs. This structured metadata allows us to take full advantage of OpenSearch's field-type capabilities. At retrieval time, our approach involves a broad "prefetching" step: we quickly identify the most relevant schemas, perform targeted vector searches within these schemas, and then rerank the top results using the LLM before agentic reasoning and execution. The LLM is provided with carefully pre-selected tools and fields, empowering it to dive deeper into prefetched results or explore alternate queries dynamically. This method significantly boosts RAG pipeline performance, ensuring both speed and relevance. Additionally, by limiting visibility of the "agentic execution context" to just the current operation span and collapsing it in subsequent interactions, we keep context sizes manageable, further enhancing responsiveness and scalability. | ||
▲ | supo 3 hours ago | parent [-] | |
This article focuses on ways to make "pre-fetching" more accurate, reducing or eliminating the need for reranking to improve latency/cost but also sometimes quality - for example if you use a text cross-encoder to rerank your structured objects, you'll find that those rerankers don't actually understand much of the numbers, locations and other data like that. |