▲ | fareesh 5 days ago | |||||||
short answer is that in rag systems the documents are chunked into some predefined size (you can pick a size based on your use-case) and the text is converted into vector embeddings (e.g. use the openai embed API) and stored in a vector database like chroma or pinecone or pg_vector in postgres then your query is converted into embeddings and the top N chunks are returned via similarity search (cosine or dot product or some other method) - this has advantages over bm25 which is lexical then you can do some processing or just hand over all the chunks as context saying "here are some documents use them to answer this question" + your query to the llm | ||||||||
▲ | potato-peeler 5 days ago | parent [-] | |||||||
> then you can do some processing or just hand over all the chunks as context saying "here are some documents use them to answer this question" + your query to the llm This part is what I want to understand. How does the llm “frame” an answer? | ||||||||
|