Remix.run Logo
osigurdson 11 hours ago

Are the documents individually large or fairly small - like a page or two each? If they are small docs since you already have Postgres, you can just add the pgvector extension determine what embeddings that you want to use and try it out without committing to much. Maybe add a hash column first so that you can avoid paying to compute the embeddings again if you decide to use a different approach. They are all basically doing the same math to find things so you aren't going to get magically better results with other things. If the docs are larger then you have to do chunking anyway.

Would the 10M documents be searched with a single vector search or would it be pre-filtered by other columns in your table first. If some prefiltering is happening it naturally make things faster. You will likely want to use regular text / tsvector based search as well and potentially feed the LLM with this as well since vector search isn't perfect.

You would then decide if you want to do re-ranking or not before handing it to the final LLM context window. These days, models are pretty good so they will do their own re-ranking to some extent but depends a bit on cost, latency and the quality of result that you are looking for.