▲ | CuriouslyC 7 days ago | |||||||||||||||||||||||||
You can, but my tool actually handles the raw chat context. So you can have millions of tokens in context, and actual message that gets produced for the LLM is an optimized distillate, re-ordered to take into account LLM memory patterns. RAG tools are mostly optimized for QA anyhow, which has dubious carryover to coding tasks. | ||||||||||||||||||||||||||
▲ | olejorgenb 6 days ago | parent [-] | |||||||||||||||||||||||||
> ... re-ordered to take into account LLM memory patterns. If I understand you correctly, doesn't this break prefix KV caching? | ||||||||||||||||||||||||||
|