▲ | olejorgenb 6 days ago | ||||||||||||||||
> ... re-ordered to take into account LLM memory patterns. If I understand you correctly, doesn't this break prefix KV caching? | |||||||||||||||||
▲ | CuriouslyC 6 days ago | parent [-] | ||||||||||||||||
It is done at immediately before the LLM call, transforming the message history for the API call. This does reduce the context cache hit rate a bit, but I'm cache aware so I try to avoid repacking the early parts if I can help it. The tradeoff is 100% worth it though. | |||||||||||||||||
|