Remix.run Logo
olejorgenb 6 days ago

> ... re-ordered to take into account LLM memory patterns.

If I understand you correctly, doesn't this break prefix KV caching?

CuriouslyC 6 days ago | parent [-]

It is done at immediately before the LLM call, transforming the message history for the API call.

This does reduce the context cache hit rate a bit, but I'm cache aware so I try to avoid repacking the early parts if I can help it. The tradeoff is 100% worth it though.

psadri 5 days ago | parent [-]

I’m curious about this project (I’m working on something similar). Anyway to get in contact with you?

CuriouslyC 5 days ago | parent [-]

you can click my spam protected email links on https://sibylline.dev, those should be working now. Any CTA will get me.