Remix.run Logo
vorticalbox 2 hours ago

You want to compact early though as sending the whole chat you will end up with a lot of tokens not in the cache which 1. Costs way more and 2. Will slow the request down as it has to process it all.

SyneRyder an hour ago | parent [-]

I do agree in cases where I'm using API and not the subscription, this would be very costly via API. Not sure why the tokens wouldn't be in the cache though? Seems everything should be cached as long as I'm within the 1 hour caching window? If I'm wrong about how token caching works, I'm eager to learn!

My other concern is, it isn't really a 1 Million context window if we can only use the first 500k, right? But now that I've found that I can re-enable it, I'm happy.

I've previously had sessions go to 700k tokens and still be okay, though it does start drifting at that 700k point. I'm regularly at 300k with no problem.