Remix.run Logo
We cut our agent's API costs by 10x with prompt caching(kern-ai.com)
2 points by obilgic 10 hours ago | 2 comments
tatrions 3 hours ago | parent | next [-]

Snapping the trim point to segment boundaries instead of a naive sliding window is the real insight here. Most caching setups I've seen break down because the prefix shifts by a few tokens every turn and you lose the whole cache.

The multi-step turn savings are what make this really add up though. A single user message triggering 5-6 tool calls means 5-6 API calls where everything before the last tool result is cached. That's where you actually get close to the 10x number.

One thing I'd add: this pairs well with routing simpler turns to cheaper models entirely. Caching saves you on input tokens, but if the turn is straightforward enough that Sonnet or gpt-4.1-mini can handle it, you save on both input and output. The two approaches are complementary.

tatrions 5 hours ago | parent | prev [-]

[dead]