We cut our agent's API costs by 10x with prompt caching

	▲	We cut our agent's API costs by 10x with prompt caching(kern-ai.com)
		2 points by obilgic 10 hours ago \| 2 comments

	▲	tatrions 3 hours ago \| parent \| next [-]
		Snapping the trim point to segment boundaries instead of a naive sliding window is the real insight here. Most caching setups I've seen break down because the prefix shifts by a few tokens every turn and you lose the whole cache. The multi-step turn savings are what make this really add up though. A single user message triggering 5-6 tool calls means 5-6 API calls where everything before the last tool result is cached. That's where you actually get close to the 10x number. One thing I'd add: this pairs well with routing simpler turns to cheaper models entirely. Caching saves you on input tokens, but if the turn is straightforward enough that Sonnet or gpt-4.1-mini can handle it, you save on both input and output. The two approaches are complementary.
	▲	tatrions 5 hours ago \| parent \| prev [-]
		[dead]