DeepSeek’s official API has a cache hit rate of over 99% if you use it continuously within the same codebase for long sessions, so it’s much cheaper than frontier models. I have an example of 200M token session in claude code.

▲

halfwhey an hour ago | parent [-]

Might be a dumb question but do you have to read the files in the same order in new sessions to ensure the correct prefix for the cache?

	▲	WatchDog 30 minutes ago \| parent \| next [-]
		Yes, you have to use the same session, I guess you could load up a bunch of context, then fork the session into a few different tasks, although I haven't tried it.
	▲	weiliddat 34 minutes ago \| parent \| prev \| next [-]
		Also curious. With tool calls reading/searching different files, possible compacting reading a large codebase / long threads, I can't imagine how you hit 99% cache rate.
	▲	naaqq 33 minutes ago \| parent \| prev \| next [-]
		Sorry, I was wrong here. I meant a single long session. And there’s no compression, the 1M context is only half used.
	▲	johndough 31 minutes ago \| parent \| prev [-]
		> do you have to read the files in the same order in new sessions to ensure the correct prefix for the cache? No, you do not need to re-read files. The entire conversation history is sent to the server for every prompt, so it already contains a copy of the files.