| ▲ | brookst 17 hours ago | ||||||||||||||||
Claude code gets >98% KV cache hits. It’s not reprocessing unless you let the cache go cold (5 minutes, which is annoyingly short). | |||||||||||||||||
| ▲ | killerstorm 16 hours ago | parent | next [-] | ||||||||||||||||
I meant caching on a bigger level. If you're an organization with 100 developers each doing 10 sessions a day, you're paying for 10000x tokens in frequently used document even if you had 100% KV cache hits within one session. Apparently that's too costly even for companies with trillion dollar market cap... Normally KV cache works only if your context prefix is identical, but there are papers which demonstrate documents can be cached between different contexts. | |||||||||||||||||
| |||||||||||||||||
| ▲ | beoberha 16 hours ago | parent | prev | next [-] | ||||||||||||||||
I believe OP is talking about new sessions or after compaction. He’s getting at the fact that LLMs are stateless and have to rediscover your codebase on every new session. | |||||||||||||||||
| |||||||||||||||||
| ▲ | dgellow 6 hours ago | parent | prev [-] | ||||||||||||||||
Are you sure that hitting the cache mean you’re not paying for those tokens? | |||||||||||||||||