| ▲ | acepl 3 hours ago | |||||||
What is the probability that two customers will have exactly the same tokens in cache? Wouldnt it require using the exact same CLAUDE.md, skills, MCPs and context? After that it is even worse since the nondeterminism of LLMs and humans | ||||||||
| ▲ | 27183 3 hours ago | parent | next [-] | |||||||
I suspect what GP is getting at is there will be a strong incentive to implement some structural sharing across tenants to avoid redundantly storing the same tokens over and over. At least I'd be tempted to do this if I was working with a very precious, constrained resource (e.g. VRAM). Doing this correctly seems.. very difficult. [edit] To answer your question directly: the probability that the entire cache is identical between two different users is very low, but the probability that there exists identical chunks of cache between two different users is very high. Exploiting those commonalities successfully will significantly compress the data. | ||||||||
| ||||||||
| ▲ | dezgeg 3 hours ago | parent | prev | next [-] | |||||||
System prompt for something like Claude Code should be identical, no? | ||||||||
| ▲ | cmrdporcupine 33 minutes ago | parent | prev [-] | |||||||
Could just be a bug in the radix tree for the KVCache with deeper, wrong, levels of the trie returning for the same initial prefix match. | ||||||||