BUILD AI has a post about this and in particular sharding k-v cache across GPUs, and how network is the new memory hierarchy:
https://buildai.substack.com/p/kv-cache-sharding-and-distrib...