| ▲ | himata4113 2 hours ago | |
kvcache residency requirements and general latency for good throughput wants good locality, but you're right it could be split across multiple different parts of a single datacenter, but as I mentioned before the weakest link is before the model is ever loaded onto the gpus. as for reverse engineering I doubt it's something that state sponsored actors would struggle with for too long. | ||