| ▲ | veunes 9 hours ago | |
Bingo. Even if some magic drops tomorrow that compresses the KV cache down to literally zero bits, that saved VRAM will instantly get swallowed up by bumping the batch size or pushing the context window to 10 million tokens. There is no such thing as "excess memory" in ML, only under-trained models | ||