That math (250k context, Q4 model, 24GB VRAM) only checks out at q4 quant for the K/V cache, which is probably not the best idea.