Remix.run Logo
hypfer 5 hours ago

That math (250k context, Q4 model, 24GB VRAM) only checks out at q4 quant for the K/V cache, which is probably not the best idea.