Remix clone Hacker News

new | show | ask | jobs Github

	▲	hypfer 5 hours ago
		That math (250k context, Q4 model, 24GB VRAM) only checks out at q4 quant for the K/V cache, which is probably not the best idea.