Remix clone Hacker News

new | show | ask | jobs Github

	▲	Tepix 3 days ago
		In general, if all you do is inference with a model that’s in VRAM, you’re right. OTOH it’s simply a matter of picking the right mainboard. If you have one of those sweet new MoE models that won‘t completely fit in your VRAM, offloading means you want PCIe bandwidth, because it will be a bottleneck. Also swapping between LLMs will be faster.