Remix clone Hacker News

new | show | ask | jobs Github

	▲	gundmc 7 days ago
		Well, their huge GPU clusters have "insane VRAM". Once you can actually load the model without offloading, inference isn't all that computationally expensive for the most part.