Remix clone Hacker News

new | show | ask | jobs Github

	▲	sosodev 2 days ago
		What models are you testing? A 120b model with hybrid attention should fit within 80gb of VRAM fine at a 4-bit quant. Also, 4-bit quants that are done well are generally fine. They certainly don’t make the model unusable.