Remix clone Hacker News

new | show | ask | jobs Github

	▲	vibe42 7 hours ago
		This is orthogonal to quantisation. Could have big impact on smaller models in the 4B-14B range where people often try specific quants and context sizes to fit into the VRAM of a laptop/desktop GPU.