Remix clone Hacker News

new | show | ask | jobs Github

	▲	arcanemachiner 3 hours ago
		The easiest way would be to quantize the model, and serve different quants based on the current demand. Higher volumes == worse quant == more customers served per GPU