Remix clone Hacker News

new | show | ask | jobs Github

	▲	singhrac 3 days ago
		I think we might just disagree about how much of the GPU spend is on small vs large model (inference or training). I think it’s something like 99.9% of spending interest is on models that don’t fit into 128 GB (remember KV cache matters too). Happy to be proven wrong!