Remix clone Hacker News

new | show | ask | jobs Github

	▲	Ritewut 6 hours ago
		Tokens per second. The difference between 8B and something like 16B is not as big as you might think in practical usage and 8B is a lot faster and interactive than 16B but there are certain things where it is useful to farm it out to the large model.
	▲	Natalia724 5 hours ago \| parent [-]
		Agree. For local coding help, latency often matters more than raw benchmark quality. A slightly weaker model that answers immediately changes how often you reach for it.