Remix clone Hacker News

new | show | ask | jobs Github

	▲	zozbot234 40 minutes ago
		Speculative decoding is not that useful at scale, it's mostly about making local single-user inference faster. When you're batching multiple inferences together, that's already as fast as the verification you have to perform w/ speculative decoding.