Remix clone Hacker News

new | show | ask | jobs Github

	▲	ac29 3 hours ago
		> I'm curious what the downside for this speed is here "DiffusionGemma's speedup is designed for local and low-concurrency inference. In high-QPS cloud serving, autoregressive models can be deployed to saturate compute efficiently, so DiffusionGemma's parallel decoding offers diminishing returns and can result in higher serving costs"