Remix clone Hacker News

new | show | ask | jobs Github

	▲	GaggiX 2 hours ago
		Well with a standard autoregressive model you can generate for example 256 tokens at once if you have 256 users, with this approach you can generate 256 tokens for a single user but you need several forward steps. So the diffusion process takes more GFLOPs, if you have enough users you can already balance memory and compute.
	▲	minimaxir 2 hours ago \| parent [-]
		Batching is a fair counterpoint.