Remix clone Hacker News

new | show | ask | jobs Github

	▲	embedding-shape 3 hours ago
		> They're likely to not be feasibly scalable far beyond the 26B size of DiffusionGemma itself I think people used to say the same about the 8B text-diffusion models too when they came out, like LLaDA. LLaDA2.0 seemingly claims 100B total / 6.1B active MoE diffusion (DiffusionGemma is also MoE). Not saying you're wrong about the current consensus, but it has a way of changing over time, might be a bit early to claim it's infeasible to scale them, especially considering the final artifact being much more suitable for local usage.