Remix clone Hacker News

new | show | ask | jobs Github

	▲	shawntan a month ago
		I'm curious how the speed is achieved is this is the technique used. Generally I expected this "masked language model" technique to be far slower since the full vocab projection needs to be computed every iteration. I always thought the eventual technique would be some form of diffusion in continuous space, then decoding into the discrete tokens. Also I'm guessing this is a "best guess" of how Gemini Diffusion is done?