Remix.run Logo
shawntan a day ago

I'm curious how the speed is achieved is this is the technique used. Generally I expected this "masked language model" technique to be far slower since the full vocab projection needs to be computed every iteration.

I always thought the eventual technique would be some form of diffusion in continuous space, then decoding into the discrete tokens.

Also I'm guessing this is a "best guess" of how Gemini Diffusion is done?