| ▲ | quantadev 7 months ago | ||||||||||||||||
Anyone able to summarize the current 'hold up' with diffusion models? I know exactly how Transformers work, but I'm not a diffusion expert. Diffusion is so much more powerful tho (from what I know) it seems like diffusion would already be beating Transformers. Why isn't it? | |||||||||||||||||
| ▲ | boroboro4 7 months ago | parent [-] | ||||||||||||||||
Diffusion is about what goes into the model and what’s a result (in this case it’s denoising of the content) as opposed to autoregressive models (where the process is to predict continuation based on prefix). It’s orthogonal to model architecture, which can be transformer or (for example) mamba. I’m pretty sure Gemini diffusion is transformer too. Diffusion brings different set of trade offs, and as you can see it improves speed but I would expect it increases compute required for generation. But this is hard to say for sure without knowing their exact sampling process. Interestingly we have opposite direction in case with gpt-4o, OpenAI made autoregressive image generation model and it seems it works great. | |||||||||||||||||
| |||||||||||||||||