▲ | boroboro4 2 days ago | |||||||
Diffusion is about what goes into the model and what’s a result (in this case it’s denoising of the content) as opposed to autoregressive models (where the process is to predict continuation based on prefix). It’s orthogonal to model architecture, which can be transformer or (for example) mamba. I’m pretty sure Gemini diffusion is transformer too. Diffusion brings different set of trade offs, and as you can see it improves speed but I would expect it increases compute required for generation. But this is hard to say for sure without knowing their exact sampling process. Interestingly we have opposite direction in case with gpt-4o, OpenAI made autoregressive image generation model and it seems it works great. | ||||||||
▲ | atq2119 2 days ago | parent [-] | |||||||
Diffusion could potentially be more efficient for local inference. With auto-regressive models, token generation is basically one token at a time, and so is not compute intensive at all -- it's bandwidth bound. With diffusion, you always run the model on a decently sized batch of tokens, so you should be (close to) compute bound even for local inference. If the "output quality per compute" is roughly the same for diffusion and auto-regression (is it? I have no idea...), then diffusion will be much more efficient for local inference because the same amount of compute can be packed into a much shorter time period. | ||||||||
|