▲ | angusturner 2 days ago | |
One under appreciated / misunderstood aspect of these models is they use more compute than an equivalent sized autoregressive model. It’s just that for N tokens, autoregressive model has to make N sequential steps. Where diffusion does K x N, with the N being done in parallel. And for K << N. This makes me wonder how well they will scale to many users, since batching requests would presumably saturate the accelerators much faster? Although I guess it depends on the exact usage patterns. Anyway, very cool demo nonetheless. |