| ▲ | GaggiX 2 hours ago | |
Well with a standard autoregressive model you can generate for example 256 tokens at once if you have 256 users, with this approach you can generate 256 tokens for a single user but you need several forward steps. So the diffusion process takes more GFLOPs, if you have enough users you can already balance memory and compute. | ||
| ▲ | minimaxir 2 hours ago | parent [-] | |
Batching is a fair counterpoint. | ||