▲ | benob 2 days ago | |
I guess autoregressive llms can be finetuned (or continual-pretrained) to do inference using diffusion. We've seen a recent paper (which I don't remember) training from scratch, but it seems overkill. Do Google say how they did it? Also, does diffusion have the potential to increase speed of cpu-only inference? |