▲ | macawfish 2 days ago | |
I'm probably not understanding your point but did you look at the paper? This explicitly does diffusion in an autoencoded latent space of the autoregressive prediction itself. The starting point is that prediction, but depending on how much noise is used, the diffusion model itself directly contributes to the prediction process to some degree or another. It should be trivial to make an encoder that has some memory of at least part of the prompt (say the tailing part) and do a diffusion step there too. |