It's also in the sense that initial latent vector is Gaussian noise. The transformer loop is de-noising latent space. They just happen to be doing the equivalent of predicting x_0 directly.