▲ | pertymcpert 2 days ago | |||||||||||||||||||||||||||||||
I have the exact same questions as you. I can barely understand how diffusion works for images, for sequential data like text it makes no sense to me. | ||||||||||||||||||||||||||||||||
▲ | janalsncm 2 days ago | parent [-] | |||||||||||||||||||||||||||||||
Let’s suppose we have 10k possible tokens in the vocabulary. Then text would be an image 10k pixels tall and N pixels wide, where N is the length of the text. For each column, exactly 1 pixel is white (corresponding to the word which is there) and the rest are black. Then the diffusion process is the same. Repeatedly denoising. | ||||||||||||||||||||||||||||||||
|