▲ | skydhash 2 days ago | |||||||
But what about the dependency graph between symbols in the program. Because all those symbols have high constraints around them which is the program design. The issue comes in image diffusion as well. When you ask it for a portrait and some details are wrong. That’s because the face has constraints (which you learn about as an artist). Patterns and probability won’t help you. | ||||||||
▲ | angusturner 2 days ago | parent [-] | |||||||
You assume that for small steps (I.e taking some noisy code and slightly denoising) you can make an independence assumption. (All tokens conditionally independent, given the current state). Once you chain many steps you get a very flexible distribution that can model all the interdependencies. A stats person could probably provide more nuance, although two interesting connection I’ve seen: There is some sense in which diffusion generalises autoregression, because you don’t have to pick an ordering when you factor the dependency graph. (Or put otherwise, for some definitions of diffusion you can show autoregression to be a special case). | ||||||||
|