▲ | angusturner 2 days ago | |
You assume that for small steps (I.e taking some noisy code and slightly denoising) you can make an independence assumption. (All tokens conditionally independent, given the current state). Once you chain many steps you get a very flexible distribution that can model all the interdependencies. A stats person could probably provide more nuance, although two interesting connection I’ve seen: There is some sense in which diffusion generalises autoregression, because you don’t have to pick an ordering when you factor the dependency graph. (Or put otherwise, for some definitions of diffusion you can show autoregression to be a special case). | ||
▲ | skydhash a day ago | parent [-] | |
There’s a reason we have formal verification as the highest guarantee for software. To ensure that we have a complete assurance of what the program can and can not do, the semantic of each of its components needs to be known. Recursively. A minor change in one token can change the meaning of the whole software. Programming is just trying to enforce semantics on instructions (how well is that done is software engineering’s realm) An algorithm like merge sort is just semantic constraints. Which is why most books go with their own notations as code does not really matter. At most, LLMs and diffusion can be regarded as fancy searches. But, what you actually want is semantics and that’s why you can design lots of stuff on paper. But we do it with the code editor because feedbacks are nice and libraries’ documentations (if they exist) lie about their semantics. And we read code because there’s nothing more complete about semantics than that. |