| ▲ | LoganDark 6 hours ago | |||||||
The way I imagine greedy sampling for autoregressive language models is guaranteeing a deterministic result at each position individually. The way I'd imagine it for diffusion language models is guaranteeing a deterministic result for the entire response as a whole. I see diffusion models potentially being more promising because the unit of determinism would be larger, preserving expressivity within that unit. Additionally, diffusion language models iterate multiple times over their full response, whereas autoregressive language models get one shot at each token, and before there's even any picture of the full response. We'll have to see what impact this has in practice; I'm only cautiously optimistic. | ||||||||
| ▲ | michaelscott 6 hours ago | parent [-] | |||||||
I guess it depends on the definition of deterministic, but I think you're right and there's strong reason to expect this will happen as they develop. I think the next 5 - 10 years will be interesting! | ||||||||
| ||||||||