| ▲ | michaelscott 6 hours ago | ||||||||||||||||
Nothing you've said about reasoning here is exclusive to LLMs. Human reasoning is also never guaranteed to be deterministic, excluding most correct solutions. As OP says, they may not be reasoning under the hood but if the effect is the same as a tool, does it matter? I'm not sure if I'm up to date on the latest diffusion work, but I'm genuinely curious how you see them potentially making LLMs more deterministic? These models usually work by sampling too, and it seems like the transformer architecture is better suited to longer context problems than diffusion | |||||||||||||||||
| ▲ | LoganDark 6 hours ago | parent [-] | ||||||||||||||||
The way I imagine greedy sampling for autoregressive language models is guaranteeing a deterministic result at each position individually. The way I'd imagine it for diffusion language models is guaranteeing a deterministic result for the entire response as a whole. I see diffusion models potentially being more promising because the unit of determinism would be larger, preserving expressivity within that unit. Additionally, diffusion language models iterate multiple times over their full response, whereas autoregressive language models get one shot at each token, and before there's even any picture of the full response. We'll have to see what impact this has in practice; I'm only cautiously optimistic. | |||||||||||||||||
| |||||||||||||||||