They use bidirectional attention between modalities, not within the same modality. This doesn’t change much in the context you're referring to (coding). How do you think "thinking" works in current SOTA models like GPT-5-Thinking/Pro? When generating code, the model's "thinking" already attends to the code, and both influence each other during generation. "Reasoning" models modify the code as they generate it, they delete it, revise it, and adjust their internal reasoning based on the new tokens they produce during the "thinking" process. There are dozens of denoising models created for text, they are not good at it and parallel sampling between modalities will not change that.

▲

ricardobeat 4 hours ago | parent [-]

They cannot "edit" the code though, like you can with diffusion. They must emit all tokens again, or a patch/diff which is not directly connected to the previous stream of tokens.

	▲	lossolo 2 hours ago \| parent [-]
		LLMs can "edit" code, but as you say, they do it differently from diffusion models. They operate directly on long text sequences and use much more context, which is one reason they currently work better for coding. Diffusion models for code aren't a new idea, people have tried different designs, but so far they tend to underperform autoregressive LLMs, probably because denoising over discrete tokens is harder to make work than straightforward next token prediction.