| ▲ | NitpickLawyer 9 hours ago | ||||||||||||||||
Do you have any examples / papers where they do the parallel thing proposed here? I've tried googles diffusion coding model, but AFAICT they don't do parallel thinking & coding. It seems to just take a prompt and output code. What's cool with this thinking & generation in parallel is that one can attend to the other. So you're not limited by prompt influences code, but can do prompt influences both thinking and code, and code can influence thinking and thinking can influence code. | |||||||||||||||||
| ▲ | lossolo 7 hours ago | parent [-] | ||||||||||||||||
They use bidirectional attention between modalities, not within the same modality. This doesn’t change much in the context you're referring to (coding). How do you think "thinking" works in current SOTA models like GPT-5-Thinking/Pro? When generating code, the model's "thinking" already attends to the code, and both influence each other during generation. "Reasoning" models modify the code as they generate it, they delete it, revise it, and adjust their internal reasoning based on the new tokens they produce during the "thinking" process. There are dozens of denoising models created for text, they are not good at it and parallel sampling between modalities will not change that. | |||||||||||||||||
| |||||||||||||||||