Remix.run Logo
bicsi 5 days ago

What if I told you that one can model bidirectional attention just by recurring over causal attention, and it’s still fast enough? Hint: It’s called chain of thought.

I strongly believe it’s time to discontinue diffusion models, solely on the fact that iterated auto-regression is faster, more parallelizable, and just as potent with proper prompting techniques (of course, unless you consider CoT as a form of diffusion, which it essentially is).

fancyfredbot 5 days ago | parent | next [-]

I'd respectfully suggest that it's perhaps not time to "discontinue diffusion models". Minsky and Papert set AI back by decades by suggesting neural networks were a dead end which couldn't learn XOR. There's not a chance of an HN comment having the same effect of course but my point is that it's easy to dismiss things prematurely.

smus 5 days ago | parent | prev | next [-]

Can you explain how CoT is a form of diffusion or models bidirectional attn?

SalmoShalazar 5 days ago | parent | prev [-]

Chain of thought is not a form of diffusion. Diffusion models clearly have characteristics that are useful and worthy of further research and should not be “discontinued“