Remix clone Hacker News

This is a super interesting claim - can you point to these benchmarks?

cubefox 2 days ago | parent | next [-]

> Gemini Diffusion’s external benchmark performance is comparable to much larger models, whilst also being faster.

That doesn't necessarily mean that they scale as well as autoregressive models.

	▲	jimmyl02 2 days ago \| parent [-]
		I think there is no way to tell and we can only see with more research and time. One nuanced part that might not be clear is the transformer was a huge part of what made traditional LLMs scale. With the diffusion transformer and newer architectures, it might be possible that transformers can now be applied to diffusion. Diffusion also has the benefit of being able to "think" with the amount of diffusion steps instead of having to output tokens and then reasoning about them. I think it's hard to tell exactly where we are headed but it's an interesting research direction especially now that it's somewhat more validated by Google.

mdp2021 2 days ago | parent | prev | next [-]

Try this one:

# d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning

	▲	mdp2021 2 days ago \| parent [-]
		I.e.: https://arxiv.org/html/2410.14157v3 # Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning