▲ | jimmyl02 2 days ago | |
I think there is no way to tell and we can only see with more research and time. One nuanced part that might not be clear is the transformer was a huge part of what made traditional LLMs scale. With the diffusion transformer and newer architectures, it might be possible that transformers can now be applied to diffusion. Diffusion also has the benefit of being able to "think" with the amount of diffusion steps instead of having to output tokens and then reasoning about them. I think it's hard to tell exactly where we are headed but it's an interesting research direction especially now that it's somewhat more validated by Google. |