Remix.run Logo
nextos 16 hours ago

I am not sure I agree we've yet to see any other architecture that competes with a large transformer. For example, in long-range tasks such as those related to genome prediction, state-space models (Mamba) exhibit SOTA performance. I also think it's hard to separate architectural advantages from maturity, given that transformers have received much more attention.