▲ | nextos 8 days ago | ||||||||||||||||
The xLSTM could become a good alternative to transformers: https://arxiv.org/abs/2405.04517. On very long contexts, such as those arising in DNA models, these models perform really well. There's a big state-space model comeback initiated by the S3-Mamba saga. RWKV, which is a hybrid between classical RNNs and transformers, is also worth mentioning. | |||||||||||||||||
▲ | bob1029 8 days ago | parent [-] | ||||||||||||||||
I was just about to post this. There was a MLST podcast about it a few days ago: https://www.youtube.com/watch?v=8u2pW2zZLCs Lots of related papers referenced in the description. | |||||||||||||||||
|