Remix.run Logo
nextos 8 days ago

The xLSTM could become a good alternative to transformers: https://arxiv.org/abs/2405.04517. On very long contexts, such as those arising in DNA models, these models perform really well.

There's a big state-space model comeback initiated by the S3-Mamba saga. RWKV, which is a hybrid between classical RNNs and transformers, is also worth mentioning.

bob1029 8 days ago | parent [-]

I was just about to post this. There was a MLST podcast about it a few days ago:

https://www.youtube.com/watch?v=8u2pW2zZLCs

Lots of related papers referenced in the description.

RossBencina 7 days ago | parent [-]

One claim from that podcast was that the xLSTM attention mechanism is (in practical implementation) more efficient than (transformer) flash attention, and therefore promises to significantly reduces the time/cost of test-time compute.

korbip 7 days ago | parent [-]

Test it out here:

https://github.com/NX-AI/mlstm_kernels

https://huggingface.co/NX-AI/xLSTM-7b