Remix.run Logo
dnautics 3 hours ago

wait do sota models use mamba-like SSMs? this is the first im hearing this

nl 2 hours ago | parent [-]

Qwen 3.5 and above use Gated DeltaNet which alternate attention and SSM layers:

https://sebastianraschka.com/llms-from-scratch/ch04/08_delta...