| ▲ | dnautics 3 hours ago | |
wait do sota models use mamba-like SSMs? this is the first im hearing this | ||
| ▲ | nl 2 hours ago | parent [-] | |
Qwen 3.5 and above use Gated DeltaNet which alternate attention and SSM layers: https://sebastianraschka.com/llms-from-scratch/ch04/08_delta... | ||