Qwen 3.5 and above use Gated DeltaNet which alternate attention and SSM layers:
https://sebastianraschka.com/llms-from-scratch/ch04/08_delta...