Remix.run Logo
bhadass 13 hours ago

Some of NVIDIA's models also tend to have interesting architectures. For example, usage of the MAMBA architecture instead of purely transformers: https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-t...

nextos 12 hours ago | parent [-]

Deep SSMs, including the entire S4 to Mamba saga, are a very interesting alternative to transformers. In some of my genomics use cases, Mamba has been easier to train and scale over large context windows, compared to transformers.