| ▲ | T-A 10 hours ago | |
From your link: DeepSeek-V3.2 Release 2025/12/01 From Zebra-Llama's arXiv page: Submitted on 22 May 2025 | ||
| ▲ | jychang 4 hours ago | parent [-] | |
That's still behind the times. Even the ancient dinosaur IBM had released a Mamba model [1] before this paper was even put out. > Granite-4.0-Tiny-Base-Preview is a 7B-parameter hybrid mixture-of-experts (MoE) language model featuring a 128k token context window. The architecture leverages Mamba-2, superimposed with a softmax attention for enhanced expressiveness, with no positional encoding for better length generalization. Release Date: May 2nd, 2025 I mean, good for them for shipping, I guess. But seriously, I expect any postgrad student to be able to train a similar model with some rented GPUs. They literally teach MLA to undergrads in the basic LLM class at Stanford [2] so this isn't some exactly some obscure never-heard-of concept. [1] https://huggingface.co/ibm-granite/granite-4.0-tiny-base-pre... | ||