Remix.run Logo
T-A 10 hours ago

From your link: DeepSeek-V3.2 Release 2025/12/01

From Zebra-Llama's arXiv page: Submitted on 22 May 2025

jychang 4 hours ago | parent [-]

That's still behind the times. Even the ancient dinosaur IBM had released a Mamba model [1] before this paper was even put out.

> Granite-4.0-Tiny-Base-Preview is a 7B-parameter hybrid mixture-of-experts (MoE) language model featuring a 128k token context window. The architecture leverages Mamba-2, superimposed with a softmax attention for enhanced expressiveness, with no positional encoding for better length generalization. Release Date: May 2nd, 2025

I mean, good for them for shipping, I guess. But seriously, I expect any postgrad student to be able to train a similar model with some rented GPUs. They literally teach MLA to undergrads in the basic LLM class at Stanford [2] so this isn't some exactly some obscure never-heard-of concept.

[1] https://huggingface.co/ibm-granite/granite-4.0-tiny-base-pre...

[2] https://youtu.be/Q5baLehv5So?t=6075