Remix.run Logo
jychang 11 hours ago

Mamba based LLMs aren't even close to novel though. IBM's been doing this since forever [1].

Also, you're off on Deepseek V3.2's param count, the full model's 685B in size with the MTP layer.

I don't think there's anything interesting here other than "I guess AMD put out a research paper", and it's not cutting edge when Deepseek or even IBM is running laps around them.

[1] Here's a news article from April, although IBM has been doing it for a long time before that https://research.ibm.com/blog/bamba-ssm-transformer-model

YouAreWRONGtoo 7 hours ago | parent [-]

[dead]