| ▲ | jychang 11 hours ago | |
Mamba based LLMs aren't even close to novel though. IBM's been doing this since forever [1]. Also, you're off on Deepseek V3.2's param count, the full model's 685B in size with the MTP layer. I don't think there's anything interesting here other than "I guess AMD put out a research paper", and it's not cutting edge when Deepseek or even IBM is running laps around them. [1] Here's a news article from April, although IBM has been doing it for a long time before that https://research.ibm.com/blog/bamba-ssm-transformer-model | ||
| ▲ | YouAreWRONGtoo 7 hours ago | parent [-] | |
[dead] | ||