Remix clone Hacker News

new | show | ask | jobs Github

	▲	jychang 11 hours ago
		Mamba based LLMs aren't even close to novel though. IBM's been doing this since forever [1]. Also, you're off on Deepseek V3.2's param count, the full model's 685B in size with the MTP layer. I don't think there's anything interesting here other than "I guess AMD put out a research paper", and it's not cutting edge when Deepseek or even IBM is running laps around them. [1] Here's a news article from April, although IBM has been doing it for a long time before that https://research.ibm.com/blog/bamba-ssm-transformer-model
	▲	YouAreWRONGtoo 7 hours ago \| parent [-]
		[dead]