Remix clone Hacker News

new | show | ask | jobs Github

	▲	jychang 7 hours ago
		They didn't do something stupid like Llama 4 "one active expert", but 4 of 256 is very sparse. It's not going to get close to Deepseek or GLM level performance unless they trained on the benchmarks. I don't think that was a good move. No other models do this.