Remix clone Hacker News

new | show | ask | jobs Github

	▲	swalsh a day ago
		Its coding to coding. I could care less how the model is architected, i only care how it performs in a real world scenario.
	▲	petu a day ago \| parent \| next [-]
		If you don't care about how it's architectured, why you care about size? Compare it to Q3.5 397B-A17B. Just like smaller size models are speed / cost optimization, so is MoE. G4 26B-A4B goes 150 t/s on 4090/5090, 80 t/s on M5 Max. Q3.5 35B-A3B is comparably fast. They are flash-lite/nano class models. G4 31B despite small increase in total parameter count is over 5 times slower. Q3.5 27B is comparably slow. They are approximating flash/mini class models (I believe sizes of proprietary models in this class are closer to Q3.5 122B-A10B or Llama 4 Scout 109B-A17B).
	▲	daemonologist a day ago \| parent \| prev [-]
		The implication is that there is (should be) a major speed difference - naively you'd expect the MoE to be 10x faster and cheaper, which can be pretty relevant on real world tasks.