Remix clone Hacker News

new | show | ask | jobs Github

	▲	zkmon 10 hours ago
		I'm guessing 3.5-27b would beat 3.6-35b. MoE is a bad idea. Because for the same VRAM 27b would leave a lot more room, and the quality of work directly depends on context size, not just the "B" number.
	▲	zozbot234 9 hours ago \| parent \| next [-]
		MoE is not a bad idea for local inference if you have fast storage to offload to, and this is quickly becoming feasible with PCIe 5.0 interconnect.
	▲	perbu 8 hours ago \| parent \| prev [-]
		MoE is excellent for the unified memory inference hardware like DGX Sparc, Apple Studio, etc. Large memory size means you can have quite a few B's and the smaller experts keeps those tokens flowing fast.