Remix clone Hacker News

new | show | ask | jobs Github

	▲	ryandrake 3 hours ago
		Yea, I'm also kind of jealous of Apple folks with their unified RAM. On a traditional homelab setup with gobs of system RAM and a GPU with relatively little VRAM, all that system RAM sits there useless for running LLMs.
	▲	zozbot234 3 hours ago \| parent \| next [-]
		That "traditional" setup is the recommended setup for running large MoE models, leaving shared routing layers on the GPU to the extent feasible. You can even go larger-than-system-RAM via mmap, though at a non-trivial cost in throughput.
	▲	2 hours ago \| parent \| prev \| next [-]
		[deleted]
	▲	khimaros 2 hours ago \| parent \| prev [-]
		Strix Halo is another option