Remix clone Hacker News

new | show | ask | jobs Github

	▲	Kirby64 3 hours ago
		> Would "lots of gpus" even help for huge models? Maybe this is exposing my lack of knowledge but don't you need to keep the whole model and context in a single GPU's VRAM? How do you think the large providers do inference? No single GPU has 1TB plus of memory on board. It’s a cluster of a bunch of gpus.