Remix clone Hacker News

new | show | ask | jobs Github

	▲	Jabrov 3 hours ago
		Yes multiple GPUs absolutely help with inference even for a single model instance. Some models are simply too big to fit on the largest available GPU. Check out tensor parallelism