Remix clone Hacker News

new | show | ask | jobs Github

	▲	Dylan16807 13 hours ago
		No, no, nothing like that. Every layer of an LLM runs separately and sequentially, and there isn't much data transfer between layers. If you wanted to, you could put each layer on a separate GPU with no real penalty. A single request will only run on one GPU at a time, so it won't go faster than a single GPU with a big RAM upgrade, but it won't go slower either.
	▲	oblio 5 hours ago \| parent [-]
		Interesting, thank you for the feedback, it's definitely worth looking into!