Remix clone Hacker News

new | show | ask | jobs Github

	▲	littlestymaar 3 hours ago
		It doesn't need to, during inference there's little data exchange between one chip and another (just a single embedding vector per token). It's completely different during training because of the backward pass and weight update, which put a lot of strain on the inter-chip communication, but during inference even x4 PCIe4.0 is enough to connect GPUs together and not lose speed.