Remix clone Hacker News

new | show | ask | jobs Github

	▲	dist-epoch a day ago
		After you load the weights into the GPU and keep the KV cache there too, you don't need any other significant traffic.
	▲	numpad0 a day ago \| parent [-]
		Even in tensor parallel modes? I thought it could only work if you're fine stalling all but n GPU for n users at any given moments.