Remix clone Hacker News

new | show | ask | jobs Github

	▲	varispeed 5 days ago
		So would 40x RPi 5 get 130 token/s?
	▲	SillyUsername 5 days ago \| parent \| next [-]
		I imagine it might be limited by number of layers and you'll get diminishing returns as well at some point caused by network latency.
	▲	reilly3000 5 days ago \| parent \| prev \| next [-]
		It has to be 2^n nodes and limited to one per attention head that the model has.
	▲	VHRanger 5 days ago \| parent \| prev [-]
		Most likely not because of NUMA bottlenecks