Remix clone Hacker News

new | show | ask | jobs Github

	▲	kosolam 5 days ago
		How is this technically done? How does it split the query and aggregates the results?
	▲	magicalhippo 5 days ago \| parent [-]
		From the readme: More devices mean faster performance, leveraging tensor parallelism and high-speed synchronization over Ethernet. The maximum number of nodes is equal to the number of KV heads in the model #70. I found this[1] article nice for an overview of the parallelism modes. [1]: https://medium.com/@chenhao511132/parallelism-in-llm-inferen...