▲ | kosolam 5 days ago | |
How is this technically done? How does it split the query and aggregates the results? | ||
▲ | magicalhippo 5 days ago | parent [-] | |
From the readme: More devices mean faster performance, leveraging tensor parallelism and high-speed synchronization over Ethernet. The maximum number of nodes is equal to the number of KV heads in the model #70. I found this[1] article nice for an overview of the parallelism modes. [1]: https://medium.com/@chenhao511132/parallelism-in-llm-inferen... |