| ▲ | zackangelo 2 days ago | |
You’re right I misunderstood. I’m not sure if it would be of much utility because this would presumably be for tensor parallel workloads. In that case you want the ranks in your cluster to be uniform or else everything will be forced to run at the speed of the slowest rank. You could run pipeline parallel but not sure it’d be that much better than what we already have. | ||
| ▲ | storus 2 days ago | parent [-] | |
It was about this use case: | ||