Tensor Parallel test with RDMA last week https://x.com/anemll/status/1996349871260107102
Note fast sync workaround