Remix.run Logo
bleepblap 2 days ago

I think you might be swapping RDMA with RoCE - RDMA can happen entirely within a single node. For example between an NVME and a GPU.

wmf 2 days ago | parent [-]

Within a single node it's just called DMA. RDMA is DMA over a network and RoCE is RDMA over Ethernet.

bleepblap 2 days ago | parent [-]

Sorry, but it certainly isn't--

https://docs.nvidia.com/cuda/gpudirect-rdma/index.html

The "R" in RDMA means there are multiple DMA controllers who can "transparently" share address spaces. You can certainly share address spaces across nodes with RoCE or Infiniband, but thats a layer on top

wtallis 2 days ago | parent | next [-]

I don't know why that NVIDIA document is wrong, but the established term for doing DMA from eg. an NVMe SSD to a GPU within a single system without the CPU initiating the transfer is peer to peer DMA. RDMA is when your data leaves the local machine's PCIe fabric.

wmf 2 days ago | parent | prev [-]

I'm going to agree to disagree with Nvidia here.