Remix.run Logo
jauntywundrkind 2 days ago

Lot of hype, but man does Voltron Data keep blowing me away with what they bring out. Mad respect.

> There’s a strong argument to be made that RAPIDS cuDF/RAPIDS libcudf drives NVIDIA’s CUDA-X Data Processing stack, from ETL (NVTabular) and SQL (BlazingSQL) to MLOps/security (Morpheus) and Spark acceleration (cuDF-Java).

Yeah this seems like the core indeed, libcudf.

Focus here is on TCP & GPUDirect (Nvidia's pci-p2p, letting for example RDMA without CPU involvement across a full GPU -> NIC -> switch -> nic -> GPU happen).

Personally it feels super dangerous to just trust Nvidia on all of this, to just buy the solution available. Pytorch nicely sees this somewhat, adopted & took over Facebook/Meta's gloo project, which wraps a lot of the rdma efforts. But man there's just so so many steps ahead that Theseus is here with figuring out & planning what to do with these capabilities, these ultra efficient links, figuring out how to not need to use them if possible! The coordination problems keep growing in computing. I think of RISC-V with its arbitrary vector-based alternative to conventional x86 simd, going from a specific instruction for each particular operation to instructions being parameterized over different data lengths & types. https://github.com/pytorch/gloo

I'd really like to see a concerted effort to around Ultra Ethernet emerge, fast. Hardware isnt really available, and it's going to start out being absurdly expensive. But Ultra Ethernet looks like a lovely mix of collision-less credit-based Infiniband RDMA and Ethernet, with lots of other niceties (transport security). Deployments just starting (AMD Pensando Pollara 400 at Oracle). I worry that without broader availability & interest, without mass saturation, AI is going to stay stuck on libcudf forever; getting hardware out there & getting software stacos using it is a chicken & egg problem that big players need to spend real effort accelerating UET or else. https://www.tomshardware.com/networking/amd-deploys-its-firs...

latchkey 2 days ago | parent [-]

Our MI300x boxes have had 8x400G Thor2 RDMA working great for a year now.