Remix.run Logo
michalsustr 2 days ago

I'm working on a python library that allows you to put data on GPU as fast as your network/storage allows you to at the hardware limits. No more clunky DataLoaders that are too slow and have inconsistent perf. I achieved stable throughput numbers >8GiB/s (so saturating my PC's nvme SSD), going to test on networked scenarios soon. Works for any python ML library (torch, tensorflow, jax, etc.), both for training/inference, single/multi GPUs on one node (shared/separate queues for DDP), multiple sources of data, seamless transition between training/validation phase (no significant perf drop), arbitrary shape of tensors and lots of data configurations.

Planning to add (de)compression and on-the-fly data augmentations and releasing it to public.