Remix.run Logo
didibus 3 days ago

I think I understood it as the database will basically store data in a binary format that can be fed into the GPU directly, and will also be optimized for streaming/batching large chunks of data at ounce.

So it's "optimized for machines to consume" meaning the GPU.

Their use case was training ML models where you need to feed the GPU massive datasets as part of training.

They seem to claim that training is now bottlenecked by how quickly you can feed the GPU, that otherwise the GPU is basically "waiting on IO" most of the time and not actual computing because the time goes in just grabbing the next piece of data, transforming it for GPU consumption, and then feeding it into the GPU.

But I'm not an expert, this is just my take from the article.