Remix clone Hacker News

new | show | ask | jobs Github

	▲	otterley 3 days ago
		The idea is that in a pipeline of work, throughput is limited by the slowest component. H100 GPUs have a lot of memory bandwidth. The question then becomes how to eliminate any bottlenecks between the data store and the GPU's memory. First is the storage bottleneck. Network-attached storage is usually a bottleneck for uncached data. Then there is CPU work decoding data. Spiral claims that their table format is ready to load by the GPU so they can bypass various CPU-bound decoding stages. Once you eliminate storage and CPU bottlenecks, the remaining bottleneck is usually the PCI bus that sits between the host memory and the GPU, and they can't solve that themselves. (And no amount of parallelization can help when the bus is saturated.) What they can do is use the network, the host bus, and the GPU more efficiently by compressing and packing data with greater mechanical sympathy. They've left unanswered how they're going to commercialize it, but my guess is that they're going to use a proprietary fork of Vortex that provides extra performance or features, or perhaps they'll offer commercial services or integrations that make it easier to use. The open-source release gives its customers a Reason to Believe, in marketing parlance.