Remix.run Logo
torginus 7 days ago

No it's not slow - a single NAND chip in SSDs offers >1GB of bandwidth - inside the chip there are 100+ wafers actually holding the data, but in SSDs only one of them is active when reading/writing.

You could probably make special NAND chips where all of them can be active at the same time, which means you could get 100GB+ bandwidth out of a single chip.

This would be useless for data storage scenarios, but very useful when you have huge amounts of static data you need to read quickly.

slickytail 7 days ago | parent [-]

The memory bandwidth on an H100 is 3TB/s, for reference. This number is the limiting factor in the size of modern LLMs. 100GB/s isn't even in the realm of viability.

torginus 7 days ago | parent | next [-]

That bandwidth is for the whole GPU, which has 6 mermoy chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap.

And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.

torginus 7 days ago | parent | prev [-]

That bandwidth is for the whole GPU, which has 6 chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap.

And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.