Remix clone Hacker News

new | show | ask | jobs Github

	▲	slickytail 7 days ago
		The memory bandwidth on an H100 is 3TB/s, for reference. This number is the limiting factor in the size of modern LLMs. 100GB/s isn't even in the realm of viability.
	▲	torginus 7 days ago \| parent \| next [-]
		That bandwidth is for the whole GPU, which has 6 mermoy chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap. And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.
	▲	torginus 7 days ago \| parent \| prev [-]
		That bandwidth is for the whole GPU, which has 6 chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap. And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.