Remix clone Hacker News

new | show | ask | jobs Github

	▲	atq2119 2 days ago
		It's misleading to compare a desktop GPU against a data center GPU on these metrics. Blackwell data center tenor cores are different from Blackwell consumer tensor cores, and same for the AMD side. Also, the size of the native / atomic matrix fragment size isn't relevant for memory bandwidth because you can always build larger matrices out of multiple fragments in the register file. A single matrix fragment is read from memory once and used in multiple matmul instructions, which has the same effect on memory bandwidth as using a single larger matmul instruction.