▲ | vlovich123 2 days ago | |
I disagree that HW blocks for lower precision take up that much die space. Data center GPUs are useless for gaming because it's tuned that way. H100 still has 24 raster operating units (4050 has 32) and 456 texture mapping units (4090 has 512). That's because there's only so much they can tune the HW architecture to one use-case or the other without breaking some fundamental architecture assumptions. And consumer cards still come with tensor units and support for lower precision. This is because the HW costs and unit economics are such that it's much more in favor of a unified architecture that scales to different workloads vs discrete implementations specific to a given market segment. They've also not bothered investing in SW to add the H100 to their consumer drivers to work well on games. That doesn't mean it's impossible and none of that takes away from the fact that H100 and consumer GPUs are much more similar and could theoretically be made to run the same workloads at comparable performance. |