Remix.run Logo
gdiamos 4 hours ago

I'm not sure why the article dismisses cost.

Let's say X=10% of the GPU area (~75mm^2) is dedicated to FP32 SIMD units. Assume FP64 units are ~2-4x bigger. That would be 150-300mm^2, a huge amount of area that would increase the price per GPU. You may not agree with these assumptions. Feel free to change them. It is an overhead that is replicated per core. Why would gamers want to pay for any features they don't use?

Not to say there isn't market segmentation going on, but FP64 cost is higher for massively parallel processors than it was in the days of high frequency single core CPUs.

thesz 15 minutes ago | parent | next [-]

  > Assume FP64 units are ~2-4x bigger.
This is wrong assumption. FP64 usually uses the same circuitry as two FP32, adding not that much ((de)normalization, mostly).

From the top of my head, overhead is around 10% or so.

  > Why would gamers want to pay for any features they don't use?
https://www.youtube.com/watch?v=lEBQveBCtKY

Apparently FP80, which is even wider than FP64, is beneficial for pathfinding algorithms in games.

Pathfinding for hundredths of units is a task worth putting on GPU.

wtallis 3 hours ago | parent | prev | next [-]

> Assume FP64 units are ~2-4x bigger.

I'm pretty sure that's not a remotely fair assumption to make. We've seen architectures that can eg. do two FP32 operations or one FP64 operation with the same unit, with relatively low overhead compared to a pure FP32 architecture. That's pretty much how all integer math units work, and it's not hard to pull off for floating point. FP64 units don't have to be—and seldom have been—implemented as massive single-purpose blocks of otherwise-dark silicon.

When the real hardware design choice is between having a reasonable 2:1 or 4:1 FP32:FP64 ratio vs having no FP64 whatsoever and designing a completely different core layout for consumer vs pro, the small overhead of having some FP64 capability has clearly been deemed worthwhile by the GPU makers for many generations. It's only now that NVIDIA is so massive that we're seeing them do five different physical implementations of "Blackwell" architecture variants.

jcranmer 3 hours ago | parent | prev | next [-]

> Assume FP64 units are ~2-4x bigger.

I'm not a hardware guy, but an explanation I've seen from someone who is says that it's not much extra hardware to add to a 2×f32 FMA unit the capability to do 1×f64. You already have all of the per-bit logic, you mostly just need to add an extra control line to make a few carries propagate. So the size overhead of adding FP64 to the SIMD units is more like 10-50%, not 100-300%.

wmf 4 hours ago | parent | prev [-]

Why would gamers want to pay for any features they don't use?

Obviously they don't want to. Now flip it around and ask why HPC people would want to force gamers to pay for something that benefits the HPC people... Suddenly the blog post makes perfect sense.

rustyhancock 3 hours ago | parent [-]

Similar to when Nvidia released LHR GPUs that nerfed performance for Ethereum mining.

NVIDIA GeForce RTX 3060 LHR which tried to hinder mining at the bios level.

The point wasn't to make the average person lose out by preventing them mining on their gaming GPU. But to make miners less inclined to buy gaming GPUs. They also released a series of crypto mining GPUs around the same time.

So fairly typical market segregation.

https://videocardz.com/newz/nvidia-geforce-rtx-3060-anti-min...