Remix.run Logo
PathOfEclipse 6 days ago

I think it was always a mistake to pretend hyperthreading doubles your core count. I always assumed it was just due to laziness; the operating system treats a hyperthreaded core as two "virtual cores" and schedules as two cores, so then every other piece of tooling sees double the number of actual cores. There's no good reason I know of that a CPU utilization tool shouldn't use real cores when calculating percentages. But, maybe that's hard to do given how the OS implements hyperthreading.

fluoridation 6 days ago | parent [-]

>There's no good reason I know of that a CPU utilization tool shouldn't use real cores when calculating percentages

On AMD, threads may as well be cores. If you take a Ryzen and disable SMT, you're basically halving its parallelism, at least for some tasks. On Intel you're just turning off an extra 10-20%.

PathOfEclipse 6 days ago | parent [-]

Can you provide some links for this? A quick web search turns this up at near the top from 2024:

https://www.techpowerup.com/review/amd-ryzen-9-9700x-perform...

The benchmarks show a 10% drop in "application" performance when SMT is disabled, but an overall 1-3% increase in performance for games.

From a hardware perspective, I can't imagine how it could be physically possible to double performance by enabling SMT.

fluoridation 5 days ago | parent [-]

I don't. It's based off my own testing, not by disabling SMT, but by running either <core_count> or <thread_count> parallel threads. It was my own code, so it's possible code that uses SIMD more heavily will see a less-significant speed-up. It's also possible I just measured wrong; running Cargo on a directory with -j16 and -j32 takes 58 and 48 seconds respectively.

>From a hardware perspective, I can't imagine how it could be physically possible to double performance by enabling SMT.

It depends on which parts of the processor your code uses. SMT works by duplicating some but not all the components of each core, so a single core can work on multiple independent uops simultaneously. I don't know the specifics, but I can imagine ALU-type code (jumps, calls, movs, etc.) benefits more from SMT than very math-heavy code. That would explain why rustc saw a greater speedup than Cinebench, as compiler code is very twisty with not a lot of math.