Remix.run Logo
bhouston 5 hours ago

Hasn't there been issues with AVX2 causing such a heavy load on the CPU that frequency scaling would kick in a lot of cases slowing down the whole CPU?

https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Dow...

My experience is that trying to get benefits from the vector extensions is incredibly hard and the use cases are very narrow. Having them in a standard BLAS implementation, sure, but outside of that I think they are not worth the effort.

jsheard 5 hours ago | parent | next [-]

Throttling was mainly an issue with AVX512, which is twice the width of AVX2, and only really on the early Skylake (2015) implementation. From your own source Ice Lake (2019) barely flinches and Rocket Lake (2021) doesn't proactively downclock at all. AMDs implementation came later but was solid right out of the gate.

kbolino 4 hours ago | parent | prev | next [-]

This is a bit short-sighted. Yes, it is kinda tricky to get right, and a number of programming languages are quite behind on good SIMD support (though many are catching up).

SIMD is not limited to mathy linear algebra things anymore. Did you know that lookup tables can be accelerated with AVX2? A lot of branchy code can be vectorized nowadays using scatter/gather/shuffle/blend/etc. instructions. The benefits vary, but can be significant. I think a view of SIMD as just being a faster/wider ALU is out of date.

kccqzy 4 hours ago | parent | prev | next [-]

That’s only on very old CPUs. Getting benefits from vector extensions is incredibly easy if you do any kind of data crunching. A lot of integer operations not covered by BLAS can benefit including modern hash tables.

vintagedave 2 hours ago | parent | prev | next [-]

Re hard to get benefits: a lot depends on the compiler. In Elements (the toolchain this article was tested with) we made a bunch of modifications to LLVM passes to prioritise vectorisation in situations where it could, but did not.

I've heard anecdotally that the old pre-LLVM Intel C++ Compiler also focused heavily on vectorisation and had some specific tradeoffs to achieve it. I think they use LLVM now too and for all I know they've made similar modifications that we did. But we see a decent number of code patterns that can and now are optimised.

adgjlsfhk1 an hour ago | parent | prev [-]

the modern approach is much more fine grained throttling so by the time it throttles you already are coming out ahead.