▲ | jandrewrogers a day ago | ||||||||||||||||
I also prefer fixed width. At least in C++, all of the padding, alignment, etc is automagically codegen-ed for the register type in my use cases, so the overhead is approximately zero. All the complexity and cost is in specializing for the capabilities of the underlying SIMD ISA, not the width. The benefit of fixed width is that optimal data structure and algorithm design on various microarchitectures is dependent on explicitly knowing the register width. SIMD widths aren’t not perfectly substitutable in practice, there is more at play than stride size. You can also do things like explicitly combine separate logic streams in a single SIMD instruction based on knowing the word layout. Compilers don’t do this work in 2025. The argument for vector width agnostic code seems predicated on the proverbial “sufficiently advanced compiler”. I will likely retire from the industry before such a compiler actually exists. Like fusion power, it has been ten years away my entire life. | |||||||||||||||||
▲ | camel-cdr a day ago | parent [-] | ||||||||||||||||
> The argument for vector width agnostic code is seems predicated on the proverbial “sufficiently advanced compiler”. A SIMD ISA having a fixed size or not is orthogonal to autovectorization. E.g. I've seen a bunch of cases where things get autovectorized for RVV but not for AVX512. The reason isn't fixed vs variable, but rather the supported instructions themselves. There are two things I'd like from a "sufficiently advanced compiler”, which are sizeless struct support and redundant predicated load/store elimination. Those don't fundamentally add new capabilities, but makes working with/integrating into existing APIs easier. > All the complexity and cost is in specializing for the capabilities of the underlying SIMD ISA, not the width. Wow, it almost sounds like you could take basically the same code and run it with different vector lengths. > The benefit of fixed width is that optimal data structure and algorithm design on various microarchitectures is dependent on explicitly knowing the register width Optimal to what degree? Like sure, fixed-width SIMD can always turn your pointer increments from a register add to an immediate add, so it's always more "optimal", but that sort of thing doesn't matter. The only difference you usually encounter when writing variable instead of fixed size code is that you have to synthesize your shuffles outside the loop. This usually just takes a few instructions, but loading a constant is certainly easier. | |||||||||||||||||
|