Remix.run Logo
jandrewrogers a day ago

I also prefer fixed width. At least in C++, all of the padding, alignment, etc is automagically codegen-ed for the register type in my use cases, so the overhead is approximately zero. All the complexity and cost is in specializing for the capabilities of the underlying SIMD ISA, not the width.

The benefit of fixed width is that optimal data structure and algorithm design on various microarchitectures is dependent on explicitly knowing the register width. SIMD widths aren’t not perfectly substitutable in practice, there is more at play than stride size. You can also do things like explicitly combine separate logic streams in a single SIMD instruction based on knowing the word layout. Compilers don’t do this work in 2025.

The argument for vector width agnostic code seems predicated on the proverbial “sufficiently advanced compiler”. I will likely retire from the industry before such a compiler actually exists. Like fusion power, it has been ten years away my entire life.

camel-cdr a day ago | parent [-]

> The argument for vector width agnostic code is seems predicated on the proverbial “sufficiently advanced compiler”.

A SIMD ISA having a fixed size or not is orthogonal to autovectorization. E.g. I've seen a bunch of cases where things get autovectorized for RVV but not for AVX512. The reason isn't fixed vs variable, but rather the supported instructions themselves.

There are two things I'd like from a "sufficiently advanced compiler”, which are sizeless struct support and redundant predicated load/store elimination. Those don't fundamentally add new capabilities, but makes working with/integrating into existing APIs easier.

> All the complexity and cost is in specializing for the capabilities of the underlying SIMD ISA, not the width.

Wow, it almost sounds like you could take basically the same code and run it with different vector lengths.

> The benefit of fixed width is that optimal data structure and algorithm design on various microarchitectures is dependent on explicitly knowing the register width

Optimal to what degree? Like sure, fixed-width SIMD can always turn your pointer increments from a register add to an immediate add, so it's always more "optimal", but that sort of thing doesn't matter.

The only difference you usually encounter when writing variable instead of fixed size code is that you have to synthesize your shuffles outside the loop. This usually just takes a few instructions, but loading a constant is certainly easier.

jandrewrogers 19 hours ago | parent [-]

The interplay of SIMD width and microarchitecture is more important for performance engineering than you seem to be assuming. Those codegen decisions are made at layer above anything being talked about here and they operate on explicit awareness of things like register size.

It isn’t “same instruction but wider or narrower” or anything that can be trivially autovectorized, it is “different algorithm design”. Compilers are not yet rewriting data structures and algorithms based on microarchitecture.

I write a lot of SIMD code, mostly for database engines, little of which is trivial “processing a vector of data types” style code. AVX512 in particular is strong enough of an ISA that it is used in all kinds of contexts that we traditionally wouldn’t think of as a good for SIMD. You can build all kinds of neat quasi-scalar idioms with it and people do.

synthos 11 hours ago | parent [-]

> Compilers are not yet rewriting data structures and algorithms based on microarchitecture.

Afaik mojo allows for this with the autotuning capability and metaprogramming