Remix.run Logo
cjbgkagh a day ago

I have similar thoughts,

I don't understand the push for variable width SIMD. Possibly due to ignorance but I think it's an abstraction that can be specialized for different hardware so the similar tradeoffs between low level languages and high level languages apply. Since I already have to be aware of hardware level concepts such as 256bit shuffle not working across 128bit lanes and different instructions having very different performance characteristics on different CPUs I'm already knee deep in hardware specifics. While in general I like abstractions I've largely given up waiting for a 'sufficiently advanced compiler' that would properly auto-vectorize my code. I think AGI AI is more likely to happen sooner. At a guess it seems to be that SIMD code could work on GPUs but GPU code has different memory access costs so the code there would also be completely different.

So my view is either create a much better higher level SIMD abstraction model with a sufficiently advanced compiler that knows all the tricks or let me work closely at the hardware level.

As an outsider who doesn't really know what is going on it does worry me a bit that it appears that WASM is pushing for variable width SIMDs instead of supporting ISAs generally supported by CPUs. I guess it's a portability vs performance tradeoff - I worry that it may be difficult to make variable as performant as fixed width and would prefer to deal with portability by having alternative branches at code level.

  >> Finally, any software that wants to use the new instruction set needs to be rewritten (or at least recompiled). What is worse, software developers often have to target several SIMD generations, and add mechanisms to their programs that dynamically select the optimal code paths depending on which SIMD generation is supported.
Why not marry the two and have variable width SIMD as one of the ISA options and if in the future variable width SIMD become more performant then it would just be another branch to dynamically select.
kevingadd 21 hours ago | parent [-]

Part of the motive behind variable width SIMD in WASM is that there's intentionally-ish no mechanism to do feature detection at runtime in WASM. The whole module has to be valid on your target, you can't include a handful of invalid functions and conditionally execute them if the target supports 256-wide or 512-wide SIMD. If you want to adapt you have to ship entire modules for each set of supported feature flags and select the correct module at startup after probing what the target supports.

So variable width SIMD solves this by making any module using it valid regardless of whether the target supports 512-bit vectors, and the VM 'just' has to solve the problem of generating good code.

Personally I think this is a terrible way to do things and there should have just been a feature detection system, but the horse fled the barn on that one like a decade ago.

__abadams__ 18 hours ago | parent [-]

It would be very easy to support 512-bit vectors everywhere, and just emulate them on most systems with a small number of smaller vectors. It's easy for a compiler to generate good code for this. Clang does it well if you use its built-in vector types (which can be any length). Variable-length vectors, on the other hand, are a very challenging problem for compiler devs. You tend to get worse code out than if you just statically picked a size, even if it's not the native size.

Someone 4 hours ago | parent | next [-]

> It would be very easy to support 512-bit vectors everywhere, and just emulate them on most systems with a small number of smaller vectors. It's easy for a compiler to generate good code for this

Wouldn’t that be suboptimal if/when CPUs that support 1024-bit vectors come along?

> Variable-length vectors, on the other hand, are a very challenging problem for compiler devs. You tend to get worse code out than if you just statically picked a size, even if it's not the native size.

Why would it be challenging? You could statically pick a size on a system with variable-length vectors, too. How would that be worse code?

jandrewrogers 18 hours ago | parent | prev | next [-]

The risk of 512-bit vectors everywhere is that many algorithms will spill the registers pretty badly if implemented in e.g. 128-bit vectors under the hood. In such cases you may be better off with a completely different algorithm implementation.

codedokode 10 hours ago | parent | prev [-]

Variable length vectors seem to be made for closed-source manually-written assembly (nobody wants to unroll the loop manually and nobody will rewrite it for new register width).