| ▲ | kevingadd 3 months ago |
| Part of the motive behind variable width SIMD in WASM is that there's intentionally-ish no mechanism to do feature detection at runtime in WASM. The whole module has to be valid on your target, you can't include a handful of invalid functions and conditionally execute them if the target supports 256-wide or 512-wide SIMD. If you want to adapt you have to ship entire modules for each set of supported feature flags and select the correct module at startup after probing what the target supports. So variable width SIMD solves this by making any module using it valid regardless of whether the target supports 512-bit vectors, and the VM 'just' has to solve the problem of generating good code. Personally I think this is a terrible way to do things and there should have just been a feature detection system, but the horse fled the barn on that one like a decade ago. |
|
| ▲ | __abadams__ 2 months ago | parent | next [-] |
| It would be very easy to support 512-bit vectors everywhere, and just emulate them on most systems with a small number of smaller vectors. It's easy for a compiler to generate good code for this. Clang does it well if you use its built-in vector types (which can be any length). Variable-length vectors, on the other hand, are a very challenging problem for compiler devs. You tend to get worse code out than if you just statically picked a size, even if it's not the native size. |
| |
| ▲ | jandrewrogers 2 months ago | parent | next [-] | | The risk of 512-bit vectors everywhere is that many algorithms will spill the registers pretty badly if implemented in e.g. 128-bit vectors under the hood. In such cases you may be better off with a completely different algorithm implementation. | |
| ▲ | Someone 2 months ago | parent | prev | next [-] | | > It would be very easy to support 512-bit vectors everywhere, and just emulate them on most systems with a small number of smaller vectors. It's easy for a compiler to generate good code for this Wouldn’t that be suboptimal if/when CPUs that support 1024-bit vectors come along? > Variable-length vectors, on the other hand, are a very challenging problem for compiler devs. You tend to get worse code out than if you just statically picked a size, even if it's not the native size. Why would it be challenging? You could statically pick a size on a system with variable-length vectors, too. How would that be worse code? | | |
| ▲ | kevingadd 2 months ago | parent | next [-] | | Optimal performance in a vector algorithm typically requires optimizing around things like the number of available registers, whether the registers in use are volatile (mandating stack spills when calling other functions like a comparer), and sizes of sequences. If you know you're engineering for 16-byte vectors you can 'just' align all your data to 16 bytes. And if you know you have 8 vector registers where 4 of them are non-volatile you can design around that too. But without information like that you have to be defensive, like aligning all your data to 128 bytes instead Just In Case (heaven forbid native vectors get bigger than that), minimizing the number of registers you use to try and avoid stack spills, etc. (I mention this because WASM also doesn't expose any of this information.) It's true that you could just design for a static size on a system with variable-length vectors. I suspect you'd see a lot of people do that, and potentially under-utilize the hardware's capabilities. Better than nothing, at least! | |
| ▲ | dataflow 2 months ago | parent | prev [-] | | > Wouldn’t that be suboptimal if/when CPUs that support 1024-bit vectors come along? Is that likely or on anyone's roadmap? It makes a little less sense than 512 bits, at least for Intel, since their cache lines are 64 bytes i.e. 512 bits. Any more than that and they'd have to mess with multiple cache lines all the time, not just on unaligned accesses. And they'd have to support crossing more than 2 cache lines on unaligned accesses. They increase the cache line size too, but that seems terrible for compatibility since a lot of programs assume it's a compile time constant (and it'd have performance overhead to make it a run-time value). Somehow it feels like this isn't the way to go, but hey, I'm not a CPU architect. |
| |
| ▲ | codedokode 2 months ago | parent | prev [-] | | Variable length vectors seem to be made for closed-source manually-written assembly (nobody wants to unroll the loop manually and nobody will rewrite it for new register width). |
|
|
| ▲ | immibis 2 months ago | parent | prev [-] |
| You can write code that runs on many processors or code that takes advantage of the capabilities of one specific processor - not both. Is portability (write once, run anywhere) no longer a goal of WASM? Or will every SIMD instruction be slowly emulated when run on "wrong" processors? What if the interpreter is too old to support the instruction at all? |