Remix.run Logo
codedokode 10 hours ago

I think that packed SIMD is better in almost every aspect and Vector SIMD is worse.

With vector SIMD you don't know the register size beforehand and therefore have to maintain and increment counters, adding extra unnecessary instructions, reducing total performance. With packed SIMD you can issue several loads immediately without dependencies, and if you look at code examples, you can see that the x86 code is more dense and uses a sequence of unrolled SIMD instructions without any extra instructions which is more efficient. While RISC-V has 4 SIMD instructions and 4 instructions dealing with counters per loop iteration, i.e. it wastes 50% of command issue bandwidth and you cannot load next block until you increment the counter.

The article mentions that you have to recompile packed SIMD code when a new architecture comes out. Is that really a problem? Open source software is recompiled every week anyway. You should just describe your operations in a high level language that gets compiled to assembly for all supported architectures.

So as a conclusion, it seems that Vector SIMD is optimized for manually-written assembly and closed-source software while Packed SIMD is made for open-source software and compilers and is more efficient. Why RISC-V community prefers Vector architecture, I don't understand.

IshKebab 5 hours ago | parent | next [-]

Those 4 counter instructions have no dependencies though so they'll likely all be issued and executed in parallel in 1 cycle, surely? Probably the branch as well in fact.

codedokode 4 hours ago | parent [-]

The load instruction has a dependency on counter increment. While with packed SIMD one can issue several loads without waiting. Also, extra counter instructions still waste resources of a CPU (unless there is some dedicated hardware for this purpose).

LoganDark 9 hours ago | parent | prev [-]

This comment sort of reminds me of how Transmeta CPUs relied on the compiler to precompute everything like pipelining. It wasn't done by the hardware.

codedokode 9 hours ago | parent [-]

Makes sense - writing or updating software is easier that designing or updating hardware. To illustrate: anyone can write software but not everyone has access to chip manufacturing fabs.

LoganDark 3 hours ago | parent [-]

Atomic Semi may be looking to change that (...eventually)