Remix.run Logo
xphos 14 hours ago

Personally, I think load and increment address register in a single instruction is extremely valuable here. It's not quite the risc model but I think that it is actually pretty significant in avoiding a von nurmon bottleneck with simd (the irony in this statement)

I found that a lot of the custom simd cores I've written for simply cannot issue instructions fast enough risvc. Or when they it's in quick bursts and than increments and loop controls that leave the engine idling for more than you'd like.

Better dual issue helps but when you have seperate vector queue you are sending things to it's not that much to add increments into vloads and vstores

codedokode 9 hours ago | parent [-]

> load and increment address register in a single instruction is extremely valuable here

I think this is not important anymore because modern architectures allow to add offset to register value so you can write something like, using weird RISC-V syntax for addition:

    ld r2, 0(r1)
    ld r3, 4(r1)
    ld r4, 8(r1)
These operations can be executed in parallel, while with auto-incrementing you cannot do that.