Remix.run Logo
cogman10 7 hours ago

> AND the software with no architecture-specific optimisations

The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V. I do not believe this is a lack of software optimization issue.

We are well past the days where hand written assembly gives much benefit, and modern compilers like gcc and llvm do nearly identical work right up until it comes to instruction emissions (including determining where SIMD instructions could be placed).

Unless these chips have very very weird performance characteristics (like the weirdness around x86's lea instruction being used for arithmetic) there's just not going to be a lot of missed heuristics.

hrmtst93837 6 hours ago | parent | next [-]

One thing compilers still struggle with is exploiting weird microarchitectural quirks or timing behaviors that aren't obvious from the ISA spec, especially with memory, cache and pipeline tuning. If a new RISC-V core doesn't expose the same prefetching tricks or has odd branch prediction you won't get parity just by porting the same backend. If you want peak numbers sometimes you do still need to tune libraries or even sprinkle in a bit of inline asm despite all the "let the compiler handle it" dogma.

cogman10 6 hours ago | parent | next [-]

While true, it's typically not going to be impactful on system performance.

There's a reason, for example, why the linux distros all target a generic x86 architecture rather than a specific architecture.

spockz 6 hours ago | parent | next [-]

Not all. CachyOS has specific builds for v3, v4, and AMD Zen4/5: https://wiki.cachyos.org/features/optimized_repos/

thesuperbigfrog 4 hours ago | parent | prev | next [-]

Ubuntu recently added a more specific target for AMD64v3:

https://discourse.ubuntu.com/t/introducing-architecture-vari...

adrian_b 6 hours ago | parent | prev [-]

Some applications may target a generic x86 architecture without any impact on performance.

However, other applications which must do cryptographic operations, audio/video processing, scientific/technical/engineering computing, etc. may have wildly different performances when compiled for different x86-64 ISA versions, for which dedicated assembly-language functions exist.

cogman10 5 hours ago | parent | next [-]

Granted, these applications do exist. They are simply becoming more and more rare. I'd also say that there's been a pretty steady dedicated effort to abstracting the assembly. It's still pretty low level, as in you are caring about the specific instructions being used, but it's also not quite assembly in both C++/rust.

Java, interestingly enough, is somewhat leading the way here with their Vector API. I think they actually have one of the better setups for allowing someone to write fast code that is platform independent.

C++ is also diving into this realm. 26 just merged in now SIMD instructions.

That is the bulk of the benefit of diving down into assembly.

https://en.cppreference.com/w/cpp/numeric/simd.html

adrian_b 5 hours ago | parent [-]

I would not say that such applications are becoming more and more rare.

Most of the applications whose performance matters for me, because I must wait a non-negligible time for them to do their job, are dependent on assembly implementation for certain functions invoked inside critical loops. I do not see any sign of replacements for them. On the contrary, Intel, AMD and Arm continue to introduce special instructions that are useful in certain niche applications and taking advantage of them will require additional assembly language functions, not less.

For me, there is only one application that I use and which consumes non-negligible computer time and which does not depend on SIMD optimizations, which is the compilation of software projects.

CyberDildonics 2 hours ago | parent | prev [-]

audio/video processing, scientific/technical/engineering computing, etc. may have wildly different performances when compiled for different x86-64 ISA versions

This is pretty vague and makes it sounds like there are big differences in instruction sets.

In actuality it comes down to memory access first which has nothing to with instructions.

After that it comes down to simple SIMD/AVX instructions and not some exotic entirely different instruction set.

CyberDildonics 2 hours ago | parent | prev [-]

The things you are talking about are taken care of by out of order execution and the CPU itself being smart about how it executes. Putting in prefetch instructions rarely beats the actual prefetcher itself. Compilers didn't end up generating perfect pentium asm either. OOO execution is what changed the game in not needing perfect compiler output any more.

bobmcnamara 6 hours ago | parent | prev [-]

> The optimizations that'd be applied to ARM and MIPS would be equally applicable to RISC-V.

There's no carry bit, and no widening multiply(or MAC)

Findecanor 4 hours ago | parent [-]

RISC-V splits widening multiply out into two instructions: one for the high bits and one for the low. Just like 64-bit ARM does.

Integer MAC doesn't exist, and is also hindered by a design decision not to require more than two source operands, so as to allow simple implementations to stay simple. The same reason also prevents RISC-V from having a true conditional move instruction: there is one but the second operand is hard-coded zero.

FMAC exists, but only because it is in the IEEE 754 spec ... and it requires significant op-code space.