Remix.run Logo
pornel 8 hours ago

The point of updating the instructions isn't to have optimal behavior in all cases, or to reconfigure programs for wildly different hardware, but to be able to easily target contemporary hardware, without having to wait for the oldest hardware to die out first to be able to target a less outdated baseline without conditional dispatch.

Users are much more forgiving about software that runs a bit slower than software that doesn't run at all. ~95% of x86_64 CPUs have AVX2 support, but compiling binaries to unconditionally rely on it makes the remaining users complain. If it was merely slower on potato hardware, it'd be an easier tradeoff to make.

This is the norm on GPUs thanks to shader recompilation (they're far from optimal for all hardware, but at least get to use the instruction set of the HW they're running on, instead of being limited to the lowest common denominator). On CPUs it's happening in limited cases: Zen 3 added AVX-512 by executing two 256-bit operations serially, and plenty of less critical instructions are emulated in microcode, but it's done by the hardware, because our software isn't set up for that.

Compilers already need to make assumptions about pipeline widths and instruction latencies, so the code is tuned for specific CPU vendors/generations anyway, and that doesn't get updated. Less explicitly, optimized code also makes assumptions about cache sizes and compute vs memory trade-offs. Code may need L1 cache of certain size to work best, but it still runs on CPUs with a too-small L1 cache, just slower. Imagine how annoying it would be if your code couldn't take advantage of a larger L1 cache without crashing on older CPUs. That's where CPUs are with SIMD.

almostgotcaught 7 hours ago | parent [-]

i have no idea what you're saying - i'm well aware that compilers do lots of things but this sentence in your original comment

> compiled machine code exactly as-is, and had a instruction-updating pass

implies there should be silicon that implements the instruction-updating - what else would be "executing" compiled machine code other than the machine itself...........

pornel 4 hours ago | parent [-]

I was talking about a software pass. Currently, the machine code stored in executables (such as ELF or PE) is only slightly patched by the dynamic linker, and then expected to be directly executable by the CPU. The code in the file has to be already compatible with the target CPU, otherwise you hit illegal instructions. This is a simplistic approach, dating back to when running executables was just a matter of loading them into RAM and jumping to their start (old a.out or DOS COM).

What I'm suggesting is adding a translation/fixup step after loading a binary, before the code is executed, to make it more tolerant to hardware changes. It doesn’t have to be full abstract portable bytecode compilation, and not even as involved as PTX to SASS, but more like a peephole optimizer for the same OS on the same general CPU architecture. For example, on a pre-AVX2 x86_64 CPU, the OS could scan for AVX2 instructions and patch them to do equivalent work using SSE or scalar instructions. There are implementation and compatibility issues that make it tricky, but fundamentally it should be possible. Wilder things like x86_64 to aarch64 translation have been done, so let's do it for x86_64-v4 to x86_64-v1 too.

almostgotcaught 43 minutes ago | parent [-]

that's certainly more reasonable so i'm sorry for being so flippant. but even this idea i wager the juice is not worth the squeeze outside of stuff like Rosetta as you alluded, where the value was extremely high (retaining x86 customers).