▲ | jcranmer 4 days ago | |
> Would it ever make sense to write handwritten compiler intermediate representation like LLVM IR instead of architecture-specific assembly? Not really. There are a couple of reasons to reach for handwritten assembly, and in every case, IR is just not the right choice: If your goal is to ensure vector code, your first choice is to try slapping explicit vectorize-me pragmas onto the loop. If that fails, your next effort is either to use generic or arch-specific vector intrinsics (or jump to something like ISPC, a language for writing SIMT-like vector code). You don't really gain anything in this use case from jumping to IR, since the intrinsics will satisfy your code. If your goal is to work around compiler suboptimality in register allocation or instruction selection... well, trying to write it in IR gives the compiler a very high likelihood of simply recanonicalizing the exact sequence you wrote to the same sequence the original code would have produced for no actual difference in code. Compiler IR doesn't add anything to the code; it just creates an extra layer that uses an unstable and harder-to-use interface for writing code. To produce the best handwritten version of assembly in these cases, you have to go straight to writing the assembly you wanted anyways. | ||
▲ | astrange 4 days ago | parent [-] | |
Loop vectorization doesn't work for ffmpeg's needs because the kernels are too small and specialized. It works better for scientific/numeric computing. You could invent a DSL for writing the kernels in… but they did, it's x86inc.asm. I agree ispc is close to something that could work. |