Remix.run Logo
duped 4 days ago

Normally you spin up a tool like vtune or uprof to analyze your benchmark hotspots at the ISA level. No idea about tools like that for ARM.

> Would it ever make sense to write handwritten compiler intermediate representation like LLVM IR instead of architecture-specific assembly?

IME, not really. I've done a fair bit of hand-written assembly and it exclusively comes up when dealing with architecture-specific problems - for everything else you can just write C (unless you hit one of the edge cases where C semantics don't allow you to express something in C, but those are rare).

For example: C and C++ compilers are really, really good at writing optimized code in general. Where they tend to be worse are things like vectorized code which requires you to redesign algorithms such that they can use fast vector instructions, and even then, you'll have to resort to compiler intrinsics to use the instructions at all, and even then, compiler intrinsics can lead to some bad codegen. So your code winds up being non-portable, looks like assembly, and has some overhead just because of what the compiler emits (and can't optimize). So you wind up just writing it in asm anyway, and get smarter about things the compiler worries about like register allocation and out-of-order instructions.

But the real problem once you get into this domain is that you simply cannot tell at a glance whether hand written assembly is "better" (insert your metric for "better here) than what the compiler emits. You must measure and benchmark, and those benchmarks have to be meaningful.

Sesse__ 4 days ago | parent [-]

> Normally you spin up a tool like vtune or uprof to analyze your benchmark hotspots at the ISA level. No idea about tools like that for ARM.

perf is included with the Linux kernel, and works with a fair amount of architectures (including Arm).

godelski 4 days ago | parent | next [-]

You may still need to install linux-tools to get the perf command.

Sesse__ 4 days ago | parent [-]

It's included with the kernel as distributed by upstream. Your distribution may choose to split out parts of it into other binary packages.

godelski 4 days ago | parent [-]

I'm not disagreeing, I just wanted to add so others might know why they can't just run the command.

duped 4 days ago | parent | prev [-]

perf doesn't give you instruction level profiling, does it? I thought the traces were mostly at the symbol level

Sesse__ 4 days ago | parent [-]

Hit enter on the symbol, and you get instruction-level profiles. Or use perf annotate explicitly. (The profiles are inherently instruction-level, but the default perf report view aggregates them into function-level for ease of viewing.)