▲ | astrange 4 days ago | ||||||||||||||||||||||
So the main issues here are not what people think they are. They generally aren't "suboptimal assembly", at least not what you can reasonably expect out of a C compiler. The factors are something like: - specialization: there's already a decent plain-C implementation of the loop, asm/SIMD versions are added on for specific hardware platforms. And different platforms have different SIMD features, so it's hard to generalize them. - predictability: users have different compiler versions, so even if there is a good one out there not everyone is going to use it. - optimization difficulties: C's memory model specifically makes optimization difficult here because video is `char *` and `char *` aliases everything. Also, the two kinds of features compilers add for this (intrinsics and autovectorization) can fight each other and make things worse than nothing. - taste: you could imagine a better portable language for writing SIMD in, but C isn't it. And on Intel C with intrinsics definitely isn't it, because their stuff was invented by Microsoft, who were famous for having absolutely no aesthetic taste in anything. The assembly is /more/ readable than C would be because it'd all be function calls with names like `_mm_movemask_epi8`. | |||||||||||||||||||||||
▲ | derf_ 3 days ago | parent | next [-] | ||||||||||||||||||||||
One time I spent a week carefully rewriting all of the SIMD asm in libtheora, really pulling out all of the stops to go after every last cycle [0], and managed to squeeze out 1% faster total decoder performance. Then I spent a day reorganizing some structs in the C code and got 7%. I think about that a lot when I decide what optimizations to go after. [0] https://gitlab.xiph.org/xiph/theora/-/blob/main/lib/x86/mmxl... is an example of what we are talking about here. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | ack_complete 3 days ago | parent | prev [-] | ||||||||||||||||||||||
> And on Intel C with intrinsics definitely isn't it, because their stuff was invented by Microsoft, who were famous for having absolutely no aesthetic taste in anything. Wouldn't Intel be the one defining the intrinsics? They're referenced from the ISA manuals, and the Intel Intrinsics Guide regularly references intrinsics like _allow_cpu_features() that are only supported by the Intel compiler and aren't implemented in MSVC. | |||||||||||||||||||||||
|