Remix.run Logo
ack_complete 2 days ago

That's with a simple data operation and using a recent x86 vector ISA (AVX-512) that is only available on some systems, notably excluding any current Intel desktop CPU.

The real killer isn't the data operations, though, it's if the overflow checks interfere with converting the loop logic or data addressing to vectorizable form. Indexing with 32-bit signed int vs. unsigned int on a 64-bit platform in C is a classic case -- with unsigned the compiler cannot assume that addressing offsets don't wrap, which then prevents coalescing data accesses into vector loads and stores.