Remix.run Logo
perihelions 3 days ago

I don't know ARM, but an alternate approach, if it's available, is to store the query constants as bitmasks in SIMD registers; and use the input bytes as indices into those constants, using a shuffle instruction. Two levels, to pull out a bit from a 256-bit mask: part of an input byte is used to index a byte (SIMD shuffle), and another part indices a bit within the byte (bit shifts).

Idea being, this is constant in the size of the query set.

ncruces 3 days ago | parent [-]

But that's slower for small query sizes.

This describes a few algorithms: http://0x80.pl/notesen/2018-10-18-simd-byte-lookup.html

Both the alternative version by Geoff Langdale, and the special case for small sets, are substantially similar to the algorithms used in Hyperscan (truffle and shufti). https://github.com/intel/hyperscan

Having something hard coded for spaces can be much faster, especially since 5 of the 6 characters are a range: a wrap-around subtraction and an unsigned less-than does the first 5; an equality compare does the other.