▲ | perihelions 3 days ago | |
I don't know ARM, but an alternate approach, if it's available, is to store the query constants as bitmasks in SIMD registers; and use the input bytes as indices into those constants, using a shuffle instruction. Two levels, to pull out a bit from a 256-bit mask: part of an input byte is used to index a byte (SIMD shuffle), and another part indices a bit within the byte (bit shifts). Idea being, this is constant in the size of the query set. | ||
▲ | ncruces 3 days ago | parent [-] | |
But that's slower for small query sizes. This describes a few algorithms: http://0x80.pl/notesen/2018-10-18-simd-byte-lookup.html Both the alternative version by Geoff Langdale, and the special case for small sets, are substantially similar to the algorithms used in Hyperscan (truffle and shufti). https://github.com/intel/hyperscan Having something hard coded for spaces can be much faster, especially since 5 of the 6 characters are a range: a wrap-around subtraction and an unsigned less-than does the first 5; an equality compare does the other. |