▲ | benreesman 6 days ago | |
Nice work and good writeup. I think most of that is very sound practice. The codegen switch with the offsets is in everything, first time I saw it was in the Rhino JS bytecode compiler in maybe 2006, written it a dozen times since. Still clever you worked it out from first principles. There are some modern C++ libraries that do frightening things with SIMD that might give your bytestring stuff a lift on modern stupid-wide high mispredict penalty stuff. Anything by lemire, stringzilla, take a look at zpp_bits for inspiration about theoretical minimum data structure pack/unpack. But I think you got damn close to what can be done, niiicccee work. | ||
▲ | Sesse__ 6 days ago | parent [-] | |
FWIW, this is basically an implementation of perfect hashing, and there's a myriad of different strategies. Sometimes “switch on length + well-chosen characters” are good, sometimes you can do better (e.g. just looking up in a table instead of a long if chain). The “value speculation” thing looks completely weird to me, especially with the “volatile” that doesn't do anything at all (volatile is generally a pointer qualifier in C++). If it works, I'm not really convinced it works for the reason the author thinks it works (especially since it refers to an article talking about a CPU from the relative stone age). |