| ▲ | kuzivaai 2 days ago | ||||||||||||||||||||||
The point about hash tables using top bits instead of bottom bits is the kind of thing that feels obvious once someone says it and yet here we are. Genuine question: have you seen any real-world hash table implementations that actually do this, or is it purely "this is what we should have done 40 years ago"? | |||||||||||||||||||||||
| ▲ | adrian_b 2 days ago | parent | next [-] | ||||||||||||||||||||||
Which bits are the best is completely dependent on the particular hash function. For instance, if the last operation during computing a hash function was the kind of integer multiplication where the result has the same length as the operands, then the top bits are the best (because they depend on more of the input bits than the bottom result bits). On the other hand, if the last operation during computing a hash function was the kind of integer multiplication where the result has a length that is the sum of the lengths of the operands (i.e. where the high bits of the result are not discarded), then the best bits of the result are neither the top bits nor the bottom bits, but the middle bits, for the same reason as before, they are the bits that depend on more of the input bits than either the bottom bits or the top bits of the result. (This fact becomes obvious if you do a long multiplication with pen on paper. After you compute the partial products and you start to add them, you can see that when computing either the top digits or the bottom digits you need to add much less digits from the partial products than when you compute the middle digits, so the middle digits are more thoroughly mixed as functions of the input digits.) When the last operation is an addition, because the carry bits propagate from the bottom bits towards the top bits, the top bits are the best, because the higher a bit is in the result there are more input bits on which it depends. | |||||||||||||||||||||||
| ▲ | SkiFire13 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||
I think older processors used to have a slower implementation for shifts, which made this slower. Nowadays swisstable and other similar hashtables use the top bits and simd/swar techniques to quickly filter out collisions after determining the starting bucket. | |||||||||||||||||||||||
| ▲ | purplesyringa 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||
I think the reason real-world implementations don't do this is to speed up access when the key is a small integer. Say, if your IDs are spread uniformly between 1 and 1000, taking the bottom 7 bits is a great hash, while the top 7 bits would just be zeros. So it's optimizing for a trivial hash rather than a general-purpose fast hash. And since most languages require each data type to provide its own hash function, you kind of have to assume that the hash is half-assed and bottom bits are better. I think only Rust could make decisions differently here, since it's parametric over hashers, but I haven't seen that done. | |||||||||||||||||||||||
| ▲ | teo_zero 2 days ago | parent | prev [-] | ||||||||||||||||||||||
Sorry, I've read and reread TFA but the concept still evades me. Is it that, since it's easier for a hash function to have higher entropy in the higher bits than in the lower ones, it would be more logical for hash tables to discard the lower bits and keep the higher ones? | |||||||||||||||||||||||
| |||||||||||||||||||||||