Remix.run Logo
teo_zero 2 days ago

Sorry, I've read and reread TFA but the concept still evades me. Is it that, since it's easier for a hash function to have higher entropy in the higher bits than in the lower ones, it would be more logical for hash tables to discard the lower bits and keep the higher ones?

adrian_b 2 days ago | parent | next [-]

Higher entropy in the higher bits is a property of addition and multiplication when the result has the same size as the operands (i.e. the result is taken modulo 2^N, which folds back the values exceeding the size of the result).

When other operations are used, there may be other bits with higher entropy. For example, when full-length multiplication is used (i.e. where the length of the result is the sum of the lengths of the operands), the middle bits have the highest entropy, not the top bits. On CPUs like the Intel/AMD x86-64 CPUs, where fast long multiplication instructions are available, this can be exploited in more performant hash functions and PRNGs.

In hash functions, additions and multiplications are frequently used together with rotations, in order to redistribute the entropy from the top bits to the bottom bits, during the following operations.

purplesyringa 2 days ago | parent [-]

Honorary mention: byte swapping instructions (originally added to CPUs for endianness conversion) can also be used to redistribute entropy, but they're slightly slower than rotations on Intel, which is why I think they aren't utilized much.

purplesyringa 2 days ago | parent | prev [-]

Yes, that's my point. It's not true that all hash functions have this characteristic, but most fast ones do. (And if you're using a slow-and-high-quality hash function, the distinction doesn't matter, so might as well use top bits.)