▲ | Fibonacci Hashing: The Optimization That the World Forgot(probablydance.com) | ||||||||||||||||||||||||||||
140 points by juancampa 6 days ago | 29 comments | |||||||||||||||||||||||||||||
▲ | 317070 4 days ago | parent | next [-] | ||||||||||||||||||||||||||||
Why the golden ratio? Because the continued fraction of the golden ratio is all 1's [0]. So it is uniquely hard to approximate with a rational number. The golden ratio is the bound on Hurwitz's theorem [1]. And avoiding a rational number is what you want for good hashing, because multiplying with a rational number doesn't mix your digits well. [0] https://codegolf.stackexchange.com/questions/48589/generate-... [1] https://en.m.wikipedia.org/wiki/Hurwitz%27s_theorem_(number_... | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | jlebar 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
I think the author has misunderstood things here. This technique is orthogonal to integer mod. Indeed the author multiplies by their magic constant and then does an integer mod to map into their hashtable's buckets. This technique is actually just applying a fast integer hash on the input keys to the hashtable before mapping the keys to buckets. You can then map to buckets however you want. The additional hash is useful if and only if the input hash function for your table's keys doesn't appear to be a random function, i.e. it doesn't mix its bits for whatever reason. If your input hash functions are indeed random then this is a (small but perhaps measurable) waste of time. Using prime-numbered table sizes is another way to accomplish basically the same thing. Dividing the input hash key by a prime forces you to look at all the bits of the input. In practice these are written as division by a constant, so they use multiplies and shifts. It's basically a hash function. (Though I'd use multiply by a magic number over divide by a prime, mul alone should be faster.) Relatedly see this post by Daniel Lemire about an alternative to integer mod, https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-... which is interesting if your number of buckets is not a power of 2 for some reason. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | beeforpork 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Oh -- this has a name? And it is not commonly known to be better than modulo? Forgotten? I feel old. The reason multiply+use of high bits is better than modulo is not the golden ratio. In fact, any number using a lot of bits is good (well, not all are the same, like 0 is extremely bad, and 1 and all powers of two are quite bad, because they just shift. But OTOH, the golden ratio is not outstandingly good either -- many values are good.). This is because in the result of a multiplication, higher bits depend on all lower-bits of the operands, because the carry propagates from low bits to higher bits through the resulting value. In contrast, modulo essentially cuts off the upper bits, and so the upper bits are lost for the distribution entropy. This means that if you take the high bits of a multiplication result to distribute into a small set of integers, then you get a better distribution than using only the lower bits, because more bits of the input value are significant for the output result. I thought this was common knowledge. The man page on rand() used to tell you that you should always multiply into the target range instead of modulo into the target range. It is surprising to see that all big hash libraries seem to miss this. | |||||||||||||||||||||||||||||
▲ | thot_experiment 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
I love this property of phi, I used this in a project a while back to order a an initial keyframe stream where it was important to quickly allow the user to arbitrarily jump to any given point in the timeline. It's super easy to implement and you guaranteed that you have something near the query point to show the user right away. | |||||||||||||||||||||||||||||
▲ | senderista 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
This article is both confused and confusing. What he's really talking about is not a hash function but a pseudorandom permutation (functionally the same as a block cipher), i.e. a pseudorandom bijective function on an integer domain. In the context of hash functions this is often called a "mixer" or "finalizer" (since it's applied as the final step before determining the bucket). Multiplication by the golden ratio is about the simplest such permutation, but it's far inferior to others (several of which I catalogued here, along with their inverses: https://github.com/senderista/hashtable-benchmarks/tree/mast...). The important thing to remember is that multiplication only mixes low bits into high bits, but you should also mix high bits into low bits (e.g. with XOR), as I do here with the "golden ratio" multiplier: https://github.com/senderista/hashtable-benchmarks/blob/mast.... One useful property of permutations is that they're reversible, so you can reconstruct a key from its hash code, which is useful for hash tables with integer keys. I used this technique here to avoid storing keys separately from hash codes: https://github.com/senderista/hashtable-benchmarks/blob/mast.... Applying a separate "mixer" function to the result of a user-supplied hash function makes sense only if the quality of the user's hash function is suspect. If you control the hash function yourself then there's no reason to use one (as I mentioned, a high-quality hash function will include a finalizer anyway). And if you do use a mixer, you should use a better one than this. | |||||||||||||||||||||||||||||
▲ | mkj 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
The Linux kernel integer hash uses something similar, so not totally forgotten.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin... | |||||||||||||||||||||||||||||
▲ | Genbox 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
I opened this post in a tab 3 days ago, but now it says "5 hours ago". Someone is playing around. | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | peterlada 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
To summarize: multiplication hashes are inferior but when used with the golden ratio derived integer, they are actually superior | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | bmn__ 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | infoaddicted 6 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
The youtube video is marked private? | |||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||
▲ | shaggie76 3 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
While reading this article it occurred to me to wonder if the CPU CRC32C instruction would be good for hash functions; I think the latency is about the same as an integer multiply. | |||||||||||||||||||||||||||||
▲ | up2isomorphism 2 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
This information density in the article is extremely low. | |||||||||||||||||||||||||||||
▲ | hyperbrainer 4 days ago | parent | prev | next [-] | ||||||||||||||||||||||||||||
Needs (2018) | |||||||||||||||||||||||||||||
▲ | igtztorrero 3 days ago | parent | prev [-] | ||||||||||||||||||||||||||||
The Fibonacci series, as fascinating as its name, the origin of the spiral in nature, and the Debian logo, ever-present in computing. The algorithm is surely better as the author says, but most of the time we use what's available. Perhaps the Python, Golang, and JS core programmers could review it. |