| ▲ | susam 2 days ago | |
I just noticed the m = 10007 value in my comment above and I thought I should clarify it. The number of bits per bloom filter does not need to be a prime number if the hash functions have uniform distribution. Murmur2 hash functions do have uniform distribution, so m was not chosen to be prime in order to reduce collisions in the Bloom filter's bit positions. The reason for using a prime value was more mundane. This was a fairly large project, with roughly 3 million lines of C and C++ code which had numerous constants with special values defined throughout the code. So instead of using a less interesting number like 10000, I chose 10007 so that if we ever came across the value 10007 (decimal) or 0x2717 (hexadecimal) while inspecting a core dump in a debugger, we would immediately recognise it as the constant defining the number of bits per Bloom filter. | ||
| ▲ | anonymars 2 days ago | parent | next [-] | |
Ha, collision avoidance all the way down | ||
| ▲ | gkfasdfasdf 2 days ago | parent | prev | next [-] | |
Interesting trick about the constant value, and thank you for the detailed write up! | ||
| ▲ | xandrius 2 days ago | parent | prev [-] | |
[flagged] | ||