| ▲ | xnorswap 9 hours ago | |||||||||||||||||||||||||||||||||||||
> slow, particularly for short keys such as integers An interesting thing about the CLR is that the default hash for integers is:
Which as well as being collision-free, also avoids the trap of a slow default hash. | ||||||||||||||||||||||||||||||||||||||
| ▲ | tialaramex 4 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
The identity function. Several C++ implementations choose this too. It is very cheap but has obvious problems which may make you wish you'd paid more up front. If you want this in Rust you can use: https://crates.io/crates/integer-hasher -- being able to swap out the hasher is something C++ folks have kinda wanted to do for like 15-20 years but they have never got it over the line. I have some benchmarks I've been noodling with for a while, measuring different ways to do a hashtable. I call the one where we just do this operation but otherwise use the ordinary Rust Swiss tables - IntHashMap in this code. For some operations, and especially at small sizes, IntHashMap is significantly better But, for other operations, and especially at large sizes, it's worse. For example suppose we have 10 K->V pairs in our hash table, when we're looking for one of the ten K values, we're much faster in IntHashMap. However, if it's not there, IntHashMap is slightly slower. Further, if we have the first ten thousands numbers instead of ten thousand random numbers, like if we'd made a hash table of serial numbers for something - we're ten times worse in IntHashMap and that's because our hashing function though fast, is very bad at its job. | ||||||||||||||||||||||||||||||||||||||
| ▲ | mrec 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Do you know how it then maps hashes to buckets? I'd expect the natural occurrence of integer values to be very non-uniform and heavily biased toward the small end. | ||||||||||||||||||||||||||||||||||||||
| ▲ | Tuna-Fish 7 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
But if you know the hash table implementation, it makes forcing collisions trivial for user-generated input, leading to easy denial of service attacks. The first requirement for safe hashtable implementations is a secret key, which makes it impossible to know the hash value for an external observer. (Or even, to know the relative hash value between any two inputs.) | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||