▲ | antirez 2 days ago | |
About one month ago I applied XOR in a similar (but a bit more complicated way) to Redis Vector Sets implementation, in the context of sanity check of loading a vset value from the RDB file. I believe the way it works is quite interesting and kinda extends the applicability of the trick in the post. The problem is that in vector sets, the HNSW graph has the invariant that each node has bidirectional links to a set of N nodes. If A links to B, then B links to A. This is unlike most other HNSW implementations. In mine, it is required that links are reciprocal, otherwise you get a crash. Now, combine this with another fact: for speed concerns, Redis vector sets are not serialized as
And then reloaded and added back to the HNSW. This would be slow. Instead, what I do, is to serialize the graph itself. Each node with its unique ID and all the links. But when I load the graph back, I must be sure it is "sane" and will not crash my systems. And reciprocal links are one of the things to check. Checking that all the links are reciprocal could be done with an hash table (as in the post problem), but that would be slower and memory consuming, so how do we use XOR instead? Each time I see a link A -> B, I normalize it swapping A and B in case A>B. So if links are reciprocal I'll see A->B A->B two times, if I use a register to accumulate the two IDs and XOR them, at the end, if the register is NOT null I got issues: some link may not be reciprocal.However, in this specific case, there is a problem: collisions. The register may be 0 even if there are non reciprocal links in case they are fancy, that is, the non-reciprocal links are a few and they happen to XOR to 0. So, to fix this part, I use a strong (and large) hash function that will make the collision extremely unlikely. It is nice now to see this post, since I was not aware of this algorithm when I used it a few weeks ago. Sure, at this point I'm old enough that never pretend I invented something, so I was sure this was already used in the past, but well, in case it was not used for reciprocal links testing, this is a new interview questions you may want to use for advanced candidates. | ||
▲ | hundredwatt a day ago | parent [-] | |
A neat trick to make the accumulator both collision-resistant and self-diagnosing.
If acc is zero, all links are reciprocal (same guarantee as before).If acc is non-zero, split it back into (x', h'): * Re-compute h(x'). * If it equals h', exactly one link is unpaired and x' tells you which one (or an astronomically unlikely collision). Otherwise there are >= 2 problems. This has collision-resistance like the parent comment and adds the ability to pinpoint a single offending link without a second pass or a hash table. |