That's pretty nice ;)

I once wrote this humdinger, that's still on my mostly dead personal website from 2010... one of my proudest bits of code besides my poker hand evaluator ;)

The question was, how do you generate a unique number for any two positive integers, where x!=y, such that f(x,y) = f(y,x) but the resulting combined id would not be generated by any other pair of integers. What I came up with was a way to generate a unique key from any set of positive integers which is valid no matter the order, but which doesn't key to any other set.

My idea was to take the radius of a circle that intersected the integer pair in cartesian space. That alone doesn't guarantee the circle won't intersect any other integer pairs... so I had to add to it the phase multiple of sine and cosine which is the same at those two points on the arc. That works out to:

(x^2+y^2)+(sin(atan(x/y))*cos(atan(x/y)))

And means that it doesn't matter which order you feed x and y in, it will generate a unique float for the pair. It reduces to:

x^2+y^2+( (x/y) / (x^2+y^2) )

To add another dimension, just add it to the process and key it to one of the first...

x^2+y^2+z^2+( (x/y) / (x^2+y^2) )+( (x/z) / (x^2+z^2) )

▲

bazzargh 4 days ago | parent [-]

It looks like you have typos? (x^2+y^2)+(sin(atan(x/y))*cos(atan(x/y))) reduces to x^2+y^2+( (x/y) / (x^2/y^2 + 1) ) - not the equation given? Tho it's easier to see that this would be symmetrical if you rearrange it to: x^2+y^2+( (xy) / (x^2+y^2) )

Also, if f(x,y) = x^2+y^2+( (x/y) / (x^2+y^2) ) then f(2,1) is 5.2 and f(1,2) is 5.1? - this is how I noticed the mistake. (the other reduction gives the same answer, 5.4, for both, by symmetry, as you suggest)

There's a simpler solution which produces integer ids (though they are large): 2^x & 2^y. Another solution is to multiply the xth and yth primes.

I only looked because I was curious how you proved it unique!

▲

noduerme 4 days ago | parent [-]

Hhhhmm. Ok. So I invented this solution in 2009 at what you might call a "peak mental moment", by a pool in Palm Springs, CA, after about 6 hours of writing on napkins. I'm not a mathematician. I don't think I'm even a great programmer, since there are probably much better ways of solving the thing I was trying to solve. And also, I'm not sure how I even came up with the reduction; I probably was wrong or made a typo (missing the +1?), and I'm not even certain how I could come up with it again.

2^x & 2^y ...is the & a bitwise operator...???? That would produce a unique ID? That would be very interesting, is that provable?

Primes take too much time.

The thing I was trying to solve was: I had written a bitcoin poker site from scratch, and I wanted to determine whether any players were colluding with each other. There were too many combinations of players on tables to analyze all their hands versus each other rapidly, so I needed to write a nightly cron job that collated their betting patterns 1 vs 1, 1 vs 2, 1 vs 3... any time 2 or 3 or 4 players were at the same table, I wanted to have a unique signature for that combination of players, regardless of which order they sat in at the table or which order they played their hands in. All the data for each player's action was in a SQL table of hand histories, indexed by playerID and tableID, with all the other playerIDs in the hand in a separate table. At the time, at least, I needed a faster way to query that data so that I could get a unique id from a set of playerIDs that would pull just the data from this massive table where all the same players were in a hand, without having to check the primary playerID column for each one. That was the motivation behind it.

It did work. I'm glad you were curious. I think I kept it as the original algorithm, not the reduced version. But I was much smarter 15 years ago... I haven't had an epiphany like that in awhile (mostly have not needed to, unfortunately).

▲

bazzargh 4 days ago | parent | next [-]

The typo is most likely the extra /, in (x/y)/(x^2+y^2) instead of (xy)/(x^2+y^2).

`2^x & 2^y ...is the & a bitwise operator...???? That would produce a unique ID? That would be very interesting, is that provable?`

Yes, & is bitwise and. It's just treating your players as a bit vector. It's not so much provable as a tautology, it is exactly the property that players x and y are present. It's not _useful_ tho because the field size you'd need to hold the bit vector is enormous.

As for the problem...it sounds bloom-filter adjacent (a bloom filter of players in a hand would give a single id with a low probability of collision for a set of players; you'd use this to accelerate exact checks), but also like an indexed many-to-many table might have done the job, but all depends on what the actual queries you needed to run were, I'm just idly speculating.

▲

noduerme 4 days ago | parent [-]

At the time, at least, there was no way to index it for all 8 players involved in a hand. Each action taken would be indexed to the player that took it, and I'd need to sweep up adjacent actions for other players in each hand, but only the players who were consistently in lots of hands with that player. I've heard of bloom filters (now, not in 2012)... makes some sense. But the idea was to find some vector that made any set of players unique when running through a linear table, regardless of the order they presented in.

To that extent, I submit my solution as possibly being the best one.

I'm still a bit perplexed by why you say 2^x & 2^y is tautologically sound as a unique way to map f(x,y)==f(y,x), where x and y are nonequal integers. Throwing in the bitwise & makes it seem less safe to me. Why is that provably never replicable between any two pairs of integers?

▲

bazzargh 3 days ago | parent [-]

I'm saying it's a tautology because it's just a binary representation of the set. Suppose we have 8 players, with x and y being 2 and 4: set the 2nd and 4th bits (ie 2^2 & 2^4) and you have 00001010.

But to lay it out: every positive integer is a sum of powers of 2. (this is obvious, since every number is a sum of 1s, ie 2^0). But also every number is a sum of _distinct_ powers of 2: if there are 2 identical powers 2^a+2^a in the sum, then they are replaced by 2^(a+1), this happens recursively until there are no more duplicated powers of 2.

It remains to show that each number has a unique binary representation, ie that there are no two numbers x=2^x1+2^x2+... and y=2^y1+2^y2+... that have the same sum, x=y, but from different powers. Suppose we have a smallest such number, and x1 y1 are the largest powers in each set. Then x1 != y1 because then we can subtract it from both numbers and get an _even smaller_ number that has distinct representations, a contradiction. Then either x1 < y1 or y1 < x1. Suppose without loss of generality that it's the first (we can just swap labels). then x<=2^(x1+1)-1 (just summing all powers of 2 from 1..x1) but y>=2^y1>=2^(x1+1)>x, a contradiction.

or, tl;dr just dealing with the case of 2 powers: we want to disprove that there exists a,b,c,d such that

2^a + 2^b = 2^c + 2^d, a>b, c>d, and (a,b) != (c,d).

Suppose a = c, then subtract 2^a from both sides and we have 2^b = 2^d, so b=d, a contradiction.

Suppose a>c; then a >= c+1.

2^c + 2^d < 2^c + 2^c = 2^(c+1).

2^c + 2^d <= 2^(c+1) - 1 < 2^(c+1) + 2^b <= 2^a + 2^b

a contradiction.

▲

noduerme 3 days ago | parent [-]

Thanks for the great response. Honestly, TIL that 2^0 = 1. That was a new one for me and I'm not sure I understand it. I failed pre-Calculus, twice.

Visually I think I can understand the bitwise version now, from reading this. But it wouldn't work for 3 integers, would it?

	▲	bazzargh 3 days ago \| parent [-]
		it works for any number of integers. The first proof above (before tl;dr) is showing that every positive integer has a unique representation as a sum of distinct powers of 2, ie binary, and that no two integers have the same representation. You can watch a lecture about the representation of sets in binary here https://www.youtube.com/watch?v=Iw21xgyN9To (google representing sets with bits for way more like this) But again it's not useful in practice for very sparse sets: if you have say a million players, with at most 10 at the same poker table, setting 10 bits of a million-bit binary number is super wasteful. Even representing the players as fixed size 20-bit numbers (1 million in binary is 20 bits long), and appending the 10 sorted numbers, means you don't need more than 200 bits to represent this set. And you can go much smaller if all you want is to label a _bucket_ that includes this particular set; just hash the 10 numbers to get a short id. Then to query faster for a specific combination of players you construct the hash of that group, query to get everything in that bucket (which may include false positives), then filter this much smaller set of answers.

▲

bazzargh 4 days ago | parent | prev [-]

BTW, yet another way to do it (more compact than the bitwise and prime options) is the Cantor pairing function https://en.wikipedia.org/wiki/Pairing_function

... z = (x+y+1)(x+y)/2 + y - but you have to sort x,y first to get the order independence you wanted. This function is famously used in the argument that the set of integers and the set of rationals have the same cardinality.

▲

noduerme 4 days ago | parent [-]

mm. I did see this when I was figuring it out. The sorting first was the specific thing I wanted to avoid, because it would've been by far the most expensive part of the operation when looking at a million poker hands and trying to target several players for potential collusion.

▲

bazzargh 4 days ago | parent [-]

you're only sorting players within a single hand. so a list of under 10 items? thats trivial

	▲	noduerme 3 days ago \| parent [-]
		So the goal was to generate signatures for 2, 3 or more players and then be able to reference anything in the history table that had that combination of players without doing a full scan and cross-joining the same table multiple times. Specifically to avoid having ten index columns in the history table for each seat's player. This was also prior to JSON querying in mysql. I needed a way to either bake in the combinations at write time, or to generate a unique id at read time in a way that wouldn't require me to query whether playerIDs were [1201,1803,2903] or [1803,1201,2903] etc. Just a one-shot unique signature for that combination of players that could always evaluate the same regardless of the order. If that makes sense. There were other considerations and this was not exactly how it worked, since only certain players were flagged and I was looking for patterns when those particular players were on the same table. It wasn't like every combination of players had a unique id, just a few combinations where I needed to be able to search over a large space to find when they were in the same room together, but disregarding the order they were listed in.