One can generalize this to k missing numbers the same way as we typically do for the addition case by using finite fields:

XOR is equivalent to addition over the finite field F_2^m. So, in this field, we're calculating the sum. If we have two numbers missing, we calculate the sum and sum of squares, so we know:

x + y

x^2 + y^2

From which we can solve for x and y. (Note all the multiplications are Galois Field multiplications, not integer!)

Similarly for k numbers we calculate sums of higher powers and get a higher order polynomial equation that gives our answer. Of course, the same solution works over the integers and I'd imagine modular arithmetic as well (I haven't checked though).

▲ less_less a day ago | parent | next [-]

This will depend on the field, and for F_2^m you want odd powers: sum(x), sum(x^3), sum(x^5) etc. Using sum(x^2) won't help because squaring over F_2^m is a field homomorphism, meaning that sum(x^2) = sum(x)^2.

This is also how BCH error-correction codes work (see https://en.wikipedia.org/wiki/BCH_code): a valid BCH codeword has sum(x^i where bit x is set in the codeword) = 0 for t odd powers i=1,3,5, ... Then if some bits get flipped, you will get a "syndrome" s_i := sum(x^i where bit x was flipped) for those odd powers. Solving from the syndrome to get the indices of the flipped bits is the same problem as here.

The general decoding algorithm is a bit involved, as you can see in the Wikipedia article, but it's not horribly difficult:

  • First, extend the syndrome: it gives sum(x^i) for odd i, but you can compute the even powers s_2i = s_i^2.

  • The syndrome is a sequence of field values s_i, but we can imagine it as a "syndrome polynomial" S(z) := sum(s_i z^i).  This is only a conceptual step, not a computational one.

  • We will find a polynomial L(z) which is zero at all errors z=x and nowhere else.  This L is called a "locator" polynomial.  It turns out (can be checked with some algebra) that L(z) satisfies a "key equation" where certain terms of L(z) * S(z) are zero.  The key equation is (almost) linear: solve it with linear algebra (takes cubic time in the number of errors), or solve it faster with the Berlekamp-Massey algorithm (quadratic time instead, maybe subquadratic if you're fancy).

  • Find the roots of L(z).  There are tricks for this if its degree is low.  If the degree is high then you usually just iterate over the field.  This takes O(#errors * size of domain) time.  It can be sped up by a constant factor using Chien's search algorithm, or by a logarithmic factor using an FFT or AFFT.

You can of course use a different error-correcting code if you prefer (e.g. binary Goppa codes).

Edit: bullets are hard.

Further edit just to note: the "^" in the above text refers to powers over the finite field, not the xor operator.

▲

Straw a day ago | parent | next [-]

Good catch, thank you!

▲

nullc 21 hours ago | parent | prev [-]

Yesterday I linked to an implementation (with complexity quadratic in the number of errors) I helped to create in another comment in this thread.

> constant factor using Chien's search algorithm

Chien's search is only really reasonable for small field sizes... which I think doesn't really make sense in this application, where the list is long and the missing elements are relatively few.

Fortunately in characteristic 2 it's quite straight forward and fast to just factor the polynomial using the berlekamp trace algorithm.

	▲	less_less 12 hours ago \| parent [-]
		Oh yeah, factoring the polynomial is also a good idea. For a long enough list that ought to be better than AFFT too.

▲ noman-land a day ago | parent | prev [-]

Can you explain a bit about how and why the higher powers work?

	▲	less_less a day ago \| parent [-]
		If you imagine a polynomial L(z) that's zero at all the missing numbers, you can expand the coefficients out. For example, with 2 missing numbers (x,y), you have: `L(z) = z^2 - (x+y)z + xy.` You already have x+y, but what's xy? You can compute it as ((x+y)^2 - (x^2 + y^2))/2. This technique generalizes to higher powers, though I forget the exact details: basically you can generate the coefficients of L from the sums of powers with a recurrence. Then you solve for the roots of L, either using your finite field's variant of the quadratic formula, or e.g. just by trying everything in the field. * But wait, this doesn't actually work! * Over fields of small characteristic, such as F_2^m, you need to modify the approach and use different powers. For example, in the equations above, I divided by 2. But over F_2^m in the example shown above, you cannot divide by 2, since 2=0. In fact, you cannot solve for (x,y) at all with only x+y and x^2 + y^2, because `(x+y)^2 = x^2 + y^2 + 2xy = x^2 + y^2 + 0xy (since 2=0) = x^2 + y^2` So having that second polynomial gives you no new information. So you need to use other powers such as cubes (a BCH code), or some other technique (e.g. a Goppa code). My sibling comment to yours describes the BCH case.