Remix clone Hacker News

new | show | ask | jobs Github

	▲	less_less a day ago
		Adding to some other comments in the thread: finding missing or extra numbers is closely related to error-correcting codes, especially binary linear codes. In an error-correcting code, you have a string of bits or symbols, with symbol x_i appearing at position i. You choose the code so that valid sequences have a certain mathematical property, and then if one or a few symbols are corrupted, then you can use that property to correct the errors. The property is typically that a certain linear function called the "syndrome" is zero, meaning that sum(x_i * G_i) = 0 where each G_i is some strategically chosen vector, particular to the code. The math for how to correct is particular to the chosen G_i, and it's a really interesting field of study. In a typical error-correcting code usage, you have an encoder which takes your message, and adds some extra symbols at the end which are calculated so that the syndrome is zero. Then when receiving your message, the receiver calculates the syndrome and if it's not zero, they know that at least one error has occurred. By using the code's decoding algorithm, they can figure out the fewest (and thus hopefully most likely) number of changes which would result in that error syndrome, and use this information to (hopefully) correct the transmission error. For the missing numbers problem, you can set x_i to "how many times does the number i appear?". Then since the syndrome is sum(x_i * G_i), you can compute the syndrome on an unordered list of the i's. You are expecting the syndrome to be the same as the syndrome of full set 1...n, so when it is not, you can figure out which few x_i's are wrong that would lead to the syndrome you observed. You have an advantage because you know how many numbers are missing, but it's only a slight one. The author's solution is called the Hamming code: you set F(i) = i, and you do the additions by xoring. Using error-correcting codes generalize to more missing numbers as well, including using xor, but the math becomes more complicated: you would want to use a fancier code such as a BCH or Goppa code. These also use xor, but in more complicated ways.