| ▲ | eru 4 days ago |
| > People rant about having to learn algorithmic questions for interviews. I get it — interview system is broken, but you ought to learn binary search at least. Well, the example of git bisect tells you that you should know of the concept of binary search, but it's not a good argument for having to learn how to implement binary search. Also just about any language worth using has binary search in the standard library (or as a third party library) these days. That's saner than writing your own, because getting all the corner cases right is tricky (and writing tests so they stay right, even when people make small changes to the code over time). |
|
| ▲ | Arcuru 4 days ago | parent | next [-] |
| Unfortunately I can't find the reference now, but I remember reading that even though binary search was first described in the 1940's, the first bug-free implementation wasn't published until the 1960s. The most problematic line that most people seem to miss is in the calculation of the midpoint index. Using `mid = (low + high) / 2` has an overflow bug if you're not using infinite precision, but there are several other potential problems even in the simplest algorithm. |
| |
| ▲ | kragen 4 days ago | parent | next [-] | | The overflow bug wasn't fixed until the 21st century; the comment you remember reading dates from before it was discovered. To be fair, in most computing environments, either indices don't overflow (Smalltalk, most Lisps) or arrays can never be big enough for the addition of two valid array indices to overflow, unless they are arrays of characters, which it would be sort of stupid to binary search. It only became a significant problem with LP64 and 64-bit Java. | | |
| ▲ | eru 4 days ago | parent [-] | | Agreed. Your comment is mostly true, when you do binary search in something like an array, yes. But you can also do binary search in any monotonically increasing function. | | |
| ▲ | kragen a day ago | parent [-] | | For arbitrary functions you usually want floating point, which solves the overflow problem in a different way. |
|
| |
| ▲ | eru 4 days ago | parent | prev [-] | | Agreed. > Using `mid = (low + high) / 2` has an overflow bug if you're not using infinite precision, but there are several other potential problems even in the simplest algorithm. Well, if you are doing binary search on eg items you actually hold in memory (or even disk) storage somewhere, like items in a sorted array (or git commits), then these days with 64 bit integers the overflow isn't a problem: there's just not enough storage to get anywhere close to overflow territory. A back of the envelope calculation estimates that we as humanity have produced enough memory and disk storage in total that we'd need around 75 bits to address each byte independently. But for a single calculation on a single computer 63 bits are probably enough for the foreseeable future. (I didn't go all the way to 64 bits, because you need a bit of headroom, so you don't run into the overflow issues.) |
|
|
| ▲ | runeblaze 4 days ago | parent | prev [-] |
| My personal mantra (that I myself cannot uphold 100%) is that every dev should at least do the exercise of implementing binary search from scratch in a language with arbitrary-precision integers (e.g., Python) once in a while. It is the best exercise in invariant-based thinking, useful for software correctness at large |
| |