| ▲ | kazinator 3 days ago | |
Sometimes, the difficulty begins only when you've identified the bad commit. You may have to further bisect the space to find the culprit. The following situation can occur in a compiler: something is changed (e.g. optimizer). The commit breaks as a result. But you have no idea where. Obviously, something is wrong with the optimization change, but from looking at the code, it's not obvious. The compiler is self-hosting, so it compiles itself as well as the library of the language, and then tests are run with everything. The optimization change miscompiled something in the compiler. That something then broke the compiler in a way that is miscompiled something in a run-time function, which then broke something else. To find the first miscompile, you can do the following binary search. Have the compiler hash the names of all top-level definitions that it is compiling across the system, say to a 64 bit value. Then in the code where the change was introduce, put an if/else switch: if the low N bits of the hash code are equal to a certain value, then optimize with the new logic. Otherwiwse use the old logic. We start with N=1 (1 bit) and the value 0. All definitions whose hash code ends in a 0 bit are subject to the broken new logic; those with 1 are subject to the old logic. Say that the bug reproduces. Then we keep the 0, and raise N to 2 for a two-bit mask. We try 00: all hashes ending with 00 use new optimization. Those ending in 10, use old. This time the problem doesn't reproduce. So we stick with 10. (And can validate that the problem now reproduces). We raise N to 3, and follow the same procedure. By doing this we reveal enough bits of the hash code to narrow it down to one definition (e.g. function): when that one specific function is subject to the compiler change, all hell breaks loose, otherwise not. From there we can analyze it: see how that definition is compiled before and after the mod, and what makes it break. From there, it is still hard work, but much easier to deduce why the optimization change is wrong and possibly how to fix it. I've successfully employed this technique several times on the TXR Lisp project to debug compiler issues. | ||