Posted by _spaceatom 11/2/2025
I've debugged things like this: a bug was introduced, but with no manifestation. Eventually, many commits later, some unrelated change triggers it. But this comes and goes. Some changes make the manifestation go away, and some changes make it reappear.
Git bisect is predicated on the bug not existing at the "good" end point and making a single appearance between that and the "bad" end of the range. It has allowance for commits not being testable; you can skip those. If the bad commit is one of the skipped ones, I think it tells you.
Try not to have any other kind of bug. :)
Well, the example of git bisect tells you that you should know of the concept of binary search, but it's not a good argument for having to learn how to implement binary search.
Also just about any language worth using has binary search in the standard library (or as a third party library) these days. That's saner than writing your own, because getting all the corner cases right is tricky (and writing tests so they stay right, even when people make small changes to the code over time).
The most problematic line that most people seem to miss is in the calculation of the midpoint index. Using `mid = (low + high) / 2` has an overflow bug if you're not using infinite precision, but there are several other potential problems even in the simplest algorithm.
To be fair, in most computing environments, either indices don't overflow (Smalltalk, most Lisps) or arrays can never be big enough for the addition of two valid array indices to overflow, unless they are arrays of characters, which it would be sort of stupid to binary search. It only became a significant problem with LP64 and 64-bit Java.
Your comment is mostly true, when you do binary search in something like an array, yes.
But you can also do binary search in any monotonically increasing function.
> Using `mid = (low + high) / 2` has an overflow bug if you're not using infinite precision, but there are several other potential problems even in the simplest algorithm.
Well, if you are doing binary search on eg items you actually hold in memory (or even disk) storage somewhere, like items in a sorted array (or git commits), then these days with 64 bit integers the overflow isn't a problem: there's just not enough storage to get anywhere close to overflow territory.
A back of the envelope calculation estimates that we as humanity have produced enough memory and disk storage in total that we'd need around 75 bits to address each byte independently. But for a single calculation on a single computer 63 bits are probably enough for the foreseeable future. (I didn't go all the way to 64 bits, because you need a bit of headroom, so you don't run into the overflow issues.)
Property based testing is really useful for finding corner cases in your binary search. See eg https://fsharpforfunandprofit.com/series/property-based-test... for one introduction.
You may have to further bisect the space to find the culprit.
The following situation can occur in a compiler: something is changed (e.g. optimizer). The commit breaks as a result. But you have no idea where. Obviously, something is wrong with the optimization change, but from looking at the code, it's not obvious. The compiler is self-hosting, so it compiles itself as well as the library of the language, and then tests are run with everything. The optimization change miscompiled something in the compiler. That something then broke the compiler in a way that is miscompiled something in a run-time function, which then broke something else.
To find the first miscompile, you can do the following binary search.
Have the compiler hash the names of all top-level definitions that it is compiling across the system, say to a 64 bit value.
Then in the code where the change was introduce, put an if/else switch: if the low N bits of the hash code are equal to a certain value, then optimize with the new logic. Otherwiwse use the old logic.
We start with N=1 (1 bit) and the value 0. All definitions whose hash code ends in a 0 bit are subject to the broken new logic; those with 1 are subject to the old logic.
Say that the bug reproduces. Then we keep the 0, and raise N to 2 for a two-bit mask. We try 00: all hashes ending with 00 use new optimization. Those ending in 10, use old. This time the problem doesn't reproduce. So we stick with 10. (And can validate that the problem now reproduces). We raise N to 3, and follow the same procedure.
By doing this we reveal enough bits of the hash code to narrow it down to one definition (e.g. function): when that one specific function is subject to the compiler change, all hell breaks loose, otherwise not.
From there we can analyze it: see how that definition is compiled before and after the mod, and what makes it break. From there, it is still hard work, but much easier to deduce why the optimization change is wrong and possibly how to fix it.
I've successfully employed this technique several times on the TXR Lisp project to debug compiler issues.
I also think that typically if you have to resort to bisect you are probably in a wrong place. You should have found the bug earlier so if do not even know when the bug came from
- your test coverage isn't good sufficient
- your tests are probably not actually testing what you believe they do
- your architecture is complex, too complex for you
To be clear though I do include myself in this abstract "you".
But few of us have the privilege of working on such codebases, and with people who have that kind of discipline and quality standards.
In reality, most codebases have statement coverage that rarely exceeds 50%, if coverage is tracked at all; tests are brittle, flaky, difficult to maintain, and likely have bugs themselves; and architecture is an afterthought for a system that grew organically under deadline pressure, where refactors are seen as a waste of time.
So given that, bisect can be very useful. Yet in practice it likely won't, since usually the same teams that would benefit from it, don't have the discipline to maintain a clean history with atomic commits, which is crucial for bisect to work. If the result is a 2000-line commit, you still have to dig through the code to find the root cause.
It’s still very satisfying to watch run though, especially if you write a script that it can run automatically (based on the existing code) to determine if it’s a good or bad commit.
1. Start with working code
2. Introduce bug
3. Identify bug
4. Write a regression test
5. Bisect with new test
In many cases you can skip the bisect because the description of the bug makes it clear where the issue is, but not always.
Tests can't prove the absence of bugs, only their presence. Bugs or regressions will slip through.
Bisect is for when that happens and the cause is not obvious.
You can go back and efficiently run a new test across old commits.
Yet, once you've identified a hole, you can write a script to test for it, run `git biset` to identify what commit introduced the hole, and then triage the possible fallout.
To sum up: bisect and tests are not in opposite sides, they complement each other
"What if the unit tests have bugs?"
I get that author learned a new-to-him technique and is excited to share with the world. But to this dev, with a rapidly greying beard, the article has the vibe of "Hey bro! You're not gonna believe this. But I just learned the Pope is catholic."
Binary search is one of the first things you learn in algorithms, and in a well-managed branch the commit tree is already a sorted straight line, so it's just obvious as hell, whether or not you use your VCS to run the bisect or you do it by hand yourself.
"Hey guys, check it out! Water is wet!"
I mean, do you really not know this XKCD?