Posted by _spaceatom 3 days ago
For more subtle bugs, where there's no hard error but something isn't doing the right thing, yes bisect might be more helpful especially if there is a known old version where the thing works, and somewhere between that and the current version it was broken.
Tick tock. You need competence in depth when you have SLIs.
In general, this takes human-interactive time. Maybe not much, but generally more interactive time than is required to write the bisect test script and invoke `git bisect run ...`
The fact that it's noninteractive means that you can do other work in the meantime. Once it's done you might well have more information than you'd have if you had used the same time manually reducing it interactively by trying to reduce the scope of the bug.
Before you find even that, your fire drill strategy is very very important. Is there enough detail in the incident channel and our CD system for coworkers to put their dev sandbox in the same state as production? Is there enough if a clue of what is happening for them to run speculative tests in parallel? Is the data architecture clean enough that your experiments don’t change the outcome of mine? Onboarding docs and deployment process docs, if they are tight, reduce the Amdahl’s Law effect as it applies to figuring out what the bug is and where it is. Which is I. This context also Brooks ‘s Law.
Well, the example of git bisect tells you that you should know of the concept of binary search, but it's not a good argument for having to learn how to implement binary search.
Also just about any language worth using has binary search in the standard library (or as a third party library) these days. That's saner than writing your own, because getting all the corner cases right is tricky (and writing tests so they stay right, even when people make small changes to the code over time).
The most problematic line that most people seem to miss is in the calculation of the midpoint index. Using `mid = (low + high) / 2` has an overflow bug if you're not using infinite precision, but there are several other potential problems even in the simplest algorithm.
To be fair, in most computing environments, either indices don't overflow (Smalltalk, most Lisps) or arrays can never be big enough for the addition of two valid array indices to overflow, unless they are arrays of characters, which it would be sort of stupid to binary search. It only became a significant problem with LP64 and 64-bit Java.
Your comment is mostly true, when you do binary search in something like an array, yes.
But you can also do binary search in any monotonically increasing function.
> Using `mid = (low + high) / 2` has an overflow bug if you're not using infinite precision, but there are several other potential problems even in the simplest algorithm.
Well, if you are doing binary search on eg items you actually hold in memory (or even disk) storage somewhere, like items in a sorted array (or git commits), then these days with 64 bit integers the overflow isn't a problem: there's just not enough storage to get anywhere close to overflow territory.
A back of the envelope calculation estimates that we as humanity have produced enough memory and disk storage in total that we'd need around 75 bits to address each byte independently. But for a single calculation on a single computer 63 bits are probably enough for the foreseeable future. (I didn't go all the way to 64 bits, because you need a bit of headroom, so you don't run into the overflow issues.)
Property based testing is really useful for finding corner cases in your binary search. See eg https://fsharpforfunandprofit.com/series/property-based-test... for one introduction.
I also think that typically if you have to resort to bisect you are probably in a wrong place. You should have found the bug earlier so if do not even know when the bug came from
- your test coverage isn't good sufficient
- your tests are probably not actually testing what you believe they do
- your architecture is complex, too complex for you
To be clear though I do include myself in this abstract "you".
But few of us have the privilege of working on such codebases, and with people who have that kind of discipline and quality standards.
In reality, most codebases have statement coverage that rarely exceeds 50%, if coverage is tracked at all; tests are brittle, flaky, difficult to maintain, and likely have bugs themselves; and architecture is an afterthought for a system that grew organically under deadline pressure, where refactors are seen as a waste of time.
So given that, bisect can be very useful. Yet in practice it likely won't, since usually the same teams that would benefit from it, don't have the discipline to maintain a clean history with atomic commits, which is crucial for bisect to work. If the result is a 2000-line commit, you still have to dig through the code to find the root cause.
It’s still very satisfying to watch run though, especially if you write a script that it can run automatically (based on the existing code) to determine if it’s a good or bad commit.
1. Start with working code
2. Introduce bug
3. Identify bug
4. Write a regression test
5. Bisect with new test
In many cases you can skip the bisect because the description of the bug makes it clear where the issue is, but not always.
Tests can't prove the absence of bugs, only their presence. Bugs or regressions will slip through.
Bisect is for when that happens and the cause is not obvious.
You can go back and efficiently run a new test across old commits.
Yet, once you've identified a hole, you can write a script to test for it, run `git biset` to identify what commit introduced the hole, and then triage the possible fallout.
To sum up: bisect and tests are not in opposite sides, they complement each other
"What if the unit tests have bugs?"
I get that author learned a new-to-him technique and is excited to share with the world. But to this dev, with a rapidly greying beard, the article has the vibe of "Hey bro! You're not gonna believe this. But I just learned the Pope is catholic."
Binary search is one of the first things you learn in algorithms, and in a well-managed branch the commit tree is already a sorted straight line, so it's just obvious as hell, whether or not you use your VCS to run the bisect or you do it by hand yourself.
"Hey guys, check it out! Water is wet!"
I mean, do you really not know this XKCD?