You can beat the binary search

Posted by vok 3 days ago

You can beat the binary search(lemire.me)

282 points | 129 commentspage 2

jstanley 16 hours ago|

As a teenager I spent a weekend thinking that if binary search was good, because it cuts the search space in half at every step, then wouldn't a ternary search be better? Because we'd cut it into thirds at every step.

So instead of just comparing the middle value, we'd compare the one at the 1/3 point, and if that turns out to be too low then we compare the value at the 2/3 point.

Unfortunately although we cut the search space to 2/3 of what it was for binary search at each step (1/3 vs 1/2), we do 3/2 as many comparisons at each step (one comparison 50% of the time, two comparisons the other 50%), so it averages out to equivalence.

EDIT: See zamadatix reply, it's actually 5/3 as many comparisons because 2/3 of the time you have to do 2.

zamadatix 15 hours ago||

This ternary approach doesn't actually average 3/2 comparisons per level:

- First third: 1 comparisons

- Second third: 2 comparisons

- Third third: 2 comparisons

(1+2+2)/3 = 5/3 average comparisons. I think the gap starts here at assuming it's 50% of the time because it feels like "either you do 1 comparison or 2" but it's really 33% of the time because "there is 1/3 chance it's in the 1st comparison and 2/3 chance it'll be 2 comparisons".

This lets us show ternary is worse in total average comparisons, just barely: 5/3*Log_3[n] = 1.052... * Log_2[n].

In other words, you end up with fewer levels but doing more comparisons (on average) to get to the end. This is true for all searches of this type (w/ a few general assumptions like the values being searched for are evenly distributed and the cost of the operations is idealized - which is where the main article comes in) where the number of splits is > 2.

jstanley 15 hours ago||

Oh yeah!

krackers 9 hours ago||

If you sort of squint this idea does work in cases where the cost of comparison is dominated by the cost of going down a level. And that leads you to things like b-trees where fetching a page from disk is expensive but doing the comparisons within that page is basically free.

GuB-42 14 hours ago|||

It turns out that teenager you had something.

Not for the ternary version of the binary search algorithm, because what you had is just a skewed binary search, not an actual ternary search. Because comparisons are binary by nature, any search algorithm involving comparisons are a type of binary search, and any choice other than the middle element is less efficient in terms of algorithmic complexity, though in some conditions, it may be better on real hardware. For an actual ternary search, you need a 3-way comparison as an elementary operation.

Where it gets interesting is when you consider "radix efficiency" [1], for which the best choice is 3, the natural number closest to e. And it is relevant to tree search, that is, a ternary tree may be better than a binary tree.

[1] https://en.wikipedia.org/wiki/Optimal_radix_choice

eggprices 15 hours ago|||

When you can't seek quickly, e.g. on a disk, you can use a B-tree with say 128-way search. Fetching 128 keys doesn't cost much more than fetching 1 but it saves an additional 7 fetches.

bryanlarsen 16 hours ago|||

Did you continue by fantasizing about CPU's that contain ternary comparators?

ack_complete 15 hours ago|||

Note that CPUs have also gotten dramatically wider in both execution width and vector capability since you were a teenager. The increased throughput shifts the balance more toward being able to burn operations to reduce dependency chains. It's possible for your idea to have been both non-viable on the CPUs at the time and more viable on CPUs now.

nkurz 15 hours ago|||

> Unfortunately although we cut the search space to 2/3 of what it was for binary search at each step (1/3 vs 1/2), we do 3/2 as many comparisons at each step (one comparison 50% of the time, two comparisons the other 50%), so it averages out to equivalence.

True, but is there some particular reason that you want to minimize the number of comparisons rather than have a faster run time? Daniel doesn't overly emphasize it, but as he mentions in the article: "The net result might generate a few more instructions but the number of instructions is likely not the limiting factor."

The main thing this article shows is that (at least sometimes on some processors) a quad search is faster than a binary search _despite_ the fact that that it performs theoretically unnecessary comparisons. While some computer scientists might scoff, I'd bet heavily that an optimized ternary search could also frequently outperform.

jstanley 15 hours ago||

You normally measure runtime of a sorting algorithm in terms of the number of comparisons it has to do.

Obviously real-world performance depends on other things as well.

Someone 15 hours ago||

Not “normally”, but “in computer science” and even then, mostly “in the past” and even then, only “typically” (there are sorting algorithms that make zero comparisons. See for example https://pages.cs.wisc.edu/~paton/readings/Old/fall01/LINEAR-...)

All other people live in the real world, and care about real-world performance, and modern computer scientists know that.

alexfoo 14 hours ago||

Those algorithms may not be doing any pairwise comparisons (e.g. between elements being sorted) but they still do plenty of comparisons.

And some of the algorithms, as described, still end up doing pairwise comparisons in all-but-optimal cases.

(Bucket sort requires items that end up in the same bucket to be sorted. This doesn't happen automatically via the algorithm as stated. Radix sort requires the items at each "level" to be sorted. Neither algorithm specifies how this should be done without pairwise comparisons.)

Counting Sort does work without pairwise comparisons, but is only efficient for small ranges of values, and if that's the case then it's obvious you don't need to apply a traditional sort if the number of elements greatly outnumbers the number of possible values.

Also, the algorithms still require some form of comparisons, just not pairwise comparisons.

> All other people live in the real world, and care about real-world performance, and modern computer scientists know that.

Yes, completely agree with that, but traditional "Comp Sci" is built on small building blocks of counting "comparisons" or "memory accesses". It's not designed to analyse prospective performance given modern processors with L1/L2/L3 caches, branch prediction, SIMD instructions, etc.

compiler-guy 14 hours ago|||

This idea is closely related to the famous "Stooge Sort", which is basically quicksort with the pivot at 1/3 rather than 1/2. Naively, one might think it is faster than Quicksort, but of course it isn't.

For years--maybe still?--analyzing its running time was a staple of the first or second problem set in a college-level "Introduction to Algorithms" course.

https://en.wikipedia.org/wiki/Stooge_sort

thaumasiotes 5 hours ago||

> the famous "Stooge Sort", which is basically quicksort with the pivot at 1/3 rather than 1/2. Naively, one might think it is faster than Quicksort, but of course it isn't.

The pivot in quicksort isn't located anywhere fixed. You can do quicksort by always choosing the pivot to be the first element in the (sub)list before you do the swapping. If it happens that the pivot you choose always belongs 1/3 of the way through the sorted sublist... that won't even affect the asymptotic running time; it will still be O(n log n).

Stoogesort is nothing like that. It's worse than quadratic.

In quicksort, though, once you've done your thing with the pivot, you break the list into two non-overlapping sublists. If it happened that the pivot broke the list into 1/3 and 2/3, you'd then recur with one sublist of length n and one of length 2n.

Stoogesort has a different recursion pattern: you recur on the first 2/3, then you recur on the second 2/3, then you recur on the first 2/3 again.

The processing step isn't similar to quicksort either - in quicksort, you get a sublist, choose an arbitrary pivot, and then process the list such that every element of the list is correctly positioned relative to the pivot, which takes O(n) time. In stoogesort, you process the list by swapping the beginning with the end if those two elements are out of order, which takes O(1) time.

madcaptenor 15 hours ago|||

Isn't it a bit better on average, although not as much as you'd hoped? For example 19 steps of binary search get you down to 1/524288 of the original search space with 19 comparisons. 12 steps of ternary search get you down to 1/3^12 = 1/531441 of the original search space with, on average, 12 * 3/2 = 18 comparisons.

jstanley 15 hours ago||

Maybe! But you can see the other comment that points out I was wrong and it is actually 5/3 comparisons so it still works out worse.

bena 15 hours ago|||

Imagine if you split the search space N times, no middles. Then you could just compare the value.

matchooo0 14 hours ago||

[dead]

gobdovan 15 hours ago||

I thought this would be about how you can beat binary search in the 'Guess Who?' game. There's a cool math paper about it [0] and an approachable video by the author. [1]

[0] https://arxiv.org/abs/1509.03327

[1] https://www.youtube.com/watch?v=_3RNB8eOSx0

thaumasiotes 13 hours ago|

You can't beat binary search in Guess Who. From the abstract:

>> Instead, the optimal strategy for the player who trails is to make certain bold plays in an attempt catch up.

The reason that's optimal, if you're losing, is that you assume that your opponent, who isn't losing, is going to use binary search. They're going to use binary search because it's the optimal way to find the secret.

Since you're behind, if you also use binary search, both players will progress toward the goal at the same rate, and you'll lose.

Trying to get lucky means that you intentionally play badly in order to get more victories. You're redistributing guesses taken between games in a negative-sum manner - you take more total guesses (because your search strategy is inferior to binary search), but they are unevenly distributed across your games, and in the relatively few games where you perform well above expectation, you can score a victory.

gobdovan 11 hours ago||

You're mixing two different objectives the paper presents. You can't beat binary search when the objective is to minimise the expected number of turns in a single player setting.

However, in a two player setting, using the strategies presented in the paper, you will beat an adversary that uses binary search in more than 50% of the games played.

Here's another visual demonstration: https://www.youtube.com/watch?v=zmvn4dnq82U

thaumasiotes 10 hours ago||

What do you think you're saying that I didn't already say?

> in a two player setting, using the strategies presented in the paper, you will beat an adversary that uses binary search in more than 50% of the games played.

This is technically true. But 50 percentage points of your "more than 50%" of games played are games where you exclusively use binary search. For the remainder, you're redistributing luck around between potential games in a way that is negative-sum, exactly like I just said.

gobdovan 9 hours ago||

> But 50 percentage points of your "more than 50%" of games played are games where you exclusively use binary search

Although I think I get your point, saying 'You can't beat binary search in Guess Who' is misleading, considering you would probably describe yourself the optimal strategy as 'play binary search when ahead, when behind, don't'.

> Trying to get lucky means that you intentionally play badly in order to get more victories

That's quite an uncommon definition of good and bad.

bediger4000 3 days ago||

The (AI generated?) image on this article is absolutely not helpful, and I think it's wrong based on how I read the article. Better not to have an image at all.

crazygringo 14 hours ago||

Seriously. It makes it seem like this is going to be a blog post either intended for elementary school students, or more likely for teachers on how to better explain some arithmetic concept to elementary school students.

It's absolutely bizarre. Images communicate meaning. Much better to have no image than to have an image that is completely misleading about the target audience or level of technical sophistication.

iosovi 16 hours ago||

Agreed, it threw me off at first but the rest of the article was quite nice.

aidenn0 16 hours ago||

If you are storing 16-bit integers, wouldn't an 8kB bitmap be even faster?

loeg 14 hours ago||

The library the author is talking about selects between bitmap and array dynamically depending on density.

https://roaringbitmap.org/

Findecanor 15 hours ago||

The range is 1..4096, so 4096 bits = 512 byte bitmap would suffice.

That is, if you're only ever going to test for membership in the set. If you need metadata then ... You could store that in a packed array and use a population count of the bit-vector before the lookup bit as index into it. For each word of bits, store the accumulated population count of the words before it to speed up lookup. Modern CPU's are memory-bound so I don't think SIMD would help much over using 64-bit words. For 4096 bits / 64, that would be 64 additional bytes.

aidenn0 8 hours ago||

The size is 1..4096, the range is implied to be the full 16-bit integer range.

BeetleB 15 hours ago||

Some of the plots would have been much more helpful if instead of absolute value in seconds, the y-axis were the multiplier w.r.t binary search (and eyeballing suggests a relatively constant multiplier).

Obviously, this isn't changing the big-Oh complexity, but in the "real world", still nice to see a 2-4x speedup.

quirino 14 hours ago||

On optimizing binary search: https://en.algorithmica.org/hpc/data-structures/binary-searc...

garaetjjte 13 hours ago|

I once did have a need for binary search in memory mapped files and I experimented with Eytzinger layout (which I learned from https://bannalia.blogspot.com/2015/06/cache-friendly-binary-...). It turned out that it was slower than plain binary search, I think because keys I was looking up were often clumped together thus it played quite well with cache anyway.

layer8 12 hours ago||

…for 16-bit integers, and it’s still a binary search with the same asymptotic complexity, just a constant-factor speedup.

senfiaj 13 hours ago||

The title is slightly misleading, I mean yes, naive binary search might have larger constant but the algorithm is still O(log(n)). This is still some "divide and conquer" style algorithm just with bunch of CPU specific optimizations. Also this works well with simple data structures, like integers, with more complex objects (custom comparators) it matters less.

pfortuny 12 hours ago||

The complexity of binary search in terms of "search" (comparison) operations is exactly log_2(n)+1, not just O(n). This algorithm just uses modern and current processor architecture artifacts to "improve" it on arrays of up to 4096 elements.

So not exactly "n" as in O(n).

Also: only for 16-bit integers.

senfiaj 12 hours ago||

> The complexity of binary search in terms of "search" (comparison) operations is exactly log_2(n)+1, not just O(n)

> So not exactly "n" as in O(n).

For large enough inputs the algorithm with better Big O complexity will eventually win (at least in the worst cases). Yes, sometimes it never happens in practice when the constants are too large. But say 100 * n * log(n) will eventually beat 5 * n for large enough n. Some advanced algorithms can use algorithms with worse Big O complexity but smaller constants for small enough sub-problems to improve performance. But it's more like to optimization detail rather than a completely different algorithm.

> This algorithm just uses modern and current processor architecture artifacts to "improve" it on arrays of up to 4096

Yes, that's my point. It's basically "I made binary search for integers X times faster on some specific CPUs". "Beating binary search" is somewhat misleading, it's more like "microptimizing binary search".

cubefox 13 hours ago||

> The title is slightly misleading, I mean yes, naive binary search might have larger constant but the algorithm is still O(log(n)).

I think the title is not misleading since the Big O notation is only supposed to give a rough estimate of the performance of an algorithm.

(I agree though that binary search is already extremely fast, so making something twice as fast won't move the needle for the vast majority of applications where the speed bottleneck is elsewhere. Even infinite speed, i.e. instant sorted search, would likely not be noticeable for most software.)

senfiaj 12 hours ago||

For me it's slightly misleading because it's almost like saying "I wrote a faster quicksort implementation, so it beats quicksort!". In this case the binary search fundamental idea of "divide and conquer" is still there, the article just does microptimizations (which seem to be not very portable and are less relevant/applicible for more complex data structures) in order to reduce the constant part.

Yes, algorithmic complexity is theoretical, it often ignores the real world constants, but they are usually useful when comparing algorithms for larger inputs, unless we are talking about "galactic algorithms" with insanely large constants.

wood_spirit 16 hours ago||

A beautiful algorithm.

Would there be any value in using simd to check the whole cache line that you fetch for exact matches on the narrowing phase for an early out?

gobdovan 15 hours ago|

I remember I had a pedagogy class in Uni taught by psychology faculty, and was messing with them by proposing a mock syllabus where we'd teach students binary search, then the advanced advanced ones ternary search, and the very advanced, Quaternary, with a big Q, as in the geological period. Jokes on me now, I suppose.

More comments...