Why Busy Beaver hunters fear the Antihydra

Posted by Bogdanp 1 day ago

Why Busy Beaver hunters fear the Antihydra(benbrubaker.com)

234 points | 70 comments

emtel 1 day ago|

The best current lower bound for BB(6) is 2↑↑2↑↑2↑↑9 (google "Knuth Up Arrow" if this makes no sense), a number so inconceivably large it gives me the willies.

In particular, this means that in going from BB(5) to BB(6), you have already crossed the line where the actual busy beaver TM can no longer be simulated step by step in the lifetime of our universe (or a googol lifetimes of our universe for that matter).

It really is mind bending how fast this function grows.

henry2023 1 day ago||

One of the curiosities about this function is that computing BB(748) is independent of ZFC.

https://scottaaronson.blog/?p=4916

Sniffnoy 1 day ago|||

The record's been lowered since then, I should note. At least down into the 600s, I've seen claims of down in the 400s. But I haven't really kept up with this.

srcreigh 1 day ago||

Link https://wiki.bbchallenge.org/wiki/Independence_from_ZFC

Sniffnoy 9 hours ago||

Thank you!

red75prime 1 day ago|||

In other words, Turing machine creates mathematical reality, which is independent from our assumptions.

cwillu 21 hours ago|||

No, that's a misconception. If you add BB(748)==‹any N but the correct number›, you get an inconsistent system that will either claim that a machine doesn't halt even though it does (e.g. the real BB champion), or that some machine does halt at N steps, which you can then disprove by enumerating all the turing machines of the relevant number of states for N steps, and showing that no machine halts at that step.

Either way, the only BB axiom you can add without blowing up ZFC is the correct one.

emtel 12 hours ago|||

Nope, that’s the crazy thing. busy beaver numbers are simple arithmetical constructions. There is a fact of the matter about each value of BB(n)! We just can’t ever know more than a small handful of them.

tromp 1 day ago|||

The BB function does grow mind bendingly fast. The machine running for 2↑↑2↑↑2↑↑9 steps is one of the 4^12*23836540 = 399910780272640 differently behaving 6-state machines [1].

A similarly fast growing function is the functional busy beaver [3]. Among all 77519927606 closed lambda terms of size <= 49 bits, there is one whose normal form size exceeds the vastly larger Graham's Number [3].

Several beaver fans believe that BB(7) might exceed Graham's Number as well, which struck me as unlikely enough to offer a $1k bet against it, the outcome of which will be known in under a decade.

[1] https://oeis.org/A107668

[2] https://oeis.org/A333479

[3] https://en.wikipedia.org/wiki/Graham%27s_number

Eliezer 1 day ago|||

It would not surprise me at all for bb7 to exceed Graham's number. Just a Kirby-Paris hydra or a Goodstein sequence gets you to epsilon zero in the fast-growing hierarchy, where Graham is around omega+2.

tromp 1 day ago||

The 79-bit lambda term λ1(λλ2(λλ3(λ312))(1(λ1)))(λλ1)(λλ211)1 in de-Bruijn notation exhibits f_ε0 growth without all the complexities of computing Kirby-Paris hydra or Goodstein sequences. Even that is over 60% larger than the 49-bit Graham exceeder (λ11)(λ1(1(λλ12(λλ2(21))))). I think one should be quite surprised if you can climb from f_4 (2↑↑2↑↑2↑↑9) to f_{ω+1} (Graham) with just 1 additional state.

JulianChastain 17 hours ago|||

It seems totally inconceivable to me that you could accurately predict how long until we will know whether BB(7) is greater than Graham's Number.

tromp 17 hours ago||

I do not predict that. We just need the bet to have a time limit because BB(7) will always have holdouts as long as I live. I chose 10 years because I have prior experience with that timeframe [1].

[1] https://senseis.xmp.net/?ShodanGoBet

baruchel 1 day ago|||

> It really is mind bending how fast this function grows.

While the BB function is obviously a well-defined function over the integers, I find it helpful to think of it as a function over qualitatively heterogeneous items—such as stones, bread toasters, mechanical watches, and computers. The key idea is to view the underlying computing devices not as “a little more powerful” than the previous ones, but as fundamentally different kinds of entities.

throwaway81523 1 day ago||

> best current lower bound

Well is there a best current UPPER bound, or at least a "probvious" one?

shawnz 1 day ago|||

I think finding an upper bound is basically just as difficult as finding the actual value itself, since both would require proving that all of the programs which run longer than that will run forever. That's why we can say BB(x) grows faster than any computable function. Being able to compute BB(x) algorithmically or any faster growing function would let you solve the halting problem

throwaway81523 1 day ago||

Sure, but I only asked about the single case x=6.

shawnz 19 hours ago|||

The point stands: the hard part is proving that all the programs with longer runtime than your upper bound will never terminate, and once you've solved that, getting the exact value is just a little extra work

thaumasiotes 22 hours ago|||

If you want an unproven-but-almost-certainly-correct upper bound on BB(6), consider BB(12).

sligocki 11 hours ago||

Not sure if this is a joke, but actually that is guaranteed to be true. It is proven that for all n: BB(n+1) >= BB(n) + 3. But it is not proven that BB(n+1) >= BB(n) + 4, haha.

chtl 11 hours ago||||

There is of course not currently any upper bound on BB(6).

I find the question about a probvious upper bound more interesting - There is also not a probvious upper bound on BB(6), as this would require at least some understanding of the high-level behavior of all remaining holdout machines. However, there may soon be a 'probvious' upper bound on the value of BB(3,3) (BB for 3-state, 3-symbol machines). Up to equivalence, there are four remaining machines to decide to find the value of BB(3,3). One is a 'probviously halting' machine which will be the new champion if it halts, and for which probabilistic models of its behavior predict with high probability an exact halting time. One is a 'probviously nonhalting' machine. The two other machines are not well-understood enough to say whether they have any probvious long term behavior, but some suspect that they both 'probviously nonhalt'. If this is true it could be said that a 'probvious' upper bound exists for BB(3,3).

cevi 1 day ago|||

There is no general procedure for computing upper bounds on busy beaver numbers (this can be proven). We haven't even come close to enumerating all of the interesting six-state Turing machines, so right now we don't even have a wild guess for an upper bound on BB(6).

kibwen 1 day ago||

Despite all the reporting on BB(5) I had never seen anyone convey that equivalent high-level formulation from 1993, that's very cool!

EDIT: For fun I converted it to Rust and expected to see it spew a few million numbers into my terminal, but no, this equivalent loop actually terminates after 15 steps, which is fascinating given that the Turing machine takes 47 million steps:

    let mut x = 0;
    loop {
        x = match x % 3 {
            0 => 5 * x + 18,
            1 => 5 * x + 22,
            _ => break
        } / 3;
    }

OEIS link: https://oeis.org/A386909

EDIT 2: Of course the article mentions this in the next paragraph, which is what I get for being immediately nerd-sniped.

ameliaquining 1 day ago|

This is because your Rust program represents the numbers in binary, while the BB(5) champion Turing machine represents them in unary. And unary arithmetic is exponentially slower than place-value arithmetic, which is why we invented the latter. (There are other inefficiencies in the Turing machine, but that's the big conceptual one.)

tgv 1 day ago|||

Nitpicking: just as TMs can use binary or decimal arithmetic, so could Rust programs use unary. It's not an inefficiency in TMs per se, but I can see how it would help a TM to become a BB champion.

rini17 20 hours ago||

> could Rust programs use unary

By not using any math functions except for increment by one?

JulianChastain 19 hours ago||

By representing all numbers with lists (or sets). 0 = [] 1 = [True] 2 = [True, True] Etc. Then for example addition becomes appending two lists together

achierius 1 day ago|||

What else comes to mind in terms of inefficiencies? I can think of quite a few but you seem to have deeper insight as to their ranking so I'm curious as to your thoughts.

russdill 1 day ago||

TLDR; As BB(n) gets larger, they can encode more random walk style problems that have a stop condition related to the position of the random walk. Proving that such a condition is unlikely may be easy, but proving it never occurs is very difficult.

gbacon 1 day ago|

Way worse than just very difficult:

- “Avoid the Collatz Conjecture at All Costs!” (Math Kook) https://www.youtube.com/watch?v=TxBRcwkRjmc

- “Experienced mathematicians warn up-and-comers to stay away from the Collatz conjecture. It’s a siren song, they say: Fall under its trance and you may never do meaningful work again.” https://www.quantamagazine.org/mathematician-proves-huge-res...

- "Mathematics is not yet ready for such problems [as Collatz].” (Paul Erdős)

hyghjiyhu 1 day ago||

Iirc if you change the numerical values of the collatz problem some instances are undecidable.

gowld 1 day ago||

A generalized Collatz problem ((mx + b mod n) instead of 3x+1 in Z) is undecidable.

CobrastanJorji 1 day ago||

That was a really well written article. I think even somebody who had never heard of a Turing Machine could probably have gotten a pretty reasonable quick understanding of roughly what BB(5) and BB(6) are and how the Antihydra works and its greater mathematical/historical context. That's hard to do, good job!

gbacon 1 day ago||

Why we care about Busy Beaver numbers, from “Who Can Name the Bigger Number?” by Scott Aaronson:

Now, suppose we knew the Nth Busy Beaver number, which we’ll call BB(N). Then we could decide whether any Turing machine with N rules halts on a blank tape. We’d just have to run the machine: if it halts, fine; but if it doesn’t halt within BB(N) steps, then we know it never will halt, since BB(N) is the maximum number of steps it could make before halting. Similarly, if you knew that all mortals died before age 200, then if Sally lived to be 200, you could conclude that Sally was immortal. So no Turing machine can list the Busy Beaver numbers—for if it could, it could solve the Halting Problem, which we already know is impossible.

But here’s a curious fact. Suppose we could name a number greater than the Nth Busy Beaver number BB(N). Call this number D for dam, since like a beaver dam, it’s a roof for the Busy Beaver below. With D in hand, computing BB(N) itself becomes easy: we just need to simulate all the Turing machines with N rules. The ones that haven’t halted within D steps—the ones that bash through the dam’s roof—never will halt. So we can list exactly which machines halt, and among these, the maximum number of steps that any machine takes before it halts is BB(N).

Conclusion? The sequence of Busy Beaver numbers, BB(1), BB(2), and so on, grows faster than any computable sequence. Faster than exponentials, stacked exponentials, the Ackermann sequence, you name it. Because if a Turing machine could compute a sequence that grows faster than Busy Beaver, then it could use that sequence to obtain the D‘s—the beaver dams. And with those D’s, it could list the Busy Beaver numbers, which (sound familiar?) we already know is impossible. The Busy Beaver sequence is non-computable, solely because it grows stupendously fast—too fast for any computer to keep up with it, even in principle.

https://www.scottaaronson.com/writings/bignumbers.html

_alternator_ 1 day ago|

Ok, I read this post quite a while ago and something about the reasoning bothered me then, and it still bothers me now.

In short, my read is that the argument does not rule out that there is a computable function that grows faster than BB(N), but rather it shows that it is impossible to prove or “decide” whether a given computable function grows faster than BB(N).

Maybe this is equivalent to the conclusion stated? Am I missing something obvious? (That sees likely; Scott Aaronson is much better at this than me.)

Edited for clarity

jcranmer 1 day ago|||

It is definitely way too easy to accidentally confuse "x is true" and "x is proven to be true" on topics of undecidability, and this is one of those times where the answer is in fact obvious, but it takes you 15 minutes to realize that it's obvious.

The halting problem is uncomputable (by diagonalization)--that is no computable function can solve the halting problem, rather than the weaker statement that we cannot assert that "this" function solves it. As a result, if somebody asserts a function that purports to solve the halting problem, then we can definitively say that either it isn't computable, or the thing it solves isn't the halting problem.

For the Busy Beaver function, it's obvious that the resulting program (if it existed) would solve the halting problem, so clearly it's not a computable function, and analyzing what part of it isn't computable leads you to the Busy Beaver as the only option.

For the D(N) function... well, since we assume D(N) ≥ BB(N), D(N) is still an upper bound on the number of steps a halting TM could run, so the resulting program using D(N) in lieu of BB(N) would still solve the halting problem, which forces us to conclude that it's not computable.

A different argument that may make more sense is this:

Consider the program "if machine M has not halted after f(sizeof M) steps, print not halt, else print halt." If f is a computable function, then the program is clearly computable. But since no computable program can solve the halting problem, we know that this program cannot either. Therefore, for every computable function f, there must exist some machine M such that M halts only after more than f(sizeof M) steps. In other words, f cannot be an upper bound on the Busy Beaver function.

_alternator_ 1 day ago||

This indeed helps me! The key distinction that there is no computable function that solves the halting problem, and whether we can prove it is irrelevant, clarifies the issue. Thanks, I definitely learned something here, even if it’s obvious in retrospect.

SAI_Peregrinus 8 hours ago||

Many of the most interesting theorems in math are obvious only in retrospect.

pmb 1 day ago||||

Any computable function f on one variable x has a program. That function is a program of size p. The input x also has a data size d. BB(p+d) >= f(x), by definition, for all f and x. If you think you might have a (function, input) pair (and corresponding (program, data) pair) for which this is not true, see the previous sentence.

_alternator_ 1 day ago||

This approach leaves open the possibility that f(x) = BB(p+d) right?

linschn 22 hours ago||

No, because f is assumed to be computable from the start, which BB is not (otherwise it could be used as a subroutine in a program that solves the halting problem).

KalMann 1 day ago||||

I think Scott's reasoning is correct in the end. If you suppose you had a computable function f(N) such that f(N) is always greater than BB(N). Then you could exploit the function f to solve the halting problem. Given a program of length N, run the program for f(N) steps. If it halts within that time, you know it's a halting program. If it doesn't halt within that time you know it will never halt.

breuleux 5 hours ago|||

But does "a computable function that grows faster than BB(N)" actually mean that f(N) is always greater than BB(N), or could it be taken to simply mean that f(N) > BB(N) for a proportion of N that trends towards 1? In other words, could you have a function that almost always upper bounds BB(N), except for a tiny (but still technically infinite) proportion of exceptions? Such a function would be larger than BB in a meaningful sense, but if the sequence of exceptions is not computable, it wouldn't be possible to solve the halting problem with it. There may be a problem with this concept, but it doesn't appear obvious to me.

_alternator_ 1 day ago|||

Yes, this I understand. I agree that it is impossible to “prove” that a computable function f(N) is always greater than BB(N).

My concern is that the argument leaves open the possibility of a larger computable function, even if it would be impossible to demonstrate that it is in fact larger for all N.

I’m sure that this possibility is somehow foreclosed (that is, I’m not trying to say that the claim is wrong, just that I think there is a case that isn’t covered by the argument). But I don’t quite see it.

KalMann 1 day ago|||

No, it has been proven that there exists no turing machine that can solve the halting problem. If we assume that there exists a computable f(N) that is always greater than BB(N) then we can construct a Turing machine that solves the halting problem. Therefore no computable f(N) can exist.

_alternator_ 1 day ago||

This was the key missing part for me; the program can’t exist, whether we can prove it is beside the point. Thanks.

KalMann 12 hours ago||

I think the issue is related to notion of a "direct" proof vs "proof by contradiction". What I gave was a proof by contradiction. Once you start thinking about the philosophy of proofs it, it's an interesting question to ask whether or not proofs by contradiction should really count. I don't know much about this but you can search up the term "Law of the excluded middle" is a good place to start reading about these concepts.

_alternator_ 6 hours ago||

Trust me, I understand proof by contradiction :). The point I was missing was the difference between an undecidable question vs impossible. See the rest of the thread above.

ted_dunning 1 day ago|||

It sounds to me like you have a good line of thought.

The key is that we can't prove that your function f grows faster than BB. That makes all the difference.

gbacon 1 day ago|||

Aaronson’s argument shows by contradiction that a computable upper bound D(N) that grows more rapidly than BB(N) cannot exist because otherwise we’d be able to use it to solve the halting problem.

MarcelOlsz 1 day ago||

If it's not loading for anyone else [0]

[0] https://web.archive.org/web/20251027173129/https://benbrubak...

entaloneralie 18 hours ago||

Immediately implements the Antihydra in Fractran

13122 -> 50/18, 55/42, 539/2, 297/275, 2/55, 2/11

7373737373 12 hours ago||

This is cool, could you please explain your implementation?

entaloneralie 12 hours ago||

For sure, here are the registers used for each fraction:

  comp h^2 > comp H^2
  comp h c > done H
  comp     > done c^2
  done H^2 > done h^3
  done H   > comp
  done     > comp

  Accumulator: comp h^8

The first two does the comparison for odd/even, the h register is moved to the H register during the comparison, then done does H += H>>2, and then keep trying.

https://wiki.xxiivv.com/site/fractran.html

7373737373 7 hours ago||

Thank you very much - great link too!

sligocki 12 hours ago||

50/18 reduces to 25/9 right?

sligocki 12 hours ago||

And 297/275 to 27/25?

altruios 1 day ago||

The Antihydra will halt if:

The sequence is (truly/fairly) random in its distribution of mods 1/2.

Even fair coins flipped infinitely would - on occasion - have arbitrary long results of heads or tails.

So the question becomes, is the anti-hydra sequence 'sufficiently' random?

LegionMammal978 1 day ago||

The peak run lengths of evens/odds 'should' go to infinity, but these runs become a smaller and smaller component of the overall average, so that it is expected to approach the long-term 50% regardless.

In other words, an unbiased random walk should almost surely return to the origin, but a biased random walk will fail to return to the origin with nonzero probability. This can be considered a biased random walk [0], since the halting condition linearly moves further and further away from the expected value of the 50/50 walk.

[0] https://wiki.bbchallenge.org/wiki/Antihydra#Trajectory

A_D_E_P_T 1 day ago|||

It is by definition not random. The Antihydra is generated by a fixed computable map, so it is compressible and would fail some effective statistical tests. You can't get true randomness via a deterministic algorithm; any computable infinite sequence fails Martin‑Löf randomness.

That said, empirically and in all current analyses, the Antihydra's parity behaves as if it were roughly fair over long spans (neither a proven odd nor even bias), and the short-range statistics look pseudo-random. Non-halting is overwhelmingly plausible... but a concrete proof seems out of reach.

wat10000 1 day ago||

I don't think a truly random sequence would necessarily halt under these rules. It's not enough to have arbitrarily long runs. As the sequence as a whole gets larger, the run length needed to end it also gets longer, and thus the probability gets smaller. The result should be something like a geometric sequence with a finite sum.

Consider a simpler version, where you flip a coin three times, then four times, then five times, etc., and you stop if you ever get the same side for every flip in a given turn. The probability that you'll stop is equal to 1/4 + 1/8 + 1/16 + ... which is 50%. If you do this forever then you'll eventually see a run of ten trillion heads or tails, but you probably won't see that run before your ten trillionth turn.

So I think the question is, does the anti-hydra sequence ever diverge sufficiently from randomness?

altruios 1 day ago||

> As the sequence as a whole gets larger, the run length needed to end it also gets longer, and thus the probability gets smaller. The result should be something like a geometric sequence with a finite sum.

This is true.

But it would still halt. Infinity is weird like that. To be clear, I mean the sequence of coin flips where the total value of heads/tails is 2:1.

The probability of having a 2:1 ratio of heads/tails - at some point - in an infinite sequence of fair flips is 1, is it not?

The anti-hydra may have a bias, and only if that bias is against the halt condition do we have a case where we can conclude that the anti-hydra does not halt.

7373737373 1 day ago|||

Even if an event has probability 1 it is not inevitable, conversely probability 0 does not imply its impossibility.

For example, randomly picking the number 0.5 out of the interval of real numbers [0,1] has probability 0, and yet it might happen. The probability of picking an irrational number instead was 1 (because almost all real numbers are irrational), but that didn't happen.

Even if you consider a countably infinite number of events, as with the coinflip example, it might just happen that the coin flips to one side forever.

Since the machines under consideration just represent one specific sequence of events, probabilistic arguments may be misleading.

Relevant xkcd: https://xkcd.com/221/

wat10000 1 day ago||||

No, I don't think it's 1. The weirdness of infinity can go both ways. A classic example being that a random walk on a line or a two-dimensional grid takes you back to your starting point an infinite number of times, but for a three dimensional grid you only return to the start a finite number of times, quite possibly zero.

This problem is equivalent to a one-dimensional random walk where the terminating condition is reaching a value equal to the number of steps you've taken divided by 3. I'm not quite sure how to calculate the probability of that.

Intuitively, I'd expect this to have a finite probability. The variance grows with sqrt(n), which gets arbitrarily far away from n/3.

Looking at it another way, this should be very similar to the gambler's ruin problem where the gambler is playing against an infinitely rich house and their probability of winning a dollar is 2/3. If the gambler starts with $1 then the probability of ever reaching zero is 1 - (1/3)/(2/3) = 50%. Reference for that formula: https://www.columbia.edu/~ks20/FE-Notes/4700-07-Notes-GR.pdf

LegionMammal978 1 day ago||

You can solve it with a linear recurrence relation [0]: the halting probability from position n is ((sqrt(5)-1)/2)^(n+1), where n is twice the number of odds minus the number of evens. (In fact, this +2/-1 random walk is precisely how the machine implements its termination condition.) The expected value of n is 1/3 the number of iterations. At the end of the longest simulation that has been computed, n is greater than 2^37, so the halting probability is less than 10^(-10^10).

[0] https://wiki.bbchallenge.org/wiki/Antihydra#Trajectory

gowld 1 day ago|||

> But it would still halt. Infinity is weird like that

What are you tring to say?

> The probability of having a 2:1 ratio of heads/tails - at some point - in an infinite sequence of fair flips is 1, is it not?

Yes, but "probability = 1" absolutely does not mean "will happen eventually" in pure mathematics. Infinity is weird like that.

LegionMammal978 1 day ago||

The probability is less than 1, and in fact it exponentially goes to 0, since the halting condition can be modeled as a biased random walk [0].

[0] https://wiki.bbchallenge.org/wiki/Antihydra#Trajectory

fijiaarone 1 day ago|

Is this like SETI@home, Bitcoin, and Ai code generation?

In the old days we used to just chop wood, and burn it to keep warm. Then sit down and watch the sportsball game on TV to waste time.

ameliaquining 1 day ago||

No, it's not a distributed-computing thing; raw compute isn't the bottleneck. (That would only help if there were a need to check many machines that halt after a tractable-but-nontrivial number of steps; that's a narrow sweet spot, given the superexponential nature of the problem, and few machines of interest are believed to be in it.) Rather, it's a collaboration of human (mostly amateur) mathematicians chipping away at different parts of the problem.

weregiraffe 1 day ago||

>sportsball

People who use this word should be banned from the internet for life.