New proof dramatically compresses space needed for computation

Posted by baruchel 4 days ago

New proof dramatically compresses space needed for computation(www.scientificamerican.com)

186 points | 95 comments

zerof1l 1 day ago|

Here's the gist:

For nearly 50 years theorists believed that if solving a problem takes t steps, it should also need roughly t bits of memory: 100 steps - 100bits. To be exact t/log(t).

Ryan Williams found that any problem solvable in time t needs only about sqrt(t) bits of memory: a 100-step computation could be compressed and solved with something on the order of 10 bits.

svat 1 day ago||

To be clear, this was/is a statement about the worst case, not every problem. So it may be clearer to rephrase your sentence as:

For nearly 50 years theorists believed that there exist problems taking t steps that also need roughly t/log(t) bits of memory.

(The Quanta magazine article https://www.quantamagazine.org/for-algorithms-a-little-memor... gives some context: for a long time, the best result was t/log(t) as proved in "On Time Versus Space" https://dl.acm.org/doi/10.1145/322003.322015 by Hopcroft, Paul and Valiant in 1975.)

makira 1 day ago||

See previous discussion: https://news.ycombinator.com/item?id=44055347

heavenlyblue 1 day ago|||

Why does it need the same bits of memory as steps?

mjburgess 1 day ago|||

The basic intuition of the two extremes are:

* High space requirement: we need a bucket per step to track the state

* Low space requirement: we only need a single bucket, if we can mix/unmix the various components of the state

High-space requirement algorithms will be those with a large amount of "unmixable" complex state. Low-space reqs will be those with either a small state, or an easily mixed/unmixed state.

Example of mixing: suppose we need an odd number and a parity (+,-) -- then we can store odd/even numbers: taking even to means (-1 * (number-1)) and odd means odd.

So 7 = +7, 8 = -7

I don't know if this example is really that good, but I believe the very basics of the intution is correct -- its to do with how we can trade "interpretation" of data (computation) for the amount of data in such a way that data can kinda be mixed/unmixed in the interpretation

TrueDuality 1 day ago|||

Bits are a bit misleading here, it would be more accurate to say "units of computation". If you're problem space operates on 32-bit integers this would be 32bits * number of steps, these papers solve for individual bits as the smallest individual unit of computation we can commonly reason about.

mort96 1 day ago||

Wait but if "bits of memory" here was supposed to be "units of computation", that means that:

* The old assumption was that if you require t steps to complete, you need t/log(t) units of computation

* This new proof shows that if you require t steps to complete, you need sqrt(t) units of computation

Surely this doesn't make sense? Using any definition of "unit of computation" I would intuitively assume, computing t steps requires something proportionalt to t units of computation...

nyrikki 1 day ago||

There is a bit of nuances here that is difficult to explain fully but note from the paper.

> Our simulation reduces the problem of simulating time-bounded multitape Turing machines to a series of implicitly-defined Tree Evaluation instances with nice parameters, leveraging the remarkable space-efficient algorithm for Tree Evaluation recently found by Cook and Mertz [STOC 2024].

The "time-bounded multitape Turing machines" with bounded fan-in means that that particular abstract model has access to the bits of those tapes current head position.

Mapping the quirks of the various abstract models can be tricky, but remember having access to the symbol currently under the tape head is 'free' in TMs.

It is a useful abstraction but doesn't directly map to physically realizable machine.

It is still an interesting result for trees in physically realizable machines.

conradev 1 day ago||

Cook and Mertz call the wider area they work on “catalytic computing”: https://www.quantamagazine.org/catalytic-computing-taps-the-...

andsoitis 1 day ago|||

> Ryan Williams found that any problem solvable in time t needs only about sqrt(t) bits of memory: a 100-step computation could be compressed and solved with something on the order of 10 bits.

Does it require one to first have solved the problem in uncompressed space?

jasperry 1 day ago||

As I understand it, this result is about algorithms that solve any instance of a given computational problem (like finding the shortest paths through a graph.) The specific problem instance (the graph) is the input to the algorithm. The result shows how to take an algorithm and construct another algorithm that solves the same class of problems in less space, by transforming how the algorithm reads and writes memory. Now you have an algorithm that still solves any instance of that problem but uses only sqrt(t) space.

So, to answer your question more directly: no, you don't have to have solved a specific instance of the problem, but you have to already have an algorithm that solves it in general.

cubefox 1 day ago|||

> Ryan Williams found that any problem solvable in time t needs only about sqrt(t) bits of memory

At least or at most or on average?

user070223 1 day ago|||

I've skimmed the result couple of weeks ago, and if I remember it states that there from all machines which takes t time to accomplish something there is one such that sqrt(t) memory is used. so it's at most but not on avg nor amorotized[not sure if et even make sense it the space where the problem lies] (you take the least memory hungry machine)

cma 1 day ago||

There must be more constraints, what if the problem is copying a string? Or is it only for turing tape machines where it has to backtrack during the copy, increasing number of steps?

jasperry 1 day ago||

That's a good example! So the sqrt(n) space in the result must refer to space beyond that needed to write down the output--basically a working memory scratchpad. In which case, a string-copying function could use just a constant amount of scratch space.

I mean, for all I know, the result may have been proved in the model of decision problems where the output is always one bit. But I'd guess that it generalizes just fine to the case where you have to write longer outputs.

However, your question does make me wonder if the result still applies in the RAM model, where you can move to any spot on the tape in constant time. My theory knowledge is getting rusty...

jasperry 1 day ago||

Update after youtube: It was already known that a time-t function can be computed in sqrt(t) space on a single-tape Turing machine. It's exactly like @cma said: a Turing machine wastes a lot of time moving the head back and forth. Williams' result shows sqrt(t) space for multi-tape Turing machines. Those are much more time-efficient in regard to head movement, so until Williams' proof it was not believed to be possible to to make any algorithm use only sqrt(t) space.

jeanlucas 1 day ago|||

In the best case

utopcell 1 day ago||

This is a worst-case result. In the best case, it is easy to see that the gap can be exponential. Imagine a program that increments a counter until it reaches value n. This "program" would only need O(logn) memory: mainly the counter itself.

taneq 1 day ago|||

Huh? What did any of those theorists think about the phrase "time/space tradeoff"?

bubblyworld 1 day ago|||

That phrase is usually about a specific problem, right? This is more like given an arbitrary algorithm (you don't know which up front), can you optimise it's memory usage? Seems kinda surprising to me that you can.

mhuffman 1 day ago|||

They were likely the ones that invented the phrase.

mort96 1 day ago|||

Really? But the whole idea behind the space/time trade-off is that in a bunch of cases, if you want to compute something in less memory, you need to solve it in more steps; or if you want to solve something in fewer steps, you need to use more memory.

This seems to wildly contradict the idea that the amount of memory needed is roughly proportional to the number of steps.

svat 1 day ago|||

> in a bunch of cases

That's precisely the issue: of course for many problems there are obvious ways to trade off space and time. Smart people who have spent their lives thinking about this are aware of this. :-) The issue here is: can this always be done? For an arbitrary computation that takes t time, can it always be simulated in much less space (using more time of course), no matter what the computation is? Lots of people have tried, and the best result until now was t/log t in the 1970s.

Edit: A comment on a previous discussion of this result by the author of the Quanta article, which substantiates and links to researchers' opinion that it's the universality (works for every problem) that's the surprising part: https://news.ycombinator.com/item?id=44075191

eru 1 day ago|||

The amount of steps is an (over)estimate on the amount of memory needed.

You can write some trivial programs that need this much memory. But most of them can be optimised to use less space (and still compute the same answer).

The big question was: can you always optimise the space usage down from the ridiculous maximum and to what extent?

mort96 1 day ago||

But plenty of programs can only be optimized to take less space by making them run slower (i.e take more steps)...

eru 1 day ago|||

Yes. But even with infinite time there's a limit to how much you can compress the space usage. And that's an interesting property to study.

In addition you can look at how much extra time you actually need: infinite time is a vast overestimate. The new proof gives much more reasonable time bounds.

Ar-Curunir 1 day ago|||

That is also what is happening here. When you reduce space to sqrt you are increasing time complexity. The interesting part of the result is that you can make this tradeoff for _any_ problem.

taneq 1 day ago|||

I'm not convinced of that if they thought that s=O(t/log(t)) where s=space (bytes or similar) and t=time (clocks, steps, etc.) I dunno about theoretical limits but in practice, there's at least 1-2 orders of magnitude to be exchanged between space and processing time.

eru 1 day ago||

That's in practice for practical problems. But they were interested in the worst cases.

m3kw9 1 day ago|||

Doesn’t make practical sense why they would even assign a number to problems which could have unknown dependencies. Unless you are talking about bounded math issues

zombot 1 day ago||

> log(t)

log to what basis? 2 or e or 10 or...

Why do programmers have to be so sloppy?

anonymous_sorry 1 day ago|||

It's not sloppiness, it's economy.

You can convert between log bases by multiplying by a constant factor. But any real-world program will also have a constant factor associated with it, depending on the specific work it is doing and the hardware it is running on. So it is usually pointless to consider constant factors when theorising about computer programs in general.

12345ieee 1 day ago||||

It doesn't matter in O() notation.

derbOac 1 day ago||||

One answer, from a certain perspective, is that it's relative to your encoding base. It's logarithmic in operations, with the base depending on the encoding.

eviks 1 day ago||||

Another reason is because base e notation is ln, not log

tzs 1 day ago|||

It depends on the field.

For example in programming base e is more common. For example log is base e in C/C++, JavaScript, Java, Python, Perl, Mathematica, Fortran, and Rust.

Another example is number theory. I just checked a few number theory books from my library and most used base e: Hardy & Wright's "An Introduction to the Theory of Numbers", Apostol's "Introduction to Analytic Number Theory", Baker's "A Comprehensive Course in Number Theory", Ingham's "The Distribution of Prime Numbers", Baker's "Transcendental Number Theory", Ribenboim's "The Book of Prime Number Records", Kumanduri & Romero's "Number Theory with Computer Applications", and Niven's "Irrational Numbers".

The only number theory book I found using ln rather than log was Gelfond's "Transcendental & Algebraic Numbers".

eviks 1 day ago||

That confusion is unfortunate, thanks for checking the books!

Am now a bit curious as to what the country/scientific field table with most common log notation would look like as it seems there is indeed a lot of variance...

https://mathworld.wolfram.com/Ln.html

> The United States Department of Commerce recommends that the notation lnx be used in this way to refer to the natural logarithm (Taylor 1995, p. 33).

> Unfortunately, mathematicians in the United States commonly use the symbol logx to refer to the natural logarithm, as does TraditionalForm typesetting in the Wolfram Language

tempodox 1 day ago|||

If only! The math libs I know use the name `log` for base e logarithm, not `ln`.

eviks 1 day ago||

hm, you're right, this unfortunatley proves the original point...

Tarq0n 1 day ago|||

This is very common. Log without further specification can be assumed to be the natural log (log e).

holowoodman 1 day ago|||

No. Usually log without further specification is base10.

Except in mathematics and physics, where it usually is base e.

Except sometimes in computer science, where it can be base 2.

But there are more of those weirdnesses: "ld" can be "log decimal", so base 10; or "logarithmus dualis", so base 2. Base 2 is also sometimes written as "lb" (log binary). You really need to know the conventions of the field/subfield to know which is which.

eru 1 day ago|||

Log without a basis (at least to me) usually indicates that the basis doesn't matter in this context (or is obvious).

I usually see either base e or base 2 as the default. In which applications do people use both logarithms and base 10 as the default? I mean, especially these day. In the olden days of the slide ruler base 10 was probably more common.

holowoodman 1 day ago||

> In which applications do people use both logarithms and base 10 as the default?

Engineering. For example dB is defined in base 10.

gbacon 1 day ago|||

Base 2 is also sometimes abbreviated lg.

mort96 1 day ago||||

In this case, and in many other cases, log without further specification is meant to be understood as just "it is logarithmic" without further specification. With big O, we don't differentiate between log bases, just as we don't differentiate between scale factors. In fact, converting from one log base to another is just multiplying by a constant.

griffzhowl 1 day ago||||

In information theory it usually means log base 2

eviks 1 day ago|||

no, that's what ln is for

thaumasiotes 1 day ago||

Well, you're right that that's what "ln" is for. But more specifically "ln" is for indicating the natural log on calculators that already have another button labeled "log". Tarq0n is correct that "log" without further specification can be assumed to be the natural log.

cyrillite 1 day ago||

One of my favourite sci comms YouTubers explained this in great detail https://youtu.be/8JuWdXrCmWg

Highly recommend

teekert 1 day ago||

Yeah, where Hossenfelder is getting more and more dramatic (although I can still appreciate it) she has a refreshingly calm and intelligent tone. Highly recommended indeed.

waynecochran 1 day ago|||

She has taken to click bait -- e.g., using the word "shocking" and showing pictures with here mouth open (actually she can't really open her mouth, but you get the idea).

eru 1 day ago|||

Sabine Hossenfelder also has very 'interesting' ideas about determinism.

teekert 1 day ago||

Which I share. It’s haunted me for a long time, but I’ve accepted it. Much like Sabine.

We can’t predict the future but we do not have free will as most people think we have, imho. Many of those separated brain cases seem to confirm this stance.

griffzhowl 1 day ago|||

We can evaluate various courses of action, and pick one based on that evaluation. I think something like that is what most people think of as having free will. If we discover that evaluation function is deterministic, it shouldn't change our attitudes to it imo. This is how we normally think of the past: what we did in the past is now determined, and yet we were just as free then as we are in the present. And we presently either suffer or enjoy the consequences of those past decisions, so we better take our present decisions reasonably seriously.

In general I'm quite sceptical of taking physical theories as guidance for life, even more so speculative interpretations of physical theories. Physics can tell us, probabilistically, how relatively simple systems will evolve in time, but most events in our life are way beyond what it can predict, so that should caution us against extrapolating its concepts to these more complicated phenomena. We don't know what conceptual innovations might be required to have a fully coherent and precise account of them, or if that is even possible. The best insight we can get on our life is to acutally live it and reflect on those experiences.

Other considerations are that we don't currently have a fundamental physical theory, since general relativity and the standard model don't work together. Even when they do apply, the equations can't be solved exactly for systems of more than two particles. Everything else involves approximations and simplifications, and in fact even the concept of particle is only valid in a non-relativistic approximation. That suggests to me that physical theories are best thought of at the moment as tools that have a limited domain of effectiveness, though of course within that domain they're extremely effective.

hnuser123456 1 day ago||||

Sure, but if someone finds themselves feeling incredibly defeated by the thought, then how can we call it productive philosophy? I went too far down this rabbit hole about 8 years ago, and built up a strong mindset of all the things I wanted to be that I couldn't because I wasn't born in the right circumstances. Much better to feel like I can absolutely be those things at least in spirit, and maybe talk to other people about it and find people who are willing to see the unexpected parts of me.

Yes, we have enough accurate theories now that we can predict parts of the future with incredible accuracy in ways that weren't possible 100+ years ago, but we don't have a bulletproof theory of everything, much less a bulletproof theory of everything about humans.

Championing superdeterminism is like being the smart alec who says they can predict anything if you give them enough initial context. Sure, now go make risky investments that will only go up.

The Heisenberg uncertainty principle itself shows that it is not worth fretting too much about superdeterminism.

We will never be able to replace every theory that uses probabilities with ones that don't.

eru 18 hours ago||

Quantum mechanics itself is actually completely deterministic. It's just certain interpretations, like Copenhagen, that unnecessarily introduce nondeterminism. The Many Worlds interpretation is one that doesn't have that property.

In any case, I don't think determinism and free will have much to do with each other. Eg a classically random fair coin is non-deterministic, but can hardly be said to have free will.

In the other direction, a Turing machine is completely deterministic, but in general you can't predict what it does (without running it.)

If you want to bring up quantum mechanics, I think the no-cloning theorem is more important: no one can learn everything about you in order to 100% simulate and predict you. (Which is what you'd need for the Turing machine. And humans are at least as complicated as Turing machines.)

eru 23 hours ago|||

Oh, I wasn't talking about free will. I was talking about 'superdeterminism'. See https://en.wikipedia.org/wiki/Superdeterminism

Superdeterminism requires crazy 'conspiracy theories' to work.

Btw, I think determinism and free will are compatible. At least for some definitions of free will.

lherron 1 day ago||

Thanks for this! Subscribed.

JohnKemeny 1 day ago||

For algorithms, a little memory outweighs a lot of time. 343 points, 139 comments, 39 days ago. https://news.ycombinator.com/item?id=44055347

You need much less memory than time. 126 points, 11 comments, 22 days ago. https://news.ycombinator.com/item?id=44212855

ALLTaken 1 day ago|

I am trying to understand how the paper OP posted can be translated into anything that improves current algorithms.

Can actually any current algorithms improved by this?? Or is that only the case if the hardware itself is improved?

I am baffled

utopcell 1 day ago|||

The result doesn't actually apply to _all_ algorithms, only oblivious ones. Oblivious algorithms are ones where step i does not depend on decisions made in steps j < i. Almost all useful algorithms are non-oblivious, with some notable exceptions [1]. This work is a step forward towards proving the result for the general case, with little practical usage (and that's perfectly okay).

[1] https://en.wikipedia.org/wiki/Sorting_network

JohnKemeny 1 day ago|||

This is blue skies research and is not expected to have any immediate practical impact whatsoever.

mikewarot 1 day ago||

I get that this is an interesting theoretical proof. I'm more interested in the inverse, trading memory for time, to make things faster, even if they take more space. It seems to me the halting problem is almost a proof the inverse of this is impossible.

Memoization is likely the best you can do.

LegionMammal978 1 day ago||

The author of the paper also notes how the result gives an upper bound for how well we can 'trade space for time' in the worst case:

> In this paper, we make another step towards separating P from PSPACE, by showing that there are problems solvable in O(n) space that cannot be solved in n^2/log^c n time for some constant c > 0. [0]

Of course, this is all specifically in the context of multitape Turing machines, which have very different complexity characteristics than the RAM machines we're more used to.

[0] https://arxiv.org/abs/2502.17779

burnt-resistor 1 day ago|||

Dynamic programming and divide-and-conquer with multiple cores on a diagonalized problem space of independent problems (CPU and memory hierarchy) with (hopefully) a final reduce step at the end. Real-world problems are seldom so neat, hence the investment in Infiniband gear.

trueismywork 1 day ago||

https://arxiv.org/abs/2502.17779

Link to paper

gwbas1c 1 day ago||

One thing that's surprised me throughout my career is just how inefficient most of the code that I've inherited is. I suspect we could do a lot more with the hardware we have if we simply became better at programming.

(Perhaps more AI coaching could help?)

jiehong 1 day ago||

I'm also surprised by how many developers just don't seem to care much.

Yet, the general population isn't behaving like scouts most of the time, so I suppose it's only fair that developers also aren't.

"Leave the world a better place than you found it" (s/world/code/g)

Agingcoder 1 day ago||

I’ve spent many years working on hpc problems, and most people simply don’t have , or don’t care about, or don’t know what to do with performance problems.

Don’t have to because they usually have a full machine, and unless they saturate it, the problem doesn’t exist.

Don’t care, because many people think it’s alright if things are slow, or inefficient ( the user can wait, or will live with 100ms instead of 1ms).

Don’t know what to do - this one is interesting : many people don’t realize it’s actually possible to get the same result , but faster.

gwbas1c 1 day ago||

> Don’t care, because many people think it’s alright if things are slow, or inefficient ( the user can wait, or will live with 100ms instead of 1ms).

It's not that.

I've encountered code where someone wrote complicated "scale out" code instead of learning how to use a database and fix the n+1 problem; or another case where someone wrote complicated caching code instead of learning how to avoid cartesian explosion.

Agingcoder 22 hours ago||

Ah that’s nice a one - it’s ’do the wrong thing’!

utopcell 1 day ago|||

Given your username, I'm sure whoever inherits your codebase would share the same sentiment.

HideousKojima 1 day ago||

AI is trained on the same shitty code you're inheriting, so probably not.

humanfromearth9 1 day ago||

It's also trained on all best practices and algorithms that you don't know exist, so it is able to do better - provided you know to ask and how to ask/what to ask.

HideousKojima 1 day ago||

It's not simply a matter of knowing what/how to ask. LLMs are essentially statistical regressions on crack. This is a gross oversimplification, but the point is that what they generate is based on statistical likelihoods, and if 90%+ of the code they were trained on was shit you're not going to get the good stuff very often. And if you need an AI to help you do it you won't even be able to recognize the good stuff when it does get generated.

krackers 1 day ago||

utopcell 1 day ago||

[ made this a top-level comment as I don't see it mentioned in other comments: ]

[1] https://en.wikipedia.org/wiki/Sorting_network

LegionMammal978 1 day ago|

It applies to any algorithm that can be implemented with a multitape Turing machine, which is still quite a few of them. The obliviousness requirement is only for random-access machines.

gweinberg 1 day ago||

I don't understand what they are saying at all. If I have a computation that requires going through a loop n times, why should the memory requirements increase with n at all?

svat 1 day ago||

Of course in many cases (such as the one you mentioned), the amount of memory needed does not grow with N. But there are other programs/computation that seem to require more memory. For example, consider the following (Python) program:

    def intpow(n, r): return int(n**r)
    N = 10000
    a = [0] * N
    a[1] = 1
    for m in range(2, N):
        a[m] = a[m-1] + a[intpow(m, 0.6)] + a[intpow(m, 0.3)] + a[intpow(m, 0.1)]
    print(a[N-1])

It populates an array of size N by, for each index m, accessing array elements at smaller values m-1, m^0.6, m^0.3 and m^0.1 (just arbitrary values I made up). So this is a program that runs in time about N, and as it can depend on array elements that go very far back, it may not be immediately obvious how to make it run with space significantly less than N^0.9 (say), let alone √N. But the result shows that it can be done (disclaimer: I haven't actually verified the details of how the proof carries over to this program, but it looks like it should be possible), and for any program no matter how weird.

jasperry 1 day ago||

Sure, there are plenty of computations where you don't need much space no matter how many times you loop--like if you're just updating a single sum. But there are other problems where each time through the loop you might need to add an element to a big array or matrix, and then you go back and do more computation on the saved data elements to get the answer.

Before this result, there were believed to be at least some problems where you need (almost) as much space as time, but Williams shows that, theoretically at least, that's never necessarily the case.

kristianp 1 day ago|

This seems very theoretical, just a lower bound on space required, without talking about what is being computed. Does it have any import on real algorithms?

wiz21c 1 day ago||

Lower bound tells you how much it's worth to improve the SOTA. It gives you a hint that you can do better.

So it's more like polar star. Maybe not directly practical, but it will lead tons of people in the right direction.

Gravityloss 1 day ago||

Thanks. I think the title is quite misleading then? I would have expected better from Scientific American.

wasabi991011 1 day ago|||

It's a reduction from multitape Turing machine to tree evaluations. If your "real algorithm" is straightforwardly modeled by a multitape TM, then it might be possible to use this proof to reduce the space. Otherwise, I don't think so.

CommenterPerson 1 day ago|||

It does have an impact. It wont give you stackoverflow like code to copy-paste though.

Analogy is thermodynamics says how efficient a heat engine _could_ be. If your engine is way below that you know there how much of an improvement there _could_ be, that it's not an impossible problem. That will get people to build better stuff.

bmacho 1 day ago|||

Upper bound on the space required. Anything, that is computable.

> Does it have any import on real algorithms?

It depends on the constants, but if the constants are good, you can have VMs for example that make memory usage smaller.

kadoban 1 day ago|||

> Does it have any import on real algorithms?

As-in algorithms you'll care about if you're a programmer and not overly interested in theory? Not really, no.

More comments...