Posted by baruchel 4 days ago
For nearly 50 years theorists believed that if solving a problem takes t steps, it should also need roughly t bits of memory: 100 steps - 100bits. To be exact t/log(t).
Ryan Williams found that any problem solvable in time t needs only about sqrt(t) bits of memory: a 100-step computation could be compressed and solved with something on the order of 10 bits.
For nearly 50 years theorists believed that there exist problems taking t steps that also need roughly t/log(t) bits of memory.
(The Quanta magazine article https://www.quantamagazine.org/for-algorithms-a-little-memor... gives some context: for a long time, the best result was t/log(t) as proved in "On Time Versus Space" https://dl.acm.org/doi/10.1145/322003.322015 by Hopcroft, Paul and Valiant in 1975.)
* High space requirement: we need a bucket per step to track the state
* Low space requirement: we only need a single bucket, if we can mix/unmix the various components of the state
High-space requirement algorithms will be those with a large amount of "unmixable" complex state. Low-space reqs will be those with either a small state, or an easily mixed/unmixed state.
Example of mixing: suppose we need an odd number and a parity (+,-) -- then we can store odd/even numbers: taking even to means (-1 * (number-1)) and odd means odd.
So 7 = +7, 8 = -7
I don't know if this example is really that good, but I believe the very basics of the intution is correct -- its to do with how we can trade "interpretation" of data (computation) for the amount of data in such a way that data can kinda be mixed/unmixed in the interpretation
* The old assumption was that if you require t steps to complete, you need t/log(t) units of computation
* This new proof shows that if you require t steps to complete, you need sqrt(t) units of computation
Surely this doesn't make sense? Using any definition of "unit of computation" I would intuitively assume, computing t steps requires something proportionalt to t units of computation...
> Our simulation reduces the problem of simulating time-bounded multitape Turing machines to a series of implicitly-defined Tree Evaluation instances with nice parameters, leveraging the remarkable space-efficient algorithm for Tree Evaluation recently found by Cook and Mertz [STOC 2024].
The "time-bounded multitape Turing machines" with bounded fan-in means that that particular abstract model has access to the bits of those tapes current head position.
Mapping the quirks of the various abstract models can be tricky, but remember having access to the symbol currently under the tape head is 'free' in TMs.
It is a useful abstraction but doesn't directly map to physically realizable machine.
It is still an interesting result for trees in physically realizable machines.
Does it require one to first have solved the problem in uncompressed space?
So, to answer your question more directly: no, you don't have to have solved a specific instance of the problem, but you have to already have an algorithm that solves it in general.
At least or at most or on average?
I mean, for all I know, the result may have been proved in the model of decision problems where the output is always one bit. But I'd guess that it generalizes just fine to the case where you have to write longer outputs.
However, your question does make me wonder if the result still applies in the RAM model, where you can move to any spot on the tape in constant time. My theory knowledge is getting rusty...
This seems to wildly contradict the idea that the amount of memory needed is roughly proportional to the number of steps.
That's precisely the issue: of course for many problems there are obvious ways to trade off space and time. Smart people who have spent their lives thinking about this are aware of this. :-) The issue here is: can this always be done? For an arbitrary computation that takes t time, can it always be simulated in much less space (using more time of course), no matter what the computation is? Lots of people have tried, and the best result until now was t/log t in the 1970s.
Edit: A comment on a previous discussion of this result by the author of the Quanta article, which substantiates and links to researchers' opinion that it's the universality (works for every problem) that's the surprising part: https://news.ycombinator.com/item?id=44075191
You can write some trivial programs that need this much memory. But most of them can be optimised to use less space (and still compute the same answer).
The big question was: can you always optimise the space usage down from the ridiculous maximum and to what extent?
In addition you can look at how much extra time you actually need: infinite time is a vast overestimate. The new proof gives much more reasonable time bounds.
log to what basis? 2 or e or 10 or...
Why do programmers have to be so sloppy?
You can convert between log bases by multiplying by a constant factor. But any real-world program will also have a constant factor associated with it, depending on the specific work it is doing and the hardware it is running on. So it is usually pointless to consider constant factors when theorising about computer programs in general.
For example in programming base e is more common. For example log is base e in C/C++, JavaScript, Java, Python, Perl, Mathematica, Fortran, and Rust.
Another example is number theory. I just checked a few number theory books from my library and most used base e: Hardy & Wright's "An Introduction to the Theory of Numbers", Apostol's "Introduction to Analytic Number Theory", Baker's "A Comprehensive Course in Number Theory", Ingham's "The Distribution of Prime Numbers", Baker's "Transcendental Number Theory", Ribenboim's "The Book of Prime Number Records", Kumanduri & Romero's "Number Theory with Computer Applications", and Niven's "Irrational Numbers".
The only number theory book I found using ln rather than log was Gelfond's "Transcendental & Algebraic Numbers".
Am now a bit curious as to what the country/scientific field table with most common log notation would look like as it seems there is indeed a lot of variance...
https://mathworld.wolfram.com/Ln.html
> The United States Department of Commerce recommends that the notation lnx be used in this way to refer to the natural logarithm (Taylor 1995, p. 33).
> Unfortunately, mathematicians in the United States commonly use the symbol logx to refer to the natural logarithm, as does TraditionalForm typesetting in the Wolfram Language
Except in mathematics and physics, where it usually is base e.
Except sometimes in computer science, where it can be base 2.
But there are more of those weirdnesses: "ld" can be "log decimal", so base 10; or "logarithmus dualis", so base 2. Base 2 is also sometimes written as "lb" (log binary). You really need to know the conventions of the field/subfield to know which is which.
I usually see either base e or base 2 as the default. In which applications do people use both logarithms and base 10 as the default? I mean, especially these day. In the olden days of the slide ruler base 10 was probably more common.
Engineering. For example dB is defined in base 10.
Highly recommend
We can’t predict the future but we do not have free will as most people think we have, imho. Many of those separated brain cases seem to confirm this stance.
In general I'm quite sceptical of taking physical theories as guidance for life, even more so speculative interpretations of physical theories. Physics can tell us, probabilistically, how relatively simple systems will evolve in time, but most events in our life are way beyond what it can predict, so that should caution us against extrapolating its concepts to these more complicated phenomena. We don't know what conceptual innovations might be required to have a fully coherent and precise account of them, or if that is even possible. The best insight we can get on our life is to acutally live it and reflect on those experiences.
Other considerations are that we don't currently have a fundamental physical theory, since general relativity and the standard model don't work together. Even when they do apply, the equations can't be solved exactly for systems of more than two particles. Everything else involves approximations and simplifications, and in fact even the concept of particle is only valid in a non-relativistic approximation. That suggests to me that physical theories are best thought of at the moment as tools that have a limited domain of effectiveness, though of course within that domain they're extremely effective.
Yes, we have enough accurate theories now that we can predict parts of the future with incredible accuracy in ways that weren't possible 100+ years ago, but we don't have a bulletproof theory of everything, much less a bulletproof theory of everything about humans.
Championing superdeterminism is like being the smart alec who says they can predict anything if you give them enough initial context. Sure, now go make risky investments that will only go up.
The Heisenberg uncertainty principle itself shows that it is not worth fretting too much about superdeterminism.
We will never be able to replace every theory that uses probabilities with ones that don't.
In any case, I don't think determinism and free will have much to do with each other. Eg a classically random fair coin is non-deterministic, but can hardly be said to have free will.
In the other direction, a Turing machine is completely deterministic, but in general you can't predict what it does (without running it.)
If you want to bring up quantum mechanics, I think the no-cloning theorem is more important: no one can learn everything about you in order to 100% simulate and predict you. (Which is what you'd need for the Turing machine. And humans are at least as complicated as Turing machines.)
Superdeterminism requires crazy 'conspiracy theories' to work.
Btw, I think determinism and free will are compatible. At least for some definitions of free will.
For algorithms, a little memory outweighs a lot of time. 343 points, 139 comments, 39 days ago. https://news.ycombinator.com/item?id=44055347
You need much less memory than time. 126 points, 11 comments, 22 days ago. https://news.ycombinator.com/item?id=44212855
Can actually any current algorithms improved by this?? Or is that only the case if the hardware itself is improved?
I am baffled
Memoization is likely the best you can do.
> In this paper, we make another step towards separating P from PSPACE, by showing that there are problems solvable in O(n) space that cannot be solved in n^2/log^c n time for some constant c > 0. [0]
Of course, this is all specifically in the context of multitape Turing machines, which have very different complexity characteristics than the RAM machines we're more used to.
Link to paper
(Perhaps more AI coaching could help?)
Yet, the general population isn't behaving like scouts most of the time, so I suppose it's only fair that developers also aren't.
"Leave the world a better place than you found it" (s/world/code/g)
Don’t have to because they usually have a full machine, and unless they saturate it, the problem doesn’t exist.
Don’t care, because many people think it’s alright if things are slow, or inefficient ( the user can wait, or will live with 100ms instead of 1ms).
Don’t know what to do - this one is interesting : many people don’t realize it’s actually possible to get the same result , but faster.
It's not that.
I've encountered code where someone wrote complicated "scale out" code instead of learning how to use a database and fix the n+1 problem; or another case where someone wrote complicated caching code instead of learning how to avoid cartesian explosion.
The result doesn't actually apply to _all_ algorithms, only oblivious ones. Oblivious algorithms are ones where step i does not depend on decisions made in steps j < i. Almost all useful algorithms are non-oblivious, with some notable exceptions [1]. This work is a step forward towards proving the result for the general case, with little practical usage (and that's perfectly okay).
def intpow(n, r): return int(n**r)
N = 10000
a = [0] * N
a[1] = 1
for m in range(2, N):
a[m] = a[m-1] + a[intpow(m, 0.6)] + a[intpow(m, 0.3)] + a[intpow(m, 0.1)]
print(a[N-1])
It populates an array of size N by, for each index m, accessing array elements at smaller values m-1, m^0.6, m^0.3 and m^0.1 (just arbitrary values I made up). So this is a program that runs in time about N, and as it can depend on array elements that go very far back, it may not be immediately obvious how to make it run with space significantly less than N^0.9 (say), let alone √N. But the result shows that it can be done (disclaimer: I haven't actually verified the details of how the proof carries over to this program, but it looks like it should be possible), and for any program no matter how weird.Before this result, there were believed to be at least some problems where you need (almost) as much space as time, but Williams shows that, theoretically at least, that's never necessarily the case.
So it's more like polar star. Maybe not directly practical, but it will lead tons of people in the right direction.
Analogy is thermodynamics says how efficient a heat engine _could_ be. If your engine is way below that you know there how much of an improvement there _could_ be, that it's not an impossible problem. That will get people to build better stuff.
> Does it have any import on real algorithms?
It depends on the constants, but if the constants are good, you can have VMs for example that make memory usage smaller.
As-in algorithms you'll care about if you're a programmer and not overly interested in theory? Not really, no.