Top
Best
New

Posted by signa11 6/26/2025

Revisiting Knuth's “Premature Optimization” Paper(probablydance.com)
191 points | 139 commentspage 2
monkeyelite 5 days ago|
- Knuth puts sentinels at the end of an array to avoid having to bounds check in the search. - Knuth uses the register keyword. - Knuth custom writes each data structure for each application.
hinkley 5 days ago||
When I was young someone pointed out to me how each hype cycle we cherry-pick a couple interesting bits out of the AI space, name them something else respectable, and then dump the rest of AI on the side of the road.

When I got a bit older I realized people were doing this with performance as well. We just call this part architecture, and that part Best Practices.

sitkack 4 days ago||
I like to call anything I don't like "anti-patterns".
bluGill 4 days ago||
When Knuth wrote his paper compilers/optimizer were not nearly as good as today, and CPUs were much more deterministic (only disk access shows the issues we now see with cache misses). Most of his optimizations are things that the compiler will do better than you can (most is not all!) and so not even worth thinking about. Meanwhile the compiler will almost never turn a O(n^2) into O(n*ln(n)) despite that resulting in a much faster speedup than anything the compiler can do.
monkeyelite 4 days ago||
Knuth does this today.

And a compiler will never make sentinels for you. Most of these tricks still work and have a measurable difference.

osigurdson 5 days ago||
The real root of all evil is reasoning by unexamined phrases.
hinkley 5 days ago|
"A clever saying proves nothing." - Voltaire
lo0dot0 3 days ago||
It's also good practice to chose an appropriate language for the problem. It's not premature optimisation to use a compiled language instead of Python when you already know the code will run thousands of times in a loop because of the application and use a lot of electrical energy in the process.
subharmonicon 5 days ago||
Love this paper and read it several times, most recently around 10 years ago when thinking about whether there were looping constructs missing from popular programming languages.

I have made the same point several times online and in person that the famous quote is misunderstood and often suggest people take the time to go back to the source and read it since it’s a wonderful read.

p0w3n3d 4 days ago||
I understand the Knuth's _Premature Optimization_ saying as: (1) find the most hot code (best way to have full heat chart), (2) optimize there. That's all of it, and it really works. E.g. If you have some code that prepares data for a loop that is n^2, and you work hard on this data preparation, profile it etc., this will not give you any visible improvement, because this is one-time code. What's inside the n^2 loop is more important and optimize there, even work together and prepare the data better to allow the loop work more efficient.
osigurdson 4 days ago|
Suggest not over indexing on this approach. The profiler will not always identify an easily fixable hotspot. Performance should not entirely be post hoc. Some upfront investment pays off. Sure, not every algorithm matters but with some basic upfront investment (like don't do O(N^2) when O(N) is possible, or don't hit the DB for every item in a loop) will pay off. The alternative can be weeks / months of profiling and debugging. The fixed code often introduces other bugs as code built upon it makes various assumptions.

I'm speaking from experience in a large code base where people liked to use the "root of all evil" argument quite a bit at one time.

p0w3n3d 2 days ago||
You're right, we should keep our development skills in shape and hold on to sanity when writing code. Especially n+1 queries and other types of calls to another system would cause problems invisible for profiler (some kind of profilers?).

That's why when converting data from one list to another I used to always initialize the latter with the known size (now I use Java streams mostly)

On the other hand there is me doing JMH microbenchmarking in code that was eventually parsing configuration files (only once at startup) which was the moment I learnt the said quote from Donald Knuth from the colleagues reviewing my code

kragen 5 days ago||
I happen to have also reread the paper last week. The last time I read it was 30 years ago, closer to when it was written than to the present, and I had forgotten a lot of it. One of the few bits that stuck with me was the Shanley Design Criterion. Another was "n and a half loops".

The bit about sequential search and optimization, the topic of this whole blog post, is kind of a minor detail in the paper, despite being so eloquently phrased that it's the part everyone quotes—sometimes without even knowing what “optimization” is. There's a table of contents on its second page, which is 33 lines long, of which "A Searching Example" and "Efficiency" are lines 4 and 5. They are on pages 266–269, 3 pages of a 41-page paper. (But efficiency is an issue he considers throughout.)

Mostly the paper is about control structures, and in particular how we can structure our (imperative!) programs to permit formal proofs of correctness. C either didn't exist yet or was only known to less than five people, so its break/continue control structures were not yet the default. Knuth talks about a number of other options that didn't end up being popular.

It was a really interesting reminder of how different things looked 51 years ago. Profilers had just been invented (by Dan Ingalls, apparently?) and were still not widely available. Compilers usually didn't do register allocation. Dynamically typed languages and functional programming existed, in the form of Lisp and APL, but were far outside the mainstream because they were so inefficient. You could reliably estimate a program's speed by counting machine instructions. People were sincerely advocating using loops that didn't allow "break", and subroutines without early "return", in the interest of building up their control flow algebraically. Knuth considered a recursive search solution to the N-queens problem to be interesting enough to mention it in CACM; similarly, he explains the tail-recursion optimization as if it's not novel but at least a bit recondite, requiring careful explanation.

He mentions COBOL, BCPL, BLISS, Algol W, Algol 60, Algol 68, other Algols, PL/I (in fact including some example code), Fortran, macro assemblers, "structured assemblers", "Wirth's Pascal language [97]", Lisp, a PDP-10 Algol compiler called SAIL (?), META-II, MIXAL, PL360, and something called XPL, but not Smalltalk, CLU, APL, FORTH, or BASIC.

He points out that it would be great for languages to bundle together the set of subroutines for operating on a particular data type, as Smalltalk and CLU did, but he doesn't mention CLU; it had only been introduced in the previous year. But he's clearly thinking along those lines (p.295):

> (...) it turns out that a given level of abstraction often involves several related routines and data definitions; for example, when we decide to represent a table in a certain way, we also want to specify the routines for storing and fetching data from that table. The next generation of languages will probably take into account such related routines.

Often when I read old papers, or old software documentation like the TENEX EXEC manual, it's obvious why the paths not taken were not taken; what we ended up doing is just obviously better. This paper is not like that. Most of the alternatives mentioned seem like they might have turned out just as well as what we ended up with.

another_twist 4 days ago||
I mean basically what he's saying is check the impact of your optimization. Every time there's an optimization, theres a complexity and brittleness cost. However sometimes unoptimized code is actually more difficult to read than the optimized version. I mean its quite logical tbf.
globular-toast 4 days ago||
Wait... people understood "premature optimisation" to mean "small optimisations are not worth it"? I've always understood it to mean exactly what it's supposed to mean, namely don't optimise something until you've shown that it's actually needed. I honestly don't know how it could be interpreted any other way.
bluGill 4 days ago|
It is sad how often people cite premature optimization in the real world even when it isn't premature. Using a good algorithm is never premature optimization, particularly when the good algorithm is likely built into your language as well. Likewise if you have run the profiler it is not premature optimization, and running the profiler to find those places is not premature optimization.
yubblegum 5 days ago||
Ironically, all pdfs of the famous paper have atrocious typesetting and are a pain to read.
svat 5 days ago||
The word “typesetting” does not refer to all aspects of visual appearance, but merely to the crucial decision of where each character goes: what glyph (from the fonts available) is placed at what position on the page. This paper was first published in the ACM Computing Surveys journal, and the typesetting is fine (as you'll find if you pick up a physical copy of the journal) — I think what you're complaining about is that all PDF versions that have been put up online so far (including the official one available via https://doi.org/10.1145/356635.356640) are from the same poor scan of the journal: it's not the typesetting that is atrocious, but the scanning quality (converting the printed page into digital images).

You may also want to look at the version published in Knuth's collection called Literate Programming (with corresponding errata: https://cs.stanford.edu/~knuth/lp.html#:~:text=Errata,correc... ), and I've just uploaded a scan here: https://shreevatsa.net/tmp/2025-06/DEK-P67-Structured.progra...

yubblegum 5 days ago||
My eyes thank you!
aitchnyu 4 days ago||
The typesetting expert's papers are hard on the eyes. The algorithm expert's programs are not executed.
SleepyMyroslav 4 days ago|
(meta) I am probably wasting my time commenting on linked article here. Nobody does that /s.

I don't think the measurements support conclusion that well.

What I want to have when I see those measurements:

I want language abstract machine and compiler to not get in the way I want code on certain platforms to perform. This is currently not what I get at all. The language is actively working against me. For example, I can not read cache line from an address because my object may not span long enough. The compiler has its own direction at best. This means there is no way to specify things, and all I can do is test compiler output after each minor update. In a multi-year project such updates can happen dozens of times! The ergonomics of trying to specify things actually getting worse. The example with assembly is similar to my other experiences: the compiler ignores even intrinsics now. If it wants to optimize, it does.

I can't run to someone else for magic library solutions every time I need to write code. I need to be able to get things done in a reasonable amount of time. It is my organization development process that should decide if the solution I used should be part of some library or not. It usually means that efforts that cover only some platforms and only some libraries are not that universally applicable to my platforms and my libraries as folks at language conferences think /s.

Disclaimer. I work in gamedev.

More comments...