Top
Best
New

Posted by ingve 10/23/2024

Optimizers need a rethink(typesanitizer.com)
149 points | 167 commentspage 2
andyferris 10/27/2024|
If you have any language where it is "semantially correct" to execute it with a simple interpetter, than all optimizations in that language are not semantically important by definition, right? I've even seen interpretters for C... so while I 100% have felt the pain in this article, I don't know where that leaves us? These are more like compilation assertions/strategies than language features. (Not that these shouldn't be written inline with the code, e.g. it can be nice to write unit tests in the same files as the code in some languages).

In the case of SQL, I'd love access to a flavor where I do the joins on indices explicitly, the query is executed as written, and each join (or filter) can be annoted with a strategy (btree lookup, etc). (The most difficult part about using indices by hand is correctly writing all the synchronous triggers on updates, not the queries, IMO).

typesanitizer 10/28/2024||
> If you have any language where it is "semantially correct" to execute it with a simple interpetter, than all optimizations in that language are not semantically important by definition, right?

Technically, yes. :)

But I think this should perhaps be treated as a bug in how we define/design languages, rather than as an immutable truth.

- We already have time-based versioning for languages. - We also have "tiers" of support for different platforms in language implementations (e.g. rarer architectures might have Tier 2 or Tier 3 support where the debugging tooling might not quite work)

One idea would be to introduce "tiers" into a language's definition. A smaller implementation could implement the language at Tier 1 (perhaps this would even be within reach for a university course project). An industrial-strength implementation could implement the language at Tier 3.

(Yes, this would also introduce more complications, such as making sure that the dynamic semantics at different tiers are equivalent. At that point, it becomes a matter of tradeoffs -- does introducing tiers help reduce complexity overall?)

Archit3ch 10/28/2024||
> If you have any language where it is "semantially correct" to execute it with a simple interpetter, than all optimizations in that language are not semantically important by definition, right?

Not when "correct" needs optimizations to meet real-time guarantees. It's hard to argue that a program which doesn't run is "correct".

pornel 10/27/2024||
Optimizations in compilers like LLVM are done by many individual code transformation passes, one applied to the result of the previous.

This layering makes the order of the passes important and very sensitive. The passes usually don't have a grand plan, they just keep shuffling code around in different ways. A pass may only be applicable to code in a specific form created by a previous simplification pass. One pass may undo optimizations of a previous pass, or optimize-out a detail required by a later pass.

Separation into passes makes it easier to reason about correctness of each transformation in isolation, but the combined result is kinda slow and complicated.

jonstewart 10/27/2024||
At least databases have Explain. I'd love to get feedback from clang or gcc about why particular optimizations were not applied.
einpoklum 10/27/2024||
Explain doesn't give you that information in many (most?) DBMSes. It's a bit like seeing the compiler IR code of your program. It lets you understand some things, while others remain a mystery.
typesanitizer 10/28/2024||
I'm guessing you've tried these flags mentioned in the blog post but haven't had luck with them?

> LLVM supports an interesting feature called Optimization Remarks – these remarks track whether an optimization was performed or missed. Clang support recording remarks using -fsave-optimization-record and Rustc supports -Zremark-dir=<blah>. There are also some tools (opt-viewer.py, optview2) to help view and understand the output.

kragen 10/28/2024||
Daniel Bernstein has given an interesting talk about "the death of optimizing compilers" (PDF slides at https://cr.yp.to/talks/2015.04.16/slides-djb-20150416-a4.pdf) claiming that less and less of our code is in that middle ground where it's performance-critical enough to risk running it through an optimizer but not so performance-critical that it needs to be rewritten in assembly.

If we think of the high-level source code (whether in SQL, JS, or Python) as being a functional spec that the low-level implementation should provably follow, maybe we should check them both in, and at build time, run a proof assistant to verify that the implementation still complies with the spec? Rather than running an optimizing compiler every time. I think this is what nanolith is suggesting in https://news.ycombinator.com/item?id=41965701.

Maybe you run the optimizing compiler once when you first write the code, and again when the proof assistant tells you an optimization is no longer valid, but you check the output before you check it in. Or maybe you have the compiler running all the time on a GPU cluster looking for new optimizations.

082349872349872 10/28/2024|
> maybe you have the compiler running all the time ... looking for new optimizations

Anyone know if there was ever a retrospective on https://github.com/webyrd/Barliman ?

tmtvl 10/28/2024|||
Will is on HN*, though I don't know whether he'll see this topic.

* https://news.ycombinator.com/user?id=will_byrd

gradschoolfail 10/29/2024||||
Sufficiently “advanced”* program synthesis is indistinguishable from compiler optimization?

*(= “correct”?)

[sufficiently constrained madness is indistinguishable from ikigai?]

082349872349872 10/31/2024||
I haven't reached 70, so maybe we should see if 画狂老人, who did (although not 110) ever mentioned ikigai?
gradschoolfail 11/2/2024||
Pardon my ongoing lack of attention— pray tell what number base?
082349872349872 11/2/2024||
in qua talibus Indorum fruimur bis quinque figuris: https://pubmed.ncbi.nlm.nih.gov/1363084/

(our own convention; the fox has Hox, and the lion too, but they'd probably count in octal. For those keeping score, compare The House of Asterion: "The original says fourteen, but there is ample reason to infer that, as used by Asterion, this numeral stands for infinite")

Lagniappe: https://www.youtube.com/watch?v=TiXINuf5nbI#t=120s

EDIT: while we're at it, maybe I should specify units as orbits of Sol III?

kragen 10/28/2024|||
I haven't seen one. He's apparently morphing into a YouTuber, though, so maybe we should grep his .vtts.
skybrian 10/27/2024||
> The optimizer’s behavior is in some contexts load-bearing and has little margin for error (e.g. missing an optimization).

Well, sure, sometimes, to an extent. But if it's load-bearing, maybe that's a bug? You might have written non-portable code that won't last, because it depends on an implementation detail that isn't standardized.

There are widespread applications where performance isn't critical. For example, any time you do a network request. Web pages shouldn't break because the network is slow today, if you can possibly avoid it.

The web provides no performance guarantees, but it tries pretty hard to provide compatibility guarantees. Your code should, usually, work on new browser versions and on devices that haven't been released yet. New browsers will have different JIT compilers with different performance cliffs. And yet, there are websites written many years ago that still work.

When standardizing things, we need to be precise about what's standardized and what isn't, or protocols "rust shut" and can't be changed without breaking lots of stuff. Not standardizing performance is often a win. (With some exceptions like video game consoles where the hardware is standardized.)

Hyrum's law suggests that all compiler optimizations will eventually be load-bearing for someone, but we should usually try to avoid it. To make sure that the code is robust, perhaps performance should vary. Maybe it would be useful to have something like a chaos monkey for programs, where optimizations vary based on a random seed?

bsder 10/27/2024|
> But if it's load-bearing, maybe that's a bug?

If your "debug optimization" code is so slow as to be unusable (see: Rust), then your optimizations qualify as load-bearing.

The problem is that "optimization level" needs a mindset change. The optimization levels should be "release", "small" and "experimental".

"Release level" needs to be perfectly usable for debugging as well as in production--"debug level" should be "release level". Compilation time should be reasonable and run time should be functional.

After that, for embedded, you should have "small level"--checks should get turned off and any optimizations that make code significantly bigger should get turned off (loop unrolling, for example). You might enable some optimizations that make compile time brutally slow.

Finally, there should be an "experimental level" which tests out optimizations before they go into release.

And there should be no "fast" optimization level. If your optimization is that situation specific, it should stay stuck in "experimental".

And through all of this, the compiler should also eject files that carry enough information to allow debuggers to make sense of the compiled code, unwind the optimizations when the users asks, and present a coherent version of what is going on. This is actually where compilers really break down nowadays. The compiler needs to eject enough information and context that a debugger can unwind what is going on rather than being an afterthought.

We need the equivalent of an LSP (Language Server Protocol) for debuggers.

dzaima 10/28/2024|||
DWARF does contain turing-complete VM for doing unwinding, so theoretically it should already be possible to make a compiler that gives precise debug info everywhere, no new protocol necessary.

But completely-reversible-anywhere optimizations are rather limiting, disallowing a bunch of things; including, but not limited to, dead code elimination (more generally, forgetting some fraction of state from somewhere), identical code merging, and instruction reordering (esp. vectorization, which essentially reorders across multiple loop iterations), severely limiting what your optimizer can even do.

astrange 10/28/2024||||
> And through all of this, the compiler should also eject files that carry enough information to allow debuggers to make sense of the compiled code, unwind the optimizations when the users asks, and present a coherent version of what is going on.

You can't do this because some optimizations can't be seen through; most obvious one is identical code merging. If you're in a merged function then you don't know which of the original source lines you're looking at.

I'd like to see a system which only had optimized compilation and used an interpreter for debugging.

int_19h 10/29/2024|||
You can do that by having several different trampolines for the merged code that stash away a unique cookie on the stack to distinguish which function the code is currently executing. This is not zero-overhead, of course, but one can argue that the slight overhead is worth it given better error reporting and increased debuggability.
bsder 10/29/2024||||
> If you're in a merged function then you don't know which of the original source lines you're looking at.

Then don't do that optimization.

How much is gained versus the grief that is caused?

Even when I did a lot of embedded device coding, I don't think that optimization ever saved me more than a couple dozen bytes of code.

And, as you point out, the optimization is stupidly difficult for debuggers to to unwind.

This is one of the disconnects of modern optimization. There is no good backpressure from downstream tools to say "Oy. That optimization is going to make our lives difficult. If you can't demonstrate a significant improvement, don't do it."

astrange 10/29/2024||
Indeed, you can just not do it for important things - usually the error paths leading to a crash or exit.

It's important for embedded software though. But that was a bad example; I should've said tail calls. Which are much more common and more important, because avoiding them blows out the stack for recursive code.

FridgeSeal 10/28/2024||||
How would this model handle code that needs to be as fast as possible, at the cost of debugging? E.g. all hot-loop code, all high performance numerical code, all latency sensitive and instruction sensitive code?

> If your optimization is that situation specific, it should stay stuck in "experimental".

Yeah, I have a feeling that there’s low-key lots of applications for whom many compiler optimisations that you’re branding “experimental” are in fact run-of-the-mill and would just enable “experimental” mode so frequently it’d just be synonymous with “actually release mode”.

> And through all of this, the compiler should also eject files that carry enough information to allow debuggers to make sense of the compiled code, unwind the optimizations when the users asks, and present a coherent version of what is going on.

This is a pretty cool idea, semi-related, you should check out some of the work around E-Graphs, which attempt to enable the same level of optimisations but also preserve more original context. It’d be neat to have the equivalent of source-maps for highly optimised code, but I’m not sure how we’d manage them without them becoming enormous. After things like inlining, constant-folding, desugaring, rearrangement operations, etc all take place I suspect you’d be left with code that bears only the merest passing resemblance to its original form.

skybrian 10/28/2024|||
For the web, there are debuggers, but no apparent "debug mode."
froh 10/28/2024||
in the "closing thoughts" [1] of the OP article there's a pointer to "Proebstrings law" [2] which got me to a benchmark [3] how LLVM has improved over 10 years from 2.7 to 11

which leaves me with the strong intuition that indeed what mit benefit more from a rethink is not optimizers but focus on programmer productivity

> Perhaps this means Programming Language Research should be concentrating on something other than optimizations. Perhaps programmer productivity is a more fruitful arena.

[1] https://typesanitizer.com/blog/rethink-optimizers.html#closi...

[2] https://proebsting.cs.arizona.edu/law.html

[3] https://zeux.io/2022/01/08/on-proebstings-law/

russellsprouts 10/28/2024||
This reminds me of Halide (https://halide-lang.org/), which is a really interesting design in this direction.

In Halide, you specify the algorithm at a high level, then specify the "schedule" (execution order, vectorization, storage) separately. Then, it verifies that the two match. For tight loops like image kernels, it's often counter-intuitive which execution plan will give the best performance, so being able to experiment with the guarantee that you are still implementing the algorithm correctly is valuable.

For a database, it would be as if you could send a high level SQL query and an execution plan together.

Halide is very specific, but I could imagine a general-purpose language with a Python-like high level representation for code behavior, with the option to specify a lower-level implementation with a lot of performance annotations.

mcfig 10/27/2024||
Great article. One real case I encountered that I find thought provoking, is where a bunch of test failures were bucketed into the same bucket because link-time code-generation had noticed that a bunch of C++ getter functions had the same output code and combined them all. So stack traces became confusing because the address-to-symbol mapping was more complicated than the logic we had in place was prepared for.

i.e. optimization had violated a rule we were implicitly relying on (that each non-inlined function should start at a distinct address, so that address-to-symbol mapping could be done easily). But that’s not an explicit guarantee and optimizers don’t seem to think about it much. (Well for inlining it seems to have had some thought, still sucks, but anyway this case doesn’t fit the pattern of inlining).

I find it hard to say anyone is dead wrong in this case… but I would turn off that LTCG optimization any time I could, except where proven necessary.

taeric 10/28/2024||
I think this is misplacing the need for a rethink. Compilers are amazing at optimizing small programs. What they are less amazing at is optimizing large programs.

The query planner for SQL databases is a good example of this. For small queries, the odds of the planner making a bad choice is rather low. Large queries, on the other hand, are comparatively easy to lead down a bad path. This is why the "no sql" family of databases see more predictable results for a lot of uses. If you can't do complicated queries, then you don't have that peril to worry about.

Worth trying if you have the programs to play with. Run it with and without -O3 and see how the compiler does for you.

sealeck 10/28/2024|
> This is why the "no sql" family of databases see more predictable results for a lot of uses. If you can't do complicated queries, then you don't have that peril to worry about.

Can't you also just, well, not use complicated SQL queries?

taeric 10/28/2024||
Certainly, and that is my point? Throwing out query planners feels like the wrong thing to consider. Instead, focus on keeping the things being planned smaller.
account42 10/28/2024|
I'm not sure if this is the right way to look at it? Do you really care if some optimization was performed or not or do you only care about the performance characteristics of the resulting program? Maybe a more sensible solution would be to have better tests for performance regressions.

Still not an easy task (performance is often highly dependent on the input and regressions can hide in very specific cases) but makes more sense to me than writing high level code and then worrying about the details of how the compier translated that high level code.

More comments...