Optimizers need a rethink

Posted by ingve 5 days ago

Optimizers need a rethink(typesanitizer.com)

90 points | 83 commentspage 2

mcfig 11 hours ago|

Great article. One real case I encountered that I find thought provoking, is where a bunch of test failures were bucketed into the same bucket because link-time code-generation had noticed that a bunch of C++ getter functions had the same output code and combined them all. So stack traces became confusing because the address-to-symbol mapping was more complicated than the logic we had in place was prepared for.

i.e. optimization had violated a rule we were implicitly relying on (that each non-inlined function should start at a distinct address, so that address-to-symbol mapping could be done easily). But that’s not an explicit guarantee and optimizers don’t seem to think about it much. (Well for inlining it seems to have had some thought, still sucks, but anyway this case doesn’t fit the pattern of inlining).

I find it hard to say anyone is dead wrong in this case… but I would turn off that LTCG optimization any time I could, except where proven necessary.

pornel 10 hours ago||

Optimizations in compilers like LLVM are done by many individual code transformation passes, one applied to the result of the previous.

This layering makes the order of the passes important and very sensitive. The passes usually don't have a grand plan, they just keep shuffling code around in different ways. A pass may only be applicable to code in a specific form created by a previous simplification pass. One pass may undo optimizations of a previous pass, or optimize-out a detail required by a later pass.

Separation into passes makes it easier to reason about correctness of each transformation in isolation, but the combined result is kinda slow and complicated.

CalChris 10 hours ago||

We are nearing the death of Proebsting's Law. AMD CEO Lisa Su's HotChips’19 keynote said that compilers had accounted for 8% performance increase over the decade. That means compilers are now only doubling performance every 90 years.

https://www.youtube.com/watch?v=nuVBid9e3RA

gpm 12 hours ago||

Arguing against query planning by pointing at a quote about databases is wild. Automatic query planning is ubiquitous and hugely succesfull in databases.

typesanitizer 9 hours ago||

I've added a clarification in the post to make my position explicit:

> This is not to imply that we should get rid of SQL or get rid of query planning entirely. Rather, more explicit planning would be an additional tool in database user’s toolbelt.

I'm not sure if there was some specific part of the blog post that made you think I'm against automatic query planning altogether; if there was, please share that so that I can tweak the wording to remove that implication.

gpm 9 hours ago||

> I'm not sure if there was some specific part of the blog post that made you think I'm against automatic query planning altogether; if there was, please share that so that I can tweak the wording to remove that implication.

The quote from another article (which I didn't read) starting with "I dislike query planners".

"Against ... altogether" is mildly stronger than I took away from this, more like "generally of the opinion that the tradeoff nearly everyone is making with sql isn't worth it".

Judging by the lack of upvotes other people didn't react as strongly to this quote as I did, so take it as you will.

bsder 11 hours ago||

And sometimes highly problematic.

I'm surprised that the "query planner" doesn't have a way to eject an opaque object that is the "assembly language of the query" that you can run that it is not allowed to change.

gpm 8 hours ago||

Sure. It's definitely a tradeoff which definitely hurts on rare occasion. I agree that the lack of fallback in most databases is a bit strange. Altogether though the productivity benefits have proven larger than the drawbacks of not defaulting to a query planner.

sebmellen 13 hours ago||

I like that the author included their intended audience up front. Definitely not me, but it helped me read the article with a different perspective.

jonstewart 12 hours ago||

At least databases have Explain. I'd love to get feedback from clang or gcc about why particular optimizations were not applied.

typesanitizer 9 hours ago||

I'm guessing you've tried these flags mentioned in the blog post but haven't had luck with them?

> LLVM supports an interesting feature called Optimization Remarks – these remarks track whether an optimization was performed or missed. Clang support recording remarks using -fsave-optimization-record and Rustc supports -Zremark-dir=<blah>. There are also some tools (opt-viewer.py, optview2) to help view and understand the output.

einpoklum 11 hours ago||

Explain doesn't give you that information in many (most?) DBMSes. It's a bit like seeing the compiler IR code of your program. It lets you understand some things, while others remain a mystery.

QuadmasterXLII 11 hours ago||

Frankly, the problem is that (generally, across languages various compiler hints) @inline sometimes fails to inline. At this point I’ve given up on ever having an @inline that reliably inlines, and I would very happily settle for an @assert_inline that doesn’t change the generated assembly at all but reliably crashes out if the function isn’t inline.

Julia is by far the worst language about this. It would be vastly more usable with the addition of @assert_type_stable, @assert_doesn’t_allocate, and @assert_doesn’t_error macros.

andyferris 11 hours ago||

On the Julia front, I believe the evolving effect system may help a little bit with obtaining/surfacing that information. JET.jl should be able to solve the type stability (inference) one from a whole-of-program perspective. We have `@inferred` for unit tests. The macros would be a cool addition - I wonder if they would get overused though?

I agree that Julia takes the idea of optimization to the extreme - it's semantically a very dynamic language and only fast due to non-semantically guaranteed optimization. On the other hand, getting access to the generated IR, LLVM and assembly and iteratively improving it is far easier than any other language I've seen.

pjmlp 1 hour ago||

What Julia offers is quite common in Common Lisp systems.

dzaima 11 hours ago||

With gcc & clang you can use __attribute__((always_inline)) to force-inline, even at -O0, giving an error if it's impossible.

einpoklum 11 hours ago||

From the article:

> Have a good mental model of what the optimizer can and cannot do.

Most DB query planner designers and implementers have little imagination, and their mental model of what optimizers can and cannot do is, well, extremely narrow-minded. There is huge unexplored space of what query planning can be (at least for analytic queries, and we think in columnar terms) - if we just stop insisting on thinking of DBMS operations as black boxes.

nostromo177 5 days ago|

[dead]