Posted by ingve 10/23/2024
In the case of SQL, I'd love access to a flavor where I do the joins on indices explicitly, the query is executed as written, and each join (or filter) can be annoted with a strategy (btree lookup, etc). (The most difficult part about using indices by hand is correctly writing all the synchronous triggers on updates, not the queries, IMO).
Technically, yes. :)
But I think this should perhaps be treated as a bug in how we define/design languages, rather than as an immutable truth.
- We already have time-based versioning for languages. - We also have "tiers" of support for different platforms in language implementations (e.g. rarer architectures might have Tier 2 or Tier 3 support where the debugging tooling might not quite work)
One idea would be to introduce "tiers" into a language's definition. A smaller implementation could implement the language at Tier 1 (perhaps this would even be within reach for a university course project). An industrial-strength implementation could implement the language at Tier 3.
(Yes, this would also introduce more complications, such as making sure that the dynamic semantics at different tiers are equivalent. At that point, it becomes a matter of tradeoffs -- does introducing tiers help reduce complexity overall?)
Not when "correct" needs optimizations to meet real-time guarantees. It's hard to argue that a program which doesn't run is "correct".
This layering makes the order of the passes important and very sensitive. The passes usually don't have a grand plan, they just keep shuffling code around in different ways. A pass may only be applicable to code in a specific form created by a previous simplification pass. One pass may undo optimizations of a previous pass, or optimize-out a detail required by a later pass.
Separation into passes makes it easier to reason about correctness of each transformation in isolation, but the combined result is kinda slow and complicated.
> LLVM supports an interesting feature called Optimization Remarks – these remarks track whether an optimization was performed or missed. Clang support recording remarks using -fsave-optimization-record and Rustc supports -Zremark-dir=<blah>. There are also some tools (opt-viewer.py, optview2) to help view and understand the output.
If we think of the high-level source code (whether in SQL, JS, or Python) as being a functional spec that the low-level implementation should provably follow, maybe we should check them both in, and at build time, run a proof assistant to verify that the implementation still complies with the spec? Rather than running an optimizing compiler every time. I think this is what nanolith is suggesting in https://news.ycombinator.com/item?id=41965701.
Maybe you run the optimizing compiler once when you first write the code, and again when the proof assistant tells you an optimization is no longer valid, but you check the output before you check it in. Or maybe you have the compiler running all the time on a GPU cluster looking for new optimizations.
Anyone know if there was ever a retrospective on https://github.com/webyrd/Barliman ?
*(= “correct”?)
[sufficiently constrained madness is indistinguishable from ikigai?]
(our own convention; the fox has Hox, and the lion too, but they'd probably count in octal. For those keeping score, compare The House of Asterion: "The original says fourteen, but there is ample reason to infer that, as used by Asterion, this numeral stands for infinite")
Lagniappe: https://www.youtube.com/watch?v=TiXINuf5nbI#t=120s
EDIT: while we're at it, maybe I should specify units as orbits of Sol III?
Well, sure, sometimes, to an extent. But if it's load-bearing, maybe that's a bug? You might have written non-portable code that won't last, because it depends on an implementation detail that isn't standardized.
There are widespread applications where performance isn't critical. For example, any time you do a network request. Web pages shouldn't break because the network is slow today, if you can possibly avoid it.
The web provides no performance guarantees, but it tries pretty hard to provide compatibility guarantees. Your code should, usually, work on new browser versions and on devices that haven't been released yet. New browsers will have different JIT compilers with different performance cliffs. And yet, there are websites written many years ago that still work.
When standardizing things, we need to be precise about what's standardized and what isn't, or protocols "rust shut" and can't be changed without breaking lots of stuff. Not standardizing performance is often a win. (With some exceptions like video game consoles where the hardware is standardized.)
Hyrum's law suggests that all compiler optimizations will eventually be load-bearing for someone, but we should usually try to avoid it. To make sure that the code is robust, perhaps performance should vary. Maybe it would be useful to have something like a chaos monkey for programs, where optimizations vary based on a random seed?
If your "debug optimization" code is so slow as to be unusable (see: Rust), then your optimizations qualify as load-bearing.
The problem is that "optimization level" needs a mindset change. The optimization levels should be "release", "small" and "experimental".
"Release level" needs to be perfectly usable for debugging as well as in production--"debug level" should be "release level". Compilation time should be reasonable and run time should be functional.
After that, for embedded, you should have "small level"--checks should get turned off and any optimizations that make code significantly bigger should get turned off (loop unrolling, for example). You might enable some optimizations that make compile time brutally slow.
Finally, there should be an "experimental level" which tests out optimizations before they go into release.
And there should be no "fast" optimization level. If your optimization is that situation specific, it should stay stuck in "experimental".
And through all of this, the compiler should also eject files that carry enough information to allow debuggers to make sense of the compiled code, unwind the optimizations when the users asks, and present a coherent version of what is going on. This is actually where compilers really break down nowadays. The compiler needs to eject enough information and context that a debugger can unwind what is going on rather than being an afterthought.
We need the equivalent of an LSP (Language Server Protocol) for debuggers.
But completely-reversible-anywhere optimizations are rather limiting, disallowing a bunch of things; including, but not limited to, dead code elimination (more generally, forgetting some fraction of state from somewhere), identical code merging, and instruction reordering (esp. vectorization, which essentially reorders across multiple loop iterations), severely limiting what your optimizer can even do.
You can't do this because some optimizations can't be seen through; most obvious one is identical code merging. If you're in a merged function then you don't know which of the original source lines you're looking at.
I'd like to see a system which only had optimized compilation and used an interpreter for debugging.
Then don't do that optimization.
How much is gained versus the grief that is caused?
Even when I did a lot of embedded device coding, I don't think that optimization ever saved me more than a couple dozen bytes of code.
And, as you point out, the optimization is stupidly difficult for debuggers to to unwind.
This is one of the disconnects of modern optimization. There is no good backpressure from downstream tools to say "Oy. That optimization is going to make our lives difficult. If you can't demonstrate a significant improvement, don't do it."
It's important for embedded software though. But that was a bad example; I should've said tail calls. Which are much more common and more important, because avoiding them blows out the stack for recursive code.
> If your optimization is that situation specific, it should stay stuck in "experimental".
Yeah, I have a feeling that there’s low-key lots of applications for whom many compiler optimisations that you’re branding “experimental” are in fact run-of-the-mill and would just enable “experimental” mode so frequently it’d just be synonymous with “actually release mode”.
> And through all of this, the compiler should also eject files that carry enough information to allow debuggers to make sense of the compiled code, unwind the optimizations when the users asks, and present a coherent version of what is going on.
This is a pretty cool idea, semi-related, you should check out some of the work around E-Graphs, which attempt to enable the same level of optimisations but also preserve more original context. It’d be neat to have the equivalent of source-maps for highly optimised code, but I’m not sure how we’d manage them without them becoming enormous. After things like inlining, constant-folding, desugaring, rearrangement operations, etc all take place I suspect you’d be left with code that bears only the merest passing resemblance to its original form.
which leaves me with the strong intuition that indeed what mit benefit more from a rethink is not optimizers but focus on programmer productivity
> Perhaps this means Programming Language Research should be concentrating on something other than optimizations. Perhaps programmer productivity is a more fruitful arena.
[1] https://typesanitizer.com/blog/rethink-optimizers.html#closi...
In Halide, you specify the algorithm at a high level, then specify the "schedule" (execution order, vectorization, storage) separately. Then, it verifies that the two match. For tight loops like image kernels, it's often counter-intuitive which execution plan will give the best performance, so being able to experiment with the guarantee that you are still implementing the algorithm correctly is valuable.
For a database, it would be as if you could send a high level SQL query and an execution plan together.
Halide is very specific, but I could imagine a general-purpose language with a Python-like high level representation for code behavior, with the option to specify a lower-level implementation with a lot of performance annotations.
i.e. optimization had violated a rule we were implicitly relying on (that each non-inlined function should start at a distinct address, so that address-to-symbol mapping could be done easily). But that’s not an explicit guarantee and optimizers don’t seem to think about it much. (Well for inlining it seems to have had some thought, still sucks, but anyway this case doesn’t fit the pattern of inlining).
I find it hard to say anyone is dead wrong in this case… but I would turn off that LTCG optimization any time I could, except where proven necessary.
The query planner for SQL databases is a good example of this. For small queries, the odds of the planner making a bad choice is rather low. Large queries, on the other hand, are comparatively easy to lead down a bad path. This is why the "no sql" family of databases see more predictable results for a lot of uses. If you can't do complicated queries, then you don't have that peril to worry about.
Worth trying if you have the programs to play with. Run it with and without -O3 and see how the compiler does for you.
Can't you also just, well, not use complicated SQL queries?
Still not an easy task (performance is often highly dependent on the input and regressions can hide in very specific cases) but makes more sense to me than writing high level code and then worrying about the details of how the compier translated that high level code.