You can't fool the optimizer

Posted by HeliumHydride 12/3/2025

267 points | 189 commentspage 3

amai 12/6/2025|

> This process of converting different code patterns into a standard, canonical form is what lets the compiler treat them all identically.

Wouldn‘t it be nice if we could transform back from the canonical form into the most readable code? In the example that would convert all functions into x + y.

anonymousiam 12/4/2025||

Sometimes you can fool the (C) optimizer by using the 'volatile' keyword in front of a variable in code that would otherwise be optimized out.

https://www.embeddedrelated.com/thread/4749/when-and-how-to-...

mattnewport 12/4/2025||

That's not really "fooling" the optimizer, that's kind of the point of volatile. The optimizer not making optimizations is the intended behaviour.

anonymousiam 12/5/2025||

You're right, but the article failed to mention that there was a way around the optimizations.

1718627440 12/5/2025||

Fooling for you is making someone not do X by telling them not to do X?

torginus 12/3/2025||

Awesome blog post - thanks to this I found out that you can view what the LLVM optimizer pipeline does, and which pass is actually responsible for doing which instruction.

It's super cool to see this in practice, and for me it helps putting more trust in the compiler that it does the right thing, rather than me trying to micro-optimize my code and peppering inline qualifiers everywhere.

norir 12/3/2025||

For me, compiler optimization is a mixed bag. On the one hand, they can facilitate the generation of higher performance runtime artifacts, but it comes at significant cost, often I believe exceeding the value they provide. They push programs in the direction of complexity and inscrutability. They make it harder to know what a function _actually_ does, and some even have the ability to break your code.

In the OP examples, instead of optimization, what I would prefer is a separate analysis tool that reports what optimizations are possible and a compiler that makes it easy to write both high level and machine code as necessary. Now instead of the compiler opaquely rewriting your code for you, it helps guide you into writing optimal code at the source level. This, for me, leads to a better equilibrium where you are able to express your intent at a high level and then, as needed, you can perform lower level optimizations in a transparent and deterministic way.

For me, the big value of existing optimizing compilers is that I can use them to figure out what instructions might be optimal for my use case and then I can directly write those instructions where the highest performance is needed. But I do not need to subject myself to the slow compilation times (which compounds as the compiler repeatedly reoptimizes the same function thousands of times during development -- a cost that is repeated with every single compilation of the file) nor the possibility that the optimizer breaks my code in an opaque way that I won't notice until something bad and inscrutable happens at runtime.

raverbashing 12/3/2025||

I'm curious what is the theoreme-proving magic behind add_v4 and if this is prior LLVM ir

gpderetta 12/3/2025||

Interesting, even this can't fool the optimizer (tried with a recent gcc and clang):

  unsigned add(unsigned x, unsigned y) {
   std::vector vx {x};
   std::vector vy {y};
   auto res = vx[0]+vy[0];
   return res;
  }

Joker_vD 12/3/2025||

Wait, why does GAS use Intel syntax for ARM instead of AT&T? Or something that looks very much like it: the destination is the first operand, not the last, and there is no "%" prefix for the register names?

Karliss 12/3/2025|

That's not Intel syntax that's more or less ARM assembly syntax as used by ARM documentation. Intel vs AT&T discussion is primarily relevant only for x86 and x86_64 assembly.

If you look at GAS manual https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/a... almost every other architecture has architecture specific syntax notes, in many cases for something as trivial comments. If they couldn't even decide on single symbols for comments, there is no hope for everything else.

ARM isn't the only architecture where GAS uses similar syntax as developers of corresponding CPU arch. They are not doing the same for X86 due to historical choices inherited from Unix software ecosystem and thus AT&T. If you play around on Godbolt with compilers for different architectures it seems like x86 and use AT&T syntax is the exception, there are a few other which use similar syntax but it's a minority.

Why not use same syntax for all architectures? I don't really know all the historical reasoning but I have a few guesses and each arch probably has it's own historic baggage. Being consistent with manufacturer docs and rest of ecosystem has the obvious benefits for the ones who need to read it. Assembly is architecture specific by definition so being consistent across different architectures has little value. GAS is consistent with GCC output. Did GCC added support for some architectures early with the with help of manufacturers assembler and only later in GAS? A lot of custom syntax quirks which don't easily fit into Intel/AT&T model and are related to various addressing modes used by different architectures. For example ARM has register postincrement/preincrement and the 0 cost shifts, arm doesn't have the subregister acess like x86 (RAX/EAX/AX/AH/AL) and non word access is more or less limited to load/store instructions unlike x86 where it can show up in more places. You would need to invent quite a few extensions for AT&T syntax for it to be used by all the non x86 architectures, or you could just use the syntax made by developer of architecture.

Joker_vD 12/3/2025|||

> Why not use same syntax for all architectures?

My question is more, why even try to use the same syntax for all architectures? I thought that was what GAS's approach was: that they took AT&T syntax, which historically was unified syntax for several PDPs (and some other ISA, I believe? VAX?) and they made it fit every other ISA they supported. Except apparently no, they didn't, they adopted the vendors' syntaxes for other ISAs but not for Intel's x86? Why? It just boggles my mind.

gldrk 12/3/2025||

I don’t believe GNU invented the AT&T syntax for x86. System V probably targeted x86 before GNU did (Richard Stallman didn’t think highly of microcomputers). They used some kind of proprietary toolchain at the time that gas must have copied.

kragen 12/4/2025|||

As I understand it, GCC and gas for i386 were written to be compatible with existing Unix C compilers (emitting assembly) and assemblers (consuming assembly), and those tools used the "AT&T" syntax that copied the PDP-11 assembler's syntax.

For amd64 there isn't really much of an excuse. Intel syntax was already supported in gas.

asah 12/3/2025||

I want an AI optimization helper that recognizes patterns that could-almost be optimized if I gave it a little help, e.g. hints about usage, type, etc.

Jaxan 12/4/2025|

Why does it have to be AI?

Scubabear68 12/3/2025|

I liked the idea behind this post, but really the author fairly widely missed the mark in my opinion.

The extent to which you can "fool the optimizer" is highly dependent on the language and the code you're talking about. Python is a great example of a language that is devilishly hard to optimize for precisely because of the language semantics. C and C++ are entirely different examples with entirely different optimization issues, usually which have to do with pointers and references and what the compiler is allowed to infer.

The point? Don't just assume your compiler will magically make all your performance issues go away and produce optimal code. Maybe it will, maybe it won't.

As always, the main performance lessons should always be "1) Don't prematurely optimize", and "2) If you see perf issues, run profilers to try to definitively nail where the perf issue is".

gpderetta 12/3/2025|

I think the author is strictly talking about C and C++. Python is famously pessimal in all possible ways.

Scubabear68 12/3/2025||

Digging around, OK that makes sense. But even in the context of C and C++, there are often more ways the compiler can't help you than ways it can.

The most common are on function calls involving array operations and pointers, but a lot of it has to do with the C/C++ header and linker setup as well. C and C++ authors should not blithely assume the compiler is doing an awesome job, and in my experience, they don't.

gpderetta 12/3/2025||

> C and C++ authors should not blithely assume the compiler is doing an awesome job

Agree. And I'm sure the author agrees as well. That's why compiler-explorer exists in the first place.

More comments...