Thoughts on Generating C

Posted by ingve 8 hours ago

182 points | 52 comments

20k 5 hours ago|

Static inline functions can sometimes serve as an optimisation barrier to compilers. Its very annoying. I've run into a lot of cases when targeting C as a compilation target where swapping something out into an always-inline function results in worse code generation, because compilers have bugs sadly

There's also the issue in that the following two things don't have the same semantics in C:

    float v = a * b + c;

    static_inline float get_thing(float a, float b) {
        return a*b;
    }

    float v = get_thing(a, b) + c;

This is just a C-ism (floating point contraction) that can make extracting things into always inlined functions still be a big net performance negative. The C spec mandates it sadly!

uintptr_t's don't actually have the same semantics as pointers either. Eg if you write:

    void my_func(strong_type1* a, strong_type2* b);

a =/= b, and we can pull the underlying type out. However, if you write:

    void my_func(some_type_that_has_a_uintptr_t1 ap, some_type_that_has_a_uintptr_t2 bp) {
        float* a = get(ap);
        float* b = get(bp);
    }

a could equal b. Semantically the uintptr_t version doesn't provide any aliasing semantics. Which may or may not be what you want depending on your higher level language semantics, but its worth keeping the distinction in mind because the compiler won't be able to optimise as well

kazinator 4 hours ago||

The inline function receives the operands as arguments, and so whatever they are, they get converted to float. Thus the inline code is effectively like this:

  float v = (float) ((float) a) * ((float) b) + c;

Since v is float, the cast representing the return conversion can be omitted:

  float v = ((float) a) * ((float) b) + c;

Now, if a and b are already float, then it's equivalent. Otherwise not; if they are double or int, we get double or int multiplication in the original open code.

jcranmer 3 hours ago||

> Now, if a and b are already float, then it's equivalent.

Not necessarily! Floating-point contraction is allowable essentially within statements but not across them. By assigning the result of a * b into a value, you prohibit contraction from being able to contract with the addition into an FMA.

In practice, every compiler has fast-math flags which says stuff it and allows all of these optimizations to occur across statements and even across inline boundaries.

(Then there's also the issue of FLT_EVAL_METHOD, another area where what the standard says and what compilers actually do are fairly diametrically opposed.)

kazinator 2 hours ago|||

The first mention of contraction in the standard (I'm looking at N3220 draft that I have handy) is:

A floating expression may be contracted, that is, evaluated as though it were a single opera- tion, thereby omitting rounding errors implied by the source code and the expression evalua- tion method.86) The FP_CONTRACT pragma in <math.h> provides a way to disallow contracted expressions. Otherwise, whether and how expressions are contracted is implementation-defined.

If you're making a language that generates C, it's probably a good idea to pin down which C compilers are supported, and control the options passed to them. Then you can more or less maintain the upper hand on issues like this.

garaetjjte 3 hours ago|||

It seems to me that either you want to allow for contraction everywhere, or not all. Allowing it only sometimes is worst of both worlds.

jcranmer 2 hours ago||

If you allow contraction after inlining, whether or not an FMA will get contracted becomes subject to the vicissitudes of inlining and other compiler decisions that can be hard-to-predict. It turns out to be a lot harder of a problem to solve than it appears at first glance.

quotemstr 4 hours ago||

Compiler bugs and standards warts suck, but you know what sucks more? Workarounds for compiler bugs and edge cases that become pessimizing folk wisdom that we can dispell only after decades, if ever. It took about that long to convince the old guards of various projects that we could have inline functions instead of macros. I don't want to spook them into renewed skepticism.

thomasahle 1 hour ago||

Maybe they just checked with a compiler and got the same code?

titzer 6 hours ago||

I think I may end up coming full circle on Virgil. Circa 2005 Virgil I compiled to C and then with avr-gcc to AVR. I did that because who the heck wants to write an AVR backend? Circa 2009 I wrote a whole new compiler for Virgil III and since then it has JVM, x86, x86-64, wasm, wasm-gc and (incomplete) arm64.

I like compiler backends, but truth be told, I grow weary of compiler backends.

I have considered generating LLVM IR but it's too quirky and unstable. Given the Virgil wasm backend already has a shadow stack, it should now be possible for me to go back to square one and generate C code, but manage roots on the stack for a precise GC.

Hmm....

aseipp 3 hours ago|

FWIW I think the LLVM bitcode format has stronger compatibility guarantees than the text IR. But I agree it's a bit of a pain either way; plus, if you forgo linking to the library and just rely on whatever 'llc' the user has installed, figuring out bugs is not a fun time...

whizzter 7 hours ago||

Having done this for a dozen of experiments/toys I fully agree with most of the post, would be nice if the the addition of must_tail attribute could be reliable across the big 3 compilers, but it's not something that can be relied on (luckily Clang seems to be fairly reliable on Windows these days).

2 additional points,

1: The article mentions DWARF, even without it you can use #line directives to give line-numbers in your generated code (and this goes a very long way when debugging), the other part is local variables and their contents.

For variables one can get a good distance by using a C++ subset(a subset that doesn't affect compile time, so avoid any std:: namespaced includes) instead and f.ex. "root/gc/smart" ptr's,etc (depending on language semantics), since the variables will show up in a debugger when you have your #line directives (so "sane" name mangling of output variables is needed).

2: The real sore point of C as a backend is GC, the best GC's are intertwined with the regular stack-frame so normal stack-walking routines also gives everything needed for accuracte GC (required for any moving GC designs, even if more naive generation collectors are possible without it).

Now if you want accurate somewhat fast portable stack-scanning the most sane way currently is to maintain a shadow-stack, where you pass prev-frame ptrs in calls and the prev-frame ptr is a ptr to the end of a flat array that is pre-pended by a magic ptr and the previous prev-frame ptr (forming a linked list with the cost of a few writes, one extra argument with no cleanup cost).

Sadly, the performant linked shadow-stack will obfuscate all your pointers for debugging since they need to be clumped into one array instead of multiple named variables (and restricts you from on-stack complex objects).

Hopefully, one can use the new C++ reflection support for shadow-stacks without breaking compile times, but that's another story.

Findecanor 5 hours ago||

> ... [pointers] need to be clumped into one array ...

You could put each stack frame into a struct, and have the first field be a pointer to a const static stack-map data structure or function that enumerates the pointers within the frame.

BTW, the passed pointer to this struct could also be used to implement access to the calling function's variables, for when you have nested functions and closures.

ufo 7 hours ago||

Related to shadow stacks, I've had trouble convincing the C optimizer that no one else is aliasing my heap-allocated helper stacks. Supposedly there ought to be a way to tell it using restrict annotations, but those are quite fiddly: only work for function parameters, and can be dusmissed for many reasons. Does anyone know of a compiler that successfully used restrict pointers in their generated code? I'd love to be pointed towards something that works.

jaen 3 hours ago||

Note that declaring no aliasing is probably unsafe for concurrent or moving garbage collectors, as then the C compiler can conveniently "forget" to either store or load values to the shadow stack at some points...

(though it is fine if GC can only happen inside a function call and the call takes the shadow stack as an argument)

spankalee 27 minutes ago||

I'm not a C programmer - having coded in high-level languages only for the past 20 years - but I've been doing a lot of WASM recently, and eagerly looking forward to the stack switching proposal so I don't have to implement an asincify-type transform for an async/await feature.

If it's true that a C program doesn't have control of the stack, what does that mean for supporting the stack switching in Wastrel? Can you not reify the stack and replace it with another from a suspended async function? Do you need some kind of userland stack for all stacks once you support WASM stack switching?

kazinator 4 hours ago||

Generators don't have to put out portable code. You document what compilers are required for the output and that's something you can change with any given release of your generator. Then the generated code uses whatever works with those compilers. If you use the output with some other compiler, then that's undefined behavior w.r.t. the documentation of the generator; you are on your own. "Whatever works" could be something undocumented that works de facto.

Joker_vD 7 hours ago||

> And finally, source-level debugging is gnarly. You would like to be able to embed DWARF information corresponding to the code you residualize; I don’t know how to do that when generating C.

I think emitting something like

    #line 12 "source.wasm"

for each line of your source before the generated code for that line does something that GDB recognizes well enough.

gopalv 6 hours ago||

If you have ever used something like yacc/bison, debugging it is relatively sane with gdb.

You can find all the possible tricks in making it debuggable by reading the y.tab.c

Including all the corner cases for odd compilers.

Re2c is a bit more modern if you don't need all the history of yacc.

kazinator 4 hours ago||

Debugging Yacc is completely insane with gdb, for other reasons, like that grammar rules aren't functions you can just put a breakpoint on, and see their backtrace, etc, as you can with a recursive descent parser.

But yes, you can put a line-oriented breakpoint on your action code and step through it.

Silphendio 5 hours ago||

Nim compiles to C, and it has a compiler iotion that does this.

kccqzy 6 hours ago||

I’ve done something similar during my intern days as well. We had a Haskell-based C AST library that supports the subset of C we generate, and an accompanying pretty printing library for generating C code that has good formatting by default. It really was a reasonable approach for good high-level abstraction power and good optimizations.

jbreckmckye 1 hour ago||

I wonder what Zig would be like as an ILR. Easy cross compilation, plus, you can compile with runtime checks to help debug your compiler output. Might be fun for a sideproject

sph 6 hours ago|

Has anyone defined a strict subset of C to be used as target for compilers? Or ideally a more regular and simpler language, as writing a C compiler itself is fraught with pitfalls.

rwmj 6 hours ago||

Not precisely, but C-- (hard to search for!) was a C-like (or C subset?) intermediate language for compilers to generate.

I found this Reddit thread that gives a bit more detail:

https://www.reddit.com/r/haskell/comments/1pbbon/c_as_a_proj...

and the project link:

https://www.cs.tufts.edu/~nr/c--/

stephenbennyhat 6 hours ago|||

https://en.wikipedia.org/wiki/C-- for example?

manwe150 4 hours ago|||

Sounds like why LLVM was created? (and derivatives like MLIR and NaCL) Its IR is intended be be C-like, except that everything is well-defined and substantially more expressive than C.

nickpsecurity 1 hour ago|||

I think one could also use a subset compatible with a formal semantics of C. Maybe the C semantics in K Framework, CompCert C, or C0 from Verisoft. Alternatively, whatever is supported in open-source, verification tooling.

Then, we have both a precise semantics and tools to help produce robust output.

nxobject 4 hours ago||

For portability, hopefully C89 as well?

More comments...