Thoughts on Generating C

Posted by ingve 10 hours ago

182 points | 52 commentspage 2

WalterBright 6 hours ago|

I've thought of doing that, but it's too much fun writing an optimizer and code generator!

(My experience with "compile to C" is with cfront, the original C++ implementation that compiled to C. The generated code was just terrible to read.)

uecker 4 hours ago||

What language features would make C better as a target language for compilers?

girvo 2 hours ago||

By piggybacking off GCC et al you gain very easy portability/access to a bunch of platforms that most languages would never attempt to support.

We used this in production with Nim for embedded firmware engineering at my previous job doing industrial IoT systems, which let us write a much nicer language than C (and much faster, much safer), with code-sharing between the network server (and its comms protocol) and the firmware code itself.

All can be done with C itself of course, but this let us achieve it faster and in a much nicer fashion

cozzyd 4 hours ago||

Not strictly for compilers (which probably don't need to use macros much), but for normal macro-codegen it would be very useful to have some some way to add line returns in macro-generated code so that it's easier to inspect with gcc -E

rirze 9 hours ago||

Love how he put a paragraph for someone asking, "why not generate Rust?". Beautiful.

pjc50 8 hours ago||

The lifetimes argument is extremely sound: this is information which you need from the developer, and not something that is easy to get when generating from a language which does not itself have lifetimes. It's an especially bad fit for the GC case he describes.

Findecanor 7 hours ago|||

> not something that is easy to get when generating from a language which does not itself have lifetimes

Not easy, but there are compilers that do it.

Lobster [0] started out with automatic reference counting. It has inferred static typing, specialising functions based on type, reminiscent of how Javascript JIT compilers do it. Then the type inference engine was expanded to also specialise functions based on ownership/borrowing type of its arguments. RC is still done for variables that don't fit into the ownership system but the executed ops overall got greatly reduced. The trade-off is increased code size.

I have read a few older papers about eliding reference counting ops which seem to be resulting in similar elisions, except that those had not been expressed in terms of ownership/borrowing.

I think newer versions of the Swift compiler too infer lifetimes to some extent.

When emitting Rust you could now also use reference counting smart pointers, even with cycle detection [1]. Personally I'm interested in how ownership information could be used to optimise tracing GC.

[0]:https://aardappel.github.io/lobster/memory_management.html

[1]:https://www.semanticscholar.org/paper/Breadth-first-Cycle-Co...

warangal 6 hours ago||

I was also reading through lobsters Memory management, which (i think) currently implements "borrow first" semantics, to do away with a lot of run-time reference counting logic, which i think is a very practical approach. Also i have doubts if reference counting overhead ever becomes too much for some languages to never consider RC ?

Tangentially, i was experimenting with a runtime library to expose such "borrow-first" semantics, such "lents" can be easily copied on a new thread stack to access shared memory, and are not involved in RC . Race-conditions detection helps to share memory without any explicit move to a new thread. It seems to work well for simpler data-structures like sequence/vectors/strings/dictionary, but have not figured a proper way to handle recursive/dynamic data-structures!

jcranmer 5 hours ago||||

If I were targeting Rust for compilation, I wouldn't do lifetimes, instead everything would be unsafe Rust using raw pointers.

I'd have to do an actual project to see how annoying it is to lower semantics to unsafe Rust to know for sure, but my guess is you'd be slightly better off because you don't have to work around implicit conversions in C, the more gratuitous UBs in C, and I think I'd prefer the slightly more complete intrinsic support in Rust over C.

bux93 7 hours ago|||

I mean, the argument boils down to "the language I'm compiling FROM doesn't have the same safeguards as rust". So obviously, the fault lies there. If he'd just compile FROM rust, he could then compile TO rust without running into those limitations. A rust-to-rust compiler (written in rust) would surely be ideal.

manwe150 6 hours ago||

I'd be willing to sell you a rust to rust compiler. In fact, I'll even generalize it to do all sorts of other languages too at no extra charge. I just need a good name...maybe rsync?

Snark aside, the output targets of compilers need to be unsafe languages typically, since the point of a high level compiler in general is to verify difficult proofs, then emit constructs consistent with those proof results, but simplified so that they cannot be verified anymore, but can run fast since those proofs aren't needed at runtime anymore. (Incidentally this is both a strength and weakness of C, since it provides very little ability for the compiler to do proofs, the output is generally close to the input, while other languages typically have much more useful compilers since they do much more proof work at compile time to make runtime faster, while C just makes the programmer specify exactly what must be done, and leaves the proof of correctness up to the programmer)

cozzyd 4 hours ago||

Compilers named after animals are the the most popular, so I might suggest cat?

Findecanor 11 minutes ago||

But ... then it would clash with the command for listing the files in a directory on RISC OS, FLEX and TRSDOS.

keybored 23 minutes ago|||

“Use Rust as a compilation target” is a new bugbear now? Never even heard that suggestion before.

artemonster 8 hours ago||

<something something about having a hammer and seeing nails everywhere> :)

themafia 3 hours ago||

> it could be that you end up compiling a function with, like 30 arguments, or 30 return values; I don’t trust a C compiler to reliably shuffle between different stack argument needs at tail calls to or from such a function.

Yet you trust it to generate the frame for this leviathan in the first place. Sometimes C is about writing quality code, apparently, sometimes it's about spending all day trying to outsmart the compiler rather than take advantage of it.

FpUser 8 hours ago||

This is weird. As soon as I thought about the subject the relevant article showed up on HN.

I was thinking about how to embed custom high level language into my backend application written in C++. Each individual script would compile to native shared lib loadable on demand so that the performance stays high. For this I was contemplating exactly this approach. Compile this high level custom language with very limited feature set to plain C and then have compiler that comes with Linux finish the job.

drivebyhooting 6 hours ago|

What’s your use case for this?

I’ve done a few such things for compiling ML models, but in the end I always regretted it.

bjourne 3 hours ago||

Last I checked static inline was merely a hint that compilers need not take. They all do, but by definition it's not a zero cost abstraction.

yearolinuxdsktp 5 hours ago||

Java JIT compilers perform function inlining across virtual function boundaries… this is why JIT’d Java can outperform the same C or C++ code. Couple it with escape analysis to transfer short-lived allocations to be stack-allocated (avoiding GC).

Often times virtual functions are implemented in C to provide an interface (such as filesystem code in the Linux kernel) via function pointers—-just like C++ vtable lookups, these cannot be inlined at compile time.

What I wonder is whether code generated in C can be JIT-optimized by WASM runtimes with similar automatic inlining.

yxhuvud 6 hours ago|

"static inline", the best way of getting people doing bindings in other languages to dislike your library (macros are just as bad, FWIW).

I really wish someone on the C language/compiler/linker level took a real look at the problem and actually tried to solve it in a way that isn't a pain to deal with for people that integrate with the code.

wahern 1 hour ago||

> I really wish someone on the C language/compiler/linker level took a real look at the problem and actually tried to solve it in a way that isn't a pain to deal with for people that integrate with the code.

It exists as "inline" and "extern inline".[1] Few people make use of them, though, partly because the semantics standardized by C99 were the complete opposite of the GCC extensions at the time, so for years people were warned to avoid them; and partly because linkage matters are a kind of black magic people avoid whenever possible--"static inline" neatly avoids needing to think about it.

[1] See C23 6.7.5 at https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf#p...

LtWorf 6 hours ago||

If it's not in the .h file it's supposed to be a private function.

zabzonk 5 hours ago||

you can access it using extern from anywhere:

    // a.c
    int f( int x ) {
        return x + 1;
    }

    // b.c
    extern int f(int x );

    int main() {
        int y = f(41);
    }

but if f() had been defined as static, you couldn't do this.

bigfishrunning 5 hours ago|||

"private function" doesn't mean "you can't know about this", it means "you shouldn't rely on this as a stable interface to my code".

Just because you can use the information you have to call a given function, doesn't mean you aren't violating an interface.

zabzonk 5 hours ago||

my point was that f() had been defined static then you can't access it from outside the translation unit it is defined in - in other words, it is "private". i'm afraid i'm unclear what your point is.

jlarocco 2 hours ago|||

I don't see what you're getting at with respect to writing bindings.

The whole point of using "static" in that way is to prevent people from using it outside of the file.

If you need to call a static function (inline or otherwise) from outside of the compilation unit to use the API, then it's a bug in the API, not a problem with static.

I agree with you about pre-processor macros, though.

zabzonk 55 minutes ago||

i think you replied to the wrong comment