I am also not entirely sure whether "manorboy" is a good benchmark, but for estimating the function call overhead it should be ok. For the access to variables part, other considerations are far more important IMHO. So I am not sure I would put too much weight on the accessing "k" part of the post.
n3654 contains a criticism of the lambda approach from a language-design perspective. The issue is that it seems not to be a good fit for C. Copying values is always cheap, but this obviously works in toy examples but not necessarily for interesting data structures. In C++ you could then capture a pointer as a value, but then you haven't avoided an indirection either. In general, this is fine in C++ as smart pointer than deal with the memory management, but in C having a captured value in a lambda you can not have explicit access to anymore does not make too much sense.
The "unfortunately, is that unlike C++ there are no templates in C" is also interesting. I fled from C++ back to C exactly because of templates. In a performance context, the fallacy is that you can always create super-optimized code using compile-time techniques that absolutely shines in microbenchmarks (such as this one) but cause a lot of bloat on larger scale (and long compilations times). If you want this, I think you should stick to C++.
(https://news.ycombinator.com/item?id=46243298)
Which rather suggests to me that such a scheme, but generated by the compiler, should have a similar performance to said "Normal Functions" and hence similar to his preferred lambda form.
Since his benchmark environment is so unwieldy, I may have a go at extracting those two code sets to a standalone environment, and measure them so see...
xgcc (GCC) 16.0.0 20260103 (experimental)
1.50 gcc -ftrampoline-impl=stack -Wl,-no-warn-execstack
1.11 gcc -ftrampoline-impl=stack -Wl,-no-warn-execstack -DREFARG
7.21 gcc -ftrampoline-impl=heap
7.34 gcc -ftrampoline-impl=heap -DREFARG
0.93 gcc -DWIDEPTR
1.38 gcc -DWIDEPTR -DREFARG
1.40 gcc -DDIRECT
1.05 gcc -xc++ -std=c++26 -DFUNCREF -DDEDUCING
19.68 gcc -xc++ -std=c++26 -DDEDUCING
20.73 gcc -xc++ -std=c++26
6.31 gcc -xc++ -std=c++26 -DDEDUCING -DREFARG
6.31 gcc -xc++ -std=c++26 -DREFARG
Debian clang version 16.0.6 (15~deb12u1)
21.11 clang -xc++
6.16 clang -xc++ -DREFARG
1.66 clang -fblocks
1.70 clang -fblocks -DREFARGObviously I sympathize as someone who spent the front half of their career writing C and has no interest in writing C++ beyond toy examples. Nevertheless, IMO you're cutting off a choice here, forcing yourself to make potentially sub-optimal choices.
Sometimes the templates would have been better and writing X-macros is just worse, ergonomically at least. Perhaps C++ people use templates more often than they should, but I'm sure "never" is not the correct amount.
Still, if C gets fat pointers that's at least a step in the right direction. So I must wish you the best with that part.
I also recently removed some code written with templates in C++ from one of my projects, because this single file caused compilation times for a ca. 450 file C project to increasse by 50% from 20s to ca. 30s.
AFAIK the C standard doesn't prevent an implementation from using fat pointers, this is one of the reasons why the conversion from pointer to integer only works in one direction. This is actually necessary for segmented memory. The compiler is allowed to optimize based on the assumption, which allocation a pointer comes from (including the allocations boundaries, i.e. size), even if bytewise the pointers would be equal, so you could argue, that C abstract machine already has fat pointers.
I am really not sure if all these observations mentioned in the article are 100% correct, though.
First : Code seems to be compiled with clang. On Linux with gcc the native function one is way faster than the clang one.
Second: The author does run the code on ARM64/MacOS .
At least on my ryzen CPU on Linux with gcc the "normal C code" is way faster than anything else. Not that we do not need to thing about "closure" type functionality, but one should be careful to extrapolate implementations from one compiler on one platform to the rest of the pack.
Regarding N3654 I am not sure how to benchmark it here, since C could potentially use __builtin_call_with_static_chain , but I am not sure how to write the function to use the chain for accessing the variables.
I tried to estimate N3654 it by using "tinygo" which is AFAIK using the usual Calling ABI, but it was a factor of two slower than clang. Even "go" with its very specific ABI is still much slower. I discovered this isn't representative since runtime calling costs had been totally shadowed by costs of allocations.
Even the rust example I am usually using http://www.reddit.com/r/rust/comments/2t80mw/the_man_or_boy_... is much slower than anything else, presumably because of the "Cell" needed
TLDR: This micro benchmark might be misleading
A trick one can do is to let it create the trampoline and then read off the two pointer from the position in the code where it is stored. Not portable and you still have the overhead for creating the trampoline, but you do not need the executable stack anymore.
So, we need to swap to the logarithmic graphs to get a better picture
I wish more people would know about decibels.
> I wish more people would know about decibels.
Huh? Is there any difference? https://en.wikipedia.org/wiki/Decibel:
“The decibel […] expresses the ratio of two values of a power or root-power quantity on a logarithmic scale”
That whole 'zero cost abstraction' idea is not unique to C++ since all the important work to make the 'zero cost thing' happen is performed down in the optimizer passes, and those are identical between C and C++.
- Ada? Many cool features, but that's also the problem - it's damn complicated.
- C++ combines simplicity of Ada with safety of C.
- Rust and Zig are only ~10 years old and haven't really stabilized yet. They also start to suffer from npm-like syndrome, which is much more problematic for a systems language.
- ATS? F#? Not all low-level stuff needs (or can afford) this level of assurance.
- Idris? Much nicer than ATS, but it's even younger than Rust and still a running target (and I'm not sure if zero-runtime support is there yet).
I mean, yes, C is missing tons of potentially useful features (syntactic macros, for one thing), but closures are not one of them.
Even if not perfect, "Typescript for C" came out in 1983.
Which functions are required for a C program to run aside from those that it run explicitly? memcpy? It's basically a missing CPU instruction. malloc will never get called unless you call it yourself.
> followed by history of systems programming languages, starting with JOVIAL in 1958.
All system languages (e. g. BLISS, PL/S, NEWT) designed as such before C was vendor-specific. Some of these had nice things missing from C, but none propagated beyond their original platform. And today we have no option but C.
> "Typescript for C" came out in 1983.
C++ is not just "not perfect", it is far worse in every way. Let's let people overload everything. And let's overload shift operator to do IO. And make every error message no less than 1000 lines, with actual error somewhere in the middle. Let's break union and struct type punning because screw you that's why. You say C macros are unreadable? Behold, template metaprogramming! C is not perfect, but it has the justification of being minimal. C++ is a huge garbage dumpster on fire.
C was also vendor specific to Bell Labs.
C++ is definitely Typescript for C.
Compare e. g. https://github.com/rust-lang/rust/pull/102750/ (I'm not following Rust development closely, picked up the first one). Yes, developers do rely on non-guaranteed behavior, that's their nature. C would likely standardize old behavior. Basically all of the C standard is a promoted vendor-specific extension - for better or worse.
Here is another good one: https://internals.rust-lang.org/t/type-inference-breakage-in...
C23 on the other hand landed a bunch of hard changes, and "it didn't break my code" only matches the reality most people observe with Rust too. The changes you mention didn't break my Rust either.
But some trivial C did break because C23 is a new language
https://c.godbolt.org/z/n9vhMGYW5
In fact those Rust changes aren't language changes unlike C23, the first you linked is a choice for the compiler to improve on layout it didn't guarantee, anybody who was relying on that layout (and there were a few) was never promised this wouldn't change, next version, next week or indeed the next time they compiled the same code. You can even ask Rust's compiler to help light a fire under you on this by arbitrarily changing some layout between builds, so that stuff which depends on unpromised layout breaks earlier, reminding you to actually tell Rust e.g. repr(C) "I need the same layout guarantees for my type T as I'd get in C" or repr(transparent) "I need to ensure my type wrapper T has the same layout as the type I'm wrapping"
The second isn't a language change at all, it's a change to a library feature, so now we're down to "C isn't stable until the C library is forever unchanging" which hopefully you recognise as entirely silly. Needless to say nobody does that.
I'm trying to drag one program at $employer up to C99 (plus C11 _Generic), so I can then subsequently drag it to the bits of C23 which GCC 13 supports.
This all takes times, and having to convince colleagues during code reviews.
What C23 has done is authorise some of the extensions which GCC has had for some time as legitimate things (typeof, etc).
However the ability to adopt is also limited by what third party linters in use at $employer may also support.
How so? In C++ a lambda is just a regular type that does not allocate any memory by itself. You have in fact precise control over how/where a lambda is allocated.
Any C implementation of capturing lambdas has the same problem of course, that's why the whole idea doesn't really fit into the C language IMHO.
Basically, with each capturing lambda, a context type _Ctxof(fn) is created that users can then declare on stack or on heap themselves.
Using C++ doesn't mean having to use the whole standard.
Now if C type safety actually was like Modula-2, Object Pascal or Zig, that would not be as bad.
For example, I'm maintaining some 20 year old C code, which the employer adopted around 10 years ago. It will likely stay in use at least until the current product is replaced, whenever that may be.
https://github.com/sbcl/sbcl/commit/021414445224e692ebb9c48c... https://github.com/sbcl/sbcl/commit/2c56c5901df1a5ea98ae19e9...
C does not have closures. You could simulate closures, but it is neither robust not automatic compared to languages tha truly support them.