Posted by ingve 6/27/2025
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3531.txt
Add a namespacing macro and you have a whole generics system, unlike that in TFA.
So, it might add more value to have the C std add an `#include "file.c" name1=val1 name2=val2` preprocessor syntax where name1, name2 would be on a "stack" and be popped after processing the file. This would let you do types/functions/whatever "generic modules" with manual instantiation which kind of fits with C (manual management of memory, bounds checking, etc.) but preprocessor-assisted "macro scoping" for nested generics. Perhaps an idea to play with in your slimcc fork?
That's an interesting idea! I think D or Zig's C header importer had similar syntax, I'm definitely gonna do it.
#include "file.c" _(_x)=myNamePrefix ## _x `\
KEY=charPtr VAL=int `\
....
The idea being that inside any generic module your private / protected names are all spelled _(_add)(..).By doing that kind of namespacing, you actually can write a generic module which allows client code manual instantiators a lot of control to select "verb noun" instead of "noun verb" kinds of schemes like "add_word" instead of "word_add" and potentially even change snake_case to camelCase with some _capwords.h file that does `#define _get Get` like moves, though of course name collisions can bite. That bst/ thing I linked to does not have full examples of all the optionality. E.g., to support my "stack popping" of macro defs, without that but just with ANSI C 89 you might do something like this instead to get "namespace nesting":
#ifndef CT_ENVIRON_H
#define CT_ENVIRON_H
/* This file establishes a macro environment suitable for instantiation of
any of the Map/Set/Seq/Pool or other sorts of generic collections. */
#ifndef _
/* set up a macro-chain using token pasting *inside* macro argument lists. */
#define _9(x) x /* an identity macro must terminate the chain. */
#define _8(x) _9(x)
#define _7(x) _8(x) /* This whole chain can really be as long as */
#define _6(x) _7(x) /* you want. At some extreme point (maybe */
#define _5(x) _6(x) /* dozens of levels) expanding _(_) will start */
#define _4(x) _5(x) /* to slow-down the Cpp phase. */
#define _3(x) _4(x) /* Also, definition order doesn't matter, but */
#define _2(x) _3(x) /* I like how top->bottom matches left->right */
#define _1(x) _2(x) /* in the prefixing-expansions. */
#define _0(x) _1(x)
#define _(x) _0(x) /* _(_) must start the expansion chain */
#endif
#ifndef CT_LNK
# define CT_LNK static
#endif
#endif /* CT_ENVIRON_H */
and then with a setup like that in place you can do: #define _8(x) _9(i_ ## x) /* some external client decides "i_" */
_(_foo) /* #include "I" -> i_foo at nesting-level 8 */
#define _6(x) _7(e_ ## x) /* impl of i_ decides "e_" */
_(_foo) /* #include "E" -> i_e_foo at level 6 */
#define _3(x) _4(c_ ## x) /* impl of e_ decides "c_" */
_(_foo) /* #include "C" -> i_e_c_foo at level 3 */
#define _0(x) _1(l_ ## x) /* impl of c_ decides "l_" */
_(_t)
_(_foo) /* #include "L" -> i_e_c_l_foo at level 0 */
#define _0(x) _1(x) /* c impl uses _(l_foo) to define _(bars) */
_(_foo) /* i_e_c_foo at nesting level 3 again */
#define _3(x) _4(x) /* e impl uses _(c_foo) to define _(bars) */
_(_foo) /* i_e_foo at nesting level 6 again */
#define _6(x) _7(x) /* i impl now uses _(e_foo) to define _(bars) */
_(_foo) /* i_foo at nesting level 8 again */
Yes, yes. All pretty hopelessly manual (as is C in so many aspects!). But that smarter macro def semantics across parameterized includes I mentioned above could go a long way towards a quality of life improvement "for client code" with good "library code" file organization. I doubt it will ever be easy enough to displace C++ much, though.Personally, I started doing this kind of thing in the mid-1990s as soon as I saw people shipping "code in headers" in C++ template libraries and open source taking off. These days I think of it as an example of how much you can achieve with very simple mechanisms and the trade-offs of automating instantiation at all. But people sure seem to like to "just refer" to instances of generic types.
I guess ctags-type tools would need updating for the new possible definition location. Mostly someone needs to decide on a separation syntax for stuff like `name1(..)=expansion1 name2(..)=expansion2` for "in-line" cases. Compiler programs have had `cc -Dname(..)=expansion` or equivalents since the dawn of the language, but they actually get the OS/argv idea of separation from whatever CL args or Windows APIs or etc.
Anyway, might makes sense to first get experience with a slimcc/tinycc/gcc/clang cpp++ extension. ;-) Personally, these days I mostly just use Nim as a better C.
Anyway, as is so often the case, it's about the whole ecosystem not just of tooling but the ecosystem of assumptions about & around tooling.
As I mentioned in my other comment, if you want you can always cc -E and re-format the code somehow, although the main times you want to do that are for line-by-line stepping in debuggers or maybe for other cases of "lines as source coordinates" like line-by-line profilers.
Of course, a more elegant solution might be just having more "adjustable step size/source coordinates" like "single ';'-statement or maybe single sequence control point in debuggers than just "line orientation". This is, in fact, so natural an idea that it seems a virtual certainty some C debugger has an "expressional step/next", especially if written by a fan more of Lisp than assembly. Of course, at some point a library is just debugged/trusted, but if there are "user hooks" those can be buggy. If it's performance important, it may never be unwelcome to have better profile reports.
While addr2line has been a thing forever, I've never heard of an addr2expr - probably because "how would you label it?" So, pros & cons, but easy for debugger/profilers is one reason I think the parameterized file way is lower friction.
https://github.com/facebookresearch/CParser#multiline-macros
Some debuggers make use of it when displaying the current program state, the major debuggers do not allow you to step into a specific sub-call on a line (e.g. skip function arguments and go straight to the outermost function call). This is purely a UI issue, they have enough information. I believe the nnd debugger has implemented selecting the call to step into.
Addr2line could be amended. I am working on my own debugger and I keep re-implementing existing command line tools as part of my testing strategy. A finer-grained addr2line sounds like a good exercise.
So, a column number would not be very meaningful to a programmer (relative to some ';' or '{}' expressional label leveraging internal language syntax/bracketing which would definitely still be a bit to muck about with). As per my Lisp mention, it is really be a >1 dimensional idea, and there are various ways to flatten/marshal that parse tree. "next/over" and "step/into" are enough "incrementally/dynamically/interactively" to build up that 2d navigation, but also harder to work with "cumulatively" and with more complex than lisp grammars. Maybe most concretely, how "subexpression numbers" (in addr2x or other senses) are enumerated might still be a thing programmers need to "learn" from their "debugger UI".
Another option might be to "reverse preprocess it" or maintain forward-meta-data to go from the "virtual line column number" back to the "true source (line,column)".
I don't mean to discourage you, but just explain more what problem I meant to refer to by "how to label it" and highlight limitations of your new test. { But many are probably limited somehow! :-) }
For every single symbol you need to actually check if there is a splice (backslash + new line) in it. For single pass compiler, this contribute to a very slow lexing phase as this splice can appear anywhere in a C/C++ code.
Additionally the example isn't even possible, at least make ridiculous examples that compile.
Note as the newer versions are basically C++ without Classes kind of thing.
Second to that I'd say the appeal is just watching something you've known for a long time grow slowly and steadily.
If you mean templates, a kind of solved problem since C++17 with static_assert and enable_if, moreso in C++20 with concepts.
It offers us safety features for arrays and strings, that apparently WG14 will never add to C.
Didn't so in 40 years, and still remains to be seen what will be done with the current trend of cybersecurity laws.
Then there is the whole basic stuff like proper namespaces instead of the ridiculous prefix convention.
This from a point of view of C++ ARM defacto standard back in the 1990's, not even considering anything else.
I see more possibilities for people to hurt themselves using C than C++, since 1993 when I added C++ to my toolbox.
I debugged enough problematic C++ code to know that people can hurt themselves badly with it.
Even if non standard, all major C++ compiler vendors have provided similar features on their standard library, and is now officially supported in C++26.
I have debugged enough C memory corruption issues with strings and arrays, that I would thought by now WG14 would actually care to fix the root cause, 40 years in.
Also to note that said extension only exists because Apple did the work WG14 did not bothered to do for the last 40 years, and as way to improve interop with safe Swift.
At least WG21 eventually did the correct thing and placed those extensions into the standard, even if due to governmental pressure.
Also while enabling bounds checking has been a common configuration option in all C++ compilers, clang and GCC aren't all C compilers.
This kind of discussion is also quite telling that nothing will change on WG14, maybe eventually who knows, C2y might finally get fat pointers if voted in, and then we will see during the following decades whatever fruits that will bare.
When we will have a standard for bounds checking arrays and pointers remains to be seen, but this does not stop anyone from using the non-standard tools available today.
And if you're targeting PC, you might be better off using Python to begin with (if perf is not a concern)
- "Most" C projects do basic OOP, many C projects even do inheritance via composition and a fair few of these do virtual dispatch too
- Templates (esp. since C++20), lambda functions, overloads and more recently coroutines (which are fancy FSM in their impl), etc. reduce boilerplate a lot
- Containers (whether std:: or one's own) are far easier to work with in C++, a lot less boilerplate overall (GDB leveraged this during their migration iirc)
- string_view makes non-destructive substring manipulation a lot easier; chrono literals (in application code) make working with durations a lot more readable too
In the past decade or two, major projects like GCC and GDB have migrated from C to C++.
Obviously, C retains advantages over C++, but they are fairly limited: faster build times, not having to worry about exposing "extern C" interface in libraries, not having to worry about controversial features like exceptions and (contextually) magic statics and so on...
One other key thing is encapsulation provided via various C++ syntax which is missing in C (where only file scope is possible).
That of course doesn't help you with the switch away from C. The question is why they keep updating the language. The only ones with valid reasons to not upgrade to some more sane language can't take advantage of the new features.
however for some niche os specific things, and existing legacy products where oversight is involved, simply rolling out a c++ porting of it on the next release is, well, not a reality, and often not worth the bureaucratic investment.
while i have no commentary on the post because i'm not really a c programmer, i think a lot of comments forget some projects have requirements, and sometimes those requirements become obsolete, but you're struck with what you got until gen2, or lazyloading standardization across teams.
templates is the main thing c++ has over c. its trivial to circumvent or escape the thing u dont 'like' about c++ like new and delete (personal obstacle) and write good nice modern c++ with templates.
C generic can help but ultimately, in my opinion, the need for templating is a good one to go from C to C++.
That said…I agree that there is a lot of syntactic sugar that could be added for free to C.
Assume you successfuly allocate an array "arr" with "sz" elements, where "sz" is of type "size_t". Then "arr + sz" is a valid expression (meaning the same as "&arr[sz]"), because it's OK to compute a pointer one past the last element of an array (but not to dereference it). Next you might be tempted to write "arr + sz - arr" (meaning the same as "&arr[sz] - &arr[0]"), and expect it to produce "sz", because it is valid to compute the element offset difference between two "pointers into an array or one past it". However, that difference is always signed, and if "sz" does not fit into "ptrdiff_t", you get UB from the pointer subtraction.
Given that the C standard (or even POSIX, AIUI) don't relate ptrdiff_t and size_t to each other, we need to restrict array element counts, before allocation, with two limits:
- nelem <= (size_t)-1 / sizeof(element_type)
- nelem <= PTRDIFF_MAX
(I forget which standard header #defines PTRDIFF_MAX; surpisingly, it is not <limits.h>.)
In general, neither condition implies the other. However, once you have enforced both, you can store the element count as either "size_t" or "ptrdiff_t".
Never mix unsigned and signed operands. Prefer signed. If you need to convert an operand, see (2).
https://nullprogram.com/blog/2024/05/24/You cannot even check the signedness of a signed size to detect an overflow, because signed overflow is undefined!
The remaining argument from what I can tell is that comparisons between signed and unsigned sizes are bug-prone. There is however, a dedicated warning to resolve this instantly.
It makes sense that you should be able to assign a pointer to a size. If the size is signed, this cannot be done due to its smaller capacity.
Given this, I can't understand the justification. I'm currently using unsigned sizes. If you have anything contradicting, please comment :^)
IMO, this is a better approach than using signed types for indexing, but AFAIK, it's not included in GCC/glibc or gnulib. It's an optional extension and you're supposed to define `__STDC_WANT_LIB_EXT1__` to use it.
I don't know if any compiler actually supports it. It came from Microsoft and was submitted for standardization, but ISO made some changes from Microsoft's own implementation.
https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1173.pdf#p...
You can, since the number of bits is the same. The mapping of pointer bits to signed integer bits will mean that you can't then do arithmetic on the resulting integers and get meaningful results, but the behavior of such shenanigans is already unspecified with no guarantees other than you can get an integer out of a pointer and then convert it back later.
But also, semantically, what does it even mean to convert a single pointer to a size? A size of an object is naturally defined as the count of chars between two pointers, one pointing at the beginning of the object, the other at its end. Which is to say, a size is a subset of pointer difference that just happens to always be non-negative. So long as the implementation guarantees that for no object that non-negative difference will always fit in a signed int of the appropriate size, it seems reasonable to reflect this in the types.
int somearray[10];
new_ptr = somearray + signed_value;
or
element = somearray[signedvalue];
this seems almost criminal to how my brain does logic/C code.
The only thing i could think of is this:
somearray+=11; somearray[-1] // index set to somearray[10] ??
if i'd see my CPU execute that i'd want it to please stop. I'd want my compiler to shout at me like a little child, and be mean until i do better.
-Wall -Wextra -Wextra -Wpedantic <-- that should flag i think any of these weird practices.
As you stated tho, i'd be keen to learn why i am wrong!
Arrays aren't the best example, since they are inherently about linear, scalar offsets, but you might see a negative offset from the start of a (decayed) array in the implementation of an allocator with clobber canaries before and after the data.
1. Certain your added value is negative.
2. Checking for underflows after computation, which you shouldn't.
The article was interesting.
Why?
By the definition of ptrdiff_t, ISTM the size of any object allocated by malloc cannot be out of bounds of ptrdiff_t, so I'm not sure how can you have a useful size_t that uses the sign bit?
Unsigned types in C have modular arithmetic, I think they should be used exclusively when this is needed, or maybe if you absolutely need the full range.
Two's complement encodes -x as ~x + 1 = 2^n - x = -x (mod 2^n) and can therefore be mixed with unsigned for (+, -, *, &, |, ^, ~, <<).
> I think they should be used exclusively when this is needed
The opposite: signed type usage should be kept to a minimum because signed type (and pointer) overflow is UB and will get optimized as such.
C is weakly typed, the basic types are really not to maintain invariants or detect their violation
#include <stdlib.h>
#include <stdio.h>
#define vec(T) struct { T* val; int size; int cap; }
#define vec_push(self, x) { \
if((self).size == (self).cap) { \
(self).cap = (self).cap == 0 ? 1 : 2 * (self).cap; \
(self).val = realloc((self).val, sizeof(*(self).val) * (self).cap); \
} \
(self).val[(self).size++] = x; \
}
#define vec_for(self, at, ...) \
for(int i = 0; i < (self).size; i++) { \
auto at = &(self).val[i]; \
__VA_ARGS__ \
}
typedef vec(char) string;
void string_push(string* self, char* chars)
{
if(self->size > 0)
{
self->size -= 1;
}
while(*chars)
{
vec_push(*self, *chars++);
}
vec_push(*self, '\0');
}
int main()
{
vec(int) a = {};
vec_push(a, 1);
vec_push(a, 2);
vec_push(a, 3);
vec_for(a, at, {
printf("%d\n", *at);
});
vec(double) b = {};
vec_push(b, 1.0);
vec_push(b, 2.0);
vec_push(b, 3.0);
vec_for(b, at, {
printf("%f\n", *at);
});
string c = {};
string_push(&c, "this is a test");
string_push(&c, " ");
string_push(&c, "for c23");
printf("%s\n", c.val);
}
typedef struct { ... } foo_t;
typedef struct { ... } bar_t;
foo_t foo = (bar_t){ ... };
i.e. these are meant to be named types and thus should remain nominal even though it's technically a typedef. And ditto for similarly defined pointer types etc. But this is a pattern regular enough that it can just be special-cased while still allowing proper structural typing for cases where that's obviously what is intended (i.e. basically everywhere else)._Generic(x, int i: i + 1, float f: f + 1.);
where the i and f then have the correct type, so you do not need to refer to 'x' in those expressions.