Top
Best
New

Posted by HexDecOctBin 6/30/2025

The provenance memory model for C(gustedt.wordpress.com)
225 points | 143 commentspage 2
eqvinox 6/30/2025|
Using the "register" storage class feels really alien for C code written in 2025…
flohofwoe 6/30/2025|
It has a slightly different meaning now, instead of hinting to the compiler that the variable should be placed in a register it now means that it is illegal to take the address of the variable (e.g. cannot create a pointer from it):

https://www.godbolt.org/z/eEYf5c59f

Might be useful in some situations although I currently can't think of any :)

eqvinox 7/1/2025||
I mean, yeah, but that function is really only an aid for the programmer in self-enforcing that rule; the compiler already knows whether the address of the variable is taken anywhere, and behave as is useful if it isn't taken anywhere…

Doesn't feel particularly valuable to have that "help" from the compiler against "accidentally" taking the address of a variable… I mean, how do you even accidentally do that?

flohofwoe 7/1/2025||
I guess you're not a fan of 'const' either? ;)
eqvinox 7/1/2025||
I am a fan of "const", because it is useful in expressing API constraints and behavior. Contrast against putting "register" on a function parameter being useless because either it's passed by value, then it's a copy anyway and register is meaningless to the caller, or it's a pointer, in which case it again does nothing because you already have a pointer (and something somewhere is very confused.)
dsp_person 6/30/2025||

    if ((Π⁻ < Π) && (Π < Π⁺)) {
I spent way too long trying to figure this out as C code instead of

    if ((Π⁻ < Π) && (Π < Π⁺)) {
RossBencina 6/30/2025||
After reading the fine article I'm left wondering what if you implement your own heterogeneous allocation scheme on top of malloc? (e.g. TLSF) In this case all of your objects will belong to the same malloced storage region, and you will compute object offsets using raw pointers, but I'd expect provenance to potentially treat each returned object to behave as if it were allocated from a separate disjoint storage.

I guess my question is: does this provenance model allow for recursive nesting of allocators with a separate notion of "storage" at each level?

f33d5173 6/30/2025|
The compiler knows about malloc, and hence knows that the pointer returned by malloc won't alias any other pointer. Your compiler might support some attribute to mark a function as behaving like malloc in this respect. Otherwise the compiler will be forced to assume the return value could alias any other pointer.
cryptonector 7/1/2025||
IMO there should be attributes for declaring allocators. Or builtin functions that have the effect of marking their callers with such attributes (e.g., an `__allocated()` function to say a pointer is indeed now to be considered a pointer to a new storage allocation, with a given size and optional type, and a `__freed()` function to say that a pointer is indeed now to be considered a dangling pointer to a deallocated object.
nixpulvis 7/1/2025||
As a bit of an aside, the example XOR doubly linked list example given here is super cool.
Joker_vD 6/30/2025||
> Here the term "same representation and alignment" covers for example the possibility to look at [...] one would be a structure and the other would be another structure that sits at the beginning of the first.

Does it? It is quite simple for a struct A that has struct B as its first member to have radically different alignment:

    struct B { char x; };

    struct A { struct B b; long long y; };
Also, accidentally coinciding pointers are nothing "rare" because all objects are allowed to be treated as 1-element arrays: so any pointer to an e.g. struct field is also a pointer one-past the previous field of this struct; also, malloc() allocations easily may produce "touching" objects. So thanks for allowing implementations to not have padding between almost every two objects, I guess.
layer8 6/30/2025|
This is about the representation and alignment of the pointer object, not about the object being pointed to. And C requires struct pointer types to all have the same representation and alignment. This is generally necessary due to the possibility of having pointers to opaque struct declarations in a translation unit.

Regarding your second point, if I understand the model correctly, there is only an ambiguity in pointer provenance if the adjacent objects are independent "storage instances", i.e. separately malloc'ed objects or separate variables on the stack — not between fields of the same struct.

jaisio 6/30/2025||
The root cause of all this is that C programs are not much more than glorified assembly programs. Any effort to retrofit higher level reasoning will always be defeated somebody doing some dirty pointer tricks. This can only be solved by more abstract ways to express programs which necessarily restricts the bare metal dirty things one can do. But what you gain is that the compiler will easily be able to do lots of things which a C compiler can't do or only with a lot of headache. The kind of stuff this article is about is really trying to solve the wrong problem IMO.
briandw 6/30/2025||
The code blocks are very difficult to read on this page. I had ChatGPT O3 rewrite this in a more accessible format. https://chatgpt.com/share/68629096-0624-8005-846f-7c0d655061...
cenobyte 6/30/2025|
So much better. Thank you!
cryptonector 7/1/2025||
:thank you:

This is great. I wonder what u/pizlonator thinks of it.

b0a04gl 6/30/2025|
provenance model basically turns memory back into a typed value. finally malloc wont just be a dumb number generator, it'll act more like a capability issuer. and access is not 'is this address in range' anymore, but “does this pointer have valid provenance”. way more deterministic, decouples gcc -wall
HexDecOctBin 6/30/2025|
Will this create more nasal demons? I always disable strict aliasing, and it's not clear to me after reading the whole article whether provenance is about making sane code illegal, or making previously illegal sane code legal.
jcranmer 6/30/2025|||
All C compilers have some notion of pointer provenance embedded in them, and this is true going back decades.

The problem is that the documented definitions of pointer provenance (which generally amount to "you must somehow have a data dependency from the original object definition (e.g., malloc)") aren't really upheld by the optimizer, and the effective definition of the optimizer is generally internally inconsistent because people don't think about side effects of pointer-to-integer conversion. The one-past-the-end pointer being equal (but of different provenance) to a different object is a particular vexatious case.

The definition given in TS6010 is generally the closest you'll get to a formal description of the behavior that optimizers are already generally following, except for cases that are clearly agreed to be bugs. The biggest problem is that it makes pointer-to-int an operation with side effects that need to be preserved, and compilers today generally fail to preserve those side effects (especially when pointer-to-int conversion happens more as an implicit operation).

The practical effect of provenance--that you can't magic a pointer to an object out of thin air--has always been true. This is largely trying to clarify what it means to actually magic a pointer out of thin air; it's not a perfect answer, but it's the best answer anyone's come up with to date.

Diggsey 6/30/2025||||
It's standardizing the contract between the programmer and the compiler.

Previously a lot of C code was non-portable because it relied on behaviour that wasn't defined as part of the standard. If you compiled it with the wrong compiler or the wrong flags you might get miscompilations.

The provenance memory model draws a line in the sand and says "all C code on this side of the line should behave in this well defined way". Any optimizations implemented by compiler authors which would miscompile code on that side of the line would need to be disabled.

Assuming the authors of the model have done a good job, the impact on compiler optimizations should be minimized whilst making as much existing C code fall on the "right" side of the line as possible.

For new C code it provides programmers a way to write useful code that is also portable, since we now have a line that we can all hopefully agree on.

layer8 6/30/2025|||
This is basically a formalization of the general understanding one already had when reading the C standard thoroughly 25 years ago. At least I was nodding along throughout the article. It cleans up the parts where the standard was too imprecise and handwavy.