Why Safety Profiles Failed

Posted by pjmlp 2 days ago

Why Safety Profiles Failed(www.circle-lang.org)

234 points | 213 commentspage 2

j16sdiz 2 days ago|

> A C++ compiler can infer nothing about aliasing from a function declaration.

True. but you don't solely rely on the declaration, do you? lots of power comes from static analysis.

steveklabnik 2 days ago||

It’s important to only rely on the declaration, for a few reasons. One of the simpler ones is that the body may be in another translation unit.

seanbax 2 days ago||

You do solely rely on the declaration.

From P3465: "why this is a scalable compile-time solution, because it requires only function-local analysis"

Profiles uses local analysis, as does borrow checking. Whole program analysis is something you don't want to mess with.

loeg 2 days ago||

Section 6 seems to propose adding essentially every Rust feature to C++? Am I reading that right? Why would someone use this new proposed C++-with-Rust-annotations in place of just Rust?

ijustlovemath 2 days ago||

Because the millions of lines of existing C++ aren't going anywhere. You need transition capability if you're ever gonna see widespread adoption. See: C++'s own adoption story; transpiling into C to get wider adoption into existing codebases.

leni536 2 days ago|||

Features C++ has that Rust doesn't:

* template specialisations

* function overloading

* I believe const generics is still not there in Rust, or its necessarily more restricted.

In general metaprogramming facilities are more expressive in C++, with different other tradeoffs to Rust. But the tradeoffs don't include memory safety.

ynik 2 days ago|||

The main big philosophical difference regarding templates is that Rust wants to guarantee that generic instantiation always succeeds; whereas C++ is happy with instantiation-time compiler errors. The C++ approach does make life a fair bit easier and can maybe even avoid some of the lifetime annotation burden in some cases: in Rust, a generic function may need a `where T: 'static` constraint; in C++ with lifetimes it could be fine without any annotations as long as it's never instantiated with structs containing pointers/references.

Template specializations are not in Rust because they have some surprisingly tricky interactions with lifetimes. It's not clear lifetimes can be added to C++ without having the same issue causing safety holes with templates. At least I think this might be an issue if you want to compile a function instance like `void foo<std::string_view>()` only once, instead of once for each different string data lifetime.

tialaramex 2 days ago|||

You definitely can't have all of "Non-type template parameters" (the C++ equivalent of const generics) in Rust because some of it is unsound. You can certainly have more than you get today, it's much less frantically demanded but I should like to be able to have an enum of Hats and then make Goose<Hat::TopHat> Goose<Hats::Beret> Goose<Hats::Fedora> and so on, which is sound but cannot exist today.

For function overloading this serves two purposes in C++ and I think Rust chooses a better option for both:

First, when there are similar features with different parameters, overloading lets you pretend the feature set was smaller but making them a single function. So e.g. C++ offers a single sort function but Rust distinguishes sort, sort_by and sort_by_key

Obviously all three have the same underlying implementation, but I feel the distinct names helps us understand when reading code what's important. If they're all named sort you may not notice that one of these calls is actually quite different.

Secondly, this provides a type of polymorphism, "Ad hoc polymorphism". For example if we ask whether name.contains('A') in C++ the contains function is overloaded to accept both char (a single byte 'A' the number 65) and several ways to represent strings in C++

In Rust name.contains('A') still works, but for a different reason 'A' is still a char, this time that's a Unicode Scalar Value, but the reason it works here is that char implements the Pattern trait, which is a trait for things which can be matched against part of a string. So name.contains(char::is_uppercase) works, name.contains(|ch| { /* arbitrary predicate for the character */ }) works, name.contains("Alex") works, and a third party crate could have it work for regular expressions or anything else.

I believe this more extensible alternative is strictly superior while also granting improved semantic value.

aw1621107 19 hours ago||

> You definitely can't have all of "Non-type template parameters" (the C++ equivalent of const generics) in Rust because some of it is unsound.

Just to clarify, does this mean NTTP in C++ is unsound as-is, or that trying to port C++ NTTP as-is to Rust would result in something unsound?

tialaramex 12 hours ago||

It is possible to write unsound code using NTTP in C++ unsurprisingly. In C++ that's just your fault as the programmer, don't make mistakes. So the compiler needn't check. I think NTTP abuse that's actually unsound is rare in production, but the problem is that's my insight as a human looking at the code, I'm not a compiler.

The Rust equivalent would need to be checked by the compiler and I think this only really delivers value if it's a feature in the safe Rust subset. So, the compiler must check what you wrote is sound, if it can't tell it must reject what you wrote. And that's why they decided to do the integer types first, that's definitely sound and it's a lot of value delivered.

As a whole concept you could probably say that the C++ NTTP is "unsound as-is" but that's so all-encompassing as to not be very useful, like saying C++ integer arithmetic is unsound. It's such a big thing that even though the problem is also big, it sort of drowns out the problem.

Noticing that std::abs is unsound has more impact because hey, that's a tiny function, why isn't it just properly defined for all inputs? But for the entire NTTP feature or arithmetic or ranges or something it's not a useful way to think about it IMO.

steveklabnik 2 days ago|||

Here’s the actual proposal: https://safecpp.org/draft.html

It explains its own motivation.

lmm 2 days ago|||

> Why would someone use this new proposed C++-with-Rust-annotations in place of just Rust?

They wouldn't. The point is, if you were serious about making a memory-safe C++, this is what you'd need to do.

SubjectToChange 2 days ago||

>Why would someone use this new proposed C++-with-Rust-annotations in place of just Rust?

Simply making C++ compilers compatible with one another is a constant struggle. Making Rust work well with existing C++ code is even more difficult. Thus, it is far easier to make something like Clang understand and compile C++-specific annotations alongside legacy C++ code than making rustc understand C++ types. Moreover, teams of C++ programmers will have an easier time writing annotated C++ than they would learning an entirely new language. And it's important to recognize how deeply entrenched C++ is in many areas, especially when you consider things like OpenMP, OpenACC, CUDA, HIP/ROCm, Kokkos, etc etc etc.

zahlman 2 days ago||

What actually is this circle-lang site, and who runs it? The main page seems to just redirect to example.com, and I don't recognize the name of the author.

steveklabnik 2 days ago||

Circle is a C++ compiler by Sean Baxter, with various extensions. One of those is an implementation of the Safe C++ proposal I’ve linked downthread.

crote 2 days ago||

Does the Circle compiler see any real-world use? I keep hearing about it, but it seems like it is a one-person project which hasn't seen significant activity in about a year.

On paper it does sound quite promising, but looking at the Github repo I can't help but shake the feeling that it is more talk than action. It seems to have basically been abandoned before it ever saw any adoption?

pjmlp 2 days ago|||

At least Circle is real, something that one can download, validate its ideas, what works, what does not.

Most of the profiles being argued for C++ only exist in PDF form, there isn't a single C++ compiler where you can validate those ideas, but trust the authors, the ideas will certainly work once implemented.

steveklabnik 2 days ago|||

It’s not open source. The GitHub activity is purely for documentation.

I don’t know about real world adoption, nor do I think it’s really about what I’m saying. The proposal was made public a few weeks ago. There hasn’t been much time for real world projects to adopt it, if that were even the goal. The point is that it does exist today, and you can try it out. It’s more than just a design on paper.

catskul2 2 days ago||

I'm surprised you've not heard of the author (Sean Baxter). He's pretty well known among people who are interested in c++ standards proposals.

He single handedly wrote his own C++ front-end and then proceeded to implement a butt load of extensions which other members of the committees poo-pood as being to hard to implement in the compiler.

Every couple of weeks he implements something new. He's a real joy to follow.

zahlman 1 day ago||

I'm familiar with C++ and used to use it a fair bit, but I'm way out of date with it. Python has been my primary language almost since I picked it up, nearly 20 years ago.

o11c 2 days ago||

"No mutable aliases" is a mistake; it prevents many useful programs.

Now that virtual address space is cheap, it's possible to recompile C (or presumably C++) with a fully-safe runtime (requiring annotation only around nasty things like `union sigval`), but this is an ABI break and has nontrivial overhead (note that AddressSanitizers has ~2x overhead and only catches some optimistic cases) unless you mandate additional annotation.

mananaysiempre 2 days ago||

> AddressSanitizer has ~2x overhead

I’ve got some programs where the ASan overhead is 10× or more. Admittedly, they are somewhat peculiar—one’s an interpreter for a low-level bytecode, the other’s largely a do-nothing benchmark for measuring the overhead of a heartbeat scheduler. The point is, the overhead can vary a lot depending on e.g. how many mallocs your code does.

This does not contradict your point in any way, to be clear. I was just very surprised when I first hit that behaviour expecting ASan’s usual overhead of “not bad, definitely not Valgrind”, so I wanted to share it.

leni536 2 days ago|||

Don't deploy ASAN builds to production, it's a debugging tool. It might very well introduce attack vectors on its own, it's not designed to be a hardening feature.

stouset 2 days ago|||

> it prevents many useful programs

Every set of constraints prevents many useful programs. If those useful programs can still be specified in slightly different ways but it prevents many more broken programs, those constraints may be a net improvement on the status quo.

tialaramex 2 days ago|||

> "No mutable aliases" is a mistake; it prevents many useful programs.

Does it? You didn't list any. It certainly prevents writing a tremendous number of programs which are nonsense.

o11c 2 days ago|||

The entirety of Rust's `std::cell` is a confession that yes, we really do need mutable aliases. We just pretend they the aliases aren't mutable except for a nanosecond around the actual mutation.

steveklabnik 2 days ago|||

It’s more than that, they disable the aliasing based optimizations, and provide APIs that restrict how and when you can mutate in order to make sure data races don’t happen.

Controlled mutable aliasing is fine. Uncontrolled is dangerous.

alilleybrinker 2 days ago||||

Alternatively, Rust's cell types are proof that you usually don't need mutable aliasing, and you can have it at hand when you need it while reaping the benefits of stronger static guarantees without it most of the time.

JoshTriplett 2 days ago||||

Cells still don't allow simultaneous mutable aliases; they just allow the partitioning of regions of mutable access to occur at runtime rather than compile time.

jerf 2 days ago|||

And Rc<T> is a "confession" that we still need reference counting semantics. And macros are a "confession" that pure Rust code isn't powerful enough. And "unsafe" is a "confession" that we really do need "unsafe". And so on and so forth with all kinds of features.

Except that's a really harsh and unproductive way to phrase it. The existence of "unsafe" is not a "confession" that safe code is impossible so why should even try.

Progress in safety is still made when an unsafe capability is removed from the normal, common, suggested way to do things and it is moved to something you need to ask for and take responsibility for, and hemmed in by other safe constructs around it.

andrewflnr 2 days ago|||

OP mentioned std::sort and the rest of std::algorithm as useful functions that use mutable aliasing.

zaphar 1 day ago||

And other languages implement the same thing just as fast without mutable aliasing.

tialaramex 1 day ago||

For general purpose sort like std::sort & std::stable_sort the obviously faster (and of course safer) choices are the Rust equivalents [T]::unstable_sort and [T]::sort and their accompanying suite of functions like sort_by_key and select_nth_unstable

There is movement towards adopting the same approach for C++ by some vendors. The biggest problem is that there's a lot of crap C++ out there buried in erroneous custom comparators and so if you change how sorting works even very slightly you blow up lots of fragile nonsense written in C++ and from their point of view you broke their software even though what happened is they wrote nonsense. Next biggest problem is that C++ named their unstable sort "sort" so sometimes people needed a stable sort and they got an unstable sort but in their test suites everything checked out by chance, and with a new algorithm now the test blows up because it's an unstable sort...

morning-coffee 2 days ago|||

> "No mutable aliases" is a mistake; it prevents many useful programs.

Yes, it prevents many useful programs.

I think it also prevents many many many more useless broken incorrect programs from wreaking havoc or being used as exploit delivery vehicles.

bluGill 2 days ago||

Writing correct mutable code when there are alias in play is difficult and should be avoided I agree. However sometimes it is the right answer to the problem and so it should be possible for the best programmers to do. However only a minority of code needs that, and when it is needed you need to do a lot of extra review - I wish there was a way to make doing it accidentally impossible on C++.

tsimionescu 2 days ago|||

How would this fix memory safety issues like std::sort(vec1.begin(), vec2.end()) (where vec1 and vec2 are different vectors, of course)? Or strlen(malloc(100))?

o11c 2 days ago|||

With a safe runtime, a pointer is really something like a `struct { u32 allocation_base, allocation_offset;}`. (it may be worth doing fancy variable-bit-width math to allow many small allocations but only a few large ones; it's also likely worth it to have a dedicated "leaf" section of memory that is intended not to contain any pointers)

An implementation of `sort` would start with: `assert (begin.allocation_base == end.allocation_base)`. Most likely, this would be implicit when `end - begin` or `begin < end` is called (but not `!=`, which is well-defined between unrelated pointers).

If we ignore the uninitialized data (which is not the same kind of UB, and usually not interesting), the `strlen` loop would assert when, after not encountering a NUL, `s.allocation_offset` exceeds `100` (which is known in the allocation metadata).

badmintonbaseba 2 days ago|||

Tracking allocations is necessary, but not sufficient.

  struct S {
    int arr1[100];
    std::string s;
    int arr2[100];
  };

  void foo(S& s) {
    //arr1 and arr2 are in the same complete object
    //so they are in the same allocation
    std::sort(std::begin(arr1), std::end(arr2));
  }

To make it sound you really need to track arrays specifically (including implicit single-element ones), not just allocations.

It's surely somewhat feasible, as the constexpr interpreters in compilers do track it, but something like that would probably be way inefficient at runtime.

o11c 2 days ago||

(if `sizeof(char *)` != `sizeof(int)` it will be detected as an illegal memory access, since at a minimum every word is tagged whether it's a pointer or not. Otherwise ...)

That's really an aliasing problem, not specific to arrays. The tricky part about tagging memory with a full type is that a lot of code, even in standards, relies on aliasing that's supposed to be forbidden.

Still, even if the `sort` isn't detected (due to the user insisting on permissiveness rather than adding annotations), it would still detect any attempt to use `s` afterward.

As for a limited array-specific solution ... I can't think of one that would handle all variants of `sort((int *)&s[0], (int *)&s[1])`. And do we want to forbid `sort(&s.arr[0], (int *)&s.s)`?

tsimionescu 2 days ago||||

That makes sense, thank you!

I'd bet that there are corners of non-UB vlaid programs that would be sensitive to the fact that pointers are no longer simple numbers, but maybe that's wrong or at least could be worked around.

I would add that to make this memory safe for multi-threaded programs, you also need to implcitly synchronize on fat pointer accesses, to prevent corrupted pointers from forming.

gpderetta 2 days ago|||

one day we will bring segments and far pointers back.

zozbot234 2 days ago||

CHERI is pretty much there already

TinkersW 2 days ago||||

It seems to me that it would handle the sort case fine, you would read/write an invalid page and it would fall over(assuming all allocations had invalid pages at the end of the allocation & no 2 allocations share a page).

The strlen could be made safe by having the malloc return the address 100 bytes before the end of the page. If it did make it to the end of the 100 bytes it would fall over safely. Result would be nonsense of course. Of course if you read bytes < than the returned ptr you would have 4k-100 bytes before it fails on bad page.

NobodyNada 2 days ago||

> assuming all allocations had invalid pages at the end of the allocation & no 2 allocations share a page

This assumption doesn't hold for real world systems -- most allocations are small, so heap allocators typically support a minimum allocation size of 16 bytes. Requiring each allocation to have its own page would be a 256x memory overhead for these small allocations (assuming 4k pages, and even worse as systems are moving to 16k or 64k page size). Not to mention destroying your TLB.

Also, guard pages only solve the problem if the program tries to access the entire array (like when sorting or reading a string). For other types of out-of-bounds array access where an attacker can control the index accessed, they can just pass in an index high enough to jump over the guard page. You can "fix" this with probing, but at that point just bounds-checking is much simpler and more performant.

gmueckl 2 days ago||||

These two examples come from bad library design, not bad language design. The first one was fixed with ranges. The second one would be fixed if C used an explicit string type in it's standard library.

tsimionescu 2 days ago||

That is besides the point. The claim was about making C++ memory safe by adjusting the runtime, and I was curious how that could work for cases other than use-after-free.

randomNumber7 2 days ago|||

The tinyCC compiler, written by fabrice bellard, has a feature that enables pointer checking and makes the resulting C code safe.

steveklabnik 2 days ago||

> When a pointer comes from unchecked code, it is assumed to be valid.

It certainly helps, but is not a full solution.

murderfs 2 days ago|||

virtual address space is cheap, but changing it is massively expensive. If you have to do a TLB shootdown on every free, you're likely going to have worse performance than just using ASan.

o11c 2 days ago||

Dealing with malloc/free is trivial and cheap - just give every allocated object a couple of reference counts.

The hard part is figuring out which words of memory should be treated as pointers, so that you know when to alter the reference counts.

Most C programs don't rely on all the weird guarantees that C mandates (relying on asm, which is also problematic, is probably more common), but for the ones that do it is quite problematic.

steveklabnik 2 days ago|||

The borrow checker works irrespective of the heap. Memory safety involves all pointers, not just ones that own a heap allocation.

o11c 2 days ago||

If we're trying to minimize annotation while maximizing C compatibility, it will be necessary to heap-allocate stack frames. This cost can be mitigated with annotations, once again. In this case, a global "forbid leaks even if unused" flag would cover it.

Static allocations only need full heap compatibility if `dlclose` isn't a nop.

And TLS is the forgotten step-child, but at the lowest level it's ultimately just implemented on normal allocations.

dwattttt 2 days ago||

> it will be necessary to heap-allocate stack frames.

I sure hope you don't use any stack frames while writing the stack frame allocator.

acbits 2 days ago||||

https://github.com/acbits/reftrack-plugin

I wrote a compiler extension just for this issue since there wasn't any.

akira2501 2 days ago|||

> just give every allocated object a couple of reference counts.

Works great with a single thread.

o11c 2 days ago||

Multi-threaded refcounts aren't actually that hard?

There's overhead (depending on how much you're willing to annotate it and how much you can infer), but the only "hard" thing is the race between accessing a field and and changing the refcount of the object it points to, and [even ignoring alternative CAS approaches] that's easy enough if you control the allocator (do not return memory to the OS until all running threads have checked in).

Note that, in contrast the the common refcount approach, it's probably better to introduce a "this is in use; crash on free" flag to significantly reduce the overhead.

myworkinisgood 2 days ago||

[flagged]

dang 2 days ago||

You broke the site guidelines repeatedly in this thread, and posting attacks like this on others will definitely get you banned here if you keep doing it.

If you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules when posting here, we'd appreciate it.

mimd 2 days ago||

I'm confused over lines such as "Profiles have to reject pointer arithmetic, because there’s no static analysis protection against indexing past the end of the allocation." Can't frama-c/etc do that? Additionally, section 2.3 is narrower than what is implied by the words "safe" and "out-of-contract" and is more concerned with what C/C++ call "undefined behavior" requirements than contract correctness. Ie. An integer which is defined to wrap overflows and violates the requirement of the function contract, which I can cause in a safe release build rust.

bjornsing 2 days ago||

How is it supposed to do that (in the general case)? If I write a C++ program that will index out of bounds iif the Riemann hypothesis is true, then frama-c would have to win the millennium prize to do its job. I bet it can’t.

bluGill 2 days ago||

Often when I look into questions like this I discover the general case is impossible, but simple hysterics can get 99.999% of the cases and so I can get almost all the benefit even though some rare cases are missed.

zozbot234 2 days ago||

My own semi-random guess is that "simple hysterics" is indeed how a vast majority (if perhaps not quite 99.999%) of C/C++ devs approaches the code correctness problem - which is why safety mechanisms like the one proposed by OP may in fact be urgently needed. Simple heuristics are likely to be significantly more worthwhile, if appropriately chosen.

bluGill 2 days ago||

C++ for sure needs better safety mechanisms. And I don't know the exact number of issues simple heuristics can catch.

steveklabnik 2 days ago||

You cannot cause undefined behavior with integer overflow using + in Rust. That behavior is considered an error, but is well defined.

mimd 1 day ago||

If it requires the programmer to bear the responsibility for proper usage (eg. must use checked_add not rely on panic), how's that different than the issues with undefined behavior? I'm also concerned with the differing functional behavior between debug and release, and the mistaken impression it could create (eg. code handles overflow in debug fine due to panic but blows up on release as the proper solution is not used). And a resulting propagation of the overflow error to a precondition of an unsafe call that modifies memory.

aw1621107 1 day ago|||

> If it requires the programmer to bear the responsibility for proper usage (eg. must use checked_add not rely on panic), how's that different than the issues with undefined behavior?

It comes down to the blast radius for a mistake. A mistake involving UB can potentially result in completely arbitrary behavior. A mistake in the safe subset of a language is still a mistake, but the universe of possible consequences is smaller. How much smaller depends on the language in question.

> I'm also concerned with the differing functional behavior between debug and release

IIRC this was a compromise. In an ideal world Rust would always panic on overflow, but the performance consequences were considered to be severe enough to potentially hinder adoption. In addition, overflow checking was not considered memory safety-critical as mandatory bounds checks in safe code would prevent overflow errors from causing memory safety issues in safe Rust.

I believe at the time it was stated that if the cost of overflow checking ever got low enough checking may be (re)enabled on release builds. I'm not sure whether that's still in the cards.

It's not ideal and can lead to problems as you point out when unsafe code is involved (also e.g., CVE-2018-1000810 [0]), but that's the nature of compromises, for better or worse.

[0]: https://groups.google.com/g/rustlang-security-announcements/...

mimd 12 hours ago||

Thanks for the input and links. I'll need to test out the costs of the mitigations.

BTW, I found one of the rust rfc documents helpful for understanding the borrow checker. Do you know if there is a similar rust RFC document for the upcoming polonius borrowchecker, even if it's just a working copy? I'm having trouble finding anything beyond some blog posts.

steveklabnik 1 day ago|||

Because defined behavior and undefined behavior operate very differently. One has guaranteed semantics, and the other can do literally anything.

prophesi 2 days ago||

For those without a dark mode extension:

body {

  background-color: #1f1f1f;

  color: #efefef;

}

.sourceCode {

  background-color: #3f3f3f;

}

account42 2 days ago|

The assumption here seems to be that the compiler/analyzer is only able to look at one function at a time. This makes no sense. Safety is a whole-program concern and you should analyze the whole program to check it.

If anything as simple as the following needs lifetime annotations then your proposed solution will not be used by anyone:

const int& f4(std::map<int, int>& map, const int& key) { return map[key]; }

simonask 2 days ago|

Whole-program analysis is not tractable (i.e., not scalable), and Rust has already proven that function signatures are enough, and does actually scale. The analysis can be performed locally at each call site, and doesn't have to recurse into callees.

Your function would look like this in Rust:

    fn f4<'a>(map: &'a Map<i32, i32>, key: &i32) -> &'a i32 { ... }

You don't need much more than a superficial understanding of Rust's lifetime syntax to understand what's going on here, and you have much more information about the function.

jerf 2 days ago|||

"Whole-program analysis is not tractable (i.e., not scalable),"

The search term for those who'd like to follow up is "Superoptimization", which is one of the perennial ideas that programmers get that will Change the World if it is "just" implemented and "why hasn't anyone else done it I guess maybe they're just stupid", except it turns out to not work in practice. In a nutshell, the complexity classes involved just get too high.

(An interesting question I have is whether a language could be designed from the get-go to work with some useful subset of superoptimizations, but unfortunately, it's really hard to answer such questions when the bare minimum to have a good chance of success is 30 years of fairly specific experience before one even really stands a chance, and by then that's very unlikely to be what that person wants to work on.)

gpderetta 2 days ago|||

Something I would like to know is how much lifetime annotation you can infer (recursively) from the function implementation itself. Compiler driven, IDE integrated, automatic annotation would be a good tool to have.

Some amount of non-local inference might also be possible for templated C++ code that already lack a proper separate compilation story.

steveklabnik 2 days ago||

At the limit, the answer is “between zero and completely.” Zero because you may only have access to the prototype and not the body, say if the body is in another translation unit, or completely if a full solution could be found, which is certainly possible for trivial cases.

The reason to not do this isn’t due to impossibility, but for other factors: it’s computationally expensive, I’d you think compile times are already bad, get ready for them to get way worse. Also, changing the body can break code in competently different parts of your program, as changing the body changes the signature and can now invalidate callers.