Posted by circuit 3 days ago
This is great, and a bit of a buried lede. Some of the economics of mercenary spyware depend on chains with interchangeable parts, and countermeasures targeting that property directly are interesting.
Well, Apple already routinely forces developers to recompile their applications so if Apple wants to introduce something needing a compiler / toolchain update they can do that easily. And they also control the entire SoC from start to finish and unlike pretty much everyone else also hold an ARM Architecture License so they can go and change whatever they want in the hardware side as well.
Not to mention the dynamic linker.
They also imply a very different system architecture.
Why would you need MTE if you have CHERI?
I think it's two halves of the same coin and Apple chose the second half of the coin.
The two systems are largely orthogonal; I think if Apple chose to go from one to the other it will be a generational change rather than an incremental one. The advantage of MTE/MIE is you can do it incrementally by just changing the high bits the allocator supplies; CHERI requires a fundamental paradigm shift. Apple love paradigm shifts but there's no indication they're going to do one here; if they do, it will be a separate effort.
That’s strictly better, in theory.
(Not sure it’s practically better. You could make an argument that it’s not.)
There is a section in the technical reports that talks about garbage collection.
I don't think CHERI is currently being used with different privileged threads in the same address space.
With CHERI, there is nothing to guess. You either have a capability or you don't.
That's because the capability (tagged pointer) itself is what gives you the right to access memory. So you have to find all the capabilities pointing to a segment of memory and invalidate them. Remember, capabilities are meant to be copied.
Early work on CHERI (CHERIvoke) proposed a stop-the-world barrier to revoke capabilities by doing a full scan of the program's memory (ouch!) to find and invalidate any stale capabilities. Because that is so expensive, the scan is only performed after a certain threshold amount of memory has been freed. That threshold introduces a security / battery life trade-off.
That was followed by "Cornucopia", which proposed a concurrent in-kernel scan (with some per-page flags to reduce the number of pages scanned) followed by a shorter stop-the-world. In 2024 (just last year), "Reloaded" was proposed, which add still more MMU hardware to nearly eliminate pauses, at the cost of 10% more memory traffic.
Unfortunately, the time between free and revocation introduces a short-but-not-zero window for UAF bugs/attacks. This time gap is even explicitly acknowledged in the Reloaded paper! Moreover, the Reloaded revocation algo requires blocking all threads of an application to ensure no dead capabilities are hidden in registers.
In contrast, with MTE, you just change the memory's tag on free, which immediately causes all formerly-valid pointers to the memory granule to become invalid. That's why you would want both: They're complementary.
* MTE gives truly instantaneous invalidation with zero battery impact, but only probabilistic spatial protections from attackers.
* CHERI gives deterministic spatial protection with eventually-consistent temporal invalidation semantics.
I also think this argument is compelling because one exists in millions of consumer drives, to-be-more (MTE -> MIE) and one does not.
Maybe you've been confused by a description of how it works inside a processor. In early CHERI designs, capabilities were in different architectural processor registers from integers.
In recent CHERI designs, the same register numbers are used for capabilities and other registers. A micro-architecture could be designed to have either all registers be capability registers with the tag bit, or use register renaming to separate integer and capability registers.
I suppose a CHERI MCU for embedded systems with small memory could theoretically have tag pages in separate SRAM instead of caching main memory, but I have not seen that.
But here’s a reason to do both: CHERI’s UAF story isn’t great. Adding MTE means you get a probabilistic story at least
Overall my _personal_ opinion is that CHERI is a huge win at a huge cost, while MTE is a huge win at a low cost. But, there are definitely vulnerability classes that each system excels at.
And CHERI fixes it only optionally, if you accept having to change a lot more code
When I say that this optional feature would force you to change a lot more code I’m comparing CHERI without intra object overflow protection to CHERI with intra object object overflow protection.
Finally, 6 million lines of code is not that impressive. Real OSes are measured in billions
Sorry, I misinterpreted what you were saying. No, that's not with subobject bounds. If you want that then yes there is more incompatibility, because C does not have a good subobject memory model. That's not really because there's anything wrong with CHERI, it's just because the language itself is at odds in places with doing that kind of enforcement with any technology. But, if you're willing to incur that additional friction (as we do for our pure-capability kernel in CheriBSD), you can enable it, and it can protect against additional vulnerabilities that other security technologies fundamentally cannot. We even provide a sliding scale of subobject bounds enforcement, where each of the three levels restricts bounds in more cases at the expense of compatibility. The architecture gives you the flexibility to decide what software model you want to enforce with it.
> Finally, 6 million lines of code is not that impressive.
We have far more than that ported, that was just one case study done in a few months by one developer. FreeBSD alone is, by my very rough estimation cloc that excludes LLVM, about 14 million lines of C and C++ (yes, I'm not distinguishing architecture-specific code and all kinds of other considerations, but it's close enough and gives an order of magnitude for the purposes of this conversation), and we have FreeBSD ported. Not to mention our work on, say, Chromium and V8 (Chromium being another set of 10s of millions of lines of code, again tractable with the engineering effort of just a few members of our research group).
> Real OSes are measured in billions
Citation needed. The Linux kernel is only a bit over 40 million lines of code these days. Real systems may well approach the billions of lines of code running once you factor in all the libraries, daemons and applications running on top of it, but that is not all low-level OS code that needs the kind of porting an OS or runtime does. Even if it were a billion lines of code, though, extrapolating at 0.026% that would be 260 kLoC changed, which isn't that scary a number.
Even V8, which is about the worse case you could possibly have (highly-stylised code written in a way that uses types in CHERI-unfriendly ways; a language runtime full of pointers; many (about 6?) different highly-optimised just-in-time compilers that embed deep knowledge of the ISAs and ABIs they are targeting and like to play games with pointers in the name of performance) we see (last I checked) ~0.8% LoC changed, or about 16k out of 2 million. The porting cost is real, but the numbers have never suggested to us it's at all intractable for industry.
> We have used CHERI’s ISA facilities as a foundation to build a software object-capability model supporting orders of magnitude greater compartmentalization performance, and hence granularity, than current designs. We use capabilities to build a hardware-software domain-transition mechanism and programming model suitable for safe communication between mutually distrusting software
and https://github.com/CTSRD-CHERI/cheripedia/wiki/Colocation-Tu...
> Processes are Unix' natural compartments, and a lot of existing software makes use of that model. The problem is, they are heavy-weight; communication and context switching overhead make using them for fine-grained compartmentalisation impractical. Cocalls, being fast (order of magnitude slower than a function call, order of magnitude faster than a cheapest syscall), aim to fix that problem.
This functionality revolves around two functions: cocall(2) for the caller (client) side, and coaccept(2) for the callee (service) side. Underneath they are implemented using CHERI magic in the form of CInvoke / LDPBR CPU instruction to switch protection domains without the need to enter the kernel, but from the API user point of view they mostly look like ordinary system calls and follow the same conventions, errno et al.
There's a decent chance that we get back whatever performance we pay for CHERI with interest as new systems architecture possibilities open up.
MTE helps us secure existing architectures. CHERI makes new architectures possible.
That's Apple and here is Google (who have been at memory safety since the early Chrome/Android days):
Google folks were responsible for pushing on Hardware MTE ... It originally came from the folks who also did work on ASAN, syzkaller, etc ... with the help and support of folks in Android ... ARM/etc as well.
I was the director for the teams that created/pushed on it ... So I'm very familiar with the tradeoffs.
...
Put another way - the goal was to make it possible to use have the equivalent of ASAN be flipped on and off when you want it.
Keeping it on all the time as a security mitigation was a secondary possibility, and has issues besides memory overhead.
For example, you will suddenly cause tons of user-visible crashes. But not even consistently. You will crash on phones with MTE, but not without it (which is most of them).
This is probably not the experience you want for a user.
For a developer, you would now have to force everyone to test on MTE enabled phones when there are ~1mn of them. This is not likely to make developers happy.
Are there security exploits it will mitigate? Yes, they will crash instead of be exploitable. Are there harmless bugs it will catch? Yes.
...
As an aside - It's also not obvious it's the best choice for run-time mitigation.
https://news.ycombinator.com/item?id=39671337Google Security (ex: TAG & Project Zero) do so much to tackle CSVs but with MTE the mothership dropped the ball so hard.
AOSP's security posture is frustrating (as Google seemingly solely decides what's good and what's bad and imposes that decision on each of their 3bn users & ~1m developers, despite some in the security community, like Daniel Micay, urging them to reconsider). The steps Apple has been taking (in both empowering the developers and locking down its own OS) in response to Celebgate and Pegasus hacks has been commendable.
I do agree it is a pain not seeing this becoming widely adopted.
As for disabling JIT, it would have the same effect as early Androids, lagging behind Symbian devices, with applications that were wrappers around NDK code.
DVM tried to mitigate the slowness with JIT+SSA, but ART mixed in JIT+SSA alongside AOT+PGO (that is, a no JITing ART means a full AOT ART, unlike in DVM where the Interp takes over when in vmSafeMode). Even if the runtime will continue to lag in terms of power/performance efficiency wrt ObjC/Swift, Google should at least let the developers decide if they want to disallow JIT from creating executable memory regions inside their app's sandbox, like Apple does: https://developer.apple.com/documentation/security/hardened-...
Okay a bit drastic, I don’t really know if this will affect them.
FWIW, I presume this is "from experience"--rather than, from first principles, which is how it comes off--as this is NOT how their early kernel memory protections worked ;P. In 2015, with iOS 9, Apple released Kernel Patch Protection (KPP), which would verify that the kernel hadn't been modified asynchronously--and not even all that often, as I presume it was an expensive check--and panic if it detected corruption.
https://raw.githubusercontent.com/jakeajames/rootlessJB/mast...
> First let’s consider our worst enemy since iOS 9: KPP (Kernel Patch Protection). KPP keeps checking the kernel for changes every few minutes, when device isn’t busy.
> That “check every now and then” thing doesn’t sound too good for a security measure, and in fact a full bypass was released by Luca Todesco and it involves a design flaw. KPP does not prevent kernel patching; it just keeps checking for it and if one is caught, panics the kernel. However, since we can still patch, that opens up an opportunity for race conditions. If we do things fast enough and then revert, KPP won’t know anything ;)
I interpreted that as what they came up with when first looking at/starting to implement MTE, not their plan since $longTimeAgo.
Apple has certainly gotten better about security, and I suspect things like what you listed are a big part of why. They were clearly forced to learn a lot by jailbreakers.
Correct me if I'm wrong, but the spyware that has been developed certainly could be applied at scale at the push of a button with basic modification. They just have chosen not to at this time. I feel like this paragraph is drawing a bigger distinction than actually exists.
If that's the case, then many of their public statements about this are extraordinarily dishonest. There are widespread exploits targeting Safari, Chrome, iOS and Android. These are not only rare attacks targeting people heavily sought out by governments, etc. They do not have nearly as much visibility into it as they make it seem.
I wonder when the first person will be turned away from a US border for having an iPhone Air that the CBPs phone extraction tool doesn't work on?
Personally I didn’t read it as a swipe against Android. If it was I don’t personally know what attack(s) it’s referring to outside of the possibility of malware installed by the vendor.
But if it’s installed by the vendor, they can really do anything can’t they. That’s not really a security breach. Just trust.
It’s my understanding that this won’t protect you in the case where the attacker has a chance to try multiple times.
The approach would be something like: go out of bounds far enough to skip the directly adjacent object, or do a use after free with a lot of grooming, so that you get a a chance of getting a matching tag. The probability of getting a matching tag is 1/16.
But this post doesn’t provide enough details for me to be super confident about what I’m saying. Time will tell! If this is successful then the remaining exploit chains will have to rely on logic bugs, which would be super painful for the bad guys
The main weakness is that MTE is only 4 bits... and it's not even 1/16 but typically 1/15 chance of bypassing it since a tag is usually reserved for metadata, free data, etc. The Linux kernel's standard implementation for in-kernel usage unnecessarily reserves more than 1 to make debugging easier. MTE clears the way for a more serious security focused memory tagging implementation with far more bits and other features. It provides a clear path to providing very strong protection against the main classes of vulnerabilities used in exploits, especially remote/proximity ones. It's a great feature but it's more what it leads to that's very impressive than the current 4 bit MTE. Getting rid of some known side channels doesn't make it into a memory safety implementation.
Others are aware of where MTE needs improvement and are working on it for years. Cortex shipped MTE with a side channel issue which is better than not shipping it and it will get addressed. Apple has plenty of their own side channel vulnerabilities for their CPUs. Deterministic protections provided via MTE aren't negatively impacted by the side channel and also avoid depending on only 4 bits of entropy. The obvious way to use MTE is not the only way to use it.
GrapheneOS began using MTE in production right after the Pixel 8 provided a production quality implementation, which was significantly later than it could have been made available since Pixels aren't early adopters of new Cortex cores. On those cores, asynchronous MTE is near free and asymmetric is comparable to something like -fstack-protector-strong. Synchronous is relatively expensive, so making that perform better than the early Cortex cores providing MTE seems to be where Apple made a significant improvement. Apple has higher end, larger cores than the current line of Cortex cores. Qualcomm's MTE implementation will be available soon and will be an interesting comparison. We expect Android to heavily adopt it and therefore it will be made faster out of necessity. The security advantage of synchronous over asymmetric for userspace is questionable. It's clearer within the kernel, where little CPU time is spent on an end user device. We use synchronous in the kernel and asymmetric in userspace. We haven't offered full synchronous as an option mainly because we don't have any example of it making a difference. System calls act as a synchronization point in addition to reads. io_uring isn't available beyond a few core processes, etc.
I just want to address this part. Why shouldn't Apple advertise or market its achievements here? If they're effectively mitigating and/or frustrating real world attacks and seems to eliminate a class of security bugs, why shouldn't they boast about it; it shows that security R&D is in the forefront of the products they build which is an effective strategy for selling more product to the security conscious consumer.
Not a shill, but a shareholder, and I invest in Apple because they're at the forefront of a lot of tech.
Unsure about iOS, but back then, Webkit published their initial mitigations (like: Index masking, Pointer poisoning): https://webkit.org/blog/8048/what-spectre-and-meltdown-mean-...
In practice, it is 15/16 chance of detection of the exploit attempt. Which is an extraordinarily high rate of detection, which will lead to a fix by Apple.
Net net, huge win. But I agree they come across as overstating the prevention aspect.
But what if the only thing available to purchase is 1/16 or 1/256? Then maybe it’s not so miserable
That makes the probability work against the attacker really well. But it’s not a guarantee
What we're essentially saying is that evading detection is now 14/15 of the battle, from the attacker's perspective. Those people are very clever
There have been multiple full-chain attacks since the introduction of PAC. It hasn’t been a meaningful attack deterrent because attackers keep finding PAC bypasses. This should give you pause as to how secure EMTE actually is.
Sure, the whole sentence is a bit of a weird mess. Paraphrased: it made exploits more complex, so we concluded that we needed a combined SW/HW approach. What I read into that is that they're admitting PAC didn't work, so they needed to come up with a new approach and part of that approach was to accept that they couldn't do it using either SW or HW alone.
Then again... I don't know much about PAC, but to me it seems like it's a HW feature that requires SW changes to make use of it, so it's kind of HW+SW already. But that's a pointless quibble; EMTE employs a lot more coordination and covers a lot more surface, iiuc.
Correction: it forces attackers to find PAC bypasses. They are not infinite.
Xbox One, 2012? Never hacked.
Nintendo Switch 2, 2025? According to reverse engineers... flawlessly secure microkernel and secure monitor built over the Switch 1 generation. Meanwhile NVIDIA's boot code is formally verified this time, written in the same language (ADA SPARK) used for nuclear reactors and airplanes, on a custom RISC-V chip.
iPhone? iOS 17 and 18 have never been jailbroken; now we introduce MIE.
Apple are definitely doing the best job that any firm ever has when it comes to mitigation, by a wide margin. Yet, we still see CVEs drop that are marked as used in the wild in exploit chains, so we know someone is still at it and still succeeding.
When it comes to the Xbox One, it’s an admirable job, in no small part because many of the brightest exploit developers from the Xbox 360 scene were employed to design and build the Xbox One security model. But even still, it’s still got little rips at the seams even in public: https://xboxoneresearch.github.io/games/2024/05/15/xbox-dump...
For example, I might know of an unrelated exploit I'm sitting on because I don't want it fixed and so far it hasn't been.
I think the climate has become one of those "don't correct your adversary when they make mistakes" types of things versus an older culture of release clout.
There are still plenty of other flaws besides memory unsafety to exploit. I doubt that we'll see like a formally proven mainstream OS for a long time.
So far as you know. There's a reason they call them zero-day vulnerabilities.
Not publicly :)
This point could use more explanation. The fundamental problem here is the low entropy of the tags (only 4 bits). An attacker who randomly guesses the tags has 1/16 chance of success. That is not fixed by reseeding the PRNG. So I am not sure what they mean.
It isn't great. Most users won't assume malice when an app crashes. And if they reopen it a few times your chance of succeeding goes up quickly. But this is also assuming that you need a single pointer tag to exploit something. If you need more you need to get even luckier.
So it definitely isn't perfect protection. But it isn't trivial to bypass.
> If you need more you need to get even luckier.
This is a good point. Im not an expert but im guessing one is rarely enough, which would exponentially decrease your chances of success by brute force, e.g. 2 tags would be 1/256 etc
Is the implication here that making phones more secure is... bad? Because it makes jailbreaks harder to develop?
*: or whatever else people use jailbreaks for these days
Just like any weapon, "security" is only good if it's in your control. When the noose is around your neck, you'd better hope it easily breaks.
It will be very hard to buy something that won't exist in the near future. This rhetoric should've died a decade ago.
This doesn't change the fact that you're being gradually locked up, though.
GrapheneOS makes similar security improvements, but it doesn't lock the escape hatches or stifle our freedoms. I could still root my device if I wanted to (although this is not recommended) and I can turn exploit protections off and customize the level of enforcement in detail, per-app, if I want/need to.
The rhetoric of blaming consumers for buying the wrong product when they complain about hostile features on Apple's side of the duopoly, and then blaming them again when they switch to Android and complain about hostile features on that side.
The rhetoric of blaming the consumers for simply "not demanding" what we want with enough conviction. It's an asinine thing to suggest because freedom to install and customize has been the headline feature of Android since day 1, but they're killing it anyway because the duopoly doesn't give a shit about what we want. They know that they can make more money and they know that we don't have a choice.
> Recommending buying stuff that supports your wishes seems like pretty reasonable advice.
No, not when the market is a well-known abusive duopoly. That's either ignorant of the reality or just gaslighting.
But in this specific case I think it does still seem strange to raise a concern that one of the most notorious locked down vendors is shipping a security improvement because it also makes it harder to get full device access.
Maybe a better way of phrasing my point is that the problem isn't that these devices are secure, that is a good feature. The problem is that Apple doesn't let you control the device. I would focus my complaints on the latter, not complain about every security improvement because it also happens to contribute to the real problem.
...all the way back to pen and paper