Both approaches revealed the same conclusion: Memory Integrity Enforcement vastly reduces the exploitation strategies available to attackers. Though memory corruption bugs are usually interchangeable, MIE cut off so many exploit steps at a fundamental level that it was not possible to restore the chains by swapping in new bugs. Even with substantial effort, we could not rebuild any of these chains to work around MIE. The few memory corruption effects that remained are unreliable and don’t give attackers sufficient momentum to successfully exploit these bugs.

This is great, and a bit of a buried lede. Some of the economics of mercenary spyware depend on chains with interchangeable parts, and countermeasures targeting that property directly are interesting.

▲ leoc 3 days ago | parent | next [-]

In terms of Apple Kremlinology, should this be seen a step towards full capability-based memory safety like CHERI ( https://en.wikipedia.org/wiki/Capability_Hardware_Enhanced_R... ) or more as Apple signaling that it thinks it can get by without something like CHERI?

▲

bri3d 3 days ago | parent | next [-]

IMO it's the latter; CHERI requires a lot of heavy lifting at the compile-and-link layer that restricts application code behaviors, and an enormous change to the microarchitecture. On the other hand, heap-cookies / tag secrets can be delegated to the allocator at runtime in something like MIE / MTE, and existing component-level building blocks like the SPTM can provide some of the guarantees without needing a whole parallel memory architecture for capabilities like CHERI demands.

▲

mschuster91 3 days ago | parent | next [-]

> CHERI requires a lot of heavy lifting at the compile-and-link layer that restricts application code behaviors, and an enormous change to the microarchitecture.

Well, Apple already routinely forces developers to recompile their applications so if Apple wants to introduce something needing a compiler / toolchain update they can do that easily. And they also control the entire SoC from start to finish and unlike pretty much everyone else also hold an ARM Architecture License so they can go and change whatever they want in the hardware side as well.

▲

jrtc27 3 days ago | parent | prev | next [-]

To reiterate what I've said elsewhere, CHERI does not need a whole parallel memory architecture, there is just one that gets a slight extension over a non-CHERI/MTE system to include tags. But that is the same story as MTE, which also needs to propagate the tags in the memory system (and in fact, more tags, since we just need one bit per 16 bytes, whereas MTE needs 4 bits per 16 bytes in the common scheme).

▲

checker659 3 days ago | parent | prev [-]

> compile-and-link layer

Not to mention the dynamic linker.

	▲	jrtc27 3 days ago \| parent [-]
		Yeah you need a compiler, linker and OS. That's true of any security technology. CHERI may be more significant in that regard because it's a bigger rethink than just stuffing some extra metadata into the existing types, but it's not at all intractable. We, a research group, maintain CheriBSD, a "full-fat" port of FreeBSD to CHERI (Morello and CHERI-RISC-V), so to a big tech organisation it's a small investment. The cost to tech companies is not making it work, it's often much more boring business factors.

▲

pizlonator 3 days ago | parent | prev [-]

MTE and CHERI are so different that it’s hard and maybe not even possible to do both at the same time (you might not have enough spare bits in a CHERI 128 bit ptr for the MTE tag)

They also imply a very different system architecture.

▲

jrtc27 3 days ago | parent | next [-]

We actually have ideas for how to combine the two; see section C.5 of https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-987.pdf

▲

leoc 3 days ago | parent | prev | next [-]

Sure, I'm not suggesting that Apple might actually do both at the same time. They could however implement the less burdensome one now while intending to replace it with the the all-singing-all-dancing alternative down the line.

	▲	pizlonator 3 days ago \| parent [-]
		Gotcha. My point about different systems architectures makes me think it’s unlikely that you’d want to do that

▲

quotemstr 3 days ago | parent | prev [-]

> MTE and CHERI are so different that it’s hard and maybe not even possible to do both at the same time (you might not have enough spare bits in a CHERI 128 bit ptr for the MTE tag)

Why would you need MTE if you have CHERI?

▲

bri3d 3 days ago | parent | next [-]

Why would you need CHERI if you have working mitigations that don't demand a second bus?

I think it's two halves of the same coin and Apple chose the second half of the coin.

The two systems are largely orthogonal; I think if Apple chose to go from one to the other it will be a generational change rather than an incremental one. The advantage of MTE/MIE is you can do it incrementally by just changing the high bits the allocator supplies; CHERI requires a fundamental paradigm shift. Apple love paradigm shifts but there's no indication they're going to do one here; if they do, it will be a separate effort.

▲

pizlonator 3 days ago | parent | next [-]

CHERI is deterministic.

That’s strictly better, in theory.

(Not sure it’s practically better. You could make an argument that it’s not.)

▲

VogonPoetry 3 days ago | parent | next [-]

This is on the verge of pedantry - CHERI determinism isn't strictly true, garbage collecting abandoned descriptors is currently done asynchronously. Malicious code could attempt to reuse an abandoned descriptor before it is "disappeared". I think it might be possible to construct a synthetic situation where two threads operating with perhaps different privilege in the same address space (something CHERI can support!) have an IPC channel might be affected by the timing.

There is a section in the technical reports that talks about garbage collection.

I don't think CHERI is currently being used with different privileged threads in the same address space.

▲

Findecanor 3 days ago | parent [-]

I suspect that the parent poster was referring to MTE's memory protection being probabilistic. There are only 16 tag values for an attacker to guess. You can combine MTE and PAC, but PAC is also only probabilistic.

With CHERI, there is nothing to guess. You either have a capability or you don't.

	▲	labcomputer 2 hours ago \| parent [-]
		Right, but the problem with CHERI is that you may (probabilistically) continue to have that capability even after you shouldn't. That's the problem. That's because the capability (tagged pointer) itself is what gives you the right to access memory. So you have to find all the capabilities pointing to a segment of memory and invalidate them. Remember, capabilities are meant to be copied. Early work on CHERI (CHERIvoke) proposed a stop-the-world barrier to revoke capabilities by doing a full scan of the program's memory (ouch!) to find and invalidate any stale capabilities. Because that is so expensive, the scan is only performed after a certain threshold amount of memory has been freed. That threshold introduces a security / battery life trade-off. That was followed by "Cornucopia", which proposed a concurrent in-kernel scan (with some per-page flags to reduce the number of pages scanned) followed by a shorter stop-the-world. In 2024 (just last year), "Reloaded" was proposed, which add still more MMU hardware to nearly eliminate pauses, at the cost of 10% more memory traffic. Unfortunately, the time between free and revocation introduces a short-but-not-zero window for UAF bugs/attacks. This time gap is even explicitly acknowledged in the Reloaded paper! Moreover, the Reloaded revocation algo requires blocking all threads of an application to ensure no dead capabilities are hidden in registers. In contrast, with MTE, you just change the memory's tag on free, which immediately causes all formerly-valid pointers to the memory granule to become invalid. That's why you would want both: They're complementary. * MTE gives truly instantaneous invalidation with zero battery impact, but only probabilistic spatial protections from attackers. * CHERI gives deterministic spatial protection with eventually-consistent temporal invalidation semantics.

▲

bri3d 3 days ago | parent | prev [-]

FWIW (I am a nobody compared to you; I didn't make FIL-C :) ) - I think that MIE/MTE are practically superior to CHERI.

I also think this argument is compelling because one exists in millions of consumer drives, to-be-more (MTE -> MIE) and one does not.

▲

als0 3 days ago | parent | prev [-]

Second bus?

▲

bri3d 3 days ago | parent [-]

CHERI fundamentally relies on capabilities living in memory that is architecturally separate from program memory. You could do so using a bus firewall, but then you're at the same place as MIE with the SPTM.

	▲	jrtc27 3 days ago \| parent \| next [-]
		That's not true. Capabilities are in main memory as much as any other data. The tags are in separate memory (whether a wider SRAM, DRAM ECC bits, or a separate table off on the side in a fraction of memory that's managed by the memory controller; all three schemes have been implemented and have trade-offs). But this is also true of MTE; you do not want those tags in normal software-visible main memory either, they need to be protected.
	▲	Findecanor 3 days ago \| parent \| prev \| next [-]
		A CHERI capability is stored in main memory but with the tag bit for that location set. The tags are stored in separate memory pages, also in main memory in current designs. Maybe you've been confused by a description of how it works inside a processor. In early CHERI designs, capabilities were in different architectural processor registers from integers. In recent CHERI designs, the same register numbers are used for capabilities and other registers. A micro-architecture could be designed to have either all registers be capability registers with the tag bit, or use register renaming to separate integer and capability registers. I suppose a CHERI MCU for embedded systems with small memory could theoretically have tag pages in separate SRAM instead of caching main memory, but I have not seen that.
	▲	MBCook 3 days ago \| parent \| prev [-]
		So something like having built in RAM for the pagetables that aren’t part of the normal pool? That way no matter what kind of attack you come up with user space cannot pass a pointer to it?

▲

pizlonator 3 days ago | parent | prev [-]

Not saying you’d want both. Just answering why MTE isn’t a path to CHERI

But here’s a reason to do both: CHERI’s UAF story isn’t great. Adding MTE means you get a probabilistic story at least

▲

bri3d 3 days ago | parent | next [-]

True! On the flip side, MTE sucks at intra-object corruption: if I get access to a heap object with pointers, MTE doesn't affect me, I can go ahead and write to that object because I own the tag.

Overall my _personal_ opinion is that CHERI is a huge win at a huge cost, while MTE is a huge win at a low cost. But, there are definitely vulnerability classes that each system excels at.

▲

pizlonator 3 days ago | parent [-]

I think the intra object issue might be niche enough to not matter.

And CHERI fixes it only optionally, if you accept having to change a lot more code

▲

jrtc27 3 days ago | parent | next [-]

Where studies suggest "a lot" is sub-0.1%. For example, https://www.capabilitieslimited.co.uk/_files/ugd/f4d681_e0f2... was a study into porting 6 million lines of C and C++ to run a KDE+X11 desktop stack on CHERI, and saw 0.026% LoC change, or ~1.5k LoC out of ~6 million LoC, all done in just 3 months by one person. That's even an overestimate, because it includes many changes to build systems just to be able to cross-compile the projects. It's not nothing, but it's the kind of thing where a single engineer can feasibly port large bodies of code. Yes, certain systems code will be worse (like JITs), but the vast majority of cases are not that, and even those are still feasible (e.g. we have people working with Chromium and V8).

▲

pizlonator 2 days ago | parent [-]

Does that study include enabling intra object overflow protection, or not?

When I say that this optional feature would force you to change a lot more code I’m comparing CHERI without intra object overflow protection to CHERI with intra object object overflow protection.

Finally, 6 million lines of code is not that impressive. Real OSes are measured in billions

	▲	jrtc27 2 days ago \| parent [-]
		> Does that study include enabling intra object overflow protection, or not? > > When I say that this optional feature would force you to change a lot more code I’m comparing CHERI without intra object overflow protection to CHERI with intra object object overflow protection. Sorry, I misinterpreted what you were saying. No, that's not with subobject bounds. If you want that then yes there is more incompatibility, because C does not have a good subobject memory model. That's not really because there's anything wrong with CHERI, it's just because the language itself is at odds in places with doing that kind of enforcement with any technology. But, if you're willing to incur that additional friction (as we do for our pure-capability kernel in CheriBSD), you can enable it, and it can protect against additional vulnerabilities that other security technologies fundamentally cannot. We even provide a sliding scale of subobject bounds enforcement, where each of the three levels restricts bounds in more cases at the expense of compatibility. The architecture gives you the flexibility to decide what software model you want to enforce with it. > Finally, 6 million lines of code is not that impressive. We have far more than that ported, that was just one case study done in a few months by one developer. FreeBSD alone is, by my very rough estimation cloc that excludes LLVM, about 14 million lines of C and C++ (yes, I'm not distinguishing architecture-specific code and all kinds of other considerations, but it's close enough and gives an order of magnitude for the purposes of this conversation), and we have FreeBSD ported. Not to mention our work on, say, Chromium and V8 (Chromium being another set of 10s of millions of lines of code, again tractable with the engineering effort of just a few members of our research group). > Real OSes are measured in billions Citation needed. The Linux kernel is only a bit over 40 million lines of code these days. Real systems may well approach the billions of lines of code running once you factor in all the libraries, daemons and applications running on top of it, but that is not all low-level OS code that needs the kind of porting an OS or runtime does. Even if it were a billion lines of code, though, extrapolating at 0.026% that would be 260 kLoC changed, which isn't that scary a number. Even V8, which is about the worse case you could possibly have (highly-stylised code written in a way that uses types in CHERI-unfriendly ways; a language runtime full of pointers; many (about 6?) different highly-optimised just-in-time compilers that embed deep knowledge of the ISAs and ABIs they are targeting and like to play games with pointers in the name of performance) we see (last I checked) ~0.8% LoC changed, or about 16k out of 2 million. The porting cost is real, but the numbers have never suggested to us it's at all intractable for industry.

▲

bri3d 3 days ago | parent | prev [-]

I think I broadly agree with you. IMO tagging is practically much, much more valuable than capabilities systems modeled like CHERI.

▲

quotemstr 3 days ago | parent [-]

Yes, but CHERI opens whole new system design possibilities, including things like ultra-cheap intra-address-space security boundaries. See https://www.cl.cam.ac.uk/research/security/ctsrd/pdfs/201607...

> We have used CHERI’s ISA facilities as a foundation to build a software object-capability model supporting orders of magnitude greater compartmentalization performance, and hence granularity, than current designs. We use capabilities to build a hardware-software domain-transition mechanism and programming model suitable for safe communication between mutually distrusting software

and https://github.com/CTSRD-CHERI/cheripedia/wiki/Colocation-Tu...

> Processes are Unix' natural compartments, and a lot of existing software makes use of that model. The problem is, they are heavy-weight; communication and context switching overhead make using them for fine-grained compartmentalisation impractical. Cocalls, being fast (order of magnitude slower than a function call, order of magnitude faster than a cheapest syscall), aim to fix that problem.

This functionality revolves around two functions: cocall(2) for the caller (client) side, and coaccept(2) for the callee (service) side. Underneath they are implemented using CHERI magic in the form of CInvoke / LDPBR CPU instruction to switch protection domains without the need to enter the kernel, but from the API user point of view they mostly look like ordinary system calls and follow the same conventions, errno et al.

There's a decent chance that we get back whatever performance we pay for CHERI with interest as new systems architecture possibilities open up.

MTE helps us secure existing architectures. CHERI makes new architectures possible.

▲

saagarjha 3 days ago | parent [-]

Yes, but this breaks mirror mappings.

▲

jrtc27 3 days ago | parent [-]

Can you elaborate on what you perceive as broken?

▲

saagarjha 3 days ago | parent [-]

mremap?

	▲	jrtc27 2 days ago \| parent \| next [-]
		You may wish to read what the current pure-capability CHERI Linux user ABI specifies for mremap(), because we (primarily Arm, in conjunction with us) have thought about this, and the conclusion is not "the existence of mremap() makes CHERI undeployable". See https://git.morello-project.org/morello/kernel/linux/-/wikis...
	▲	quotemstr 2 days ago \| parent \| prev [-]
		Add a a sliding window aliasing mode to the hardware? You'd set a page table bit saying "check capabilities not against my VA, but those VAs over there"

▲

quotemstr 3 days ago | parent | prev [-]

Some progress on UAF though! https://dl.acm.org/doi/10.1145/3703595.3705878

▲ ignoramous 3 days ago | parent | prev | next [-]

> This is great ...

That's Apple and here is Google (who have been at memory safety since the early Chrome/Android days):

  Google folks were responsible for pushing on Hardware MTE ... It originally came from the folks who also did work on ASAN, syzkaller, etc ... with the help and support of folks in Android ... ARM/etc as well.

  I was the director for the teams that created/pushed on it ... So I'm very familiar with the tradeoffs.
  
  ...

  Put another way - the goal was to make it possible to use have the equivalent of ASAN be flipped on and off when you want it.

  Keeping it on all the time as a security mitigation was a secondary possibility, and has issues besides memory overhead.

  For example, you will suddenly cause tons of user-visible crashes. But not even consistently. You will crash on phones with MTE, but not without it (which is most of them).

  This is probably not the experience you want for a user.

  For a developer, you would now have to force everyone to test on MTE enabled phones when there are ~1mn of them. This is not likely to make developers happy.

  Are there security exploits it will mitigate? Yes, they will crash instead of be exploitable. Are there harmless bugs it will catch? Yes.

  ...

  As an aside - It's also not obvious it's the best choice for run-time mitigation.

https://news.ycombinator.com/item?id=39671337

Google Security (ex: TAG & Project Zero) do so much to tackle CSVs but with MTE the mothership dropped the ball so hard.

▲

tptacek 3 days ago | parent [-]

This is a Daniel Berlin post explaining why Google didn't originally enable MTE full-time on Android. It explicitly acknowledges that keeping MTE enforcement enabled for everyone would block vulnerabilities.

▲

ignoramous 3 days ago | parent [-]

Unfortunate Daniel Berlin did not push Google to invest in MTE for security specifically, like Apple has done now with EMTE (MTE v4?). I mean, AOSP is investing heavily in rewriting core components like Binder IPC in Rust for memory safety instead... They also haven't resurrected the per-app toggle to disable JIT in ART for Java/Kotlin apps (like DVM's android:vmSafeMode)... especially after having delivered on-device "Isolated compilation" but (from what I can tell) only for OS (Java/Kotlin) components.

AOSP's security posture is frustrating (as Google seemingly solely decides what's good and what's bad and imposes that decision on each of their 3bn users & ~1m developers, despite some in the security community, like Daniel Micay, urging them to reconsider). The steps Apple has been taking (in both empowering the developers and locking down its own OS) in response to Celebgate and Pegasus hacks has been commendable.

▲

saagarjha 3 days ago | parent | next [-]

Google did invest in MTE. In fact you linked to some of their investments that ended up trickling down to Android. The problem is actually shipping this is hard and Google was not able to do it. No, "some in the security community" being loud does not mean it is ready to ship. Google identified several problems that they were not able to solve and thus did not ship it generally.

▲

pjmlp 3 days ago | parent | prev [-]

Meanwhile Oracle has been doing it since 2015 with SPARC ADI on Solaris.

I do agree it is a pain not seeing this becoming widely adopted.

As for disabling JIT, it would have the same effect as early Androids, lagging behind Symbian devices, with applications that were wrappers around NDK code.

	▲	ignoramous 2 days ago \| parent [-]
		> As for disabling JIT, it would have the same effect as early Androids DVM tried to mitigate the slowness with JIT+SSA, but ART mixed in JIT+SSA alongside AOT+PGO (that is, a no JITing ART means a full AOT ART, unlike in DVM where the Interp takes over when in vmSafeMode). Even if the runtime will continue to lag in terms of power/performance efficiency wrt ObjC/Swift, Google should at least let the developers decide if they want to disallow JIT from creating executable memory regions inside their app's sandbox, like Apple does: https://developer.apple.com/documentation/security/hardened-...

▲ commandersaki 3 days ago | parent | prev [-]

RIP Vigilant Labs

Okay a bit drastic, I don’t really know if this will affect them.

	▲	tptacek 3 days ago \| parent [-]
		I think they're going to print money hats, but we'll see. Remember: there isn't a realistic ceiling on what NATO-friendly intelligence and law enforcement agencies will pay for this technology; it competes with human intelligence, which is nosebleed expensive.