| ▲ | jandrewrogers 2 days ago |
| As someone that uses pointer tagging, I must point out that this article is insufficiently defensive. I've done my own exploration of what I can get away with across 64-bit x86 and ARM in this regard. It has been a while but the maximum number of bits that are reliably taggable across all environments and use cases that I have been able to determine is six. Can you get away with more? Probably yes, but there are identifiable environments where it will explode if you do so. That may not apply to your use case. Reliable pointer tagging is not trivial. |
|
| ▲ | dzaima a day ago | parent | next [-] |
| At least on Linux, you're guaranteed to get the top 16 bits free on at least x86[1] and ARM[2]. Maybe other OSes are less nice, but generally an OS doesn't really have a reason to force down the full 57-or-whatever-bit range before the program actually requests 256 terrabytes of virtual address space. It is generally kinda sad though that there's not a way to request from mmap or equivalent that the result is in a specific range of memory (in (0; 1<<48) here). Would be useful for JIT-compiling code that needs to call into precompiled functions too. [1]: https://www.kernel.org/doc/html/v5.8/x86/x86_64/5level-pagin... [2]: https://www.kernel.org/doc/html/v5.8/arm64/memory.html#bit-u... |
| |
| ▲ | bigstrat2003 a day ago | parent [-] | | > Maybe other OSes are less nice, but generally an OS doesn't really have a reason to force down the full 57-or-whatever-bit range before the program actually requests 256 terrabytes of virtual address space. The fact that Linux does this isn't nice, it's a huge mistake. It means that the kernel can't automatically use 5-level page tables on processors that support it, because backwards compatibility guarantees mean the programs must be able to use those bits in a pointer. AMD was smart enough to throw an exception if programs use those bits in a pointer (thus guaranteeing forward compatibility), so why Linux didn't follow suit is puzzling. | | |
| ▲ | dzaima a day ago | parent [-] | | Indeed most architectures error if the top bits are non-zero, but that's trivially worked around by just a manual `x & 0xffffffffffff` or `x << 16 >> 16` or similar, and software already utilizes that (and indeed must do that to not immediately break on x86). It is somewhat unfortunate to just force the larger address space to specific mmap usage, but it's hard for me to imagine that many programs actually needing more than 256TB of virtual memory that aren't doing so in a very-specialized way. Certainly much less frequent than the already-infrequent (but very much existing, and significant! incl. both Firefox/SpiderMonkey and WebKit/JavaScriptCore) cases of programs utilizing top 16 bits being zeroes. Then there's the option of mmap returning ranges from from the low 2^48 while possible, and using larger addresses only when that completely runs out; should mean existing software works fine before it needs more than 256TB of RAM, and, if the software checks the top bits of mmap's result being zeroes, it's not negatively affected anyway. Really the proper solution is to go back in time and make mmap have separate lower and upper bounds addresses though. |
|
|
|
| ▲ | orlp a day ago | parent | prev | next [-] |
| > Reliable pointer tagging is not trivial. It is if you use alignment bits. Not always possible if you don't control the data though. |
|
| ▲ | 2 days ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | forrestthewoods 2 days ago | parent | prev [-] |
| Can you share details? What modern platforms/environments does this not work on? Are you saying the intersection of available bits on all platforms is just 6? Or are there platforms that actually use 58 bits? Would be great to hear some actionable details. |
| |
| ▲ | jandrewrogers 2 days ago | parent | next [-] | | It wasn’t anything clever. A couple years ago I did a dive into x86 and ARM literature to determine what bits of a pointer were in use in various environments or were on a roadmap to be used in the future. To be honest, it was more bits than I was expecting. Note also that this is the intersection of bits that are available on both ARM and x86. If you want it to be portable, you need both architectures. Just because ARM64 doesn’t use a bit doesn’t mean that x86 doesn’t and vice versa. Both x86 and ARM have proposed standards for pointer tagging in the high bits. However, those bits don’t perfectly overlap. Also, some platforms don’t fully conform to this reservation of high bits for pointer tagging, so there is a backward compatibility issue. Across all of that, I found six high bits that were guaranteed to be safe for all current and future platforms. In practice you can probably use more but there is a portability risk. | | |
| ▲ | loeg 2 days ago | parent | next [-] | | > Note also that this is the intersection of bits that are available on both ARM and x86. If you want it to be portable, you need both architectures. Just because ARM64 doesn’t use a bit doesn’t mean that x86 doesn’t and vice versa. Your mask/tag doesn't need to use the same bits on x86 and ARM to be portable, though. | | |
| ▲ | jandrewrogers 2 days ago | parent [-] | | It depends on the application, those bits may be materialized across architectures. The objective was maximizing safety in all contexts. My perspective is biased by the requirements of high-assurance systems. | | |
| ▲ | rhdjebejdbd 2 days ago | parent | next [-] | | It doesn't depend on the application unless the application shares the same pointers between x86 and arm which doesn't make any sense to me. Otherwise they're right, it's not the intersection that matters but just the total bits available | | |
| ▲ | jandrewrogers 2 days ago | parent [-] | | Eh? The values you can store in the tags are absolutely dependent on the number of available bits. That’s a simple type safety problem. This requirement is architecture independent. You can’t cram 8 bits of tag in 7 bits if the latter is all the architecture has available. Hence why you have to design for the smallest reliable target. | | |
| |
| ▲ | forrestthewoods 2 days ago | parent | prev [-] | | If one platform uses the upper 56 bits and another uses the lower 56 bits that doesn’t mean you have 0 bits available for tagging. It means you have 8 bits and have to go through a conversation when moving from one platform to another. This is perhaps annoying but perfectly fine. Kinda weird to materialize pointers across architectures rather than indices. But in any case surely the relevant consideration is “fewest number of free pointer bits on any single platform”. And not “intersection of free bits across all platforms”. Right? | | |
|
| |
| ▲ | menaerus 2 days ago | parent | prev [-] | | How about taking the advantage of max_align_t pointer alignment guarantees, which on x86-64 Linux (glibc) is 16-bytes? This would leave you with the 4 lowest bits to be used. | | |
| ▲ | jandrewrogers 2 days ago | parent [-] | | Unfortunately, low bits vary considerably from a large number to zero. If you need a minimum number of bits to be reliably available then you have to look at the high bits of a pointer. Naturally, implementations use the low bits of alignment makes them available. | | |
| ▲ | menaerus a day ago | parent [-] | | Hm, what I have seen so far is that pointers returned by system malloc are usually aligned either to 8-byte boundary (windows) or 16-byte boundary (linux). I think jemalloc interprets the C standard guarantees a bit differently and will return the 8-byte aligned pointer for allocations whose size <= 8. But even this still leaves us with 3 LBSs to use. |
|
|
| |
| ▲ | jandrewrogers 2 days ago | parent | prev | next [-] | | Page tables can optionally consume a very large number of bits on x86 (57?). Not every platform enables it but your code may run on a platform that uses it. There are a bunch of proposals from Intel, AMD, ARM, et al about officially recognizing some set of high bits in 64-bit pointers as tags in user space, with implementations. Unfortunately, these “standards” don’t agree on which high bits can be safely reserved for tagging. IIRC, the 6 high bits I mentioned was the intersection of every tag reservation implementation and/or proposal. In other words, it was the set of bits that Intel, AMD, and ARM agreed would be safe for tagging for the foreseeable future. Fewer bits than I would like and can probably exploit, but nonetheless the number I can reasonably rely on. If a consistent standard is ever agreed upon, the number of bits may increase. | |
| ▲ | joz1-k 2 days ago | parent | prev | next [-] | | > are there platforms that actually use 58 bits? The original article already contains a note that "Some more recent x64 CPUs use 5-level paging, which would increase this number to 57 bits [0]" Apparently server-level "Sunny Cove" Intel CPUs implement this extension [1]. [0]: https://en.wikipedia.org/wiki/Intel_5-level_paging [1]: <https://en.wikipedia.org/wiki/Sunny_Cove_(microarchitecture)> | |
| ▲ | nervoir 2 days ago | parent | prev [-] | | If you include ARM then PAC and MTE will consume a few of those precious bits. Don’t think any platforms use PAC for pointers to allocated objects though unless they’re determined to be exceptionally important like creds structure pointers in task structures in the kernel. |
|