Remix.run Logo
plagiarist 12 hours ago

The idea in the paper is really cool.

People who enjoyed this might also like to read how Apple used tagged pointers for short strings in Objective-C [0]. I think that's when I first learned about tagged pointers. NaN-boxing was mindblowing for me. I love this kind of stuff.

[0] https://mikeash.com/pyblog/friday-qa-2015-07-31-tagged-point...

wging 10 hours ago | parent [-]

Another cool thing that seems related: exploiting alignment to free up N bits in a 'pointer' representation, because your values have to be aligned. The JVM does this to expand the set of possible addresses representable in 32 bits: https://shipilev.net/jvm/anatomy-quarks/23-compressed-refere...

So, for example, with 3 bits of alignment required, the first valid address for a pointer to point to after 0x0 is 0x8, and after that is 0x10, but you represent those as 0x1 and 0x2 respectively, and use a shift to get back the actual address (0x1 << 3 = 0x8, actual address). I think this is gestured at in section 1.1 of the paper, sort of, except they envision using the space thus freed for tags, rather than additional bits. (Which only makes sense if your address is 32 bits anyway, rather than 64 as in the paper: no one has 67-bit addresses. So saving 3 bits doesn't buy you anything. I think.)

> Aligning all heap-allocated values to 64-bit machine words conveniently frees the low bits of pointers to store a 3-bit tag.

plagiarist 10 hours ago | parent [-]

It's interesting which runtimes exploit the extra space for what reasons! Definitely makes more sense to have the extra address space on 32 bits compared to 64. I wonder if the extra addresses are specific to JVM / not something that works well in the C family?

lmm 8 hours ago | parent [-]

Well in C you have non-aligned pointers, because you can have pointers to things that aren't objects and might not be aligned (e.g. individual chars or shorts). In Java everything is at least 8-byte-aligned, you can't store a loose char/short/int on the heap (it has to go in a boxed object that's 8-byte-aligned, though the compiler will do this semi-automatically) and you can't take a pointer to an individual element of an array.

If you applied the C standard strictly, you could use a JVM-style representation for pointers to longs, pointers, and structs that start with longs and pointers, so you could theoretically have an implementation where those pointers were shorter. But you'd have to convert back and forth when casting to and from void* (and char*), and in practice C people expect to be able to cast a long* to int, cast that to void*, and get the same result as casting long* to void*, even though doing that and using it is undefined behaviour according to the standard.