▲ | Float Self-Tagging(arxiv.org) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
130 points by ndjdjddjsjj 16 hours ago | 22 comments | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | pavpanchekha 12 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I loved this clever, weird, awesome paper, so a short summary. In many dynamic languages some values are stored on the heap ("boxed") and represented as a pointer, while others are represented as an immediate value ("unboxed" or "immediate"). Pointer tagging is a common way to do that: the low bit of the value tells you the value's type, and some types are immediate while others are boxed. Naturally, the tag bits have a fixed value, so can't be used to store data. So for example your language might offer 61-bit integer immediates instead of 64-bit integers; the other three bits are used for tags. Possibly, larger integers are stored on the heap and treated as a different type (for example Python 2.X had separate int and long types for these cases). However, it's hard to use this strategy for floats, because floats need all 64 bits (or 32 bits for single-precision, same difference). There's a trick called "NaN boxing" which makes use of the large number of NaNs in the float representation, but read the paper if you want more on that. The authors' insight is that, suppose you have a three-bit tag and 011 is the tag for floats. By totally random chance, _some_ floats will end in 011; you can represent those as immediates with those tag bits. Obviously, that's unlikely, though you can raise the chances by using, like, 010, 011, 100, and 101 all as float tags. Still, the low bits are a bad choice. But what about high bits? Most floats have one of a couple high bit patterns, because most floats are either 0 or between, say, 1e-100 and 1e100. Floats outside that range can be boxed but since they're really rare it's not a big cost to box them. So basically, we use high bits as our tag bits and map all the common float prefixes to float tags. This allows unboxing the vast majority of floats, which leads to big speedups on float-heavy benchmarks. A personal note: I've been working in numerics and floating-point for a decade now and have had to deal with float boxing both from a research point of view (lots of runtime analysis systems for floats), from a user point of view (using unboxed float vectors for significant speedup in my own software), and from a teaching point of view (discussing boxing in my compilers class, using NaN-boxing as an example of cleverness). This idea is so simple, so crazy, so stupid, and works so well, but I never thought of it. Bravo to the authors. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | vanderZwan an hour ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Already linked yesterday, btw, with a comment by one of the authors: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | comex 6 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> For instance, NaN-tagging prevents (or largely complicates) optimizations relying on stack allocations. The stack uses high memory addresses that do not fit in 48 bits unless encoded relative to the location of the stack segment. Er, what? The paper says they tested on a Xeon CPU, so x86-64, running Linux. On traditional x86-64, all pointers fit in 48 bits, period. Stack memory is no exception. More recently the architecture was extended to allow 56-bit pointers, but my impression is that Linux (like other OSes) keeps them disabled by default in userspace. According to the documentation [1]: > Not all user space is ready to handle wide addresses. [..] To mitigate this, we are not going to allocate virtual address space above 47-bit by default. So how would the stack end up above 47 bits? Is the documentation out of date? [1] https://docs.kernel.org/arch/x86/x86_64/5level-paging.html | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | plagiarist 10 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The idea in the paper is really cool. People who enjoyed this might also like to read how Apple used tagged pointers for short strings in Objective-C [0]. I think that's when I first learned about tagged pointers. NaN-boxing was mindblowing for me. I love this kind of stuff. [0] https://mikeash.com/pyblog/friday-qa-2015-07-31-tagged-point... | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|