▲ | pavpanchekha 15 hours ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||
I loved this clever, weird, awesome paper, so a short summary. In many dynamic languages some values are stored on the heap ("boxed") and represented as a pointer, while others are represented as an immediate value ("unboxed" or "immediate"). Pointer tagging is a common way to do that: the low bit of the value tells you the value's type, and some types are immediate while others are boxed. Naturally, the tag bits have a fixed value, so can't be used to store data. So for example your language might offer 61-bit integer immediates instead of 64-bit integers; the other three bits are used for tags. Possibly, larger integers are stored on the heap and treated as a different type (for example Python 2.X had separate int and long types for these cases). However, it's hard to use this strategy for floats, because floats need all 64 bits (or 32 bits for single-precision, same difference). There's a trick called "NaN boxing" which makes use of the large number of NaNs in the float representation, but read the paper if you want more on that. The authors' insight is that, suppose you have a three-bit tag and 011 is the tag for floats. By totally random chance, _some_ floats will end in 011; you can represent those as immediates with those tag bits. Obviously, that's unlikely, though you can raise the chances by using, like, 010, 011, 100, and 101 all as float tags. Still, the low bits are a bad choice. But what about high bits? Most floats have one of a couple high bit patterns, because most floats are either 0 or between, say, 1e-100 and 1e100. Floats outside that range can be boxed but since they're really rare it's not a big cost to box them. So basically, we use high bits as our tag bits and map all the common float prefixes to float tags. This allows unboxing the vast majority of floats, which leads to big speedups on float-heavy benchmarks. A personal note: I've been working in numerics and floating-point for a decade now and have had to deal with float boxing both from a research point of view (lots of runtime analysis systems for floats), from a user point of view (using unboxed float vectors for significant speedup in my own software), and from a teaching point of view (discussing boxing in my compilers class, using NaN-boxing as an example of cleverness). This idea is so simple, so crazy, so stupid, and works so well, but I never thought of it. Bravo to the authors. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | daanx 13 hours ago | parent | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
> This idea is so simple, so crazy, so stupid, and works so well, but I never thought of it. Bravo to the authors. Thanks for the nice summary -- looking forward to read the paper! The same idea of self-tagging is actually also used in Koka language [1] runtime system where by default the Koka compiler only heap allocates float64's when their absolute value is outside the range [2e-511,2e512) and not 0, infinity, or NaN (see [2]). This saves indeed many (many!) heap allocations for float intensive programs. Since Koka only uses 1 bit to distinguish pointers from values, another slightly faster option is to only box negative float64's but of course, negative numbers are still quite common so it saves less allocations in general. [1] https://koka-lang.github.io/koka/doc/book.html#sec-value-typ... [2] https://github.com/koka-lang/koka/blob/dev/kklib/src/box.c#L... ps. If you enjoy reading about tagging, I recently wrote a note on efficiently supporting seamless large integer arithmetic (as used in Koka as well) and discuss how certain hardware instructions could really help to speed this up [3]: [3] https://www.microsoft.com/en-us/research/uploads/prod/2022/0... (ML workshop 2022) | |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | gus_massa 5 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Nice explanation but it took me a while to understand the trick. They are hidding the tag in the "exponent" of the float, not in the "mantisa"! | |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | rtpg 14 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Do all float operations need to reconfirm those bits afterwards though? I suppose if you have some sort of JIT you can end up with a bunch of unboxed floats and would only pay the cost on boundaries though | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | pansa2 14 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
> This allows unboxing the vast majority of floats, which leads to big speedups on float-heavy benchmarks. NaN-boxing allows all floats to be unboxed though. The main benefit of the self-tagging approach seems to be that by boxing some floats, we can make space for 64-bit pointers which are too large for NaN-boxing. The surprising part of the paper is that "some floats" is only a small minority of values - not, say, 50% of them. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | skybrian 10 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
It’s clever, but not random chance. That would be too much of a coincidence. They rotate the floats to make it happen the way they want. It’s hardly random that only 8 bits of exponent are needed for many calculations. | |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | jonnycomputer 15 hours ago | parent | prev | next [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
Thank you for the clear explanation! | |||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | 10 hours ago | parent | prev [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||
[deleted] |