Remix.run Logo
adampunk 7 hours ago

We love to leave faster functions languishing in library code. The basis for Q3A’s fast inverse square root had been sitting in fdlibm since 1986, on the net since 1993: https://www.netlib.org/fdlibm/e_sqrt.c

def-pri-pub 4 hours ago | parent [-]

Funny enough that fdlimb implementation of asin() did come up in my research. I believe it might have been more performant in the past. But taking a quick scan of `e_asin.c`, I see it doing something similar to the Cg asin() implementation (but with more terms and more multiplications, which my guess is that it's slower). I think I see it also taking more branches (which could also lead to more of a slowdown).

adampunk 2 hours ago | parent [-]

Yeah Ng’s work in fdlibm is cool and really clever in parts but a lot of branching. Some of the ways they reach correct rounding are…so cool.