| ▲ | jcalvinowens 5 hours ago | |
Does your benchmark use sequential or randomly ordered inputs? That would make a substantial difference with an LUT, I would think. But I'm guessing. Maybe 32K is so small it doesn't matter (if almost all of the LUT sits in the cache and is never displaced). > if you use smaller data types than double or float. Even dropping interpolation worked fine, That's kinda tautological isn't it? Of course reduced precision is acceptable where reduced precision is acceptable... I guess I'm assuming double precision was used for a good reason, it often isn't :) | ||
| ▲ | drsopp 4 hours ago | parent [-] | |
I didnt inspect the rest of the code but I guess the table is fetched from L2 on every call? I think the L1 data cache is flooded by other stuff going on all the time. About dropping the interpolation: Yes you are right of course. I was thinking about the speed. No noticable speed improvement by dropping interpolation. The asin calls are only a small fraction of everything. | ||