| ▲ | Floor and Ceil versus Denormals on CPU and GPU(asawicki.info) | |
| 27 points by ibobev 4 days ago | 3 comments | ||
| ▲ | yosefk 10 minutes ago | parent | next [-] | |
Flush denormals to zero. Even their inventor had trouble writing correct code in their presence - see the Appendix to that "what every programmer should know..." paper | ||
| ▲ | kevmo314 37 minutes ago | parent | prev | next [-] | |
> This is not the first time we can see Nvidia taking shortcuts to achieve maximum performance of their GPUs Why is implementing it correctly not performant? For context I have no idea how rounding is typically implemented anyways. | ||
| ▲ | crote 2 hours ago | parent | prev [-] | |
Another thing to keep in mind is that CPU processing of denormals tends to be extremely slow - I vaguely recall running into something like a 10x slowdown a decade ago. For a lot of applications the difference between a denormal and zero is small enough to be irrelevant, so if you expect near-zero values to be common, enabling a denormals-to-zero compiler flag might give you a pretty nice performance boost for free. | ||