▲ | sgeisenh 5 hours ago | |
This is an oversimplification. When distributed, the nondeterministic order of additions during reductions can produce nondeterministic results due to floating point error. It’s nitpicking for sure, but it causes real challenges for reproducibility, especially during model training. |