| ▲ | altcognito 2 hours ago | |||||||
Explain this though. The code is deterministic, even if it relies on pseudo random number generation. It doesn't just happen, someone has to make a conscious decision to force a different code path (or model) if the system is loaded. | ||||||||
| ▲ | minimaltom an hour ago | parent | next [-] | |||||||
Its not deterministic. Any individual floating point mul/add is deterministic, but in a GPU these are all happening in parallel and the accumulation is in the order they happen to complete. When you add A then B then C, you get a different answer than C then A then B, because floating point, approximation error, subnormals etc. | ||||||||
| ▲ | chrisjj an hour ago | parent | prev | next [-] | |||||||
Not deterministic. https://thinkingmachines.ai/blog/defeating-nondeterminism-in... | ||||||||
| ▲ | FL33TW00D an hour ago | parent | prev | next [-] | |||||||
It takes a different code path for efficiency. e.g if (batch_size > 1024): kernel_x else: kernel_y | ||||||||
| ▲ | pertymcpert an hour ago | parent | prev [-] | |||||||
Floating point math isn't associative for operations that are associative in normal math. | ||||||||
| ||||||||