| ▲ | HappMacDonald 10 hours ago | |
> considering that model training process is non-deterministic Why would it have to be? Just use PRNG with published seeds and then anyone can reproduce it. | ||
| ▲ | dataflow 8 hours ago | parent [-] | |
I have zero actual experience in training models, but in general, when parallelizing work: there can be fundamental nondeterminism (e.g., some race conditions) that is tolerated, whose recording/reproduction can be prohibitive performance-wise. | ||