Remix.run Logo
HappMacDonald 10 hours ago

> considering that model training process is non-deterministic

Why would it have to be? Just use PRNG with published seeds and then anyone can reproduce it.

dataflow 8 hours ago | parent [-]

I have zero actual experience in training models, but in general, when parallelizing work: there can be fundamental nondeterminism (e.g., some race conditions) that is tolerated, whose recording/reproduction can be prohibitive performance-wise.