▲ | thaumasiotes 19 hours ago | ||||||||||||||||||||||
I don't actually understand why you'd want reproducibility in a statistical simulation. If you fix the output, what are you learning? The point of the simulation is to produce different random numbers so you can see what the outcomes are like... right? Let's say I write a paper that says "in this statistical model, with random seed 1495268404, I get Important Result Y", and you criticize me on the grounds that when you run the model with random seed 282086400, Important Result Y does not hold. Doesn't this entire argument fail to be conceptually coherent? | |||||||||||||||||||||||
▲ | hansvm 9 hours ago | parent | next [-] | ||||||||||||||||||||||
- It eliminates one way of forging results. Without stating the seed, people could consistently fail to reproduce your results and you could always have "oops, guess I had a bad seed" as an excuse. You still have to worry about people re-running the simulation till the results look good, but that's handled via other mechanisms. - Many algorithms are vastly easier to implement stochastically than deterministically. If you want to replay the system (e.g., to locally debug some production issue), you need those "stochastic" behaviors to nonetheless be deterministic. - If you're a little careful with how you implement deterministic randomness, you can start to ask counterfactual questions -- how would the system have behaved had I made this change -- and actually compare apples to apples when examining an experimental run. Even in your counterexample, the random seeds being reproducible and published is still important. With the seed and source published, now anyone can cheaply verify the issue with the simulation, you can debug it, you can investigate the proportion of "bad" seeds and suss out the error bounds of the simulation, etc. | |||||||||||||||||||||||
| |||||||||||||||||||||||
▲ | AlotOfReading 17 hours ago | parent | prev [-] | ||||||||||||||||||||||
The point is that if you run the simulation with seed X, you'll get the same results I do. This means that all you need to reproduce my results is the code and any inputs like the seed, rather than the entire execution history. If you want to provide different inputs and get the same result, that's another matter entirely (where numerical stability will be much more important). | |||||||||||||||||||||||
|