▲ | craigacp 20 hours ago | |||||||
The same operations in the same order is a tough constraint in an environment where core count is increasing and clock speeds/IPC are not. It's hard to rewrite some of these algorithms to use a parallel decomposition that's the same as the serial one. I've done a lot of work on reproducibility in machine learning systems, and its really, really hard. Even the JVM got me by changing some functions in `java.lang.Math` between versions & platforms (while keeping to their documented 2ulp error bounds). | ||||||||
▲ | Dylan16807 13 hours ago | parent | next [-] | |||||||
Most people aren't spreading single calculations across multiple cores, and the ones that do are already deep enough into the technical weeds to handle deterministic chunking and combining. "parallel decomposition that's the same as the serial one" would be difficult in many ways, but only needed when you can't afford a one-time change. | ||||||||
▲ | AlotOfReading 16 hours ago | parent | prev | next [-] | |||||||
Many systems I've worked on can be written as embarrassingly parallel with synchronized checkpoints for a relatively small cost if they're not serial, but yeah, the middle ground is hard. That's the nature of the beast though. Imagine trying to rearrange the instructions in your program the same way. You're going to get all sorts of wacky behavior unless you invest an enormous amount of effort into avoiding problems the way superscalar CPUs do. Relatively few programs truly need to operate that way though and they're usually written by people who know the cost of their choices. | ||||||||
▲ | saagarjha 13 hours ago | parent | prev [-] | |||||||
You can definitely do operations in a reproducible way (assuming you do the reductions in a defined order) but, yeah, you might lose some performance. Unfortunately the ML people seem to pick better performance over correctness basically every time :( | ||||||||
|