The same operations in the same order is a tough constraint in an environment where core count is increasing and clock speeds/IPC are not. It's hard to rewrite some of these algorithms to use a parallel decomposition that's the same as the serial one.

I've done a lot of work on reproducibility in machine learning systems, and its really, really hard. Even the JVM got me by changing some functions in `java.lang.Math` between versions & platforms (while keeping to their documented 2ulp error bounds).

▲

Dylan16807 2 months ago | parent | next [-]

Most people aren't spreading single calculations across multiple cores, and the ones that do are already deep enough into the technical weeds to handle deterministic chunking and combining.

"parallel decomposition that's the same as the serial one" would be difficult in many ways, but only needed when you can't afford a one-time change.

▲

AlotOfReading 2 months ago | parent | prev | next [-]

Many systems I've worked on can be written as embarrassingly parallel with synchronized checkpoints for a relatively small cost if they're not serial, but yeah, the middle ground is hard. That's the nature of the beast though. Imagine trying to rearrange the instructions in your program the same way. You're going to get all sorts of wacky behavior unless you invest an enormous amount of effort into avoiding problems the way superscalar CPUs do.

Relatively few programs truly need to operate that way though and they're usually written by people who know the cost of their choices.

▲

saagarjha 2 months ago | parent | prev [-]

You can definitely do operations in a reproducible way (assuming you do the reductions in a defined order) but, yeah, you might lose some performance. Unfortunately the ML people seem to pick better performance over correctness basically every time :(

▲

aragilar 2 months ago | parent [-]

But is that correctness or reproducibility? The latter isn't important as the former.

	▲	saagarjha a month ago \| parent [-]
		I don't think they care about either