Remix.run Logo
TimorousBestie 3 days ago

The single-thread performance of the parallel prefix sum that they use is O(N log N), so the improvement from that to O(log N) on N threads is not as surprising.

The way the headline is written, it sounds like Amdahl’s law was violated. It wasn’t, of course.

casta 3 days ago | parent [-]

How's the prefix sum on a single thread O(N log(N))? Isn't it trivially O(N)? It's just a for loop.

TimorousBestie 3 days ago | parent | next [-]

Yes, but for loop comes with all those data dependencies that prevent it from being parallelized trivially.

The algorithm with fewer data dependencies is O(N log N).

This is covered in more detail in the article.

gyrovagueGeist 3 days ago | parent | prev [-]

It's from the depth of the computation, not the work