Yes, but for loop comes with all those data dependencies that prevent it from being parallelized trivially.
The algorithm with fewer data dependencies is O(N log N).
This is covered in more detail in the article.