Remix.run Logo
NoahZuniga 2 days ago

O(2n) doesn't exist. The whole point of big O is that you ignore such "trivial" things as what factor comes before the n

moron4hire 2 days ago | parent [-]

Did I not say that?

haskellshill 2 days ago | parent | next [-]

>we end up performing two loops instead of one, all because sticking three operations in one statement is "yucky"

You seem to believe that "O(2n)"

  for value in range(1, n + 1):
    result ^= value
  for value in A:
    result ^= value
is slower than "O(n2)"

  for value in range(1, n + 1):
    result ^= value
    result ^= A[value-1]
simply because the latter has one "for loop" less. Am I misunderstanding you, or if not, why would this matter for speed?
a_t48 2 days ago | parent | next [-]

Unless both loops get unrolled it's ever so slightly slower due to having to check for the end value twice. Plus potentially a cache hit at the start of the second loop.

codebje 2 days ago | parent | prev | next [-]

None of this is as straightforward as it seems.

A "for" loop in Python isn't particularly cheap. It compiles to some static overhead to set up the iterator, then each loop iteration compiles to the "FOR_ITER" opcode, a "STORE_FAST" opcode to assign the iteration value to a variable, and the body of the loop.

"FOR_ITER" calls the "__next__()" method of the iterator (which is on top of the interpreter object stack), catches the StopIteration exception to know when to terminate the loop (by jumping past the loop body), and stores the iterator value to the top of the stack. What "__next__()" does is totally opaque - we don't know what kind of object A is - but since we've added the overhead of a function call already it wouldn't matter if it was a super tight bit of machine code, we're already paying a (relatively) hefty runtime cost.

A particularly bad implementation of "__next__()" for some custom iterable collection might be so stupid as to walk through the collection until it reaches the current index's item and returns that, so "for value in A" could in fact be O(n^2).

Plus, "result ^= A[value-1]" is substantially more work than "result ^= value", so even just on the loop bodies the two examples aren't very similar at all. Evaluating "A[value-1]" may wind up calling a "__getitem__()" method on A.

If A is, say, a linked list or binary tree, iterating it is very cheap but indexing it is O(n), so the second loop might be O(n^2) where the first is O(n).

So maybe we be a bit more Pythonic, and do:

    for i, value in enumerate(A)
        result ^= i
        result ^= value
One loop, no indexing of A! But we've not actually saved anything: the __next__() method of enumerate's iterator will increment its index then call the __next__() method of A's iterator, (approximately) the same work as if we'd done two FOR_ITER, one for an index and one for A.

Why would this matter for speed? I don't know. Unless 'n' is pretty big a human won't even notice the execution time of any of this code.

moron4hire 2 days ago | parent | prev [-]

Even assuming python's foreach loop in these cases get optimized down to a very bare for loop, the operations being performed are dominated by the looping logic itself, because the loop body is so simple.

Each iteration of a for loop performs one index update and one termination comparison. For a simple body that is just an XOR, that's the difference between performing 5 operations (update, exit check, read array, XOR with value, XOR with index) per N elements in the one loop case versus 7 operations (update, exit, read array, XOR with value, then update, exit, XOR with index) in the two loop case. So we're looking at a 29% savings in operations.

It gets worse if the looping structure does not optimize to a raw, most basic for loop and instead constructs some kind of lazy collection iterator generalized for all kinds of collections it could iterate over.

The smaller the loop body, the higher the gains from optimizing the looping construct itself.

codebje 2 days ago | parent | prev | next [-]

You did, but it might not be an effective strategy to mention asymptotic complexity to help forward your argument that one linear implementation is faster than another.

Whether it's a win in Python to use one or two loops isn't so clear, as a lot is hidden behind complex opcodes and opaque iterator implementations. Imperative testing might help, but a new interpreter version could change your results.

In any case, if we want to nitpick over performance we should be insisting on a parallel implementation to take advantage of the gobs of cores CPUs now have, but now we're on a micro-optimisation crusade and are ignoring the whole point of the article.

iainmerrick a day ago | parent | prev [-]

You said the code from the article is O(2n) when it could be O(n), but those are the same thing.