Remix.run Logo
moron4hire 2 days ago

No, again, that's not my point. The code from the article is O(2n) when it could be O(n). I know we're not supposed to care about constant factors, but I've lived in a world where not micro optimizing the ever loving shit out of my software could potentially make people throw up, so this sort of stuff kind of stands out to me.

repiret 2 days ago | parent | next [-]

The code in the article is written in Python, whose only for loop is for-each. It is 2N XOR operations, regardless of whether you use one or two loops.

I probably would have written it with a single loop, using the `enumerate` iterator adapter. But in Python, two loops is almost certainly more efficient.

Dylan16807 2 days ago | parent | next [-]

You can loop over the range and then do result ^= i ^ A[i]. If adapters are slow you don't need them here.

Having only one loop gives you a better memory access pattern, because it's 2 XOR operations in between each memory access. Two loops is the same number of instructions, but it spends one loop ignoring memory and then another loop doing rapid-fire memory accesses. In python there's enough overhead that it's unlikely to matter. But in a faster language running on a busy machine that could make a real difference.

sfn42 a day ago | parent | prev [-]

for i in range(n - 1):

Is pretty much a standard for loop. Between that and

for n in numbers:

You can do pretty much the same things as a more conventional language.

You could also solve it pretty simply like this:

expected_sum = (n * (n + 1)) / 2

missing_num = expected_sum - sum(numbers)

This only iterates the list once. This would probably be my solution if I was given this task in an interview.

perfmode 2 days ago | parent | prev | next [-]

real world performance will depend on how much of that N fits in cache. and in what cache it fits (L1, 2, 3). once loaded, you may not pay much cost to access each value a second time.

MindSpunk 2 days ago | parent [-]

Doing 2 loops over the data means you have to do a full pass over the data twice. If your N doesn’t fit in L3 then you’re going to load N twice instead of once. Loading twice, even out of L1 is still slower than never loading twice at all.

moron4hire 2 days ago | parent [-]

Exactly. And there's also the fact that sometimes the data stream we're processing is unbounded and ephemeral. For example, reading values from a physical sensor. It may not match up to this specific example, but the point remains that a single "loop" over a data set might be all you get, so pack as much into that loop as you can.

NoahZuniga 2 days ago | parent | prev [-]

O(2n) doesn't exist. The whole point of big O is that you ignore such "trivial" things as what factor comes before the n

moron4hire 2 days ago | parent [-]

Did I not say that?

haskellshill 2 days ago | parent | next [-]

>we end up performing two loops instead of one, all because sticking three operations in one statement is "yucky"

You seem to believe that "O(2n)"

  for value in range(1, n + 1):
    result ^= value
  for value in A:
    result ^= value
is slower than "O(n2)"

  for value in range(1, n + 1):
    result ^= value
    result ^= A[value-1]
simply because the latter has one "for loop" less. Am I misunderstanding you, or if not, why would this matter for speed?
a_t48 2 days ago | parent | next [-]

Unless both loops get unrolled it's ever so slightly slower due to having to check for the end value twice. Plus potentially a cache hit at the start of the second loop.

codebje 2 days ago | parent | prev | next [-]

None of this is as straightforward as it seems.

A "for" loop in Python isn't particularly cheap. It compiles to some static overhead to set up the iterator, then each loop iteration compiles to the "FOR_ITER" opcode, a "STORE_FAST" opcode to assign the iteration value to a variable, and the body of the loop.

"FOR_ITER" calls the "__next__()" method of the iterator (which is on top of the interpreter object stack), catches the StopIteration exception to know when to terminate the loop (by jumping past the loop body), and stores the iterator value to the top of the stack. What "__next__()" does is totally opaque - we don't know what kind of object A is - but since we've added the overhead of a function call already it wouldn't matter if it was a super tight bit of machine code, we're already paying a (relatively) hefty runtime cost.

A particularly bad implementation of "__next__()" for some custom iterable collection might be so stupid as to walk through the collection until it reaches the current index's item and returns that, so "for value in A" could in fact be O(n^2).

Plus, "result ^= A[value-1]" is substantially more work than "result ^= value", so even just on the loop bodies the two examples aren't very similar at all. Evaluating "A[value-1]" may wind up calling a "__getitem__()" method on A.

If A is, say, a linked list or binary tree, iterating it is very cheap but indexing it is O(n), so the second loop might be O(n^2) where the first is O(n).

So maybe we be a bit more Pythonic, and do:

    for i, value in enumerate(A)
        result ^= i
        result ^= value
One loop, no indexing of A! But we've not actually saved anything: the __next__() method of enumerate's iterator will increment its index then call the __next__() method of A's iterator, (approximately) the same work as if we'd done two FOR_ITER, one for an index and one for A.

Why would this matter for speed? I don't know. Unless 'n' is pretty big a human won't even notice the execution time of any of this code.

moron4hire 2 days ago | parent | prev [-]

Even assuming python's foreach loop in these cases get optimized down to a very bare for loop, the operations being performed are dominated by the looping logic itself, because the loop body is so simple.

Each iteration of a for loop performs one index update and one termination comparison. For a simple body that is just an XOR, that's the difference between performing 5 operations (update, exit check, read array, XOR with value, XOR with index) per N elements in the one loop case versus 7 operations (update, exit, read array, XOR with value, then update, exit, XOR with index) in the two loop case. So we're looking at a 29% savings in operations.

It gets worse if the looping structure does not optimize to a raw, most basic for loop and instead constructs some kind of lazy collection iterator generalized for all kinds of collections it could iterate over.

The smaller the loop body, the higher the gains from optimizing the looping construct itself.

codebje 2 days ago | parent | prev | next [-]

You did, but it might not be an effective strategy to mention asymptotic complexity to help forward your argument that one linear implementation is faster than another.

Whether it's a win in Python to use one or two loops isn't so clear, as a lot is hidden behind complex opcodes and opaque iterator implementations. Imperative testing might help, but a new interpreter version could change your results.

In any case, if we want to nitpick over performance we should be insisting on a parallel implementation to take advantage of the gobs of cores CPUs now have, but now we're on a micro-optimisation crusade and are ignoring the whole point of the article.

iainmerrick a day ago | parent | prev [-]

You said the code from the article is O(2n) when it could be O(n), but those are the same thing.