Prove it. You can put your stack data structure on the stack anyway. A balanced tree isn't going to have more depth than your memory address bit length. Why would copying a single value be slower than pushing an entire call frame to the stack? Locality is what matters and there is no truth to what you're saying.

More important is the debugability. If you have a normal data structure you can see the full stack of values. If you use recursion you have to unwind through multiple call frames and look at each one individually.

Recursion is for people who want to show a neat clever trick, it isn't the best way to program.

▲

fooker a day ago | parent [-]

Here you go : https://godbolt.org/z/haroWGa3c

Recursive DFS took 4 ms Iterative DFS took 8 ms

I'm sure you could optimize the explicit stack based one a bit more to reach parity with a significantly more complex program.

But might as well let ~75 years of hardware, OS, and compiler advancements do that for you when possible.

> Why would copying a single value be slower than pushing an entire call frame to the stack

Because that's not what happens. The stack arithmetic is handled in hardware increasing IPC significantly, and the 'frame' you are talking about it almost the same same size as a single value in the happy path when all the relevant optimizations work out.

> More important is the debugability

Debugging recursive programs is pretty neat with most debuggers. No, you don't unwind through anything manually, just generate a backtrace.

▲

CyberDildonics 15 hours ago | parent [-]

One reason it's slower is because your stack doesn't reserve any memory and grows to 40441 at its max size then shrinks back down again. Stack uses a dequeue by default which stores elements in chunks which likely causes lots of memory allocations (and deallocations) which don't happen in the recursive version. Also at n=80,000 your recursive version blows the stack.

https://en.cppreference.com/w/cpp/container/deque.html

The stack arithmetic is handled in hardware increasing IPC significantly, and the 'frame' you are talking about it almost the same same size as a single value in the happy path when all the relevant optimizations work out.

The program stack isn't magically special, it isn't going to beat writing a single value to memory, especially if that memory is some constant sized array already on the stack.

Debugging recursive programs is pretty neat with most debuggers. No, you don't unwind through anything manually, just generate a backtrace.

No matter what kind of debugger it is you're still going to be looking at a lot of information that contains the values you're looking for instead of just looking at the values directly in an array.

Recursion gets used because it's quick, dirty and clever, not because it's the best way to do it.

▲

fooker 14 hours ago | parent [-]

Go on, 'Prove it'. Write a version that's faster.

I know it's doable, because I have done it.

You don't seem to understand yet how complex it will be. My guess is ~10x the number of lines of code. It'll be significantly less readable, let alone debuggable.

(btw changing from stack to vector and reserving memory outright for the allocations has virtually no change in performance.)

> The program stack isn't magically special

This is what you're missing. Yes, it is magical because the hardware optimizes for that path. That's why it's faster than what you'd think from first principles.

> it isn't going to beat writing a single value to memory

If you examine the kernel trace from this, you'll find that it has the exact same memory usage and bandwidth (and about twice the ipc). Magical, yes.

	▲	CyberDildonics 14 hours ago \| parent [-]
		Go on, 'Prove it'. Write a version that's faster. You're trying to say that pushing a function call frame is faster than writing a single value to memory and incrementing an index, and when asked to prove it you made something that allocates and deallocates chunks of memory to a linked list where every access takes multiple pointer dereferences.