One reason it's slower is because your stack doesn't reserve any memory and grows to 40441 at its max size then shrinks back down again. Stack uses a dequeue by default which stores elements in chunks which likely causes lots of memory allocations (and deallocations) which don't happen in the recursive version. Also at n=80,000 your recursive version blows the stack.
https://en.cppreference.com/w/cpp/container/deque.html
The stack arithmetic is handled in hardware increasing IPC significantly, and the 'frame' you are talking about it almost the same same size as a single value in the happy path when all the relevant optimizations work out.
The program stack isn't magically special, it isn't going to beat writing a single value to memory, especially if that memory is some constant sized array already on the stack.
Debugging recursive programs is pretty neat with most debuggers. No, you don't unwind through anything manually, just generate a backtrace.
No matter what kind of debugger it is you're still going to be looking at a lot of information that contains the values you're looking for instead of just looking at the values directly in an array.
Recursion gets used because it's quick, dirty and clever, not because it's the best way to do it.