▲ | CyberDildonics 15 hours ago | |||||||
One reason it's slower is because your stack doesn't reserve any memory and grows to 40441 at its max size then shrinks back down again. Stack uses a dequeue by default which stores elements in chunks which likely causes lots of memory allocations (and deallocations) which don't happen in the recursive version. Also at n=80,000 your recursive version blows the stack. https://en.cppreference.com/w/cpp/container/deque.html The stack arithmetic is handled in hardware increasing IPC significantly, and the 'frame' you are talking about it almost the same same size as a single value in the happy path when all the relevant optimizations work out. The program stack isn't magically special, it isn't going to beat writing a single value to memory, especially if that memory is some constant sized array already on the stack. Debugging recursive programs is pretty neat with most debuggers. No, you don't unwind through anything manually, just generate a backtrace. No matter what kind of debugger it is you're still going to be looking at a lot of information that contains the values you're looking for instead of just looking at the values directly in an array. Recursion gets used because it's quick, dirty and clever, not because it's the best way to do it. | ||||||||
▲ | fooker 14 hours ago | parent [-] | |||||||
Go on, 'Prove it'. Write a version that's faster. I know it's doable, because I have done it. You don't seem to understand yet how complex it will be. My guess is ~10x the number of lines of code. It'll be significantly less readable, let alone debuggable. (btw changing from stack to vector and reserving memory outright for the allocations has virtually no change in performance.) > The program stack isn't magically special This is what you're missing. Yes, it is magical because the hardware optimizes for that path. That's why it's faster than what you'd think from first principles. > it isn't going to beat writing a single value to memory If you examine the kernel trace from this, you'll find that it has the exact same memory usage and bandwidth (and about twice the ipc). Magical, yes. | ||||||||
|