Remix.run Logo
pizlonator 5 days ago

Would your modern game have a stack that is so deep that it brushes up against the stack height limit?

Probably not. Your game would be inches of stack away from crashing

debugnik 5 days ago | parent [-]

You're missing the point, they're giving an example of an entire workload that fits into your technique's worst-case overhead. It's could be the right trade-off and rarely be hit, but that worst-case does sound bad.

pizlonator 5 days ago | parent | next [-]

Stacks tend to be small enough that the cost of scanning them is minuscule.

(I’m not trying to BS my way here - I’m explaining the reason why on the fly GC optimization almost never involves doing stuff about the time it takes to scan stack. It’s just not worth it. If it was, we’d be seeing a lot of stacklet type optimizations.)

torginus 5 days ago | parent | prev | next [-]

From what he describes, he uses stack maps to tell which stack values are pointers. He can skip over everything that's not a pointer.

On x86_64 you need about 10k function deep stack, all of them with the 14 GPs filled with pointers -to have an 1MB stack.

pizlonator 5 days ago | parent [-]

To play devil's advocate, the suckiest part about stack scanning is that it's a linked list walk. It's not a linear scan. So it's all pointer chasing. And it's very likely to find previously unmarked pointers, which involves CAS and other work.

(It would be a linear scan if I was just conservatively scanning, but then I'd have other problems.)

This is one of the most counterintuitive parts of GC perf! You'd think that the stack scan had to be a bottleneck, and it might even be one in some corner cases. But it's just not the longest pole in the tent most of the time, because you're so unlikely to actually have a 1MB stack, and programs that do have a 1MB stack tend to also have ginormous heaps (like many gigabytes), which then means that even if the stack scan is a problem it's not the problem.

kragen 4 days ago | parent [-]

You're writing the compiler, though, so you can define the stack layout. If the stack-scanning linked-list walk were the long pole, it wouldn't be hard to eliminate the pointer chasing: your procedure prologue could add a pointer to each newly pushed stack frame to something like a std::deque, then pop it off in the epilogue.

I don't know, maybe the fact that I'm disagreeing with someone who knows a lot more than I do about the issue should be a warning sign that I'm probably wrong?

adastra22 5 days ago | parent | prev | next [-]

^ this was the intent of the example.

kristofferc 5 days ago | parent | prev [-]

Actually, it sounds quite ok.