new | show | ask | jobs Github

adastra22 5 days ago

That's a very long time. Milliseconds of work is an entire frame update-render cycle in a modern game.

▲

pizlonator 5 days ago | parent | next [-]

Would your modern game have a stack that is so deep that it brushes up against the stack height limit?

Probably not. Your game would be inches of stack away from crashing

▲

debugnik 5 days ago | parent [-]

You're missing the point, they're giving an example of an entire workload that fits into your technique's worst-case overhead. It's could be the right trade-off and rarely be hit, but that worst-case does sound bad.

▲

pizlonator 5 days ago | parent | next [-]

Stacks tend to be small enough that the cost of scanning them is minuscule.

(I’m not trying to BS my way here - I’m explaining the reason why on the fly GC optimization almost never involves doing stuff about the time it takes to scan stack. It’s just not worth it. If it was, we’d be seeing a lot of stacklet type optimizations.)

▲

torginus 5 days ago | parent | prev | next [-]

From what he describes, he uses stack maps to tell which stack values are pointers. He can skip over everything that's not a pointer.

On x86_64 you need about 10k function deep stack, all of them with the 14 GPs filled with pointers -to have an 1MB stack.

▲

pizlonator 5 days ago | parent [-]

To play devil's advocate, the suckiest part about stack scanning is that it's a linked list walk. It's not a linear scan. So it's all pointer chasing. And it's very likely to find previously unmarked pointers, which involves CAS and other work.

(It would be a linear scan if I was just conservatively scanning, but then I'd have other problems.)

This is one of the most counterintuitive parts of GC perf! You'd think that the stack scan had to be a bottleneck, and it might even be one in some corner cases. But it's just not the longest pole in the tent most of the time, because you're so unlikely to actually have a 1MB stack, and programs that do have a 1MB stack tend to also have ginormous heaps (like many gigabytes), which then means that even if the stack scan is a problem it's not the problem.

	▲	kragen 4 days ago \| parent [-]
		You're writing the compiler, though, so you can define the stack layout. If the stack-scanning linked-list walk were the long pole, it wouldn't be hard to eliminate the pointer chasing: your procedure prologue could add a pointer to each newly pushed stack frame to something like a std::deque, then pop it off in the epilogue. I don't know, maybe the fact that I'm disagreeing with someone who knows a lot more than I do about the issue should be a warning sign that I'm probably wrong?

▲

adastra22 5 days ago | parent | prev | next [-]

^ this was the intent of the example.

▲

kristofferc 5 days ago | parent | prev [-]

Actually, it sounds quite ok.

▲

munificent 5 days ago | parent | prev | next [-]

Games don't tend to have very deep callstacks. And if a game cared about performance also wanted to use GC, it would probably try to run the GC at the end of a frame when there is little on the stack.

	▲	pizlonator 5 days ago \| parent \| next [-]
		Yeah UE GC safepoints at end of tick where there is no stack. That’s a common trick in systems that have both GC and ticking. To be fair, FUGC doesn’t currently let you do that. The GC runs in a separate thread and soft handshakes at various points, which cause your game thread to react at poll checks and exits that might not be at end of tick. But I could add a feature that lets you to force handshake responses to be at end of tick! That sounds like a good idea
	▲	adastra22 5 days ago \| parent \| prev [-]
		FUGC runs the GC in a separate thread and you don’t have a lot of control over when it interrupts.

▲

kragen 5 days ago | parent | prev [-]

Latency-sensitive programs like games are usually careful to avoid deep recursion.