Remix.run Logo
DarkNova6 8 months ago

But stack allocated objects are not part of the heap and therefore not even part of Garbage Collection? And afaik stack allocation is already done for objects which don't escape a method.

masklinn 8 months ago | parent [-]

Yes, but that’s the point: objects which don’t escape are pretty much all young objects. So by this process the stack captures a significant fraction of the young generation, that young generation never reaches the heap and this is never under consideration by the GC.

Essentially the stack is a form of younggen. It is not as complete (as there are things which must be heap allocated) but because it is, it reduces the benefits of a generational GC… without having much impact on its costs and complexity.

Depending on work load, that competition can be sufficient to make a generational GC net negative.

pron 8 months ago | parent | next [-]

1. How significant that portion is may be lower than you think.

2. Stack allocation adds complexity that actually can adversely affect performance. The main problem is that for stack objects to behave more like heap objects, you need to be able to reference them. References into the stack make user-mode threads less flexible. For example, Go takes a significant hit for Go-native interop in goroutines, whereas Java doesn't pay that cost.

3. Why do you, as a user, care if the GC is more complex?

masklinn 8 months ago | parent [-]

> How significant that portion is may be lower than you think.

And it’s probably higher than you think.

> References into the stack make user-mode threads less flexible. For example, Go takes a significant hit for Go-native interop in goroutines, whereas Java doesn't pay that cost.

This has nothing to do with on-stack structure, it has to do with Go not using C-compatible stacks. If this was an actual issue C would take a hit when calling C.

> Why do you, as a user, care if the GC is more complex?

GC complexity impacts its cost and performances. Generational GCs have more overhead than non-generational ones. That cost needs to be reclaimed by avoiding full collections or the generational GC is a net loss.

And that is what the go team observed on many workloads when they experimented with making Go’s GC generational.

pron 8 months ago | parent [-]

> And it’s probably higher than you think.

Possibly, but the question is does it negate the need for a generational GC? Judging by Go's poor GC performance compared to Java -- it doesn't.

> This has nothing to do with on-stack structure, it has to do with Go not using C-compatible stacks

Sure, but any user-mode thread implementation will not be using C-compatible stacks (if it wants to be efficient). Java doesn't use C-compatible stacks yet it doesn't take the hit Go does.

> GC complexity impacts its cost and performances. Generational GCs have more overhead than non-generational ones.

But in this case (ZGC) it doesn't come at a cost.

> And that is what the go team observed on many workloads when they experimented with making Go’s GC generational.

Go's GC is several generations behind the GCs in Java. It is also isn't compacting. Java had such a GC -- CMS, which served the JDK well for many years -- until the next generation (G1) and the next-next generation (ZGC) were developed, at which point it made little sense to keep a non-moving collector.

DarkNova6 8 months ago | parent | prev | next [-]

Thanks for the answer. But is this actual behaviour for the GCs of the JDK? I was certain that at the very least Hotspot makes use of stack allocation as much as possible.

But perhaps the JDK GCs don't care so much about the stack because that is already dealt by the JVM a step prior? In any case, there will likely still be young objects allocated in the heap and this new algorithm might prove useful.

But you can tell I am far from an expert here.

masklinn 8 months ago | parent [-]

> Thanks for the answer. But is this actual behaviour for the GCs of the JDK? I was certain that at the very least Hotspot makes use of stack allocation as much as possible.

Not really, java has some escape analysis but it's very limited in its ability to stack allocate as it can't put entire structures (objects) on the stack, it only works if the compiler manages to scalar-replace the object (https://shipilev.net/jvm/anatomy-quarks/18-scalar-replacemen...) and that has somewhat restricted applicability (https://pkolaczk.github.io/overhead-of-optional/). The behaviour I'm talking about is mostly that of Go, as it is much more capable of stack allocating, and specifically has a non-generational GC because in testing they found generational GCs had a very variable impact depending on workload (rather than a universally or near universally positive impact).

jerven 8 months ago | parent | next [-]

There was a microsoft prototype for more stack allocation in OpenJDK (https://archive.fosdem.org/2020/schedule/event/reducing_gc_t...). I recall that being put on hold because of how it would interact with project Loom fast stack copying. But I don't know the current status.

GO has a non moving GC and I understand, that the cost of introducing safe moving GC is considered high. If one has a moving GC which the serious java one's are read/write barriers are already required, especially if they are concurrent like ZGC, C4 or Shenadoah. ZGc, C4 and Shenadoah all started out as non generational GC implementations, which gained them later, because in most cases they do increase performance/reduce overhead.

Valhalla makes objects denser, and reduces overhead of identity which is great. Reducing the difference in memory layout between java objects and nested go structs.

Go with arena's reduce the GC de-allocation costs. Something that the ZGC team is looking at in relation to loom/virtual threads. (but I can't find the reference for that right now)

DarkNova6 8 months ago | parent | prev [-]

Thank you for these excellent sources!

pfdietz 8 months ago | parent | prev [-]

I wonder if architectural support could be added to reduce the cost of recording modification information.