Remix.run Logo
foldr 4 days ago

Go won’t put large allocations on the stack even if escape analysis would permit it, so generally speaking this should only be a concern if you have very deep recursion (in which case you might have to worry about stack overflows anyway).

masklinn 3 days ago | parent | next [-]

> Go won’t put large allocations on the stack even if escape analysis would permit it

Depends what you mean by “large”. As of 1.24 Go will put slices several KB into the stack frame:

    make([]byte, 65536)
Goes on the stack if it does not escape (you can see Go request a large stack frame)

    make([]byte, 65537)
goes on the heap (Go calls runtime.makeslice).

Interestingly arrays have a different limit: they respect MaxStackVarSize, which was lowered from 10MB to 128 KB in 1.24.

If you use indexed slice literals gc does not even check and you can create megabyte-sized slices on the stack.

Yokohiii 3 days ago | parent [-]

There is a option -smallframes that seems to be intended for conservative use cases. Below are the related configs and a test at what point they escape (+1).

  // -smallframes
  // ir.MaxStackVarSize = 64 * 1024
  // ir.MaxImplicitStackVarSize = 16 * 1024
  a := [64 * 1024 +1]byte{}
  b := make([]byte, 0, 16 * 1024 +1)
  // default
  // MaxStackVarSize = int64(128 * 1024)
  // MaxImplicitStackVarSize = int64(64 * 1024)
  c := [128 * 1024 +1]byte{}
  d := make([]byte, 0, 64 * 1024 +1)
Not sure how to verify this, but the assumption you can allocate megabytes on the stack seems wrong. The output of the escape analysis for arrays is different then the make statement:

  test/test.go:36:2: moved to heap: c
Maybe an overlook because it is a bit sneaky?
masklinn 2 days ago | parent [-]

> Not sure how to verify this, but the assumption you can allocate megabytes on the stack seems wrong.

    []byte{N: 0}
Yokohiii a day ago | parent | next [-]

doesn't make sense.

masklinn a day ago | parent [-]

And yet it does: https://godbolt.org/z/h9GW5v3YK

And creates an on-stack slice whose size is only limited by Go's 1GB limit on individual stack frames: https://godbolt.org/z/rKzo8jre6 https://godbolt.org/z/don99e9cn

Yokohiii 16 hours ago | parent [-]

Yea with more context it suddenly makes sense :p

Interesting, [...] syntax works here as expected. So escape analysis simply doesn't look at the element list.

2 days ago | parent | prev [-]
[deleted]
Yokohiii 4 days ago | parent | prev [-]

Escape analysis accounts for size, so it wouldn't even permit it.

The initial stack size seems to be 2kb, a more on a few systems. So far I understand you can allocate a large local i.e. 8kb, that doesn't escape and grow the stack immediately. (Of course that adds up if you have a chain of calls with smaller allocs). So recursion is certainly not the only concern.

foldr 4 days ago | parent [-]

For that to be a problem you either have to have one function that allocates an enormous number of non-escaping objects below the size limit (if the Go compiler doesn't take the total size of all a function's non-escaping allocations into account – I don't know), or a very long series of nested function calls, which in practice is only likely to arise if there are recursive calls.

Yokohiii 4 days ago | parent [-]

I think we mix things up here. But be aware of my newbie knowledge.

I am pretty sure the escape analysis doesn't affect the initial stack size. Escape analysis does determine where an allocation lives. So if your allocation is lower then what escape analysis considers heap and bigger then the initial stack size, the stack needs to grow.

What I am certain about, is that I have runtime.newstack calls accounting for +20% of my benchmark times (go testing). My code is quite shallow (3-4 calls deep) and anything of size should be on the heap (global/preallocated) and the code has zero allocations. I don't use goroutines either, it might me I still make a mistake or it's the overhead from the testing benchmark. But this obviously doesn't seem to be anything super unusual.

foldr 4 days ago | parent [-]

I don't know about your code, but in general, goroutine stacks are designed to start small and grow. There is nothing concerning about this. A call to runtime.newstack triggered by a large stack-allocated value would generally be cheaper than the corresponding heap allocation.

Yokohiii 4 days ago | parent [-]

I found my issue, I was creating a 256 item fixed array of a 2*uint8 struct in my code. That was enough to cause newstack calls. It now went down from varying 10% to roughly 1%. Oddly enough it didn't change the ns/op a bit. I guess some mix of workload related irrelevancy and inaccurate reporting or another oversight on my side.