Remix.run Logo
AndyKelley 3 days ago

Note that not having runtime-known stack allocations is a key piece of the puzzle in Zig's upcoming async I/O strategy because it allows the compiler to calculate upper bound stack usage for a given function call.

At a fundamental level, runtime-known stack allocation harms code reusability.

Edit: commenters identified 2 more puzzle pieces below, but there's still one that didn't get asked about yet :P

do_not_redeem 3 days ago | parent | next [-]

A comptime_int-bounded alloca would achieve those goals, plus would be more space-efficient on average than the current strategy of always pessimistically allocating for the worst case scenario.

  @alloca(T: type, count: usize, upper_bound_count: comptime_int)
with the added bonus that if `count` is small, you can avoid splitting the stack around a big chunk of unused bytes. Don't underestimate the important of memory locality on modern CPUs.
AndyKelley 3 days ago | parent [-]

2017: https://github.com/ziglang/zig/issues/225

when I had been only thinking about zig for 2 years, I thought the same.

do_not_redeem 3 days ago | parent [-]

I'd be curious if you expanded your reasoning, your comments in that thread never explained anything for me.

> It's too tempting to use incorrectly.

A compile-time-determined upper bound would solve this.

> The stack is allocated based on a compile-time-determined upper bound.

A compile-time-determined upper bound would solve this too.

Shouldn't a performance-oriented language give the programmer tools to improve memory locality? And what's wrong with spexguy's idea?

throwawaymaths 3 days ago | parent [-]

instead of being namby pamby with the stack, it's simply better to take all of the desired maximimum. in zig, there's even provided a way to wrap it in an allocator so you can pretend like it's on the heap!

do_not_redeem 3 days ago | parent | next [-]

This conversation would benefit from using more rigorous technical terminology than "namby pampy". There is nothing namby pamby about allocating the precise amount of space that you need, and keeping your app's memory footprint optimized. That's called engineering.

throwawaymaths 3 days ago | parent [-]

real question: what are you going to do with the rest of the stack? are you in a situation where the stack and the heap might collide because you're that tight on resources? and let's say you take a function call that is about to overflow the stack. what should happen? error? panic? return null? silent fail?

there are no good choices in the case where you really need that thing you claim to need. recognizing that fact and picking different strategy is good engineering.

uecker 3 days ago | parent | next [-]

The first stack / heap collisions were not using VLA but fixed size arrays on the stack. Nowadays compilers do stack probing, which solves this problem also for VLAs. Yes, you get a segfault for stack overflow, but this has not much to do with VLAs or not, but with putting too much stuff on the stack. The thing is, VLAs allow you to reduce stack usage, by putting the right amount of stack on the stack and not a worst case. The only downside is that they make it harder to control stack usage, but not a lot harder. So no, I do not think avoiding VLAs is good engineering.

do_not_redeem 3 days ago | parent | prev [-]

This whole post is a strawman. I never said my reason was being tight on resources. Please reread the thread. Also don't forget that on modern architectures, the stack and heap can't "collide", because of guard pages.

> what are you going to do with the rest of the stack?

I'll leave it for the rest of the system. My app will use less memory, and since memory locality is improved, there will be fewer cache misses, meaning it runs faster too.

> let's say you take a function call that is about to overflow the stack

Stack overflows are impossible thanks to the comptime upper_bound parameter. That's the entire premise of this thread.

CJefferson 3 days ago | parent | prev [-]

Yes, let's be "namby Pamby" with the cache lines storing the hot part of the stack, that sounds like an awesome idea!

I thought Zig was all about maximum performance. Sometimes I just want a little bit of stack memory, which will often already be in L1 cache.

achierius 2 days ago | parent [-]

Zig does allow this, that's what GP is saying. You don't actually need to relocate your stack, you can just declare a portion of your stack (i.e. what would otherwise be the next N frames) to be the stack you'll use for recursion, and thereafter use that to recurse in to.

ManDeJan 3 days ago | parent | prev | next [-]

How does this work with interrupts, say in an embedded context, that execute on the current stack, and that may in some cases, be interrupted themselves. Do you add the maximum stack depth of all interrupt routines that could go off at the same time?

travisgriggs 3 days ago | parent | prev | next [-]

> Note that not having runtime-known stack allocations is a key piece of the puzzle in Zig's upcoming async I/O strategy because it allows the compiler to calculate upper bound stack usage for a given function call.

Sigh. So I have to choose between something I think might be useful, for something that too many languages have already soiled themselves with. Hopes that Zig has a better solution, but not optimistic.

Our stack compels me to work in Swift, Kotlin, Elixir, and Python. I use the async feature of Swift and Kotlin when some library forces me to. I actually preferred working with GCD before Swift had to join the async crowd. Elixir of course just has this problem solved already.

I frequently ask others who work in these languages how often they themselves reach for the async abilities of their languages, and the best I ever get from the more adventurous type is “I did a play thing to experiment with what I could do with it”.

dnautics 3 days ago | parent [-]

RE: Elixir I have a feeling that the zig's i/o strategy will enable me to bring back the zig-async-dependent yielding nifs in zigler. I'm really hopeful io interface will have a yield() function, that would be even better!

https://www.youtube.com/watch?v=lDfjdGva3NE&t=1819s

arthurcolle 3 days ago | parent [-]

Love the excitement

omnicognate 3 days ago | parent | prev | next [-]

> there's still one that didn't get asked about yet :P

C libraries?

NobodyNada 3 days ago | parent [-]

Or function pointers (especially given that Zig's been moving towards encouraging vtables over static dispatch)?

AndyKelley 3 days ago | parent [-]

Bingo. That completes the picture: https://github.com/ziglang/zig/issues/23367

AshamedCaptain 3 days ago | parent | prev | next [-]

How does this work given... recursion?

Even on languages without VLAs one can implement a simulacra of them with recursion.

AndyKelley 3 days ago | parent | next [-]

All Zig code is in one compilation unit, so the compiler has access to the entire function call graph. Cycles in the graph (recursion) cause an error. To break cycles in the graph, one must use a language builtin to call a function using a different stack (probably obtained via heap allocation).

dev-ns8 3 days ago | parent | next [-]

Does this mean it's impossible in Zig to do strictly Stack related recursion and just by the mere inclusion of a recursive function your implicitly getting heap allocations alongside?

AndyKelley 3 days ago | parent [-]

You can put a big buffer on the stack, and use this buffer to break your cycles. At some point you'll run out of this buffer and be forced to handle failure, rather than triggering a stack overflow segfault.

So it will be the same thing but with more (error handling) steps.

This annoyance can be avoided by avoiding recursion. Where recursion is useful, it can be done, you just have to handle failure properly, and then you'll have safety against stack overflow.

CJefferson 3 days ago | parent | next [-]

Wait, so how do I write mutually recursive functions, say for a parser? Do I have to manually do the recursion myself, and stick everything in one big uber-function?

eru 3 days ago | parent | prev [-]

Does Zig offer (guaranteed) tail call optimisation?

> Where recursion is useful, [...]

Recursion is so useful, most imperative languages even have special syntax constructs very specific special cases of recursion they call 'loops'.

messe 3 days ago | parent [-]

> Does Zig offer (guaranteed) tail call optimisation?

Yes[1]. You can use the @call builtin with the .always_tail modifier.

    @call(.always_tail, foo, { arg1, arg2, ... });
[1]: https://ziglang.org/documentation/master/#call
ants_everywhere 3 days ago | parent | prev | next [-]

> All Zig code is in one compilation unit

How do incremental compilation and distributed compilation work?

pyrolistical 3 days ago | parent | next [-]

Subtrees should be cacheable and parallelizable?

wavemode 3 days ago | parent | prev [-]

Single compilation unit does not imply that the results of the compilation of the different parts of that unit cannot be cached.

ants_everywhere 3 days ago | parent [-]

I see. So in some sense the actual unit of compilation is smaller and units can be combined or "linked" even if compiled at different times on different machines?

atmikemikeb 3 days ago | parent | prev [-]

what about extern functions?

AndyKelley 3 days ago | parent [-]

Zig's linker will calculate this information automatically in most cases when statically linking (via analysis of machine code disassembly). Otherwise, there is a default upper bound stack value, overridable via user annotation.

omnicognate 3 days ago | parent | prev [-]

https://github.com/ziglang/zig/issues/1006

3 days ago | parent | prev | next [-]
[deleted]
mananaysiempre 3 days ago | parent | prev | next [-]

> Note that not having runtime-known stack allocations is a key piece of the puzzle in Zig's upcoming async I/O strategy because it allows the compiler to calculate upper bound stack usage for a given function call.

That’s a genuinely interesting point. I don’t think known sizes for locals are a hard requirement here, though threading this needle in a lower-level fashion than Swift would need some subtle language design.

Fundamentally, what you want to do is construct an (inevitably) runtime-sized type (the coroutine) out of (by problem statement) runtime-sized pieces (the activation frames, itself composed out of individual, possibly runtime-sized locals). It’s true that you can’t then allow the activations to perform arbitrary allocas. You can, however, allow them to do allocas whose sizes (and alignments) are known at the time the coroutine is constructed, with some bookkeeping burden morally equivalent to maintaining a frame pointer, which seems fair. (In Swift terms, you can construct a generic type if you know what type arguments are passed to it.) And that’s enough to have a local of type of unknown size pulled in from a dynamic library, for example.

Again, I’m not sure how a language could express this constraint on allocas without being Swift (and hiding the whole thing from the user completely) or C (and forcing the user to maintain the frames by hand), so thank you for drawing my attention to this question. But I’m not ready to give up on it just yet.

> At a fundamental level, runtime-known stack allocation harms code reusability.

This is an assertion, not an argument, so it doesn’t really have any points I could respond to. I guess my view is this: there are programs that can be written with alloca and can’t be written without (unless you introduce a fully general allocator, which brings fragmentation problems, or a parallel stack, which is silly but was in fact used to implement alloca historically). One other example I can give in addition to locals of dynamically-linked types is a bytecode interpreter that allocates virtual frames on the host stack. So I guess that’s the other side of being opinionated—those whose opinions don’t match are turned away.

Frankly, I don’t even know why I’m defending alloca this hard. I’m not actually happy with the status quo of just yoloing a hopefully maybe sufficiently large stack. I guess the sticking point is that you seem to think alloca is obviously the wrong thing, when it’s not even close to obvious to me what the right thing is.

bobthebuilders 3 days ago | parent | next [-]

Alloca is a fundmentally insecure way of doing allocations. Languages that promote alloca will find themselves stuck in a morass of security messes and buffer overflows. If Zig were to adopt alloca, it would make the catastrophic mistake that plagued C for over several decades and introduce permanently unfixable security issues for another generation of programming languages.

johnisgood 3 days ago | parent | next [-]

Any thoughts on the use of strdupa()? I do not use it, but I wonder if that is dangerous too, considering it uses alloca().

mananaysiempre 3 days ago | parent [-]

I’ve been defending alloca() here, but no, strdupa() (not to be confused with shlwapi!StrDupA on Windows) is a bad idea. In cases that I think are acceptable, the size of the allocation is reasonably small and does not come from outside the program. Here you’re duplicating a string that you probably got somewhere else and don’t really control. That means you don’t really know if and when you’re going to overflow the stack, which is not a good position to be in.

(Once upon a time, MSLU, a Microsoft-provided Unicode compatibility layer for Windows 9x, used stack-allocated buffers to convert strings from WTF-16 to the current 8-bit encoding. That was also a bad idea.)

johnisgood 2 days ago | parent [-]

I don't have anything against alloca(), but then again, I don't use it at all. I stick to malloc() / free(), and in case of strings, asprintf().

rurban 3 days ago | parent | prev | next [-]

Didn't stop rust from using it internally.

kibwen 3 days ago | parent | next [-]

I don't think Rust uses alloca internally for anything. You may be thinking of Swift, which I think uses alloca for ABI shenanigans.

surajrmal 3 days ago | parent | prev [-]

How does it do that?

steveklabnik 3 days ago | parent | prev [-]

I don’t know why you’re downvoted, alloca is a mistake.

Yoric 3 days ago | parent | prev [-]

Out of curiosity, why is knowing the size of locals required, exactly? Because it avoids dynamic allocations for each fiber?

Conscat 3 days ago | parent | prev [-]

Does anything stop a user from doing this with inline assembly?

AndyKelley 3 days ago | parent [-]

Wisdom