> Why is reserving a megabyte of stack space "expensive"?

Because if you use one thread for each of your 10,000 idle sockets you will use 10GB to do nothing.

So you'll want to use a better architecture such as a thread pool.

And if you want your better architecture to be generic and ergonomic, you'll end up with async or green threads.

▲ lelanthran 3 hours ago | parent | next [-]

> Because if you use one thread for each of your 10,000 idle sockets you will use 10GB to do nothing.

1.On a system that is handling 10k concurrent requests, the 10GB of RAM is going to be a fraction of what is installed.

2. It's not 10GB of RAM anyway, it's 10GB of address space. It still only gets faulted into real RAM when it gets used.

▲

n_e 3 hours ago | parent | next [-]

> 1.On a system that is handling 10k concurrent requests, the 10GB of RAM is going to be a fraction of what is installed.

My example (and the c10k problem) is 10k concurrent connections, not 10k concurrent requests.

> 2. It's not 10GB of RAM anyway, it's 10GB of address space. It still only gets faulted into real RAM when it gets used.

Yes, and that's both memory and cpu usage that isn't needed when using a better concurrency model. That's why no high-performance server software use a huge amount of threads, and many use the reactor pattern.

	▲	cmrdporcupine 3 hours ago \| parent [-]
		> Yes, and that's both memory and cpu usage that isn't needed No, it literally is not. The "memory" is just entries in a page table in the kernel and MMU. It shouldn't worry you at all. Nor is the CPU used by the kernel to manage those threads going to be necessarily less efficient than someone's handrolled async runtime. In fact given it gets more eyes... likely more. The sole argument I can see is just avoiding a handful of syscalls and excessive crossing of the kernel<->userspace brain blood barrier too much.

▲

com2kid 3 hours ago | parent | prev [-]

> 1.On a system that is handling 10k concurrent requests, the 10GB of RAM is going to be a fraction of what is installed

I've written massively concurrent systems where each connection only handled maybe a few kilobytes of data.

Async io is a massive win in those situations.

This describes many rest endpoints. Fetch a few rows from a DB, return some JSON.

▲ wmf 4 hours ago | parent | prev | next [-]

On a 64-bit system, 10 GB of address space is nothing.

▲

matheusmoreira 2 hours ago | parent [-]

10 GB of RAM is certainly something though. Especially in current times.

▲

monocasa 2 hours ago | parent [-]

Except if those threads are actually faulting in all of that memory and making it resident, they'd be doing the same thing, just on the heap, for a classic async coroutine style application.

	▲	asdfasgasdgasdg an hour ago \| parent [-]
		If you have hugepages enabled, all of those threads are probably faulting in a fair amount of memory.

▲ duped 4 hours ago | parent | prev [-]

> you will use 10GB to do nothing.

You don't pay for stack space you don't use unless you disable overcommit. And if you disable overcommit on modern linux the machine will very quickly stop functioning.

▲ simonask 3 hours ago | parent [-]

The amount of stack you pay for on a thread is proportional to the maximum depth that the stack ever reached on the thread. Operating systems can grow the amount of real memory allocated to a thread, but never shrink it.

It’s a programming model that has some really risky drawbacks.

	▲	matheusmoreira 2 hours ago \| parent [-]
		> Operating systems can grow the amount of real memory allocated to a thread, but never shrink it. Operating systems can shrink the memory usage of a stack. `madvise(page, size, MADV_DONTNEED);` Leaves the memory mapping intact but the kernel frees underlying resources. Subsequent accesses get either new zero pages or the original file's pages. Linux also supports mremap, which is essentially a kernel version of realloc. Supports growing and shrinking memory mappings. `stack = mremap(stack, old_size, old_size / 2, MREMAP_MAYMOVE, 0);` Whether existing systems make use of this is another matter entirely. My language uses mremap for growth and shrinkage of stacks. C programs can't do it because pointers to stack allocated objects may exist.