Remix.run Logo
mrkeen an hour ago

> fork() is a relatively expensive system call; it must copy the entire process state (including memory) for the child process. Many optimizations have been made over the years, but a fork is still a fundamentally costly operation. To make things worse, a fork() call is often immediately followed by an exec(), which will discard all of that memory that was so carefully copied for the child.

It's weird to leave out a mention of copy-on-write - the optimisation that means that you don't copy over all the memory.

tux3 an hour ago | parent | next [-]

This was left implicit in the article, but what they mean by copying the process state here is the memory management structures. That's mainly the page tables and the VMAs.

That means you have to allocate new pages to hold a copy of all these structures, even if the actual memory pointed by the pages is shared. And walking all those structures to make a copy is still costly.

cls59 44 minutes ago | parent | prev | next [-]

Even with copy-on-write, fork() still has to pay the setup cost for COW. If the parent process has a lot of busy threads (e.g. Java), you can end up doing a lot of unnecessary COW before exec() fires.

epcoa 34 minutes ago | parent | prev | next [-]

> It's weird to leave out a mention of copy-on-write

For the intended audience of such a paper this is base knowledge.

FooBarWidget an hour ago | parent | prev [-]

It says state. Copy on write still means it's O(number of page table entries) even if you don't copy the contents. It's a well known issue that forking a program with large virtual memory size is slow.

mort96 27 minutes ago | parent [-]

It says "(including memory)". It's pretty natural to read this as "(including the contents of allocated pages)".