> ABSTRACT

> The received wisdom suggests that Unix’s unusual combination of fork() and exec() for process creation was an inspired design. In this paper, we argue that fork was a clever hack for machines and programs of the 1970s that has long outlived its usefulness and is now a liability. We catalog the ways in which fork is a terrible abstraction for the modern programmer to use, describe how it compromises OS implementations, and propose alternatives.

> As the designers and implementers of operating systems, we should acknowledge that fork’s continued existence as a first-class OS primitive holds back systems research, and deprecate it. As educators, we should teach fork as a historical artifact, and not the first process creation mechanism students encounter.

▲

anarazel an hour ago | parent | next [-]

It is somewhat interesting that the most widely used "big" OS that doesn't use fork, i.e. Windows, has dog slow process creation...

I agree that there should be non-fork primitives, I'm just not that sure that performance is the best argument.

▲

mort96 28 minutes ago | parent | next [-]

The problem with fork isn't really that it's slow. The problem is that if you want it to be not-slow, it locks you into a bunch of OS design decisions: you more or less need a memory subsystem where all writable pages are refcounted and copy-on-write when the refcount is bigger than 1, and you need overcommit.

Now these decisions aren't objectively bad, but they have significant trade-offs and it's probably not a good idea that they're forced simply because we use fork()+exec() for process creation.

	▲	theK a minute ago \| parent [-]
		Didn't he just say that fork turns out to be comparatively faster to the non-fork samples we get? Ie Linux forks faster than Microsoft's kernels?

▲

pjmlp 43 minutes ago | parent | prev | next [-]

Because that OS best practices is to use threads.

Traditionally Windows applications that create processes all the time come from UNIX heritage.

Contrary to UNIX, Windows NT was designed with threads first mentality, from the get go.

While on UNIX they were added after fact, and to this day there are gotchas mixing posix threads with signals, fork and exec.

▲

zozbot234 28 minutes ago | parent [-]

Windows was designed with threads-first mentality because on pre-386 machines you don't have viable process memory protection, so your tasks share memory by necessity. This is not a great argument.

	▲	epcoa 2 minutes ago \| parent \| next [-]
		This is not true. NT never had fork, was always based on the assumption of an MMU and Dave Cutler was a well known fork hater in the 80s even before this paper came out. Even when Windows 95 was coming out, the contemporary baseline were systems with an MMU. CreateThread was initially designed for NT in 1993 though.
	▲	JdeBP 7 minutes ago \| parent \| prev [-]
		Windows NT was never designed with pre-386 machines in mind. That was the territory of the old DOS+Windows. Windows NT from the get-go was for machines with page-based virtual memory. * https://computernewb.com/~lily/files/Documents/NTDesignWorkb...

▲

aseipp 31 minutes ago | parent | prev | next [-]

I suspect it's a long tail sort of thing; it mostly doesn't matter except when it really matters. It's interesting that the stated motivation for the patch is in the context of agentic tools spawning subcommands. There's some related prior art in this area where the payoffs could be much greater, like fuzzing: https://gts3.org/assets/papers/2017/xu:os-fuzz.pdf is an example. It would be very interesting to see this patch applied to e.g. AFL++

▲

nvme0n1p1 40 minutes ago | parent | prev | next [-]

That's not the reason for the performance difference. Windows does have a fork primitive (ZwCreateProcess) and it's still slower than Linux's equivalent.

▲

42 minutes ago | parent | prev [-]

[deleted]

▲

aseipp an hour ago | parent | prev | next [-]

This paper is great and I also really like one of its references [29] as it goes into some more subtle parts of scalable interfaces, including fork. It's a gem IMO: The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors https://people.csail.mit.edu/nickolai/papers/clements-sc.pdf

▲

omoikane an hour ago | parent | prev | next [-]

Discussion at the time:

https://news.ycombinator.com/item?id=19621799 - A fork() in the road (2019-04-10, 178 comments)

▲

pizlonator an hour ago | parent | prev [-]

Fork is marvelous for the zygote pattern

Hard to come up with an optimization that is equally efficient and elegant

▲

toast0 35 minutes ago | parent | next [-]

The zygote pattern[1] is a great optimization to deal with the cost of forking, but IMHO, being able to inexpensively spawn a carefully tailored process regardless of the size and scope of the current process would be better.

I would guess it would be a small difference in measurable performance between zygote and a direct clean spawn, but it's one less trick an application needs to do, and it would be very helpful for libraries that spawn things. Spawning inside a library isn't always a great thing to do, but some things would really benefit from process level isolation.

[1] In case one isn't aware, the zygote pattern involves forking a 'zygote' process during application startup, and having that process do any forks that need to happen during application runtime. This reduces the cost of forking in large applications, because the zygote will have few fds open and use little memory. This lets your large application spawn new processes without delaying the application or the startup of the new processes. Some applications will spawn many zygotes to allow parallelism for spawning at runtime.

▲

pizlonator 23 minutes ago | parent [-]

You're referring to something else, and maybe I'm using the term "zygote" incorrectly.

In all uses of zygotes that I have seen, here's what's really happening:

- `fork` is being used to reduce the cost of starting a process that has a high start-up cost. So, you start one process, run it through the expensive initialization, and then fork it from there to start new processes.

- To make this even faster, you have a pool of pre-forked processes sit around.

- Having pre-forked processes sitting around ready to be used is not expensive because of the CoW property and the fact that a process that forks and then immediately pauses will not have triggered any significant CoW yet.

So, the zygote optimization you speak of is in practice only meaningful on top of systems that are using an optimization uniquely enabled by `fork` (avoiding process initialization costs by cloning a process), and that zygote optimization is further optimized by another property of `fork` (memory sharing of forked processes that haven't done anything else yet).

	▲	toast0 8 minutes ago \| parent [-]
		Oh I see. I guess your zygotes have developed more than mine. I think Google may have coined or at least popularized the term zygote for this in Chrome and Android, Chrome documentation [1] says: > A zygote process is one that listens for spawn requests from a main process and forks itself in response. Generally they are used because forking a process after some expensive setup has been performed can save time and share extra memory pages. I think reading the first sentance and stopping covers my zygote, but adding the second sentance covers yours. So I think we're both right! I think both paths are useful. If your children need time to startup and become ready, spawn one that does start up work, and then it (pre)forks at the ready state to have processes ready to handle requests (your zygote). But if forking is expensive at runtime because you have a million FDs open and a whole lot of memory allocations, spawn spawners before you start doing work (my zygote). Of course, you can also use my zygotes to spawn your zygotes. Zygoteception. [1] https://chromium.googlesource.com/chromium/src/+/HEAD/docs/l...

▲

vlovich123 23 minutes ago | parent | prev [-]

The paper explicitly covers it that various memory COW/snapshot mechanisms are probably faster and safer than the zygote pattern. As it stands getting the zygote pattern correct and safe is something you have to plan for upfront. You can’t retrofit it which is why the paper mentions it has poor composability. Also the advantages of the zygote pattern can be overstated since the memory sharing benefit is minimal since it has to happen so early and modern OSes already transparently CoW duplicate pages in the background.