> ABSTRACT
> The received wisdom suggests that Unix’s unusual combination of fork() and exec() for process creation was an inspired design. In this paper, we argue that fork was a clever hack for machines and programs of the 1970s that has long outlived its usefulness and is now a liability. We catalog the ways in which fork is a terrible abstraction for the modern programmer to use, describe how it compromises OS implementations, and propose alternatives.
> As the designers and implementers of operating systems, we should acknowledge that fork’s continued existence as a first-class OS primitive holds back systems research, and deprecate it. As educators, we should teach fork as a historical artifact, and not the first process creation mechanism students encounter.
I agree that there should be non-fork primitives, I'm just not that sure that performance is the best argument.
Now these decisions aren't objectively bad, but they have significant trade-offs and it's probably not a good idea that they're forced simply because we use fork()+exec() for process creation.
At least on systems with virtual addressing. If you want to go into physical addressing, then yes, maybe it's a problem. But Linux will never touch anything with physical addressing, so I don't see what people are complaining about.
(Windows's fork is called ZwCreateProcess)
Traditionally Windows applications that create processes all the time come from UNIX heritage.
Contrary to UNIX, Windows NT was designed with threads first mentality, from the get go.
While on UNIX they were added after fact, and to this day there are gotchas mixing posix threads with signals, fork and exec.
Both systems are implemented using threads as the execution context, but in Unix, the history means that that you fork+exec most of the time, resulting in a two tasks that do not share memory any more. By contrast, on Windows (NT onward) the common case when creating a new execution context is to create a thread that shares memory with others in its process.
Both systems allow the easy use of the other's core abstraction. On Unix, you can either code like its 1986 and use fork without exec, or use clone(3) or any of its higher level abstractions like pthreads.
You're right that POSIX semantics get tangled when using threads.
* https://computernewb.com/~lily/files/Documents/NTDesignWorkb...
* https://computernewb.com/~lily/files/Documents/NTDesignWorkb...
Misread on purpose to make a point?
https://news.ycombinator.com/item?id=19621799 - A fork() in the road (2019-04-10, 178 comments)
Hard to come up with an optimization that is equally efficient and elegant
I would guess it would be a small difference in measurable performance between zygote and a direct clean spawn, but it's one less trick an application needs to do, and it would be very helpful for libraries that spawn things. Spawning inside a library isn't always a great thing to do, but some things would really benefit from process level isolation.
[1] In case one isn't aware, the zygote pattern involves forking a 'zygote' process during application startup, and having that process do any forks that need to happen during application runtime. This reduces the cost of forking in large applications, because the zygote will have few fds open and use little memory. This lets your large application spawn new processes without delaying the application or the startup of the new processes. Some applications will spawn many zygotes to allow parallelism for spawning at runtime.
In all uses of zygotes that I have seen, here's what's really happening:
- `fork` is being used to reduce the cost of starting a process that has a high start-up cost. So, you start one process, run it through the expensive initialization, and then fork it from there to start new processes.
- To make this even faster, you have a pool of pre-forked processes sit around.
- Having pre-forked processes sitting around ready to be used is not expensive because of the CoW property and the fact that a process that forks and then immediately pauses will not have triggered any significant CoW yet.
So, the zygote optimization you speak of is in practice only meaningful on top of systems that are using an optimization uniquely enabled by `fork` (avoiding process initialization costs by cloning a process), and that zygote optimization is further optimized by another property of `fork` (memory sharing of forked processes that haven't done anything else yet).
> A zygote process is one that listens for spawn requests from a main process and forks itself in response. Generally they are used because forking a process after some expensive setup has been performed can save time and share extra memory pages.
I think reading the first sentance and stopping covers my zygote, but adding the second sentance covers yours. So I think we're both right!
I think both paths are useful. If your children need time to startup and become ready, spawn one that does start up work, and then it (pre)forks at the ready state to have processes ready to handle requests (your zygote). This does require a traditional fork() to avoid duplication of work.
But if forking is expensive at runtime because you have a million FDs open and a whole lot of memory allocations, spawn spawners before you start doing work (my zygote). This could be unnecessary with a inexpensive way to spawn a new process from an process that has lots of resources in use.
Of course, you can also use my zygotes to spawn your zygotes. Zygoteception.
[1] https://chromium.googlesource.com/chromium/src/+/HEAD/docs/l...
I do use threaded code. It's significantly harder to write and reason about. (45 years in to a CS career, ageing out)
It's weird to leave out a mention of copy-on-write - the optimisation that means that you don't copy over all the memory.
That means you have to allocate new pages to hold a copy of all these structures, even if the actual memory pointed by the pages is shared. And walking all those structures to make a copy is still costly.
For the intended audience of such a paper this is base knowledge.
It shares way too much, and have huge use cases where it is really, really bad.
This is just an example of I don't even know how many things a modern-day process will share from its parent.
By "complicated" I do not even remotely mean "unsolvable". I just mean that if you really dig down into what it means to "share nothing" in a modern operating system, it's a lot richer than it was back when fork+exec was a practical solution. There's a lot of fuzzy things that could go either way when you say "shares nothing".
Isn't that what posix_spawn is for?
Windows, for all its many, many faults, did not use fork+exec and instead mostly has options for how one creates a process. It wasn’t done elegantly, but it was the right decision.
Any kind of replacement should aim for the same conceptual simplicity and power. Sadly, I fear that people driving development nowadays are more interested in building unbreakable walled gardens for advertisement or app stores, or trying to squeeze down the some small gain when used on the cloud. I am more interested in general computing on the user side.
* https://jdebp.uk/FGA/bernstein-on-ttys/cttys.html
Interestingly, on MS/PC/DR-DOS file descriptor 3 was stdaux. and file descriptor 4 was stdprn.
The Windows approach may be correct, but it suffers in performance from the POSIX perspective.
I have heard that WSL1 iimproves this.
Windows does not historically depend on fork(), so there was no native fork(), so Cygwin kludged it up.
If you want to greenfield re-engineer the world with all new system calls and a totally different execution model, feel free to go right ahead.
― George Bernard Shaw, probably.
I am curious about what the best way to handle the example in the article of one process spawning many git subprocesses is. Surely it just doesn't make sense to repeatedly start git from scratch in the course of a long-running parent operation. What's the low cost abstraction for the same result, though?
Yes, it's copy on write... but there is a linear relationship between the size of the process and the number of page table entries required to represent it.