Remix.run Logo
scottlamb 2 hours ago

I don't think the author intended "code simplicity" as an end unto itself but a way to reduce cache pressure. He popped into the 2016 discussion [1] to say:

> Another benefit of this design overlooked is that individual cores may not ever need to read memory -- the entire task can run in L1 or L2. If a single worker becomes too complicated this benefit is lost, and memory is much much slower than cache.

I think this is wrong or at least overstated: if you're passing off fds and their associated (kernel- and/or user-side) buffers between cores, you can't run entirely in L1 or L2. And in general, I'd expect data to be responsible for much more cache pressure than code, so I'm skeptical of localizing the code at the expense of the data.

But anyway, if the goal is to organize which cores are doing the work, splitting a single core's work from a single thread (pinnned to it) to several threads (still pinned to it) doesn't help. It just introduces more context switching.

[1] https://news.ycombinator.com/item?id=10874616