| ▲ | How to escape the Linux networking stack(blog.cloudflare.com) |
| 107 points by meysamazad 12 hours ago | 26 comments |
| |
|
| ▲ | marginalia_nu 5 hours ago | parent | next [-] |
| This is extremely tangential, but I was working on setting up some manual network namespaces recently, basically manually reproducing what docker does to fix some of its faulty assumptions regarding containers having multiple IPs and a single name causing all sort of jank, and had to freshen up on a lot of Linux virtual networking concepts (namespaces, veths, bridge networks, macvlans and various other interfaces), made a ton of fairly informal notes to make myself sufficiently familiar with the thing to set it up. Would anyone be interested if I polished it up and maybe added a refresher on the relevant layer 2 networking needed to reason about it? It's a fair bit of work and it's a niche topic, so I'm trying to poll a bit to see if the juice is worth the squeeze. |
| |
| ▲ | HumanOstrich 3 hours ago | parent | next [-] | | I was actually going down rabbitholes today trying to figure out how to do a sane Docker setup where all the containers couldn't connect to each other. Your notes would be valuable at most any level of polish. | |
| ▲ | msbhvn 4 hours ago | parent | prev | next [-] | | Please do it, I'm very biased but I think there would be lots of interest in seeing all that explained in one place in a coherant fashion (you will likely sharpen your own understanding in the process and have the perfect resource for when you next need to revisit these topics). | |
| ▲ | ambicapter 2 hours ago | parent | prev | next [-] | | I would absolutely be interested. | |
| ▲ | MrResearcher 3 hours ago | parent | prev | next [-] | | Don't forget to post the link here! | |
| ▲ | manuelangel99 4 hours ago | parent | prev | next [-] | | I would def. be interestred! | |
| ▲ | globalnode 2 hours ago | parent | prev [-] | | i await your write up! |
|
|
| ▲ | seabrookmx 9 hours ago | parent | prev | next [-] |
| I had to read their article on "soft-unicast" before I could really grok this one: https://blog.cloudflare.com/cloudflare-servers-dont-own-ips-... |
|
| ▲ | notepad0x90 8 hours ago | parent | prev | next [-] |
| I'm slightly surprised cloudflare isn't using a userspace tcp/ip stack already (faster - less context switches and copies). It's the type of company I'd expect to actually need one. |
| |
| ▲ | Droobfest 7 hours ago | parent | next [-] | | From 2016: https://blog.cloudflare.com/why-we-use-the-linux-kernels-tcp... | | |
| ▲ | notepad0x90 7 hours ago | parent [-] | | Nice, they know better. But it also makes me wonder, because they're saying "but what if you need to run another app", I'd expect for things like loadbalancers for example, you'd only run one app per server on the data plane, the user space stack handles that, and the OS/services use a different control plane NIC with the kernel stack so that boxes are reachable even if there is link saturation, ddos,etc.. It also makes me wonder, why is tcp/ip special? The kernel should expose a raw network device. I get physical or layer 2 configuration happening in the kernel, but if it is supposed to do IP, then why stop there, why not TLS as well? Why run a complex network protocol stack in the kernel when you can just expose a configured layer 2 device to a user space process? It sounds like "that's just the way it's always been done" type of a scenario. | | |
| ▲ | wmf 7 hours ago | parent | next [-] | | AFAIK Cloudflare runs their whole stack on every machine. I guess that gives them flexibility and maybe better load balancing. They also seem to use only one NIC. why is tcp/ip special? The kernel should expose a raw network device. ... Why run a complex network protocol stack in the kernel when you can just expose a configured layer 2 device to a user space process? Check out the MIT Exokernel project and Solarflare OpenOnload that used this approach. It never really caught on because the old school way is good enough for almost everyone. why stop there, why not TLS as well? kTLS is a thing now (mostly used by Netflix). Back in the day we also had kernel-mode Web servers to save every cycle. | | | |
| ▲ | hansvm 3 hours ago | parent | prev | next [-] | | TCP/IP is, in theory (AFAIK all experiments related to this fizzled out a decade or two ago), a global resource when you start factoring in congestion control. TLS is less obviously something you would want kernel involvement from, give or take the idea of outsourcing crypto to the kernel or some small efficiency gains for some workloads by skipping userspace handoffs, with more gains possible with NIC support. | | |
| ▲ | notepad0x90 3 minutes ago | parent | next [-] | | why can't it be global and user space? DNS resolution for example is done by user space, and it is global. | |
| ▲ | Veserv 3 hours ago | parent | prev [-] | | You do want to offload crypto to dedicated hardware otherwise your transport will get stuck at a paltry 40-50 Gb/s per core. However, you do not need more than block decryption; you can leave all of the crypto protocol management in userspace with no material performance impact. |
| |
| ▲ | rcxdude 5 hours ago | parent | prev [-] | | You can do that if you want, but I think part of why tcp/ip is a useful layer of abstraction is it allows more robust boundaries between applications that may be running on the same machine. If you're just at layer 2 you are basically acting in behalf of the whole box. |
|
| |
| ▲ | nomel 5 hours ago | parent | prev [-] | | > faster - less context switches and copies Aren't neither required these days with the "async" like and zero-copy interfaces that are now available (like io_uring, where it's still handled by the kernel), along with the nearly non-existence of single core processors in modern times? |
|
|
| ▲ | alecco 6 hours ago | parent | prev | next [-] |
| Being a networking company I always wondered why did they pick Linux over FreeBSD. |
| |
|
| ▲ | snvzz 4 hours ago | parent | prev | next [-] |
| Tangentially related, seL4's LionsOS can now act as a router/firewall[0]. 0. https://news.ycombinator.com/item?id=45959952 |
|
| ▲ | lazyeye 9 hours ago | parent | prev [-] |
| SLATFATF - "So long and thanks for all the fish" is a Douglas Adams quote https://en.wikipedia.org/wiki/So_Long,_and_Thanks_for_All_th... |
| |
| ▲ | cestith 7 hours ago | parent [-] | | A few things in the article are Douglas Adams quotes, and more specifically from the Hitchhiker’s Guide series. Creating the universe being regarded as a mistake and making many unhappy is from those books. Whenever someone figures out the universe it gets replaced with something stranger and having evidence that’s happened repeatedly is too. The Restaurant at the End of the Universe is reference in the article. I’m a bit surprised nothing in the article was mentioned as being “mostly harmless”. | | |
| ▲ | gishh 3 hours ago | parent [-] | | One of these days I’ll figure out how to throw myself at the ground and miss. |
|
|