| ▲ | adsharma 6 hours ago |
| > We plan to deliver improvements to [..] purging mechanisms During my time at Facebook, I maintained a bunch of kernel patches to improve jemalloc purging mechanisms. It wasn't popular in the kernel or the security community, but it was more efficient on benchmarks for sure. Many programs run multiple threads, allocate in one and free in the other. Jemalloc's primary mechanism used to be: madvise the page back to the kernel and then have it allocate it in another thread's pool. One problem: this involves zero'ing memory, which has an impact on cache locality and over all app performance. It's completely unnecessary if the page is being recirculated within the same security domain. The problem was getting everyone to agree on what that security domain is, even if the mechanism was opt-in. https://marc.info/?l=linux-kernel&m=132691299630179&w=2 |
|
| ▲ | jcalvinowens 2 hours ago | parent | next [-] |
| I'm really surprised to see you still hocking this. We did extensive benchmarking of HHVM with and without your patches, and they were proven to make no statistically significant difference in high level metrics. So we dropped them out of the kernel, and they never went back in. I don't doubt for a second you can come up with specific counterexamples and microbenchnarks which show benefit. But you were unable to show an advantage at the system level when challenged on it, and that's what matters. |
| |
| ▲ | adsharma 2 hours ago | parent [-] | | You probably weren't there when servers were running for many days at a time. By the time you joined and benchmarked these systems, the continuous rolling deployment had taken over. If you're restarting the server every few hours, of course the memory fragmentation isn't much of an issue. > But you were unable to show an advantage at the system level when challenged on it, and that's what matters. You mean 5 years after I stopped working on the kernel and the underlying system had changed? I don't recall ever talking to you on the matter. | | |
| ▲ | jcalvinowens 2 hours ago | parent [-] | | > By the time you joined and benchmarked these systems, the continuous rolling deployment had taken over Nope, I started in 2014. > I don't recall ever talking to you on the matter. I recall. You refused to believe the benchmark results and made me repeat the test, then stopped replying after I did :) | | |
| ▲ | adsharma an hour ago | parent | next [-] | | The patches were written in 2011 and published in 2012. They did what they were supposed to at the time. For the peanut gallery: this is a manifestation of an internal eng culture at fb that I wasn't particularly fond of. Celebrating that "I killed X" and partying about it. You didn't reply to the main point: did you benchmark a server that was running several days at a time? Reasonable people can disagree about whether this a good deployment strategy or not. I tend to believe that there are many places which want to deploy servers and run for months if not days. | | |
| ▲ | alexgartrell an hour ago | parent [-] | | For the peanut gallery more: I worked with both of these guys at Meta on this. The "servers are only on for a few hours" thing was like never true so I have no idea where that claim is coming from. The web performance test took more than a few hours to run alone and we had way more aggressive soaks for other workloads. My recollection was that "write zeroes" just became a cheaper operation between '12 and '14. A fun fact to distract from the awkwardness: a lot of the kernel work done in the early days was exceedingly scrappy. The port mapping stuff for memcached UDP before SO_REUSEPORT for example. FB binaries couldn't even run on vanilla linux a lot of the time. Over the next several years we put a TON of effort in getting as close to mainline as possible and now Meta is one of the biggest drivers of Linux development. | | |
| ▲ | adsharma an hour ago | parent [-] | | [ Edit: "servers" in this context meant the HHVM server processes, not the physical server which of course had a longer uptime ] People got promoted for continuous deployment https://engineering.fb.com/2017/08/31/web/rapid-release-at-m... I think it's fair to say the hardware changed, the deployment strategy changed and the patches were no longer relevant, so we stopped applying them. When I showed up, there were 100+ patches on top of a 2009 kernel tree. I reduced the size to about 10 or so critical patches, rebased them at a 6 months cadence over 2-3 years. Upstreamed a few. Didn't go around saying those old patches were bad ideas and I got rid of them. How you say it matters. | | |
| ▲ | alexgartrell 32 minutes ago | parent [-] | | The linked article says they decided to do CD in 2016 fwiw so that's not inconsistent with what I said. You reduced the number of patches a lot and also pushed very hard to get us to 3.0 after we sat on 2.6.38 ~forever. Which was very appreciated, btw. We built the whole plan going forward based on this work. I'm not arguing that anyone should be nice to anyone or not (it's a waste of breath when it comes to Linux). I'm just saying that the benchmarking was thorough and that contemporary 2014 hardware could zero pages fast. |
|
|
| |
| ▲ | nullpoint420 an hour ago | parent | prev [-] | | This is why I love hacker news. I learn so much from these moments. | | |
|
|
|
|
| ▲ | asveikau 2 hours ago | parent | prev [-] |
| Maybe I'm misreading, but considering it OK to leak memory contents across a process boundary because it's within a cgroup sounds wild. |
| |
| ▲ | adsharma 2 hours ago | parent [-] | | It wasn't any cgroup. If you put two untrusting processes in a memory cgroup, there is a lot that can go wrong. If you don't like the idea of memory cgroups as a security domain, you could tighten it to be a process. But kernel developers have been opposed to tracking pages on a per address space basis for a long time. On the other hand memory cgroup tracking happens by construction. |
|