Remix.run Logo
jeltz 9 hours ago

Turns out the amd machine had huge tables enabled and after disabling those the regression was there on and too. So arm vs amd was a red herring.

Of course not a nice regression but you should not run PostgreSQL on large servers without huge pages enabled so thud regression will only hurt people who have a bad configuration. That said I think these bad configurations are common out there, especially in containerized environments where the one running PostgreSQL may not have the ability to enable huge pages.

whizzter 9 hours ago | parent | next [-]

Still that huge a regression that affects multiple platforms doesn't sound too neat, did they narrow down the root cause?

db48x an hour ago | parent [-]

That should be obvious to anyone who read the initial message. The regression was caused by a configuration change that changed the default from PREEMPT_NONE to PREEMT_LAZY. If you don’t know what those options do, use the source. (<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...>)

db48x 8 hours ago | parent | prev [-]

Yes, I had a good laugh at that. It might technically be a regression, but not one that most people will see in practice. Pretty weird that someone at Amazon is bothering to run those tests without hugepages.

scottlamb 3 hours ago | parent [-]

I doubt they explicitly said "I'll run without huge pages, which is an important AWS configuration". They probably just forgot a step. And "someone at Amazon" describes a lot of people; multiply your mental probability tables accordingly.

db48x an hour ago | parent [-]

The number of people at Amazon is pretty much irrelevant; the org is going to ensure that someone is keeping an eye on kernel performance, but also that the work isn’t duplicative.

Surely they would be testing the configuration(s) that they use in production? They’re not running RDS without hugepages turned on, right?