Remix.run Logo
shadowgovt 2 hours ago

In addition, it looks like this system wasn't on any kind of 1%/10%/50%/100% rollout gating. Such a rollout would trivially have shown the poison input killing tasks.

penteract an hour ago | parent | next [-]

To me it reads like there was a gradual rollout of the faulty software responsible for generating the config files, but those files are generated on approximately one machine, then propogated across the whole network every 5 minutes.

> Bad data was only generated if the query ran on a part of the cluster which had been updated. As a result, every five minutes there was a chance of either a good or a bad set of configuration files being generated and rapidly propagated across the network.

helloericsf an hour ago | parent | prev [-]

Not a DBA, how do you do DB permission rollout gating?