| ▲ | shadowgovt 2 hours ago | |
In addition, it looks like this system wasn't on any kind of 1%/10%/50%/100% rollout gating. Such a rollout would trivially have shown the poison input killing tasks. | ||
| ▲ | penteract an hour ago | parent | next [-] | |
To me it reads like there was a gradual rollout of the faulty software responsible for generating the config files, but those files are generated on approximately one machine, then propogated across the whole network every 5 minutes. > Bad data was only generated if the query ran on a part of the cluster which had been updated. As a result, every five minutes there was a chance of either a good or a bad set of configuration files being generated and rapidly propagated across the network. | ||
| ▲ | helloericsf an hour ago | parent | prev [-] | |
Not a DBA, how do you do DB permission rollout gating? | ||