| ▲ | wparad 10 hours ago | ||||||||||||||||||||||
If the check can't be done, then everything stays stable, so I'm guessing the question is, "What happens if Route 53 does the check and incorrectly reports the result?" In that case, no matter what we are using there is going to be a critical issue. I think the best I could suggest at that point would be to have records in your zone that round robin different cloud providers, but that comes with its own challenges. I believe there are some articles sitting around regarding how AWS plans for failure and the fallback mechanism actually reduces load on the system rather than makes it worse. I think it would require in-depth investigation on the expected failover mode to have a good answer there. For instance, just to make it more concrete, what sort of failure mode are you expecting to happen with the Route 53 health check? Depending on that there could be different recommendations. | |||||||||||||||||||||||
| ▲ | indigodaddy 9 hours ago | parent [-] | ||||||||||||||||||||||
Have you considered the scenario of "everything is so dead in aws", that the check doesn't happen, plus the backends are dead too (this is assuming the backend services live in aws as well) ? But I'd guess in that case you'd know quickly enough from supplementary alerting (you guys don't seem the type to not have some sort of awesome monitoring in place) and you have a different/worse DR problem on your hands. As far as the OP's point though, I'm going to probably assume that the health checks need to stay within/from AWS because 3rd party health checks could taint/dilute the point of the in-house AWS HC service to begin with. | |||||||||||||||||||||||
| |||||||||||||||||||||||