Our main cluster is on azure west us but we have another cluster on amazon east and route53 on top of that. When the main clusters fails, route53 switch to secondary, so we where not affected at all this time.
The only manual step was to delay the switch back until our vms where working fine and had all resources. We do this changing route53 health check to one that is always failing.
We had also to purge our crashed mongo nodes because the journal was broken.
The only manual step was to delay the switch back until our vms where working fine and had all resources. We do this changing route53 health check to one that is always failing.
We had also to purge our crashed mongo nodes because the journal was broken.
https://auth0.com/availability-trust/img/auth0-infrastructur...