DNS based failover between AWS Availability Zones and Split DNS
Working in the AWS public cloud; one has to adapt to a world of guaranteed failure at unpredictable times. Utilizing a combination of LTM for HA within a single availability zone and GTM across availability zones and regions provides an architecture to survive the chaos monkeys.
The following is part of a demo environment that I built for AWS, it highlights a couple of useful features of LTM/GTM including:
- Monitoring LTM services from GTM
- Using GTM for outbound DNS resolution
- Creating split DNS records for EIP vs. internal VPC IP
- Create topology LB records to avoid cross-AZ communication for active-active services
Demo
Here’s what the overall architecture looks like:
We create a couple of wide IP records to show different failure scenarios (these DNS records are isolated to my demo environment):
- prefer-d.f5demo.com
- Active/Standby from D to E
- prefer-e.f5demo.com
- Active/Standby from E to D
- active-active.f5demo.com
- Round robin between D and E (unless internal request)
Here’s what things look like when everything is OK from an external user:
Or another view using a command line view with curl:
Taking a look from an internal user we can see that the behavior is slightly different. In this case a request from the US-EAST-1D AZ will always request from the same AZ when communicating with the active-active service. It also accesses the service using the internal IP address. This provides some cost savings if you have services that are very data heavy to avoid cross-AZ data billing charges.
Performing a failure of the D services (stopping the web server) we can see that initially connections fail while the client is still trying to access services use the US-EAST-1D IP address:
Once the client refreshes its DNS record (default 30 second TTL) we can see that it is now only communicating with US-EAST-E services:
Setting it up
To create this demo you’ll need:
- LTM/GTM devices
- Two AZ
- Some backend services
Once you have these you can build out a standard LTM/GTM environment in AWS. Create a DNS cache (required for Topology LB). Point your AWS instances to your DNS cache listeners (make sure this is only accessible to your internal clients!!!). Build up some Split DNS / topology records.
You can also extend this example to go cross-Region as well (with the limitation that your internal IP space would not be accessible cross VPC)!
More Chaos
The example above illustrates a single failure scenario (loss of US-EAST-1D web services). You can imagine that there’s several other scenarios that could cause a failure including and not limited to:
- Loss of EAST-1E services
- Loss of a single AZ
- Loss of a single LTM/GTM device
The demo environment is built to survive these and keep on running despite the best efforts of any chaos monkeys.
- Eric_ChenEmployee
Also take a look at https://devcentral.f5.com/s/articles/aws-advanced-ha-iapp for another way to perform cross-AZ failover.