Working in the AWS public cloud; one has to adapt to a world of guaranteed failure at unpredictable times. Utilizing a combination of LTM for HA within a single availability zone and GTM across availability zones and regions provides an architecture to survive the chaos monkeys.
The following is part of a demo environment that I built for AWS, it highlights a couple of useful features of LTM/GTM including:
Monitoring LTM services from GTM
Using GTM for outbound DNS resolution
Creating split DNS records for EIP vs. internal VPC IP
Create topology LB records to avoid cross-AZ communication for active-active services
Here’s what the overall architecture looks like:
We create a couple of wide IP records to show different failure scenarios (these DNS records are isolated to my demo environment):
Active/Standby from D to E
Active/Standby from E to D
Round robin between D and E (unless internal request)
Here’s what things look like when everything is OK from an external user:
Or another view using a command line view with curl:
Taking a look from an internal user we can see that the behavior is slightly different. In this case a request from the US-EAST-1D AZ will always request from the same AZ when communicating with the active-active service. It also accesses the service using the internal IP address. This provides some cost savings if you have services that are very data heavy to avoid cross-AZ data billing charges.
Performing a failure of the D services (stopping the web server) we can see that initially connections fail while the client is still trying to access services use the US-EAST-1D IP address:
Once the client refreshes its DNS record (default 30 second TTL) we can see that it is now only communicating with US-EAST-E services: