The CloudFormation templates that are provided and supported by F5 are an excellent resource for customers to deploy BIG-IP VE in AWS. Along with these templates, documentation guiding your F5 deployment in AWS is an excellent resource.
And of course, DevCentral articles are helpful. I recommend reading about HA topologies in AWS to start. I hope my article today can shed more light on an architecture that will suit a specific set of requirements: No Elastic IP's (EIP’s), High Availability (HA) across AZ’s, and multiple VPC’s.
I recently had a requirement to deploy BIG-IP appliances in AWS across AZ’s. I had read the official deployment guide, but I wasn’t clear on how to achieve failover without EIP’s. I was given 3 requirements:
This is the easiest of the requirements to meet because F5 has already provided templates to deploy these in AWS. I personally used a 2-nic device, with a BYOL license, deployed to an existing VPC, so that meant my template was this one. After this deployment is complete, you should have a pair of devices that will sync their configuration.
The supported F5 templates will deploy with the Advanced HA iApp. It is important that you configure this iApp after you have completed your AWS deployments. The iApp uses IAM permissions deployed with the template to make API calls to AWS at the time of failover. The API calls will update the route tables that you specify within the iApp wizard. Because this iApp is installed on both devices, either device can update the route in your route tables to point to its own interface.
This article was first written Feb 2019, and in Dec 2019 F5 released the Cloud Failover Extension (CFE), which is a cloud-agnostic, declarative way to configure failover in multiple public clouds. You can use the CFE instead of the Advanced HA iApp to achieve high availability between BIG-IP devices in cloud.
Your API calls will typically be sent to the public Internet endpoints for AWS EC2 API calls. Optionally, you can use AWS VPC endpoints to keep your API calls out to AWS EC2 from traversing the public Internet. My colleague Arnulfo Hernandez has written an article explaining how to do this.
I’m recycling another DevCentral solution here. You will need to choose an IP range for your VIP network that does not fall within the CIDR that is assigned to your VPC. Let’s call it an “alien range” because it “doesn’t belong” in your VPC and you couldn’t assign IP addresses from this range to your AWS ENI’s. Despite that, now create a route table within AWS that points this “alien range” to your Active BIG-IP device’s ENI (if you’re using a 2+ nic device, point it to the correct data plane NIC, not the Mgmt. interface). Don’t forget to associate the route table with specific subnets, per your design. Alternatively, you could add this route to the default VPC route table.
Now create a VIP on your active device and configure the IP address as an IP within your alien range. Ensure the config replicates to your standby device. Ensure that source/destination checking is disabled on the ENI’s that your AWS routes are pointing to (on both Standby and Active devices). You should now have a VIP that you can target from other hosts in your VPC, provided that the route you created above is applied to the traffic destined to the VIP.
For extra credit, we’ll set up a Transit Gateway. This will allow other VPCs to route traffic to this “alien range” also, provided that the appropriate routes exist in the remote VPC’s, and also are applied to the appropriate Transit Gateway Route Table. Again, I’m recycling ideas that have already been laid out in other DevCentral posts.
I won’t re-hash how to set up a transit gateway in AWS, because you can follow the linked post above. Sufficed to say, this is what you will need to set up if you want to route between multiple VPC’s using a transit gateway:
I will point out that you need to add a route for your “alien range” in your transit gateway route table, and in your remote VPC’s. That way, hosts in your remote VPC’s will forward traffic destined to your alien range (VIP network) to the transit gateway, and the transit gateway will forward it to your VPC, and the route you created in Step A will forward that traffic to your active BIG-IP device.
After the above configuration, you should have an environment that looks like the diagram below:
Internet access for deployments: When you deploy your BIG-IP devices, they will need Internet access to pull down some resources, including the iApp. So if you are deploying devices into your existing VPC, make sure you have a reachable Internet Gateway in AWS so that the devices have Internet access through both their management interface, and their data plane interface(s).
Internet access for failover: Remember that an API call to AWS will still use an outbound request to the Internet. Make sure you allow the BIG-IP devices to make outbound connections over HTTPS to the Internet. If this is not available, you will find that your route tables are not updated at time of failover. (If you have a hard requirement that your devices should not have outbound access to internet, you can follow Arnulfo's guide linked to above and use VPC endpoints to keep this traffic on your local VPC)
iApp logs: you can enable this in the iApp settings. I have used the logs (in /var/ltm/log) to troubleshoot issues that I have created myself. That’s how I learned not to accidentally cut off Internet access for your devices!
Don’t forget about return routes if SNAT is disabled: Just like on-prem environments, if you disable SNAT, the pool member will need to route return traffic back to the BIG-IP device. You will commonly set up a default route (0.0.0.0/0) in AWS, point it at the ENI of the active BIG-IP device, and associate this route table with the subnet containing the pool members. If the pool members are in a remote VPC, you will need to create this route on the transit gw route table also.
Don’t accidentally cut off internet access: When you configure the default route of 0.0.0.0/0 to point to eth1 of the BIG-IP device, don’t apply this route to every subnet in your Security VPC. It may be easy to do so accidentally, but remember that it could cause the API calls to update route tables to fail when the Standby device becomes Active.
Don’t forget to disable source/dest check on your ENI’s. This is configured by the template, but if you have other devices that require it, remember to check this setting.