Configuring AWS HA Failover Across AZs Without EIPs Using F5 Cloud Failover Extension (CFE)

I recently had a requirement to deploy BIG-IP appliances in AWS across AZs without Elastic IPs (EIPs). F5 has plenty of documentation to achieve HA in AWS, but they all rely on utilizing EIPs.

Here are some of these articles:

F5 Cloud Formation Templates hosted on our GitHub site and Terraform deployments by my colleague, Jeff Giroux, https://github.com/JeffGiroux/f5_terraform can deploy HA across AZs, but they all require an EIP.

For my customer I was given 2 requirements:

HA across AZs. In this architecture, we required a pair of BIG-IP devices in Active/Standby, where each device was in a different AZ. I needed to be able to fail over between devices.
No EIPs. This requirement existed because all applications hosted in their AWS environment were internal to their organization. No users accessing the applications were coming from the public internet. So they would be using private, non-routed IP address space. So there was no reason to associate an EIP (a public IP address) with the BIG-IP interface.

Assumptions:

Two BIG-IPs are running in an HA pair in AWS in 2 different AZs in the same region
CFE has been installed on each BIG-IP; if not see this article to install it: https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/installation.html
If you used a CFT or Terraform to build the HA pair, then appropriate labeling of the S3 bucket should be in place
If you used a CFT or Terraform to build the HA pair, then the appropriate IAM Role should be created with permissions

Note: Even if your environment will not have internet access by way of EIPs or NAT gateway, I would recommend using a CFT to build the BIG-IPs. In addition to building the BIG-IPs and all of its AWS components, it will create an S3 bucket (tagged), IAM Roles (assigned to instances), and ENI tagging.

Configure an “alien IP range”

You will need to choose an IP range for your VIP network that does not fall within the CIDR that is assigned to your VPC; in my case I'll use 192.168.0.0/24. Let’s call it an “alien range” because it "doesn’t belong" in your VPC (it falls outside of the CIDR block of your VPC) and therefore you could not assign IP addresses from this range to your AWS ENIs directly. Despite that, now create a route table within AWS that points this “alien range” to your Active BIG-IP device’s Public/External ENI. Don’t forget to associate the route table with specific subnets, per your design. Alternatively, you could add this route to the default VPC route table.

CFE Declaration

Next, you will need to send a CFE Declaration with a REST API Client, such as Postman, to your BIG-IP. If the CFE has been installed as part of the initial onboarding, there will be a CFE declaration on each BIG-IP already. Care should be taken not to overwrite an existing declaration by NOT including that in the declaration below. This declaration is synced to the other unit in the pair, so it may not be necessary to send it to the other unit. It is the same exact declaration, so you can just send it to both.

{
        "class": "Cloud_Failover",
        "controls": {
            "class": "Controls",
            "logLevel": "silly"
        },
        "environment": "aws",
        "externalStorage": {
            "scopingTags": {
                "f5_cloud_failover_label": "mydeployment"
            }
        },
        "failoverRoutes": {
            "enabled": true,
            "routeGroupDefinitions": [
                {
            	"scopingTags": {
             				"f5_cloud_failover_label": "mydeployment"
         		},
                    "scopingAddressRanges": [
                        {
                            "range": "ALIEN_RANGE_IN_CIDR"
                        }
                    ],
                    "defaultNextHopAddresses": {
                        "discoveryType": "static",
                        "items": [
                            "BIGIP-A_EXTERNAL_SELF",
                            "BIGIP-B_EXTERNAL_SELF"
                        ]
                    }
                }
            ]
        }
    }

Note: This configuration utilizes static identification of the defaultNextHopAddresses: IP addresses of the BIG-IP's External Self-IPs:

                    "defaultNextHopAddresses": {
                        "discoveryType": "static",
                        "items": [
                            "BIGIP-A_EXTERNAL_SELF",
                            "BIGIP-B_EXTERNAL_SELF"
                        ]
                    }

Alternatively, you can use "discoveryType":"routeTag". You will need to add another tag to the route table in your cloud environment with the reserved key f5_self_ips. For example, "f5_self_ips":"IP_ADDRESS_OF_BIGIP-A_EXTERNAL_SELF, IP_ADDRESS_OF_BIGIP-B_EXTERNAL_SELF"

The declaration would look like this:

{
        "class": "Cloud_Failover",
        "controls": {
            "class": "Controls",
            "logLevel": "silly"
        },
        "environment": "aws",
        "externalStorage": {
            "scopingTags": {
                "f5_cloud_failover_label": "mydeployment"
            }
        },
        "failoverRoutes": {
            "enabled": true,
            "routeGroupDefinitions": [
                {
                    "scopingTags": {
                        "f5_cloud_failover_label": "mydeployment"
                    },
                    "scopingAddressRanges": [
                        {
                            "range": "ALIEN_RANGE_IN_CIDR"
                        }
                    ],
                    "defaultNextHopAddresses": {
                        "discoveryType": "routeTag"
                    }
                }
            ]
        }
    }

Required tags on the Route table:

"f5_cloud_failover_label": "mydeployment"
"f5_self_ips": "BIGIP-A_EXTERNAL_SELF,BIGIP-B_EXTERNAL_SELF"

Note: the “f5_cloud_failover_label: mydeployment” in this example is key-value pair that will correspond to the key-value pair in the failoverAddresses.scopingTags section of the CFE declaration. These should match what is used in your original CFE deployment as they must correspond to the label on your S3 bucket as well as the IAM Role configuration already deployed.

Virtual Server Creation

Now, all you have to do is create your virtual servers using the 192.168.0.0/24 subnet. Ensure that source/destination checking is disabled on the ENIs that your AWS routes are pointing to (on both Standby and Active devices). As a failover occurs, the CFE of the new Active unit will send a POST to the AWS EC2 API so that the previously recorded route will now point to the ENI of the new Active unit. You can watch this happen as you perform a failover by looking at the route table and see the “Target” of your 192.1.68.0.0/24 Destination change ENIs.

Test access to your VS at the 192.168.0.x address. Verify traffic traverses the Active BIG-IP. Then perform a failover and again test access to the VS verifying traffic traverse the newly Active BIG-IP.

Completed design:

After the above configuration, you should have an environment that looks like the diagram below:

Common issues

In my experience, customers that are having difficulty with failover in AWS are usually falling into one of these problems, ordered from most to least common:

Outbound internet access. The BIG-IP that is becoming Active during a failover event will send API calls to AWS, specifically to the EC2 endpoint and the S3 endpoint. These are available on the public Internet, which is why outbound Internet access is required for the CFE successfully perform updates at failover.
- There is a doumented solution for environments where no outbound internet access is allowed. This is to use AWS private endpoints for AWS api calls, and my colleague Arnulfo wrote an article on this.
Tags. Tags are easy to misconfigure. With this scenario, the Route Table must be tagged with at least one tag, and two if you are using tags to identify target ENI's. The ENI's need to be tagged, and the S3 bucket must be tagged.
CFE configuration declaration.
1. The values from tags must match what is in the CFE config file. The keys should not be changed. Ie., "f5_cloud_failover_label" and "f5_self_ips" are tag keys, and these should not be changed. Values are what you should configure.
2. CFE config file is in JSON format. It's easy to accidentally overlook a section, so double-check this.
Permissions. Each BIG-IP must have permissions to update AWS resources like ENI's and route tables. This is defined in the IAM Role, which is assigned to the BIG-IP EC2 instances. This is sometimes overlooked, edited, or otherwise misconfigured.

Troubleshooting tips

The CFE documentation does provide some troubleshooting tips. Here are my favorite lessons learned:

The log file will usually tell you exactly what is going on. We want to look at this file on the unit that was standby and is becoming active. Either check out this file after a failover or tail the file live during a test failover scenario:
```
 tail -f /var/log/restnoded/restnoded.log
```
1. Example: if there is no outbound internet access allowed to the device, you will see attempts to make API calls in the logs, but those attempts will be rejected with a message like "Connection refused".
2. Example: if you see API calls being made but the response is "Permission Denied", you know you likely have an issue with a IAM Role with insufficient privileges, or an IAM Role that is not correctly assigned to the BIG-IP.
Don't forget to configure your CFE log level to silly while troubleshooting.
Tcpdump is the network admin's friend. From the Standby unit going Active:

 tcpdump -nni EXTERNAL_VLAN port 443 and host EXTERNAL_SELF-IP

There are several EC2 and S3 API endpoints your BIG-IP will connect to over HTTPS port 443. Tcpdump may show you a connection attempt that is not completing so you can troubleshoot connectivity further.

tcpdump -nn port 53

This will show you all of the DNS requests that BIG-IP makes while it is going Active to see which EC2 and S3 API endpoints it needs to connect to to assist in troubleshooting.

Note: Don’t be surprised to see the names of S3 buckets that are associated with your AWS account but are not related to your deployment. The CFE has retrieved a list of all of the S3 buckets in your account. It must then determine which S3 bucket has a tag that matches the tag in your CFE declaration defined here:

"externalStorage": {
            "scopingTags": {
                "f5_cloud_failover_label": "mydeployment"
            }
        }

Due to an IAM Role assigned to your BIG-IP, it will only be able to access the bucket deployed with your BIG-IPs. You will see log messages in restnoded.log that look like this; this message can safely be ignored:

[f5-cloud-failover] Unable to get SOME_OTHER_BUCKETNAME region info. AccessDenied: Access Denied.

Conclusion

You now have configured AWS and your F5 BIG-IPs where you can run a pair of VEs in an HA Failover configuration across Availability Zones in AWS without using Elastic IPs (EIPs). Due to the extensibility of the BIG-IP platform, the CFE can accommodate all types of configurations that meet your secure cloud computing requirements.

Updated Jun 06, 2023

Version 2.0

cloud