HA Cluster behavior on AWS
I've managed to run two F5 in different "Availability Zone" (in same VPC) from scratch and they can sync configuration objects that i created. CFE configuration took some time to figured out but eventually its is working right now. I have a couple of questions about HA cluster on Aws. Can you help me to understand?
- While testing this setup i saw that if active device goes offline for any reason, the peer device does nothing. Even, the CMI logs on stand-by unit says that the peer device unreachable. Is that normal? Should stand-by device take action to go active, after when it realized the peer is unreachable.
- Devices can sync objects i added including pools, iRules, nodes, virtual servers and etc. Also, the status indicator on top left corner says devices are "In Sync". However, when i looked at "Device Manager > Devices", each device sees other device is offline. Why?
- Despite there is a Sync-Failover device group configuration on both devices, each device says that they are "Active" itself. In setups with help of AWS Cloud Formation or Terraform guided, does this happen?
- When i use GUI for failover, it takes around 1 minute and 20 seconds. But if i trigger failover with "curl" command it only takes 5 seconds. Is that normal?
Let's break the problem down into two items. The upgrade and CFE.
When you upgrade an existing deployment software is installed into a new slot on the HDD and the configuration is imported. After the system(s) are rebooted the HA iAPP may need to reinstalled and apply the iAPP configuration applied to BOTH of the systems. A tell tale sign of the HA iAPP configuration not being applied to both of the systems is when the secondary device attempts to go active you do not see anything in the /var/log/ltm listing tg_active. If you do see the tg_active scripts firing on the system attempting to move from standby to active but the mapped configuration objects do not move then either the instance does not have the IAM permissions, does not have access to the EC2 API (normally via eth 0 - the exact interface will be exposed via the route command and looking at the route metrics), or the secondary elastic IP (public IP) were not allowed to be remapped with the system was deployed.
Migrating to CFE
Migrating from the HA iAPP to CFE requires you to remove the HA iAPP and then install CFE. The migration to CFE is depending on proper IAM roles, access to the EC2 API, and the S3 API endpoint (you can use VPC endpoints if necessary for these). A peer of mine who works with the integration lays out the steps as follows
- Gather EIPs as defined in HA-iApp to the Static elastic IP definitions https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/aws.html#define-the-failover-addresses-in-aws
- Gather Routes as defined in HA-iApp to Static Route Definitions https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/aws.html#define-the-routes-in-aws
- Uninstall HA-Iapp.
- Start fresh installation of CFE: https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/installation.html
- Configure by running through https://clouddocs.f5.com/products/extensions/f5-cloud-failover/latest/userguide/aws.html.
- i.e. the example is even the static config (with EIP mappings) so should be really close to the old iApp.
- Just remember to tag the Network Interfaces though, that’s the only thing that needs to be tagged with the static config.
- The hardest part might be migrating/getting the IAM role right as we have a much more granular role example now, we have a S3 bucket added, etc.
Upgrade or Deploy New
I cannot answer that for you as there are many nuances to the architecture and each carries with it some level of work..
A parallel deployment allows you to build out the new stack, operational aspects and then cut over the DNS records of the virtual IP addresses. It carries with it the work of having to migrate the virtual server configurations.
An upgrade without the installation of CFE us just a standard upgrade and then reapplying the HA iAPP config.
An upgrade after the installation of CFE will be similar to the HA iAPP (install the package post upgrade apply config)
I would separate the migration from the HA iAPP to CFE from the OS upgrade if you are not performing a parallel deployment. Why? Failing over in cloud requires access to proper roles and API endpoints. Tyring to upgrade the OS and the failover tooling at the same time can lead to a large amount of work. With the cloud failover tooling there are aspects that are more dependent depending on the cloud provider so if one has to troubleshoot both an upgrade AND the iApp migration a change window can become small.
In any upgrade scenario you should always take a backup of each BIG-IP. Additionally you have the option to take a snapshot of the VM disk when it is powered off.