High Availability for F5 NGINX Instance Manager in AWS

Introduction

F5 NGINX Instance Manager gives you a centralized way to manage NGINX Open Source and NGINX Plus instances across your environment. It’s ideal for disconnected or air-gapped deployments, with no need for internet access or external cloud services.

The NGINX Instance Manager features keep changing. They now include many features for managing configurations, like NGINX config versioning and templating, F5 WAF policy and signature management, monitoring of NGINX metrics and security events, and a rich API to help external automation.

As the role of NGINX Instance Manager becomes increasingly important in the management of disconnected NGINX fleets, the need for high availability increases.  This article explores how we can use Linux clustering to provide high availability for NGINX Instance Manager across two availability zones in AWS.

 

Core Technologies

Core technologies used in this HA architecture design include:

Amazon Elastic Compute instances (EC2) - virtual machines rented inside AWS that can be used to host applications, like NGINX Instance Manager.

Pacemaker - an open-source high availability resource manager software used in Linux clusters since 2004.  Pacemaker  is generally deployed with the Corosync Cluster Engine, which provides the cluster node communication, membership tracking and cluster quorum.

Amazon Elastic File System (EFS) - a serverless, fully managed, elastic Network File System (NFS) that allows servers to share file data simultaneously between systems.

Amazon Network Load Balancer (NLB) - a layer 4 TCP/UDP load balancer that forwards traffic to targets like EC2 instances, containers or IP addresses.  NLB can send periodic health checks to registered targets to ensure that traffic is only forwarded to healthy targets.

 

Architecture Overview

In this highly available architecture, we will install NGINX Instance Manager (NIM) on two EC2 instances in different AWS Availability Zones (AZ).  Four EFS file systems will be created to share key stateful information between the two NIM instances, and Pacemaker/Corosync will be used to orchestrate the cluster - only one NIM instance is active at any time and Pacemaker will facilitate this by starting/stopping the NIM systemd services.  Finally, an Amazon NLB will be used to provide network failover between the two NIM instances, using an HTTP health check to determine the active cluster node.

 

Deployment Steps

1. Create AWS EFS file systems

First, we are going to create four EFS volumes to hold important NIM configuration and state information that will be shared between nodes.  These file systems will be mounted onto: /etc/nms, /var/lib/clickhouse, /var/lib/nms and /usr/share/nms inside the NIM node.

Take note of the File System IDs of the newly created file systems.

Edit the properties of each EFS file system and create a mount target in each AZ you intend to deploy a NIM node in, then restrict network access to only the NIM nodes by setting up an AWS Security Group.  You may also consider more advanced authentication methods, but these aren't covered in this article.

2. Deploy two EC2 instances for NGINX Instance Manager

Deploy two EC2 instances with suitable specifications to support the number of data plane instances that you plan to manage (you can find the sizing specifications here) and connect one to each of the AZ/subnet that you configured EFS mount targets in above.

In this example, I will deploy two t2.medium instances running Ubuntu 24.04, connect one to us-east-1a and the other to us-east-1c, and create a security group allowing only traffic from its local assigned subnet.

 

3. Mount the EFS file systems on NGINX Instance Manager Node 1

Now we have the EC2 instances deployed, we can log on to Node 1 and mount the EFS volumes onto this node by executing the following steps:

1.  SSH onto Node 1

2. Install efs-utils package if is not installed already

3. Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory

4. Execute mount -a to mount the file systems

5. Execute df  to ensure that the paths are mounted correctly

 

4. Install NGINX Instance Manager on Node 1

With the EFS file systems now mounted, it's time to run through the NGINX Instance Manager installation on Node 1.

1.  Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh

2.  Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 

3.  Run bash install-nim-bundle.sh -d ubuntu22.04   

4.  Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node:

systemctl stop nms; systemctl disable nms
systemctl stop nginx; systemctl disable nginx
systemctl stop clickhouse-server; systemctl disable clickhouse-server

 

5. Install NGINX Instance Manager on Node 2

This time we are going to install NGINX Instance Manager on Node two but without attaching the EFS file systems.  On Node 2:

1.  Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh

2.  Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 

3.  Run bash install-nim-bundle.sh -d ubuntu22.04   

4.  Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node:

systemctl stop nms; systemctl disable nms
systemctl stop nginx; systemctl disable nginx
systemctl stop clickhouse-server; systemctl disable clickhouse-server

 

6. Mount EFS file systems on NGINX Instance Manager Node 2

Now we have the NGINX Instance Manager binaries installed on each node, let's mount the EFS file systems on Node 2:

1.  SSH onto Node 2

2.  Install efs-utils package if is not installed already

3.  Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory

4.  Execute mount -a to mount the file systems

5.  Execute df  to ensure that the paths are mounted correctly

7. Install and configure Pacemaker/Corosync

With NGINX Instance Manager now installed on both nodes, it's now time to get Pacemaker and Corosync installed:

1.  Install Pacemaker, Corosync and other important agents

sudo apt update
sudo apt install pacemaker pcs corosync fence-agents-aws resource-agents-base

2.  To allow Pacemaker to communicate between nodes, we need to add TCP communication between nodes to the Security Group for the NIM nodes.

3.  Once we have the connectivity in place, we have to set a common password for the hacluster user on both nodes - we can do this by running the following command on both nodes:

sudo passwd hacluster
password:  IloveF5 (don't use this!)

4.  Now we start the Pacemaker services by running the following commands on both nodes:

systemctl start pcsd.service
systemctl enable pcsd.service
systemctl status pcsd.service
systemctl start pacemaker
systemctl enable pacemaker

5.  And finally, we authenticate the nodes with each other (using hacluster username, password and node hostname) and check the cluster status:

pcs host auth ip-172-17-1-89 ip-172-17-2-160
pcs cluster setup nimcluster --force ip-172-17-1-89
pcs status

 

8. Configure Cluster Fencing

Fencing is the ability to make a node unable to run resources, even when that node is unresponsive to cluster commands - you can think of fencing as cutting the power to the node. Fencing protects against corruption of data due to concurrent access to shared resources, commonly known as "split brain" scenario.  In this architecture, we use the fence_aws agent, which uses boto3 library to connect to AWS and stop the EC2 instances of failing nodes.

Let's install and configure the fence_aws agent:

1.  Create an AWS Access Key and Secret Access key for fence_aws to use

2.  Install the AWS CLI on both NIM nodes

3.  Take note of the Instance IDs for the NIM instances

4.  Configure the fence_aws agent as a Pacemaker STONITH device.  Run the psc stonith command inserting your access key, secret key, region, and mappings of Instance ID to Linux hostname.

pcs stonith create hacluster-stonith fence_aws access_key=(your access key) secret_key=(your secret key) region=us-east-1 pcmk_host_map="ip-172-31-34-95:i-0a46181368524dab6;ip-172-31-27-134:i-032d0b400b5689f68" power_timeout=240 pcmk_reboot_timeout=480 pcmk_reboot_retries=4
 

5.  Run pcs status and make sure that the stonith device is started

 

9. Configure Pacemaker resources, colocations and contraints

Ok - we are almost there!   It's time to configure the Pacemaker resources, colocations and constraints.  We want to make sure that the clickhouse-server, nms and nginx systemd services all come up on the same node together, and in that order.  We can do that using Pacemaker colocations and constraints.

1.  Configure a pacemaker resource for each systemd service

pcs resource create clickhouse systemd:clickhouse-server
pcs resource create nms systemd:nms.service
pcs resource create nginx systemd:nginx.service

🔥HOT TIP🔥  check out pcs resource command options (op monitor interval etc.) to optimize failover time.

2.  Create two colocations to make sure they all start on the same node

pcs constraint colocation add clickhouse with nms
pcs constraint colocation add nms with nginx

3.  Create three constraints to define the startup order:  Clickhouse -> NMS -> NGINX

pcs constraint order start clickhouse then nms
pcs constraint order start nms then nginx

4.  Enable and start the pcs cluster

pcs cluster enable --all
pcs cluster start --all

 

10. Provision AWS NLB Load Balancer

Finally - we are going to set up the AWS Network Load Balancer (NLB) to facilitate the failover.

  1. Create a Security Group entry to allow HTTPs traffic to enter the EC2 instance from the local subnet 

2.  Create a Load Balancer target group, targeting instances, with Protocol TCP on port 443

⚠️NOTE ⚠️   if you are using Load balancing with TCP Protocol and terminating the TLS connection on the NIM node (EC2 instance), you must create a security group entry to allow TCP 443 from the connecting clients directly to the EC2 instance IP address.  If you have trusted SSL/TLS server certificates, you may want to investigate a load balancer for TLS protocol.

 

3.  Ensure that a HTTPS health check is in place to facilitate the failover

🔥HOT TIP🔥 you can speed up failure detection and failover using Advanced health check settings.

4.  Include our two NIM instances as pending and save the target group

5.  Now let's create the network load balancer (NLB) listening on TCP port 443 and forwarding to the target group created above. 

6.  Once the load balancer is created, check the target group and you will find that one of the targets is healthy - that's the active node in the pacemaker cluster!

7.  With the load balancing now in place, you can access the NIM console using the FQDN for your load balancer and login with the password set in the install of Node 1.  

8.  Once you have logged in, we need to install a license before we proceed any further:

    • Click on Settings
    • Click on Licenses
    • Click Get Started
    • Click Browse
    • Upload your license
    • Click Add

 

9.  With the license now installed, we have access to the full console

 

11. Test failover

The easiest way to test failover is to just shut down the active node in the cluster.  Pacemaker will detect the node is no longer available and start the services on the remaining node.

  1. Stop the active node/instance of the NIM

2.      Monitor the Target Group and watch it fail over - depending on the settings you have set up, this may take a few minutes

 

12. How to upgrade NGINX Instance Manager on the cluster

To upgrade NGINX Instance Manager in a Pacemaker cluster, perform the following tasks:

1.  Stop the Pacemaker Cluster services on Node 2 - forcing Node 1 to take over.

pcs cluster stop ip-172-17-2-160

2.  Disconnect the NFS mounts on Node2

umount /usr/share/nms
umount /etc/nms
umount /var/lib/nms
umount /var/lib/clickhouse

3.  Upgrade NGINX Instance Manager on Node 1

sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb
sudo systemctl restart nms
sudo systemctl restart nginx

4.  Upgrade NGINX Instance Manager on Node 2 (with the NFS mounts disconnected)

sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb
sudo systemctl restart nms
sudo systemctl restart nginx

5.  Re-mount all the NFS mounts on Node 2

mount -a

6.  Start the Pacemaker Cluster services on Node 2 - adding it back into the cluster

pcs cluster start ip-172-17-2-160

 

13. Reference Documents

Some good references on Pacemaker/Corosync clustering can be found here:

Published Mar 06, 2026
Version 1.0
No CommentsBe the first to comment