application delivery
2355 TopicsPost-Quantum Cryptography: Building Resilience Against Tomorrow’s Threats
Modern cryptographic systems such as RSA, ECC (Elliptic Curve Cryptography), and DH (Diffie-Hellman) rely heavily on the mathematical difficulty of certain problems, like factoring large integers or computing discrete logarithms. However, with the rise of quantum computing, algorithms like Shor's and Grover's threaten to break these systems, rendering them insecure. Quantum computers are not yet at the scale required to break these encryption methods in practice, but their rapid development has pushed the cryptographic community to act now. This is where Post-Quantum Cryptography (PQC) comes in — a new wave of algorithms designed to remain secure against both classical and quantum attacks. Figure 1: Cryptography evolution Why PQC Matters Quantum computers exploit quantum mechanics principles like superposition and entanglement to perform calculations that would take classical computers millennia2. This threatens: Public-key cryptography: Algorithms like RSA rely on factoring large primes or solving discrete logarithms-problems quantum computers could crack using Shor’s algorithm. Long-term data security: Attackers may already be harvesting encrypted data to decrypt later ("harvest now, decrypt later") once quantum computers mature. How PQC Works The National Institute of Standards and Technology (NIST) has led a multi-year standardization effort. Here are the main algorithm families and notable examples. Lattice-Based Cryptography. Lattice problems are believed to be hard for quantum computers. Most of the leading candidates come from this category. CRYSTALS-Kyber (Key Encapsulation Mechanism) CRYSTALS-Dilithium (Digital Signatures) Uses complex geometric structures (lattices) where finding the shortest vector is computationally hard, even for quantum computers Example: ML-KEM (formerly Kyber) establishes encryption keys using lattices but requires more data transfer (2,272 bytes vs. 64 bytes for elliptic curves) The below figure shows an illustration of how Lattice-based cryptography works. Imagine solving a maze with two maps-one public (twisted paths) and one private (shortest route). Only the private map holder can navigate efficiently Code-Based Cryptography Based on the difficulty of decoding random linear codes. Classic McEliece: Resistant to quantum attacks for decades. Pros: Very well-studied and conservative. Cons: Very large public key sizes. Relies on error-correcting codes. The Classic McEliece scheme hides messages by adding intentional errors only the recipient can fix. How it works: Key generation: Create a parity-check matrix (public key) and a secret decoder (private key). Encryption: Encode a message with random errors. Decryption: Use the private key to correct errors and recover the message Figure3: Code-Based Cryptography Illustration Multivariate & Hash-Based Quadratic Equations Multivariate These are based on solving systems of multivariate quadratic equations over finite fields and relies on solving systems of multivariate equations, a problem believed to be quantum-resistant. Hash-Based Use hash functions to construct secure digital signatures. SPHINCS+: Stateless and hash-based, good for long-term digital signature security. Challenges and Adoption Integration: PQC must work within existing TLS, VPN, and hardware stacks. Key sizes: PQC algorithms often require larger keys. For example, Classic McEliece public keys can exceed 1MB. Hybrid Schemes: Combining classical and post-quantum methods for gradual adoption. Performance: Lattice-based methods are fast but increase bandwidth usage. Standardization: NIST has finalized three PQC standards (e.g., ML-KEM) and is testing others. Organizations must start migrating now, as transitions can take decades. Adopting PQC with BIG-IP As of F5 BIG-IP 17.5, the BIG-IP now supports the widely implemented MLKEM cipher group for client-side TLS negotiations as well as Server side TLS negotiation. Other cipher groups and capabilities will become available in subsequent releases. Cipher walkthrough Let's take the supported cipher in v17.5.0 (Hybrid X25519_Kyber768) as an example and walk through it. X25519: A classical elliptic-curve Diffie-Hellman (ECDH) algorithm Kyber768: A post-quantum Key Encapsulation Mechanism (KEM) The goal is to securely establish a shared secret key between the two parties using both classical and quantum-resistant cryptography. Key Exchange X25519 Exchange: Alice and Bob exchange X25519 public keys. Each computes a shared secret using their own private key + the other’s public key: Kyber768 Exchange: Alice uses Bob’s Kyber768 public key to encapsulate a secret: Produces a ciphertext and a shared secret Bob uses his Kyber768 private key to decapsulate the ciphertext and recover the same shared secret: Both parties now have: A classical shared secret A post-quantum shared secret They combine them using a KDF (Key Derivation Function): Why the hybrid approach is being followed: If quantum computers are not practical yet, X25519 provides strong classical security. If a quantum computer arrives, Kyber768 keeps communications secure. Helps organizations migrate gradually from classical to post-quantum systems. Implementation guide F5 introduced new enhancements in 17.5.1 New Features in BIG-IP Version 17.5.1 BIG-IP now supports the X25519MLKEM768 hybrid key exchange in TLS 1.3 on the client side and server side. This mechanism combines the widely used X25519 elliptic curve key exchange with MLKEM768 They provide enhanced protection by ensuring the confidentiality of communications even in future quantum threats. This enhancement strengthens the application’s cryptographic flexibility and positions it for secure communication in classical and post-quantum environments. This change does not affect existing configurations but provides an additional option for enhanced security where supported. Implementation KB provided by F5 K000149577: Enabling Post-Quantum Cryptography in F5 BIG-IP TMOS NGINX Support for PQC We are pleased to announce support for Post Quantum Cryptography (PQC) starting NGINX Plus R33. NGINX provides PQC support using the Open Quantum Safe provider library for OpenSSL 3.x (oqs-provider). This library is available from the Open Quantum Safe (OQS) project. The oqs-provider library adds support for all post-quantum algorithms supported by the OQS project into network protocols like TLS in OpenSSL-3 reliant applications. All ciphers/algorithms provided by oqs-provider are supported by NGINX. To configure NGINX with PQC support using oqs-provider, follow these steps: Install the necessary dependencies sudo apt update sudo apt install -y build-essential git cmake ninja-build libssl-dev pkg-config Download and install liboqs git clone --branch main https://github.com/open-quantum-safe/liboqs.git cd liboqs mkdir build && cd build cmake -GNinja -DCMAKE_INSTALL_PREFIX=/usr/local -DOQS_DIST_BUILD=ON .. ninja sudo ninja install Download and install oqs-provider git clone --branch main https://github.com/open-quantum-safe/oqs-provider.git cd oqs-provider mkdir build && cd build cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local -DOPENSSL_ROOT_DIR=/usr/local/ssl .. make -j$(nproc) sudo make install Download and install OpenSSL with oqs-provider support git clone https://github.com/openssl/openssl.git cd openssl ./Configure --prefix=/usr/local/ssl --openssldir=/usr/local/ssl linux-x86_64 make -j$(nproc) sudo make install_sw Configure OpenSSL for oqs-provider /usr/local/ssl/openssl.cnf: openssl_conf = openssl_init [openssl_init] providers = provider_sect [provider_sect] default = default_sect oqsprovider = oqsprovider_sect [default_sect] activate = 1 [oqsprovider_sect] activate = 1 Generate post quantum certificates export OPENSSL_CONF=/usr/local/ssl/openssl.cnf # Generate CA key and certificate /usr/local/ssl/bin/openssl req -x509 -new -newkey dilithium3 -keyout ca.key -out ca.crt -nodes -subj "/CN=Post-Quantum CA" -days 365 # Generate server key and certificate signing request (CSR) /usr/local/ssl/bin/openssl req -new -newkey dilithium3 -keyout server.key -out server.csr -nodes -subj "/CN=your.domain.com" # Sign the server certificate with the CA /usr/local/ssl/bin/openssl x509 -req -in server.csr -out server.crt -CA ca.crt -CAkey ca.key -CAcreateserial -days 365 Download and install NGINX Plus Configure NGINX to use the post quantum certificates server { listen 0.0.0.0:443 ssl; ssl_certificate /path/to/server.crt; ssl_certificate_key /path/to/server.key; ssl_protocols TLSv1.3; ssl_ecdh_curve kyber768; location / { return 200 "$ssl_curve $ssl_curves"; } } Conclusion By adopting PQC, we can future-proof encryption against quantum threats while balancing security and practicality. While technical hurdles remain, collaborative efforts between researchers, engineers, and policymakers are accelerating the transition. Related Content K000149577: Enabling Post-Quantum Cryptography in F5 BIG-IP TMOS F5 NGINX Plus R33 Release Now Available | DevCentral New Features in BIG-IP Version 17.5.1 The State of Post-Quantum Cryptography (PQC) on the Web
2.2KViews5likes5CommentsVMware VKS integration with F5 BIG-IP and CIS
Introduction vSphere Kubernetes Service (VKS) is the Kubernetes runtime built directly into VMware Cloud Foundation (VCF). With CNCF certified Kubernetes, VKS enables platform engineers to deploy and manage Kubernetes clusters while leveraging a comprehensive set of cloud services in VCF. Cloud admins benefit from the support for N-2 Kubernetes versions, enterprise-grade security, and simplified lifecycle management for modern apps adoption. Alike with other Kubernetes platforms, the integration with BIG-IP is done through the use of the Container Ingress Services (CIS) component, which is hosted in the Kubernetes platform and allows to configure the BIG-IP using the Kubernetes API. Under the hood, it uses the F5 AS3 declarative API. Note from the picture that BIG-IP integration with VKS is not limited to BIG-IP´s load balancing capabilities and that most BIG-IP features can be configured using this integration. These features include: Advanced TLS encryption, including safe key storage with Hardware Security Module (HSM) or Network & Cloud HSM support. Advanced WAF, L7 bot and API protection. L3-L4 High-performance firewall with IPS for protocol conformance. Behavioral DDoS protection with cloud scrubbing support. Visibility into TLS traffic for inspection with 3 rd party solutions. Identity-aware ingress with Federated SSO and integration with leading MFAs. AI inference and agentic support thanks to JSON and MCP protocol support. Planning the deployment of CIS for VMware VKS The installation of CIS in VMware VKS is performed through the standard Helm charts facility. The platform owner needs to determine beforehand: Whether the deployment is hosted on a vSphere (VDS) network or an NSX network. It has to be taken into account that on an NSX network, VKS doesn´t currently allow to place the load balancers in the same segment as the VKS cluster. No special considerations have to be taken when hosting BIG-IP in a vSphere (VDS) network. Whether this is a single-cluster or a multi-cluster deployment. When using the multi-cluster option and clusterIP mode (only possible with Calico in VKS), it has to be taken into account that the POD networks of the clusters cannot have overlapping prefixes. What Kubernetes networking (CNI) is desired to be used. CIS supports both VKS supported CNIs: Antrea (default) and Calico. From the CIS point of view, the CNI is only relevant when sending traffic directly to the PODs. See next. What integration with the CNI is desired between the BIG-IP and VKS NodePort mode This is done by making applications discoverable using Services of type NodePort. From the BIG-IP, the traffic is sent to the Node´s IPs where it is redistributed to the POD depending on the TrafficPolicies of the Service. This is CNI agnostic. Any CNI can be used. Direct-to-POD mode This is done by making applications discoverable using the Services of type ClusterIP. Note that the CIS integration with Antrea uses Antrea´s nodePortLocal mechanism, which requires an additional annotation in the Service declaration. See the CIS VKS page in F5 CloudDocs for details. This Antrea nodePortLocal mechanism allows to send the traffic directly to the POD without actually using the POD IP address. This is especially relevant for NSX because it allows to access the PODs without actually re-distributing the PODs IPs across the NSX network, which is not allowed. When using vSphere (VDS) networking, either Antrea’s nodePortLocal or clusterIP with Calico can be used. Another way (but not frequent) is the use of hostNetwork POD networking because it requires privileges for the application PODs or ingress controllers. Network-wise, this would have a similar behavior to nodePortLocal, but without the automatic allocation of ports. Whether the deployment is a single-tier or a two-tier deployment. A single-tier deployment is a deployment where the BIG-IP sends the traffic directly to the application PODs. This has a simpler traffic flow and easier persistence and end-to-end monitoring. A two-tier deployment sends the traffic to an ingress controller POD instead of the application PODs. This ingress controller could be Contour, NGINX Gateway Fabric, Istio or an API gateway. This type of deployment offers the ultimate scalability and provides additional segregation between the BIG-IPs (typically owned by NetOps) and the Kubernetes cluster (typically owned by DevOps). Once CIS is deployed, applications can be published either using the Kubernetes standard Ingress resource or F5’s Custom Resources. This latter is the recommended way because it allows to expose most of the BIG-IPs capabilities. Details on the Ingress resource and F5 custom annotations can be found here. Details on the F5 CRDs can be found here. Please note that at time of this writing Antrea nodePortLocal doesn´t support the TransportServer CRD. Please consult your F5 representative for its availability. Detailed instructions on how to deploy CIS for VKS can be found on this CIS VKS page in F5 CloudDocs. Application-aware MultiCluster support MultiCluster allows to expose applications that are hosted in multiple VKS clusters and publish them in a single VIP. BIG-IP & CIS are in charge of: Discover where the PODs of the applications are hosted. Note that a given application doesn´t need to be available in all clusters. Upon receiving the request for a given application, decide to which cluster and Node/Pod the request has to be sent. This decision is based on the weight of each cluster, the application availability and the load balancing algorithm being applied. Single-tier or Two-tier architectures are possible. NodePort and ClusterIP modes are possible as well. Note that at the time of this writing, Antrea in ClusterIP mode (nodePortLocal) is not supported currently. Please consult your F5 representative for availability of this feature. Considerations for NSX Load Balancers cannot be placed in the same VPC segment where the VMware VKS cluster is. These can be placed in a separate VPC segment of the same VPC gateway as shown in the next diagram. In this arrangement the BIG-IP can be configured as either 1NIC mode or as a regular deployment, in which case the MGMT interface is typically configured through an infrastructure VLAN instead of an NSX segment. The data segment is only required to have enough prefixes to host the self-IPs of the BIG-IP units. The prefixes of the VIPs might not belong to the Data Segment´s subnet. These additional prefixes have to be configured as static routes in the VPC Gateway and Route Redistribution for these must be enabled. Given that the Load Balancers are not in line with the traffic flow towards the VKS Cluster, it is required to use SNAT. When using SNAT pools, the prefixes of these can optionally be configured as additional prefixes of the Data Segment, like the VIPs. Specifically for Calico, clusterIP mode cannot be used in NSX because this would require the BIG-IP to be in the same VPC segment as VMware VKS. Note also that BGP multi-hop is not feasible either because it would require the POD cluster network prefixes to be redistributed through NSX, which is not possible either. Conclusion and final remarks F5 BIG-IPs provides unmatched deployment options and features for VMware VKS; these include: Support for all VKS CNIs, which allows sending the traffic directly instead of using hostNetwork (which implies a security risk) or using the common NodePort, which can incur an additional kube-proxy indirection. Both 1-tier or 2-tier arrangements (or both types simultaneously) are possible. F5´s Container Ingress Services provides the ability to handle multiple VMware VKS clusters with application-aware VIPs. This is a unique feature in the industry. Securing applications with the wide range of L3 to L7 security features provided by BIG-IP, including Advanced WAF and Application Access. To complete the circle, this integration also provides IP address management (IPAM) which provides great flexibility to DevOps teams. All these are available regardless of the form factor of the BIG-IP: Virtual Edition, appliance or chassis, allowing great scalability and multi-tenancy options. In NSX deployments, the recommended form-factor is Virtual Edition in order to connect to the NSX segments. We look forward to hearing your experience and feedback on this article.318Views1like0CommentsHigh Availability for F5 NGINX Instance Manager in AWS
Introduction F5 NGINX Instance Manager gives you a centralized way to manage NGINX Open Source and NGINX Plus instances across your environment. It’s ideal for disconnected or air-gapped deployments, with no need for internet access or external cloud services. The NGINX Instance Manager features keep changing. They now include many features for managing configurations, like NGINX config versioning and templating, F5 WAF for NGINX policy and signature management, monitoring of NGINX metrics and security events, and a rich API to help external automation. As the role of NGINX Instance Manager becomes increasingly important in the management of disconnected NGINX fleets, the need for high availability increases. This article explores how we can use Linux clustering to provide high availability for NGINX Instance Manager across two availability zones in AWS. Core Technologies Core technologies used in this HA architecture design include: Amazon Elastic Compute instances (EC2) - virtual machines rented inside AWS that can be used to host applications, like NGINX Instance Manager. Pacemaker - an open-source high availability resource manager software used in Linux clusters since 2004. Pacemaker is generally deployed with the Corosync Cluster Engine, which provides the cluster node communication, membership tracking and cluster quorum. Amazon Elastic File System (EFS) - a serverless, fully managed, elastic Network File System (NFS) that allows servers to share file data simultaneously between systems. Amazon Network Load Balancer (NLB) - a layer 4 TCP/UDP load balancer that forwards traffic to targets like EC2 instances, containers or IP addresses. NLB can send periodic health checks to registered targets to ensure that traffic is only forwarded to healthy targets. Architecture Overview In this highly available architecture, we will install NGINX Instance Manager (NIM) on two EC2 instances in different AWS Availability Zones (AZ). Four EFS file systems will be created to share key stateful information between the two NIM instances, and Pacemaker/Corosync will be used to orchestrate the cluster - only one NIM instance is active at any time and Pacemaker will facilitate this by starting/stopping the NIM systemd services. Finally, an Amazon NLB will be used to provide network failover between the two NIM instances, using an HTTP health check to determine the active cluster node. Deployment Steps 1. Create AWS EFS file systems First, we are going to create four EFS volumes to hold important NIM configuration and state information that will be shared between nodes. These file systems will be mounted onto: /etc/nms, /var/lib/clickhouse, /var/lib/nms and /usr/share/nms inside the NIM node. Take note of the File System IDs of the newly created file systems. Edit the properties of each EFS file system and create a mount target in each AZ you intend to deploy a NIM node in, then restrict network access to only the NIM nodes by setting up an AWS Security Group. You may also consider more advanced authentication methods, but these aren't covered in this article. 2. Deploy two EC2 instances for NGINX Instance Manager Deploy two EC2 instances with suitable specifications to support the number of data plane instances that you plan to manage (you can find the sizing specifications here) and connect one to each of the AZ/subnet that you configured EFS mount targets in above. In this example, I will deploy two t2.medium instances running Ubuntu 24.04, connect one to us-east-1a and the other to us-east-1c, and create a security group allowing only traffic from its local assigned subnet. 3. Mount the EFS file systems on NGINX Instance Manager Node 1 Now we have the EC2 instances deployed, we can log on to Node 1 and mount the EFS volumes onto this node by executing the following steps: 1. SSH onto Node 1 2. Install efs-utils package if is not installed already 3. Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory 4. Execute mount -a to mount the file systems 5. Execute df to ensure that the paths are mounted correctly 4. Install NGINX Instance Manager on Node 1 With the EFS file systems now mounted, it's time to run through the NGINX Instance Manager installation on Node 1. 1. Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh 2. Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 3. Run bash install-nim-bundle.sh -d ubuntu22.04 4. Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node: systemctl stop nms; systemctl disable nms systemctl stop nginx; systemctl disable nginx systemctl stop clickhouse-server; systemctl disable clickhouse-server 5. Install NGINX Instance Manager on Node 2 This time we are going to install NGINX Instance Manager on Node two but without attaching the EFS file systems. On Node 2: 1. Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh 2. Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 3. Run bash install-nim-bundle.sh -d ubuntu22.04 4. Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node: systemctl stop nms; systemctl disable nms systemctl stop nginx; systemctl disable nginx systemctl stop clickhouse-server; systemctl disable clickhouse-server 6. Mount EFS file systems on NGINX Instance Manager Node 2 Now we have the NGINX Instance Manager binaries installed on each node, let's mount the EFS file systems on Node 2: 1. SSH onto Node 2 2. Install efs-utils package if is not installed already 3. Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory 4. Execute mount -a to mount the file systems 5. Execute df to ensure that the paths are mounted correctly 7. Install and configure Pacemaker/Corosync With NGINX Instance Manager now installed on both nodes, it's now time to get Pacemaker and Corosync installed: 1. Install Pacemaker, Corosync and other important agents sudo apt update sudo apt install pacemaker pcs corosync fence-agents-aws resource-agents-base 2. To allow Pacemaker to communicate between nodes, we need to add TCP communication between nodes to the Security Group for the NIM nodes. 3. Once we have the connectivity in place, we have to set a common password for the hacluster user on both nodes - we can do this by running the following command on both nodes: sudo passwd hacluster password: IloveF5 (don't use this!) 4. Now we start the Pacemaker services by running the following commands on both nodes: systemctl start pcsd.service systemctl enable pcsd.service systemctl status pcsd.service systemctl start pacemaker systemctl enable pacemaker 5. And finally, we authenticate the nodes with each other (using hacluster username, password and node hostname) and check the cluster status: pcs host auth ip-172-17-1-89 ip-172-17-2-160 pcs cluster setup nimcluster --force ip-172-17-1-89 pcs status 8. Configure Cluster Fencing Fencing is the ability to make a node unable to run resources, even when that node is unresponsive to cluster commands - you can think of fencing as cutting the power to the node. Fencing protects against corruption of data due to concurrent access to shared resources, commonly known as "split brain" scenario. In this architecture, we use the fence_aws agent, which uses boto3 library to connect to AWS and stop the EC2 instances of failing nodes. Let's install and configure the fence_aws agent: 1. Create an AWS Access Key and Secret Access key for fence_aws to use 2. Install the AWS CLI on both NIM nodes 3. Take note of the Instance IDs for the NIM instances 4. Configure the fence_aws agent as a Pacemaker STONITH device. Run the psc stonith command inserting your access key, secret key, region, and mappings of Instance ID to Linux hostname. pcs stonith create hacluster-stonith fence_aws access_key=(your access key) secret_key=(your secret key) region=us-east-1 pcmk_host_map="ip-172-31-34-95:i-0a46181368524dab6;ip-172-31-27-134:i-032d0b400b5689f68" power_timeout=240 pcmk_reboot_timeout=480 pcmk_reboot_retries=4 5. Run pcs status and make sure that the stonith device is started 9. Configure Pacemaker resources, colocations and contraints Ok - we are almost there! It's time to configure the Pacemaker resources, colocations and constraints. We want to make sure that the clickhouse-server, nms and nginx systemd services all come up on the same node together, and in that order. We can do that using Pacemaker colocations and constraints. 1. Configure a pacemaker resource for each systemd service pcs resource create clickhouse systemd:clickhouse-server pcs resource create nms systemd:nms.service pcs resource create nginx systemd:nginx.service 🔥HOT TIP🔥 check out pcs resource command options (op monitor interval etc.) to optimize failover time. 2. Create two colocations to make sure they all start on the same node pcs constraint colocation add clickhouse with nms pcs constraint colocation add nms with nginx 3. Create three constraints to define the startup order: Clickhouse -> NMS -> NGINX pcs constraint order start clickhouse then nms pcs constraint order start nms then nginx 4. Enable and start the pcs cluster pcs cluster enable --all pcs cluster start --all 10. Provision AWS NLB Load Balancer Finally - we are going to set up the AWS Network Load Balancer (NLB) to facilitate the failover. Create a Security Group entry to allow HTTPs traffic to enter the EC2 instance from the local subnet 2. Create a Load Balancer target group, targeting instances, with Protocol TCP on port 443 ⚠️NOTE ⚠️ if you are using Load balancing with TCP Protocol and terminating the TLS connection on the NIM node (EC2 instance), you must create a security group entry to allow TCP 443 from the connecting clients directly to the EC2 instance IP address. If you have trusted SSL/TLS server certificates, you may want to investigate a load balancer for TLS protocol. 3. Ensure that a HTTPS health check is in place to facilitate the failover 🔥HOT TIP🔥 you can speed up failure detection and failover using Advanced health check settings. 4. Include our two NIM instances as pending and save the target group 5. Now let's create the network load balancer (NLB) listening on TCP port 443 and forwarding to the target group created above. 6. Once the load balancer is created, check the target group and you will find that one of the targets is healthy - that's the active node in the pacemaker cluster! 7. With the load balancing now in place, you can access the NIM console using the FQDN for your load balancer and login with the password set in the install of Node 1. 8. Once you have logged in, we need to install a license before we proceed any further: Click on Settings Click on Licenses Click Get Started Click Browse Upload your license Click Add 9. With the license now installed, we have access to the full console 11. Test failover The easiest way to test failover is to just shut down the active node in the cluster. Pacemaker will detect the node is no longer available and start the services on the remaining node. Stop the active node/instance of the NIM 2. Monitor the Target Group and watch it fail over - depending on the settings you have set up, this may take a few minutes 12. How to upgrade NGINX Instance Manager on the cluster To upgrade NGINX Instance Manager in a Pacemaker cluster, perform the following tasks: 1. Stop the Pacemaker Cluster services on Node 2 - forcing Node 1 to take over. pcs cluster stop ip-172-17-2-160 2. Disconnect the NFS mounts on Node2 umount /usr/share/nms umount /etc/nms umount /var/lib/nms umount /var/lib/clickhouse 3. Upgrade NGINX Instance Manager on Node 1 Download the update from the MyF5 Customer Portal sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb sudo systemctl restart nms sudo systemctl restart nginx 4. Upgrade NGINX Instance Manager on Node 2 (with the NFS mounts disconnected) Download the update from the MyF5 Customer Portal sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb sudo systemctl restart nms sudo systemctl restart nginx 5. Re-mount all the NFS mounts on Node 2 mount -a 6. Start the Pacemaker Cluster services on Node 2 - adding it back into the cluster pcs cluster start ip-172-17-2-160 13. Reference Documents Some good references on Pacemaker/Corosync clustering can be found here: Configuring a Red Hat High Availability cluster on AWS Implement a High-Availability Cluster with Pacemaker and Corosync ClusterLabs Pacemaker website Corosync Cluster Engine website200Views0likes0CommentsExplore TCP and TLS Profiles for Optimal S3 with MinIO Clusters
A lab-based investigation was conducted to observe measurable performance differences when using different profiles, including TCP and TLS settings, in both local and simulated wide-area implementations. Testing at a high-traffic scale was not carried out. That may be something to look for in the future. Rather, simply observing the nuances of TCP and TLS in support of modern S3 flows, think AI data delivery for model training exercises, led to interesting strategic findings. Top of mind throughout the exercise were new available configurations of both TCP and TLS BIG-IP profiles, with the express interest in improving the speed and resilience of S3 data flows. Suggested tweaks to the rich set of TCP and TLS parameters will be touched upon. Lab Setup A fully software-based configuration was created using virtualization and a modern hypervisor, where S3 clients were routed to tunable BIG-IP virtual servers, which proxied S3 traffic to a virtualized AIStor single-node single-drive object solution. The routing of client traffic was chosen so as to simulate first a high-speed local area network (LAN) experience, followed by, second, a wide area network (WAN) examination. The router offered various network impairments, simulating cross-North American latency, shaping of traffic to a reasonable but constrained maximum bandwidth, and the introduction of packet loss throughout the S3 activities. The investigation was primarily around changes in performance for a representative S3 transaction, such as a 10-megabyte object retrieval by a client, as different profiles in BIG-IP were exercised. LAN Baseline for S3 Traffic and BIG-IP To establish what our lab setup is capable of delivering, some standard S3 transactions were carried out. This started without a simulated WAN. S3 was put through its paces by both downloading and uploading of objects using both Microsoft Edge and the S3Browser clients. The LAN setup, instantiated on an ESXi host, is designed to be highly responsive. As seen by quick measurements of ICMP latency between a client and a BIG-IP VE virtual server, the round-trip time is sub-1 millisecond. The first Ping slightly exceeds 1 millisecond due to a client-side ARP request/response between client and router, and subsequent responses settle to approximately 400 microseconds. The measurements were taken with Wireshark on the S3 client running on Windows. The command prompt results do not have the fidelity of Wireshark and simply report latencies of sub-1 milliseconds for the bulk of response times. With performance protected by limiting Wireshark capture to the first 128-bytes of all packets, we use the Wireshark delta times highlighted to confirm more accurately the low round-trip time (RTT) latency. To support local on-premises S3 clients, the virtual server at 10.150.92.202 was configured to use one of the BIG-IP available LAN-oriented TCP profiles “tcp-lan-optimized”, available with all fresh BIG-IP installs. As pointed out in the diagram, the S3 traffic will also benefit from SSL/TLS security between the client and the BIG-IP. The TCP profile “tcp-lan-optimized” exhibits settings that are beneficial for low-latency TCP sessions only traversing LANs, where intuitively low packet loss and high throughput are both very likely. Just some of the characteristics include aggressive (“high speed”) congestion control, smaller initial congestion windows (suited for lower RTT), and disabling of Nagle’s algorithm such that data is sent immediately, even if not filling out a TCP maximum segment size (MSS). There are a breadth of LAN profiles also available with their own tweaks, such as “f5-tcp-lan”, the selected profile "tcp-lan-optimized" was simply chosen as a starting point. With MinIO AIStor configured with buckets allowing public access, meaning no S3 access key is necessary, simple browsing of buckets with something like Microsoft Edge and its Developer Tools can give an estimate of S3 retrieval times for a sample 10-megabyte object. Determining Throughput for LAN Delivered S3 Objects We noted that Edge Developer Tools, with caching disabled, suggested our 10-megabyte object required between 155 and 259 milliseconds to download. Some variances in time can be expected as an early test might require full TCP 3-way handshakes on port 9000 (the MinIO S3 port used) and a full TLS negotiation. Later downloads can benefit from features like TLS resumption. To drill deeper, and to estimate the actual traffic rate provided in the LAN only setup, in terms of bits per second, one can again turn to Wireshark. It is important to restrict capture to only the necessary bytes to decode TCP, as it’s the study of progressive TCP sequence numbers and issues like TCP retransmits that lead to an estimate of S3 transfer rate. Trying to capture all the payload on a consumer Windows virtual machine will not see all the packets stored. It is also good practice to filter out capture of extraneous traffic, such as Remote Desktop Protocol (RDP), which often operates on TCP and UDP ports 3389. Three successive downloads from the BIG-IP, serving the MinIO AIStor solution in the backend, appear as follows when using the TCP Stream Graphs, specifically the Sequence Number (Stevens) plot, where one clearly sees the rapid rise in sequence numbers as the 3 downloads complete at high speed. Interestingly, zooming in upon any one of the downloads, one notes there is room for improvement, which is simply to say we could have driven S3 faster with client adjustments. The Windows client periodically sends TCP “zero window” advertisements to BIG-IP, essentially halting S3 delivery for some number of milliseconds while buffers on the client are serviced. A quick filter on the client’s address and zero window events can show the activity “on the wire” during these moments (double-click to enlarge). We see that the Windows S3 client periodically shuts down the delivery of TCP segments by reducing its TCP receive (Rx) window size to zero, after which it can be seen to take 15.5 milliseconds to re-open the window. This is not critical but means our measured bit rate for a simple set of basic transactions, even in a virtualized environment, could easily be increased with more performant clients in use. The objective is not to do benchmarking but rather a comparison of TCP profiles (double click for high resolution). As seen above, the typical S3 10-megabyte object, even with a client throttling delivery periodically, was still in the 85 Mbps range. This with the TCP-LAN-Optimized profile in use. The last baseline measurement undertaken was to push S3 data (technically an HTTP PUT command) from the client to the MinIO object storage, via the BIG-IP virtual server. The value here was to measure the TCP response times observed by the client as it moved TCP segments to BIG-IP. To this end, S3Browser, available here, was used as it fully supports S3 user access keys and their corresponding secrets, and allows a user a graphical “File Explorer” experience to manipulate S3 objects. Uploading a 20-kilobyte object from our local client, we see the following response times in Wireshark (double-click for detailed image). Note, response times with something like ICMP Ping Echo Request/Reply are simple, each transaction provides an estimate of round-trip time. With TCP and data in flight, most TCP stacks use a delayed ACK approach where two times MSS (often meaning two times 1,460 bytes) are waited for, or a short timeout in case no additional data arrives, before the data is acknowledged. With large bursts of traffic, such as the upload of a sizable object with S3, the response time is likely not to deviate a lot as delayed ACKs add negligible time. We see in the chart that the data moved from client to BIG-IP, over the LAN, with typical TCP acknowledgment times in the 400 to 800 microsecond range. Baseline S3 Performance Across WAN Using Different BIG-IP TCP Profiles The router used in the lab, OPNSense, has an ability to emulate network impairments, including those consistent with the realities of traffic traversing long distances. Objectives for testing include: Base round-trip delays, in our case, 70 milliseconds, will be introduced as this is in line with best-case optical delays for round trips between New York City and Los Angeles. Shaping of traffic to emulate an achievable bandwidth of 10 Mbps, simulating normal limiting factors such as IP hops, queuing delays, and head-of-line blocking on oversubscribed intermediate network gear. The expectation is such a bandwidth cap will add variance to the overall round-trip delays of packets and occasional drops, forcing TCP into phases such as TCP slow start with congestion avoidance. Experimenting with packet loss, as anything below 1 percent packet loss generally is considered not altogether unexpected. A value of 0.5 percent packet loss will be set for both directions. As with LAN TCP profiles, BIG-IP has a number of WAN options. The one to be contrasted with the existing LAN profile was selected as “tcp-wan-optimized”. The objectives of this profile include maximizing throughput and efficient recovery from packet loss over less performant WAN links. The expectation is to encounter higher latency and lower bandwidth end-to-end network experience. Note, on the inside of BIG-IP, where MinIO AIStor continues to be co-located at LAN speeds, the server side will continue to use the tcp-lan-optimized profile. A rule of thumb is that packet loss in a WAN environment, even as little as 0.5 percent, will impede network quality of service more than latency. As such, we started with no packet loss, simply 70 milliseconds of round-trip latency and policing of bandwidth to a 10 Mbps maximum. Continuing to use the lan-tcp-optimized profile on BIG-IP still saw decent results. As expected, the bandwidth now falls under 10 Mbps, with the measured value, over three downloads, appearing to hover just under 7 Mbps. Using Edge’s developer tools, it shows, even with a profile optimized for LAN traffic, that total download times are fairly consistent, averaging just under 13 seconds. The BIG-IP virtual server was then updated to use the tcp-wan-optimized profile, just for traffic involving the external, client-side. The TCP profile for the internal-side, co-located MinIO server was left with an LAN profile. Using S3 from the client to retrieve the very same 10-megabyte objects, and the results were, to a degree, better. The object download times were in the same order of magnitude as with the LAN profile, however, drilling into the actual bit rates achieved, one can see a marginal increase in overall S3 performance. The next step would be to introduce another real-world component; packet loss equally applied in both directions. Since both directions were subjected to loss, beyond the need to retransmit TCP segments in the direction of the client, TCP acknowledgments from the client to the BIG-IP can also be dropped. The resulting behavior will exercise the TCP congestion control mechanisms. With even low packet loss in our simulated WAN environment, the outcome with the tcp-wan-optimized profile on BIG-IP was markedly better. As seen in the tables above, this is far from a scientific, in-depth analysis, as three 10-megabyte S3 retrievals is not a rigorous baseline. However, simply using these numbers above to guide us, we come to these findings: Average S3 download with WAN profile: 33.7 seconds Average S3 download with LAN profile: 40.2 seconds Percentage reduction in S3 transaction time using WAN profile: 16 percent Comparing one sample S3 transaction using each of the two profiles, visually, we see modest differences that can help to explain the increased quality of service of the WAN profile. For the purpose of a quick investigation, the WAN profile has been seen to offer benefits in a lossy, higher-latency environment such as the lab emulation. Two specific TCP features to call out, and are worth enabling in such environments are: Selective ACKs (SACK option) Without SACK, a normal approach for a client to signal back to a sender that a TCP segment appears to have been lost in flight, is to send simple duplicate ACKs. This means for every newly received segment after the missing data, an ACK is sent but only acknowledging the last contiguously received data, there is no positive indication of the subsequent data received. Simply receiving duplicate ACKs can let the sender infer there is a gap in the data stream delivered. With SACK, there is no inferring, should both ends (e.g. the client and BIG-IP in our case) support this TCP option, agreed upon during the 3-way TCP handshake, then the edges of the discontinuity in data are clearly sent by the client. Most TCP profiles will have SACK enabled, but it is worth confirming it has not been disabled and is active with your key S3 clients, as seen in the following screenshot. TCP Receive Window Exponential Scaling Original implementations of TCP only allowed for approximately 65,535 bytes to be in flight, without the sender having received acknowledgments of reception. This number can be a bottleneck with highly performant devices exchanging TCP over higher latency WAN pipes, networks with so-called high-bandwidth delay products (BDP). The workaround is for each end to advertise an exponent, for instance 2, in which case the peer device will understand that advertised receive windows are interpreted as four times (2^2) the encoded value. An exponent of 4 would indicate a 16-time multiplier (2^4), and so on. To enable this on a BIG-IP stack, per this KB article, we simply adjust the buffer size to the desired value in the profile setup. Without this adjustment, there will be no windows scaling used. Introducing the New S3 TCP Profile for BIG-IP The rise in S3 traffic has implications specific to networking and traffic in-flight. Some of the characteristics that are relevant include: There can be vast differences in network load, transaction by transaction. Consider an IoT device generating a 40-kilobyte sensor reading, followed immediately by a 500-megabyte high-definition medical ultrasound. Both are valid and sample payloads in S3 delivery today. S3, being transported by an HTTPS conduit, employs extensive parallelism for many types of larger transactions, for instance multi-part uploads or using HTTP ranges for large object downloads. Essentially, large transactions such as a 60-megabyte upload become, as an example, 12 smaller 5-megabyte writes. This parallelism is particularly advantageous with clustered HDD nodes, as spinning media still is estimated to provide 60 percent of all storage, yet the input/output (IOPS) rates of HDD technology frequently peaks at rates of only 100 to 150. As such, there is value around using many smaller, but parallel transactions. Solutions supporting S3 storage are focused upon strong read-upon-write consistency, meaning the correct versioning of objects must be served immediately after writing, frequently from any number of storage sites. As such, the immediate replication of asynchronously connected sites through S3 protocol, is something that must happen very quickly for an effective solution. With version 21 and later of TMOS, the BIG-IP LTM module has provided a starting point for a S3 TCP profile, a profile with logical settings aligned with S3 network delivery. The s3-tcp profile is based upon the parent “tcp” profile of BIG-IP, with a number of tweaks now to be described. With the s3-tcp profile, management of the receive window and send buffer is turned over to the system, which will monitor network behavior to auto-adjust. The one striking note are maximum values for items like the largest possible receive window to advertise are much bigger, moving from 65,535 bytes into the millions of bytes range. Similarly, send buffers are much larger, although the client side will ultimately have control over throttling the amount of sent traffic "in flight". The other major difference is that two different congestion control algorithms are in use, CUBIC for the s3-tcp profile and high-speed for the standard tcp-profile. Congestion control is the art of operating a TCP connection with the highest possible congestion window (CWND) over time, while minimizing segment loss due to saturated networking or peer-perceived loss due to fluctuations in delivery latency. TCP is designed to back off how many segments may be in flight, meaning the CWND, when loss is detected. The two algorithms that dictate the ensuing behavior are slow start and congestion avoidance. TCP slow start aggressively opens the congestion window when starting a TCP connection, or during a connection when recovering from a segment loss. Don't be fooled by the term slow, it's actually aggressive and fast off the mark! The "slow" is from historical convention, so named because it is slow only in comparison to the original, 1980s TCP behavior of immediately transmitting a full window of data at wire speed. Meanwhile, congestion avoidance will normally try to slowly, continually and usually linearly further open the congestion window once arriving at a fraction of the last fleetingly reached CWND, perhaps half of the last CWND before loss was detected. From there the congestion avoidance algorithm will inch upwards at a controlled rate trying to achieve the optimal end to end bandwidth before segment loss sets in once again. Think of it as a constant fine-tuning exercise trying to maximize throughput while being wary of the network’s saturation point. CUBIC stems from Linux implementations where high bandwidth, coupled with substantive network latency, are prevalent. CUBIC, as the name suggests, uses a cubic function to calculate congestion window growth based on time elapsed since the last packet loss. The bonus of BIG-IP is that adjustments to TCP profiles are quite easy to achieve. Simply create a new profile, often based upon an existing “parent” profile and make “before” and “after” observations. A good candidate for experimentation is the Westwood+ profile. Westwood+ congestion control is a server-side only implementation that studies the return of TCP ACKs, to infer optimal transmission rates. A more basic TCP approach is to half the congestion window, a full 50 percent reduction of unacknowledged bytes in flight, when three duplicate ACKs are received. The presumption is that three are supporting evidence that a TCP segment has been lost over the network. Westwood+ has a more advanced approach to studying the ACKs to arrive at a less coarse value for a new congestion window. New S3 TLS Profile for BIG-IP Similar to the new s3-tcp profile, there is now an S3 TLS profile. The normal practice is to make a copy of this profile, to allow the setting of specific certificates, keys and certificate chains for the application in question. One of the most important aspects of the s3 profile is that it by default supports TLS 1.3. There are a number of aspects to 1.3 that make it a good option. One, it removes some of the antiquated ciphering and hashing algorithms of older TLS revisions. For another thing, TLS 1.3 is a mandatory baseline for supporting post-quantum computing (PQC) shared key establishment between peers. BIG-IP, using TLS 1.3 as a building block, supports NIST FIPS 203 – ML-KEM key encapsulation as per the standard found here. A S3 immediate win around TLS 1.3 is the ability for entirely new TLS sessions, not resumptions of previously negotiated sessions, to start delivery of application data in one round-trip time (RTT). This is due to a simplification in the protocol exchange between peers in TLS 1.3. Among other things, in non-mutual TLS sessions, only the server is required to provide a certificate, which is already encrypted when delivered. In other words, no later TLS 1.2-style quick validation testing of encryption is required; all necessary tasks required of the server, including if the client can successfully decipher the provided certificate, are achieved in one logical step. Take the following example of a S3 object retrieval through an encrypted HTTP GET. As seen in the diagram, there are two round-trip times and the latency, run across the wide-area network emulator used early, required 300 milliseconds from Client Hello to the first application data. It is noted above that the time from the TLS Client Hello to the first transmission of encrypted data, carrying a S3 transaction, was 300 milliseconds. Contrast this with the same lab setup, now using a copy of the BIG-IP S3 TLS profile, which offers out-of-the box default advertising for TLS 1.3, something other TLS profiles typically require a custom adjustment to make happen in prior BIG-IP releases. As with this overall exploration, the results are suggestive of a notable performance increase, but much higher-load testing should be considered more decisive evidence. What one can say, in this case of single S3 encrypted transactions, a 50 msec or 17 percent savings in latency was observed, with respect to getting a transaction on the wire. Extrapolating over thousands of TLS sessions and the savings, let alone the security-posture improvement of TLS 1.3, make it appear like a logical configuration choice. Summary The amount of attention now being paid to S3 data delivery, including delivery for AI initiatives like model training, is heightened across the industry. With network considerations top of mind, like the effect of both WAN latency and non-zero packet loss ratios, a lab oriented simple exploration of S3 performance between clients and MinIO AIStor was carried out. New configurations of both TCP and TLS profiles, with the express interest in improving the speed and resilience of S3 data flows, were investigated. The impact of using existing LAN or WAN TCP profiles on BIG-IP was measured for small-scale, sampled lab tests. The outcomes suggested that although a LAN profile performed well with significant latency applied, the WAN oriented profiles were demonstrably better in environments with just 0.5 percent packet loss. TLS 1.3 configurations for S3 traffic were also tested, with a noticeably quicker transition to the data plane being setup when the 1RTT of TLS 1.3 was in use. Extrapolating to enterprise loads and the win should be significant, beyond just preparing the foundations for PQC-infused TLS in the future. The recommendation from this exercise is to investigate, using tools with network visibility, what options are already actively in use by both TCP and TLS in support of your S3 applications today. Awareness and benefits from TCP windows, scaling, selective acknowledgments and the use of TLS1.3 wherever possible, are all likely to add to the most robust S3 performance possible.113Views2likes0CommentsF5 NGINX Gateway Fabric - the One Gateway for AI-Powered Applications
NGINX Gateway Fabric (NGF) simplifies the deployment of LLM-powered applications on Kubernetes by acting as a single, unified gateway across every layer of the stack - reverse proxy, API gateway, and LLM inference gateway. This article walks through a reference architecture demonstrating how NGF handles frontend traffic, protects backend APIs, and delivers model-aware inference routing using the Kubernetes Gateway API and the Gateway API Inference Extension.77Views2likes0CommentsAccelerate Application Deployment on Google Cloud with F5 NGINXaaS
Introduction In the push for cloud-native agility, infrastructure teams often face a crossroads: settle for basic, "good enough" load balancing, or take on the heavy lifting of manually managing complex, high-performance proxies. For those building on Google Cloud (GCP), this compromise is no longer necessary. F5 NGINXaaS for Google Cloud represents a shift in how we approach application delivery. It isn’t just NGINX running in the cloud; it is a co-engineered, fully managed on-demand service that lives natively within the GCP ecosystem. This integration allows you to combine the advanced traffic control and programmability NGINX is known for with the effortless scaling and consumption model of an SaaS offering in a platform-first way. By offloading the "toil" of lifecycle management—like patching, tuning, and infrastructure provisioning—to F5, teams can redirect their energy toward modernizing application logic and accelerating release cycles. In this article, we’ll dive into how this synergy between F5 and Google Cloud simplifies your architecture, from securing traffic with integrated secret management to gaining deep operational insights through native monitoring tools. Getting Started with NGINXaaS for Google Cloud The transition to a managed service begins with a seamless onboarding experience through the Google Cloud Marketplace. By leveraging this integrated path, teams can bypass the manual "toil" of traditional infrastructure setup, such as patching and individual instance maintenance. The deployment process involves: Marketplace Subscription: Directly subscribe to the service to ensure unified billing and support. Network Connectivity: Setting up essential VPC and Network Attachments to allow NGINXaaS to communicate securely with your backend resources. Provisioning: Launching a dedicated deployment that provides enterprise-grade reliability while maintaining a cloud-native feel. Secure and Manage SSL/TLS in F5 NGINXaaS for Google Cloud Security is a foundational pillar of this co-engineered service, particularly regarding traffic encryption. NGINXaaS simplifies the lifecycle of SSL/TLS certificates by providing a centralized way to manage credentials. Key security features include: Integrated Secrets Management: Working natively with Google Cloud services to handle sensitive data like private keys and certificates securely. Proxy Configuration: Demonstrating how to set up a Google Cloud proxy network load balancer to handle incoming client traffic. Credential Deployment: Uploading and managing certificates directly within the NGINX console to ensure all application endpoints are protected by robust encryption. Enhancing Visibility in Google Cloud with F5 NGINXaaS Visibility is no longer an afterthought but a native component of the deployment, providing high-fidelity telemetry without separate agents. Native Telemetry Export: By linking your Google Cloud Project ID and configuring Workload Identity Federation (WIF), metrics and logs are pushed directly to Google Cloud Monitoring. Real-Time Dashboards: The observability demo walks through using the Metrics Explorer to visualize critical performance data, such as active HTTP connection counts and response rates. Actionable Logging: Integrated Log Analytics allow you to use the Logs Explorer to isolate events and troubleshoot application issues within a single toolset, streamlining your operational workflow. Whether you are just beginning your transition to the cloud or fine-tuning a sophisticated microservices architecture, F5 NGINXaaS provides the advanced availability, scalability, security, and visibility capabilities necessary for success in the Google Cloud environment. Conclusion The integration of F5 NGINXaaS for Google Cloud represents a significant advantage for organizations looking to modernize their application delivery without the traditional overhead of infrastructure management. By shifting to this co-engineered, managed service, teams can bridge together advanced NGINX performance and the native agility of the Google Cloud ecosystem. Through the demonstrations provided in this article, we’ve highlighted how you can: Accelerate Onboarding: Move from Marketplace subscription to a live deployment in minutes using Network Attachments. Fortify Security: Centralize SSL/TLS management within the NGINX console while leveraging Google Cloud's robust networking layer. Maximize Operational Intelligence: Harness deep, real-time observability by piping telemetry directly into Google Cloud Monitoring and Logging. Resources Accelerating app transformation with F5 NGINXaaS for Google Cloud F5 NGINXaaS for Google Cloud: Delivering resilient, scalable applications107Views2likes2CommentsAI Inference for VLLM models with F5 BIG-IP & Red Hat OpenShift
This article shows how to perform Intelligent Load Balancing for AI workloads using the new features of BIG-IP v21 and Red Hat OpenShift. Intelligent Load Balancing is done based on business logic rules without iRule programming and state metrics of the VLLM inference servers gathered from OpenShift´s Prometheus.376Views1like5CommentsApp Migration and Portability with Equinix Fabric and F5 Distributed Cloud CE
Enterprises face growing pressure to modernize legacy applications, adopt hybrid multi-cloud strategies, and meet rising compliance and performance demands. Migration and portability are now essential to enable agility, optimize costs, and accelerate innovation. Organizations need a secure, high‑performance way to move and connect applications across environments without re‑architecting. This solution brings together Equinix and F5 to deliver a unified, cloud‑adjacent application delivery and security platform.89Views1like0CommentsCrafting a Cloud-Based Antifragile Cyber Resiliency Strategy
Application availability is a continually evolving concept. With the proliferation of hybrid multicloud applications and environments, fronted by third-party services like AWS and Azure, what used to be a back‑end challenge now encompasses even the delivery‑path. Even if an application is healthy, a failing CDN, edge location, or cloud control plane can make it unreachable, bringing business to a halt. Teams that support applications need to plan for this. They need their networks to reduce the blast radius of external failures. They need them to adapt as the failure unfolds. They need them to incorporate the lessons of disruption into future behavior. And they need them to be a systemic and strategic advantage rather than an operational obligation. Why this matters now Late‑2025 outages exposed how single‑vendor or single‑region bets amplify risk. A Cloudflare edge configuration defect propagated globally, causing widespread 5xx responses, despite customers’ origins being sound. Weeks earlier, a control-plane issue impacting AWS’s US‑East‑1 DNS stranded workloads, even impairing provider‑native failover actions. The organizations that fared best ran multi‑path delivery with independent steering at the DNS layer, allowing them to circumvent the gaps in their delivery networks that these outages created. F5’s ADSP portfolio has the tools to help create these resiliency paths: Distributed Cloud (XC) DNS & DNS Load Balancer (DNSLB) enable independent global traffic steering; XC Customer Edge (CE) enables private, controlled failover paths; XC Synthetic Monitoring uses continuous probes from regional edge locations to detect performance degradation or outages; and XC WAAP (Web Application & API Protection) enables consistent security during failover and graceful degradation. Together, they can provide teams with the necessary tools to get ahead of outages and keep applications online, not simply respond reactively after the fact of an outage. The reference architecture at a glance The goal of any modern, antifragile resiliency strategy is to keep full control over how traffic is routed, in all conditions and scenarios. This way, the network becomes an intelligent part of the application delivery and availability strategy. Using built‑in traffic routing policies, it automatically switches to available paths when a problem arises, keeping necessary security protections fully intact and applications online Global steering (Authoritative DNS + GSLB): For many organizations, DNS remains anchored to a single provider, creating an unnecessary dependency and a larger operational blast radius when that provider experiences issues. Delegating critical zones to XC DNS distributes authoritative DNS duties across F5’s global anycast footprint, reducing reliance on any one hyperscaler or DNS control plane. Running alongside this, the DNS Load Balancer applies Global Server Load Balancing policies that evaluate both origin and edge health, returning DNS responses that reflect real service conditions rather than static routing assumptions. Even when a provider’s own control plane is impaired. XC runs on F5’s independent backbone and can steer traffic to any public DNS name or IP (including a primary CDN CNAME), making it ideal for multi‑vendor delivery. Delivery planes (Primary + Alternate): Use your primary CDN/edge as usual, but maintain an alternate ingress on F5 XC WAAP. GSLB returns the primary CDN CNAME under normal conditions and automatically returns the XC VIP/CNAME when health checks degrade. This duality breaks monoculture risk at the edge. Secure origin connectivity (Customer Edge): Place F5 Customer Edge nodes in your VPCs/data centers. When traffic shifts to XC, it traverses F5’s encrypted backbone to the CE, keeping your origins private and avoiding an emergency “open to the internet” posture during failover. Consistent protection (WAAP + Bot Defense): Enforce WAAP policies and (optionally) Bot Defense connectors, so the same security logic applies regardless of which delivery path is active. This closes the “security arbitrage” gap attackers exploit when a secondary path is weaker. Observability & learning (synthetics + SIEM): Run synthetic monitors that test real HTTP journeys (not just TCP liveness) This is one of the only reliable ways to spot “grey failures” like mass 503s. The resultant logs can be streamed to your SIEM for further analysis. Use Game Days to rehearse provider cutovers and capture evidence. How the pieces fit - step by step 1) Make XC your steering layer (Authoritative DNS + GSLB) Delegate your service zone(s) to XC DNS. Keep a short TTL (e.g., 30–60s) on critical records to accelerate adaptation. In DNS Load Balancer, create two Origin Pools: Primary: your CDN CNAME (e.g., www.example.com.cdn.cloudflare.net). Secondary: the XC HTTP LB VIP or CNAME (vip.example.xc.f5.com). Attach health checks and steering policies (priority‑based, latency‑based, or SLO‑aware) so the LB returns the primary CNAME when healthy and the XC endpoint otherwise. Why DNS/GSLB first? When a provider’s control plane stalls (e.g., Route 53 updates freeze), you still retain an external control plane to redirect users, without waiting on the impacted provider to recover. 2) Detect real problems with synthetic monitoring Create Synthetic Monitors in XC that fetch a known‑good URL (/healthz) through the primary CDN path and validate status code, content, and latency. Don’t rely on ping/TCP, as grey failures may look healthy to L4 checks. By using multiple vantage points, you can tune thresholds to trigger “pool down” events quickly but safely. Failover logic example (conceptual): 3) Keep origins private with CE Deploy CE nodes into each origin environment. XC Regional Edge terminates client traffic, applies WAAP controls, then forwards over the F5 backbone to CE, which in turn reaches your service on private subnets. During a CDN failure, you don’t have to expose an emergency public listener or widen security groups. 4) Avoid “security arbitrage” during failover Use a policy‑as‑code approach to keep WAAP intent portable and synchronized between providers (e.g., a positive security model for APIs, shared bot policies). Where applicable, F5 Bot Defense Connector can evaluate traffic headed through Cloudflare so your bot verdicts are identical across both paths. 5) Close the loop: observability, drills, and policy evolution To maintain continuous situational awareness, stream DNS/GSLB state changes, WAAP events, and synthetic monitoring results into your SIEM, such as Splunk or Datadog, to build a global traffic‑health view rather than a narrow “Provider‑X‑only” dashboard. Reinforce this operational picture with quarterly Game Days that deliberately disable the primary pool in XC to validate time‑to‑detect, time‑to‑shift, and confirm that the secondary path meets expected performance and SLO targets. What happens in real outages? Scenario A: A CDN logical push breaks the edge: Your synthetics see 503s from multiple vantage points through the CDN CNAME. The primary pool flips to Down, and XC DNS starts returning the XC VIP. As client caches expire (short TTLs help), users seamlessly land on the XC path (WAAP intact), traverse the F5 backbone, and reach your private origins through CE. This bypasses the global bad push entirely. Scenario B: Cloud control plane failure (e.g., US‑East‑1): Because XC runs on an independent control plane, its monitors and DNS orchestration continue operating. GSLB marks the impacted origin/region down and returns a DR pool (another region or on‑prem behind CE), even if provider‑native tools are unresponsive. Why F5 for the DNS layer? A hybrid DNS/GSLB toolset like the one found in F5’s Distributed Cloud is built for multicloud: F5 XC DNS Load Balancer steers to any origin, IPs or CNAMEs, so you can keep your preferred CDN while retaining an independent steering layer. This toolset is further complemented by Distributed Cloud Synthetic Monitoring. Designed for grey‑failure detection, it simulates user traffic against endpoints to quantify health and performance in real-time, to catch issues that liveness checks miss. All of these functions run on F5’s independent backbone and dedicated Customer Edge nodes. These private, encrypted paths to origins reduce exposure and keep performance stable when the public internet is noisy. Tying back to the five antifragile practices Blast Radius Control: Isolate critical names and flows in distinct DNS/GSLB policies so one provider’s trouble doesn’t cascade. Dependency Diversification: Maintain two delivery planes (Primary CDN + XC WAAP) with independent health and failover. Policy‑Driven Adaptation: Encode SLOs and health criteria in XC GSLB so failover is autonomous and fast. Incremental Adaptation: Use WAAP to prioritize/shape traffic under stress (rate limits, bot controls, feature flags), keeping core transactions hot. Observation‑Informed Governance: Synthetics + SIEM + Game Days create the learning loop and harden policy over time. Quick-start checklist (copy/paste into your runbook) Delegate critical zones to XC DNS; set low TTLs for fast convergence. Create GSLB pools: Primary = CDN CNAME; Secondary = XC VIP/CNAME; select priority/latency steering. Add synthetics: validate status, content, and p95 latency from ≥5 vantage points; wire to pool health. Deploy CE into each origin VPC/DC; lock origins to CE ingress only. Unify WAAP/bot policies across paths (policy‑as‑code; connectors where applicable). Instrument & drill: stream GSLB/WAAP logs to SIEM and run quarterly Game Days96Views1like0CommentsF5 Container Ingress Services (CIS) deployment using Cilium CNI and static routes
F5 Container Ingress Services (CIS) supports static route configuration to enable direct routing from F5 BIG-IP to Kubernetes/OpenShift Pods as an alternative to VXLAN tunnels. Static routes are enabled in the F5 CIS CLI/Helm yaml manifest using the argument --static-routing-mode=true. In this article, we will use Cilium as the Container Network Interface (CNI) and configure static routes for an NGINX deployment For initial configuration of the BIG-IP, including AS3 installation, please see https://clouddocs.f5.com/products/extensions/f5-appsvcs-extension/latest/userguide/installation.html and https://clouddocs.f5.com/containers/latest/userguide/kubernetes/#cis-installation The first step is to install Cilium CNI using the steps below on Linux host: CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt) CLI_ARCH=amd64 if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} cilium install --version 1.18.5 cilium status cilium status --wait root@ciliumk8s-ubuntu-server:~# cilium status --wait /¯¯\ /¯¯\__/¯¯\ Cilium: OK \__/¯¯\__/ Operator: OK /¯¯\__/¯¯\ Envoy DaemonSet: OK \__/¯¯\__/ Hubble Relay: disabled \__/ ClusterMesh: disabled DaemonSet cilium Desired: 1, Ready: 1/1, Available: 1/1 DaemonSet cilium-envoy Desired: 1, Ready: 1/1, Available: 1/1 Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1 Containers: cilium Running: 1 cilium-envoy Running: 1 cilium-operator Running: 1 clustermesh-apiserver hubble-relay Cluster Pods: 6/6 managed by Cilium Helm chart version: 1.18.3 Image versions cilium quay.io/cilium/cilium:v1.18.3@sha256:5649db451c88d928ea585514746d50d91e6210801b300c897283ea319d68de15: 1 cilium-envoy quay.io/cilium/cilium-envoy:v1.34.10-1761014632-c360e8557eb41011dfb5210f8fb53fed6c0b3222@sha256:ca76eb4e9812d114c7f43215a742c00b8bf41200992af0d21b5561d46156fd15: 1 cilium-operator quay.io/cilium/operator-generic:v1.18.3@sha256:b5a0138e1a38e4437c5215257ff4e35373619501f4877dbaf92c89ecfad81797: 1 cilium connectivity test root@ciliumk8s-ubuntu-server:~# cilium connectivity test ℹ️ Monitor aggregation detected, will skip some flow validation steps ✨ [default] Creating namespace cilium-test-1 for connectivity check... ✨ [default] Deploying echo-same-node service... ✨ [default] Deploying DNS test server configmap... ✨ [default] Deploying same-node deployment... ✨ [default] Deploying client deployment... ✨ [default] Deploying client2 deployment... ✨ [default] Deploying ccnp deployment... ⌛ [default] Waiting for deployment cilium-test-1/client to become ready... ⌛ [default] Waiting for deployment cilium-test-1/client2 to become ready... ⌛ [default] Waiting for deployment cilium-test-1/echo-same-node to become ready... ⌛ [default] Waiting for deployment cilium-test-ccnp1/client-ccnp to become ready... ⌛ [default] Waiting for deployment cilium-test-ccnp2/client-ccnp to become ready... ⌛ [default] Waiting for pod cilium-test-1/client-645b68dcf7-s5mdb to reach DNS server on cilium-test-1/echo-same-node-f5b8d454c-qkgq9 pod... ⌛ [default] Waiting for pod cilium-test-1/client2-66475877c6-cw7f5 to reach DNS server on cilium-test-1/echo-same-node-f5b8d454c-qkgq9 pod... ⌛ [default] Waiting for pod cilium-test-1/client-645b68dcf7-s5mdb to reach default/kubernetes service... ⌛ [default] Waiting for pod cilium-test-1/client2-66475877c6-cw7f5 to reach default/kubernetes service... ⌛ [default] Waiting for Service cilium-test-1/echo-same-node to become ready... ⌛ [default] Waiting for Service cilium-test-1/echo-same-node to be synchronized by Cilium pod kube-system/cilium-lxjxf ⌛ [default] Waiting for NodePort 10.69.12.2:32046 (cilium-test-1/echo-same-node) to become ready... 🔭 Enabling Hubble telescope... ⚠️ Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:4245: connect: connection refused" ℹ️ Expose Relay locally with: cilium hubble enable cilium hubble port-forward& ℹ️ Cilium version: 1.18.3 🏃[cilium-test-1] Running 126 tests ... [=] [cilium-test-1] Test [no-policies] [1/126] .................... [=] [cilium-test-1] Skipping test [no-policies-from-outside] [2/126] (skipped by condition) [=] [cilium-test-1] Test [no-policies-extra] [3/126] <- snip -> For this article, we will install k3s with Cilium CNI root@ciliumk8s-ubuntu-server:~# curl -sfL https://get.k3s.io | sh -s - --flannel-backend=none --disable-kube-proxy --disable servicelb --disable-network-policy --disable traefik --cluster-init --node-ip=10.69.12.2 --cluster-cidr=10.42.0.0/16 root@ciliumk8s-ubuntu-server:~# mkdir -p $HOME/.kube root@ciliumk8s-ubuntu-server:~# sudo cp -i /etc/rancher/k3s/k3s.yaml $HOME/.kube/config root@ciliumk8s-ubuntu-server:~# sudo chown $(id -u):$(id -g) $HOME/.kube/config root@ciliumk8s-ubuntu-server:~# echo "export KUBECONFIG=$HOME/.kube/config" >> $HOME/.bashrc root@ciliumk8s-ubuntu-server:~# source $HOME/.bashrc API_SERVER_IP=10.69.12.2 API_SERVER_PORT=6443 CLUSTER_ID=1 CLUSTER_NAME=`hostname` POD_CIDR="10.42.0.0/16" root@ciliumk8s-ubuntu-server:~# cilium install --set cluster.id=${CLUSTER_ID} --set cluster.name=${CLUSTER_NAME} --set k8sServiceHost=${API_SERVER_IP} --set k8sServicePort=${API_SERVER_PORT} --set ipam.operator.clusterPoolIPv4PodCIDRList=$POD_CIDR --set kubeProxyReplacement=true --helm-set=operator.replicas=1 root@ciliumk8s-ubuntu-server:~# cilium config view | grep cluster bpf-lb-external-clusterip false cluster-id 1 cluster-name ciliumk8s-ubuntu-server cluster-pool-ipv4-cidr 10.42.0.0/16 cluster-pool-ipv4-mask-size 24 clustermesh-enable-endpoint-sync false clustermesh-enable-mcs-api false ipam cluster-pool max-connected-clusters 255 policy-default-local-cluster false root@ciliumk8s-ubuntu-server:~# cilium status --wait The F5 CIS yaml manifest for deployment using Helm Note that these arguments are required for CIS to leverage static routes static-routing-mode: true orchestration-cni: cilium-k8s We will also be installing custom resources, so this argument is also required 3. custom-resource-mode: true Values yaml manifest for Helm deployment bigip_login_secret: f5-bigip-ctlr-login bigip_secret: create: false username: password: rbac: create: true serviceAccount: # Specifies whether a service account should be created create: true # The name of the service account to use. # If not set and create is true, a name is generated using the fullname template name: k8s-bigip-ctlr # This namespace is where the Controller lives; namespace: kube-system ingressClass: create: true ingressClassName: f5 isDefaultIngressController: true args: # See https://clouddocs.f5.com/containers/latest/userguide/config-parameters.html # NOTE: helm has difficulty with values using `-`; `_` are used for naming # and are replaced with `-` during rendering. # REQUIRED Params bigip_url: X.X.X.S bigip_partition: <BIG-IP_PARTITION> # OPTIONAL PARAMS -- uncomment and provide values for those you wish to use. static-routing-mode: true orchestration-cni: cilium-k8s # verify_interval: # node-poll_interval: # log_level: DEBUG # python_basedir: ~ # VXLAN # openshift_sdn_name: # flannel_name: cilium-vxlan # KUBERNETES # default_ingress_ip: # kubeconfig: # namespaces: ["foo", "bar"] # namespace_label: # node_label_selector: pool_member_type: cluster # resolve_ingress_names: # running_in_cluster: # use_node_internal: # use_secrets: insecure: true custom-resource-mode: true log-as3-response: true as3-validation: true # gtm-bigip-password # gtm-bigip-url # gtm-bigip-username # ipam : true image: # Use the tag to target a specific version of the Controller user: f5networks repo: k8s-bigip-ctlr pullPolicy: Always version: latest # affinity: # nodeAffinity: # requiredDuringSchedulingIgnoredDuringExecution: # nodeSelectorTerms: # - matchExpressions: # - key: kubernetes.io/arch # operator: Exists # securityContext: # runAsUser: 1000 # runAsGroup: 3000 # fsGroup: 2000 # If you want to specify resources, uncomment the following # limits_cpu: 100m # limits_memory: 512Mi # requests_cpu: 100m # requests_memory: 512Mi # Set podSecurityContext for Pod Security Admission and Pod Security Standards # podSecurityContext: # runAsUser: 1000 # runAsGroup: 1000 # privileged: true Installation steps for deploying F5 CIS using helm can be found in this link https://clouddocs.f5.com/containers/latest/userguide/kubernetes/ Once F5 CIS is validated to be up and running, we can now deploy the following application example root@ciliumk8s-ubuntu-server:~# cat application.yaml apiVersion: cis.f5.com/v1 kind: VirtualServer metadata: labels: f5cr: "true" name: goblin-virtual-server namespace: nsgoblin spec: host: goblin.com pools: - path: /green service: svc-nodeport servicePort: 80 - path: /harry service: svc-nodeport servicePort: 80 virtualServerAddress: X.X.X.X --- apiVersion: apps/v1 kind: Deployment metadata: name: goblin-backend namespace: nsgoblin spec: replicas: 2 selector: matchLabels: app: goblin-backend template: metadata: labels: app: goblin-backend spec: containers: - name: goblin-backend image: nginx:latest ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: svc-nodeport namespace: nsgoblin spec: selector: app: goblin-backend ports: - port: 80 targetPort: 80 type: ClusterIP k apply -f application.yaml We can now verify the k8s pods are created. Then we will create a sample html page to test access to the backend NGINX pod root@ciliumk8s-ubuntu-server:~# k -n nsgoblin get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES goblin-backend-7485b6dcdf-d5t48 1/1 Running 0 6d2h 10.42.0.70 ciliumk8s-ubuntu-server <none> <none> goblin-backend-7485b6dcdf-pt7hx 1/1 Running 0 6d2h 10.42.0.97 ciliumk8s-ubuntu-server <none> <none> root@ciliumk8s-ubuntu-server:~# k -n nsgoblin exec -it po/goblin-backend-7485b6dcdf-pt7hx -- /bin/sh # cat > green <<'EOF' <!DOCTYPE html> > > <html> > <head> <title>Green Goblin</title> <style> body { background-color: #4CAF50; color: white; text-align: center; padding: 50px; } h1 { font-size: 3em; } > > > > > </style> </head> <body> <h1>I am the green goblin!</h1> <p>Access me at /green</p> </body> </html> > > > > > > > EOF root@ciliumk8s-ubuntu-server:~# k -n nsgoblin exec -it goblin-backend-7485b6dcdf-d5t48 -- /bin/sh # cat > green <<'EOF' > <!DOCTYPE html> <html> <head> <title>Green Goblin</title> <style> body { background-color: #4CAF50; color: white; text-align: center; padding: 50px; } h1 { font-size: 3em; } </style> > </head> <body> <h1>I am the green goblin!</h1> <p>Access me at /green</p> </body> </html> EOF> > > > > > > > > > > > > We can now validate the pools are created on the F5 BIG-IP root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes/Shared)(tmos)# list ltm pool all ltm pool svc_nodeport_80_nsgoblin_goblin_com_green { description "crd_10_69_12_40_80 loadbalances this pool" members { /kubernetes/10.42.0.70:http { address 10.42.0.70 } /kubernetes/10.42.0.97:http { address 10.42.0.97 } } min-active-members 1 partition kubernetes } ltm pool svc_nodeport_80_nsgoblin_goblin_com_harry { description "crd_10_69_12_40_80 loadbalances this pool" members { /kubernetes/10.42.0.70:http { address 10.42.0.70 } /kubernetes/10.42.0.97:http { address 10.42.0.97 } } min-active-members 1 partition kubernetes } root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes/Shared)(tmos)# list ltm virtual crd_10_69_12_40_80 ltm virtual crd_10_69_12_40_80 { creation-time 2025-12-22:10:10:37 description Shared destination /kubernetes/10.69.12.40:http ip-protocol tcp last-modified-time 2025-12-22:10:10:37 mask 255.255.255.255 partition kubernetes persist { /Common/cookie { default yes } } policies { crd_10_69_12_40_80_goblin_com_policy { } } profiles { /Common/f5-tcp-progressive { } /Common/http { } } serverssl-use-sni disabled source 0.0.0.0/0 source-address-translation { type automap } translate-address enabled translate-port enabled vs-index 2 } CIS log output 2025/12/22 18:10:25 [INFO] [Request: 1] cluster local requested CREATE in VIRTUALSERVER nsgoblin/goblin-virtual-server 2025/12/22 18:10:25 [INFO] [Request: 1][AS3] creating a new AS3 manifest 2025/12/22 18:10:25 [INFO] [Request: 1][AS3][BigIP] posting request to https://10.69.12.1 for tenants 2025/12/22 18:10:26 [INFO] [Request: 2] cluster local requested UPDATE in ENDPOINTS nsgoblin/svc-nodeport 2025/12/22 18:10:26 [INFO] [Request: 3] cluster local requested UPDATE in ENDPOINTS nsgoblin/svc-nodeport 2025/12/22 18:10:43 [INFO] [Request: 1][AS3][BigIP] post resulted in SUCCESS 2025/12/22 18:10:43 [INFO] [AS3][POST] SUCCESS: code: 200 --- tenant:kubernetes --- message: success 2025/12/22 18:10:43 [INFO] [Request: 3][AS3] Processing request 2025/12/22 18:10:43 [INFO] [Request: 3][AS3] creating a new AS3 manifest 2025/12/22 18:10:43 [INFO] [Request: 3][AS3][BigIP] posting request to https://10.69.12.1 for tenants 2025/12/22 18:10:43 [INFO] Successfully updated status of VirtualServer:nsgoblin/goblin-virtual-server in Cluster W1222 18:10:49.238444 1 warnings.go:70] v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice 2025/12/22 18:10:52 [INFO] [Request: 3][AS3][BigIP] post resulted in SUCCESS 2025/12/22 18:10:52 [INFO] [AS3][POST] SUCCESS: code: 200 --- tenant:kubernetes --- message: success 2025/12/22 18:10:52 [INFO] Successfully updated status of VirtualServer:nsgoblin/goblin-virtual-server in Cluster Troubleshooting: 1. If static routes are not added, the first step is to inspect CIS logs for entries similar to these: Cilium annotation warning logs 2025/12/22 17:44:45 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:41 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:42 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:43 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2. These are resolved by adding annotations to the node using the reference: https://clouddocs.f5.com/containers/latest/userguide/static-route-support.html Cilium annotation for node root@ciliumk8s-ubuntu-server:~# k annotate node ciliumk8s-ubuntu-server io.cilium.network.ipv4-pod-cidr=10.42.0.0/16 root@ciliumk8s-ubuntu-server:~# k describe node | grep -E "Annotations:|PodCIDR:|^\s+.*pod-cidr" Annotations: alpha.kubernetes.io/provided-node-ip: 10.69.12.2 io.cilium.network.ipv4-pod-cidr: 10.42.0.0/16 PodCIDR: 10.42.0.0/24 3. Verify a static route has been created and test connectivity to k8s pods root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes)(tmos)# list net route net route k8s-ciliumk8s-ubuntu-server-10.69.12.2 { description 10.69.12.1 gw 10.69.12.2 network 10.42.0.0/16 partition kubernetes } Using pup (command line HTML parser) -> https://commandmasters.com/commands/pup-common/ root@ciliumk8s-ubuntu-server:~# curl -s http://goblin.com/green | pup 'body text{}' I am the green goblin! Access me at /green 1 0.000000 10.69.12.34 ? 10.69.12.40 TCP 78 34294 ? 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM TSval=2984295232 TSecr=0 WS=128 2 0.000045 10.69.12.40 ? 10.69.12.34 TCP 78 80 ? 34294 [SYN, ACK] Seq=0 Ack=1 Win=23360 Len=0 MSS=1460 WS=512 SACK_PERM TSval=1809316303 TSecr=2984295232 3 0.001134 10.69.12.34 ? 10.69.12.40 TCP 70 34294 ? 80 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=2984295234 TSecr=1809316303 4 0.001151 10.69.12.34 ? 10.69.12.40 HTTP 149 GET /green HTTP/1.1 5 0.001343 10.69.12.40 ? 10.69.12.34 TCP 70 80 ? 34294 [ACK] Seq=1 Ack=80 Win=23040 Len=0 TSval=1809316304 TSecr=2984295234 6 0.002497 10.69.12.1 ? 10.42.0.97 TCP 78 33707 ? 80 [SYN] Seq=0 Win=23360 Len=0 MSS=1460 WS=512 SACK_PERM TSval=1809316304 TSecr=0 7 0.003614 10.42.0.97 ? 10.69.12.1 TCP 78 80 ? 33707 [SYN, ACK] Seq=0 Ack=1 Win=64308 Len=0 MSS=1410 SACK_PERM TSval=1012609408 TSecr=1809316304 WS=128 8 0.003636 10.69.12.1 ? 10.42.0.97 TCP 70 33707 ? 80 [ACK] Seq=1 Ack=1 Win=23040 Len=0 TSval=1809316307 TSecr=1012609408 9 0.003680 10.69.12.1 ? 10.42.0.97 HTTP 149 GET /green HTTP/1.1 10 0.004774 10.42.0.97 ? 10.69.12.1 TCP 70 80 ? 33707 [ACK] Seq=1 Ack=80 Win=64256 Len=0 TSval=1012609409 TSecr=1809316307 11 0.004790 10.42.0.97 ? 10.69.12.1 TCP 323 HTTP/1.1 200 OK [TCP segment of a reassembled PDU] 12 0.004796 10.42.0.97 ? 10.69.12.1 HTTP 384 HTTP/1.1 200 OK 13 0.004820 10.69.12.40 ? 10.69.12.34 TCP 448 HTTP/1.1 200 OK [TCP segment of a reassembled PDU] 14 0.004838 10.69.12.1 ? 10.42.0.97 TCP 70 33707 ? 80 [ACK] Seq=80 Ack=254 Win=23552 Len=0 TSval=1809316308 TSecr=1012609410 15 0.004854 10.69.12.40 ? 10.69.12.34 HTTP 384 HTTP/1.1 200 OK Summary: There we have it, we have successfully deployed an NGINX application on a Kubernetes cluster managed by F5 CIS using static routes to forward traffic to the kubernetes pods430Views3likes2Comments