cloud
3965 TopicsHigh Availability for F5 NGINX Instance Manager in AWS
Introduction F5 NGINX Instance Manager gives you a centralized way to manage NGINX Open Source and NGINX Plus instances across your environment. It’s ideal for disconnected or air-gapped deployments, with no need for internet access or external cloud services. The NGINX Instance Manager features keep changing. They now include many features for managing configurations, like NGINX config versioning and templating, F5 WAF for NGINX policy and signature management, monitoring of NGINX metrics and security events, and a rich API to help external automation. As the role of NGINX Instance Manager becomes increasingly important in the management of disconnected NGINX fleets, the need for high availability increases. This article explores how we can use Linux clustering to provide high availability for NGINX Instance Manager across two availability zones in AWS. Core Technologies Core technologies used in this HA architecture design include: Amazon Elastic Compute instances (EC2) - virtual machines rented inside AWS that can be used to host applications, like NGINX Instance Manager. Pacemaker - an open-source high availability resource manager software used in Linux clusters since 2004. Pacemaker is generally deployed with the Corosync Cluster Engine, which provides the cluster node communication, membership tracking and cluster quorum. Amazon Elastic File System (EFS) - a serverless, fully managed, elastic Network File System (NFS) that allows servers to share file data simultaneously between systems. Amazon Network Load Balancer (NLB) - a layer 4 TCP/UDP load balancer that forwards traffic to targets like EC2 instances, containers or IP addresses. NLB can send periodic health checks to registered targets to ensure that traffic is only forwarded to healthy targets. Architecture Overview In this highly available architecture, we will install NGINX Instance Manager (NIM) on two EC2 instances in different AWS Availability Zones (AZ). Four EFS file systems will be created to share key stateful information between the two NIM instances, and Pacemaker/Corosync will be used to orchestrate the cluster - only one NIM instance is active at any time and Pacemaker will facilitate this by starting/stopping the NIM systemd services. Finally, an Amazon NLB will be used to provide network failover between the two NIM instances, using an HTTP health check to determine the active cluster node. Deployment Steps 1. Create AWS EFS file systems First, we are going to create four EFS volumes to hold important NIM configuration and state information that will be shared between nodes. These file systems will be mounted onto: /etc/nms, /var/lib/clickhouse, /var/lib/nms and /usr/share/nms inside the NIM node. Take note of the File System IDs of the newly created file systems. Edit the properties of each EFS file system and create a mount target in each AZ you intend to deploy a NIM node in, then restrict network access to only the NIM nodes by setting up an AWS Security Group. You may also consider more advanced authentication methods, but these aren't covered in this article. 2. Deploy two EC2 instances for NGINX Instance Manager Deploy two EC2 instances with suitable specifications to support the number of data plane instances that you plan to manage (you can find the sizing specifications here) and connect one to each of the AZ/subnet that you configured EFS mount targets in above. In this example, I will deploy two t2.medium instances running Ubuntu 24.04, connect one to us-east-1a and the other to us-east-1c, and create a security group allowing only traffic from its local assigned subnet. 3. Mount the EFS file systems on NGINX Instance Manager Node 1 Now we have the EC2 instances deployed, we can log on to Node 1 and mount the EFS volumes onto this node by executing the following steps: 1. SSH onto Node 1 2. Install efs-utils package if is not installed already 3. Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory 4. Execute mount -a to mount the file systems 5. Execute df to ensure that the paths are mounted correctly 4. Install NGINX Instance Manager on Node 1 With the EFS file systems now mounted, it's time to run through the NGINX Instance Manager installation on Node 1. 1. Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh 2. Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 3. Run bash install-nim-bundle.sh -d ubuntu22.04 4. Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node: systemctl stop nms; systemctl disable nms systemctl stop nginx; systemctl disable nginx systemctl stop clickhouse-server; systemctl disable clickhouse-server 5. Install NGINX Instance Manager on Node 2 This time we are going to install NGINX Instance Manager on Node two but without attaching the EFS file systems. On Node 2: 1. Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh 2. Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 3. Run bash install-nim-bundle.sh -d ubuntu22.04 4. Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node: systemctl stop nms; systemctl disable nms systemctl stop nginx; systemctl disable nginx systemctl stop clickhouse-server; systemctl disable clickhouse-server 6. Mount EFS file systems on NGINX Instance Manager Node 2 Now we have the NGINX Instance Manager binaries installed on each node, let's mount the EFS file systems on Node 2: 1. SSH onto Node 2 2. Install efs-utils package if is not installed already 3. Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory 4. Execute mount -a to mount the file systems 5. Execute df to ensure that the paths are mounted correctly 7. Install and configure Pacemaker/Corosync With NGINX Instance Manager now installed on both nodes, it's now time to get Pacemaker and Corosync installed: 1. Install Pacemaker, Corosync and other important agents sudo apt update sudo apt install pacemaker pcs corosync fence-agents-aws resource-agents-base 2. To allow Pacemaker to communicate between nodes, we need to add TCP communication between nodes to the Security Group for the NIM nodes. 3. Once we have the connectivity in place, we have to set a common password for the hacluster user on both nodes - we can do this by running the following command on both nodes: sudo passwd hacluster password: IloveF5 (don't use this!) 4. Now we start the Pacemaker services by running the following commands on both nodes: systemctl start pcsd.service systemctl enable pcsd.service systemctl status pcsd.service systemctl start pacemaker systemctl enable pacemaker 5. And finally, we authenticate the nodes with each other (using hacluster username, password and node hostname) and check the cluster status: pcs host auth ip-172-17-1-89 ip-172-17-2-160 pcs cluster setup nimcluster --force ip-172-17-1-89 pcs status 8. Configure Cluster Fencing Fencing is the ability to make a node unable to run resources, even when that node is unresponsive to cluster commands - you can think of fencing as cutting the power to the node. Fencing protects against corruption of data due to concurrent access to shared resources, commonly known as "split brain" scenario. In this architecture, we use the fence_aws agent, which uses boto3 library to connect to AWS and stop the EC2 instances of failing nodes. Let's install and configure the fence_aws agent: 1. Create an AWS Access Key and Secret Access key for fence_aws to use 2. Install the AWS CLI on both NIM nodes 3. Take note of the Instance IDs for the NIM instances 4. Configure the fence_aws agent as a Pacemaker STONITH device. Run the psc stonith command inserting your access key, secret key, region, and mappings of Instance ID to Linux hostname. pcs stonith create hacluster-stonith fence_aws access_key=(your access key) secret_key=(your secret key) region=us-east-1 pcmk_host_map="ip-172-31-34-95:i-0a46181368524dab6;ip-172-31-27-134:i-032d0b400b5689f68" power_timeout=240 pcmk_reboot_timeout=480 pcmk_reboot_retries=4 5. Run pcs status and make sure that the stonith device is started 9. Configure Pacemaker resources, colocations and contraints Ok - we are almost there! It's time to configure the Pacemaker resources, colocations and constraints. We want to make sure that the clickhouse-server, nms and nginx systemd services all come up on the same node together, and in that order. We can do that using Pacemaker colocations and constraints. 1. Configure a pacemaker resource for each systemd service pcs resource create clickhouse systemd:clickhouse-server pcs resource create nms systemd:nms.service pcs resource create nginx systemd:nginx.service 🔥HOT TIP🔥 check out pcs resource command options (op monitor interval etc.) to optimize failover time. 2. Create two colocations to make sure they all start on the same node pcs constraint colocation add clickhouse with nms pcs constraint colocation add nms with nginx 3. Create three constraints to define the startup order: Clickhouse -> NMS -> NGINX pcs constraint order start clickhouse then nms pcs constraint order start nms then nginx 4. Enable and start the pcs cluster pcs cluster enable --all pcs cluster start --all 10. Provision AWS NLB Load Balancer Finally - we are going to set up the AWS Network Load Balancer (NLB) to facilitate the failover. Create a Security Group entry to allow HTTPs traffic to enter the EC2 instance from the local subnet 2. Create a Load Balancer target group, targeting instances, with Protocol TCP on port 443 ⚠️NOTE ⚠️ if you are using Load balancing with TCP Protocol and terminating the TLS connection on the NIM node (EC2 instance), you must create a security group entry to allow TCP 443 from the connecting clients directly to the EC2 instance IP address. If you have trusted SSL/TLS server certificates, you may want to investigate a load balancer for TLS protocol. 3. Ensure that a HTTPS health check is in place to facilitate the failover 🔥HOT TIP🔥 you can speed up failure detection and failover using Advanced health check settings. 4. Include our two NIM instances as pending and save the target group 5. Now let's create the network load balancer (NLB) listening on TCP port 443 and forwarding to the target group created above. 6. Once the load balancer is created, check the target group and you will find that one of the targets is healthy - that's the active node in the pacemaker cluster! 7. With the load balancing now in place, you can access the NIM console using the FQDN for your load balancer and login with the password set in the install of Node 1. 8. Once you have logged in, we need to install a license before we proceed any further: Click on Settings Click on Licenses Click Get Started Click Browse Upload your license Click Add 9. With the license now installed, we have access to the full console 11. Test failover The easiest way to test failover is to just shut down the active node in the cluster. Pacemaker will detect the node is no longer available and start the services on the remaining node. Stop the active node/instance of the NIM 2. Monitor the Target Group and watch it fail over - depending on the settings you have set up, this may take a few minutes 12. How to upgrade NGINX Instance Manager on the cluster To upgrade NGINX Instance Manager in a Pacemaker cluster, perform the following tasks: 1. Stop the Pacemaker Cluster services on Node 2 - forcing Node 1 to take over. pcs cluster stop ip-172-17-2-160 2. Disconnect the NFS mounts on Node2 umount /usr/share/nms umount /etc/nms umount /var/lib/nms umount /var/lib/clickhouse 3. Upgrade NGINX Instance Manager on Node 1 Download the update from the MyF5 Customer Portal sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb sudo systemctl restart nms sudo systemctl restart nginx 4. Upgrade NGINX Instance Manager on Node 2 (with the NFS mounts disconnected) Download the update from the MyF5 Customer Portal sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb sudo systemctl restart nms sudo systemctl restart nginx 5. Re-mount all the NFS mounts on Node 2 mount -a 6. Start the Pacemaker Cluster services on Node 2 - adding it back into the cluster pcs cluster start ip-172-17-2-160 13. Reference Documents Some good references on Pacemaker/Corosync clustering can be found here: Configuring a Red Hat High Availability cluster on AWS Implement a High-Availability Cluster with Pacemaker and Corosync ClusterLabs Pacemaker website Corosync Cluster Engine website188Views0likes0CommentsAPM checking for URI
I have created an APM policy that checks to is it the URI contains a specific URI. If the URI is anything else then the fallback is to send the traffic. example: https://www.fubar.com/admin - the APM is looking for /admin and if present the traffic will then go to the next step is certificate prompt if the URI contains anything else use the fallback to continue. For example https://www.fubar.com/documents/invenioHealth the APM would use the fallback and just let the traffic pass https://www.fubar.com/documents/invenioHealth in this case the F5 is sending a 302, instead of just sending the traffic through and then sends a FIN/ACK back to the source.32Views0likes1CommentAccelerate Application Deployment on Google Cloud with F5 NGINXaaS
Introduction In the push for cloud-native agility, infrastructure teams often face a crossroads: settle for basic, "good enough" load balancing, or take on the heavy lifting of manually managing complex, high-performance proxies. For those building on Google Cloud (GCP), this compromise is no longer necessary. F5 NGINXaaS for Google Cloud represents a shift in how we approach application delivery. It isn’t just NGINX running in the cloud; it is a co-engineered, fully managed on-demand service that lives natively within the GCP ecosystem. This integration allows you to combine the advanced traffic control and programmability NGINX is known for with the effortless scaling and consumption model of an SaaS offering in a platform-first way. By offloading the "toil" of lifecycle management—like patching, tuning, and infrastructure provisioning—to F5, teams can redirect their energy toward modernizing application logic and accelerating release cycles. In this article, we’ll dive into how this synergy between F5 and Google Cloud simplifies your architecture, from securing traffic with integrated secret management to gaining deep operational insights through native monitoring tools. Getting Started with NGINXaaS for Google Cloud The transition to a managed service begins with a seamless onboarding experience through the Google Cloud Marketplace. By leveraging this integrated path, teams can bypass the manual "toil" of traditional infrastructure setup, such as patching and individual instance maintenance. The deployment process involves: Marketplace Subscription: Directly subscribe to the service to ensure unified billing and support. Network Connectivity: Setting up essential VPC and Network Attachments to allow NGINXaaS to communicate securely with your backend resources. Provisioning: Launching a dedicated deployment that provides enterprise-grade reliability while maintaining a cloud-native feel. Secure and Manage SSL/TLS in F5 NGINXaaS for Google Cloud Security is a foundational pillar of this co-engineered service, particularly regarding traffic encryption. NGINXaaS simplifies the lifecycle of SSL/TLS certificates by providing a centralized way to manage credentials. Key security features include: Integrated Secrets Management: Working natively with Google Cloud services to handle sensitive data like private keys and certificates securely. Proxy Configuration: Demonstrating how to set up a Google Cloud proxy network load balancer to handle incoming client traffic. Credential Deployment: Uploading and managing certificates directly within the NGINX console to ensure all application endpoints are protected by robust encryption. Enhancing Visibility in Google Cloud with F5 NGINXaaS Visibility is no longer an afterthought but a native component of the deployment, providing high-fidelity telemetry without separate agents. Native Telemetry Export: By linking your Google Cloud Project ID and configuring Workload Identity Federation (WIF), metrics and logs are pushed directly to Google Cloud Monitoring. Real-Time Dashboards: The observability demo walks through using the Metrics Explorer to visualize critical performance data, such as active HTTP connection counts and response rates. Actionable Logging: Integrated Log Analytics allow you to use the Logs Explorer to isolate events and troubleshoot application issues within a single toolset, streamlining your operational workflow. Whether you are just beginning your transition to the cloud or fine-tuning a sophisticated microservices architecture, F5 NGINXaaS provides the advanced availability, scalability, security, and visibility capabilities necessary for success in the Google Cloud environment. Conclusion The integration of F5 NGINXaaS for Google Cloud represents a significant advantage for organizations looking to modernize their application delivery without the traditional overhead of infrastructure management. By shifting to this co-engineered, managed service, teams can bridge together advanced NGINX performance and the native agility of the Google Cloud ecosystem. Through the demonstrations provided in this article, we’ve highlighted how you can: Accelerate Onboarding: Move from Marketplace subscription to a live deployment in minutes using Network Attachments. Fortify Security: Centralize SSL/TLS management within the NGINX console while leveraging Google Cloud's robust networking layer. Maximize Operational Intelligence: Harness deep, real-time observability by piping telemetry directly into Google Cloud Monitoring and Logging. Resources Accelerating app transformation with F5 NGINXaaS for Google Cloud F5 NGINXaaS for Google Cloud: Delivering resilient, scalable applications98Views2likes2CommentsAI Inference for VLLM models with F5 BIG-IP & Red Hat OpenShift
This article shows how to perform Intelligent Load Balancing for AI workloads using the new features of BIG-IP v21 and Red Hat OpenShift. Intelligent Load Balancing is done based on business logic rules without iRule programming and state metrics of the VLLM inference servers gathered from OpenShift´s Prometheus.367Views1like5CommentsF5 in AZ
We are building F5 BIG-IP in Azure. Our long term intention is Active-Active or Active-Standby HA, but to kick start we are deploying a single standalone instance first. The F5 is not exposed to the internet directly. We have a Palo Alto firewall performing DNAT to convert the public IP to a private IP, and that private IP is the F5 VIP. We are using Azure basic Load Balancer to send traffic to F5. Our example external subnet is 10.1.1.0/24 and the IPs are configured as follows on the Azure NIC and F5. The Primary Self IP is 10.1.1.10, the first Secondary IP is 10.1.1.11 which is VIP for App1, and the second Secondary IP is 10.1.1.12 which is VIP for App2 and follows. My questions are as follows. First, in the ALB backend pool, should we use the Primary Self IP 10.1.1.10 or the Secondary VIP IPs 10.1.1.11 and 10.1.1.12? If we use Secondary IPs, do we need a separate ALB for each VIP? We have seen some older videos suggesting Secondary IPs should be used in the backend pool but we want to confirm the correct approach. Second, when we expand to HA in the future by adding a second F5 device, can both devices be configured with the same VIP IPs such as 10.1.1.11 and 10.1.1.12? And since Azure does not support floating IPs moving between VMs, we understand ALB health probes handle failover, so in that case should the ALB backend pool contain the Primary Self IPs of both devices? Please advise on the correct design for both standalone and HA scenarios.94Views0likes2CommentsCrafting a Cloud-Based Antifragile Cyber Resiliency Strategy
Application availability is a continually evolving concept. With the proliferation of hybrid multicloud applications and environments, fronted by third-party services like AWS and Azure, what used to be a back‑end challenge now encompasses even the delivery‑path. Even if an application is healthy, a failing CDN, edge location, or cloud control plane can make it unreachable, bringing business to a halt. Teams that support applications need to plan for this. They need their networks to reduce the blast radius of external failures. They need them to adapt as the failure unfolds. They need them to incorporate the lessons of disruption into future behavior. And they need them to be a systemic and strategic advantage rather than an operational obligation. Why this matters now Late‑2025 outages exposed how single‑vendor or single‑region bets amplify risk. A Cloudflare edge configuration defect propagated globally, causing widespread 5xx responses, despite customers’ origins being sound. Weeks earlier, a control-plane issue impacting AWS’s US‑East‑1 DNS stranded workloads, even impairing provider‑native failover actions. The organizations that fared best ran multi‑path delivery with independent steering at the DNS layer, allowing them to circumvent the gaps in their delivery networks that these outages created. F5’s ADSP portfolio has the tools to help create these resiliency paths: Distributed Cloud (XC) DNS & DNS Load Balancer (DNSLB) enable independent global traffic steering; XC Customer Edge (CE) enables private, controlled failover paths; XC Synthetic Monitoring uses continuous probes from regional edge locations to detect performance degradation or outages; and XC WAAP (Web Application & API Protection) enables consistent security during failover and graceful degradation. Together, they can provide teams with the necessary tools to get ahead of outages and keep applications online, not simply respond reactively after the fact of an outage. The reference architecture at a glance The goal of any modern, antifragile resiliency strategy is to keep full control over how traffic is routed, in all conditions and scenarios. This way, the network becomes an intelligent part of the application delivery and availability strategy. Using built‑in traffic routing policies, it automatically switches to available paths when a problem arises, keeping necessary security protections fully intact and applications online Global steering (Authoritative DNS + GSLB): For many organizations, DNS remains anchored to a single provider, creating an unnecessary dependency and a larger operational blast radius when that provider experiences issues. Delegating critical zones to XC DNS distributes authoritative DNS duties across F5’s global anycast footprint, reducing reliance on any one hyperscaler or DNS control plane. Running alongside this, the DNS Load Balancer applies Global Server Load Balancing policies that evaluate both origin and edge health, returning DNS responses that reflect real service conditions rather than static routing assumptions. Even when a provider’s own control plane is impaired. XC runs on F5’s independent backbone and can steer traffic to any public DNS name or IP (including a primary CDN CNAME), making it ideal for multi‑vendor delivery. Delivery planes (Primary + Alternate): Use your primary CDN/edge as usual, but maintain an alternate ingress on F5 XC WAAP. GSLB returns the primary CDN CNAME under normal conditions and automatically returns the XC VIP/CNAME when health checks degrade. This duality breaks monoculture risk at the edge. Secure origin connectivity (Customer Edge): Place F5 Customer Edge nodes in your VPCs/data centers. When traffic shifts to XC, it traverses F5’s encrypted backbone to the CE, keeping your origins private and avoiding an emergency “open to the internet” posture during failover. Consistent protection (WAAP + Bot Defense): Enforce WAAP policies and (optionally) Bot Defense connectors, so the same security logic applies regardless of which delivery path is active. This closes the “security arbitrage” gap attackers exploit when a secondary path is weaker. Observability & learning (synthetics + SIEM): Run synthetic monitors that test real HTTP journeys (not just TCP liveness) This is one of the only reliable ways to spot “grey failures” like mass 503s. The resultant logs can be streamed to your SIEM for further analysis. Use Game Days to rehearse provider cutovers and capture evidence. How the pieces fit - step by step 1) Make XC your steering layer (Authoritative DNS + GSLB) Delegate your service zone(s) to XC DNS. Keep a short TTL (e.g., 30–60s) on critical records to accelerate adaptation. In DNS Load Balancer, create two Origin Pools: Primary: your CDN CNAME (e.g., www.example.com.cdn.cloudflare.net). Secondary: the XC HTTP LB VIP or CNAME (vip.example.xc.f5.com). Attach health checks and steering policies (priority‑based, latency‑based, or SLO‑aware) so the LB returns the primary CNAME when healthy and the XC endpoint otherwise. Why DNS/GSLB first? When a provider’s control plane stalls (e.g., Route 53 updates freeze), you still retain an external control plane to redirect users, without waiting on the impacted provider to recover. 2) Detect real problems with synthetic monitoring Create Synthetic Monitors in XC that fetch a known‑good URL (/healthz) through the primary CDN path and validate status code, content, and latency. Don’t rely on ping/TCP, as grey failures may look healthy to L4 checks. By using multiple vantage points, you can tune thresholds to trigger “pool down” events quickly but safely. Failover logic example (conceptual): 3) Keep origins private with CE Deploy CE nodes into each origin environment. XC Regional Edge terminates client traffic, applies WAAP controls, then forwards over the F5 backbone to CE, which in turn reaches your service on private subnets. During a CDN failure, you don’t have to expose an emergency public listener or widen security groups. 4) Avoid “security arbitrage” during failover Use a policy‑as‑code approach to keep WAAP intent portable and synchronized between providers (e.g., a positive security model for APIs, shared bot policies). Where applicable, F5 Bot Defense Connector can evaluate traffic headed through Cloudflare so your bot verdicts are identical across both paths. 5) Close the loop: observability, drills, and policy evolution To maintain continuous situational awareness, stream DNS/GSLB state changes, WAAP events, and synthetic monitoring results into your SIEM, such as Splunk or Datadog, to build a global traffic‑health view rather than a narrow “Provider‑X‑only” dashboard. Reinforce this operational picture with quarterly Game Days that deliberately disable the primary pool in XC to validate time‑to‑detect, time‑to‑shift, and confirm that the secondary path meets expected performance and SLO targets. What happens in real outages? Scenario A: A CDN logical push breaks the edge: Your synthetics see 503s from multiple vantage points through the CDN CNAME. The primary pool flips to Down, and XC DNS starts returning the XC VIP. As client caches expire (short TTLs help), users seamlessly land on the XC path (WAAP intact), traverse the F5 backbone, and reach your private origins through CE. This bypasses the global bad push entirely. Scenario B: Cloud control plane failure (e.g., US‑East‑1): Because XC runs on an independent control plane, its monitors and DNS orchestration continue operating. GSLB marks the impacted origin/region down and returns a DR pool (another region or on‑prem behind CE), even if provider‑native tools are unresponsive. Why F5 for the DNS layer? A hybrid DNS/GSLB toolset like the one found in F5’s Distributed Cloud is built for multicloud: F5 XC DNS Load Balancer steers to any origin, IPs or CNAMEs, so you can keep your preferred CDN while retaining an independent steering layer. This toolset is further complemented by Distributed Cloud Synthetic Monitoring. Designed for grey‑failure detection, it simulates user traffic against endpoints to quantify health and performance in real-time, to catch issues that liveness checks miss. All of these functions run on F5’s independent backbone and dedicated Customer Edge nodes. These private, encrypted paths to origins reduce exposure and keep performance stable when the public internet is noisy. Tying back to the five antifragile practices Blast Radius Control: Isolate critical names and flows in distinct DNS/GSLB policies so one provider’s trouble doesn’t cascade. Dependency Diversification: Maintain two delivery planes (Primary CDN + XC WAAP) with independent health and failover. Policy‑Driven Adaptation: Encode SLOs and health criteria in XC GSLB so failover is autonomous and fast. Incremental Adaptation: Use WAAP to prioritize/shape traffic under stress (rate limits, bot controls, feature flags), keeping core transactions hot. Observation‑Informed Governance: Synthetics + SIEM + Game Days create the learning loop and harden policy over time. Quick-start checklist (copy/paste into your runbook) Delegate critical zones to XC DNS; set low TTLs for fast convergence. Create GSLB pools: Primary = CDN CNAME; Secondary = XC VIP/CNAME; select priority/latency steering. Add synthetics: validate status, content, and p95 latency from ≥5 vantage points; wire to pool health. Deploy CE into each origin VPC/DC; lock origins to CE ingress only. Unify WAAP/bot policies across paths (policy‑as‑code; connectors where applicable). Instrument & drill: stream GSLB/WAAP logs to SIEM and run quarterly Game Days77Views1like0CommentsF5 Container Ingress Services (CIS) deployment using Cilium CNI and static routes
F5 Container Ingress Services (CIS) supports static route configuration to enable direct routing from F5 BIG-IP to Kubernetes/OpenShift Pods as an alternative to VXLAN tunnels. Static routes are enabled in the F5 CIS CLI/Helm yaml manifest using the argument --static-routing-mode=true. In this article, we will use Cilium as the Container Network Interface (CNI) and configure static routes for an NGINX deployment For initial configuration of the BIG-IP, including AS3 installation, please see https://clouddocs.f5.com/products/extensions/f5-appsvcs-extension/latest/userguide/installation.html and https://clouddocs.f5.com/containers/latest/userguide/kubernetes/#cis-installation The first step is to install Cilium CNI using the steps below on Linux host: CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt) CLI_ARCH=amd64 if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} cilium install --version 1.18.5 cilium status cilium status --wait root@ciliumk8s-ubuntu-server:~# cilium status --wait /¯¯\ /¯¯\__/¯¯\ Cilium: OK \__/¯¯\__/ Operator: OK /¯¯\__/¯¯\ Envoy DaemonSet: OK \__/¯¯\__/ Hubble Relay: disabled \__/ ClusterMesh: disabled DaemonSet cilium Desired: 1, Ready: 1/1, Available: 1/1 DaemonSet cilium-envoy Desired: 1, Ready: 1/1, Available: 1/1 Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1 Containers: cilium Running: 1 cilium-envoy Running: 1 cilium-operator Running: 1 clustermesh-apiserver hubble-relay Cluster Pods: 6/6 managed by Cilium Helm chart version: 1.18.3 Image versions cilium quay.io/cilium/cilium:v1.18.3@sha256:5649db451c88d928ea585514746d50d91e6210801b300c897283ea319d68de15: 1 cilium-envoy quay.io/cilium/cilium-envoy:v1.34.10-1761014632-c360e8557eb41011dfb5210f8fb53fed6c0b3222@sha256:ca76eb4e9812d114c7f43215a742c00b8bf41200992af0d21b5561d46156fd15: 1 cilium-operator quay.io/cilium/operator-generic:v1.18.3@sha256:b5a0138e1a38e4437c5215257ff4e35373619501f4877dbaf92c89ecfad81797: 1 cilium connectivity test root@ciliumk8s-ubuntu-server:~# cilium connectivity test ℹ️ Monitor aggregation detected, will skip some flow validation steps ✨ [default] Creating namespace cilium-test-1 for connectivity check... ✨ [default] Deploying echo-same-node service... ✨ [default] Deploying DNS test server configmap... ✨ [default] Deploying same-node deployment... ✨ [default] Deploying client deployment... ✨ [default] Deploying client2 deployment... ✨ [default] Deploying ccnp deployment... ⌛ [default] Waiting for deployment cilium-test-1/client to become ready... ⌛ [default] Waiting for deployment cilium-test-1/client2 to become ready... ⌛ [default] Waiting for deployment cilium-test-1/echo-same-node to become ready... ⌛ [default] Waiting for deployment cilium-test-ccnp1/client-ccnp to become ready... ⌛ [default] Waiting for deployment cilium-test-ccnp2/client-ccnp to become ready... ⌛ [default] Waiting for pod cilium-test-1/client-645b68dcf7-s5mdb to reach DNS server on cilium-test-1/echo-same-node-f5b8d454c-qkgq9 pod... ⌛ [default] Waiting for pod cilium-test-1/client2-66475877c6-cw7f5 to reach DNS server on cilium-test-1/echo-same-node-f5b8d454c-qkgq9 pod... ⌛ [default] Waiting for pod cilium-test-1/client-645b68dcf7-s5mdb to reach default/kubernetes service... ⌛ [default] Waiting for pod cilium-test-1/client2-66475877c6-cw7f5 to reach default/kubernetes service... ⌛ [default] Waiting for Service cilium-test-1/echo-same-node to become ready... ⌛ [default] Waiting for Service cilium-test-1/echo-same-node to be synchronized by Cilium pod kube-system/cilium-lxjxf ⌛ [default] Waiting for NodePort 10.69.12.2:32046 (cilium-test-1/echo-same-node) to become ready... 🔭 Enabling Hubble telescope... ⚠️ Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:4245: connect: connection refused" ℹ️ Expose Relay locally with: cilium hubble enable cilium hubble port-forward& ℹ️ Cilium version: 1.18.3 🏃[cilium-test-1] Running 126 tests ... [=] [cilium-test-1] Test [no-policies] [1/126] .................... [=] [cilium-test-1] Skipping test [no-policies-from-outside] [2/126] (skipped by condition) [=] [cilium-test-1] Test [no-policies-extra] [3/126] <- snip -> For this article, we will install k3s with Cilium CNI root@ciliumk8s-ubuntu-server:~# curl -sfL https://get.k3s.io | sh -s - --flannel-backend=none --disable-kube-proxy --disable servicelb --disable-network-policy --disable traefik --cluster-init --node-ip=10.69.12.2 --cluster-cidr=10.42.0.0/16 root@ciliumk8s-ubuntu-server:~# mkdir -p $HOME/.kube root@ciliumk8s-ubuntu-server:~# sudo cp -i /etc/rancher/k3s/k3s.yaml $HOME/.kube/config root@ciliumk8s-ubuntu-server:~# sudo chown $(id -u):$(id -g) $HOME/.kube/config root@ciliumk8s-ubuntu-server:~# echo "export KUBECONFIG=$HOME/.kube/config" >> $HOME/.bashrc root@ciliumk8s-ubuntu-server:~# source $HOME/.bashrc API_SERVER_IP=10.69.12.2 API_SERVER_PORT=6443 CLUSTER_ID=1 CLUSTER_NAME=`hostname` POD_CIDR="10.42.0.0/16" root@ciliumk8s-ubuntu-server:~# cilium install --set cluster.id=${CLUSTER_ID} --set cluster.name=${CLUSTER_NAME} --set k8sServiceHost=${API_SERVER_IP} --set k8sServicePort=${API_SERVER_PORT} --set ipam.operator.clusterPoolIPv4PodCIDRList=$POD_CIDR --set kubeProxyReplacement=true --helm-set=operator.replicas=1 root@ciliumk8s-ubuntu-server:~# cilium config view | grep cluster bpf-lb-external-clusterip false cluster-id 1 cluster-name ciliumk8s-ubuntu-server cluster-pool-ipv4-cidr 10.42.0.0/16 cluster-pool-ipv4-mask-size 24 clustermesh-enable-endpoint-sync false clustermesh-enable-mcs-api false ipam cluster-pool max-connected-clusters 255 policy-default-local-cluster false root@ciliumk8s-ubuntu-server:~# cilium status --wait The F5 CIS yaml manifest for deployment using Helm Note that these arguments are required for CIS to leverage static routes static-routing-mode: true orchestration-cni: cilium-k8s We will also be installing custom resources, so this argument is also required 3. custom-resource-mode: true Values yaml manifest for Helm deployment bigip_login_secret: f5-bigip-ctlr-login bigip_secret: create: false username: password: rbac: create: true serviceAccount: # Specifies whether a service account should be created create: true # The name of the service account to use. # If not set and create is true, a name is generated using the fullname template name: k8s-bigip-ctlr # This namespace is where the Controller lives; namespace: kube-system ingressClass: create: true ingressClassName: f5 isDefaultIngressController: true args: # See https://clouddocs.f5.com/containers/latest/userguide/config-parameters.html # NOTE: helm has difficulty with values using `-`; `_` are used for naming # and are replaced with `-` during rendering. # REQUIRED Params bigip_url: X.X.X.S bigip_partition: <BIG-IP_PARTITION> # OPTIONAL PARAMS -- uncomment and provide values for those you wish to use. static-routing-mode: true orchestration-cni: cilium-k8s # verify_interval: # node-poll_interval: # log_level: DEBUG # python_basedir: ~ # VXLAN # openshift_sdn_name: # flannel_name: cilium-vxlan # KUBERNETES # default_ingress_ip: # kubeconfig: # namespaces: ["foo", "bar"] # namespace_label: # node_label_selector: pool_member_type: cluster # resolve_ingress_names: # running_in_cluster: # use_node_internal: # use_secrets: insecure: true custom-resource-mode: true log-as3-response: true as3-validation: true # gtm-bigip-password # gtm-bigip-url # gtm-bigip-username # ipam : true image: # Use the tag to target a specific version of the Controller user: f5networks repo: k8s-bigip-ctlr pullPolicy: Always version: latest # affinity: # nodeAffinity: # requiredDuringSchedulingIgnoredDuringExecution: # nodeSelectorTerms: # - matchExpressions: # - key: kubernetes.io/arch # operator: Exists # securityContext: # runAsUser: 1000 # runAsGroup: 3000 # fsGroup: 2000 # If you want to specify resources, uncomment the following # limits_cpu: 100m # limits_memory: 512Mi # requests_cpu: 100m # requests_memory: 512Mi # Set podSecurityContext for Pod Security Admission and Pod Security Standards # podSecurityContext: # runAsUser: 1000 # runAsGroup: 1000 # privileged: true Installation steps for deploying F5 CIS using helm can be found in this link https://clouddocs.f5.com/containers/latest/userguide/kubernetes/ Once F5 CIS is validated to be up and running, we can now deploy the following application example root@ciliumk8s-ubuntu-server:~# cat application.yaml apiVersion: cis.f5.com/v1 kind: VirtualServer metadata: labels: f5cr: "true" name: goblin-virtual-server namespace: nsgoblin spec: host: goblin.com pools: - path: /green service: svc-nodeport servicePort: 80 - path: /harry service: svc-nodeport servicePort: 80 virtualServerAddress: X.X.X.X --- apiVersion: apps/v1 kind: Deployment metadata: name: goblin-backend namespace: nsgoblin spec: replicas: 2 selector: matchLabels: app: goblin-backend template: metadata: labels: app: goblin-backend spec: containers: - name: goblin-backend image: nginx:latest ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: svc-nodeport namespace: nsgoblin spec: selector: app: goblin-backend ports: - port: 80 targetPort: 80 type: ClusterIP k apply -f application.yaml We can now verify the k8s pods are created. Then we will create a sample html page to test access to the backend NGINX pod root@ciliumk8s-ubuntu-server:~# k -n nsgoblin get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES goblin-backend-7485b6dcdf-d5t48 1/1 Running 0 6d2h 10.42.0.70 ciliumk8s-ubuntu-server <none> <none> goblin-backend-7485b6dcdf-pt7hx 1/1 Running 0 6d2h 10.42.0.97 ciliumk8s-ubuntu-server <none> <none> root@ciliumk8s-ubuntu-server:~# k -n nsgoblin exec -it po/goblin-backend-7485b6dcdf-pt7hx -- /bin/sh # cat > green <<'EOF' <!DOCTYPE html> > > <html> > <head> <title>Green Goblin</title> <style> body { background-color: #4CAF50; color: white; text-align: center; padding: 50px; } h1 { font-size: 3em; } > > > > > </style> </head> <body> <h1>I am the green goblin!</h1> <p>Access me at /green</p> </body> </html> > > > > > > > EOF root@ciliumk8s-ubuntu-server:~# k -n nsgoblin exec -it goblin-backend-7485b6dcdf-d5t48 -- /bin/sh # cat > green <<'EOF' > <!DOCTYPE html> <html> <head> <title>Green Goblin</title> <style> body { background-color: #4CAF50; color: white; text-align: center; padding: 50px; } h1 { font-size: 3em; } </style> > </head> <body> <h1>I am the green goblin!</h1> <p>Access me at /green</p> </body> </html> EOF> > > > > > > > > > > > > We can now validate the pools are created on the F5 BIG-IP root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes/Shared)(tmos)# list ltm pool all ltm pool svc_nodeport_80_nsgoblin_goblin_com_green { description "crd_10_69_12_40_80 loadbalances this pool" members { /kubernetes/10.42.0.70:http { address 10.42.0.70 } /kubernetes/10.42.0.97:http { address 10.42.0.97 } } min-active-members 1 partition kubernetes } ltm pool svc_nodeport_80_nsgoblin_goblin_com_harry { description "crd_10_69_12_40_80 loadbalances this pool" members { /kubernetes/10.42.0.70:http { address 10.42.0.70 } /kubernetes/10.42.0.97:http { address 10.42.0.97 } } min-active-members 1 partition kubernetes } root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes/Shared)(tmos)# list ltm virtual crd_10_69_12_40_80 ltm virtual crd_10_69_12_40_80 { creation-time 2025-12-22:10:10:37 description Shared destination /kubernetes/10.69.12.40:http ip-protocol tcp last-modified-time 2025-12-22:10:10:37 mask 255.255.255.255 partition kubernetes persist { /Common/cookie { default yes } } policies { crd_10_69_12_40_80_goblin_com_policy { } } profiles { /Common/f5-tcp-progressive { } /Common/http { } } serverssl-use-sni disabled source 0.0.0.0/0 source-address-translation { type automap } translate-address enabled translate-port enabled vs-index 2 } CIS log output 2025/12/22 18:10:25 [INFO] [Request: 1] cluster local requested CREATE in VIRTUALSERVER nsgoblin/goblin-virtual-server 2025/12/22 18:10:25 [INFO] [Request: 1][AS3] creating a new AS3 manifest 2025/12/22 18:10:25 [INFO] [Request: 1][AS3][BigIP] posting request to https://10.69.12.1 for tenants 2025/12/22 18:10:26 [INFO] [Request: 2] cluster local requested UPDATE in ENDPOINTS nsgoblin/svc-nodeport 2025/12/22 18:10:26 [INFO] [Request: 3] cluster local requested UPDATE in ENDPOINTS nsgoblin/svc-nodeport 2025/12/22 18:10:43 [INFO] [Request: 1][AS3][BigIP] post resulted in SUCCESS 2025/12/22 18:10:43 [INFO] [AS3][POST] SUCCESS: code: 200 --- tenant:kubernetes --- message: success 2025/12/22 18:10:43 [INFO] [Request: 3][AS3] Processing request 2025/12/22 18:10:43 [INFO] [Request: 3][AS3] creating a new AS3 manifest 2025/12/22 18:10:43 [INFO] [Request: 3][AS3][BigIP] posting request to https://10.69.12.1 for tenants 2025/12/22 18:10:43 [INFO] Successfully updated status of VirtualServer:nsgoblin/goblin-virtual-server in Cluster W1222 18:10:49.238444 1 warnings.go:70] v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice 2025/12/22 18:10:52 [INFO] [Request: 3][AS3][BigIP] post resulted in SUCCESS 2025/12/22 18:10:52 [INFO] [AS3][POST] SUCCESS: code: 200 --- tenant:kubernetes --- message: success 2025/12/22 18:10:52 [INFO] Successfully updated status of VirtualServer:nsgoblin/goblin-virtual-server in Cluster Troubleshooting: 1. If static routes are not added, the first step is to inspect CIS logs for entries similar to these: Cilium annotation warning logs 2025/12/22 17:44:45 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:41 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:42 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:43 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2. These are resolved by adding annotations to the node using the reference: https://clouddocs.f5.com/containers/latest/userguide/static-route-support.html Cilium annotation for node root@ciliumk8s-ubuntu-server:~# k annotate node ciliumk8s-ubuntu-server io.cilium.network.ipv4-pod-cidr=10.42.0.0/16 root@ciliumk8s-ubuntu-server:~# k describe node | grep -E "Annotations:|PodCIDR:|^\s+.*pod-cidr" Annotations: alpha.kubernetes.io/provided-node-ip: 10.69.12.2 io.cilium.network.ipv4-pod-cidr: 10.42.0.0/16 PodCIDR: 10.42.0.0/24 3. Verify a static route has been created and test connectivity to k8s pods root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes)(tmos)# list net route net route k8s-ciliumk8s-ubuntu-server-10.69.12.2 { description 10.69.12.1 gw 10.69.12.2 network 10.42.0.0/16 partition kubernetes } Using pup (command line HTML parser) -> https://commandmasters.com/commands/pup-common/ root@ciliumk8s-ubuntu-server:~# curl -s http://goblin.com/green | pup 'body text{}' I am the green goblin! Access me at /green 1 0.000000 10.69.12.34 ? 10.69.12.40 TCP 78 34294 ? 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM TSval=2984295232 TSecr=0 WS=128 2 0.000045 10.69.12.40 ? 10.69.12.34 TCP 78 80 ? 34294 [SYN, ACK] Seq=0 Ack=1 Win=23360 Len=0 MSS=1460 WS=512 SACK_PERM TSval=1809316303 TSecr=2984295232 3 0.001134 10.69.12.34 ? 10.69.12.40 TCP 70 34294 ? 80 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=2984295234 TSecr=1809316303 4 0.001151 10.69.12.34 ? 10.69.12.40 HTTP 149 GET /green HTTP/1.1 5 0.001343 10.69.12.40 ? 10.69.12.34 TCP 70 80 ? 34294 [ACK] Seq=1 Ack=80 Win=23040 Len=0 TSval=1809316304 TSecr=2984295234 6 0.002497 10.69.12.1 ? 10.42.0.97 TCP 78 33707 ? 80 [SYN] Seq=0 Win=23360 Len=0 MSS=1460 WS=512 SACK_PERM TSval=1809316304 TSecr=0 7 0.003614 10.42.0.97 ? 10.69.12.1 TCP 78 80 ? 33707 [SYN, ACK] Seq=0 Ack=1 Win=64308 Len=0 MSS=1410 SACK_PERM TSval=1012609408 TSecr=1809316304 WS=128 8 0.003636 10.69.12.1 ? 10.42.0.97 TCP 70 33707 ? 80 [ACK] Seq=1 Ack=1 Win=23040 Len=0 TSval=1809316307 TSecr=1012609408 9 0.003680 10.69.12.1 ? 10.42.0.97 HTTP 149 GET /green HTTP/1.1 10 0.004774 10.42.0.97 ? 10.69.12.1 TCP 70 80 ? 33707 [ACK] Seq=1 Ack=80 Win=64256 Len=0 TSval=1012609409 TSecr=1809316307 11 0.004790 10.42.0.97 ? 10.69.12.1 TCP 323 HTTP/1.1 200 OK [TCP segment of a reassembled PDU] 12 0.004796 10.42.0.97 ? 10.69.12.1 HTTP 384 HTTP/1.1 200 OK 13 0.004820 10.69.12.40 ? 10.69.12.34 TCP 448 HTTP/1.1 200 OK [TCP segment of a reassembled PDU] 14 0.004838 10.69.12.1 ? 10.42.0.97 TCP 70 33707 ? 80 [ACK] Seq=80 Ack=254 Win=23552 Len=0 TSval=1809316308 TSecr=1012609410 15 0.004854 10.69.12.40 ? 10.69.12.34 HTTP 384 HTTP/1.1 200 OK Summary: There we have it, we have successfully deployed an NGINX application on a Kubernetes cluster managed by F5 CIS using static routes to forward traffic to the kubernetes pods408Views3likes2CommentsAWS Transit Gateway Connect: GRE + BGP = ?
GRE and BGP are technologies that are... mature. In this article we'll take a look at how you can use AWS Transit Gateway Connect to do some unique networking and application delivery in the cloud. In December 2020 AWS released a new feature of Transit Gateway (TGW) that enables a device to peer with TGW via a GRE/BGP tunnel. The intent was to be used with SD-WAN devices, but we can also use it for things like load balancing many internal private addresses, NAT gateway, etc... In this article we'll look at my experience of setting up TGW Connect in a lab environment based on F5's documentation for setting up GRE and BGP. Challenges with TGW For folks that are not familiar with TGW it is an AWS service that allows you to stitch together multiple physical and virtual networks via AWS internal networking (VPC peering) or via network protocols (VPN, Direct Connect (private L2)). Using TGW you can steer traffic to a specific network device by creating a route table entry within a VPC that points to the device's ENI (network interface). This is useful for a case where you want to send all traffic for a specific CIDR (192.0.2.0/24) to traverse that device. In the scenario where a device is responsible for a CIDR it is also responsible for updating the route table for HA. This could be done via Lambda function, our Cloud Failover Extension, manual updates, etc.... The other downside is that this limits you to a single device per Availability Zone to receive traffic for that CIDR. TGW Connect provides a mechanism for the device to use a GRE tunnel/BGP to establish a connection to TGW and use dynamic routing protocols (BGP) to advertise the health of the device. This allows you to establish up to 4 devices to peer with TGW with up to 5 Gbps of traffic per connection (for comparison you can burst up to 50 Gbps with a VPC connection). Topology of TGW Connect When using TGW Connect it re-uses existing TGW connections. In practice this means that you are likely using an existing Direct Connect or VPC connection (I guess you could also use a VPN connection, but that would be weird). See also: https://aws.amazon.com/blogs/networking-and-content-delivery/simplify-sd-wan-connectivity-with-aws-transit-gateway-connect/ Configuring a BIG-IP for TGW Connect To use a BIG-IP with TGW Connect you will need a device that is licensed for BGP (also called Advanced Routing, part of Better/Best). Follow the steps for setting up TGW Connect and be sure to specify a different peer ASN than your TGW (you will need to use eBGP). The "Peer Address" will be the self-ip of the BIG-IP on the AWS VPC (when using a VPC). Configuring target VPC When setting up TGW Connect you will peer with an existing VPC. In the subnet that you want to use with TGW Connect (Peer Address of GRE tunnel) you will need to have a route that points to the TGW peer address; for example if you specify a CIDR of 10.254.254.0/24 for TGW and the peer address is 10.254.254.11 you will need to create a route that includes 10.254.254.0/24 on the subnet for BIG-IP peer address. Also make sure to open up Security Groups to allow GRE traffic to traverse to/from the interface that will be used for the GRE tunnel. The rule should allow the IP of the peer address (i.e. 10.254.254.11). Route to TGW from 10.1.7.0/24 GRE Tunnel On the BIG-IP under Network / Tunnels you will need to create a GRE Tunnel. You can use the default "gre" tunnel profile. Specify the same "Peer Address" that you used when setting up TGW Connect. You will also want to specify the Remote Address that is the TGW address. BGP Peer Next you will need to configure the BIG-IP to act as a BGP peer to TGW connecting over the GRE tunnel. TGW requires that you use an IP in the 169.254.0.0 range. This will require modify a db variable to allow that address to be used as a self-ip. The tmsh command to use is. modify sys db config.allow.rfc3927 { value "enable" } You can then create your BGP peer address to match the value that you used in TGW Connect. The BGP peer address will need to be configured to allow BGP updates (port 179). Since the traffic is occurring over the GRE tunnel there is no need to update AWS Security Groups (invisible to the ENI). Setting up BGP Peering TGW Connect requires eBGP to be used. The following is an example of a working config. This also assumes you go through the pre-req of setting up BGP/RHI. Be careful to only advertise the routes that you want, when you use "redistribute kernel" it will also advertise 0.0.0.0/0! Please also see: https://support.f5.com/csp/article/K15923612 ! no service password-encryption ! router bgp 65520 bgp graceful-restart restart-time 120 aggregate-address 10.0.0.0/16 summary-only aggregate-address 10.1.0.0/16 summary-only aggregate-address 10.2.0.0/16 summary-only aggregate-address 10.3.0.0/16 summary-only redistribute kernel neighbor 169.254.10.2 remote-as 64512 neighbor 169.254.10.2 ebgp-multihop 2 neighbor 169.254.10.2 soft-reconfiguration inbound neighbor 169.254.10.2 prefix-list tenlist out neighbor 169.254.10.3 remote-as 64512 neighbor 169.254.10.3 ebgp-multihop 2 neighbor 169.254.10.3 soft-reconfiguration inbound neighbor 169.254.10.3 prefix-list tenlist out ! ip prefix-list tenlist seq 5 deny 10.254.254.0/24 ip prefix-list tenlist seq 10 permit 10.0.0.0/8 ge 16 ! line con 0 login line vty 0 39 login ! end This example was created with help from a BGP expert Brandon Frelich. In the example above we only limiting routes to 3 CIDRs and configuring ECMP. At this point the BIG-IP could allocate VIPs on the CIDR, act as an AFM firewall, and if we used 0.0.0.0/0 it could act as an outbound gateway. Verifying the Setup You should be able to see your BGP connection go green in the AWS console and also see the status by running "show ip bgp neighbors" from imish. AWS Console ip-10-1-1-112.ec2.internal[0]>show ip bgp summary BGP router identifier 169.254.10.1, local AS number 65520 BGP table version is 4 2 BGP AS-PATH entries 0 BGP community entries Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 169.254.10.2 4 64512 293 293 4 0 0 00:47:08 3 169.254.10.3 4 64512 293 292 4 0 0 00:47:08 3 Total number of neighbors 2 Output from imish (tmos)# list /ltm virtual-address one-line ltm virtual-address 10.0.0.0 { address 10.0.0.0 arp disabled floating disabled icmp-echo disabled mask 255.255.0.0 route-advertisement selective traffic-group none unit 0 } ltm virtual-address 10.1.0.0 { address 10.1.0.0 arp disabled floating disabled icmp-echo disabled mask 255.255.0.0 route-advertisement selective traffic-group none unit 0 } ltm virtual-address 10.2.0.0 { address 10.2.0.0 arp disabled floating disabled icmp-echo disabled mask 255.255.0.0 route-advertisement selective traffic-group none unit 0 } (tmos)# list /net route tgw net route tgw { interface /Common/tgw-connect network 10.0.0.0/8 } Output from TMSH. Note route-advertisement is enabled on the virtual-addresses. We are using a static route to steer traffic to the GRE tunnel. You should also see any routes advertised as well. ECMP Considerations When you deploy multiple BIG-IP devices TGW can use ECMP to spray traffic across multiple devices (by enabling traffic group None or multiple standalone devices). Be aware that if you need to statefully inspect traffic you may want to enable SNAT to have the return traffic go to the same device or use traffic-group-1 to run in Active/Standby via Route Health Injection. Otherwise you will need to follow the guidance on setting up a forwarding virtual server to ignore the system connection table. Routing Connections One of the issues that a customer discovered when exploring this solution is that the BIG-IP will initially send health checks across the GRE tunnel using an IP from the 169.254.x.x address range, this follows the address selection criteria that a BIG-IP uses. One method of dealing with this is to assign an IP address in a range that you would like to advertise across the tunnel like 198.168.254.0/24. Creating a self-ip of 198.168.254.253 that you assign to the tunnel. To send traffic for a different range (i.e. 10.0.0.0/16) you can then create a static route on the BIG-IP that points to 198.168.254.1. Since the BIG-IP sees the address is on the tunnel it will correctly forward the traffic through the tunnel. Another question that arose was whether it was possible to have asymmetric traffic flows of utilizing both the GRE TGW Connect tunnel and the TGW VPC Connection of the VPC itself. I discovered that YES this is possible by following the guidance on enabling asymmetrically routed traffic. It hurts your brain a bit, but here's the results of a flow that is using both a Connect and VPC connection. You can do some crazy things, but with great power... Request traffic over VPC Connection Response traffic over TGW Connect (GRE) Other Options You could also achieve a similar result by using default VPC peering and making use of Cloud Failover Extension for updating the route table. The benefit of that approach is that you don't have to deal with GRE/BGP! It does limit you to a single device per-AZ vs. being able to get up to 4 devices running across a connection.7.4KViews2likes1CommentPractical Mapping Guide - F5 BIG-IP TMOS Modules to Feature-Scoped CNFs
Introduction BIG-IP TMOS and BIG-IP CNFs solve similar problems with very different deployment and configuration models. In TMOS you deploy whole modules (LTM, AFM, DNS, etc.), while in CNFs you deploy only the specific features and data plane functions you need as cloud native components. Modules vs feature scoped services In TMOS, enabling LTM gives you a broad set of capabilities in one module: virtual servers, pools, profiles, iRules, SNAT, persistence, etc., all living in the same configuration namespace. In CNFs, you deploy only the features you actually need, expressed as discrete custom resources: for example, a routing CNF for more precise TMM routing, a firewall CNF for policies, or specific CRs for profiles and NAT, rather than bringing the entire LTM or AFM module along “just in case”. Configuration objects vs custom resources TMOS configuration is organized as objects under modules (for example: LTM virtual, LTM pool, security, firewall policy, net vlan), managed via tmsh, GUI, or iControl REST. CNFs expose those same capabilities through Kubernetes Custom Resources (CRs): routes, policies, profiles, NAT, VLANs, and so on are expressed as YAML and applied with GitOps-style workflows, making individual features independently deployable with a version history. Coarse-grained vs precise feature deployment On TMOS, deploying a use case often means standing up a full BIG-IP instance with multiple modules enabled, even if the application only needs basic load balancing with a single HTTP profile and a couple of firewall rules. With CNFs, you can carve out exactly what you need: for example, only a precise TMM routing function plus the specific TCP/HTTP/SSL profiles and security policies required for a given application or edge segment, reducing blast radius and resource footprint. BIG-IP Next CNF Custom Resources (CRs) extend Kubernetes APIs to configure the Traffic Management Microkernel (TMM) and Advanced Firewall Manager (AFM), providing declarative equivalents to BIG-IP TMOS configuration objects. This mapping ensures functional parity for L4-L7 services, security, and networking while enabling cloud-native scalability. Focus here covers core mappings with examples for iRules and Profiles. What are Kubernetes Custom Resources Think of Kubernetes API as a menu of objects you can create: Pods (containers), Services (networking), Deployments (replicas). CRs add new menu items for your unique needs. You first define a CustomResourceDefinition (CRD) (the blueprint), then create CR instances (actual objects using that blueprint). How CRs Work (Step-by-Step) Create CRD - Define new object type (e.g., "F5BigFwPolicy") Apply CRD - Kubernetes API server registers it, adding new REST endpoints like /apis/f5net.com/v1/f5bigfwpolicies Create CR - Write YAML instances of your new type and apply with kubectl Controllers Watch - Custom operators/controllers react to CR changes, creating real resources (Pods, ConfigMaps, etc.) CR examples Networking and NAT Mappings Networking CRs handle interfaces and routing, mirroring, TMOS VLANs, Self-IPs and NAT mapping. CNF CR TMOS Object(s) Purpose F5BigNetVlan VLANs, Self-IPs, MTU Interface config F5BigCneSnatpool SNAT Automap Source NAT s Example: F5BigNetVlan CR sets VLAN tags and IPs, applied via kubectl apply, propagating to TMM like tmsh create net vlan. Security and Protection Mappings Protection CRs map to AFM and DoS modules for threat mitigation. CNF CR CR Purpose TMOS Object(s) Purpose F5BigFwPolicy applies industry-standard firewall rules to TMM AFM Policies Stateful ACL filtering F5BigFwRulelist Consists of an array of ACL rules AFM Rule list Stateful ACL filtering F5BigSvcPolicy Allows creation of Timer Policies and attaching them to the Firewall Rules AFM Policies Stateful ACL filtering F5BigIpsPolicy Allows you to filter inspected traffic (matched events) by various properties such as, the inspection profile’s host (virtual server or firewall policy), traffic properties, inspection action, or inspection service. IPS AFM policies Enforce IPS F5BigDdosGlobal Configures the TMM Proxy Pod to protect applications and the TMM Pod from Denial of Service / Distributed Denial of Service (Dos/DDoS) attacks. Global DoS/DDoS Device DoS protection F5BigDdosProfile The Percontext DDoS CRD configures the TMM Proxy Pod to protect applications from Denial of Service / Distributed Denial of Service (Dos/DDoS) attacks. DoS/DDoS per profile Per Context DoS profile protection F5BigIpiPolicy Each policy contains a list of categories and actions that can be customized, with IPRep database being common. Feedlist based policies can also customize the IP addresses configured. IP Intelligence Policies Reputation-based blocking Example: F5BigFwPolicy references rule-lists and zones, equivalent to tmsh creating security firewall policy. Traffic Management Mappings Traffic CRs proxy TCP/UDP and support ALGs, akin to LTM Virtual Servers and Pools. CNF CR TMOS Object(s) Purpose Part of the Apps CRs Virtual Server (Ingress/HTTPRoute/GRPCRoute) LTM Virtual Servers Load balancing Part of the Apps CRs Pool (Endpoints) LTM Pools Node groups F5BigAlgFtp FTP Profile/ALG FTP gateway F5BigDnsCache DNS Cache Resolution/caching Example: GRPCRoute CR defines backend endpoints and profiles, mapping to tmsh create ltm virtual. Profiles Mappings with Examples Profiles CRs customize traffic handling, referenced by Traffic Management CRs, directly paralleling TMOS profiles. CNF CR TMOS Profile(s) Example Configuration Snippet F5BigTcpSetting TCP Profile spec: { defaultsFrom: "tcp-lan-optimized" } tunes congestion like tmsh create LTM profile TCP optimized F5BigUdpSetting UDP Profile spec: { idleTimeout: 300 } sets timeouts F5BigUdpSetting HTTP Profile spec: { http2Profile: "http2" } enables HTTP/2 F5BigClientSslSetting ClientSSL Profile spec: { certKeyChain: { name: "default" } } for TLS termination F5PersistenceProfile Persistence Profile Enables source/dest persistence Profiles attach to Virtual Servers/Ingresses declaratively, e.g., spec.profiles: [{ name: "my-tcp", context: "tcp" }]. iRules Support and Examples CNF fully supports iRules for custom logic, integrated into Traffic Management CRs like Virtual Servers or FTP/RTSP. iRules execute in TMM, preserving TMOS scripting. Example CR Snippet: apiVersion: k8s.f5net.com/v1 kind: F5BigCneIrule metadata: name: cnfs-dns-irule namespace: cnf-gateway spec: iRule: > when DNS_REQUEST { if { [IP::addr [IP::remote_addr] equals 10.10.1.0/24] } { cname cname.siterequest.com } else { host 10.20.20.20 } } This mapping facilitates TMOS-to-CNF migrations using tools like F5Big-Ip Controller for CR generation from UCS configs. For full CR specs, refer to official docs. The updated full list of CNF CRs per release can be found over here, https://clouddocs.f5.com/cnfs/robin/latest/cnf-custom-resources.html Conclusion In this article, we examined how BIG-IP Next CNF redefines the TMOS module model into a feature-scoped, cloud-native service architecture. By mapping TMOS objects such as virtual servers, profiles, and security policies to their corresponding CNF Custom Resources, we illustrated how familiar traffic management and security constructs translate into declarative Kubernetes workflows. Related content F5 Cloud-Native Network Functions configurations guide BIG-IP Next Cloud-Native Network Functions (CNFs) CNF DNS Express BIG-IP Next for Kubernetes CNFs - DNS walkthrough BIG-IP Next for Kubernetes CNFs deployment walkthrough | DevCentral BIG-IP Next Edge Firewall CNF for Edge workloads | DevCentral Modern Applications-Demystifying Ingress solutions flavors | DevCentral87Views1like0Comments
