cloud
1790 TopicsVMware VKS integration with F5 BIG-IP and CIS
Introduction vSphere Kubernetes Service (VKS) is the Kubernetes runtime built directly into VMware Cloud Foundation (VCF). With CNCF certified Kubernetes, VKS enables platform engineers to deploy and manage Kubernetes clusters while leveraging a comprehensive set of cloud services in VCF. Cloud admins benefit from the support for N-2 Kubernetes versions, enterprise-grade security, and simplified lifecycle management for modern apps adoption. Alike with other Kubernetes platforms, the integration with BIG-IP is done through the use of the Container Ingress Services (CIS) component, which is hosted in the Kubernetes platform and allows to configure the BIG-IP using the Kubernetes API. Under the hood, it uses the F5 AS3 declarative API. Note from the picture that BIG-IP integration with VKS is not limited to BIG-IP´s load balancing capabilities and that most BIG-IP features can be configured using this integration. These features include: Advanced TLS encryption, including safe key storage with Hardware Security Module (HSM) or Network & Cloud HSM support. Advanced WAF, L7 bot and API protection. L3-L4 High-performance firewall with IPS for protocol conformance. Behavioral DDoS protection with cloud scrubbing support. Visibility into TLS traffic for inspection with 3 rd party solutions. Identity-aware ingress with Federated SSO and integration with leading MFAs. AI inference and agentic support thanks to JSON and MCP protocol support. Planning the deployment of CIS for VMware VKS The installation of CIS in VMware VKS is performed through the standard Helm charts facility. The platform owner needs to determine beforehand: Whether the deployment is hosted on a vSphere (VDS) network or an NSX network. It has to be taken into account that on an NSX network, VKS doesn´t currently allow to place the load balancers in the same segment as the VKS cluster. No special considerations have to be taken when hosting BIG-IP in a vSphere (VDS) network. Whether this is a single-cluster or a multi-cluster deployment. When using the multi-cluster option and clusterIP mode (only possible with Calico in VKS), it has to be taken into account that the POD networks of the clusters cannot have overlapping prefixes. What Kubernetes networking (CNI) is desired to be used. CIS supports both VKS supported CNIs: Antrea (default) and Calico. From the CIS point of view, the CNI is only relevant when sending traffic directly to the PODs. See next. What integration with the CNI is desired between the BIG-IP and VKS NodePort mode This is done by making applications discoverable using Services of type NodePort. From the BIG-IP, the traffic is sent to the Node´s IPs where it is redistributed to the POD depending on the TrafficPolicies of the Service. This is CNI agnostic. Any CNI can be used. Direct-to-POD mode This is done by making applications discoverable using the Services of type ClusterIP. Note that the CIS integration with Antrea uses Antrea´s nodePortLocal mechanism, which requires an additional annotation in the Service declaration. See the CIS VKS page in F5 CloudDocs for details. This Antrea nodePortLocal mechanism allows to send the traffic directly to the POD without actually using the POD IP address. This is especially relevant for NSX because it allows to access the PODs without actually re-distributing the PODs IPs across the NSX network, which is not allowed. When using vSphere (VDS) networking, either Antrea’s nodePortLocal or clusterIP with Calico can be used. Another way (but not frequent) is the use of hostNetwork POD networking because it requires privileges for the application PODs or ingress controllers. Network-wise, this would have a similar behavior to nodePortLocal, but without the automatic allocation of ports. Whether the deployment is a single-tier or a two-tier deployment. A single-tier deployment is a deployment where the BIG-IP sends the traffic directly to the application PODs. This has a simpler traffic flow and easier persistence and end-to-end monitoring. A two-tier deployment sends the traffic to an ingress controller POD instead of the application PODs. This ingress controller could be Contour, NGINX Gateway Fabric, Istio or an API gateway. This type of deployment offers the ultimate scalability and provides additional segregation between the BIG-IPs (typically owned by NetOps) and the Kubernetes cluster (typically owned by DevOps). Once CIS is deployed, applications can be published either using the Kubernetes standard Ingress resource or F5’s Custom Resources. This latter is the recommended way because it allows to expose most of the BIG-IPs capabilities. Details on the Ingress resource and F5 custom annotations can be found here. Details on the F5 CRDs can be found here. Please note that at time of this writing Antrea nodePortLocal doesn´t support the TransportServer CRD. Please consult your F5 representative for its availability. Detailed instructions on how to deploy CIS for VKS can be found on this CIS VKS page in F5 CloudDocs. Application-aware MultiCluster support MultiCluster allows to expose applications that are hosted in multiple VKS clusters and publish them in a single VIP. BIG-IP & CIS are in charge of: Discover where the PODs of the applications are hosted. Note that a given application doesn´t need to be available in all clusters. Upon receiving the request for a given application, decide to which cluster and Node/Pod the request has to be sent. This decision is based on the weight of each cluster, the application availability and the load balancing algorithm being applied. Single-tier or Two-tier architectures are possible. NodePort and ClusterIP modes are possible as well. Note that at the time of this writing, Antrea in ClusterIP mode (nodePortLocal) is not supported currently. Please consult your F5 representative for availability of this feature. Considerations for NSX Load Balancers cannot be placed in the same VPC segment where the VMware VKS cluster is. These can be placed in a separate VPC segment of the same VPC gateway as shown in the next diagram. In this arrangement the BIG-IP can be configured as either 1NIC mode or as a regular deployment, in which case the MGMT interface is typically configured through an infrastructure VLAN instead of an NSX segment. The data segment is only required to have enough prefixes to host the self-IPs of the BIG-IP units. The prefixes of the VIPs might not belong to the Data Segment´s subnet. These additional prefixes have to be configured as static routes in the VPC Gateway and Route Redistribution for these must be enabled. Given that the Load Balancers are not in line with the traffic flow towards the VKS Cluster, it is required to use SNAT. When using SNAT pools, the prefixes of these can optionally be configured as additional prefixes of the Data Segment, like the VIPs. Specifically for Calico, clusterIP mode cannot be used in NSX because this would require the BIG-IP to be in the same VPC segment as VMware VKS. Note also that BGP multi-hop is not feasible either because it would require the POD cluster network prefixes to be redistributed through NSX, which is not possible either. Conclusion and final remarks F5 BIG-IPs provides unmatched deployment options and features for VMware VKS; these include: Support for all VKS CNIs, which allows sending the traffic directly instead of using hostNetwork (which implies a security risk) or using the common NodePort, which can incur an additional kube-proxy indirection. Both 1-tier or 2-tier arrangements (or both types simultaneously) are possible. F5´s Container Ingress Services provides the ability to handle multiple VMware VKS clusters with application-aware VIPs. This is a unique feature in the industry. Securing applications with the wide range of L3 to L7 security features provided by BIG-IP, including Advanced WAF and Application Access. To complete the circle, this integration also provides IP address management (IPAM) which provides great flexibility to DevOps teams. All these are available regardless of the form factor of the BIG-IP: Virtual Edition, appliance or chassis, allowing great scalability and multi-tenancy options. In NSX deployments, the recommended form-factor is Virtual Edition in order to connect to the NSX segments. We look forward to hearing your experience and feedback on this article.21Views1like0CommentsHigh Availability for F5 NGINX Instance Manager in AWS
Introduction F5 NGINX Instance Manager gives you a centralized way to manage NGINX Open Source and NGINX Plus instances across your environment. It’s ideal for disconnected or air-gapped deployments, with no need for internet access or external cloud services. The NGINX Instance Manager features keep changing. They now include many features for managing configurations, like NGINX config versioning and templating, F5 WAF for NGINX policy and signature management, monitoring of NGINX metrics and security events, and a rich API to help external automation. As the role of NGINX Instance Manager becomes increasingly important in the management of disconnected NGINX fleets, the need for high availability increases. This article explores how we can use Linux clustering to provide high availability for NGINX Instance Manager across two availability zones in AWS. Core Technologies Core technologies used in this HA architecture design include: Amazon Elastic Compute instances (EC2) - virtual machines rented inside AWS that can be used to host applications, like NGINX Instance Manager. Pacemaker - an open-source high availability resource manager software used in Linux clusters since 2004. Pacemaker is generally deployed with the Corosync Cluster Engine, which provides the cluster node communication, membership tracking and cluster quorum. Amazon Elastic File System (EFS) - a serverless, fully managed, elastic Network File System (NFS) that allows servers to share file data simultaneously between systems. Amazon Network Load Balancer (NLB) - a layer 4 TCP/UDP load balancer that forwards traffic to targets like EC2 instances, containers or IP addresses. NLB can send periodic health checks to registered targets to ensure that traffic is only forwarded to healthy targets. Architecture Overview In this highly available architecture, we will install NGINX Instance Manager (NIM) on two EC2 instances in different AWS Availability Zones (AZ). Four EFS file systems will be created to share key stateful information between the two NIM instances, and Pacemaker/Corosync will be used to orchestrate the cluster - only one NIM instance is active at any time and Pacemaker will facilitate this by starting/stopping the NIM systemd services. Finally, an Amazon NLB will be used to provide network failover between the two NIM instances, using an HTTP health check to determine the active cluster node. Deployment Steps 1. Create AWS EFS file systems First, we are going to create four EFS volumes to hold important NIM configuration and state information that will be shared between nodes. These file systems will be mounted onto: /etc/nms, /var/lib/clickhouse, /var/lib/nms and /usr/share/nms inside the NIM node. Take note of the File System IDs of the newly created file systems. Edit the properties of each EFS file system and create a mount target in each AZ you intend to deploy a NIM node in, then restrict network access to only the NIM nodes by setting up an AWS Security Group. You may also consider more advanced authentication methods, but these aren't covered in this article. 2. Deploy two EC2 instances for NGINX Instance Manager Deploy two EC2 instances with suitable specifications to support the number of data plane instances that you plan to manage (you can find the sizing specifications here) and connect one to each of the AZ/subnet that you configured EFS mount targets in above. In this example, I will deploy two t2.medium instances running Ubuntu 24.04, connect one to us-east-1a and the other to us-east-1c, and create a security group allowing only traffic from its local assigned subnet. 3. Mount the EFS file systems on NGINX Instance Manager Node 1 Now we have the EC2 instances deployed, we can log on to Node 1 and mount the EFS volumes onto this node by executing the following steps: 1. SSH onto Node 1 2. Install efs-utils package if is not installed already 3. Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory 4. Execute mount -a to mount the file systems 5. Execute df to ensure that the paths are mounted correctly 4. Install NGINX Instance Manager on Node 1 With the EFS file systems now mounted, it's time to run through the NGINX Instance Manager installation on Node 1. 1. Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh 2. Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 3. Run bash install-nim-bundle.sh -d ubuntu22.04 4. Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node: systemctl stop nms; systemctl disable nms systemctl stop nginx; systemctl disable nginx systemctl stop clickhouse-server; systemctl disable clickhouse-server 5. Install NGINX Instance Manager on Node 2 This time we are going to install NGINX Instance Manager on Node two but without attaching the EFS file systems. On Node 2: 1. Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh 2. Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 3. Run bash install-nim-bundle.sh -d ubuntu22.04 4. Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node: systemctl stop nms; systemctl disable nms systemctl stop nginx; systemctl disable nginx systemctl stop clickhouse-server; systemctl disable clickhouse-server 6. Mount EFS file systems on NGINX Instance Manager Node 2 Now we have the NGINX Instance Manager binaries installed on each node, let's mount the EFS file systems on Node 2: 1. SSH onto Node 2 2. Install efs-utils package if is not installed already 3. Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory 4. Execute mount -a to mount the file systems 5. Execute df to ensure that the paths are mounted correctly 7. Install and configure Pacemaker/Corosync With NGINX Instance Manager now installed on both nodes, it's now time to get Pacemaker and Corosync installed: 1. Install Pacemaker, Corosync and other important agents sudo apt update sudo apt install pacemaker pcs corosync fence-agents-aws resource-agents-base 2. To allow Pacemaker to communicate between nodes, we need to add TCP communication between nodes to the Security Group for the NIM nodes. 3. Once we have the connectivity in place, we have to set a common password for the hacluster user on both nodes - we can do this by running the following command on both nodes: sudo passwd hacluster password: IloveF5 (don't use this!) 4. Now we start the Pacemaker services by running the following commands on both nodes: systemctl start pcsd.service systemctl enable pcsd.service systemctl status pcsd.service systemctl start pacemaker systemctl enable pacemaker 5. And finally, we authenticate the nodes with each other (using hacluster username, password and node hostname) and check the cluster status: pcs host auth ip-172-17-1-89 ip-172-17-2-160 pcs cluster setup nimcluster --force ip-172-17-1-89 pcs status 8. Configure Cluster Fencing Fencing is the ability to make a node unable to run resources, even when that node is unresponsive to cluster commands - you can think of fencing as cutting the power to the node. Fencing protects against corruption of data due to concurrent access to shared resources, commonly known as "split brain" scenario. In this architecture, we use the fence_aws agent, which uses boto3 library to connect to AWS and stop the EC2 instances of failing nodes. Let's install and configure the fence_aws agent: 1. Create an AWS Access Key and Secret Access key for fence_aws to use 2. Install the AWS CLI on both NIM nodes 3. Take note of the Instance IDs for the NIM instances 4. Configure the fence_aws agent as a Pacemaker STONITH device. Run the psc stonith command inserting your access key, secret key, region, and mappings of Instance ID to Linux hostname. pcs stonith create hacluster-stonith fence_aws access_key=(your access key) secret_key=(your secret key) region=us-east-1 pcmk_host_map="ip-172-31-34-95:i-0a46181368524dab6;ip-172-31-27-134:i-032d0b400b5689f68" power_timeout=240 pcmk_reboot_timeout=480 pcmk_reboot_retries=4 5. Run pcs status and make sure that the stonith device is started 9. Configure Pacemaker resources, colocations and contraints Ok - we are almost there! It's time to configure the Pacemaker resources, colocations and constraints. We want to make sure that the clickhouse-server, nms and nginx systemd services all come up on the same node together, and in that order. We can do that using Pacemaker colocations and constraints. 1. Configure a pacemaker resource for each systemd service pcs resource create clickhouse systemd:clickhouse-server pcs resource create nms systemd:nms.service pcs resource create nginx systemd:nginx.service 🔥HOT TIP🔥 check out pcs resource command options (op monitor interval etc.) to optimize failover time. 2. Create two colocations to make sure they all start on the same node pcs constraint colocation add clickhouse with nms pcs constraint colocation add nms with nginx 3. Create three constraints to define the startup order: Clickhouse -> NMS -> NGINX pcs constraint order start clickhouse then nms pcs constraint order start nms then nginx 4. Enable and start the pcs cluster pcs cluster enable --all pcs cluster start --all 10. Provision AWS NLB Load Balancer Finally - we are going to set up the AWS Network Load Balancer (NLB) to facilitate the failover. Create a Security Group entry to allow HTTPs traffic to enter the EC2 instance from the local subnet 2. Create a Load Balancer target group, targeting instances, with Protocol TCP on port 443 ⚠️NOTE ⚠️ if you are using Load balancing with TCP Protocol and terminating the TLS connection on the NIM node (EC2 instance), you must create a security group entry to allow TCP 443 from the connecting clients directly to the EC2 instance IP address. If you have trusted SSL/TLS server certificates, you may want to investigate a load balancer for TLS protocol. 3. Ensure that a HTTPS health check is in place to facilitate the failover 🔥HOT TIP🔥 you can speed up failure detection and failover using Advanced health check settings. 4. Include our two NIM instances as pending and save the target group 5. Now let's create the network load balancer (NLB) listening on TCP port 443 and forwarding to the target group created above. 6. Once the load balancer is created, check the target group and you will find that one of the targets is healthy - that's the active node in the pacemaker cluster! 7. With the load balancing now in place, you can access the NIM console using the FQDN for your load balancer and login with the password set in the install of Node 1. 8. Once you have logged in, we need to install a license before we proceed any further: Click on Settings Click on Licenses Click Get Started Click Browse Upload your license Click Add 9. With the license now installed, we have access to the full console 11. Test failover The easiest way to test failover is to just shut down the active node in the cluster. Pacemaker will detect the node is no longer available and start the services on the remaining node. Stop the active node/instance of the NIM 2. Monitor the Target Group and watch it fail over - depending on the settings you have set up, this may take a few minutes 12. How to upgrade NGINX Instance Manager on the cluster To upgrade NGINX Instance Manager in a Pacemaker cluster, perform the following tasks: 1. Stop the Pacemaker Cluster services on Node 2 - forcing Node 1 to take over. pcs cluster stop ip-172-17-2-160 2. Disconnect the NFS mounts on Node2 umount /usr/share/nms umount /etc/nms umount /var/lib/nms umount /var/lib/clickhouse 3. Upgrade NGINX Instance Manager on Node 1 Download the update from the MyF5 Customer Portal sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb sudo systemctl restart nms sudo systemctl restart nginx 4. Upgrade NGINX Instance Manager on Node 2 (with the NFS mounts disconnected) Download the update from the MyF5 Customer Portal sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb sudo systemctl restart nms sudo systemctl restart nginx 5. Re-mount all the NFS mounts on Node 2 mount -a 6. Start the Pacemaker Cluster services on Node 2 - adding it back into the cluster pcs cluster start ip-172-17-2-160 13. Reference Documents Some good references on Pacemaker/Corosync clustering can be found here: Configuring a Red Hat High Availability cluster on AWS Implement a High-Availability Cluster with Pacemaker and Corosync ClusterLabs Pacemaker website Corosync Cluster Engine website188Views0likes0CommentsAccelerate Application Deployment on Google Cloud with F5 NGINXaaS
Introduction In the push for cloud-native agility, infrastructure teams often face a crossroads: settle for basic, "good enough" load balancing, or take on the heavy lifting of manually managing complex, high-performance proxies. For those building on Google Cloud (GCP), this compromise is no longer necessary. F5 NGINXaaS for Google Cloud represents a shift in how we approach application delivery. It isn’t just NGINX running in the cloud; it is a co-engineered, fully managed on-demand service that lives natively within the GCP ecosystem. This integration allows you to combine the advanced traffic control and programmability NGINX is known for with the effortless scaling and consumption model of an SaaS offering in a platform-first way. By offloading the "toil" of lifecycle management—like patching, tuning, and infrastructure provisioning—to F5, teams can redirect their energy toward modernizing application logic and accelerating release cycles. In this article, we’ll dive into how this synergy between F5 and Google Cloud simplifies your architecture, from securing traffic with integrated secret management to gaining deep operational insights through native monitoring tools. Getting Started with NGINXaaS for Google Cloud The transition to a managed service begins with a seamless onboarding experience through the Google Cloud Marketplace. By leveraging this integrated path, teams can bypass the manual "toil" of traditional infrastructure setup, such as patching and individual instance maintenance. The deployment process involves: Marketplace Subscription: Directly subscribe to the service to ensure unified billing and support. Network Connectivity: Setting up essential VPC and Network Attachments to allow NGINXaaS to communicate securely with your backend resources. Provisioning: Launching a dedicated deployment that provides enterprise-grade reliability while maintaining a cloud-native feel. Secure and Manage SSL/TLS in F5 NGINXaaS for Google Cloud Security is a foundational pillar of this co-engineered service, particularly regarding traffic encryption. NGINXaaS simplifies the lifecycle of SSL/TLS certificates by providing a centralized way to manage credentials. Key security features include: Integrated Secrets Management: Working natively with Google Cloud services to handle sensitive data like private keys and certificates securely. Proxy Configuration: Demonstrating how to set up a Google Cloud proxy network load balancer to handle incoming client traffic. Credential Deployment: Uploading and managing certificates directly within the NGINX console to ensure all application endpoints are protected by robust encryption. Enhancing Visibility in Google Cloud with F5 NGINXaaS Visibility is no longer an afterthought but a native component of the deployment, providing high-fidelity telemetry without separate agents. Native Telemetry Export: By linking your Google Cloud Project ID and configuring Workload Identity Federation (WIF), metrics and logs are pushed directly to Google Cloud Monitoring. Real-Time Dashboards: The observability demo walks through using the Metrics Explorer to visualize critical performance data, such as active HTTP connection counts and response rates. Actionable Logging: Integrated Log Analytics allow you to use the Logs Explorer to isolate events and troubleshoot application issues within a single toolset, streamlining your operational workflow. Whether you are just beginning your transition to the cloud or fine-tuning a sophisticated microservices architecture, F5 NGINXaaS provides the advanced availability, scalability, security, and visibility capabilities necessary for success in the Google Cloud environment. Conclusion The integration of F5 NGINXaaS for Google Cloud represents a significant advantage for organizations looking to modernize their application delivery without the traditional overhead of infrastructure management. By shifting to this co-engineered, managed service, teams can bridge together advanced NGINX performance and the native agility of the Google Cloud ecosystem. Through the demonstrations provided in this article, we’ve highlighted how you can: Accelerate Onboarding: Move from Marketplace subscription to a live deployment in minutes using Network Attachments. Fortify Security: Centralize SSL/TLS management within the NGINX console while leveraging Google Cloud's robust networking layer. Maximize Operational Intelligence: Harness deep, real-time observability by piping telemetry directly into Google Cloud Monitoring and Logging. Resources Accelerating app transformation with F5 NGINXaaS for Google Cloud F5 NGINXaaS for Google Cloud: Delivering resilient, scalable applications98Views2likes2CommentsAI Inference for VLLM models with F5 BIG-IP & Red Hat OpenShift
This article shows how to perform Intelligent Load Balancing for AI workloads using the new features of BIG-IP v21 and Red Hat OpenShift. Intelligent Load Balancing is done based on business logic rules without iRule programming and state metrics of the VLLM inference servers gathered from OpenShift´s Prometheus.368Views1like5CommentsCrafting a Cloud-Based Antifragile Cyber Resiliency Strategy
Application availability is a continually evolving concept. With the proliferation of hybrid multicloud applications and environments, fronted by third-party services like AWS and Azure, what used to be a back‑end challenge now encompasses even the delivery‑path. Even if an application is healthy, a failing CDN, edge location, or cloud control plane can make it unreachable, bringing business to a halt. Teams that support applications need to plan for this. They need their networks to reduce the blast radius of external failures. They need them to adapt as the failure unfolds. They need them to incorporate the lessons of disruption into future behavior. And they need them to be a systemic and strategic advantage rather than an operational obligation. Why this matters now Late‑2025 outages exposed how single‑vendor or single‑region bets amplify risk. A Cloudflare edge configuration defect propagated globally, causing widespread 5xx responses, despite customers’ origins being sound. Weeks earlier, a control-plane issue impacting AWS’s US‑East‑1 DNS stranded workloads, even impairing provider‑native failover actions. The organizations that fared best ran multi‑path delivery with independent steering at the DNS layer, allowing them to circumvent the gaps in their delivery networks that these outages created. F5’s ADSP portfolio has the tools to help create these resiliency paths: Distributed Cloud (XC) DNS & DNS Load Balancer (DNSLB) enable independent global traffic steering; XC Customer Edge (CE) enables private, controlled failover paths; XC Synthetic Monitoring uses continuous probes from regional edge locations to detect performance degradation or outages; and XC WAAP (Web Application & API Protection) enables consistent security during failover and graceful degradation. Together, they can provide teams with the necessary tools to get ahead of outages and keep applications online, not simply respond reactively after the fact of an outage. The reference architecture at a glance The goal of any modern, antifragile resiliency strategy is to keep full control over how traffic is routed, in all conditions and scenarios. This way, the network becomes an intelligent part of the application delivery and availability strategy. Using built‑in traffic routing policies, it automatically switches to available paths when a problem arises, keeping necessary security protections fully intact and applications online Global steering (Authoritative DNS + GSLB): For many organizations, DNS remains anchored to a single provider, creating an unnecessary dependency and a larger operational blast radius when that provider experiences issues. Delegating critical zones to XC DNS distributes authoritative DNS duties across F5’s global anycast footprint, reducing reliance on any one hyperscaler or DNS control plane. Running alongside this, the DNS Load Balancer applies Global Server Load Balancing policies that evaluate both origin and edge health, returning DNS responses that reflect real service conditions rather than static routing assumptions. Even when a provider’s own control plane is impaired. XC runs on F5’s independent backbone and can steer traffic to any public DNS name or IP (including a primary CDN CNAME), making it ideal for multi‑vendor delivery. Delivery planes (Primary + Alternate): Use your primary CDN/edge as usual, but maintain an alternate ingress on F5 XC WAAP. GSLB returns the primary CDN CNAME under normal conditions and automatically returns the XC VIP/CNAME when health checks degrade. This duality breaks monoculture risk at the edge. Secure origin connectivity (Customer Edge): Place F5 Customer Edge nodes in your VPCs/data centers. When traffic shifts to XC, it traverses F5’s encrypted backbone to the CE, keeping your origins private and avoiding an emergency “open to the internet” posture during failover. Consistent protection (WAAP + Bot Defense): Enforce WAAP policies and (optionally) Bot Defense connectors, so the same security logic applies regardless of which delivery path is active. This closes the “security arbitrage” gap attackers exploit when a secondary path is weaker. Observability & learning (synthetics + SIEM): Run synthetic monitors that test real HTTP journeys (not just TCP liveness) This is one of the only reliable ways to spot “grey failures” like mass 503s. The resultant logs can be streamed to your SIEM for further analysis. Use Game Days to rehearse provider cutovers and capture evidence. How the pieces fit - step by step 1) Make XC your steering layer (Authoritative DNS + GSLB) Delegate your service zone(s) to XC DNS. Keep a short TTL (e.g., 30–60s) on critical records to accelerate adaptation. In DNS Load Balancer, create two Origin Pools: Primary: your CDN CNAME (e.g., www.example.com.cdn.cloudflare.net). Secondary: the XC HTTP LB VIP or CNAME (vip.example.xc.f5.com). Attach health checks and steering policies (priority‑based, latency‑based, or SLO‑aware) so the LB returns the primary CNAME when healthy and the XC endpoint otherwise. Why DNS/GSLB first? When a provider’s control plane stalls (e.g., Route 53 updates freeze), you still retain an external control plane to redirect users, without waiting on the impacted provider to recover. 2) Detect real problems with synthetic monitoring Create Synthetic Monitors in XC that fetch a known‑good URL (/healthz) through the primary CDN path and validate status code, content, and latency. Don’t rely on ping/TCP, as grey failures may look healthy to L4 checks. By using multiple vantage points, you can tune thresholds to trigger “pool down” events quickly but safely. Failover logic example (conceptual): 3) Keep origins private with CE Deploy CE nodes into each origin environment. XC Regional Edge terminates client traffic, applies WAAP controls, then forwards over the F5 backbone to CE, which in turn reaches your service on private subnets. During a CDN failure, you don’t have to expose an emergency public listener or widen security groups. 4) Avoid “security arbitrage” during failover Use a policy‑as‑code approach to keep WAAP intent portable and synchronized between providers (e.g., a positive security model for APIs, shared bot policies). Where applicable, F5 Bot Defense Connector can evaluate traffic headed through Cloudflare so your bot verdicts are identical across both paths. 5) Close the loop: observability, drills, and policy evolution To maintain continuous situational awareness, stream DNS/GSLB state changes, WAAP events, and synthetic monitoring results into your SIEM, such as Splunk or Datadog, to build a global traffic‑health view rather than a narrow “Provider‑X‑only” dashboard. Reinforce this operational picture with quarterly Game Days that deliberately disable the primary pool in XC to validate time‑to‑detect, time‑to‑shift, and confirm that the secondary path meets expected performance and SLO targets. What happens in real outages? Scenario A: A CDN logical push breaks the edge: Your synthetics see 503s from multiple vantage points through the CDN CNAME. The primary pool flips to Down, and XC DNS starts returning the XC VIP. As client caches expire (short TTLs help), users seamlessly land on the XC path (WAAP intact), traverse the F5 backbone, and reach your private origins through CE. This bypasses the global bad push entirely. Scenario B: Cloud control plane failure (e.g., US‑East‑1): Because XC runs on an independent control plane, its monitors and DNS orchestration continue operating. GSLB marks the impacted origin/region down and returns a DR pool (another region or on‑prem behind CE), even if provider‑native tools are unresponsive. Why F5 for the DNS layer? A hybrid DNS/GSLB toolset like the one found in F5’s Distributed Cloud is built for multicloud: F5 XC DNS Load Balancer steers to any origin, IPs or CNAMEs, so you can keep your preferred CDN while retaining an independent steering layer. This toolset is further complemented by Distributed Cloud Synthetic Monitoring. Designed for grey‑failure detection, it simulates user traffic against endpoints to quantify health and performance in real-time, to catch issues that liveness checks miss. All of these functions run on F5’s independent backbone and dedicated Customer Edge nodes. These private, encrypted paths to origins reduce exposure and keep performance stable when the public internet is noisy. Tying back to the five antifragile practices Blast Radius Control: Isolate critical names and flows in distinct DNS/GSLB policies so one provider’s trouble doesn’t cascade. Dependency Diversification: Maintain two delivery planes (Primary CDN + XC WAAP) with independent health and failover. Policy‑Driven Adaptation: Encode SLOs and health criteria in XC GSLB so failover is autonomous and fast. Incremental Adaptation: Use WAAP to prioritize/shape traffic under stress (rate limits, bot controls, feature flags), keeping core transactions hot. Observation‑Informed Governance: Synthetics + SIEM + Game Days create the learning loop and harden policy over time. Quick-start checklist (copy/paste into your runbook) Delegate critical zones to XC DNS; set low TTLs for fast convergence. Create GSLB pools: Primary = CDN CNAME; Secondary = XC VIP/CNAME; select priority/latency steering. Add synthetics: validate status, content, and p95 latency from ≥5 vantage points; wire to pool health. Deploy CE into each origin VPC/DC; lock origins to CE ingress only. Unify WAAP/bot policies across paths (policy‑as‑code; connectors where applicable). Instrument & drill: stream GSLB/WAAP logs to SIEM and run quarterly Game Days80Views1like0CommentsF5 Container Ingress Services (CIS) deployment using Cilium CNI and static routes
F5 Container Ingress Services (CIS) supports static route configuration to enable direct routing from F5 BIG-IP to Kubernetes/OpenShift Pods as an alternative to VXLAN tunnels. Static routes are enabled in the F5 CIS CLI/Helm yaml manifest using the argument --static-routing-mode=true. In this article, we will use Cilium as the Container Network Interface (CNI) and configure static routes for an NGINX deployment For initial configuration of the BIG-IP, including AS3 installation, please see https://clouddocs.f5.com/products/extensions/f5-appsvcs-extension/latest/userguide/installation.html and https://clouddocs.f5.com/containers/latest/userguide/kubernetes/#cis-installation The first step is to install Cilium CNI using the steps below on Linux host: CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt) CLI_ARCH=amd64 if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} cilium install --version 1.18.5 cilium status cilium status --wait root@ciliumk8s-ubuntu-server:~# cilium status --wait /¯¯\ /¯¯\__/¯¯\ Cilium: OK \__/¯¯\__/ Operator: OK /¯¯\__/¯¯\ Envoy DaemonSet: OK \__/¯¯\__/ Hubble Relay: disabled \__/ ClusterMesh: disabled DaemonSet cilium Desired: 1, Ready: 1/1, Available: 1/1 DaemonSet cilium-envoy Desired: 1, Ready: 1/1, Available: 1/1 Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1 Containers: cilium Running: 1 cilium-envoy Running: 1 cilium-operator Running: 1 clustermesh-apiserver hubble-relay Cluster Pods: 6/6 managed by Cilium Helm chart version: 1.18.3 Image versions cilium quay.io/cilium/cilium:v1.18.3@sha256:5649db451c88d928ea585514746d50d91e6210801b300c897283ea319d68de15: 1 cilium-envoy quay.io/cilium/cilium-envoy:v1.34.10-1761014632-c360e8557eb41011dfb5210f8fb53fed6c0b3222@sha256:ca76eb4e9812d114c7f43215a742c00b8bf41200992af0d21b5561d46156fd15: 1 cilium-operator quay.io/cilium/operator-generic:v1.18.3@sha256:b5a0138e1a38e4437c5215257ff4e35373619501f4877dbaf92c89ecfad81797: 1 cilium connectivity test root@ciliumk8s-ubuntu-server:~# cilium connectivity test ℹ️ Monitor aggregation detected, will skip some flow validation steps ✨ [default] Creating namespace cilium-test-1 for connectivity check... ✨ [default] Deploying echo-same-node service... ✨ [default] Deploying DNS test server configmap... ✨ [default] Deploying same-node deployment... ✨ [default] Deploying client deployment... ✨ [default] Deploying client2 deployment... ✨ [default] Deploying ccnp deployment... ⌛ [default] Waiting for deployment cilium-test-1/client to become ready... ⌛ [default] Waiting for deployment cilium-test-1/client2 to become ready... ⌛ [default] Waiting for deployment cilium-test-1/echo-same-node to become ready... ⌛ [default] Waiting for deployment cilium-test-ccnp1/client-ccnp to become ready... ⌛ [default] Waiting for deployment cilium-test-ccnp2/client-ccnp to become ready... ⌛ [default] Waiting for pod cilium-test-1/client-645b68dcf7-s5mdb to reach DNS server on cilium-test-1/echo-same-node-f5b8d454c-qkgq9 pod... ⌛ [default] Waiting for pod cilium-test-1/client2-66475877c6-cw7f5 to reach DNS server on cilium-test-1/echo-same-node-f5b8d454c-qkgq9 pod... ⌛ [default] Waiting for pod cilium-test-1/client-645b68dcf7-s5mdb to reach default/kubernetes service... ⌛ [default] Waiting for pod cilium-test-1/client2-66475877c6-cw7f5 to reach default/kubernetes service... ⌛ [default] Waiting for Service cilium-test-1/echo-same-node to become ready... ⌛ [default] Waiting for Service cilium-test-1/echo-same-node to be synchronized by Cilium pod kube-system/cilium-lxjxf ⌛ [default] Waiting for NodePort 10.69.12.2:32046 (cilium-test-1/echo-same-node) to become ready... 🔭 Enabling Hubble telescope... ⚠️ Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:4245: connect: connection refused" ℹ️ Expose Relay locally with: cilium hubble enable cilium hubble port-forward& ℹ️ Cilium version: 1.18.3 🏃[cilium-test-1] Running 126 tests ... [=] [cilium-test-1] Test [no-policies] [1/126] .................... [=] [cilium-test-1] Skipping test [no-policies-from-outside] [2/126] (skipped by condition) [=] [cilium-test-1] Test [no-policies-extra] [3/126] <- snip -> For this article, we will install k3s with Cilium CNI root@ciliumk8s-ubuntu-server:~# curl -sfL https://get.k3s.io | sh -s - --flannel-backend=none --disable-kube-proxy --disable servicelb --disable-network-policy --disable traefik --cluster-init --node-ip=10.69.12.2 --cluster-cidr=10.42.0.0/16 root@ciliumk8s-ubuntu-server:~# mkdir -p $HOME/.kube root@ciliumk8s-ubuntu-server:~# sudo cp -i /etc/rancher/k3s/k3s.yaml $HOME/.kube/config root@ciliumk8s-ubuntu-server:~# sudo chown $(id -u):$(id -g) $HOME/.kube/config root@ciliumk8s-ubuntu-server:~# echo "export KUBECONFIG=$HOME/.kube/config" >> $HOME/.bashrc root@ciliumk8s-ubuntu-server:~# source $HOME/.bashrc API_SERVER_IP=10.69.12.2 API_SERVER_PORT=6443 CLUSTER_ID=1 CLUSTER_NAME=`hostname` POD_CIDR="10.42.0.0/16" root@ciliumk8s-ubuntu-server:~# cilium install --set cluster.id=${CLUSTER_ID} --set cluster.name=${CLUSTER_NAME} --set k8sServiceHost=${API_SERVER_IP} --set k8sServicePort=${API_SERVER_PORT} --set ipam.operator.clusterPoolIPv4PodCIDRList=$POD_CIDR --set kubeProxyReplacement=true --helm-set=operator.replicas=1 root@ciliumk8s-ubuntu-server:~# cilium config view | grep cluster bpf-lb-external-clusterip false cluster-id 1 cluster-name ciliumk8s-ubuntu-server cluster-pool-ipv4-cidr 10.42.0.0/16 cluster-pool-ipv4-mask-size 24 clustermesh-enable-endpoint-sync false clustermesh-enable-mcs-api false ipam cluster-pool max-connected-clusters 255 policy-default-local-cluster false root@ciliumk8s-ubuntu-server:~# cilium status --wait The F5 CIS yaml manifest for deployment using Helm Note that these arguments are required for CIS to leverage static routes static-routing-mode: true orchestration-cni: cilium-k8s We will also be installing custom resources, so this argument is also required 3. custom-resource-mode: true Values yaml manifest for Helm deployment bigip_login_secret: f5-bigip-ctlr-login bigip_secret: create: false username: password: rbac: create: true serviceAccount: # Specifies whether a service account should be created create: true # The name of the service account to use. # If not set and create is true, a name is generated using the fullname template name: k8s-bigip-ctlr # This namespace is where the Controller lives; namespace: kube-system ingressClass: create: true ingressClassName: f5 isDefaultIngressController: true args: # See https://clouddocs.f5.com/containers/latest/userguide/config-parameters.html # NOTE: helm has difficulty with values using `-`; `_` are used for naming # and are replaced with `-` during rendering. # REQUIRED Params bigip_url: X.X.X.S bigip_partition: <BIG-IP_PARTITION> # OPTIONAL PARAMS -- uncomment and provide values for those you wish to use. static-routing-mode: true orchestration-cni: cilium-k8s # verify_interval: # node-poll_interval: # log_level: DEBUG # python_basedir: ~ # VXLAN # openshift_sdn_name: # flannel_name: cilium-vxlan # KUBERNETES # default_ingress_ip: # kubeconfig: # namespaces: ["foo", "bar"] # namespace_label: # node_label_selector: pool_member_type: cluster # resolve_ingress_names: # running_in_cluster: # use_node_internal: # use_secrets: insecure: true custom-resource-mode: true log-as3-response: true as3-validation: true # gtm-bigip-password # gtm-bigip-url # gtm-bigip-username # ipam : true image: # Use the tag to target a specific version of the Controller user: f5networks repo: k8s-bigip-ctlr pullPolicy: Always version: latest # affinity: # nodeAffinity: # requiredDuringSchedulingIgnoredDuringExecution: # nodeSelectorTerms: # - matchExpressions: # - key: kubernetes.io/arch # operator: Exists # securityContext: # runAsUser: 1000 # runAsGroup: 3000 # fsGroup: 2000 # If you want to specify resources, uncomment the following # limits_cpu: 100m # limits_memory: 512Mi # requests_cpu: 100m # requests_memory: 512Mi # Set podSecurityContext for Pod Security Admission and Pod Security Standards # podSecurityContext: # runAsUser: 1000 # runAsGroup: 1000 # privileged: true Installation steps for deploying F5 CIS using helm can be found in this link https://clouddocs.f5.com/containers/latest/userguide/kubernetes/ Once F5 CIS is validated to be up and running, we can now deploy the following application example root@ciliumk8s-ubuntu-server:~# cat application.yaml apiVersion: cis.f5.com/v1 kind: VirtualServer metadata: labels: f5cr: "true" name: goblin-virtual-server namespace: nsgoblin spec: host: goblin.com pools: - path: /green service: svc-nodeport servicePort: 80 - path: /harry service: svc-nodeport servicePort: 80 virtualServerAddress: X.X.X.X --- apiVersion: apps/v1 kind: Deployment metadata: name: goblin-backend namespace: nsgoblin spec: replicas: 2 selector: matchLabels: app: goblin-backend template: metadata: labels: app: goblin-backend spec: containers: - name: goblin-backend image: nginx:latest ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: svc-nodeport namespace: nsgoblin spec: selector: app: goblin-backend ports: - port: 80 targetPort: 80 type: ClusterIP k apply -f application.yaml We can now verify the k8s pods are created. Then we will create a sample html page to test access to the backend NGINX pod root@ciliumk8s-ubuntu-server:~# k -n nsgoblin get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES goblin-backend-7485b6dcdf-d5t48 1/1 Running 0 6d2h 10.42.0.70 ciliumk8s-ubuntu-server <none> <none> goblin-backend-7485b6dcdf-pt7hx 1/1 Running 0 6d2h 10.42.0.97 ciliumk8s-ubuntu-server <none> <none> root@ciliumk8s-ubuntu-server:~# k -n nsgoblin exec -it po/goblin-backend-7485b6dcdf-pt7hx -- /bin/sh # cat > green <<'EOF' <!DOCTYPE html> > > <html> > <head> <title>Green Goblin</title> <style> body { background-color: #4CAF50; color: white; text-align: center; padding: 50px; } h1 { font-size: 3em; } > > > > > </style> </head> <body> <h1>I am the green goblin!</h1> <p>Access me at /green</p> </body> </html> > > > > > > > EOF root@ciliumk8s-ubuntu-server:~# k -n nsgoblin exec -it goblin-backend-7485b6dcdf-d5t48 -- /bin/sh # cat > green <<'EOF' > <!DOCTYPE html> <html> <head> <title>Green Goblin</title> <style> body { background-color: #4CAF50; color: white; text-align: center; padding: 50px; } h1 { font-size: 3em; } </style> > </head> <body> <h1>I am the green goblin!</h1> <p>Access me at /green</p> </body> </html> EOF> > > > > > > > > > > > > We can now validate the pools are created on the F5 BIG-IP root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes/Shared)(tmos)# list ltm pool all ltm pool svc_nodeport_80_nsgoblin_goblin_com_green { description "crd_10_69_12_40_80 loadbalances this pool" members { /kubernetes/10.42.0.70:http { address 10.42.0.70 } /kubernetes/10.42.0.97:http { address 10.42.0.97 } } min-active-members 1 partition kubernetes } ltm pool svc_nodeport_80_nsgoblin_goblin_com_harry { description "crd_10_69_12_40_80 loadbalances this pool" members { /kubernetes/10.42.0.70:http { address 10.42.0.70 } /kubernetes/10.42.0.97:http { address 10.42.0.97 } } min-active-members 1 partition kubernetes } root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes/Shared)(tmos)# list ltm virtual crd_10_69_12_40_80 ltm virtual crd_10_69_12_40_80 { creation-time 2025-12-22:10:10:37 description Shared destination /kubernetes/10.69.12.40:http ip-protocol tcp last-modified-time 2025-12-22:10:10:37 mask 255.255.255.255 partition kubernetes persist { /Common/cookie { default yes } } policies { crd_10_69_12_40_80_goblin_com_policy { } } profiles { /Common/f5-tcp-progressive { } /Common/http { } } serverssl-use-sni disabled source 0.0.0.0/0 source-address-translation { type automap } translate-address enabled translate-port enabled vs-index 2 } CIS log output 2025/12/22 18:10:25 [INFO] [Request: 1] cluster local requested CREATE in VIRTUALSERVER nsgoblin/goblin-virtual-server 2025/12/22 18:10:25 [INFO] [Request: 1][AS3] creating a new AS3 manifest 2025/12/22 18:10:25 [INFO] [Request: 1][AS3][BigIP] posting request to https://10.69.12.1 for tenants 2025/12/22 18:10:26 [INFO] [Request: 2] cluster local requested UPDATE in ENDPOINTS nsgoblin/svc-nodeport 2025/12/22 18:10:26 [INFO] [Request: 3] cluster local requested UPDATE in ENDPOINTS nsgoblin/svc-nodeport 2025/12/22 18:10:43 [INFO] [Request: 1][AS3][BigIP] post resulted in SUCCESS 2025/12/22 18:10:43 [INFO] [AS3][POST] SUCCESS: code: 200 --- tenant:kubernetes --- message: success 2025/12/22 18:10:43 [INFO] [Request: 3][AS3] Processing request 2025/12/22 18:10:43 [INFO] [Request: 3][AS3] creating a new AS3 manifest 2025/12/22 18:10:43 [INFO] [Request: 3][AS3][BigIP] posting request to https://10.69.12.1 for tenants 2025/12/22 18:10:43 [INFO] Successfully updated status of VirtualServer:nsgoblin/goblin-virtual-server in Cluster W1222 18:10:49.238444 1 warnings.go:70] v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice 2025/12/22 18:10:52 [INFO] [Request: 3][AS3][BigIP] post resulted in SUCCESS 2025/12/22 18:10:52 [INFO] [AS3][POST] SUCCESS: code: 200 --- tenant:kubernetes --- message: success 2025/12/22 18:10:52 [INFO] Successfully updated status of VirtualServer:nsgoblin/goblin-virtual-server in Cluster Troubleshooting: 1. If static routes are not added, the first step is to inspect CIS logs for entries similar to these: Cilium annotation warning logs 2025/12/22 17:44:45 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:41 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:42 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:43 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2. These are resolved by adding annotations to the node using the reference: https://clouddocs.f5.com/containers/latest/userguide/static-route-support.html Cilium annotation for node root@ciliumk8s-ubuntu-server:~# k annotate node ciliumk8s-ubuntu-server io.cilium.network.ipv4-pod-cidr=10.42.0.0/16 root@ciliumk8s-ubuntu-server:~# k describe node | grep -E "Annotations:|PodCIDR:|^\s+.*pod-cidr" Annotations: alpha.kubernetes.io/provided-node-ip: 10.69.12.2 io.cilium.network.ipv4-pod-cidr: 10.42.0.0/16 PodCIDR: 10.42.0.0/24 3. Verify a static route has been created and test connectivity to k8s pods root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes)(tmos)# list net route net route k8s-ciliumk8s-ubuntu-server-10.69.12.2 { description 10.69.12.1 gw 10.69.12.2 network 10.42.0.0/16 partition kubernetes } Using pup (command line HTML parser) -> https://commandmasters.com/commands/pup-common/ root@ciliumk8s-ubuntu-server:~# curl -s http://goblin.com/green | pup 'body text{}' I am the green goblin! Access me at /green 1 0.000000 10.69.12.34 ? 10.69.12.40 TCP 78 34294 ? 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM TSval=2984295232 TSecr=0 WS=128 2 0.000045 10.69.12.40 ? 10.69.12.34 TCP 78 80 ? 34294 [SYN, ACK] Seq=0 Ack=1 Win=23360 Len=0 MSS=1460 WS=512 SACK_PERM TSval=1809316303 TSecr=2984295232 3 0.001134 10.69.12.34 ? 10.69.12.40 TCP 70 34294 ? 80 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=2984295234 TSecr=1809316303 4 0.001151 10.69.12.34 ? 10.69.12.40 HTTP 149 GET /green HTTP/1.1 5 0.001343 10.69.12.40 ? 10.69.12.34 TCP 70 80 ? 34294 [ACK] Seq=1 Ack=80 Win=23040 Len=0 TSval=1809316304 TSecr=2984295234 6 0.002497 10.69.12.1 ? 10.42.0.97 TCP 78 33707 ? 80 [SYN] Seq=0 Win=23360 Len=0 MSS=1460 WS=512 SACK_PERM TSval=1809316304 TSecr=0 7 0.003614 10.42.0.97 ? 10.69.12.1 TCP 78 80 ? 33707 [SYN, ACK] Seq=0 Ack=1 Win=64308 Len=0 MSS=1410 SACK_PERM TSval=1012609408 TSecr=1809316304 WS=128 8 0.003636 10.69.12.1 ? 10.42.0.97 TCP 70 33707 ? 80 [ACK] Seq=1 Ack=1 Win=23040 Len=0 TSval=1809316307 TSecr=1012609408 9 0.003680 10.69.12.1 ? 10.42.0.97 HTTP 149 GET /green HTTP/1.1 10 0.004774 10.42.0.97 ? 10.69.12.1 TCP 70 80 ? 33707 [ACK] Seq=1 Ack=80 Win=64256 Len=0 TSval=1012609409 TSecr=1809316307 11 0.004790 10.42.0.97 ? 10.69.12.1 TCP 323 HTTP/1.1 200 OK [TCP segment of a reassembled PDU] 12 0.004796 10.42.0.97 ? 10.69.12.1 HTTP 384 HTTP/1.1 200 OK 13 0.004820 10.69.12.40 ? 10.69.12.34 TCP 448 HTTP/1.1 200 OK [TCP segment of a reassembled PDU] 14 0.004838 10.69.12.1 ? 10.42.0.97 TCP 70 33707 ? 80 [ACK] Seq=80 Ack=254 Win=23552 Len=0 TSval=1809316308 TSecr=1012609410 15 0.004854 10.69.12.40 ? 10.69.12.34 HTTP 384 HTTP/1.1 200 OK Summary: There we have it, we have successfully deployed an NGINX application on a Kubernetes cluster managed by F5 CIS using static routes to forward traffic to the kubernetes pods409Views3likes2CommentsAWS Transit Gateway Connect: GRE + BGP = ?
GRE and BGP are technologies that are... mature. In this article we'll take a look at how you can use AWS Transit Gateway Connect to do some unique networking and application delivery in the cloud. In December 2020 AWS released a new feature of Transit Gateway (TGW) that enables a device to peer with TGW via a GRE/BGP tunnel. The intent was to be used with SD-WAN devices, but we can also use it for things like load balancing many internal private addresses, NAT gateway, etc... In this article we'll look at my experience of setting up TGW Connect in a lab environment based on F5's documentation for setting up GRE and BGP. Challenges with TGW For folks that are not familiar with TGW it is an AWS service that allows you to stitch together multiple physical and virtual networks via AWS internal networking (VPC peering) or via network protocols (VPN, Direct Connect (private L2)). Using TGW you can steer traffic to a specific network device by creating a route table entry within a VPC that points to the device's ENI (network interface). This is useful for a case where you want to send all traffic for a specific CIDR (192.0.2.0/24) to traverse that device. In the scenario where a device is responsible for a CIDR it is also responsible for updating the route table for HA. This could be done via Lambda function, our Cloud Failover Extension, manual updates, etc.... The other downside is that this limits you to a single device per Availability Zone to receive traffic for that CIDR. TGW Connect provides a mechanism for the device to use a GRE tunnel/BGP to establish a connection to TGW and use dynamic routing protocols (BGP) to advertise the health of the device. This allows you to establish up to 4 devices to peer with TGW with up to 5 Gbps of traffic per connection (for comparison you can burst up to 50 Gbps with a VPC connection). Topology of TGW Connect When using TGW Connect it re-uses existing TGW connections. In practice this means that you are likely using an existing Direct Connect or VPC connection (I guess you could also use a VPN connection, but that would be weird). See also: https://aws.amazon.com/blogs/networking-and-content-delivery/simplify-sd-wan-connectivity-with-aws-transit-gateway-connect/ Configuring a BIG-IP for TGW Connect To use a BIG-IP with TGW Connect you will need a device that is licensed for BGP (also called Advanced Routing, part of Better/Best). Follow the steps for setting up TGW Connect and be sure to specify a different peer ASN than your TGW (you will need to use eBGP). The "Peer Address" will be the self-ip of the BIG-IP on the AWS VPC (when using a VPC). Configuring target VPC When setting up TGW Connect you will peer with an existing VPC. In the subnet that you want to use with TGW Connect (Peer Address of GRE tunnel) you will need to have a route that points to the TGW peer address; for example if you specify a CIDR of 10.254.254.0/24 for TGW and the peer address is 10.254.254.11 you will need to create a route that includes 10.254.254.0/24 on the subnet for BIG-IP peer address. Also make sure to open up Security Groups to allow GRE traffic to traverse to/from the interface that will be used for the GRE tunnel. The rule should allow the IP of the peer address (i.e. 10.254.254.11). Route to TGW from 10.1.7.0/24 GRE Tunnel On the BIG-IP under Network / Tunnels you will need to create a GRE Tunnel. You can use the default "gre" tunnel profile. Specify the same "Peer Address" that you used when setting up TGW Connect. You will also want to specify the Remote Address that is the TGW address. BGP Peer Next you will need to configure the BIG-IP to act as a BGP peer to TGW connecting over the GRE tunnel. TGW requires that you use an IP in the 169.254.0.0 range. This will require modify a db variable to allow that address to be used as a self-ip. The tmsh command to use is. modify sys db config.allow.rfc3927 { value "enable" } You can then create your BGP peer address to match the value that you used in TGW Connect. The BGP peer address will need to be configured to allow BGP updates (port 179). Since the traffic is occurring over the GRE tunnel there is no need to update AWS Security Groups (invisible to the ENI). Setting up BGP Peering TGW Connect requires eBGP to be used. The following is an example of a working config. This also assumes you go through the pre-req of setting up BGP/RHI. Be careful to only advertise the routes that you want, when you use "redistribute kernel" it will also advertise 0.0.0.0/0! Please also see: https://support.f5.com/csp/article/K15923612 ! no service password-encryption ! router bgp 65520 bgp graceful-restart restart-time 120 aggregate-address 10.0.0.0/16 summary-only aggregate-address 10.1.0.0/16 summary-only aggregate-address 10.2.0.0/16 summary-only aggregate-address 10.3.0.0/16 summary-only redistribute kernel neighbor 169.254.10.2 remote-as 64512 neighbor 169.254.10.2 ebgp-multihop 2 neighbor 169.254.10.2 soft-reconfiguration inbound neighbor 169.254.10.2 prefix-list tenlist out neighbor 169.254.10.3 remote-as 64512 neighbor 169.254.10.3 ebgp-multihop 2 neighbor 169.254.10.3 soft-reconfiguration inbound neighbor 169.254.10.3 prefix-list tenlist out ! ip prefix-list tenlist seq 5 deny 10.254.254.0/24 ip prefix-list tenlist seq 10 permit 10.0.0.0/8 ge 16 ! line con 0 login line vty 0 39 login ! end This example was created with help from a BGP expert Brandon Frelich. In the example above we only limiting routes to 3 CIDRs and configuring ECMP. At this point the BIG-IP could allocate VIPs on the CIDR, act as an AFM firewall, and if we used 0.0.0.0/0 it could act as an outbound gateway. Verifying the Setup You should be able to see your BGP connection go green in the AWS console and also see the status by running "show ip bgp neighbors" from imish. AWS Console ip-10-1-1-112.ec2.internal[0]>show ip bgp summary BGP router identifier 169.254.10.1, local AS number 65520 BGP table version is 4 2 BGP AS-PATH entries 0 BGP community entries Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 169.254.10.2 4 64512 293 293 4 0 0 00:47:08 3 169.254.10.3 4 64512 293 292 4 0 0 00:47:08 3 Total number of neighbors 2 Output from imish (tmos)# list /ltm virtual-address one-line ltm virtual-address 10.0.0.0 { address 10.0.0.0 arp disabled floating disabled icmp-echo disabled mask 255.255.0.0 route-advertisement selective traffic-group none unit 0 } ltm virtual-address 10.1.0.0 { address 10.1.0.0 arp disabled floating disabled icmp-echo disabled mask 255.255.0.0 route-advertisement selective traffic-group none unit 0 } ltm virtual-address 10.2.0.0 { address 10.2.0.0 arp disabled floating disabled icmp-echo disabled mask 255.255.0.0 route-advertisement selective traffic-group none unit 0 } (tmos)# list /net route tgw net route tgw { interface /Common/tgw-connect network 10.0.0.0/8 } Output from TMSH. Note route-advertisement is enabled on the virtual-addresses. We are using a static route to steer traffic to the GRE tunnel. You should also see any routes advertised as well. ECMP Considerations When you deploy multiple BIG-IP devices TGW can use ECMP to spray traffic across multiple devices (by enabling traffic group None or multiple standalone devices). Be aware that if you need to statefully inspect traffic you may want to enable SNAT to have the return traffic go to the same device or use traffic-group-1 to run in Active/Standby via Route Health Injection. Otherwise you will need to follow the guidance on setting up a forwarding virtual server to ignore the system connection table. Routing Connections One of the issues that a customer discovered when exploring this solution is that the BIG-IP will initially send health checks across the GRE tunnel using an IP from the 169.254.x.x address range, this follows the address selection criteria that a BIG-IP uses. One method of dealing with this is to assign an IP address in a range that you would like to advertise across the tunnel like 198.168.254.0/24. Creating a self-ip of 198.168.254.253 that you assign to the tunnel. To send traffic for a different range (i.e. 10.0.0.0/16) you can then create a static route on the BIG-IP that points to 198.168.254.1. Since the BIG-IP sees the address is on the tunnel it will correctly forward the traffic through the tunnel. Another question that arose was whether it was possible to have asymmetric traffic flows of utilizing both the GRE TGW Connect tunnel and the TGW VPC Connection of the VPC itself. I discovered that YES this is possible by following the guidance on enabling asymmetrically routed traffic. It hurts your brain a bit, but here's the results of a flow that is using both a Connect and VPC connection. You can do some crazy things, but with great power... Request traffic over VPC Connection Response traffic over TGW Connect (GRE) Other Options You could also achieve a similar result by using default VPC peering and making use of Cloud Failover Extension for updating the route table. The benefit of that approach is that you don't have to deal with GRE/BGP! It does limit you to a single device per-AZ vs. being able to get up to 4 devices running across a connection.7.4KViews2likes1CommentPractical Mapping Guide - F5 BIG-IP TMOS Modules to Feature-Scoped CNFs
Introduction BIG-IP TMOS and BIG-IP CNFs solve similar problems with very different deployment and configuration models. In TMOS you deploy whole modules (LTM, AFM, DNS, etc.), while in CNFs you deploy only the specific features and data plane functions you need as cloud native components. Modules vs feature scoped services In TMOS, enabling LTM gives you a broad set of capabilities in one module: virtual servers, pools, profiles, iRules, SNAT, persistence, etc., all living in the same configuration namespace. In CNFs, you deploy only the features you actually need, expressed as discrete custom resources: for example, a routing CNF for more precise TMM routing, a firewall CNF for policies, or specific CRs for profiles and NAT, rather than bringing the entire LTM or AFM module along “just in case”. Configuration objects vs custom resources TMOS configuration is organized as objects under modules (for example: LTM virtual, LTM pool, security, firewall policy, net vlan), managed via tmsh, GUI, or iControl REST. CNFs expose those same capabilities through Kubernetes Custom Resources (CRs): routes, policies, profiles, NAT, VLANs, and so on are expressed as YAML and applied with GitOps-style workflows, making individual features independently deployable with a version history. Coarse-grained vs precise feature deployment On TMOS, deploying a use case often means standing up a full BIG-IP instance with multiple modules enabled, even if the application only needs basic load balancing with a single HTTP profile and a couple of firewall rules. With CNFs, you can carve out exactly what you need: for example, only a precise TMM routing function plus the specific TCP/HTTP/SSL profiles and security policies required for a given application or edge segment, reducing blast radius and resource footprint. BIG-IP Next CNF Custom Resources (CRs) extend Kubernetes APIs to configure the Traffic Management Microkernel (TMM) and Advanced Firewall Manager (AFM), providing declarative equivalents to BIG-IP TMOS configuration objects. This mapping ensures functional parity for L4-L7 services, security, and networking while enabling cloud-native scalability. Focus here covers core mappings with examples for iRules and Profiles. What are Kubernetes Custom Resources Think of Kubernetes API as a menu of objects you can create: Pods (containers), Services (networking), Deployments (replicas). CRs add new menu items for your unique needs. You first define a CustomResourceDefinition (CRD) (the blueprint), then create CR instances (actual objects using that blueprint). How CRs Work (Step-by-Step) Create CRD - Define new object type (e.g., "F5BigFwPolicy") Apply CRD - Kubernetes API server registers it, adding new REST endpoints like /apis/f5net.com/v1/f5bigfwpolicies Create CR - Write YAML instances of your new type and apply with kubectl Controllers Watch - Custom operators/controllers react to CR changes, creating real resources (Pods, ConfigMaps, etc.) CR examples Networking and NAT Mappings Networking CRs handle interfaces and routing, mirroring, TMOS VLANs, Self-IPs and NAT mapping. CNF CR TMOS Object(s) Purpose F5BigNetVlan VLANs, Self-IPs, MTU Interface config F5BigCneSnatpool SNAT Automap Source NAT s Example: F5BigNetVlan CR sets VLAN tags and IPs, applied via kubectl apply, propagating to TMM like tmsh create net vlan. Security and Protection Mappings Protection CRs map to AFM and DoS modules for threat mitigation. CNF CR CR Purpose TMOS Object(s) Purpose F5BigFwPolicy applies industry-standard firewall rules to TMM AFM Policies Stateful ACL filtering F5BigFwRulelist Consists of an array of ACL rules AFM Rule list Stateful ACL filtering F5BigSvcPolicy Allows creation of Timer Policies and attaching them to the Firewall Rules AFM Policies Stateful ACL filtering F5BigIpsPolicy Allows you to filter inspected traffic (matched events) by various properties such as, the inspection profile’s host (virtual server or firewall policy), traffic properties, inspection action, or inspection service. IPS AFM policies Enforce IPS F5BigDdosGlobal Configures the TMM Proxy Pod to protect applications and the TMM Pod from Denial of Service / Distributed Denial of Service (Dos/DDoS) attacks. Global DoS/DDoS Device DoS protection F5BigDdosProfile The Percontext DDoS CRD configures the TMM Proxy Pod to protect applications from Denial of Service / Distributed Denial of Service (Dos/DDoS) attacks. DoS/DDoS per profile Per Context DoS profile protection F5BigIpiPolicy Each policy contains a list of categories and actions that can be customized, with IPRep database being common. Feedlist based policies can also customize the IP addresses configured. IP Intelligence Policies Reputation-based blocking Example: F5BigFwPolicy references rule-lists and zones, equivalent to tmsh creating security firewall policy. Traffic Management Mappings Traffic CRs proxy TCP/UDP and support ALGs, akin to LTM Virtual Servers and Pools. CNF CR TMOS Object(s) Purpose Part of the Apps CRs Virtual Server (Ingress/HTTPRoute/GRPCRoute) LTM Virtual Servers Load balancing Part of the Apps CRs Pool (Endpoints) LTM Pools Node groups F5BigAlgFtp FTP Profile/ALG FTP gateway F5BigDnsCache DNS Cache Resolution/caching Example: GRPCRoute CR defines backend endpoints and profiles, mapping to tmsh create ltm virtual. Profiles Mappings with Examples Profiles CRs customize traffic handling, referenced by Traffic Management CRs, directly paralleling TMOS profiles. CNF CR TMOS Profile(s) Example Configuration Snippet F5BigTcpSetting TCP Profile spec: { defaultsFrom: "tcp-lan-optimized" } tunes congestion like tmsh create LTM profile TCP optimized F5BigUdpSetting UDP Profile spec: { idleTimeout: 300 } sets timeouts F5BigUdpSetting HTTP Profile spec: { http2Profile: "http2" } enables HTTP/2 F5BigClientSslSetting ClientSSL Profile spec: { certKeyChain: { name: "default" } } for TLS termination F5PersistenceProfile Persistence Profile Enables source/dest persistence Profiles attach to Virtual Servers/Ingresses declaratively, e.g., spec.profiles: [{ name: "my-tcp", context: "tcp" }]. iRules Support and Examples CNF fully supports iRules for custom logic, integrated into Traffic Management CRs like Virtual Servers or FTP/RTSP. iRules execute in TMM, preserving TMOS scripting. Example CR Snippet: apiVersion: k8s.f5net.com/v1 kind: F5BigCneIrule metadata: name: cnfs-dns-irule namespace: cnf-gateway spec: iRule: > when DNS_REQUEST { if { [IP::addr [IP::remote_addr] equals 10.10.1.0/24] } { cname cname.siterequest.com } else { host 10.20.20.20 } } This mapping facilitates TMOS-to-CNF migrations using tools like F5Big-Ip Controller for CR generation from UCS configs. For full CR specs, refer to official docs. The updated full list of CNF CRs per release can be found over here, https://clouddocs.f5.com/cnfs/robin/latest/cnf-custom-resources.html Conclusion In this article, we examined how BIG-IP Next CNF redefines the TMOS module model into a feature-scoped, cloud-native service architecture. By mapping TMOS objects such as virtual servers, profiles, and security policies to their corresponding CNF Custom Resources, we illustrated how familiar traffic management and security constructs translate into declarative Kubernetes workflows. Related content F5 Cloud-Native Network Functions configurations guide BIG-IP Next Cloud-Native Network Functions (CNFs) CNF DNS Express BIG-IP Next for Kubernetes CNFs - DNS walkthrough BIG-IP Next for Kubernetes CNFs deployment walkthrough | DevCentral BIG-IP Next Edge Firewall CNF for Edge workloads | DevCentral Modern Applications-Demystifying Ingress solutions flavors | DevCentral87Views1like0CommentsBIG-IP for Scalable App Delivery & Security in Hybrid Environments
Scope As enterprises deploy multiple instances of the same applications across diverse infrastructure platforms such as VMware, OpenShift, Nutanix, and public cloud environments and across geographically distributed locations to support redundancy and facilitate seamless migration, they face increasing challenges in ensuring consistent performance, centralized security, and operational visibility. The complexity of managing distributed application traffic, enforcing uniform security policies, and maintaining high availability across hybrid environments introduces significant operational overhead and risk, hindering agility and scalability. F5 BIG-IP Application Delivery and Security address this challenge by providing a unified, policy-driven approach to manage secure workloads across hybrid multi-cloud environments. It can be used to scale up application services on existing infrastructure or with new business models. Introduction This article highlights how F5 BIG-IP deploys identical application workloads across multiple environments. This ensure high availability, seamless traffic management, and consistent performance. By supporting smooth workload transitions and zero-downtime deployments, F5 helps organizations maintain reliable, secure, and scalable applications. From a business perspective, it enhances operational agility, supports growing traffic demands, reduces risk during updates, and ultimately delivers a reliable, secure, and high-performance application experience that meets customer expectations and drives growth. This use case covers a typical enterprise setup with the following environments: VMware (On-Premises) Nutanix (On-Premises) OCP (On-Premises) Google Cloud Platform (GCP) Solution Overview The following video shows how F5 BIG-IP VE running on different virtualized platforms and environments can be configured to scale, secure, and deliver applications equally, even when located on-prem and in cloud environments. By providing a uniform interface and security policies organizations can focus on other priorities and changing business needs. Architecture Overview As illustrated in the diagram, when new application workloads are provisioned across environments such as AWS, GCP, VMware (on-prem), Nutanix (on-prem & VMware) BIG-IP ensures seamless integration with existing services. Platforms Supported Environments VMware On-Prem, GCP, Azure Nutanix On-Prem, AWS, Azure OCP On-Prem, AWS, Azure This article outlines the deployment in VMware platform. For deployment in other platforms like Nutanix and GCP, refer the detailed guide below. F5 Scalable Enterprise Workload Deployments Complete Guide Scalable Enterprise Workload Deployment Across Hybrid Environments Enterprise applications are deployed smoothly across multiple environments to address diverse customer needs. With F5’s advanced Application Delivery and Security features, organizations can ensure consistent performance, high availability, and robust protection across all deployment platforms. F5 provides a unified and secure application experience across cloud, on-premises, and virtualized environments. Workload Distribution Across Environments Workloads are distributed across the following environments: VMware: App A & App B OpenShift: App B Nutanix: App B & App C → VMware: Add App C → OpenShift: Add App A & App C → Nutanix: Add App A Applications being used: A → Juice Shop (Vulnerable web app for security testing) B → DVWA (Damn Vulnerable Web Application) C → Mutillidae Initial Infrastructure & B, Nutanix: App B &C, GCP: App B. VMware In the VMware on-premises environment, Applications A and B are deployed and connected to two separate load balancers. This forms the existing infrastructure. These applications are actively serving user traffic with delivery and security managed by BIG-IP. Web Application Firewall (WAF) is enabled, which will prevent any malicious threats. The corresponding logs can be found under BIG-IP > Security > Event Logs Note: This initial deployment infrastructure has also been implemented on Nutanix and GCP. For the full details, please consult the complete guide here Adding additional workloads To demonstrate BIG-IP’s ability to support evolving enterprise demands, we will introduce new workloads across all environments. This will validate its seamless integration, consistent security enforcement, and support for continuous delivery across hybrid infrastructures. VMware Let us add additional application-3 (mutillidae) to the VMware on-premises environment. Try to access the application through BIG-IP virtual server. Apply the WAF policy to the newly created virtual server, then verify the same by simulating malicious attacks. Nutanix The use case described for VMware is equally applicable and supported when deploying BIG-IP on Nutanix Bare Metal as well as Nutanix on VMware. For demonstration purposes, the Nutanix Community Edition hypervisor is booted as a virtual machine within VMware. Inside this hypervisor, a new virtual machine is created and provisioned using the BIG-IP image downloaded from the F5 Downloads portal. Once the BIG-IP instance is online, an additional VM hosting the application workload is deployed. This application VM is then associated with a BIG-IP virtual server, ensuring that the application remains isolated and protected from direct external exposure. OCP The use case described for VMware is equally applicable and fully supported when deploying BIG-IP with Red Hat OpenShift Container Platform (OCP) including Nutanix and VMware-based infrastructures. For demonstration, OCP is deployed on a virtualized cluster, while BIG-IP is provisioned externally using an image from the F5 Downloads portal. BIG-IP consumes the OpenShift configuration and dynamically creates the required virtual servers, pools, and health monitors. Traffic to the application is routed through BIG-IP, ensuring that the application remains isolated from direct external exposure while benefiting from enterprise-grade traffic management, security enforcement, and observability. GCP (Google Cloud Platform) The use case discussed above for VMware is also applicable and supported when deploying BIG-IP on public cloud platforms such as Azure, AWS, and GCP. For demonstration purposes, GCP is selected as the cloud environment for deploying BIG-IP. Within the same project where the BIG-IP instance is provisioned, an additional virtual machine hosting application workloads is deployed and associated with the BIG-IP virtual server. This setup ensures that the application workloads remain protected behind BIG-IP, preventing direct external exposure. Key Resources: Please refer to the detailed guide below, which outlines the deployment of Nutanix on VMware and GCP, and demonstrates how BIG-IP delivers consistent security, traffic management, and application delivery across hybrid environments. F5 Scalable Enterprise Workload Deployments Complete Guide Conclusion This demonstration clearly illustrates that BIG-IP’s Application Delivery and Security capabilities offer a robust, scalable, and consistent solution across both multi-cloud and on-premises environments. By deploying BIG-IP across diverse platforms, organizations can achieve uniform application security, while maintaining reliable connectivity, strong encryption, and comprehensive protection for both modern and legacy workloads. This unified approach allows businesses to seamlessly scale infrastructure and address evolving user demands without sacrificing performance, availability, or security. With BIG-IP, enterprises can confidently deliver applications with resilience and speed, while maintaining centralized control and policy enforcement across heterogeneous environments. Ultimately, BIG-IP empowers organizations to simplify operations, standardize security, and accelerate digital transformation across any environment. References F5 Application Delivery and Security Platform BIG-IP Data Sheet F5 Hybrid Security Architectures: One WAF Engine, Total Flexibility Distributed Cloud (XC) Github Repo BIG-IP Github Repo
503Views2likes0Comments
