cloud
3970 TopicsWhere SASE Ends and ADSP Begins, The Dual-Plane Zero Trust Model
Introduction Zero Trust Architecture (ZTA) mandates “never trust, always verify”, explicit policy enforcement across every user, device, network, application, and data flow, regardless of location. The challenge is that ZTA isn’t a single product. It’s a model that requires enforcement at multiple planes. Two converged platforms cover those planes: SASE at the access edge, and F5 ADSP at the application edge. This article explains what each platform does, where the boundary sits, and why both are necessary. Two Planes, One Architecture SASE and F5 ADSP are both converged networking and security platforms. Both deploy across hardware, software, and SaaS. Both serve NetOps, SecOps, and PlatformOps through unified consoles. But they enforce ZTA at different layers, and at different scales. SASE secures the user/access plane: it governs who reaches the network and under what conditions, using ZTNA (Zero Trust Network Access), SWG, CASB, and DLP. F5 ADSP secures the application plane: it governs what authenticated sessions can actually do once traffic arrives, using WAAP, bot management, API security, and ZTAA (Zero Trust Application Access). The NIST SP 800-207 distinction is useful here: SASE houses the Policy Decision Point for network access; ADSP houses the Policy Enforcement Point at the application layer. Neither alone satisfies the full ZTA model. The Forward/Reverse Proxy Split The architectural difference comes down to proxy direction. SASE is a forward proxy. Employee traffic terminates at an SSE PoP, where identity and device posture are checked before content is retrieved on the user’s behalf. SD-WAN steers traffic intelligently across MPLS, broadband, 5G, or satellite based on real-time path quality. SSE enforces CASB, RBI, and DLP policies before delivery. F5 ADSP is a reverse proxy. Traffic destined for an application terminates at ADSP first, where L4–7 inspection, load balancing, and policy enforcement happen before the request reaches the backend. ADSP understands application protocols, session behavior, and traffic patterns, enabling health monitoring, TLS termination, connection multiplexing, and granular authorization across BIG-IP (hardware, virtual, cloud), NGINX, BIG-IP Next for Kubernetes (BNK), and BIG-IP CNE. The scale difference matters: ADSP handles consumer-facing traffic at orders of magnitude higher volume than SASE handles employee access. This is why full platform convergence only makes sense at the SMB scale, enterprise organizations operate them as distinct, specialized systems owned by different teams. ZTA Principles Mapped to Each Platform ZTA requires continuous policy evaluation, not just at initial authentication, but throughout every session. The table below maps NIST SP 800-207 principles to how each platform implements them. ZTA Principle SASE F5 ADSP Verify explicitly Identity + device posture evaluated per session at SSE PoP L7 authz per request: token validation, API key checks, behavioral scoring Least privilege ZTNA grants per-application, per-session access, no implicit lateral movement API gateway enforces method/endpoint/scope, no over-permissive routes Assume breach CASB + DLP monitors post-access behavior, continuous posture re-evaluation WAF + bot mitigation inspects every payload; micro-segmentation at service boundaries Continuous validation Real-time endpoint compliance; access revoked on posture drift ML behavioral baselines detect anomalous request patterns mid-session Use Case Breakdown Secure Remote Access SASE enforces ZTNA, validating identity, MFA, and endpoint compliance before granting access. F5 ADSP picks up from there, enforcing L7 authorization continuity: token inspection, API gateway policy, and traffic steering to protected backends. A compromised identity that passes ZTNA still faces ADSP’s per-request behavioral inspection. Web Application and API Protection (WAAP) SASE pre-filters known malicious IPs and provides initial TLS inspection, reducing volumetric noise. F5 ADSP delivers full-spectrum WAAP in-path, signature, ML, and behavioral WAF models simultaneously, where application context is fully visible. SASE cannot inspect REST API schemas, GraphQL mutation intent, or session-layer business logic. ADSP can. Bot Management SASE blocks bot C2 communications and applies rate limits at the network edge. F5 ADSP handles what gets through: JavaScript telemetry challenges, ML-based device fingerprinting, and human-behavior scoring that distinguishes legitimate automation (CI/CD, partner APIs) from credential stuffing and scraping, regardless of source IP reputation. AI Security SASE applies CASB and DLP policies to block sensitive data uploads to external AI services and discover shadow AI usage across the workforce. F5 ADSP protects custom AI inference endpoints: prompt injection filtering, per-model, rate limiting, request schema validation, and encrypted traffic inspection. The Handoff Gap, and How to Close It The most common zero trust failure in hybrid architectures isn’t within either platform. It’s the handoff between them. ZTNA grants access, but session context (identity claims, device posture score, risk level) doesn’t automatically propagate to the application plane. The fix is explicit context propagation: SASE injects headers carrying identity and posture signals; ADSP policy engines consume them for L7 authorization decisions. This closes the gap between “who is allowed to connect” and “what that specific session is permitted to do.” Conclusion SASE and F5 ADSP are not competing platforms. They are complementary enforcement planes. SASE answers: can this user reach the application? ADSP answers: What can this session do once it arrives? Organizations that deploy only one leave systematic gaps. Together, with explicit context propagation at the handoff, they deliver the end-to-end zero trust coverage that NIST SP 800-207 actually requires. Related Content Why SASE and ADSP are complementary platform77Views3likes0CommentsLeveraging BGP and ECMP for F5 Distributed Cloud Customer Edge, Part Two
Introduction This is the second part of our series on leveraging BGP and ECMP for F5 Distributed Cloud Customer Edge deployments. In Part One, we explored the high-level concepts, architecture decisions, and design principles that make BGP and ECMP such a powerful combination for Customer Edge high availability and maintenance operations. This article provides step-by-step implementation guidance, including: High-level and low-level architecture diagrams Complete BGP peering and routing policy configuration in F5 Distributed Cloud Console Practical configuration examples for Fortinet FortiGate and Palo Alto Networks firewalls By the end of this article, you'll have everything you need to implement BGP-based high availability for your Customer Edge deployment. Architecture Overview Before diving into configuration, let’s establish a clear picture of the architecture we’re implementing. We’ll examine this from two perspectives: a high-level logical view and a detailed low-level view showing specific IP addressing and AS numbers. High-Level Architecture The high-level architecture illustrates the fundamental traffic flow and BGP relationships in our deployment: Key Components: Component Role Internet External connectivity to the network Next-Generation Firewall Acts as the BGP peer and performs ECMP distribution to Customer Edge nodes Customer Edge Virtual Site Two or more CE nodes advertising identical VIP prefixes via BGP The architecture follows a straightforward principle: the upstream firewall establishes BGP peering with each CE node. Each CE advertises its VIP addresses as /32 routes. The firewall, seeing multiple equal-cost paths to the same destination, distributes incoming traffic across all available CE nodes using ECMP. Low-Level Architecture with IP Addressing The low-level diagram provides the specific details needed for implementation, including IP addresses and AS numbers: Network Details: Component IP Address Role Firewall (Inside) 10.154.4.119/24 BGP Peer, ECMP Router CE1 (Outside) 10.154.4.160/24 Customer Edge Node 1 CE2 (Outside) 10.154.4.33/24 Customer Edge Node 2 Global VIP 192.168.100.10/32 Load Balancer VIP BGP Configuration: Parameter Firewall Customer Edge AS Number 65001 65002 Router ID 10.154.4.119 Auto-assigned based on interface IP Advertised Prefix None 192.168.100.0/24 le 32 This configuration uses eBGP (External BGP) between the firewall and CE nodes, with different AS numbers for each. The CE nodes share the same AS number (65002), which is the standard approach for multi-node CE deployments advertising the same VIP prefixes. Configuring BGP in F5 Distributed Cloud Console The F5 Distributed Cloud Console provides a centralized interface for configuring BGP peering and routing policies on your Customer Edge nodes. This section walks you through the complete configuration process. Step 1: Configure the BGP peering Go to: Multi-Cloud Network Connect --> Manage --> Networking --> External Connectivity --> BGP Peers & Policies Click on Add BGP Peer Then add the following information: Object name Site where to apply this BGP configuration ASN Router ID Here is an example of the required parameters. Then click on Peers --> Add Item And filled the relevant fields like below by adapting the parameters for your requirements. Step 2: Configure the BGP routing policies Go to: Multi-Cloud Network Connect --> Manage --> Networking --> External Connectivity --> BGP Peers & Policies --> BGP Routing Policies Click on Add BGP Routing Policy Add a name for your BGP routing policy object and click on Configure to add the rules. Click on Add Item to add a rule. Here we are going to allow the /32 prefixes from our VIP subnet (192.168.100.0/24). Save the BGP Routing Policy Repeat the action to create another BGP routing policy with the exact same parameters except the Action Type, which should be of type Deny. Now we have two BGP routing policies: One to allow the VIP prefixes (for normal operations) One to deny the VIP prefixes (for maintenance mode) We still need to a a third and final BGP routing policy, in order to deny any prefixes on the CE. For that, create a third BGP routing policy with this match. Step 3: Apply the BGP routing policies To apply the BGP routing policies in your BGP peer object, edit the Peer and: Enable the BGP routing policy Apply the BGP routing policy objects created before for Inbound and Outbound Fortinet FortiGate Configuration FortiGate firewalls are widely deployed as network security appliances and support robust BGP capabilities. This section provides the minimum configuration for establishing BGP peering with Customer Edge nodes and enabling ECMP load distribution. Step 1: Configure the Router ID and AS Number Configure the basic BGP settings: config router bgp set as 65001 set router-id 10.154.4.119 set ebgp-multipath enable Step 2: Configure BGP Neighbors Add each CE node as a BGP neighbor: config neighbor edit "10.154.4.160" set remote-as 65002 set route-map-in "ACCEPT-CE-VIPS" set route-map-out "DENY-ALL" set soft-reconfiguration enable next edit "10.154.4.33" set remote-as 65002 set route-map-in "ACCEPT-CE-VIPS" set route-map-out "DENY-ALL" set soft-reconfiguration enable next end end Step 3: Create Prefix List for VIP Range Define the prefix list that matches the CE VIP range: config router prefix-list edit "CE-VIP-PREFIXES" config rule edit 1 set prefix 192.168.100.0 255.255.255.0 set ge 32 set le 32 next end next end Important: The ge 32 and le 32 parameters ensure we only match /32 prefixes within the 192.168.100.0/24 range, which is exactly what CE nodes advertise for their VIPs. Step 4: Create Route Maps Configure route maps to implement the filtering policies: Inbound Route Map (Accept VIP prefixes): config router route-map edit "ACCEPT-CE-VIPS" config rule edit 1 set match-ip-address "CE-VIP-PREFIXES" next end next end Outbound Route Map (Deny all advertisements): config router route-map edit "DENY-ALL" config rule edit 1 set action deny next end next end Step 5: Verify BGP Configuration After applying the configuration, verify the BGP sessions and routes: Check BGP neighbor status: get router info bgp summary VRF 0 BGP router identifier 10.154.4.119, local AS number 65001 BGP table version is 4 1 BGP AS-PATH entries 0 BGP community entries Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.154.4.33 4 65002 2092 2365 0 0 0 00:05:33 1 10.154.4.160 4 65002 2074 2346 0 0 0 00:14:14 1 Total number of neighbors 2 Verify ECMP routes: get router info routing-table bgp Routing table for VRF=0 B 192.168.100.10/32 [20/255] via 10.154.4.160 (recursive is directly connected, port2), 00:00:11, [1/0] [20/255] via 10.154.4.33 (recursive is directly connected, port2), 00:00:11, [1/0] Palo Alto Networks Configuration Palo Alto Networks firewalls provide enterprise-grade security with comprehensive routing capabilities. This section covers the minimum BGP configuration for peering with Customer Edge nodes. Note: This part is assuming that Palo Alto firewall is configured in the new "Advanced Routing Engine" mode. And we will use the logical-router named "default". Step 1: Configure ECMP parameters set network logical-router default vrf default ecmp enable yes set network logical-router default vrf default ecmp max-path 4 set network logical-router default vrf default ecmp algorithm ip-hash Step 2: Configure objects IPs and firewall rules for BGP peering set address CE1 ip-netmask 10.154.4.160/32 set address CE2 ip-netmask 10.154.4.33/32 set address-group BGP_PEERS static [ CE1 CE2 ] set address LOCAL_BGP_IP ip-netmask 10.154.4.119/32 set rulebase security rules ALLOW_BGP from service set rulebase security rules ALLOW_BGP to service set rulebase security rules ALLOW_BGP source LOCAL_BGP_IP set rulebase security rules ALLOW_BGP destination BGP_PEERS set rulebase security rules ALLOW_BGP application bgp set rulebase security rules ALLOW_BGP service application-default set rulebase security rules ALLOW_BGP action allow Step 3: Palo Alto Configuration Summary (CLI Format) set network routing-profile filters prefix-list ALLOWED_PREFIXES type ipv4 ipv4-entry 1 prefix entry network 192.168.100.0/24 set network routing-profile filters prefix-list ALLOWED_PREFIXES type ipv4 ipv4-entry 1 prefix entry greater-than-or-equal 32 set network routing-profile filters prefix-list ALLOWED_PREFIXES type ipv4 ipv4-entry 1 prefix entry less-than-or-equal 32 set network routing-profile filters prefix-list ALLOWED_PREFIXES type ipv4 ipv4-entry 1 action permit set network routing-profile filters prefix-list ALLOWED_PREFIXES description "Allow only m32 inside 192.168.100.0m24" set network routing-profile filters prefix-list DENY_ALL type ipv4 ipv4-entry 1 prefix entry network 0.0.0.0/0 set network routing-profile filters prefix-list DENY_ALL type ipv4 ipv4-entry 1 prefix entry greater-than-or-equal 0 set network routing-profile filters prefix-list DENY_ALL type ipv4 ipv4-entry 1 prefix entry less-than-or-equal 32 set network routing-profile filters prefix-list DENY_ALL type ipv4 ipv4-entry 1 action deny set network routing-profile filters prefix-list DENY_ALL description "Deny all prefixes" set network routing-profile bgp filtering-profile FILTER_INBOUND ipv4 unicast inbound-network-filters prefix-list ALLOWED_PREFIXES set network routing-profile bgp filtering-profile FILTER_OUTBOUND ipv4 unicast inbound-network-filters prefix-list DENY_ALL set network logical-router default vrf default bgp router-id 10.154.4.119 set network logical-router default vrf default bgp local-as 65001 set network logical-router default vrf default bgp install-route yes set network logical-router default vrf default bgp enable yes set network logical-router default vrf default bgp peer-group BGP_PEERS type ebgp set network logical-router default vrf default bgp peer-group BGP_PEERS address-family ipv4 ipv4-unicast-default set network logical-router default vrf default bgp peer-group BGP_PEERS filtering-profile ipv4 FILTER_INBOUND set network logical-router default vrf default bgp peer-group BGP_PEERS filtering-profile ipv4 FILTER_OUTBOUND set network logical-router default vrf default bgp peer-group BGP_PEERS peer CE1 peer-as 65002 set network logical-router default vrf default bgp peer-group BGP_PEERS peer CE1 local-address interface ethernet1/2 set network logical-router default vrf default bgp peer-group BGP_PEERS peer CE1 local-address ip svc-intf-ip set network logical-router default vrf default bgp peer-group BGP_PEERS peer CE1 peer-address ip 10.154.4.160 set network logical-router default vrf default bgp peer-group BGP_PEERS peer CE2 peer-as 65002 set network logical-router default vrf default bgp peer-group BGP_PEERS peer CE2 local-address interface ethernet1/2 set network logical-router default vrf default bgp peer-group BGP_PEERS peer CE2 local-address ip svc-intf-ip set network logical-router default vrf default bgp peer-group BGP_PEERS peer CE2 peer-address ip 10.154.4.33 Step 4: Verify BGP Configuration After committing the configuration, verify the BGP sessions and routes: Check BGP neighbor status: run show advanced-routing bgp peer status logical-router default Logical Router: default ============== Peer Name: CE2 BGP State: Established, up for 00:01:55 Peer Name: CE1 BGP State: Established, up for 00:00:44 Verify ECMP routes: run show advanced-routing route logical-router default Logical Router: default ========================== flags: A:active, E:ecmp, R:recursive, Oi:ospf intra-area, Oo:ospf inter-area, O1:ospf ext 1, O2:ospf ext 2 destination protocol nexthop distance metric flag tag age interface 0.0.0.0/0 static 10.154.1.1 10 10 A 01:47:33 ethernet1/1 10.154.1.0/24 connected 0 0 A 01:47:37 ethernet1/1 10.154.1.99/32 local 0 0 A 01:47:37 ethernet1/1 10.154.4.0/24 connected 0 0 A 01:47:37 ethernet1/2 10.154.4.119/32 local 0 0 A 01:47:37 ethernet1/2 192.168.100.10/32 bgp 10.154.4.33 20 255 A E 00:01:03 ethernet1/2 192.168.100.10/32 bgp 10.154.4.160 20 255 A E 00:01:03 ethernet1/2 total route shown: 7 Implementing CE Isolation for Maintenance As discussed in Part One, one of the key advantages of BGP-based deployments is the ability to gracefully isolate CE nodes for maintenance. Here’s how to implement this in practice. Isolation via F5 Distributed Cloud Console To isolate a CE node from receiving traffic, in your BGP peer object, edit the Peer and: Change the Outbound BGP routing policy from the one that is allowing the VIP prefixes to the one that is denying the VIP prefixes The CE will stop advertising its VIP routes, and within seconds (based on BGP timers), the upstream firewall will remove this CE from its ECMP paths. Verification During Maintenance On your firewall, verify the route withdrawal (in this case we are using a Fortigate firewall): get router info bgp summary VRF 0 BGP router identifier 10.154.4.119, local AS number 65001 BGP table version is 4 1 BGP AS-PATH entries 0 BGP community entries Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 10.154.4.33 4 65002 2070 2345 0 0 0 00:04:05 0 10.154.4.160 4 65002 2057 2326 0 0 0 00:12:46 1 Total number of neighbors 2 We are not receiving any prefixes anymore for the 10.154.4.33 peer. get router info routing-table bgp Routing table for VRF=0 B 192.168.100.10/32 [20/255] via 10.154.4.160 (recursive is directly connected, port2), 00:06:34, [1/0] End we have now only one path. Restoring the CE in the data path After maintenance is complete: Return to the BGP Peer configuration in the F5XC Console Restore the original export policy (permit VIP prefixes) Save the configuration On the upstream firewall, confirm that CE prefixes are received again and that ECMP paths are restored Conclusion This article has provided the complete implementation details for deploying BGP and ECMP with F5 Distributed Cloud Customer Edge nodes. You now have: A clear understanding of the architecture at both high and low levels Step-by-step instructions for configuring BGP in F5 Distributed Cloud Console Ready-to-use configurations for both Fortinet FortiGate and Palo Alto Networks firewalls Practical guidance for implementing graceful CE isolation for maintenance By combining the concepts from the first article with the practical configurations in this article, you can build a robust, highly available application delivery infrastructure that maximizes resource utilization, provides automatic failover, and enables zero-downtime maintenance operations. The BGP-based approach transforms your Customer Edge deployment from a traditional Active/Standby model into a full active topology where every node contributes to handling traffic, and any node can be gracefully removed for maintenance without impacting your users.342Views3likes0CommentsCloud Apps Protection
Hello Everyone, I hope you're well, I realize a deploy A F5 Big-IP. I have two doubts: Can the Big-IP on-premise solution protect external web applications hosted on AWS and Azure? Can the WAF module in Big-IP on-premise protect mobile applications (APP Mobile)? Would it be possible in scenarios On-Premise , or I need to opt for a Distributed Cloud or Hybrid solution?74Views0likes1CommentUsing ExternalDNS with F5 CIS to Automate DNS on Non-F5 DNS Servers
Overview F5 Container Ingress Services (CIS) is a powerful way to manage BIG-IP configuration directly from Kubernetes. Using CIS Custom Resource Definitions (CRDs) like VirtualServer and TransportServer, you can define rich traffic management policies in native Kubernetes manifests and have CIS automatically create and update Virtual IPs (VIPs) on BIG-IP. One common question that comes up: “What if I want DNS records created automatically when a VirtualServer comes up, but I’m not using F5 DNS?” This article answers exactly that question. We’ll walk through how to combine CIS VirtualServer resources with the community project ExternalDNS to automatically register DNS records on external DNS providers like AWS Route 53, Infoblox, CoreDNS, Azure DNS, and others — all without touching a zone file by hand. Background: How DNS Automation Typically Works in Kubernetes Before diving into the solution, it’s worth grounding ourselves in how DNS automation normally works in Kubernetes. The Standard Pattern: Services of Type LoadBalancer The most common pattern is: Create a Service of type LoadBalancer. A cloud controller (or a bare-metal equivalent like MetalLB) assigns an external IP and updates the status.loadBalancer.ingress field of the Service object. ExternalDNS watches for Services of type LoadBalancer with specific annotations, reads the IP from the status field, and creates a DNS A record on your external DNS server. This is clean, well-understood, and widely supported. ExternalDNS can also watch Ingress objects or Services of type ClusterIP and NodePort, but the LoadBalancer pattern is the most common integration point. Where F5 CIS Fits In CIS supports creating VIPs on BIG-IP in multiple ways: VirtualServer / TransportServer CRDs — Most customers prefer to use VS or TS CRDs because they expose rich BIG-IP capabilities: iRules, custom persistence profiles, health monitors, TLS termination policies, and more. This is where the DNS automation story gets more nuanced and is the focus of this article. Service of type LoadBalancer — CIS watches for Services of type LoadBalancer. Typically an IPAM controller or a custom annotation will be used to configure an IP address. CIS allocates a VIP on BIG-IP. This is not the focus of this article. Other — CIS can also use Ingress or ConfigMap resources, but these are more historical approaches, not recommended for new deployments, and out of scope for this article. The Gap: F5 CRDs and Non-F5 DNS CIS does include its own ExternalDNS CRD (not to be confused with the community project of the same name). However, F5’s built-in ExternalDNS CRD only supports F5 DNS (BIG-IP DNS / GTM). If you’re using Route 53, Infoblox, PowerDNS, or any other DNS provider, you need a different approach. That’s where the community ExternalDNS project comes in. The Solution: VirtualServer + Service of Type LoadBalancer + ExternalDNS The trick is straightforward once you see it: CIS can manage a VIP on BIG-IP via a VirtualServer CRD while simultaneously updating the status field of a Service of type LoadBalancer. ExternalDNS then reads that status field and creates DNS records. Let’s walk through the manifests. Step-by-Step Walkthrough Step 1: Deploy Your Application A standard Deployment — nothing special here. apiVersion: apps/v1 kind: Deployment metadata: name: my-app namespace: my-namespace spec: replicas: 2 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: my-app image: my-app:latest ports: - containerPort: 8080 Step 2: Create the Service of Type LoadBalancer This Service is the linchpin of the whole solution. It serves three purposes: It acts as a target for the CIS VirtualServer pool (either via NodePort or directly to pod IPs in cluster mode). CIS updates its status.loadBalancer.ingress field with the BIG-IP VIP address. ExternalDNS reads its status and annotations to create a DNS record. apiVersion: v1 kind: Service metadata: name: my-app-svc namespace: my-namespace annotations: # ExternalDNS annotation — tells ExternalDNS what hostname to register external-dns.alpha.kubernetes.io/hostname: myapp.example.com # Optional: set a custom TTL external-dns.alpha.kubernetes.io/ttl: "60" spec: selector: app: my-app ports: - port: 80 targetPort: 8080 protocol: TCP type: LoadBalancer # Prevent other LB controllers from acting on this Service loadBalancerClass: f5.com/bigip # Do not allocate NodePort endpoints — more on this below allocateLoadBalancerNodePorts: false Two fields here deserve extra explanation. loadBalancerClass: f5.com/bigip In a typical cluster, multiple controllers may be watching for Services of type LoadBalancer — MetalLB, the cloud provider controller, etc. If you’re using CIS VirtualServer CRDs to manage the VIP (rather than having CIS act directly as a LoadBalancer controller for this Service), you likely don’t want any of those other controllers touching this Service. Setting loadBalancerClass to a value that no other running controller claims means this Service will be ignored by all LB controllers except the one that explicitly handles that class. In this pattern, you want CIS to "see" this service, but not other controllers. So use the CIS argument --load-balancer-class=f5.com/bigip here. Note: The value of loadBalancerClass in your Service should match the value of load-balancer-class in your CIS deployment. The goal is to prevent unintended controllers from assigning IPs or creating cloud load balancers for this Service. allocateLoadBalancerNodePorts: false By default, LoadBalancer Services in Kubernetes allocate NodePort endpoints. This means traffic could reach your pods directly via : — bypassing BIG-IP, security policies, and your iRules. Setting allocateLoadBalancerNodePorts: false prevents this. The Service effectively behaves like a ClusterIP service in terms of access — the only way to reach it from outside the cluster is via the BIG-IP VIP. This is the right posture when: Your CIS deployment uses --pool-member-type=cluster , sending traffic directly to pod IPs. You want BIG-IP to be the sole external entry point for policy enforcement. Step 3: Create the VirtualServer CRD Now we define the VirtualServer. Note how it references the Service by name in the pool configuration: apiVersion: cis.f5.com/v1 kind: VirtualServer metadata: name: my-app-vs namespace: my-namespace labels: f5cr: "true" spec: host: myapp.example.com ipamLabel: prod # Optional: use F5 IPAM Controller for IP allocation # virtualServerAddress: "10.1.10.50" # Or specify IP directly pools: - path: / service: my-app-svc servicePort: 80 monitor: type: http send: "GET / HTTP/1.1\r\nHost: myapp.example.com\r\n\r\n" recv: "" interval: 10 timeout: 10 When CIS processes this VirtualServer, it: Creates a VIP on BIG-IP Configures the BIG-IP pool with the backends from my-app-svc Writes the VIP IP address back into my-app-svc’s status.loadBalancer.ingress field. That last step is what makes the whole chain work. IP Address: Specify Directly or Use F5 IPAM Controller You have two options for IP allocation: Option A — Specify the IP directly in the VirtualServer manifest: spec: virtualServerAddress: "10.1.10.50" This is simple and predictable. Good for static, well-planned deployments. Option B — Use the F5 IPAM Controller: spec: ipamLabel: prod The F5 IPAM Controller watches for CIS resources with ipamLabel annotations and allocates IPs from a configured range. CIS then picks up the allocated IP automatically. This is ideal when you want full automation without managing IP addresses in YAML files. Step 4: Verify CIS Updates the Service Status After CIS processes the VirtualServer, check the Service: kubectl get svc my-app-svc -n my-namespace -o jsonpath='{.status.loadBalancer.ingress}' You should see output like: [{"ip":"10.1.10.50"}] This is the IP that ExternalDNS will use to create the DNS record. Step 5: ExternalDNS Does Its Job With ExternalDNS deployed and configured for your DNS provider (Route 53, Infoblox, etc.), it will: Discover my-app-svc because it’s of type LoadBalancer with an external-dns.alpha.kubernetes.io/hostname annotation. Read 10.1.10.50 from status.loadBalancer.ingress. Create an A record: myapp.example.com → 10.1.10.50. ExternalDNS handles the rest automatically, including updates if the IP changes. A minimal ExternalDNS deployment for Route 53 would look like: apiVersion: apps/v1 kind: Deployment metadata: name: external-dns namespace: external-dns spec: replicas: 1 selector: matchLabels: app: external-dns template: metadata: labels: app: external-dns spec: serviceAccountName: external-dns containers: - name: external-dns image: registry.k8s.io/external-dns/external-dns:v0.14.0 args: - --source=service - --domain-filter=example.com - --provider=aws - --aws-zone-type=public - --registry=txt - --txt-owner-id=my-cluster Refer to the ExternalDNS documentation for provider-specific configuration (IAM roles for Route 53, credentials for Infoblox, etc.). Putting It All Together: Summary of the Architecture Key Considerations and Design Choices When to Use This Pattern vs. CIS as a LoadBalancer Controller CIS can act directly as a LoadBalancer controller — watching Services of type LoadBalancer and creating VIPs on BIG-IP without any VirtualServer CRD involvement. If that’s sufficient for your needs, it’s simpler. ExternalDNS works with that mode natively, since CIS updates status.loadBalancer.ingress in both cases. Use the VirtualServer CRD approach when you need: Custom iRules or iApps on the VIP Advanced persistence profiles Fine-grained TLS termination control Traffic splitting or A/B routing policies Any BIG-IP capability that doesn’t map directly to Kubernetes Service semantics allocateLoadBalancerNodePorts: false — When It Applies This setting is appropriate when your CIS deployment uses --pool-member-type=cluster . In cluster mode, BIG-IP sends traffic directly to pod IPs, not through NodePort endpoints. Disabling NodePort allocation: Prevents back-door access to your application via : Reduces iptables rule sprawl on your nodes Aligns with a clean security boundary where BIG-IP is the sole ingress If your CIS deployment uses --pool-member-type=nodeport , you should not set allocateLoadBalancerNodePorts: false, as CIS will need those NodePorts to forward traffic. F5 IPAM Controller Integration The F5 IPAM Controller pairs particularly well with this pattern. Rather than managing VIP IP addresses in your VirtualServer manifests, IPAM handles allocation from a configured pool. This means: Platform teams manage IP ranges in the IPAM controller config. Application teams simply specify an ipamLabel in their VirtualServer manifest. CIS picks up the IPAM-assigned IP and writes it to the Service status automatically. The ExternalDNS chain remains identical regardless of whether the IP comes from IPAM or is statically assigned. Frequently Asked Questions Q: Can I use this pattern with TransportServer CRDs instead of VirtualServer? Yes. CIS similarly updates the status of a referenced Service when using TransportServer. The same approach applies. Q: What if I want ExternalDNS to also create a CNAME instead of an A record? Use the external-dns.alpha.kubernetes.io/target annotation on the Service to override the IP with a hostname, causing ExternalDNS to create a CNAME. Refer to ExternalDNS documentation for specifics. Q: Can I use multiple hostnames for the same VirtualServer? Add multiple external-dns.alpha.kubernetes.io/hostname annotations (comma-separated values are supported by ExternalDNS) or create additional Services pointing to the same pods. Conclusion Combining F5 CIS VirtualServer CRDs with the community ExternalDNS project gives you the best of both worlds: rich BIG-IP traffic management via CIS, and flexible, provider-agnostic DNS automation via ExternalDNS. The core insight is simple — CIS writes the BIG-IP VIP IP address back into the Kubernetes Service status field, and ExternalDNS reads from that same field. By using loadBalancerClass and allocateLoadBalancerNodePorts: false, you ensure the Service is a clean “status carrier” that doesn’t accidentally expose your application through unintended paths. Whether you assign VIP IPs statically in your manifests or use the F5 IPAM Controller for full automation, this pattern integrates naturally into any Kubernetes-native GitOps workflow. Additional Resources F5 CIS Documentation F5 CIS VirtualServer CRD Reference F5 IPAM Controller on GitHub ExternalDNS on GitHub ExternalDNS: Service Source Documentation Kubernetes: LoadBalancer Service specification107Views3likes0CommentsBeyond Five Nines: SRE Practices for BIG-IP Cloud-Native Network Functions
Introduction Five nines (99.999%) availability gets the headline. But any SRE who has been on-call for a telecom user-plane incident knows that uptime percentages don’t capture the full picture. A NAT pool exhausted at 99.98% availability can still affect millions of subscribers. A DNS cache miss storm at 99.99% uptime can still degrade application performance across an entire region. This article explores how SRE principles (specifically SLIs, SLOs, error budgets, and toil reduction) apply to cloud-native network functions (CNFs) deployed with F5 BIG-IP Cloud-Native Edition. The goal is practical: give SRE teams and platform engineers the vocabulary and patterns to instrument, operate, and evolve these functions the same way they operate any other Kubernetes workload. Why subscriber-centric SLIs beat infrastructure metrics Traditional network operations relies on infrastructure health metrics: CPU utilisation, interface counters, and process uptime. These metrics are necessary, but they answer the wrong question. They tell you the system’s perspective, not the subscriber’s. SRE flips this. An SLI is a direct quantitative measurement of user-visible service behavior. For a CNF in the 5G user plane, subscriber-centric SLIs look like: GTP-U flow forwarding success rate (not just firewall process uptime) NAT session establishment latency at P95 (not just CPU idle) DNS query response rate and cache hit ratio (not just resolver process health) Packet drop rate at the N6/Gi-LAN boundary (not just interface RX errors) BIG-IP CNE exposes these metrics natively through Prometheus-compatible endpoints on each CNF pod, meaning your existing Kubernetes observability stack, whether that is Prometheus + Grafana, Datadog, or a vendor-managed observability platform, can consume them without custom instrumentation. As a consultant, if your monitoring today alerts on CNF pod restarts before it alerts on subscriber-impacting packet drops, your SLI hierarchy is inverted. Fix the SLI definition first, then tune your alerting. SLIs and SLOs: the measurement-to-promise pipeline The distinction between SLIs and SLOs is operational, not semantic. An SLI is what you observe; an SLO is what you commit to. Together, they create an error budget (your explicit allowance for controlled unreliability). Table 1 gives a quick summary to further highlight the relation between SLI, SLO and why it matters to SREs. Table 1: SLI vs SLO — what each term means operationally Aspect SLI (Measurement) SLO (Target) Why it matters to SREs Purpose Reports reality Sets reliability goal Drives team alignment Example "99.92% queries succeeded" "≥99.99% over 30d" Error budget = 0.01% Burn rate Changes minute-by-minute Calculated over window Feeds alerting cadence Action Feeds dashboards/alerts Gates releases Halts or accelerates rollouts The gap between your SLI (what you measure) and your SLO (what you target) is the error budget. For a DNS CNF with an SLO of 99.99% queries answered within 20ms over 30 days, the error budget is 4.38 minutes of allowable degradation per month. That budget governs rollout velocity: when the budget is healthy, teams can ship faster; when it burns through, all changes halt until the system stabilizes. Example: Set your SLO as "99.99% of GTP-U flows processed within 2ms." Your error budget is 0.01% of flows, or roughly 52 minutes of allowable impact per year. A CNF upgrade that introduces a 0.005% flow drop during rollout consumes half your annual budget. That’s the signal your CI/CD pipeline should be gating on — not deployment success. Golden signals mapped to BIG-IP CNE metrics The SRE golden signals (latency, traffic, errors, saturation) map directly to BIG-IP CNE telemetry. The table below gives practical SLI examples, SLO targets, and the operator’s actions each signal should trigger. Table 2 shows an example with the relation to the SLO concepts and the actions to be taken. Table 2: Golden signals as operational SLIs for BIG-IP CNE Golden Signal BIG-IP CNE SLI Example SLO Target Operator Action Latency P95 GTP-U at Edge Firewall CNF ≤ 2ms for 99.99% flows Scale pods / tune policy Traffic Packets/sec per CNF pod Autoscale to 4M+ pps HPA trigger or pre-scale Errors NAT session failure rate < 0.01% over 30 days Halt rollout, root-cause Saturation Port/CPU threshold breach Proactive alert at 80% Drain + horizontal scale These SLIs flow into the same Prometheus/Grafana stack your Kubernetes platform team already operates. A single dashboard can surface both pod-level Kubernetes metrics and CNF user-plane metrics, creating a shared view of reliability that eliminates the classic “my side is green” response to incidents. Observability implementation: metrics, logs, and traces BIG-IP CNE exports telemetry natively into Kubernetes observability pipelines. Here is what that looks like in practice for each pillar of observability: Pillars Description Metrics Each CNF pod exposes metrics endpoints compatible with Prometheus scraping. Key metric families include flow_processing_latency_seconds (histogram), nat_session_failures_total (counter), dns_cache_hit_ratio (gauge), and pod_packet_drop_total (counter). These feed directly into your SLI calculations. Logs CNF logs emit structured JSON to stdout, consumable by Fluentd, Fluent Bit, or any log aggregator in your cluster. Event chains like NAT pool exhaustion produce correlated log sequences that enable root-cause analysis without SSH access to the CNF pod. Traces For distributed request tracing (for example, following a DNS query from UE through the DNS CNF to upstream resolvers) BIG-IP CNE supports OpenTelemetry trace propagation. This is particularly useful when debugging latency spikes in multi-CNF traffic chains where the delay source is ambiguous. Config note: To wire CNF metrics into an existing Prometheus stack, annotate the CNF pod spec with prometheus.io/scrape:“true”" and prometheus.io/port matching the CNF metrics port. No additional expertise required. Error budgets as a deployment gate SRE uses error budgets to make deployment velocity a function of reliability, not a function of the change calendar. Here is how this applies to CNF operations with BIG-IP CNE: Healthy budget (burn rate < 1x): Teams can accelerate CNF feature delivery. New CRD configurations, Helm chart upgrades, and policy changes proceed with normal review cycles. Elevated burn (burn rate 1–5x): All non-emergency CNF changes require additional review. Automated rollback thresholds tighten. Budget exhausted: CNF changes halt. The SRE team shifts 100% focus to reliability work until the budget recovers. This is a policy decision, not a technical one. In practice, BIG-IP CNE supports this through Kubernetes-native mechanisms: Helm-managed upgrades can be gated by pre-upgrade hooks that query current SLI state; CRD-based configuration changes can be rolled out with canary patterns using standard Kubernetes deployment strategies; HPA (Horizontal Pod Autoscaler) rules can be tied directly to CNF-emitted metrics rather than generic CPU thresholds. Toil reduction: from runbooks to controllers SRE defines toil as manual, repetitive, automatable operational work that scales with traffic volume but produces no enduring value. In telecom CNF operations, toil accumulates fast: Manual NAT pool expansion during traffic peaks SSH-based policy pushes for firewall rule updates Ticket-driven DNS configuration changes Manual health checks before and after maintenance windows BIG-IP CNE addresses this through Kubernetes-native control loops. Configuration is declarative — CNF policies are expressed as Custom Resource Definitions (CRDs) applied via kubectl or GitOps pipelines. Kubernetes controllers reconcile the actual CNF state to the desired state defined in Git, eliminating configuration drift and manual intervention. Example: Instead of a runbook step that says “SSH to the CGNAT CNF and add 1000 ports to poolX,” your GitOps pipeline applies a CRD update that the CNF controller reconciles automatically. The audit trail is a Git commit, not a change ticket. SRE teams typically target a 50/50 split between operational work and engineering work. CNF operations that rely on manual runbooks push this ratio toward 70–80% operations. Declarative CNF management via CRDs and Helm shifts it back, freeing SRE capacity for SLO definition, observability improvement, and automation engineering. Dissolving the platform/network operations boundary Figure 1: SRE bridges the Kubernetes platform team and telecom network operations team through shared SLIs and a unified observability stack. The most persistent operational problem in cloud-native telecom is not technical; it is organizational. Kubernetes platform teams and telecom network operations teams measure different things, escalate through different processes, and use different tooling. When a GTP-U latency spike occurs, Kubernetes teams check pod health and cluster metrics; telecom teams check interface counters and policy logs. Neither has the full picture. The SRE resolves this by requiring both teams to operate against the same SLIs. When CNF and cluster metrics flow into the same observability stack: A single SLI can span pods, nodes, and network functions Rollouts, autoscaling, and maintenance windows are gated by shared error budgets rather than siloed change calendars Kubernetes engineers declare CNF configurations as code; telecom teams define SLOs that consume those functions as building blocks The result is that when an SLI burns through an error budget (for example, a 0.02% GTP-U drop rate) both teams respond to the same signal. Kubernetes teams scale pods; telecom teams tune policies. No finger-pointing. Shared accountability for the packet-level truth that subscribers experience. 5G N6/Gi-LAN consolidation: a concrete SRE use case Figure 2: BIG-IP CNE consolidating SGi-LAN/N6 functions (Edge Firewall, CGNAT, DNS) as Kubernetes-native CNFs alongside the 5G core. A common deployment pattern for BIG-IP CNE is N6/Gi-LAN consolidation, where edge firewalling, CGNAT, DNS, and DDoS protection are deployed as CNFs alongside the 5G core rather than as discrete physical or virtual appliances. From an SRE perspective, this architecture enables composite SLOs that span multiple CNFs in a single traffic chain: Edge Firewall CNF: SLI = packet drop rate at N6 boundary. SLO = <0.001% drops over 30 days. CGNAT CNF: SLI = NAT session establishment success rate. SLO = 99.99% sessions established within 5ms. DNS CNF: SLI = query response latency at P95. SLO = P95 < 20ms with >80% cache hit ratio. Composite SLOs then drive autoscaling and routing decisions based on real service behavior rather than static capacity plans. When the DNS cache hit ratio drops below threshold, the autoscaler adds DNS CNF replicas driven by the CNF-emitted metric, not a manual capacity review. Conclusion: Path to AI-native 6G The 6G architecture direction (disaggregated, software-defined network functions dynamically placed across distributed edge locations) requires SRE disciplines at the foundation, not bolted on later. Networks that must adapt in near-real time cannot be operated by humans with runbooks. BIG-IP CNE was designed with this trajectory in mind. The same Kubernetes-native architecture that enables SRE practices for 5G today (declarative configuration, horizontal scaling, native observability) is the foundation for AI-driven traffic steering, dynamic policy enforcement, and intent-based networking in 6G environments. For platform teams making architecture decisions now: investing in SLO definition and observability instrumentation for current CNF deployments is not just operational hygiene. It is building the data infrastructure that AI-native operations will require. Key takeaways, Define SLIs at the subscriber boundary, not the infrastructure boundary Use error budgets to gate CNF rollout velocity. Make it a CI/CD policy, not a manual decision Consume CNF Prometheus metrics in your existing Kubernetes observability stack, no separate tooling required Declarative CRD-based CNF management via GitOps is the primary toil-reduction lever Shared SLIs between Kubernetes platform and telecom operations teams eliminate the organizational boundary that causes most major incidents Related content BIG-IP Next for Kubernetes CNFs - DNS walkthrough BIG-IP Next for Kubernetes CNFs deployment walkthrough From virtual to cloud-native, infrastructure evolution Visibility for Modern Telco and Cloud‑Native Networks BIG-IP Next Cloud-Native Network Functions (CNFs)180Views3likes0CommentsAPM checking for URI
I have created an APM policy that checks to is it the URI contains a specific URI. If the URI is anything else then the fallback is to send the traffic. example: https://www.fubar.com/admin - the APM is looking for /admin and if present the traffic will then go to the next step is certificate prompt if the URI contains anything else use the fallback to continue. For example https://www.fubar.com/documents/invenioHealth the APM would use the fallback and just let the traffic pass https://www.fubar.com/documents/invenioHealth in this case the F5 is sending a 302, instead of just sending the traffic through and then sends a FIN/ACK back to the source.Solved128Views0likes2CommentsVMware VKS integration with F5 BIG-IP and CIS
Introduction vSphere Kubernetes Service (VKS) is the Kubernetes runtime built directly into VMware Cloud Foundation (VCF). With CNCF certified Kubernetes, VKS enables platform engineers to deploy and manage Kubernetes clusters while leveraging a comprehensive set of cloud services in VCF. Cloud admins benefit from the support for N-2 Kubernetes versions, enterprise-grade security, and simplified lifecycle management for modern apps adoption. Alike with other Kubernetes platforms, the integration with BIG-IP is done through the use of the Container Ingress Services (CIS) component, which is hosted in the Kubernetes platform and allows to configure the BIG-IP using the Kubernetes API. Under the hood, it uses the F5 AS3 declarative API. Note from the picture that BIG-IP integration with VKS is not limited to BIG-IP´s load balancing capabilities and that most BIG-IP features can be configured using this integration. These features include: Advanced TLS encryption, including safe key storage with Hardware Security Module (HSM) or Network & Cloud HSM support. Advanced WAF, L7 bot and API protection. L3-L4 High-performance firewall with IPS for protocol conformance. Behavioral DDoS protection with cloud scrubbing support. Visibility into TLS traffic for inspection with 3 rd party solutions. Identity-aware ingress with Federated SSO and integration with leading MFAs. AI inference and agentic support thanks to JSON and MCP protocol support. Planning the deployment of CIS for VMware VKS The installation of CIS in VMware VKS is performed through the standard Helm charts facility. The platform owner needs to determine beforehand: Whether the deployment is hosted on a vSphere (VDS) network or an NSX network. It has to be taken into account that on an NSX network, VKS doesn´t currently allow to place the load balancers in the same segment as the VKS cluster. No special considerations have to be taken when hosting BIG-IP in a vSphere (VDS) network. Whether this is a single-cluster or a multi-cluster deployment. When using the multi-cluster option and clusterIP mode (only possible with Calico in VKS), it has to be taken into account that the POD networks of the clusters cannot have overlapping prefixes. What Kubernetes networking (CNI) is desired to be used. CIS supports both VKS supported CNIs: Antrea (default) and Calico. From the CIS point of view, the CNI is only relevant when sending traffic directly to the PODs. See next. What integration with the CNI is desired between the BIG-IP and VKS NodePort mode This is done by making applications discoverable using Services of type NodePort. From the BIG-IP, the traffic is sent to the Node´s IPs where it is redistributed to the POD depending on the TrafficPolicies of the Service. This is CNI agnostic. Any CNI can be used. Direct-to-POD mode This is done by making applications discoverable using the Services of type ClusterIP. Note that the CIS integration with Antrea uses Antrea´s nodePortLocal mechanism, which requires an additional annotation in the Service declaration. See the CIS VKS page in F5 CloudDocs for details. This Antrea nodePortLocal mechanism allows to send the traffic directly to the POD without actually using the POD IP address. This is especially relevant for NSX because it allows to access the PODs without actually re-distributing the PODs IPs across the NSX network, which is not allowed. When using vSphere (VDS) networking, either Antrea’s nodePortLocal or clusterIP with Calico can be used. Another way (but not frequent) is the use of hostNetwork POD networking because it requires privileges for the application PODs or ingress controllers. Network-wise, this would have a similar behavior to nodePortLocal, but without the automatic allocation of ports. Whether the deployment is a single-tier or a two-tier deployment. A single-tier deployment is a deployment where the BIG-IP sends the traffic directly to the application PODs. This has a simpler traffic flow and easier persistence and end-to-end monitoring. A two-tier deployment sends the traffic to an ingress controller POD instead of the application PODs. This ingress controller could be Contour, NGINX Gateway Fabric, Istio or an API gateway. This type of deployment offers the ultimate scalability and provides additional segregation between the BIG-IPs (typically owned by NetOps) and the Kubernetes cluster (typically owned by DevOps). Once CIS is deployed, applications can be published either using the Kubernetes standard Ingress resource or F5’s Custom Resources. This latter is the recommended way because it allows to expose most of the BIG-IPs capabilities. Details on the Ingress resource and F5 custom annotations can be found here. Details on the F5 CRDs can be found here. Please note that at time of this writing Antrea nodePortLocal doesn´t support the TransportServer CRD. Please consult your F5 representative for its availability. Detailed instructions on how to deploy CIS for VKS can be found on this CIS VKS page in F5 CloudDocs. Application-aware MultiCluster support MultiCluster allows to expose applications that are hosted in multiple VKS clusters and publish them in a single VIP. BIG-IP & CIS are in charge of: Discover where the PODs of the applications are hosted. Note that a given application doesn´t need to be available in all clusters. Upon receiving the request for a given application, decide to which cluster and Node/Pod the request has to be sent. This decision is based on the weight of each cluster, the application availability and the load balancing algorithm being applied. Single-tier or Two-tier architectures are possible. NodePort and ClusterIP modes are possible as well. Note that at the time of this writing, Antrea in ClusterIP mode (nodePortLocal) is not supported currently. Please consult your F5 representative for availability of this feature. Considerations for NSX Load Balancers cannot be placed in the same VPC segment where the VMware VKS cluster is. These can be placed in a separate VPC segment of the same VPC gateway as shown in the next diagram. In this arrangement the BIG-IP can be configured as either 1NIC mode or as a regular deployment, in which case the MGMT interface is typically configured through an infrastructure VLAN instead of an NSX segment. The data segment is only required to have enough prefixes to host the self-IPs of the BIG-IP units. The prefixes of the VIPs might not belong to the Data Segment´s subnet. These additional prefixes have to be configured as static routes in the VPC Gateway and Route Redistribution for these must be enabled. Given that the Load Balancers are not in line with the traffic flow towards the VKS Cluster, it is required to use SNAT. When using SNAT pools, the prefixes of these can optionally be configured as additional prefixes of the Data Segment, like the VIPs. Specifically for Calico, clusterIP mode cannot be used in NSX because this would require the BIG-IP to be in the same VPC segment as VMware VKS. Note also that BGP multi-hop is not feasible either because it would require the POD cluster network prefixes to be redistributed through NSX, which is not possible either. Conclusion and final remarks F5 BIG-IPs provides unmatched deployment options and features for VMware VKS; these include: Support for all VKS CNIs, which allows sending the traffic directly instead of using hostNetwork (which implies a security risk) or using the common NodePort, which can incur an additional kube-proxy indirection. Both 1-tier or 2-tier arrangements (or both types simultaneously) are possible. F5´s Container Ingress Services provides the ability to handle multiple VMware VKS clusters with application-aware VIPs. This is a unique feature in the industry. Securing applications with the wide range of L3 to L7 security features provided by BIG-IP, including Advanced WAF and Application Access. To complete the circle, this integration also provides IP address management (IPAM) which provides great flexibility to DevOps teams. All these are available regardless of the form factor of the BIG-IP: Virtual Edition, appliance or chassis, allowing great scalability and multi-tenancy options. In NSX deployments, the recommended form-factor is Virtual Edition in order to connect to the NSX segments. We look forward to hearing your experience and feedback on this article.685Views1like0CommentsHigh Availability for F5 NGINX Instance Manager in AWS
Introduction F5 NGINX Instance Manager gives you a centralized way to manage NGINX Open Source and NGINX Plus instances across your environment. It’s ideal for disconnected or air-gapped deployments, with no need for internet access or external cloud services. The NGINX Instance Manager features keep changing. They now include many features for managing configurations, like NGINX config versioning and templating, F5 WAF for NGINX policy and signature management, monitoring of NGINX metrics and security events, and a rich API to help external automation. As the role of NGINX Instance Manager becomes increasingly important in the management of disconnected NGINX fleets, the need for high availability increases. This article explores how we can use Linux clustering to provide high availability for NGINX Instance Manager across two availability zones in AWS. Core Technologies Core technologies used in this HA architecture design include: Amazon Elastic Compute instances (EC2) - virtual machines rented inside AWS that can be used to host applications, like NGINX Instance Manager. Pacemaker - an open-source high availability resource manager software used in Linux clusters since 2004. Pacemaker is generally deployed with the Corosync Cluster Engine, which provides the cluster node communication, membership tracking and cluster quorum. Amazon Elastic File System (EFS) - a serverless, fully managed, elastic Network File System (NFS) that allows servers to share file data simultaneously between systems. Amazon Network Load Balancer (NLB) - a layer 4 TCP/UDP load balancer that forwards traffic to targets like EC2 instances, containers or IP addresses. NLB can send periodic health checks to registered targets to ensure that traffic is only forwarded to healthy targets. Architecture Overview In this highly available architecture, we will install NGINX Instance Manager (NIM) on two EC2 instances in different AWS Availability Zones (AZ). Four EFS file systems will be created to share key stateful information between the two NIM instances, and Pacemaker/Corosync will be used to orchestrate the cluster - only one NIM instance is active at any time and Pacemaker will facilitate this by starting/stopping the NIM systemd services. Finally, an Amazon NLB will be used to provide network failover between the two NIM instances, using an HTTP health check to determine the active cluster node. Deployment Steps 1. Create AWS EFS file systems First, we are going to create four EFS volumes to hold important NIM configuration and state information that will be shared between nodes. These file systems will be mounted onto: /etc/nms, /var/lib/clickhouse, /var/lib/nms and /usr/share/nms inside the NIM node. Take note of the File System IDs of the newly created file systems. Edit the properties of each EFS file system and create a mount target in each AZ you intend to deploy a NIM node in, then restrict network access to only the NIM nodes by setting up an AWS Security Group. You may also consider more advanced authentication methods, but these aren't covered in this article. 2. Deploy two EC2 instances for NGINX Instance Manager Deploy two EC2 instances with suitable specifications to support the number of data plane instances that you plan to manage (you can find the sizing specifications here) and connect one to each of the AZ/subnet that you configured EFS mount targets in above. In this example, I will deploy two t2.medium instances running Ubuntu 24.04, connect one to us-east-1a and the other to us-east-1c, and create a security group allowing only traffic from its local assigned subnet. 3. Mount the EFS file systems on NGINX Instance Manager Node 1 Now we have the EC2 instances deployed, we can log on to Node 1 and mount the EFS volumes onto this node by executing the following steps: 1. SSH onto Node 1 2. Install efs-utils package if is not installed already 3. Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory 4. Execute mount -a to mount the file systems 5. Execute df to ensure that the paths are mounted correctly 4. Install NGINX Instance Manager on Node 1 With the EFS file systems now mounted, it's time to run through the NGINX Instance Manager installation on Node 1. 1. Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh 2. Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 3. Run bash install-nim-bundle.sh -d ubuntu22.04 4. Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node: systemctl stop nms; systemctl disable nms systemctl stop nginx; systemctl disable nginx systemctl stop clickhouse-server; systemctl disable clickhouse-server 5. Install NGINX Instance Manager on Node 2 This time we are going to install NGINX Instance Manager on Node two but without attaching the EFS file systems. On Node 2: 1. Navigate to the Install the latest NGINX Instance Manager with a script page in the NGINX documentation and download install-nim-bundle.sh 2. Install your NGINX licenses (nginx-repo.crt and nginx-repo.key) into /etc/ssl/nginx/ 3. Run bash install-nim-bundle.sh -d ubuntu22.04 4. Wait for the installation to complete, take note of the password that was generated during the installation, then stop and disable autostart of NIM services on this node: systemctl stop nms; systemctl disable nms systemctl stop nginx; systemctl disable nginx systemctl stop clickhouse-server; systemctl disable clickhouse-server 6. Mount EFS file systems on NGINX Instance Manager Node 2 Now we have the NGINX Instance Manager binaries installed on each node, let's mount the EFS file systems on Node 2: 1. SSH onto Node 2 2. Install efs-utils package if is not installed already 3. Edit /etc/fstab and create an entry for each EFS File System ID and its associated mount directory 4. Execute mount -a to mount the file systems 5. Execute df to ensure that the paths are mounted correctly 7. Install and configure Pacemaker/Corosync With NGINX Instance Manager now installed on both nodes, it's now time to get Pacemaker and Corosync installed: 1. Install Pacemaker, Corosync and other important agents sudo apt update sudo apt install pacemaker pcs corosync fence-agents-aws resource-agents-base 2. To allow Pacemaker to communicate between nodes, we need to add TCP communication between nodes to the Security Group for the NIM nodes. 3. Once we have the connectivity in place, we have to set a common password for the hacluster user on both nodes - we can do this by running the following command on both nodes: sudo passwd hacluster password: IloveF5 (don't use this!) 4. Now we start the Pacemaker services by running the following commands on both nodes: systemctl start pcsd.service systemctl enable pcsd.service systemctl status pcsd.service systemctl start pacemaker systemctl enable pacemaker 5. And finally, we authenticate the nodes with each other (using hacluster username, password and node hostname) and check the cluster status: pcs host auth ip-172-17-1-89 ip-172-17-2-160 pcs cluster setup nimcluster --force ip-172-17-1-89 pcs status 8. Configure Cluster Fencing Fencing is the ability to make a node unable to run resources, even when that node is unresponsive to cluster commands - you can think of fencing as cutting the power to the node. Fencing protects against corruption of data due to concurrent access to shared resources, commonly known as "split brain" scenario. In this architecture, we use the fence_aws agent, which uses boto3 library to connect to AWS and stop the EC2 instances of failing nodes. Let's install and configure the fence_aws agent: 1. Create an AWS Access Key and Secret Access key for fence_aws to use 2. Install the AWS CLI on both NIM nodes 3. Take note of the Instance IDs for the NIM instances 4. Configure the fence_aws agent as a Pacemaker STONITH device. Run the psc stonith command inserting your access key, secret key, region, and mappings of Instance ID to Linux hostname. pcs stonith create hacluster-stonith fence_aws access_key=(your access key) secret_key=(your secret key) region=us-east-1 pcmk_host_map="ip-172-31-34-95:i-0a46181368524dab6;ip-172-31-27-134:i-032d0b400b5689f68" power_timeout=240 pcmk_reboot_timeout=480 pcmk_reboot_retries=4 5. Run pcs status and make sure that the stonith device is started 9. Configure Pacemaker resources, colocations and contraints Ok - we are almost there! It's time to configure the Pacemaker resources, colocations and constraints. We want to make sure that the clickhouse-server, nms and nginx systemd services all come up on the same node together, and in that order. We can do that using Pacemaker colocations and constraints. 1. Configure a pacemaker resource for each systemd service pcs resource create clickhouse systemd:clickhouse-server pcs resource create nms systemd:nms.service pcs resource create nginx systemd:nginx.service 🔥HOT TIP🔥 check out pcs resource command options (op monitor interval etc.) to optimize failover time. 2. Create two colocations to make sure they all start on the same node pcs constraint colocation add clickhouse with nms pcs constraint colocation add nms with nginx 3. Create three constraints to define the startup order: Clickhouse -> NMS -> NGINX pcs constraint order start clickhouse then nms pcs constraint order start nms then nginx 4. Enable and start the pcs cluster pcs cluster enable --all pcs cluster start --all 10. Provision AWS NLB Load Balancer Finally - we are going to set up the AWS Network Load Balancer (NLB) to facilitate the failover. Create a Security Group entry to allow HTTPs traffic to enter the EC2 instance from the local subnet 2. Create a Load Balancer target group, targeting instances, with Protocol TCP on port 443 ⚠️NOTE ⚠️ if you are using Load balancing with TCP Protocol and terminating the TLS connection on the NIM node (EC2 instance), you must create a security group entry to allow TCP 443 from the connecting clients directly to the EC2 instance IP address. If you have trusted SSL/TLS server certificates, you may want to investigate a load balancer for TLS protocol. 3. Ensure that a HTTPS health check is in place to facilitate the failover 🔥HOT TIP🔥 you can speed up failure detection and failover using Advanced health check settings. 4. Include our two NIM instances as pending and save the target group 5. Now let's create the network load balancer (NLB) listening on TCP port 443 and forwarding to the target group created above. 6. Once the load balancer is created, check the target group and you will find that one of the targets is healthy - that's the active node in the pacemaker cluster! 7. With the load balancing now in place, you can access the NIM console using the FQDN for your load balancer and login with the password set in the install of Node 1. 8. Once you have logged in, we need to install a license before we proceed any further: Click on Settings Click on Licenses Click Get Started Click Browse Upload your license Click Add 9. With the license now installed, we have access to the full console 11. Test failover The easiest way to test failover is to just shut down the active node in the cluster. Pacemaker will detect the node is no longer available and start the services on the remaining node. Stop the active node/instance of the NIM 2. Monitor the Target Group and watch it fail over - depending on the settings you have set up, this may take a few minutes 12. How to upgrade NGINX Instance Manager on the cluster To upgrade NGINX Instance Manager in a Pacemaker cluster, perform the following tasks: 1. Stop the Pacemaker Cluster services on Node 2 - forcing Node 1 to take over. pcs cluster stop ip-172-17-2-160 2. Disconnect the NFS mounts on Node2 umount /usr/share/nms umount /etc/nms umount /var/lib/nms umount /var/lib/clickhouse 3. Upgrade NGINX Instance Manager on Node 1 Download the update from the MyF5 Customer Portal sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb sudo systemctl restart nms sudo systemctl restart nginx 4. Upgrade NGINX Instance Manager on Node 2 (with the NFS mounts disconnected) Download the update from the MyF5 Customer Portal sudo apt-get -y install -f /home/user/nms-instance-manager_<version>_amd64.deb sudo systemctl restart nms sudo systemctl restart nginx 5. Re-mount all the NFS mounts on Node 2 mount -a 6. Start the Pacemaker Cluster services on Node 2 - adding it back into the cluster pcs cluster start ip-172-17-2-160 13. Reference Documents Some good references on Pacemaker/Corosync clustering can be found here: Configuring a Red Hat High Availability cluster on AWS Implement a High-Availability Cluster with Pacemaker and Corosync ClusterLabs Pacemaker website Corosync Cluster Engine website296Views0likes0CommentsAccelerate Application Deployment on Google Cloud with F5 NGINXaaS
Introduction In the push for cloud-native agility, infrastructure teams often face a crossroads: settle for basic, "good enough" load balancing, or take on the heavy lifting of manually managing complex, high-performance proxies. For those building on Google Cloud (GCP), this compromise is no longer necessary. F5 NGINXaaS for Google Cloud represents a shift in how we approach application delivery. It isn’t just NGINX running in the cloud; it is a co-engineered, fully managed on-demand service that lives natively within the GCP ecosystem. This integration allows you to combine the advanced traffic control and programmability NGINX is known for with the effortless scaling and consumption model of an SaaS offering in a platform-first way. By offloading the "toil" of lifecycle management—like patching, tuning, and infrastructure provisioning—to F5, teams can redirect their energy toward modernizing application logic and accelerating release cycles. In this article, we’ll dive into how this synergy between F5 and Google Cloud simplifies your architecture, from securing traffic with integrated secret management to gaining deep operational insights through native monitoring tools. Getting Started with NGINXaaS for Google Cloud The transition to a managed service begins with a seamless onboarding experience through the Google Cloud Marketplace. By leveraging this integrated path, teams can bypass the manual "toil" of traditional infrastructure setup, such as patching and individual instance maintenance. The deployment process involves: Marketplace Subscription: Directly subscribe to the service to ensure unified billing and support. Network Connectivity: Setting up essential VPC and Network Attachments to allow NGINXaaS to communicate securely with your backend resources. Provisioning: Launching a dedicated deployment that provides enterprise-grade reliability while maintaining a cloud-native feel. Secure and Manage SSL/TLS in F5 NGINXaaS for Google Cloud Security is a foundational pillar of this co-engineered service, particularly regarding traffic encryption. NGINXaaS simplifies the lifecycle of SSL/TLS certificates by providing a centralized way to manage credentials. Key security features include: Integrated Secrets Management: Working natively with Google Cloud services to handle sensitive data like private keys and certificates securely. Proxy Configuration: Demonstrating how to set up a Google Cloud proxy network load balancer to handle incoming client traffic. Credential Deployment: Uploading and managing certificates directly within the NGINX console to ensure all application endpoints are protected by robust encryption. Enhancing Visibility in Google Cloud with F5 NGINXaaS Visibility is no longer an afterthought but a native component of the deployment, providing high-fidelity telemetry without separate agents. Native Telemetry Export: By linking your Google Cloud Project ID and configuring Workload Identity Federation (WIF), metrics and logs are pushed directly to Google Cloud Monitoring. Real-Time Dashboards: The observability demo walks through using the Metrics Explorer to visualize critical performance data, such as active HTTP connection counts and response rates. Actionable Logging: Integrated Log Analytics allow you to use the Logs Explorer to isolate events and troubleshoot application issues within a single toolset, streamlining your operational workflow. Whether you are just beginning your transition to the cloud or fine-tuning a sophisticated microservices architecture, F5 NGINXaaS provides the advanced availability, scalability, security, and visibility capabilities necessary for success in the Google Cloud environment. Conclusion The integration of F5 NGINXaaS for Google Cloud represents a significant advantage for organizations looking to modernize their application delivery without the traditional overhead of infrastructure management. By shifting to this co-engineered, managed service, teams can bridge together advanced NGINX performance and the native agility of the Google Cloud ecosystem. Through the demonstrations provided in this article, we’ve highlighted how you can: Accelerate Onboarding: Move from Marketplace subscription to a live deployment in minutes using Network Attachments. Fortify Security: Centralize SSL/TLS management within the NGINX console while leveraging Google Cloud's robust networking layer. Maximize Operational Intelligence: Harness deep, real-time observability by piping telemetry directly into Google Cloud Monitoring and Logging. Resources Accelerating app transformation with F5 NGINXaaS for Google Cloud F5 NGINXaaS for Google Cloud: Delivering resilient, scalable applications139Views2likes2CommentsAI Inference for VLLM models with F5 BIG-IP & Red Hat OpenShift
This article shows how to perform Intelligent Load Balancing for AI workloads using the new features of BIG-IP v21 and Red Hat OpenShift. Intelligent Load Balancing is done based on business logic rules without iRule programming and state metrics of the VLLM inference servers gathered from OpenShift´s Prometheus.461Views1like5Comments