deployment
308 TopicsDesign for resiliency and protect against cloud outages with F5 DNS and application monitoring
How to reduce DNS recovery time and know when a provider, region, or control plane is having a bad day. Why DNS resiliency matters Major outages happen more often than many architectures assume. The most painful part is frequently not the incident itself, but the operational loss of control that comes from tightly coupling critical functions (like DNS) to a single platform or provider. When that platform is impaired, workarounds become limited and recovery slows. Design principle: fail safely and recover fast A useful way to frame resiliency is failures will occur, so the architecture should prioritize rapid, low-risk recovery. That typically means eliminating single points of dependency, automating failover where practical, and ensuring you can change traffic direction even when one control plane is degraded. The DNS failure mode: what breaks and how long it takes When authoritative DNS is hosted with a single vendor, a DNS incident can translate into recovery times on the order of 30 minutes to 3 hours (depending on the failure domain, TTLs, and operational procedures). With an automated, multi-provider design, recovery can be reduced dramatically, down to ~60 seconds in some scenarios. Solution overview This article describes an end-to-end resiliency pattern that combines (1) multi-provider authoritative DNS, using F5 BIG-IP DNS (commonly deployed on-prem or in IaaS) with F5 Distributed Cloud DNS as an additional authoritative provider, and (2) application assurance via F5 Distributed Cloud Synthetic Monitoring. The DNS design helps keep applications reachable during cloud-service impairments or regional failures by enabling automated failover and preserving the ability to shift control when a dependency is degraded. Synthetic DNS/HTTP checks then continuously validate external reachability and performance, so you can detect issues early and triage faster when incidents occur. What you get from multi-provider authoritative DNS Higher availability: a second authoritative provider reduces the blast radius of a single-vendor outage. Lower query latency: globally distributed anycast networks can shorten resolver-to-authoritative RTT for many users. Built-in DDoS resistance: distributed networks can absorb and disperse volumetric attacks more effectively than a small on-prem footprint. Elastic capacity: the service can scale during traffic spikes without pre-provisioning appliances for peak usage. Better visibility: per-query metrics and synthetic checks help validate reachability from multiple regions. Example: improving availability and latency for Acme Bank Acme Bank, whose name has been changed for the purposes of this article, struggled with higher DNS latency and periodic downtime when their on-prem DNS appliances failed. They also had to plan for peak capacity in advance to handle traffic spikes, an approach that can be expensive and still leave gaps when demand exceeds forecasts. By adding Distributed Cloud DNS as an additional authoritative DNS provider alongside BIG-IP DNS, Acme Bank extended DNS serving closer to end users on a globally distributed network. This improved DNS availability and reduced query latency, while providing a platform that can scale to meet demand. Reference architecture (high-level) At a minimum, you are operating two authoritative DNS providers for the same zone: Primary authoritative: BIG-IP DNS serving the zone (often integrated with existing on-prem or cloud-adjacent infrastructure). Secondary/additional authoritative: Distributed Cloud DNS hosting the same zone data (via zone transfer and/or secondary zone configuration). Delegation: Your registrar/parent zone publishes NS records so recursive resolvers can reach either provider. Configuration walkthrough Step 1: Enable zone transfers from BIG-IP DNS Configure BIG-IP DNS to allow zone transfers (AXFR/IXFR) to the Distributed Cloud DNS name servers for the zones you want to protect. Validate transfers and ensure TSIG and IP-based allowlists (as applicable) are in place to prevent unauthorized replication. Step 2: Add the zone as secondary in Distributed Cloud DNS Add your domain as a secondary DNS zone in Distributed Cloud DNS and point it to BIG-IP DNS for transfers. Once the initial transfer completes, verify the zone is online and that records (including SOA/NS) match expectations. Use the console to inspect zone content and confirm refresh/retry timers align with your operational goals. Step 3: Update delegation at the registrar (planned cutover) Update the domain delegation at your DNS registrar/parent zone to publish the desired authoritative name servers (for example, shifting primary delegation from BIG-IP DNS to Distributed Cloud DNS, or publishing both sets depending on your strategy). Plan for propagation by lowering TTLs ahead of time when feasible, and document a rollback procedure (e.g., reverting NS to the previous set) before making changes. Monitoring and app assurance with synthetic checks Once secondary DNS is active, use DNS and HTTP synthetic monitoring from multiple geographies to validate end-to-end reachability. Track query success rate, response codes, and latency, and alert on anomalies that indicate partial outages (e.g., a single region failing, increased NXDOMAIN/SERVFAIL rates, or unexpected record changes). Application assurance (synthetic monitoring) Even with resilient DNS, application incidents still happen and the worst-case operational pattern is learning about them from users first. Synthetic monitoring helps you detect externally visible failures early (often before customer reports), so response starts with evidence rather than guesswork. F5 Distributed Cloud Synthetic Monitoring continuously simulates DNS lookups and HTTP requests to validate the external health and performance of your applications. Over time, you can establish a baseline for availability and latency, i.e., “what normal looks like,” which makes deviations easier to detect and triage. Global vantage points: run checks from multiple regions to avoid a single-location “false negative.” Multiple providers: compare results across providers to separate internet-path issues from app/origin issues. Actionable alerts: alert on latency spikes, elevated error rates (e.g., HTTP 5xx), and DNS resolution failures. Fast drill-down: pivot from an alert to region-level breakdowns, timelines, and event tables to isolate where the failure is occurring. Example triage workflow: an alert flags a critical payroll application. In the console, you can correlate a single-region degradation (for example, West US) with a sharp increase in HTTP latency and a burst of HTTP 500 responses. A regional timing breakdown can further indicate whether time is being spent in network connect, TLS negotiation, or server processing, helping you route the incident to the correct owning team (e.g., origin/app servers for that region) without hours of cross-team war-room triage. The practical outcome is reduced mean time to detect (MTTD) and faster “mean time to innocence” by quickly narrowing down which component is failing and which team should engage. Video Demonstration The following video reviews each of the challenges described in this article and how F5 solves this by providing cloud resiliency with DNS services and app assurance with synthetic monitoring. Conclusion DNS is a critical dependency, and a common amplification point during outages, so a multi-provider authoritative DNS design (BIG-IP DNS plus Distributed Cloud DNS) helps preserve reachability and control when a vendor, region, or control plane is degraded. But resiliency is strongest when DNS failover is paired with application assurance: synthetic DNS/HTTP checks provide early, external detection and rapid triage signals that shorten both MTTD and time to mitigation. Together, DNS resiliency with app assurance form an end-to-end resiliency solution, keeping users routed to healthy endpoints while simultaneously proving what is (and isn’t) failing, so teams can respond faster with less guesswork. Next, validate your zone-transfer security model, define failover/runbook procedures, instrument synthetic checks and alert thresholds, and test delegation changes in a lower environment before production cutover. Additional Resources F5 DNS Products Distributed Cloud Synthetic Monitoring Related Technical Articles Accelerate Your Initiatives: Secure & Scale Hybrid Cloud Apps on F5 BIG-IP & Distributed Cloud DNS The Power of &: F5 Hybrid DNS solution Use F5 Distributed Cloud to control Primary and Secondary DNS Using F5 Distributed Cloud DNS Load Balancer health checks and DNS observability Demo Guide: F5 Distributed Cloud DNS (SaaS Console)
29Views1like0CommentsCentralized Application Control for Distributed AI with Equinix and F5 Distributed Cloud
As AI adoption accelerates, I’ve been seeing a common architectural pattern emerge: centralized AI factories handling model training, with inference workloads pushed out to remote departments like public safety, healthcare, or logistics. While the execution is distributed, the operational requirements—security, performance, and policy consistency—remain very much centralized. The challenge isn’t running inference at the edge; it’s delivering centralized AI services to distributed consumers without introducing complex routing, fragmented security controls, or inconsistent performance between locations. This article outlines how you can address that problem using F5 Distributed Cloud (XC) Customer Edge deployed on Equinix Network Edge, with private connectivity provided by Equinix Fabric. The Problem to Solve From an infrastructure perspective, these environments tend to stress three things simultaneously: Scalability, as data volumes and inference demand grow rapidly Security, to protect models, APIs, and sensitive inference data Reliability, so performance remains consistent regardless of where requests originate Traditional approaches often force tradeoffs—centralize everything and accept latency, or decentralize enforcement and deal with policy sprawl. What we need is centralized control with distributed execution. Architectural Approach Rather than building bespoke connectivity for each inference location, we’ll focus on creating a repeatable edge pattern that could be deployed globally while still being governed centrally. The architecture breaks down into four core components: Central AI Factory (Training Hub) This is where model training and lifecycle management live. It connects to S3‑compatible object storage for large‑scale data ingestion and model artifacts. Importantly, it doesn’t need direct exposure to every inference a consumer makes. Equinix Fabric Equinix Fabric provides private, low‑latency connectivity between the AI factory and distributed inference locations. In this design, it effectively acts as a segment extender across regions, keeping AI traffic off the public internet while preserving predictable performance. F5 Distributed Cloud (XC) Customer Edge F5 XC Customer Edge (CE) instances are deployed close to inference consumers. These handle traffic management, API security, segmentation, and observability, while remaining under centralized policy control. This is where enforcement happens—consistently, everywhere. Equinix Network Edge Marketplace Equinix Network Edge enables rapid deployment of Customer Edge instances in new regions without waiting on physical infrastructure, which is critical when inference demand expands faster than traditional provisioning cycles. How It Works Inference requests are processed locally through CEs at each location. When access to centralized resources is required—such as model updates or validation—traffic traverses Equinix Fabric back to the AI factory. The key detail is that policy is defined centrally but enforced at the edge. Security controls, API protections, and segmentation rules are created once and applied uniformly, regardless of geography. That eliminates the need for custom routing logic or per‑site security tuning. Design Principles That Matter A few principles guided the implementation: Centralized control, distributed execution — inference stays close to data. Governance stays centralized Zero Trust by default — all AI data flows are explicitly authenticated and authorized Elastic expansion — new regions can be brought online quickly through the Marketplace Integrated observability — traffic, performance, and security posture are visible across all endpoints Compliance‑ready — isolation and segmentation support regulatory requirements like GDPR and HIPAA When This Pattern Fits This approach works well for organizations that need to scale AI inference across multiple regions or departments while maintaining tight operational control. It’s particularly effective when inference demand grows incrementally and predictability, security, and governance matter more than ad‑hoc edge autonomy. If the goal is centralized governance with distributed execution, this pattern provides a clean and repeatable way to get there. Additional Links F5 Distributed Cloud Services F5 Distributed Cloud (XC) Customer Edge Equinix Fabric Equinix Network Edge Marketplace74Views1like0CommentsFuture-Proofing Kubernetes Routing: From Standard Ingress to Role-Based CRDs
Standard Kubernetes Ingress is a "lowest common denominator" model that enforces rigid, monolithic configurations, limiting agility. F5 NGINX Ingress Controller breaks this mold with role-oriented Custom Resource Definitions (CRDs)—VirtualServer and VirtualServerRoute—enabling a modern system of delegated authority that is a sharp contrast to standard monolithic configurations. This approach provides platforms with central security control while granting application teams the essential freedom to manage their own routes without risk.163Views1like0CommentsWhere SASE Ends and ADSP Begins, The Dual-Plane Zero Trust Model
Introduction Zero Trust Architecture (ZTA) mandates “never trust, always verify”, explicit policy enforcement across every user, device, network, application, and data flow, regardless of location. The challenge is that ZTA isn’t a single product. It’s a model that requires enforcement at multiple planes. Two converged platforms cover those planes: SASE at the access edge, and F5 ADSP at the application edge. This article explains what each platform does, where the boundary sits, and why both are necessary. Two Planes, One Architecture SASE and F5 ADSP are both converged networking and security platforms. Both deploy across hardware, software, and SaaS. Both serve NetOps, SecOps, and PlatformOps through unified consoles. But they enforce ZTA at different layers, and at different scales. SASE secures the user/access plane: it governs who reaches the network and under what conditions, using ZTNA (Zero Trust Network Access), SWG, CASB, and DLP. F5 ADSP secures the application plane: it governs what authenticated sessions can actually do once traffic arrives, using WAAP, bot management, API security, and ZTAA (Zero Trust Application Access). The NIST SP 800-207 distinction is useful here: SASE houses the Policy Decision Point for network access; ADSP houses the Policy Enforcement Point at the application layer. Neither alone satisfies the full ZTA model. The Forward/Reverse Proxy Split The architectural difference comes down to proxy direction. SASE is a forward proxy. Employee traffic terminates at an SSE PoP, where identity and device posture are checked before content is retrieved on the user’s behalf. SD-WAN steers traffic intelligently across MPLS, broadband, 5G, or satellite based on real-time path quality. SSE enforces CASB, RBI, and DLP policies before delivery. F5 ADSP is a reverse proxy. Traffic destined for an application terminates at ADSP first, where L4–7 inspection, load balancing, and policy enforcement happen before the request reaches the backend. ADSP understands application protocols, session behavior, and traffic patterns, enabling health monitoring, TLS termination, connection multiplexing, and granular authorization across BIG-IP (hardware, virtual, cloud), NGINX, BIG-IP Next for Kubernetes (BNK), and BIG-IP CNE. The scale difference matters: ADSP handles consumer-facing traffic at orders of magnitude higher volume than SASE handles employee access. This is why full platform convergence only makes sense at the SMB scale, enterprise organizations operate them as distinct, specialized systems owned by different teams. ZTA Principles Mapped to Each Platform ZTA requires continuous policy evaluation, not just at initial authentication, but throughout every session. The table below maps NIST SP 800-207 principles to how each platform implements them. ZTA Principle SASE F5 ADSP Verify explicitly Identity + device posture evaluated per session at SSE PoP L7 authz per request: token validation, API key checks, behavioral scoring Least privilege ZTNA grants per-application, per-session access, no implicit lateral movement API gateway enforces method/endpoint/scope, no over-permissive routes Assume breach CASB + DLP monitors post-access behavior, continuous posture re-evaluation WAF + bot mitigation inspects every payload; micro-segmentation at service boundaries Continuous validation Real-time endpoint compliance; access revoked on posture drift ML behavioral baselines detect anomalous request patterns mid-session Use Case Breakdown Secure Remote Access SASE enforces ZTNA, validating identity, MFA, and endpoint compliance before granting access. F5 ADSP picks up from there, enforcing L7 authorization continuity: token inspection, API gateway policy, and traffic steering to protected backends. A compromised identity that passes ZTNA still faces ADSP’s per-request behavioral inspection. Web Application and API Protection (WAAP) SASE pre-filters known malicious IPs and provides initial TLS inspection, reducing volumetric noise. F5 ADSP delivers full-spectrum WAAP in-path, signature, ML, and behavioral WAF models simultaneously, where application context is fully visible. SASE cannot inspect REST API schemas, GraphQL mutation intent, or session-layer business logic. ADSP can. Bot Management SASE blocks bot C2 communications and applies rate limits at the network edge. F5 ADSP handles what gets through: JavaScript telemetry challenges, ML-based device fingerprinting, and human-behavior scoring that distinguishes legitimate automation (CI/CD, partner APIs) from credential stuffing and scraping, regardless of source IP reputation. AI Security SASE applies CASB and DLP policies to block sensitive data uploads to external AI services and discover shadow AI usage across the workforce. F5 ADSP protects custom AI inference endpoints: prompt injection filtering, per-model, rate limiting, request schema validation, and encrypted traffic inspection. The Handoff Gap, and How to Close It The most common zero trust failure in hybrid architectures isn’t within either platform. It’s the handoff between them. ZTNA grants access, but session context (identity claims, device posture score, risk level) doesn’t automatically propagate to the application plane. The fix is explicit context propagation: SASE injects headers carrying identity and posture signals; ADSP policy engines consume them for L7 authorization decisions. This closes the gap between “who is allowed to connect” and “what that specific session is permitted to do.” Conclusion SASE and F5 ADSP are not competing platforms. They are complementary enforcement planes. SASE answers: can this user reach the application? ADSP answers: What can this session do once it arrives? Organizations that deploy only one leave systematic gaps. Together, with explicit context propagation at the handoff, they deliver the end-to-end zero trust coverage that NIST SP 800-207 actually requires. Related Content Why SASE and ADSP are complementary platform242Views3likes0CommentsBeyond Five Nines: SRE Practices for BIG-IP Cloud-Native Network Functions
Introduction Five nines (99.999%) availability gets the headline. But any SRE who has been on-call for a telecom user-plane incident knows that uptime percentages don’t capture the full picture. A NAT pool exhausted at 99.98% availability can still affect millions of subscribers. A DNS cache miss storm at 99.99% uptime can still degrade application performance across an entire region. This article explores how SRE principles (specifically SLIs, SLOs, error budgets, and toil reduction) apply to cloud-native network functions (CNFs) deployed with F5 BIG-IP Cloud-Native Edition. The goal is practical: give SRE teams and platform engineers the vocabulary and patterns to instrument, operate, and evolve these functions the same way they operate any other Kubernetes workload. Why subscriber-centric SLIs beat infrastructure metrics Traditional network operations relies on infrastructure health metrics: CPU utilisation, interface counters, and process uptime. These metrics are necessary, but they answer the wrong question. They tell you the system’s perspective, not the subscriber’s. SRE flips this. An SLI is a direct quantitative measurement of user-visible service behavior. For a CNF in the 5G user plane, subscriber-centric SLIs look like: GTP-U flow forwarding success rate (not just firewall process uptime) NAT session establishment latency at P95 (not just CPU idle) DNS query response rate and cache hit ratio (not just resolver process health) Packet drop rate at the N6/Gi-LAN boundary (not just interface RX errors) BIG-IP CNE exposes these metrics natively through Prometheus-compatible endpoints on each CNF pod, meaning your existing Kubernetes observability stack, whether that is Prometheus + Grafana, Datadog, or a vendor-managed observability platform, can consume them without custom instrumentation. As a consultant, if your monitoring today alerts on CNF pod restarts before it alerts on subscriber-impacting packet drops, your SLI hierarchy is inverted. Fix the SLI definition first, then tune your alerting. SLIs and SLOs: the measurement-to-promise pipeline The distinction between SLIs and SLOs is operational, not semantic. An SLI is what you observe; an SLO is what you commit to. Together, they create an error budget (your explicit allowance for controlled unreliability). Table 1 gives a quick summary to further highlight the relation between SLI, SLO and why it matters to SREs. Table 1: SLI vs SLO — what each term means operationally Aspect SLI (Measurement) SLO (Target) Why it matters to SREs Purpose Reports reality Sets reliability goal Drives team alignment Example "99.92% queries succeeded" "≥99.99% over 30d" Error budget = 0.01% Burn rate Changes minute-by-minute Calculated over window Feeds alerting cadence Action Feeds dashboards/alerts Gates releases Halts or accelerates rollouts The gap between your SLI (what you measure) and your SLO (what you target) is the error budget. For a DNS CNF with an SLO of 99.99% queries answered within 20ms over 30 days, the error budget is 4.38 minutes of allowable degradation per month. That budget governs rollout velocity: when the budget is healthy, teams can ship faster; when it burns through, all changes halt until the system stabilizes. Example: Set your SLO as "99.99% of GTP-U flows processed within 2ms." Your error budget is 0.01% of flows, or roughly 52 minutes of allowable impact per year. A CNF upgrade that introduces a 0.005% flow drop during rollout consumes half your annual budget. That’s the signal your CI/CD pipeline should be gating on — not deployment success. Golden signals mapped to BIG-IP CNE metrics The SRE golden signals (latency, traffic, errors, saturation) map directly to BIG-IP CNE telemetry. The table below gives practical SLI examples, SLO targets, and the operator’s actions each signal should trigger. Table 2 shows an example with the relation to the SLO concepts and the actions to be taken. Table 2: Golden signals as operational SLIs for BIG-IP CNE Golden Signal BIG-IP CNE SLI Example SLO Target Operator Action Latency P95 GTP-U at Edge Firewall CNF ≤ 2ms for 99.99% flows Scale pods / tune policy Traffic Packets/sec per CNF pod Autoscale to 4M+ pps HPA trigger or pre-scale Errors NAT session failure rate < 0.01% over 30 days Halt rollout, root-cause Saturation Port/CPU threshold breach Proactive alert at 80% Drain + horizontal scale These SLIs flow into the same Prometheus/Grafana stack your Kubernetes platform team already operates. A single dashboard can surface both pod-level Kubernetes metrics and CNF user-plane metrics, creating a shared view of reliability that eliminates the classic “my side is green” response to incidents. Observability implementation: metrics, logs, and traces BIG-IP CNE exports telemetry natively into Kubernetes observability pipelines. Here is what that looks like in practice for each pillar of observability: Pillars Description Metrics Each CNF pod exposes metrics endpoints compatible with Prometheus scraping. Key metric families include flow_processing_latency_seconds (histogram), nat_session_failures_total (counter), dns_cache_hit_ratio (gauge), and pod_packet_drop_total (counter). These feed directly into your SLI calculations. Logs CNF logs emit structured JSON to stdout, consumable by Fluentd, Fluent Bit, or any log aggregator in your cluster. Event chains like NAT pool exhaustion produce correlated log sequences that enable root-cause analysis without SSH access to the CNF pod. Traces For distributed request tracing (for example, following a DNS query from UE through the DNS CNF to upstream resolvers) BIG-IP CNE supports OpenTelemetry trace propagation. This is particularly useful when debugging latency spikes in multi-CNF traffic chains where the delay source is ambiguous. Config note: To wire CNF metrics into an existing Prometheus stack, annotate the CNF pod spec with prometheus.io/scrape:“true”" and prometheus.io/port matching the CNF metrics port. No additional expertise required. Error budgets as a deployment gate SRE uses error budgets to make deployment velocity a function of reliability, not a function of the change calendar. Here is how this applies to CNF operations with BIG-IP CNE: Healthy budget (burn rate < 1x): Teams can accelerate CNF feature delivery. New CRD configurations, Helm chart upgrades, and policy changes proceed with normal review cycles. Elevated burn (burn rate 1–5x): All non-emergency CNF changes require additional review. Automated rollback thresholds tighten. Budget exhausted: CNF changes halt. The SRE team shifts 100% focus to reliability work until the budget recovers. This is a policy decision, not a technical one. In practice, BIG-IP CNE supports this through Kubernetes-native mechanisms: Helm-managed upgrades can be gated by pre-upgrade hooks that query current SLI state; CRD-based configuration changes can be rolled out with canary patterns using standard Kubernetes deployment strategies; HPA (Horizontal Pod Autoscaler) rules can be tied directly to CNF-emitted metrics rather than generic CPU thresholds. Toil reduction: from runbooks to controllers SRE defines toil as manual, repetitive, automatable operational work that scales with traffic volume but produces no enduring value. In telecom CNF operations, toil accumulates fast: Manual NAT pool expansion during traffic peaks SSH-based policy pushes for firewall rule updates Ticket-driven DNS configuration changes Manual health checks before and after maintenance windows BIG-IP CNE addresses this through Kubernetes-native control loops. Configuration is declarative — CNF policies are expressed as Custom Resource Definitions (CRDs) applied via kubectl or GitOps pipelines. Kubernetes controllers reconcile the actual CNF state to the desired state defined in Git, eliminating configuration drift and manual intervention. Example: Instead of a runbook step that says “SSH to the CGNAT CNF and add 1000 ports to poolX,” your GitOps pipeline applies a CRD update that the CNF controller reconciles automatically. The audit trail is a Git commit, not a change ticket. SRE teams typically target a 50/50 split between operational work and engineering work. CNF operations that rely on manual runbooks push this ratio toward 70–80% operations. Declarative CNF management via CRDs and Helm shifts it back, freeing SRE capacity for SLO definition, observability improvement, and automation engineering. Dissolving the platform/network operations boundary Figure 1: SRE bridges the Kubernetes platform team and telecom network operations team through shared SLIs and a unified observability stack. The most persistent operational problem in cloud-native telecom is not technical; it is organizational. Kubernetes platform teams and telecom network operations teams measure different things, escalate through different processes, and use different tooling. When a GTP-U latency spike occurs, Kubernetes teams check pod health and cluster metrics; telecom teams check interface counters and policy logs. Neither has the full picture. The SRE resolves this by requiring both teams to operate against the same SLIs. When CNF and cluster metrics flow into the same observability stack: A single SLI can span pods, nodes, and network functions Rollouts, autoscaling, and maintenance windows are gated by shared error budgets rather than siloed change calendars Kubernetes engineers declare CNF configurations as code; telecom teams define SLOs that consume those functions as building blocks The result is that when an SLI burns through an error budget (for example, a 0.02% GTP-U drop rate) both teams respond to the same signal. Kubernetes teams scale pods; telecom teams tune policies. No finger-pointing. Shared accountability for the packet-level truth that subscribers experience. 5G N6/Gi-LAN consolidation: a concrete SRE use case Figure 2: BIG-IP CNE consolidating SGi-LAN/N6 functions (Edge Firewall, CGNAT, DNS) as Kubernetes-native CNFs alongside the 5G core. A common deployment pattern for BIG-IP CNE is N6/Gi-LAN consolidation, where edge firewalling, CGNAT, DNS, and DDoS protection are deployed as CNFs alongside the 5G core rather than as discrete physical or virtual appliances. From an SRE perspective, this architecture enables composite SLOs that span multiple CNFs in a single traffic chain: Edge Firewall CNF: SLI = packet drop rate at N6 boundary. SLO = <0.001% drops over 30 days. CGNAT CNF: SLI = NAT session establishment success rate. SLO = 99.99% sessions established within 5ms. DNS CNF: SLI = query response latency at P95. SLO = P95 < 20ms with >80% cache hit ratio. Composite SLOs then drive autoscaling and routing decisions based on real service behavior rather than static capacity plans. When the DNS cache hit ratio drops below threshold, the autoscaler adds DNS CNF replicas driven by the CNF-emitted metric, not a manual capacity review. Conclusion: Path to AI-native 6G The 6G architecture direction (disaggregated, software-defined network functions dynamically placed across distributed edge locations) requires SRE disciplines at the foundation, not bolted on later. Networks that must adapt in near-real time cannot be operated by humans with runbooks. BIG-IP CNE was designed with this trajectory in mind. The same Kubernetes-native architecture that enables SRE practices for 5G today (declarative configuration, horizontal scaling, native observability) is the foundation for AI-driven traffic steering, dynamic policy enforcement, and intent-based networking in 6G environments. For platform teams making architecture decisions now: investing in SLO definition and observability instrumentation for current CNF deployments is not just operational hygiene. It is building the data infrastructure that AI-native operations will require. Key takeaways, Define SLIs at the subscriber boundary, not the infrastructure boundary Use error budgets to gate CNF rollout velocity. Make it a CI/CD policy, not a manual decision Consume CNF Prometheus metrics in your existing Kubernetes observability stack, no separate tooling required Declarative CRD-based CNF management via GitOps is the primary toil-reduction lever Shared SLIs between Kubernetes platform and telecom operations teams eliminate the organizational boundary that causes most major incidents Related content BIG-IP Next for Kubernetes CNFs - DNS walkthrough BIG-IP Next for Kubernetes CNFs deployment walkthrough From virtual to cloud-native, infrastructure evolution Visibility for Modern Telco and Cloud‑Native Networks BIG-IP Next Cloud-Native Network Functions (CNFs)232Views3likes0CommentsAccelerate Application Deployment on Google Cloud with F5 NGINXaaS
Introduction In the push for cloud-native agility, infrastructure teams often face a crossroads: settle for basic, "good enough" load balancing, or take on the heavy lifting of manually managing complex, high-performance proxies. For those building on Google Cloud (GCP), this compromise is no longer necessary. F5 NGINXaaS for Google Cloud represents a shift in how we approach application delivery. It isn’t just NGINX running in the cloud; it is a co-engineered, fully managed on-demand service that lives natively within the GCP ecosystem. This integration allows you to combine the advanced traffic control and programmability NGINX is known for with the effortless scaling and consumption model of an SaaS offering in a platform-first way. By offloading the "toil" of lifecycle management—like patching, tuning, and infrastructure provisioning—to F5, teams can redirect their energy toward modernizing application logic and accelerating release cycles. In this article, we’ll dive into how this synergy between F5 and Google Cloud simplifies your architecture, from securing traffic with integrated secret management to gaining deep operational insights through native monitoring tools. Getting Started with NGINXaaS for Google Cloud The transition to a managed service begins with a seamless onboarding experience through the Google Cloud Marketplace. By leveraging this integrated path, teams can bypass the manual "toil" of traditional infrastructure setup, such as patching and individual instance maintenance. The deployment process involves: Marketplace Subscription: Directly subscribe to the service to ensure unified billing and support. Network Connectivity: Setting up essential VPC and Network Attachments to allow NGINXaaS to communicate securely with your backend resources. Provisioning: Launching a dedicated deployment that provides enterprise-grade reliability while maintaining a cloud-native feel. Secure and Manage SSL/TLS in F5 NGINXaaS for Google Cloud Security is a foundational pillar of this co-engineered service, particularly regarding traffic encryption. NGINXaaS simplifies the lifecycle of SSL/TLS certificates by providing a centralized way to manage credentials. Key security features include: Integrated Secrets Management: Working natively with Google Cloud services to handle sensitive data like private keys and certificates securely. Proxy Configuration: Demonstrating how to set up a Google Cloud proxy network load balancer to handle incoming client traffic. Credential Deployment: Uploading and managing certificates directly within the NGINX console to ensure all application endpoints are protected by robust encryption. Enhancing Visibility in Google Cloud with F5 NGINXaaS Visibility is no longer an afterthought but a native component of the deployment, providing high-fidelity telemetry without separate agents. Native Telemetry Export: By linking your Google Cloud Project ID and configuring Workload Identity Federation (WIF), metrics and logs are pushed directly to Google Cloud Monitoring. Real-Time Dashboards: The observability demo walks through using the Metrics Explorer to visualize critical performance data, such as active HTTP connection counts and response rates. Actionable Logging: Integrated Log Analytics allow you to use the Logs Explorer to isolate events and troubleshoot application issues within a single toolset, streamlining your operational workflow. Whether you are just beginning your transition to the cloud or fine-tuning a sophisticated microservices architecture, F5 NGINXaaS provides the advanced availability, scalability, security, and visibility capabilities necessary for success in the Google Cloud environment. Conclusion The integration of F5 NGINXaaS for Google Cloud represents a significant advantage for organizations looking to modernize their application delivery without the traditional overhead of infrastructure management. By shifting to this co-engineered, managed service, teams can bridge together advanced NGINX performance and the native agility of the Google Cloud ecosystem. Through the demonstrations provided in this article, we’ve highlighted how you can: Accelerate Onboarding: Move from Marketplace subscription to a live deployment in minutes using Network Attachments. Fortify Security: Centralize SSL/TLS management within the NGINX console while leveraging Google Cloud's robust networking layer. Maximize Operational Intelligence: Harness deep, real-time observability by piping telemetry directly into Google Cloud Monitoring and Logging. Resources Accelerating app transformation with F5 NGINXaaS for Google Cloud F5 NGINXaaS for Google Cloud: Delivering resilient, scalable applications162Views2likes2CommentsDeploying the F5 AI Security Certified OpenShift Operator: A Validated Playbook
Introduction As enterprises race to deploy Large Language Models (LLMs) in production, securing AI workloads has become as critical as securing traditional applications. The F5 AI Security Operator installs two products on your cluster — F5 AI Guardrails and F5 AI Red Team — both powered by CalypsoAI. Together they provide inline prompt/response scanning, policy enforcement, and adversarial red-team testing, all running natively on your own OpenShift cluster. This article is a validated deployment runbook for F5 AI Security on OpenShift (version 4.20.14) with NVIDIA GPU nodes. It is based on the official Red Hat Operator installation baseline, in a real lab deployment on a 3×A40 GPU cluster. If you follow these steps in order, you will end up with a fully functional AI Security stack, avoiding the most common pitfalls along the way. What Gets Deployed F5 AI Security consists of four main components, each running in its own OpenShift namespace: Component Namespace Role Moderator + PostgreSQL cai-moderator Web UI, API gateway, policy management, and backing database Prefect Server + Worker prefect Workflow orchestration for scans and red-team runs AI Guardrails Scanner cai-scanner Inline scanning against your OpenAI-compatible LLM endpoint AI Red Team Worker cai-redteam GPU-backed adversarial testing; reports results to Moderator via Prefect The Moderator is CPU-only. The Scanner and Red Team Worker can leverage GPUs depending on the policies and models you configure. Infrastructure Requirements Before you begin, verify your cluster meets these minimums: CPU / Control Node 16 vCPUs, 32 GiB RAM, x86_64, 100 GiB persistent storage Worker Nodes (per GPU-enabled component) 4 vCPUs, 16 GiB RAM (32 GiB recommended for Red Team), 100 GiB storage GPU Nodes AI Guardrails: CUDA-compatible GPU, minimum 24 GB VRAM, 100 GiB storage AI Red Team: CUDA-compatible GPU, minimum 48 GB VRAM, 200 GiB storage GPU must NOT be shared with other workloads Verify your cluster: # Check nodes oc get nodes -o wide # Check GPU allocatable resources oc get node -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' # Check available storage classes oc get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE lvms-vg1 (default) topolvm.io Delete WaitForFirstConsumer true 15d Step 1 — Install Prerequisites 1.1 Node Feature Discovery (NFD) Operator NFD labels your nodes with hardware capabilities, which NVIDIA GPU Operator relies on to target the right nodes. OpenShift Console → Ecosystem → Software Catalog → Search Node Feature Discovery Operator → Install After installation: Installed Operators → Node Feature Discovery → Create NodeFeatureDiscovery → Accept defaults Verify: oc get pods -n openshift-nfd oc get node --show-labels | grep feature.node.kubernetes.io || true 1.2 NVIDIA GPU Operator OpenShift Console → Ecosystem → Software Catalog → Search GPU Operator → Install After installation: Installed Operators → NVIDIA GPU Operator → Create ClusterPolicy → Accept defaults Verify: oc get pods -n nvidia-gpu-operator oc describe node <gpu-node> | grep -i nvidia nvidia-smi</gpu-node> Step 2 — Install F5 AI Security Operator Prerequisites: You will need registry credentials and a valid license from the F5 AI Security team before proceeding. Contact F5 Sales: https://www.f5.com/products/get-f5 2.1 Create the Namespace and Pull Secret export DOCKER_USERNAME='<registry-username>' export DOCKER_PASSWORD='<registry-password>' export DOCKER_EMAIL='<your-email>' oc new-project f5-ai-sec oc create secret docker-registry regcred \ -n f5-ai-sec \ --docker-username=$DOCKER_USERNAME \ --docker-password=$DOCKER_PASSWORD \ --docker-email=$DOCKER_EMAIL</your-email></registry-password></registry-username> 2.2 Install from OperatorHub OpenShift Console → Ecosystem → Software Catalog → Search F5 AI Security Operator → Install into namespace f5-ai-sec Verify your F5 AI Security Operator: # Verify the controller-manager pod is Running oc -n f5-ai-sec get pods # NAME READY STATUS RESTARTS AGE # controller-manager-6f784bd96d-z6sbh 1/1 Running 1 43s # Verify the CSV reached Succeeded phase oc -n f5-ai-sec get csv # NAME DISPLAY VERSION PHASE # f5-ai-security-operator.v0.4.3 F5 Ai Security Operator 0.4.3 Succeeded # Verify the CRD is registered oc -n f5-ai-sec get crd | grep ai.security.f5.com # securityoperators.ai.security.f5.com 2.3 Deploy the SecurityOperator Custom Resource After installation: Installed Operators → F5 AI Security Operator → Create SecurityOperator Choose YAML and copy the below Custom Resource Template in there, changing select values to match your installation. apiVersion: ai.security.f5.com/v1alpha1 kind: SecurityOperator metadata: name: security-operator-demo namespace: f5-ai-sec spec: registryAuth: existingSecret: "regcred" # Internal PostgreSQL — convenient for labs, not recommended for production postgresql: enabled: true values: postgresql: auth: password: "pass" jobManager: enabled: true moderator: enabled: true values: env: CAI_MODERATOR_BASE_URL: https://<your-hostname> secrets: CAI_MODERATOR_DB_ADMIN_PASSWORD: "pass" CAI_MODERATOR_DEFAULT_LICENSE: "<valid_license_from_f5>" scanner: enabled: true redTeam: enabled: true</valid_license_from_f5></your-hostname> Key values to customize: Field What to set CAI_MODERATOR_BASE_URL Your cluster's public hostname for the UI (e.g., https://aisec.apps.mycluster.example.com ) CAI_MODERATOR_DEFAULT_LICENSE License string provided by F5 CAI_MODERATOR_DB_ADMIN_PASSWORD DB password — must match the value set in the PostgreSQL block For external PostgreSQL (recommended for production), replace the postgresql block with: moderator: values: env: CAI_MODERATOR_DB_HOST: <my-external-db-hostname> secrets: CAI_MODERATOR_DB_ADMIN_PASSWORD: <my-external-db-password></my-external-db-password></my-external-db-hostname> Verify your F5 AI Security Operator: oc -n f5-ai-sec get securityoperator oc -n f5-ai-sec get securityoperator security-operator-demo -o yaml | sed -n '/status:/,$p' Step 3 — Required OpenShift Configuration This is where most deployments hit problems. OpenShift's default restricted Security Context Constraint (SCC) blocks these containers from running. You must explicitly grant anyuid to each service account. 3.1 Apply SCC Policies oc adm policy add-scc-to-user anyuid -z cai-moderator-sa -n cai-moderator oc adm policy add-scc-to-user anyuid -z default -n cai-moderator oc adm policy add-scc-to-user anyuid -z default -n prefect oc adm policy add-scc-to-user anyuid -z prefect-server -n prefect oc adm policy add-scc-to-user anyuid -z prefect-worker -n prefect oc adm policy add-scc-to-user anyuid -z cai-scanner -n cai-scanner oc adm policy add-scc-to-user anyuid -z cai-redteam-worker -n cai-redteam 3.2 Force PostgreSQL to Restart (if Stuck at 0/1) If PostgreSQL was stuck before the SCC was applied, bounce it manually: oc -n cai-moderator scale sts/cai-moderator-postgres-cai-postgresql --replicas=0 oc -n cai-moderator scale sts/cai-moderator-postgres-cai-postgresql --replicas=1 3.3 Restart All Components oc -n cai-moderator rollout restart deploy oc -n prefect rollout restart deploy oc -n cai-scanner rollout restart deploy oc -n cai-redteam rollout restart deploy 3.4 Verify ➜ oc -n cai-moderator get statefulset NAME READY AGE cai-moderator-postgres-cai-postgresql 1/1 3d4h ➜ oc -n cai-moderator get pods | grep postgres cai-moderator-postgres-cai-postgresql-0 1/1 Running 0 3d4h ➜ oc -n cai-moderator get pods | grep cai-moderator cai-moderator-75c47fc9db-sl8t2 1/1 Running 0 3d4h cai-moderator-postgres-cai-postgresql-0 1/1 Running 0 3d4h ➜ oc -n cai-moderator get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cai-moderator ClusterIP 172.30.123.197 <none> 5500/TCP,8080/TCP 3d4h cai-moderator-headless ClusterIP None <none> 8080/TCP 3d4h cai-moderator-postgres-postgresql ClusterIP None <none> 5432/TCP 3d4h ➜ oc -n cai-moderator get endpoints Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice NAME ENDPOINTS AGE cai-moderator 10.130.0.139:8080,10.130.0.139:5500 3d4h cai-moderator-headless 10.130.0.139:8080 3d4h cai-moderator-postgres-postgresql 10.128.0.177:5432 3d4h</none></none></none> Step 4 — Create OpenShift Routes (Required for UI Access) The Moderator exposes two ports that must be routed separately: port 5500 for the UI and port 8080 for the /auth path. Skipping the auth route is the most common cause of the blank/black page issue. # UI route oc -n cai-moderator create route edge cai-moderator-ui \ --service=cai-moderator \ --port=5500 \ --hostname=<your-hostname> \ --path=/ # Auth route — required, or the UI will render blank oc -n cai-moderator create route edge cai-moderator-auth \ --service=cai-moderator \ --port=8080 \ --hostname=<your-hostname> \ --path=/auth</your-hostname></your-hostname> Verify all pods are running: oc get pods -n cai-moderator oc get pods -n cai-scanner oc get pods -n cai-redteam oc get pods -n prefect Access the UI Open https:// in a browser. Log in with the default credentials: admin / pass Log in and update the admin email address immediately. You should be able to log in successfully and see the Guardrails dashboard. Step 5 — Grant Prefect Worker Cluster-scope RBAC The Prefect worker watches Kubernetes Pods and Jobs at cluster scope to monitor scan and red-team workflow execution. Without this RBAC, prefect-worker fills its logs with 403 Forbidden errors. The Guardrails UI still loads, but scheduled workflows and Red Team runs will fail silently. # ClusterRole: allow prefect-worker to list/watch pods, jobs, and events cluster-wide oc apply -f - <<'YAML' apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prefect-worker-watch-cluster rules: - apiGroups: ["batch"] resources: ["jobs"] verbs: ["get","list","watch"] - apiGroups: [""] resources: ["pods","pods/log","events"] verbs: ["get","list","watch"] YAML # ClusterRoleBinding: bind to the prefect-worker ServiceAccount oc apply -f - <<'YAML' apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prefect-worker-watch-cluster subjects: - kind: ServiceAccount name: prefect-worker namespace: prefect roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prefect-worker-watch-cluster YAML # Restart to pick up the new permissions oc -n prefect rollout restart deploy/prefect-worker Verify RBAC errors are gone: oc -n prefect logs deploy/prefect-worker --tail=200 \ | egrep -i 'forbidden|rbac|permission|denied' \ || echo "OK: no RBAC errors detected" oc get clusterrolebinding prefect-worker-watch-cluster LlamaStack Integration F5 AI Security works alongside any OpenAI-compatible LLM inference endpoint. In our lab we pair it with LlamaStack running a quantized Llama 3.2 model on the same OpenShift cluster — F5 AI Guardrails then scans every prompt and response inline before it reaches your application. A dedicated follow-up post will walk through the full LlamaStack deployment and end-to-end integration in detail. Stay tuned. Summary Deploying F5 AI Security on OpenShift is straightforward once you know the OpenShift-specific friction points: SCC policies, the dual-route requirement, and the Prefect cluster-scope RBAC. Following this runbook in sequence — prerequisites, operator install, SCC grants, routes, Prefect RBAC — gets you to a fully operational AI guardrailing stack in a single pass. If you run into anything not covered here, drop a comment below. Tested on: OpenShift 4.20.14 · F5 AI Security Operator v0.4.3 · NVIDIA A40 GPUs · LlamaStack with Llama-3.2-1B-Instruct-quantized.w8a8 Additional Resources F5 AI Security Operator — Red Hat Catalog775Views1like0CommentsAI Inference for VLLM models with F5 BIG-IP & Red Hat OpenShift
This article shows how to perform Intelligent Load Balancing for AI workloads using the new features of BIG-IP v21 and Red Hat OpenShift. Intelligent Load Balancing is done based on business logic rules without iRule programming and state metrics of the VLLM inference servers gathered from OpenShift´s Prometheus.520Views1like5CommentsApp Migration and Portability with Equinix Fabric and F5 Distributed Cloud CE
Enterprises face growing pressure to modernize legacy applications, adopt hybrid multi-cloud strategies, and meet rising compliance and performance demands. Migration and portability are now essential to enable agility, optimize costs, and accelerate innovation. Organizations need a secure, high‑performance way to move and connect applications across environments without re‑architecting. This solution brings together Equinix and F5 to deliver a unified, cloud‑adjacent application delivery and security platform.182Views1like0CommentsF5 Container Ingress Services (CIS) deployment using Cilium CNI and static routes
F5 Container Ingress Services (CIS) supports static route configuration to enable direct routing from F5 BIG-IP to Kubernetes/OpenShift Pods as an alternative to VXLAN tunnels. Static routes are enabled in the F5 CIS CLI/Helm yaml manifest using the argument --static-routing-mode=true. In this article, we will use Cilium as the Container Network Interface (CNI) and configure static routes for an NGINX deployment For initial configuration of the BIG-IP, including AS3 installation, please see https://clouddocs.f5.com/products/extensions/f5-appsvcs-extension/latest/userguide/installation.html and https://clouddocs.f5.com/containers/latest/userguide/kubernetes/#cis-installation The first step is to install Cilium CNI using the steps below on Linux host: CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt) CLI_ARCH=amd64 if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} cilium install --version 1.18.5 cilium status cilium status --wait root@ciliumk8s-ubuntu-server:~# cilium status --wait /¯¯\ /¯¯\__/¯¯\ Cilium: OK \__/¯¯\__/ Operator: OK /¯¯\__/¯¯\ Envoy DaemonSet: OK \__/¯¯\__/ Hubble Relay: disabled \__/ ClusterMesh: disabled DaemonSet cilium Desired: 1, Ready: 1/1, Available: 1/1 DaemonSet cilium-envoy Desired: 1, Ready: 1/1, Available: 1/1 Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1 Containers: cilium Running: 1 cilium-envoy Running: 1 cilium-operator Running: 1 clustermesh-apiserver hubble-relay Cluster Pods: 6/6 managed by Cilium Helm chart version: 1.18.3 Image versions cilium quay.io/cilium/cilium:v1.18.3@sha256:5649db451c88d928ea585514746d50d91e6210801b300c897283ea319d68de15: 1 cilium-envoy quay.io/cilium/cilium-envoy:v1.34.10-1761014632-c360e8557eb41011dfb5210f8fb53fed6c0b3222@sha256:ca76eb4e9812d114c7f43215a742c00b8bf41200992af0d21b5561d46156fd15: 1 cilium-operator quay.io/cilium/operator-generic:v1.18.3@sha256:b5a0138e1a38e4437c5215257ff4e35373619501f4877dbaf92c89ecfad81797: 1 cilium connectivity test root@ciliumk8s-ubuntu-server:~# cilium connectivity test ℹ️ Monitor aggregation detected, will skip some flow validation steps ✨ [default] Creating namespace cilium-test-1 for connectivity check... ✨ [default] Deploying echo-same-node service... ✨ [default] Deploying DNS test server configmap... ✨ [default] Deploying same-node deployment... ✨ [default] Deploying client deployment... ✨ [default] Deploying client2 deployment... ✨ [default] Deploying ccnp deployment... ⌛ [default] Waiting for deployment cilium-test-1/client to become ready... ⌛ [default] Waiting for deployment cilium-test-1/client2 to become ready... ⌛ [default] Waiting for deployment cilium-test-1/echo-same-node to become ready... ⌛ [default] Waiting for deployment cilium-test-ccnp1/client-ccnp to become ready... ⌛ [default] Waiting for deployment cilium-test-ccnp2/client-ccnp to become ready... ⌛ [default] Waiting for pod cilium-test-1/client-645b68dcf7-s5mdb to reach DNS server on cilium-test-1/echo-same-node-f5b8d454c-qkgq9 pod... ⌛ [default] Waiting for pod cilium-test-1/client2-66475877c6-cw7f5 to reach DNS server on cilium-test-1/echo-same-node-f5b8d454c-qkgq9 pod... ⌛ [default] Waiting for pod cilium-test-1/client-645b68dcf7-s5mdb to reach default/kubernetes service... ⌛ [default] Waiting for pod cilium-test-1/client2-66475877c6-cw7f5 to reach default/kubernetes service... ⌛ [default] Waiting for Service cilium-test-1/echo-same-node to become ready... ⌛ [default] Waiting for Service cilium-test-1/echo-same-node to be synchronized by Cilium pod kube-system/cilium-lxjxf ⌛ [default] Waiting for NodePort 10.69.12.2:32046 (cilium-test-1/echo-same-node) to become ready... 🔭 Enabling Hubble telescope... ⚠️ Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:4245: connect: connection refused" ℹ️ Expose Relay locally with: cilium hubble enable cilium hubble port-forward& ℹ️ Cilium version: 1.18.3 🏃[cilium-test-1] Running 126 tests ... [=] [cilium-test-1] Test [no-policies] [1/126] .................... [=] [cilium-test-1] Skipping test [no-policies-from-outside] [2/126] (skipped by condition) [=] [cilium-test-1] Test [no-policies-extra] [3/126] <- snip -> For this article, we will install k3s with Cilium CNI root@ciliumk8s-ubuntu-server:~# curl -sfL https://get.k3s.io | sh -s - --flannel-backend=none --disable-kube-proxy --disable servicelb --disable-network-policy --disable traefik --cluster-init --node-ip=10.69.12.2 --cluster-cidr=10.42.0.0/16 root@ciliumk8s-ubuntu-server:~# mkdir -p $HOME/.kube root@ciliumk8s-ubuntu-server:~# sudo cp -i /etc/rancher/k3s/k3s.yaml $HOME/.kube/config root@ciliumk8s-ubuntu-server:~# sudo chown $(id -u):$(id -g) $HOME/.kube/config root@ciliumk8s-ubuntu-server:~# echo "export KUBECONFIG=$HOME/.kube/config" >> $HOME/.bashrc root@ciliumk8s-ubuntu-server:~# source $HOME/.bashrc API_SERVER_IP=10.69.12.2 API_SERVER_PORT=6443 CLUSTER_ID=1 CLUSTER_NAME=`hostname` POD_CIDR="10.42.0.0/16" root@ciliumk8s-ubuntu-server:~# cilium install --set cluster.id=${CLUSTER_ID} --set cluster.name=${CLUSTER_NAME} --set k8sServiceHost=${API_SERVER_IP} --set k8sServicePort=${API_SERVER_PORT} --set ipam.operator.clusterPoolIPv4PodCIDRList=$POD_CIDR --set kubeProxyReplacement=true --helm-set=operator.replicas=1 root@ciliumk8s-ubuntu-server:~# cilium config view | grep cluster bpf-lb-external-clusterip false cluster-id 1 cluster-name ciliumk8s-ubuntu-server cluster-pool-ipv4-cidr 10.42.0.0/16 cluster-pool-ipv4-mask-size 24 clustermesh-enable-endpoint-sync false clustermesh-enable-mcs-api false ipam cluster-pool max-connected-clusters 255 policy-default-local-cluster false root@ciliumk8s-ubuntu-server:~# cilium status --wait The F5 CIS yaml manifest for deployment using Helm Note that these arguments are required for CIS to leverage static routes static-routing-mode: true orchestration-cni: cilium-k8s We will also be installing custom resources, so this argument is also required 3. custom-resource-mode: true Values yaml manifest for Helm deployment bigip_login_secret: f5-bigip-ctlr-login bigip_secret: create: false username: password: rbac: create: true serviceAccount: # Specifies whether a service account should be created create: true # The name of the service account to use. # If not set and create is true, a name is generated using the fullname template name: k8s-bigip-ctlr # This namespace is where the Controller lives; namespace: kube-system ingressClass: create: true ingressClassName: f5 isDefaultIngressController: true args: # See https://clouddocs.f5.com/containers/latest/userguide/config-parameters.html # NOTE: helm has difficulty with values using `-`; `_` are used for naming # and are replaced with `-` during rendering. # REQUIRED Params bigip_url: X.X.X.S bigip_partition: <BIG-IP_PARTITION> # OPTIONAL PARAMS -- uncomment and provide values for those you wish to use. static-routing-mode: true orchestration-cni: cilium-k8s # verify_interval: # node-poll_interval: # log_level: DEBUG # python_basedir: ~ # VXLAN # openshift_sdn_name: # flannel_name: cilium-vxlan # KUBERNETES # default_ingress_ip: # kubeconfig: # namespaces: ["foo", "bar"] # namespace_label: # node_label_selector: pool_member_type: cluster # resolve_ingress_names: # running_in_cluster: # use_node_internal: # use_secrets: insecure: true custom-resource-mode: true log-as3-response: true as3-validation: true # gtm-bigip-password # gtm-bigip-url # gtm-bigip-username # ipam : true image: # Use the tag to target a specific version of the Controller user: f5networks repo: k8s-bigip-ctlr pullPolicy: Always version: latest # affinity: # nodeAffinity: # requiredDuringSchedulingIgnoredDuringExecution: # nodeSelectorTerms: # - matchExpressions: # - key: kubernetes.io/arch # operator: Exists # securityContext: # runAsUser: 1000 # runAsGroup: 3000 # fsGroup: 2000 # If you want to specify resources, uncomment the following # limits_cpu: 100m # limits_memory: 512Mi # requests_cpu: 100m # requests_memory: 512Mi # Set podSecurityContext for Pod Security Admission and Pod Security Standards # podSecurityContext: # runAsUser: 1000 # runAsGroup: 1000 # privileged: true Installation steps for deploying F5 CIS using helm can be found in this link https://clouddocs.f5.com/containers/latest/userguide/kubernetes/ Once F5 CIS is validated to be up and running, we can now deploy the following application example root@ciliumk8s-ubuntu-server:~# cat application.yaml apiVersion: cis.f5.com/v1 kind: VirtualServer metadata: labels: f5cr: "true" name: goblin-virtual-server namespace: nsgoblin spec: host: goblin.com pools: - path: /green service: svc-nodeport servicePort: 80 - path: /harry service: svc-nodeport servicePort: 80 virtualServerAddress: X.X.X.X --- apiVersion: apps/v1 kind: Deployment metadata: name: goblin-backend namespace: nsgoblin spec: replicas: 2 selector: matchLabels: app: goblin-backend template: metadata: labels: app: goblin-backend spec: containers: - name: goblin-backend image: nginx:latest ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: svc-nodeport namespace: nsgoblin spec: selector: app: goblin-backend ports: - port: 80 targetPort: 80 type: ClusterIP k apply -f application.yaml We can now verify the k8s pods are created. Then we will create a sample html page to test access to the backend NGINX pod root@ciliumk8s-ubuntu-server:~# k -n nsgoblin get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES goblin-backend-7485b6dcdf-d5t48 1/1 Running 0 6d2h 10.42.0.70 ciliumk8s-ubuntu-server <none> <none> goblin-backend-7485b6dcdf-pt7hx 1/1 Running 0 6d2h 10.42.0.97 ciliumk8s-ubuntu-server <none> <none> root@ciliumk8s-ubuntu-server:~# k -n nsgoblin exec -it po/goblin-backend-7485b6dcdf-pt7hx -- /bin/sh # cat > green <<'EOF' <!DOCTYPE html> > > <html> > <head> <title>Green Goblin</title> <style> body { background-color: #4CAF50; color: white; text-align: center; padding: 50px; } h1 { font-size: 3em; } > > > > > </style> </head> <body> <h1>I am the green goblin!</h1> <p>Access me at /green</p> </body> </html> > > > > > > > EOF root@ciliumk8s-ubuntu-server:~# k -n nsgoblin exec -it goblin-backend-7485b6dcdf-d5t48 -- /bin/sh # cat > green <<'EOF' > <!DOCTYPE html> <html> <head> <title>Green Goblin</title> <style> body { background-color: #4CAF50; color: white; text-align: center; padding: 50px; } h1 { font-size: 3em; } </style> > </head> <body> <h1>I am the green goblin!</h1> <p>Access me at /green</p> </body> </html> EOF> > > > > > > > > > > > > We can now validate the pools are created on the F5 BIG-IP root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes/Shared)(tmos)# list ltm pool all ltm pool svc_nodeport_80_nsgoblin_goblin_com_green { description "crd_10_69_12_40_80 loadbalances this pool" members { /kubernetes/10.42.0.70:http { address 10.42.0.70 } /kubernetes/10.42.0.97:http { address 10.42.0.97 } } min-active-members 1 partition kubernetes } ltm pool svc_nodeport_80_nsgoblin_goblin_com_harry { description "crd_10_69_12_40_80 loadbalances this pool" members { /kubernetes/10.42.0.70:http { address 10.42.0.70 } /kubernetes/10.42.0.97:http { address 10.42.0.97 } } min-active-members 1 partition kubernetes } root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes/Shared)(tmos)# list ltm virtual crd_10_69_12_40_80 ltm virtual crd_10_69_12_40_80 { creation-time 2025-12-22:10:10:37 description Shared destination /kubernetes/10.69.12.40:http ip-protocol tcp last-modified-time 2025-12-22:10:10:37 mask 255.255.255.255 partition kubernetes persist { /Common/cookie { default yes } } policies { crd_10_69_12_40_80_goblin_com_policy { } } profiles { /Common/f5-tcp-progressive { } /Common/http { } } serverssl-use-sni disabled source 0.0.0.0/0 source-address-translation { type automap } translate-address enabled translate-port enabled vs-index 2 } CIS log output 2025/12/22 18:10:25 [INFO] [Request: 1] cluster local requested CREATE in VIRTUALSERVER nsgoblin/goblin-virtual-server 2025/12/22 18:10:25 [INFO] [Request: 1][AS3] creating a new AS3 manifest 2025/12/22 18:10:25 [INFO] [Request: 1][AS3][BigIP] posting request to https://10.69.12.1 for tenants 2025/12/22 18:10:26 [INFO] [Request: 2] cluster local requested UPDATE in ENDPOINTS nsgoblin/svc-nodeport 2025/12/22 18:10:26 [INFO] [Request: 3] cluster local requested UPDATE in ENDPOINTS nsgoblin/svc-nodeport 2025/12/22 18:10:43 [INFO] [Request: 1][AS3][BigIP] post resulted in SUCCESS 2025/12/22 18:10:43 [INFO] [AS3][POST] SUCCESS: code: 200 --- tenant:kubernetes --- message: success 2025/12/22 18:10:43 [INFO] [Request: 3][AS3] Processing request 2025/12/22 18:10:43 [INFO] [Request: 3][AS3] creating a new AS3 manifest 2025/12/22 18:10:43 [INFO] [Request: 3][AS3][BigIP] posting request to https://10.69.12.1 for tenants 2025/12/22 18:10:43 [INFO] Successfully updated status of VirtualServer:nsgoblin/goblin-virtual-server in Cluster W1222 18:10:49.238444 1 warnings.go:70] v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice 2025/12/22 18:10:52 [INFO] [Request: 3][AS3][BigIP] post resulted in SUCCESS 2025/12/22 18:10:52 [INFO] [AS3][POST] SUCCESS: code: 200 --- tenant:kubernetes --- message: success 2025/12/22 18:10:52 [INFO] Successfully updated status of VirtualServer:nsgoblin/goblin-virtual-server in Cluster Troubleshooting: 1. If static routes are not added, the first step is to inspect CIS logs for entries similar to these: Cilium annotation warning logs 2025/12/22 17:44:45 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:41 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:42 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2025/12/22 17:46:43 [WARNING] Cilium node podCIDR annotation not found on node ciliumk8s-ubuntu-server, node has spec.podCIDR ? 2. These are resolved by adding annotations to the node using the reference: https://clouddocs.f5.com/containers/latest/userguide/static-route-support.html Cilium annotation for node root@ciliumk8s-ubuntu-server:~# k annotate node ciliumk8s-ubuntu-server io.cilium.network.ipv4-pod-cidr=10.42.0.0/16 root@ciliumk8s-ubuntu-server:~# k describe node | grep -E "Annotations:|PodCIDR:|^\s+.*pod-cidr" Annotations: alpha.kubernetes.io/provided-node-ip: 10.69.12.2 io.cilium.network.ipv4-pod-cidr: 10.42.0.0/16 PodCIDR: 10.42.0.0/24 3. Verify a static route has been created and test connectivity to k8s pods root@(ciliumk8s-bigip)(cfg-sync Standalone)(Active)(/kubernetes)(tmos)# list net route net route k8s-ciliumk8s-ubuntu-server-10.69.12.2 { description 10.69.12.1 gw 10.69.12.2 network 10.42.0.0/16 partition kubernetes } Using pup (command line HTML parser) -> https://commandmasters.com/commands/pup-common/ root@ciliumk8s-ubuntu-server:~# curl -s http://goblin.com/green | pup 'body text{}' I am the green goblin! Access me at /green 1 0.000000 10.69.12.34 ? 10.69.12.40 TCP 78 34294 ? 80 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM TSval=2984295232 TSecr=0 WS=128 2 0.000045 10.69.12.40 ? 10.69.12.34 TCP 78 80 ? 34294 [SYN, ACK] Seq=0 Ack=1 Win=23360 Len=0 MSS=1460 WS=512 SACK_PERM TSval=1809316303 TSecr=2984295232 3 0.001134 10.69.12.34 ? 10.69.12.40 TCP 70 34294 ? 80 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=2984295234 TSecr=1809316303 4 0.001151 10.69.12.34 ? 10.69.12.40 HTTP 149 GET /green HTTP/1.1 5 0.001343 10.69.12.40 ? 10.69.12.34 TCP 70 80 ? 34294 [ACK] Seq=1 Ack=80 Win=23040 Len=0 TSval=1809316304 TSecr=2984295234 6 0.002497 10.69.12.1 ? 10.42.0.97 TCP 78 33707 ? 80 [SYN] Seq=0 Win=23360 Len=0 MSS=1460 WS=512 SACK_PERM TSval=1809316304 TSecr=0 7 0.003614 10.42.0.97 ? 10.69.12.1 TCP 78 80 ? 33707 [SYN, ACK] Seq=0 Ack=1 Win=64308 Len=0 MSS=1410 SACK_PERM TSval=1012609408 TSecr=1809316304 WS=128 8 0.003636 10.69.12.1 ? 10.42.0.97 TCP 70 33707 ? 80 [ACK] Seq=1 Ack=1 Win=23040 Len=0 TSval=1809316307 TSecr=1012609408 9 0.003680 10.69.12.1 ? 10.42.0.97 HTTP 149 GET /green HTTP/1.1 10 0.004774 10.42.0.97 ? 10.69.12.1 TCP 70 80 ? 33707 [ACK] Seq=1 Ack=80 Win=64256 Len=0 TSval=1012609409 TSecr=1809316307 11 0.004790 10.42.0.97 ? 10.69.12.1 TCP 323 HTTP/1.1 200 OK [TCP segment of a reassembled PDU] 12 0.004796 10.42.0.97 ? 10.69.12.1 HTTP 384 HTTP/1.1 200 OK 13 0.004820 10.69.12.40 ? 10.69.12.34 TCP 448 HTTP/1.1 200 OK [TCP segment of a reassembled PDU] 14 0.004838 10.69.12.1 ? 10.42.0.97 TCP 70 33707 ? 80 [ACK] Seq=80 Ack=254 Win=23552 Len=0 TSval=1809316308 TSecr=1012609410 15 0.004854 10.69.12.40 ? 10.69.12.34 HTTP 384 HTTP/1.1 200 OK Summary: There we have it, we have successfully deployed an NGINX application on a Kubernetes cluster managed by F5 CIS using static routes to forward traffic to the kubernetes pods609Views3likes2Comments