application delivery
2366 TopicsAPIs First: Why AI Systems Are Still API Systems
AI and APIs Over the past several years, the industry has seen an explosion of interest in large language models and AI driven applications. Much of the discussion has focused on the models themselves: their size, their capabilities, and their apparent ability to reason, summarize, and generate content. In the process, it is easy to overlook a more fundamental reality. Modern AI systems are still API systems. Despite new abstractions and new terminology, the underlying mechanics of AI applications remain familiar. Requests are sent, responses are returned. Identities are authenticated, authorization decisions are made, data is retrieved, and actions are executed. These interactions happen over APIs, and the reliability, security, and scalability of AI systems are constrained by the same architectural principles that have always governed distributed systems. What is new is not the presence of APIs, but the nature of the consumer calling them. In traditional systems, API consumers are deterministic. They are code written by engineers who read the documentation and invoke endpoints in predictable ways. In AI systems, the consumer is increasingly a model, a probabilistic component that infers behavior from schemas, chains calls dynamically, and produces traffic patterns that were not explicitly programmed. That single shift is what makes every downstream concern in this series, including MCP design, token budgets, authorization, and operations, behave differently than in traditional API platforms. Understanding this relationship is critical, not only for building AI systems, but for operating and securing them in production. AI Applications as API Orchestration Platforms At a high level, an AI application is best understood not as a single model invocation, but as an orchestration layer that coordinates multiple API interactions. A typical request may involve: A client calling an application API Authentication and authorization checks Retrieval of contextual data from internal or external services One or more calls to a model inference endpoint Follow-on tool or service calls triggered by the model’s output Aggregation and formatting of the final response From an architectural perspective, this is not fundamentally different from any other multi-service application. Routing, observability, traffic management, and trust boundaries remain as relevant here as in any traditional platform. What has changed is that the decision logic, meaning when to call which service and with what parameters, is increasingly driven by model output rather than static application code. That shift does not eliminate APIs. It increases their importance. AI Application as an Orchestration Platform Models as API Endpoints, Not Black Boxes In production environments, models are consumed almost exclusively through APIs. Whether hosted by a third party or deployed internally, a model is exposed as an endpoint that accepts structured input and returns structured output. Treating models as API endpoints clarifies several important points. A model does not "see" your system. It receives a request payload, processes it, and returns a response. Everything the model knows about your environment arrives through an API boundary. What distinguishes model endpoints from conventional APIs is not their interface, but their operational profile. Responses are frequently streamed rather than returned as a single payload, which changes how load balancers, proxies, and timeouts behave. Payload sizes are highly variable, with both requests and responses ranging from a few hundred bytes to many megabytes depending on context and output length. Rate limits are often expressed in tokens per minute rather than requests per second, which complicates capacity planning and quota enforcement. Self-hosted models introduce additional concerns around GPU scheduling, cold start latency, and memory pressure that do not exist for traditional stateless services. These characteristics do not change the fundamental nature of a model as an API endpoint. They do mean that the operational assumptions built into the existing API infrastructure may not hold without adjustment. Tools, Retrieval, and Data Access Are Still APIs As AI systems evolve beyond simple prompt-and-response interactions, they increasingly rely on tools: databases, search systems, ticketing platforms, code repositories, and internal business services. These tools are almost always accessed through APIs. Retrieval-augmented generation, for example, is often described as a novel AI pattern. In practice, it is a sequence of API calls: An embedding service is called to encode a query A vector database is queried for relevant results A document store is accessed to retrieve source material The retrieved data is passed to the model as context Each step carries the usual concerns: latency, authorization, data exposure, and error handling. The model may influence when these calls occur, but it does not change their fundamental nature. Why API Design Matters More in AI Systems If AI systems are built on APIs, why do they feel harder to manage? The answer lies in amplification. Model-driven systems tend to: Chain API calls dynamically Surface data in ways developers did not explicitly anticipate Expand the blast radius of a misconfigured authorization Increase sensitivity to payload size and response shape A poorly designed API that returns excessive data may be tolerable in a traditional application. In an AI system, that same response can overflow context limits, leak sensitive information into prompts, or cascade into additional unintended tool calls. This amplification rarely stays within a single domain. A schema decision that looks like an application concern becomes a traffic and routing concern when responses grow unpredictably, and an authorization concern when a model uses that response to drive the next call. Design choices that were once contained within one team’s scope now propagate across the stack. In this sense, AI does not introduce entirely new architectural risks. It magnifies existing ones. Introducing MCP as an API Coordination Layer As models gain the ability to invoke tools directly, the need for consistent, structured access to APIs becomes more pressing. This is where Model Context Protocol (MCP) enters the picture. At a conceptual level, MCP does not replace APIs. It standardizes how AI systems discover, describe, and invoke API-backed tools. MCP servers typically sit in front of existing services, exposing them in a model-friendly way while relying on the same underlying API infrastructure. Seen through this lens, MCP is not a departure from established architecture patterns. It is an adaptation, one that acknowledges models as active participants in API-driven systems rather than passive consumers of text. But it is also the introduction of a new coordination layer, a tool plane, with its own operational, network, and security properties that do not map cleanly onto the API layer beneath it. The rest of this series examines what that means for the systems you build, run, and secure. Looking Ahead If AI systems are still API systems, then the familiar disciplines of API architecture, security, and operations remain essential. What changes is where decisions are made, how data flows, and how quickly small design flaws can propagate. The next article looks more closely at MCP itself, examining how it standardizes tool access on top of APIs and why treating it as a tool plane helps clarify both its power and its risks. From there, the series turns to tokens as a first-class design constraint that shapes tool schemas, response shaping, and traffic behavior. The fourth article addresses authorization and the security implications of letting models invoke tools directly, including identity, delegation, and the expanded blast radius MCP introduces. The series closes with a look at operating MCP-enabled systems in production, where reliability, cost, and safety have to be enforced rather than assumed. Resources: Article Series: MCP, APIs, and Tokens: Building and Securing the Tool Plane of AI Systems (Intro) MCP, APIs, and Tokens (Part 1 - APIs First: Why AI Systems Are Still API Systems) MCP, APIs, and Tokens (Part 2 - MCP as the Tool Plane: Standardizing Access Across APIs) MCP, APIs, and Tokens (Part 3 - Tokens as a Design Constraint for MCP and APIs) MCP, APIs, and Tokens (Part 4 - Securing the Tool Plane: MCP, APIs, and Authorization) MCP, APIs, and Tokens (Part 5 - Designing for the Inference Track: Safe, Scalable MCP Systems)102Views5likes2CommentsHow to Optimize AI Inference with F5 NGINX Gateway Fabric
If you’re managing Kubernetes clusters right now, you already know the drill: standard Layer 7 load balancing works flawlessly for web APIs where requests resolve in milliseconds. But the moment you start hosting Large Language Models (LLMs), that traditional routing logic falls apart. AI inference workloads are a completely different beast. You have to account for GPU memory, active inference queues, and KV-caches. If you rely on basic proxying, you usually end up with incredibly expensive GPUs tied up handling lightweight tasks, while your developers are forced to write custom middleware just to orchestrate traffic. We needed a way to bring intent-driven networking directly to the AI edge. That’s exactly what the Gateway API Inference Extension does. By pairing this with F5 NGINX Gateway FabricWe can transform a standard Kubernetes Gateway into a dedicated Inference Gateway. Let's look at how this changes the game for platform teams. The Two-Stage Architecture To make intelligent routing decisions, we use a two-stage architecture that separates high-level routing intent from real-time endpoint selection. Stage 1 (Intent): We use standard Kubernetes HTTPRoutes to define exactly where traffic should go based on paths, headers, or weights. Stage 2 (Real-Time Selection): Instead of routing blindly to backend pods, we target an InferencePool CRD. This pool uses an Endpoint Picker to evaluate real-time node telemetry (like queue depth) and picks the absolute best pod for the job. To prove this is running under the hood, we can describe our GPU and CPU InferencePools. Notice how each pool has a dedicated Endpoint Picker attached and ready to route traffic based on real-time node health. GPU Pool Endpoint Picker: kubectl describe inferencepool ollama-inferencepool -n ollama | grep -A 10 "Endpoint Picker" Endpoint Picker Ref: Failure Mode: FailClose Group: Kind: Service Name: ollama-inferencepool-epp Port: Number: 9002 Selector: Match Labels: App: ollama --> GPU Target Ports: CPU Pool Endpoint Picker: kubectl describe inferencepool ollama-cpu-pool -n ollama | grep -A 10 "Endpoint Picker" Endpoint Picker Ref: Failure Mode: FailOpen Group: Kind: Service Name: ollama-cpu-pool-epp Port: Number: 9002 Selector: Match Labels: App: ollama-cpu --> CPU Target Ports: This separation of routing intent from real-time endpoint selection allows platform engineers to solve three critical AI infrastructure challenges without requiring developers to write custom middleware. A quick note on resilience: Notice the Failure Mode in those outputs? This defines what happens if the Endpoint Picker itself goes offline. For our expensive GPU pool, we set it to FailClose (rejecting traffic so we don't overwhelm premium hardware blindly). For our CPU efficiency tier, we set it to FailOpen (falling back to standard round-robin load balancing to keep the application alive). Let’s look at three practical ways to use this setup to optimize your hardware. 1. Model-Aware Routing: Stop Using GPUs for Everything Treating all AI traffic equally is the fastest way to exhaust a hardware budget. High-performance GPUs should be strictly reserved for complex generation tasks, while efficiency tiers (such as CPUs) should handle lightweight tasks, such as audio transcription or basic summarization. apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: model-aware-httproute namespace: ollama spec: parentRefs: - name: inference-gateway namespace: nginx-gateway sectionName: http rules: # --- UI Preservation Rules --- - matches: - path: { type: PathPrefix, value: /ui } filters: - type: URLRewrite urlRewrite: path: { type: ReplacePrefixMatch, replacePrefixMatch: / } backendRefs: - name: chatbot port: 8501 - matches: - path: { type: PathPrefix, value: /static } - path: { type: Exact, value: /favicon.png } backendRefs: - name: chatbot port: 8501 # --- AI Routing Rules --- - matches: - path: { type: PathPrefix, value: /v1/audio } backendRefs: - group: inference.networking.k8s.io kind: InferencePool name: ollama-cpu-pool. # CPU Pool Fallback port: 11434 - matches: - path: { type: PathPrefix, value: /v1/chat } backendRefs: - group: inference.networking.k8s.io kind: InferencePool name: ollama-inferencepool. # GPU Pool Fallback port: 11434 Using path-based routing, the Gateway acts as an intelligent traffic cop. In the configuration below, we map the /v1/audio path directly to a CPU InferencePool. If a request hits this endpoint, the Gateway seamlessly offloads it to the efficiency tier, protecting our premium GPUs from trivial workloads. 2. Canary Deployments In MLOps, rolling out a new LLM is inherently risky. Performance regressions, latency spikes, and hallucinations are real threats to the user experience. You cannot simply cut over all production traffic to a new model overnight. Native traffic splitting provides a safety net for model validation. By configuring a deterministic weight at the Gateway level, 90% of users continue to be served by the stable production GPU pool, while 10% of traffic is routed to the efficiency tier for real-time validation. apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: canary-httproute namespace: ollama spec: parentRefs: - name: inference-gateway namespace: nginx-gateway sectionName: http rules: # --- UI Preservation Rules --- - matches: - path: { type: PathPrefix, value: /ui } filters: - type: URLRewrite urlRewrite: path: { type: ReplacePrefixMatch, replacePrefixMatch: / } backendRefs: - name: chatbot port: 8501 - matches: - path: { type: PathPrefix, value: /static } - path: { type: Exact, value: /favicon.png } backendRefs: - name: chatbot port: 8501 # --- Use Case 2: 90/10 Canary Split --- - matches: - path: { type: PathPrefix, value: /v1/chat } backendRefs: - group: inference.networking.k8s.io kind: InferencePool name: ollama-inferencepool. # GPU Pool Fallback weight: 90 port: 11434 - group: inference.networking.k8s.io kind: InferencePool name: ollama-cpu-pool. # CPU Pool Fallback weight: 10 port: 11434 Because this is handled purely via declarative YAML, platform teams can execute risk-free canary tests without altering any application code. 3. Cost Optimization (Header-Based): When an AI cluster is under maximum pressure, standard load balancers process requests on a first-in, first-out basis. In an enterprise environment, this is unacceptable. Mission-critical workflows and premium users must have guaranteed access to the best compute resources. By utilizing custom HTTP headers, client applications can signal their importance to the Gateway. The NGINX semantic engine reads these headers in real-time. If the x-query-complexity: high header is present, the request is immediately fast-tracked to the premium GPU pool. Every other request falls back to the CPU tier. apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: header-based-httproute namespace: ollama spec: parentRefs: - name: inference-gateway namespace: nginx-gateway sectionName: http rules: # --- UI Preservation Rules --- - matches: - path: { type: PathPrefix, value: /ui } filters: - type: URLRewrite urlRewrite: path: { type: ReplacePrefixMatch, replacePrefixMatch: / } backendRefs: - name: chatbot port: 8501 - matches: - path: { type: PathPrefix, value: /static } - path: { type: Exact, value: /favicon.png } backendRefs: - name: chatbot port: 8501 # --- Use Case 3: Priority Steering --- - matches: - headers: - type: Exact name: x-query-complexity value: high path: { type: PathPrefix, value: /v1/chat } backendRefs: - group: inference.networking.k8s.io kind: InferencePool name: ollama-inferencepool # GPU Pool port: 11434 - matches: - path: { type: PathPrefix, value: /v1/chat } backendRefs: - group: inference.networking.k8s.io kind: InferencePool name: ollama-cpu-pool # CPU Pool Fallback port: 11434 This enforces strict SLAs regardless of overall cluster load, so your most valuable transactions stay running. See the Architecture in Action Managing complex AI traffic across heterogeneous hardware doesn't have to be a headache. By utilizing F5 NGINX Gateway Fabric, you can optimize compute, validate models safely, and prioritize critical traffic—all through declarative, intent-driven configurations. To see exactly how these routing rules are executed, check out my full technical walk-through below: Resources : [NGINX Community Blog] Read the official announcement and dive deeper into how NGF supports this new extension. [NGINX Gateway Fabric Documentation] The official documentation for deploying and configuring NGINX Gateway Fabric in your Kubernetes environment. [NGINX Gateway Fabric (GitHub)] Explore the upstream development, CRD definitions, and architecture of the Inference Extension project.21Views1like0CommentsF5 NGINX Plus 37.0 Release now available
We’re thrilled to announce the general availability of F5 NGINX Plus Release 37.0. This release delivers an exceptionally broad set of new features, including the ability to monitor agentic traffic, control configurations remotely via a new API, format error logs as JSON with custom variables, monitor upstream latency with high-fidelity metrics, and much more. Additionally, to assist with lifecycle planning, NGINX Plus 37.0 will be the first release supported under our new long-term support (LTS) policy. Wondering why this is the 37.0 and not R37 release? With the introduction of the LTS policy, we’re changing how NGINX Plus is versioned. As always, NGINX Plus inherits all the latest capabilities from NGINX Open Source, the only all-in-one proven and trusted software web server, load balancer, reverse proxy, content cache, and API gateway. We highly recommend upgrading to the most recent NGINX Plus release to take advantage of all the latest features, fixes and security patches in NGINX Open Source and NGINX Plus. Here’s a summary of the most important updates in 37.0: Agentic Observability: Real-time MCP Traffic Monitoring NGINX Plus is ready for the agentic AI era with new observability capabilities that help teams trace and monitor AI agent activity, spot overly chatty agents, identify error-prone MCP tools, detect MCP server latency, and troubleshoot emerging AI application patterns. Control API Reconfiguring NGINX Plus just got much easier. With the new control API, teams can apply updates through a REST API call without the need to watch error logs to verify that reloads succeeded. The result is simpler config management, cleaner integrations, and CI/CD pipelines that are easier to automate. JSON Error Logs with Custom Variables Error logs are now easier to integrate, analyze, and act on. By exporting error logs in JSON format, teams can reduce the need for regex-heavy parsing or custom scripts, simplifying connections to logging pipelines. New customization options also enable richer debugging workflows, including correlation between individual error and access log entries. Enhanced Upstream Latency Metrics Understanding upstream behavior is essential to delivering fast, reliable applications. With latency histograms, teams can now spot user experience issues, API performance problems, and upstream bottlenecks faster than ever. HTTP/2 Support for Upstream Connectivity Applications can communicate with NGINX over HTTP/2 on the upstream side. This gives teams more flexibility to support modern application architectures without requiring backend services to rely on legacy HTTP protocols. Basic Authentication for HTTP CONNECT Forward Proxy Building on the forward proxy support introduced in NGINX Plus R36, 37.0 makes client authentication easier to implement with new Basic authentication support and a streamlined setup experience for JWT-based authentication. Additional Features and Updates inherited from NGINX Open Source: Encrypted Client Hello Multipath TCP ACME Module: Renewal Information Support Upstream connectivity now defaults to HTTP/1.1 with keepalives enabled New Features in Detail Agentic Observability: Real-time MCP Traffic Monitoring Agentic workloads introduce a new kind of traffic pattern: highly dynamic, non-deterministic tool calls that can fan out across multiple MCP servers and shift behavior over time. With the new Agentic Observability module, NGINX can inspect MCP traffic in real time and report throughput, latency, errors, and traces so you can quickly understand which agents are generating traffic, which MCP tools and servers are bottlenecks, and where failures are originating. To get started with Agentic Observability, configure NGINX Plus to use the mcp.js module from the nginx-mcp-js GitHub repository with the NGINX JavaScript module, the nginx-otel module, and an OpenTelemetry backend. See the repository for setup instructions and a sample configuration. The repository also includes an easy-to-deploy demo environment built with Docker Compose, OpenTelemetry, Prometheus, and Grafana. NGINX Control API NGINX Plus 37.0 introduces a new native Control API that provides programmatic access to runtime control and introspection. The API is capable of displaying the status of worker processes, identifying configuration files currently in memory, and updating NGINX configurations. Historically, NGINX configuration updates were initiated via UNIX signals or the nginx -s reload command-line parameter. While syntax issues in the config are caught with this method, error log inspection is still necessary to determine whether reloads succeeded or failed. For example, binding a listening socket to a privileged port would pass a config syntax test, but ultimately fail if NGINX is not running with root privileges. The new Control API provides real-time status on reloads, including downstream errors, as JSON in response to a configuration reload API call. Enabling the Control API Enable the Control API with care. Although it can be configured to listen for API requests on an external socket, this configuration is strongly discouraged. IMPORTANT: For security reasons, do not expose the API on public ports, the public internet, or any broad network. Enable the control listener when launching NGINX by using the -l command-line argument. For example, to expose the API on the Unix socket /tmp/nginx.sock, run the following command: nginx -l unix:/tmp/nginx.sock Configuring the Control API to listen on a Unix socket is currently the best available method for controlling authorization to make requests, because access can be managed through file permissions. The created Unix socket file is accessible only to the user running NGINX. Accessing the API The API is versioned and exposed under the /1 URI. You can use any HTTP client to access the API. The following examples use curl. To access the API exposed on a Unix socket, run the following command: curl --unix-socket /tmp/nginx.sock http://localhost/1/ Response: [ "control", "nginx" ] Inspecting the Running Configuration The following command returns the configuration currently loaded into memory. This is useful for identifying differences between the configuration NGINX is currently using and the configuration on disk. Any differences may indicate that the configuration on disk was changed but not successfully loaded into memory. curl --unix-socket /tmp/nginx.sock http://localhost/1/control/config Sample response: [ { "name": "nginx.conf", "content": " (...) " }, { "name": "stub.conf", "content": " (...) " } ] Triggering a Reload with Structured Feedback The following command causes NGINX to load the configuration from disk into memory. This is equivalent to running nginx -s reload, with the added benefit that the API can return errors beyond configuration syntax issues. curl --unix-socket /tmp/nginx.sock -X PATCH http://localhost/1/control/config Sample response: { "logs": [ “ .... “, “ .... “ ] } JSON Formatted Error Logs NGINX error logs have always provided valuable operational insight, but their traditional string-based format could make them difficult to parse at scale. Without structured fields or consistent delimiters, teams often had to rely on regexes, custom scripts, or complex heuristics to extract the data they needed. JSON is widely used across modern infrastructure for representing structured data, and support for parsing it is nearly universal across logging, monitoring, and automation tools. With JSON-formatted error logs in NGINX Plus, operators and system integrators can eliminate custom parsers and more easily connect NGINX to CI/CD pipelines, log aggregators, and application monitoring platforms. To enable JSON format in error logs, modify the error_log directive by adding the json parameter. error_log /var/log/nginx/error.json error json; Here’s an example of an error log entry formatted in JSON: { "level": "error", "timestamp": "2026-05-04T10:30:15.042+00:00", "pid": 12345, "tid": 12345, "cnum": 3, "msg": "connect() failed", "client": "192.168.1.10", "server": "example.com", "request": "GET /api HTTP/1.1", "upstream": "http://127.0.0.1:8080/api", "errno": 111, "errtext": "Connection refused" } Custom Error Log Variables NGINX Plus access logs are commonly used to inspect traffic and troubleshoot infrastructure integrations, but correlating access log entries with related error log events has historically been difficult. In many cases, teams had to rely on timestamp comparisons, which became increasingly unreliable on high-volume systems with many concurrent requests. NGINX Plus now makes correlation easier with custom error log variables. Using the new error_log_tag directive, operators can add custom context to error log entries, including request identifiers, host information, static strings, and values derived from incoming headers. This makes it easier to connect an error to the request, client, tenant, service, or workflow that produced it. NGINX Plus also adds the $time_iso8601_ms variable, which can be used in access logs to improve timestamp-based correlation with error logs. The error_log_tag directive accepts text-based NGINX variables and complex expressions, and outputs custom data in both JSON and text-formatted error log entries. In the following example, the directive adds a correlation identifier to each error log entry: server { error_log_tag request_id $request_id; error_log_tag x_request_id $http_x_request_id; location /api/ { proxy_pass http://backend; } } Enhanced Upstream Latency Metrics Low-latency application delivery is critical to delivering fast, reliable user experiences. Previously, NGINX Plus reported per-peer upstream latency in milliseconds as a moving average across the full request lifecycle, including the TCP handshake, TLS handshake, request send, and response read. While useful, averages can obscure important details, such as latency spikes, long-tail behavior, or patterns affecting only a subset of requests. With NGINX Plus 37.0, teams can now view latency histograms for each upstream peer. This provides a more detailed view of upstream performance, making it easier to understand latency distributions, analyze percentiles, identify outliers, and correlate upstream behavior with other NGINX Plus metrics and insights. Latency histogram data is available through the NGINX Plus API, can be exported to external observability systems using the Prometheus-njs exporter module, and can be viewed directly in the NGINX Plus dashboard for quick, at-a-glance insight, as shown below. NGINX Plus dashboardshowing new upstream latency data For deeper diagnostics, teams can export latency histogram data to Prometheus and visualize it in Grafana. The example below shows request density by latency bucket alongside p50, p95, and p99 response times, request rates, and upstream errors. By comparing latency distributions with throughput and error rates, operators can more quickly determine whether performance issues are driven by traffic spikes, upstream behavior, or HTTP errors. The following configuration example shows how to enable upstream latency histograms. To collect histogram data, define a shared memory zone in the upstream block. This enables upstream-level telemetry in NGINX Plus, including the new response_time_hist field. http { upstream my_backend { zone my_backend 64k; server backend1.example.com:8080; server backend2.example.com:8080; } server { listen 443 ssl; location / { proxy_pass http://my_backend; } # Expose the Nginx Plus API for metrics retrieval location /api/ { api write=on; allow 127.0.0.1; # restrict to localhost only deny all; } } } To access upstream latency histograms through the NGINX Plus API, use curl to issue the following REST call and pipe the response to jq for easier formatting: curl -s http://localhost/api/9/http/upstreams/my_backend | jq HTTP/2 Support for Upstream Connectivity Previously, NGINX Plus supported upstream HTTP connections using HTTP/1.0 or HTTP/1.1. HTTP/1.1 provided important performance benefits such as keepalive connections, but upstream applications still needed to communicate with NGINX using legacy HTTP protocols. NGINX Plus 37.0 introduces support for proxying and load balancing HTTP/2 traffic directly to upstream servers, allowing teams to extend HTTP/2 deeper into the application delivery path. Use the following example configuration to enable HTTP/2 connectivity to upstreams: upstream http_backend { server 127.0.0.1:8080; keepalive 16; } server { ... location /http/ { proxy_pass http://http_backend; proxy_http_version 2 proxy_set_header Connection ""; ... } } Note: The current implementation of HTTP/2 connectivity to upstreams does not support request multiplexing. Basic Authentication for HTTP CONNECT Forward Proxy NGINX Plus R36 introduced support for configuring NGINX Plus as a forward proxy using the HTTP CONNECT protocol. In 37.0, we’re building on that capability with support for Basic authentication for HTTP CONNECT requests. This gives teams another straightforward option for authenticating clients before allowing them to establish proxy tunnels through NGINX Plus. Use the following example configuration to enable Basic authentication for the HTTP CONNECT forward proxy: server { listen 3128; auth_basic "my proxy"; auth_basic_user_file conf/htpasswd; tunnel_pass; } Long-term Support (LTS) Starting with NGINX Plus 37.0, select releases will be certified as Long-Term Support (LTS) releases. For more details, read about the LTS program and what it means for support policies across LTS and non-LTS releases. The introduction of LTS releases also changes how NGINX Plus is installed, versioned, and updated. Customers should follow the NGINX Plus installation guide carefully and note the updated versioning scheme. Going forward, non-LTS releases will no longer increment the main NGINX Plus release number. For example, the non-LTS release following 37.0 will be versioned as 37.1, while the next LTS release will be versioned as R38.0. Module and package versions will follow the same scheme. For example, NGINX Plus R36 used a package version such as nginx-plus 36-1~noble; starting with 37.0, the package version format changes to nginx-plus 37.0.0-1~noble. NGINX Plus package repository paths are also changing. To pin repositories to the 37.0 release, use /plus/R37.0/... in the repository URI instead of the previous /plus/R36/... format. In some contexts, releases may be referenced by their full version names. For example, the official designator for the 37.0 release is PLS.37.0.0.1, with the final number representing the initial package release. More information can be found about the meaning of version number in the LTS program blog post. Important: If you intend to stay on the LTS upgrade path, we recommend pinning to the /plus/LTS URI. This helps ensure that future upgrades remain on LTS releases and do not unintentionally move your deployment to a non-LTS release. Additional Enhancement Available in NGINX Plus 37.0 NGINX Plus 37.0 is based on the NGINX 1.29.8 mainline release and inherits all functional changes, features, and bug fixes made since NGINX Plus R36 was released (which was based on the 1.29.3 mainline release). For the full list of new changes, features, bug fixes, and workarounds inherited from recent releases, see the NGINX changelog . Changes to Platform Support Added Platforms Support for the following platforms has been added: Alpine Linux 3.23 FreeBSD 15 Ubuntu 26.04 Note: Ubuntu is ending support for older variants of amd64 HW on Ubuntu 26.04. See more: https://documentation.ubuntu.com/project/how-ubuntu-is-made/concepts/supported-architectures/#architecture-variants NGINX Plus will be supported on amd64v3. Support on older hardware variants can be achieved by running older Ubuntu LTS releases (e.g. Ubuntu 24.04 LTS). Removed Platforms Support for the following platforms has been removed: Alpine Linux 3.20 – Reached End of Support on April 1st, 2026 Deprecated Platforms Support for the following platforms will be removed in a future release: Alpine Linux 3.21 Amazon Linux 2 FreeBSD 13 F5 NGINX in F5’s Application Delivery & Security Platform NGINX One is part of F5’s Application Delivery & Security Platform. It helps organizations deliver, improve, and secure new applications and APIs. This platform is a unified solution designed to ensure reliable performance, robust security, and seamless scalability for applications deployed across cloud, hybrid, and edge architectures. NGINX One is the all-in-one, subscription-based package that unifies all of NGINX’s capabilities. NGINX One brings together the features of NGINX Plus, F5 NGINX App Protect, and NGINX Kubernetes and management solutions into a single, easy-to-consume package. NGINX Plus, a key component of NGINX One, adds features to open-source NGINX that are designed for enterprise-grade performance, scalability, and security. Follow this guide for more information on installing and deploying NGINX Plus 37.0 or NGINX Open Source.228Views1like0CommentsBIG-IP Cloud-Native Network Functions 2.3: What’s New in CNF and BNK
Introduction F5 BIG-IP continues to advance BIG-IP Next for Kubernetes (BNK) and Cloud-Native Network Functions (CNFs) to meet the growing demands of service providers and modern application environments. F5 provides the full stack required to make cloud-native networking work in a service provider environment. CNFs alone are not enough; you need functions, control, infrastructure, and observability, working together as one system. What is new in BIG-IP Cloud-Native Edition 2.3 for BNK and CNF? Release 2.3 adds MPLS provider edge support (early access), native UDP/TCP load balancing, DPU-accelerated data plane offload on NVIDIA BlueField, and subscriber-aware Policy Enforcement Manager (PEM) with Gx interface integration. It also introduces VRF-aware AFM policies, GSLB with Sync Groups for multi-region deployments, BBRv2 congestion control, and crash diagnostics that operate without host-level Kubernetes access. This release targets service providers and telecom operators who need cloud-native networking without sacrificing the protocol support and policy control of traditional infrastructure. Cloud-Native Network Functions (CNFs): What Changed in 2.3? In release 2.3, CNF capabilities focus on strengthening the underlying network functions required for service provider deployments. How does BIG-IP CNF 2.3 handle crash diagnostics in restricted Kubernetes clusters? Operating CNFs in production environments requires strong observability, even in restricted clusters. Release 2.3 introduces improvements to the crash agent that allow core files to be collected directly from pods without requiring host-level access. This enables deployments in more secure Kubernetes environments and simplifies troubleshooting when issues occur. How does BIG-IP CNF 2.3 operate in multi-tenant and multi-VRF environments? Multi-tenant environments demand precise control over traffic behavior. Release 2.3, Advanced Firewall Manager (AFM), introduces VRF-aware ACL and NAT policies, allowing operators to apply firewall and translation rules within specific routing contexts. This enables better segmentation and supports overlapping address spaces while maintaining consistent policy enforcement. It aligns CNFs more closely with how service provider networks are designed and operated. Can BIG-IP CNF 2.3 operate at edge? One of the most significant additions to this release is MPLS support within CNFs. This is currently an early-access feature and is expected to reach general availability in a future release. CNFs can now operate as provider edge nodes, supporting label-based forwarding and applying policies based on MPLS labels. This allows service providers to extend existing MPLS architectures into Kubernetes environments without requiring major redesigns. It also provides a path for replacing legacy systems with cloud-native alternatives while maintaining familiar networking constructs. UDP and TCP Application Load Balancing Release 2.3 introduces UDP and TCP application load balancing, expanding support beyond HTTP-based traffic management. This capability enables CNFs to handle a broader range of applications and telco protocols, including workloads that rely on Layer 4 traffic patterns. Traffic can be balanced across services both inside and outside the Kubernetes cluster, which is critical for hybrid deployments and incremental modernization efforts. This enhancement is especially important for service providers and large enterprises that operate in mixed environments. They allow existing applications to continue functioning while new cloud-native components are introduced, without requiring immediate architectural changes. Subscriber-Aware Policy Enforcement Subscriber awareness remains a core requirement for service providers. Release 2.3 enhances Policy Enforcement Manager (PEM) with GX interface integration, enabling real-time policy enforcement based on subscriber data. This allows traffic to be classified and controlled dynamically, supporting use cases such as QoS enforcement, traffic shaping, and content filtering. It also enables compliance with regulatory requirements and opens new opportunities for service differentiation. Improved Observability and Aggregated Insights As CNFs scale, visibility becomes more complex. Earlier approaches relied on per-pod metrics, which made it difficult to build a unified view of the system. Release 2.3 enhances PEM by introducing aggregation through TODA, allowing statistics and session data to be collected and presented as a single entity. Enhancements to MRFDB and PEM reporting further improve visibility into subscriber sessions and traffic behavior, giving operators a more complete and centralized view of network activity. Building on this foundation, release 2.3 expands PEM capabilities with subscriber-aware policy enforcement. By integrating external policy systems and classification services, CNFs can now correlate traffic with subscriber identity and apply policies dynamically. This provides deeper insight into how individual subscribers and applications behave on the network, enabling more precise control and improved operational awareness. Additional DNS visibility enhancements, such as adding dig support into netkvest, further strengthen troubleshooting capabilities. By enabling more detailed DNS query inspection and response analysis, operators can quickly diagnose resolution issues and better understand traffic patterns tied to application behavior. Together, these enhancements move CNFs beyond basic monitoring. They provide a richer, more contextual understanding of traffic, subscribers, and services, which simplifies operations and enables faster troubleshooting in large-scale environments. DNS and Traffic Behavior Enhancements Release 2.3 includes improvements that address real-world network behavior, particularly in how DNS and transport protocols operate at scale. One example is the handling of DNS requests during certain scenarios. Instead of silently dropping traffic, CNFs can now return NXDOMAIN responses, preventing upstream systems from interpreting the lack of response as a service failure. This improves reliability and ensures better interoperability with external DNS resolvers in distributed environments. In addition, support for BBRv2 congestion control improves TCP performance in challenging conditions. It provides better fairness across flows and adapts more effectively to latency and packet loss, improving overall user experience in mobile and distributed networks. Extending to Multi-Region Traffic Management Release 2.3 continues to expand DNS capabilities with early access support for Global Server Load Balancing, enabling traffic distribution across multiple locations such as data centers and cloud environments. This represents an important step toward multi-region and hybrid architectures, where applications are no longer tied to a single cluster or deployment location. Building on this, the introduction of GSLB Sync Groups improves how configurations are managed across distributed deployments. Within a sync group, one instance is designated as the sync agent and is responsible for propagating configuration changes to other members. This approach ensures consistency across environments while preventing conflicting updates and reducing the risk of synchronization issues. Release 2.3 also begins to introduce more intelligent traffic steering with topology-based load balancing. This capability allows traffic to be directed based on user-defined parameters such as location or network proximity. As a result, operators can optimize application delivery by sending users to the most appropriate endpoint, improving latency and overall service quality. Together, these enhancements move CNFs closer to providing a fully cloud-native, globally distributed traffic management solution that aligns with modern application deployment patterns. BNK on NVIDIA BlueField DPU: What Performance Does Hardware Offload Deliver? As Kubernetes environments scale, the limitations of CPU-based packet processing become more visible. Networking workloads compete directly with applications for resources, which can impact both performance and cost. Release 2.3 continues the expansion of BNK on NVIDIA BlueField DPUs, allowing key data plane functions to be offloaded from the host CPU. This change improves throughput and reduces latency while freeing compute resources for applications that generate business value. The benefit is not just raw performance. It also brings predictability. With networking and security processing handled on the DPU, operators can achieve more consistent performance across distributed environments. This is especially important for AI infrastructure and high-throughput telco deployments, where even small inefficiencies can scale into significant costs. From an operational perspective, this also simplifies infrastructure design. Separating application workloads from networking functions reduces contention and allows for more efficient scaling strategies. CNFs begin to behave less like shared software components and more like purpose-built networking systems, while still retaining the flexibility of Kubernetes. BNK for Telco and Modern Applications Modern environments rarely consist of purely cloud-native applications. Most organizations are running a mix of legacy protocols, telco workloads, and newer microservices. Release 2.3 addresses this reality directly. Can BNK 2.3 load balance non-HTTP protocols? One of the most important additions to this release is TCP and UDP load balancing. This extends BNK beyond HTTP-based traffic management and enables support for telco protocols and other non-HTTP workloads. It also allows traffic to be balanced both inside and outside the Kubernetes cluster, which is critical for hybrid architectures and phased migrations This capability reflects a broader shift in BNK. It is no longer just an ingress layer. It is evolving into a unified traffic management platform that can handle diverse protocols and application types without forcing architectural changes. For service providers, this means they can modernize incrementally. Existing applications can continue to operate while new components are introduced in Kubernetes. For enterprise environments, it provides a consistent way to manage traffic across distributed services without introducing additional tools or complexity. Frequently asked questions These questions represent the most common queries architects and operators ask when evaluating BIG-IP Cloud-Native Edition 2.3. Q: What is new in BIG-IP Cloud-Native Edition 2.3? A: BIG-IP CNF 2.3 adds MPLS provider edge support (early access), native UDP/TCP load balancing, DPU-accelerated data plane offload on NVIDIA BlueField, subscriber-aware PEM with Gx integration, VRF-aware AFM policies, GSLB with Sync Groups, congestion control, and crash diagnostics that operate without host-level Kubernetes access. Q: What CPU savings does BNK on NVIDIA BlueField DPU deliver? A: Validated testing (Tolly Report #226104, February 2026) showed approximately 80% host CPU reduction, 40% more output tokens per second versus HAProxy on Llama 70B, and 61% faster time to first token (TTFT). These results reflect BNK offloading data plane processing from the host CPU to the BlueField DPU, freeing the host compute for application workloads. Q: Does BIG-IP CNF 2.3 support protocols beyond HTTP and HTTPS? A: Yes. Release 2.3 adds native UDP and TCP load balancing to both CNFs and BNK, extending traffic management beyond HTTP. This supports telco protocols such as GTP-U, Diameter, and RADIUS, with the ability to balance traffic across services inside and outside the Kubernetes cluster.49Views1like0CommentsEnhancing AI Data Pipelines with BIG-IP v21: Discover S3 Integration
F5 BIG-IP v21 revolutionizes AI data pipelines with advanced support for S3-compatible object storage, enabling enterprises to optimize, secure, and scale AI and analytics workflows seamlessly. By introducing S3-tuned traffic profiles, intelligent load balancing, and robust health monitoring, BIG-IP ensures predictable performance, resiliency, and protection against protocol-specific threats. This transformative delivery layer empowers businesses to handle complex workloads efficiently, making AI-driven innovation faster, smoother, and more reliable than ever.
82Views2likes0CommentsUse SFTP and FTP to Join Critical IT Systems to Modern Object Storage with F5 BIG-IP and MinIO AIStor
Around the world, many critical IT systems require moving data repeatedly but pre-date the rise of object storage solutions. These newer solutions largely harness the S3-compliant API. IT applications at risk of being left behind frequently use well-established file management protocols including FTP and SFTP. The cost and talent to retrofit is daunting, attempts to integrate these apps into the modern, low-cost world of object storage may not be palatable. To now, external gateway appliances might be one strategy. However, this adds hardware costs, latency, and failure points. Separate authentication systems for SFTP and S3 create fragmented security. The solution described in this article joins traditional clients to MinIO’s AIStor, which provides native FTP and SFTP control planes and not just S3 object access. Traffic robustness is accentuated by F5 BIG-IP, which allows loose coupling between IT client systems and the back-end MinIO storage nodes. File Management Protocols – Not Going Anywhere The File Transfer Protocol (FTP) was first codified in RFC 114 in April of 1971; and it’s still very much in use today. Frequently, as security awareness in the industry rose, the TLS-based companion protocol File Transfer Protocol Secure (FTPS) gained prominence. Both continue to be used today, one contentious issue is the use of multiple TCP ports during sessions, as well as the required discipline to maintain valid X.509 certificates for authentication in FTPS conversations. Meanwhile, Secure Shell File Transfer Protocol (SFTP) concurrently arose, and benefits from being a simpler, single TCP port solution with authentication frequently relying on easier, pre-created key exchanges. One essential item to keep in mind from the start, SFTP transfers its data over Secure Shell (SSH) version 2, making it distinct from TLS-carried protocols such as HTTPS, SMTPS, DNS over TLS (DoT) and the aforementioned FTPS. To support the vast investment in these traditional file moving protocols, MinIO has developed a server side offering for them. When traditional BIG-IP load balancing is introduced, such as in this KB article and companion how-to video that discusses load balancing SFTP, we achieve the desirable decoupling of clients from individual AIStor nodes. By interacting with a BIG-IP virtual server, traffic can be load balanced and the failure or taking off-line of one node will not stop the upload or download of files. If one MinIO node becomes a hot spot of activity, a new load can proportionally task other less-utilized nodes. Lab Validation with BIG-IP and AIStor The following diagram depicts the environment used for investigating this union of traditional file transfer protocols and modern object storage. Of the possible legacy file management protocols, why was SFTP double-clicked upon? A number of reasons, including the fact SFTP is downright young compared to FTP, with an IETF specification dating back to only 1997. More importantly, although numbers may be hard to come by, all indications are SFTP usage will remain steady and vital for years to come. The principal reasons for SFTP to be used in IT to this day include: Compliance Requirements: SFTP is essential for meeting regulatory frameworks like GDPR and HIPAA, in conjunction with providing a reliable audit trail. SFT is heavily used for automated, scheduled batch workflows, this includes importing/exporting of data to partners in B2B data exchanges. The growth of big data has pushed the value added by external Extract, Transform, Load (ETL) vendors, with nightly data movements often being SFTP-based. The lack of firewall complexity, with a single well-known tcp port, such as port 22, often being the only “allow” rule required. The ETL space in particular is significant, with some estimates placing the dollar value around this technology at over US $10 billion in 2026, with a doubling predicted by 2031. Configure AIStor and BIG-IP for SFTP Traffic An existing AIStor node cluster is easily adjusted to support protocols such as SFTP, FTP, and FTPS. Generally, AIStor nodes are automatically started with Linux’s systemctl to run the MinIO offering at each startup. For quick lab testing, though, one may simply start AIStor interactively from the command line. In the case of adding SFTP support, we merely add the highlighted flags to the startup. #minio server /data/disk1/minio --console-address ":9001" --sftp="address=:8022" --sftp="ssh-private-key=./ca_user_key" --sftp="trusted-user-ca-key=./ca_user_key.pub" The initial command portions are standard fare, in this simple lab case of single drive nodes; we point to the disk at /data/disk1/minio and per common practice, run the AIStor GUI on TCP port 9001. By default, S3 API calls will utilize port 9000. The SFTP additions, presented in yellow above, tell AIStor to accept SFTP control plane commands, things like “get”, “put”, “ls” and “cd”, on TCP port 8022. The only new ground for some may be the SSH key referenced, however MinIO has documented an easy-to-follow guide on creating these towards the latter part of this linked page in the standard documentation. My first thought would be the unpleasant possibility of an administrative workload here, frequently SSH-key based authentication means the loading of each potential user’s public key into an “authorized_keys” file on each server node. In reality, the delivered solution is more elegant and much simpler to maintain. Three keys will be created: Public key file for the trusted certificate authority (you create this certificate authority, one single run of #ssh key-gen). Public key file for the AIStor Server, minted and signed by the trusted certificate authority. Public key file for the user, minted and signed by the trusted certificate authority for the client connecting by SFTP and located in the user’s .ssh folder (or equivalent for their operating system). In my lab setup, which uses 2 AIStor nodes to allow for load balancing, I started by creating a user in the AIStor GUI. The user was simply named “miniouser123”. As such, the ssh miniouser123.pub key creation for step 3 would look like the following: ssh-keygen -s ~/.ssh/ca_user_key -I miniouser123 -n miniouser123 -V +90d -z 1 miniouser123.pub The net result is a CA-signed public key, or in other words, an SSH certificate, that allows AIStor nodes to trust the miniouser123 public key when provided upon SFTP connection. The -V flag indicates the public key will be trusted for 90 days and the -z option sets a serial number to 1. This signing of the user’s public key has a series of security benefits, such as (i) the enforcement of an expiration timeframe, (ii) the ability to enact a KRL (Key Revocation Lists, analogous to the use of CRL with X.509 certificates) and finally (iii) the fact that principals, including the username, can be embedded in the public key. Once a lab, including integration with BIG-IP, is completed, it is likely better to move from invoking the AIStor come the command line (eg #minio server /data/disk1 plus your flags) to an automatic startup with Linux systemctl options. In this case, the approach is to embed the flags specifically needed for file management protocols like SFTP or FTP, into the /etc/default/minio file. Here is a sample for a two node (10.150.91.190 and .192), single drive lab setup: MINIO_VOLUMES="http://10.150.91.{190...191}:9000/data/disk1/minio" MINIO_LICENSE="/opt/minio/minio.license" ## Use if you want to run MinIO on a custom port. ## add --address and --console-address to MINIO_OPTS: # MINIO_OPTS="--address :9000 --console-address :9001 [OTHER_PARAMS]" MINIO_OPTS=' --sftp="address=:8022" --sftp="ssh-private-key=/sshkeys/ca_user_key" --sftp="trusted-user-ca-key=/sshkeys/ca_user_key.pub" ' Now to ensure startup with every reboot and to also start right now, we simply issue the two commands: #systemctl enable minio #systemctl start minio BIG-IP SFTP Load Balancing Setup Following the guidance of the F5 KB articles referenced earlier, the first step would be to create an SFTP health monitor. In production, the more advanced monitor, that aims to successfully connect to each AIStor with SFTP commands, every 15 seconds, might be best practice. In a lab setup, the monitor to establish a half-open TCP connection on the desired TCP port 8022 is sufficient (double-click to enlarge image). We now simply add our AIStor cluster members, in our case on port 8022 for SFTP. Concurrently, the BIG-IP can support other protocols including FTP and, of course, S3 access too. From the BIG-IP GUI, simply select Local Traffic -> Pools -> Pool List and the “Create” button. The only settings are to tie the pool to your SFTP monitor and select the pool AIStor members, as shown in the next image. Note the load balancing default method will be “Least Connections” to even out individual SFTP active loads on each AIStor node. We will see in the virtual server setup that good practice is normally to allow persistence based upon source IP addresses. As such, when new transactions arrive from a previously serviced client; the solution will prefer to engage the same storage node, if healthy. The virtual server setup for SFTP is largely just like a web-oriented virtual server, although we would not gain the same insights from using a “standard” mode virtual server and prefer to use a “performance” mode instance. This is due to the fact that web technologies over TLS, like HTTPS browsing or S3-compatible API commands which harness HTTPS, allow for TLS interception at the proxy. This opens up use cases like iRules HTTP header rewrites or content scanning, to name just two. Since SFTP is using SSH not TLS for encryption, the produced traffic is not aligned with in-flight interception for decryption and re-encryption. The first key benefits of BIG-IP will be in hot spot avoidance, where a busy AIStor can be shielded by spreading traffic to less busy nodes, and the ability to loosely couple clients to the service. This is to say, IT systems using SFTP (or FTP/FTPS) can be configured to use the virtual server IP or FQDN as an endpoint and an AIStor node may be taken offline, such as during maintenance windows, completely unbeknownst to clients. Other significant benefits of BIG-IP lie with performance. The settings for a virtual server of type “Performance (layer 4)” are highlighted in red, and the settings for virtual server IP address and TCP port are yellow highlighted. The Protocol Profile has been set to “fastL4”, one of F5’s most performant profiles. The following KB article details the characteristics of the fastL4 profile, all generally steered towards peak data delivery rates. One of the principal features for BIG-IP hardware platforms that contain the ePVA chip: the systems make flow acceleration decisions in software and then offload eligible flows to the ePVA chip for acceleration. For platforms that do not contain ePVA chips, the systems perform acceleration actions in software. Finally, we request client source IP address persistence. A given client’s traffic will be directed to the same backend node if it has been active in the past. If the node is out of service, due to a fault or perhaps maintenance for upgrades, another node will be used. The first time a client is seen, the pool’s load balancing algorithm will come into play, in this case “Least Connections” will guide the initial node selected. Lab Testing of SFTP Load Balancing to AIStor Storage Servers Popular operating systems like Ubuntu or Windows-11 will offer a sftp client directly from the command line. Alternatives include simple applications like WinSCP (Windows), CyberDuck (Mac/Windows) and FileZilla (cross platform). Of course, in enterprise networks, the key driver for SFTP support will be existing IT systems that use SFTP through automation to move files, completely removed from human involvement. Using Ubuntu, a test of the AIStor SFTP solution through BIG-IP, including interactive perusal of the objects was conducted. #sftp -i ./miniouser123 -oPort=8022 miniouser123@10.150.92.189 Although in S3 parlance, the AIStor system is made up of buckets and objects, buckets will appear as the traditional and very familiar “folder” to interactive SFTP users, and objects seen as files to be retrieved or uploaded. Nothing really changes, familiar commands like ls, cd and get as examples are fully supported. Here is an example of a simple login and retrieve sequence. Notice how a password-based login is not required since our CA-signed public key is provided by the user. Easy stuff for we humans. # sftp -i ./miniouser123 -oPort=8022 miniouser123@10.150.92.189 Connected to 10.150.92.189. sftp> ls bucket001 sftp> cd bucket001 sftp> ls file001.txt file002.txt file003.txt file004.txt fileap15.txt sftp> get file001.txt Fetching /bucket001/file001.txt to file001.txt /bucket001/file001.txt 100% 299KB 5.5MB/s 00:00 sftp> The following demonstrates that, upon first connecting to the cluster with SFTP, the client instantiates a backed TCP connection to one of the AIStor pool members, the second “current” connection reflects that another client is also active. The small amount of traffic reflects low bit rate background keep alive-type exchanges. Upon retrieving the approximately 300 kilobyte file, an e-book, the counters are updated as expected. The outbound traffic, from the perspective of the AIStor node, is noted to be 2.4 million bits, or, dividing by eight, 300 kilobytes. We never said there would be no math. To simulate forcing the BIG-IP to seamlessly switch usage from the currently active back-end node to the AIStor .191 node, we can use the “Force Offline” feature. In highly consumptive TCP-based protocols, such as web browser traffic, where a single page display might drive 8 to 12 short-lived TCP connections to a given origin server, the force offline feature will allow established connections to finish but will preclude new connections being set up to the node. In the case of SFTP, which for interactive human-driven sessions, may see one connection stay up for hours or days until closed, even the offline node will maintain full service. To expedite our lab test, we can simply close our active SFTP client sessions and then reengage with the BIG-IP SFTP virtual server. We note that the BIG-IP has switched our SFTP client to the other AIStor. Downloading the e-book 300 kilobyte file, we see the counters agree with the first test run, just that the load balancer has ensured we are serviced by the in-service AIStor. Summary IT infrastructure and the protocols these solutions use do not arise overnight, many critical systems continue to use file management protocols like FTP, SFTP and FTPS that have permeated networking for decades. The ability to retroactively adjust applications to use object-first protocols, like S3-compliant API calls, is not going to always be trivial. Outside factors, such as data movement governance, may also lead enterprises to stay with perceived tried-and-true protocols. With MinIO’s introduction of AIStor support for the classic file moving protocols, there is a path now to tie into very large object stores where the economies of scale of larger, multi-protocol storage clusters and highly advanced data robustness features like erasure coding can merge. More data in a more resilient offering makes sense - this helps play a role in solidifying and modernizing your information lifecycle management story. Through BIG-IP traffic like SFTP was seen to make use of highly performant data delivery, including FastL4 mode. The decoupling of SFTP clients from individual storage nodes to, instead, point at a BIG-IP virtual server allows for vigorous health checking of nodes; traffic will get delivered in either direction even when any one node is off-line for something as mundane as a routine software upgrade. Through load balancing algorithms like “Least Connections” the overall load on the MinIO cluster will be optimized to transparently avoid troublesome hot spots.66Views2likes0CommentsAutomating F5 Application Delivery and Security Platform Deployments
The F5 ADSP Architecture Automation Project The F5 ADSP reduces the complexity of modern applications by integrating operations, traffic management, performance optimization, and security controls into a single platform with multiple deployment options. This series outlines practical steps anyone can take to put these ideas into practice using the F5 ADSP Architectures GitHub repo. Each article highlights different deployment examples, which can be run locally or integrated into CI/CD pipelines following DevSecOps practices. The repository is community-supported and provides reference code that can be used for demos, workshops, or as a stepping stone for your own F5 ADSP deployments. If you find any bugs or have any enhancement requests, open an issue, or better yet, contribute. The F5 Application Delivery and Security Platform (F5 ADSP) The F5 ADSP addresses four core areas: how you operate day to day, how you deploy at scale, how you secure against evolving threats, and how you deliver reliably across environments. Each comes with its own challenges, but together they define the foundation for keeping systems fast, stable, and safe. Each architecture deployment example is designed to cover at least two of the four core areas: xOps, Deployment, Delivery, and Security. This ensures the examples demonstrate how multiple components of the platform work together in practice. DevSecOps: Integrating security into the software delivery lifecycle is a necessary part of building and maintaining secure applications. This project incorporates DevSecOps practices by using supported APIs and tooling, with each use case including a GitHub repository containing IaC code, CI/CD integration examples, and telemetry options. Demo: Use-Case 1: F5 Distributed Cloud WAF and BIG-IP Advanced WAF Resources: F5 Application Delivery and Security Platform GitHub Repo and Automation Guide ADSP Architecture Article Series: Automating F5 Application Delivery and Security Platform Deployments (Intro) F5 Hybrid Security Architectures (Part 1 - F5's Distributed Cloud WAF and BIG-IP Advanced WAF) F5 Hybrid Security Architectures (Part 2 - F5's Distributed Cloud WAF and NGINX App Protect WAF) F5 Hybrid Security Architectures (Part 3 - F5 XC API Protection and NGINX Ingress Controller) F5 Hybrid Security Architectures (Part 4 - F5 XC BOT and DDoS Defense and BIG-IP Advanced WAF) F5 Hybrid Security Architectures (Part 5 - F5 XC, BIG-IP APM, CIS, and NGINX Ingress Controller) Minimizing Security Complexity: Managing Distributed WAF Policies
620Views3likes0CommentsYou Don't Have to Have Played to Understand the Game
Andy Reid barely played football. He was a community college tackle who transferred to BYU and then rode the bench for most of his time in Provo. Teammates remember him as the guy in the film room, not the guy on the field. He spent his Saturdays watching, taking notes, and pestering head coach LaVell Edwards with so many questions about strategy that Edwards eventually told him: Kid, you should coach. That’s the origin story of three Super Bowl wins for my local Kansas City Chiefs, six appearances across the Chiefs and Eagles, and one of the winningest coaches in NFL history. Not a stud player who worked his way down to the sideline, but a guy who asked a lot of questions, kept asking them, and turned that into a career. Richard Williams had never picked up a tennis racket in his life when he saw Virginia Ruzici win a tournament on TV in 1978 and decided his daughters were going to be world champions. He taught himself the sport from books and instructional videos and then wrote and implemented a 78-page plan for coaching Venus and Serena on the public courts in Compton when they were very young. Thirty Grand Slam singles titles between them later and the “you have to have played at the highest level to coach at the highest level” theory was looking pretty thin. Nobody looks at Reid's three Super Bowl rings and says “yeah, but did you really understand it without playing in the league?” Nobody tells Richard Williams his daughters’ Grand Slam titles have an asterisk because he learned the game from a VHS tape. We accept, in sports, that there’s more than one way to know a thing. Somehow that grace evaporates the second AI enters the conversation. Doing isn't understanding There’s a flavor of pushback on AI use that goes something like: “you have to do it manually first to really understand it.” Sometimes that’s gatekeeping in a wise-elder’s costume. Sometimes it’s a genuine concern. An experienced person who built their intuition the hard way, watching newer folks skip the grind, and worrying (not unreasonably) that the intuition won’t form. But “doing it manually” and “understanding it” aren’t the same thing. They overlap, but they’re not the same thing. You can grind through a problem manually for years and still not understand the system around it. And you can understand a system deeply without having implemented every piece of it yourself, if you’re willing to ask enough questions. The questioning is the work Here’s the part I think people miss when they’re worried about AI making us dumber: A lot of what an expert does for you, when you’re lucky enough to have one, is answer questions patiently. Over and over. Sometimes the same question is phrased three different ways because you didn’t quite get it the first time. Sometimes a dumb question that you’d be embarrassed to ask on a Slack channel. Good mentors don’t get tired of this. But there are very few good mentors. They’re busy, and you only get so many of them in a career. I’ve been at this for thirty years now and I can count the great mentors I’ve had on one hand. LLMs don’t get tired. They don’t sigh. They don’t make you feel stupid for asking why something works the way it does for the fourth time. And the act of formulating the question, asking "what exactly am I confused about?" and "what do I need to know to clear the fog?” That’s a huge chunk of where understanding actually comes from. The model is a sparring partner for your own thinking, if you let it be one. Use it as a vending machine and you’ll get exactly that: answers, not understanding. The tragic version of LLM use is the one where someone pastes the problem, takes the answer, ships it, and walks away no smarter than they started. Then does it again the next day. And the next. Building a career out of outputs they couldn’t reproduce or defend if you took the tool away. That's the version the skeptics are right about. It just isn’t the only version. Andy Reid didn’t need to have been a pro-bowl tackle to understand offensive football. He needed to watch carefully, ask the right questions, and think rigorously about what he was seeing. Richard Williams didn’t need to have been on tour. He needed books, tapes, and the willingness to do the homework. Playing at the highest level is one path to understanding. But for systemic thinking, tactical thinking, architectural thinking, it might not always be the best one. Two things I learned this week First: I’m working on a side project where the FastAPI Cloud backend runs as a two-instance replica deployment. I started on SQLite, which worked fine until I realized writes were landing in whichever instance happened to handle the request, leaving me with two file-based databases with immediate data drift. I moved to a serverless Postgres database (Neon) to give both instances a single source of truth, and once I was there, realized I could just point dev and prod at the same data. Yes, in a real production system this is an anti-pattern and I’d never recommend it. But for a small project where I’m iterating fast and the bottleneck is my own understanding of the problem, not having to migrate data back and forth every time I want to test a frontend change or hunt down a bug? Game changer. I got there by talking the tradeoffs through with my good friend Claude. What breaks, when it breaks, what the actual risk surface looks like at my scale. Nobody handed me a "here's when to break the rule" tutorial. I asked questions until I understood the rule well enough to break it on purpose. Second: I'm building an on-box tool for BIG-IP (article coming soon), and I hit the HA problem. How do I keep state synced across boxes? My first instinct was file-based storage on the host, which, it turns out, is exactly where AS3 and SSL Orchestrator started. SSLO went a step further and built a dedicated sync layer called gossip to keep those files coordinated across the cluster. Over time, both products converged on a different approach: data-groups for metadata and iFiles for larger payloads, both of which ride along with standard config sync. That's a much smaller surface area to maintain, and it leans on infrastructure the platform already guarantees. So I'm following the same path: metadata in data-groups, data blobs in iFiles. I figured this out by interrogating Claude about how those products were architected, why they made the choices they did, and what the failure modes were. I could have read the source, and I could have tried to track down the developers and architects (and I should have over dinner to get the inside scoop). But the speed of “ask, get an answer, ask the next question, get an answer” let me sketch the whole design space in an afternoon. That's not skipping the understanding. That’s building it. Get off whose lawn? I get the resistance. Some of it is "get off my lawn." Some of it is genuine expertise feeling devalued. Some of it is real fear about what this technology means for the people who come up behind us. None of those concerns are stupid. The people who built their understanding the hard way, by tinkering, by breaking things, by reading source code under duress at 2am because there was no other way to get the system back online? They are not wrong about the value of that path. They earned something real in all that trial by fire. Some of them are the best engineers I know. The intuition that comes from years of manual struggle is a kind of literacy that doesn’t have a shortcut, and the people who have it are the ones I most want in the room when something goes sideways. But I’d push back on the specific claim that you must do every step manually to understand the thing. You don’t. Engage with it seriously. Ask real questions and chase the answers until they hold together. Be willing to be wrong. Notice when you’re wrong, and update accordingly. Used well, an LLM doesn’t dull that loop. It tightens it. The design decision, the tradeoff, the bet, I’m getting to that part of the problem sooner than I would have otherwise. Reid had Edwards. Williams had the library. The skeptics aren’t wrong that some understanding only comes from doing. They’re wrong that this is one of them.115Views1like1CommentF5 Distributed Cloud – Why You Should Never Block Regional Edge IPs on Your Firewall
Introduction A common mistake when onboarding a public-facing application onto F5 Distributed Cloud (XC) is to restrict which source IP addresses can reach the origin server. Network and security teams, following a traditional “deny all / allow what you need” approach, sometimes allow only a handful of F5 XC Regional Edge IPs through their firewall — or worse, block RE IPs entirely because they see unfamiliar traffic hitting the origin from IP ranges they don’t recognize. This article explains why this is fundamentally incompatible with how F5 Distributed Cloud works, and what the consequences are. Understanding Distributed Architecture When you expose an application through F5 Distributed Cloud, the platform advertises your application’s FQDN via an Anycast IP address across all Regional Edges worldwide. As of the latest updates, this means your application is reachable through multiple REs across the Americas, Europe, and Asia-Pacific. Each RE acts as an independent proxy and point of presence. End users are routed to the closest RE based on BGP peering and network proximity. This is the core of F5 XC’s distributed model — there is no single centralized proxy. How Health Checks Work: Each RE Monitors Independently This is the critical point that is often misunderstood. When you configure a Health Check and an Origin Pool with your application’s public IP, every Regional Edge independently performs its own health check against your origin server. Each RE uses its own local internet breakout to reach your application — health check traffic does not traverse the F5 Global Network. This means: If you have an origin server with a public IP, and your Origin Pool is configured with “Public IP” (the default), then all REs will send health-check probes to your origin. Each RE maintains its own independent view of your origin’s health status. On the F5 XC console, you will see the same origin IP listed multiple times — once per RE — each with its own health status. The source IPs of these health checks come from the RE subnet ranges published in the official F5 documentation: F5 Distributed Cloud IP Address and Domain Reference. What Happens When You Block Some RE IPs Suppose you allow only a few RE IP ranges (for example, only European REs) but block the rest. Here is what happens: REs whose IPs are allowed will successfully complete health checks, and your origin will appear as UP from those locations. REs whose IPs are blocked will see health check failures, and your origin will be marked as DOWN from those locations. The immediate and most visible consequence is on the F5 XC console itself. Because a majority of REs report the origin as DOWN, the console will display a degraded application health status — showing poor availability and performance metrics. This gives a misleading picture of your application’s actual state: your origin is perfectly healthy, but the console reflects a largely unhealthy deployment simply because most REs cannot reach it through the firewall. This can trigger unnecessary troubleshooting, false alerts, and erode confidence in the platform’s monitoring data. Now, when an end user connects through a blocked RE (for example, a user in Asia hitting a Singapore RE), the platform behavior depends on the Endpoint Selection policy configured in your Origin Pool: Endpoint Selection Policy Behavior When Local RE Shows Origin as DOWN Local Endpoints Only Traffic is dropped. The user gets an error. No fallback. Local Endpoints Preferred (default) Traffic is forwarded via the F5 Global Network to a RE that has the origin marked as UP. This adds some latency. All Endpoints Same as Local Preferred — traffic is rerouted to a healthy RE over the Global Network. This can add major latency if the responding RE is far away from the origin. In the Local Endpoints Only case, users connecting through blocked REs will experience a complete outage for your application — even though the origin is healthy and reachable. In the Local Preferred or All Endpoints cases, the platform will attempt to reroute traffic through the F5 Global Network to a RE that has a healthy view of the origin. While the application will still be reachable, this introduces several problems: Increased latency: Traffic must travel from the ingress RE to a remote egress RE over the internal F5XC fabric before reaching your origin, instead of egressing locally to the internet. Suboptimal routing: A user in Tokyo may end up having their traffic routed through Paris because only European REs can reach the origin — defeating the purpose of a globally distributed edge. Reduced resilience: You’ve effectively reduced the number of egress points that can serve traffic, creating bottlenecks and potential single points of failure. The Correct Default Approach: Allowlist All RE IP Ranges The F5 official documentation is clear on this point: you should allowlist all F5 Distributed Cloud RE subnet ranges on your origin firewall. The published IP ranges are organized by region (Americas, Europe, Asia) and are available on the official F5 Distributed Cloud documentation page. Ideally, your origin firewall should be configured to only allow the F5 Distributed Cloud subnets for your application’s listening port. This ensures that: All RE health checks succeed, giving the platform an accurate and complete view of your origin’s health. Traffic egresses locally from the closest RE, providing the lowest latency path to your users. Only traffic routed through F5 XC can reach your origin, preventing attackers from bypassing the F5 XC security stack (WAAP, DDoS, Bot Protection, etc.) by hitting the origin directly. What If You Want to Limit Which REs Perform Health Checks? If you have a legitimate reason to reduce the number of REs performing health checks (for example, to reduce health check traffic on the origin or because your application is regionally scoped), F5 XC provides a built-in mechanism for this. Instead of using “Public IP” in the Origin Pool member configuration, select “IP Address of Origin Server on Given Sites” and then assign a Virtual Site that includes only the REs you want. For example, you could create a Virtual Site that includes only European REs, reducing your health check sources from all worldwide REs down to just the ones in that region. Conclusion F5 Distributed Cloud is architected as a fully distributed system. Health monitoring is not performed from a central location — it is performed independently by every Regional Edge. This design is what enables the platform to provide low-latency, resilient application delivery worldwide. Blocking RE IPs on your origin firewall fundamentally breaks this distributed health monitoring model. It causes health checks to fail, triggers suboptimal traffic routing, and potentially increases latency. The correct and recommended approach is to allowlist all F5 Distributed Cloud RE IP ranges on your origin firewall, and use the platform’s built-in Virtual Site mechanism if you need to control which REs perform health checks.120Views2likes1CommentSingle-click CDN Experience for F5 Distributed Cloud Load Balancers
Fundamentals The modern CDN has evolved well beyond cache and serve. Today’s platforms are intelligent edge fabrics that combine performance optimization, layered security, multicloud routing, and even workload execution at the edge. Few products embody this evolution more completely than F5 Distributed Cloud CDN, and this post explores both why CDNs matter and what sets F5’s newest approach apart. At its core, a CDN is a globally distributed system of edge servers, called PoPs or Regional Edges (RE), that cache content and handle user requests on behalf of the server origin. When a user requests a resource, DNS resolution routes them to the nearest PoP. If the resource is cached there (a “cache hit”), it’s returned immediately. If not (a “cache miss”), the PoP fetches it from the origin, stores it, and returns it to the user. The speed improvement isn’t just perceptual. Reduced Round-Trip Time (RTT) correlates directly with business outcomes. Every page load shaved makes a difference for search rankings, checkout completion, and ad viewability all improve with lower latency. CDNs don’t just make things faster; they make digital businesses more competitive. To put the difference in concrete terms, here’s how a typical 200KB page might deliver across different scenarios. Platform deep dive Traditional CDNs optimize for one thing: getting cached bytes to users fast. Distributed Cloud CDN starts there but doesn’t stop, it's engineered as a unified platform where content delivery, application security, multicloud connectivity, and edge compute converge under a single operational surface. F5’s approach is architecturally distinct Most CDNs are standalone services that organizations integrate with separate security tools, load balancers, and observability stacks. The operational overhead of stitching these together and keeping policies consistent across them is substantial. F5 takes a different approach: CDN is one capability within the broader Distributed Cloud Platform, meaning it inherits the platform’s DNS, load balancing, WAF, observability, and multicloud networking services. The practical result, noted by enterprise users, is that WAF rules, DDoS policies, and CDN configurations all live in the same console. There’s no context switching between vendors, no policy drift between your security tool and your delivery tool, and no blind spots at the handoff between them. In the newest product update, anyone already using a Distributed Cloud Load Balancer can enable CDN acceleration with a single click: no rearchitecting, no new deployments. Built-in cacheability insights estimate performance improvement and cost savings before activation, so teams can make informed decisions without guesswork. Target use cases: Where F5 Distributed Cloud CDN fits best There are three primary use-case families for enabling an integrated CDN: Secure apps everywhere (WAAP + CDN): Organizations that need comprehensive web app and API protection with WAF, DDoS, bot defense, unified content delivery under a single policy plane and management console. Modern digital experiences: Dynamic, personalized applications spanning multiple public clouds, edge locations, and on-premises infrastructure that need consistent delivery regardless of where origin workloads live. Multicloud & edge initiatives: Enterprises migrating workloads across cloud providers or deploying edge compute who need a platform that bridges delivery, security, and service mesh without re-platforming for each environment. Visibility & Control: You can’t optimize what you can’t see F5’s Distributed Cloud Platform ships with unified observability that spans delivery performance and security posture. Real-time dashboards expose traffic patterns, cache efficiency metrics, origin health, and security event timelines, all from the same interface used to configure policies. Cache efficiency isn’t a static attribute either. Distributed Cloud CDN provides granular control over cache keys, TTL values, and path or header-based caching rules, enabling teams to optimize hit rates for specific content types and access patterns. Cacheability insights indicate which web apps are candidates for acceleration. For security operations, the edge generates rich telemetry: request rates, blocked attack types, geographic traffic distribution, and bot classification outcomes. This feeds into the same observability layer as performance data, giving teams a single pane of glass rather than separate dashboards for CDN and security. The recently announced F5 Insight capability extends this further, bringing OpenTelemetry-powered observability across BIG-IP, NGINX, and Distributed Cloud Services, consolidating performance and security intelligence across an organization’s entire F5 footprint into actionable, unified visibility. Demo Walkthrough Final thoughts A CDN is no longer an optimization. It’s table stakes for any organization serving digital experiences to a geographically distributed audience. The question isn’t whether to deploy one, but which platform best aligns with the complexity of your architecture and the ambition of your security posture. For organizations operating at the intersection of multicloud delivery, API-driven applications, and enterprise security requirements, Distributed Cloud CDN represents a compelling architectural choice: a platform that treats performance and security not as separate concerns to be stitched together, but as integrated properties of the same edge fabric. The bytes will always need to get from somewhere to your users. F5 makes that journey faster, safer, and smarter. Additional Resources Product information: https://www.f5.com/products/distributed-cloud-services/cdn Technical documentation: https://docs.cloud.f5.com/docs-v2/content-delivery-network/how-to/cdn-mgmt/conf-cache-lb Feature announcement blog: https://www.f5.com/company/blog/f5-distributed-cloud-cdn-faster-apps-one-click-enablement-lower-costs
133Views1like0Comments