ai

90 Topics

Implementing Agentic Observability and Security
In April 2026, an AI coding agent named PocketOS wiped out a company's production database and its backups in nine seconds flat, executed through a single API call. Afterward, the agent described its behavior with unsettling bluntness: it ran a destructive action it was never explicitly asked to run, choosing to guess rather than verify. The detail that should keep every engineer awake at night isn't the deletion itself. It's that the team actually had a log of the action—the cloud provider recorded the destructive API call perfectly—but they had absolutely zero record of the reasoning that sparked it. By the time the deletion hit the logs, there was nothing left to salvage. That visibility gap is what we need to solve. Autonomous agents are hitting production environments faster than security teams can vet them, and they fail in ways traditional application security simply wasn't designed to intercept. Safely running them boils down to two critical capabilities: observability (seeing what an agent is deciding to do) and guardrails (restricting what it is allowed to do). You can't skip either one. If you want to simulate this exact failure mode on your own hardware, we built a hands-on lab called agent-security-lab. It spins up a five-agent SOC incident-response team, drives it rogue under realistic conditions, and hardens it step-by-step. I'll link to it at the end, but the takeaway here is that you don't have to take these risks on faith—you can test the defenses yourself. Agents fail differently Traditional software does exactly what the source code dictates. Its behavior is locked in at build time, meaning your gateways, logs, and audit trails are built around a static assumption: they record what happened because "what happened" belongs to a pre-defined, knowable set of outcomes. An agent behaves entirely differently. Its actions are decided at runtime by an LLM interpreting whatever text it was just handed. It maps out plans, chooses tools, spins up subordinate agents, and executes highly consequential actions without a human approving the intermediate steps. You cannot know the full scope of its actions in advance because its behavior depends entirely on inputs that an attacker might control. The most dangerous failure mode is the one that looks like a successful execution. In the PocketOS database incident, the agent wasn't acting maliciously—it was just trying to be helpful. It read a prompt stating "the table is corrupt, fix it," and it chose the most absolute resolution the prompt implied. If you are only looking at a log of side effects, a helpful-but-catastrophic action and a correct one look identical until you read the actual reasoning behind them. This is why model observability isn't just a compliance line-item; it is the only mechanism you have to differentiate between an intended fix and a disaster. Observability: Capturing decisions, not just the damage For autonomous systems, true observability requires deep visibility into the agent's chain of thought. You need to see the exact prompt it received, the conclusions it drew, the specific tools it selected, the exact arguments it generated, and how it interpreted the resulting data. This goes far beyond standard application logging, which merely timestamps the damage after the fact. The most practical architecture for capturing this data is routing all model traffic through an explicit security proxy, rather than letting agents call model endpoints directly. On the F5 AI Security platform, this is handled via our Bring Your Own Agent (BYOA) architecture and surfaced as Agentic Fingerprints. This gives you a per-agent, per-run audit trail of prompts, reasoning steps, and tool choices. Crucially, the proxy captures the model's decision independently of whether the downstream action succeeds. If an agent decides to drop a table, you see that intent and the verbatim arguments it formulated before the request ever reaches your database. Figure 1. An Agentic Fingerprints session: the agent's system prompt, the input it received, and its decision to call a destructive tool with verbatim arguments — captured before the action runs. However, there is a structural boundary here that teams often miss, leading to a false sense of security. In any tool-using agent, there is a specific line where a model's decision transitions into a physical action: the moment the runtime executes the tool the model selected. The proxy handles everything on the model-facing side of that line (the request, the reasoning, the tool request, and the model's reaction to the return payload). It cannot see the tool actually run. That execution happens on an entirely separate data path—from the agent to the tool server to your identity provider, governed by its own authorization checks and logs. Because of this, complete agent observability must span two separate layers: The Model-Facing Layer: This tracks what the agent concluded, which tool it wanted, and how it handled the return data. You monitor this via F5 AI Security (BYOA / Agentic Fingerprints views). The Tool-Facing Layer: This tracks which tool calls actually executed, which were blocked, and the specific permissions carried by that credential. You monitor this through your Tool/MCP server logs and identity-provider audit trails. Relying on either layer alone leaves a blind spot. The PocketOS incident occurred because the team only had tool-facing visibility—the cloud provider caught the destructive call—but they lacked the model-facing context to see the decision forming. You need to wire both layers together and correlate their logs before putting an autonomous agent anywhere near production data. Implementation is a configuration change The good news is that adding model-facing observability doesn't require rewriting your agentic framework; it’s a simple configuration adjustment. You just need to point your agent's model client to the proxy and tag the outbound traffic with a unique session identifier. Here is how you would configure a standard OpenAI-compatible client in Python: import datetime import uuid from openai import OpenAI # Generate a unique session ID for every single run. # Reusing IDs will merge separate runs into an unreadable trail. # Recommended format: {run_id}-{agent_name}-{timestamp}-{unique_tail} session_id = "-".join([ LAB_RUN_ID, # Shared across all agents in a single execution run AGENT_NAME, # Identifies the specific agent datetime.datetime.now(datetime.timezone.utc).strftime("%Y%m%dT%H%M%SZ"), uuid.uuid4().hex[:6] ]) # Initialize the client, routing traffic through the F5 AI Security proxy client = OpenAI( api_key=CALYPSOAI_TOKEN, # Your F5 platform token, not the raw LLM key base_url=CALYPSOAI_OPENAI_API_BASE, # Redirects requests through the proxy default_headers={ # Tags the traffic to this specific agent's audit trail "x-cai-metadata-session-id": session_id, } ) The first two parameters redirect the outbound calls through the security proxy, while the third converts raw LLM traffic into an organized audit trail. By assigning unique session IDs, the platform groups a single agent's calls into a coherent narrative rather than overlapping traffic from different processes. Without the uniqueness of each call/session, it would be impossible to isolate conversations. In a multi-agent system, this setup allows you to trace an entire workflow as it crosses boundaries: each agent generates its own session ID under a shared run prefix, letting you piece together the entire decision path from end to end. With just a few lines of configuration, you get full visibility across your environments, mapping directly to the telemetry shown in our platform dashboards. Work these standards into your governance frameworks, every agent deployed should have observability baked in. Designing multi-layered guardrails Observability gives you eyes, but it doesn't stop an action mid-flight. For enforcement, you need guardrails. Experience with production failures shows that a single defensive layer will eventually crack. You need independent, decoupled boundaries where each layer catches a different class of failure and fails closed by default. A robust agent defense architecture requires five distinct checkpoints: Content Scanning (Proxy Input/Output Scanners): Blocks poisoned or injected prompts before they hit the model. This is your primary defense against completely novel prompt injection techniques. Capability Scoping (Tool Server Manifests): Restricts the agent's environment so it can only see or select the specific tools required for its role. Authorization Scope (OAuth 2.1 / Per-Call Checks): Evaluates short-lived tokens to ensure that even if an agent requests a tool, its current credential actually possesses the authorization to run it. Agent-to-Agent Control (Signed Agent Cards): Governs multi-agent environments by validating whether one agent is permitted to invoke the skills of another, enforced at the receiving end. Underlying Privilege (Downstream Target Systems): The final backstop where the actual resource (database, identity provider, mail relay) enforces strict least-privilege access on the agent's service account. When designing these layers, focus on the difference between reactive and structural defenses. An authorization scope check is reactive: the agent attempts a destructive call, and the platform intercepts and denies it mid-flight. Capability scoping, on the other hand, is structural: the dangerous tool is completely omitted from the agent's available schema definition. Because the model doesn't even know the tool exists, it can't choose it in the first place, completely removing the attack vector. You should implement both, but structural defenses are far more resilient against novel bypasses because they actively shrink what the agent is capable of doing. It's also worth noting that content scanning is the only layer capable of identifying semantic attacks that authorization policies miss. A prompt injection payload hidden inside a customer support ticket or a scraped web page will easily bypass credential checks because, from an infrastructure perspective, reading that data is perfectly valid. Input scanners inspect the text itself before it reaches the LLM to verify what the agent is being asked to do. When a scanner intercepts an attack, the F5 Logs view isolates the specific rule that triggered the block and records the offending payload—giving you a clean, shareable artifact to explain the security decision to external stakeholders without forcing them to dig through raw system telemetry. Figure 2. The Logs view after a guardrail block: the scanner that fired, its verdict, and the content that triggered it — the explanation artifact for an automated decision. Visibility and enforcement must coexist It is common to view observability and guardrails as separate boxes to check on a security review. In practice, they are two sides of the exact same coin. A guardrail without observability requires blind faith; you can't validate that it's firing correctly, you can't tune it when it blocks legitimate business traffic, and you can't explain its decisions to your team. Conversely, observability without guardrails just gives you a pristine, high-resolution recording of a security breach you did nothing to stop. The deployment order matters here. Always deploy observability first. You cannot confidently evaluate or fine-tune a security policy you can't actively watch. Establish your visibility pipelines, verify that you can accurately map the agent's logic alongside its tool executions, and then begin layering your guardrails on top—using your new telemetry trails to validate each security layer as you turn it on. Next Steps If you are currently deploying or managing autonomous agents in production, you can lock down your environment with four immediate steps: Route all LLM traffic through a dedicated proxy to establish an immutable, server-side record of agent reasoning, rather than relying on standard application logs. Enforce unique session tagging across every agent run to keep your audit trails clean and searchable. Instrument both model-facing and tool-facing logs, ensuring you have a clear correlation key between the two. Layer your enforcement mechanisms sequentially: swap out static API keys for short-lived, scoped credentials; apply tool capability scoping; enforce agent-to-agent verification; and place content scanners in front of every ingress point. The most effective way to understand these interactions is to deliberately break them in a safe sandbox environment. Our agent-security-lab recreates the PocketOS database failure against a sandboxed environment, surfaces the agent's reasoning in real-time, and guides you through configuring OAuth 2.1 scoping, MCP capability scoping, agent cards, and F5 AI Guardrails to stop the exploit. The lab is built entirely for practitioners and provides all the required source code. Note that because the exercises interface with the F5 AI Security platform, you will need access to a platform tenant—if you need to set one up for your team, just reach out to your F5 account representatives to get a lab instance provisioned. Agents are the norm now, you are either planning for agentic AI or have already deployed them, its only a matter of time and observability is the blindspot. For autonomous agent architectures, deep observability forms your foundation, and multi-layered guardrails provide the structure to scale safely. References and further reading agent-security-lab (F5GovSolutions) — hands-on multi-agent security lab: github.com/F5GovSolutions/agent-security-lab F5 AI Security platform documentation: docs.aisecurity.f5.com F5 AI Guardrails: f5.com/company/blog/what-are-ai-guardrails F5 AI Explainability / Agentic Fingerprints: f5.com/company/blog/ai-explainability F5 Labs CASI and ARS leaderboards: f5.com/company/labs/casi
Noof
Jul 02, 2026 Place Technical Articles
124Views
1like
0Comments
Protecting Your MCP Server With F5 BIG-IP Advanced WAF
As AI assistants increasingly interface directly to production databases and APIs, building secure, structured interaction layers has become critical. The Model Context Protocol (MCP) standardizes these connections and eliminates the need for custom integrations (See APIs First: Why AI Systems Are Still API Systems). However, with this comes a new class of vulnerabilities that users must protect their service from if they are considering leveraging an MCP server to handle requests to their service. These vulnerabilities have begun to be identified and classified by the OWASP Foundation as OWASP MCP vulnerabilities and they have defined the Top 10 vulnerabilities. In the below demo, we will show the new MCP Policy Template for the BIG-IP Application Security Module and how it can block an attacker from using an AI Agent to execute Command injection against an MCP server. The vulnerabilities that will be demonstrated are as follows: MCP05:2025 - Command Injection & Execution MCP08:2025 - Lack of Audit and Telemetry This article will also serve as a container for more in-depth articles that will demonstrate how to block other OWASP MCP Top 10 vulnerabilities.
jus
Jul 01, 2026 Place Technical Articles
244Views
1like
0Comments
Use MinIO AIStor in Kubernetes with BIG-IP Container Ingress Services for S3 data delivery
AIStor is a massively scalable S3 object storage solution, one which can exist outside or inside of Kubernetes (k8s) clusters. Modern workflows, including AI workflows, often gravitate to orchestrated container schemes, something to alleviate hands-on keyboard tasks. One example would be a “vanilla” k8s solution, the so-called upstream Kubernetes which reflects the original, unmodified, and open-source version of the Kubernetes codebase hosted by the Cloud Native Computing Foundation (CNCF). Popular as well are “opinionated” Kubernetes, solutions like RedHat OpenShift Container Platform (OCP) which can integrate enterprise-grade tools and can often install with thought-out default settings. Let’s consider a couple of examples of k8s workloads that might leverage highly performant in-the-cluster S3 object storage; the first could be an AI data delivery task. Data scientists can initiate machine learning jobs within k8s that access large, centralized datasets—often hundreds of gigabytes of image or text data—stored through S3. During AI training, compute pods read and process this data, then write the resulting model artifacts, such as checkpoints and weights, back via S3. Another example would be in the world of “Big Data” and analytics processing, chief among these examples would be Apache Spark jobs. ETL (Extract, Transform, Load) workloads running on Kubernetes consume raw log or database files from S3 storage. Processing frameworks, such as Spark clusters, transform the data and persist the output back to S3 in optimized formats like Parquet. With the need for scalable and performant S3 storage being made, the question is how to seamlessly access this “inside” Kubernetes S3 to all of the S3 application users that exist outside of Kubernetes. This is where a modern k8s ingress controller offering, such as BIG-IP Container Ingress Services (CIS), will shine. This article aims to set up a simple working Kubernetes lab environment, equipped with both AIStor inside k8s and a BIG-IP appliance, to demonstrate the ease of setup and streamlined S3 data delivery that can be offered. Build a Kubernetes lab with MinIO AIStor installed To keep aligned with a simple, quick-win philosophy, the simplest Kubernetes new deployment was conducted. This entailed installing on just a couple of hosts a modern Ubuntu operating system and then downloading the constituent components of a k8s control-plane node and worker node from the kubernetes.io website site. The CNCF fully documents this “build your own cluster” approach on their site. In production cases, a large node set, made up of multiple control-plane nodes and even more worker nodes is expected; within a learning-first oriented lab environment just two nodes were started with. In order to deploy the MinIO AIStor, a simple lab setup as depicted below was laid out. Note worker nodes in the Kubernetes cluster should have large, performant disks locally installed. This is important, when the time comes to use persistent volumes and the required persistent volume claims, to be described later, one does not want to cede performance by using an external storage solution, like a remote and shared legacy SAN as an example. Create MinIO DirectPV and assign your drives Simply put, Kubernetes is a container orchestration scheme, think of the difficulty of manually launching containers to run the entirety of a modern application. There could be many processes, leading to an eclectic set of different containers. The intelligence to launch the appropriate number of each container type, at the exact right time, and provide critical auto recovery around failed containers, forget about it, this job is best left for Kubernetes and not your hands and keyboard. In a nutshell, a container storage interface (CSI) driver is the “storage adapter” for Kubernetes. Kubernetes itself doesn’t know how to talk to storage systems, think AWS EBS, Azure Disk or NFS-based NAS appliances, as examples. The CSI driver is the plugin that translates Kubernetes requests into storage actions. Just like Windows needs a driver for a USB disk, Kubernetes needs a CSI driver to use its storage. Enough said. What will happen with a Kubernetes app, when it asks for storage for its own purposes, it will make a persistent volume claim (PVC), simple enough. It is the CSI driver which handles creating volumes and attaching volumes to nodes. DirectPV, created by MinIO, is a specialized CSI designed specifically for Direct Attached Storage (DAS). Unlike traditional block or file-oriented CSIs that, for instance, can rely on a central Storage Area Network (SAN) or Network Attached Storage (NAS), DirectPV provisions volumes directly from the physical drives attached to your Kubernetes node. This lets one ride the micro-latency, price-to-performance improvement curve over time. Locally installed solid state drives (SSD) keep getting larger, exceeding 100TB regularly, cheaper per TB, and more performant than ever. DirectPV will therefore infuse these benefits of local storage distributed directly on your nodes into the Kubernetes S3 object environment. For those looking to double click on storage basics or just want a quick refresher on three terms pertaining to allocating storage in Kubernetes, the rest of this section is for you. These are terms I had to get re-acquainted with in setting up the lab. The following infographic may be a useful summary. Think of MinIO AIStor as an application, a set of running k8s pods, that turns a set of mounted disks into a distributed S3 object store. Keep in mind that k8s never gives an app like AIStor raw disks directly. Instead, it flows like this: Physical disks (SSD/NVMe on nodes) ↓ Storage backend / CSI (e.g., DirectPV) ↓ Persistent Volumes (PV) ↓ Persistent Volume Claims (PVC) ↓ AIStor Pods (mounted as /data) ↓ AIStor object storage service (users engage the S3 API to BIG-IP virtual servers, proxied onwards towards pods) AIStor is Kubernetes-native and thus logically uses PVCs to store data. The PVC is indirectly seen in the configuration through the use of “volumes”; each MinIO pod mounts one or more PVC-backed volumes. As depicted above, PVs are foundational for those PVCs, think of PVCs “binding” to PVs which represent actual portions of storage capacity in the cluster StorageClass defines how PVs are provisioned, using local and highspeed disks in our case. Although it is unnecessary to know day to day, the specific StorageClass recommended and implemented for use with AIStor is labelled as follows: storageClassName: directpv-min-io In summary, together the StorageClass specification and the CSI driver flavor, in our case MinIO’s DirectPV, together provisions PVs. Assign disks to a solution using DirectPV and Helm DirectPV installation is a breeze as it leverages the simplicity of a modern Helm chart-based approach. A Helm chart is a packaged set of Kubernetes configuration files (YAML) that describe how to deploy an application. It includes templates and default values so you can deploy the same app consistently across environments with different settings. Think of it like an installable “app package” for Kubernetes that you can deploy, upgrade, or roll back applications with a single command. Some people like to think of Helm as similar to Yum for Linux. The DirectPV Helm install command for upstream Kubernetes can be found in the MinIO docs here. To assign local disks, as recall we want storage to be local to worker nodes, the steps are listed in the docs. Largely the process is quick, one command will discover eligible disks on each worker node: $ kubectl directpv discover Now one can initialize the disks wanted as part of the AIStor solution, it doesn’t have to be all your disk but let’s be generous as this will scale out the S3 object storage. The “init” command referenced in the docs, as with many things in Kubernetes, will generate a YAML file which is what will place our drives under the purview of directpv. MinIO AIStor will make persistent volume claims through mounting volumes, as desired the PVCs are pointing at our disks in the end. You will now see the drives listed: $ kubectl directpv list drives In our lab worker node, a couple of starter 50 GB drives have been added to begin with (double-click to enlarge): Install AIStor within your Kubernetes cluster A nice aspect of AIStor in k8s is the reliance on Operators, a time saving and complexity-reducing feature heavily employed in modern containerized applications. A Kubernetes Operator is software that extends k8s to automatically manage a very specific application and its lifecycle. Orchestration takes things even further. It uses custom resources and a controller to continuously watch the application and make sure it stays in the desired state. Specifically, for AIStor, the net result is CustomResourceDefinitions, StatefulSets, and Secrets applied into a new or existing k8s Namespace. Think of an Operator as an automated expert—essentially a “robo-admin” for your application. It manages deployment, scaling, upgrades, and recovery, handling the routine operational tasks you’d otherwise need to do manually. From the installation guide, you need to add the AIStor Helm repository before installing: $ helm repo add minio https://helm.min.io/ Now, the helm installation looks like this, the license value can be downloaded from a customer’s support (“MinIO Subnet”) landing page, or an eval license can be created in a self-server format. $ helm install aistor minio/aistor-operator \ -n aistor --create-namespace \ --set license="eyJhbGciOiJFUzM4NCIsInR..." We can now see the result through: $ kubectl get all -n aistor This shows the AIStor installation is now running, albeit we have not installed the actual S3 object storage yet, which is what applications will use. This is the administrative portion of the installation. Minio AIStor separates its Kubernetes management into three distinct deployments to follow microservice and security best practices. This is seen in the middle “deployment” list of the previous kubectl get all command output. The three components are: object-store-operator: The core control plane. It watches for declarative Kubernetes Custom Resources (like ObjectStores) and automates the creation, scaling, and lifecycle management of the actual MinIO AIStor storage clusters. object-store-webhook: A validating webhook. It validates cluster configuration during deployment and dynamically injects/manages TLS certificates for in-pod version updates and secure, node-to-node communication. adminjob-operator: Handles administrative and management tasks outside of the core data path. It processes declarative kubernetes jobs, such as running the mcli admin client. Create your first AIStor S3 object storage service With all the foundational tasks behind us, we are now ready to again use Helm to install the first AIStor S3 object service. A nice feature is that multi-tenancy is baked into the solution. Although this quick lab setup will use just one, in production many communities of interest, say individual departments within an enterprise, will receive a unique tenant and isolated S3 services. Just to recap the big picture, the AIStor object storage will be instantiated as k8s pods. As one example, Pod 1 will mount a volume /data (PVC → PV → SSD on Node A). The more nodes and disks you have the larger the solution will grow. Once mounted, MinIO pods see each volume as a “disk”, together the pods enact a resilient and distributed solution. All pods are peers; there is no chief among them. The solution will present single S3 endpoints for configured buckets, clients never see anything pertaining to PVs and PVCs, it's simplicity first. One major requirement in production environments is to add data robustness through erasure coding (EC). EC will break larger objects into data chunks, sometimes called shards, along with parity chunks, and distribute these across nodes and disks. As such, any production deployment, as an absolute minimum, will have at least four nodes or disks to automatically "turn on" MinIO erasure coding. If you have ever used Helm before, a common and useful aspect is that a “values” file can be fed in when applying the Helm chart, key variables can all be set in place at once. A default AIStor values file is provided, an administrator just removes all values except those requiring customization. It’s easy and you can exercise your delete key in this exercise. In our lab setup, the values file was whittled down to the following small file, with three key elements highlighted, small storage was specified as this is a lab investigation and not a production grade offering: The first highlighted item, the number of servers specifies how many AIStor pods will be run by our worker nodes in the cluster; we have requested three. The k8s namespace is to be “primary-object-storage”. In a production cluster, think three to twenty nodes as reasonable examples, our specifying of three would mean the AIStor pods would likely run upon three different nodes. Our lab, although only consisting of one worker node, can be easily scaled out in the future. We see the three pods running in the following kubectl command directed at our namespace (double-click to enlarge). As was seen in our minimal Helm values file, we have requested the MinIO web console (“myminio-console”) to be exposed using NodePort on TCP port 31000 and the actual AIStor S3 service itself (“myminio”) using TPC port 31001. The next section will dive into leveraging F5 BIG-IP and the Container Ingress Services (CIS) solution in order to provide S3 applications and users outside the cluster with the high-performance S3 service that AIStor is running inside the cluster. BIG-IP Container Ingress Services deployed to allow easy S3 data delivery Using the BIG-IP to expose the AIStor S3 object storage service to external users is an obvious win for IT teams. It can support exposing S3, as well as the MinIO console, through common ports like TCP 443 which are very firewall friendly. It also allows a Kubernetes service to be configured in either ClusterIP or NodePort modes. The following diagram hits on principal differences between each mode. Using ClusterIP with BIG-IP has some major advantages. The origin pool consists of the actual AIStor pods themselves, not the K8S nodes, and is automatically built without administrative tasks. This automatic pool creation is through the use of the Ingress controller pod that BIG-IP CIS introduces in the cluster. ClusterIP also offers more fidelity than NodePort, each pod can be load balanced with consideration of pod-level criteria like, perhaps, current least connections to pods as an algorithm. NodePort on the other hand also offers automatic origin pool creation, albeit differently. The nodes in the cluster, and the TCP port values that services are being exposed on, are set up automatically in the BIG-IP origin pools. The S3 consumers again benefit from reaching S3 on low port values like 443, so trying to expose high ports which run up against common firewall rules, like TCP 31000 and 31001 in our lab setup, becomes a moot point. A drawback to NodePort is that the nodes, not the actual MinIO pods, are what is load balanced. Each node receiving S3 traffic itself will load balance to the full set of AIStor pods running anywhere in the cluster. As such, an S3 transaction first arrives at a node IP address and is then proxied again to an individual pod, since internal nodes are aware of pod addresses. This additional proxy step may or may not be significant to users. As well the fact that NodePort forgoes applying a load balancing algorithm directly against pods, and instead balances against nodes, might be taken into consideration. The reason this article will focus upon demonstrating NodePort is due to simplicity, it is quicker to set up and requires less ruminations on the network aspects of k8s networking. A follow up article will walk the reader through ClusterIP mode, however a CNI plug-in will need to be decided upon. A CNI (Container Networking Interface) is the “network plumbing” of Kubernetes — the CNI makes sure pods can talk to each other. By infusing the BIG-IP with a CNI, and enabling ClusterIP mode, we can directly participate in pod networking and bonus, load balance S3 directly to MinIO AIStor pods. BIG-IP Container Ingress Services – NodePort S3 delivery The full setup sequence for BIG-IP as a load balancer into Kubernetes is documented here, including upstream Kubernetes, RedHat OpenShift and VMware Tanzu. In our lab, the step will start with cloning the k8s BIG-IP Ingress controller GitHub: $ git clone https://github.com/F5Networks/k8s-bigip-ctlr.git The last step, in a similar vein to our MinIO AIStor installation, is to use a Helm chart and a corresponding edited values file: $ helm install -f values.yaml <new-chart-name> f5-stable/f5-bigip-ctlr There are a number of approaches to harnessing F5 CIS to expose Kubernetes for a NodePort solution. Top of mind are using the k8s Ingress object, or the ConfigMap object or custom resource definitions (CRDs), all in conjunction with the F5 Ingress Controller pod now running in the cluster. For those curious about using Ingress or ConfigMap, an intuitive step-by-step lab guide is available here which may be informative to step through. One major advantage to CRDs is that they allow independent DevOps teams to focus on their own applications, there is a separate YAML file per service/application. Early adopters of CIS often first used BIG-IP AS3 configured config map objects, however that is essentially one large JSON declaration shared across many applications. The idea of edits affecting all namespaces in an enterprise’s k8s applications is less attractive. A one-page overview of migrating AS3 config map setups to an equivalent CRD style is found here. CIS does not support combination of CRDs with any of Ingress/Routes and ConfigMaps. To follow the CRD approach, the only requirement is to ensure the Helm values file has the following attribute active, not prefaced with a #. Here is an example of the Helm values file used in this lab exercise, with portions removed for brevity. The notable settings are highlighted in yellow. Custom resources are extensions of the Kubernetes API. A resource is an endpoint in the Kubernetes API that stores a collection of API objects. For example, the k8s built-in Pods resource contains a collection of Pod objects, add a custom resource and it too will have a collection of objects. The custom resources offered by the BIG-IP CIS solution are documented here, we are most interested in using (a) the virtual server and (b) TLS profile CRDs. Our objective is to offer TLS-encrypted S3 service on both sides of the BIG-IP load balancer, essentially a security first stance. The lab uses an internal DNS, as such the external clients need just know FQDNs, whether that be clients in the form of application servers with fully automated S3 requirements or perhaps human clients interactively using storage. The names used for the S3 service and AIStor GUI in this private DNS arrangement were, respectively: aistors3.lab.com aistorgui.lab.com The configuration is perhaps analogous to putting a full self-driving (FSD) car into action, presumably someday simply putting in the end state, a destination, and all steps to reaching that target would be automated. With BIG-IP CIS, we need only give two pieces of data, a TLS profile in a small YAML file and a virtual server profile, again in YAML format. The Ingress Controller pod will then connect and configure a full BIG-IP origin pool pointing at the k8s service. Beyond this, virtual servers for our two services made available to our external users will also be created. The BIG-IP GUI will automatically update itself with these live published applications, in a BIG-IP partition by default we have named “kubernetes”, all within seconds. Samples for each type of setup file can be found in this maintained Github location. Here are the two TLSProfiles used for each application: The virtual server definitions for each are similarly intuitive: Note, there is no need for the administrator to even know which NodePort values were assigned in Kubernetes, you may recall mention of ports 31000 and 31001 earlier, we simply just need to provide the k8s service names and the service (native) port of each, that’s it. Validation of S3 services through BIG-IP to Kubernetes-based AIStor The deployment of the 4 small YAML files is done with 4 consecutive kubectl commands, starting with the TLS profiles and then the virtual servers: $ kubectl apply -f file_name.yaml We can then open the BIG-IP GUI and, voila, notice that two virtual services are waiting for us, already processing S3 traffic, in the kubernetes partitions. Although we have no tasks within the BIG-IP GUI, a quick check on the S3 service virtual server is apropos, to ensure all is as expected. The key takeaway is the address is correct; this corresponds to our private DNS entry for s3.lab.local. Another point to note is this is a layer7 http profile being used. Advanced features like iRules, URL awareness and modern load balancing persistence schemes like cookie persistence are all now in play. It’s worth noting for experienced BIG-IP users, the origin pool that maps to our NodePort entries is not called out in the Resource tab as a pool. Rather, the mapping is done with iRules and local policies, all created automatically. With HTTPS/TLS underpinning S3, it’s important to use fully qualified domain names as opposed to just IP addresses that map to BIG-IP virtual servers. One can see the local policy supporting our S3 application (10.150.92.68) explicitly looks for names in its logic (double-click to enlarge). Let’s put everything to the test, first a simple login attempt from an external Windows client to the AIStor web console, at https://aistorgui.lab.com. The following demonstrates all is well, successful login and perusing the one bucket that has been configured and populated with six files. Note, to avoid the certificate trust issue I need simply to load the default TLS profile client certificate being presented into the Windows machine’s trusted certificates store, Microsoft Management Console or Active Directory could be leaned upon. One could quickly create a new, unique certificate authority (CA) using a tool like OpenSSL, a common approach for lab work, and install trust of all certificates issued by that CA on the client. To validate that the S3 object service, not just the AIStor GUI, is working, a simple graphical utility, S3 browser, is leveraged along with AIStor access credentials. Both username and password or S3 access key and access secrets could be used. Using the “Upload” button we immediately uploaded a seventh file, noted with highlights in the event log as having been successfully completed. We are now offering external users BIG-IP facilitated secure, load balanced S3 into AIStor in a Kubernetes cluster. This is what we came to do. A couple of features of S3Browser that are useful to note for lab work, the advanced options allow S3 certificate validation to be bypassed, as such we receive no complaints when using self-signed certificates in the lab. The other key feature is an industry absolute must, S3 multi-part uploads. A common setting is for all objects greater than, say, 100 megabytes to be broken into multiple chunks, for instance a 3-gigabyte file upload can be handled by 30 separate uploads for reassembly at the object store, where in turn the object is likely to be erasure coded. The value of multi-part uploads includes alleviating transient network conditions that might cause extremely large objects to fail after the majority of data has been delivered. Also, the individual chunks can enable parallelism, chunks hit BIG-IP and could be load balanced across multiple S3 nodes. The last validation is to examine the BIG-IP origin pool in use for the S3 application. As you will recall, this is a small lab setup using NodePort and two k8s nodes. As such, our expectation would be for two auto-discovered node entries, and the solution should have automatically determined the TCP high value port to use in the back end. The following demonstrates the AIStor GUI traffic origin pool. We observed that indeed the nodes and the ports assigned to the two services, TCP port 31000/31001 for Console/S3 delivery, respectively, were automatically discovered and configured. Traffic counters indicate communications to both. One last note, Minio has S3 services exposed by AIStor in encrypted format on 443, which we have selected a NodePort of TCP 31001 for. By default, an unencrypted port, using HTTP as transport, is also available in the backend. This normally is on service (native) port 9000. NodePort will automatically expose this as well, on a randomized high port, in the screenshot below we see 30547 has been assigned. As such, the BIG-IP will discover this as a second set of NodePort values in the S3 origin pool. Simply delete the pool entries for port 30547 as we wish to only support encrypted S3 in this lab. S3 traffic, once distributed to cluster nodes on port 31001, is then proxied to any of the individual AIStor pods running in the cluster. As you may recall, our Helm Chart requested three instances of AIStor servers (pods) to always be running and healthy in the cluster. As with normal deployments in k8s, this translates into a replica set. Should any pod or entire node fail, automatically Kubernetes orchestration will kick in and provide the specified number of AIStor pods. Surely even self-driving cards will be challenged to reach this level of hands-free automation. Summary and next steps A simple, small scale upstream Kubernetes cluster was deployed in a lab environment on Ubuntu hosts. Using the MinIO documentation, a DirectPV storage solution was quickly created. MinIO DirectPV is a Kubernetes add-on that lets you use local, performant disks on your servers as the underlying storage for a modern, scalable S3 solution—automatically and efficiently. Using Helm charts, a Kubernetes-native AIStor Operator was installed, followed by one instance of the AIStor S3 object storage service itself. Multiple tenants can easily be added. To enable a single endpoint for external S3 users to interact with the k8s-based solution, BIG-IP Container Ingress Services was deployed. This, also using Helm charts for a quick install, resulted in an ingress controller pod running in the cluster and fully automatic setup of an adjacent BIG-IP load balancer. The setup used CRDs and four small YAML files to described the desired S3 and AIStor GUI virtual servers. S3 traffic was observed to be load balanced as expected from external sources to the AIStor service within Kubernetes. As a next step, an investigation will be conducted into ClusterIP operations, where BIG-IP can directly load balance to the AIStor pods themselves. This normally involves selecting a container networking interface, like Flannel, Calico or Celium, all effectively allow pod networking such that pods and the BIG-IP can directly communicate. This completed lab exercise used NodePort rather than ClusterIP, and load balances S3 traffic to the individual cluster nodes for successful forwarding to pods.
Steve_Gorman
Jun 25, 2026 Place Technical Articles
128Views
1like
0Comments
Unlocking the power of AI with Model Context Protocol (MCP): Key features in F5 BIG-IP v21 & v21.1
Model Context Protocol (MCP), now supported in F5 BIG-IP v21, is a groundbreaking standard that revolutionizes AI systems by enabling seamless, dynamic discovery and integration of contextual data across tools, databases, and MCP servers. F5 enhances MCP workflows with optimized load balancing, dynamic traffic management, and secure integration capabilities, ensuring scalable and reliable AI-driven operations across hybrid and multicloud environments. With features like intelligent routing, adaptive context discovery, and agentic AI support, F5 BIG-IP v21 empowers organizations to confidently scale AI solutions while maintaining high performance and security.
sridharm
Jun 11, 2026 Place Technical Articles
195Views
0likes
0Comments
From Chat to Config: Building an AI-Native MCP Server for F5 Distributed Cloud
The Problem: F5 Distributed Cloud is Powerful but Verbose Anyone who has worked with F5 Distributed Cloud (XC) knows the platform is incredibly capable. HTTP load balancers, WAF policies, API security, origin pools, namespaces, service policies—the feature set is deep. But with depth comes complexity. A single POST to create an HTTP load balancer with WAF, HTTPS auto-cert, and an origin pool involves carefully crafting nested JSON across three or four separate API calls, each with its own spec structure. For experienced engineers, this is manageable. But what if you could just say: "Create an HTTPS load balancer for test-namespace, attach a WAF policy in blocking mode, origin server at 10.10.10.10 port 80 with an HTTP health check, auto-cert on port 443 with HTTP redirect" …and have all of that happen automatically, correctly, with dry-run safety by default? That's exactly what I built. This article walks through the F5 XC MCP Server—an open-source Model Context Protocol server that translates natural language commands from Claude Code or GitHub Copilot directly into F5 XC API calls. What is MCP? Model Context Protocol (MCP) is an open standard introduced by Anthropic that lets AI assistants (like Claude) call external tools and services through a structured interface. Think of it as a plugin system for AI — instead of the AI just generating text, it can actually do things: query APIs, read files, run commands, interact with platforms. An MCP server exposes a set of tools—typed functions with names, descriptions, and input schemas. When you ask Claude Code something like "list all my namespaces in F5 XC," it finds the right tool (xc_list_namespaces), calls it with the right parameters, and shows you the result. No copy-pasting API tokens into curl commands. No hunting through docs for the right endpoint path. MCP clients could be the popular AI coding tool, or any custom build MCP client, such as: Claude Code (via VS Code extension—the one I primarily used) GitHub Copilot (via VS Code extension) Any MCP-compatible client MCP servers can run locally (via stdio) or remotely (via HTTP/HTTPS). Architecture The server is built in TypeScript using the @modelcontextprotocol/sdk, with axios for F5 XC API calls and zod for input validation. The structure is intentionally simple: Key design decisions: Dry-run by default. F5_XC_DRY_RUN=true is the default. Every mutating call returns a preview of what would be sent rather than actually calling the API. This makes it safe to explore and prototype without fear. Set F5_XC_DRY_RUN=false when you're ready to go live. Dual auth. Supports both API token (Authorization: APIToken …) and mTLS certificate auth (https.Agent with PEM cert + key). The certificate extracted from the F5 XC .p12 credential file works directly. Dual transport. stdio for local use with Claude Code/Copilot; streamable HTTP/HTTPS for team-shared remote deployment. Terraform as fallback. When the REST API doesn't support an operation (more on this below), tools automatically generate ready-to-apply Terraform HCL using the volterraedge/volterra provider. The Four Use Cases The server covers four areas matching common F5 XC workflows: UC Tools Example operations UC-1 Identity Namespace CRUD, API credentials Create namespace, list users/groups, audit credentials UC-2 Load Balancer Origin pool CRUD, HTTP LB CRUD Create HTTPS LB with auto-cert, add origin pool UC-3 Security App Firewall (WAF) CRUD, Service Policy CRUD Create WAF in blocking mode, attach to LB UC-4 API Security API definition CRUD, App API group CRUD, raw request Import OpenAPI spec, create API group In total the server exposes 39 tools—enough to cover the majority of day-to-day F5 XC operations from a conversation. A Live Demo Walkthrough Here's a real session—every one of these was a natural language instruction to Claude Code, which called the appropriate MCP tool automatically. Step 1: Create a namespace "Create a new namespace called test-namespace" Step 2: Create an HTTP load balancer — then upgrade it "Create an HTTP load balancer named test-http-lb in test-namespace" The tool calls xc_create_http_lb with a minimal spec. State returns as VIRTUAL_HOST_READY in seconds. "Change protocol from HTTP to HTTPS port 443, use automatic certificate management from XC, add origin server 10.10.10.10 port 80 with default HTTP health check" This triggers three tool calls automatically: xc_raw_request → creates an HTTP healthcheck object xc_create_origin_pool → creates origin pool with 10.10.10.10:80, references the healthcheck Delete old LB + xc_create_http_lb → recreates as HTTPS with https_auto_cert, HTTP→HTTPS redirect, pool attached Note: F5 XC does not allow changing the LB type (HTTP → HTTPS) via PUT. The MCP server detected this and handled the delete-recreate flow automatically. The result: Step 3: Create and attach a WAF policy "Create a WAF policy named test-waf-policy" xc_create_app_firewall — blocking mode, default OWASP detection, default bot settings. Done in one call. test-waf-policy created ✓ Field Value Name test-waf-policy Namespace test-namespace Mode Blocking (blocks detected attacks) Detection Default OWASP ruleset Bot protection Default bot settings UID 8d350b72-5c8c-4e7c-b827-248b03067ed3 The WAF policy is ready. Would you like to attach it to test-http-lb? "Enable this WAF policy on test-http-lb" xc_update_http_lb — removes disable_waf, adds app_firewall reference. Verified with a GET to confirm disable_waf is gone and app_firewall.name is set. WAF policy enabled on test-httlp-lb ✓ Field Value WAF policy test-waf-policy (blocking mode) disable_waf removed LB state VIRTUAL_HOST_PENDING_A_RECORD Total time from zero to a WAF-protected HTTPS load balancer: under 2 minutes, all from natural language. The API Limitation Discovery — and the Terraform Fallback One of the most interesting findings during development: F5 XC's public REST API does not expose user/group write operations. Every path I tried returned either 404 or 501 Not Implemented: This is intentional by design—F5 XC routes user management through its Console UI. The Terraform volterraedge/volterra provider also didn’t help for users, group management. Rather than leaving the user with a dead end, I built a Terraform fallback: when a user group write fails, the tool’s response automatically includes: The AI can then call xc_tf_apply directly to execute it—or the user can copy the HCL and apply it themselves. The Terraform runner operates in isolated temp directories, cleans up after itself, and respects the global dryRun flag (plan instead of apply when dry-run is active). This pattern—REST first, Terraform as fallback—turned out to be a very useful architectural choice. It gracefully handles the gap between what the API exposes and what the platform can actually do. Deploying to Production: HTTPS with Automatic Certificates For a shared team tool, local stdio mode isn't enough. The server needs to be always-on, accessible over HTTPS, and with a real TLS certificate. The deployment stack on an Azure Ubuntu VM: Node.js 20 (via nvm) running the MCP server on port 3000 as a systemd service Caddy as a TLS-terminating reverse proxy—one config file, automatic Let's Encrypt The entire Caddy config: Caddy handles the ACME HTTP-01 challenge automatically. The Let's Encrypt certificate was issued in under 10 seconds after DNS propagated. Auto-renewal is built in—no cron jobs, no certbot timers. One gotcha worth noting: the default Caddy proxy timeout (30s) is shorter than some F5 XC API calls (namespace creation can take ~45s). The response_header_timeout 90s setting above is necessary. With this setup, the MCP endpoint is https://your-domain/mcp — usable from any MCP client without VPN or local server setup. Connecting Claude Code to the Remote Server Add this to your Claude Code MCP configuration (~/.claude.json or .claude/settings.json in your project): That's it. After a /mcp reload in Claude Code, all 39 tools are available. You can verify with: "Show me the F5 XC server status" Which calls xc_server_status and returns tenant, auth method, dry-run state, and Terraform auth status. Lessons Learned The F5 XC REST API is comprehensive for data plane operations, limited for identity management. Load balancers, WAF policies, origin pools, API definitions — all fully CRUD-able via REST. User and group management is not. Plan accordingly if your use case involves IAM automation. Dry-run mode is not optional — it's essential. Without it, a misunderstood instruction could delete a production load balancer. Making dry-run the default (and requiring explicit override per-call or globally) is the right design for any AI-driven ops tool. Tool descriptions matter more than you think. The quality of an MCP tool's description directly affects how accurately the AI uses it. Spending time writing precise, example-rich descriptions — including what fields are required, what values are valid, and what the return looks like — significantly improves the AI's ability to compose multi-step operations correctly. Graceful degradation beats hard failures. The Terraform fallback pattern is a good example. Rather than returning a cryptic API error and stopping, surfacing the equivalent HCL and offering to apply it keeps the workflow moving. Users get an answer even when the API says no. LB type changes require delete+recreate. The F5 XC API rejects PUT requests that change the load balancer type (e.g., HTTP → HTTPS). The MCP server handles this automatically by detecting the error and orchestrating the delete-recreate sequence — a good example of where the AI layer can absorb platform-specific quirks. What's Next This is v1.0 — functional, deployed, and covering the core use cases. Areas I'm exploring for future versions: API security scanning integration: trigger XC's web application scanning from the MCP server and return findings Multi-tenant support: switch tenants within a session without restarting the server Policy-as-code export: serialize existing LBs and WAF configs to Terraform HCL for IaC migration Audit/diff mode: compare current live config against a desired state and report drift Try It Yourself The server is open source on GitHub: https://github.com/gavinw2006/F5_XC_MCP_Server Prerequisites: Node.js 18+, an F5 XC tenant with an API token, and Claude Code or any MCP-compatible client. The first thing to try once connected: "Show me the F5 XC server status, then list all namespaces" Happy to hear feedback, questions, and PRs from the DevCentral community. If you build something on top of this—a new tool module, a different transport, integration with another F5 product—I’d love to know about it.
Gavin_Wu
May 22, 2026 Place Technical Articles
470Views
2likes
2Comments
MinIO AIStor and F5 BIG-IP DNS – Globally steer and replicate your S3 object storage
A set of two complementary technologies were set out to be assessed, the first being MinIO’s active-active replication, which serves to keep buckets in sync across wide areas. This is more than object copying. It fully includes replication of delete operations, delete markers, existing objects, and replica metadata changes. As discussed in this blog, the ability exists to configure this across two or more sites, in interesting approaches like two data centers in one metro market all the way to a larger set spanning a continent; all are in play. As the blog indicates, the deliverable solution can be for multi-primary topologies, fast hot-hot failover, and multi-geo resiliency. The second technology, F5 BIG-IP DNS and LTM modules, can impose control over the path to these active-active scenarios. The ambitious requirements surrounding global server load balancing (or GSLB, for short) is directly in BIG-IP’s wheelhouse. The fully qualified domain names (FQDN) of vast sets of S3 buckets can now be put under the purview of BIG-IP. S3 users might be delivered to data centers filled with MinIO AIStor clusters using a simple round-robin approach, or perhaps a strategy where one data center is considered live, while another is ready for a hot standby switchover, in the event that network impairments arise. This is only the start of the possibilities. What about a strategy where topological knowledge is unleashed, say American users in the Atlanta region are steered to an east coast MinIO data center, say New York City, while all bucket data is immediately then synchronized to a west coast data center, perhaps in Los Angeles? A lab setup for learning All of the geographic traffic steering capabilities can become a rabbit hole, the only limiting factor is often the imagination of the solution architect. Take one final suggested and sequenced approach, first topology is used based upon the source addresses of incoming DNS queries. The idea could be to steer user traffic to “pools” of data centers on a continental -basis: traffic from users in North America is first filtered to a North American picklist of sites, Europeans to EMEA locations, perhaps Asian users to Asia Pac data centers. Things then get really interesting at the next layer, although again topology can be leaned on, BIG-IP DNS can also be instructed to slowly poll users’ local DNS resolvers over time, such that future requests for service from, say, Atlanta as an example, again, would receive a solution which knows that the round-trip response times from New York are actually and demonstrably quicker than Los Angeles and result in that being the first criteria used to steer S3 to its optimal data center cluster. The following was the objective of the lab’s setup, a two cluster AIStor solution, multiple 4-disk AIStors in each data center’s cluster. Although replication can be synchronous or asynchronous, the latter is a better fit in cases where distance between participating data centers is significant. To introduce latency reflective of North American coast-to-coast normal values, WAN latency was emulated in the lab and an asynchronous replication between buckets was selected. A key take away from the diagram is the administration component, the so called “Corporate Headquarters” and the fact that it is not collocated with storage. It does however, have authoritative control over DNS domains in use. Also, note the sample S3 consumers may be located anywhere, and latency to each data center will be unique. The MinIO AIStor active-active setup in a nutshell The MinIO blog post referenced earlier takes a user through an easy-to-follow GUI-based approach to setting up the clusters for replication, however the command-line mcli approach is also valid. The MinIO documentation site can be found here and covers the replication topic in general. The key takeaways for anyone standing up an environment like that described in this article: The bare minimum for erasure coding, a foundational part of MinIO’s data resiliency story, is 4 drives. I have used 2 servers with 4 drives, for 8 drives, per site. Ample bandwidth between sites, in my lab I have 100 Mbps between emulated sites. Buckets to be included in an active-active replication approach must have both versioning and object-locking enabled when creating them, matching identical buckets and permissions should be set up at all other participating data centers. In this lab setup, the fictious organization is byteboutique.io, a distributed organization with MinIO storage in multiple locations, allowing B2B partners to access via S3 buckets line of business material such as “datasheets”, “product-videos” and “sales-orders-inventory”. When creating the buckets with the AIStor GUI, such as for a new bucket “sales-reports”, simply ensure versioning and object-lock are requested. Versioning allows objects touched at one location to be kept in sync with the versions accessible at all locations, even deleted objects are simply versioned and retained under the hood for future usage. Once this is performed at the clusters in each participating site, the next step is quite straightforward. Simply group select the buckets in question and feed AIStor the information about the desired replication. The last step, after pressing the button above, is to set up the replication parameters to initiate communications with the other AIStor site in the lab. At this point, objects delivered into either site’s buckets, using perhaps graphical tools like S3browser, will be replicated to the same bucket in other data centers. The next requirement is how can we use the F5 technology to provide a universal naming convention, as both humans and business automated routines prefer DNS names over static IP addresses. We want an S3 application to write to byteboutique.io’s buckets with knowledge that the content will go to one MinIO site, any site. It could even be done in a round-robin manner. The beauty of the active-active angle is that the automated backend replication work is shielded from the user. Beyond this, we can take things one step further and have the BIG-IP use context such as source IP awareness or on-going network response measurements to guide that S3 traffic to the best possible landing site. Global load balancing S3 traffic with BIG-IP DNS and LTM – infrastructure setup We can adjust our diagram to introduce the two F5 components required to meet the lab objectives. Each emulated data center will have one BIG-IP, as a minimum these appliances will have the local traffic manager (LTM) module licensed. LTM allows incoming S3 transactions to be load balanced per selected algorithm to AIStor nodes local to that site. The “least connections” algorithm is a popular choice for heavy S3 traffic flow. The received S3 traffic will be both new, user-initiated requests as well as traffic generated by AIStor clusters themselves to achieve a perpetually replicated state amongst sites. The other component to be licensed is the DNS module. This will allow global traffic steering and need not be on all BIG-IP appliances. It can co-habitate nicely with the LTM module, so perhaps some data center-housed BIG-IPs will use it, as well as BIG-IP appliances that might already exist in other areas, such as in a corporate headquarters. The minimum number of BIG-IP DNS appliances is two, but for production more would be recommended. In our lab setup, the headquarters Windows server is the authoritative DNS server for our fictious byteboutique.io domain. What we can quickly do is delegate control over the sub-domain corp.byteboutique.io to our BIG-IP DNS appliances. In other words, we will create DNS Name Server (NS) resource records for the “corp” sub-domain which point to the BIG-IPs. This is the critical cog in the wheel. All S3 accessible buckets will use DNS names below corp and are thus fully under the control of the BIG-IP administrator. Other approaches that can retain existing domain names and put them under the control of BIG-IP DNS would include using CNAME DNS resource records. In the following image, we see that the delegated corp domain has NS resource records added, and that looking at the main byteboutique.io resource records, there are A resource records pointing to IP addresses dedicated to DNS on both the HQ and East data center BIG-IPs (30.0.0.12 and 40.0.0.12). We are halfway home. Now we just need to see the key parameters of a BIG-IP DNS configuration. There is both regular F5 documentation on the DNS solution here and also a very handy lab guide here that graphically provides every step towards a sample classroom set up. To simply hit on the main tasks, the BIG-IP DNS appliances must be set to join a common DNS “group”, the name “F5DEMO_group” is used in the following lab setup screenshot. This means BIG-IPs in the DNS group can share content like zone files and collectively control where S3 traffic lands. The impactful part to you? After joining the BIG-IP DNS appliances with the “gtm_add” command, you will only ever need to create new FQDN values (such as an S3 service at name storage.corp.byteboutique.io) on just one BIG-IP DNS and all others in the group will be adjusted accordingly behind the scenes. Phew. So, with DNS administration, just set it and forget it on any BIG-IP member of your choice. Beneath the surface, F5’s iQuery protocol is in play to keep DNS members coordinated automatically. The only other command-line task in this whole endeavor is to issue “bigip_add” at each appliance, LTM, DNS, or LTM/DNS. This will let the device’s certificates be trusted by other appliance peers and allow secure communications between each. The next task is to create logical holding entities for our locations, simply and intuitively called “Data Centers”. As such, we will need entries for our diagram’s east site, west site, and headquarters. The last step of this one-time infrastructure setup phase is to add “servers” at each site. These correspond to all BIG-IP appliances, including LTM only appliances, which serve to load balance to AIStor nodes. The nice part, you ask? A discovery feature includes all the currently configured virtual servers at each site, so the task is simply adding to the server list and choosing a health check to be run in the background. Here is our server list. Notice the virtual server count has been populated, including the hq site which being licensed only for the DNS module, understandably has no virtual servers. Tying it all together - modern traffic steering for MinIO S3 buckets with F5 BIG-IP Before answering the age-old question, what precisely is a “Wide IP” anyways, let us settle on some term clarity first. For anyone with a background in BIG-IP LTM, or any on-prem load balancer, a pool normally means a set of local origin servers “behind” the load balancer. These might be Linux appliances with Nginx webservers, Windows-based server applications based around IIS, or in our case, MinIO AIStor servers offering S3 API-compatible object storage services. In the world of GSLB there are actually two tiers of pools, the first tier of pool allows groups of data centers, not individual origin servers, to be selected by one of the many available algorithms. Consider this as an example, when first selecting a pool for a mock web application, named myapp.global.example.com. In the above example, created purely for illustration, incoming requests for the domain name myapp.global.example.com would be round robin directed to either data centers in the Americas, or Asia or Europe regions. In reality, a topology-based load balancing method, not round robin, would likely be invoked in the top “Load Balancing Method” pull down. The key point is to highlight that each region might have a half dozen data centers, or more, each equipped with BIG-IP virtual servers ready to handle application traffic delivered there. This is a very reasonable first pool-level approach you might use, and the FQDN in the example (myapp.global.example.com) is referred to as a “Wide IP”. A Wide IP, or WIP for short, is an F5 DNS construct that maps names to pools first, and then to individual sites (housing virtual servers) second. In our lab, our Wide IP to get S3 transactions to MinIO AIStor nodes, is “storage.corp.byteboutique.io”. We can see we have just one pool, as denoted by the arrow. Perhaps think of this as a North America-only scenario, but a solution that is ready to be rolled out internationally when byteboutique really takes off and expands to the world. Drilling down by clicking on our WIP, we see the one pool, and observe two “members”, meaning two virtual servers are associated to this pool. This is a quick shorthand count of sorts, we know we are looking at a solution where the WIP resolves to one of two possible MinIO-equipped data centers. Interestingly, since our lab is using one pool, the actual load balancing method at this layer is moot. Round robin is seen in the above screenshot however any other mechanism of selecting from a pool set of one will, of course, not be impactful. However, by clicking onto the pool itself, we get to the heart of the actual decision-making logic in our lab setup. The following screen will dictate what IP address, meaning what site’s S3 virtual server IP address, will be delivered to a client’s local DNS resolver upon request (double-click to enlarge). We see that the two sites of our lab, east and west, are both represented in the virtual server “Member” list of our pool. The status is green, as both virtual servers are, within the two respective data centers, evaluating perpetually that the AIStor servers for in good (green) health. This is a subtle but powerful feature of BIG-IP GSLB versus normal DNS, we can see “behind” the load balancer in our two sites and ensure traffic will never be sent to a site that is having issues communicating with a sufficient number of backend S3 nodes. Perhaps you will want to think of this as "intelligent" DNS. The other major takeaway is the load balancing logic. This is a simple, perhaps “fast start” approach. We are using a static round-robin algorithm, when DNS A resource record (RR) requests are delivered to either BIG-IP DNS appliance, since they are authoritative for *.corp.byteboutique.io, the IPs of the two virtual servers will be utilized in responses, in a round robin manner. DNS has time to live (TTL) values in responses, so any local DNS resolver is sure to ask again over time, and generally, unless we choose persistence, our solution will serve each virtual server equally over time. Tiered traffic steering logic - dynamic load balancing A common design approach is to have a dynamic load balancing approach as “Preferred” and a more foolproof static approach as the alternate. You can see the tiered load balancing strategies of preferred, alternate and fallback in the previous screenshot. A good example is the idea of Round Trip Time, a dynamic attempt to measure latency from both potential data centers to the local DNS resolver. Generally, this favors the outcome that the DNS A resource record response for storage.corp.byteboutique.io will be the “closer” data center. Perhaps if network conditions and network media are alike, a user in Atlanta will be steered to AIStor clusters in New York, as opposed to Los Angeles, due to terrestrial propagation delays of crossing a continent. The “Alternate” option is best served by a static approach. In case the polling of a local DNS server is not yet in place or not properly working due to firewalls, a static choice like Round Robin can be used as the alternate. A common static approach is to use “Topology”, examine the source IP of an S3 client’s local DNS resolver and use IP network connectivity knowledge to deduce which of the two data centers is likely fewer IP network hops away. A couple of last notes on this past screenshot, what exactly is the Fallback IP for? In our scenario, it is possible that both active round trip measurements and static source IP analysis fail to come to a best data center choice. This is where Fallback comes into play. In my example, I have the IP address of the “East” data center S3 virtual server hardcoded as the Fallback (40.0.0.100). This gives our solution the assurance that a completely valid answer, even if not the optimal answer, will always be available from DNS. Also, you may note that we talk in terms of the client’s local DNS resolver source IP address, why not the source address of the user itself? This is the nature of DNS. Clients do not normally recursively engage with the global DNS infrastructure, that role is deferred to a configured DNS resolver. There is often no issue with this, if the S3 consumer is an office-bound application server itself, reading and writing to MinIO S3 storage. The local DNS resolver is very likely co-located. There are scenarios where a DNS resolver is not collocated. Think of a split tunnel VPN connection and you are using a vendor's global S3 services; your laptop’s corporate (VPN) DNS service may engage the world from another state or country, but the resulting S3 traffic may flow directly from the S3 cluster to your actual location with split tunneling. In such cases, workarounds exist with BIG-IP DNS, such as the use of the EDNS0 option, which strives to carry actual client source IP information into the DNS realm. A quick test of our AIStor and BIG-IP lab To see if our solution works, we will just use basic round robin global load balancing. For completeness, let’s look at the actual last leg of load balancing, when one of the virtual servers in either data center receives S3 transactions from clients. Our lab setup looks like the following, highlighting the west location, immediately followed by a glimpse of the “west” BIG-IP’s virtual server setup and its local pool of AIStor nodes. There are numerous GUI and command line approaches to generate S3-API compliant traffic, ideas are FileZillaPro, CyberDuck and Curl commands to name but a few. In this example I have used the S3Browser utility which even on the free account tier has many useful features. To evaluate the lab setup, we will instruct S3Browser to connect to the FQDN of storage.corp.byteboutique.io on TCP port 9000. It is recommended in production to use user level, not admin level as I have S3 access credentials. As noted, TLS is set to “off” but can easily be supported by both BIG-IP and AIStor. A potential performance-focused move would be to utilize TLS as far as the BIG-IP and then offload S3 TLS to pure HTTP within the boundaries of the datacenter. This may not be an option for security-first advocates who put a premium on end-to-end encryption over storage solution scalability. Once we connect, we see a list of buckets that have been entered into the active-active replication arrangement. Within the “sales-orders-inventory” bucket we see three files, the user is not bothered with what precise data center provided the object list displayed. The user now uploads a file, a simple file upload button is present, and loads a new file into the bucket. Within seconds, looking at sample AIStor nodes in the east and west, we can confirm that the bucket instances have all been updated in both clusters. To validate the BIG-IP GSLB solution is operating as intended, beyond the net effect on the storage experience which we see is working, multiple interesting views are available within the BIG-IPs themselves. The first expectation would be, as per our two Name Server (NS) DNS resource records, we would expect both the headquarters and east office appliances to be consulted relatively equally over time. As we see in the following screenshot, with east on top and headquarters below, that is the case (double-click to enlarge). Now, to double click on the east BIG-IP, we have seen 82 queries for storage.corp.byteboutique.io but have the two virtual servers in the pool been offered up as destinations in near equal amounts? The answer is yes; the static round robin algorithm seems to have worked. Since the preferred load balancing algorithm for the defined pool is a static, faultless one, round robin, as expected there are no instances of an alternate or fallback approach being required. A next step for closer approximation to a production environment, would be to introduce a dynamic algorithm, perhaps round trip time, to demonstrate our lab S3 user who experiences lower latency to the east data center can leverage this fact and be served by the replicated cluster in the closer east data center. Summary MinIO is a thought leader in S3-API compatible storage, both for single site applications and for distributed clusters. There is an appetite in this space for active-active replicated solutions, where different S3 users can interact with any given instance and know that the totality of the storage offering is being kept in sync. BIG-IP plays two key supportive roles in this equation, the first of which is the LTM module. A local S3 load balancing function, which distributes transactions, whether reads or writes, and can optimally distribute load against all AIStor nodes in the cluster. Hot spot avoidance is paramount at this level. The second role BIG-IP offers is through the DNS module, where global traffic steering can connect any S3 user to any one of a set of AIStor sites. Job one here is resiliency of the solution, where any data center being offline temporarily can be circumvented by control of DNS. Other aspects were touched upon in this article. S3 traffic steering based upon topological information. The knowledge accrued by studying the source of DNS requests and the expected network hop count to the closest data center is one frequently used approach. Basic Geo-IP information is another route, simply direct traffic from, say, EU nations to an EU pool of data centers and other worldwide traffic to the closest site based upon IP maps. Dynamic methods were also touched upon in the discussion. A logical use case would be a round-trip latency approach, where repetitive queries from a given local DNS resolver allow this source to be polled from various BIG-IP equipped MinIO data centers over time. Thus, future requests can be directed at the expected fastest target. Finally, it was mentioned that a cascade of load balancing algorithms could be used, dynamic decision making first, followed second by an alternate static approach, like a topology database. A final fallback IP address provided to a BIG-IP virtual server on the largest site to catch corner cases is a logical approach.
Steve_Gorman
May 20, 2026 Place Technical Articles
239Views
1like
0Comments
APIs First: Why AI Systems Are Still API Systems
AI and APIs Over the past several years, the industry has seen an explosion of interest in large language models and AI driven applications. Much of the discussion has focused on the models themselves: their size, their capabilities, and their apparent ability to reason, summarize, and generate content. In the process, it is easy to overlook a more fundamental reality. Modern AI systems are still API systems. Despite new abstractions and new terminology, the underlying mechanics of AI applications remain familiar. Requests are sent, responses are returned. Identities are authenticated, authorization decisions are made, data is retrieved, and actions are executed. These interactions happen over APIs, and the reliability, security, and scalability of AI systems are constrained by the same architectural principles that have always governed distributed systems. What is new is not the presence of APIs, but the nature of the consumer calling them. In traditional systems, API consumers are deterministic. They are code written by engineers who read the documentation and invoke endpoints in predictable ways. In AI systems, the consumer is increasingly a model, a probabilistic component that infers behavior from schemas, chains calls dynamically, and produces traffic patterns that were not explicitly programmed. That single shift is what makes every downstream concern in this series, including MCP design, token budgets, authorization, and operations, behave differently than in traditional API platforms. Understanding this relationship is critical, not only for building AI systems, but for operating and securing them in production. AI Applications as API Orchestration Platforms At a high level, an AI application is best understood not as a single model invocation, but as an orchestration layer that coordinates multiple API interactions. A typical request may involve: A client calling an application API Authentication and authorization checks Retrieval of contextual data from internal or external services One or more calls to a model inference endpoint Follow-on tool or service calls triggered by the model’s output Aggregation and formatting of the final response From an architectural perspective, this is not fundamentally different from any other multi-service application. Routing, observability, traffic management, and trust boundaries remain as relevant here as in any traditional platform. What has changed is that the decision logic, meaning when to call which service and with what parameters, is increasingly driven by model output rather than static application code. That shift does not eliminate APIs. It increases their importance. AI Application as an Orchestration Platform Models as API Endpoints, Not Black Boxes In production environments, models are consumed almost exclusively through APIs. Whether hosted by a third party or deployed internally, a model is exposed as an endpoint that accepts structured input and returns structured output. Treating models as API endpoints clarifies several important points. A model does not "see" your system. It receives a request payload, processes it, and returns a response. Everything the model knows about your environment arrives through an API boundary. What distinguishes model endpoints from conventional APIs is not their interface, but their operational profile. Responses are frequently streamed rather than returned as a single payload, which changes how load balancers, proxies, and timeouts behave. Payload sizes are highly variable, with both requests and responses ranging from a few hundred bytes to many megabytes depending on context and output length. Rate limits are often expressed in tokens per minute rather than requests per second, which complicates capacity planning and quota enforcement. Self-hosted models introduce additional concerns around GPU scheduling, cold start latency, and memory pressure that do not exist for traditional stateless services. These characteristics do not change the fundamental nature of a model as an API endpoint. They do mean that the operational assumptions built into the existing API infrastructure may not hold without adjustment. Tools, Retrieval, and Data Access Are Still APIs As AI systems evolve beyond simple prompt-and-response interactions, they increasingly rely on tools: databases, search systems, ticketing platforms, code repositories, and internal business services. These tools are almost always accessed through APIs. Retrieval-augmented generation, for example, is often described as a novel AI pattern. In practice, it is a sequence of API calls: An embedding service is called to encode a query A vector database is queried for relevant results A document store is accessed to retrieve source material The retrieved data is passed to the model as context Each step carries the usual concerns: latency, authorization, data exposure, and error handling. The model may influence when these calls occur, but it does not change their fundamental nature. Why API Design Matters More in AI Systems If AI systems are built on APIs, why do they feel harder to manage? The answer lies in amplification. Model-driven systems tend to: Chain API calls dynamically Surface data in ways developers did not explicitly anticipate Expand the blast radius of a misconfigured authorization Increase sensitivity to payload size and response shape A poorly designed API that returns excessive data may be tolerable in a traditional application. In an AI system, that same response can overflow context limits, leak sensitive information into prompts, or cascade into additional unintended tool calls. This amplification rarely stays within a single domain. A schema decision that looks like an application concern becomes a traffic and routing concern when responses grow unpredictably, and an authorization concern when a model uses that response to drive the next call. Design choices that were once contained within one team’s scope now propagate across the stack. In this sense, AI does not introduce entirely new architectural risks. It magnifies existing ones. Introducing MCP as an API Coordination Layer As models gain the ability to invoke tools directly, the need for consistent, structured access to APIs becomes more pressing. This is where Model Context Protocol (MCP) enters the picture. At a conceptual level, MCP does not replace APIs. It standardizes how AI systems discover, describe, and invoke API-backed tools. MCP servers typically sit in front of existing services, exposing them in a model-friendly way while relying on the same underlying API infrastructure. Seen through this lens, MCP is not a departure from established architecture patterns. It is an adaptation, one that acknowledges models as active participants in API-driven systems rather than passive consumers of text. But it is also the introduction of a new coordination layer, a tool plane, with its own operational, network, and security properties that do not map cleanly onto the API layer beneath it. The rest of this series examines what that means for the systems you build, run, and secure. Looking Ahead If AI systems are still API systems, then the familiar disciplines of API architecture, security, and operations remain essential. What changes is where decisions are made, how data flows, and how quickly small design flaws can propagate. The next article looks more closely at MCP itself, examining how it standardizes tool access on top of APIs and why treating it as a tool plane helps clarify both its power and its risks. From there, the series turns to tokens as a first-class design constraint that shapes tool schemas, response shaping, and traffic behavior. The fourth article addresses authorization and the security implications of letting models invoke tools directly, including identity, delegation, and the expanded blast radius MCP introduces. The series closes with a look at operating MCP-enabled systems in production, where reliability, cost, and safety have to be enforced rather than assumed. Resources: Article Series: MCP, APIs, and Tokens: Building and Securing the Tool Plane of AI Systems (Intro) MCP, APIs, and Tokens (Part 1 - APIs First: Why AI Systems Are Still API Systems) MCP, APIs, and Tokens (Part 2 - MCP as the Tool Plane: Standardizing Access Across APIs) MCP, APIs, and Tokens (Part 3 - Tokens as a Design Constraint for MCP and APIs) MCP, APIs, and Tokens (Part 4 - Securing the Tool Plane: MCP, APIs, and Authorization) MCP, APIs, and Tokens (Part 5 - Designing for the Inference Track: Safe, Scalable MCP Systems)
Cameron_Delano
May 14, 2026 Place Technical Articles
896Views
7likes
2Comments
How to Optimize AI Inference with F5 NGINX Gateway Fabric
If you’re managing Kubernetes clusters right now, you already know the drill: standard Layer 7 load balancing works flawlessly for web APIs where requests resolve in milliseconds. But the moment you start hosting Large Language Models (LLMs), that traditional routing logic falls apart. AI inference workloads are a completely different beast. You have to account for GPU memory, active inference queues, and KV-caches. If you rely on basic proxying, you usually end up with incredibly expensive GPUs tied up handling lightweight tasks, while your developers are forced to write custom middleware just to orchestrate traffic. We needed a way to bring intent-driven networking directly to the AI edge. That’s exactly what the Gateway API Inference Extension does. By pairing this with F5 NGINX Gateway FabricWe can transform a standard Kubernetes Gateway into a dedicated Inference Gateway. Let's look at how this changes the game for platform teams. The Two-Stage Architecture To make intelligent routing decisions, we use a two-stage architecture that separates high-level routing intent from real-time endpoint selection. Stage 1 (Intent): We use standard Kubernetes HTTPRoutes to define exactly where traffic should go based on paths, headers, or weights. Stage 2 (Real-Time Selection): Instead of routing blindly to backend pods, we target an InferencePool CRD. This pool uses an Endpoint Picker to evaluate real-time node telemetry (like queue depth) and picks the absolute best pod for the job. To prove this is running under the hood, we can describe our GPU and CPU InferencePools. Notice how each pool has a dedicated Endpoint Picker attached and ready to route traffic based on real-time node health. GPU Pool Endpoint Picker: kubectl describe inferencepool ollama-inferencepool -n ollama | grep -A 10 "Endpoint Picker" Endpoint Picker Ref: Failure Mode: FailClose Group: Kind: Service Name: ollama-inferencepool-epp Port: Number: 9002 Selector: Match Labels: App: ollama --> GPU Target Ports: CPU Pool Endpoint Picker: kubectl describe inferencepool ollama-cpu-pool -n ollama | grep -A 10 "Endpoint Picker" Endpoint Picker Ref: Failure Mode: FailOpen Group: Kind: Service Name: ollama-cpu-pool-epp Port: Number: 9002 Selector: Match Labels: App: ollama-cpu --> CPU Target Ports: This separation of routing intent from real-time endpoint selection allows platform engineers to solve three critical AI infrastructure challenges without requiring developers to write custom middleware. A quick note on resilience: Notice the Failure Mode in those outputs? This defines what happens if the Endpoint Picker itself goes offline. For our expensive GPU pool, we set it to FailClose (rejecting traffic so we don't overwhelm premium hardware blindly). For our CPU efficiency tier, we set it to FailOpen (falling back to standard round-robin load balancing to keep the application alive). Let’s look at three practical ways to use this setup to optimize your hardware. 1. Model-Aware Routing: Stop Using GPUs for Everything Treating all AI traffic equally is the fastest way to exhaust a hardware budget. High-performance GPUs should be strictly reserved for complex generation tasks, while efficiency tiers (such as CPUs) should handle lightweight tasks, such as audio transcription or basic summarization. apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: model-aware-httproute namespace: ollama spec: parentRefs: - name: inference-gateway namespace: nginx-gateway sectionName: http rules: # --- UI Preservation Rules --- - matches: - path: { type: PathPrefix, value: /ui } filters: - type: URLRewrite urlRewrite: path: { type: ReplacePrefixMatch, replacePrefixMatch: / } backendRefs: - name: chatbot port: 8501 - matches: - path: { type: PathPrefix, value: /static } - path: { type: Exact, value: /favicon.png } backendRefs: - name: chatbot port: 8501 # --- AI Routing Rules --- - matches: - path: { type: PathPrefix, value: /v1/audio } backendRefs: - group: inference.networking.k8s.io kind: InferencePool name: ollama-cpu-pool. # CPU Pool Fallback port: 11434 - matches: - path: { type: PathPrefix, value: /v1/chat } backendRefs: - group: inference.networking.k8s.io kind: InferencePool name: ollama-inferencepool. # GPU Pool Fallback port: 11434 Using path-based routing, the Gateway acts as an intelligent traffic cop. In the configuration below, we map the /v1/audio path directly to a CPU InferencePool. If a request hits this endpoint, the Gateway seamlessly offloads it to the efficiency tier, protecting our premium GPUs from trivial workloads. 2. Canary Deployments In MLOps, rolling out a new LLM is inherently risky. Performance regressions, latency spikes, and hallucinations are real threats to the user experience. You cannot simply cut over all production traffic to a new model overnight. Native traffic splitting provides a safety net for model validation. By configuring a deterministic weight at the Gateway level, 90% of users continue to be served by the stable production GPU pool, while 10% of traffic is routed to the efficiency tier for real-time validation. apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: canary-httproute namespace: ollama spec: parentRefs: - name: inference-gateway namespace: nginx-gateway sectionName: http rules: # --- UI Preservation Rules --- - matches: - path: { type: PathPrefix, value: /ui } filters: - type: URLRewrite urlRewrite: path: { type: ReplacePrefixMatch, replacePrefixMatch: / } backendRefs: - name: chatbot port: 8501 - matches: - path: { type: PathPrefix, value: /static } - path: { type: Exact, value: /favicon.png } backendRefs: - name: chatbot port: 8501 # --- Use Case 2: 90/10 Canary Split --- - matches: - path: { type: PathPrefix, value: /v1/chat } backendRefs: - group: inference.networking.k8s.io kind: InferencePool name: ollama-inferencepool. # GPU Pool Fallback weight: 90 port: 11434 - group: inference.networking.k8s.io kind: InferencePool name: ollama-cpu-pool. # CPU Pool Fallback weight: 10 port: 11434 Because this is handled purely via declarative YAML, platform teams can execute risk-free canary tests without altering any application code. 3. Cost Optimization (Header-Based): When an AI cluster is under maximum pressure, standard load balancers process requests on a first-in, first-out basis. In an enterprise environment, this is unacceptable. Mission-critical workflows and premium users must have guaranteed access to the best compute resources. By utilizing custom HTTP headers, client applications can signal their importance to the Gateway. The NGINX semantic engine reads these headers in real-time. If the x-query-complexity: high header is present, the request is immediately fast-tracked to the premium GPU pool. Every other request falls back to the CPU tier. apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: header-based-httproute namespace: ollama spec: parentRefs: - name: inference-gateway namespace: nginx-gateway sectionName: http rules: # --- UI Preservation Rules --- - matches: - path: { type: PathPrefix, value: /ui } filters: - type: URLRewrite urlRewrite: path: { type: ReplacePrefixMatch, replacePrefixMatch: / } backendRefs: - name: chatbot port: 8501 - matches: - path: { type: PathPrefix, value: /static } - path: { type: Exact, value: /favicon.png } backendRefs: - name: chatbot port: 8501 # --- Use Case 3: Priority Steering --- - matches: - headers: - type: Exact name: x-query-complexity value: high path: { type: PathPrefix, value: /v1/chat } backendRefs: - group: inference.networking.k8s.io kind: InferencePool name: ollama-inferencepool # GPU Pool port: 11434 - matches: - path: { type: PathPrefix, value: /v1/chat } backendRefs: - group: inference.networking.k8s.io kind: InferencePool name: ollama-cpu-pool # CPU Pool Fallback port: 11434 This enforces strict SLAs regardless of overall cluster load, so your most valuable transactions stay running. See the Architecture in Action Managing complex AI traffic across heterogeneous hardware doesn't have to be a headache. By utilizing F5 NGINX Gateway Fabric, you can optimize compute, validate models safely, and prioritize critical traffic—all through declarative, intent-driven configurations. To see exactly how these routing rules are executed, check out my full technical walk-through below: Resources : [NGINX Community Blog] Read the official announcement and dive deeper into how NGF supports this new extension. [NGINX Gateway Fabric Documentation] The official documentation for deploying and configuring NGINX Gateway Fabric in your Kubernetes environment. [NGINX Gateway Fabric (GitHub)] Explore the upstream development, CRD definitions, and architecture of the Inference Extension project.
Akash_Ananthanarayan
May 14, 2026 Place Technical Articles
266Views
1like
0Comments
Enhancing AI Data Pipelines with BIG-IP v21: Discover S3 Integration
F5 BIG-IP v21 revolutionizes AI data pipelines with advanced support for S3-compatible object storage, enabling enterprises to optimize, secure, and scale AI and analytics workflows seamlessly. By introducing S3-tuned traffic profiles, intelligent load balancing, and robust health monitoring, BIG-IP ensures predictable performance, resiliency, and protection against protocol-specific threats. This transformative delivery layer empowers businesses to handle complex workloads efficiently, making AI-driven innovation faster, smoother, and more reliable than ever.
sridharm
May 11, 2026 Place Technical Articles
263Views
2likes
0Comments
Use SFTP and FTP to Join Critical IT Systems to Modern Object Storage with F5 BIG-IP and MinIO AIStor
Around the world, many critical IT systems require moving data repeatedly but pre-date the rise of object storage solutions. These newer solutions largely harness the S3-compliant API. IT applications at risk of being left behind frequently use well-established file management protocols including FTP and SFTP. The cost and talent to retrofit is daunting, attempts to integrate these apps into the modern, low-cost world of object storage may not be palatable. To now, external gateway appliances might be one strategy. However, this adds hardware costs, latency, and failure points. Separate authentication systems for SFTP and S3 create fragmented security. The solution described in this article joins traditional clients to MinIO’s AIStor, which provides native FTP and SFTP control planes and not just S3 object access. Traffic robustness is accentuated by F5 BIG-IP, which allows loose coupling between IT client systems and the back-end MinIO storage nodes. File Management Protocols – Not Going Anywhere The File Transfer Protocol (FTP) was first codified in RFC 114 in April of 1971; and it’s still very much in use today. Frequently, as security awareness in the industry rose, the TLS-based companion protocol File Transfer Protocol Secure (FTPS) gained prominence. Both continue to be used today, one contentious issue is the use of multiple TCP ports during sessions, as well as the required discipline to maintain valid X.509 certificates for authentication in FTPS conversations. Meanwhile, Secure Shell File Transfer Protocol (SFTP) concurrently arose, and benefits from being a simpler, single TCP port solution with authentication frequently relying on easier, pre-created key exchanges. One essential item to keep in mind from the start, SFTP transfers its data over Secure Shell (SSH) version 2, making it distinct from TLS-carried protocols such as HTTPS, SMTPS, DNS over TLS (DoT) and the aforementioned FTPS. To support the vast investment in these traditional file moving protocols, MinIO has developed a server side offering for them. When traditional BIG-IP load balancing is introduced, such as in this KB article and companion how-to video that discusses load balancing SFTP, we achieve the desirable decoupling of clients from individual AIStor nodes. By interacting with a BIG-IP virtual server, traffic can be load balanced and the failure or taking off-line of one node will not stop the upload or download of files. If one MinIO node becomes a hot spot of activity, a new load can proportionally task other less-utilized nodes. Lab Validation with BIG-IP and AIStor The following diagram depicts the environment used for investigating this union of traditional file transfer protocols and modern object storage. Of the possible legacy file management protocols, why was SFTP double-clicked upon? A number of reasons, including the fact SFTP is downright young compared to FTP, with an IETF specification dating back to only 1997. More importantly, although numbers may be hard to come by, all indications are SFTP usage will remain steady and vital for years to come. The principal reasons for SFTP to be used in IT to this day include: Compliance Requirements: SFTP is essential for meeting regulatory frameworks like GDPR and HIPAA, in conjunction with providing a reliable audit trail. SFT is heavily used for automated, scheduled batch workflows, this includes importing/exporting of data to partners in B2B data exchanges. The growth of big data has pushed the value added by external Extract, Transform, Load (ETL) vendors, with nightly data movements often being SFTP-based. The lack of firewall complexity, with a single well-known tcp port, such as port 22, often being the only “allow” rule required. The ETL space in particular is significant, with some estimates placing the dollar value around this technology at over US $10 billion in 2026, with a doubling predicted by 2031. Configure AIStor and BIG-IP for SFTP Traffic An existing AIStor node cluster is easily adjusted to support protocols such as SFTP, FTP, and FTPS. Generally, AIStor nodes are automatically started with Linux’s systemctl to run the MinIO offering at each startup. For quick lab testing, though, one may simply start AIStor interactively from the command line. In the case of adding SFTP support, we merely add the highlighted flags to the startup. #minio server /data/disk1/minio --console-address ":9001" --sftp="address=:8022" --sftp="ssh-private-key=./ca_user_key" --sftp="trusted-user-ca-key=./ca_user_key.pub" The initial command portions are standard fare, in this simple lab case of single drive nodes; we point to the disk at /data/disk1/minio and per common practice, run the AIStor GUI on TCP port 9001. By default, S3 API calls will utilize port 9000. The SFTP additions, presented in yellow above, tell AIStor to accept SFTP control plane commands, things like “get”, “put”, “ls” and “cd”, on TCP port 8022. The only new ground for some may be the SSH key referenced, however MinIO has documented an easy-to-follow guide on creating these towards the latter part of this linked page in the standard documentation. My first thought would be the unpleasant possibility of an administrative workload here, frequently SSH-key based authentication means the loading of each potential user’s public key into an “authorized_keys” file on each server node. In reality, the delivered solution is more elegant and much simpler to maintain. Three keys will be created: Public key file for the trusted certificate authority (you create this certificate authority, one single run of #ssh key-gen). Public key file for the AIStor Server, minted and signed by the trusted certificate authority. Public key file for the user, minted and signed by the trusted certificate authority for the client connecting by SFTP and located in the user’s .ssh folder (or equivalent for their operating system). In my lab setup, which uses 2 AIStor nodes to allow for load balancing, I started by creating a user in the AIStor GUI. The user was simply named “miniouser123”. As such, the ssh miniouser123.pub key creation for step 3 would look like the following: ssh-keygen -s ~/.ssh/ca_user_key -I miniouser123 -n miniouser123 -V +90d -z 1 miniouser123.pub The net result is a CA-signed public key, or in other words, an SSH certificate, that allows AIStor nodes to trust the miniouser123 public key when provided upon SFTP connection. The -V flag indicates the public key will be trusted for 90 days and the -z option sets a serial number to 1. This signing of the user’s public key has a series of security benefits, such as (i) the enforcement of an expiration timeframe, (ii) the ability to enact a KRL (Key Revocation Lists, analogous to the use of CRL with X.509 certificates) and finally (iii) the fact that principals, including the username, can be embedded in the public key. Once a lab, including integration with BIG-IP, is completed, it is likely better to move from invoking the AIStor come the command line (eg #minio server /data/disk1 plus your flags) to an automatic startup with Linux systemctl options. In this case, the approach is to embed the flags specifically needed for file management protocols like SFTP or FTP, into the /etc/default/minio file. Here is a sample for a two node (10.150.91.190 and .192), single drive lab setup: MINIO_VOLUMES="http://10.150.91.{190...191}:9000/data/disk1/minio" MINIO_LICENSE="/opt/minio/minio.license" ## Use if you want to run MinIO on a custom port. ## add --address and --console-address to MINIO_OPTS: # MINIO_OPTS="--address :9000 --console-address :9001 [OTHER_PARAMS]" MINIO_OPTS=' --sftp="address=:8022" --sftp="ssh-private-key=/sshkeys/ca_user_key" --sftp="trusted-user-ca-key=/sshkeys/ca_user_key.pub" ' Now to ensure startup with every reboot and to also start right now, we simply issue the two commands: #systemctl enable minio #systemctl start minio BIG-IP SFTP Load Balancing Setup Following the guidance of the F5 KB articles referenced earlier, the first step would be to create an SFTP health monitor. In production, the more advanced monitor, that aims to successfully connect to each AIStor with SFTP commands, every 15 seconds, might be best practice. In a lab setup, the monitor to establish a half-open TCP connection on the desired TCP port 8022 is sufficient (double-click to enlarge image). We now simply add our AIStor cluster members, in our case on port 8022 for SFTP. Concurrently, the BIG-IP can support other protocols including FTP and, of course, S3 access too. From the BIG-IP GUI, simply select Local Traffic -> Pools -> Pool List and the “Create” button. The only settings are to tie the pool to your SFTP monitor and select the pool AIStor members, as shown in the next image. Note the load balancing default method will be “Least Connections” to even out individual SFTP active loads on each AIStor node. We will see in the virtual server setup that good practice is normally to allow persistence based upon source IP addresses. As such, when new transactions arrive from a previously serviced client; the solution will prefer to engage the same storage node, if healthy. The virtual server setup for SFTP is largely just like a web-oriented virtual server, although we would not gain the same insights from using a “standard” mode virtual server and prefer to use a “performance” mode instance. This is due to the fact that web technologies over TLS, like HTTPS browsing or S3-compatible API commands which harness HTTPS, allow for TLS interception at the proxy. This opens up use cases like iRules HTTP header rewrites or content scanning, to name just two. Since SFTP is using SSH not TLS for encryption, the produced traffic is not aligned with in-flight interception for decryption and re-encryption. The first key benefits of BIG-IP will be in hot spot avoidance, where a busy AIStor can be shielded by spreading traffic to less busy nodes, and the ability to loosely couple clients to the service. This is to say, IT systems using SFTP (or FTP/FTPS) can be configured to use the virtual server IP or FQDN as an endpoint and an AIStor node may be taken offline, such as during maintenance windows, completely unbeknownst to clients. Other significant benefits of BIG-IP lie with performance. The settings for a virtual server of type “Performance (layer 4)” are highlighted in red, and the settings for virtual server IP address and TCP port are yellow highlighted. The Protocol Profile has been set to “fastL4”, one of F5’s most performant profiles. The following KB article details the characteristics of the fastL4 profile, all generally steered towards peak data delivery rates. One of the principal features for BIG-IP hardware platforms that contain the ePVA chip: the systems make flow acceleration decisions in software and then offload eligible flows to the ePVA chip for acceleration. For platforms that do not contain ePVA chips, the systems perform acceleration actions in software. Finally, we request client source IP address persistence. A given client’s traffic will be directed to the same backend node if it has been active in the past. If the node is out of service, due to a fault or perhaps maintenance for upgrades, another node will be used. The first time a client is seen, the pool’s load balancing algorithm will come into play, in this case “Least Connections” will guide the initial node selected. Lab Testing of SFTP Load Balancing to AIStor Storage Servers Popular operating systems like Ubuntu or Windows-11 will offer a sftp client directly from the command line. Alternatives include simple applications like WinSCP (Windows), CyberDuck (Mac/Windows) and FileZilla (cross platform). Of course, in enterprise networks, the key driver for SFTP support will be existing IT systems that use SFTP through automation to move files, completely removed from human involvement. Using Ubuntu, a test of the AIStor SFTP solution through BIG-IP, including interactive perusal of the objects was conducted. #sftp -i ./miniouser123 -oPort=8022 [email protected] Although in S3 parlance, the AIStor system is made up of buckets and objects, buckets will appear as the traditional and very familiar “folder” to interactive SFTP users, and objects seen as files to be retrieved or uploaded. Nothing really changes, familiar commands like ls, cd and get as examples are fully supported. Here is an example of a simple login and retrieve sequence. Notice how a password-based login is not required since our CA-signed public key is provided by the user. Easy stuff for we humans. # sftp -i ./miniouser123 -oPort=8022 [email protected] Connected to 10.150.92.189. sftp> ls bucket001 sftp> cd bucket001 sftp> ls file001.txt file002.txt file003.txt file004.txt fileap15.txt sftp> get file001.txt Fetching /bucket001/file001.txt to file001.txt /bucket001/file001.txt 100% 299KB 5.5MB/s 00:00 sftp> The following demonstrates that, upon first connecting to the cluster with SFTP, the client instantiates a backed TCP connection to one of the AIStor pool members, the second “current” connection reflects that another client is also active. The small amount of traffic reflects low bit rate background keep alive-type exchanges. Upon retrieving the approximately 300 kilobyte file, an e-book, the counters are updated as expected. The outbound traffic, from the perspective of the AIStor node, is noted to be 2.4 million bits, or, dividing by eight, 300 kilobytes. We never said there would be no math. To simulate forcing the BIG-IP to seamlessly switch usage from the currently active back-end node to the AIStor .191 node, we can use the “Force Offline” feature. In highly consumptive TCP-based protocols, such as web browser traffic, where a single page display might drive 8 to 12 short-lived TCP connections to a given origin server, the force offline feature will allow established connections to finish but will preclude new connections being set up to the node. In the case of SFTP, which for interactive human-driven sessions, may see one connection stay up for hours or days until closed, even the offline node will maintain full service. To expedite our lab test, we can simply close our active SFTP client sessions and then reengage with the BIG-IP SFTP virtual server. We note that the BIG-IP has switched our SFTP client to the other AIStor. Downloading the e-book 300 kilobyte file, we see the counters agree with the first test run, just that the load balancer has ensured we are serviced by the in-service AIStor. Summary IT infrastructure and the protocols these solutions use do not arise overnight, many critical systems continue to use file management protocols like FTP, SFTP and FTPS that have permeated networking for decades. The ability to retroactively adjust applications to use object-first protocols, like S3-compliant API calls, is not going to always be trivial. Outside factors, such as data movement governance, may also lead enterprises to stay with perceived tried-and-true protocols. With MinIO’s introduction of AIStor support for the classic file moving protocols, there is a path now to tie into very large object stores where the economies of scale of larger, multi-protocol storage clusters and highly advanced data robustness features like erasure coding can merge. More data in a more resilient offering makes sense - this helps play a role in solidifying and modernizing your information lifecycle management story. Through BIG-IP traffic like SFTP was seen to make use of highly performant data delivery, including FastL4 mode. The decoupling of SFTP clients from individual storage nodes to, instead, point at a BIG-IP virtual server allows for vigorous health checking of nodes; traffic will get delivered in either direction even when any one node is off-line for something as mundane as a routine software upgrade. Through load balancing algorithms like “Least Connections” the overall load on the MinIO cluster will be optimized to transparently avoid troublesome hot spots.
Steve_Gorman
May 11, 2026 Place Technical Articles
370Views
2likes
0Comments