ai
97 TopicsAPIs First: Why AI Systems Are Still API Systems
AI and APIs Over the past several years, the industry has seen an explosion of interest in large language models and AI driven applications. Much of the discussion has focused on the models themselves: their size, their capabilities, and their apparent ability to reason, summarize, and generate content. In the process, it is easy to overlook a more fundamental reality. Modern AI systems are still API systems. Despite new abstractions and new terminology, the underlying mechanics of AI applications remain familiar. Requests are sent, responses are returned. Identities are authenticated, authorization decisions are made, data is retrieved, and actions are executed. These interactions happen over APIs, and the reliability, security, and scalability of AI systems are constrained by the same architectural principles that have always governed distributed systems. What is new is not the presence of APIs, but the nature of the consumer calling them. In traditional systems, API consumers are deterministic. They are code written by engineers who read the documentation and invoke endpoints in predictable ways. In AI systems, the consumer is increasingly a model, a probabilistic component that infers behavior from schemas, chains calls dynamically, and produces traffic patterns that were not explicitly programmed. That single shift is what makes every downstream concern in this series, including MCP design, token budgets, authorization, and operations, behave differently than in traditional API platforms. Understanding this relationship is critical, not only for building AI systems, but for operating and securing them in production. AI Applications as API Orchestration Platforms At a high level, an AI application is best understood not as a single model invocation, but as an orchestration layer that coordinates multiple API interactions. A typical request may involve: A client calling an application API Authentication and authorization checks Retrieval of contextual data from internal or external services One or more calls to a model inference endpoint Follow-on tool or service calls triggered by the model’s output Aggregation and formatting of the final response From an architectural perspective, this is not fundamentally different from any other multi-service application. Routing, observability, traffic management, and trust boundaries remain as relevant here as in any traditional platform. What has changed is that the decision logic, meaning when to call which service and with what parameters, is increasingly driven by model output rather than static application code. That shift does not eliminate APIs. It increases their importance. AI Application as an Orchestration Platform Models as API Endpoints, Not Black Boxes In production environments, models are consumed almost exclusively through APIs. Whether hosted by a third party or deployed internally, a model is exposed as an endpoint that accepts structured input and returns structured output. Treating models as API endpoints clarifies several important points. A model does not "see" your system. It receives a request payload, processes it, and returns a response. Everything the model knows about your environment arrives through an API boundary. What distinguishes model endpoints from conventional APIs is not their interface, but their operational profile. Responses are frequently streamed rather than returned as a single payload, which changes how load balancers, proxies, and timeouts behave. Payload sizes are highly variable, with both requests and responses ranging from a few hundred bytes to many megabytes depending on context and output length. Rate limits are often expressed in tokens per minute rather than requests per second, which complicates capacity planning and quota enforcement. Self-hosted models introduce additional concerns around GPU scheduling, cold start latency, and memory pressure that do not exist for traditional stateless services. These characteristics do not change the fundamental nature of a model as an API endpoint. They do mean that the operational assumptions built into the existing API infrastructure may not hold without adjustment. Tools, Retrieval, and Data Access Are Still APIs As AI systems evolve beyond simple prompt-and-response interactions, they increasingly rely on tools: databases, search systems, ticketing platforms, code repositories, and internal business services. These tools are almost always accessed through APIs. Retrieval-augmented generation, for example, is often described as a novel AI pattern. In practice, it is a sequence of API calls: An embedding service is called to encode a query A vector database is queried for relevant results A document store is accessed to retrieve source material The retrieved data is passed to the model as context Each step carries the usual concerns: latency, authorization, data exposure, and error handling. The model may influence when these calls occur, but it does not change their fundamental nature. Why API Design Matters More in AI Systems If AI systems are built on APIs, why do they feel harder to manage? The answer lies in amplification. Model-driven systems tend to: Chain API calls dynamically Surface data in ways developers did not explicitly anticipate Expand the blast radius of a misconfigured authorization Increase sensitivity to payload size and response shape A poorly designed API that returns excessive data may be tolerable in a traditional application. In an AI system, that same response can overflow context limits, leak sensitive information into prompts, or cascade into additional unintended tool calls. This amplification rarely stays within a single domain. A schema decision that looks like an application concern becomes a traffic and routing concern when responses grow unpredictably, and an authorization concern when a model uses that response to drive the next call. Design choices that were once contained within one team’s scope now propagate across the stack. In this sense, AI does not introduce entirely new architectural risks. It magnifies existing ones. Introducing MCP as an API Coordination Layer As models gain the ability to invoke tools directly, the need for consistent, structured access to APIs becomes more pressing. This is where Model Context Protocol (MCP) enters the picture. At a conceptual level, MCP does not replace APIs. It standardizes how AI systems discover, describe, and invoke API-backed tools. MCP servers typically sit in front of existing services, exposing them in a model-friendly way while relying on the same underlying API infrastructure. Seen through this lens, MCP is not a departure from established architecture patterns. It is an adaptation, one that acknowledges models as active participants in API-driven systems rather than passive consumers of text. But it is also the introduction of a new coordination layer, a tool plane, with its own operational, network, and security properties that do not map cleanly onto the API layer beneath it. The rest of this series examines what that means for the systems you build, run, and secure. Looking Ahead If AI systems are still API systems, then the familiar disciplines of API architecture, security, and operations remain essential. What changes is where decisions are made, how data flows, and how quickly small design flaws can propagate. The next article looks more closely at MCP itself, examining how it standardizes tool access on top of APIs and why treating it as a tool plane helps clarify both its power and its risks. From there, the series turns to tokens as a first-class design constraint that shapes tool schemas, response shaping, and traffic behavior. The fourth article addresses authorization and the security implications of letting models invoke tools directly, including identity, delegation, and the expanded blast radius MCP introduces. The series closes with a look at operating MCP-enabled systems in production, where reliability, cost, and safety have to be enforced rather than assumed. Resources: Article Series: MCP, APIs, and Tokens: Building and Securing the Tool Plane of AI Systems (Intro) MCP, APIs, and Tokens (Part 1 - APIs First: Why AI Systems Are Still API Systems) MCP, APIs, and Tokens (Part 2 - MCP as the Tool Plane: Standardizing Access Across APIs) MCP, APIs, and Tokens (Part 3 - Tokens as a Design Constraint for MCP and APIs) MCP, APIs, and Tokens (Part 4 - Securing the Tool Plane: MCP, APIs, and Authorization) MCP, APIs, and Tokens (Part 5 - Designing for the Inference Track: Safe, Scalable MCP Systems)73Views2likes1CommentFrom Chat to Config: Building an AI-Native MCP Server for F5 Distributed Cloud
The Problem: F5 Distributed Cloud is Powerful but Verbose Anyone who has worked with F5 Distributed Cloud (XC) knows the platform is incredibly capable. HTTP load balancers, WAF policies, API security, origin pools, namespaces, service policies—the feature set is deep. But with depth comes complexity. A single POST to create an HTTP load balancer with WAF, HTTPS auto-cert, and an origin pool involves carefully crafting nested JSON across three or four separate API calls, each with its own spec structure. For experienced engineers, this is manageable. But what if you could just say: "Create an HTTPS load balancer for test-namespace, attach a WAF policy in blocking mode, origin server at 10.10.10.10 port 80 with an HTTP health check, auto-cert on port 443 with HTTP redirect" …and have all of that happen automatically, correctly, with dry-run safety by default? That's exactly what I built. This article walks through the F5 XC MCP Server—an open-source Model Context Protocol server that translates natural language commands from Claude Code or GitHub Copilot directly into F5 XC API calls. What is MCP? Model Context Protocol (MCP) is an open standard introduced by Anthropic that lets AI assistants (like Claude) call external tools and services through a structured interface. Think of it as a plugin system for AI — instead of the AI just generating text, it can actually do things: query APIs, read files, run commands, interact with platforms. An MCP server exposes a set of tools—typed functions with names, descriptions, and input schemas. When you ask Claude Code something like "list all my namespaces in F5 XC," it finds the right tool (xc_list_namespaces), calls it with the right parameters, and shows you the result. No copy-pasting API tokens into curl commands. No hunting through docs for the right endpoint path. MCP clients could be the popular AI coding tool, or any custom build MCP client, such as: Claude Code (via VS Code extension—the one I primarily used) GitHub Copilot (via VS Code extension) Any MCP-compatible client MCP servers can run locally (via stdio) or remotely (via HTTP/HTTPS). Architecture The server is built in TypeScript using the @modelcontextprotocol/sdk, with axios for F5 XC API calls and zod for input validation. The structure is intentionally simple: Key design decisions: Dry-run by default. F5_XC_DRY_RUN=true is the default. Every mutating call returns a preview of what would be sent rather than actually calling the API. This makes it safe to explore and prototype without fear. Set F5_XC_DRY_RUN=false when you're ready to go live. Dual auth. Supports both API token (Authorization: APIToken …) and mTLS certificate auth (https.Agent with PEM cert + key). The certificate extracted from the F5 XC .p12 credential file works directly. Dual transport. stdio for local use with Claude Code/Copilot; streamable HTTP/HTTPS for team-shared remote deployment. Terraform as fallback. When the REST API doesn't support an operation (more on this below), tools automatically generate ready-to-apply Terraform HCL using the volterraedge/volterra provider. The Four Use Cases The server covers four areas matching common F5 XC workflows: UC Tools Example operations UC-1 Identity Namespace CRUD, API credentials Create namespace, list users/groups, audit credentials UC-2 Load Balancer Origin pool CRUD, HTTP LB CRUD Create HTTPS LB with auto-cert, add origin pool UC-3 Security App Firewall (WAF) CRUD, Service Policy CRUD Create WAF in blocking mode, attach to LB UC-4 API Security API definition CRUD, App API group CRUD, raw request Import OpenAPI spec, create API group In total the server exposes 39 tools—enough to cover the majority of day-to-day F5 XC operations from a conversation. A Live Demo Walkthrough Here's a real session—every one of these was a natural language instruction to Claude Code, which called the appropriate MCP tool automatically. Step 1: Create a namespace "Create a new namespace called test-namespace" Step 2: Create an HTTP load balancer — then upgrade it "Create an HTTP load balancer named test-http-lb in test-namespace" The tool calls xc_create_http_lb with a minimal spec. State returns as VIRTUAL_HOST_READY in seconds. "Change protocol from HTTP to HTTPS port 443, use automatic certificate management from XC, add origin server 10.10.10.10 port 80 with default HTTP health check" This triggers three tool calls automatically: xc_raw_request → creates an HTTP healthcheck object xc_create_origin_pool → creates origin pool with 10.10.10.10:80, references the healthcheck Delete old LB + xc_create_http_lb → recreates as HTTPS with https_auto_cert, HTTP→HTTPS redirect, pool attached Note: F5 XC does not allow changing the LB type (HTTP → HTTPS) via PUT. The MCP server detected this and handled the delete-recreate flow automatically. The result: Step 3: Create and attach a WAF policy "Create a WAF policy named test-waf-policy" xc_create_app_firewall — blocking mode, default OWASP detection, default bot settings. Done in one call. test-waf-policy created ✓ Field Value Name test-waf-policy Namespace test-namespace Mode Blocking (blocks detected attacks) Detection Default OWASP ruleset Bot protection Default bot settings UID 8d350b72-5c8c-4e7c-b827-248b03067ed3 The WAF policy is ready. Would you like to attach it to test-http-lb? "Enable this WAF policy on test-http-lb" xc_update_http_lb — removes disable_waf, adds app_firewall reference. Verified with a GET to confirm disable_waf is gone and app_firewall.name is set. WAF policy enabled on test-httlp-lb ✓ Field Value WAF policy test-waf-policy (blocking mode) disable_waf removed LB state VIRTUAL_HOST_PENDING_A_RECORD Total time from zero to a WAF-protected HTTPS load balancer: under 2 minutes, all from natural language. The API Limitation Discovery — and the Terraform Fallback One of the most interesting findings during development: F5 XC's public REST API does not expose user/group write operations. Every path I tried returned either 404 or 501 Not Implemented: This is intentional by design—F5 XC routes user management through its Console UI. The Terraform volterraedge/volterra provider also didn’t help for users, group management. Rather than leaving the user with a dead end, I built a Terraform fallback: when a user group write fails, the tool’s response automatically includes: The AI can then call xc_tf_apply directly to execute it—or the user can copy the HCL and apply it themselves. The Terraform runner operates in isolated temp directories, cleans up after itself, and respects the global dryRun flag (plan instead of apply when dry-run is active). This pattern—REST first, Terraform as fallback—turned out to be a very useful architectural choice. It gracefully handles the gap between what the API exposes and what the platform can actually do. Deploying to Production: HTTPS with Automatic Certificates For a shared team tool, local stdio mode isn't enough. The server needs to be always-on, accessible over HTTPS, and with a real TLS certificate. The deployment stack on an Azure Ubuntu VM: Node.js 20 (via nvm) running the MCP server on port 3000 as a systemd service Caddy as a TLS-terminating reverse proxy—one config file, automatic Let's Encrypt The entire Caddy config: Caddy handles the ACME HTTP-01 challenge automatically. The Let's Encrypt certificate was issued in under 10 seconds after DNS propagated. Auto-renewal is built in—no cron jobs, no certbot timers. One gotcha worth noting: the default Caddy proxy timeout (30s) is shorter than some F5 XC API calls (namespace creation can take ~45s). The response_header_timeout 90s setting above is necessary. With this setup, the MCP endpoint is https://your-domain/mcp — usable from any MCP client without VPN or local server setup. Connecting Claude Code to the Remote Server Add this to your Claude Code MCP configuration (~/.claude.json or .claude/settings.json in your project): That's it. After a /mcp reload in Claude Code, all 39 tools are available. You can verify with: "Show me the F5 XC server status" Which calls xc_server_status and returns tenant, auth method, dry-run state, and Terraform auth status. Lessons Learned The F5 XC REST API is comprehensive for data plane operations, limited for identity management. Load balancers, WAF policies, origin pools, API definitions — all fully CRUD-able via REST. User and group management is not. Plan accordingly if your use case involves IAM automation. Dry-run mode is not optional — it's essential. Without it, a misunderstood instruction could delete a production load balancer. Making dry-run the default (and requiring explicit override per-call or globally) is the right design for any AI-driven ops tool. Tool descriptions matter more than you think. The quality of an MCP tool's description directly affects how accurately the AI uses it. Spending time writing precise, example-rich descriptions — including what fields are required, what values are valid, and what the return looks like — significantly improves the AI's ability to compose multi-step operations correctly. Graceful degradation beats hard failures. The Terraform fallback pattern is a good example. Rather than returning a cryptic API error and stopping, surfacing the equivalent HCL and offering to apply it keeps the workflow moving. Users get an answer even when the API says no. LB type changes require delete+recreate. The F5 XC API rejects PUT requests that change the load balancer type (e.g., HTTP → HTTPS). The MCP server handles this automatically by detecting the error and orchestrating the delete-recreate sequence — a good example of where the AI layer can absorb platform-specific quirks. What's Next This is v1.0 — functional, deployed, and covering the core use cases. Areas I'm exploring for future versions: API security scanning integration: trigger XC's web application scanning from the MCP server and return findings Multi-tenant support: switch tenants within a session without restarting the server Policy-as-code export: serialize existing LBs and WAF configs to Terraform HCL for IaC migration Audit/diff mode: compare current live config against a desired state and report drift Try It Yourself The server is open source on GitHub: https://github.com/gavinw2006/F5_XC_MCP_Server Prerequisites: Node.js 18+, an F5 XC tenant with an API token, and Claude Code or any MCP-compatible client. The first thing to try once connected: "Show me the F5 XC server status, then list all namespaces" Happy to hear feedback, questions, and PRs from the DevCentral community. If you build something on top of this—a new tool module, a different transport, integration with another F5 product—I’d love to know about it.31Views0likes0CommentsEnhancing AI Data Pipelines with BIG-IP v21: Discover S3 Integration
F5 BIG-IP v21 revolutionizes AI data pipelines with advanced support for S3-compatible object storage, enabling enterprises to optimize, secure, and scale AI and analytics workflows seamlessly. By introducing S3-tuned traffic profiles, intelligent load balancing, and robust health monitoring, BIG-IP ensures predictable performance, resiliency, and protection against protocol-specific threats. This transformative delivery layer empowers businesses to handle complex workloads efficiently, making AI-driven innovation faster, smoother, and more reliable than ever.
77Views2likes0CommentsUse SFTP and FTP to Join Critical IT Systems to Modern Object Storage with F5 BIG-IP and MinIO AIStor
Around the world, many critical IT systems require moving data repeatedly but pre-date the rise of object storage solutions. These newer solutions largely harness the S3-compliant API. IT applications at risk of being left behind frequently use well-established file management protocols including FTP and SFTP. The cost and talent to retrofit is daunting, attempts to integrate these apps into the modern, low-cost world of object storage may not be palatable. To now, external gateway appliances might be one strategy. However, this adds hardware costs, latency, and failure points. Separate authentication systems for SFTP and S3 create fragmented security. The solution described in this article joins traditional clients to MinIO’s AIStor, which provides native FTP and SFTP control planes and not just S3 object access. Traffic robustness is accentuated by F5 BIG-IP, which allows loose coupling between IT client systems and the back-end MinIO storage nodes. File Management Protocols – Not Going Anywhere The File Transfer Protocol (FTP) was first codified in RFC 114 in April of 1971; and it’s still very much in use today. Frequently, as security awareness in the industry rose, the TLS-based companion protocol File Transfer Protocol Secure (FTPS) gained prominence. Both continue to be used today, one contentious issue is the use of multiple TCP ports during sessions, as well as the required discipline to maintain valid X.509 certificates for authentication in FTPS conversations. Meanwhile, Secure Shell File Transfer Protocol (SFTP) concurrently arose, and benefits from being a simpler, single TCP port solution with authentication frequently relying on easier, pre-created key exchanges. One essential item to keep in mind from the start, SFTP transfers its data over Secure Shell (SSH) version 2, making it distinct from TLS-carried protocols such as HTTPS, SMTPS, DNS over TLS (DoT) and the aforementioned FTPS. To support the vast investment in these traditional file moving protocols, MinIO has developed a server side offering for them. When traditional BIG-IP load balancing is introduced, such as in this KB article and companion how-to video that discusses load balancing SFTP, we achieve the desirable decoupling of clients from individual AIStor nodes. By interacting with a BIG-IP virtual server, traffic can be load balanced and the failure or taking off-line of one node will not stop the upload or download of files. If one MinIO node becomes a hot spot of activity, a new load can proportionally task other less-utilized nodes. Lab Validation with BIG-IP and AIStor The following diagram depicts the environment used for investigating this union of traditional file transfer protocols and modern object storage. Of the possible legacy file management protocols, why was SFTP double-clicked upon? A number of reasons, including the fact SFTP is downright young compared to FTP, with an IETF specification dating back to only 1997. More importantly, although numbers may be hard to come by, all indications are SFTP usage will remain steady and vital for years to come. The principal reasons for SFTP to be used in IT to this day include: Compliance Requirements: SFTP is essential for meeting regulatory frameworks like GDPR and HIPAA, in conjunction with providing a reliable audit trail. SFT is heavily used for automated, scheduled batch workflows, this includes importing/exporting of data to partners in B2B data exchanges. The growth of big data has pushed the value added by external Extract, Transform, Load (ETL) vendors, with nightly data movements often being SFTP-based. The lack of firewall complexity, with a single well-known tcp port, such as port 22, often being the only “allow” rule required. The ETL space in particular is significant, with some estimates placing the dollar value around this technology at over US $10 billion in 2026, with a doubling predicted by 2031. Configure AIStor and BIG-IP for SFTP Traffic An existing AIStor node cluster is easily adjusted to support protocols such as SFTP, FTP, and FTPS. Generally, AIStor nodes are automatically started with Linux’s systemctl to run the MinIO offering at each startup. For quick lab testing, though, one may simply start AIStor interactively from the command line. In the case of adding SFTP support, we merely add the highlighted flags to the startup. #minio server /data/disk1/minio --console-address ":9001" --sftp="address=:8022" --sftp="ssh-private-key=./ca_user_key" --sftp="trusted-user-ca-key=./ca_user_key.pub" The initial command portions are standard fare, in this simple lab case of single drive nodes; we point to the disk at /data/disk1/minio and per common practice, run the AIStor GUI on TCP port 9001. By default, S3 API calls will utilize port 9000. The SFTP additions, presented in yellow above, tell AIStor to accept SFTP control plane commands, things like “get”, “put”, “ls” and “cd”, on TCP port 8022. The only new ground for some may be the SSH key referenced, however MinIO has documented an easy-to-follow guide on creating these towards the latter part of this linked page in the standard documentation. My first thought would be the unpleasant possibility of an administrative workload here, frequently SSH-key based authentication means the loading of each potential user’s public key into an “authorized_keys” file on each server node. In reality, the delivered solution is more elegant and much simpler to maintain. Three keys will be created: Public key file for the trusted certificate authority (you create this certificate authority, one single run of #ssh key-gen). Public key file for the AIStor Server, minted and signed by the trusted certificate authority. Public key file for the user, minted and signed by the trusted certificate authority for the client connecting by SFTP and located in the user’s .ssh folder (or equivalent for their operating system). In my lab setup, which uses 2 AIStor nodes to allow for load balancing, I started by creating a user in the AIStor GUI. The user was simply named “miniouser123”. As such, the ssh miniouser123.pub key creation for step 3 would look like the following: ssh-keygen -s ~/.ssh/ca_user_key -I miniouser123 -n miniouser123 -V +90d -z 1 miniouser123.pub The net result is a CA-signed public key, or in other words, an SSH certificate, that allows AIStor nodes to trust the miniouser123 public key when provided upon SFTP connection. The -V flag indicates the public key will be trusted for 90 days and the -z option sets a serial number to 1. This signing of the user’s public key has a series of security benefits, such as (i) the enforcement of an expiration timeframe, (ii) the ability to enact a KRL (Key Revocation Lists, analogous to the use of CRL with X.509 certificates) and finally (iii) the fact that principals, including the username, can be embedded in the public key. Once a lab, including integration with BIG-IP, is completed, it is likely better to move from invoking the AIStor come the command line (eg #minio server /data/disk1 plus your flags) to an automatic startup with Linux systemctl options. In this case, the approach is to embed the flags specifically needed for file management protocols like SFTP or FTP, into the /etc/default/minio file. Here is a sample for a two node (10.150.91.190 and .192), single drive lab setup: MINIO_VOLUMES="http://10.150.91.{190...191}:9000/data/disk1/minio" MINIO_LICENSE="/opt/minio/minio.license" ## Use if you want to run MinIO on a custom port. ## add --address and --console-address to MINIO_OPTS: # MINIO_OPTS="--address :9000 --console-address :9001 [OTHER_PARAMS]" MINIO_OPTS=' --sftp="address=:8022" --sftp="ssh-private-key=/sshkeys/ca_user_key" --sftp="trusted-user-ca-key=/sshkeys/ca_user_key.pub" ' Now to ensure startup with every reboot and to also start right now, we simply issue the two commands: #systemctl enable minio #systemctl start minio BIG-IP SFTP Load Balancing Setup Following the guidance of the F5 KB articles referenced earlier, the first step would be to create an SFTP health monitor. In production, the more advanced monitor, that aims to successfully connect to each AIStor with SFTP commands, every 15 seconds, might be best practice. In a lab setup, the monitor to establish a half-open TCP connection on the desired TCP port 8022 is sufficient (double-click to enlarge image). We now simply add our AIStor cluster members, in our case on port 8022 for SFTP. Concurrently, the BIG-IP can support other protocols including FTP and, of course, S3 access too. From the BIG-IP GUI, simply select Local Traffic -> Pools -> Pool List and the “Create” button. The only settings are to tie the pool to your SFTP monitor and select the pool AIStor members, as shown in the next image. Note the load balancing default method will be “Least Connections” to even out individual SFTP active loads on each AIStor node. We will see in the virtual server setup that good practice is normally to allow persistence based upon source IP addresses. As such, when new transactions arrive from a previously serviced client; the solution will prefer to engage the same storage node, if healthy. The virtual server setup for SFTP is largely just like a web-oriented virtual server, although we would not gain the same insights from using a “standard” mode virtual server and prefer to use a “performance” mode instance. This is due to the fact that web technologies over TLS, like HTTPS browsing or S3-compatible API commands which harness HTTPS, allow for TLS interception at the proxy. This opens up use cases like iRules HTTP header rewrites or content scanning, to name just two. Since SFTP is using SSH not TLS for encryption, the produced traffic is not aligned with in-flight interception for decryption and re-encryption. The first key benefits of BIG-IP will be in hot spot avoidance, where a busy AIStor can be shielded by spreading traffic to less busy nodes, and the ability to loosely couple clients to the service. This is to say, IT systems using SFTP (or FTP/FTPS) can be configured to use the virtual server IP or FQDN as an endpoint and an AIStor node may be taken offline, such as during maintenance windows, completely unbeknownst to clients. Other significant benefits of BIG-IP lie with performance. The settings for a virtual server of type “Performance (layer 4)” are highlighted in red, and the settings for virtual server IP address and TCP port are yellow highlighted. The Protocol Profile has been set to “fastL4”, one of F5’s most performant profiles. The following KB article details the characteristics of the fastL4 profile, all generally steered towards peak data delivery rates. One of the principal features for BIG-IP hardware platforms that contain the ePVA chip: the systems make flow acceleration decisions in software and then offload eligible flows to the ePVA chip for acceleration. For platforms that do not contain ePVA chips, the systems perform acceleration actions in software. Finally, we request client source IP address persistence. A given client’s traffic will be directed to the same backend node if it has been active in the past. If the node is out of service, due to a fault or perhaps maintenance for upgrades, another node will be used. The first time a client is seen, the pool’s load balancing algorithm will come into play, in this case “Least Connections” will guide the initial node selected. Lab Testing of SFTP Load Balancing to AIStor Storage Servers Popular operating systems like Ubuntu or Windows-11 will offer a sftp client directly from the command line. Alternatives include simple applications like WinSCP (Windows), CyberDuck (Mac/Windows) and FileZilla (cross platform). Of course, in enterprise networks, the key driver for SFTP support will be existing IT systems that use SFTP through automation to move files, completely removed from human involvement. Using Ubuntu, a test of the AIStor SFTP solution through BIG-IP, including interactive perusal of the objects was conducted. #sftp -i ./miniouser123 -oPort=8022 miniouser123@10.150.92.189 Although in S3 parlance, the AIStor system is made up of buckets and objects, buckets will appear as the traditional and very familiar “folder” to interactive SFTP users, and objects seen as files to be retrieved or uploaded. Nothing really changes, familiar commands like ls, cd and get as examples are fully supported. Here is an example of a simple login and retrieve sequence. Notice how a password-based login is not required since our CA-signed public key is provided by the user. Easy stuff for we humans. # sftp -i ./miniouser123 -oPort=8022 miniouser123@10.150.92.189 Connected to 10.150.92.189. sftp> ls bucket001 sftp> cd bucket001 sftp> ls file001.txt file002.txt file003.txt file004.txt fileap15.txt sftp> get file001.txt Fetching /bucket001/file001.txt to file001.txt /bucket001/file001.txt 100% 299KB 5.5MB/s 00:00 sftp> The following demonstrates that, upon first connecting to the cluster with SFTP, the client instantiates a backed TCP connection to one of the AIStor pool members, the second “current” connection reflects that another client is also active. The small amount of traffic reflects low bit rate background keep alive-type exchanges. Upon retrieving the approximately 300 kilobyte file, an e-book, the counters are updated as expected. The outbound traffic, from the perspective of the AIStor node, is noted to be 2.4 million bits, or, dividing by eight, 300 kilobytes. We never said there would be no math. To simulate forcing the BIG-IP to seamlessly switch usage from the currently active back-end node to the AIStor .191 node, we can use the “Force Offline” feature. In highly consumptive TCP-based protocols, such as web browser traffic, where a single page display might drive 8 to 12 short-lived TCP connections to a given origin server, the force offline feature will allow established connections to finish but will preclude new connections being set up to the node. In the case of SFTP, which for interactive human-driven sessions, may see one connection stay up for hours or days until closed, even the offline node will maintain full service. To expedite our lab test, we can simply close our active SFTP client sessions and then reengage with the BIG-IP SFTP virtual server. We note that the BIG-IP has switched our SFTP client to the other AIStor. Downloading the e-book 300 kilobyte file, we see the counters agree with the first test run, just that the load balancer has ensured we are serviced by the in-service AIStor. Summary IT infrastructure and the protocols these solutions use do not arise overnight, many critical systems continue to use file management protocols like FTP, SFTP and FTPS that have permeated networking for decades. The ability to retroactively adjust applications to use object-first protocols, like S3-compliant API calls, is not going to always be trivial. Outside factors, such as data movement governance, may also lead enterprises to stay with perceived tried-and-true protocols. With MinIO’s introduction of AIStor support for the classic file moving protocols, there is a path now to tie into very large object stores where the economies of scale of larger, multi-protocol storage clusters and highly advanced data robustness features like erasure coding can merge. More data in a more resilient offering makes sense - this helps play a role in solidifying and modernizing your information lifecycle management story. Through BIG-IP traffic like SFTP was seen to make use of highly performant data delivery, including FastL4 mode. The decoupling of SFTP clients from individual storage nodes to, instead, point at a BIG-IP virtual server allows for vigorous health checking of nodes; traffic will get delivered in either direction even when any one node is off-line for something as mundane as a routine software upgrade. Through load balancing algorithms like “Least Connections” the overall load on the MinIO cluster will be optimized to transparently avoid troublesome hot spots.48Views1like0CommentsImplementing Risk-Based Actions with AI-Powered WAF: Customer Policy Paths
Why Custom policy is where risk-based actions matter most The default policy is straightforward: it applies a broad mix of signatures, threat campaigns, and violations; “Enhance with AI” is an optional add-on. Custom policies are where customers can accidentally recreate the same problems Risk Scoring is designed to solve—usually by combining: Overly broad/noisy signature selection (especially low-accuracy signatures) Aggressive enforcement (blocking Medium too early) Disabling/excluding key signatures and unintentionally reducing ML invocation So the rest of this blog is a tight, configuration-oriented walkthrough of the Custom path. Custom policy: configuration walkthrough (decision points → operational outcomes) Baseline: Navigate to the Custom controls LB Config → Web Application Firewall Create/edit the WAF object (Metadata `Name`, etc.) Set Security Policy = Custom Choose Signature Selection by Accuracy Optionally enable Enhance with AI (Risk Scoring) If enabled, optionally configure Action by Risk Score (risk-based enforcement) Step 1: Signature Selection by Accuracy (choose your baseline level) Accuracy indicates susceptibility to false positives: Low: high likelihood of false positives Medium: some likelihood of false positives High: low likelihood of false positives Note: This setting is foundational: it determines which signatures are active, and therefore the quality and volume of detection signals that feed into downstream risk evaluation. Operationally: High accuracy tends to support faster, safer enforcement. Medium/Low accuracy can expand coverage but increases the chance you’ll need exceptions, investigations, or staged rollout discipline. Step 2: Enhance with AI (turn on Risk Scoring) Enhance with AI = On enables AI-powered risk scoring and assigns each request a High/Medium/Low risk score using layered signals. Two implementation details to make explicit in your blog because they affect customer expectations: ML invocation depends on enabled signatures firing in the specified injection/execution categories. If teams disable/exclude those signatures, they may reduce when the model runs—changing practical behavior of risk evaluation. Step 3: Action by Risk Score (map risk levels to enforcement) When Action by Risk Score is enabled: By default, high-risk requests are blocked Users can choose whether Medium-risk requests are blocked (via dropdown) This is the primary knob that determines how quickly a user decides to move from “safe enforcement” to “broad enforcement.” Recommended rollout path: Day 0 → Day 7 → Steady state This is the most common and safest operational progression for customers Day 0 (safe enforcement baseline) Custom → Signature Selection by Accuracy = High (or High + Medium if you need broader coverage immediately) Enhance with AI = On Action by Risk Score = High Outcome Gets to blocking quickly while minimizing availability risk. High is blocked. This is the “prove safety while stopping obvious bad” posture. Day 7 (controlled expansion) Keep Custom + Enhance with AI + Action by Risk Score Optionally widen Signature Selection from High → High + Medium if coverage is insufficient Enhance with AI = On Action by Risk Score = High + Medium Outcome Expands detection inputs without immediately expanding enforcement. Teams focus on what’s landing in Medium and whether exclusions/disabled signatures are reducing ML invocation in key categories Steady state (mature enforcement) Custom → signature selection set to the broadest set Widen Signature Selection from High + Medium → High + Medium + Low Action by Risk Score = High + Medium Enhance with AI = On Action by Risk Score = High + Medium Outcome Risk outcomes become the enforcement interface. Broad, consistent blocking across apps/APIs with reduced per-app tuning and fewer signature-level decisions Common Pitfalls: Avoid Block Medium on Day 0 when including low-accuracy signatures—this is the fastest way to recreate false-positive outages. If you disable/exclude signatures in the key injection/execution categories, you can reduce ML invocation and change risk evaluation behavior. Summary Custom policies traditionally scale poorly because every app ends up with bespoke signature decisions and exception handling. Risk Scoring is designed to invert that: keep signatures as key signals but standardize enforcement via risk outcomes. If you implement Custom with the Day 0 → Day 7 → Steady state progression above, you get a predictable path from “block safely now” to “enforce broadly later” without returning to signature-by-signature tuning as your primary operating model.101Views1like1CommentYou Don't Have to Have Played to Understand the Game
Andy Reid barely played football. He was a community college tackle who transferred to BYU and then rode the bench for most of his time in Provo. Teammates remember him as the guy in the film room, not the guy on the field. He spent his Saturdays watching, taking notes, and pestering head coach LaVell Edwards with so many questions about strategy that Edwards eventually told him: Kid, you should coach. That’s the origin story of three Super Bowl wins for my local Kansas City Chiefs, six appearances across the Chiefs and Eagles, and one of the winningest coaches in NFL history. Not a stud player who worked his way down to the sideline, but a guy who asked a lot of questions, kept asking them, and turned that into a career. Richard Williams had never picked up a tennis racket in his life when he saw Virginia Ruzici win a tournament on TV in 1978 and decided his daughters were going to be world champions. He taught himself the sport from books and instructional videos and then wrote and implemented a 78-page plan for coaching Venus and Serena on the public courts in Compton when they were very young. Thirty Grand Slam singles titles between them later and the “you have to have played at the highest level to coach at the highest level” theory was looking pretty thin. Nobody looks at Reid's three Super Bowl rings and says “yeah, but did you really understand it without playing in the league?” Nobody tells Richard Williams his daughters’ Grand Slam titles have an asterisk because he learned the game from a VHS tape. We accept, in sports, that there’s more than one way to know a thing. Somehow that grace evaporates the second AI enters the conversation. Doing isn't understanding There’s a flavor of pushback on AI use that goes something like: “you have to do it manually first to really understand it.” Sometimes that’s gatekeeping in a wise-elder’s costume. Sometimes it’s a genuine concern. An experienced person who built their intuition the hard way, watching newer folks skip the grind, and worrying (not unreasonably) that the intuition won’t form. But “doing it manually” and “understanding it” aren’t the same thing. They overlap, but they’re not the same thing. You can grind through a problem manually for years and still not understand the system around it. And you can understand a system deeply without having implemented every piece of it yourself, if you’re willing to ask enough questions. The questioning is the work Here’s the part I think people miss when they’re worried about AI making us dumber: A lot of what an expert does for you, when you’re lucky enough to have one, is answer questions patiently. Over and over. Sometimes the same question is phrased three different ways because you didn’t quite get it the first time. Sometimes a dumb question that you’d be embarrassed to ask on a Slack channel. Good mentors don’t get tired of this. But there are very few good mentors. They’re busy, and you only get so many of them in a career. I’ve been at this for thirty years now and I can count the great mentors I’ve had on one hand. LLMs don’t get tired. They don’t sigh. They don’t make you feel stupid for asking why something works the way it does for the fourth time. And the act of formulating the question, asking "what exactly am I confused about?" and "what do I need to know to clear the fog?” That’s a huge chunk of where understanding actually comes from. The model is a sparring partner for your own thinking, if you let it be one. Use it as a vending machine and you’ll get exactly that: answers, not understanding. The tragic version of LLM use is the one where someone pastes the problem, takes the answer, ships it, and walks away no smarter than they started. Then does it again the next day. And the next. Building a career out of outputs they couldn’t reproduce or defend if you took the tool away. That's the version the skeptics are right about. It just isn’t the only version. Andy Reid didn’t need to have been a pro-bowl tackle to understand offensive football. He needed to watch carefully, ask the right questions, and think rigorously about what he was seeing. Richard Williams didn’t need to have been on tour. He needed books, tapes, and the willingness to do the homework. Playing at the highest level is one path to understanding. But for systemic thinking, tactical thinking, architectural thinking, it might not always be the best one. Two things I learned this week First: I’m working on a side project where the FastAPI Cloud backend runs as a two-instance replica deployment. I started on SQLite, which worked fine until I realized writes were landing in whichever instance happened to handle the request, leaving me with two file-based databases with immediate data drift. I moved to a serverless Postgres database (Neon) to give both instances a single source of truth, and once I was there, realized I could just point dev and prod at the same data. Yes, in a real production system this is an anti-pattern and I’d never recommend it. But for a small project where I’m iterating fast and the bottleneck is my own understanding of the problem, not having to migrate data back and forth every time I want to test a frontend change or hunt down a bug? Game changer. I got there by talking the tradeoffs through with my good friend Claude. What breaks, when it breaks, what the actual risk surface looks like at my scale. Nobody handed me a "here's when to break the rule" tutorial. I asked questions until I understood the rule well enough to break it on purpose. Second: I'm building an on-box tool for BIG-IP (article coming soon), and I hit the HA problem. How do I keep state synced across boxes? My first instinct was file-based storage on the host, which, it turns out, is exactly where AS3 and SSL Orchestrator started. SSLO went a step further and built a dedicated sync layer called gossip to keep those files coordinated across the cluster. Over time, both products converged on a different approach: data-groups for metadata and iFiles for larger payloads, both of which ride along with standard config sync. That's a much smaller surface area to maintain, and it leans on infrastructure the platform already guarantees. So I'm following the same path: metadata in data-groups, data blobs in iFiles. I figured this out by interrogating Claude about how those products were architected, why they made the choices they did, and what the failure modes were. I could have read the source, and I could have tried to track down the developers and architects (and I should have over dinner to get the inside scoop). But the speed of “ask, get an answer, ask the next question, get an answer” let me sketch the whole design space in an afternoon. That's not skipping the understanding. That’s building it. Get off whose lawn? I get the resistance. Some of it is "get off my lawn." Some of it is genuine expertise feeling devalued. Some of it is real fear about what this technology means for the people who come up behind us. None of those concerns are stupid. The people who built their understanding the hard way, by tinkering, by breaking things, by reading source code under duress at 2am because there was no other way to get the system back online? They are not wrong about the value of that path. They earned something real in all that trial by fire. Some of them are the best engineers I know. The intuition that comes from years of manual struggle is a kind of literacy that doesn’t have a shortcut, and the people who have it are the ones I most want in the room when something goes sideways. But I’d push back on the specific claim that you must do every step manually to understand the thing. You don’t. Engage with it seriously. Ask real questions and chase the answers until they hold together. Be willing to be wrong. Notice when you’re wrong, and update accordingly. Used well, an LLM doesn’t dull that loop. It tightens it. The design decision, the tradeoff, the bet, I’m getting to that part of the problem sooner than I would have otherwise. Reid had Edwards. Williams had the library. The skeptics aren’t wrong that some understanding only comes from doing. They’re wrong that this is one of them.103Views1like1CommentF5 Insight for ADSP – Initial Setup in VMware
Demo Video Initial VMware Configuration Download the ova file from myf5.com. In VMware choose the Create/Register VM option and choose Deploy a virtual machine from an OVF or OVA file. Continue through the install wizard, which will upload the ova file to your VMware server. Uncheck the option to Power on automatically so you can edit the VM properties prior to boot. Note: Thick Provision Lazy Zeroed is recommended for performance Edit the Virtual Hardware options and set the hardware settings as follows: Note: A 600 GB disk formatted to Thick Provision Lazy Zeroed is recommended for performance Switch to the VM Options tab and expand Advanced Scroll down and click Edit Configuration Click Add parameter and add the following: guestinfo.userdata.encoding = base64 Create a local cloud-config.yml file to set the administrative username and password: Be sure to change the admin password and make a note of it. Then you need to base64 encode the file. Return to the VMware Configuration Parameters screen and Add another parameter named “guestinfo.userdata” and paste the base64 encoded text in the Value. Click OK when done. After saving the VM settings, you are ready to power on your VM for the first time! Note: Refer to the F5 Insight on VMware Deployment Guide for further details on this procedure. Post Boot VM Settings Open the VM Console and login to F5 Insight with the credentials specified in the cloud-config.yml file Configure the F5 Insight network settings using the following commands: Example: After hitting Enter, you will see the following: If no changes are needed, enter “y” to confirm. The output should look like the following: Note: Refer to the F5 Insight User Guide for further details on this procedure. Accessing the User Interface The initial configuration is complete and you can now log into the UI. You will see the Welcome screen. Click Next. Paste the text of the JWT Token and click Validate. If the license is activated, click Next. Enable the LLM Provider. Select your LLM Provider, Anthropic in this example. Enter your API Token/Key and the Enterprise API URL. Note that I am skipping TLS verification. Click Test Connection. Click Next if the test is successful On the next screen, select your preferred Setup Method. I’m using Start Fresh. Click Add Device Enter the Endpoint, Username and Password You can optionally configure a Certificate Authority and Data Center Select the Modules that are active and you want to monitor. Click Add Device. Click Next The configuration is complete. You can view the Home Page or the Device Settings. The Home Page should look like this: Conclusion F5 Insight for ADSP offers customizable visualizations and dashboards to help teams surface actionable metrics and KPIs tailored to your organization. It provides access to useful telemetry data for a deeper understanding of your environment, application behaviors, and complex BIG-IP deployments, all centralized in a single location. Identification of root causes during outages/tickets. Solve issues and struggles with Day 2 analysis of your BIG-IP Fleet and the applications therein. Mitigates the problem of a lack of detailed visual information on your BIG-IP Fleet. Set a foundation for the utilization of open-source tools and their benefits. Related Content Introducing F5 Insight for ADSP F5 Insight for ADSP - A Closer Look F5 Insight for ADSP Documentation F5 Insight Product Page F5 Insight Release Blog
183Views3likes0CommentsF5 BIG-IP OneConnect and MinIO S3 – Finetune TCP
The BIG-IP OneConnect feature may in some cases increase network throughput by efficiently managing connections created between the BIG-IP system and back-end pool members. The OneConnect feature works with HTTP Keep-Alives to allow the BIG-IP system to minimize the number of server-side TCP connections, such as those carrying S3 to MinIO AIStor servers, by making existing connections available for reuse by other clients. Why bother you might say, surely setting up rapid TCP connections is something BIG-IP and modern Linux-based server storage pools have well in hand? The reality is there can be tangible benefits from orchestrating the readiness of bulk TCP connections in the backend, advantages can be had by using pooled and immediately "on" TCP connections, potentially squeezing more performance from your S3 cluster. The goal of the following lab tests is to allow a reader to determine if further investigation is needed, to see if the “juice might be worth the squeeze”. Getting TCP Primed and Ready for S3 Flows A chief reason why a BIG-IP results in a scalable, performant S3 cluster is its role as a force multiplier. As opposed to S3 clients addressing individual nodes per request, we aim to abstract the full pool of nodes with a BIG-IP virtual server, a sort of "super" server which can incorporate all back-end resources. This means transactions can automatically be fanned out logically and land upon optimal origin servers in the pool. For S3 clients one target, the virtual server, for requests keeps complexity “out of sight, out of mind”. A broad set of algorithms are built in to BIG-IP as the front end for an AIStor cluster, to distribute load and avoid performance “hot spots” from inadvertently degrading service. The BIG-IP, though, is more than a load balancer. It’s a purpose-designed communications appliance, in both physical and software forms. One aspect of the appliance is the ability to sculpt protocol behavior to enhance the application experience. Drilling down, why would the back-end, from BIG-IP to a server cluster, actually benefit from controlled characteristics of something as ubiquitous as TCP, a foundational protocol that traces back to the 1970s? For one thing, the standard TCP control plane setup for a connection, SYN-SYN/ACK-ACK, is not completely without latency. In the context of wide-area networks, it is more significant than between collocated BIG-IPs and MinIO servers, however, it is not completely negligible. A method to consistently and frequently bypass this three-message setup phase is of interest. A second consideration is the memory consumption for each TCP connection; estimates vary, but a value of 3 to 4 kilobytes is a common assumption with Linux platforms. The reality of large ebbs and flows in TCP connections, as traffic rises and wanes, means many connections have passed through the useful data transport “established” phase and per the protocol specification are in the TIME_WAIT state, frequently referred to as 2MSL (two-times maximum segment lifetime). With 2MSL, connections cannot be removed entirely from memory as the legacy of TCP was to be wary of segments arriving late from peers, even long after the receiver had sent a TCP FIN message and received a TCP FIN ACK. Legacy networks exhibited more acute latency and packet-by-packet routing differences, meaning a safeguard TIME_WAIT was required. To this day, modern Linux systems often hold older, unused, TCP sessions in this state for 60 seconds, and some Windows platforms reach 3 to 4 minutes. The net result, the resource consumption on server resources, due to the true number of TCP connections in various states, is often much more significant than just the momentary live “data-on-the- wire” would suggest. A third of many reasons exists as to why a solution to constrain the number of new TCP sessions requiring setup is valuable. It is a byproduct of the number of round-trip times (RTT) required to maximize the TCP congestion window. In other words, the round trips to fully capitalize the fullest data carrying capacity of a newly setup connection. The amount of data in flight, unacknowledged, is traditionally only grown when positive feedback from across the network appears, the arrival of TCP ACK messages. To avoid overwhelming network capacity, algorithms like TCP Slow Start with Congestion Avoidance only expand the carrying capacity of the connection after a surprisingly high number of round trips. Best case, newer congestion control algorithms like CUBIC and BRR can take a few round trips, older stacks may take a dozen or more RTT to move "peak data". All of these reasons contribute to the notion that having an established pool of ready and waiting TCP connections primed to resume moving data from the front side of BIG-IP to the backend AIStor server pool members could be beneficial. How OneConnect Actually Works Through use of HTTP Keep-Alive directives, BIG-IP can instruct origin servers behind it, such as AIStor nodes, to keep TCP connections established, even after transactional data flow has completed. This is a good fit for S3 API compatible data flows which harness the HTTP protocol, normally within an HTTPS encrypted flow, and as such are excellent candidates for the use of Keep-Alives. OneConnect adds fine grain controls for Netops to control things like the maximum number of such connections to maintain, the amount of re-use in terms of total data before tearing down, and the ability to cap the amount of time idle connections are held up. One aspect of the HTTP-centric behavior we are trying to leverage is that the BIG-IP must be working in a mode where both full-header inspection and manipulation are supported. This means a “standard” virtual server profile, with its ability to perform full TLS interception, is a good fit. An http family of profile is also recommended. A pure layer-four proxy, such as the FastL4 server profile where TLS in not decrypted on the box, while very fast, is not a fit. A Lab Validation of OneConnect and MinIO AIStor To understand the out-of-the-box expected result of enabling a OneConnect profile, the following 4 node MinIO setup was used in a lab. To drive S3 load, and be able to have fine grained control of that traffic, the Linux command-line “warp” utility was installed on an Ubuntu client machine. Warp is an open-source load S3 traffic generator than can be downloaded here. Warp offers full “mixed” benchmarking options, mixed in the sense it can easily combine uploading, downloading, and deleting objects from an appropriately named “warp-benchmark-bucket” bucket on AIStor. The nice aspect is that an array of object sizes will automatically be included in each test run and cleaned up for you at the end. In the testing done, a fixed-size object, an e-book of 870 kilobytes, was used as the test object and warp was instructed to download it repeatedly. The --duration and --concurrent flags allow for control of how long tests run and the number of concurrent TCP connections to the BIG-IP virtual server that will be harnessed for S3 transactions. A simple S3 test syntax would be as follows, which runs a 1-minute test with transactions spread across 400 TCP connections: sudo ./warp get --host 10.1.40.163:9000 --list-existing --access-key 0t7hrh3nxhJPzEgBSEY8 --secret-key dfksjBkS3Bf5qs0R6rqkzijPrvp9clrn7IXq9UYX --duration 1m --concurrent 400 For TLS support, including the ability to accept certificates without requirement of authenticating them, simply add flags –tls and –insecure to the above example. Behavior Without the OneConnect Profile The lab-oriented S3 validation testing with MinIO AIStor started with a standard BIG-IP operating with a virtual server of type “Standard” to ensure the proxy operation had control and visibility of layer 7 features, in lock step with this the client’s http profile was set to the standard “http” profile. This will allow us to see the impact upon MinIO S3 traffic of subsequent adjustments, such as enabling OneConnect. As warp conducts a one-minute test, one practical screen to watch is the pool member connections and requests screen. To recap, in this warp test, all S3 transactions were GETs pulling an 800 kB electronic document from the cluster, repeatedly. As seen in the diagram, at this early stage within the one-minute test run, 400 backend TCP connections have been distributed with relatively equal weighting to the four backend AIStor nodes. To this point, as seen in the screenshot, 1.9K S3 transactions have been completed. At the end of the test run, warp provides a succinct summary of the results. In the case of this one-minute test run, we see 9,485 object downloads, with no errors, have been achieved (double-click to enlarge image). This corresponds to the pool statistics provided by BIG-IP. The key observation is that all backend TCP connections have now been torn down, there is zero left in place for this pool. Note also, the 9.5K requests displayed line up with the warp summary. Configure and Understand Settings of the OneConnect Profile To create a OneConnect profile for a lab evaluation, follow the path displayed below and click the (+) button to create a new instance. As with many shipped profiles with BIG-IP, it’s important to create a new profile instance. The idea is, although one can use the default, it’s not good practice to modify it. Instead, a new instance of the profile is created, and can be customized, and this can certainly be based upon the default. The following shows a new instance “OneConnect_Eval” profile, based upon the default, with some of the key parameters called out. After a virtual server, in our case for S3 traffic, has satisfied the transactions received, when tied to the above profile, the BIG-IP will keep any resultant idle TCP connections to the backend AIStor nodes established. When new S3 requests arrive, subsequently, OneConnect will analyze and, if available, use these idle connections to promptly communicate with the backend AIStor servers. The highlighted fields are to focus on the maximums, for instance, the highest number of idle connections that will be possibly maintained across server pools. Simply servicing a small burst of initial S3 traffic will only see just the required backend connections added afterwards to the OneConnect connection pool. Connections are not proactively established in advance of first seeing an actual need. Similar highlighted values put into place limits on the maximum age of the backed connections, although origin servers are free to close connections at their discretion. A maximum number of times an idle connection may be reused is also configurable, a level of fidelity common to most profiles within BIG-IP. A last note on two other fields. It’s noted that the option to “share pools” exists, enabling this would allow similar virtual servers, perhaps variants on front doors into MinIO clusters, to share back-end connections. Also, the connections on the back-end, internal side of a BIG-IP normally do not use the client source IP address observed by the virtual server on the front-end. Instead, the source address in use is normally pulled from “automap”, which is to say an available self-ip defined on the origin pool side of BIG-IP or an address from a configured SNAT pool. The source mask, if set to none, means transactions can utilize an idle OneConnect TCP connection with one source address, even if SNAT had suggested the use of another source address. As mentioned above, BIG-IP tends to allow the highest degree of flexibility possible, so a full 32-bit mask could restrict connection reuse to only an idle connection with a completely matching SNAT source IP address after the load balancing decision has been made. Evaluate OneConnect Under Lab Conditions With a S3 BIG-IP virtual server fronting traffic for the four-node AIStor cluster, warp was again used to apply load with high concurrency on the front end. There is a real-time OneConnect display, which can be found in the following spot in the BIG-IP TMUI interface. While the warp run is underway, we see that OneConnect is already in on the action. In the screenshot below, we are being told that OneConnect has seen 484 backend TCP connections created, and 3.8K times a TCP connection has already been re-used to carry a S3 transaction. At this snapshot in time, 94 connections are idle and candidates to carry new S3 requests, with the maximum idle count so far observed being 132. A very quick shorthand to know if OneConnect is taking place, beyond the specific counters above, is to simply watch the cluster pool statistics between test iterations of warp. If S3 traffic load is not actively being presented to the front end of BIG-IP, and “current” backend TCP connections are present in bulk, OneConnect is in effect. Remember, frequently refresh statistics or enable auto-refresh when doing analysis. Note that by design, any TCP-based health monitor traffic to pools that you may have configured is happily not included in the pool statistics, nicely avoiding any confusion. After two full warp runs, with high concurrency, the following were the OneConnect metrics. Of the roughly 19,000 S3 transactions unleashed by warp over two test runs, spaced out by a few minutes, 18,500 transactions are noted to have used existing and retained tcp connections. How cool is that? The value of OneConnect is compounded with intermittent S3 traffic intended for the MinIO backed, with AIStor servers. By default closing out idle connections that remain dormant for five minutes; traffic bursts within this window simply leverage the pool of waiting connections. The following full test run demonstrates that not one new TCP connection was required to complete another high-rate warp session. Addressing Non-Concurrent Traffic Loads Targeting AIStor An interesting feature of Minio warp for lab testing is the ability to send high S3 traffic loads over a highly controlled number of concurrent TCP connections to the virtual server on the front side of BIG-IP. By default, warp runs with 20 concurrent connections, with experiments leveraging S3 traffic over as few as one single connection and all the way up to hundreds of concurrent connections. In real world scenarios, especially with S3 traffic often being generated by applications and automation, the number of new connections per second a single traffic source may impose is worth noting. Take for example, a trivial bash shell script leveraging a repeated curl command requesting a MinIO-housed object, in this case with TLS enabled on both sides of BIG-IP. The net result in this case is a new S3 retrieval for each and every request of the object. As such, in a non-OneConnect setup, the BIG-IP counters display 100 connections to achieve 100 S3 retrievals. The amount of TCP setup alone, coupled with the fact that this is a full TLS interception scenario, and as such TLS will need to be established or resumed for each backend S3 request should be top of mind. With a standard OneConnect profile applied to the virtual server, the exact same test has a very different result on the wire, as seen below. We observe, in this simple lab setup at least a reduction of 92 percent in the required TCP connections, dropping from 100 to only 8. Nice. A Last Interesting Impact of OneConnect – Per S3 Transaction Load Balancing OneConnect is most well-known for its ability to constrain TCP sprawl in the backend of BIG-IP to origin server communications. However, there is another interesting byproduct. This is the fact that when a persistent TCP connection is used by a client to conduct serial transactions, such as S3 requests, the later requests will follow the same load balancing decision decided upon for the first request. This can be visually seen in the following BIG-IP screen after running a warp test with --connections set to only one, meaning all S3 GETs without an OneConnect profile attached at the virtual server will flow on a single TCP connection in the backed: sudo ./warp get --host 10.1.40.163:9000 --list-existing --access-key 0t7hrh3nxhJPzEgBSEY8 --secret-key dfksjBkS3Bf5qs0R6rqkzijPrvp9clrn7IXq9UYX --duration 1m --concurrent 1 The above is not necessarily negative behavior. It is simply a solution observation. In some cases, there are protocols that are served best by persisting one user's traffic to the same backend node. Also, this exercise is quite trivial, based upon a single S3 load source. In a real-world deployment, there might be hundreds or thousands of sources, which would serve to even out overall backend AIStor node utilization. However, in some cases, having the 4.3K requests in the previous image individually analyzed and load balanced in isolation makes sense. Take one example, a transaction may carry a cookie header, a cookie assigned previously by the load balancer and intended to persist just by traffic presenting that specific cookie to a specific node. The cookie can normally serve to override an individual load-balancing decision and deliver the transaction to a specific backend server. BIG-IP can fully support cookie persistence, it is described here. In such a case, how can one achieve per-request load-balancing decisions, even on persistent front-end connections with serialized requests? Yes, OneConnect can do this. Note the difference when the exact same warp load, over a single TCP connection, lands upon a virtual server, now using OneConnect. The per-transaction load balancing, independent of the incoming traffic being on a single TCP connection, becomes evident when OneConnect is enacted for the virtual server. For a deeper dive into the characteristics of OneConnect the following Knowledge Base article 7208 is provided. Summary Using a lab setup involving BIG-IP and MinIO AIStor S3 servers, the nuances of tying an OneConnect profile to S3 virtual servers were investigated. The ability to maintain large numbers of setup and primed TCP connections between bursts of traffic was demonstrated, largely using Minio Warp to drive S3 traffic. The optimal result was traffic, which was intermittent, perhaps minute by minute ebbs and flows. As default behavior noted is that AIStor will close idle TCP connections after five minutes. S3 client traffic that generally sets up many short TCP connections, thus driving up TCP load, was investigated. This was achieved by nested curl commands pulling public bucket contents from AIStor. The potential gain of OneConnect appears even higher in such S3 low-TCP persistence environments, as intermittent traffic can result in continual re-use of OneConnect back-end connection pools. In other words, the ratio of front-end TCP connection count to back-end TCP connection count is driven even higher, suggesting more value is attainable. A final interesting use case of OneConnect was observed with long persistent connections, carrying many S3 requests in a serialized manner. OneConnect allows for each carried transaction to be load-balanced by BIG-IP on an individual request-by-request basis. This might introduce additional load in terms of decision making on BIG-IP but still in some cases may be useful for deeper examination of items such as HTTP header cookies as opposed to making a single load-balancing choice for all carried requests.145Views2likes0CommentsOne Quick Step to Make your website AI-Agent/MCP Ready with an iRule
The Problem Nobody Warned You About Here’s the thing about the AI agent explosion: GPTBot, ClaudeBot, PerplexityBot, and a dozen other crawlers are hitting your web applications today. And they’re getting back the same bloated HTML that your browser gets, complete with navigation bars, cookie banners, SVG icons, inline JavaScript, and CSS that means absolutely nothing to an LLM, other than a hit to your token usage. These agents don’t need your <nav> with 47 links. They don’t need your cookie consent modal. They definitely don’t need 200+ lines of minified CSS & JS. They need the content. The headings, the paragraphs, the links, the data. If you, or anyone, are using an agent to access and utilize the data on the page, it’s burning through a massive amount of tokens, generally ~2k per GET. But what if your BIG-IP could intercept these requests, see that the client is an AI agent, and transform that HTML response into clean markdown before it ever leaves your network? BTW, there is plenty of room for improvement here, and a small disclaimer at the end! The Approach The iRule works in three phases across three HTTP events. Here’s the flow: Client Request => HTTP_REQUEST (detect agent, strip Accept-Encoding) Origin Response => HTTP_RESPONSE (check HTML, collect body) Body Received => HTTP_RESPONSE_DATA (convert HTML => Markdown, replace body) Client receives clean markdown with Content-Type: text/plain Detection: Who’s an AI Agent? This example is set up to detect agents three ways; because different agents announce themselves differently, and we want to give humans a way to trigger it too (mostly I used this for testing, notes on that later). when HTTP_REQUEST { set is_ai_agent 0 set ua [string tolower [HTTP::header "User-Agent"]] # The usual suspects if { $ua contains "gptbot" || $ua contains "chatgpt-user" || $ua contains "claudebot" || $ua contains "claude-web" || $ua contains "perplexitybot" || $ua contains "cohere-ai" || $ua contains "google-extended" || $ua contains "applebot-extended" || $ua contains "bytespider" || $ua contains "ccbot" || $ua contains "amazonbot" } { set is_ai_agent 1 } # Explicit opt-in via header if { [HTTP::header "X-Request-Format"] eq "markdown" } { set is_ai_agent 1 } # Content negotiation (the standards-correct way) if { [HTTP::header "Accept"] contains "text/markdown" } { set is_ai_agent 1 } Why three methods? User-Agent detection handles the common crawlers automatically. The X-Request-Format header lets any client explicitly request markdown. And Accept: text/markdown is proper HTTP content negotiation, the way it should work once the ecosystem matures. The Demo Path: /md/ Prefix I added one more trigger that’s purely for demos: set orig_uri [HTTP::uri] if { $orig_uri starts_with "/md/" } { set is_ai_agent 1 set new_uri [string range $orig_uri 3 end] if { $new_uri eq "" } { set new_uri "/" } HTTP::uri $new_uri } Visit /md/ in your browser and you get the markdown version of the upstream site. This is great for showing the capability to someone without having to modify your User-Agent string or install curl. Preventing Compressed Responses This one bit me during testing. And if you could believe it, Kunal Anand is the one who gave me a tip to actually find the resolution. If the origin returns gzip-compressed HTML, HTTP::payload gives you binary garbage. The fix: if { $is_ai_agent } { HTTP::header replace "Accept-Encoding" "identity" } We just need to strip the Accept-Encoding header on the request side so the origin sends us uncompressed HTML. And I added a safety net in HTTP_RESPONSE: when HTTP_RESPONSE { if { $is_ai_agent } { if { [HTTP::header "Content-Type"] contains "text/html" } { set ce [HTTP::header "Content-Encoding"] if { $ce ne "" } { if { $ce ne "identity" } { set is_ai_agent 0 HTTP::header insert "X-Markdown-Skipped" "compressed-response" return } } } } If the upstream ignores our Accept-Encoding override and sends gzip anyway, we bail gracefully instead of serving corrupted content. Defense in-depth! The Conversion: Where the Magic Happens This is HTTP_RESPONSE_DATA, the body has been collected and we have the raw HTML. Now we convert it to markdown through a series of regex passes. Phase 1: The Multiline Problem Tcl's . in regex doesn't match newlines. Every <script>, <style>, and <nav> block in real HTML spans multiple lines. So this won’t work: # This silently fails on multiline <script> blocks! regsub -all -nocase {<script[^>]*>.*?</script>} $html_body "" html_body The fix, again, another hint from Kunal: collapse all newlines to a sentinel character before stripping block elements, then restore them after: set NL_MARK "\x01" set html_body [string map [list "\r\n" $NL_MARK "\r" $NL_MARK "\n" $NL_MARK] $html_body] # NOW these work, everything is one "line" regsub -all -nocase "<script\[^>\]*>.*?</script>" $html_body "" html_body regsub -all -nocase "<style\[^>\]*>.*?</style>" $html_body "" html_body regsub -all -nocase "<nav\[^>\]*>.*?</nav>" $html_body "" html_body # ... strip footer, header, noscript, svg, comments, forms, cookie banners # Restore newlines set html_body [string map [list $NL_MARK "\n"] $html_body] This is the single biggest quality improvement. Without it, you get raw JavaScript and CSS bleeding into your markdown output. Phase 2: Converting Structure With the junk stripped and newlines restored, we convert HTML elements to markdown syntax. Here’s the key insight that took a few iterations: Use [^<]* instead of .*? for tag content. # BAD: .*? crosses newlines in Tcl and matches across multiple tags regsub -all -nocase {<a[^>]*href="(/[^"]*)"[^>]*>(.*?)</a>} ... # GOOD: [^<]* stops at the next tag boundary regsub -all -nocase {<a[^>]*href="(/[^"]*)"[^>]*>([^<]*)</a>} ... This matters when you have two <a> tags on adjacent lines. The .*? version matches from the first <a> opening all the way to the second </a> closing, one giant broken link. The [^<]* version correctly matches each link individually. Here’s the conversion order (it matters): # 1. Headings regsub -all -nocase {<h2[^>]*>([^<]*)</h2>} $html_body "\n## \\1\n\n" html_body # 2. Emphasis BEFORE links (so **bold** inside links works) regsub -all -nocase {<strong[^>]*>([^<]*)</strong>} $html_body {**\1**} html_body # 3. Links with relative URL resolution regsub -all -nocase {<a[^>]*href="(/[^"]*)"[^>]*>([^<]*)</a>} \ $html_body "\[\\2\](https://${http_request_host}\\1)" html_body # 4. Tables, code, lists, paragraphs, blockquotes, images... # 5. Strip ALL remaining tags regsub -all {<[^>]+>} $html_body "" html_body # 6. Decode HTML entities regsub -all {“} $html_body {"} html_body regsub -all {’} $html_body {'} html_body # ... 20+ entity decodings Emphasis before links is important. If you have <a href="/pricing"><strong>$149,900</strong></a>, converting emphasis first gives you <a href="/pricing">$149,900</a>, which then converts to [$149,900](/pricing). Do it the other way, and the bold markers end up orphaned. URL Resolution AI agents need absolute URLs. A relative link like /properties is useless to a bot that doesn’t know what host it’s talking to. We capture $http_request_host in HTTP_REQUEST and use it during link conversion: # Relative to absolute regsub -all -nocase {<a[^>]*href="(/[^"]*)"[^>]*>([^<]*)</a>} \ $html_body "\[\\2\](https://${http_request_host}\\1)" html_body # Absolute stays absolute regsub -all -nocase {<a[^>]*href="(https?://[^"]*)"[^>]*>([^<]*)</a>} \ $html_body "\[\\2\](\\1)" html_body Same treatment for images. Dynamic Table Separators (Yet another place Kunal offered some tips) This one is kind of tricky to solve, because of common HTML table structure standards. Markdown tables need a separator row between the header and body: | Name | Price | Status | |------|-------|--------| | Unit A | $500k | Available | The separator needs the right number of columns. We count <th> tags in the <thead> and build it dynamically (but what if there is no thead? I try to account for that, too): set col_count 0 set thead_check $html_body if { [regsub -nocase {<thead[^>]*>(.*?)</thead>} $thead_check "\\1" first_thead] } { set col_count [regsub -all -nocase {<th[^>]*>} $first_thead "" _discard] } if { $col_count > 0 } { set sep "\n|" for { set c 0 } { $c < $col_count } { incr c } { append sep "---|" } append sep "\n" } If we can’t count the columns in thead, then we default to 6 columns, which could still use some work. But we end up without a hardcoded 2-column separator, breaking our 5-column tables. Performance Considerations This iRule runs in TMM. Every CPU cycle it uses is a cycle not processing other connections. So I built in several guardrails (could be better still): Size limit: Pages over 512KB skip conversion entirely. The regex chain gets expensive on large documents and the output quality degrades anyway. if { $content_length > 524288 } { set is_ai_agent 0 HTTP::header insert "X-Markdown-Skipped" "body-too-large" } Targeted Accept-Encoding: By stripping Accept-Encoding only for AI agent requests, normal browser traffic still gets compressed responses. No performance impact on human users. Logging: Every conversion logs the byte reduction to /var/log/ltm so you can monitor the overhead: markdown: converted 15526 bytes -> 4200 bytes (73% reduction) What This Doesn’t Do (And I Think That’s OK) I will be honest about the limitations: No DOM parsing. This is regex-based conversion. Complex nested structures (a <strong> that wraps three <div>s) won't convert perfectly. You need a real DOM parser for that, and iRules doesn't have one. I avoided using iRulesLX for this project entirely. Multiline tags within content blocks. The newline collapse trick handles <script> and <style>, but a <p> tag with inline markup that spans lines will partially match. The [^<]* pattern helps, but it can't capture text that contains child tags. Tables without <thead>. It detects column count from <th> tags. Tables that use plain <tr><td> with no header get a fallback separator. For 80% of web pages, the output is surprisingly good. For the other 20%, consider iRulesLX (Node.js sidecar with a real DOM parser) or a sideband approach with compiled-language HTML parsing. The Complete iRule Here it is, attach it to your virtual server and you're done: when HTTP_REQUEST { set is_ai_agent 0 set ua [string tolower [HTTP::header "User-Agent"]] if { $ua contains "gptbot" || $ua contains "chatgpt-user" || $ua contains "claudebot" || $ua contains "claude-web" || $ua contains "perplexitybot" || $ua contains "cohere-ai" || $ua contains "google-extended" || $ua contains "applebot-extended" || $ua contains "bytespider" || $ua contains "ccbot" || $ua contains "amazonbot" } { set is_ai_agent 1 } if { [HTTP::header "X-Request-Format"] eq "markdown" } { set is_ai_agent 1 } if { [HTTP::header "Accept"] contains "text/markdown" } { set is_ai_agent 1 } set orig_uri [HTTP::uri] if { $orig_uri starts_with "/md/" } { set is_ai_agent 1 set new_uri [string range $orig_uri 3 end] if { $new_uri eq "" } { set new_uri "/" } HTTP::uri $new_uri } elseif { $orig_uri eq "/md" } { set is_ai_agent 1 HTTP::uri "/" } set http_request_host [HTTP::host] if { $is_ai_agent } { HTTP::header replace "Accept-Encoding" "identity" } } when HTTP_RESPONSE { if { $is_ai_agent } { if { [HTTP::header "Content-Type"] contains "text/html" } { set ce [HTTP::header "Content-Encoding"] if { $ce ne "" } { if { $ce ne "identity" } { set is_ai_agent 0 HTTP::header insert "X-Markdown-Skipped" "compressed-response" return } } set content_length [HTTP::header "Content-Length"] set do_collect 1 if { $content_length ne "" } { if { $content_length > 524288 } { set is_ai_agent 0 set do_collect 0 HTTP::header insert "X-Markdown-Skipped" "body-too-large" } } if { $do_collect } { if { $content_length ne "" } { if { $content_length > 0 } { HTTP::collect $content_length } } else { HTTP::collect 524288 } } } } } when HTTP_RESPONSE_DATA { if { $is_ai_agent } { set html_body [HTTP::payload] set orig_size [string length $html_body] # Phase 1: Collapse newlines for multiline tag stripping set NL_MARK "\x01" set html_body [string map [list "\r\n" $NL_MARK "\r" $NL_MARK "\n" $NL_MARK] $html_body] regsub -all -nocase "<script\[^>\]*>.*?</script>" $html_body "" html_body regsub -all -nocase "<style\[^>\]*>.*?</style>" $html_body "" html_body regsub -all -nocase "<nav\[^>\]*>.*?</nav>" $html_body "" html_body regsub -all -nocase "<footer\[^>\]*>.*?</footer>" $html_body "" html_body regsub -all -nocase "<header\[^>\]*>.*?</header>" $html_body "" html_body regsub -all -nocase "<noscript\[^>\]*>.*?</noscript>" $html_body "" html_body regsub -all -nocase "<svg\[^>\]*>.*?</svg>" $html_body "" html_body regsub -all "<!--.*?-->" $html_body "" html_body regsub -all -nocase "<form\[^>\]*>.*?</form>" $html_body "" html_body # Phase 2: Restore newlines, convert structure set html_body [string map [list $NL_MARK "\n"] $html_body] regsub -all -nocase {<h1[^>]*>([^<]*)</h1>} $html_body "# \\1\n\n" html_body regsub -all -nocase {<h2[^>]*>([^<]*)</h2>} $html_body "\n## \\1\n\n" html_body regsub -all -nocase {<h3[^>]*>([^<]*)</h3>} $html_body "\n### \\1\n\n" html_body regsub -all -nocase {<h4[^>]*>([^<]*)</h4>} $html_body "\n#### \\1\n\n" html_body regsub -all -nocase {<strong[^>]*>([^<]*)</strong>} $html_body {**\1**} html_body regsub -all -nocase {<b[^>]*>([^<]*)</b>} $html_body {**\1**} html_body regsub -all -nocase {<em>([^<]*)</em>} $html_body {*\1*} html_body regsub -all -nocase {<i>([^<]*)</i>} $html_body {*\1*} html_body regsub -all -nocase {<a[^>]*href="(/[^"]*)"[^>]*>([^<]*)</a>} $html_body "\[\\2\](https://${http_request_host}\\1)" html_body regsub -all -nocase {<a[^>]*href="(https?://[^"]*)"[^>]*>([^<]*)</a>} $html_body "\[\\2\](\\1)" html_body regsub -all -nocase {<a[^>]*>([^<]*)</a>} $html_body {\\1} html_body regsub -all -nocase {<th[^>]*>([^<]*)</th>} $html_body "| \\1 " html_body regsub -all -nocase {<td[^>]*>([^<]*)</td>} $html_body "| \\1 " html_body regsub -all -nocase {</tr>} $html_body "|\n" html_body regsub -all -nocase {<code>([^<]*)</code>} $html_body {`\1`} html_body regsub -all -nocase {<li[^>]*>([^<]*)</li>} $html_body "- \\1\n" html_body regsub -all -nocase {</?[uo]l[^>]*>} $html_body "\n" html_body regsub -all -nocase {<p[^>]*>([^<]*)</p>} $html_body "\\1\n\n" html_body regsub -all -nocase {<br\s*/?>} $html_body "\n" html_body regsub -all -nocase {<hr\s*/?>} $html_body "\n---\n\n" html_body regsub -all -nocase {<blockquote[^>]*>} $html_body "> " html_body regsub -all -nocase {</blockquote>} $html_body "\n\n" html_body regsub -all -nocase {<cite>([^<]*)</cite>} $html_body "-- *\\1*\n" html_body regsub -all {<[^>]+>} $html_body "" html_body regsub -all {&} $html_body {\&} html_body regsub -all {<} $html_body {<} html_body regsub -all {>} $html_body {>} html_body regsub -all {"} $html_body {"} html_body regsub -all { } $html_body { } html_body regsub -all {“} $html_body {"} html_body regsub -all {”} $html_body {"} html_body regsub -all {‘} $html_body {'} html_body regsub -all {’} $html_body {'} html_body regsub -all {—} $html_body {--} html_body regsub -all {–} $html_body {-} html_body regsub -all {…} $html_body {...} html_body regsub -all {&#[0-9]+;} $html_body {} html_body regsub -all {\n +} $html_body "\n" html_body regsub -all {\n{3,}} $html_body "\n\n" html_body regsub -all {([^\n])\n\n([^#\n\[>*-])} $html_body "\\1\n\\2" html_body set html_body [string trim $html_body] HTTP::payload replace 0 [HTTP::payload length] $html_body HTTP::header replace "Content-Type" "text/plain; charset=utf-8" HTTP::header replace "Content-Length" [string length $html_body] HTTP::header insert "X-Markdown-Source" "bigip-irule" } } Testing / Demoing It # Normal browser request, HTML as usual curl https://your-site.example.com/ # AI agent simulation curl -H "User-Agent: GPTBot/1.0" https://your-site.example.com/ # Explicit markdown request curl -H "X-Request-Format: markdown" https://your-site.example.com/ # Browser-friendly demo curl https://your-site.example.com/md/ # (or just visit it in your browser) What's Next This is a solid starting point for making your existing sites AI-agent ready without touching application code. A few directions to take it: Agent discovery files: serve /llms.txt and /.well-known/ai-plugin.json so agents can programmatically discover your markdown capability iRulesLX upgrade path: When regex-based conversion isn't enough, move the HTML parsing to a Node.js sidecar with a real DOM parser (cheerio, jsdom). Same detection logic, better conversion quality. The AI agent wave isn't coming. It’s here. Your BIG-IP already sees every request. Might as well make those responses useful. Disclaimer! The iRule in this article was developed as part of a proof-of-concept for edge-layer HTML-to-Markdown conversion. It's been tested on BIG-IP 17.5.1+. Your mileage may vary on complex single-page applications, but for content-heavy sites, it works remarkably well for something that's "just regex."461Views9likes3CommentsCentralized Application Control for Distributed AI with Equinix and F5 Distributed Cloud
As AI adoption accelerates, I’ve been seeing a common architectural pattern emerge: centralized AI factories handling model training, with inference workloads pushed out to remote departments like public safety, healthcare, or logistics. While the execution is distributed, the operational requirements—security, performance, and policy consistency—remain very much centralized. The challenge isn’t running inference at the edge; it’s delivering centralized AI services to distributed consumers without introducing complex routing, fragmented security controls, or inconsistent performance between locations. This article outlines how you can address that problem using F5 Distributed Cloud (XC) Customer Edge deployed on Equinix Network Edge, with private connectivity provided by Equinix Fabric. The Problem to Solve From an infrastructure perspective, these environments tend to stress three things simultaneously: Scalability, as data volumes and inference demand grow rapidly Security, to protect models, APIs, and sensitive inference data Reliability, so performance remains consistent regardless of where requests originate Traditional approaches often force tradeoffs—centralize everything and accept latency, or decentralize enforcement and deal with policy sprawl. What we need is centralized control with distributed execution. Architectural Approach Rather than building bespoke connectivity for each inference location, we’ll focus on creating a repeatable edge pattern that could be deployed globally while still being governed centrally. The architecture breaks down into four core components: Central AI Factory (Training Hub) This is where model training and lifecycle management live. It connects to S3‑compatible object storage for large‑scale data ingestion and model artifacts. Importantly, it doesn’t need direct exposure to every inference a consumer makes. Equinix Fabric Equinix Fabric provides private, low‑latency connectivity between the AI factory and distributed inference locations. In this design, it effectively acts as a segment extender across regions, keeping AI traffic off the public internet while preserving predictable performance. F5 Distributed Cloud (XC) Customer Edge F5 XC Customer Edge (CE) instances are deployed close to inference consumers. These handle traffic management, API security, segmentation, and observability, while remaining under centralized policy control. This is where enforcement happens—consistently, everywhere. Equinix Network Edge Marketplace Equinix Network Edge enables rapid deployment of Customer Edge instances in new regions without waiting on physical infrastructure, which is critical when inference demand expands faster than traditional provisioning cycles. How It Works Inference requests are processed locally through CEs at each location. When access to centralized resources is required—such as model updates or validation—traffic traverses Equinix Fabric back to the AI factory. The key detail is that policy is defined centrally but enforced at the edge. Security controls, API protections, and segmentation rules are created once and applied uniformly, regardless of geography. That eliminates the need for custom routing logic or per‑site security tuning. Design Principles That Matter A few principles guided the implementation: Centralized control, distributed execution — inference stays close to data. Governance stays centralized Zero Trust by default — all AI data flows are explicitly authenticated and authorized Elastic expansion — new regions can be brought online quickly through the Marketplace Integrated observability — traffic, performance, and security posture are visible across all endpoints Compliance‑ready — isolation and segmentation support regulatory requirements like GDPR and HIPAA When This Pattern Fits This approach works well for organizations that need to scale AI inference across multiple regions or departments while maintaining tight operational control. It’s particularly effective when inference demand grows incrementally and predictability, security, and governance matter more than ad‑hoc edge autonomy. If the goal is centralized governance with distributed execution, this pattern provides a clean and repeatable way to get there. Additional Links F5 Distributed Cloud Services F5 Distributed Cloud (XC) Customer Edge Equinix Fabric Equinix Network Edge Marketplace153Views1like0Comments