For more information regarding the security incident at F5, the actions we are taking to address it, and our ongoing efforts to protect our customers, click here.

ai

80 Topics

Key Steps to Securely Scale and Optimize Production-Ready AI for Banking and Financial Services
This article outlines three key actions banks and financial firms can take to better securely scale, connect, and optimize their AI workflows, which will be demonstrated through a scenario of a bank taking a new AI application to production.
Chad_Davis
Jan 08, 2026 Place Technical Articles
41Views
3likes
0Comments
I Tried to Beat OpenAI with Ollama in n8n—Here’s Why It Failed (and the Bug I’m Filing)
Hey, community. I wanted to share a story about how I built the n8n Labs workflow. It watches a YouTube channel, summarizes the latest videos with AI agents, and sends a clean HTML newsletter via Gmail. In the video, I show it working flawlessly with OpenAI. But before I got there, I spent a lot of time trying to copy the same flow using open source models through Ollama with the n8n Ollama node. My results were all over the map. I really wanted this to be a great “open source first” build. I tried many local models via Ollama, tuned prompts, adjusted parameters, and re‑ran tests. The outputs were always unpredictable: sometimes I’d get partial JSON, sometimes extra text around the JSON. Sometimes fields would be missing. Sometimes it would just refuse to stick to the structure I asked for. After enough iterations, I started to doubt whether my understanding of the agent setup was off. So, I built a quick proof inside the n8n Code node. If the AI Agent step is supposed to take the XML→JSON feed and reshape it into a structured list—title, description, content URL, thumbnail URL—then I should be able to do that deterministically in JavaScript and compare. I wrote a tiny snippet that reads the entries array, grabs the media fields, and formats a minimal output. And guess what? Voila. It worked on the first try and my HTML generator lit up exactly the way I wanted. That told me two things: one, my upstream data (HTTP Request + XML→JSON) was solid; and two, my desired output structure was clear and achievable without any trickery. With that proof in hand, I turned to OpenAI. I wired the same agent prompt, the same structured output parser, and the same workflow wiring—but swapped the Ollama node for an OpenAI chat model. It worked immediately. Fast, cheap, predictable. The agent returned a perfectly clean JSON with the fields I requested. My code node transformed it into HTML. The preview looked right, and Gmail sent the newsletter just like in the demo. So at that point, I felt confident the approach was sound and the transcript you saw in the video was repeatable—at least with OpenAI in the loop. Where does that leave Ollama and open source models? I’m not throwing shade—I love open source, and I want this path to be great. My current belief is the failure is somewhere inside the n8n Ollama node code path. I don’t think it’s the models themselves in isolation; I think the node may be mishandling one or more of these details: how messages are composed (system vs user). Whether “JSON mode” or a grammar/format hint is being passed, token/length defaults that cause truncation, stop settings that let extra text leak into the output; or the way the structured output parser constraints are communicated. If you’ve worked with local models, you know they can follow structure very well when you give them a strict format or grammar. If the node isn’t exposing that (or is dropping it on the floor), you get variability. To make sure this gets eyes from the right folks, my intent is to file a bug with n8n for the Ollama node. I’ll include a minimal, reproducible workflow: the same RSS fetch, the same XML→JSON conversion, the same agent prompt and required output shape, and a comparison run where OpenAI succeeds and Ollama does not. I’ll share versions, logs, model names, and settings so the team can trace exactly where the behavior diverges. If there’s a missing parameter (like format: json) or a message-role mix‑up, great—let’s fix it. If it needs a small enhancement to pass a grammar or schema to the model, even better. The net‑net is simple: for AI agents inside n8n to feel predictable with Ollama, we need the node to enforce reliably structured outputs the same way the OpenAI path does. That unlocks a ton of practical automation for folks who prefer local models. In the meantime, if you’re following the lab and want a rock‑solid fallback, you can use the Code node to do the exact transformation the agent would do. Here’s the JavaScript I wrote and tested in the workflow: const entries = $input.first().json.feed?.entry ?? []; function truncate(str, max) { if (!str) return ''; const s = String(str).trim(); return s.length > max ? s.slice(0, max) + '…' : s; // If you want total length (including …) to be max, use: // return s.length > max ? s.slice(0, Math.max(0, max - 1)) + '…' : s; } const output = entries.map(entry => { const g = entry['media:group'] ?? {}; return { title: g['media:title'] ?? '', description: truncate(g['media:description'], 60), contentUrl: g['media:content']?.url ?? '', thumbnailUrl: g['media:thumbnail']?.url ?? '' }; }); return [{ json: { output } }]; That snippet proves the data is there and your HTML builder is fine. If OpenAI reproduces the same structured JSON as the code, and Ollama doesn’t, the issue is likely in the node’s request/response handling rather than your workflow logic. I’ll keep pushing on the bug report so we can make agents with Ollama as predictable as they need to be. Until then, if you want speed and consistency to get the job done, OpenAI works great. If you’re experimenting with open source, try enforcing stricter formats and shorter outputs—and keep an eye on what the node actually sends to the model. As always, I’ll share updates, because I love sharing knowledge—and I want the open-source path to shine right alongside the rest of our AI, agents, n8n, Gmail, and OpenAI workflows. As always, community, if you have a resolution and can pull it off, please share!
AubreyKingF5
Jan 07, 2026 Place Technical Articles
293Views
2likes
1Comment
Using the Model Context Protocol with Open WebUI
This year we started building out a series of hands-on labs you can do on your own in our AI Step-by-Step repo on GitHub. In my latest lab, I walk you through setting up a Model Context Protocol (MCP) server and the mcpo proxy to allow you to use MCP tools in a locally-hosted Open WebUI + Ollama environment. The steps are well-covered there, but I wanted to highlight what you learn in the lab. What is MCP and why does it matter? MCP is a JSON-based open standard from Anthropic that (shockingly!) is only about 13 months old now. It allows AI assistants to securely connect to external data sources and tools through a unified interface. The key delivery that led to it's rapid adoption is that it solves the fragmentation problem in AI integrations—instead of every AI system needing custom code to connect to each tool or database, MCP provides a single protocol that works across different AI models and data sources. MCP in the local lab My first exposure to MCP was using Claude and Docker tools to replicate a video Sebastian_Maniak released showing how to configure a BIG-IP application service. I wanted to see how F5-agnostic I could be in my prompt and still get a successful result, and it turned out that the only domain-specific language I needed, after it came up with a solution and deployed it, was to specify the load balancing algorithm. Everything else was correct. Kinda blew my mind. I spoke about this experience throughout the year at F5 Academy events and at a solutions days event in Toronto, but more-so, I wanted to see how far I could take this in a local setting away from the pay-to-play tooling offered at that time. This was the genesis for this lab. Tools In this lab, you'll use the following tools: Ollama - Open WebUI mcpo custom mcp server Ollama and Open WebUI are assumed to already be installed, those labs are also in the AI Step-by-Step repo: Installing Ollama Installing Open WebUI Once those are in place, you can clone the repo and deploy in docker or podman, just make sure the containers for open WebUI are in the same network as the repo you're deploying. Results The success for getting your Open WebUI inference through the mcpo proxy and the MCP servers (mine is very basic just for test purposes, there are more that you can test or build yourself) depends greatly on your prompting skills and the abilities of the local models you choose. I had varying success with llama3.2:3b. But the goal here isn't production-ready tooling, it's to build and discover and get comfortable in this new world of AI assistants and leveraging them where it makes sense to augment our toolbox. Drop a comment below if you build this lab and share your successes and failures. Community is the best learning environment.
JRahm
Dec 31, 2025 Place Technical Articles
133Views
2likes
0Comments
How I did it.....again "High-Performance S3 Load Balancing with F5 BIG-IP"
Introduction Welcome back to the "How I did it" series! In the previous installment, we explored the high‑performance S3 load balancing of Dell ObjectScale with F5 BIG‑IP. This follow‑up builds on that foundation with BIG‑IP v21.x’s S3‑focused profiles and how to apply them in the wild. We’ll also put the external monitor to work, validating health with real PUT/GET/DELETE checks so your S3-compatible backends aren’t just “up,” they’re truly dependable. New S3 Profiles for the BIG-IP…..well kind of A big part of why F5 BIG-IP excels is because of its advanced traffic profiles, like TCP and SSL/TLS. These profiles let you fine-tune connection behavior—optimizing throughput, reducing latency, and managing congestion—while enforcing strong encryption and protocol settings for secure, efficient data flow. Available with version 21.x the BIG-IP now includes new S3-specific profiles, (s3-tcp and s3-default-clientssl). These profiles are based off existing default parent profiles, (tcp and clientssl respectively) that have been customized or “tuned” to optimize s3 traffic. Let’s take a closer look. Anatomy of a TCP Profile The BIG-IP includes a number of pre-defined TCP profiles that define how the system manages TCP traffic for virtual servers, controlling aspects like connection setup, data transfer, congestion control, and buffer tuning. These profiles allow administrators to optimize performance for different network conditions by adjusting parameters such as initial congestion window, retransmission timeout, and algorithms like Nagle’s or Delayed ACK. The s3-tcp, (see below) has been tweaked with respect to data transfer and congestion window sizes as well as memory management to optimize typical S3 traffic patterns (i.e. high-throughput data transfer, varying request sizes, large payloads, etc.). Tweaking the Client SSL Profile for S3 Client SSL profiles on BIG-IP define how the system terminates and manages SSL/TLS sessions from clients at the virtual server. They specify critical parameters such as certificates, private keys, cipher suites, and supported protocol versions, enabling secure decryption for advanced traffic handling like HTTP optimization, security policies, and iRules. The s3-default-clientssl has been modified, (see below) from the default client ssl profile to optimize SSL/TLS settings for high-throughput object storage traffic, ensuring better performance and compatibility with S3-specific requirements. Advanced S3-compatible health checking with EAV Has anyone ever told you how cool BIG-IP Extended Application Verification (EAV) aka external monitors are? Okay, I suppose “coolness” is subjective, but EAVs are objectively cool. Let me prove it to you. Health monitoring of backend S3-compatible servers typically involves making an HTTP GET request to either the exposed S3 ingest/egress API endpoint or a liveness probe. Get a 200 and all's good. Wouldn’t it be cool if you could verify a backend server's health by verifying it can actually perform the operations as intended? Fortunately, we can do just that using an EAV monitor. Therefore, based on the transitive property, EAVs are cool. —mic drop The bash script located at the bottom of the page performs health checks on S3-compatible storage by executing PUT, GET, and DELETE operations on a test object. The health check creates a temporary health check file with timestamp, retrieves the file to verify read access, and removes the test file to clean up. If all three operations return the expected HTTP status code, the node is marked up otherwise the node is marked down. Installing and using the EAV health check Import the monitor script Save the bash script, (.sh) extension, (located at the bottom of this page) locally and import the file onto the BIG-IP. Log in to the BIG-IP Configuration Utility and navigate to System > File Management > External Monitor Program File List > Import. Use the file selector to navigate to and select the newly created. bash file, provide a name for the file and select 'Import'. Create a new external monitor Navigate to Local Traffic > Monitors > Create Provide a name for the monitor. Select 'External' for the type, and select the previously uploaded file for the 'External Program'. The 'Interval' and 'Timeout' settings can be modified or left at the default as desired. In addition to the backend host and port, the monitor must pass three (3) additional variables to the backend: bucket - The name of an existing bucket where the monitor can place a small text file. During the health check, the monitor will create a file, request the file and delete the file. access_key - S3-compatible access key with permissions to perform the above operations on the specified bucket. secret_key - corresponding S3-compatible secret key. Select 'Finished' to create the monitor. Associate the monitor with the pool Navigate to Local Traffic > Pools > Pool List and select the relevant backend S3 pool. Under 'Health Monitors' select the newly created monitor and move from 'Available' to the 'Active'. Select 'Update' to save the configuration. Additional Links How I did it - "High-Performance S3 Load Balancing of Dell ObjectScale with F5 BIG-IP" F5 BIG-IP v21.0 brings enhanced AI data delivery and ingestion for S3 workflows Overview of BIG-IP EAV external monitors EAV Bash Script #!/bin/bash ################################################################################ # S3 Health Check Monitor for F5 BIG-IP (External Monitor - EAV) ################################################################################ # # Description: # This script performs health checks on S3-compatible storage by # executing PUT, GET, and DELETE operations on a test object. It uses AWS # Signature Version 4 for authentication and is designed to run as a BIG-IP # External Application Verification (EAV) monitor. # # Usage: # This script is intended to be configured as an external monitor in BIG-IP. # BIG-IP automatically provides the first two arguments: # $1 - Pool member IP address (may be IPv6-mapped format: ::ffff:x.x.x.x) # $2 - Pool member port number # # Additional arguments must be configured in the monitor's "Variables" field: # bucket - S3 bucket name # access_key - Access key for authentication # secret_key - Secret key for authentication # # BIG-IP Monitor Configuration: # Type: External # External Program: /path/to/this/script.sh # Variables: # bucket="your-bucket-name" # access_key="your-access-key" # secret_key="your-secret-key" # # Health Check Logic: # 1. PUT - Creates a temporary health check file with timestamp # 2. GET - Retrieves the file to verify read access # 3. DELETE - Removes the test file to clean up # Success: All three operations return expected HTTP status codes # Failure: Any operation fails or times out # # Exit Behavior: # - Prints "UP" to stdout if all checks pass (BIG-IP marks pool member up) # - Silent exit if any check fails (BIG-IP marks pool member down) # # Requirements: # - openssl (for SHA256 hashing and HMAC signing) # - curl (for HTTP requests) # - xxd (for hex encoding) # - Standard bash utilities (date, cut, sed, awk) # # Notes: # - Handles IPv6-mapped IPv4 addresses from BIG-IP (::ffff:x.x.x.x) # - Uses AWS Signature Version 4 authentication # - Logs activity to syslog (local0.notice) # - Creates temporary files that are automatically cleaned up # # Author: [Gregory Coward/F5] # Version: 1.0 # Last Modified: 12/2025 # ################################################################################ # ===== PARAMETER CONFIGURATION ===== # BIG-IP automatically provides these HOST="$1" # Pool member IP (may include ::ffff: prefix for IPv4) PORT="$2" # Pool member port BUCKET="${bucket}" # S3 bucket name ACCESS_KEY="${access_key}" # S3 access key SECRET_KEY="${secret_key}" # S3 secret key OBJECT="${6:-healthcheck.txt}" # Test object name (default: healthcheck.txt) # Strip IPv6-mapped IPv4 prefix if present (::ffff:10.1.1.1 -> 10.1.1.1) # BIG-IP may pass IPv4 addresses in IPv6-mapped format if [[ "$HOST" =~ ^::ffff: ]]; then HOST="${HOST#::ffff:}" fi # ===== S3/AWS CONFIGURATION ===== ENDPOINT="http://$HOST:$PORT" # S3 endpoint URL SERVICE="s3" # AWS service identifier for signature REGION="" # AWS region (leave empty for S3 compatible such as MinIO/Dell) # ===== TEMPORARY FILE SETUP ===== # Create temporary file for health check upload TMP_FILE=$(mktemp) printf "Health check at %s\n" "$(date)" > "$TMP_FILE" # Ensure temp file is deleted on script exit (success or failure) trap "rm -f $TMP_FILE" EXIT # ===== CRYPTOGRAPHIC HELPER FUNCTIONS ===== # Calculate SHA256 hash and return as hex string # Input: stdin # Output: hex-encoded SHA256 hash hex_of_sha256() { openssl dgst -sha256 -hex | sed 's/^.* //' } # Sign data using HMAC-SHA256 and return hex signature # Args: $1=hex-encoded key, $2=data to sign # Output: hex-encoded signature sign_hmac_sha256_hex() { local key_hex="$1" local data="$2" printf "%s" "$data" | openssl dgst -sha256 -mac HMAC -macopt "hexkey:$key_hex" | awk '{print $2}' } # Sign data using HMAC-SHA256 and return binary as hex # Args: $1=hex-encoded key, $2=data to sign # Output: hex-encoded binary signature (for key derivation chain) sign_hmac_sha256_binary() { local key_hex="$1" local data="$2" printf "%s" "$data" | openssl dgst -sha256 -mac HMAC -macopt "hexkey:$key_hex" -binary | xxd -p -c 256 } # ===== AWS SIGNATURE VERSION 4 IMPLEMENTATION ===== # Generate AWS Signature Version 4 for S3 requests # Args: # $1 - HTTP method (PUT, GET, DELETE, etc.) # $2 - URI path (e.g., /bucket/object) # $3 - Payload hash (SHA256 of request body, or empty hash for GET/DELETE) # $4 - Content-Type header value (empty string if not applicable) # Output: pipe-delimited string "Authorization|Timestamp|Host" aws_sig_v4() { local method="$1" local uri="$2" local payload_hash="$3" local content_type="$4" # Generate timestamp in AWS format (YYYYMMDDTHHMMSSZ) local timestamp=$(date -u +"%Y%m%dT%H%M%SZ" 2>/dev/null || gdate -u +"%Y%m%dT%H%M%SZ") local datestamp=$(date -u +"%Y%m%d") # Build host header (include port if non-standard) local host_header="$HOST" if [ "$PORT" != "80" ] && [ "$PORT" != "443" ]; then host_header="$HOST:$PORT" fi # Build canonical headers and signed headers list local canonical_headers="" local signed_headers="" # Include Content-Type if provided (for PUT requests) if [ -n "$content_type" ]; then canonical_headers="content-type:${content_type}"$'\n' signed_headers="content-type;" fi # Add required headers (must be in alphabetical order) canonical_headers="${canonical_headers}host:${host_header}"$'\n' canonical_headers="${canonical_headers}x-amz-content-sha256:${payload_hash}"$'\n' canonical_headers="${canonical_headers}x-amz-date:${timestamp}" signed_headers="${signed_headers}host;x-amz-content-sha256;x-amz-date" # Build canonical request (AWS Signature V4 format) # Format: METHOD\nURI\nQUERY_STRING\nHEADERS\n\nSIGNED_HEADERS\nPAYLOAD_HASH local canonical_request="${method}"$'\n' canonical_request+="${uri}"$'\n\n' # Empty query string (double newline) canonical_request+="${canonical_headers}"$'\n\n' canonical_request+="${signed_headers}"$'\n' canonical_request+="${payload_hash}" # Hash the canonical request local canonical_hash canonical_hash=$(printf "%s" "$canonical_request" | hex_of_sha256) # Build string to sign local algorithm="AWS4-HMAC-SHA256" local credential_scope="$datestamp/$REGION/$SERVICE/aws4_request" local string_to_sign="${algorithm}"$'\n' string_to_sign+="${timestamp}"$'\n' string_to_sign+="${credential_scope}"$'\n' string_to_sign+="${canonical_hash}" # Derive signing key using HMAC-SHA256 key derivation chain # kSecret = HMAC("AWS4" + secret_key, datestamp) # kRegion = HMAC(kSecret, region) # kService = HMAC(kRegion, service) # kSigning = HMAC(kService, "aws4_request") local k_secret k_secret=$(printf "AWS4%s" "$SECRET_KEY" | xxd -p -c 256) local k_date k_date=$(sign_hmac_sha256_binary "$k_secret" "$datestamp") local k_region k_region=$(sign_hmac_sha256_binary "$k_date" "$REGION") local k_service k_service=$(sign_hmac_sha256_binary "$k_region" "$SERVICE") local k_signing k_signing=$(sign_hmac_sha256_binary "$k_service" "aws4_request") # Calculate final signature local signature signature=$(sign_hmac_sha256_hex "$k_signing" "$string_to_sign") # Return authorization header, timestamp, and host header (pipe-delimited) printf "%s|%s|%s" \ "${algorithm} Credential=${ACCESS_KEY}/${credential_scope}, SignedHeaders=${signed_headers}, Signature=${signature}" \ "$timestamp" \ "$host_header" } # ===== HTTP REQUEST FUNCTION ===== # Execute HTTP request using curl with AWS Signature V4 authentication # Args: # $1 - HTTP method (PUT, GET, DELETE) # $2 - Full URL # $3 - Authorization header value # $4 - Timestamp (x-amz-date header) # $5 - Host header value # $6 - Payload hash (x-amz-content-sha256 header) # $7 - Content-Type (optional, empty for GET/DELETE) # $8 - Data file path (optional, for PUT with body) # Output: HTTP status code (e.g., 200, 404, 500) do_request() { local method="$1" local url="$2" local auth="$3" local timestamp="$4" local host_header="$5" local payload_hash="$6" local content_type="$7" local data_file="$8" # Build curl command with required headers local cmd="curl -s -o /dev/null --connect-timeout 5 --write-out %{http_code} \"$url\"" cmd="$cmd -X $method" cmd="$cmd -H \"Host: $host_header\"" cmd="$cmd -H \"x-amz-date: $timestamp\"" cmd="$cmd -H \"x-amz-content-sha256: $payload_hash\"" # Add optional headers [ -n "$content_type" ] && cmd="$cmd -H \"Content-Type: $content_type\"" cmd="$cmd -H \"Authorization: $auth\"" [ -n "$data_file" ] && cmd="$cmd --data-binary @\"$data_file\"" # Execute request and return HTTP status code eval "$cmd" } # ===== MAIN HEALTH CHECK LOGIC ===== # ===== STEP 1: PUT (Upload Test Object) ===== # Calculate SHA256 hash of the temp file content UPLOAD_HASH=$(openssl dgst -sha256 -binary "$TMP_FILE" | xxd -p -c 256) CONTENT_TYPE="application/octet-stream" # Generate AWS Signature V4 for PUT request SIGN_OUTPUT=$(aws_sig_v4 "PUT" "/$BUCKET/$OBJECT" "$UPLOAD_HASH" "$CONTENT_TYPE") AUTH_PUT=$(cut -d'|' -f1 <<< "$SIGN_OUTPUT") DATE_PUT=$(cut -d'|' -f2 <<< "$SIGN_OUTPUT") HOST_PUT=$(cut -d'|' -f3 <<< "$SIGN_OUTPUT") # Execute PUT request (expect 200 OK) PUT_STATUS=$(do_request "PUT" "$ENDPOINT/$BUCKET/$OBJECT" "$AUTH_PUT" "$DATE_PUT" "$HOST_PUT" "$UPLOAD_HASH" "$CONTENT_TYPE" "$TMP_FILE") # ===== STEP 2: GET (Download Test Object) ===== # SHA256 hash of empty body (for GET requests with no payload) EMPTY_HASH="e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" # Generate AWS Signature V4 for GET request SIGN_OUTPUT=$(aws_sig_v4 "GET" "/$BUCKET/$OBJECT" "$EMPTY_HASH" "") AUTH_GET=$(cut -d'|' -f1 <<< "$SIGN_OUTPUT") DATE_GET=$(cut -d'|' -f2 <<< "$SIGN_OUTPUT") HOST_GET=$(cut -d'|' -f3 <<< "$SIGN_OUTPUT") # Execute GET request (expect 200 OK) GET_STATUS=$(do_request "GET" "$ENDPOINT/$BUCKET/$OBJECT" "$AUTH_GET" "$DATE_GET" "$HOST_GET" "$EMPTY_HASH" "" "") # ===== STEP 3: DELETE (Remove Test Object) ===== # Generate AWS Signature V4 for DELETE request SIGN_OUTPUT=$(aws_sig_v4 "DELETE" "/$BUCKET/$OBJECT" "$EMPTY_HASH" "") AUTH_DEL=$(cut -d'|' -f1 <<< "$SIGN_OUTPUT") DATE_DEL=$(cut -d'|' -f2 <<< "$SIGN_OUTPUT") HOST_DEL=$(cut -d'|' -f3 <<< "$SIGN_OUTPUT") # Execute DELETE request (expect 204 No Content) DEL_STATUS=$(do_request "DELETE" "$ENDPOINT/$BUCKET/$OBJECT" "$AUTH_DEL" "$DATE_DEL" "$HOST_DEL" "$EMPTY_HASH" "" "") # ===== LOG RESULTS ===== # Log all operation results for troubleshooting #logger -p local0.notice "S3 Monitor: PUT=$PUT_STATUS GET=$GET_STATUS DEL=$DEL_STATUS" # ===== EVALUATE HEALTH CHECK RESULT ===== # BIG-IP considers the pool member "UP" only if this script prints "UP" to stdout # Check if all operations returned expected status codes: # PUT: 200 (OK) # GET: 200 (OK) # DELETE: 204 (No Content) if [ "$PUT_STATUS" -eq 200 ] && [ "$GET_STATUS" -eq 200 ] && [ "$DEL_STATUS" -eq 204 ]; then #logger -p local0.notice "S3 Monitor: UP" echo "UP" fi # If any check fails, script exits silently (no "UP" output) # BIG-IP will mark the pool member as DOWN
Greg_Coward
Dec 29, 2025 Place Technical Articles
182Views
3likes
0Comments
F5 BIG-IP and NetApp StorageGRID - Providing Fast and Scalable S3 API for AI apps
F5 BIG-IP, an industry-leading ADC solution, can provide load balancing services for HTTPS servers, with full security applied in-flight and performance levels to meet any enterprise’s capacity targets. Specific to the S3 API, the object storage and retrieval protocol that rides upon HTTPS, an aligned partnering solution exists from NetApp, which allows a large-scale set of S3 API targets to ingest and provide objects. Automatic backend synchronization allows any node to be offered up as a target by a server load balancer like BIG-IP. This allows overall storage node utilization to be optimized across the node set, and scaled performance to reach the highest S3 API bandwidth levels, all while offering high availability to S3 API consumers. If one node fails or is undergoing maintenance, the overall service continues. S3 compatible storage is becoming popular for AI applications due to its superior performance over traditional protocols such as NFS or CIFS, as well as enabling repatriation of data from the cloud to on-prem. These are scenarios where the amount of data faced is large, this drives the requirement for new levels of scalability and performance; S3 compatible object storages such as NetApp StorageGRID are purpose-built to reach such levels. Sample BIG-IP and StorageGRID Configuration This document is based upon tests and measurements using the following lab configuration. All devices in the lab were virtual machine-based offerings. The S3 service to be projected to the outside world, depicted in the above diagram and delivered to the client via the external network, will use a BIG-IP virtual server (VS) which is tied to an origin pool of three large-capacity StorageGRID nodes. The BIG-IP maintains the integrity of the NetApp nodes by frequent HTTP-based health checks. Should an unhealthy node be detected, it will be dropped from the list of active pool members. When content is written via the S3 protocol to any node in the pool, the other members are synchronized to serve up content should they be selected by BIG-IP for future read requests. The key recommendations and observations in building the lab include: Setup a local certificate authority such that all nodes can be trusted by the BIG-IP. Typically the local CA-signed certificate will incorporate every node’s FQDN and IP address within the listed subject alternate names (SAN) to make the backend solution streamlined with one single certificate. Different F5 profiles, such as FastL4 or FastHTTP, can be selected to reach the right tradeoff between the absolute capacity of stateful traffic load-balanced versus rich layer 7 functions like iRules or authentication. Modern techniques such as multi-part uploads or using HTTP Ranges for downloads can take large objects, and concurrently move smaller pieces across the load balancer, lowering total transaction times, and spreading work over more CPU cores. The S3 protocol, at its core, is a set of REST API calls. To facilitate testing, the widely used S3Browser (www.s3browser.com) was used to quickly and intuitively create S3 buckets on the NetApp offering and send/retrieve objects (files) through the BIG-IP load balancer. Setup the BIG-IP and StorageGrid Systems The StorageGrid solution is an array of storage nodes, provisioned with the help of an administrative host, the “Grid Manager”. For interactive users, no thick client is required as on-board web services allow a streamlined experience all through an Internet browser. The following is an example of Grid Manager, taken from a Chrome browser; one sees the three Storage Nodes setup have been successfully added. The load balancer, in our case the BIG-IP, is set up with a virtual server to support HTTPS traffic and distributed that traffic, which is S3 object storage traffic, to the three StorageGRID nodes. The following screenshot demonstrates that the BIG-IP is setup in a standard HA (active-passive pair) configuration and the three pool members are healthy (green, health checks are fine) and receiving/sending S3 traffic, as the byte counts are seen in the image to be non-zero. On the internal side of the BIG-IP, TCP port 18082 is being used for S3 traffic. To do testing of the solution, including features such as multi-part uploads and downloads, a popular S3 tool, S3Browser, was downloaded and used. The following shows the entirety of the S3Browser setup. Simply create an account (StorageGRID-Account-01 in our example) and point the REST API endpoint at the BIG-IP Virtual Server that is acting as the secure front door for our pool of NetApp nodes. The S3 Access Key ID and Secret values are generated at turn-up time of the NetApp appliances. All S3 traffic will, of course, be SSL/TLS encrypted. BIG-IP will intercept the SSL traffic (high-speed decrypt) and then re-encrypt when proxying the traffic to a selected origin pool member. Other valid load balancer setups exist; one might include an “off load” approach to SSL, whereby the S3 nodes safely co-located in a data center may prefer to receive non-SSL HTTP S3 traffic. This may see an overall performance improvement in terms of peak bandwidth per storage node, but this comes at the tradeoff of security considerations. Experimenting with S3 Protocol and Load Balancing With all the elements in place to start understanding the behavior of S3 and spreading traffic across NetApp nodes, a quick test involved creating a S3 bucket and placing some objects in that new bucket. Buckets are logical collections of objects, conceptually not that different from folders or directories in file systems. In fact, a S3 bucket could even be mounted as a folder in an operating system such as Linux. In their simplest form, most commonly, buckets can simply serve as high-capacity, performant storage and retrieval targets for similarly themed structured or unstructured data. In the first test, we created a new bucket (“audio-clip-bucket”) and uploaded four sample files to the new bucket using S3Browser. We then zeroed the statistics for each pool member on the BIG-IP, to see if even this small upload would spread S3 traffic across more than a single NetApp device. Immediately after the upload, the counters reflect that two StorageGRID nodes were selected to receive S3 transactions. Richly detailed, per-transaction visibility can be obtained by leveraging the F5 SSL Orchestrator (SSLO) feature on the BIG-IP, whereby copies of the bi-directional S3 traffic decrypted within the load balancer can be sent to packet loggers, analytics tools, or even protocol analyzers like Wireshark. The BIG-IP also has an onboard analytics tool, Application Visibility and Reporting (AVR) which can provide some details on the nuances of the S3 traffic being proxied. AVR demonstrates the following characteristics of the above traffic, a simple bucket creation and upload of 4 objects. With AVR, one can see the URL values used by S3, which include the bucket name itself as well as transactions incorporating the object names as URLs. Also, the HTTP methods used included both GETS and PUTS. The use of HTTP PUT is expected when creating a new bucket. S3 is not governed by a typical standards body document, such as an IETF Request for Comment (RFC), but rather has evolved out of AWS and their use of S3 since 2006. For details around S3 API characteristics and nomenclature, this site can be referenced. For example, the expected syntax for creating a bucket is provided, including the fact that it should be an HTTP PUT to the root (/) URL target, with the bucket configuration parameters including name provided within the HTTP transaction body. Achieving High Performance S3 with BIG-IP and StorageGRID A common concern with protocols, such as HTTP, is head-of-line blocking, where one large, lengthy transaction blocks subsequent desired, and now queued transactions. This is one of the reasons for parallelism in HTTP 1.X, where loading 30 or more objects to paint a web page will often utilize two, four, or even more concurrent TCP sessions. Another performance issue when dealing with very large transactions is, without parallelism, even those most performant networks will see an established TCP session reach a maximum congestion window (CWND) where no more segments may be in put inflight until new TCP ACKs arrive back. Advanced TCP options like TCP exponential windowing or TCP SACK can help, but regardless of this, the achievable bandwidth of any one TCP session is bounded and may also frequently task only one core in multi-core CPUs. With the BIG-IP serving as the intermediary, large S3 transactions may default to “multi-part” uploads and downloads. The larger objects become a series of smaller objects that conveniently can be load-balanced by BIG-IP across the entire cluster of NetApp nodes. As displayed in the following diagram, we are asking for multi-part uploads to kick in for objects larger than 5 megabytes. After uploading a 20-megabyte file (technically, 20,000,000 bytes) the BIG-IP shows the traffic distributed across multiple NetApp nodes to the tune of 160.9 million bits. The incoming bits, incoming from the perspective of the origin pool members, confirm the delivery of the object with a small amount of protocol overhead (bits divided by eight to reach bytes). The value of load balancing manageable chunks of very large objects will pay dividends over time with faster overall transaction completion times due to the spreading of traffic across NetApp nodes, more TCP sessions reaching high congestion window values, and no single-core bottle necks in multicore equipment. Tuning BIG-IP for High Performance S3 Service Delivery The F5 BIG-IP offers a set of different profiles it can run its Local Traffic Manager (LTM) module in accordance with; LTM is the heart of the server load balancing function. The most performant profile in terms of attainable traffic load is the “FastL4” profile. This, and other profiles such as “OneConnect” or “FastHTTP”, can be tied to a virtual server, and details around each profile can be found here within the BIG-IP GUI: The FastL4 profile can increase virtual server performance and throughput for supported platforms by using the embedded Packet Velocity Acceleration (ePVA) chip to accelerate traffic. The ePVA chip is a hardware acceleration field programmable gate array (FPGA) that delivers high-performance L4 throughput by offloading traffic processing to the hardware acceleration chip. The BIG-IP makes flow acceleration decisions in software and then offloads eligible flows to the ePVA chip for that acceleration. For platforms that do not contain the ePVA chip, the system performs acceleration actions in software. Software-only solutions can increase performance in direct relationship to the hardware offered by the underlying host. As examples of BIG-IP virtual edition (VE) software running on mid-grade hardware platforms, results with Dell can be found here and similar experiences with HPE Proliant platforms are here. One thing to note about FastL4 as the profile to underpin a performance mode BIG-IP virtual server is that it is layer 4 oriented. For certain features that involve layer 7 HTTP related fields, such as using iRules to swap HTTP headers or perform HTTP authentication, a different profile might be more suitable. A bonus of FastL4 are some interesting specific performance features catering to it. In the BIG-IP version 17 release train, there is a feature to quickly tear down, with no delay, TCP sessions no longer required. Most TCP stacks implement TCP “2MSL” rules, where upon receiving and sending TCP FIN messages, the socket enters a lengthy TCP “TIME_WAIT” state, often minutes long. This stems back to historically bad packet loss environments of the very early Internet. A concern was high latency and packet loss might see incoming packets arrive at a target very late, and the TCP state machine would be confused if no record of the socket still existed. As such, the lengthy TIME_WAIT period was adopted even though this is consuming on-board resources to maintain the state. With FastL4, the “fast” close with TCP reset option now exists, such that any incoming TCP FIN message observed by BIG-IP will result in TCP RESETS being sent to both endpoints, normally bypassing TIME_WAIT penalties. OneConnect and FastHTTP Profiles As mentioned, other traffic profiles on BIG-IP are directed towards Layer 7 and HTTP features. One interesting profile is F5’s “OneConnect”. The OneConnect feature set works with HTTP Keep-Alives, which allows the BIG-IP system to minimize the number of server-side TCP connections by making existing connections available for reuse by other clients. This reduces, among other things, excessive TCP 3-way handshakes (Syn, Syn-Ack, Ack) and mitigates the small TCP congestion windows that new TCP sessions start with and only increases with successful traffic delivery. Persistent server-side TCP connections ameliorate this. When a new connection is initiated to the virtual server, if an existing server-side flow to the pool member is idle, the BIG-IP system applies the OneConnect source mask to the IP address in the request to determine whether it is eligible to reuse the existing idle connection. If it is eligible, the BIG-IP system marks the connection as non-idle and sends a client request over it. If the request is not eligible for reuse, or an idle server-side flow is not found, the BIG-IP system creates a new server-side TCP connection and sends client requests over it. The last profile considered is the “Fast HTTP” profile. The Fast HTTP profile is designed to speed up certain types of HTTP connections and again strives to reduce the number of connections opened to the back-end HTTP servers. This is accomplished by combining features from the TCP, HTTP, and OneConnect profiles into a single profile that is optimized for network performance. A resulting high performance HTTP virtual server processes connections on a packet-by-packet basis and buffers only enough data to parse packet headers. The performance HTTP virtual server TCP behavior operates as follows: the BIG-IP system establishes server-side flows by opening TCP connections to pool members. When a client makes a connection to the performance HTTP virtual server, if an existing server-side flow to the pool member is idle, the BIG-IP LTM system marks the connection as non-idle and sends a client request over the connection. Summary The NetApp StorageGRID multi-node S3 compatible object storage solution fits well with a high-performance server load balancer, thus making the F5 BIG-IP a good fit. S3 protocol can itself be adjusted to improve transaction response times, such as through the use of multi-part uploads and downloads, amplifying the default load balancing to now spread even more traffic chunks over many NetApp nodes. BIG-IP has numerous approaches to configuring virtual servers, from highest performance L4-focused profiles to similar offerings that retain L7 HTTP awareness. Lab testing was accomplished using the S3Browser utility and results of traffic flows were confirmed with both the standard BIG-IP GUI and the additional AVR analytics module, which provides additional protocol insight.
Steve_Gorman
Dec 15, 2025 Place Technical Articles
1.4KViews
3likes
0Comments
Using n8n To Orchestrate Multiple Agents
I’ve been heads-down building a series of AI step-by-step labs, and this one might be my favorite so far: a practical, cost-savvy “mixture of experts” architectural pattern you can run with n8n and self-hosted models on Ollama. The idea is simple. Not every prompt needs a heavyweight reasoning model. In fact, most don’t. So we put a small, fast model in front to classify the user’s request—coding, reasoning, or something else—and then hand that prompt to the right expert. That way, you keep your spend and latency down, and only bring out the big guns when you really need them. Architecture at a glance: Two hosts: one for your models (Ollama) and one for your n8n app. Keeping these separate helps n8n stay snappy while the model server does the heavy lifting. Docker everywhere, with persistent volumes for both Ollama and n8n so nothing gets lost across restarts. Optional but recommended: NVIDIA GPU on the model host, configured with the NVIDIA Container Toolkit to get the most out of inference. On the model server, we spin up Ollama and pull a small set of targeted models: deepseek-r1:1.5b for the classifier and general chit-chat deepseek-r1:7b for the reasoning agent (this is your “brains-on” model) codellama:latest for coding tasks (Python, JSON, Node.js, iRules, etc.) llama3.2:3b as an alternative generalist On the app server, we run n8n. Inside n8n, the flow starts with the “On Chat Message” trigger. I like to immediately send a test prompt so there’s data available in the node inspector as I build. It makes mapping inputs easier and speeds up debugging. Next up is the Text Classifier node. The trick here is a tight system, prompt and clear categories: Categories: Reasoning and Coding Options: When no clear match → Send to an “Other” branch Optional: You can allow multiple matches if you want the same prompt to hit more than one expert. I’ve tried both approaches. For certain, ambiguous asks, allowing multiple can yield surprisingly strong results. I attach deepseek-r1:1.5b to the classifier. It’s inexpensive and fast, which is exactly what you want for routing. In the System Prompt Template, I tell it: If a prompt explicitly asks for coding help, classify it as Coding If it explicitly asks for reasoning help, classify it as Reasoning Otherwise, pass the original chat input to a Generalist From there, each classifier output connects to its own AI Agent node: Reasoning Agent → deepseek-r1:7b Coding Agent → codellama:latest Generalist Agent (the “Other” branch) → deepseek-r1:1.5b or llama3.2:3b I enable “Retry on Fail” on the classifier and each agent. In my environment (cloud and long-lived connections), a few retries smooth out transient hiccups. It’s not a silver bullet, but it prevents a lot of unnecessary red Xs while you’re iterating. Does this actually save money? If you’re paying per token on hosted models, absolutely. You’re deferring the expensive reasoning calls until a small model decides they’re justified. Even self-hosted, you’ll feel the difference in throughput and latency. CodeLlama crushes most code-related queries without dragging a reasoning model into it. And for general questions—“How do I make this sandwich?”—A small generalist is plenty. A few practical notes from the build: Good inputs help. If you know you’re asking for code, say so. Your classifier and downstream agent will have an easier time. Tuning beats guessing. Spend time on the classifier’s system prompt. Small changes go a long way. Non-determinism is real. You’ll see variance run-to-run. Between retries, better prompts, and a firm “When no clear match” path, you can keep that variance sane. Bigger models, better answers. If you have the budget or hardware, plugging in something like Claude, GPT, or a higher-parameter DeepSeek will lift quality. The routing pattern stays the same. Where to take it next: Wire this to Slack so an engineering channel can drop prompts and get routed answers in place. Add more “experts” (e.g., a data-analysis agent or an internal knowledge agent) and expand your classifier categories. Log token counts/latency per branch so you can actually measure savings and adjust thresholds/models over time. This is a lab, not a production, but the pattern is production-worthy with the right guardrails. Start small, measure, tune, and only scale up the heavy models where you’re seeing real business value. Let me know what you build—especially if you try multi-class routing and send prompts to more than one expert. Some of the combined answers I’ve seen are pretty great. Here's the lab in our git, if you'd like to try it out for yourself. If video is more your thing, try this: Thanks for building along, and I’ll see you in the next lab.
AubreyKingF5
Nov 26, 2025 Place Technical Articles
181Views
3likes
0Comments
Managing Model Context Protocol in iRules - Part 3
In part 2 of this series, we took a look at a couple iRules use cases that do not require the json or sse profiles and don't capitalize on the new JSON commands and events introduced in the v21 release. That changes now! In this article, we'll take a look at two use cases for logging MCP activity and removing MCP tools from a servers tool list. Event logging This iRule logs various HTTP, SSE, and JSON-related events for debugging and monitoring purposes. It provides clear visibility into request/response flow and detects anomalies or errors. How it works HTTP_REQUEST Logs each HTTP request with its URI and client IP. Example: "HTTP request received: URI /example from 192.168.1.1" SSE_RESPONSE Logs when a Server-Sent Event (SSE) response is identified. Example: "SSE response detected successfully." JSON_REQUEST and JSON_RESPONSE Logs when valid JSON requests or responses are detected Examples: "JSON Request detected successfully" JSON Response detected successfully" JSON_REQUEST_MISSING and JSON_RESPONSE_MISSING Logs if JSON payloads are missing from requests or responses. Examples: "JSON Request missing." "JSON Response missing." JSON_REQUEST_ERROR and JSON_RESPONSE_ERROR Logs when there are errors in parsing JSON during requests or responses. Examples: "Error processing JSON request. Rejecting request." "Error processing JSON response." iRule: Event Logging when HTTP_REQUEST { # Log the event (for debugging) log local0. "HTTP request received: URI [HTTP::uri] from [IP::client_addr]" when SSE_RESPONSE { # Triggered when a Server-Sent Event response is detected log local0. "SSE response detected successfully." } when JSON_REQUEST { # Triggered when the JSON request is detected log local0. "JSON Request detected successfully." } when JSON_RESPONSE { # Triggered when a Server-Sent Event response is detected log local0. "JSON response detected successfully." } when JSON_RESPONSE_MISSING { # Triggered when the JSON payload is missing from the server response log local0. "JSON Response missing." } when JSON_REQUEST_MISSING { # Triggered when the JSON is missing or can't be parsed in the request log local0. "JSON Request missing." } when JSON_RESPONSE_ERROR { # Triggered when there's an error in the JSON response processing log local0. "Error processing JSON response." #HTTP::respond 500 content "Invalid JSON response from server." } when JSON_REQUEST_ERROR { # Triggered when an error occurs (e.g., malformed JSON) during JSON processing log local0. "Error processing JSON request. Rejecting request." #HTTP::respond 400 content "Malformed JSON payload. Please check your input." } MCP tool removal This iRule modifies server JSON responses by removing disallowed tools from the result.tools array while logging detailed debugging information. How it works JSON parsing and logging print procedure - recursively traverses and logs the JSON structure, including arrays, objects, strings, and other types. jpath procedure - extracts values or JSON elements based on a provided path, allowing targeted retrieval of nested properties. JSON response handling When JSON_RESPONSE is triggered: Logs the root JSON object and parses it using JSON::root. Extracts the tools array from result.tools. Tool removal logic Iterates over the tools array and retrieves the name of each tool. If the tool name matches start-notification-stream: Removes it from the array using JSON::array remove. Logs that the tool is not allowed. If the tool does not match: Logs that the tool is allowed and moves to the next one. Logging information Logs all JSON structures and actions: Full JSON structure. Extracted tools array. Tools allowed or removed. Input JSON Response { "result": { "tools": [ {"name": "start-notification-stream"}, {"name": "allowed-tool"} ] } } Modified Response { "result": { "tools": [ {"name": "allowed-tool"} ] } } iRule: Remove tool list # Code to check JSON and print in logs proc print { e } { set t [JSON::type $e] set v [JSON::get $e] set p0 [string repeat " " [expr {2 * ([info level] - 1)}]] set p [string repeat " " [expr {2 * [info level]}]] switch $t { array { log local0. "$p0\[" set size [JSON::array size $v] for {set i 0} {$i < $size} {incr i} { set e2 [JSON::array get $v $i] call print $e2 } log local0. "$p0\]" } object { log local0. "$p0{" set keys [JSON::object keys $v] foreach k $keys { set e2 [JSON::object get $v $k] log local0. "$p${k}:" call print $e2 } log local0. "$p0}" } string - literal { set v2 [JSON::get $e $t] log local0. "$p\"$v2\"" } default { set v2 [JSON::get $e $t] log local0. "$p$v2" } } } proc jpath { e path {d .} } { if { [catch {set v [call jpath2 $e $path $d]} err] } { return "" } return $v } proc jpath2 { e path {d .} } { set parray [split $path $d] set plen [llength $parray] set i 0 for {} {$i < [expr {$plen }]} {incr i} { set p [lindex $parray $i] set t [JSON::type $e] set v [JSON::get $e] if { $t eq "array" } { # array set e [JSON::array get $v $p] } else { # object set e [JSON::object get $v $p] } } set t [JSON::type $e] set v [JSON::get $e $t] return $v } # Modify in response when JSON_RESPONSE { log local0. "JSON::root" set root [JSON::root] call print $root set tools [call jpath $root result.tools] log local0. "root = $root tools= $tools" if { $tools ne "" } { log local0. "TOOLS not empty" set i 0 set block_tool "start-notification-stream" while { $i < 100 } { set name [call jpath $root result.tools.${i}.name] if { $name eq "" } { break } if { $name eq $block_tool } { log local0. "tool $name is not alowed" JSON::array remove $tools $i } else { log local0. "tool $name is alowed" incr i } } } else { log local0. "no tools" } } Conclusion This not only concludes the article, but also this introductory series on managing MCP in iRules. Note that all these commands handle all things JSON, so you are not limited to MCP contexts. We look forward to what the community will build (and hopefully share back) with this new functionality! NOTE: This series is ghostwritten. Awaiting permission from original author to credit.
JRahm
Nov 20, 2025 Place Technical Articles
231Views
2likes
0Comments
Managing Model Context Protocol in iRules - Part 2
In the first article in this series, we took a look at what Model Context Protocol (MCP) is, and how to get the F5 BIG-IP set up to manage it with iRules. In this article, we'll take a look at the first couple of use cases with session persistence and routing. Note that the use cases in this article do not require the json or sse profiles to work. That will change in part 3. Session persistence and routing This iRule ensures session persistence and traffic routing for three endpoints: /sse, /messages, and /mcp. It injects routing information (f5Session) via query parameters or headers, processes them for routing to specific pool members, and transparently forwards requests to the server. How it works Client sends HTTP GET request to SSE endpoint of server (typically /sse): GET /sse HTTP/1.1 Server responds 200 OK with an SSE event stream. It includes an SSE message with an "event" field of "endpoint", which provides the client with a URI where all its future HTTP requests must be sent. This is where servers might include a session ID: event: endpoint data: /messages?sessionId=abcd1234efgh5678 NOTE: the MCP spec does not specify how a session ID can be encoded in the endpoint here. While we have only seen use of a sessionId query parameter, theoretically a server could implement its session Ids with any arbitrary query parameter name, or even as part of the path like this: event: endpoint data: /messages/abcd1234efgh5678 Our iRule can take advantage of this mechanism by injecting a query parameter into this path that tells us which server we should persist future requests to. So when we forward the SSE message to the client, it looks something like this: event: endpoint data: /messages?f5Session=some_pool_name,10.10.10.5:8080&sessionId=abcd1234efgh5678 or event: endpoint data: /messages/abcd1234efgh5678?f5Session=some_pool_name,10.10.10.5:8080 When the client sends a subsequent HTTP request, it will use this endpoint. Thus, when processing HTTP requests, we can read the f5Session secret we inserted earlier, route to that pool member, and then remove our secret before forwarding the request to the server using the original endpoint/sessionId it provided. Load Balancing when HTTP_REQUEST { set is_req_to_sse_endpoint false # Handle requests to `/sse` (Server-Sent Event endpoint) if { [HTTP::path] eq "/sse" } { set is_req_to_sse_endpoint true return } # Handle `/messages` endpoint persistence query processing if { [HTTP::path] eq "/messages" } { set query_string [HTTP::query] set f5_sess_found false set new_query_string "" set query_separator "" set queries [split $query_string "&"] ;# Split query string into individual key-value pairs foreach query $queries { if { $f5_sess_found } { append new_query_string "${query_separator}${query}" set query_separator "&" } elseif { [string match "f5Session=*" $query] } { # Parse `f5Session` for persistence routing set pmbr_info [string range $query 10 end] set pmbr_parts [split $pmbr_info ","] if { [llength $pmbr_parts] == 2 } { set pmbr_tuple [split [lindex $pmbr_parts 1] ":"] if { [llength $pmbr_tuple] == 2 } { pool [lindex $pmbr_parts 0] member [lindex $pmbr_parts 1] set f5_sess_found true } else { HTTP::respond 404 noserver return } } else { HTTP::respond 404 noserver return } } else { append new_query_string "${query_separator}${query}" set query_separator "&" } } if { $f5_sess_found } { HTTP::query $new_query_string } else { HTTP::respond 404 noserver } return } # Handle `/mcp` endpoint persistence via session header if { [HTTP::path] eq "/mcp" } { if { [HTTP::header exists "Mcp-Session-Id"] } { set header_value [HTTP::header "Mcp-Session-Id"] set header_parts [split $header_value ","] if { [llength $header_parts] == 3 } { set pmbr_tuple [split [lindex $header_parts 1] ":"] if { [llength $pmbr_tuple] == 2 } { pool [lindex $header_parts 0] member [lindex $header_parts 1] HTTP::header replace "Mcp-Session-Id" [lindex $header_parts 2] } else { HTTP::respond 404 noserver } } else { HTTP::respond 404 noserver } } } } when HTTP_RESPONSE { # Persist session for MCP responses if { [HTTP::header exists "Mcp-Session-Id"] } { set pool_member [LB::server pool],[IP::remote_addr]:[TCP::remote_port] set header_value [HTTP::header "Mcp-Session-Id"] set new_header_value "$pool_member,$header_value" HTTP::header replace "Mcp-Session-Id" $new_header_value } # Inject persistence information into response payloads for Server-Sent Events if { $is_req_to_sse_endpoint } { set sse_data [HTTP::payload] ;# Get the SSE payload # Extract existing query params from the SSE response set old_queries [URI::query $sse_data] if { [string length $old_queries] == 0 } { set query_separator "" } else { set query_separator "&" } # Insert `f5Session` persistence information into query set new_query "f5Session=[URI::encode [LB::server pool],[IP::remote_addr]:[TCP::remote_port]]" set new_payload "?${new_query}${query_separator}${old_queries}" # Replace the payload in the SSE response HTTP::payload replace 0 [string length $sse_data] $new_payload } } Persistence when CLIENT_ACCEPTED { # Log when a new TCP connection arrives (useful for debugging) log local0. "New TCP connection accepted from [IP::client_addr]:[TCP::client_port]" } when HTTP_REQUEST { # Check if this might be an SSE request by examining the Accept header if {[HTTP::header exists "Accept"] && [HTTP::header "Accept"] contains "text/event-stream"} { log local0. "SSE Request detected from [IP::client_addr] to [HTTP::uri]" # Insert a custom persistence key (optional) set sse_persistence_key "[IP::client_addr]:[HTTP::uri]" persist uie $sse_persistence_key } } when HTTP_RESPONSE { # Ensure this is an SSE connection by checking the Content-Type if {[HTTP::header exists "Content-Type"] && [HTTP::header "Content-Type"] equals "text/event-stream"} { log local0. "SSE Response detected for [IP::client_addr]. Enabling persistence." # Use the same persistence key for the response persist add uie $sse_persistence_key } } Conclusion Thank you for your patience! Now is the time to continue on to part 3 where we'll finally get into the new JSON commands and events added in version 21! NOTE: This series is ghostwritten. Awaiting permission from original author to credit.
JRahm
Nov 19, 2025 Place Technical Articles
127Views
3likes
0Comments
Managing Model Context Protocol in iRules - Part 1
The Model Context Protocol (MCP) was introduced by Anthropic in November of 2024, and has taken the industry by storm since. MCP provides a standardized way for AI applications to connect with external data sources and tools through a single protocol, eliminating the need for custom integrations for each service and enabling AI systems to dynamically discover and use available capabilities. It's gained rapid industry adoption because major model providers and numerous IDE and tool makers have embraced it as an open standard, with tens of thousands of MCP servers built and widespread recognition that it mostly solves the fragmented integration challenge that previously plagued AI development. In this article, we'll take a look at the MCP components, how MCP works, and how you can use the JSON iRules events and commands introduced in version 21 to control the messsaging between MCP clients and servers. MCP components Host The host is the AI application where the LLM logic resides, such as Claude Desktop, AI-powered IDEs like Cursor, Open WebUI with the mcpo proxy like in our AI Step-by-Step labs, or via custom agentic systems that receive user requests and orchestrate the overall interaction. Client The client exists within the host application and maintains a one-to-one connection with each MCP server, converting user requests into the structured format that the protocol can process and managing session details like timeouts and reconnects. Server Servers are lightweight programs that expose data and functionality from external systems, whether internal databases or external APIs, allowing connections to both local and remote resources. Multiple clients can exist within a host, but each client has a dedicated (or perceived in the case of using a proxy) 1:1 relationship with an MCP server. MCP servers expose three main types of capabilities: Resources - information retrieval without executing actions Tools - performing side effects like calculations or API requests Prompts - reusable templates and workflows for LLM-server communication Message format (JSON-RPC) The transport layer between clients and servers uses JSON-RPC format for two-way message conversion, allowing the transport of various data structures and their processing rules. This enforces a consistent request/response format across all tools, so applications don't have to handle different response types for different services. Transport options MCP supports three standard transport mechanisms: stdio (standard input/output for local connections), Server-Sent Events (SSE for remote connections with separate endpoints for requests and responses), and Streamable HTTP (a newer method introduced in March 2025 that uses a single HTTP endpoint for bidirectional messaging). NOTE: SSE transport has been deprecated as of protocol version 2024-11-05 and replaced by Streamable HTTP, which addresses limitations like lack of resumable streams and the need to maintain long-lived connections, though SSE is still supported for backward compatibility. MCP workflow Pictures tell a compelling story. First, the diagram. The steps in the diagram above are as follows: The MCP client requests capabilities from the MCP server The MCP server provides a list of available tools and services the MCP client sends the question and the retrieved MCP server tools and services to the LLM The LLM specifies which tools and services to use. The MCP client calls the specific tool or service The MCP server returns the result/context to the MCP client The MCP client passes the result/context to the LLM The LLM uses the result/context to prepare the answer iRules MCP-based use cases There are a bunch of use cases for MCP handling, such as: Load-balancing of MCP traffic across MCP Servers High availability of the MCP Servers MCP message validation on behalf of MCP servers MCP protocol inspection and payload modification Monitoring the MCP Servers' health and their transport protocol status. In case of any error in MCP request and response, BIG-IP should be able to detect and report to the user Optimization Profiles Support Use OneConnect Profile Use Compression Profile Security support for MCP servers. There are no native features for this yet, but you can build your own secure business logic into the iRules logic for now. LTM profiles Configuring MCP involves creating two profiles - an SSE profile and a JSON profile - and then attaching them to a virtual server. The SSE profile is for backwards compatibility should you need it in your MCP client/server environment. The defaults for these profiles are shown below. [root@ltm21a:Active:Standalone] config # tmsh list ltm profile sse all-properties ltm profile sse sse { app-service none defaults-from none description none max-buffered-msg-bytes 65536 max-field-name-size 1024 partition Common } [root@ltm21a:Active:Standalone] config # tmsh list ltm profile json all-properties ltm profile json json { app-service none defaults-from none description none maximum-bytes 65536 maximum-entries 2048 maximum-non-json-bytes 32768 partition Common } These can be tuned down from these maximums by creating custom profiles that will meet the needs of your environment, for example (without all properties like above): [root@ltm21a:Active:Standalone] config # tmsh create ltm profile sse sse_test_env max-buffered-msg-bytes 1000 max-field-name-size 500 [root@ltm21a:Active:Standalone] config # tmsh create ltm profile json json_test_env maximum-bytes 3000 maximum-entries 1000 maximum-non-json-bytes 2000 [root@ltm21a:Active:Standalone] config # tmsh list ltm profile sse sse_test_env ltm profile sse sse_test_env { app-service none max-buffered-msg-bytes 1000 max-field-name-size 500 } [root@ltm21a:Active:Standalone] config # tmsh list ltm profile json json_test_env ltm profile json json_test_env { app-service none maximum-bytes 3000 maximum-entries 1000 maximum-non-json-bytes 2000 } NOTE: Both profiles have database keys that can be temporarily enabled for troubleshooting purposes. The keys are log.sse.level and log.json.level. You can set the value for one or both to debug. Do not leave them in debug mode! Conclusion Now that we have the laid the foundation, continue on to part 2 where we'll look at the first two use cases. NOTE: This series is ghostwritten. Awaiting permission from original author to credit.
JRahm
Nov 18, 2025 Place Technical Articles
262Views
3likes
1Comment
Accelerating AI Data Delivery with F5 BIG-IP
Introduction AI continues to rely heavily on efficient data delivery infrastructures to innovate across industries. S3 is the protocol that AL/ML engineers rely on for data delivery. As AI workloads grow in complexity, ensuring seamless and resilient data ingestion and delivery becomes critical. This will support massive datasets, robust training workflows, and production-grade outputs. S3 is HTTP-based, so F5 is commonly used to provide advanced capabilities for managing S3-compatible storage pipelines, enforcing policies, and preventing delivery failures. This enables businesses to maintain operational excellence in AI environments. This article explores three key functions of F5 BIG-IP within AI data delivery through embedded demo videos. From optimizing S3 data pipelines and enforcing granular policies to monitoring traffic health in real time, F5 presents core functions for developers and organizations striving for agility in their AI operations. The diagram shows a scalable, resilient, and secure AI architecture facilitated by F5 BIG-IP. End-user traffic is directed to the front-end application through F5, ensuring secure and load-balanced access via the "Web and API front door." This traffic interacts with the AI Factory, comprising components like AI agents, inference, and model training, also secured and scaled through F5. Data is ingested into enterprise events and data stores, which are securely delivered back to the AI Factory's model training through F5 to support optimized resource utilization. Additionally, the architecture includes Retrieval-Augmented Generation (RAG), securely backed by AI object storage and connected through F5 for AI APIs. Whether from the front-end applications or the AI Factory, traffic to downstream services like AI agents, databases, websites, or queues is routed via F5 to ensure consistency, security, and high availability across the ecosystem. This comprehensive deployment highlights F5's critical role in enabling secure, efficient AI-powered operations. 1. Ensure Resilient AI Data and S3 Delivery Pipelines with F5 BIG-IP Modern AI workflows often rely on S3-compatible storage for high-throughput data delivery. However, a common problem is inefficient resource utilization in clusters due to uneven traffic distribution across storage nodes, causing bottlenecks, delays, and reliability concerns. If you manage your own storage environment, or have spoken to a storage administrator, you’ll know that “hot spots” are something to avoid when dealing with disk arrays. In this demo, F5 BIG-IP demonstrates how a loose-coupling architecture solves these issues. By intelligently distributing traffic across all cluster nodes via a virtual server, BIG-IP ensures balanced load distribution, eliminates bottlenecks, and provides high-performance bandwidth for AI workloads. The demo uses Warp, a S3 benchmarking too, to highlight how F5 BIG-IP can take incoming S3 traffic and route it efficiently to storage clusters. We use the least-connection load balancing algorithm to minimize latency across the nodes while maximizing resource utilization. We also add new nodes to the load balancing pool, ensuring smooth, scalable, and resilient storage pipelines. 2.Enforce Policy-Driven AI Data Delivery with F5 BIG-IP AI workloads are susceptible to traffic spikes that can destabilize storage clusters and impact concurrent data workflows. The video demonstrates using iRules to cap connections and stabilize clusters under high request-per-second spikes. Additionally, we use local traffic policies to redirect specific buckets while preserving other ongoing requests. For operational clarity, the study tool visualizes real-time cluster metrics, offering deep insights into how policies influence traffic. 3.Prevent AI Data Delivery Failures with F5 BIG-IP AI operations depend on high efficiency and reliable data delivery to maintain optimal training and model fine-tuning workflows. The video demonstrates how F5 BIG-IP uses real-time health monitors to ensure storage clusters remain operational during failure scenarios. By dynamically detecting node health and write quorum thresholds, BIG-IP intelligently routes traffic to backup pools or read quorum clusters without disrupting endpoints. The health monitors also detect partial node failures, which is important to avoid risk of partial writes when working with S3 storage.. Conclusion Once again, with AI so reliant on HTTP-based S3 storage, F5 administrators find themselves as a critical part of the latest technologies. By enabling loose coupling, enforcing granular policies, and monitoring traffic health in real time, F5 optimizes data delivery for improved AI model accuracy, faster innovation, and future-proof architectures. Whether facing unpredictable traffic surges or handling partial failures in clusters, BIG-IP ensures your applications remain resilient and ready to meet business demands with ease. Related Resources AI Data Delivery Use Case AI Reference Architecture Enterprise AI delivery and security
sridharm
Nov 18, 2025 Place Technical Articles
181Views
3likes
0Comments