ai
88 TopicsOne Quick Step to Make your website AI-Agent/MCP Ready with an iRule
The Problem Nobody Warned You About Here’s the thing about the AI agent explosion: GPTBot, ClaudeBot, PerplexityBot, and a dozen other crawlers are hitting your web applications today. And they’re getting back the same bloated HTML that your browser gets, complete with navigation bars, cookie banners, SVG icons, inline JavaScript, and CSS that means absolutely nothing to an LLM, other than a hit to your token usage. These agents don’t need your <nav> with 47 links. They don’t need your cookie consent modal. They definitely don’t need 200+ lines of minified CSS & JS. They need the content. The headings, the paragraphs, the links, the data. If you, or anyone, are using an agent to access and utilize the data on the page, it’s burning through a massive amount of tokens, generally ~2k per GET. But what if your BIG-IP could intercept these requests, see that the client is an AI agent, and transform that HTML response into clean markdown before it ever leaves your network? BTW, there is plenty of room for improvement here, and a small disclaimer at the end! The Approach The iRule works in three phases across three HTTP events. Here’s the flow: Client Request => HTTP_REQUEST (detect agent, strip Accept-Encoding) Origin Response => HTTP_RESPONSE (check HTML, collect body) Body Received => HTTP_RESPONSE_DATA (convert HTML => Markdown, replace body) Client receives clean markdown with Content-Type: text/plain Detection: Who’s an AI Agent? This example is set up to detect agents three ways; because different agents announce themselves differently, and we want to give humans a way to trigger it too (mostly I used this for testing, notes on that later). when HTTP_REQUEST { set is_ai_agent 0 set ua [string tolower [HTTP::header "User-Agent"]] # The usual suspects if { $ua contains "gptbot" || $ua contains "chatgpt-user" || $ua contains "claudebot" || $ua contains "claude-web" || $ua contains "perplexitybot" || $ua contains "cohere-ai" || $ua contains "google-extended" || $ua contains "applebot-extended" || $ua contains "bytespider" || $ua contains "ccbot" || $ua contains "amazonbot" } { set is_ai_agent 1 } # Explicit opt-in via header if { [HTTP::header "X-Request-Format"] eq "markdown" } { set is_ai_agent 1 } # Content negotiation (the standards-correct way) if { [HTTP::header "Accept"] contains "text/markdown" } { set is_ai_agent 1 } Why three methods? User-Agent detection handles the common crawlers automatically. The X-Request-Format header lets any client explicitly request markdown. And Accept: text/markdown is proper HTTP content negotiation, the way it should work once the ecosystem matures. The Demo Path: /md/ Prefix I added one more trigger that’s purely for demos: set orig_uri [HTTP::uri] if { $orig_uri starts_with "/md/" } { set is_ai_agent 1 set new_uri [string range $orig_uri 3 end] if { $new_uri eq "" } { set new_uri "/" } HTTP::uri $new_uri } Visit /md/ in your browser and you get the markdown version of the upstream site. This is great for showing the capability to someone without having to modify your User-Agent string or install curl. Preventing Compressed Responses This one bit me during testing. And if you could believe it, Kunal Anand is the one who gave me a tip to actually find the resolution. If the origin returns gzip-compressed HTML, HTTP::payload gives you binary garbage. The fix: if { $is_ai_agent } { HTTP::header replace "Accept-Encoding" "identity" } We just need to strip the Accept-Encoding header on the request side so the origin sends us uncompressed HTML. And I added a safety net in HTTP_RESPONSE: when HTTP_RESPONSE { if { $is_ai_agent } { if { [HTTP::header "Content-Type"] contains "text/html" } { set ce [HTTP::header "Content-Encoding"] if { $ce ne "" } { if { $ce ne "identity" } { set is_ai_agent 0 HTTP::header insert "X-Markdown-Skipped" "compressed-response" return } } } } If the upstream ignores our Accept-Encoding override and sends gzip anyway, we bail gracefully instead of serving corrupted content. Defense in-depth! The Conversion: Where the Magic Happens This is HTTP_RESPONSE_DATA, the body has been collected and we have the raw HTML. Now we convert it to markdown through a series of regex passes. Phase 1: The Multiline Problem Tcl's . in regex doesn't match newlines. Every <script>, <style>, and <nav> block in real HTML spans multiple lines. So this won’t work: # This silently fails on multiline <script> blocks! regsub -all -nocase {<script[^>]*>.*?</script>} $html_body "" html_body The fix, again, another hint from Kunal: collapse all newlines to a sentinel character before stripping block elements, then restore them after: set NL_MARK "\x01" set html_body [string map [list "\r\n" $NL_MARK "\r" $NL_MARK "\n" $NL_MARK] $html_body] # NOW these work, everything is one "line" regsub -all -nocase "<script\[^>\]*>.*?</script>" $html_body "" html_body regsub -all -nocase "<style\[^>\]*>.*?</style>" $html_body "" html_body regsub -all -nocase "<nav\[^>\]*>.*?</nav>" $html_body "" html_body # ... strip footer, header, noscript, svg, comments, forms, cookie banners # Restore newlines set html_body [string map [list $NL_MARK "\n"] $html_body] This is the single biggest quality improvement. Without it, you get raw JavaScript and CSS bleeding into your markdown output. Phase 2: Converting Structure With the junk stripped and newlines restored, we convert HTML elements to markdown syntax. Here’s the key insight that took a few iterations: Use [^<]* instead of .*? for tag content. # BAD: .*? crosses newlines in Tcl and matches across multiple tags regsub -all -nocase {<a[^>]*href="(/[^"]*)"[^>]*>(.*?)</a>} ... # GOOD: [^<]* stops at the next tag boundary regsub -all -nocase {<a[^>]*href="(/[^"]*)"[^>]*>([^<]*)</a>} ... This matters when you have two <a> tags on adjacent lines. The .*? version matches from the first <a> opening all the way to the second </a> closing, one giant broken link. The [^<]* version correctly matches each link individually. Here’s the conversion order (it matters): # 1. Headings regsub -all -nocase {<h2[^>]*>([^<]*)</h2>} $html_body "\n## \\1\n\n" html_body # 2. Emphasis BEFORE links (so **bold** inside links works) regsub -all -nocase {<strong[^>]*>([^<]*)</strong>} $html_body {**\1**} html_body # 3. Links with relative URL resolution regsub -all -nocase {<a[^>]*href="(/[^"]*)"[^>]*>([^<]*)</a>} \ $html_body "\[\\2\](https://${http_request_host}\\1)" html_body # 4. Tables, code, lists, paragraphs, blockquotes, images... # 5. Strip ALL remaining tags regsub -all {<[^>]+>} $html_body "" html_body # 6. Decode HTML entities regsub -all {“} $html_body {"} html_body regsub -all {’} $html_body {'} html_body # ... 20+ entity decodings Emphasis before links is important. If you have <a href="/pricing"><strong>$149,900</strong></a>, converting emphasis first gives you <a href="/pricing">$149,900</a>, which then converts to [$149,900](/pricing). Do it the other way, and the bold markers end up orphaned. URL Resolution AI agents need absolute URLs. A relative link like /properties is useless to a bot that doesn’t know what host it’s talking to. We capture $http_request_host in HTTP_REQUEST and use it during link conversion: # Relative to absolute regsub -all -nocase {<a[^>]*href="(/[^"]*)"[^>]*>([^<]*)</a>} \ $html_body "\[\\2\](https://${http_request_host}\\1)" html_body # Absolute stays absolute regsub -all -nocase {<a[^>]*href="(https?://[^"]*)"[^>]*>([^<]*)</a>} \ $html_body "\[\\2\](\\1)" html_body Same treatment for images. Dynamic Table Separators (Yet another place Kunal offered some tips) This one is kind of tricky to solve, because of common HTML table structure standards. Markdown tables need a separator row between the header and body: | Name | Price | Status | |------|-------|--------| | Unit A | $500k | Available | The separator needs the right number of columns. We count <th> tags in the <thead> and build it dynamically (but what if there is no thead? I try to account for that, too): set col_count 0 set thead_check $html_body if { [regsub -nocase {<thead[^>]*>(.*?)</thead>} $thead_check "\\1" first_thead] } { set col_count [regsub -all -nocase {<th[^>]*>} $first_thead "" _discard] } if { $col_count > 0 } { set sep "\n|" for { set c 0 } { $c < $col_count } { incr c } { append sep "---|" } append sep "\n" } If we can’t count the columns in thead, then we default to 6 columns, which could still use some work. But we end up without a hardcoded 2-column separator, breaking our 5-column tables. Performance Considerations This iRule runs in TMM. Every CPU cycle it uses is a cycle not processing other connections. So I built in several guardrails (could be better still): Size limit: Pages over 512KB skip conversion entirely. The regex chain gets expensive on large documents and the output quality degrades anyway. if { $content_length > 524288 } { set is_ai_agent 0 HTTP::header insert "X-Markdown-Skipped" "body-too-large" } Targeted Accept-Encoding: By stripping Accept-Encoding only for AI agent requests, normal browser traffic still gets compressed responses. No performance impact on human users. Logging: Every conversion logs the byte reduction to /var/log/ltm so you can monitor the overhead: markdown: converted 15526 bytes -> 4200 bytes (73% reduction) What This Doesn’t Do (And I Think That’s OK) I will be honest about the limitations: No DOM parsing. This is regex-based conversion. Complex nested structures (a <strong> that wraps three <div>s) won't convert perfectly. You need a real DOM parser for that, and iRules doesn't have one. I avoided using iRulesLX for this project entirely. Multiline tags within content blocks. The newline collapse trick handles <script> and <style>, but a <p> tag with inline markup that spans lines will partially match. The [^<]* pattern helps, but it can't capture text that contains child tags. Tables without <thead>. It detects column count from <th> tags. Tables that use plain <tr><td> with no header get a fallback separator. For 80% of web pages, the output is surprisingly good. For the other 20%, consider iRulesLX (Node.js sidecar with a real DOM parser) or a sideband approach with compiled-language HTML parsing. The Complete iRule Here it is, attach it to your virtual server and you're done: when HTTP_REQUEST { set is_ai_agent 0 set ua [string tolower [HTTP::header "User-Agent"]] if { $ua contains "gptbot" || $ua contains "chatgpt-user" || $ua contains "claudebot" || $ua contains "claude-web" || $ua contains "perplexitybot" || $ua contains "cohere-ai" || $ua contains "google-extended" || $ua contains "applebot-extended" || $ua contains "bytespider" || $ua contains "ccbot" || $ua contains "amazonbot" } { set is_ai_agent 1 } if { [HTTP::header "X-Request-Format"] eq "markdown" } { set is_ai_agent 1 } if { [HTTP::header "Accept"] contains "text/markdown" } { set is_ai_agent 1 } set orig_uri [HTTP::uri] if { $orig_uri starts_with "/md/" } { set is_ai_agent 1 set new_uri [string range $orig_uri 3 end] if { $new_uri eq "" } { set new_uri "/" } HTTP::uri $new_uri } elseif { $orig_uri eq "/md" } { set is_ai_agent 1 HTTP::uri "/" } set http_request_host [HTTP::host] if { $is_ai_agent } { HTTP::header replace "Accept-Encoding" "identity" } } when HTTP_RESPONSE { if { $is_ai_agent } { if { [HTTP::header "Content-Type"] contains "text/html" } { set ce [HTTP::header "Content-Encoding"] if { $ce ne "" } { if { $ce ne "identity" } { set is_ai_agent 0 HTTP::header insert "X-Markdown-Skipped" "compressed-response" return } } set content_length [HTTP::header "Content-Length"] set do_collect 1 if { $content_length ne "" } { if { $content_length > 524288 } { set is_ai_agent 0 set do_collect 0 HTTP::header insert "X-Markdown-Skipped" "body-too-large" } } if { $do_collect } { if { $content_length ne "" } { if { $content_length > 0 } { HTTP::collect $content_length } } else { HTTP::collect 524288 } } } } } when HTTP_RESPONSE_DATA { if { $is_ai_agent } { set html_body [HTTP::payload] set orig_size [string length $html_body] # Phase 1: Collapse newlines for multiline tag stripping set NL_MARK "\x01" set html_body [string map [list "\r\n" $NL_MARK "\r" $NL_MARK "\n" $NL_MARK] $html_body] regsub -all -nocase "<script\[^>\]*>.*?</script>" $html_body "" html_body regsub -all -nocase "<style\[^>\]*>.*?</style>" $html_body "" html_body regsub -all -nocase "<nav\[^>\]*>.*?</nav>" $html_body "" html_body regsub -all -nocase "<footer\[^>\]*>.*?</footer>" $html_body "" html_body regsub -all -nocase "<header\[^>\]*>.*?</header>" $html_body "" html_body regsub -all -nocase "<noscript\[^>\]*>.*?</noscript>" $html_body "" html_body regsub -all -nocase "<svg\[^>\]*>.*?</svg>" $html_body "" html_body regsub -all "<!--.*?-->" $html_body "" html_body regsub -all -nocase "<form\[^>\]*>.*?</form>" $html_body "" html_body # Phase 2: Restore newlines, convert structure set html_body [string map [list $NL_MARK "\n"] $html_body] regsub -all -nocase {<h1[^>]*>([^<]*)</h1>} $html_body "# \\1\n\n" html_body regsub -all -nocase {<h2[^>]*>([^<]*)</h2>} $html_body "\n## \\1\n\n" html_body regsub -all -nocase {<h3[^>]*>([^<]*)</h3>} $html_body "\n### \\1\n\n" html_body regsub -all -nocase {<h4[^>]*>([^<]*)</h4>} $html_body "\n#### \\1\n\n" html_body regsub -all -nocase {<strong[^>]*>([^<]*)</strong>} $html_body {**\1**} html_body regsub -all -nocase {<b[^>]*>([^<]*)</b>} $html_body {**\1**} html_body regsub -all -nocase {<em>([^<]*)</em>} $html_body {*\1*} html_body regsub -all -nocase {<i>([^<]*)</i>} $html_body {*\1*} html_body regsub -all -nocase {<a[^>]*href="(/[^"]*)"[^>]*>([^<]*)</a>} $html_body "\[\\2\](https://${http_request_host}\\1)" html_body regsub -all -nocase {<a[^>]*href="(https?://[^"]*)"[^>]*>([^<]*)</a>} $html_body "\[\\2\](\\1)" html_body regsub -all -nocase {<a[^>]*>([^<]*)</a>} $html_body {\\1} html_body regsub -all -nocase {<th[^>]*>([^<]*)</th>} $html_body "| \\1 " html_body regsub -all -nocase {<td[^>]*>([^<]*)</td>} $html_body "| \\1 " html_body regsub -all -nocase {</tr>} $html_body "|\n" html_body regsub -all -nocase {<code>([^<]*)</code>} $html_body {`\1`} html_body regsub -all -nocase {<li[^>]*>([^<]*)</li>} $html_body "- \\1\n" html_body regsub -all -nocase {</?[uo]l[^>]*>} $html_body "\n" html_body regsub -all -nocase {<p[^>]*>([^<]*)</p>} $html_body "\\1\n\n" html_body regsub -all -nocase {<br\s*/?>} $html_body "\n" html_body regsub -all -nocase {<hr\s*/?>} $html_body "\n---\n\n" html_body regsub -all -nocase {<blockquote[^>]*>} $html_body "> " html_body regsub -all -nocase {</blockquote>} $html_body "\n\n" html_body regsub -all -nocase {<cite>([^<]*)</cite>} $html_body "-- *\\1*\n" html_body regsub -all {<[^>]+>} $html_body "" html_body regsub -all {&} $html_body {\&} html_body regsub -all {<} $html_body {<} html_body regsub -all {>} $html_body {>} html_body regsub -all {"} $html_body {"} html_body regsub -all { } $html_body { } html_body regsub -all {“} $html_body {"} html_body regsub -all {”} $html_body {"} html_body regsub -all {‘} $html_body {'} html_body regsub -all {’} $html_body {'} html_body regsub -all {—} $html_body {--} html_body regsub -all {–} $html_body {-} html_body regsub -all {…} $html_body {...} html_body regsub -all {&#[0-9]+;} $html_body {} html_body regsub -all {\n +} $html_body "\n" html_body regsub -all {\n{3,}} $html_body "\n\n" html_body regsub -all {([^\n])\n\n([^#\n\[>*-])} $html_body "\\1\n\\2" html_body set html_body [string trim $html_body] HTTP::payload replace 0 [HTTP::payload length] $html_body HTTP::header replace "Content-Type" "text/plain; charset=utf-8" HTTP::header replace "Content-Length" [string length $html_body] HTTP::header insert "X-Markdown-Source" "bigip-irule" } } Testing / Demoing It # Normal browser request, HTML as usual curl https://your-site.example.com/ # AI agent simulation curl -H "User-Agent: GPTBot/1.0" https://your-site.example.com/ # Explicit markdown request curl -H "X-Request-Format: markdown" https://your-site.example.com/ # Browser-friendly demo curl https://your-site.example.com/md/ # (or just visit it in your browser) What's Next This is a solid starting point for making your existing sites AI-agent ready without touching application code. A few directions to take it: Agent discovery files: serve /llms.txt and /.well-known/ai-plugin.json so agents can programmatically discover your markdown capability iRulesLX upgrade path: When regex-based conversion isn't enough, move the HTML parsing to a Node.js sidecar with a real DOM parser (cheerio, jsdom). Same detection logic, better conversion quality. The AI agent wave isn't coming. It’s here. Your BIG-IP already sees every request. Might as well make those responses useful. Disclaimer! The iRule in this article was developed as part of a proof-of-concept for edge-layer HTML-to-Markdown conversion. It's been tested on BIG-IP 17.5.1+. Your mileage may vary on complex single-page applications, but for content-heavy sites, it works remarkably well for something that's "just regex."127Views8likes1CommentDNS: El protocolo que nadie ve...hasta que se convierte en el problema más grande de la arquitectura
En el día a día de operación, el DNS rara vez está en el centro de la conversación. No suele ser el protagonista de los roadmaps, ni el primer punto que se revisa en un diseño nuevo. Muchas veces se asume que "ya está ahí", funcionando, resolviendo nombres sin mayor fricción. Y eso es precisamente lo peligroso. Después de años trabajando con arquitecturas empresariales y de Service Providers, hay una constante difícil de ignorar: el DNS es uno de los componentes más críticos de la infraestructura moderna y, al mismo tiempo, uno de los más subestimados. Cuando todo funciona, nadie pregunta por él. Cuando algo falla, todo lo demás deja de importar. He visto aplicaciones "arriba", enlaces estables, firewalls en verde... pero usuarios incapaces de acceder a un servicio. El problema no estaba en la aplicación ni en la red, sino en ese primer paso que todos damos por sentado: resolver un nombre. DNS en la vida real: cuando la teoría se encuentra con la operación En el papel, el DNS parece simple, pero en producción, no lo es. En arquitecturas reales, el DNS participa activamente en decisiones críticas como: A qué sitio o región se dirige un usuario Qué backend recibe una solicitud Cómo se distribuye la carga entre múltiples data centers Qué tan rápido se recupera un servicio ante una degradación o caída En Service Providers y grandes organizaciones, esto se vuelve evidente durante eventos inesperados. Un sismo, una noticia de alto impacto, un evento deportivo global, una actualización urgente de una aplicación, una app que se vuelve tendencia, un anuncio gubernamental o una contingencia sanitaria pueden disparar el tráfico digital en cuestión de minutos. En estos escenarios, el primer crecimiento significativo no ocurre en HTTP ni en TLS, sino en las consultas DNS. Millones de usuarios abren aplicaciones, refrescan portales y generan resoluciones simultáneas hacia los mismos dominios. Si el DNS no está preparado para absorber ese pico: La aplicación nunca recibe tráfico Los balanceadores no alcanzan a participar La experiencia del usuario se degrada de inmediato Aquí es donde soluciones como F5 BIG-IP DNS dejan de ser un "servicio de nombres" y se convierten en un componente activo de control de tráfico, capaz de tomar decisiones basadas en: Estado real de los servicios Salud de los backends Políticas de disponibilidad y continuidad El boom de la inteligencia artificial y su impacto directo en DNS La llegada masiva de la Inteligencia Artificial no solo ha cambiado cómo se consumen aplicaciones; ha cambiado radicalmente el patrón del tráfico. Las arquitecturas modernas de IA introducen: Más microservicios desacoplados Más APIs internas y externas Endpoints efímeros que aparecen y desaparecen Resoluciones dinámicas y frecuentes Cada llamada a un modelo, cada inferencia distribuida y cada backend que escala automáticamente comienza con una consulta DNS. Pero el impacto va mucho más allá de la teoría. Los datos de 2025 lo confirman de forma contundente: Según Humansecurity, el tráfico generado por sistemas de inteligencia artificial casi se triplicó en un solo año, con un crecimiento mensual del 187% entre enero y diciembre. Y dentro de ese universo, el segmento que más creció fue el de agentes autónomos de IA — bots que ya no solo leen la web, sino que interactúan con ella: navegan, consultan APIs, completan transacciones. Ese tipo de tráfico creció más de un 7,800% interanual. ¿Qué significa esto para el DNS? Cada vez que un agente de IA resuelve una consulta, busca un endpoint o escala un servicio, hay una resolución DNS detrás. Y no estamos hablando de patrones predecibles. El tráfico de IA se comporta de forma diferente al tráfico humano: es más agresivo en ráfagas, menos predecible en horarios, y crece a un ritmo ocho veces mayor que el tráfico generado por personas. A esto se suma que los crawlers de IA — los sistemas que rastrean contenido para entrenamiento, búsqueda y acciones en tiempo real — se han convertido en una de las fuentes más grandes de tráfico automatizado en internet. Durante 2025, solo el crawler de Google generó aproximadamente el 4.5% de todas las solicitudes HTML hacia sitios protegidos por las principales plataformas de seguridad. Y el crawling por "acción de usuario" (cuando un usuario le pide algo a un chatbot y este sale a buscar información en la web) creció más de 15 veces en el mismo periodo. Desde una perspectiva técnica, esto tiene implicaciones claras: Crecimiento sostenido y acelerado del volumen de consultas DNS Mayor sensibilidad a latencia y jitter en la resolución Mayor impacto de fallas parciales o intermitentes Patrones de tráfico menos predecibles y más difíciles de planificar El DNS deja de ser infraestructura "estática" y se convierte en un componente dinámico del plano de datos, íntimamente ligado al desempeño de los modelos y servicios de IA. Y quien no esté dimensionando su infraestructura DNS con esta variable en mente, va a sentir el impacto más temprano que tarde. El DNS como punto crítico en eventos globales A lo largo de los años, múltiples eventos han demostrado el impacto del DNS a escala global. Desde interrupciones masivas de servicios digitales hasta ataques de amplificación, el patrón es consistente: cuando el DNS se ve afectado, el impacto se multiplica. Y no hay que ir lejos para encontrar ejemplos. Solo en octubre de 2025, dos incidentes dejaron muy claro lo que pasa cuando el DNS falla en entornos de producción reales. En el primero, un error en el sistema automatizado de gestión DNS de una de las principales plataformas de nube pública dejó inaccesibles servicios críticos en la región más grande de su infraestructura. Los servidores estaban operativos, las bases de datos intactas... pero el DNS los hacía invisibles. La falla se propagó en cascada a más de cien servicios, afectó a miles de empresas en más de 60 países y generó millones de reportes de caída en cuestión de horas. Plataformas de mensajería, gaming, streaming, servicios financieros y hasta aplicaciones gubernamentales quedaron fuera de línea. No fue un ataque. Fue un error de automatización en DNS. Días después, un proveedor de telecomunicaciones europeo con millones de clientes de red fija experimentó una falla de resolución DNS que dejó sin acceso a internet a gran parte de su base de usuarios durante varias horas. El tráfico observado cayó más del 75%. Y la causa raíz fue, nuevamente, un problema de resolución DNS — no un corte de fibra, no un ataque DDoS, no una caída de energía. F5 Labs ha documentado ampliamente cómo: Ataques volumétricos utilizan DNS como vector principal La amplificación DNS sigue siendo una técnica efectiva Eventos globales generan patrones de tráfico anómalos que exponen debilidades de diseño Estos escenarios no distinguen industria ni tamaño. Afectan por igual a proveedores de servicios, plataformas digitales, infraestructura crítica y aplicaciones empresariales. La diferencia no está en si ocurren, sino en qué tan preparada está la arquitectura para absorberlos sin colapsar. Primero la observabilidad: entender antes de reaccionar Durante mucho tiempo, el DNS fue una caja negra. Funcionaba... o no. Hoy, la conversación cambia por completo cuando incorporamos observabilidad real a la ecuación. F5 Insight, componente clave dentro de la plataforma F5 ADSP, permite: Analizar patrones reales de consultas Identificar picos atípicos y comportamientos anómalos Correlacionar eventos externos con tráfico DNS Entender el porqué detrás del comportamiento del sistema Pero F5 Insight va más allá de dashboards y métricas. Incorpora inteligencia artificial para ofrecer guía proactiva, detección de anomalías y algo que cambia la forma de operar: la capacidad de interactuar con tus datos operacionales en lenguaje natural, usando integración con LLMs y el Model Context Protocol (MCP). En la práctica, esto significa pasar de revisar dashboards buscando qué pasó, a recibir narrativas operacionales que te dicen qué está pasando, por qué, y qué deberías priorizar. Para entornos con alta densidad de consultas DNS, donde los patrones cambian en minutos y los picos pueden ser impredecibles,especialmente con el crecimiento del tráfico de IA, ese salto de reactivo a proactivo no es un lujo, es una necesidad operativa. En escenarios de alto QPS, el problema no siempre es la capacidad, sino cómo se comporta el tráfico. Ver el DNS como datos, no solo como servicio, marca una diferencia enorme en operación y planeación.Plataforma y arquitectura: cuando el DNS ya no es "básico" Plataforma y arquitectura: cuando el DNS ya no es "básico" Cuando el DNS se vuelve crítico, la plataforma importa. Arquitecturas basadas en VELOS permiten separar capacidad, resiliencia y crecimiento del ciclo de vida de la aplicación, soportando cargas DNS masivas con aislamiento, alta disponibilidad y escalamiento ordenado. Por su parte, rSeries ofrece un modelo más compacto y eficiente, ideal para despliegues donde el rendimiento por unidad, la latencia y la eficiencia operativa son clave. Y cuando la aplicación ya no vive en un solo data center, F5 Distributed Cloud Services extiende las capacidades del DNS a arquitecturas híbridas y multicloud, manteniendo: Políticas consistentes Visibilidad unificada Control desde el edge hasta el backend El punto no es dónde corre el DNS, sino que acompañe a la aplicación sin perder control, resiliencia ni observabilidad. Seguridad integrada desde el primer paquete El DNS también es una superficie de ataque. Lo hemos comprobado una y otra vez durante eventos de seguridad globales. Y el panorama no se está simplificando: durante 2025, se identificaron más de 100 millones de dominios nuevos, de los cuales una cuarta parte fue clasificada como maliciosos o sospechosos. Los atacantes están registrando volúmenes sin precedentes de dominios, usando automatización para montar campañas masivas que evaden defensas tradicionales. Ignorar al DNS desde el punto de vista de seguridad es dejar una puerta abierta. Integrar protección dentro de la misma plataforma de entrega permite: Mitigar ataques sin introducir latencia adicional Aplicar políticas de forma consistente Mantener visibilidad unificada del tráfico Seguridad, rendimiento y disponibilidad no son capas separadas; son partes del mismo flujo. En ocasiones olvidamos Que el DNS no suele fallar de forma dramática. Falla de forma silenciosa... y por eso es tan peligroso. Invertir en su diseño, observabilidad y seguridad no es sobreingeniería. Es entender que todo empieza ahí. En un mundo donde la IA está triplicando el volumen de tráfico automatizado año con año, donde los agentes autónomos generan resoluciones DNS a un ritmo que no existía hace 18 meses, y donde un solo error de automatización en DNS puede dejar fuera a miles de empresas en más de 60 países... el DNS vuelve a ocupar el lugar que siempre tuvo en la arquitectura, aunque muchos lo hayan olvidado. La pregunta no es si el DNS es crítico. La pregunta es si tu arquitectura está lista para tratarlo como tal.44Views0likes0CommentsContext Cloak: Hiding PII from LLMs with F5 BIG-IP
The Story As I dove deeper into the world of AI -- MCP servers, LLM orchestration, tool-calling models, agentic workflows -- one question kept nagging me: how do you use the power of LLMs to process sensitive data without actually exposing that data to the model? Banks, healthcare providers, government agencies -- they all want to leverage AI for report generation, customer analysis, and workflow automation. But the data they need to process is full of PII: Social Security Numbers, account numbers, names, phone numbers. Sending that to an LLM (whether cloud-hosted or self-hosted) creates a security and compliance risk that most organizations can't accept. I've spent years working with F5 technology, and when I learned that BIG-IP TMOS v21 added native support for the MCP protocol, the lightbulb went on. BIG-IP already sits in the data path between clients and servers. It already inspects, transforms, and enforces policy on HTTP traffic. What if it could transparently cloak PII before it reaches the LLM, and de-cloak it on the way back? That's Context Cloak. The Problem An analyst asks an LLM: "Generate a financial report for John Doe, SSN 078-05-1120, account 4532-1189-0042." The LLM now has real PII. Whether it's logged, cached, fine-tuned on, or exfiltrated -- that data is exposed. Traditional approaches fall short: Approach What Happens The Issue Masking (****) LLM can't see the data Can't reason about what it can't see Tokenization (<<SSN:001>>) LLM sees placeholders Works with larger models (14B+); smaller models may hallucinate Do nothing LLM sees real PII Security and compliance violation The Solution: Value Substitution Context Cloak takes a different approach -- substitute real PII with realistic fake values: John Doe --> Maria Garcia 078-05-1120 --> 523-50-6675 4532-1189-0042 --> 7865-4412-3375 The LLM sees what looks like real data and reasons about it naturally. It generates a perfect financial report for "Maria Garcia." On the way back, BIG-IP swaps the fakes back to the real values. The user sees a report about John Doe. The LLM never knew John Doe existed. This is conceptually a substitution cipher -- every real value maps to a consistent fake within the session, and the mapping is reversed transparently. When I was thinking about this concept, my mind kept coming back to James Veitch's TED talk about messing with email scammers. Veitch tells the scammer they need to use a code for security: Lawyer --> Gummy Bear Bank --> Cream Egg Documents --> Jelly Beans Western Union --> A Giant Gummy Lizard The scammer actually uses the code. He writes back: "I am trying to raise the balance for the Gummy Bear so he can submit all the needed Fizzy Cola Bottle Jelly Beans to the Creme Egg... Send 1,500 pounds via a Giant Gummy Lizard." The real transaction details -- the amounts, the urgency, the process -- all stayed intact. Only the sensitive terms were swapped. The scammer didn't even question it. That idea stuck with me -- what if we could do the same thing to protect PII from LLMs? But rotate the candy -- so it's not a static code book, but a fresh set of substitutions every session. Watch the talk: https://www.ted.com/talks/james_veitch_this_is_what_happens_when_you_reply_to_spam_email?t=280 Why BIG-IP? F5 BIG-IP was the natural candidate: Already in the data path -- BIG-IP is a reverse proxy that organizations already deploy MCP protocol support -- TMOS v21 added native MCP awareness via iRules iRules -- Tcl-based traffic manipulation for real-time HTTP payload inspection and rewriting Subtables -- in-memory key-value storage perfect for session-scoped cloaking maps iAppLX -- deployable application packages with REST APIs and web UIs Trust boundary -- BIG-IP is already the enforcement point for SSL, WAF, and access control How Context Cloak Works An analyst asks a question in Open WebUI Open WebUI calls MCP tools through the BIG-IP MCP Virtual Server The MCP server queries Postgres and returns real customer data (name, SSN, accounts, transactions) BIG-IP's MCP iRule scans the structured JSON response, extracts PII from known field names, generates deterministic fakes, and stores bidirectional mappings in a session-keyed subtable. The response passes through unmodified so tool chaining works. Open WebUI receives real data and composes a prompt When the prompt goes to the LLM through the BIG-IP Inference VS, the iRule uses [string map] to swap every real PII value with its fake counterpart The LLM generates its response using fake data BIG-IP intercepts the response and swaps fakes back to reals. The analyst sees a report about John Doe with his real SSN and account numbers. Two Cloaking Modes Context Cloak supports two modes, configurable per PII field: Substitute Mode Replaces PII with realistic fake values. Names come from a deterministic pool, numbers are digit-shifted, emails are derived. The LLM reasons about the data naturally because it looks real. John Doe --> Maria Garcia (name pool) 078-05-1120 --> 523-50-6675 (digit shift +5) 4532-1189-0042 --> 7865-4412-3375 (digit shift +3) john@email.com --> maria.g@example.net (derived) Best for: fields the LLM needs to reason about naturally -- names in reports, account numbers in summaries. Tokenize Mode Replaces PII with structured placeholders: 078-05-1120 --> <<SSN:32.192.169.232:001>> John Doe --> <<name:32.192.169.232:001>> 4532-1189-0042 --> <<digit_shift:32.192.169.232:001>> A guidance prompt is automatically injected into the LLM request, instructing it to reproduce the tokens exactly as-is. Larger models (14B+ parameters) handle this reliably; smaller models (7B) may struggle. Best for: defense-in-depth with F5 AI Guardrails. The tokens are intentionally distinctive -- if one leaks through de-cloaking, a guardrails policy can catch it. Both modes can be mixed per-field in the same request. The iAppLX Package Context Cloak is packaged as an iAppLX extension -- a deployable application on BIG-IP with a REST API and web-based configuration UI. When deployed, it creates all required BIG-IP objects: data groups, iRules, HTTP profiles, SSL profiles, pools, monitors, and virtual servers. The PII Field Configuration is the core of Context Cloak. The admin selects which JSON fields in MCP responses contain PII and chooses the cloaking mode per field: Field Aliases Mode Type / Label full_name customer_name Substitute Name Pool ssn Tokenize SSN account_number Substitute Digit Shift phone Substitute Phone email Substitute Email The iRules are data-group-driven -- no PII field names are hardcoded. Change the data group via the GUI, and the cloaking behavior changes instantly. This means Context Cloak works with any MCP server, not just the financial demo. Live Demo Enough theory -- here's what it looks like in practice. Step 1: Install the RPM Installing Context Cloak via BIG-IP Package Management LX Step 2: Configure and Deploy Context Cloak GUI -- MCP server, LLM endpoint, PII fields, one-click deploy Deployment output showing session config and saved configuration Step 3: Verify Virtual Servers BIG-IP Local Traffic showing MCP VS and Inference VS created by Context Cloak Step 4: Baseline -- No Cloaking Without Context Cloak: real PII flows directly to the LLM in cleartext This is the "before" picture. The LLM sees everything: real names, real SSNs, real account numbers. Demo 1: Substitute Mode -- SSN Lookup Prompt: "Show me the SSN number for John Doe. Just display the number." Substitute mode -- Open WebUI + Context Cloak GUI showing all fields as Substitute Result: User sees real SSN 078-05-1120. LLM saw a digit-shifted fake. Demo 2: Substitute Mode -- Account Lookup Prompt: "What accounts are associated to John Doe?" Left: Open WebUI with real data. Right: vLLM logs showing "Maria Garcia" with fake account numbers What the LLM saw: "customer_name": "Maria Garcia" "account_number": "7865-4412-3375" (checking) "account_number": "7865-4412-3322" (investment) "account_number": "7865-4412-3376" (savings) What the user saw: Customer: John Doe Checking: 4532-1189-0042 -- $45,230.18 Investment: 4532-1189-0099 -- $312,500.00 Savings: 4532-1189-0043 -- $128,750.00 Switching to Tokenize Mode Changing PII fields from Substitute to Tokenize in the GUI Demo 3: Mixed Mode -- Tokenized SSN SSN set to Tokenize, name set to Substitute. Prompt: "Show me the SSN number for Jane Smith. Just display the number." Mixed mode -- real SSN de-cloaked on left, <<SSN:...>> token visible in vLLM logs on right What the LLM saw: "customer_name": "Maria Thompson" "ssn": "<<SSN:32.192.169.232:001>>" What the user saw: Jane Smith, SSN 219-09-9999 Both modes operating on the same customer record, in the same request. Demo 4: Full Tokenize -- The Punchline ALL fields set to Tokenize mode. Prompt: "Show me the SSN and account information for Carlos Rivera. Display all the numbers." Full tokenize -- every PII field as a token, all de-cloaked on return What the LLM saw -- every PII field was a token: "full_name": "<<name:32.192.169.232:001>>" "ssn": "<<SSN:32.192.169.232:002>>" "phone": "<<phone:32.192.169.232:002>>" "email": "<<email:32.192.169.232:001>>" "account_number": "<<digit_shift:32.192.169.232:002>>" (checking) "account_number": "<<digit_shift:32.192.169.232:003>>" (investment) "account_number": "<<digit_shift:32.192.169.232:004>>" (savings) What the user saw -- all real data restored: Name: Carlos Rivera SSN: 323-45-6789 Checking: 6789-3345-0022 -- $89,120.45 Investment: 6789-3345-0024 -- $890,000.00 Savings: 6789-3345-0023 -- $245,000.00 And here's the best part. Qwen's last line in the response: "Please note that the actual numerical values for the SSN and account numbers are masked due to privacy concerns." The LLM genuinely believed it showed the user masked data. It apologized for the "privacy masking" -- not knowing that BIG-IP had already de-cloaked every token back to the real values. The user saw the full, real, unmasked report. What's Next: F5 AI Guardrails Integration Context Cloak's tokenize mode is designed to complement F5 AI Guardrails. The <<TYPE:ID:SEQ>> format is intentionally distinctive -- if any token leaks through de-cloaking, a guardrails policy can catch it as a pattern match violation. The vision: Context Cloak as the first layer of defense (PII never reaches the LLM), AI Guardrails as the safety net (catches anything that slips through). Defense in depth for AI data protection. Other areas I'm exploring: Hostname-based LLM routing -- BIG-IP as a model gateway with per-route cloaking policies JSON profile integration -- native BIG-IP JSON DOM parsing instead of regex Auto-discovery of MCP tool schemas for PII field detection Centralized cloaking policy management across multiple BIG-IP instances Try It Yourself The complete project is open source: https://github.com/j2rsolutions/f5_mcp_context_cloak The repository includes Terraform for AWS infrastructure, Kubernetes manifests, the iAppLX package (RPM available in Releases), iRules, sample financial data, a test script, comprehensive documentation, and a full demo walkthrough with GIFs (see docs/demo-evidence.md). A Note on Production Readiness I want to be clear: this is a lab proof-of-concept. I have not tested this in a production environment. The cloaking subtable stores PII in BIG-IP memory, the fake name pool is small (100 combinations), the SSL certificates are self-signed, and there's no authentication on the MCP server. There are edge cases around streaming responses, subtable TTL expiry, and LLM-derived values that need more work. But the core concept is proven: BIG-IP can transparently cloak PII in LLM workflows using value substitution and tokenization, and the iAppLX packaging makes it deployable and configurable without touching iRule code. I'd love to hear what the community thinks. Is this approach viable for your use cases? What PII types would you need to support? How would you handle the edge cases? What would it take to make this production-ready for your environment? Let me know in the comments -- and if you want to contribute, PRs are welcome! Demo Environment F5 BIG-IP VE v21.0.0.1 on AWS (m5.xlarge) Qwen 2.5 14B Instruct AWQ on vLLM 0.8.5 (NVIDIA L4, 24GB VRAM) MCP Server: FastMCP 1.26 + PostgreSQL 16 on Kubernetes (RKE2) Open WebUI v0.8.10 Context Cloak iAppLX v0.2.0 References Managing MCP in iRules -- Part 1 Managing MCP in iRules -- Part 2 Managing MCP in iRules -- Part 3 Model Context Protocol Specification James Veitch: This is what happens when you reply to spam email (TED, skip to 4:40)105Views1like0CommentsScality RING and F5 BIG-IP: High-Performance S3 Object Storage
The load balancing of F5 BIG-IP, both locally within a site as well as for global traffic steering to an optimal site around large geographies, works effectively with Scality RING, a modern and massively scalable object storage solution. The RING architecture takes an innovative “bring-your-own Linux” approach to turning highly performant servers, equipped with ample disks, into a resilient, durable storage solution. The BIG-IP can scale in lock step with offered S3 access loads, for use cases like AI data delivery for model training as an example, avoiding any single RING node from being a hot spot, with pioneering load balancing algorithms like “Least Connections” or “Fastest”, to name just a couple. From a global server load balancing perspective, BIG-IP DNS can apply similar advanced logic, for instance, steering S3 traffic to the optimal RING site, taking into consideration the geographic locale of the traffic source or leveraging on-going latency measurements from these traffic source sites. Scality RING – High Capacity and Durability for Today’s Object Storage The Scality solution is well known for the ability to grow the capacity of an enterprise’s storage needs with agility; simply license the usable storage needed today and upgrade on an as-needed basis as business warrants. RING supports both object and file storage; however, the focus of this investigation is object. Industry drivers of object storage growth include its prevalence in AI model training, specifically for content accrual, which will in-turn feed GPUs, as well as data lakehouse implementations. There is an extremely long-tailed distribution of other use cases, such as video clip retention in the media and entertainment industry, medical imaging repositories, updates to traditional uses like NAS offload to S3 and the evolution of enterprise storage backups. At the very minimum, a 3-node site, with 200 TB of storage, serves as a starting point for a RING implementation. The underlying servers typically run RHEL 9 or Rocky Linux, running upon x86 or AMD architectures, and a representative server would offer disk bays, front or back, with loaded disks totally anywhere from 10 to dozens of disk units. Generally, S3 objects are stored upon spinning hard disk drives (HDD) while the corresponding metadata warrants inclusion of a subset of flash drives in a typical Scality deployment. A representative diagram of BIG-IP in support of a single RING site would be as follows. One of the known attributes of a well-engineered RING solution is 100 percent data availability. In industry terms, this is an RPO (recovery point objective) of zero, meaning that no data is lost between the moment a failure occurs and the moment the system is restored to its last known good state. This is achieved through means like multiple nodes, multiple disks, and often multiple sites. Included is the combination of replication for small objects, such as retaining 2 or 3 copies of objects smaller than 60 kilobytes and erasure coding (EC) for larger objects. Erasure coding is a nuanced topic within the storage industry. Scality uses a sophisticated take on Erasure coding known as ARC (Advanced Resiliency Coding). In alignment with availability, is the durability of data that can be achieved through RING. This is to say, how “intact” can I believe my data at rest is? The Scality solution is a fourteen 9’s solution, exceeding most other advertised values, including that of AWS. What 9’s correspond to in terms of downtime in a single year can be found here, although it is telling that Wikipedia, as of early 2026, does not even provide calculations beyond twelve 9’s. Finally, in keeping with sound information lifecycle management (ILM), the Scality site may offer an additional server running XDM (eXtended Data Management) to act as a bridge between on-premises RING and public clouds such as AWS and Azure. This allows a tiering approach, where older, “cold” data is moved off-site. Archiving to-tape solutions are also available options. Scality – Quick Overview of Data at Rest Protection The two principal approaches to protecting data in large single or multi-site RING deployments is to combine replication and erasure coding. Replication is simple to understand, for smaller objects an operator simply chooses the number of replicas desired. If two replicas are chosen, indicated by class of service (COS) 2, two copies are spread across nodes. For COS 3, three copies are spread across nodes. A frequent rule of thumb is a three percent rule, this being the fraction of files frequently being 60 kilobytes or less across a full object storage environment, meaning they are to be replicated; replicas are available in cases of hardware disruptions with a given node. Erasure coding is an adjustable technique where larger objects are divided into data chunks, sometimes called data shards or data blocks, and spread (or “striped”) across many nodes. To add resilience, in the case of one or even more hardware issues with nodes or disks within nodes, additional parity chunks are mathematically derived. This way, cleverly and by design, only a subset of the data chunks and parity chunks are required in a solution under duress, and the original object is still easily provided upon an S3 request. In smaller node deployments, it is possible to consider a single RING server as two entities, but dividing storage into two “disk groups.” However, for an ideal, larger RING site, the approach depicted is preferred. The erasure coding depicted, normally referred to with the nomenclature EC(9,3), leads into a deeper design consideration where storage overhead is traded off against data resiliency. In the diagram, as many as 3 nodes holding portions of the data could become unreachable and still the erasure-coded object would be available. The overhead can be considered 33 percent as 3 additional parity chunks were created, beyond the 9 data chunks, and stored. For more risk-adverse operators, an EC of, say, EC(8,4) would allow even more, four points of failure. The trade off would be, in this case, a 50 percent overhead to achieve that increased resiliency. The overhead is still much less than replication, which can see hundreds of percent in overhead, thus the logical choice to use that for only small objects. Together, replication and EC lead to an overall storage efficiency number. Considering a 3 percent small objects environment, an EC(9,3) and COS3 for replication might tactically lead to a long-term palatable data protection posture, all for only a total cost of 41 percent additional storage overhead. The ability to scale out and protect the S3 data in flight is the domain of BIG-IP and what we will review next. BIG-IP – Bring Scale and Traffic Control to Scality RING A starting point for any discussion around BIG-IP are the rich load balancing algorithms and the ability to drop unhealthy nodes from an origin pool, transparent to users who only interact with the configured virtual server. Load balancing for S3 involves avoiding “hot spots”, where a single RING node might otherwise by overly tasked by users directly communicating to it, all while other nodes remain vastly underutilized. By steering DNS resolution of S3 services to BIG-IP, and configured virtual servers, traffic can be spread across all healthy nodes and spread in accordance with interesting algorithms. Popular ones for S3 include: Least Connections – RING nodes with fewer established TCP connections will receive proportionally more of the new S3 transactions, towards a goal of balanced load in the server cluster. Ratio (member) – Although sound practice would be all RING members having similar compute and storage makeup, in some cases, perhaps two vintages of server exist. Ratio will allow proportionally more traffic to target newer, more performant classes of Scality nodes. Fastest (Application) – The number of “in progress” transactions any one server in a pool is handling is considered. If traffic steered to all members is generally similar over time, a member with the least number of transactions actively in progress will be considered a faster member in the pool, and new transactions can favor such low latency servers. The RING nodes are contacted through Scality "S3 Connectors", in an all object deployment the connector resides on the storage node itself. For some configurations, perhaps one with file-based protocols like NFS concurrently running, the S3 Connectors can also be installed on VM or 1U appliances too. Of course, an unhealthy node should be precluded from an origin pool and the ability to do low-impact HTTP-based health monitors, like the HTTP HEAD method to see if an endpoint is responsive are frequently used. With BIG-IP Extended Application Validation (EAV) one can move towards even more sophisticated health checks. An S3 access and secret token pair installed on BIG-IP can be harnessed to perpetually upload and download small objects to each pool member, assuring the BIG-IP administrator that S3 is unequivocally healthy with each pool member. BIG-IP – Control-Plane and Data-Plane Safeguards A popular topic in a Scality software-defined distributed storage solution is that of a noisy neighbor when multiple tenants are considered. Perhaps one tenant has an S3 application which consumes disproportionate amounts of shared resources (CPU, network, or disk I/O), degrading performance for other tenants; controls are needed to counter this. With BIG-IP, a simple control plane threshold can be invoked with a straight-forward iRule, a programmatic rule which can limit the source from producing more than, say, 25 S3 requests over 10 seconds. An iRule is a powerful but normally short, event-driven script. A sample is provided below. Most modern generative AI solutions are well-versed in F5 iRules and can summarize even the most advanced scripts into digestible terms. This iRule examines an application (“client_addr”) that connects to a BIG-IP virtual server and starts a counter, after 10 transactions within 6 seconds the S3 commands will be rejected. The approach is that of a leaky bucket, and the application will be replenished with credits for future transactions over time. Whereas iRules frequently target layer 7, HTTP-layer activity, a wealth of layer 3 and layer 4 controls to limit data plane excessive consumption exist. Take for example the static bandwidth controller concept. Simply create a profile such as the following 10 Mbps example. This bandwidth controller can then be applied against a virtual server, including a virtual server supporting, say, lower-priority S3 application traffic. Focusing on layer-4, the TCP layer, a number of BIG-IP safeguards exist, amongst which are those that can defend against orphaned S3 connections, including those intentionally set up and left open by a bad actor to try to deplete RING resources. Another safeguard, the ability to re-map DiffServ code points or Type of Service (TOS) precedence bits exists. In this manner, a source that exceeds ideal traffic rates can be passed without intervention; however, by remapping heavy upstream traffic, BIG-IP enables network infrastructure adjacent to Scality RING nodes to police or discard such traffic if required. Evolving Modern S3 Traffic with Fresh Takes on TLS TLS underwent a major improvement with the first release of TLS 1.3 in 2018. It removed a number of antiquated security components from official support, things like RSA-style key agreements, SHA-1 hashes, and DES encryption. However, from a performance point of view, the upgrade to TLS 1.3 is equally significant. When establishing a TLS 1.2 session, perhaps towards the goal of an S3 transaction with RING, with a TCP connection established, an application can expect 2 round-trip times to successfully pass the TLS negotiation phase and move forward with encrypted communications. TLS1.3 cuts round trips in half; a new TLS 1.3 session can proceed to encrypted data exchange with a single round-trip time. In fact, when resuming a previously established TLS 1.3 session, 0RTT is possible, meaning the first resumption from the client can itself carry encrypted data. The following packet trace demonstrates 1RTT TLS1.3 establishment (double-click to enlarge image). To turn on this feature, simply use a client-facing TLS profile on BIG-IP and remove the “No TLS1.3” option. Another advancement in TLS, which must have TLS 1.3 enabled to start with, is quantum computing resistance to shared key agreement algorithms in TLS. This is a foundational building block of Post Quantum Computing (PCQ) cryptography, and the most well-known of these techniques is NIST FIPS-203 ML-KEM. The concern with not supporting PCQ today is that traffic in flight, which may be surreptitiously siphoned off and stored long term, will be readable in the future with quantum computers, perhaps as early as 2030. This risk stems from thought leadership like Shor’s algorithm, which indicates public key (asymmetric) cryptography, foundational to shared key establishment between parties in TLS, is at risk. The concern is that due to large-scale, fault-tolerant quantum computers potentially cracking elliptic curve cryptography (ECC) and Diffie-Helman (DH) algorithms. This risk, the so-called Harvest Now, Decrypt Later threat, means sensitive data like tax records, medical information and anything with longer term retention value requires protections today. It cannot be put off safely; action needs to be taken now. FIPS-203 ML-KEM suggests a hybrid approach to shared key derivation, after which TLS parties today can safely continue to use symmetric encryption algorithms like AES, which are thought to be far less susceptible to quantum attacks. Updating our initial one-site topology, we can consider the following improvements. A key understanding is that a hybrid key agreement scheme is used in FIPS -203. Essentially, a parallel set of crypto operations using traditional key agreements like X25519 ECDH key exchange protocol is performed simultaneously to the new MLKEM768 quantum resistant key encapsulation approach. The net result is a significant amount of crypto is carried out, with two sets of calculations, and the final combining of outcomes to come to an agreed upon shared key. The conclusion is this load is likely best suited for only a subset of S3 flows, those with objects housing PII of high long-term potential value. A method to achieve this balance, the trade off between security and performance, is to use multiple BIG-IP virtual servers: a regular set of S3 endpoints with classical TLS support, and higher-security S3 endpoints for selective use. The latter would support the PQC provisions of modern TLS. A full article on configuring BIG-IP for PQC, including a video demonstration of the click-through to add support to a virtual server, can be found here. Multi-site Global Server Load Balancing with BIG-IP and Scality RING An illustrative diagram showing two RING sites, asynchronously connected and offering S3 ingestion and object retrieval is shown below. Note that the BIG-IP DNS, although frequently deployed independently from BIG-IP LTM appliances, can operate on the same, existing LTM appliances as well. In this example, an S3 application physically situated in Phoenix, Arizona, in the American southwest, will use its configured local DNS resolver (frequently shorted to LDNS) to resolve S3 targets to IP addresses. Think, finance.s3.acme.com or humanresources.s3.acme.com. In F5 terms, these example domain names are referred to as “Wide IPs”. An organization such as the fictious acme.com will delegate the relevant sub-domains to F5 DNS, such as s3.acme.com in our example, meaning the F5 appliances in San Francisco and Boston hold the DNS nameservice (NS) resource records for the S3 domain in question, and can answer the client’s DNS resolver authoritatively. The DNS A queries required by the S3 application will land on either BIG-IP DNS platform, San Francisco or Boston. The pair serve for redundancy purposes, and both can provide an enterprise-controlled answer. In other words, should the S3 application target be resolved in Los Angeles or New York City? The F5 solution allows for a multitude of considerations when providing the answer to the above question. Interesting options and their impact on our topology diagram: Global Availability – A common disaster recovery approach. The BIG-IP DNS appliance distributes DNS name resolution requests to the first available virtual server in a pool list the administrator configures. BIG-IP DNS starts at the top of the list of virtual servers and sends requests to the first available virtual server in the list. Only when the virtual server becomes unavailable does BIG-IP DNS send requests to the next virtual server in the list. If we want S3 generally to travel to Los Angeles, and only utilize New York when application availability problems arise, this would be a good approach. Ratio – In a case where we would like a, say, 80/20 split between S3 traffic landing in Los Angeles versus New York, this would be a sound method. Perhaps market reasons make the cost of ingesting traffic in New York more expensive. Round Robin – the logical choice where we would like to see both data centers receive, generally, over time, the same amount of S3 transactions. Topology - BIG-IP DNS distributes DNS name resolution requests using proximity-based load balancing. BIG-IP DNS determines the proximity of the resource by comparing location information derived from the DNS message to the topology records in a topology statement. A great choice if data centers are of similar capacity and S3 transactions are best serviced by the closest physical data center. Note, the source IP address of the application’s DNS resolver is analyzed; if a centralized DNS service is used, perhaps it is not in Phoenix at all. There are techniques like EDNS0 to try to place the actual locality of the application. Round Trip Time – An advanced algorithm that is dynamic, not static. BIG-IP DNS distributes DNS name resolution requests to the virtual server with the fastest measured round-trip time between that data center and a client’s LDNS. This is achieved by having sites send low-impact probes, from “prober pools”, to each application’s DNS resolver over time. Therefore, for new DNS resolution requests, the BIG-IP DNS can tap into real-world latency knowledge to direct S3 traffic to the site, which is demonstrably known to offer the lowest latency. This again works best when the application and DNS resolver are in the same location. The BIG-IP DNS, when selecting between virtual servers, such as in Los Angeles and New York City in our simple example, can have a primary algorithm, a secondary algorithm and a fall-back, hard-coded IP. For instance, consider the first two algorithms are, in order, dynamic approaches, such as prober pools measuring round-trip time and, as a second approach, the measurement of active hop counts between sites and application LDNS. Should both methods fail to provide results, an IP address of last resort, perhaps in our case, Los Angeles, will be provided through the configured fall-back IP. Key takeaway: what is being provided by F5 and Scality is “intelligent” DNS, traffic is directed to the sites not based upon basic network reachability to Los Angeles or New York. In reality, the solution looks behind the local load balancing tier and is aware of the health of each Scality RING member. Thus, traffic is steered in accordance to back-end application health monitoring, something a regular DNS solution would not offer. Multi-site Solutions for Global Deployments and Geo-Awareness One potentially interesting use case for F5 BIG-IP DNS and Scality RING sites would be to tier all data centers into pools, based upon wider geographies. Consider a use case such as the following, with Scality RING sites spread across both North America and Europe. The BIG-IP DNS solution can handle this higher layer of abstraction, the first layer involves choosing between a pool of sites, before delving down one more layer into the pool of virtual servers spread across the sites within the optimal region. Policy is driving the response to a DNS query for S3 services all the way through these two layers. To explore all load balancing methods is an interesting exercise but beyond the scope of this article. The manual here drills into the possible options. To direct traffic at the country or even continent level, one can follow the “Topology” algorithm for first selecting the correct site pool. Persistence can be enabled, allowing future requests from the same LDNS resolver to follow prior outcomes. First, it is good practice to ensure the geo-IP database of BIG-IP is up to date. A brief video here steps a user through the update. The next thing to create is regions. In this diagram the user has created an “Americas” and “Europe” region. In fact, in this particular setup, the Europe region is seen to match all traffic with DNS queries originating outside of North and South American, per the list of member continents. With regions defined, now the one creates simple topology records to control DNS responses for S3 services based upon the source IP of DNS queries on behalf of S3 applications. The net result is a worldwide set of controls with regard to which Scality site S3 transactions will land upon. The decision based upon enterprise objectives can fully consider geographies like continents or individual countries. In our example, once a source region has been decided upon for an inbound DNS request, any of the previous algorithms can kick in. This would include options like global availability for DR within the selected regions, or perhaps measured latency to steer traffic to the most performant site in the region. Summary Scality RING is a software-defined object and file solution that supports data resiliency at levels expected by risk-adverse storage groups, all with contemporary Linux-friendly hardware platforms selected by the enterprise. The F5 BIG-IP application delivery controller complements S3 object traffic involving Scality, through massive scale out of nodes coupled with innovative algorithms for agile spreading of the traffic. Health of RING nodes is perpetually monitored so as to seamlessly bypass any troubled system. When moving to multi-site RING deployments, within a country or even across continents, BIG-IP DNS is harnessed to steer traffic to the optimal site, potentially including geo-ip rules, proximity between user and data center, and established baseline latencies offered by each site to the S3 application’s home location.401Views2likes2CommentsYouTube RSS Newsletter in n8n Root Cause: Why the Ollama Node Broke My Agent
Hey community—Aubrey here. I want to talk about a failure I ran into while building an n8n workflow, because this one cost me some real time and I think it’s going to save you an afternoon if you’re headed down the same road. The short version: I had a workflow working great with OpenAI, and I wanted to swap in Ollama so I could run the LLM locally. Same prompt, same data, same structured output requirements. In my head, that should’ve been a clean plug-and-play change. It wasn’t. It broke in a way that looked like “the model isn’t returning valid JSON,” but the real root cause was something else entirely—and it’s actually documented. What broke (and where it broke) The failure always showed up in the Structured Output Parser. n8n would run the flow, then the parser would throw: "Model output doesn't fit required format," Which is a super reasonable error if your model is rambling, adding commentary, wrapping JSON in markdown, returning tool traces, whatever. So that’s where my head went first: “Okay, I need to tighten the prompt. Maybe the schema is too strict. Maybe Ollama’s being weird.” But here’s the thing: this wasn’t one of those “LLM didn’t obey” moments. This was repeatable, consistent, and it didn’t really matter how I tuned the prompt. The OpenAI version worked; the Ollama version failed, and the parser was just the first place it showed up. The first big clue: the 5-minute wall As I dug in, I started seeing a pattern: a hard failure at exactly five minutes. Not “about five minutes,” not “sometimes,” but right on the dot. That error often surfaced as: "fetch failed." So now we’re not talking about a formatting issue anymore—we’re talking about the request itself failing. That matters, because if the model call dies mid-stream, the structured parser downstream is going to be handed something incomplete or empty or error-shaped, and it’s going to complain that it doesn’t match the schema. That’s not the parser being wrong. That’s the parser doing its job. This 5-minute behavior is also what other folks reported in n8n issue #13655—the Ollama chat model node timing out after 5 minutes even when people tried to change the “keep alive” setting. Reproducing behavior with logs (and why it matters) One of the most useful things I found in that issue thread was simple: Ollama’s own logs clearly show the request dying at 5 minutes when driven by n8n’s AI nodes. You’ll see entries like: failure case: 500 | 5m0s | POST "/api/chat" Then a n8n community member swapped the same payload into a manual HTTP Request node in n8n (which does let you set a timeout), and suddenly the same call works: success case: 200 | 7m0s | POST "/api/chat" That’s a huge diagnostic move. Because it tells you the model isn’t “incapable” or “too slow”—it tells you the client behavior is the problem (timeout / abort / cancellation), not your prompt, not your JSON schema, not the content. And that lined up perfectly with what I was seeing on my side. Getting serious: tcpdump, FIN/ACK, and “context canceled” At some point I wanted proof of what was actually happening on the wire, so I ran a tcpdump against the Ollama port. And yeah—this is where it got real. What I saw was: n8n connects to Ollama fine Data flows for a while (so we’re not talking about “can’t reach host”) At the ~5 minute mark, n8n sends a TCP FIN/ACK (client closes connection) Then also sends an HTTP 500 containing an error like "context cancelled" In the issue thread, you can literally see an example of that pattern: client FIN from n8n → Ollama, then 'HTTP/1.1 500 Internal Server Error' with a body indicating: 'context canceled.' So when I originally said “the structured output parser fails because Ollama’s tool call output isn’t close to what’s expected,” I wasn’t totally wrong about the symptom. But the deeper “why” is: the request is being canceled and what comes back is not a valid structured model output. The parser is just where it becomes obvious when you force the node to return data back to the agent. The root cause (and the part I want everyone to notice) Now here’s the punchline, and this is the part I want to underline, bold, highlight, put on a billboard: The n8n Ollama model node does not work with LLM Tools implementations. That’s not a rumor. That’s in the n8n docs! After a quick recap and discussion, JRahm pointed me to the documentation for the Ollama model integration, and it straight-up says the Ollama node does not support tools, and recommends using Basic LLM Chain instead: What I’m doing next (and what you should do) I’m not done with Ollama—I’m just done trying to use it the wrong way and this is going to spawn two follow-up efforts for me: Attempt to rebuild the same idea using Basic LLM Chain with Ollama, the way the docs recommend. Write a deeper explainer on LLM Tools—what they are, why agents use them, and how that’s different than RAG (because those concepts get mashed together constantly). So if you’re out there wiring up an Agent with structured output and you’re thinking “I’ll just switch the model to Ollama,” don’t do what I did. Read that doc line first. If you need tools, pick a model/node combo that supports tools. If you’re using Ollama, design for the Basic LLM Chain path and you’ll save yourself the five-minute timeout rabbit hole and the structured-parser blame game.194Views0likes0CommentsUsing the Model Context Protocol with Open WebUI
This year we started building out a series of hands-on labs you can do on your own in our AI Step-by-Step repo on GitHub. In my latest lab, I walk you through setting up a Model Context Protocol (MCP) server and the mcpo proxy to allow you to use MCP tools in a locally-hosted Open WebUI + Ollama environment. The steps are well-covered there, but I wanted to highlight what you learn in the lab. What is MCP and why does it matter? MCP is a JSON-based open standard from Anthropic that (shockingly!) is only about 13 months old now. It allows AI assistants to securely connect to external data sources and tools through a unified interface. The key delivery that led to it's rapid adoption is that it solves the fragmentation problem in AI integrations—instead of every AI system needing custom code to connect to each tool or database, MCP provides a single protocol that works across different AI models and data sources. MCP in the local lab My first exposure to MCP was using Claude and Docker tools to replicate a video Sebastian_Maniak released showing how to configure a BIG-IP application service. I wanted to see how F5-agnostic I could be in my prompt and still get a successful result, and it turned out that the only domain-specific language I needed, after it came up with a solution and deployed it, was to specify the load balancing algorithm. Everything else was correct. Kinda blew my mind. I spoke about this experience throughout the year at F5 Academy events and at a solutions days event in Toronto, but more-so, I wanted to see how far I could take this in a local setting away from the pay-to-play tooling offered at that time. This was the genesis for this lab. Tools In this lab, you'll use the following tools: Ollama - Open WebUI mcpo custom mcp server Ollama and Open WebUI are assumed to already be installed, those labs are also in the AI Step-by-Step repo: Installing Ollama Installing Open WebUI Once those are in place, you can clone the repo and deploy in docker or podman, just make sure the containers for open WebUI are in the same network as the repo you're deploying. Results The success for getting your Open WebUI inference through the mcpo proxy and the MCP servers (mine is very basic just for test purposes, there are more that you can test or build yourself) depends greatly on your prompting skills and the abilities of the local models you choose. I had varying success with llama3.2:3b. But the goal here isn't production-ready tooling, it's to build and discover and get comfortable in this new world of AI assistants and leveraging them where it makes sense to augment our toolbox. Drop a comment below if you build this lab and share your successes and failures. Community is the best learning environment.
1.8KViews5likes1CommentThe Fast Path to Safer Labs: CycloneDX SBOMs for F5 Practitioners
Quick note up front about my intent with this lab... I built it to quickly help F5 practitioners keep their lab environments safe from hidden threats. Fast, approachable, and useful on day one. We used the bundled Dependency-Track container because it’s trivial to stand up in a lab. In production, please deploy Dependency-Track backed by a production‑grade database and tune it for scale and durability. Lab-first, but think ahead to enterprise‑ready. Now, let’s talk about why I chose CycloneDX for the SBOM we generated with Trivy, and why it’s the accepted standard I recommend for modern, AI‑heavy workloads. At a high level, an SBOM is your ingredient list for software. Containers that host LLM apps are layered: base OS, GPU drivers and CUDA, language runtimes, Python packages, app binaries, plus external services you call (hosted inference, embeddings, vector databases). If you don’t know what’s in that stack, you can’t manage risk when new CVEs land. CycloneDX gives you that visibility and does it with a security-first design. Here’s why CycloneDX is such a good fit: - Security-first schema. CycloneDX was born into the AppSec world at OWASP. It bakes in identifiers that vulnerability tooling actually uses—package URLs (purls), CPEs, hashes—and a proper dependency graph. That graph matters when the vulnerable thing isn’t your top-level app but the library three layers deep. - Broad component coverage, including services. Real LLM apps don’t stop at “libraries.” CycloneDX can represent applications, libraries, containers, operating systems, files, and services. That service support is huge: if you depend on an external inference API, a hosted vector DB, or a third-party embedding service, CycloneDX can document that right in your SBOM. Your risk picture is no longer just what’s “in the image,” but what the image calls it. - VEX support to cut noise. CycloneDX supports VEX (Vulnerability Exploitability eXchange), which lets you annotate “not affected” or “mitigated” when a CVE shows up in your base image but is not exploitable in your specific deployment. That’s how you keep the signal high and the noise low. - Toolchain adoption. The path we used in the lab—Trivy generates CycloneDX JSON in a single command, Dependency-Track ingests it cleanly—is exactly what you want. Fewer conversions, fewer surprises, more time looking at risk with a project-centric view. So how does that map to LLM app security, specifically? - Containers and drivers: CycloneDX captures the full container context—OS packages, runtime layers, GPU driver stacks—so when you rebuild to pick up a CUDA or base image update, your SBOM reflects the change and your risk dashboard stays current. - Python ecosystems: For model-serving and data pipelines, CycloneDX tracks the Python libraries and their transitive dependencies, so when a popular package pushes a patch for a nasty CVE, you’ll see the impact across your projects. - Model artifacts and files: CycloneDX can represent file components with hashes. If you pin or verify model files, that checksum data helps you detect drift or tampering. - External services: Many LLM apps rely on hosted endpoints. CycloneDX’s service component type lets you document those dependencies, so governance isn’t blind to the parts of your “system” that live outside your containers. Now, let’s compare CycloneDX to other SBOM standards you’ll hear about. SPDX (Software Package Data Exchange) - Strengths: It’s a Linux Foundation standard with deep traction, especially for license compliance. Legal and compliance teams love it for moving license information through CI/CD. - Tradeoffs for AppSec: SPDX can represent dependencies and has added security-relevant fields, but its heritage is compliance rather than vulnerability analysis. Modeling external services is less natural, and a lot of AppSec tooling (like the Trivy -> Dependency-Track workflow we used) is tuned for CycloneDX. If your primary goal is security visibility and CVE triage for containerized AI apps, CycloneDX tends to be the smoother path. SWID tags (ISO/IEC 19770-2) - Strengths: Vendor provided software identification for asset management—who installed what, what version, and how it’s licensed. - Tradeoffs: Limited open tooling, and not a great fit for layered containers or fast-moving dependency graphs. You won’t get the rich, developer-centric view you need for daily AppSec in LLM environments. And a quick reality check: package manifests and lockfiles (pip freeze, requirements.txt, package-lock.json) are useful, but they’re not SBOMs. They miss OS packages, drivers, and container layers. CycloneDX gives you the whole picture. Practically speaking, here’s the loop we ran—and why CycloneDX makes it painless: - Generate: Use Trivy to scan your AI container and spit out CycloneDX JSON. It’s trivial—one line, usually under a minute. - Ingest: Push that SBOM into Dependency-Track via the API. You get components, licenses, vulnerability scores, dependency graphs, and a clean project/version history. - Act: Watch for new CVEs. Use VEX to mark what’s not exploitable in your context. Rebuild, rescan, repeat. Automate it in CI so your SBOM stays fresh without manual babysitting. Production note again, because it matters: the bundled Dependency-Track container is perfect for labs and demos. In production, deploy Dependency-Track with a production-grade database, persistent storage, backups, and access controls that match your enterprise standards. Bottom line: SPDX and CycloneDX are both legitimate, widely accepted SBOM standards. If your priority is license compliance, SPDX is an excellent fit. If your priority is application security for modern, service-heavy, containerized LLM apps, CycloneDX gives you security-first modeling, service coverage, VEX, and an ecosystem that lets you move fast without sacrificing visibility. Voila—grab Trivy, generate CycloneDX, feed Dependency-Track, and start getting signals instead of noise. Fresh installs often look green on day one, but when something changes tomorrow, you’ll see it. That’s the whole game: make hidden threats visible, then make them go away. If you’d like to try the lab, it’s located here. If you want to check out the video of the lab, instead, try this one:
137Views3likes0CommentsIntroducing F5 AI Red Team
F5 AI Red Team simulates adversarial attacks such as prompt injection and jailbreaks at unprecedented speed and scale, allowing for continuous assessment throughout the application lifecycle, providing insights into threats and integrating with F5 AI Guardrails to convert these insights into security policies.
552Views6likes1CommentKey Steps to Securely Scale and Optimize Production-Ready AI for Banking and Financial Services
This article outlines three key actions banks and financial firms can take to better securely scale, connect, and optimize their AI workflows, which will be demonstrated through a scenario of a bank taking a new AI application to production.1.2KViews3likes0Comments