application delivery
36 TopicsDNS: El protocolo que nadie ve...hasta que se convierte en el problema más grande de la arquitectura
En el día a día de operación, el DNS rara vez está en el centro de la conversación. No suele ser el protagonista de los roadmaps, ni el primer punto que se revisa en un diseño nuevo. Muchas veces se asume que "ya está ahí", funcionando, resolviendo nombres sin mayor fricción. Y eso es precisamente lo peligroso. Después de años trabajando con arquitecturas empresariales y de Service Providers, hay una constante difícil de ignorar: el DNS es uno de los componentes más críticos de la infraestructura moderna y, al mismo tiempo, uno de los más subestimados. Cuando todo funciona, nadie pregunta por él. Cuando algo falla, todo lo demás deja de importar. He visto aplicaciones "arriba", enlaces estables, firewalls en verde... pero usuarios incapaces de acceder a un servicio. El problema no estaba en la aplicación ni en la red, sino en ese primer paso que todos damos por sentado: resolver un nombre. DNS en la vida real: cuando la teoría se encuentra con la operación En el papel, el DNS parece simple, pero en producción, no lo es. En arquitecturas reales, el DNS participa activamente en decisiones críticas como: A qué sitio o región se dirige un usuario Qué backend recibe una solicitud Cómo se distribuye la carga entre múltiples data centers Qué tan rápido se recupera un servicio ante una degradación o caída En Service Providers y grandes organizaciones, esto se vuelve evidente durante eventos inesperados. Un sismo, una noticia de alto impacto, un evento deportivo global, una actualización urgente de una aplicación, una app que se vuelve tendencia, un anuncio gubernamental o una contingencia sanitaria pueden disparar el tráfico digital en cuestión de minutos. En estos escenarios, el primer crecimiento significativo no ocurre en HTTP ni en TLS, sino en las consultas DNS. Millones de usuarios abren aplicaciones, refrescan portales y generan resoluciones simultáneas hacia los mismos dominios. Si el DNS no está preparado para absorber ese pico: La aplicación nunca recibe tráfico Los balanceadores no alcanzan a participar La experiencia del usuario se degrada de inmediato Aquí es donde soluciones como F5 BIG-IP DNS dejan de ser un "servicio de nombres" y se convierten en un componente activo de control de tráfico, capaz de tomar decisiones basadas en: Estado real de los servicios Salud de los backends Políticas de disponibilidad y continuidad El boom de la inteligencia artificial y su impacto directo en DNS La llegada masiva de la Inteligencia Artificial no solo ha cambiado cómo se consumen aplicaciones; ha cambiado radicalmente el patrón del tráfico. Las arquitecturas modernas de IA introducen: Más microservicios desacoplados Más APIs internas y externas Endpoints efímeros que aparecen y desaparecen Resoluciones dinámicas y frecuentes Cada llamada a un modelo, cada inferencia distribuida y cada backend que escala automáticamente comienza con una consulta DNS. Pero el impacto va mucho más allá de la teoría. Los datos de 2025 lo confirman de forma contundente: Según Humansecurity, el tráfico generado por sistemas de inteligencia artificial casi se triplicó en un solo año, con un crecimiento mensual del 187% entre enero y diciembre. Y dentro de ese universo, el segmento que más creció fue el de agentes autónomos de IA — bots que ya no solo leen la web, sino que interactúan con ella: navegan, consultan APIs, completan transacciones. Ese tipo de tráfico creció más de un 7,800% interanual. ¿Qué significa esto para el DNS? Cada vez que un agente de IA resuelve una consulta, busca un endpoint o escala un servicio, hay una resolución DNS detrás. Y no estamos hablando de patrones predecibles. El tráfico de IA se comporta de forma diferente al tráfico humano: es más agresivo en ráfagas, menos predecible en horarios, y crece a un ritmo ocho veces mayor que el tráfico generado por personas. A esto se suma que los crawlers de IA — los sistemas que rastrean contenido para entrenamiento, búsqueda y acciones en tiempo real — se han convertido en una de las fuentes más grandes de tráfico automatizado en internet. Durante 2025, solo el crawler de Google generó aproximadamente el 4.5% de todas las solicitudes HTML hacia sitios protegidos por las principales plataformas de seguridad. Y el crawling por "acción de usuario" (cuando un usuario le pide algo a un chatbot y este sale a buscar información en la web) creció más de 15 veces en el mismo periodo. Desde una perspectiva técnica, esto tiene implicaciones claras: Crecimiento sostenido y acelerado del volumen de consultas DNS Mayor sensibilidad a latencia y jitter en la resolución Mayor impacto de fallas parciales o intermitentes Patrones de tráfico menos predecibles y más difíciles de planificar El DNS deja de ser infraestructura "estática" y se convierte en un componente dinámico del plano de datos, íntimamente ligado al desempeño de los modelos y servicios de IA. Y quien no esté dimensionando su infraestructura DNS con esta variable en mente, va a sentir el impacto más temprano que tarde. El DNS como punto crítico en eventos globales A lo largo de los años, múltiples eventos han demostrado el impacto del DNS a escala global. Desde interrupciones masivas de servicios digitales hasta ataques de amplificación, el patrón es consistente: cuando el DNS se ve afectado, el impacto se multiplica. Y no hay que ir lejos para encontrar ejemplos. Solo en octubre de 2025, dos incidentes dejaron muy claro lo que pasa cuando el DNS falla en entornos de producción reales. En el primero, un error en el sistema automatizado de gestión DNS de una de las principales plataformas de nube pública dejó inaccesibles servicios críticos en la región más grande de su infraestructura. Los servidores estaban operativos, las bases de datos intactas... pero el DNS los hacía invisibles. La falla se propagó en cascada a más de cien servicios, afectó a miles de empresas en más de 60 países y generó millones de reportes de caída en cuestión de horas. Plataformas de mensajería, gaming, streaming, servicios financieros y hasta aplicaciones gubernamentales quedaron fuera de línea. No fue un ataque. Fue un error de automatización en DNS. Días después, un proveedor de telecomunicaciones europeo con millones de clientes de red fija experimentó una falla de resolución DNS que dejó sin acceso a internet a gran parte de su base de usuarios durante varias horas. El tráfico observado cayó más del 75%. Y la causa raíz fue, nuevamente, un problema de resolución DNS — no un corte de fibra, no un ataque DDoS, no una caída de energía. F5 Labs ha documentado ampliamente cómo: Ataques volumétricos utilizan DNS como vector principal La amplificación DNS sigue siendo una técnica efectiva Eventos globales generan patrones de tráfico anómalos que exponen debilidades de diseño Estos escenarios no distinguen industria ni tamaño. Afectan por igual a proveedores de servicios, plataformas digitales, infraestructura crítica y aplicaciones empresariales. La diferencia no está en si ocurren, sino en qué tan preparada está la arquitectura para absorberlos sin colapsar. Primero la observabilidad: entender antes de reaccionar Durante mucho tiempo, el DNS fue una caja negra. Funcionaba... o no. Hoy, la conversación cambia por completo cuando incorporamos observabilidad real a la ecuación. F5 Insight, componente clave dentro de la plataforma F5 ADSP, permite: Analizar patrones reales de consultas Identificar picos atípicos y comportamientos anómalos Correlacionar eventos externos con tráfico DNS Entender el porqué detrás del comportamiento del sistema Pero F5 Insight va más allá de dashboards y métricas. Incorpora inteligencia artificial para ofrecer guía proactiva, detección de anomalías y algo que cambia la forma de operar: la capacidad de interactuar con tus datos operacionales en lenguaje natural, usando integración con LLMs y el Model Context Protocol (MCP). En la práctica, esto significa pasar de revisar dashboards buscando qué pasó, a recibir narrativas operacionales que te dicen qué está pasando, por qué, y qué deberías priorizar. Para entornos con alta densidad de consultas DNS, donde los patrones cambian en minutos y los picos pueden ser impredecibles,especialmente con el crecimiento del tráfico de IA, ese salto de reactivo a proactivo no es un lujo, es una necesidad operativa. En escenarios de alto QPS, el problema no siempre es la capacidad, sino cómo se comporta el tráfico. Ver el DNS como datos, no solo como servicio, marca una diferencia enorme en operación y planeación.Plataforma y arquitectura: cuando el DNS ya no es "básico" Plataforma y arquitectura: cuando el DNS ya no es "básico" Cuando el DNS se vuelve crítico, la plataforma importa. Arquitecturas basadas en VELOS permiten separar capacidad, resiliencia y crecimiento del ciclo de vida de la aplicación, soportando cargas DNS masivas con aislamiento, alta disponibilidad y escalamiento ordenado. Por su parte, rSeries ofrece un modelo más compacto y eficiente, ideal para despliegues donde el rendimiento por unidad, la latencia y la eficiencia operativa son clave. Y cuando la aplicación ya no vive en un solo data center, F5 Distributed Cloud Services extiende las capacidades del DNS a arquitecturas híbridas y multicloud, manteniendo: Políticas consistentes Visibilidad unificada Control desde el edge hasta el backend El punto no es dónde corre el DNS, sino que acompañe a la aplicación sin perder control, resiliencia ni observabilidad. Seguridad integrada desde el primer paquete El DNS también es una superficie de ataque. Lo hemos comprobado una y otra vez durante eventos de seguridad globales. Y el panorama no se está simplificando: durante 2025, se identificaron más de 100 millones de dominios nuevos, de los cuales una cuarta parte fue clasificada como maliciosos o sospechosos. Los atacantes están registrando volúmenes sin precedentes de dominios, usando automatización para montar campañas masivas que evaden defensas tradicionales. Ignorar al DNS desde el punto de vista de seguridad es dejar una puerta abierta. Integrar protección dentro de la misma plataforma de entrega permite: Mitigar ataques sin introducir latencia adicional Aplicar políticas de forma consistente Mantener visibilidad unificada del tráfico Seguridad, rendimiento y disponibilidad no son capas separadas; son partes del mismo flujo. En ocasiones olvidamos Que el DNS no suele fallar de forma dramática. Falla de forma silenciosa... y por eso es tan peligroso. Invertir en su diseño, observabilidad y seguridad no es sobreingeniería. Es entender que todo empieza ahí. En un mundo donde la IA está triplicando el volumen de tráfico automatizado año con año, donde los agentes autónomos generan resoluciones DNS a un ritmo que no existía hace 18 meses, y donde un solo error de automatización en DNS puede dejar fuera a miles de empresas en más de 60 países... el DNS vuelve a ocupar el lugar que siempre tuvo en la arquitectura, aunque muchos lo hayan olvidado. La pregunta no es si el DNS es crítico. La pregunta es si tu arquitectura está lista para tratarlo como tal.114Views1like0CommentsContext Cloak: Hiding PII from LLMs with F5 BIG-IP
The Story As I dove deeper into the world of AI -- MCP servers, LLM orchestration, tool-calling models, agentic workflows -- one question kept nagging me: how do you use the power of LLMs to process sensitive data without actually exposing that data to the model? Banks, healthcare providers, government agencies -- they all want to leverage AI for report generation, customer analysis, and workflow automation. But the data they need to process is full of PII: Social Security Numbers, account numbers, names, phone numbers. Sending that to an LLM (whether cloud-hosted or self-hosted) creates a security and compliance risk that most organizations can't accept. I've spent years working with F5 technology, and when I learned that BIG-IP TMOS v21 added native support for the MCP protocol, the lightbulb went on. BIG-IP already sits in the data path between clients and servers. It already inspects, transforms, and enforces policy on HTTP traffic. What if it could transparently cloak PII before it reaches the LLM, and de-cloak it on the way back? That's Context Cloak. The Problem An analyst asks an LLM: "Generate a financial report for John Doe, SSN 078-05-1120, account 4532-1189-0042." The LLM now has real PII. Whether it's logged, cached, fine-tuned on, or exfiltrated -- that data is exposed. Traditional approaches fall short: Approach What Happens The Issue Masking (****) LLM can't see the data Can't reason about what it can't see Tokenization (<<SSN:001>>) LLM sees placeholders Works with larger models (14B+); smaller models may hallucinate Do nothing LLM sees real PII Security and compliance violation The Solution: Value Substitution Context Cloak takes a different approach -- substitute real PII with realistic fake values: John Doe --> Maria Garcia 078-05-1120 --> 523-50-6675 4532-1189-0042 --> 7865-4412-3375 The LLM sees what looks like real data and reasons about it naturally. It generates a perfect financial report for "Maria Garcia." On the way back, BIG-IP swaps the fakes back to the real values. The user sees a report about John Doe. The LLM never knew John Doe existed. This is conceptually a substitution cipher -- every real value maps to a consistent fake within the session, and the mapping is reversed transparently. When I was thinking about this concept, my mind kept coming back to James Veitch's TED talk about messing with email scammers. Veitch tells the scammer they need to use a code for security: Lawyer --> Gummy Bear Bank --> Cream Egg Documents --> Jelly Beans Western Union --> A Giant Gummy Lizard The scammer actually uses the code. He writes back: "I am trying to raise the balance for the Gummy Bear so he can submit all the needed Fizzy Cola Bottle Jelly Beans to the Creme Egg... Send 1,500 pounds via a Giant Gummy Lizard." The real transaction details -- the amounts, the urgency, the process -- all stayed intact. Only the sensitive terms were swapped. The scammer didn't even question it. That idea stuck with me -- what if we could do the same thing to protect PII from LLMs? But rotate the candy -- so it's not a static code book, but a fresh set of substitutions every session. Watch the talk: https://www.ted.com/talks/james_veitch_this_is_what_happens_when_you_reply_to_spam_email?t=280 Why BIG-IP? F5 BIG-IP was the natural candidate: Already in the data path -- BIG-IP is a reverse proxy that organizations already deploy MCP protocol support -- TMOS v21 added native MCP awareness via iRules iRules -- Tcl-based traffic manipulation for real-time HTTP payload inspection and rewriting Subtables -- in-memory key-value storage perfect for session-scoped cloaking maps iAppLX -- deployable application packages with REST APIs and web UIs Trust boundary -- BIG-IP is already the enforcement point for SSL, WAF, and access control How Context Cloak Works An analyst asks a question in Open WebUI Open WebUI calls MCP tools through the BIG-IP MCP Virtual Server The MCP server queries Postgres and returns real customer data (name, SSN, accounts, transactions) BIG-IP's MCP iRule scans the structured JSON response, extracts PII from known field names, generates deterministic fakes, and stores bidirectional mappings in a session-keyed subtable. The response passes through unmodified so tool chaining works. Open WebUI receives real data and composes a prompt When the prompt goes to the LLM through the BIG-IP Inference VS, the iRule uses [string map] to swap every real PII value with its fake counterpart The LLM generates its response using fake data BIG-IP intercepts the response and swaps fakes back to reals. The analyst sees a report about John Doe with his real SSN and account numbers. Two Cloaking Modes Context Cloak supports two modes, configurable per PII field: Substitute Mode Replaces PII with realistic fake values. Names come from a deterministic pool, numbers are digit-shifted, emails are derived. The LLM reasons about the data naturally because it looks real. John Doe --> Maria Garcia (name pool) 078-05-1120 --> 523-50-6675 (digit shift +5) 4532-1189-0042 --> 7865-4412-3375 (digit shift +3) john@email.com --> maria.g@example.net (derived) Best for: fields the LLM needs to reason about naturally -- names in reports, account numbers in summaries. Tokenize Mode Replaces PII with structured placeholders: 078-05-1120 --> <<SSN:32.192.169.232:001>> John Doe --> <<name:32.192.169.232:001>> 4532-1189-0042 --> <<digit_shift:32.192.169.232:001>> A guidance prompt is automatically injected into the LLM request, instructing it to reproduce the tokens exactly as-is. Larger models (14B+ parameters) handle this reliably; smaller models (7B) may struggle. Best for: defense-in-depth with F5 AI Guardrails. The tokens are intentionally distinctive -- if one leaks through de-cloaking, a guardrails policy can catch it. Both modes can be mixed per-field in the same request. The iAppLX Package Context Cloak is packaged as an iAppLX extension -- a deployable application on BIG-IP with a REST API and web-based configuration UI. When deployed, it creates all required BIG-IP objects: data groups, iRules, HTTP profiles, SSL profiles, pools, monitors, and virtual servers. The PII Field Configuration is the core of Context Cloak. The admin selects which JSON fields in MCP responses contain PII and chooses the cloaking mode per field: Field Aliases Mode Type / Label full_name customer_name Substitute Name Pool ssn Tokenize SSN account_number Substitute Digit Shift phone Substitute Phone email Substitute Email The iRules are data-group-driven -- no PII field names are hardcoded. Change the data group via the GUI, and the cloaking behavior changes instantly. This means Context Cloak works with any MCP server, not just the financial demo. Live Demo Enough theory -- here's what it looks like in practice. Step 1: Install the RPM Installing Context Cloak via BIG-IP Package Management LX Step 2: Configure and Deploy Context Cloak GUI -- MCP server, LLM endpoint, PII fields, one-click deploy Deployment output showing session config and saved configuration Step 3: Verify Virtual Servers BIG-IP Local Traffic showing MCP VS and Inference VS created by Context Cloak Step 4: Baseline -- No Cloaking Without Context Cloak: real PII flows directly to the LLM in cleartext This is the "before" picture. The LLM sees everything: real names, real SSNs, real account numbers. Demo 1: Substitute Mode -- SSN Lookup Prompt: "Show me the SSN number for John Doe. Just display the number." Substitute mode -- Open WebUI + Context Cloak GUI showing all fields as Substitute Result: User sees real SSN 078-05-1120. LLM saw a digit-shifted fake. Demo 2: Substitute Mode -- Account Lookup Prompt: "What accounts are associated to John Doe?" Left: Open WebUI with real data. Right: vLLM logs showing "Maria Garcia" with fake account numbers What the LLM saw: "customer_name": "Maria Garcia" "account_number": "7865-4412-3375" (checking) "account_number": "7865-4412-3322" (investment) "account_number": "7865-4412-3376" (savings) What the user saw: Customer: John Doe Checking: 4532-1189-0042 -- $45,230.18 Investment: 4532-1189-0099 -- $312,500.00 Savings: 4532-1189-0043 -- $128,750.00 Switching to Tokenize Mode Changing PII fields from Substitute to Tokenize in the GUI Demo 3: Mixed Mode -- Tokenized SSN SSN set to Tokenize, name set to Substitute. Prompt: "Show me the SSN number for Jane Smith. Just display the number." Mixed mode -- real SSN de-cloaked on left, <<SSN:...>> token visible in vLLM logs on right What the LLM saw: "customer_name": "Maria Thompson" "ssn": "<<SSN:32.192.169.232:001>>" What the user saw: Jane Smith, SSN 219-09-9999 Both modes operating on the same customer record, in the same request. Demo 4: Full Tokenize -- The Punchline ALL fields set to Tokenize mode. Prompt: "Show me the SSN and account information for Carlos Rivera. Display all the numbers." Full tokenize -- every PII field as a token, all de-cloaked on return What the LLM saw -- every PII field was a token: "full_name": "<<name:32.192.169.232:001>>" "ssn": "<<SSN:32.192.169.232:002>>" "phone": "<<phone:32.192.169.232:002>>" "email": "<<email:32.192.169.232:001>>" "account_number": "<<digit_shift:32.192.169.232:002>>" (checking) "account_number": "<<digit_shift:32.192.169.232:003>>" (investment) "account_number": "<<digit_shift:32.192.169.232:004>>" (savings) What the user saw -- all real data restored: Name: Carlos Rivera SSN: 323-45-6789 Checking: 6789-3345-0022 -- $89,120.45 Investment: 6789-3345-0024 -- $890,000.00 Savings: 6789-3345-0023 -- $245,000.00 And here's the best part. Qwen's last line in the response: "Please note that the actual numerical values for the SSN and account numbers are masked due to privacy concerns." The LLM genuinely believed it showed the user masked data. It apologized for the "privacy masking" -- not knowing that BIG-IP had already de-cloaked every token back to the real values. The user saw the full, real, unmasked report. What's Next: F5 AI Guardrails Integration Context Cloak's tokenize mode is designed to complement F5 AI Guardrails. The <<TYPE:ID:SEQ>> format is intentionally distinctive -- if any token leaks through de-cloaking, a guardrails policy can catch it as a pattern match violation. The vision: Context Cloak as the first layer of defense (PII never reaches the LLM), AI Guardrails as the safety net (catches anything that slips through). Defense in depth for AI data protection. Other areas I'm exploring: Hostname-based LLM routing -- BIG-IP as a model gateway with per-route cloaking policies JSON profile integration -- native BIG-IP JSON DOM parsing instead of regex Auto-discovery of MCP tool schemas for PII field detection Centralized cloaking policy management across multiple BIG-IP instances Try It Yourself The complete project is open source: https://github.com/j2rsolutions/f5_mcp_context_cloak The repository includes Terraform for AWS infrastructure, Kubernetes manifests, the iAppLX package (RPM available in Releases), iRules, sample financial data, a test script, comprehensive documentation, and a full demo walkthrough with GIFs (see docs/demo-evidence.md). A Note on Production Readiness I want to be clear: this is a lab proof-of-concept. I have not tested this in a production environment. The cloaking subtable stores PII in BIG-IP memory, the fake name pool is small (100 combinations), the SSL certificates are self-signed, and there's no authentication on the MCP server. There are edge cases around streaming responses, subtable TTL expiry, and LLM-derived values that need more work. But the core concept is proven: BIG-IP can transparently cloak PII in LLM workflows using value substitution and tokenization, and the iAppLX packaging makes it deployable and configurable without touching iRule code. I'd love to hear what the community thinks. Is this approach viable for your use cases? What PII types would you need to support? How would you handle the edge cases? What would it take to make this production-ready for your environment? Let me know in the comments -- and if you want to contribute, PRs are welcome! Demo Environment F5 BIG-IP VE v21.0.0.1 on AWS (m5.xlarge) Qwen 2.5 14B Instruct AWQ on vLLM 0.8.5 (NVIDIA L4, 24GB VRAM) MCP Server: FastMCP 1.26 + PostgreSQL 16 on Kubernetes (RKE2) Open WebUI v0.8.10 Context Cloak iAppLX v0.2.0 References Managing MCP in iRules -- Part 1 Managing MCP in iRules -- Part 2 Managing MCP in iRules -- Part 3 Model Context Protocol Specification James Veitch: This is what happens when you reply to spam email (TED, skip to 4:40)160Views1like0CommentsEmbedding APM Protected Resources into Third-Party Sites
This is the story of a fight with CORS (Cross-Origin Resource Sharing) and the process of understanding how it actually works. APM is a powerful tool that provides solutions for both simple and complex use cases. However, I have historically struggled with one specific scenario: embedding an APM-protected application into a third-party site. Unfortunately APM doesn't provide any mechanism to make this easy. Everything works when accessing the APM application directly. But when the same application was embedded in another site, it failed silently—nothing rendered, and the browser console filled with errors. Since I never had the opportunity to fully investigate the issue, alternative solutions were usually chosen. Recently, I worked with a customer who needed protection for a web application, and APM seemed like the right fit. The application was divided into public and private components, where only the private resources required authentication. This appeared straightforward, and I initially assumed that some iRule logic would be sufficient. But unbeknownst to me the site was not always the entry point to the application which meant that I now had the same old problem with serving content to a 3. party site. Now I was forced to look into CORS for real and how to handle this beast. I used a fair amount of time to get a better understanding and to my surprise it wasn't really all that difficult to fix, it looked like I just needed to inject a couple of headers in the response and then it would all be dandy. This is where the complexity began - specifically around iRules on a virtual server with APM enabled; I just couldn't get it to insert the headers the way I wanted. While there are multiple ways to solve this, I chose a layered virtual server design to keep the logic clean and predictable. This weird construct has been my goto solution for some time now and proven quite reliable. With a layered VS you have two virtual servers which are glued together via an iRule command "virtual". This gives you the option to run modules, iRules, policies etc. independently of each other. One of the VS' is the entry point and based on the logic you then forward traffic to the second one. In this setup the frontend virtual server - the one exposed to the users - has a single iRule responsible for: Handling CORS preflight requests by returning the appropriate headers Injecting CORS headers into responses when the origin is allowed Forwarding all non-preflight requests to the APM virtual server This separation ensures full control over HTTP processing without being constrained by APM behavior. On the inner APM virtual server, an access profile is attached which handles authentication. I usually set a non-reachable IP address on the inner VS to ensure that it can only be reached via the frontend VS. Initially, I planned to use a per-request policy on the inner APM VS with a URL branch agent to determine whether a request should be authenticated or not. While this looked correct on paper, it failed in practice. The third-party site was requesting embedded resources (such as images), and its logic could not handle APM redirects (e.g., '/my.policy'). One possible workaround was to use "clientless mode" by inserting the special header: HTTP::header insert "clientless-mode" 1 However, this introduces additional logic without providing real benefits and I wasn't sure if the application would handle the session cookie correctly or just create a million new APM sessions. Instead, I implemented an iRule that performs a datagroup lookup. If a match is found, APM is bypassed entirely for that request. This approach is simpler to maintain and reduces load by not utilizing the APM module. Below is a diagram illustrating the request flow and decision logic: A user browses the 3. party site which has links to our site. The Origin site. The browser retrieves the resources on our site. Should the browser decided to make a preflight request, the iRule will verify the Origin header from a datagroup and if allowed return the CORS headers. Forward the traffic to the inner APM VS. The iRule on the inner APM VS looks up in a datagroup and find no match and the authentication process is executed. The iRule on the inner APM VS looks up in a datagroup and find a match and disables APM. On the response from the backend we check if the origin was on the approved list. and if so we inject the CORS headers which will allow for the browser to show the content. A user browses the site directly which will bypass any CORS logic. Here is the iRule for the frontend VS: # ============================================================================= # iRule: Outer virtual for redirect + CORS before APM # ----------------------------------------------------------------------------- # Purpose: # - Redirect "/" on portal.example.com to /portal/home/ # - Handle CORS preflight locally before APM # - Forward all other traffic to the inner APM virtual # - Inject CORS headers into normal responses # # Dependencies: # - Attached to outer/public virtual server # - Inner virtual server exists and is called via "virtual" # - Internal string datagroup for allowed CORS origin hostnames # # Datagroups: # - portal_example_com_allowed_cors_origins_dg # # Notes: # - Datagroup must contain lowercase hostnames only # - Empty datagroup = allow all origins # # ============================================================================= when RULE_INIT { set static::portal_example_com_outer_cors_debug_enabled 1 } when HTTP_REQUEST { # Initialize request-scoped state explicitly. set cors_origin "" set cors_origin_host "" set cors_is_allowed 0 if { $static::portal_example_com_outer_cors_debug_enabled } { log local0. "debug: HTTP_REQUEST start - host=[HTTP::host] uri=[HTTP::uri] method=[HTTP::method]" } # ------------------------------------------------------------------------- # Root redirect # ------------------------------------------------------------------------- if { ([string tolower [getfield [HTTP::host] ":" 1]] eq "portal.example.com") && ([HTTP::uri] eq "/") } { if { $static::portal_example_com_outer_cors_debug_enabled } { log local0. "debug: root redirect matched" } HTTP::redirect "https://portal.example.com/portal/home/" return } # ------------------------------------------------------------------------- # Origin evaluation (empty DG = allow all) # ------------------------------------------------------------------------- if { [HTTP::header exists "Origin"] } { set cors_origin [HTTP::header "Origin"] set cors_origin_host [string tolower [URI::host $cors_origin]] if { $static::portal_example_com_outer_cors_debug_enabled } { log local0. "debug: origin='$cors_origin' parsed_host='$cors_origin_host'" } if { ([class size portal_example_com_allowed_cors_origins_dg] == 0) || ($cors_origin_host ne "" && [class match -- $cors_origin_host equals portal_example_com_allowed_cors_origins_dg]) } { set cors_is_allowed 1 if { $static::portal_example_com_outer_cors_debug_enabled } { log local0. "debug: origin allowed" } } else { if { $static::portal_example_com_outer_cors_debug_enabled } { log local0. "debug: origin NOT allowed" } } } else { if { $static::portal_example_com_outer_cors_debug_enabled } { log local0. "debug: no Origin header" } } # ------------------------------------------------------------------------- # Preflight handling # ------------------------------------------------------------------------- if { $cors_is_allowed } { if { ([HTTP::method] eq "OPTIONS") && ([HTTP::header exists "Access-Control-Request-Method"]) } { if { $static::portal_example_com_outer_cors_debug_enabled } { log local0. "debug: handling preflight locally" } HTTP::respond 200 noserver \ "Access-Control-Allow-Origin" $cors_origin \ "Access-Control-Allow-Methods" "GET, POST, OPTIONS" \ "Access-Control-Allow-Headers" [HTTP::header "Access-Control-Request-Headers"] \ "Access-Control-Max-Age" "86400" \ "Vary" "Origin" return } } if { $static::portal_example_com_outer_cors_debug_enabled } { log local0. "debug: forwarding to inner virtual" } # ------------------------------------------------------------------------- # Forward to inner APM virtual # ------------------------------------------------------------------------- virtual portal.example.com_https_vs } when HTTP_RESPONSE { if { $cors_is_allowed && $cors_origin ne "" } { if { $static::portal_example_com_outer_cors_debug_enabled } { log local0. "debug: injecting CORS headers" } HTTP::header replace "Access-Control-Allow-Origin" $cors_origin HTTP::header replace "Vary" "Origin" } } Here is the iRule for the inner APM VS: # ============================================================================= # iRule: Selective APM bypass for portal.example.com using datagroups # ----------------------------------------------------------------------------- # Purpose: # Disable APM only for explicitly public endpoint prefixes while preserving # APM protection for everything else. # # Dependencies: # - BIG-IP LTM # - BIG-IP APM # - Internal string datagroup for public path prefixes # # Datagroups: # - gisportal_public_path_prefixes_dg # # Notes: # - Matching is done against the raw request path derived from HTTP::uri. # - Only the query string is stripped. # - Datagroup entries are matched using starts_with behavior. # - APM is disabled only when a request matches an explicit public prefix. # - All other requests remain protected by APM. # - Debug logging can be enabled in RULE_INIT by setting: # static::portal_example_com_apm_bypass_debug 1 # # ============================================================================= when RULE_INIT { set static::portal_example_com_apm_bypass_debug 1 } when HTTP_REQUEST { set normalized_host [string tolower [getfield [HTTP::host] ":" 1]] set normalized_uri [string tolower [HTTP::uri]] # ------------------------------------------------------------------------- # Extract request path from raw URI # ------------------------------------------------------------------------- set query_delimiter_index [string first "?" $normalized_uri] if { $query_delimiter_index >= 0 } { set normalized_path [string range $normalized_uri 0 [expr {$query_delimiter_index - 1}]] } else { set normalized_path $normalized_uri } if { $static::portal_example_com_apm_bypass_debug } { log local0. "debug: host='$normalized_host' uri='$normalized_uri' path='$normalized_path'" } if { $normalized_host ne "portal.example.com" } { if { $static::portal_example_com_apm_bypass_debug } { log local0. "debug: host did not match target host, skipping rule" } return } # ------------------------------------------------------------------------- # PUBLIC # ------------------------------------------------------------------------- set matched_public_prefix [class match -name -- $normalized_path starts_with gisportal_public_path_prefixes_dg] if { $static::portal_example_com_apm_bypass_debug } { if { $matched_public_prefix ne "" } { log local0. "debug: matched PUBLIC prefix '$matched_public_prefix' -> disabling APM" } else { log local0. "debug: no public prefix matched -> APM remains enabled" } } if { $matched_public_prefix ne "" } { ACCESS::disable return } } I can recommend spending time on YouTube to get a better feeling about what CORS is about and why it is being used. You might run into another problem with APM if you try to embed the entire application, with APM authentication logic, into a frame in a 3. party app. I have not addressed it in this example, but you could extend the iRule logic to inject CSP headers to make this possible. I might address this in another article. I hope this example can help you fast-track pass all the headaches I struggled with. If you have any feedback just let me know. I don't know everything about CORS and I'm sure there are areas which requires special attention that I haven't addressed. Tell me so we can enrich the solution by sharing knowledge.301Views2likes1CommentThe State Of HTTP/2 Full Proxy With F5 LTM
In this article, I will attempt to summarize the known challenges of an HTTP/2 full proxy setup, point out possible solutions, and document known bugs and incompatibilities. Most major browsers had added HTTP/2 support by the end of 2015. However, I hardly ever see F5 LTM setups with HTTP/2 full proxy configured.1.8KViews5likes14CommentsF5 Container Ingress Services (CIS) and using k8s traffic policies to send traffic directly to pods
This article will take a look how you can use health monitors on the BIG-IP to solve the issue with constant AS3 REST-API pool member changes or when there is a sidecar service mesh like Istio (F5 has version called Aspen mesh of the istio mesh) or Linkerd mesh. I also have described some possible enchantments for CIS/AS3, Nginx Ingress Controller or Gateway Fabric that will be nice to have in the future. Intro Install Nginx Ingress Open source and CIS F5 CIS without Ingress/Gateway F5 CIS with Ingress F5 CIS with Gateway fabric Summary 1. Intro F5 CIS allows integration between F5 and k8s kubernetes or openshift clusters. F5 CIS has two modes and that are NodePort and ClusterIP and this is well documented at https://clouddocs.f5.com/containers/latest/userguide/config-options.html . There is also a mode called auto that I prefer as based on k8s service type NodePort or ClusterIP it knows how to configure the pool members. CIS in ClusterIP mode generally is much better as you bypass the kube-proxy as send traffic directly to pods but there could be issues if k8s pods are constantly being scaled up or down as CIS uses AS3 REST-API to talk and configure the F5 BIG-IP. I also have seen some issues where a bug or a config error that is not well validated can bring the entire CIS to BIG-IP control channel down as you then see 422 errors in the F5 logs and on CIS logs. By using NodePort and "externaltrafficpolicy: local" and if there is an ingress also "internaltrafficpolicy: local" you can also bypass the kubernetes proxy and send traffic directly to the pods and BIG-IP health monitoring will mark the nodes that don't have pods as down as the traffic policies prevent nodes that do not have the web application pods to send the traffic to other nodes. 2..Install Nginx Ingress Open source and CIS As I already have the k8s version of nginx and F5 CIS I need 3 different classes of ingress. k8s nginx is end of life https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/ , so my example also shows how you can have in parallel the two nginx versions the k8s nginx and F5 nginx. There is a new option to use The Operator Lifecycle Manager (OLM) that when installed will install the components and this is even better way than helm (you can install OLM with helm and this is even newer way to manage nginx ingress!) but I found it still in early stage for k8s while for Openshift it is much more advanced. I have installed Nginx in a daemonset not deployment and I will mention why later on and I have added a listener config for the F5 TransportServer even if later it is seen why at the moment it is not usable. helm install -f values.yaml ginx-ingress oci://ghcr.io/nginx/charts/nginx-ingress \ --version 2.4.1 \ --namespace f5-nginx \ --set controller.kind=daemonset \ --set controller.image.tag=5.3.1 \ --set controller.ingressClass.name=nginx-nginxinc \ --set controller.ingressClass.create=true \ --set controller.ingressClass.setAsDefaultIngress=false cat values.yaml controller: enableCustomResources: true globalConfiguration: create: true spec: listeners: - name: nginx-tcp port: 88 protocol: TCP kubectl get ingressclasses NAME CONTROLLER PARAMETERS AGE f5 f5.com/cntr-ingress-svcs <none> 8d nginx k8s.io/ingress-nginx <none> 40d nginx-nginxinc nginx.org/ingress-controller <none> 32s niki@master-1:~$ kubectl get pods -o wide -n f5-nginx NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-ingress-controller-2zbdr 1/1 Running 0 62s 10.10.133.234 worker-2 <none> <none> nginx-ingress-controller-rrrc9 1/1 Running 0 62s 10.10.226.87 worker-1 <none> <none> niki@master-1:~$ The CIS config is shown below. I have used "pool_member_type" auto as this allows Cluster-IP or NodePort services to be used at the same time. helm install -f values.yaml f5-cis f5-stable/f5-bigip-ctlr cat values.yaml bigip_login_secret: f5-bigip-ctlr-login rbac: create: true serviceAccount: create: true name: namespace: f5-cis args: bigip_url: X.X.X.X bigip_partition: kubernetes log_level: DEBUG pool_member_type: auto insecure: true as3_validation: true custom_resource_mode: true log-as3-response: true load-balancer-class: f5 manage-load-balancer-class-only: true namespaces: [default, test, linkerd-viz, ingress-nginx, f5-nginx] # verify-interval: 35 image: user: f5networks repo: k8s-bigip-ctlr pullPolicy: Always nodeSelector: {} tolerations: [] livenessProbe: {} readinessProbe: {} resources: {} version: latest 3. F5 CIS without Ingress/Gateway Without Ingress actually the F5's configuration is much simpler as you just need to create nodeport service and the VirtualServer CR. As you see below the health monitor marks the control node and the worker node that do not have pod from "hello-world-app-new-node" as shown in the F5 picture below. Sending traffic without Ingresses or Gateways removes one extra hop and sub-optimal traffic patterns as when the Ingress or Gateway is in deployment mode for example there could be 20 nodes and only 2 ingress/gateway pods on 1 node each. Traffic will need to go to only those 2 nodes to enter the cluster. apiVersion: v1 kind: Service metadata: name: hello-world-app-new-node labels: app: hello-world-app-new-node spec: externalTrafficPolicy: Local ports: - name: http protocol: TCP port: 8080 targetPort: 8080 selector: app: hello-world-app-new type: NodePort --- apiVersion: "cis.f5.com/v1" kind: VirtualServer metadata: name: vs-hello-new namespace: default labels: f5cr: "true" spec: virtualServerAddress: "192.168.1.71" virtualServerHTTPPort: 80 host: www.example.com hostGroup: "new" snat: auto pools: - monitor: interval: 10 recv: "" send: "GET /" timeout: 31 type: http path: / service: hello-world-app-new-node servicePort: 8080 For Istio and Linkerd Integration an irule could be needed to send custom ALPN extensions to the backend pods that now have a sidecar. I suggest seeing my article at "the Medium" for more information see https://medium.com/@nikoolayy1/connecting-kubernetes-k8s-cluster-to-external-router-using-bgp-with-calico-cni-and-nginx-ingress-2c45ebe493a1 Keep in mind that for the new options with Ambient mesh (sidecarless) the CIS without Ingress will not work as F5 does not speak HBONE (or HTTP-Based Overlay Network Environment) protocol that is send in the HTTP Connect tunnel to inform the zTunnel (layer 3/4 proxy that starts or terminates the mtls) about the real source identity (SPIFFE and SPIRE) that may not be the same as the one in CN/SAN client SSL cert. Maybe in the future there could be an option based on a CRD to provide the IP address of an external device like F5 and the zTunnel proxy to terminate the TLS/SSL (the waypoint layer 7 proxy usually Envoy is not needed in this case as F5 will do the HTTP processing) and send traffic to the pod but for now I see no way to make F5 work directly with Ambient mesh. If the ztunnel takes the identity from the client cert CN/SAN F5 will not have to even speak HBONE. 4. F5 CIS with Ingress Why we may need an ingress just as a gateway into the k8s you may ask? Nowadays many times a service mesh like linkerd or istio or F5 aspen mesh is used and the pods talk to each other with mTLS handled by the sidecars and an Ingress as shown in https://linkerd.io/2-edge/tasks/using-ingress/ is an easy way for the client-side to be https while the server side to be the service mesh mtls, Even ambient mesh works with Ingresses as it captures traffic after them. It is possible from my tests F5 to talk to a linkerd injected pods for example but it is hard! I have described this in more detail at https://medium.com/@nikoolayy1/connecting-kubernetes-k8s-cluster-to-external-router-using-bgp-with-calico-cni-and-nginx-ingress-2c45ebe493a1 Unfortunately when there is an ingress things as much more complex! F5 has Integration called "IngressLink" but as I recently found out it is when BIG-IP is only for Layer 3/4 Load Balancing and the Nginx Ingress Controller will actually do the decryption and AppProtect WAF will be on the Nginx as well F5 CIS IngressLink attaching WAF policy on the big-ip through the CRD ? | DevCentral Wish F5 to make an integration like "IngressLink" but the reverse where each node will have nginx ingress as this can be done with demon set and not deployment on k8s and Nginx Ingress will be the layer 3/4, as the Nginx VirtualServer CRD support this and to just allow F5 in the k8s cluster. Below is how currently this can be done. I have created a Transportserver but is not used as it does not at the momemt support the option "use-cluster-ip" set to true so that Nginx does not bypass the service and to go directly to the endpoints as this will cause nodes that have nginx ingress pod but no application pod to send the traffic to other nodes and we do not want that as add one more layer of load balancing latency and performance impact. The gateway is shared as you can have a different gateway per namespace or shared like the Ingress. apiVersion: v1 kind: Service metadata: name: hello-world-app-new-cluster labels: app: hello-world-app-new-cluster spec: internalTrafficPolicy: Local ports: - name: http protocol: TCP port: 8080 targetPort: 8080 selector: app: hello-world-app-new type: ClusterIP --- apiVersion: k8s.nginx.org/v1 kind: TransportServer metadata: name: nginx-tcp annotations: nginx.org/use-cluster-ip: "true" spec: listener: name: nginx-tcp protocol: TCP upstreams: - name: nginx-tcp service: hello-world-app-new-cluster port: 8080 action: pass: nginx-tcp --- apiVersion: k8s.nginx.org/v1 kind: VirtualServer metadata: name: nginx-http spec: host: "app.example.com" upstreams: - name: webapp service: hello-world-app-new-cluster port: 8080 use-cluster-ip: true routes: - path: / action: pass: webapp The second part of the configuration is to expose the Ingress to BIG-IP using CIS. --- apiVersion: v1 kind: Service metadata: name: f5-nginx-ingress-controller namespace: f5-nginx labels: app.kubernetes.io/name: nginx-ingress spec: externalTrafficPolicy: Local type: NodePort selector: app.kubernetes.io/name: nginx-ingress ports: - name: http protocol: TCP port: 80 targetPort: http --- apiVersion: "cis.f5.com/v1" kind: VirtualServer metadata: name: vs-hello-ingress namespace: f5-nginx labels: f5cr: "true" spec: virtualServerAddress: "192.168.1.81" virtualServerHTTPPort: 80 snat: auto pools: - monitor: interval: 10 recv: "200" send: "GET / HTTP/1.1\r\nHost:app.example.com\r\nConnection: close\r\n\r\n" timeout: 31 type: http path: / service: f5-nginx-ingress-controller servicePort: 80 Only the nodes that have a pod will answer the health monitor. Hopefully F5 can make some Integration and CRD that makes this configuration simpler like the "IngressLink" and to add the option "use-cluster-ip" to the Transport server as Nginx does not need to see the HTTP traffic at all. This is on my wish list for this year 😁 Also if AS3 could reference existing group of nodes and just with different ports this could help CIS will need to push AS3 declaration of nodes just one time and then the different VirtualServers could reference it but with different ports and this will make the AS3 REST-API traffic much smaller. 5. F5 CIS with Gateway fabric This does not at the moment work as gateway-fabric unfortunately does not support "use-cluster-ip" option. The idea is to deploy the gateway fabric in daemonset and to inject it with a sidecar or even without one this will work with ambient meshes. As k8s world is moving away from an Ingress this will be a good option. Gateway fabric natively supports TCP , UDP traffic and even TLS traffic that is not HTTPS and by exposing the gateway fabric with a Cluster-IP or Node-Port service then with different hostnames the Gateway fabric will select to correct route to send the traffic to! helm install ngf oci://ghcr.io/nginx/charts/nginx-gateway-fabric --create-namespace -n nginx-gateway -f values-gateway.yaml cat values-gateway.yaml nginx: # Run the data plane per-node kind: daemonSet # How the data plane gets exposed when you create a Gateway service: type: NodePort # or NodePort # (optional) if you’re using Gateway API experimental channel features: nginxGateway: gwAPIExperimentalFeatures: enable: true apiVersion: gateway.networking.k8s.io/v1 kind: Gateway metadata: name: shared-gw namespace: nginx-gateway spec: gatewayClassName: nginx listeners: - name: https port: 443 protocol: HTTPS tls: mode: Terminate certificateRefs: - kind: Secret name: wildcard-tls allowedRoutes: namespaces: from: ALL --- apiVersion: gateway.networking.k8s.io/v1 kind: HTTPRoute metadata: name: app-route namespace: app spec: parentRefs: - name: shared-gw namespace: nginx-gateway hostnames: - app.example.com rules: - backendRefs: - name: app-svc port: 8080 F5 Nginx Fabric mesh is evolving really fast from what I see , so hopefully we see the features I mentioned soon and always you can open a github case. The documentation is at https://docs.nginx.com/nginx-gateway-fabric and as this use k8s CRD the full options can be seen at TLS - Kubernetes Gateway API 6. Summary With the release of TMOS 21 F5 now supports much more health monitors and pool members, so this way of deploying CIS with NodePort services may offer benefits with TMOS 21.1 that will be the stable version as shown in https://techdocs.f5.com/en-us/bigip-21-0-0/big-ip-release-notes/big-ip-new-features.html With auto mode some services can still be directly exposed to BIG-IP as the CIS config changes are usually faster to remove a pool member pod than BIG-IP health monitors to mark a node as down. The new version of CIS that will be CIS advanced may take of the concerns of hitting a bug or not well validated configuration that could bring the control channel down and TMOS 21.1 may also handle AS3 config changes better with less cpu/memory issue, so there could be no need in the future of using trafficpolicies and NodePort mode and k8s services of this type. For ambient mesh my example with Ingress and Gateway seems the only option for direct communication at the moment. We will see what the future holds!1.3KViews6likes2CommentsF5 Distributed Cloud (XC) Custom Routes: Capabilities, Limitations, and Key Design Considerations
This article explores how Custom Routes work in F5 Distributed Cloud (XC), why they differ architecturally from standard Load Balancer routes, and what to watch out for in real-world deployments, covering backend abstraction, Endpoint/Cluster dependencies, and critical TLS trust and Root CA requirements.653Views2likes1CommentXC Distributed Cloud and how to keep the Source IP from changing with customer edges(CE)!
Code is community submitted, community supported, and recognized as ‘Use At Your Own Risk’. Old applications sometimes do not accept a different IP address to be used by the clients during the session/connection. How can make certain the IP stays the same for a client? The best will always be the application to stop tracking users based on something primitive as an ip address and sometimes the issue is in the Load Balancer or ADC after the XC RE and then if the persistence is based on source IP address on the ADC to be changed in case it is BIG-IP to Cookie or Universal or SSL session based if the Load Balancer is doing no decryption and it is just TCP/UDP layer . As an XC Regional Edge (RE) has many ip addresses it can connect to the origin servers adding a CE for the legacy apps is a good option to keep the source IP from changing for the same client HTTP requests during the session/transaction. Before going through this article I recommend reading the links below: F5 Distributed Cloud – CE High Availability Options: A Comparative Exploration | DevCentral F5 Distributed Cloud - Customer Edge | F5 Distributed Cloud Technical Knowledge Create Two Node HA Infrastructure for Load Balancing Using Virtual Sites with Customer Edges | F5 Distributed Cloud Technical Knowledge RE to CE cluster of 3 nodes The new SNAT prefix option under the origin pool allows no mater what CE connects to the origin pool the same IP address to be seen by the origin. Be careful as if you have more than a single IP with /32 then again the client may get each time different IP address. This may cause "inet port exhaustion " ( that is what it is called in F5BIG-IP) if there are too many connections to the origin server, so be careful as the SNAT option was added primary for that use case. There was an older option called "LB source IP persistence" but better not use it as it was not so optimized and clean as this one. RE to 2 CE nodes in a virtual site The same option with SNAT pool is not allowed for a virtual site made of 2 standalone CE. For this we can use the ring hash algorithm. Why this works? Well as Kayvan explained to me the hashing of the origin is taking into account the CE name, so the same origin under 2 different CE will get the same ring hash and the same source IP address will be send to the same CE to access the Origin Server. This will not work for a single 3-node CE cluster as it all 3 nodes have the same name. I have seen 503 errors when ring hash is enabled under the HTTP LB so enable it only under the XC route object and the attached origin pool to it! CE hosted HTTP LB with Advertise policy In XC with CE you can do do HA with 3-cluster CE that can be layer2 HA based on VRRP and arp or Layer 3 persistence based bgp that can work 3 node CE cluster or 2 CE in a virtual site and it's control options like weight, as prepend or local preference options at the router level. For the Layer 2 I will just mention that you need to allow 224.0.0.8 for the VRRP if you are migrating from BIG-IP HA and that XC selects 1 CE to hold active IP that is seen in the XC logs and at the moment the selection for some reason can't be controlled. if a CE can't reach the origin servers in the origin pool it should stop advertising the HTTP LB IP address through BGP. For those options Deploying F5 Distributed Cloud (XC) Services in Cisco ACI - Layer Three Attached Deployment is a great example as it shows ECMP BGP but with the BGP attributes you can easily select one CE to be active and processing connections, so that just one ip address is seen by the origin server. When a CE gets traffic by default it does prefer to send it to the origin as by default "Local Preferred" is enabled under the origin pool. In the clouds like AWS/Azure just a cloud native LB is added In front of the 3 CE cluster and the solution there is simple as to just modify the LB to have a persistence. Public Clouds do not support ARP, so forget about Layer 2 and play with the native LB that load balances between the CE 😉 CE on Public Cloud (AWS/Azure/GCP) When deploying on a public cloud the CE can be deployed in two ways one is through the XC GUI and adding the AWS credentials but this way you have not a big freedom to be honest as you can't deploy 2 CE and make a virtual site out of them and add cloud LB in-front of them as it always will be 3-CE cluster with preconfigured cloud LB that will use all 3 LB! Using the newer "clickops" method is much better https://docs.cloud.f5.com/docs-v2/multi-cloud-network-connect/how-to/site-management/deploy-site-aws-clickops or using terraform but with manual mode and aws as a provider (not XC/volterra as it is the same as the XC GUI deployment) https://docs.cloud.f5.com/docs-v2/multi-cloud-network-connect/how-to/site-management/deploy-aws-site-terraform This way you can make the Cloud LB to use just one CE or have some client Persistence or if traffic comes from RE to CE to implement the virtual site 2 CE node! There is no Layer 2 ARP support as I mentioned in public cloud with 3-node cluster but there is NAT policy https://docs.cloud.f5.com/docs-v2/multi-cloud-network-connect/how-tos/networking/nat-policies but I haven't tried it myself to comment on it. Hope you enjoyed this article!385Views2likes0CommentsF5 XC Distributed Cloud HTTP Header/Cookie manipulations and using the client ip/user headers
1 . F5 XC distributed cloud HTTP Header manipulations In the F5 XC Distributed Cloud some client information is saved to variables that can be inserted in HTTP headers similar to how F5 Big-IP saves some data that can after that be used in a iRule or Local Traffic Policy. By default XC will insert XFF header with the client IP address but what if the end servers want an HTTP header with another name to contain the real client IP. Under the HTTP load balancer under "Other Options" under "More Options" the "Header Options" can be found. Then the the predefined variables can be used for this job like in the example below the $[client_address] is used. A list of the predefined variables for F5 XC: https://docs.cloud.f5.com/docs/how-to/advanced-security/configure-http-header-processing There is $[user] variable and maybe in the future if F5 XC does the authentication of the users this option will be insert the user in a proxy chaining scenario but for now I think that this just manipulates data in the XAU (X-Authenticated-User) HTTP header. 2. Matching of the real client ip HTTP headers You can also match a XFF header if it is inserted by a proxy device before the F5 XC nodes for security bypass/blocking or for logging in the F5 XC. For User logging from the XFF Under "Common Security Controls" create a "User Identification Policy". You can also match a regex that matches the ip address and this is in case there are multiple IP addresses in the XFF header as there could have been many Proxy devices in the data path and we want see if just one is present. For Security bypass or blocking based based on XFF Under "Common Security Controls" create a "Trusted Client Rules" or "Client Blocking Rules". Also if you have "User Identification Policy" then you can just use the "User Identifier" but it can't use regex in this case. I have made separate article about User-Identification F5 XC Session tracking and logging with User Identification Policy | DevCentral To match a regex value in the header that is just a single IP address, even when the header has many ip addresses, use the regex (1\.1\.1\.1) as an example to mach address 1.1.1.1. To use the client IP address as a source Ip address to the backend Origin Servers in the TCP packet after going through the F5 XC (similar to removing the SNAT pool or Automap in F5 Big-IP) use the option below: The same way the XAU (X-Authenticated-User) HTTP header can be used in a proxy chaining topology, when there is a proxy before the F5 XC that has added this header. Edit: Keep in mind that in some cases in the XC Regex for example (1\.1\.1\.1) should be written without () as 1\.1\.1\.1 , so test it as this could be something new and I have seen it in service policy regex matches, when making a new custom signature that was not in WAAP WAF XC policy. I could make a seperate article for this 🙂 XC can even send the client certificate attributes to the backend server if Client Side mTLS is enabled but it is configured at the cert tab. 3. F5 XC distributed cloud HTTP Cookie manipulations. Now you can overwrite the XC cookie by keeping the value but modifying the tags and this is big thing as before this was not possible. When combined with cookies this becomes very powerful thing as you can match on User-Agent header and for Mozilla for example to change the flags as if there is bug with the browser etc. The feature changes cookies returned in the Response Set-Cookie header from the origin server as it should.4.8KViews8likes1CommentKnowledge sharing: Velos and rSeries (F5OS) basic troubleshooting, logs and commands
This another part of my Knowledge sharing articles, where I will take a deeper look into Velos and rSeries investigation of issues, logs and command. 1. Velos HA controller and blade issues. As the Velos system is the one with two controllers in active/standby mode only with Velos it could be needed to check if there is an issue with the controller's HA. As the controller's HA order can be different for the system and the different partitions to check the HA for the system use the /var/log_controller/cc-confd file or for a partition HA issue look at the partition velos log at /var/F5/partition<ID>/log/velos.log . Also you can enable HA debug for the controllers with " system dbvars config debug confd ha-state-machine true ". Overview of HA: https://support.f5.com/csp/article/K19204400 Controller HA: https://support.f5.com/csp/article/K21130014 Partition HA: https://support.f5.com/csp/article/K58515297 List of Velos/rSeries services: Overview of F5 VELOS chassis controller services Overview of F5 VELOS partition services Overview of F5 rSeries system services 2. Entering into F5OS objects. The rSeries and Velos tenants are like vCMP quests with VIPRION and sometimes if there are access issues with them it could be needed to open their console. For this the "virtctl" command can be used and as an example " /usr/share/omd/kubevirt/virtctl console <tenant_name>-<tenant_instance_ID> ". Also as velos uses blades and partitions it could be needed to ssh to a blade with " ssh slot<number> " or to enter a partition with " docker exec -it partition<ID>_cli su admin " as sometimes for example to see the GUI logs entering the GUI container for the partition could be needed but F5 support will for this in most cases and maybe this will be the way to enter the BIG-IP NEXT CLI. Overview of VELOS system architecture: https://support.f5.com/csp/article/K73364432 Overview of rSeries system architecture: https://support.f5.com/csp/article/K49918625 rSeries tanant access: https://support.f5.com/csp/article/K33373310 Velos blade and tenant access: https://support.f5.com/csp/article/K65442484 Velos partition access: https://support.f5.com/csp/article/K11206563 3. Usefull commands and logs. For Velos/rSeries as this is a system with a cluster the "show cluster" command is usefull to see any issues (look fo "cluster is NOT ready."). Also the velos.log for the controller and partitions is a great place to start and debug level can be enabled for it under " SYSTEM SETTINGS Log Settings " as this is also the place for rSeries logging to be set to debug. Also the /var/log/openshift.log is good be checked with velos if there are cluster issues or or ks3.log in rSeries. Also the confd logs are like mcpd logs, so they are really usefull for Velos or rSeries. Other nice commands are docker ps, oc get pod --all-namespaces -o wide, kubectl get pod --all-namespaces -o wide but the support will ask for them in most cases. Velos cluster status: https://support.f5.com/csp/article/K27427444 Velos debug: https://support.f5.com/csp/article/K51486849 Velos openshift example issue: https://support.f5.com/csp/article/K01030619 Monitoring Velos: https://clouddocs.f5.com/training/community/velos-training/html/monitoring_velos.html Monitoring rSeries: https://clouddocs.f5.com/training/community/rseries-training/html/monitoring_rseries.html 4. Velos and rSeries tcpdumps packet captures, file utility and qkview files. For Velos qkviews ca be created for controller or partition as they are seperate qkviews. Tcpdumps for client traffic are done a tcpdump utility from the F5OS (su - admin) and a tcpdump in the Linux kernel is just for the managment ip addresses of the appliance , controller (floating or local) , partition or tenant. The file utility allows for file transfers to remote servers or even downloading any log from the Velos/rSeries to your computer as this was not possible before with iSeries or Viprion. Also the file utility starts outbound session to the remote servers so this an extra security as no inbound sessions need to be allowed on the firewall/web proxy and it can be even triggered by API call and I may make a codeshare article for this. Velos tcpdump utility: https://support.f5.com/csp/article/K12313135 rSeries tcpdump utility: https://support.f5.com/csp/article/K80685750 Qkview Velos: https://support.f5.com/csp/article/K02521182 Qkview Velos CLI location: https://support.f5.com/csp/article/K79603072 Qkview rSeries: https://support.f5.com/csp/article/K04756153 SCP: https://support.f5.com/csp/article/K34776373 For rSeries 2000/4000 tcpdump is different as SR-IOV not FPGA (rSeries Networking (f5.com)) is used to attach interfaces directly to the tenant VM: Article Detail (f5.com) 5. A final fast check could be to use ''kubectl get pods -o wide--all-namespaces'' (with Velos also ''oc get pods -o wide --all-namespaces'' should also work) to see that all pods are ok and running. Also ''docker ps'' or '' docker ps --format 'table {{.Names}}\t{{.RunningFor}}\t{{.Status}}' '' are usefull to see a container that could be going down and up and this can be correlated with issues seen with "show cluster" command. 6. The new F5OS has much better hardware diagnostics than the old devices, so no more the need to do EUD tests as all system hardware components and their health can be viewed from the GUI or CLI and also this is shown in F5 ihealth! https://techdocs.f5.com/en-us/velos-1-5-0/velos-systems-administration-configuration/title-system-settings.html 7. For Velos and rSeries always keep the software up to date as for example I will give with the Velos 1.5.1 the cluster rebuild because of the openshift ssl cert being 1 year is much simpler or the F5 rSeries and the Cisco Nexus issues or the corrupt Qkview generation when the GUI not the CLI is used (the velos cluster rebuild with touch /var/omd/CLUSTER_REINSTALL can solve many issues but it will cause some timeout): http://cdn.f5.com/product/bugtracker/ID1135853.html https://my.f5.com/manage/s/article/K000092905 https://support.f5.com/csp/article/K79603072 In the future ''docker'' commands could be not available but then just use "crictl" as this replaces the docker init system for kubernetes. 8. F5OS 1.8 has added several cool features that I will discuss. system rollback initiate proceed - F5OS config option to rollback to the previous config. system diagnostics os-utils docker restart node platform service xxx - config option to restart a docker service from the F5OS without the use or root bash. Also can be scheduled through API! f5sh - to enter F5OS from root bash like tmsh command. system diagnostics core-files list - to see the core files and which process made them as to know where to focus. system diagnostics net-utils xxx - to run ping, dig, traceroute from the F5os without bash access.4.4KViews2likes3Comments