xc
7 TopicsF5 Distributed Cloud – CE High Availability Options: A Comparative Exploration
This article explores an alternative approach to achieve HA across single CE nodes, catering for use cases requiring higher performance and granular control over redundancy and failover management. Introduction F5 Distributed Cloud offers different techniques to achieve High Availability (HA) for Customer Edge (CE) nodes in an active-active configuration to provide redundancy, scaling on-demand and simplify management. By default, F5 Distributed Cloud uses a method for clustering CE nodes, in which CEs keep track of peers by sending heartbeats and facilitating traffic exchange among themselves. This method also handles the automatic transfer of traffic, virtual IPs, and services between CE peers —excellent for simplified deployment and running App Stack sites hosting Kubernetes workloads. However, if CE nodes are deployed mainly to manage L3/L7 traffic and application security, this default model might lack the flexibility needed for certain scenarios. Many of our customers tell us that achieving high availability is not so straightforward with the current clustering model. These customers often have a lot of experience in managing redundancy and high availability across traditional network devices. They like to manage everything themselves—from scheduling when to switch over to a redundant pair (planned failover), to choosing how many network paths (tunnels) to use between CEs to REs (Regional Edges) or other CEs. They also want to handle any issues device by device, decide the number of CE nodes in a redundancy group, and be able to direct traffic to different CEs when one is being updated. Their feedback inspired us to write this article, where we explore a different approach to achieve high availability across CEs. The default clustering model is explained in this document: https://docs.cloud.f5.com/docs/ves-concepts/site#cluster-of-nodes Throughout this article, we will dive into several key areas: An overview of the default CE clustering model, highlighting its inherent challenges and advantages. Introduction to an alternative clustering strategy: Single Node Clustering, including: An analysis of its challenges and benefits. Identification of scenarios where this approach is most applicable. A guide to the configuration steps necessary to implement this model. An exploration of failover behavior within this framework. A comparison table showing how this new method differs from the default clustering method. By the end of this article, readers will gain an understanding of both clustering approaches, enabling informed decisions on the optimal strategy for their specific needs. Default CE Clustering Overview In a standard CE clustering setup, a cluster must have at least three Master nodes, with subsequent additions acting as Worker nodes. A CE cluster is configured as a "Site," centralizing operations like pool configuration and software upgrades to simplify management. In this clustering method, frequent communication is required between control plane components of the nodes on a low latency network. When a failover happens, the VIPs and services - including customer’s compute workloads - will transition to the other active nodes. As shown in the picture above, a CE cluster is treated as a single site, regardless of the number of nodes it contains. In a Mesh Group scenario, each mesh link is associated with one single tunnel connected to the cluster. These tunnels are distributed among the master nodes in the cluster, optimizing the total number of tunnels required for a large-scale Mesh Group. It also means that the site will be connected to REs only via 2 tunnels – one to each RE. Design Considerations for Default CE Clustering model: Best suited for: 1- App Stack Sites: Running Kubernetes workloads necessitates the default clustering method for container orchestration across nodes. 2- Large-scale Site-Mesh Groups (SMG) 3- Cluster-wide upgrade preference: Customers who favour managing nodes collectively will find cluster-wide upgrades more convenient, however without control over the upgrade sequence of individual nodes. Challenges: o Network Bottleneck for Ingress Traffic: A cluster connected to two Regional Edge (RE) sites via only 2 tunnels can lead to only two nodes processing external (ingress) traffic, limiting the use of additional nodes to process internal traffic only. o Three-master node requirement: Some customers are accustomed to dual-node HA models and may find the requirement for three master nodes resource-intensive. o Hitless upgrades: Controlled, phased upgrades are preferred by some customers for testing before widespread deployment, which is challenging with cluster-wide upgrades. o Cross-site deployments: High network latency between remote data centers can impact cluster performance due to the latency sensitivity of etcd daemon, the backbone of cluster state management. If the network connection across the nodes gets disconnected, all nodes will most likely stop the operation due to the quorum requirements of etcd. Therefore, F5 recommends deploying separate clusters for different physical sites. o Service Fault Sprawl and limited Node fault tolerance: Default clusters can sometimes experience a cascading effect where a fault in a node spreads throughout the cluster. Additionally, a standard 3-node cluster can generally only tolerate the failure of two nodes. If the cluster was originally configured with three nodes, functionality may be lost if reduced to a single active node. These limitations stem from the underlying clustering design and its dependency on etcd for maintaining cluster state. The Alternative Solution: HA Between Multiple Single Nodes The good news is that we can achieve the key objectives of the clustering – which are streamlined management and high availability - without the dependency on the control plane clustering mechanisms. Streamlined management using “Virtual Site”: F5 Distributed Cloud provides a mechanism called “Virtual Site” to perform operations on a group of sites (site = node or cluster of nodes), reducing the need to repeat the same set of operations for each site. The “Virtual Site” acts as an abstraction layer, grouping nodes tagged with a unique label and allows collectively addressing these nodes as a single entity. Configuration of origin pools and load balancers can reference Virtual Sites instead of individual sites/nodes, to facilitate cluster-like management for two or more nodes and enabling controlled day 2 operations. When a node is disassociated from Virtual Site by removing the label, it's no longer eligible for new connections, and its listeners are simultaneously deactivated. Upgrading nodes is streamlined: simply remove the node's label to exclude it from the Virtual Site, perform the upgrade, and then reapply the label once the node is operational again. This procedure offers you a controlled failover process, ensuring minimal disruption and enhanced manageability by minimizing the blast radius and limiting the cope of downtime. As traffic is rerouted to other CEs, if something goes wrong with an upgrade of a CE node, the services will not be impacted. HA/Redundancy across multiple nodes: Each single node in a Virtual Site connect to dual REs through IPSec or SSL/TLS tunnels, ensuring even load distribution and true active-active redundancy. External (Ingress) Traffic: In the Virtual Site model, the Regional Edges (REs) distribute external traffic evenly across all nodes. This contrasts with the default clustering approach where only two CE nodes are actively connected to the REs. The main Virtual Site advantage lies in its true active/active configuration for CEs, increasing the total ingress traffic capacity. If a node becomes unavailable, the REs will automatically reroute the new connections to another operational node within the Virtual Site, and the services (connection to origin pools) remain uninterrupted. Internal (East-West) Traffic: For managing internal traffic within a single CE node in a Virtual Site (for example, when LB objects are configured to be advertised within the local site), all network techniques applicable to the default clustering model can be employed in this model as well, except for the Layer 2 attachment (VRRP) method. Preferred load distribution method for internal traffic across CEs: Our preferred methods for load balancing across CE nodes are either DNS based load balancing or Equal-Cost Multi-Path (ECMP) routing utilizing BGP for redundancy. DNS Load Balancer Behavior: If a node is detached from a Virtual Site, its associated listeners and Virtual IPs (VIPs) are automatically withdrawn. Consequently, the DNS load balancer's health checks will mark those VIPs as down and prevent them from receiving internal network traffic. Current limitation for custom VIP and BGP: When using BGP, please note a current limitation that prevents configuring a custom VIP address on the Virtual Site. As a workaround, custom VIPs should be advertised on individual sites instead. The F5 product team is actively working to address this gap. For a detailed exploration of traffic routing options to CEs, please refer to the following article here: https://community.f5.com/kb/technicalarticles/f5-distributed-cloud---customer-edge-site---deployment--routing-options/319435 Design Considerations for Single Node HA Model: Best suited for: 1- Customers with high throughput requirement: This clustering model ensures that all Customer Edge (CE) nodes are engaged in managing ingress traffic from Regional Edges (REs), which allows for scalable expansion by adding additional CEs as required. In contrast, the default clustering model limits ingress traffic processing to only two CE nodes per cluster, and more precisely, to a single node from each RE, regardless of the number of worker nodes in the cluster. Consequently, this model is more advantageous for customers who have high throughput demands. 2- Customers who prefer to use controlled failover and software upgrades This clustering model enables a sequential upgrade process, where nodes are updated individually to ensure each node upgrades successfully before moving on to the other nodes. The process involves detaching the node from the cluster by removing its site label, which causes redirecting traffic to the remaining nodes during the upgrade. Once upgraded, the label is reapplied, and this process is repeated for each node in turn. This is a model that customers have known for 20+ years for upgrade procedures, with a little wrinkle with the label. 3- Customers who prefer to distribute the load across remote sites Nodes are deployed independently and do not require inter-node heartbeat communication, unlike the default clustering method. This independence allows for their deployment across various data centers and availability zones while being managed as a single entity. They are compatible with both Layer 2 (L2) spanned and Layer 3 (L3) spanned data centers, where nodes in different L3 networks utilize distinct gateways. As long as the nodes can access the origin pools, they can be integrated into the same "Virtual Site". This flexibility caters to customers' traditional preferences, such as deploying two CE nodes per location, which is fully supported by this clustering model. Challenges: Lack of VRRP Support: The primary limitation of this clustering method is the absence of VRRP support for internal VIPs. However, there are some alternative methods to distribute internal traffic across CE nodes. These include DNS based routing, BGP with Equal-Cost Multi-Path (ECMP) routing, or the implementation of CEs behind another Layer 4 (L4) load balancer capable of traffic distribution without source address alteration, such as F5 BIG-IPs or the standard load balancers provided by Azure or AWS. Limitation on Custom VIP IP Support: Currently, the F5 Distributed Cloud Console has a restriction preventing the configuration of custom virtual IPs for load balancer advertisements on Virtual Sites. We anticipate this limitation will be addressed in future updates to the F5 Distributed Cloud platform. As a temporary solution, you can advertise the LB across multiple individual sites within the Virtual Site. This approach enables the configuration of custom VIPs on those sites. Requires extra steps for upgrading nodes Unlike the Default clustering model where upgrades can be performed collectively on a group of nodes, this clustering model requires upgrading nodes on an individual basis. This may introduce more steps, especially in larger clusters, but it remains significantly simpler than traditional network device upgrades. Large-Scale Mesh Group: In F5 Distributed Cloud, the "Mesh Group" feature allows for direct connections between sites (whether individual CE sites or clusters of CEs) and other selected sites through IPSec tunnels. For CE clusters, tunnels are established on a per-cluster basis. However, for single-node sites, each node creates its own tunnels to connect with remote CEs. This setup can lead to an increased number of tunnels needed to establish the mesh. For example, in a network of 10 sites configured with dual-CE Virtual Sites, each CE is required to establish 18 IPSec tunnels to connect with other sites, or 19 for a full mesh configuration. Comparatively, a 10-site network using the default clustering method—with a minimum of 3 CEs per site—would only need up to 9 tunnels from each CE for full connectivity. Opting for Virtual Sites with dual CEs, a common choice, effectively doubles the number of required tunnels from each CE when compared to the default clustering setup. However, despite this increase in tunnels, opting for a Mesh configuration with single-node clusters can offer advantages in terms of performance and load distribution. Note: Use DC Groups as an alternative solution to Secure Mesh Group for CE connectivity: For customers with existing private connectivity between their CE nodes, running Site Mesh Group (SMG) with numerous IPsec tunnels can be less optimal. As a more scalable alternative for these customers, we recommend using DC Cluster Group (DCG). This method utilizes IP-in-IP tunnels over the existing private network, eliminating the need for individual encrypted IPsec tunnels between each node and streamlining communication between CE nodes via IP-n-IP encapsulations. Configuration Steps The configuration for creating single node clusters involves the following steps: Creating a Label Creating a Virtual Site Applying the label to the CE nodes (sites) Review and validate the configuration The detailed configuration guide for the above steps can be found here: https://docs.cloud.f5.com/docs/how-to/fleets-vsites/create-virtual-site Example Configuration: In this example, you can create a label called "my-vsite" to group CE nodes that belong to the same Virtual Site. Within this label, you can then define different values to represent different environments or clusters, such as specific Azure region or an on-premise data center. Then a Virtual Site of “CE” type can be created to represent the CE cluster in “Azure-AustraliaEast-vSite" and tied to any CE that is tagged with the label “my-vsite=Azure-AustraliaEast-vSite”: Now, any CE node that should join the cluster (Virtual Site), should get this label: Verification: To confirm the Virtual Site configuration is functioning as intended, we joined two CEs (k1-azure-ce2 and k1-azure-ce03) into the Virtual Site and evaluated the routing and load balancing behavior. Test 1: Public Load Balancer (Virtual Site referenced in the pool) The diagram shows a public "Load Balancer" advertised on the RE referencing a pool that uses the newly created Virtual Site to access the private application: As shown below, the pool member was configured to be accessed through the Virtual Site: Analysis of the request logs in the Performance dashboard confirmed that all requests to the public website were evenly distributed across both CEs. Test 2: Internal Load Balancer (LB advertised on the Virtual Site) We deployed an internal Load Balancer and advertised it on the newly created Virtual Site, utilizing the pool that also references the same Virtual Site (k1-azure-ce2 and k1-azure-ce03). As shown below, the Load Balancer was configured to be advertised on the Virtual Site. Note: Here we couldn't use a "shared" custom VIP across the Virtual Site due to a current platform constraint. If a custom VIP is required, we should use "site" as opposed to "Virtual Site" and advertise the Load Balancer on all sites, like below picture: Request logs revealed that when traffic reached either CE node within the Virtual Site, the request was processed and forwarded locally to the pool member. In the example below: src_site: Indicates the CE (k1-azure-ce2) that processed the request. src_ip: Represents the client's source IP address (192.168.1.68). dst_site: Indicates the CE (k1-azure-ce2) from which the pool member is accessed. dst_ip: Represents the IP address of the pool member (192.168.1.6). Resilience Testing: To assess the Virtual Site's resilience, we intentionally blocked network access from k1-azure-ce2 CE to the pool member (192.168.1.6). The CE automatically rerouted traffic to the pool member via the other CE (k1-azure-ce03) in the Virtual Site. Note: By default, CEs can communicate with each other via the F5 Global Network. This can be customized to use direct connectivity through tunnels if the CEs are members of the same DC Cluster Group (IP-n-IP tunneling) or Secure Mesh Groups (IPSec tunneling). The following picture shows the traffic flow via F5 Global Network. The following picture shows the traffic flow via the IP-n-IP tunnel when a DC Clustering Group (DCG) is configured across the CE nodes. Failover Behaviour When a CE node is tied to a Virtual Site, all internal Load Balancers (VIPs) advertised on that Virtual Site will be deployed in the CE. Additionally, the Regional Edge (RE) begins to use this node as one of the potential next hops for connections to the origin pool. Should the CE become unavailable, or if it lacks the necessary network access to the origin server, the RE will almost seamlessly reroute connections through the other operational CEs in the Virtual Site. Uncontrolled Failover: During instances of uncontrolled failover, such as when a node is unexpectedly shut down from the hypervisor, we have observed a handful of new connections experiencing timeouts. However, these issues were resolved by implementing health checks within the origin pool, which prevented any subsequent connection drops. Note: Irrespective of the clustering model in use, it's always recommended to configure health checks for the origin pool. This practice enhances failover responsiveness and mitigates any additional latency incurred during traffic rerouting. Controlled Failover: The moment a CE node is disassociated from the Virtual Site — by the removal of its label— the CE node will not be used by RE to connect to origin pools anymore. At the same time, all Load Balancer listeners associated with that Virtual Site are withdrawn from the node. This effectively halts traffic processing for those applications, preventing the node from receiving related traffic. During controlled failover scenarios, we have observed seamless service continuity on externally advertised services (to REs). On-Demand Scaling: F5 Distributed Cloud provides a flexible solution that enables customers to scale the number of active CE nodes according to demand. This allows you to easily add more powerful CE nodes during peak periods (such as promotional events) and then remove them when demand subsides. With the Virtual Sites method, you can even mix and match node sizes within your cluster (Virtual Site), providing granular control over resources. It's advisable to monitor CE node performance and implement node related alerts. These alerts notify you when nodes are operating at high capacity, allowing for timely addition of extra nodes as needed. Moreover, you can monitor node’s health in the dashboard. CPU, Memory and Disk utilizations of nodes can be a good factor in determining if more nodes are needed or not. Furthermore, the use of Virtual Sites makes managing this process even easier, thanks to labels. Node Based Alerts: Node-based alerts are essential for maintaining efficient CE operations. Accessing the alerts in the Console: To view alerts, go to Multi-Cloud Network Connect > Notifications > Alerts. Here, you can see both "Active Alerts" and "All Alerts." Alerts related to node health fall under the "infrastructure" alert group. The following screenshot shows alerts indicating high loads on the nodes. Configuring Alert Policies: Alert policies determine the notification process for raised alerts. To set up an alert policy, navigate to Multi-Cloud Network Connect > Alerts Management > Alert Policies. An alert policy consists of two main elements: the alert receiver configuration and the policy rules. Configuring Alert Receiver: The configuration allows for integration with platforms like Slack and PagerDuty, among others, facilitating notifications through commonly used channels. Configuring Alert Rules: For alert selection, we recommend configuring notifications for alerts with severity of “Major” or “Critical” at a minimum. Alternatively, the “infrastructure” group which includes node-based alerts can be selected. Comparison Table Criteria Default Cluster Single Node HA Minimum number of nodes in HA 3 2 Upgrade operations Per cluster Per Node Network redundancy and client side routing for east-west traffic VRRP, BGP, DNS, L4/7 LB DNS, L4/7 LB, BGP* Tunnels to RE 2 tunnels per cluster 2 tunnels per node Tunnels to other CEs (SMG or DCG) 1 tunnel from each cluster 1 tunnel from each node External traffic processing Limited to 2 nodes All nodes will be active Internal traffic processing All nodes can be active All nodes can be active Scale management in Public Cloud Sites Straightforward, by configuring ingress interfaces in Azure/AWS/GCP sites Straightforward, by adding or removing the labels Scale management in Secure Mesh Sites Requires reconfiguring the cluster (secure mesh site) - may cause interruption Straightforward, by adding or removing the labels Custom VIP IP Available Not Available (Planned to be available in future releases), workaround available. Node sizes All nodes should be same size. Upgrading node size in a cluster is a disruptive operation. Any node sizes or clusters can join the Virtual Site * When using BGP, please note a current limitation that prevents configuring custom VIP address on the Virtual Site. Conclusion: F5 Distributed Cloud offers a flexible approach to High Availability (HA) across CE nodes, allowing customers to select the redundancy model that best fits their specific use cases and requirements. While we continue to advocate for default clustering approach due to their operational simplicity and shared VRRP VIP or, unified network configuration benefits, especially for routine tasks like upgrades, the Virtual Site and single node HA model presents some great use cases. It not only addresses the limitations and challenges of the default clustering model, but also introduces a solution that is both scalable and adaptable. While Virtual Sites offer their own benefits, we recognize they also present trade-offs. The overall benefits, particularly for scenarios demanding high ingress (RE to CE) throughput and controlled failover capabilities cater to specific customer demands. The F5 product and development team remains committed to addressing the limitations of both default clustering and Virtual Sites discussed throughout this article. Their focus is on continuous improvement and finding the solutions that best serve our customers' needs. References and Additional Links: Default Clustering model: https://docs.cloud.f5.com/docs/ves-concepts/site#cluster-of-nodes Configuration guide for Virtual Sites: https://docs.cloud.f5.com/docs/how-to/fleets-vsites/create-virtual-site Routing Options for CEs: https://community.f5.com/kb/technicalarticles/f5-distributed-cloud---customer-edge-site---deployment--routing-options/319435 Configuration guide for DC Clustering Group: https://docs.cloud.f5.com/docs/how-to/advanced-networking/configure-dc-cluster-group2.5KViews7likes2CommentsSecure AI RAG using F5 Distributed Cloud in Red Hat OpenShift AI and NetApp ONTAP Environment
Introduction Retrieval Augmented Generation (RAG) is a powerful technique that allows Large Language Models (LLMs) to access information beyond their training data. The “R” in RAG refers to the data retrieval process, where the system retrieves relevant information from an external knowledge base based on the input query. Next, the “A” in RAG represents the augmentation of context enrichment, as the system combines the retrieved relevant information and the input query to create a more comprehensive prompt for the LLM. Lastly, the “G” in RAG stands for response generation, where the LLM generates a response with a more contextually accurate output based on the augmented prompt as a result. RAG is becoming increasingly popular in enterprise AI applications due to its ability to provide more accurate and contextually relevant responses to a wide range of queries. However, deploying RAG can introduce complexity due to its components being located in different environments. For instance, the datastore or corpus, which is a collection of data, is typically on-premise for enhanced control over data access and management due to data security, governance, and compliance with regulations within the enterprise. Meanwhile, inference services are often deployed in the cloud for their scalability and cost-effectiveness. In this article, we will discuss how F5 Distributed Cloud can simplify the complexity and securely connect all RAG components seamlessly for enterprise RAG-enabled AI applications deployments. Specifically, we will focus on Network Connect, App Connect, and Web App & API Protection. We will demonstrate how these F5 Distributed Cloud features can be leveraged to secure RAG in collaboration with Red Hat OpenShift AI and NetApp ONTAP. Example Topology F5 Distributed Cloud Network Connect F5 Distributed Cloud Network Connect enables seamless and secure network connectivity across hybrid and multicloud environments. By deploying F5 Distributed Cloud Customer Edge (CE) at site, it allows us to easily establish encrypted site-to-site connectivity across on-premises, multi-cloud, and edge environment. Jensen Huang, CEO of NVIDIA, has said that "Nearly half of the files in the world are stored on-prem on NetApp.”. In our example, enterprise data stores are deployed on NetApp ONTAP in a data center in Seattle managed by organization B (Segment-B: s-gorman-production-segment), while RAG services, including embedding Large Language Model (LLM) and vector database, is deployed on-premise on a Red Hat OpenShift cluster in a data center in California managed by Organization A (Segment-A: jy-ocp). By leveraging F5 Distributed Cloud Network Connect, we can quickly and easily establish a secure connection for seamless and efficient data transfer from the enterprise data stores to RAG services between these two segments only: F5 Distributed Cloud CE can be deployed as a virtual machine (VM) or as a pod on a Red Hat OpenShift cluster. In California, we deploy the CE as a VM using Red Hat OpenShift Virtualization — click here to find out more on Deploying F5 Distributed Cloud Customer Edge in Red Hat OpenShift Virtualization: Segment-A: jy-ocp on CE in California and Segment-B: s-gorman-production-segment on CE in Seattle: Simply and securely connect Segment-A: jy-ocp and Segment-B: s-gorman-production-segment only, using Segment Connector: NetApp ONTAP in Seattle has a LUN named “tbd-RAG”, which serves as the enterprise data store in our demo setup and contains a collection of data. After these two data centers are connected using F5 XC Network Connect, a secure encrypted end-to-end connection is established between them. In our example, “test-ai-tbd” is in the data center in California where it hosts the RAG services, including embedding Large Language Model (LLM) and vector database, and it can now successfully connect to the enterprise data stores on NetApp ONTAP in the data center in Seattle: F5 Distributed Cloud App Connect F5 Distributed Cloud App Connect securely connects and delivers distributed applications and services across hybrid and multicloud environments. By utilizing F5 Distributed Cloud App Connect, we can direct the inference traffic through F5 Distributed Cloud's security layers to safeguard our inference endpoints. Red Hat OpenShift on Amazon Web Services (ROSA) is a fully managed service that allows users to develop, run, and scale applications in a native AWS environment. We can host our inference service on ROSA so that we can leverage the scalability, cost-effectiveness, and numerous benefits of AWS’s managed infrastructure services. For instance, we can host our inference service on ROSA by deploying Ollama with multiple AI/ML models: Or, we can enable Model Serving on Red Hat OpenShift AI (RHOAI). Red Hat OpenShift AI (RHOAI) is a flexible and scalable AI/ML platform builds on the capabilities of Red Hat OpenShift that facilitates collaboration among data scientists, engineers, and app developers. This platform allows them to serve, build, train, deploy, test, and monitor AI/ML models and applications either on-premise or in the cloud, fostering efficient innovation within organizations. In our example, we use Red Hat OpenShift AI (RHOAI) Model Serving on ROSA for our inference service: Once inference service is deployed on ROSA, we can utilize F5 Distributed Cloud to secure our inference endpoint by steering the inference traffic through F5 Distributed Cloud's security layers, which offers an extensive suite of features designed specifically for the security of modern AI/ML inference endpoints. This setup would allow us to scrutinize requests, implement policies for detected threats, and protect sensitive datasets before they reach the inferencing service hosted within ROSA. In our example, we setup a F5 Distributed Cloud HTTP Load Balancer (rhoai-llm-serving.f5-demo.com), and we advertise it to the CE in the datacenter in California only: We now reach our Red Hat OpenShift AI (RHOAI) inference endpoint through F5 Distributed Cloud: F5 Distributed Cloud Web App & API Protection F5 Distributed Cloud Web App & API Protection provides comprehensive sets of security features, and uniform observability and policy enforcement to protect apps and APIs across hybrid and multicloud environments. We utilize F5 Distributed Cloud App Connect to steer the inference traffic through F5 Distributed Cloud to secure our inference endpoint. In our example, we protect our Red Hat OpenShift AI (RHOAI) inference endpoint by rate-limiting the access, so that we can ensure no single client would exhaust the inference service: A "Too Many Requests" is received in the response when a single client repeatedly requests access to the inference service at a rate higher than the configured threshold: This is just one of the many security features to protect our inference service. Click here to find out more on Securing Model Serving in Red Hat OpenShift AI (on ROSA) with F5 Distributed Cloud API Security. Demonstration In a real-world scenario, the front-end application could be hosted on the cloud, or hosted at the edge, or served through F5 Distributed Cloud, offering flexible alternatives for efficient application delivery based on user preferences and specific needs. To illustrate how all the discussed components work seamlessly together, we simplify our example by deploying Open WebUI as the front-end application on the Red Hat OpenShift cluster in the data center in California, which includes RAG services. While a DPU or GPU could be used for improved performance, our setup utilizes a CPU for inferencing tasks. We connect our app to our enterprise data stores deployed on NetApp ONTAP in the data center in Seattle using F5 Distributed Cloud Network Connect, where we have a copy of "Chapter 1. About the Migration Toolkit for Virtualization" from Red Hat. These documents are processed and saved to the Vector DB: Our embedding Large Language Model (LLM) is Sentence-Transformers/all-MiniLM-L6-v2, and here is our RAG template: Instead of connecting to the inference endpoint on Red Hat OpenShift AI (RHOAI) on ROSA directly, we connect to the F5 Distributed Cloud HTTP Load Balancer (rhoai-llm-serving.f5-demo.com) from F5 Distributed Cloud App Connect: Previously, we asked, "What is MTV?“ and we never received a response related to Red Hat Migration Toolkit for Virtualization: Now, let's try asking the same question again with RAG services enabled: We finally received the response we had anticipated. Next, we use F5 Distributed Cloud Web App & API Protection to safeguard our Red Hat OpenShift AI (RHOAI) inference endpoint on ROSA by rate-limiting the access, thus preventing a single client from exhausting the inference service: As expected, we received "Too Many Requests" in the response on our app upon requesting the inference service at a rate greater than the set threshold: With F5 Distributed Cloud's real-time observability and security analytics from the F5 Distributed Console, we can proactively monitor for potential threats. For example, if necessary, we can block a client from accessing the inference service by adding it to the Blocked Clients List: As expected, this specific client is now unable to access the inference service: Summary Deploying and securing RAG for enterprise RAG-enabled AI applications in a multi-vendor, hybrid, and multi-cloud environment can present complex challenges. In collaboration with Red Hat OpenShift AI (RHOAI) and NetApp ONTAP, F5 Distributed Cloud provides an effortless solution that secures RAG components seamlessly for enterprise RAG-enabled AI applications.263Views1like0CommentsEnhancing BIG-IP with F5 Distributed Cloud: Automated Service Discovery for Scalable Application Delivery and Security
The F5 Distributed Cloud Services (XC) feature called BIG-IP Service Discovery makes it easier to deliver and protect distributed applications on BIG-IP virtual servers. It does this by automatically finding them in an existing BIG-IP TMOS setup. Augmenting BIG-IP with F5 Distributed Cloud streamlines operations and maximizes efficiency. This makes it easier to change your network settings without having to do it yourself. It also makes it easier to manage global traffic, without having to worry about managing hardware across regions. Ensure application uptime with real-time health monitoring and automated service registration for seamless handling of ephemeral applications. Additionally, this accelerates deployment in new environments with high-speed discovery and one-click policy deployment. Simplify, scale, and secure your applications effortlessly with F5 Distributed Cloud Value delivered to BIG-IP deployments Service discovery unlocks the full potential of your BIG-IP deployments by extending them with F5 Distributed Cloud’s SaaS services. Customers gain centralized observability across multiple BIG-IP instances via the F5 Distributed Cloud Console, ensuring seamless visibility and control. It strengthens application security with advanced services like API Discovery and XC WAF while shifting the security perimeter to the F5 Global Network for superior defense against large-scale attacks. It also enables secure partner access with ease and simplifies application migration to public clouds to optimize BIG-IP resources. Technical details The feature requires the deployment of an F5 Distributed Cloud CE with reachability to the BIG-IP management and data interfaces. In the case of the F5 rSeries, the CE and BIG-IP can be deployed on the same hardware. See the reference architecture for details. For other BIG-IP hardware and virtual deployments, the CE can be deployed on any supported platform like VMWare, KVM, or bare-metal servers. The diagram below provides an overview of the solution in action: With the XC CE Site, you can securely access internal resources without exposing them to the internet, providing enhanced control and security. Once the XC Site is set up, configuring BIG-IP Service Discovery becomes straightforward. Before starting to configure Service Discovery, decide where the configuration will be. If BIG-IP is a dedicated resource managed by a single team, configure the Service Discovery object within the specific App Connect Namespace to ensure all resources are discovered in one namespace. This setup keeps the deployment isolated for use by a single team. Alternatively, for shared BIG-IP resources managed by different teams, configure the Service Discovery object in the Shared Configurations workspace. To begin with, create a new BIG-IP Service Discovery object from the XC Cloud portal. Then enter the BIG-IP Management IP and Username and click on Configure to add the Admin Password. This establishes communication between F5 XC Cloud and the BIG-IP deployment. In the Virtual Server Filter, you can fine-tune the discovery process by filtering Virtual Servers based on Name, Description, or Port Range. For instance, in this example: Name: Apply a regex filter using ^*app* to identify Virtual Servers containing the word "app" in their names. Port Range: Set the range to 8080-8090 to include only Virtual Servers operating within that specific port range. This flexible filtering mechanism allows you to target specific services for discovery, streamlining the load balancer configuration process. After applying the configuration, the Discovered Virtual Servers will appear in the interface. Keep in mind that it may take a few minutes for the system to load and display the Virtual Servers. Once they are listed, you can click on any of the discovered services to view detailed information After the Virtual Servers are discovered, it becomes possible to create an HTTP Load Balancer in just a few clicks. Simply provide a name, domain name, and SSL details, and the HTTP Load Balancer will be created and configured automatically. While the initial setup is quick and straightforward, you can further customize it later by adding advanced features such as enhanced security, high availability (HA), or a DMZ configuration to meet specific operational requirements. With HA, you will need to deploy an additional rSeries device with the same configuration to ensure redundancy and continuous availability. For a DMZ setup, a second data center is required to segregate external and internal traffic for added security. Once these components are in place, you can update the Origin Pool of the HTTP Load Balancer to include the new resources, ensuring a robust and scalable load balancing solution. The diagram below illustrates this configuration, showing how HA and DMZ work together with the HTTP Load Balancer to enhance reliability and security. Conclusion In this article, we walk through configuring BIG-IP Service Discovery to automatically discover Virtual Servers and create an HTTP Load Balancer to expose applications to the internet. Beyond the basic setup, we also implemented High Availability by adding a second rSeries device and introduced a DMZ deployment by including a second data center, ensuring a more resilient and secure architecture. More details on this feature and its configuration options are available in this technical documentation. Or you can view a demonstration of the feature and related use cases in this Teachable Course. With F5’s rSeries devices, you get the performance and scalability required to handle modern multi-cloud environments, while F5 Distributed Cloud simplifies management by providing centralized visibility and control. Elevate security, streamline operations, and future-proof your BIG-IP applications with F5 Distributed Cloud.148Views1like1CommentF5 Distributed Cloud and Transfer Encoding: Chunking
My team recently came across an unusual request from an F5 Distributed Cloud customer: How do we support HTTP/API clients that can only support transfer encoding chunked requests. What even is chunking? What is Transfer Encoding? The key word is "encoding" and HTTP uses a header to communicate what scheme encodes the data in a message body. These can be used for functional purposes as well as communication optimization. In the case of Transfer Encoding it is most commonly leveraged for chunking, which is taking a large bit of data and breaking it up into smaller pieces that are sent between two nodes along a path, transparently to the application sending/receiving messages. These nodes may not necessarily be the source and destination of an HTTP conversation, so proxies in between could transparently reassemble the chunks for differing parts of the path. It does not use a content-length header: Contrasting with Content Encoding, which is more commonly used for compression of message bodies (although this can be done with transfer encoding too) and requires the length to be defined. Proxies along the path are expected to not change these values, but this is not always the case. In our customer scenario, the request was exactly for the proxy (in this case Distributed Cloud) to support chunked requests from the client to an HTTP 2 server (HTTP2 does away with chunking completely). With Distributed Cloud, we fulfill this with three simple config elements: 1. The HTTP Load Balancer Object is configured to be an HTTP 1.1 virtual server: 2. The Origin is configured to use HTTP 2 (which defines Distributed Cloud's behavior as an HTTP client): And after applying the config, we go back to the HTTP Load Balancer dialog, to the Other Settings section and configure a Buffer Policy under Miscellaneous Options: A value configured in that dialog (it is the only property aside from an enable checkbox) will limit the request size to the specified value in bytes, but it has the added benefit of allowing the Distributed Cloud proxy to buffer the chunked requests and then convert them into content-encoding friendly values with length specified, and then send to the server via an HTTP 2 connection. To test this connection, a simple cURL command with the header "Transfer-Encoding: chunked" and the -v flag can validate your config. ex. curl -v --location 'https:/[URL/PATH]:PORT --header 'Transfer-Encoding: chunked' --data ‘’ In the ensuing response, the -v flag (verbose) will include the following in the response: * using HTTP/1.x > POST [PATH] HTTP/1.1 > Host: [URL] > User-Agent: curl/8.7.1 … > Transfer-Encoding: chunked … Note the Transfer-Encoding chunked line, which shows that chunking was used on the client-side connection. You can validate the server-side connection in the request logs in the Distributed Cloud dashboard by looking at the request headers specified in the event JSON: "rsp_headers": "{\":status\":\"200\",\"connection\":\"close\",\"content-length\":\"26930\", [TRUNCATED] This is a transfer-encoded chunked client-side request being converted to a content-encoded request on the server side: Special shoutout to fellow F5er Gowry Bhaagavathula for collaborating with me on getting this figured out!369Views0likes0CommentsHow I did it - “Delivering Kasm Workspaces three ways”
Securing modern, containerized platforms like Kasm Workspaces requires a robust and multi-faceted approach to ensure performance, reliability, and data protection. In this edition of "How I did it" we'll see how F5 technologies can enhance the security and scalability of Kasm Workspaces deployments.390Views2likes0CommentsF5 Distributed Cloud - Mitigation for Cross Tenant Origin Exposure (CTOE)
F5 Distributed Cloud (XC) offers a suite of powerful features designed to simplify the lives of administrators and engineers. A key aspect of this ease of use comes from shared objects, such as Regional Edge Proxies which utilize well-known public IP addresses. However, while this shared infrastructure enhances scalability and efficiency, it can also present risks if leveraged by attackers; and in this case, cross tenant origin exposure (CTOE). For instance: Customer(x) has tenant(x) in XC with a Load Balancer pointing to their public IP origin servers. These may be behind a perimeter firewall NAT (as diagrammed below) or be actual public IPs on the servers. Customers perimeter firewall is configured to deny all inbound traffic to public IP for site1.example.com Perimeter Firewall is configured to allow inbound traffic to public IP for site1.example.com for XC IP’s. (which is a well-known and public shared IP range) XC Proxy IP’s Reference Doc This setup is generally considered a minimum best practice because it restricts traffic to only those sources originating from XC. However, depending on the organization’s risk appetite, this level of security may be insufficient. The Risk Another account/tenant(y) within Distributed Cloud could create a load balancer and point to the public IP or DNS name of the origin pools for tenant(x). The attacker must know or learn the actual origin servers IP, or network segment to perform this attack. This discovery is fairly trivial and there are many approaches. In addition, what if the origin pool in tenant(x) is pointing to a DNS name that resolves to public IP’s? This is common with SaaS API gateways such as AWS and Azure to name a few and these gateways all use the same DNS name for the gateway respective to their cloud. Same DNS = Same IP’s = Easy to learn or guess Origin IP’s. For instance a common flow where a customer is using XC for WAF/WAAP and a 3rd party SAAS solution for an APIGW, may be Client–>XC(LB-WAAP)–>APIGW(pub-ip)–>API. In this default configuration, an attacker could learn the customers public NAT IP and add it to their Origin Pool. They can now instantiate attacks from their tenant(y) which will be sourced from the XC IP’s and allowed by the customer(x) perimeter firewall. Mitigation There are at least 4 ways to mitigate this risk. 1. L7 Header - If the origin servers (on-prem or SAAS) have something in front of them that is “L7 aware” or they themselves can be configured to do header validation, a custom HTTP request header could be injected into the flow by the load balancer in “tenant x”. Tenant y would not know or be able to see this header. Of course traffic not containing this header would still make it all the way to the L7 aware service before being dropped. While this would suffice for a L7 DoS or or other L7 type attack, it would not help with a L3/4 type attack which could still make it’s way through the infrastructure. 2. MTLS - A unique differentiator for F5 XC, is our ability to use server-side MTLS. If a customer has the capability on the Web Server/Service or something in front of it similar to the previous L7 header example, then we can add an additional layer of source validation by using mutual certificate authentication (mtls). Even a self-signed cert would add a lot of value here. No cert = no layer 7 access to the app or service. This does not prevent an L3/4 attack but will prevent unwanted application access. 3. Customer Edge (CE) proxies are deploy-able software that creates a private mesh back to our Application Delivery Network (ADN). These come with additional cost and need to be deployed at each location, thus creating a private mesh or overlay network that is unavailable outside of the tenant. in this scenario, the attacker traffic could potentially make it to the public IP of (or in front of) the CE and be dropped, thus protecting the application itself but still potentially allowing bad L3/4. 4. Private Link is a paid add-on to XC that enables connectivity between XC, clients, and resources. It offers many advantages, particularly when addressing regulatory and other security compliance requirements. Perimeter firewall rules can be simplified to allow traffic exclusively from Private Links, which are accessible only from the designated tenancy. Private Links can mitigate L3-L7 attacks because the link is entirely private by design. XC Private Link Overview A Word on L3/4 DDoS: L3/4 attacks were brought up several times above when talking about the technicalities of each mitigation method. While a L3/4 attack is not always distributed by nature, most are. One very important concept to keep in mind is the fact that XC natively provides L3/4 DDoS mitigation at our Regional Edges. Even in the examples above where “attack” traffic could make it all the way to the app or at least to the perimeter, if it was a true DDoS, this would get picked up by our Regional Edges and automatically mitigated. Conclusion In today’s interconnected cloud ecosystems, mitigating CTOE attacks is crucial to maintaining service availability and performance. By understanding the vulnerabilities that stem from cross-cloud communications and applying best practices, organizations can safeguard their systems from exploitation. As we continue to expand our cloud footprints, proactive security measures are not only necessary but must evolve alongside the complexity of the environments we manage. Effective CTOE prevention is an essential part of ensuring a resilient, high-performing network in this cloud-driven world. Like this article? Please drop a like or line below!153Views1like2Comments