Customer Edge Site High Availability for Application Delivery - Reference Architecture
Purpose This guide describes the reference architecture for deploying a highly available F5 Distributed Cloud (F5XC) Customer Edge (CE) site. It explains the networking options available to deploy a highly available multi-node CE site in an on-premise data center, branch location, or on the public cloud when deployed manually. Audience This guide is for technical readers, including NetOps and Solution Architect teams who want to better understand the various options for deploying a highly available F5 Distributed Cloud Customer Edge (CE) site. The guide assumes the reader is familiar with basic networking concepts like routing protocols, DNS, and data center network architecture. Also, the reader must be aware of various F5XC concepts such as Load Balancing, BGP configuration, Sites and Virtual Sites, and Site Local Inside (SLI) and Site Local Outside (SLO) interfaces. Introduction To create a resilient network architecture, all components on the network must be deployed in a redundant topology to handle device and connectivity failures. A CE acts as an L7 gateway and sits in the path of the network traffic, hence it needs a redundant architecture. For a production setup, it is recommended to deploy the site as a three-node cluster. These three nodes are the control nodes. Additional worker nodes can be added for higher L7 and security performance. Clustering on CE Site A CE can be deployed as a multi-node site for redundancy and scaling performance. CE runs kubernetes on its nodes and inherits k8s HA architecture of having either one or three control nodes and optional worker nodes. Production deployments are recommended to have 3 control nodes for redundancy and additional worker nodes to meet the performance requirements of the site. A multi-node site can tolerate one control node failure as it needs at least 2 nodes to form the quorum for HA. It's important to ensure that multiple control nodes don't fail simultaneously in each site. Worker node failures do not cause the whole site to fail. It only reduces the total throughput the site can handle. Note: The control nodes may also be addressed as master nodes in legacy documentation. Although they are called control nodes, they run both the control plane and data plane functions. Figure: CE Clustering In a multi-node setup, two CE control nodes form tunnels to the two closest REs. If one of the control nodes with a tunnel fails, it gets reassigned to the remaining control node. In a single node site, the same node forms tunnels to two different REs. Worker nodes are not supported for sites with a single control node. Figure: RE – CE connectivity CE Site HA Options In a regular deployment, a multi-node CE site is used to achieve redundancy. A load balancer configured on the CE site uses the IP address of SLI, SLO, or both interfaces as VIP by default. But this means the load balancer domain/hostname will need to resolve to multiple IP addresses across the nodes of the CE. To simplify this, F5XC also allows users to specify a custom IPv4 address as the VIP for each load balancer. An alternative topology is to use multiple single-node sites deployed across different availability zones in the data center or public cloud. In this case, the sites can be grouped into a Virtual Site. A load balancer can be configured with a custom VIP advertised to this Virtual Site. Both of these options are explained in detail in the sections below. High Availability Options for Single CE Site This section describes the deployment options available to direct the traffic across the CE nodes and lists the pros and cons of each option. The feasibility of these options may vary by environment (on-premise or public cloud) and networking tools available. These nuances are also explained for each option. For L4 and L7 load balancer VIPs on the CE site, all nodes (control and worker nodes) can actively receive the traffic. The site bandwidth scales linearly with the number of nodes. So, multiple worker nodes can be deployed based on the performance requirements of the site. For public load balancers, the VIP is on RE, and the bandwidth is limited to the bandwidth of the tunnel connecting the two CE control nodes to the REs. Layer 3 Redundancy Using Static Routing With ECMP This is the simplest way to configure redundancy for Load Balancer VIP on the CE cluster. The application admin can configure the LB with a user-specified VIP and the Network admin can configure equal-cost static routes for this VIP IP addresses, with the SLI/SLO IP addresses of the CE nodes as the next hop. The router uses Equal Cost Multi-Path (ECMP) to spread the traffic across the CE nodes. It is recommended to use consistent hashing ECMP configuration on the router to ensure an active session to a CE node is not rehashed in case another node fails. Figure: Static Routing Pros: VIP IP can be from any valid subnet. It is not restricted to the SLO or SLI subnet where it is advertised. Simple L3 routing configuration. Can scale with worker nodes with minimal route configuration change. All active nodes can receive the traffic. Cons: Needs routing configuration changes external to F5XC, every time a new LB VIP is created/deleted. Traffic will get blackholed when a CE node fails, until the node’s route is removed from the route configuration, or the node is restored. When To Use: When the NetOps team does not have access to routing devices with dynamic routing protocol capabilities like BGP. In use cases where the number of load balancers on the site is small and doesn’t change often, the operational overhead of configuring and managing the routes is less. Layer 3 Redundancy Using BGP Routing With ECMP BGP peering can be configured between F5XC CE and the router. This configuration requires LBs to be created with user-specified VIPs. The CE advertises equal cost, /32 routes to the VIP with the SLO/SLI as the next hop. The router uses Equal Cost Multi-Path (ECMP) to spread the traffic across the CE nodes. It is recommended to use sticky/persistent ECMP configuration on the router to ensure an active session to a CE node is not rehashed to a different node in case of a node failure. Note: Separate BGP peers must be configured for VIPs on SLO and SLI. Users can select the peer interface on CE while configuring the peers. For more information check BGP. Figure:BGP Routing Pros: VIP IP can be from any valid subnet. It is not restricted to the SLO or SLI subnet where it is advertised. Can automatically scale with worker nodes. Automatically revokes the route for failed CE node. Faster failover than any other method. All active nodes can receive the traffic. Cons: Needs advanced network configuration on the router. The router must support BGP. When To Use: The site has a large number of load balancers configured. Load balancers are frequently created and deleted. The application requires fast failover and minimal disruption in case of node failure. With network overlay technologies like Cisco ACI. Layer 2 redundancy using VRRP/GARP A user can enable VRRP on the CE site. This configuration requires LBs to be created with user-specified VIPs. Only control nodes participate in the VRRP redundancy group and one of them is elected as the leader for a VIP. VIPs will be randomly placed on different nodes. Only the leader node for the VIP sends out Gratuitous ARP (GARP) broadcasts for the VIP. If the leader node fails, a new leader is elected, and VIP is placed on it. Figure: VRRP/GARP Pros: No network configuration is required external to F5XC. Automatic failover of VIP when VRRP leader node fails. Cons: Only control nodes can receive the traffic. Only one node is actively receiving traffic at a given time. VIPs will be placed on different control nodes randomly. Equal distribution of VIPs across control nodes is not guaranteed. Failover can be slow depending on the ARP resolution time on the network. When To Use: The application team does not have access to routers, DNS servers, or load balancers on the network. (see other deployment options for details) The application does not require high throughput. Some traffic loss can be tolerated in case of node failure (e.g. for non-critical applications) Does not work in public cloud deployments as the cloud networking blocks GARP requests. External Proxy Load Balancing Network admin can configure an external load balancer (LB), with the CE SLO/SLI IP addresses in its origin pool, to spread the traffic across CE nodes. This can be a TCP or HTTP load balancer. For an external TCP LB, the client IP will be lost as the LB will SNAT the request before forwarding it to the CE nodes. F5XC does not support proxy protocol on the client side so it cannot be used to convey the client IP to the load balancer on the CE. For an external HTTP LB, the traffic will still get SNAT-ed, but the client IP can be persisted to the CE nodes if the external LB can add the X-Forwarded-For header to the request. If the LB on the CE site is a HTTPS LB or TCP LB with TLS enabled, the external LB will have to host the TLS certificate as it will terminate the client TLS sessions. A wild card certificate can be used to simplify this issue, but this may not always be a viable option for the applications. In case of public cloud deployments where L2 ARP and routing protocols may not work, in addition to TCP or HTTP load balancers, users also have the option to use Network Load Balancer on AWS and Google Cloud, Standard Load Balancer on Azure or similar feature on other public clouds, that does not SNAT the traffic but just forwards it to the CE nodes just like a router running ECMP. Note: For multi-node public cloud sites created using the F5XC console, it will automatically create the required cloud native LB. But we also support manual CE deployment in the cloud in which case the user will have to create the LB. Figure: External LB Pros: All active nodes can receive the traffic. Health probes can be configured to get F5XC LB health to avoid traffic blackholing. Can scale with worker nodes. Works for public cloud deployments. Cons: Managing certificates on external LB can be operationally challenging for TLS traffic. No source IP retention in the case of TCP LB Adds additional proxy hop. External LB can become a performance bottleneck even if CE is scaled out using worker nodes. When To Use: There is an existing load balancer (usually in DMZ) in the traffic path, but CE is used for additional services like WAAP, DDoS, etc. In public clouds where cloud LBs can be used to load balance to the nodes. DNS Load Balancing Network admins can use DNS to resolve application hostnames to the SLI/SLO IP addresses on the CE Nodes. The DNS can be configured to respond with one IP at a time in a round robin manner. Alternatively, a private DNS LB or Global Server Load Balancer (GSLB) can be used which can make load-based intelligent decisions to distribute the traffic more evenly. User-specified VIPs must not be used in this case as the hostname must resolve to the individual node’s SLI/SLO IP address for the traffic to get routed to the node. Figure: DNS LB Pros: Can be configured using existing DNS servers. Can scale with worker nodes. Works for public cloud deployments. (Not recommended as better options are available) This does not add a L4/L7 hop to the traffic path. Cons: Needs DNS configuration changes external to F5XC, every time a new LB VIP is created/deleted, or a new worker node is added. Traffic will get blackholed when a CE node fails, until the node’s IP is removed from the DNS configuration, or the node is restored. Intelligent distribution of traffic requires GSLB which can be expensive. Subject to DNS cache and TTL causing clients to resolve to a down CE node. When To Use: The application team only has access to GSLB or DNS server and does not want to limit the traffic to only one node at a time as in the case of the VRRP/GARP option High performance is not a requirement as multiple clients may resolve to the same node even if the site has multiple nodes Can be used in the public cloud if the user does not want to create an external LB. High Availability Using Multiple Single Node Sites Across Availability Zones Instead of deploying a single multi-node site, customers can opt to deploy two (or more) single-node sites and use them together (as individual sites or grouped into a Virtual Site) to advertise a VIP. This can be useful if the data center has two AZs so it’s more logical to deploy a CE on each AZ than deploying a three-node CE with one node in one AZ and two nodes in the other. By upgrading one site at a time, it is guaranteed that at least one site will always be online to serve the traffic, providing resiliency against upgrade failures. This is very useful in case of critical applications demanding zero downtime. All the deployment options above, other than the VRRP/GARP method, can be used in this case. It is recommended to use a consistent hash configuration for ECMP on the router to ensure all packets in a TCP session from a client are always routed to the same site. In this deployment, each CE has two tunnels to the nearest REs. Hence, this method is also beneficial when you want to publish an app to the internet using F5XC Regional Edges (REs) as you can scale throughput by adding CEs and hence more tunnels. Note: This is a big advantage of this topology over a multi-node site as the latter is limited to only two tunnels. For Public load balancers, the VIP is on the Regional Edges (REs) on the F5XC global network. The load balancing happens on the REs and the CEs provide secure connectivity with auto SNAT between REs and private origins. So, to get the most out of the available compute, the CEs in this case can be configured for Enhanced L3 performance mode as all the L7 processing happens on the RE. Figure: RE-CE tunnels for multiple single-node site deployment Conclusion This guide should help the reader learn about the various HA options available in F5XC to make an informed decision on which method to choose based on their requirements and the networking tools available. For a more detailed explanation of the above options with config examples also see:F5 Distributed Cloud – CE High Availability Options: A Comparative Exploration Related Articles F5XC Load Balancing and Distributed Proxy Concepts F5XC Virtual Network Concepts F5XC Site BGP Configurations on F5XC F5 Distributed Cloud - Customer Edge Site - Deployment & Routing Options F5 Distributed Cloud - Listener Logic101Views1like0CommentsSimplify Network Segmentation for Hybrid Cloud
Introduction Enterprises have always had the need to maintain separate development and production environments. Operational efficiency, reduction of blast radius, security and compliance are generally the common objectives behind separating these environments. By dividing networks into smaller, isolated segments, organizations can enhance security, optimize performance, and ensure regulatory compliance. This article demonstrates a practical strategy for implementing network segmentation in modern multicloud environments that also connect on-prem infrastructure. This uses F5 Distributed Cloud (F5 XC) services to connect and secure network segments in cloud environments like Amazon Web Services (AWS) and on-prem datacenters. Need for Segmentation Network segmentation is critical for managing complex enterprise environments. Traditional methods like Virtual Routing and Forwarding (VRFs) and Multiprotocol Label Switching (MPLS) have long been used to create isolated network segments in on-prem setups. F5 XC ensures segmentation in environments like AWS and it can extend the same segmentation to on-prem environments. These techniques separate traffic, enhance security, and improve network management by preventing unauthorized access and minimizing the attack surface. Scenario Overview Our scenario depicts an enterprise with three different environments (prod, dev, and shared services) extended between on-prem and cloud. A 3rd party entity requires access to a subset of the enterprise's services. This article, covers the following two networking segmentation use-cases: Hybrid Cloud Transit Extranet (servicing external 3 rd party partners/customers) Hybrid Cloud Transit Consider an enterprise with three distinct environments: Production (Prod), Development (Dev), and Shared Services. Each environment requires strict isolation to ensure security and performance. Using F5 XC Cloud Connect, we can assign each VPC a network segment effectively isolating the VPC’s. Segments in multiple locations (or VPC’s) can traverse F5 XC to reach distant locations whether in another cloud environment or on-prem. Network segments are isolated by default, for example, our Prod segment cannot access Shared. A segment connector is needed to allow traffic between Prod and Shared. The following diagram shows the VPC segments, ensuring complete "ships in the night" isolation between environments. In this setup, Prod, Dev, and Shared Services environments operate independently and are completely isolated from one another at the control plane level. This ensures that any issues or attacks in one environment do not affect the others. Customer Requirement: Shared Services Access Many enterprises deploy common services across their organization to support internal workloads and applications. Some examples include DHCP, DNS, NTP, and NFS, services that need to be accessible to both Prod and Dev environments while keeping Prod and Dev separate from each other. Segment Connectors is a method to allow communication between two isolated segments by leaking the routes between the source and destination segments. It is important to note that segment connector can be of type Direct or SNAT. Direct allows bidirectional communication between segments whereas the SNAT option allows unidirectional communication from the source to the destination. Extending Segmentation to On-Premises Enterprises already use segmented networks within their on-premises infrastructure. Extending this segmentation to AWS involves creating similar isolated segments in the cloud and establishing secure communication channels. F5 XC allows you to easily extend this segmentation from on-prem to the cloud regardless of the underlay technology. In this scenario, communication between the on-premises Prod segment and its cloud counterpart is seamless, and the same also applies for the Dev segment. Meanwhile Dev and Prod stay separate ensuring that existing security and isolation is preserved across the hybrid environment. Extranet In this scenario an external entity (customer/partner) needs access to a few applications within our Prod segment. There are two different ways to enable this access, Network-centric and App-centric. Let’s refer to the external entity as Company B. In order to connect Company B we generally need appropriate cloud credentials, but Company B will not share their cloud credentials with us. To solve this problem, F5 XC recommends using AWS STS:AssumeRole functionality whereby Company B creates an AWS IAM Role that trusts F5 XC with the minimum privileges necessary to configure Transit Gateway (TGW) attachments and TGW route table entries to extend access to the F5 XC network or network segments. Section 1 – Network-centric Extranet Many times, partners & customers need to access a unique subset of your enterprise’s applications. This can be achieved with F5 XC’s dedicated network segments and segment connectors. With a segment connector for the external and prod network segments, we can give Company B access to the required HTTP service without gaining broader access to other non-Prod segments. Locking Down with Firewall Policies We can implement a Zero Trust firewall policy to lock down access from the external segment. By refining these policies, we ensure that third-party consumers can only access the services they are authorized to use. Our firewall policy on the CE only allows access from the external segment to the intended application on TCP/80 in Prod. [ec2-user@ip-10-150-10-146 ~]$ curl --head 10.1.10.100 HTTP/1.1 200 OK Server: nginx/1.24.0 (Ubuntu) Date: Thu, 30 May 2024 20:50:30 GMT Content-Type: text/html Content-Length: 615 Last-Modified: Wed, 22 May 2024 21:35:11 GMT Connection: keep-alive ETag: "664e650f-267" Accept-Ranges: bytes [ec2-user@ip-10-150-10-146 ~]$ ping -O 10.1.10.100 PING 10.1.10.100 (10.1.10.100) 56(84) bytes of data. no answer yet for icmp_seq=1 no answer yet for icmp_seq=2 no answer yet for icmp_seq=3 ^C --- 10.1.10.100 ping statistics --- 4 packets transmitted, 0 received, 100% packet loss, time 3153ms After applying the new policies, we confirm that the third-party access is restricted to the intended services only, enhancing security and compliance. This demonstrates how F5 Distributed Cloud services enable networking segmentation across on-prem and cloud environments, with granular control over security policies applied between the segments. Section 2 - App-centric Extranet In the scenario above, Company B can directly access one or more services in Prod with a segment connector and we’ve locked it down with a firewall policy. For the App-centric method, we’ll only publish the intended services that live in Prod to the external segment. App-centric connectivity is made possible without a segment connector by using load balancers within App Connect that target the application within the Prod segment and advertises its VIP address to the external segment. The following illustration shows how to configure each component in the load balancer. Visualization of Traffic Flows The visualization flow analysis tool in the F5 XC Console shows traffic flows between the connected environments. By analyzing these flows, particularly between third-party consumers and the Prod environment, we can identify any unintended access or overreach. The following diagram is for a Network-centric connection flow: This following diagram shows an App-centric connection flow using the load balancer: Conclusion Effective network segmentation is a cornerstone of secure and efficient cloud environments. We’ve discussed how F5 XC enables hybrid cloud transit and extranet communication. Extranet can be done with either a network centric or app-centric deployment. F5 XC is an end to end platform that manages and orchestrates end-to-end segmentation and security in hybrid-cloud environments. Enterprises can achieve comprehensive segmentation, ensuring isolation, secure access, and compliance. The strategies and examples provided demonstrate how to implement and manage segmentation across hybrid environments, catering to diverse requirements and enhancing overall network security. Additional Resources More features and guidance are provided in the comprehensive guide below, where showing exactly how you can use the power and flexibility of F5 Distributed Cloud and Cloud Connect to deliver a Network-centric approach with a firewall and an App-centric approach with a load balancer. Create and manage segmented networks inyour own cloud and on-prem environments, and achieve the following benefits: Ability to isolate environments within AWS Ability to extend segmentation to on-prem environments Ability to connect external partners or customers to a specific segment Use Enhanced Firewall Policies to limit access and reduce the blast radius Enhance the compliance and regulatory requirements by isolating sensitive data and systems Visualize and monitor the traffic flows and policies across segments and network domains Workflow Guide - Secure Network Fabric (Multi-Cloud Networking) DevCentral: Secure Multicloud Networking Article Series GitHub: S-MCN Use-case Playbooks (Console, Automation) for F5 Distributed Cloud Customers F5.com: Product Information Product Documentation Network Segmentation Cloud Connect Network Segment Connectors App Security App Networking CE Site Management81Views0likes0CommentsF5 Distributed Cloud – CE High Availability Options: A Comparative Exploration
This article explores an alternative approach to achieve HA across single CE nodes, catering for use cases requiring higher performance and granular control over redundancy and failover management. Introduction F5 Distributed Cloud offers different techniques to achieve High Availability (HA) for Customer Edge (CE) nodes in an active-active configuration to provide redundancy, scaling on-demand and simplify management. By default, F5 Distributed Cloud uses a method for clustering CE nodes, in which CEs keep track of peers by sending heartbeats and facilitating traffic exchange among themselves. This method also handles the automatic transfer of traffic, virtual IPs, and services between CE peers —excellent for simplified deployment and running App Stack sites hosting Kubernetes workloads. However, if CE nodes are deployed mainly to manage L3/L7 traffic and application security, this default model might lack the flexibility needed for certain scenarios. Many of our customers tell us that achieving high availability is not so straightforward with the current clustering model. These customers often have a lot of experience in managing redundancy and high availability across traditional network devices. They like to manage everything themselves—from scheduling when to switch over to a redundant pair (planned failover), to choosing how many network paths (tunnels) to use between CEs to REs (Regional Edges) or other CEs. They also want to handle any issues device by device, decide the number of CE nodes in a redundancy group, and be able to direct traffic to different CEs when one is being updated. Their feedback inspired us to write this article, where we explore a different approach to achieve high availability across CEs. The default clustering model is explained in this document: https://docs.cloud.f5.com/docs/ves-concepts/site#cluster-of-nodes Throughout this article, we will dive into several key areas: An overview of the default CE clustering model, highlighting its inherent challenges and advantages. Introduction to an alternative clustering strategy: Single Node Clustering, including: An analysis of its challenges and benefits. Identification of scenarios where this approach is most applicable. A guide to the configuration steps necessary to implement this model. An exploration of failover behavior within this framework. A comparison table showing how this new method differs from the default clustering method. By the end of this article, readers will gain an understanding of both clustering approaches, enabling informed decisions on the optimal strategy for their specific needs. Default CE Clustering Overview In a standard CE clustering setup, a cluster must have at least three Master nodes, with subsequent additions acting as Worker nodes. A CE cluster is configured as a "Site," centralizing operations like pool configuration and software upgrades to simplify management. In this clustering method, frequent communication is required between control plane components of the nodes on a low latency network. When a failover happens, the VIPs and services - including customer’s compute workloads - will transition to the other active nodes. As shown in the picture above, a CE cluster is treated as a single site, regardless of the number of nodes it contains. In a Mesh Group scenario, each mesh link is associated with one single tunnel connected to the cluster. These tunnels are distributed among the master nodes in the cluster, optimizing the total number of tunnels required for a large-scale Mesh Group. It also means that the site will be connected to REs only via 2 tunnels – one to each RE. Design Considerations for Default CE Clustering model: Best suited for: 1- App Stack Sites: Running Kubernetes workloads necessitates the default clustering method for container orchestration across nodes. 2- Large-scale Site-Mesh Groups (SMG) 3- Cluster-wide upgrade preference: Customers who favour managing nodes collectively will find cluster-wide upgrades more convenient, however without control over the upgrade sequence of individual nodes. Challenges: o Network Bottleneck for Ingress Traffic: A cluster connected to two Regional Edge (RE) sites via only 2 tunnels can lead to only two nodes processing external (ingress) traffic, limiting the use of additional nodes to process internal traffic only. o Three-master node requirement: Some customers are accustomed to dual-node HA models and may find the requirement for three master nodes resource-intensive. o Hitless upgrades: Controlled, phased upgrades are preferred by some customers for testing before widespread deployment, which is challenging with cluster-wide upgrades. o Cross-site deployments: High network latency between remote data centers can impact cluster performance due to the latency sensitivity of etcd daemon, the backbone of cluster state management. If the network connection across the nodes gets disconnected, all nodes will most likely stop the operation due to the quorum requirements of etcd. Therefore, F5 recommends deploying separate clusters for different physical sites. o Service Fault Sprawl and limited Node fault tolerance: Default clusters can sometimes experience a cascading effect where a fault in a node spreads throughout the cluster. Additionally, a standard 3-node cluster can generally only tolerate the failure of two nodes. If the cluster was originally configured with three nodes, functionality may be lost if reduced to a single active node. These limitations stem from the underlying clustering design and its dependency on etcd for maintaining cluster state. The Alternative Solution: HA Between Multiple Single Nodes The good news is that we can achieve the key objectives of the clustering – which are streamlined management and high availability - without the dependency on the control plane clustering mechanisms. Streamlined management using “Virtual Site”: F5 Distributed Cloud provides a mechanism called “Virtual Site” to perform operations on a group of sites (site = node or cluster of nodes), reducing the need to repeat the same set of operations for each site. The “Virtual Site” acts as an abstraction layer, grouping nodes tagged with a unique label and allows collectively addressing these nodes as a single entity. Configuration of origin pools and load balancers can reference Virtual Sites instead of individual sites/nodes, to facilitate cluster-like management for two or more nodes and enabling controlled day 2 operations. When a node is disassociated from Virtual Site by removing the label, it's no longer eligible for new connections, and its listeners are simultaneously deactivated. Upgrading nodes is streamlined: simply remove the node's label to exclude it from the Virtual Site, perform the upgrade, and then reapply the label once the node is operational again. This procedure offers you a controlled failover process, ensuring minimal disruption and enhanced manageability by minimizing the blast radius and limiting the cope of downtime. As traffic is rerouted to other CEs, if something goes wrong with an upgrade of a CE node, the services will not be impacted. HA/Redundancy across multiple nodes: Each single node in a Virtual Site connect to dual REs through IPSec or SSL/TLS tunnels, ensuring even load distribution and true active-active redundancy. External (Ingress) Traffic: In the Virtual Site model, the Regional Edges (REs) distribute external traffic evenly across all nodes. This contrasts with the default clustering approach where only two CE nodes are actively connected to the REs. The main Virtual Site advantage lies in its true active/active configuration for CEs, increasing the total ingress traffic capacity. If a node becomes unavailable, the REs will automatically reroute the new connections to another operational node within the Virtual Site, and the services (connection to origin pools) remain uninterrupted. Internal (East-West) Traffic: For managing internal traffic within a single CE node in a Virtual Site (for example, when LB objects are configured to be advertised within the local site), all network techniques applicable to the default clustering model can be employed in this model as well, except for the Layer 2 attachment (VRRP) method. Preferred load distribution method for internal traffic across CEs: Our preferred methods for load balancing across CE nodes are either DNS based load balancing or Equal-Cost Multi-Path (ECMP) routing utilizing BGP for redundancy. DNS Load Balancer Behavior: If a node is detached from a Virtual Site, its associated listeners and Virtual IPs (VIPs) are automatically withdrawn. Consequently, the DNS load balancer's health checks will mark those VIPs as down and prevent them from receiving internal network traffic. Current limitation for custom VIP and BGP: When using BGP, please note a current limitation that prevents configuring a custom VIP address on the Virtual Site. As a workaround, custom VIPs should be advertised on individual sites instead. The F5 product team is actively working to address this gap. For a detailed exploration of traffic routing options to CEs, please refer to the following article here: https://community.f5.com/kb/technicalarticles/f5-distributed-cloud---customer-edge-site---deployment--routing-options/319435 Design Considerations for Single Node HA Model: Best suited for: 1- Customers with high throughput requirement: This clustering model ensures that all Customer Edge (CE) nodes are engaged in managing ingress traffic from Regional Edges (REs), which allows for scalable expansion by adding additional CEs as required. In contrast, the default clustering model limits ingress traffic processing to only two CE nodes per cluster, and more precisely, to a single node from each RE, regardless of the number of worker nodes in the cluster. Consequently, this model is more advantageous for customers who have high throughput demands. 2- Customers who prefer to use controlled failover and software upgrades This clustering model enables a sequential upgrade process, where nodes are updated individually to ensure each node upgrades successfully before moving on to the other nodes. The process involves detaching the node from the cluster by removing its site label, which causes redirecting traffic to the remaining nodes during the upgrade. Once upgraded, the label is reapplied, and this process is repeated for each node in turn. This is a model that customers have known for 20+ years for upgrade procedures, with a little wrinkle with the label. 3- Customers who prefer to distribute the load across remote sites Nodes are deployed independently and do not require inter-node heartbeat communication, unlike the default clustering method. This independence allows for their deployment across various data centers and availability zones while being managed as a single entity. They are compatible with both Layer 2 (L2) spanned and Layer 3 (L3) spanned data centers, where nodes in different L3 networks utilize distinct gateways. As long as the nodes can access the origin pools, they can be integrated into the same "Virtual Site". This flexibility caters to customers' traditional preferences, such as deploying two CE nodes per location, which is fully supported by this clustering model. Challenges: Lack of VRRP Support: The primary limitation of this clustering method is the absence of VRRP support for internal VIPs. However, there are some alternative methods to distribute internal traffic across CE nodes. These include DNS based routing, BGP with Equal-Cost Multi-Path (ECMP) routing, or the implementation of CEs behind another Layer 4 (L4) load balancer capable of traffic distribution without source address alteration, such as F5 BIG-IPs or the standard load balancers provided by Azure or AWS. Limitation on Custom VIP IP Support: Currently, the F5 Distributed Cloud Console has a restriction preventing the configuration of custom virtual IPs for load balancer advertisements on Virtual Sites. We anticipate this limitation will be addressed in future updates to the F5 Distributed Cloud platform. As a temporary solution, you can advertise the LB across multiple individual sites within the Virtual Site. This approach enables the configuration of custom VIPs on those sites. Requires extra steps for upgrading nodes Unlike the Default clustering model where upgrades can be performed collectively on a group of nodes, this clustering model requires upgrading nodes on an individual basis. This may introduce more steps, especially in larger clusters, but it remains significantly simpler than traditional network device upgrades. Large-Scale Mesh Group: In F5 Distributed Cloud, the "Mesh Group" feature allows for direct connections between sites (whether individual CE sites or clusters of CEs) and other selected sites through IPSec tunnels. For CE clusters, tunnels are established on a per-cluster basis. However, for single-node sites, each node creates its own tunnels to connect with remote CEs. This setup can lead to an increased number of tunnels needed to establish the mesh. For example, in a network of 10 sites configured with dual-CE Virtual Sites, each CE is required to establish 18 IPSec tunnels to connect with other sites, or 19 for a full mesh configuration. Comparatively, a 10-site network using the default clustering method—with a minimum of 3 CEs per site—would only need up to 9 tunnels from each CE for full connectivity. Opting for Virtual Sites with dual CEs, a common choice, effectively doubles the number of required tunnels from each CE when compared to the default clustering setup. However, despite this increase in tunnels, opting for a Mesh configuration with single-node clusters can offer advantages in terms of performance and load distribution. Note: Use DC Groups as an alternative solution to Secure Mesh Group for CE connectivity: For customers with existing private connectivity between their CE nodes, running Site Mesh Group (SMG) with numerous IPsec tunnels can be less optimal. As a more scalable alternative for these customers, we recommend using DC Cluster Group (DCG). This method utilizes IP-in-IP tunnels over the existing private network, eliminating the need for individual encrypted IPsec tunnels between each node and streamlining communication between CE nodes via IP-n-IP encapsulations. Configuration Steps The configuration for creating single node clusters involves the following steps: Creating a Label Creating a Virtual Site Applying the label to the CE nodes (sites) Review and validate the configuration The detailed configuration guide for the above steps can be found here: https://docs.cloud.f5.com/docs/how-to/fleets-vsites/create-virtual-site Example Configuration: In this example, you can create a label called "my-vsite" to group CE nodes that belong to the same Virtual Site. Within this label, you can then define different values to represent different environments or clusters, such as specific Azure region or an on-premise data center. Then a Virtual Site of “CE” type can be created to represent the CE cluster in “Azure-AustraliaEast-vSite" and tied to any CE that is tagged with the label “my-vsite=Azure-AustraliaEast-vSite”: Now, any CE node that should join the cluster (Virtual Site), should get this label: Verification: To confirm the Virtual Site configuration is functioning as intended, we joined two CEs (k1-azure-ce2 and k1-azure-ce03) into the Virtual Site and evaluated the routing and load balancing behavior. Test 1: Public Load Balancer (Virtual Site referenced in the pool) The diagram shows a public "Load Balancer" advertised on the RE referencing a pool that uses the newly created Virtual Site to access the private application: As shown below, the pool member was configured to be accessed through the Virtual Site: Analysis of the request logs in the Performance dashboard confirmed that all requests to the public website were evenly distributed across both CEs. Test 2: Internal Load Balancer (LB advertised on the Virtual Site) We deployed an internal Load Balancer and advertised it on the newly created Virtual Site, utilizing the pool that also references the same Virtual Site (k1-azure-ce2 and k1-azure-ce03). As shown below, the Load Balancer was configured to be advertised on the Virtual Site. Note: Here we couldn't use a "shared" custom VIP across the Virtual Site due to a current platform constraint. If a custom VIP is required, we should use "site" as opposed to "Virtual Site" and advertise the Load Balancer on all sites, like below picture: Request logs revealed that when traffic reached either CE node within the Virtual Site, the request was processed and forwarded locally to the pool member. In the example below: src_site: Indicates the CE (k1-azure-ce2) that processed the request. src_ip: Represents the client's source IP address (192.168.1.68). dst_site: Indicates the CE (k1-azure-ce2) from which the pool member is accessed. dst_ip: Represents the IP address of the pool member (192.168.1.6). Resilience Testing: To assess the Virtual Site's resilience, we intentionally blocked network access from k1-azure-ce2 CE to the pool member (192.168.1.6). The CE automatically rerouted traffic to the pool member via the other CE (k1-azure-ce03) in the Virtual Site. Note:By default, CEs can communicate with each other via the F5 Global Network. This can be customized to use direct connectivity through tunnels if the CEs are members of the same DC Cluster Group (IP-n-IP tunneling) or Secure Mesh Groups (IPSec tunneling). The following picture shows the traffic flow via F5 Global Network. The following picture shows the traffic flow via the IP-n-IP tunnel when a DC Clustering Group (DCG) is configured across the CE nodes. Failover Behaviour When a CE node is tied to a Virtual Site, all internal Load Balancers (VIPs) advertised on that Virtual Site will be deployed in the CE. Additionally, the Regional Edge (RE) begins to use this node as one of the potential next hops for connections to the origin pool. Should the CE become unavailable, or if it lacks the necessary network access to the origin server, the RE will almost seamlessly reroute connections through the other operational CEs in the Virtual Site. Uncontrolled Failover: During instances of uncontrolled failover, such as when a node is unexpectedly shut down from the hypervisor, we have observed a handful of new connections experiencing timeouts. However, these issues were resolved by implementing health checks within the origin pool, which prevented any subsequent connection drops. Note: Irrespective of the clustering model in use, it's always recommended to configure health checks for the origin pool. This practice enhances failover responsiveness and mitigates any additional latency incurred during traffic rerouting. Controlled Failover: The moment a CE node is disassociated from the Virtual Site — by the removal of its label— the CE node will not be used by RE to connect to origin pools anymore. At the same time, all Load Balancer listeners associated with that Virtual Site are withdrawn from the node. This effectively halts traffic processing for those applications, preventing the node from receiving related traffic. During controlled failover scenarios, we have observed seamless service continuity on externally advertised services (to REs). On-Demand Scaling: F5 Distributed Cloud provides a flexible solution that enables customers to scale the number of active CE nodes according to demand. This allows you to easily add more powerful CE nodes during peak periods (such as promotional events) and then remove them when demand subsides. With the Virtual Sites method, you can even mix and match node sizes within your cluster (Virtual Site), providing granular control over resources. It's advisable to monitor CE node performance and implement node related alerts. These alerts notify you when nodes are operating at high capacity, allowing for timely addition of extra nodes as needed. Moreover, you can monitor node’s health in the dashboard. CPU, Memory and Disk utilizations of nodes can be a good factor in determining if more nodes are needed or not. Furthermore, the use of Virtual Sites makes managing this process even easier, thanks to labels. Node Based Alerts: Node-based alerts are essential for maintaining efficient CE operations. Accessing the alerts in the Console: To view alerts, go to Multi-Cloud Network Connect > Notifications > Alerts. Here, you can see both "Active Alerts" and "All Alerts." Alerts related to node health fall under the "infrastructure" alert group. The following screenshot shows alerts indicating high loads on the nodes. Configuring Alert Policies: Alert policies determine the notification process for raised alerts. To set up an alert policy, navigate to Multi-Cloud Network Connect > Alerts Management > Alert Policies. An alert policy consists of two main elements: the alert receiver configuration and the policy rules. Configuring Alert Receiver: The configuration allows for integration with platforms like Slack and PagerDuty, among others, facilitating notifications through commonly used channels. Configuring Alert Rules: For alert selection, we recommend configuring notifications for alerts with severity of “Major” or “Critical” at a minimum. Alternatively, the “infrastructure” group which includes node-based alerts can be selected. Comparison Table Criteria Default Cluster Single Node HA Minimum number of nodes in HA 3 2 Upgrade operations Per cluster Per Node Network redundancy and client side routing for east-west traffic VRRP, BGP, DNS, L4/7 LB DNS, L4/7 LB, BGP* Tunnels to RE 2 tunnels per cluster 2 tunnels per node Tunnels to other CEs (SMG or DCG) 1 tunnel from each cluster 1 tunnel from each node External traffic processing Limited to 2 nodes All nodes will be active Internal traffic processing All nodes can be active All nodes can be active Scale management in Public Cloud Sites Straightforward, by configuring ingress interfaces in Azure/AWS/GCP sites Straightforward, by adding or removing the labels Scale management in Secure Mesh Sites Requires reconfiguring the cluster (secure mesh site) - may cause interruption Straightforward, by adding or removing the labels Custom VIP IP Available Not Available (Planned to be available in future releases), workaround available. Node sizes All nodes should be same size. Upgrading node size in a cluster is a disruptive operation. Any node sizes or clusters can join the Virtual Site * When using BGP, please note a current limitation that prevents configuring custom VIP address on the Virtual Site. Conclusion: F5 Distributed Cloud offers a flexible approach to High Availability (HA) across CE nodes, allowing customers to select the redundancy model that best fits their specific use cases and requirements. While we continue to advocate for default clustering approach due to their operational simplicity and shared VRRP VIP or, unified network configuration benefits, especially for routine tasks like upgrades, the Virtual Site and single node HA model presents some great use cases. It not only addresses the limitations and challenges of the default clustering model, but also introduces a solution that is both scalable and adaptable. While Virtual Sites offer their own benefits, we recognize they also present trade-offs. The overall benefits, particularly for scenarios demanding high ingress (RE to CE) throughput and controlled failover capabilities cater to specific customer demands. The F5 product and development team remains committed to addressing the limitations of both default clustering and Virtual Sites discussed throughout this article. Their focus is on continuous improvement and finding the solutions that best serve our customers' needs. References and Additional Links: Default Clustering model: https://docs.cloud.f5.com/docs/ves-concepts/site#cluster-of-nodes Configuration guide for Virtual Sites:https://docs.cloud.f5.com/docs/how-to/fleets-vsites/create-virtual-site Routing Options for CEs:https://community.f5.com/kb/technicalarticles/f5-distributed-cloud---customer-edge-site---deployment--routing-options/319435 Configuration guide for DC Clustering Group:https://docs.cloud.f5.com/docs/how-to/advanced-networking/configure-dc-cluster-group722Views3likes0CommentsThe Power of &: F5 Hybrid DNS solution
While some organizations prioritize the advantages of a SaaS solution like scalability, others value the benefits of an on-premises solution, such as data control and migration flexibility. This is why having the option to deploy a hybrid model can be beneficial, not just for redundancy, but also for allowing organizations to blend the best of both worlds. Understanding theArchitecture’scomponents F5 BIG-IP DNS - (formerly BIG-IP GTM) is a well-known on-premise solution for delivering high-performance DNS services such as DNSExpress and DNS Caching. It is also recognized for offering intelligent DNS responses that are based on various factors such as LDNS’ Geolocation (GSLB) and health status of applications. F5 Distributed Cloud DNS (F5 XC DNS) – It is F5’s SaaS-based DNS solution which is built on a global data plane, ensuring automatic scalability to meet high-volume demand. Additionally, it also provides GSLB and security such as DNS DoS protection. On the diagram above, BIG-IP DNS will be the hidden primary DNS, acting as the source of truth for DNS records. This setup ensures centralized control and adds an extra layer of security by reducing exposure to potential attacks. F5XC DNS will function as the secondary DNS server, receiving DNS records from BIG-IP via Zone Transfer. It will be responsible for handling public DNS queries and providing domain name resolution services to clients. In the first part of this article, we will show you how to set up and configure BIG-IP DNS as the hidden primary and F5XC DNS as the authoritative secondary DNS server. For some, this setup is sufficient for their requirements, but for others, there may be additional requirements to consider in this hybrid design. In the later part of this article, we will demonstrate how we can address these challenges by leveraging F5's platform features and capabilities! Steps on Implementing F5 Hybrid DNS Solution Step 1: Configure BIG-IP DNS First, we need to configure BIG-IP DNS to be able to perform a zone transfer to F5XC DNS. For more details on the configuration, you can check this link: https://community.f5.com/kb/technicalarticles/configuring-big-ip-for-zone-transfer-and-dnssec/330359 Step 2: Configure F5XC DNS Now after you've configured BIG-IP DNS, we need to configure F5 XC DNS to be a secondary DNS server. For more details, check the steps below: Log into XC Console, selectDNS Managementoption, clickAdd Zone. In Domain Name field, enter the domain/subdomain. In our example, it will bef5sg.com Zone Type:Secondary DNS Configuration Under the Secondary DNS Configuration field, clickConfigure On the DNS primary server IP field, enter the public IP address of the Primary DNS. In our example it will be the Public IP of BIG-IP DNS. On TSIG key, enter the name we used to generate TSIG earlier in BIG-IP. In our example, usedexample. On the TSIG Key algorithm field and select an algorithm from the drop-down. Selecthmac-sha256. Click Configure in the TSIG key value in base 64 format section, On the Secret Type field, selectClear Secretand paste the secret in the Secret field. Use the same secret we generated earlier in BIG-IP DNS. ClickApply. You should see the DNS records transferred from BIG-IP DNS to F5XC DNS Step 3: Configure Domain Registrar In this example, the domain registrar I'm using is Namecheap. I'll configure it so that the authoritative name server for the domain f5sg.com is set to F5XC (ns1.f5clouddns.com and ns2.f5clouddns.com). The steps will vary depending on which domain registrar you are using. Refer to the documentation of your registrar. See the screenshots below for how I configured it in Namecheap. F5 XC DNS should now be able to answer DNS queries since it is set to be the authoritative DNS. Now, let's do some testing! On my local machine, I will perform a dig on the f5sg.com domain. See below: You can see that on the dig result, the NS for f5sg.com is set to ns1.f5clouddns.com & ns2.f5clouddns.com! I can also resolve sales.f5sg.com! We have successfully implemented BIG-IP as Hidden Primary and F5XC as Authoritative Secondary DNS! Challenges and Considerations Now let's discuss the additional requirements or challenges that we might encounter with this hybrid setup solution: Security: We need to comply with security compliance. Nowadays, there are laws requiring the implementation of DNSSEC (DNS Security Extensions). We need to consider this in the design and implement it without adding complexity. Resiliency: Although F5XC DNS Infrastructure is built to be resilient, we still want a backup plan to failover to the BIG-IP Primary DNS in case of unforeseen events. This process will be manual, as we need to change the NS records at the registrar to promote the hidden BIG-IP Primary DNS as the authoritative NS for the domain once F5XC is unavailable. Synchronization: BIG-IP will not be able to synchronize the GSLB functionality with F5XC because Wide-IP records are non-standard and cannot be transferred as part of zone transfers. Solution to Challenges Now comes the fun part: tackling the challenges we’ve laid out! Fortunately, F5 Distributed Cloud is an API-first platform that enables us to automate configuration. At the same time, we have the power of the BIG-IP platform, where you can run custom scripts that will enable us to integrate it with F5XC through API. Solution to Challenge #1: This is easy. DNSSEC records like RRSIG, DNSKEY, DS, NSEC, and NSEC3 are standardized and can be synchronized as part of a zone transfer. Since BIG-IP DNS is our primary DNS and supports DNSSEC, we can enable it. The records will synchronize to F5XC DNS and still respond with signed records, maintaining the integrity and security of our DNS infrastructure. How do you enable it? Check the last part of the technical article below: https://community.f5.com/kb/technicalarticles/configuring-big-ip-for-zone-transfer-and-dnssec/330359 Solution to Challenge #2: We need to automate failover! But when automating tasks, you need two things: a trigger and an action. In our scenario, our trigger should be the availability of F5XC DNS to resolve DNS queries, and the action should be to change our nameserver to BIG-IP on the domain registrar. If you can create and run a script in BIG-IP, it means you can continuously monitor the health of F5XC DNS, allowing us to determine the trigger. But what about the action to change the domain name server records in the registrar? It's easy—check if it can be configured via API, then the problem is solved! Let's explore using Namecheap as our registrar for this example. We will use the BIG-IP EAV (external) monitor to run the script. If you're unfamiliar with the BIG-IP external monitor and its capabilities, check this out →https://my.f5.com/manage/s/article/K71282813 A dummy pool configured with an external monitor will run at intervals. The attached script is designed to monitor F5XC and check if it can resolve DNS queries. If it cannot, the script will trigger an API call to Namecheap (our domain registrar) to change the nameservers back to BIG-IP DNS. Simultaneously, the script will update the domain's NS records from F5XC to BIG-IP. Step 1: Create an external monitor using the custom script. Refer to article K71282813 how to create the external monitor. See the codeshare link for the sample custom script I used: Namecheap and BIG-IP Integration via API | DevCentral Step 2: Create a dummy pool and attach the custom external monitor Let's do some tests! See the results in the later part of this article. Solution to Challenge #3: We can't use Zone Transfer to synchronize GSLB configurations? No problem! Instead, we'll harness the power of APIs. We can run a custom script in BIG-IP to convert Wide-IP configurations into F5XC DNSLB records via API. Let's see below how we can do this. On BIG-IP DNS, configure the zone records for the domainf5sg.comto delegate the subdomains needed for GSLB. For example, we need to perform GSLB forwww.f5sg.com,we will configure the zone like below: www.f5sg.comCNAMEwww.gslb.f5sg.com gslb.f5sg.com NS ns1.f5clouddns.com On BIG-IP we will create Wide-IP configuration forwww.gslb.f5sg.comwhich should hold the A records. These Wide-IP configurations can be converted by a script to F5XC DNSLB configurations. Check the sample script on this codeshare link: BIG-IP Wide-IP to F5XC DNSLB converter | DevCentral Testing and Result Challenge #2:Failover Testing To simulate the scenario in which F5XC is unable to respond to DNS queries, we designed the script to execute a dig command to F5XC for a TXT record. If F5XC responds with "RESPONSE-OK," no further action is needed. However, if it fails to respond correctly or does not respond at all, the script will trigger a failover action. Scenario 1: When F5XC responds to DNS queries(TXT record value is RESPONSE-OK) Namecheap dashboard shows F5XC nameservers BIG-IP DNS zone records shows F5XC nameservers F5XC zone records shows F5XC nameservers Result when performing dig to resolve sales.f5sg.com -> it shows that F5XC nameservers are Authoritative Scenario: When F5XC doesn't respond to DNS queries (TXT record value is RESPONSE-NOT-OK) We changed the TXT record value to 'RESPONSE-NOT-OK,' which should mark the monitor as down. The dummy pool went down, which means the script inside the monitor detected that the dig result was not what it expected. You can see from the zone records below that the NS records have now changed to GTM (gtm1.f5sg.com and gtm2.f5sg.com) When we check our domain registrar, Namecheap, we can see that the nameservers are now automatically set to BIG-IP GTMs. When I issue a dig command from my workstation, I can see that the nameserver responding to my query isgtm1.f5sg.com Online DNS tools (like MXToolbox) also report that gtm1.f5sg.com is the authoritative NS that responds to the DNS queries for sales.f5sg.com, which resolves to 2.2.2.2 We have now solved one of the challenges by implementing a backup failover plan using custom monitors and automations, made possible by the power of BIG-IP and APIs! Challenge #3:Synchronization Testing Using this script, we can convert and synchronize the BIG-IP Wide-IP configuration to its F5XC equivalent configuration Note: The sample script is limited to handling a Wide-IP with a single GTM pool. Inside the pool is where you will define the IP addresses that you want to load balance. The pool load balancing method is also limited to Round Robin, Ratio, Static Persist, and Global Availability. The script is designed to run at intervals. There are several ways to execute it: you can use external monitors (as we did earlier) or utilize a cronjob, etc. For testing and simplicity, I will use a cronjob set to run every 10 minutes. Let's begin creating our GSLB configuration. If you've configured BIG-IP GTM/DNS before, one of the first objects you need to create is a GTM server. I've configured two Generic Servers representing the application in two different Data Centers. Next is we create aGTM Poolwhich we will associate theVirtual Serverinside the GTM server we created earlier. (i.e. I'm assigning 1.1.1.1 and 2.2.2.2 as the members of the pool) Lastly, we will create theWide-IP recordand attach the GTM Pool we created earlier After this, the script should get triggered and convert this BIG-IP DNS Wide-IP configuration into F5XC DNS configuration. We should see that a new Primary Zone will be created in F5XC (gslb.f5sg.com) When you view the resource records, we should see a DNSLB record which has the record name equivalent to the subdomain of the wide-IP record. (BIG-IP DNS Wide-IP record is www.glsb.f5sg.com, In F5XC DNS zonegslb.f5sg.com, the record name iswwwand pointing toDNSLB object) The load balancing rules should have the DNSLB pool(pool-www) which is the equivalent of theGTM Pool(pool_www) configured in BIG-IP DNS TheDNSLB pool memberswill include the same IP addresses we defined asGTM Pool members in BIG-IP DNS. There are four load balancing methods available in F5XC, and there is an equivalent BIG-IP DNS load balancing method. The script was created to match this methods but if you configure the BIG-IP DNS pool load balancing method to something other than these four, it will default to Round Robin. BIG-IP DNS F5XC DNS Round Robin Round-Robin Ratio Ratio-Member Static Persist Static-Persist Global Availability Priority Based on the results above, we have successfully converted and synchronize BIG-IP DNS Wide-IP configuration into F5XC DNSLB records! Conclusion We have resolved DNS challenges using the power and integration of F5 solutions! By utilizing both BIG-IP and F5XC platforms, which can sign and serve DNSSEC records, we can seamlessly implement DNSSEC in a hybrid setup without complexity. Furthermore, our scalable F5XC Cloud DNS will shield you from myriad DNS DoS attacks, which are continually evolving, especially with the rise of AI. In terms of DNS resiliency, with the power of our API-first platforms and automation, we can create a DNS hybrid solution capable of automatically failing over from Cloud DNS to on-prem DNS. Lastly, we can synchronize the configurations of both platforms using standards like Zone Transfer and APIs. This capability allows us to convert and synchronize GSLB configurations between our on-prem DNS and Cloud DNS, making administration easier, and establishing a single source of truth.295Views2likes0CommentsSecuring and Scaling Hybrid Apps with F5/NGINX (Part 3)
In part 2 of our series, I demonstrated how to configure ZT (Zero Trust) use cases centering around authentication with NGINX Plus in hybrid environments. We deployed NGINX Plus as the external LB to route and authenticate users connecting to my Kubernetes applications. In this article, we explore other areas of the ZT spectrum configurable on the External LB Service, including: Authorization and Access Encryption mTLS Monitoring/Auditing ZT Use case #1: Authorization Many people think that authentication and authorization can be used interchangeably. However, they both mean different things. Authentication involves the process of verifying user identities based on the credentials presented. Even though authenticated users are verified by the system, they do not necessarily have the authority to access protected applications. That is where authorization comes into play. Authorization involves the process of verifying the authority of an identity before granting access to application. Authorization in the context of OIDC authentication involves retrieving claims from user ID tokens and setting conditions to validate whether the user is authorized to enter the system. An authenticated user is granted an ID token from the IdP with specific user information through JWT claims. The configuration of these claims is typically set from the IdP. Revisiting the OIDC auth use case configured in the previous section, we can retrieve the ID tokens of authenticated users from the NGINX key-value store. $ curl -i http://localhost:8010/api/9/http/keyvals/oidc_acess_tokens Then we can view the decoded value of the ID token using jwt.io. Below is an example of decoded payload data from the ID token. { "exp": 1716219261, "iat": 1716219201, "admin": true, "name": "Micash", "zone_info": "America/Los_Angeles" "jti": "9f8ff4bd-4857-4e12-9634-e5876f786f98", "iss": "http://idp.f5lab.com:8080/auth/realms/master", "aud": "account", "typ": "Bearer", "azp": "appworld2024", "nonce": "gMNK3tu06j6tp5-jGa3aRhkj4F0P-Z3e04UfcFeqbes" } NGINX Plus has access to these claims as embedded variables. They are accessed by prefixing $jwt_claim_ to the desired field (for example, $jwt_claim_admin for the admin claim). We can easily set conditions on these claims and block unauthorized users before they even reach the back-end applications. Going back to our frontend.conf file in the previous part of our series. We can set $jwt_flag variable to 0 or 1 based on the value of the admin JWT claim. We then use the jwt_claim_require directive to validate the ID token. ID tokens with admin claims set to false will be rejected. map $jwt_claim_admin $jwt_status { "true" 1; default 0; } server { include conf.d/openid_connect.server_conf; # Authorization code flow and Relying Party processing error_log /var/log/nginx/error.log debug; # Reduce severity level as required listen [::]:443 ssl ipv6only=on; listen 443 ssl; server_name example.work.gd; ssl_certificate /etc/ssl/nginx/default.crt; # self-signed for example only ssl_certificate_key /etc/ssl/nginx/default.key; location / { # This site is protected with OpenID Connect auth_jwt "" token=$session_jwt; error_page 401 = @do_oidc_flow; auth_jwt_key_request /_jwks_uri; # Enable when using URL auth_jwt_require $jwt_status; proxy_pass https://cluster1-https; # The backend site/app } } Note: Authorization with NGINX Plus is not restricted to only JWT tokens. You can technically set conditions on a variety of attributes, such as: Session cookies HTTP headers Source/Destination IP addresses ZT use case #2: Mutual TLS Authentication (mTLS) When it comes to ZT, mTLS is one of the mainstream use cases falling under the Zero Trust umbrella. For example, enterprises are using Service Mesh technologies to stay compliant with ZT standards. This is because Service Mesh technologies aim to secure service to service communication using mTLS. In many ways, mTLS is similar to the OIDC use case we implemented in the previous section. Only here, we are leveraging digital certificates to encrypt and authenticate traffic. This underlying framework is defined by PKI (Public Key Infrastructure). To explain this framework in simple terms we can refer to a simple example; the driver's license you carry in your wallet. Your driver’s license can be used to validate your identity, the same way digital certificates can be used to validate the identity of applications. Similarly, only the state can issue valid driver's licenses, the same way only Certificate Authorities (CAs) can issue valid certificates to applications. It is also important that only the state can issue valid certificates. Therefore, every CA must have a private secure key to sign and issue valid certificates. Configuring mTLS with NGINX can be broken down in two parts: Ingress mTLS; Securing SSL client traffic and validating client certificates against a trusted CA. Egress mTLS; securing SSL upstream traffic and offloading authentication of TLS material to a trusted HTTPS back-end server. Ingress mTLS You can configure ingress mTLS on the NLK deployment by simply referencing the trusted certificate authority adding the ssl_client_certificate directive in the server context. This will configure NGINX to validate client certificates with the referenced CA. Note: If you do not have a CA, you can create one using OpenSSL or Cloudflare PKI and TLS toolkits server { listen 443 ssl; status_zone https://cafe.example.com; server_name cafe.example.com; ssl_certificate /etc/ssl/nginx/default.crt; ssl_certificate_key /etc/ssl/nginx/default.key; ssl_client_certificate /etc/ssl/ca.crt; } Egress mTLS Egress mTLS is a slight alternative to ingress mTLS where NGINX verifies certificates of upstream applications rather than certificates originating from clients. This feature can be enabled by adding the proxy_ssl_trusted_certificate directive to the server context. You can reference the same trusted CA we used for verification when configuring ingress mTLS or reference a different CA. In addition to verifying server certificates, NGINX as a reverse-proxy can pass over certs/keys and offload verification to HTTPS upstream applications. This can be done by adding the proxy_ssl_certificate and proxy_ssl_certificate_key directives in the server context. server { listen 443 ssl; status_zone https://cafe.example.com; server_name cafe.example.com; ssl_certificate /etc/ssl/nginx/default.crt; ssl_certificate_key /etc/ssl/nginx/default.key; #Ingress mTLS ssl_client_certificate /etc/ssl/ca.crt; #Egress mTLS proxy_ssl_certificate /etc/nginx/secrets/default-egress.crt; proxy_ssl_certificate_key /etc/nginx/secrets/default-egress.key; proxy_ssl_trusted_certificate /etc/nginx/secrets/default-egress-ca.crt; } ZT use case #3: Secure Assertion Markup Language (SAML) SAML (Security Assertion Markup Language) is an alternative SSO solution to OIDC. Many organizations may choose between SAML and OIDC depending on requirements and IdPs they currently run in production. SAML requires a SP (Service Provider) to exchange XML messages via HTTP POST binding to a SAML IdP. Once exchanges between the SP and IdP are successful, the user will have session access to the protected backed applications with one set of user credentials. In this section, we will configure NGINX Plus as the SP and enable SAML with the IdP. This will be like how we configured NGINX Plus as the relying party in an OIDC authorization code flow (See ZT Use case #1). Setting up the IdP The one prerequisite is setting up your IdP. In our example, we will set up the Microsoft Entra ID on Azure. You can use the SAML IdP of your choosing. Once the SAML application is created in your IdP, you can access the SSO fields necessary to link your SP (NGINX Plus) to your IdP (Microsoft Entra ID). You will need to edit the basic SAML configuration by clicking on the pencil icon next to Editin Basic SAML Configuration, as seen in the figure above. Add the following values and click Save: Identifier (Entity ID) -- https://fourth.run.place Reply URL (Assertion Consumer Service URL) -- https://fourth.run.place/saml/acs Sign on URL: https://fourth.run.place Logout URL (Optional): https://fourth.run.place/saml/sls Finally download the Certificate (Raw) from Microsoft Entra ID and save it to your NGINX Plus instance. This certificate is used to verify signed SAML assertions received from the IdP. Once the certificate is saved on the NGINX Plus instance, extract the public key from the downloaded certificate and convert it to SPKI format. We will use this certificate later when we configure NGINX Plus in the next section. $ openssl x509 -in demo-nginx.der -outform DER -out demo-nginx.der $ openssl x509 -inform DER -in demo-nginx.der -pubkey -noout > demo-nginx.spki Configuring NGINX Plus as the SAML Service Provider After the IdP is setup, we can configure NGINX Plus as the SP to exchange and validate XML messages with the IdP. Once logged into the NGINX Plus instance, simply clone the nginx SAML GitHub repo. $ git clone https://github.com/nginxinc/nginx-saml.git && cd nginx-saml Copy the config files into the /etc/nginx/conf.d directory. $ cp frontend.conf saml_sp.js saml_sp.server_conf saml_sp_configuration.conf /etc/nginx/conf.d/ Notice that by default, frontend.conf listens on port 8010 with clear text http. You can merge kube_lb.conf into frontend.conf to enable TLS termination and update the upstream context with application endpoints you wish to protect with SAML. Finally we will need to edit the saml_sp_configuration.conf file and update variables in the map context based on the parameters of your SP and IdP: $saml_sp_entity_id; https://fourth.run.place $saml_sp_acs_url; https://fourth.run.place/saml/acs $saml_sp_sign_authn; false $saml_sp_want_signed_response; false $saml_sp_want_signed_assertion; true $saml_sp_want_encrypted_assertion; false $saml_idp_entity_id; Unique identifier that identifies the IdP to the SP. This field is retrieved from your IdP $saml_idp_sso_url; This is the login URL and is also retrieved from the IdP $saml_idp_verification_certificate; Variable referencing the certificate downloaded from the previous section when setting up the IdP. This certificate will verify signed assertions received from the IdP. Use the full directory (/etc/nginx/conf.d/demo-nginx.spki) $saml_sp_slo_url; https://fourth.run.place/saml/sls $saml_idp_slo_url; This is the logout URL retrieved from the IdP $saml_sp_want_signed_slo; true The remaining variables defined in saml_sp_configuration.conf can be left unchanged, unless there is a specific requirement for enabling them. Once the variables are set appropriately, we can reload NGINX Plus. $ nginx -s reload Testing Now we will verify the SAML flow. open your browser and enter https://fourth.run.place in the address bar. This should redirect me to the IDP login page. Once you login with your credentials, I should be granted access to my protected application ZT use case #4: Monitoring/Auditing NGINX logs/metrics can be exported to a variety of 3rd party providers including: Splunk, Prometheus/Grafana, cloud providers (AWS CloudWatch and Azure Monitor Logs), Datadog, ELK stack, and more. You can monitor NGINX metrics and logs natively with NGINX Instance Manager or NGINX SaaS. The NGINX Plus API provides me a lot of flexibility by exporting metrics to any third-party tool that accepts JSON. For example, you can export NGINX Plus API metrics to our native real-time dashboard from part 1. native real-time dashboard from part 1 Whichever tool I chose, monitoring/auditing my data generated from my IT systems is key to understanding and optimizing my applications. Conclusion Cloud providers offer a convenient way to expose Kubernetes Services to the internet. Simply create Kubernetes Service of type: LoadBalancer and external users connect to your services via public entry point. However, cloud load balancers do nothing more than basic TCP/HTTP load balancing. You can configure NGINX Plus with many Zero Trust capabilities as you scale out your environment to multiple clusters in different regions, which is what we will cover in the next part of our series.128Views2likes0CommentsSecurely Scale RAG - Azure OpenAI Service, F5 Distributed Cloud and NetApp
Arguably, the easiest and most massively scalable approach to harnessing Large Language Models (LLMs) is to consume leading services like OpenAI endpoints, the most well-known of cloud-based offering delivered to enterprises over the general Internet. Access to hardware, such as GPUs, and the significant skillset to run LLMs on your own become non-issues, consumption is simply an API call away. One concern, and a serious one, is that sensitive inferencing (AI prompts, both the requests and responses) travels "in the wild" to these LLMs found through DNS at public endpoints. Retrieval Augmented Generation (RAG) adds potentially very sensitive corporate data to prompts, to leverage AI for internal use cases, thus ratcheting up even further the uneasiness with using the general Internet as a conduit to reach LLMs. RAG is a popular method to greatly increase the accuracy and relevancy of generative AI for a company’s unique set of problems. Finally, to leverage sensitive data with RAG, the source documents must be vectorized with similarly remote “embedding” LLMS; once again sensitive, potentially proprietary sensitive data will leave the corporate premises to leverage the large AI solutions like OpenAI or Azure OpenAI. Unlike purveyors of locally executed models, say a repository like Huggingface.com, which allow downloading of binaries to be harnessed on local compute, industry leading solutions like OpenAI and Azure OpenAI Service are founded on the paradigm of remote compute. Beyond the complexity and resources of quickly and correctly setting up performant on-prem models one time, the choice to consume remote endpoints allows hassle-free management like models perpetually updated to latest revisions and full white-glove support available to enterprise customers consuming SaaS AI models. In this article, an approach will be presented where, using F5 Distributed Cloud (XC) and NetApp, Azure OpenAI Service can be leveraged with privacy, where prompts are carried over secured, encrypted tunnels over XC between on-premises enterprise locations and that enterprise’s Azure VNET. The Azure OpenAI models are then exclusively exposed as private endpoints within that VNET, nowhere else in the world. This means both the embedding LLM activity to vectorize sensitive corporate data, and the actual generative AI prompts to harness value from that data are encrypted in flight. All source data and resultant vector databases remain on-premises in well-known solutions like a NetApp ONTAP storage appliance. Why is the Azure OpenAI Service a Practical Enabler of AI Projects? Some of the items that distinguish Azure OpenAI Service include the following: Prompts sent to Azure OpenAI are not forwarded to OpenAI, the service exists within Microsoft Azure, benefiting from the performance of Microsoft’s enormous cloud computing platform Customer prompts are never used for training data to build new or refine existing models Simplified billing, think of the Azure OpenAI Service as analogous to an “all you can eat buffet”, simply harness the AI service and settle the charge incurred on a regular monthly billing cycle With OpenAI, models are exposed at universal endpoints shared by a global audience, added HTTP headers such as the OPENAI_API_KEY value distinguish users and allow billing to occur in accordance with consumption. Azure OpenAI Service is slightly different. No models actually exist to be used until they are setup under an Azure subscription. At this point, beyond receiving an API key to identify the source user, the other major difference is unique API "base" URL (endpoint) is setup for accessing LLMs an organization wishes to use. Examples would be a truly unique enterprise endpoint for GPT-3.5-Turbo, GPT4 or perhaps an embedding LLM used in vectorization, such as the popular text-embedding-ada-002 LLM. This second feature of Azure OpenAI Service presents a powerful opportunity to F5 Distributed Cloud (XC) customers. This stems from the fact that unlike traditional OpenAI, this per-organization, unique base URL for API communications does not have to be projected into the global DNS, reachable from anywhere on the Internet. Instead, Microsoft Azure allows the OpenAI service to be constrained to a private endpoint, accessible only from where the customer chooses. Leveraging F5 XC Multicloud Networking offers a way to secure and encrypt communications between on-premises locations and Azure subnets only available from within the organization. What does this add up to for the enterprise with generative AI projects? It means huge scalability for AI services and consuming the very much leading-edge modern OpenAI models, all in a simple manner an enterprise can now consume today with limited technical onus on corporate technology services. The sense of certainty that sensitive data is not cavalierly exposed on the Internet is a critical cog in the wheel of good data governance. Tap Into Secure Data from NetApp ONTAP Clusters for Fortified Access to OpenAI Models The F5 Distributed Cloud global fabric consists of points of presence in 26+ metropolitan markets worldwide, such as Paris, New York, Singapore, that are interconnected with high-speed links aggregating to more than 14 Tbps of bandwidth in total, it is growing quarterly. With the F5 multicloud networking (MCN) solution, customers can easily set up dual-active encrypted tunnels (IPSec or SSL) to two points on the global fabric. The instances connected to are referred to as RE’s (Regional Edge nodes) and the customer-side sites are made up of CE’s (Customer Edge nodes, scalable from one to a full cluster). The service is a SaaS solution and setup is turn-key based upon menu click-ops or Terraform. The customer sites, beyond being in bricks-and-mortar customer data centers and office locations, can also exist within cloud locations such as Microsoft Azure Resource Groups or AWS VPCs, among others. Enterprise customers with existing bandwidth solutions may choose to directly interconnect sites as opposed to leveraging the high-speed F5 global fabric. The net result of an F5 XC Layer 3 multicloud network is high-speed, encrypted communications between customer sites. By disabling the default network access provided by Azure OpenAI Service, and only allowing private endpoint access, one can instantiate a private approach to running workloads with well-known OpenAI models. With this deployment in place, customers may tap into years of data acquired and stored on trusted on-premises NetApp storage appliances to inject value into AI use cases, customized and enhanced inference results using well-regarded, industry-leading OpenAI models. A perennial industry leader in storage is ONTAP from NetApp, a solution that can safely expose volumes to file systems, through protocols such as NFS and SMB/CIFS. The ability to also expose LUNs, meaning block-level data that constitutes remote disks, is also available using protocols like iSCSI. In the preceding diagram, one can leverage AI through a standard Python approach, in the case shown harnessing an Ubuntu Linux server, and volumes provided by ONTAP. AI jobs, rather than calling out to an Internet-routed Azure OpenAI public endpoint can instead interact with a private endpoint, one which resolves through private DNS to an address on a subnet behind a customer Azure CE node. This endpoint cannot be reached from the Internet, it is restricted to only communicating with customer subnets (routes) located in the L3 multicloud deployment. In use cases that leverage one’s own data, a leading approach is Retrieval Augmented Generation (RAG) in order to empower Large Language Models (LLMs) to deliver niche, hyper-focused responses pertaining to specialized, sometimes proprietary, documents representing the corporate body of knowledge. Simple examples might include highly detailed, potentially confidential, company-specific information distilled from years of financial internal reporting. Another prominent early use case of RAG is to backstop frontline, customer helpdesk employees. With customers sensitive to delays in handling support requests, and pressure to reduce support staff research delays, the OpenAI LLM can harvest only relevant knowledge base (KB) articles, releases notes, and private engineering documents not normally exposed in their entirety. The net result is a much more effective helpdesk experience, with precise, relevant help provided to the support desk employee in seconds. RAG Using Microsoft Azure OpenAI, F5 and NetApp in Nutshell In the sample deployment, one of the more important items to recognize is that two OpenAI models will be harnessed, an embedding LLM and a generative transformer based GPT family LLM. A simple depiction of RAG would be as follows: Using OpenAI Embedding LLMs The OpenAI embedding modeltext-embedding-ada-002 is used first to vectorize data sourced from the on-premises ONTAP system, via NFS volumes mounted to the server hosting Python. The embedding model consumes “chunks” of text from each sourced document and converts the text to numbers, specifically long sequences of numbers, typically in the range of 700 to 1,500 values. These are known as vectors. The vectors returned in the private OpenAI calls are then stored in a vector database, in this case ChromaDB was used. It is important to note, the ChromaDB itself was directed to install itself within a volume supported by the on-premises ONTAP cluster, as such the content at rest is governed by the same security governance as the source content in its native format. Other common industry solutions for vector storage and searches include Milvus and for those looking to cloud-hosted vectors Pinecone. Vector databases are purpose-built to manage vector embeddings. Conventional databases can, in fact, store vectors but the art of doing a semantic search, finding similarities between vectors, would then require vector indices solutions. One of the best known in FAISS (Facebook AI Similarity Search) which is a library that allows developers to quickly search for embeddings of multimedia documents. These semantic searches would otherwise be inefficient or impossible with standard database engines (SQL). When a prompt is first generated by a client, the text in the prompt is vectorized by the very same OpenAI embedding model, producing a vector on the fly. The key to RAG, the “retriever” function, then compares the newly arrived query with semantically similar text chunks in the database. The actual semantic similarity of the query and previously stored chunks is arrived at through a nearest neighbor search of the vectors, in other words, phrases and sentences that might augment the original prompt can be provided to the OpenAI GPT model. The art of finding semantic similarities relies upon comparing the lengthy vectors. The objective, for instance, to find supportive text around the user query “how to nurture shrub growth” might reasonably align more closely with a previously vectorized paragraph that included “gardening tips for the North American spring of 2024” and less so with vectorized content stemming from a user guide for the departmental photocopy machine. The suspected closeness of vectors, are text samples actually similar topic wise, is a feature of semantic similarity search algorithms, many exist in themarketplace and two approaches commonly leveraged are cosine similarity and Euclidean distance; a brief description for those interested can be found here. The source text chunks corresponding to vectors are retained in the database and it is this source text that augments the prompt after the closest neighbor vectors are calculated. Using OpenAI GPT LLMs Generative Pre-trained Transformer (GPT) refers to a family of LLMs created by OpenAI that are built on atransformer architecture. The specific OpenAI model used in this model is not necessarily the latest, premium model, GPT-4o and GPT-4 Turbo are more recent, however the utilized gpt-35-turbo model is a good intersection of price versus performance and has been used extensively in deployed projects. With the retriever function helping to build an augmented prompt, the default use case documented included three text chunks to buttress the original query. The OpenAI prompt response will not only be infused with the provided content extracted from the customer but unlike normal GPT responses, RAG will have specific attributions to which documents and specific paragraphs led to the response. Brief Overview of Microsoft OpenAI Service Setup Microsoft Azure has a long history of adding innovative new functions as subscribed “opt in” service resources, the Azure OpenAI Service is no different. A thorough, step-by-step guide to setting up the OpenAI service can be found here. This screenshot demonstrates the rich variety of OpenAI models available within Azure, specifically showing the Azure OpenAI Studio interface, highlighting models such as gpt-4, gpt-4o and dall-e-3. In this article, two models are added, one embedding and the other GPT. The following OpenAI Service Resource screen shows the necessary information to actually use our two models. This information consists of the keys (use either KEY1 and KEY2, both can be seen and copied with the Show Keys button) and the unique, per customer endpoint path, frequently referred to as the base URL by OpenAI users. Perhaps the key Azure feature that empowers this article is the ability to disable network access to the configured OpenAI model, as seen below. With traditional network access disabled, we can then enable private endpoint access and set the access point to a network interface on the private subnet connected to the inside interface of our F5 Distributed Cloud CE node. The following re-visits the earlier topology diagram, with focus upon where the Azure OpenAI service interacts with our F5 Distributed Cloud multicloud network. The steps involved in setting up an Azure site in F5 Distributed Cloud are found here. The corresponding steps for configuring an on-premises Distributed Cloud site are found in this location. Many options exist, such as using KVM or a bare metal server, the link provided highlights the VMware ESXi approach to on-premises site creation. Demonstrating RAG in Action using OpenAI Models with a Secure Private Endpoint The RAG setup, in lieu of vectorizing actual private and sensitive documents, utilized the OpenAI embedding LLM to process chunks taken from the classic H.G. Wells 1895 science fiction novel “The Time Machine” in text or markdown format. The novel is one of many in the public domain through the Gutenberg Project. Two NFS folders supported by the NetApp ONTAP appliance in a Redmond, Washington office were used: one for source content and one for supporting the ChromaDB vector database. The NFS mounts are seen below, with the Megabytes consumed and remaining available seen per volume, the ONTAP address can be seen as 10.50.0.220. (Linux Host) #df -h 10.50.0.220:/RAG_Source_Documents_2024 1.9M 511M 1% /mnt/rag_source_files 10.50.0.220:/Vectors 17M 803M 3% /home/sgorman/langchain-rag-tutorial-main/chroma2 The creation of the vector database was handled by one Python script and the actual AI prompts generated against the OpenAI gpt-35-turbo model were housed in another script. This may often make sense, as the vector database creation may be an infrequently run script, only executed when new source content is introduced (/mnt/rag_source_files) whereas the generative AI tasks targeting gpt-3.5-turbo are likely run continuously for imperative business needs like helpdesk or code creations, as example purposes. Creating the vector database first entails preparing the source text, typically remove extraneous formatting or less than valuable text fields, think of boilerplate statements such as repetitive footnotes or perhaps copyright/privacy statements that might be found on every single page of some corporate documents. The next step is to create text chunks for embedding, the tradeoff of using too short chunks will be lack of semantic meaning in any one chunk and a growth in the vector count. Using overly long chunks, on the other hand, could lead to lengthy augmented prompts sent to gpt-35-turbo that significantly grow the token count for requests, although many models now support very large token counts a common value remains a total, for requests and responses, of 4,096 tokens. Token counts are the foundation for most billing formulae of endpoint-based AI models. Finally, it is important to have some degree of overlap of generated chunks such that meanings and themes within documents are not lost; if an idea is fragmented at the demarcation point of adjacent chunks the model may not pickup on its importance. The vectorization script for “The Time Machine” resulted in 978 chunks being created from the source text, with character counts per chunk not to exceed 300 characters. The text splitting function is loaded from LangChain and the pertinent code lines include: from langchain.text_splitter import RecursiveCharacterTextSplitter text_splitter = RecursiveCharacterTextSplitter( chunk_size=300, chunk_overlap=100, } The values of 100 characters of overlaps suggests each chunk will incorporate 200 characters of new text within the total of 300 grabbed. It is important to remember all characters, even white space, count towards totals. As per the following screenshot, the source novel, when split into increments of 200 new characters per chunk does indicate 978 chunks were indeed a correct total (double click to expand). With the source data vectorized and secure on the NetApp appliance, the actual use of the gpt-35-turbo OpenAI model could commence. The following shows an example, where the model is instructed in the system prompt to only respond via information it can glean from the RAG augmented prompt text, the response portions shown in red font. python3 create99.py “What is the palace of green porcelain?” <response highlights below, the response also included the full text chunks RAG suggested would potentially support the LLM in answering the posed question> Answer the question based on the above context: What is the palace of green porcelain? Response: content='The Palace of Green Porcelain is a deserted and ruined structure with remaining glass fragments in its windows and corroded metallic framework.' response_metadata={'token_usage': {'completion_tokens': 25, 'prompt_tokens': 175, 'total_tokens': 200}, 'model_name': 'gpt-35-turbo', The response of gpt-35-turbo is correct, and we see that the token consumption is heavily slanted towards the request (the “prompt”), with 175 tokens used whereas the response required only 25 tokens. The key takeaways are that the prompt and its response did not travel hop-by-hop over the Internet to a public endpoint, all traffic traveled with VPN-like security from the on-premises server and ONTAP to a private Azure subnet using F5 Distributed Cloud. The OpenAI model was utilized as a private endpoint, corresponding a network interface available only on that private subnet and not found within the global DNS, only the private corporate DNS or /etc/hosts files. Adding Laser Precision to RAG Using the default chunking strategy did lead to sub-optimal results, when ideas, themes and events were lost across chunk boundaries, even when including some degree of overlap. The following is one example: A key moment in the H.G. Wells book involves the protagonist meeting a character Weena, who provides strange white flowers which are pocketed. Upon returning to the present time, the time traveler relies upon the exotic and foreign look of the white flowers to attempt to prove to friends the veracity of his tale. # python3 query99.py “What did Weena give the Time Traveler?” As captured in the response below, the chunks provided by RAG do not provide all the details, only that something of note was pocketed, but gpt-35-turbo can therefore not return a sufficient answer as the full details are not provided in the augmented prompt. The screenshot shows first the three chunks and at the end the best answer the LLM could provide (double click to expand). The takeaway is that some effort will be required to adjust the vectorization process to pick optimally large chunk sizes, and sufficient numbers to properly empower the OpenAI model. In this demonstration, based upon vectors and their corresponding text, only three text chunks were harnessed to augment the user prompt. By increasing this number to 5 or 10, and increasing each of the chunk sizes, all of course at the expense of token consumption, one would expect more accurate results from the LLM. Summary This article demonstrated a more secure approach to using OpenAI models as a programmatic endpoint service in which proprietary company information can be kept secure by not using the general purpose, insecure Internet to provide prompts for vectorization and general AI inquiries. Instead, an approach was followed where the Azure OpenAI service was deployed as a private endpoint, exclusively available at an address on a private subnet within an enterprise’s Azure subscription, a subnet with no external access. By utilizing F5 Distributed Cloud Multicloud Networking, existing corporate locations and data centers can be connected to that enterprise’s Azure resource groups and private, encrypted communications can take place between these networks, the necessary routing and tunneling technologies are deployed in a turn-key manner without requiring advanced network skillsets. When leveraging NetApp ONTAP as the continued enterprise storage solution, RAG deployments based upon Azure OpenAI service can continue to be managed and secured with well-developed storage administration skills. In this example, ONTAP housed both the source, sensitive enterprise content and the actual vector database resulting from interactions with the Azure OpenAI embedding LLM. Subsequent to a discussion on vectors and optimal chunking strategies, RAG was utilized to answer questions on private documents using the well-known OpenAI chat-35-turbo model.139Views1like0CommentsProtect multi-cloud and Edge Generative AI applications with F5 Distributed Cloud
F5 Distributed Cloud capabilities allows customers to use a single platform for connectivity, application delivery and security of GenAI applications in any cloud location and at the Edge, with a consistent and simplified operational model, a game changer for streamlined operational experience for DevOps, NetOps and SecOps.573Views3likes0CommentsF5 Distributed Cloud - Customer Edge Site - Deployment & Routing Options
F5 Distributed Cloud Customer Edge (CE) software deployment models for scale and routing for enterprises deploying multi-cloud infrastructure. Today's service delivery environments are comprised of multiple clouds in a hybrid cloud environment. How your multi-cloud solution attaches to your existing on-prem and cloud networks can be the difference between a successful overlay fabric, and one that leave you wanting more out of your solution. Learn your options with F5 Distributed Cloud Customer Edge software.9.4KViews17likes3CommentsSecure RAG for Safe AI Deployments Using F5 Distributed Cloud and NetApp ONTAP
Retrieval Augmented Generation (RAG) is one of the most discussed techniques to empower Large Language Models (LLM) to deliver niche, hyper-focused responses pertaining to specialized, sometimes proprietary, bodies of knowledge documents. Two simple examples might include highly detailed company-specific information distilled from years of financial internal reporting from financial controllers or helpdesk type queries with the LLM harvesting only relevant knowledge base (KB) articles, releases notes, and private engineering documents not normally exposed in their entirety. RAG is highly bantered about in numerous good articles; the two principal values are: LLM responses to prompts (queries) based upon specific, niche knowledge as opposed to the general, vast pre-training generic LLMs are taught with; in fact, it is common to instruct LLMs not to answer specifically with any pre-trained knowledge. Only the content “augmenting” the prompt. Attribution is a key deliverable with RAG. Generally LLM pre-trained knowledge inquiries are difficult to traceback to a root source of truth. Prompts augmented with specific assistive knowledge normally solicit responses that clearly call out the source of the answers provided. Why is the Security of RAG Source Content Particularly Important? To maximize the efficacy of LLM solutions in the realm of artificial intelligence (AI) an often-repeated adage is “garbage in, garbage out” which succinctly states an obvious fact with RAG: valuable and actionable items must be entered into the model to expect valuable, tactical outcomes. This means exposing key forms of data, examples being data which might include patented knowledge, intellectual property not to be exposed in raw form to competitors. Actual trade secrets, which will infuse the LLM but need to remain confidential in their native form. In one example around trade secrets, the Government of Canada spells out a series of items courts will look at in determining compensation for misuse (theft) of intellectual property. It is notable that the first item listed is not the cost associated with creation of the secret material (“the cost in money or time of creating or developing the information”) but rather the very first item is instead how much effort was made to keep the content secure (“the measures taken to maintain secrecy”). With RAG, incoming queries are augmented with rich, semantically similar enterprise content. The content has already been populated into a vector database by converting documents, they might be pdf or docx as examples, into raw text form and converting chunks of text into vectors. The vectors are long sequences of numbers with similar mathematical attributes for similar content. As a trivial example, one-word chunks such as glass, cup, bucket, jar might be semantically related, meaning similarities can be construed by both human minds and LLMs. On the other hand, empathy, joy, and thoughtfulness maintain similarities of their own. This semantic approach means a phrase/sentence/paragraph (chunk) using bow to mean “to bend in respect” will be highly distinct from chunks referring to the “front end of a ship" or “something to tie one’s hair back with”, even a tool every violinist would need. The list goes on; all semantic meanings of bow are very different in these chunks and would have distinctive embeddings within a vector database. The word embedding is likely derived from “fixing” or “planting” an object. In this case, words are “embedded” into a contextual understanding. The typical length of the number sequence describing the meaning of items has typically been more than 700, but this number of “dimensions” applied is always a matter of research, and the entire vector database is arrived at with an embedding LLM, distinct from the main LLM that will produce generative AI responses to our queries. Incoming queries destined for the main generative AI LLM can, in turn, be converted to vectors themselves by the very same text-embedding “helper” LLM and through retrieval (the “R” in RAG) similar textual content can buttress the prompt presented to the main LLM (double click to expand). Since a critical cog in the wheel of the RAG architecture is the ingestion of valuable and sensitive source documents into the vector database, using the embedding LLM, it is not just prudent but critical that this source content be brought securely over networks to the embedding engine. F5 Distributed Cloud Secure Multicloud Networking and NetApp ONTAP For many practical, time-to-market reasons, modern LLMs, both the main and embedding instances, may not be collocated with the data vaults of modern enterprises. LLMs benefit from cloud compute and GPU access, something often in short supply for on-premises production roll outs. A typical approach assisted by the economies of scale might be to harvest public cloud providers, such as Azure, AWS, and Google Cloud Platform for the compute side of AI projects. Azure, as one example, can turn up virtual machines with GPUs from NVIDIA like A100, A2, and Tesla T4 to name a few. The documents needed to feed an effective RAG solution may well be on-premises, and this is unlikely to change for reasons including governance, regulatory, and the weight of decades of sound security practice. One of the leading on-premises storage solutions of the last 25 years is the NetApp ONTAP storage appliance family, and reflected in this quote from NVIDIA: "Nearly half of the files in the world are stored on-prem on NetApp." — Jensen Huang, CEO of NVIDIA A key deliverable of F5 Distributed Cloud is providing encrypted interconnectivity of disparate physical sites and heterogeneous cloud instances such as Azure VNETs or AWS VPCs. As such, there are two immediate, concurrent F5 features that come to mind: Secure interconnectivity of on-premises NetApp volumes (NAS) or LUNs (Block) containing critical documents for ingestion into RAG. Utilize encrypted L3 connectivity between the enterprise location and the cloud instance where the LLM/RAG are instantiated. TCP load balancers are another alternative for volume sharing NAS protocols like NFS or SMB/CIFS. Secure access to the LLM web interface or RESTful API end points, with HTTPS load balancers including key features like WAF, anti-bot mechanisms, and API automatic rate limiting for abusive prompt sources. The following diagram presents the topology this article set out to create, REs are “regional edge” sites maintained internationally by F5 and harness private RE to RE, high-speed global communication links. DNS names, such as the target name of an LLM service, will leverage mappings to anycast IP addresses, thus users entering the RE network from southeast Asian might, for example, enter the Singapore RE while users in Switzerland might enter via a Paris or Frankfurt RE. Complementing the REs are Customer Edge (CE) nodes. These are virtual or physical appliances which act as security demarcation points. For instance, a CE placed in an Azure VNET can protect access to the server supporting the LLM, removing any need for Internet access to the server, which is now entirely accessible only through a private RFC-1918 type of private address. External access to the LLM for just employees or, maybe employees and contractors, or potentially access for the Internet community is enabled by a distributed HTTPS load balancer. In the example depicted above, oriented towards full Internet access, the FQDN of the LLM is projected by the load balancer into the global DNS and consumers of the service resolve the name to one IP address and are attracted to the closest RE by BGP-4’s support for anycast. As the name “distributed” load balancer suggests, the origin pool can be in an entirely different site than the incoming RE, in this case the origin pool is the LLM behind the CE in the Azure VNET. The LLM requests travel from RE to CE via a highspeed networking underlay. The portion of the solution that securely ties the LLM to the source content required for RAG to embed vectors is, in this case, utilizing layer 3 multicloud networking (MCN). The solution is turnkey, routing table are automatically connected to members of the L3 MCN, in this case the inside interfaces of the Azure CE and Redmond, Washington on-premises CE and traffic flows over an encrypted underlay network. As such, the NetApp ONTAP cluster can securely expose volumes with key file ware via a protocol like Network File System (NFS), no risk of data exposure to third-party prying eyes exists. The following diagram drills into the RE and CE and NetApp interplay (double click to expand). F5 Distributed Cloud App Connect and LLM Setup This article speaks to hands-on experience with web-driven LLM inferencing with augmented prompts derived from a RAG implementation. The AI compute was instantiated on an Azure-hosted Ubuntu 20.04 virtual machine with 4 virtual cores. Installed software included Python 3.10, and libraries such as Langchain, Pypdf (for converting pdf documents to text), FAISS (for similarity searching via a vector database), and other libraries. The actual open source LLM utilized for the generative AI is found here on huggingface.co. The binary, which exceeds 4 GB, is considered effective for CPU-based deployments. The embedding LLM model, critical to seed the vector database with entries derived from secured enterprise documentation, and then used again per incoming query for RAG similarity searches to build augmented prompts, was from Hugging Face: sentence-transformers/all-MiniLM-L6-v2 and can be found here. The AI RAG solution was implemented in Python3, and as such the Azure Ubuntu can be accessed both by SSH or via Jupyter Notebooks. The latter was utilized as this is the preferred final delivery mechanism for standard users, not a web chatbot design or the requirement to use API commands through solutions like Postman or Curl. This design choice, to steer the user experience towards Jupyter Notebook consumption, is in keeping with the fact that it has become a standard in AI LLM usage where the LLM is tactical and vital to an enterprise's lines of business (LOBs). Jupyter Notebooks are web-accessed with a browser like Chrome or Edge and as such, F5’s WAF, anti-bot, and L7 DDoS, all part of the F5 WAAP offering, can easily be laid upon an HTTP load balancer with a few mouse clicks in XC to provide premium security to the user experience. NetApp and F5 Distributed Cloud Secure Multicloud Networking The secure access to files for ingestion into the vector database, for similarity searches when user queries are received, makes use of an encrypted L3 Multicloud Network relationship between the Azure VNET and the LAN on prem in Redmond, Washington hosting the NetApp ONTAP cluster. The specific protocol chosen was NFS and the simplicity is demonstrated by the use of just one Linux command to present key, high-valued documents for the AI to populate the database: #mount -t nfs <IP Address of NetAPP LIF interface on-prem>:/Secure_docs_for_RAG /home/ubuntu_restriced_user/rag_project/docs/Secure_docs_for_RAG. This address is available nowhere else in the world except behind this F5 CE in the Azure VNET. After the pdf files are converted to text, chunked to reasonable sizes with some overlap suggested between the end of one chunk and the start of the next chunk, the embedding LLM will populate the vector database. The files are always only accessed remotely by NFS through the mounted volume, and this mount may be terminated until new documents are ready to be added to the solution. The Objective RAG Implementation - Described In order to have a reasonable facsimile of the real-word use cases this solution will empower today, but not having any sensitive documents to be injected, it was decided to use some seminal “Internet Boom”-era IETF Requests for Comment (RFCs) as source content. With the rise of multi-port routing and switching devices, it became apparent the industry badly needed specific and highly precise definitions around network device (router and switch) performance benchmarking to allow purchasers “apples-to-apples” comparisons. These documents recommend testing parameters, such as what frame or packet sizes to test with, test iteration time lengths, when to use FIFO vs LIFO vs LILO definitions of latency, etc. RFC-1242 (Request for Comment, terminology) and RFC-2544 (methodologies), chaired by Scott Bradner of Harvard University, and the later RFC 2285 (LAN switching terminologies), chaired by Bob Mandeville then of European Network Laboratories are three prominent examples, to which test and measurement solutions aspired to be compliant. Detailed LLM answers for quality assurance engineers in the network equipment manufacturing (NEM) space is the intended use case of the design, answers that must be distilled specifically by generative AI considering queries augmented by RAG and specifically only based upon these industry-approved documents. These documents are, of course, not containing trade secrets or patented engineering designs. They are in fact publicly available from the IETF, however they are nicely representative of the value offered in sensitive environments. Validating RAG – Watching the Context Provided to the LLM To ensure RAG was working, the content being augmented in the prompt was displayed to screen, we would expect to see relevant clauses and sentences from the RFCs being provided to the generative AI LLM. Also, if we were to start by asking questions that were outside the purview of this testing/benchmarking topic, we should see the LLM struggle to provide users a meaningful answer. To achieve this, rather than, say, asking what 802.3/Ethernetv2 frame sizes should be used in throughput measurements, and what precisely is the industry standard definition of the term “throughput” was, the question instead pertained to a recent Netflix release, featuring Lindsay Lohan. Due to the recency of the film, even if the LLM leaned upon its pretrained model, it will come up with nothing meaningful. “Question: Important, only use information provided as context in the prompt, do not use other trained knowledge. Please identify who played Heather in the March 2024 Lindsay Lohan Netflix movie titled Irish Wish?” As seen in the following Jupyter screenshot, the RAG solution can only provide augmented prompts from the database, in this case it has some test and measurement clauses and some rules pertaining to the winter ice sport of curling (double click to enlarge). The supportive context data augmenting the prompt was, as expected, not going to help in this off-the-wall query, as only the RFCs and some sports rules had been introduced to the vector database. What is especially noteworthy is the LLM did not simply admit it could not possibly answer this question but hallucinated a very articulate response that indicated “Heather played for her team against Lindsay Lohan’s Team Ireland”. To validate the RAG architecture for a working scenario, actually pertinent to the movie, a film review in pdf format was added to the ONTAP cluster-housed volume. The supportive documentation was consumed (using Pypdf from the Jupyter Notebook it became text). It was automatically chunked and introduced to the vector database through the Hugging Face embedding LLM. With the RAG vector database populated by the embedding LLM, the responses then became meaningful. The following exchange shows the value of RAG in producing meaningful, thought-out answers: Question: Important, only use information provided as context in the prompt, do not use other pre-trained knowledge. Who does Maddie edit novels for in the Lindsay Lohan March 2024 Netflix movie titled Irish Wish? Answer: Let's think step by step. We know that Maddie Kelly plays a book editor who harbors a big secret, and we have been given information about her job as an editor for Paul Kennedy (played by Alexander Vlahos), which is also part of the question context provided above. Therefore it seems logical to infer that she edits novels in general but specifically those written or co-written by author Paul Kennedy - who happens to be a bestselling romance novelist as stated earlier on, so we can safely conclude Maddie works for Paul himself (as opposed to any other book publishing company). This time we see relevant, meaningful context provided by RAG, with relevant portions of the pdf file review document augmenting the prompt. The augmentations are normally not displayed unless requested by the user or programmer, unlike humans, LLMs do not require superfluous text formatting (advantage: LLMs) and thus the content is packed tightly and efficiently. Fewer characters also mean fewer tokens get used by the targeted AI model and can allow more data before a token context limit is reached. In pay-for-use LLM approaches fewer tokens also help the enterprise’s bottom line financially. Also, note that the answer will likely not always be identical with subsequent asks of the same question as per LLM normal behavior. Features like “temperature setting” can also allow more “creative” ideas in responses, injecting humor and even outlandishness if desired. The RAG workflow is now validated, but the LLMs in question (embedding and main generative LLM) can still be made better with these suggestions: Increase “chunk” sizes so ideas are not lost when excessive breaks make for short chunks. Increase “overlap” so an idea/concept is not lost at the demarcation point of two chunks. Most importantly, provide more context from the vector database as context lengths (maximum tokens in a request/response) are generally increasing in size. Llama2, for instance, typically has a 4,096 context length but can now be used with larger values, such as 32,768. This article used only 3 augmentations to the user query, better results could be attained by increasing this value at a potential cost of more CPU cycles. Using Secure RAG – F5 L3 MCN, HTTPS Load Balancers and NetApp ONTAP Together With the RAG architecture validated to be working, the solution was used to assist the target user entering queries to the Azure server by means of Jupyter Notebooks, with RAG documents ingested over encrypted, private networking to the on-premises ONTAP cluster NFS volumes. The questions posed, which are answerable by reading and understanding key portions spread throughout the Scott Bradner RFCs, was: “Important, only use information provided as context in the prompt, do not use other pre-trained knowledge. Please explain the specific definition of throughput? What 802.3 frame sizes should be used for benchmarking? How long should each test iteration last? If you cannot answer the questions exclusively with the details included in the prompt, simply say you are unable to answer the question accurately. Thank you." The Jupyter Notebook representation of this query, which is made in the Python language and issued from the user’s local browser anywhere in the world and directly against the Azure-hosted LLM, looks like the following (click to expand image): The next screenshot demonstrates the result, based upon the provided secure documents (double click to expand). The response is decent, however, the fact that it is clearly using the provided augmentations to the prompt, that is the key objective of this article. The accuracy of the response can be questionable in some areas, the Bradner RFCs highlighted the importance of 64-byte 802.3/Ethernetv2 frame sizes in testing, as line rate forwarding with this minimum size produces the highest theoretically possible frame per second load. In the era of software driven forwarding in switches and routers this was very demanding. Sixty-four byte frames result in 14,881 fps (frames per second) for 10BaseT, 148,809 fps for 100BaseT, 1.48 million fps for Gigabit Ethernet. These values were frequently more aspirational in earlier times and also a frequent metric used in network equipment purchasing cycles. Suspiciously, the LLM response calls out 64kB in 802.3 testing, not 64B, something which seems to be an error. Again, with this architecture, the actual LLM providing the generative AI responses is increasingly viewed as a commodity, alternative LLMs can be plugged quickly and easily into the RAG approach of this Jupyter Notebook. The end user, and thus the enterprise itself, is empowered to utilize different LLMs, purchased or open-source from sites like Hugging Face, to determine optimal results. The other key change that can affect the overall accuracy of results is to experiment with different embedding models. In fact, there are on-line “leader” boards strictly for embedding LLMs so one can quickly swap in and out various popular embedding LLMs to see the impact on results. Summary and Conclusions on F5 and NetApp as Enablers for Secure RAG This article demonstrated an approach to AI usage that leveraged the compute and GPU availability that can be found today within cloud providers such as Azure. To safely access such an AI platform for a production-grade enterprise requirement, F5 Distributed Cloud (XC) provided HTTPS load balancers to connect worker browsers to a Jupyter Notebook service on the AI platform, this service applies advanced security upon the traffic within the XC, from WAF to anti-bot to L3/L7 DDOS protections. Utilizing secure Multicloud Networking (MCN), F5 provided a private L3 connectivity service between the inside interface on an Azure VNET-based CE (customer edge) node and the inside interface of an on-premises CE node in a building in Redmond, Washington. This secure network facilitated an NFS remote volume, content on spindles/flash in on-premises NetApp ONTAP to be remotely mounted on the Azure server. This secure file access provided peace of mind to exposing potentially critical and private materials from NetApp ONTAP volumes to the AI offering. RAG was configured and files were ingested, populating a vector database within the Azure server, that allowed details, ideas, and recommendations to be harnessed by a generative AI LLM by augmenting user prompts with text gleaned from the vector database. Simple examples were used to first demonstrate that RAG was working by posing queries that should not have been addressed by the loaded secure content; such a query was not suitably answered as expected. The feeding of meaningful content from ONTAP was then demonstrated to unleash the potential of AI to address queries based upon meaningful .pdf files. Opportunities to improve results by swapping in and out the main generative AI model, as well as the embedding model, were also considered.523Views2likes0CommentsDistributed Cloud Support for NAS Migrations from On-Premises Approaches to Azure NetApp Files
F5 Distributed Cloud (XC) Secure Multicloud Networking (MCN) connects and secures distributed applications across offices, data centers, and various cloud platforms. Frequently the technology is web-based, meaning traffic is often carried on ports like TCP port 443, however other traffic types are also prevalent in an enterprise’s traffic mix. Examples include SSH or relational database protocols. One major component of networked traffic is Network-Attached Storage (NAS), a protocol in the past frequently carried over LANs between employees in offices and co-located NAS appliances, perhaps in wiring closets or server rooms. An example of such an appliance would be the ONTAP family from NetApp which can take on physical or virtual form factors. NAS protocols are particularly useful as they integrate file stores into operating systems such as Microsoft Windows or Linux distributions as directories, mounted for easy access to files at any time, often permanently. This contrasts with SSH file transfers, which are often ephemeral actions and not so tightly integral to host operating system health. With the rise of remote work, often the NAS appliances see increasing file reads-and-writes to these directories, traversing wide-area links. In fact, one study analyzing fundamental traffic changes due to the Covid-19 pandemic saw a 22 percent increase in file transfer protocol (FTP) in a single year, suggesting access to files has undergone significant foundational changes in recent years. Distributed Cloud and the Movement towards Centralized Enterprise Storage A traditional concern about serving NAS files to offices from a centralized point, such as a cloud-instantiated file repository, is latency and reliability. With F5’s Distributed Cloud offering a 12 Tbps aggregate backbone and dedicated RE-to-RE links, the behavior of the network component is both highly durable and performant. The efficiencies of a centralized corporate file distribution point, with the required 9’s of guaranteed uptime of modern cloud services, and the logic of moving towards cloud-served NAS solutions makes a lot of sense. With on-premises storage appliances replaced by a secure, networked service eliminates the need to maintain costly spares, which are effectively a shadow NAS appliance infrastructure and onerous RMA procedures. All of this enables accomplishing the goal of shrinking/greening office wiring closets. To demonstrate this centralized model for a NAS architecture, a configuration was created whereby a west coast simulated office was connected by F5 Distributed Cloud to Azure NetApp Files (ANF) instantiated in Azure East-2 region. ANF is Microsoft Azure’s newest native file serving solution, managed by NetApp, with data throughputs that increase in lock step with the amount of reserved storage pool capacity. Different quality of service (QoS) levels are selectable by the consumer. In the streamlined ANF configuration workflow, where various transaction latency thresholds may be requested, even the most demanding relational database operations are typically accommodated. Microsoft offers additional details on ANF here, however, this article should serve to sufficiently demonstrate the ANF and F5 Distributed Cloud Secure MCN solutions for most readers. Distributed Cloud and Azure NetApp Files Deployment Example NAS in the enterprise today largely involves use of either NFS or SMB protocols, both of which can be used within Windows and Linux environments and make remote directories appear and perform as if local to users. In our example, a western US point of presence was leveraged to serve as the simulated remote office and standard Linux hosts to serve as the consumers of NetApp volumes. In the east, a corporate VNET was deployed in an Azure resource group (RG) in US-East-2, with one subnet delegated to provide Azure NetApp Files (ANF). To securely connect the west coast office to the eastern Azure ANF service, F5 Distributed Cloud Secure MCN was utilized to create a Layer 3 multi-cloud network offering. This is achieved by easily dropping an F5 customer edge (CE) virtual appliance into both the office and the Azure VNET in the east. The CE is a 2-port security appliance. The inside interfaces on both CEs were attached to a global virtual network, and exclusive layer-3 associations to allow simple connectivity and fully preserve privacy. In keeping with the promise of SaaS, Distributed Cloud users require no routing protocol setup. The solution takes care of the control plane, including routing and encryption. This concept could be scaled to hundreds of offices, if equipped with CEs, and easily attached to the same global virtual network. CEs, at boot-up, automatically attach via IP Sec (or SSL) tunnels to geographically close F5 backbone nodes, called regional edge (RE) sites. Like tunnel establishment, routing tables are updated under-the-hood to allow for a turn-key security relationship between Azure NetApp File volumes and consuming offices. The setup is depicted as follows: Setup Azure NetApp Files (ANF) Volumes in Minutes To put the centralized approach to offering NAS volumes for remote offices or locations into practice, a series of quick steps are undertaken, which can all be done through the standard Microsoft Azure portal. The four steps are listed below, with screenshots provided for key points in the brief process: If not starting from an existing Resource Group (RG), create a new RG and add an Azure VNET to it. Delegate one subnet in your VNET to support ANF. Under “Delegate Subnet to a Service” select from the pull-down-list the entry “Microsoft.NetApp/volumes”. Within the Resource Group, choose “Create” and make a NetApp account. This will appear in the Azure Marketplace listings as “Azure NetApp Files”. In your NetApp account, under “Storage service” create a capacity pool. The pool should be sized appropriately, larger is typically better, since numerous volumes, supporting your choice of NFS3/4 and SMB protocols, will be created from this single, large disk pool. Create your first volume, select size, NAS protocols to support, and QoS parameters that meet your business requirements. As seen below, when adding a capacity pool simply follow the numerical sequence to add your pool, with a newly created sample 2 TiB pool highlighted; 1,024 TiB (1 PiB) are possible (click image to enlarge). Interestingly, the capacity pool shown is the “Standard” service level, as opposed to “Premium” and “Ultra”. With QoS type of Auto selected, Azure NetApp Files provides increasing throughput in terms of megabytes per second as the number of TiB in the pool increases. The throughput also increases with service levels; for standard, as shown, 8 megabytes per second per TiB will be allocated. Beyond throughput, ANF also provides the lowest latency averages for reads and writes in the Azure portfolio of storage offerings. As such, ANF is a very good fit for database deployments that must see constrained, average latency for mission-critical transactions. Deeper discussion around ANF service levels may be explored through the Microsoft document here. The next screenshot shows the simple click-through sequence for adding a volume to the capacity pool, simply click on volumes and the “+Add volume” button. A resulting sample volume is displayed in the figure with key parameters highlighted. In the above volume (“f5-distributed-cloud-vol-001”) the NAS protocol selected was NFSv3 and the size of the volume (“Quota”) was set to 100GiB. Setup F5 Distributed Cloud Office-to-Azure Connectivity To access the volume in a secured and highly responsive manner, from corporate headquarters, remote offices or existing data centers, three items from F5 Distributed Cloud are required: A customer edge (CE) node, normally with 2-ports, must be deployed in the Azure RG VNET. This establishes the Azure instance as a “site” within the Distributed Cloud dashboard. Hub and spoke architectures may also be used if required, where VNET peering can also allow the secure multi-cloud network (MCN) solution to operate seamlessly. A CE is deployed at a remote office or datacenter, where file storage services are required by various lines of business. The CE is frequently deployed as a virtual appliance or installed on a bare metal server and typically has 2-ports. To instantiate a layer-3 MCN service, the inside ports of the two CEs are “joined” to a virtual global network created by the enterprise in the Distributed Cloud console, although REST API and Terraform are also deployment options. By having each inside port of the Azure and office CE’s joined to the same virtual network, the “inside” subnets can now communicate with each other, securely, with traffic normally exchanged over encrypted high-speed IPSec tunnels into the F5 XC global fabric. The following screenshot demonstrates adding the Azure CE inside interface to a global virtual network, allowing MCN connectivity to remote office clients requiring access to volumes. Further restrictions, to prevent unauthorized clients, are found within NAS protocols themselves, such as simple Export policies in NFS and ACL rules in SMB/CIFS, which can be configured quickly within ANF. Remote Office Access – Establish Read/Write File Access to Azure ANF over F5 Distributed Cloud With both ANF configured and F5 Distributed Cloud now providing a layer-3 muticloud network (MCN) solution, to patch enterprise offices to the centralized storage, some confirmation of the solution working as expected was desired. First off, a choice in protocols was made. When configuring ANF, the normal choices for access are NFSv3/v4 or SMB/CIFS or both protocols concurrently. Historically, Microsoft hosts made use of SMB/CIFS and Linux/Unix hosts preferred NFS, however today both protocols are used throughout enterprises. One example being long-time SAMBA server (SMB/CIFS) support in the world of Linux. Azure NetApp Files will provide all the necessary command samples to get hosts connected without difficulty. For instance, to mount the volume to a folder off the Linux user home directory, such as the sample folder “f5-distributed-cloud-vol-001”, per the ANF suggestion the following one command will connect the office Linux host to the central storage in Azure-East-2: sudo mount -t nfs -o rw,hard,rsize=262144,wsize=262144,vers=3,tcp 10.0.9.4:/f5-distributed-cloud-vol-001 f5-distributed-cloud-vol-001 At this point the volume is available for day-to-day tasks, including read and write operations, as if the NAS solution were local to the office, often literally down the hallway. Remote Office Access - Demonstration of Azure ANF over F5 Distributed Cloud in Action To repeatedly exercise file writes from a west coast US office to an east coast ANF deployment in Azure-East-2 (Richmond, Virgina) a simple shell script was used to perpetually write a file to a volume, delete it, and repeat over time. The following sample wrote a file of 20,000 bytes to the ANF service, waited a few seconds, and then removed the file before beginning another cycle. At the lowest common denominator, packet analysis for the ensuing traffic from the western US office will indicate both network and application latency sample values. As depicted in the following Wireshark trace, the TCP response to a transmitted segment carrying an NFS command, was observed to be just 74.5 milliseconds. This prompt round-trip latency for a cross-continent data plane suggests a performant Distributed Cloud MCN service level. This is easily seen as the offset from the reference timestamp (time equal to zero) of the NFS v3 Create Call. Click on image to expand. Analyzing the NAS response from ANF (packet 185) arrives less than 1 millisecond later, suggesting a very responsive, well-tuned NFS control plane offered by ANF. To measure the actual, write-time of a file from west coast to east coast, the following trace demonstrates the 20,000 byte file write exercise from the shell script. In this case, the TCP segments making up the file, specifically the large packet body lengths called out in the screenshot, are delivered efficiently without TCP retransmissions, TCP zero window events, nor having any indicators of layer 3 and 4 health concerns. The entirety of the write is measured at the packet layer to take only 150.8 milliseconds. Since packet-level analysis is not the most turnkey, easy method to monitor file read and write performance, a set of Linux and Windows utilities can also be leveraged. The Linux utility nfsiostat was concurrently used with the test file writes and produced similar, good latency measurements. Nfsiostat monitoring of the file write testing, from west coast to east coast, for the 20,000-byte file, has indicated an average write time to ANF of 151 milliseconds. The measurements presented here are simply observational, to present rapid, digestible techniques for readers interested in service assurance for running ANF over an XC L3 MCN offering. For more rigorous monitoring treatments, Microsoft provides guidance on performing one’s own measurements of Azure NetApp Files here. Summary As enterprise-class customers continue to rapidly look towards cloud for compute performance, GPU access, and economies-of-scale savings for key workloads, the benefits of a centralized, scalable storage counterpart to this story exists. F5 Distributed Cloud offers the reach and performance levels to securely tie existing offices and data centers to cloud-native storage solutions. One example of this approach to modernize storage was covered in this article, the turn-key ability to begin transitioning from traditionally on-premises NAS appliances to cloud-native scalable volumes. The Azure NetApp Files approach to serving read/write volumes allows modern hosts, including Windows and Linux distributions, to utilize virtually unlimited folder sizes with service levels adjustable to business needs.127Views0likes0Comments