Active/Active load balancing examples with F5 BIG-IP and Azure load balancer
Background A couple years ago Iwrote an article about some practical considerations using Azure Load Balancer. Over time it's been used by customers, so I thought to add a further article that specifically discusses Active/Active load balancing options. I'll use Azure's standard load balancer as an example, but you can apply this to other cloud providers. In fact, the customer I helped most recently with this very question was running in Google Cloud. This article focuses on using standard TCP load balancers in the cloud. Why Active/Active? Most customers run 2x BIG-IP's in an Active/Standby cluster on-premises, and it's extremely common to do the same in public cloud. Since simplicity and supportability are key to successful migration projects, often it's best to stick with architectures you know and can support. However, if you are confident in your cloud engineering skills or if you want more than 2x BIG-IP's processing traffic, you may consider running them all Active. Of course, if your totalthroughput for N number of BIG-IP's exceeds the throughput thatN-1 can support, the loss of a single VM will leave you with more traffic than the remaining device(s) can handle. I recommend choosing Active/Active only if you're confident in your purpose and skillset. Let's define Active/Active Sometimes this term is used with ambiguity. I'll cover three approaches using Azure load balancer, each slightly different: multiple standalone devices Sync-Only group using Traffic Group None Sync-Failover group using Traffic Group None Each of these will use a standard TCP cloud load balancer. This article does not cover other ways to run multiple Active devices, which I've outlined at the end for completeness. Multiple standalone appliances This is a straightforward approach and an ideal target for cloud architectures. When multiple devices each receive and process traffic independently, the overhead work of disaggregating traffic to spread between the devices can be done by other solutions, like a cloud load balancer. (Other out-of-scope solutions could be ECMP, BGP, DNS load balancing, or gateway load balancers). Scaling out horizontally can be a matter of simple automation and there is no cluster configuration to maintain. The only limit to the number of BIG-IP's will be any limits of the cloud load balancer. The main disadvantage to this approach is the fear of misconfiguration by human operators. Often a customer is not confident that they can configure two separate devices consistently over time. This is why automation for configuration management is ideal. In the real world, it's also a reason customers consider our next approach. Clustering with a sync-only group A Sync-Only device group allows us to sync some configuration data between devices, but not fail over configuration objects in floating traffic groups between devices, as we would in a Sync-Failover group. With this approach, we can sync traffic objects between devices, assign them to Traffic Group None, and both devices will be considered Active. Both devices will process traffic, but changes only need to be made to a single device in the group. In the example pictured above: The 2x BIG-IP devices are in a Sync-Only group called syncGroup /Common partition isnotsynced between devices /app1 partition issynced between devices the /app1 partition has Traffic Group None selected the /app1 partition has the Sync-Only group syncGroup selected Both devices are Active and will process traffic received on Traffic Group None The disadvantage to this approach is that you can create an invalid configuration by referring to objects that are not synced. For example, if Nodes are created in/Common, they will exist on the device on which they were created, but not on other devices. If a Pool in /app1 then references Nodes from /Common, the resulting configuration will be invalid for devices that do not have these Nodes configured. Another consideration is that an operator must use and understand partitions. These are simple and should be embraced. However, not all customers understand the use of partitions and many prefer to use /Common only, if possible. The big advantage here is that changes only need to be made on a single device, and they will be replicated to other devices (up to 32 devices in a Sync-Only group). The risk of inconsistent configuration due to human error is reduced. Each device has a small green "Active" icon in the top left hand of the console, reminding operators that each device is Active and will process incoming traffic onTraffic Group None. Failover clustering using Traffic Group None Our third approach is very similar to our second approach. However, instead of a Sync-Only group, we will use a Sync-Failover group. A Sync-Failover group will sync all traffic objects in the default /Common partition, allowing us to keep all traffic objects in the default partition and avoid the use of additional partitions. This creates a traditional Active/Standby pair for a failover traffic group, and a Standby device will not respond to data plane traffic. So how do we make this Active/Active? When we create our VIPs in Traffic Group None, all devices will process traffic received on these Virtual Servers. One device will show "Active" and the other "Standby" in their console, but this is only the status for the floating traffic group. We don't need to use the floating traffic group, and by using Traffic Group None we have an Active/Active configuration in terms of traffic flow. The advantage here is similar to the previous example: human operators only need to configure objects in a single device, and all changes are synced between device group members (up to 8 in a Sync-Failover group). Another advantage is that you can use the/Common partition, which was not possible with the previous example. The main disadvantage here is that the console will show the word "Active" and "Standby" on devices, and this can confuse an operator that is familiar only with Active/Standby clusters using traffic groups for failover. While this third approach is a very legitimate approach and technically sound, it's worth considering if your daily operations and support teams have the knowledge to support this. Other considerations Source NAT (SNAT) It is almost always a requirement that you SNAT traffic when using Active/Active architecture, and this especially applies to the public cloud, where our options for other networking tricks are limited. If you have a requirement to see true source IPandneed to use multiple devices in Active/Active fashion, consider using Azure or AWS Gateway Load Balancer options. Alternative solutions like NGINX and F5 Distributed Cloud may also be worth considering in high-value, hard-requirement situations. Alternatives to a cloud load balancer This article is not referring to F5 with Azure Gateway Load Balancer, or to F5 with AWS Gateway Load Balancer. Those gateway load balancer solutions are another way for customers to run appliances as multiple standalone devices in the cloud. However, they typically requirerouting, not proxying the traffic (ie, they don't allow destination NAT, which many customers intend with BIG-IP). This article is also not referring to other ways you might achieve Active/Active architectures, such as DNS-based high availability, or using routing protocols, like BGP or ECMP. Note that using multiple traffic groups to achieve Active/Active BIG-IP's - the traditional approach on-prem or in private cloud - is not practical in public cloud, as briefly outlined below. Failover of traffic groups with Cloud Failover Extension (CFE) One option for Active/Standby high availability of BIG-IP is to use the CFE , which can programmatically update IP addresses and routes in Azure at time of device failure. Since CFE does not support Active/Active scenarios, it is appropriate only for failover of a single traffic group (ie., Active/Standby). Conclusion Thanks for reading! In general I see that Active/Standby solutions work for many customers, but if you are confident in your skills and have a need for Active/Active F5 BIG-IP devices in the cloud, please reach out if you'd like me to walk you through these options and explore any other possibilities. Related articles Practical Considerations using F5 BIG-IP and Azure Load Balancer Deploying F5 BIG-IP with Azure Cross-Region Load Balancer1.3KViews2likes2CommentsRunning F5 with managed Azure RedHat OpenShift
Summary In early 2020, Microsoft and RedHat announced a new release of Azure RedHat OpenShift. This article shows how to set up F5 to integrate with this offering. This is also an easy demo. Background OpenShift is now available as a managed service in Azure called ARO (as in, Azure RedHat OpenShift). Microsoft has published a tutorial to deploy a cluster into an existing virtual network, but this article shows a way to deploy an environment with F5 integrated in a single deployment. Use this for demo or learning purposes. Deploying Azure RedHat OpenShift (ARO) You can run OpenShift on your own servers on-premises or in the cloud. For example, these instructions were the way I first learned to deploy a cluster on AWS. Eric Ji from F5 recently published a guide that walks through these instructions and he includes deployment of F5 Container Ingress Services. This method is supported and gives you a high level of control. ARO is a deployment option where your servers are managed by Azure. Patching, upgrading, repair, and DR are all handled for you, along with joint support from Microsoft and RedHat. Microsoft have done a great job of documenting the process to deploy ARO in the tutorial already mentioned. If you were to follow their instructions, after about 35 minutes your deployment would produce something like this (image taken straight from OpenShift's announcement article): Microsoft's instructions to create the demo above require that you have the User Access Administrator role, or that you pass in the credentials of a ServicePrincipal that has contributor rights over the Resource Group in which the existing VNET resides. Deploying F5 + ARO Another way to build out the same environment in Azure is this automated demo, which will include the deployment of F5 and also takes around 35 minutes to complete. Click here to deploy this demo: https://github.com/mikeoleary/azure-redhat-openshift-f5 This does not require a User Access Administrator, but does require that you have a ServicePrincipal with Contributor permissions on the subscription. A ServicePrincipal is a principal in Azure ActiveDirectory to which you can assign roles at a scope like Resource Group or Subscription. For this demo, I recommend creating a ServicePrincipal and then assigning it the role of Contributor over your Subscription, or the Resource Group in which you intend to deploy. If you follow this demo, you'll have an environment that looks more like this: This demo adds the following resources to the environment. You could add these resources manually yourself, if you have an existing OpenShift environment. Adds 3x subnets for the F5 BIG-IP VM Deploys F5 VM's into those subnets using this ARM template Adds the BIG-IP into the OpenShift network following these instructions Installs CIS in OpenShift following these instructions. Deploys an app into OpenShift This includes a Route resource that is detected by CIS CIS then populates the app's pod IP addresses as pool members in BIG-IP Output values are added to the deployment, for users to verify successful completion Post-deployment verification This demo will deploy an app in OpenShift that is exposed by an OpenShift Route, and this requires that you manually change your DNS record on the Internet to point to the IP address value of the deployment output called publicExternalLoadBalancerAddress. After you have made this DNS change (optionally, use a local hosts record), you should see your demo app available on the Internet, like this: The outputs of this demo will also give you the public URL's of BIG-IP's and your OpenShift cluster. You can login to all of these to see the configuration at work. Deleting your environment Don't forget to delete your environment if you are just testing. I find the easiest way to do this is just to delete the Resource Group into which you deployed originally. You can delete individual resources via the Azure portal if you choose, but do remember that the Read-Only Resource Group that is created by ARO is deleted by deleting the OpenShift cluster resource, which is in the Resource Group into which you originally deployed. Conclusion To summarize, ARO allows us to deploy an OpenShift environment quickly. Integration with F5 is much like an on-prem installation of OpenShift. You integrate the BIG-IP with the OpenShift network, then deploy CIS so that it can configure the BIG-IP to expose your applications. Thanks for reading! Any questions, please leave a comment and I'll respond, thanks!1.3KViews1like1CommentCreate a BIG-IP HA Pair in Azure
Use an Azure ARM template to create a high availability (active-standby) pair of BIG-IP Virtual Edition instances in Microsoft Azure. When one BIG-IP VE goes standby, the other becomes active, the virtual server address is reassigned from one external NIC to another. Today, let’s walk through how to create a high availability pair of BIG-IP VE instances in Microsoft Azure. When we’re done, we’ll have an active-standby pair of BIG-IP VEs. To start, go to the F5 Networks Github repository. Click F5-azure-arm-templates. Then go to Supported>failover. You have several options at this point. You can chose which templates to use based on your needs, failing over via API calls, via upstream load balancers, and NIC counts. Read each readme to determine your desired deployment strategy. When you already have your subnets and existing IP addresses defined but to see how it works, let’s deploy a new stack. Click new stack and scroll down to the Deploy button. If you have a trial or production license from F5, you can use the BYOL or BIG-IQ as license server options but in this case we’re going to choose the PAYG option. Click Deploy and the template opens in the Azure portal. Now we simply fill out the fields. We’ll create a new Resource Group and set a password for the BIG-IP VEs. When you get to the questions: The DNS label is used as part of the URL. Instance Name is just the name of the VM in Azure. Instance Type determines how much memory and CPU you’ll have. Image Name determines how many BIG-IP modules you can run (and you can choose the latest BIG-IP version). Licensed Bandwidth determines the maximum throughput of the traffic going through BIG-IP. Select the Number of External IP addresses (we’ll start with one but can add more later). For instance, if you plan on running more than one application behind the BIG-IP, then you’ll need the appropriate external IP addresses. Vnet Address Prefix is for the address ranges of you subnets (we’ll leave at default). The next 3 fields (Tenant ID, Client ID, Service Principal Secret) have to do with security. Rather than using your own credentials to modify resources in Azure, you can create an Active Directory application and assign permissions to it. The last two fields also go together. Managed Routes let you route traffic from other external networks through the BIG-IPs. The Route Table Tag means that anytime this tag is found in the route table, routes that have this destination are updated so that the next hop is the IP address of the active BIG-IP VE. This is useful if you want all outbound traffic to go through the BIG-IP or if you want to send traffic from a bunch of different Vnets through the BIG-IP. We’ll leave the rest as default but the Restricted Src Address is good way to put IP addresses on my network – the ones that are allowed to connect to the BIG-IP. We’ll agree to the terms and click Purchase. We’re redirected to the Dashboard with the Deployment in Progress indicator. This takes about 15 minutes. Once finished we’ll go check all the resources in the Resource Group. Let’s find out where the virtual server address is located since this is associated with one of the external NICs, which have ‘ext’ in the name. Click the one you want. Then click IP Configuration under Settings. When you look at the IP Configuration for these NICs, whenever the NIC has two IP addresses that’s the NIC for the active BIG-IP. The Primary IP address is the BIG-IP Self IP and the Secondary IP is the virtual server address. If we look at the other external NIC we’ll see that it only has one Self IP and that’s the Primary and it doesn’t have the Secondary virtual server address. The virtual server address is assigned to the active BIG-IP. When we force the active BIG-IP to standby, the virtual server address is reassigned from one NIC to the other. To see this, we’ll log into the BIG-IPs and on the active BIG-IP, we’ll click Force to Standby and the other BIG-IP becomes Active. When we go back to Azure, we can see that the virtual server IP is no longer associated with the external NIC. And if we wait a few minutes, we’ll see that the address is now associated with the other NIC. So basically how BIG-IP HA works in the Azure cloud is by reassigning the virtual server address from one BIG-IP to another. Thanks to our TechPubs group and check out the demo video. ps6.4KViews0likes6CommentsDistributed Cloud Support for NAS Migrations from On-Premises Approaches to Azure NetApp Files
F5 Distributed Cloud (XC) Secure Multicloud Networking (MCN) connects and secures distributed applications across offices, data centers, and various cloud platforms. Frequently the technology is web-based, meaning traffic is often carried on ports like TCP port 443, however other traffic types are also prevalent in an enterprise’s traffic mix. Examples include SSH or relational database protocols. One major component of networked traffic is Network-Attached Storage (NAS), a protocol in the past frequently carried over LANs between employees in offices and co-located NAS appliances, perhaps in wiring closets or server rooms. An example of such an appliance would be the ONTAP family from NetApp which can take on physical or virtual form factors. NAS protocols are particularly useful as they integrate file stores into operating systems such as Microsoft Windows or Linux distributions as directories, mounted for easy access to files at any time, often permanently. This contrasts with SSH file transfers, which are often ephemeral actions and not so tightly integral to host operating system health. With the rise of remote work, often the NAS appliances see increasing file reads-and-writes to these directories, traversing wide-area links. In fact, one study analyzing fundamental traffic changes due to the Covid-19 pandemic saw a 22 percent increase in file transfer protocol (FTP) in a single year, suggesting access to files has undergone significant foundational changes in recent years. Distributed Cloud and the Movement towards Centralized Enterprise Storage A traditional concern about serving NAS files to offices from a centralized point, such as a cloud-instantiated file repository, is latency and reliability. With F5’s Distributed Cloud offering a 12 Tbps aggregate backbone and dedicated RE-to-RE links, the behavior of the network component is both highly durable and performant. The efficiencies of a centralized corporate file distribution point, with the required 9’s of guaranteed uptime of modern cloud services, and the logic of moving towards cloud-served NAS solutions makes a lot of sense. With on-premises storage appliances replaced by a secure, networked service eliminates the need to maintain costly spares, which are effectively a shadow NAS appliance infrastructure and onerous RMA procedures. All of this enables accomplishing the goal of shrinking/greening office wiring closets. To demonstrate this centralized model for a NAS architecture, a configuration was created whereby a west coast simulated office was connected by F5 Distributed Cloud to Azure NetApp Files (ANF) instantiated in Azure East-2 region. ANF is Microsoft Azure’s newest native file serving solution, managed by NetApp, with data throughputs that increase in lock step with the amount of reserved storage pool capacity. Different quality of service (QoS) levels are selectable by the consumer. In the streamlined ANF configuration workflow, where various transaction latency thresholds may be requested, even the most demanding relational database operations are typically accommodated. Microsoft offers additional details on ANF here, however, this article should serve to sufficiently demonstrate the ANF and F5 Distributed Cloud Secure MCN solutions for most readers. Distributed Cloud and Azure NetApp Files Deployment Example NAS in the enterprise today largely involves use of either NFS or SMB protocols, both of which can be used within Windows and Linux environments and make remote directories appear and perform as if local to users. In our example, a western US point of presence was leveraged to serve as the simulated remote office and standard Linux hosts to serve as the consumers of NetApp volumes. In the east, a corporate VNET was deployed in an Azure resource group (RG) in US-East-2, with one subnet delegated to provide Azure NetApp Files (ANF). To securely connect the west coast office to the eastern Azure ANF service, F5 Distributed Cloud Secure MCN was utilized to create a Layer 3 multi-cloud network offering. This is achieved by easily dropping an F5 customer edge (CE) virtual appliance into both the office and the Azure VNET in the east. The CE is a 2-port security appliance. The inside interfaces on both CEs were attached to a global virtual network, and exclusive layer-3 associations to allow simple connectivity and fully preserve privacy. In keeping with the promise of SaaS, Distributed Cloud users require no routing protocol setup. The solution takes care of the control plane, including routing and encryption. This concept could be scaled to hundreds of offices, if equipped with CEs, and easily attached to the same global virtual network. CEs, at boot-up, automatically attach via IP Sec (or SSL) tunnels to geographically close F5 backbone nodes, called regional edge (RE) sites. Like tunnel establishment, routing tables are updated under-the-hood to allow for a turn-key security relationship between Azure NetApp File volumes and consuming offices. The setup is depicted as follows: Setup Azure NetApp Files (ANF) Volumes in Minutes To put the centralized approach to offering NAS volumes for remote offices or locations into practice, a series of quick steps are undertaken, which can all be done through the standard Microsoft Azure portal. The four steps are listed below, with screenshots provided for key points in the brief process: If not starting from an existing Resource Group (RG), create a new RG and add an Azure VNET to it. Delegate one subnet in your VNET to support ANF. Under “Delegate Subnet to a Service” select from the pull-down-list the entry “Microsoft.NetApp/volumes”. Within the Resource Group, choose “Create” and make a NetApp account. This will appear in the Azure Marketplace listings as “Azure NetApp Files”. In your NetApp account, under “Storage service” create a capacity pool. The pool should be sized appropriately, larger is typically better, since numerous volumes, supporting your choice of NFS3/4 and SMB protocols, will be created from this single, large disk pool. Create your first volume, select size, NAS protocols to support, and QoS parameters that meet your business requirements. As seen below, when adding a capacity pool simply follow the numerical sequence to add your pool, with a newly created sample 2 TiB pool highlighted; 1,024 TiB (1 PiB) are possible (click image to enlarge). Interestingly, the capacity pool shown is the “Standard” service level, as opposed to “Premium” and “Ultra”. With QoS type of Auto selected, Azure NetApp Files provides increasing throughput in terms of megabytes per second as the number of TiB in the pool increases. The throughput also increases with service levels; for standard, as shown, 8 megabytes per second per TiB will be allocated. Beyond throughput, ANF also provides the lowest latency averages for reads and writes in the Azure portfolio of storage offerings. As such, ANF is a very good fit for database deployments that must see constrained, average latency for mission-critical transactions. Deeper discussion around ANF service levels may be explored through the Microsoft document here. The next screenshot shows the simple click-through sequence for adding a volume to the capacity pool, simply click on volumes and the “+Add volume” button. A resulting sample volume is displayed in the figure with key parameters highlighted. In the above volume (“f5-distributed-cloud-vol-001”) the NAS protocol selected was NFSv3 and the size of the volume (“Quota”) was set to 100GiB. Setup F5 Distributed Cloud Office-to-Azure Connectivity To access the volume in a secured and highly responsive manner, from corporate headquarters, remote offices or existing data centers, three items from F5 Distributed Cloud are required: A customer edge (CE) node, normally with 2-ports, must be deployed in the Azure RG VNET. This establishes the Azure instance as a “site” within the Distributed Cloud dashboard. Hub and spoke architectures may also be used if required, where VNET peering can also allow the secure multi-cloud network (MCN) solution to operate seamlessly. A CE is deployed at a remote office or datacenter, where file storage services are required by various lines of business. The CE is frequently deployed as a virtual appliance or installed on a bare metal server and typically has 2-ports. To instantiate a layer-3 MCN service, the inside ports of the two CEs are “joined” to a virtual global network created by the enterprise in the Distributed Cloud console, although REST API and Terraform are also deployment options. By having each inside port of the Azure and office CE’s joined to the same virtual network, the “inside” subnets can now communicate with each other, securely, with traffic normally exchanged over encrypted high-speed IPSec tunnels into the F5 XC global fabric. The following screenshot demonstrates adding the Azure CE inside interface to a global virtual network, allowing MCN connectivity to remote office clients requiring access to volumes. Further restrictions, to prevent unauthorized clients, are found within NAS protocols themselves, such as simple Export policies in NFS and ACL rules in SMB/CIFS, which can be configured quickly within ANF. Remote Office Access – Establish Read/Write File Access to Azure ANF over F5 Distributed Cloud With both ANF configured and F5 Distributed Cloud now providing a layer-3 muticloud network (MCN) solution, to patch enterprise offices to the centralized storage, some confirmation of the solution working as expected was desired. First off, a choice in protocols was made. When configuring ANF, the normal choices for access are NFSv3/v4 or SMB/CIFS or both protocols concurrently. Historically, Microsoft hosts made use of SMB/CIFS and Linux/Unix hosts preferred NFS, however today both protocols are used throughout enterprises. One example being long-time SAMBA server (SMB/CIFS) support in the world of Linux. Azure NetApp Files will provide all the necessary command samples to get hosts connected without difficulty. For instance, to mount the volume to a folder off the Linux user home directory, such as the sample folder “f5-distributed-cloud-vol-001”, per the ANF suggestion the following one command will connect the office Linux host to the central storage in Azure-East-2: sudo mount -t nfs -o rw,hard,rsize=262144,wsize=262144,vers=3,tcp 10.0.9.4:/f5-distributed-cloud-vol-001 f5-distributed-cloud-vol-001 At this point the volume is available for day-to-day tasks, including read and write operations, as if the NAS solution were local to the office, often literally down the hallway. Remote Office Access - Demonstration of Azure ANF over F5 Distributed Cloud in Action To repeatedly exercise file writes from a west coast US office to an east coast ANF deployment in Azure-East-2 (Richmond, Virgina) a simple shell script was used to perpetually write a file to a volume, delete it, and repeat over time. The following sample wrote a file of 20,000 bytes to the ANF service, waited a few seconds, and then removed the file before beginning another cycle. At the lowest common denominator, packet analysis for the ensuing traffic from the western US office will indicate both network and application latency sample values. As depicted in the following Wireshark trace, the TCP response to a transmitted segment carrying an NFS command, was observed to be just 74.5 milliseconds. This prompt round-trip latency for a cross-continent data plane suggests a performant Distributed Cloud MCN service level. This is easily seen as the offset from the reference timestamp (time equal to zero) of the NFS v3 Create Call. Click on image to expand. Analyzing the NAS response from ANF (packet 185) arrives less than 1 millisecond later, suggesting a very responsive, well-tuned NFS control plane offered by ANF. To measure the actual, write-time of a file from west coast to east coast, the following trace demonstrates the 20,000 byte file write exercise from the shell script. In this case, the TCP segments making up the file, specifically the large packet body lengths called out in the screenshot, are delivered efficiently without TCP retransmissions, TCP zero window events, nor having any indicators of layer 3 and 4 health concerns. The entirety of the write is measured at the packet layer to take only 150.8 milliseconds. Since packet-level analysis is not the most turnkey, easy method to monitor file read and write performance, a set of Linux and Windows utilities can also be leveraged. The Linux utility nfsiostat was concurrently used with the test file writes and produced similar, good latency measurements. Nfsiostat monitoring of the file write testing, from west coast to east coast, for the 20,000-byte file, has indicated an average write time to ANF of 151 milliseconds. The measurements presented here are simply observational, to present rapid, digestible techniques for readers interested in service assurance for running ANF over an XC L3 MCN offering. For more rigorous monitoring treatments, Microsoft provides guidance on performing one’s own measurements of Azure NetApp Files here. Summary As enterprise-class customers continue to rapidly look towards cloud for compute performance, GPU access, and economies-of-scale savings for key workloads, the benefits of a centralized, scalable storage counterpart to this story exists. F5 Distributed Cloud offers the reach and performance levels to securely tie existing offices and data centers to cloud-native storage solutions. One example of this approach to modernize storage was covered in this article, the turn-key ability to begin transitioning from traditionally on-premises NAS appliances to cloud-native scalable volumes. The Azure NetApp Files approach to serving read/write volumes allows modern hosts, including Windows and Linux distributions, to utilize virtually unlimited folder sizes with service levels adjustable to business needs.151Views0likes0CommentsF5 High Availability - Public Cloud Guidance
This article will provide information about BIG-IP and NGINX high availability (HA) topics that should be considered when leveraging the public cloud. There are differences between on-prem and public cloud such as cloud provider L2 networking. These differences lead to challenges in how you address HA, failover time, peer setup, scaling options, and application state. Topics Covered: Discuss and Define HA Importance of Application Behavior and Traffic Sizing HA Capabilities of BIG-IP and NGINX Various HA Deployment Options (Active/Active, Active/Standby, auto scale) Example Customer Scenario What is High Availability? High availability can mean many things to different people. Depending on the application and traffic requirements, HA requires dual data paths, redundant storage, redundant power, and compute. It means the ability to survive a failure, maintenance windows should be seamless to user, and the user experience should never suffer...ever! Reference: https://en.wikipedia.org/wiki/High_availability So what should HA provide? Synchronization of configuration data to peers (ex. configs objects) Synchronization of application session state (ex. persistence records) Enable traffic to fail over to a peer Locally, allow clusters of devices to act and appear as one unit Globally, disburse traffic via DNS and routing Importance of Application Behavior and Traffic Sizing Let's look at a common use case... "gaming app, lots of persistent connections, client needs to hit same backend throughout entire game session" Session State The requirement of session state is common across applications using methods like HTTP cookies,F5 iRule persistence, JSessionID, IP affinity, or hash. The session type used by the application can help you decide what migration path is right for you. Is this an app more fitting for a lift-n-shift approach...Rehost? Can the app be redesigned to take advantage of all native IaaS and PaaS technologies...Refactor? Reference: 6 R's of a Cloud Migration Application session state allows user to have a consistent and reliable experience Auto scaling L7 proxies (BIG-IP or NGINX) keep track of session state BIG-IP can only mirror session state to next device in cluster NGINX can mirror state to all devices in cluster (via zone sync) Traffic Sizing The cloud provider does a great job with things like scaling, but there are still cloud provider limits that affect sizing and machine instance types to keep in mind. BIG-IP and NGINX are considered network virtual appliances (NVA). They carry quota limits like other cloud objects. Google GCP VPC Resource Limits Azure VM Flow Limits AWS Instance Types Unfortunately, not all limits are documented. Key metrics for L7 proxies are typically SSL stats, throughput, connection type, and connection count. Collecting these application and traffic metrics can help identify the correct instance type. We have a list of the F5 supported BIG-IP VE platforms on F5 CloudDocs. F5 Products and HA Capabilities BIG-IP HA Capabilities BIG-IP supports the following HA cluster configurations: Active/Active - all devices processing traffic Active/Standby - one device processes traffic, others wait in standby Configuration sync to all devices in cluster L3/L4 connection sharing to next device in cluster (ex. avoids re-login) L5-L7 state sharing to next device in cluster (ex. IP persistence, SSL persistence, iRule UIE persistence) Reference: BIG-IP High Availability Docs NGINX HA Capabilities NGINX supports the following HA cluster configurations: Active/Active - all devices processing traffic Active/Standby - one device processes traffic, others wait in standby Configuration sync to all devices in cluster Mirroring connections at L3/L4 not available Mirroring session state to ALL devices in cluster using Zone Synchronization Module (NGINX Plus R15) Reference: NGINX High Availability Docs HA Methods for BIG-IP In the following sections, I will illustrate 3 common deployment configurations for BIG-IP in public cloud. HA for BIG-IP Design #1 - Active/Standby via API HA for BIG-IP Design #2 - A/A or A/S via LB HA for BIG-IP Design #3 - Regional Failover (multi region) HA for BIG-IP Design #1 - Active/Standby via API (multi AZ) This failover method uses API calls to communicate with the cloud provider and move objects (IP address, routes, etc) during failover events. The F5 Cloud Failover Extension (CFE) for BIG-IP is used to declaratively configure the HA settings. Cloud provider load balancer is NOT required Fail over time can be SLOW! Only one device actively used (other device sits idle) Failover uses API calls to move cloud objects, times vary (see CFE Performance and Sizing) Key Findings: Google API failover times depend on number of forwarding rules Azure API slow to disassociate/associate IPs to NICs (remapping) Azure API fast when updating routes (UDR, user defined routes) AWS reliable with API regarding IP moves and routes Recommendations: This design with multi AZ is more preferred than single AZ Recommend when "traditional" HA cluster required or Lift-n-Shift...Rehost For Azure (based on my testing)... Recommend using Azure UDR versus IP failover when possible Look at Failover via LB example instead for Azure If API method required, look at DNS solutions to provide further redundancy HA for BIG-IP Design #2 - A/A or A/S via LB (multi AZ) Cloud LB health checks the BIG-IP for up/down status Faster failover times (depends on cloud LB health timers) Cloud LB allows A/A or A/S Key difference: Increased network/compute redundancy Cloud load balancer required Recommendations: Use "failover via LB" if you require faster failover times For Google (based on my testing)... Recommend against "via LB" for IPSEC traffic (Google LB not supported) If load balancing IPSEC, then use "via API" or "via DNS" failover methods HA for BIG-IP Design #3 - Regional Failover via DNS (multi AZ, multi region) BIG-IP VE active/active in multiple regions Traffic disbursed to VEs by DNS/GSLB DNS/GSLB intelligent health checks for the VEs Key difference: Cloud LB is not required DNS logic required by clients Orchestration required to manage configs across each BIG-IP BIG-IP standalone devices (no DSC cluster limitations) Recommendations: Good for apps that handle DNS resolution well upon failover events Recommend when cloud LB cannot handle a particular protocol Recommend when customer is already using DNS to direct traffic Recommend for applications that have been refactored to handle session state outside of BIG-IP Recommend for customers with in-house skillset to orchestrate (Ansible, Terraform, etc) HA Methods for NGINX In the following sections, I will illustrate 2 common deployment configurations for NGINX in public cloud. HA for NGINX Design #1 - Active/Standby via API HA for NGINX Design #2 - Auto Scale Active/Active via LB HA for NGINX Design #1 - Active/Standby via API (multi AZ) NGINX Plus required Cloud provider load balancer is NOT required Only one device actively used (other device sits idle) Only available in AWS currently Recommendations: Recommend when "traditional" HA cluster required or Lift-n-Shift...Rehost Reference: Active-Passive HA for NGINX Plus on AWS HA for NGINX Design #2 - Auto Scale Active/Active via LB (multi AZ) NGINX Plus required Cloud LB health checks the NGINX Faster failover times Key difference: Increased network/compute redundancy Cloud load balancer required Recommendations: Recommended for apps fitting a migration type of Replatform or Refactor Reference: Active-Active HA for NGINX Plus on AWS, Active-Active HA for NGINX Plus on Google Pros & Cons: Public Cloud Scaling Options Review this handy table to understand the high level pros and cons of each deployment method. Example Customer Scenario #1 As a means to make this topic a little more real, here isa common customer scenario that shows you the decisions that go into moving an application to the public cloud. Sometimes it's as easy as a lift-n-shift, other times you might need to do a little more work. In general, public cloud is not on-prem and things might need some tweaking. Hopefully this example will give you some pointers and guidance on your next app migration to the cloud. Current Setup: Gaming applications F5 Hardware BIG-IP VIRPIONs on-prem Two data centers for HA redundancy iRule heavy configuration (TLS encryption/decryption, payload inspections) Session Persistence = iRule Universal Persistence (UIE), and other methods Biggest app 15K SSL TPS 15Gbps throughput 2 million concurrent connections 300K HTTP req/sec (L7 with TLS) Requirements for Successful Cloud Migration: Support current traffic numbers Support future target traffic growth Must run in multiple geographic regions Maintain session state Must retain all iRules in use Recommended Design for Cloud Phase #1: Migration Type: Hybrid model, on-prem + cloud, and some Rehost Platform: BIG-IP Retaining iRules means BIG-IP is required Licensing: High Performance BIG-IP Unlocks additional CPU cores past 8 (up to 24) extra traffic and SSL processing Instance type: check F5 supported BIG-IP VE platforms for accelerated networking (10Gb+) HA method: Active/Standby and multi-region with DNS iRule Universal persistence only mirrors to only next device, keep cluster size to 2 scale horizontally via additional HA clusters and DNS clients pinned to a region via DNS (on-prem or public cloud) inside region, local proxy cluster shares state This example comes up in customer conversations often. Based on customer requirements, in-house skillset, current operational model, and time frames there is one option that is better than the rest. A second design phase lends itself to more of a Replatform or Refactor migration type. In that case, more options can be leveraged to take advantage of cloud-native features. For example, changing the application persistence type from iRule UIE to cookie would allow BIG-IP to avoid keeping track of state. Why? With cookies, the client keeps track of that session state. Client receives a cookie, passes the cookie to L7 proxy on successive requests, proxy checks cookie value, sends to backend pool member. The requirement for L7 proxy to share session state is now removed. Example Customer Scenario #2 Here is another customer scenario. This time the application is a full suite of multimedia content. In contrast to the first scenario, this one will illustrate the benefits of rearchitecting various components allowing greater flexibility when leveraging the cloud. You still must factor in-house skill set, project time frames, and other important business (and application) requirements when deciding on the best migration type. Current Setup: Multimedia (Gaming, Movie, TV, Music) Platform BIG-IP VIPRIONs using vCMP on-prem Two data centers for HA redundancy iRule heavy (Security, Traffic Manipulation, Performance) Biggest App: oAuth + Cassandra for token storage (entitlements) Requirements for Success Cloud Migration: Support current traffic numbers Elastic auto scale for seasonal growth (ex. holidays) VPC peering with partners (must also bypass Web Application Firewall) Must support current or similar traffic manipulating in data plane Compatibility with existing tooling used by Business Recommended Design for Cloud Phase #1: Migration Type: Repurchase, migration BIG-IP to NGINX Plus Platform: NGINX iRules converted to JS or LUA Licensing: NGINX Plus Modules: GeoIP, LUA, JavaScript HA method: N+1 Autoscaling via Native LB Active Health Checks This is a great example of a Repurchase in which application characteristics can allow the various teams to explore alternative cloud migration approaches. In this scenario, it describes a phase one migration of converting BIG-IP devices to NGINX Plus devices. This example assumes the BIG-IP configurations can be somewhat easily converted to NGINX Plus, and it also assumes there is available skillset and project time allocated to properly rearchitect the application where needed. Summary OK! Brains are expanding...hopefully? We learned about high availability and what that means for applications and user experience. We touched on the importance of application behavior and traffic sizing. Then we explored the various F5 products, how they handle HA, and HA designs. These recommendations are based on my own lab testing and interactions with customers. Every scenario will carry its own requirements, and all options should be carefully considered when leveraging the public cloud. Finally, we looked at a customer scenario, discussed requirements, and design proposal. Fun! Resources Read the following articles for more guidance specific to the various cloud providers. Advanced Topologies and More on Highly Available Services Lightboard Lessons - BIG-IP Deployments in Azure Google and BIG-IP Failing Faster in the Cloud BIG-IP VE on Public Cloud High-Availability Load Balancing with NGINX Plus on Google Cloud Platform Using AWS Quick Starts to Deploy NGINX Plus NGINX on Azure5.5KViews5likes2CommentsHow I did it - "Integrating Fortanix SDKMS with the BIG-IP"
Let Me Ask You a Question I need to implement a key management service (KMS) to manage my organization’s TLS keys.The KMS/HSM needs to be cloud-agnostic, secure, scalable, and available to handle crypto operations offloaded from web applications deployed on a variety of platforms across the globe.What’s more, there is a requirement that the organization maintains full control and ownership of the KMS and it’s operations so using a SaaS offering is not an option. Here’s the question, what can I use? Why Fortanix? Answering the above question isn’t an easy one.When I hear phrases like “scalable and highly available” and “...across the globe”, I immediately start looking at the public cloud.But, I still need to be cloud agnostic and maintain full control so cloud HSMs and SaaS offerings don’t fit the bill. To address the above requirements, I turned to one of F5’s partners and deployed the Fortanix Self-Defending Key Management Service (SDKMS).The Fortanix SDKMS system checks all the boxes, including: Cloud Agnostic - I am able to make use of KMS services across all public cloud platforms as well as on-premises. Secure/Scalable/Highly-Available - Deploys in the public cloud on hardware that utilizes Intel SGX technology and runs every operation in HSM-grade security. Full Control - Deploys in your infrastructure and provides web-based UI for centralized management. Solution Overview In this article, we’ll walk through configuring the BIG-IP to offload TLS crypto operations to a Fortanix SDKMS.The deployment process is quite similar to F5’s integration with Equinix SmartKey, (Fortanix SDKMS SaaS offering).The following steps are based upon F5’s guidance for implementing a network HSM. Okay, let’s take a look at how I did it. Prerequisites F5 BIG-IP LTM 14.1.0 or later - Virtual Edition (VE) utilized for this article.Both hardware and virtual edition platforms support network HSM integration. Additionally, you will need to provide a license covering the network HSM module. Fortanix SDKMS Cluster - We’re going to go right to the source, (Fortanix installation guidance). Setting up the SDKMS cluster is relatively straightforward and runs on Azure’s new DC-series instances, (currently in preview).Currently, the DC-series is available for public preview in the US East region only.The cool thing about the DC-series is that it makes use of Intel SGX technology.A better way to put it is it allows Azure infrastructure customers to make use of SGX. Fortanix SDKMS makes use of the underlying SGX technology to providesecure scalable data at rest. Step 1 - Create SDKMS Application Assuming the prerequisites have been met, (i.e. I have a Fortanix SDKMS stood up), the first thing I need to do is create an application object in SDKMS. The application object can access certificates, keys, and secrets that will be used by my application, (delivered via the BIG-IP).. A. Login to the Fortanix SDKMS UI.Select the ‘Apps’ icon from the left blade and then the ‘+’ to open a new application form. B. Provide the name of the application and select ‘API Key’ for the authentication method. C. Create a group and assign the application to the group.The group represents a collection of security objects, (applications, keys, certificates, etc.) that are available to members of the group. D. Select ‘Save’ to create the application, (see below left). With the application created, select ‘COPY API KEY’, (see above right) to capture the api key and store for later use. The key will be used by the BIG-IP as the password to authenticate calls to SDKMS. Step 2 - Install Fortanix Plugin Now that we have SDKMS prepared, I need to turn my attention to the BIG-IP.In this step, I will use my favorite ssh client to log into the BIG-IP as root. From there I will use the following commands to download and install the Fortanix plugin onto the BIG-IP.The plugin, (RPM) is available for download from here. cd /shared/ mkdir nethsm cd nethsm curl -O https://d2bzqwib4mjc49.cloudfront.net/3.11.1281/fortanix-pkcs11-4.25.2353-0.x86_64.rpm rpm -ivh ./fortanix-pkcs11-4.25.2353-0.x86_64.rpm Step 3 - Configure BIG-IP netHSM Integration A. Add the Fortanix HSM library to the BIG-IP tmsh create sys crypto fips external-hsm vendor auto pkcs11-lib-path /opt/fortanix/pkcs11/fortanix_pkcs11.sofortanix-pkcs11-3.11.1281-0.x86_64.rpm B. Configure the netHSM partition tmsh create sys crypto fips nethsm-partition auto password <copied API key> C. Restart thepkcs11d service bigstart restart pkcs11d tmm D. Testing Connectivity I will now use the BIG-IP management GUI to test connectivity between the BIG-IP and SDKMS. After logging into the BIG-IP GUI navigate to System --> Certificate Management --> HSM Management --> External HSM. Under the 'Partitions' section check the checkbox next to the partition in the partition list and select 'Test'. Below is example output of a successful connectivity test. Step 4 - Import SSL Certificate/key to BIG-IP and SDKMS A. Import private key into SDKMS Now that we have our external HSM, (Fortanix), https://fortanix.aserracorp.com integrated with our BIG-IP let’s put it to use. To start with, I will import a private key into SDKMS. Login to the Fortanix SDKMS UI and select the ‘Security Objects’ icon from the left blade and then the ‘+’ to open a new security object form. Provide the name for the key and select ‘Import’. Select 'RSA' for the object type Select 'Base64' and upload the my key. Associate the key to previously created group. Select ‘Import' to create the security object, (see below). B. Import SSL Certificate and netHSM Key Pointer into BIG-IP With the SDKMS now hostng the private key, I now import the corresponding certificate into the BIG-IP. Additionally, I must create a key resource pointing to the Fortanix SDKMS-hosted key. Login to the BIG-IP management GUI and navigate to System --> Certificate Management --> SSL Certificate List --> Import Select 'Certificate' for import type and provide a name. Browse to and upload certificate, select 'Import', (see below left). navigate to System --> Certificate Management --> SSL Certificate List --> Import Select 'Key' for import type and provide a name. The name must match the security object name of the SDKMS-stored key. Select 'From NetHSM' for 'Key Source', select 'Import', (see below right). C. Create SSL Profile and Attach to Virtual Server The last thing I need to do is create a Client SSL profile and associate it with my virtual server. Login to the BIG-IP management GUI and navigate to Local Traffic --> Profiles --> SSL --> CLIENT --> + Provide a name and check the 'Custom' checkbox. In the 'Certificate Key Chain' section select 'Add' Select the previously imported certificate and key from the drop-down menus Select 'Finished' to create the profile, (see below left) navigate to Local Traffic --> Virtual Servers and select the appropriate virtual server Under the 'SSLProfile (Client)' section select the previously create SSL profile, (see below right). Select 'Update' to save the modified virtual server. Well, that's how I did it. Now with the setup and configuration completed, my application, (https://app.aserracorp.com) is now secured with the BIG-IP offloading the crypto workload to Fortanix SDKMS. Additional Links Fortanix Self-Defending Key Management Service Setting Up the Network HSM BIG-IP Local Traffic Manager Azure Confidential Computing1.4KViews0likes0CommentsRunning FTP services in Azure via ALB and BIG-IP
Introduction A customer recently asked if the Azure Load Balancer (ALB) allowed for FTP traffic? The answer is, yes it can be done! But to make it possible, I had to use F5 BIG-IP and it made a nice use case to demonstrate some typical cloud limitations, and how BIG-IP can help. Problem statement Customer uses a commmercial, enterprise-level FTP server on-prem today. They want to "lift-n-shift" this FTP server to Azure. Today, they support both Active and Passive FTP connections, and their Passive FTP data ports are locked down to a range of 5000 ports. That's a lot, but not uncommon on-premises. Some initial chellenges to overcome: Tiered architecture. They intend to have an internet-facing firewall in front of their BIG-IP tier. This is due to legacy on-prem architectures and a desire to replicate for consistency in Azure. So inbound FTP client traffic will traverse their FW, and then needs to be served via Azure LB. Ports. If you've ever worked with FTP, you'll know that FTP uses a control channel (usually port 21) and data channels (usually a random high port and sometimes locked down to a range). Active FTP requires the server to connect back to the client. Then the client communicates with server on (usually) destination port 20. Passive FTP (more common today) requires the client to make these connections to the server, and port ranges vary. ALB rules allow for a single port each. With a range of 5000 ports, opening ports in separate ALB rules would break the (current) Azure limit of 1500. ALB limitations. While an internal ALB can have a rule that listens on all ports, an external ALB cannot. They intend to have an external ALB in front of firewall appliances, and their clients will access the FTP server from Internet. Legacy IT. They cannot easily reduce this port range to something manageable, under Azure's limit. There's legacy reasons for this. High Availability requirement. They have load-balanced Active/Active FTP servers on-prem and would like to continue this in Azure with Active/Active FTP servers. Demo Click here to see the GitHub repo and deploy to Azure. Let's analyze this! As part of the design for this demo, I drew out the diagram below. There are some simplifications to the diagram above, but in short, the intent is to show: An external tier of firewall appliances. In my demo I have made these F5 appliances, but the customer (or you) may choose to use a 3rd party firewall vendor for their external tier. An internal tier of BIG-IP devices. These devices secure and proxy FTP traffic with an FTP profile. An application tier. I have pictured only a single FTP app server for this demo, but obviously you can put many servers behind F5 BIG-IP devices. Making the last layer Active/Active is somewhat arbitrary, as is multiple tiers of the app itself (like adding a database). I used vsftp on Linux, not the commercial product my customer uses. Technical overview Here's how we overcome the challenges in the problem statement above. The external tier shown just has FastL4 Virtual Servers that accept traffic and pass it through on TCP ports 20, 21, 22, 80, 443, and 5000-5003. These are just providing traffic to be proxied on Layer 4. There is no app-layer inspection. The internal tier of BIG-IP devices uses an LTM FTP profile. It uses an iRule which I wrote based off an example here. This allows us to re-write the data ports used from the 5000 ports used to a more manageable number. For further protection, ASM can use an FTP profile to protect against other types of FTP attacks. ALB rules are set up, as are Network Security Group (NSG) rules, to allow our defined port ranges. This meets our requirements for ALB's and overcomes port-range-related challenges. High Availability is easily gained by creating additional app servers and load-balancing within F5. F5 itself has HA via the Device Service Cluster. The internal tier also proxy ports 22 (SSH) and 80 and 443 (HTTP and HTTPS) for the sake of a demo web app. These aren't needed for FTP to work, but are nice to have set up for demo purposes. In my demo I reduce the port range to 4 data ports (tcp/5000-5003) but for busier sites you may want more. Just stay under the limit of 1500. Real world scenarios FYI, I started creating this demo by starting with this F5 Networks demo template here. If I was to do this again, I would use linked templates, instead of 1 large template, but this demo works. Remember a few points in your production environment: Never expose your management interfaces to the public Internet. This demo is intended to show the mgmt interface for demo purposes. Use your own SSL certificates that are valid and signed by a public CA. For deploying cloud resources I have created one large ARM template, but if I was to do this again, I would make child templates Alternatively, I would use a deployment tool like Ansible or Terraform to deploy my cloud resources, and then configure them This demo requires EVAL keys, and don't forget to revoke your eval keys before blowing away the environment. If you don't, you'll need to call F5 Support to have them release your key. In this demo, you'll need to revoke the internal tier devices first, and then external tier. This is due to the default routes and inline devices. Why do you need F5? What's the difference between a basic load balancer, and a full proxy? One reason you use F5 in this scenario is for the enterprise-level features available. For example, when traffic traverses the BIG-IP in this scenario, we aren't just performing load-balancing. We're re-writing ports, dynamically opening new ports when required, optionally performing security, and a host of other app services are available. I'm often asked to differentiate F5 from basic load balancing services, and this is a good example of functionality you just cannot get from a basic loadbalancer. Conclusion I hope this demo environment helps you see that you CAN run FTP servers in Azure, via an Azure Load Balancer, if you use BIG-IP and configure your application services to provide security, availability, and performance!1.4KViews0likes1CommentPractical considerations for using Azure internal load balancer and BIG-IP
Background I recently had a scenario that required me to do some testing and I thought it would be a good opportunity to share. A user told me that he wants to put BIG-IP in Azure, but he has a few requirements: He wants to use an Azure Load Balancer (ALB) to ensure HA for his BIG-IP pair. This makes failover times faster in Azure, compared to other options. He does not want to use an external Load Balancer. He has internet-facing firewalls that will proxy inbound traffic to BIG-IP, so there is no need to expose BIG-IP to internet. He needs internal BIG-IP's only to provide the app services he needs. He does not want his traffic SNAT'd. He wants app servers to see the true client IP. Ideally he does not want to automate an update of Azure routes at time of failover. He would like to run his BIG-IP pair Active/Active, but could also run Active/Standby. Quick side note: Why would we use ALB's when deploying BIG-IP? Isn't that like putting a load balancer in front of a load balancer? In this case we're using the ALB as a basic, Layer 3/4 traffic disaggregator to simply send traffic to multiple BIG-IP appliances and provide failover in the case of a VM-level failure. However we want our traffic to traverse BIG-IP's when we need advanced app services: TLS handling, authentication, advanced health monitoring, sophisticated load balancing methods, etc. Let's analyze this! Firstly, I put together a quick demo to easily show him how to deploy BIG-IP behind an ALB. My demo uses an external (internet-facing) ALB and an internal ALB. It is based on the official template provided by F5, but additionally deploys an app server and configures the BIG-IP's with a route and an AS3 declaration. If it wasn't for the internal-only part, we would have met his requirements with this set up: Constraints However, this user's case requires no internet-facing BIG-IP. And now we hit 2 problems: Azure will not allow 2x internal LB’s in the same Availability Set, or you’ll get a conflict with error: NetworkInterfacesInAvailabilitySetUseMultipleLoadBalancersOfSameType . So the diagram above cannot be re-created with 2x Internal Azure LB's. When using only 1x internal LB, you “ should only have one inbound rule if that rule loadbalances across all ports and protocols. ” Since we want at least 1 rule for all ports (for outbound traffic from server), we cannot also have individual LB rules for our apps. Trying to do so will fail with the error message in quotes. Alternatives This leaves us with a few options. We can meet most of his requirements but one that I have not been able to overcome is this: if your cluster of BIG-IP's is Active/Active, you will need to SourceNAT traffic to ensure the response traffic traverses the same BIG-IP. I'll discuss three options and how they meet the requirements. Use a single Azure internal LB. At BIG-IP, SNAT the traffic to the web server, and send XFF header in place of true client IP. Default route can be the Firewall, so server-initiated traffic like patch updates can still reach internet. Can be Active/Active or Active/Standby, but you must SNAT if you do not want to be updating a UDR at time of failover. Or, don’t SNAT traffic, and web server sees true source IP. You will need a UDR (User Defined Route) to point default route and any client subnets at the Active BIG-IP. You will need to automatically update this UDR at time of failover (automated via F5's Cloud Failover Extension or CFE). Can be Active/Standby only, as traffic will return following the default route. Use a single Azure internal LB, but with multiple LB rules. In our example we'll have 2x Front End IP's configured on the Azure LB, and these will be in different internal subnets. Then we'll have 2x back end pools that consist of the external SelfIP's for 1 rule, and internal SelfIP's for the other. Finally we'll have 2x LB rules. The rule for the "internal" side of the LB may listen on ALL ports (for outbound traffic) and the "external" side of the LB might listen on 80 and 443 only. Advanced use cases (not pictured) Single NIC. If you did not want to have a 3-NIC BIG-IP, it would be possible to achieve scenario C above with a single NIC or dual NIC VM: Use a 2-nic BIG-IP (1 nic for mgmt., 1 for dataplane). Put your F5 pair behind a single internal Azure LB with only 1 LB rule which has “HA” ports checked (all ports). We can then have the default route of the server subnet point to Azure LB, which will be served by a VIP 0.0.0.0/0 on F5. Because this only allows you 1 LB rule on the Azure LB, enable DSR on the Azure LB Rule. Designate an “alien subnet range” that doesn’t exist in VNET, but only on the BIG-IP. Create a route to this range, and point the next hop at the frontend IP on the only LB rule. Then have your FW send traffic to the actual VIP on F5 that's within the alien range (not the frontendIP), which will get forwarded to Azure LB, and to F5. I have tested this but see no real advantage and prefer multi-NIC VM's. Alien range. As mentioned above, an "alien IP range" - a subnet not within the VNET but configured only on the BIG-IP for VIPs - could exist. You can then create a UDR that points this "alien range" toward the FrontEnd IP on Azure LB. An alien range may help you overcome the limit of internal IP's on Azure NIC's, but with a limit of 256 private IP's per NIC, I have not seen a case requiring this. An alien range might also allow app teams to manage their own network without bothering network admins. I would not advise going "around" network teams however - cooperation is key. So I cannot find a great use for this in Azure, but I have written about how an alien range may help in AWS. DSR. Direct Server Return in Azure LB means that Azure LB will not perform Destination NAT on the traffic, and it will arrive at the backend pool member with the true destination IP address. This can be handy when you want to create 1 VIP per application in your BIG-IP cluster, and not worry about multiple VIP's per application, or VIP's with /30 masks, or VIP's that use Shared Address lists. However, given that we have multiple options to configure VIP's when Destination NAT is performed by Azure LB (as it is, by default), I generally don't recommend DSR on Azure LB unless it's truly desired. Personally, I'd recommend in this case to proceed with Option C below. I'd also point out: I believe operating in the cloud requires automation, so you should not shy away from automated updates to UDR's when required. Given these can be configured by tools like F5's Cloud Failover Extension (CFE), they are a part of mature cloud operations. I personally try to architect to allow for changes later. Making SNAT a requirement may be a limitation for app owners later on, so I try not to end up in a scenario where we enforce SNAT. I personally like to see outbound traffic traverse BIG-IP, and not just because it allows apps to see true source IP. Outbound traffic can be analyzed, optimized, secured, etc - and a sophisticated device like BIG-IP is the place to do it. Lastly, I recommend best practices we all should know: Use template-based deployments for production so that you have Infrastructure as Code Ideally keep your BIG-IP config off-box and deploy with AS3 templates Get your app and dev teams configuring BIG-IP using declarative deployments to speed your deployment times Conclusion There are multiple ways to deploy BIG-IP when your requirements dictate an architecture that is not deployed by default. By keeping in mind my priorities of application services, operational support and high availability, you can decide on the best architecture for a given scenario. Thanks for reading, and let me know if you have any questions.6.1KViews3likes20CommentsDeploying BIG-IP Telemetry Streaming with Azure Sentinel as its consumer.
AZURE SENTINEL and BIG-IP ...with Telemetry Streaming! This work was completed as a collaboration of Remo Mattei r.mattei@f5.com and Bill Wester b.wester@f5.com, feel free to email us if you have questions. One of the things that I have discovered recently is how neat it is to be able to leverage Azures new Sentinel to receive and display telemetry data from F5's BIG-IP devices. The devices don't even have to be in Azure, you could have dedicated hardware BIG-IPs and still send via Telemetry Streaming to Sentinel as your destination for statistics and logs. Let us explore a bit more on how to get all of the moving pieces together to a single cohesive implementation. Telemetry Streaming is a way for you to forward events and statistics from the BIG-IP system to your preferred data consumer and visualization application. You can do all of this byPOSTinga single JSON declaration to a declarative REST API endpoint. Telemetry Streaming uses a declarative model, meaning you provide a JSON declaration rather than a set of imperative commands. More info can be found here: https://clouddocs.f5.com/products/extensions/f5-telemetry-streaming/latest/userguide/about-telemetry.html BIG-IP allows you to send logs to several external providers. Splunk, awell knownone, is one of the most used out there. However, the new Azure Sentinel, a cloud solution, is something that many customers can take advantages from.This section,will help in understanding on how to setup BIG-IP to get the logs to Azure Sentinel. Setup BIG-IP First of all, this is broken into two parts, one shows the logs of the BIG-IP System Metrics, like what OS, what modules are installed etc. The second, is about themodule ASM. The two have a few things in common. They use the TS RPM file which is added to the BIG-IP, and the declaration, which tells the BIG-IP where to send the stream of data. To send data relate to BIG-IP System Metrics it is required to have AVR provisioned on the device. ASM is not required but we use it here as an example of how to enable another module. Here is a screenshot from the Azure which shows the required modules.One more important thing is that ASM will need to have AFM also enabled otherwise you will not get logs in Azure. ASM Once enabled the required modules it will show System Metrics Common components that you must install for this to work First you need Telemetry Streaming: The TS RPM can be found here on GITHUB: https://github.com/F5Networks/f5-telemetry-streaming/releases/ You can use Visual Studio Code to install the RPM or your favorite way... Here are some screen shots form VS Code, using the F5 Plugin. NOTE: in order to useVSCodeto push AS3, DOetcyou must install the F5 Plugin. Use the command options in Mac it’scommand+shift+P(here you can search for RPM by just typing it in the box) Select AS3and make sure to install both AS3 and TS: Select the version: (probably latest is best here) The Telemetry Streaming declaration looks like this: { "class": "Telemetry", "My_Listener": { "class": "Telemetry_Listener", "port": 6514 }, "Poller": { "class": "Telemetry_System_Poller", "interval": 60, "enable": true, "trace": false, "allowSelfSignedCert": false, "host": "localhost", "port": 8100, "protocol": "http", "actions": [ { "enable": true, "includeData": {}, "locations": { "system": true, "virtualServers": true, "httpProfiles": true, "clientSslProfiles": true, "serverSslProfiles": true } } ] }, "Pull_Consumer": { "class": "Telemetry_Pull_Consumer", "type": "default", "systemPoller": [ "Poller" ] }, "Azure_Consumer": { "class": "Telemetry_Consumer", "type": "Azure_Log_Analytics", "workspaceId": "workspaceID", "passphrase": { "cipherText": "primkey" } }, "schemaVersion": "1.12.0" } NOTE: You will need to get theworkspaceIDand theprimarykey. You can use the azure cli for that: azmonitor log-analytics workspace list --out table CustomerId Location Name ProvisioningState PublicNetworkAccessForIngestion PublicNetworkAccessForQuery ResourceGroup RetentionInDays ------------------------------------ ------------- ---------------------------------------------------------- ------------------- --------------------------------- ----------------------------- ------------------------- ----------------- a05d4bfb-27c8-49a6-96e2-351d2dc78c61 eastus adrianLA Succeeded Enabled Enabled adrian_rg_01 7 63be43ed-b3f5-4e9f-bc92-226bb3393d11 eastus DefaultWorkspace-77c6ebef-d849-4527-a355-742d8d7d3fdc-EUS Succeeded Enabled Enabled defaultresourcegroup-eus 30 2ccbd35a-dfdf-4a5e-ab5f-1d5314f52e4b southeastasia DefaultWorkspace-77c6ebef-d849-4527-a355-742d8d7d3fdc-SEA Succeeded Enabled Enabled defaultresourcegroup-sea 30 9436f742-069a-4e29-aac0-e1258f7b1f87 westus2 calalangakslog Succeeded Enabled Enabled calalang-rg 30 ac071b51-f0c6-43b6-8bef-16b9197fde0f westus2 edgar-log Succeeded Enabled Enabled defaultresourcegroup-eus 31 555ae8d5-75bc-4058-becf-df510c09f8d3 westus2 DefaultWorkspace-77c6ebef-d849-4527-a355-742d8d7d3fdc-WUS2 Succeeded Enabled Enabled defaultresourcegroup-wus2 30 f633bdb1-d560-43cd-a664-cc7a93ed8781 westus2 edgar-log-analytics Succeeded Enabled Enabled edgar-rg 30 9334eb7c-16fc-4db9-a84f-5824a7177ccb centralus DefaultWorkspace-77c6ebef-d849-4527-a355-742d8d7d3fdc-CUS Succeeded Enabled Enabled defaultresourcegroup-cus 30 091c2cf3-853d-4297-9001-41d2109c28ec westus DefaultWorkspace-77c6ebef-d849-4527-a355-742d8d7d3fdc-WUS Succeeded Enabled Enabled defaultresourcegroup-wus 30 52471748-d9c7-46ba-9f9f-72ed8e92a201 westus remo-analytics Succeeded Enabled Enabled remo-telemetry 30 bc8e90ca-f59c-4fbf-a28b-213fe1cfcfda westus wester-log Succeeded Enabled Enabled wester_rg 30 Here you can see the name of the resource group then run the following command: azmonitor log-analytics workspace get-shared-keys --resource-groupwester_rg--workspace-name wester-log Which will print out theprimarykey The workspace isCustomerIdfrom the main table. To install this declaration you can use POSTMAN, curl, or Visual Studio Code; we used Visual Studio Code. Copy the text into a newVScodetab, make sure it’s in json format and then use the command pallet to post it Verify by using the TS version at the bottom ofVSCode, it will execute a GET to the BIG-IP that is connected. ASM In order to use ASM you will need to configure a VIP with the IP of 255.255.255.254, and the port to the 6514, as well as aniRule. This can be done with an AS3 declaration or TMSH. Sample of AS3 declaration { "class":"ADC", "schemaVersion":"3.10.0", "remark":"Example depicting creation of BIG-IP module log profiles", "Common": { "Shared": { "class":"Application", "template":"shared", "telemetry_local_rule": { "remark":"Only required when TS is a local listener", "class":"iRule", "iRule":"when CLIENT_ACCEPTED {\nnode127.0.0.1 6514\n}" }, "telemetry_local": { "remark":"Only required when TS is a local listener", "class":"Service_TCP", "virtualAddresses": [ "255.255.255.254" ], "virtualPort":6514, "iRules": [ "telemetry_local_rule" ] }, "telemetry": { "class":"Pool", "members": [ { "enable":true, "serverAddresses": [ "255.255.255.254" ], "servicePort":6514 } ], "monitors": [ { "bigip":"/Common/tcp" } ] }, "telemetry_hsl": { "class":"Log_Destination", "type":"remote-high-speed-log", "protocol":"tcp", "pool": { "use":"telemetry" } }, "telemetry_formatted": { "class":"Log_Destination", "type":"splunk", "forwardTo": { "use":"telemetry_hsl" } }, "telemetry_publisher": { "class":"Log_Publisher", "destinations": [ { "use":"telemetry_formatted" } ] }, "telemetry_traffic_log_profile": { "class":"Traffic_Log_Profile", "requestSettings": { "requestEnabled":true, "requestProtocol":"mds-tcp", "requestPool": { "use":"telemetry" }, "requestTemplate":"event_source=\"request_logging\",hostname=\"$BIGIP_HOSTNAME\",client_ip=\"$CLIENT_IP\",server_ip=\"$SERVER_IP\",http_method=\"$HTTP_METHOD\",http_uri=\"$HTTP_URI\",virtual_name=\"$VIRTUAL_NAME\",event_timestamp=\"$DATE_HTTP\"" } }, "telemetry_security_log_profile": { "class":"Security_Log_Profile", "application": { "localStorage":false, "remoteStorage":"splunk", "protocol":"tcp", "servers": [ { "address":"255.255.255.254", "port":"6514" } ], "storageFilter": { "requestType":"illegal-including-staged-signatures" } }, "network": { "publisher": { "use":"telemetry_publisher" }, "logRuleMatchAccepts":false, "logRuleMatchRejects":true, "logRuleMatchDrops":true, "logIpErrors":true, "logTcpErrors":true, "logTcpEvents":true } } } } } To post an AS3 declaration like above use Visual Studio Code Use the command menu and select F5 Post an AS3 Declaration from the tab you have pasted the code OUTPUT from the declaration above: iRuleused Assign the Telemetry Policy to the Virtual Service by selecting the option in the advanced menu Once you have the modules installed, and configured the appropriate settings, like above, then you will see data coming in Azure Sentinel. Here is an example: ASM System Metrics For System Metrics to work, you will need to have AVR installed, you do not need an AS3 declaration or aniRule. Once you have AVR installed, and have pushed the declaration to the BIG-IP, you will need to execute the following command in your BIG-IP. tmshmodify analytics global-settings{offbox-protocoltcpoffbox-tcp-addresses add { 127.0.0.1 }offbox-tcp-port 6514 use-offboxenabled } tmshsave /sys config Check the logs in your BIG-IP less /var/log/restnoded/restnoded.log You will see something like: Fri, 18 Sep 2020 06:36:04 GMT - info: [telemetry] Starting systempollerPoller::Poller. Interval = 60 sec. Fri, 18 Sep 2020 06:36:04 GMT - info: [telemetry] 1 consumer plug-in(s) loaded Next you will need to go into the Azure Portal, and you can find a nice pre-defined Sentinel Workbook to view and start to work with: You will select the "template" and then fill out the correct workspace from the dropdown, then select the correct hostname from the dropdown and you will start to see data showing up. Azure Sentinel displaying the workbook As you enable more modules, they will show up in the Azure Sentinel and will show how it’s enabled.You can also add / modify / enhance the workbook to show more data that is in Sentinel sent from the BIG-IP. Remo and I hope you found this article helpful and enjoy using BIG-IPs with Sentinel!4.2KViews3likes9CommentsBIG-IP integration with Azure Gateway Load Balancer
Introduction Microsoft just announced the Gateway Load Balancer. This is a new load balancer sku to go along with the existing types of Basic and Standard, however, this is aimed at sending traffic to transparent network devices for in-line traffic inspection. This article will overview the gateway load balancer and explain how F5 BIG-IP can integrate for transparent traffic inspection. Check out MS's blog too. The Challenge Many organizations face challenges with Azure network operations, like: setting up High Availability (HA) can be a learning curve for admins of 3rd party appliances in Azure using User Defined Routes (UDR's) to direct traffic to a network appliance can be difficult for some organizations, especially updating at scale dealing with multiple VNets across multiple regions and routing traffic between them can add operational overhead requiring Source NAT (SNAT) can be a deal-breaker for app owners, but architecting for flow symmetry without SNAT can require serious know-how Finally, there's a very real skills gap between most Network/Security teams and Cloud teams. Often, a SecOps team member (eg, firewall admin) just wants to manage a firewall, without needing to know about Azure networking. Likewise, the CloudOps team want to send traffic through the firewall without having to alert the SecOps team for every new application added. With Azure's Basic and Standard load balancer types, it is possible to achieve much of this and I've blogged about this before. But things get complicated quickly with advanced scenarios. The Gateway load balancer makes load balancing network appliances much easier. No SNAT It's important to remember that this is aimed at network devices that are inspecting but not proxying traffic. I.e., src and dst IP and port should be preserved when traffic traverses BIG-IP in this scenario, so SNAT is not supported. Think of transparent firewalling as a good use case here. Enter Gateway Load Balancer The gateway load balancer is aimed at load balancing firewalls, routers or other network appliances. Some high-level benefits: It's transparent. Source and Destination IP addresses are unchanged when traffic traverses the gateway load balancer, via VXLAN tunnels to backend pool members. Your firewall admin can now set up policies without worrying about Source or Destination NATs. Flow symmetry. The load balancer will ensure that response traffic hits the same device on the return path as it did on the request path. This means you can run multiple standalone or Active/Active appliances without using SNAT or UDR's to ensure flow symmetry. UDR's are not required, because traffic reaches the gateway load balancer in one of two ways: Either a regular, external load balancer is updated to send traffic via the gateway load balancer before forwarding to it's backend pool, or, If the VM is not behind an Azure Load Balancer, the VM's NIC is updated so that traffic traverses the gateway load balancer before it receives traffic. Traffic across VNets and Regions is supported. You can target a gateway load balancer from a regular load balancer that is in a different VNet, or a different region. UDR's are not used, and VNet peering is not required for this traffic flow. One last important point I'll make: this load balancer type is aimed at solving for North-South traffic, not East-West. I.e., your typical use-case will be for inbound traffic from Internet to your app servers on your Azure VNet. Put your app servers behind a public-facing load balancer, update that load balancer to reference the gateway load balancer, and you're good to go! No need to ask the firewall or F5 admin to configure a new application for you, or a new public IP on their device, etc. Example scenario In this scenario we have a web app that is internet-facing. We want to protect the web app from attacks by having the traffic traverse BIG-IP Advanced WAF (AWAF). The gateway load balancer is configured to allow all traffic inbound to the BIG-IP VE's, where security policies are applied. So we deploy a public-facing Azure Load Balancer, configure the load balancer with an attribute referencing the gateway load balancer, and watch as traffic from internet clients is routed via the Advanced WAF devices in transit to the app servers. Demo A basic demo of this is hosted on GitHub, if you would like to deploy this yourself or see more details. The BIG-IP's will require VXLAN tunnels configuration, VLAN groups, FDB entries, and ARP records. This is to integrate with the VXLAN requirements of the gateway load balancer. See the demo on GitHub here.8.8KViews2likes3Comments