TurboFlex, Application Delivery, and Tires: What's the Common Thread? (Tread?)
What Is TurboFlex? TurboFlex is the latest stop on F5's 15+ year journey using customized silicon to accelerate and optimize network trafficbut it’s more than just a stop. In fact, it’s more accurate to call it a quantum leap: use-case-oriented hardware optimizations, which can be changed withonly a minimal interruption in service.. TurboFlex, introduced with the iSeries appliances and VIPRION B4450 blade, can increase performance dramatically while reducing the load on other systems, such as the CPU. This advancement is more than just new software. It’s the union of the software and hardware components working together in harmony, more so than on any prior generation of BIG-IP devices. Application Delivery and Tires? I thought about how best to describe the impact and advancement of TurboFlex and, while merely putting performance numbers and specific features could tell the story, talking about the performance increase only covered a portion of what TurboFlex really is. Instead, I’ll use an analogy based on a recent trip I took. I went on a road trip at the end of June which, due to an oversight on my part in checking tread depth, caused me to replace my tires in the early part of a 3500-mile road trip. Fortunately for me, I was able to get the tires drop-shipped to an installer at a city on my trip, installed in about an hour, and I was back on my way to have other adventures. While getting these tires installed, I wandered and found a coffee shop and thought about the importance of ordering the right sort of tires for my car, especially for driving in the Summer and driving through some beautiful canyon roads(yes, I do contemplate these things because I am a gearhead). I had read some online forums for tire recommendations and, aside from the usual Brand-X-is-better-than-Brand-Y “discussions” (there are somestrongopinions), there was a recurring thread about having the right tires for the type of driving you're doing and the season in which you're doing it. It then hit me that this was very similar to the capabilities of the new TurboFlex features in the iSeries and VIPRION product lines. If you’ve ever been in a car running on tires with limited tread (like mine) or the wrong tread for the conditions, you know how difficult it can be to control the vehicle.Similar to putting different sets of tires on a car to optimize performance for different driving conditions, TurboFlex allows for changing thehardware offload and optimizationof an iSeries without needing to purchase a statically configured device (or a season-specific car, to keep with the metaphor). With its capability to change profiles to meet and match the requirements and optimizations needed to provide high performance without needing to change out the BIG-IP hardware, TurboFlex is very similar to changing the tires on a car instead of changing out the entire car. Lori MacVittie spoke with Enterprise Networking Planetabout how TurboFlex works in concert with other components in the BIG-IP ecosystem, such asApp ConnectorandContainer Connectorto provide adaptability and performance for application delivery. This holistic approach is no different than how all the components of a car work in concert to increase safety, performance, and capability for the driver and passengers. Easier Than Changing A Tire To Gain Performance And Capability With better performance for the prevailing conditions and the ability to adapt rapidly to new conditions as needed, TurboFlex sounds like the best of both worlds, being both flexible and purpose-built, while not needing to compromise on either. Similar to tires, however, you can't run multiple profiles simultaneously at this point (it doesn't seem wise to run winter tires on two wheels and summer tires on the other two, since they perform so differently). The iSeries is a substantial step in the evolution of the BIG-IP platform. With increases in cores, RAM, throughput, storage size, and the introduction of 10G interfaces in the entry-level platforms, the iSeries packs a lot of performance and capability in a single RU package. Additionally, having high-performance hardware acceleration for SSL transactions that use Elliptic Curve Cryptography (ECC) certificates across the appliance line is a major step forward in meeting the growing requirements for ECC certificates, such as with mobile device connections or the Internet of Things (IoT). In a larger sense, TurboFlex is the "glue" that pulls software and hardware together to provide in BIG-IP. Included in this umbrella diagram are a few components that haven't been talked about yet: TCAMs and the L2 Switch. In and of themselves, these components are fast and flexible and, in other applications, have a long history of working well together. Ternary Content-Addressable Memory (TCAM) chips have served in switches, routers, and firewalls for quite some time because they are able to provide high-speed table lookups, often several hundred thousand lookups per second. This capability is great for providing white/black/gray-list capabilities for ruleset evaluation in a firewall. When you integrate that capability with the L2 Switch and the FPGA capabilities, you've got a system that can provide protection against DDoS as well as enforcing Access-Control Lists (ACLs), all at line-rate. The great advantage to this integration is that none of these lookup requests are hitting the main CPUs. The capacity of the CPU to do more complicated work is diminished because the cycles are taken up by repetitive tasks, such as looking up and evaluating the firewall ruleset, so offloading those tasks it to purpose-built silicon means there's more capacity to perform more detailed and involved traffic manipulation and evaluation. Adding the software control components that round out TurboFlex, the flexibility and responsiveness of TMOS features such as iRules and iControl are able to interact with TurboFlex. Ultimately, this relationship allows the BIG-IP to process more traffic, even with data flows that include very detailed inspections and manipulations. A Short BIG-IP History Lesson To gain some perspective on how large a change this is from previous generations and iterations of the Application Delivery Controller (ADC), I need to share a short history lesson. Field Programmable Gate Arrays (FPGA) and Application-Specific Integrated Circuits (ASICs) have been a part of the BIG-IP platform for over 15 years. Almost every electronic device, especially networking equipment, has some form of ASIC in it, but you may see them as different types of memory or System-on-Chip (SoC) implementations. ASICs can perform tasks at the line-rate of most interfaces, being optimized for specific and highly repetitive tasks, such as sorting traffic by IP address and port or SSL session key exchange/negotiation. General purpose x86 CPUs are not optimized to handle computationally intensive and repetitive operations like cryptographic negotiation at high rates of speed because they have to reread and evaluate the algorithm for each operation, taking up clock cycles and slowing down all the other processes running on the CPU. Offloading these operations to a purpose-built device like an ASIC frees up the cycles on the CPU, allowing for more complex and dynamic operations, such as interpreting and manipulating traffic with iRules, to perform at a higher rate of speed overall. F5 recognized the need to push different operations off to ASICs early on, incorporating ASICs to offload compression and SSL negotiation operations and eventually designing its own ASICs,the Packet Velocity ASICs (PVA): PVA1, PVA2, and PVA10. These chips were designed to perform L4 traffic disaggregation at the line rate of the device they were in. The BIG-IP 8800 was the first BIG-IP to have 10G ports and the PVA10 ensured it could pass L4 traffic at 10 Gbps– "line-rate." It boasted the highest (at the time) rate of protection from SYN cookie attacks – over 9 million per second.The BIG-IP 8800 was also the last platform to have a PVA in it. Fun Fact: F5's FPGA design teams have over 500 years of experience combined across three development sites. No, this isn't 1000 people who have worked on FPGAs for six months. The average experience level for each person working on FPGA programming at F5 is close to 10 years. As it became clear over time that application, aka Layer 7 or L7, traffic was growing in usage, it also became clear that a static solution such as an ASIC was not the right tool for traffic processing. ASICs are very fast at what they do, but they are not flexible at all, seeing as all the logic to perform a specific task or tasks is etched in the silicon of the chip itself. Adding functions such as cookie persistence, header insertion and rewriting, and manipulation of the payload of packets were just not possible with an ASIC because it could not be reprogrammed once installed in a device. A more flexible and programmable solution was needed to evolve the optimizations through software upgrades. Enter the FPGA into the BIG-IP architecture. The first BIG-IP device to use FPGAs was the VIPRION B4100 (PB100), about 10 years ago. FPGAs, by their nature, can be reprogrammed and repurposed with different sets of logic and instructions, known as a bitstream. With the introduction of the VIPRION blades and the subsequent BIG-IP appliances, the functionality in the PVA was enhanced and included on these FPGAs as the embedded PVA (ePVA). Because the bitstream could be updated, unlike the PVA, the ePVA architecture could allow for additional features to be added as new releases arrived, such as providing acceleration for IPv6 traffic which would become very important for mobile phone networks as well as the Internet of Things (IoT). These updates eventually included being able to provide Denial of Service (DoS) attack mitigation at line rate, thanks to the ePVA. In the latest TMOS release, the bitstream includes mitigation for over 100 different vectors used in DoS attacks and it's updated as needed with new releases. The flexibility to update and tailor functionality was a great leap forward, but this was only a step towards a higher level of capability. While reprogramming the FPGAs to add new functionality was a great benefit of the architecture, the size (number of gates) of the FPGAs of the time didn't allow for many highly-specialized optimizations. The bitstreams for the early FPGA implementations had to be somewhat generic as a result. Now, with the increased capacity of the latest generation FPGAs, bitstreams can contain instructions to optimize traffic in multiple situations, often times increasing the performance of different software modules by a noticeable margin. Leveraging this flexible capability of FPGAs is where TurboFlex comes in to provide higher performance and greater operating efficiency for BIG-IP. Benefits of TurboFlex The proof is, as they say, in the pudding, so here are some performance gains when using the Security profile of TurboFlex in conjunction with AFM: Attack Mitigated Performance Benefit SYN Flood 33% less CPU used ICMP "Ping of Death" 9x Packets/Second and Bandwidth Capacity 38% less CPU used UDP Flood 6-13x Packets/Second and Bandwidth Capacity 56-64% less CPU used DNS Query Flood 3x Packets/Second and Bandwidth Capacity This is only a small sampling of the performance increase and resource saving that TurboFlex provides when implemented. There are over 110 other DDoS and DoS vectors that can be mitigated in the FPGA, so the resource savings show up across the board when dealing with high-volume attacks. Other profiles, such as the Private Cloud profile, can work with the App and Container Connectors to provide specific optimizations to disaggregate, secure, and direct traffic to those components in different architectures,such as deployments in the Equinexinfrastructure. A Moment On Composable Architectures Managing a data center has changed quite a bit over the last year, let alone the last 5 years. It's gone beyond just simply making sure that bits go from Point A to Point B as fast as possible to ensuring that applications and traffic can be moved and optimized dynamically, all the while handling ever increasing numbers of users. Of course, this transition is definitely not exempt from the mantra heard in meetings and hallways everywhere: "You need to do more, with less..." One of the big changes to the data center environment is the Rise of DevOps (sounding similar toTerminator 3: Rise of the Machines) and how much it requires a data center to be reconfigurable without human intervention. Orchestration is the enabling technology umbrella; the items underneath it must be "composable" to create harmonious operation across all the pieces that define an application in the data center and work in concert with the...OK, too many music references. The idea here is to have an infrastructure that can be dynamically adjusted to meet the needs of changing application requirements and security postures. How Does TurboFlex Fit Into These Composable Architectures? Continuing with the theme of tires, DevOps and pit stops have quite a bit in common. No, not the pit stops you might have on a road trip to get out of the car and stretch your legs. Instead, think of the highly trained and blazingly fast pit stops you see in the major auto racing series like Formula 1 or NASCAR where pit stops are well under ten seconds to change all four tires (the fastest change was 1.92 seconds by the Williams F1 team in 2016). These changes are not unlike what might be seen in a DevOps world – lightning fast changes to handle varying conditions. TurboFlex can do the same,providing line-rate optimizationswhich can be changed without rebooting the BIG-IP(a restart of the daemons is required, but that's much faster than rebooting a BIG-IP and a much shorter planned outage). These changes can be accomplished via TMSH, GUI, or, in an upcoming release, iControl-REST calls. Being able to reconfigure a BIG-IP device to optimize a different kind of traffic by enabling TurboFlex profiles through an orchestration system (using iControl-REST) with minimal interruption enables the fast transformation capabilities that are part of the Modern Datacenter(tm). How Do I Access TurboFlex? Building off of the Pay-As-You-Grow licensing that was introduced with the previous generation of BIG-IP appliances, each iSeries model is offered as a Standard version (those ending in 600, such as the i7600) and a Performance version (those ending in 800, such as the i7800). TurboFlex is enabled on the Performance versions of each appliance with capabilities determined by the size of the FPGA(s) in the platform. These capabilities, along with others such as vCMP, can be enabled on Standard models at a later date with an additional license key. TurboFlex is quite easy to put into action, since the profiles are enabled based on the modules licensed and provisioned. Originally, TurboFlex profiles were attached to the modules that made use of the optimizations, so if the AFM module was provisioned, the Security profile was enabled. No muss, no fuss, and no other choice. While easy to operate and reap the benefits of the optimizations, this setup did not allow for customization, especially when running multiple modules as you might with a Better or Best license. In version 13.1, additional flexibility to choose the active profile is available, providing the right optimization for the services that need it most, such as use cases where multiple modules are provisioned on a BIG-IP but one could benefit from hardware offload more than the others. With this capability to choose profiles, access via TMSH and the iControl-REST API will be possible, in addition to the existing TurboFlex components in the management GUI. Operational Benefits of TurboFlex Flexibility and specialization are usually not adjectives that go together; they seem somewhat at odds, in fact. In the case of TurboFlex, however, they do work together to provide an accurate description of what can be done to customize a deployment to meet the changing needs in the data center. As mentioned above, the increase in the capabilities and capacities of the latest generation of FPGAs allows for a greater variety of optimizations to be loaded simultaneously in each bitstream. The increase also means that there is a lot more room to add specialized and differentiated optimizations. By having a selection of profiles to tailor the BIG-IP appropriately to the kinds of traffic it may be asked to handle, each TurboFlex-enabled BIG-IP can provide a higher performance-per-watt and a lower cost-per-transaction. TurboFlex also allows for greater consolidation as smaller devices can outperform larger ones from the previous generations, reducing the rack space and cooling required to maintain the same performance point. Finally, having optimizations performed in hardware at line rate reduces the support costs associated with these performance optimizations because there's no additional configuration or programming required to achieve the higher performance and capabilities TurboFlex provides. Troubleshooting is simplified because there aren't additional items in the GUI which might be set incorrectly or scripts that need to be reviewed line by line to determine if there's an error in the logic. In short: TurboFlex simply works. It's an "Easy Button" for increased performance, just like the aforementioned change of tires on a car. Conclusion I need to reiterate this important development:TurboFlex enables reconfiguring hardware resources to optimize traffic for different use cases without changing out the base hardware.This, in and of itself, is a great step forward to provide higher performance and reliability for applications. Traditionally, performance improvements required new hardware, simply because the internal components became faster and more powerful. Unfortunately, the speed of improvement rarely matched the budgeting cycle, so technology refreshes had to wait for depreciation cycles to complete and for major infrastructural changes to be approved, due to needing to swap out gear. TurboFlex provides a way to operate outside of those cycles and adapt to changing conditions, almost dynamically. With additional hardware capabilities being delivered in software with major releases (and with a very fast turnaround), TurboFlex ensures that the iSeries and B4450 are able to keep up with the requirements of delivering highly available applications as they evolve. Application performance doesn't go flat or have a blowout, making for a safer and quicker trip for everyone, and it's easier than changing a tire.2.7KViews1like1CommentMitigating 40Gbps DDoS Attacks with the new BIG-IP VE for SmartNICs Solution
Introduction First off, if this is the first you’ve heard of this new solution please do go and either check out this Lightboard Lesson or review this Newsroom Article for more context and to bring yourself up-to-speed with what it is and how it works. In a nutshell though, the BIG-IP VE for SmartNICs solution is comprised of a high performance BIG-IP AFM VE integrated with an Intel FPGA PAC N3000 SmartNIC. By programming an FPGA embedded within the SmartNIC to assume responsibility for detecting and mitigating DDoS attacks, we can offload this function from BIG-IP VE. Processing and blocking all malicious DDoS packets within the FPGA before they reach the network infrastructure alleviates much of the strain such attacks usually place on VE CPU resources while significantly bolstering DDoS performance. If all that sounds too much like marketing fluff for your liking, the purpose of this article is to show just how significant those performance improvements are and how this solution really can protect cloud environments against a range of voluminous, complex attacks. To do so, we are going to compare the performance of the BIG-IP AFM VE for SmartNICs solution against a High Performance BIG-IP AFM VE (software-only) when handling the four different DDoS attack scenarios below: 1.TCP SYN-ACK Flood Attack 2.UDP Flood Attack 3.ICMPv4 Flood Attack 4.Combination of UDP Flood, ICMPv4 Flood and TCP SYN-ACK Flood Attacks Diagram A shows our basic setup, we have an Ixia acting as client and server generating both Malicious and Normal Traffic through the N3000 SmartNIC and BIG-IP VE. Diagram B shows 3.5Gbps of baseline Normal Traffic to show the effect of the malicious traffic on both software-only and then with hardware mitigation ON (SmartNIC FPGA enabled). Note: We are generating Malicious and Normal Traffic off separate ports of the Ixia to max out the malicious traffic port. Diagram A - Simplified layout of the Test Harness Diagram B - Baseline 3.5Gbps of Normal Traffic (Goodput) Test 1 – TCP SYN-ACK Flood Attack Below in Figure 1 you will see a TCP SYN-ACK Flood Attack performed first with software-only (SmartNIC FPGA disabled); this shows an initial drop of our Goodput at 1.6Gbps of malicious traffic and approaching zero at only 2.4Gbps with 100% CPU usage. Figure 1 – SYN-ACK Flood Mitigation with High Performance AFM VE (Software-only) Next in Figure 2 you will see the same attack performed with the SmartNIC FPGA enabled; we pass the software-only limit of 2.4Gbps and increase the malicious traffic to 36Gbps with no effect on the Goodput with only 31.3% CPU usage. Figure 2 – SYN-ACK Flood Mitigation with BIG-IP AFM VE for SmartNICs Test 2 – UDP Flood Attack Below in Figure 3 you will see a UDP Flood Attack performed first with software-only (SmartNIC FPGA disabled); this shows an initial drop of our Goodput at 1.2Gbps of malicious traffic and approaching zero at only 2.4Gbps with 100% CPU usage. Figure 3 – UDP Flood Mitigation with High Performance BIG-IP VE AFM (Software-only) Next in Figure 4 you will see the same attack performed with the SmartNIC FPGA enabled; we pass the software-only limit of 2.4Gbps and increase the malicious traffic to 36Gbps with no effect on the Goodput with only 31.3% CPU usage. Figure 4 – UDP Flood Mitigation with BIG-IP AFM VE for SmartNICs Test 3 – ICMPv4 Flood Attack Below in Figure 5 you will see an ICMPv4 Flood Attack performed first with software-only (SmartNIC FPGA disabled); this shows an initial drop of our Goodput at 1.2Gbps of malicious traffic and approaching zero at only 2.4Gbps with 100% CPU usage. Figure 5 – ICMPv4 Flood Mitigation with High Performance BIG-IP AFM VE (Software-only) Next in Figure 6 you will see the same attack performed with the SmartNIC FPGA enabled; we pass the software-only limit of 2.4Gbps and increase the malicious traffic to 36Gbps with no effect on the Goodput with only 29.8% CPU usage. Figure 6 – ICMPv4 Flood Mitigation with BIG-IP AFM VE for SmartNICs Test 4 – Combined SYN ACK Flood, UDP Flood Attack and ICMPv4 Flood Attack Last, we are going to send a combined attack to show a complex mitigation scenario. We will be using the full 40G capability of the Ixia port to generate malicious traffic while still maintaining 3.5Gbps of Goodput from a second stream off another Ixia port. Below in Figure 7 you will see a complex multi-vector attack performed first with software-only (SmartNIC FPGA disabled); this shows an initial drop of our Goodput at 0.8 Gbps of malicious traffic and approaching zero at only 2.4Gbps with 100% CPU usage. Figure 7 – Combined attack mitigation with High Performance BIG-IP AFM VE (Software-only) In figure 8 you will see the same attack performed with the SmartNIC FPGA enabled; we pass the software-only limit of 2.4Gbps and increase the malicious traffic to 40Gbps with no effect on the 3.5Gbps of Goodput with only 27.4% CPU usage. Figure 8 – Combined attack mitigation with BIG-IP AFM VE for SmartNICs Wrap up From the results it is very clear that with the FPGA enabled on the Intel PAC N3000 SmartNIC, BIG-IP AFM VE can handle single or complex multi-vector attacks without affecting the CPU or normal traffic flowing through the system. With software-only mitigation the CPU must deal with every packet entering the system which quickly exhausts resources. With the assistance of the FPGA on the SmartNIC the malicious traffic is blocked before it ever reaches the CPU, preventing saturation. Our normal traffic or Goodput is allowed through without interruption; this is important because it reflects a customer application, web page, VOIP or other traffic which should not be affected during an attack otherwise the attacker has met their objective. In summary we have shown you that using a Common Off The Shelf (COTS) server in conjunction with a SmartNIC and BIG-IP VE delivers protection similar to our BIG-IP iSeries appliances in an augmented software VE (VNF) solution. Figure 9 – Magnitude of different DDoS attacks both solutions were capable of mitigating Additional Resources ·F5 Cloud Docs – BIG-IP VE for SmartNICs ·F5 & Intel Solution Brief - BIG-IP VE for SmartNICs1.8KViews1like0Comments