on 28-Jul-2017 10:15
TurboFlex is the latest stop on F5's 15+ year journey using customized silicon to accelerate and optimize network traffic but it’s more than just a stop. In fact, it’s more accurate to call it a quantum leap: use-case-oriented hardware optimizations, which can be changed with only a minimal interruption in service.. TurboFlex, introduced with the iSeries appliances and VIPRION B4450 blade, can increase performance dramatically while reducing the load on other systems, such as the CPU. This advancement is more than just new software. It’s the union of the software and hardware components working together in harmony, more so than on any prior generation of BIG-IP devices.
I thought about how best to describe the impact and advancement of TurboFlex and, while merely putting performance numbers and specific features could tell the story, talking about the performance increase only covered a portion of what TurboFlex really is. Instead, I’ll use an analogy based on a recent trip I took. I went on a road trip at the end of June which, due to an oversight on my part in checking tread depth, caused me to replace my tires in the early part of a 3500-mile road trip. Fortunately for me, I was able to get the tires drop-shipped to an installer at a city on my trip, installed in about an hour, and I was back on my way to have other adventures. While getting these tires installed, I wandered and found a coffee shop and thought about the importance of ordering the right sort of tires for my car, especially for driving in the Summer and driving through some beautiful canyon roads (yes, I do contemplate these things because I am a gearhead). I had read some online forums for tire recommendations and, aside from the usual Brand-X-is-better-than-Brand-Y “discussions” (there are some strong opinions), there was a recurring thread about having the right tires for the type of driving you're doing and the season in which you're doing it. It then hit me that this was very similar to the capabilities of the new TurboFlex features in the iSeries and VIPRION product lines.
If you’ve ever been in a car running on tires with limited tread (like mine) or the wrong tread for the conditions, you know how difficult it can be to control the vehicle. Similar to putting different sets of tires on a car to optimize performance for different driving conditions, TurboFlex allows for changing the hardware offload and optimization of an iSeries without needing to purchase a statically configured device (or a season-specific car, to keep with the metaphor).
With its capability to change profiles to meet and match the requirements and optimizations needed to provide high performance without needing to change out the BIG-IP hardware, TurboFlex is very similar to changing the tires on a car instead of changing out the entire car. Lori MacVittie spoke with Enterprise Networking Planet about how TurboFlex works in concert with other components in the BIG-IP ecosystem, such as App Connector and Container Connector to provide adaptability and performance for application delivery. This holistic approach is no different than how all the components of a car work in concert to increase safety, performance, and capability for the driver and passengers.
With better performance for the prevailing conditions and the ability to adapt rapidly to new conditions as needed, TurboFlex sounds like the best of both worlds, being both flexible and purpose-built, while not needing to compromise on either. Similar to tires, however, you can't run multiple profiles simultaneously at this point (it doesn't seem wise to run winter tires on two wheels and summer tires on the other two, since they perform so differently).
The iSeries is a substantial step in the evolution of the BIG-IP platform. With increases in cores, RAM, throughput, storage size, and the introduction of 10G interfaces in the entry-level platforms, the iSeries packs a lot of performance and capability in a single RU package. Additionally, having high-performance hardware acceleration for SSL transactions that use Elliptic Curve Cryptography (ECC) certificates across the appliance line is a major step forward in meeting the growing requirements for ECC certificates, such as with mobile device connections or the Internet of Things (IoT).
In a larger sense, TurboFlex is the "glue" that pulls software and hardware together to provide in BIG-IP.
Included in this umbrella diagram are a few components that haven't been talked about yet: TCAMs and the L2 Switch. In and of themselves, these components are fast and flexible and, in other applications, have a long history of working well together. Ternary Content-Addressable Memory (TCAM) chips have served in switches, routers, and firewalls for quite some time because they are able to provide high-speed table lookups, often several hundred thousand lookups per second. This capability is great for providing white/black/gray-list capabilities for ruleset evaluation in a firewall.
When you integrate that capability with the L2 Switch and the FPGA capabilities, you've got a system that can provide protection against DDoS as well as enforcing Access-Control Lists (ACLs), all at line-rate. The great advantage to this integration is that none of these lookup requests are hitting the main CPUs. The capacity of the CPU to do more complicated work is diminished because the cycles are taken up by repetitive tasks, such as looking up and evaluating the firewall ruleset, so offloading those tasks it to purpose-built silicon means there's more capacity to perform more detailed and involved traffic manipulation and evaluation.
Adding the software control components that round out TurboFlex, the flexibility and responsiveness of TMOS features such as iRules and iControl are able to interact with TurboFlex. Ultimately, this relationship allows the BIG-IP to process more traffic, even with data flows that include very detailed inspections and manipulations.
To gain some perspective on how large a change this is from previous generations and iterations of the Application Delivery Controller (ADC), I need to share a short history lesson. Field Programmable Gate Arrays (FPGA) and Application-Specific Integrated Circuits (ASICs) have been a part of the BIG-IP platform for over 15 years. Almost every electronic device, especially networking equipment, has some form of ASIC in it, but you may see them as different types of memory or System-on-Chip (SoC) implementations. ASICs can perform tasks at the line-rate of most interfaces, being optimized for specific and highly repetitive tasks, such as sorting traffic by IP address and port or SSL session key exchange/negotiation. General purpose x86 CPUs are not optimized to handle computationally intensive and repetitive operations like cryptographic negotiation at high rates of speed because they have to reread and evaluate the algorithm for each operation, taking up clock cycles and slowing down all the other processes running on the CPU. Offloading these operations to a purpose-built device like an ASIC frees up the cycles on the CPU, allowing for more complex and dynamic operations, such as interpreting and manipulating traffic with iRules, to perform at a higher rate of speed overall.
F5 recognized the need to push different operations off to ASICs early on, incorporating ASICs to offload compression and SSL negotiation operations and eventually designing its own ASICs, the Packet Velocity ASICs (PVA): PVA1, PVA2, and PVA10. These chips were designed to perform L4 traffic disaggregation at the line rate of the device they were in. The BIG-IP 8800 was the first BIG-IP to have 10G ports and the PVA10 ensured it could pass L4 traffic at 10 Gbps – "line-rate." It boasted the highest (at the time) rate of protection from SYN cookie attacks – over 9 million per second. The BIG-IP 8800 was also the last platform to have a PVA in it.
Fun Fact: F5's FPGA design teams have over 500 years of experience combined across three development sites. No, this isn't 1000 people who have worked on FPGAs for six months. The average experience level for each person working on FPGA programming at F5 is close to 10 years.
As it became clear over time that application, aka Layer 7 or L7, traffic was growing in usage, it also became clear that a static solution such as an ASIC was not the right tool for traffic processing. ASICs are very fast at what they do, but they are not flexible at all, seeing as all the logic to perform a specific task or tasks is etched in the silicon of the chip itself. Adding functions such as cookie persistence, header insertion and rewriting, and manipulation of the payload of packets were just not possible with an ASIC because it could not be reprogrammed once installed in a device. A more flexible and programmable solution was needed to evolve the optimizations through software upgrades. Enter the FPGA into the BIG-IP architecture.
The first BIG-IP device to use FPGAs was the VIPRION B4100 (PB100), about 10 years ago. FPGAs, by their nature, can be reprogrammed and repurposed with different sets of logic and instructions, known as a bitstream. With the introduction of the VIPRION blades and the subsequent BIG-IP appliances, the functionality in the PVA was enhanced and included on these FPGAs as the embedded PVA (ePVA). Because the bitstream could be updated, unlike the PVA, the ePVA architecture could allow for additional features to be added as new releases arrived, such as providing acceleration for IPv6 traffic which would become very important for mobile phone networks as well as the Internet of Things (IoT). These updates eventually included being able to provide Denial of Service (DoS) attack mitigation at line rate, thanks to the ePVA. In the latest TMOS release, the bitstream includes mitigation for over 100 different vectors used in DoS attacks and it's updated as needed with new releases.
The flexibility to update and tailor functionality was a great leap forward, but this was only a step towards a higher level of capability. While reprogramming the FPGAs to add new functionality was a great benefit of the architecture, the size (number of gates) of the FPGAs of the time didn't allow for many highly-specialized optimizations. The bitstreams for the early FPGA implementations had to be somewhat generic as a result. Now, with the increased capacity of the latest generation FPGAs, bitstreams can contain instructions to optimize traffic in multiple situations, often times increasing the performance of different software modules by a noticeable margin.
Leveraging this flexible capability of FPGAs is where TurboFlex comes in to provide higher performance and greater operating efficiency for BIG-IP.
The proof is, as they say, in the pudding, so here are some performance gains when using the Security profile of TurboFlex in conjunction with AFM:
33% less CPU used
|ICMP "Ping of Death"||
9x Packets/Second and Bandwidth Capacity
38% less CPU used
6-13x Packets/Second and Bandwidth Capacity
56-64% less CPU used
|DNS Query Flood||
3x Packets/Second and Bandwidth Capacity
This is only a small sampling of the performance increase and resource saving that TurboFlex provides when implemented. There are over 110 other DDoS and DoS vectors that can be mitigated in the FPGA, so the resource savings show up across the board when dealing with high-volume attacks. Other profiles, such as the Private Cloud profile, can work with the App and Container Connectors to provide specific optimizations to disaggregate, secure, and direct traffic to those components in different architectures, such as deployments in the Equinex infrastructure.
Managing a data center has changed quite a bit over the last year, let alone the last 5 years. It's gone beyond just simply making sure that bits go from Point A to Point B as fast as possible to ensuring that applications and traffic can be moved and optimized dynamically, all the while handling ever increasing numbers of users. Of course, this transition is definitely not exempt from the mantra heard in meetings and hallways everywhere: "You need to do more, with less..."
One of the big changes to the data center environment is the Rise of DevOps (sounding similar to Terminator 3: Rise of the Machines) and how much it requires a data center to be reconfigurable without human intervention. Orchestration is the enabling technology umbrella; the items underneath it must be "composable" to create harmonious operation across all the pieces that define an application in the data center and work in concert with the...OK, too many music references. The idea here is to have an infrastructure that can be dynamically adjusted to meet the needs of changing application requirements and security postures.
Continuing with the theme of tires, DevOps and pit stops have quite a bit in common. No, not the pit stops you might have on a road trip to get out of the car and stretch your legs. Instead, think of the highly trained and blazingly fast pit stops you see in the major auto racing series like Formula 1 or NASCAR where pit stops are well under ten seconds to change all four tires (the fastest change was 1.92 seconds by the Williams F1 team in 2016). These changes are not unlike what might be seen in a DevOps world – lightning fast changes to handle varying conditions. TurboFlex can do the same, providing line-rate optimizations which can be changed without rebooting the BIG-IP (a restart of the daemons is required, but that's much faster than rebooting a BIG-IP and a much shorter planned outage). These changes can be accomplished via TMSH, GUI, or, in an upcoming release, iControl-REST calls. Being able to reconfigure a BIG-IP device to optimize a different kind of traffic by enabling TurboFlex profiles through an orchestration system (using iControl-REST) with minimal interruption enables the fast transformation capabilities that are part of the Modern Datacenter(tm).
Building off of the Pay-As-You-Grow licensing that was introduced with the previous generation of BIG-IP appliances, each iSeries model is offered as a Standard version (those ending in 600, such as the i7600) and a Performance version (those ending in 800, such as the i7800). TurboFlex is enabled on the Performance versions of each appliance with capabilities determined by the size of the FPGA(s) in the platform. These capabilities, along with others such as vCMP, can be enabled on Standard models at a later date with an additional license key.
TurboFlex is quite easy to put into action, since the profiles are enabled based on the modules licensed and provisioned. Originally, TurboFlex profiles were attached to the modules that made use of the optimizations, so if the AFM module was provisioned, the Security profile was enabled. No muss, no fuss, and no other choice. While easy to operate and reap the benefits of the optimizations, this setup did not allow for customization, especially when running multiple modules as you might with a Better or Best license. In version 13.1, additional flexibility to choose the active profile is available, providing the right optimization for the services that need it most, such as use cases where multiple modules are provisioned on a BIG-IP but one could benefit from hardware offload more than the others. With this capability to choose profiles, access via TMSH and the iControl-REST API will be possible, in addition to the existing TurboFlex components in the management GUI.
Flexibility and specialization are usually not adjectives that go together; they seem somewhat at odds, in fact. In the case of TurboFlex, however, they do work together to provide an accurate description of what can be done to customize a deployment to meet the changing needs in the data center. As mentioned above, the increase in the capabilities and capacities of the latest generation of FPGAs allows for a greater variety of optimizations to be loaded simultaneously in each bitstream. The increase also means that there is a lot more room to add specialized and differentiated optimizations.
By having a selection of profiles to tailor the BIG-IP appropriately to the kinds of traffic it may be asked to handle, each TurboFlex-enabled BIG-IP can provide a higher performance-per-watt and a lower cost-per-transaction. TurboFlex also allows for greater consolidation as smaller devices can outperform larger ones from the previous generations, reducing the rack space and cooling required to maintain the same performance point. Finally, having optimizations performed in hardware at line rate reduces the support costs associated with these performance optimizations because there's no additional configuration or programming required to achieve the higher performance and capabilities TurboFlex provides. Troubleshooting is simplified because there aren't additional items in the GUI which might be set incorrectly or scripts that need to be reviewed line by line to determine if there's an error in the logic. In short: TurboFlex simply works. It's an "Easy Button" for increased performance, just like the aforementioned change of tires on a car.
I need to reiterate this important development: TurboFlex enables reconfiguring hardware resources to optimize traffic for different use cases without changing out the base hardware. This, in and of itself, is a great step forward to provide higher performance and reliability for applications. Traditionally, performance improvements required new hardware, simply because the internal components became faster and more powerful. Unfortunately, the speed of improvement rarely matched the budgeting cycle, so technology refreshes had to wait for depreciation cycles to complete and for major infrastructural changes to be approved, due to needing to swap out gear. TurboFlex provides a way to operate outside of those cycles and adapt to changing conditions, almost dynamically. With additional hardware capabilities being delivered in software with major releases (and with a very fast turnaround), TurboFlex ensures that the iSeries and B4450 are able to keep up with the requirements of delivering highly available applications as they evolve. Application performance doesn't go flat or have a blowout, making for a safer and quicker trip for everyone, and it's easier than changing a tire.
The way you detail and glue information together is incredible.