Handle Over 100 Gbps With a Single BIG-IP Virtual Edition

Cloud computing is an inescapable term. The general public knows that their cat pictures, videos, and memes go to the cloud somehow. Companies and application developers go cloud-first, or often cloud-only, when they develop a new service. They think of the cloud as the group of resources and APIs offered by cloud providers.

Large enterprises and service providers have a bifurcated view of cloud computing: they see a public cloud and a private cloud. A service provider might mandate that any new software or services must run within their orchestrated virtualization or bare metal environment, actively discouraging or simply disallowing new vendor-specific hardware purchases. This has the effect of causing traditional networking vendors to improve the efficiency of their software offerings, and take advantage of the available server hardware in an opportunistic fashion.

Behold the early fruits of our labor.

100+ Gbps L4 From a Single VE?!

We introduced the high performance license option for BIG-IP Virtual Edition with BIG-IP v13.0.0 HF1. This means that rather than having a throughput capped license, you can purchase a license that is only restricted by the maximum number of vCPUs that can be assigned. This license allows you to optimize the utilization of the underlying hypervisor hardware. BIG-IP v13.0.0 HF1 introduced a limit of 16 vCPUs per VE. BIG-IP v13.1.0.1 raised the maximum to 24 vCPUs. Given that this is a non-trivial amount of computing capacity for a single VM, we decided to see what kind of performance can be obtained when you use the largest VE license on recent hypervisor hardware. The result is decidedly awesome. I want to show you precisely how we achieved 100+ Gbps in a single VE.

Test Harness Overview

The hypervisor for this test was KVM running on an enterprise grade rack mount server. The server had two sockets, and each socket had an Intel processor with 24 physical cores / 48 hyperthreads. We exposed 3 x 40 Gbps interfaces from Intel XL710 NICs to the guest via SR-IOV. Each NIC utilized a PCI-E 3.0x8 slot. There was no over-subscription of hypervisor resources.

Support for "huge pages", or memory pages much larger than 4 KB, was enabled on the hypervisor. It is not a tuning requirement, but it proved beneficial on our hypervisor. See: Ubuntu community - using hugepages.

The VE was configured to run BIG-IP v13.1.0.1 with 24 vCPU and 48 GB of RAM in an "unpacked" configuration. This means that we dedicated a single vCPU per physical core. This was done to prevent hyperthread contention within each physical core. Additionally, all of the physical cores were on the same socket. This eliminated any inter-socket communication latency and bus limitations.

The VE was provisioned with LTM only, and all test traffic utilized a single FastL4 virtual server. There were two logical VLANs. The 3 x 40 Gbps interfaces were logically trunked. The VE only has two L3 presences, one for the client network and one for the server network.

In direct terms, this is a single application deployment achieving 100+ Gbps with a single BIG-IP Virtual Edition.

Result

The network load was generated using Ixia IxLoad and Ixia hardware appliances. The traffic was legitimate HTTP traffic with full TCP handshakes and graceful TCP teardowns. A single 512 kB HTTP transaction was completed for every TCP connection. We describe this scenario as one request per connection, or 1-RPC. It's worth noting that 1-RPC is the worst case for an ADC.

For every Ixia client connection:

  • Three-way TCP handshake
  • HTTP request (less than 200 B) delivered to Ixia servers
  • HTTP response (512 kB, multiple packets) from Ixia servers
  • Three-way TCP termination

The following plot shows the L7 throughput in Gbps during the "sustained" period of a test, meaning that the device is under constant load and new connections are being established immediately after a previous connection is satisfied. If you work in the network testing world, you'll probably note how stupendously smooth this graph is...

The average for the sustained period ends up around 108 Gbps. Note that, as hardware continues to improve, this performance will only go up.

Considerations

Technical forums love car analogies and initialisms, like "your mileage may vary" as YMMV. This caveat applies to the result described above. You should consider these factors when planning a high performance VE deployment:

  • Physical hardware layout of the hypervisor - Non-uniform memory access (NUMA) architectures are ubiquitous in today's high density servers. In very simple terms, the implication of NUMA architectures is that the physical locality of a computational core matters. All of the work for a given task should be confined to a single NUMA node when possible. The slot placement of physical NICs can be a factor as well. Your server vendor can guide you in understanding the physical layout of your hardware.

    Example: you have a hypervisor with two sockets, and each socket has 20c / 40t. You have 160 Gbps of connectivity to the hypervisor. The recommended deployment would be two 20 vCPU high performance VE guests, one per socket, with each receiving 80 Gbps of connectivity. Spanning a 24 vCPU guest across both sockets would result in more CPU load per unit of work done, as the guest would be communicating between both sockets rather than within a single socket.
     

  • Driver support - The number of drivers that BIG-IP supports for SR-IOV access is growing. See: https://support.f5.com/csp/article/K17204. Do note that we also have driver support for VMXNET3, virtio, and OvS-DPDK via virtio. Experimentation and an understanding of the available hypervisor configurations will allow you to select the proper deployment.
     

  • Know the workload - This result was generated with a pure L4 configuration using simple load balancing, and no L5-L7 inspection or policy enforcement. The TMM CPU utilization was at maximum during this test. Additional inspection and manipulation of network traffic requires more CPU cycles per unit of work.
     

Published Aug 21, 2018
Version 1.0
  • Hi team, plz I have a doubt.. Is mandatory in all  NIC interfaces set to SR-IOV for F5 BIG IP VE high performance licenses? Thks. B.R

  • Greetings rafaelbn!

     

    What you have described seems correct. Disabling LACP is required, which is satisfied by your "channel-group mode on" switch configuration. After that it's just a matter of making sure that the VFs you expose to the BIG-IP VE have driver support.

     

    I'd love to hear about any interesting problems that you solve with this functionality. Feel free to reach out to me directly. You should be able to get my contact information through the partner resources.

     

    Keep building awesome. :)

     

  • Hello Robby! Thanks for the link. I read that article and the problem to me was seeing "the entire" solution. We tested a scenario here and I would like to share a topology.

     

     

    Does this schematic makes sense? The ideia is that ports 1/4 and 1/5 on the switch would be configured with "channel-group mode on". VF3-1 and VF4-1 would be the only VFs associated with interfaces 3 and 4 on the DELL server. That way, I could create an trunk inside the BIG-IP. In my head, this is the only way to make sure the VE will have high throughput. Does that make any sense? Thanks!

     

  • Hello Robby! Great article! I work for a F5 partner here in Brazil. We're going to implement a similar scenario but with 40G on the VE instead. This is going to be a CGNAT solution inside Linux KVM with SR-IOV. I would love to get more details about the implementation you did in this guide. I'm having some trouble if this solution. Any tips for me? The main problem to me right now is that I don't know how to setup/design the networking part. The DELL server will have 4x10G interfaces. How can I trunk/bundle these interfaces?

     

  • Wow that's huge amount of traffic! Super cool.

     

    It's also interesting to see results with http profile enabled and how load balancing between several nodes affects performance.

     

    And finally, with basic ASM inspections turned on.