Exploring BIG-IP VE capabilities on HPE ProLiant DL365 Gen10 Plus servers - Part 1 of 2

This article continues an investigation into the viability of using popular and readily available, rack-mountable servers as an enterprise-grade performant solution for F5 software, namely the BIG-IP Virtual Edition (VE) software enabled by VMware ESXi™ as a hypervisor. The server explored is from Hewlett Packard Enterprise (HPE) and is a platform that is commonly deployed in enterprise environments, it is a mid-range offering that nicely combines a reasonable purchase cost with attractive, modern internal components.  The specific model selected for the study was the HPE ProLiant DL365 Gen10 Plus.

The goal is to understand the expected performance of the resulting BIG-IP and to document precisely the steps followed as it was quickly implemented on a mainstream HPE compute platform.  The hardware, specifically all of the internal components, were in part selected to maintain server delivery to just a matter of days, aligning itself to short deployment cycles.

Performance was established in a lab environment using the very popular test and measurement load generators from Keysight’s Ixia division, specifically an Ixia PerfectStorm. The following tests demonstrate that using HPE to provide the compute platform for F5 software achieves the dual benefit of quick deployments and excellent performance results for modern, complex enterprise traffic patterns.

In short, whether purchasing a new server from HPE or leveraging existing hardware already owned, this document details how higher performance outcomes can be achieved when industry leading F5 software is executed using the popular HPE hardware platform.

HPE Server and BIG-IP Software Overview

One common deployment approach to implement BIG-IP is to use a hardware solution from F5, such as the iSeries or rSeries platforms. An expected inquiry when looking at alternative approaches, is whether commercial off the shelf (COTS) servers can match the performance of the BIG-IP system on F5 appliances. A leading server worldwide in terms of customer adoption, an HPE ProLiant appliance was chosen for rigorous testing. Tests were performed using widely accepted, industry standard load generation and measurement tools. The server selected for these tests, and the key internal components, were specifically chosen to reflect a readily available, mainstream solution.

It is important to note that we would expect similar outcomes from other HPE servers, this model and its particular internal parts can potentially add guidance when selecting a similar HPE platform.  The full parts list is available from F5, but the highlights include:

  • HPE ProLiant DL365 Gen10 Plus server (two, to allow for high availability [HA] deployments)
  • AMD EPYC 7543 Processor 2.8G (32 physical cores, only one of two motherboard CPU sockets populated allowing for future scaling)
  • 128 gigabytes of memory (RDIMM, 3200 MT/sec) (8x16GB DIMM configuration to utilize all memory channels)
  • 2x 960 gigabytes of storage in RAID-1 (SSD SAS ISE Read Intensive) for HA
  • Mellanox MT2894 ConnectX-6 Lx Dual Port 10/25GbE interface card equipped with 25 Gbps transceivers for traffic handling and HA configuration sync purposes
  • Broadcom BCM57416 NetXtreme-E 10 Gbps dual port interfaces (OCP 3.0) for ESXi management

See the section below for information on our rational for selecting the above components over possible other choices that we could have made.

The operating system of the server—ordered as a pre-installed, out-of-the-box solution from HPE—was VMware ESXi version 7.0.3 (Build 20036589).  The ProLiant server was ordered from HPE with an HPE Megaraid Mr216i-p hardware RAID disk controller in one of the PCI-E slots and the VMware ESXi pre-installed.

Upon powering up and configuring the ESXi server’s initial values (management, DNS, Time, etc.), the next item would be to create network port groups with different VLANs and subnets per port group (e.g., management, internal, external, and high availability [HA]).  The following step was the deployment of a single instance of BIG-IP VE as a virtual machine on the ESXi OS. To maximize performance and allow for repeatable deployments, no other virtual machines were deployed on the ESXi instance. BIG-IP VE was deployed using an OVA downloaded from downloads.f5.com and setup with the instructions from BIG-IP VE deployment guide (No SR-IOV) configuring with the 8vCPU deployment model and the previously created networks from the ESXi/vSphere HTML5 console.  This type of deployment usually takes less than 30 minutes to complete.

To facilitate enhanced performance reporting and common industry practices, a VMware vCenter™ instance was also utilized for management, vCenter lends itself to use of complementary features such as the use of virtual distributed switching (VDS) with the HPE ESXi host.  In this testing, however, all results were achieved using the standard virtual switch (VSS) that can be run on a standalone ESXi server. The vCenter was hosted off-box to preserve server performance and its use is considered industry norm in real world scenarios and thus was leveraged for ESXi management.

HPE Server Components - Selection Rationale

The selection of elements internal to the HPE ProLiant DL365 Gen10 Plus was guided by F5 best practice criteria. Highlights of the decision-making process included the following:

  • An AMD EPYC TM 7003 Series processor, specifically the 7543 model with a 2.8 GHz frequency, launched in 2021 with 32 physical cores and up to 64 threads was chosen. There are multiple classes of processors that could have been selected (e.g., higher core count CPUs).  The aim for this testing was to balance between core counts and frequency of the CPU and therefore the 2.8 GHz AMD processor was selected. As cores are increased, frequency typically is lessened due to the need to dissipate more heat across the denser core count housed on the silicon used in most processor dies.
  • Processor hyperthreading, referred to as simultaneous multithreading (SMT) by AMD, was not enabled. The optimal performance of virtual machines on a server such as the ProLiant should occur when virtual cores map directly to physical cores, this was the previously measured experience with testing of Dell hardware recently. Hyperthreading enables two threads per core, effectively giving the appearance of a doubling in the underlying thread count, however it doesn’t increase the performance of the processor by 2x.  Our preference in this case continues to be the guarantee that the virtual machine utilizes physical cores.
  • F5 does co-develop enhanced drivers with Intel and Mellanox network adapters, to allow for features such as SR-IOV support, where a virtual function (VF) of the adapter can be utilized for offloading and direct network access.  This guide was prepared using a Mellanox MT2894 ConnextX-6 Lx adapter.
  • The HPE server was equipped with durable and fast solid-state drives as opposed to spinning hard disk drives. We selected dual 960 GB running in RAID-1 to allow for the Operating System (OS) and the function of the BIG-IP VE room for expansion and redundancy.  The usage of solid-state disks allows for improved data access times on both the OS and VE.  A hardware RAID controller, the HPE MR216i-p, was ordered to maximize performance and populated the first PCI-E slot.

Performance Test Setup in Lab

The HPE ProLiant DL365, with the factory installation of ESXi 7.0.3, was put through a series of highly controlled test scenarios to see what benchmarks could be achieved with no manipulation of hardware or software, providing a first impression of what could be expected from the solution with no optimization. BIG-IP VE can be equipped with a range of license types that can impose throughput ceilings at levels that reflect real world scenarios. In this way, F5 delivers a variety of price/performance options to suit a wider range of users. Typical VE licenses are capped at 1, 3, 5 and 10 Gbps. For the purposes of our initial benchmark, a 10 Gbps license was installed. Full license details, including the physical cores each license can leverage, are available here.

An HTTP virtual server was configured on the external side of the BIG-IP deployment and a corresponding internal pool of 72 configured nodes—corresponding to internal servers—was set up. This is a standard setup within modern datacenters. Per our intention to record a best-case scenario with the HPE server right out of the box, advanced features for which F5 is well known (e.g., detailed iRules and TLS encryption), were not used as part of the initial benchmark. The server profiles utilized were layer4 TCP-centric for bandwidth measurements and layer 7 HTTP-centric for transaction rate tests.

The test bed is depicted in the following diagram.

Initial Test Results with HPE Server Out-of-the-Box Setup

The objective of the first set of tests was to determine

1) the achievable throughput of the solution, established with massive numbers of clients downloading large 512-kilobyte objects, and

2) the maximum HTTP transaction rate. Transaction rate is measured when users rapidly request many, smaller 128-byte pages. In both scenarios, 100 transactions were conducted per the established TCP connection prior to tear down of the connection.

Throughput Result:

With the Ixia PerfectStorm solution instructed to step up to 1,024 concurrent simulated users (clients) with 100 requests per TCP connection, each retrieving large 512KB (kilobyte) objects, the steady state achieved was just over 9.5 Gbps of sustained traffic across the HPE ProLiant and BIG-IP VE pairing.  The BIG-IP utilized a layer 4 oriented server profile.  Interestingly, CPU availability provided by the HPE server was not a bottleneck, with ample CPU reported available even at full load.  Load was spread evenly across cores with no core tasked more than 30 percent.  The corresponding transaction rate for these large downloads was measured to be approaching 2,300 transactions per second.

 

Transaction Rate Result:

When layer 7 (HTTP) transactions (as opposed to bandwidth) became the focus, the smaller 128B (byte) page download meant many more potential transactions per second. Since the BIG-IP platform is by nature performing a full proxy function, the achievable transaction rate is likely just as important as the supported bandwidth. The BIG-IP server profile leveraged was a layer 7 oriented configuration with basic iRules also enabled.  HTTP-centric features such as cookie-based server load balancing are empowered when using layer 7 profiles. The measured, sustained maximum HTTP transaction rate was determined to be in excess of 365,000 transactions per second. It is important to note, as with throughput, this is simply out-of-the-box performance.

 Transaction performance, depicted above, achieved with a layer 7 HTTP-aware server profile, could even reach greater values when using a layer 4 TCP centric server profile where it was observed to exceed 600,000 HTTP transactions per second.  A last data point worth noting occurred in tests scenarios which restricted each TCP connection to a single HTTP transaction thus maximizing the new TCP connections per second.  In this case as many as 195,000 TCP connections per second could be established, all with an out of the box deployment.

Optimized Test Results with Specific Adjustments to the HPE ProLiant and ESXi 7.0.3

The impact of the optimizations to both the underlying HPE server and the BIG-IP virtual machine was significant, however the out of the box performance numbers were exceptionally good to start with when equipped with a 10Gbps F5 license. The details around those optimizations are provided in a subsequent section, while this analysis is focused entirely on the new results.

The throughput-oriented tests once again had the Ixia PerfectStorm stepping up the number of active users up to 1,024, with 100 requests per TCP connection, 512KB (kilobyte) objects retrieved with each successive request over the TCP connection. An immediate increase which saw throughput right on the 10 Gbps threshold, versus the originally measured 9.5 Gbps throughput, was detected after implementing the optimizations.

The 0.5 Gbps of extra throughput corresponds to a 5 percent increase over out-of-the-box performance. Also of importance, testing was performed with a 10 Gbps virtual edition license, suggesting the underlying HPE ProLiant server and interfaces could potentially have supported even more throughput.

A test which removed the 10 Gbps license from the equation was run to simply get an estimation of what the solution could theoretically reach, keeping in mind the two Mellanox Ethernet interfaces were clocking at 25 Gbps.  The result is depicted below and indicates the potential for traffic handling by HPE hardware and F5 software was potentially only bounded by interface clock speed.

 

To measure transactions per second, with the 10Gbps license in place, tests were performed using essentially the same logic as the Ixia PerfectStorm throughput test but which much smaller, 128B (byte) objects retrieved 100 times per each TCP connection. This resulted in more transactions attempted and even higher incremental gains. With the optimized server and virtual BIG-IP, running a layer 7 HTTP virtual server profile including basic iRules, the increase was just over 125 percent, with 822,000 transactions per second achieved on the optimized deployment versus 365,000 measured in the out of the box test.  It was noted that transactional testing generally imposed more load on the CPU cores than bandwidth testing, an average core utilization of 65 percent was observed at peak loads.

 To recap, BIG-IP virtual edition licenses in wide use today are presently bandwidth oriented, to allow selection of the appropriate trim level; our use of a 10 Gbps license for this testing was reflected in the fact that bandwidth plateaued at the 10Gbps mark on the optimized server.  Heavy transactional workloads generally are not affected by license bandwidth caps, as our peak transactions per second resulted in barely 3 Gbps of sustained throughput.  With respect to peak TCP connections per second, testing with only a single small 128B (byte) transferred on a single connection was conducted on the optimized solution and sustained measurements of 360,000 TCP connections per second were achieved.

Please click the following link to be taken to part two of this article, where the specifics on what optimizations were conducted are discussed in detail:

Exploring BIG-IP VE capabilities on HPE ProLiant DL365 Gen10 Plus servers - Part 2 of 2 

 

Updated Feb 21, 2023
Version 2.0
  • PSilva's avatar
    PSilva
    Ret. Employee

    We also got a video to go with this!

    Optimizing F5 BIG-IP Software on a Dell Server