This is part two of this article, please find the first part at the following link: Exploring BIG-IP VE capabilities on Dell PowerEdge R650 Servers - Part One
The Optimizations We Selected
After careful consideration of the delivered Dell R650 and installed ESXi hypervisor, and initial benchmark tests were completed, the following changes were made:
The first change was to disable Hyper-threading (HT) so that we were guaranteed that virtual cores (VCPU) were mapped to physical cores (PCPUs). HT with Intel processors allows a PCPU to appear as two “logical” cores, meaning 32 physical cores can appear to the operating system as 64 threads. Listing all cores interlocks the PCPUs and each corresponding logical core. Therefore, cores appearing as 0,2,4, 6,…., will be physical cores with 1,3,5,7,…., being associated logical (hyperthreaded) cores. The traditional value of HT—allowing highly complex and disparate workloads to task one physical core simultaneously—does not add value in a focused, single use case of the server which seeks out optimal performance.
Without disabling Hyper-threading, to ensure separate PCPUs are used, a careful assignment of cores would need to be followed to ensure only physical cores were leveraged (pinned). The simplest approach was to disable Hyperthreading by accessing the Dell server BIOS. The first step to achieving this is to shut down the server, access the server setup upon startup, and select system BIOS via the Integrated Dell Remote Access Controller (iDRAC) module, as shown here:
Next, enter the Processors screen and simply disable Logical Processor with one mouse click (as shown below). After a system save and restart, Hyper-threading will be disabled.
Upon restart, to ensure consistency, one should also quickly confirm that HT has been disabled from the perspective of the hypervisor as well. Access the Dell R650 host via an instance of vCenter/ESXi HTML5 console and examine the hardware configuration of the server from the ESXi perspective, as shown here:
Scrolling down the hardware overview window will bring one to the Processors section, where the initial view will look like the following:
Use the “Edit Hyperthreading” button to gain access to the HT setting and disable the feature by removing the corresponding check mark, as seen in the following image. This ensures that the Hypervisor does not re-enable the hyperthreading due to BIOS updates/changes.
Additional optimization with respect to harnessing full, maximized processing power through CPU core affinity adjustment, a potentially small boost might be achieved through bypassing core 0 and opting for cores 1 through 8.
The recommendation of bypassing core 0 stems from previous versions of ESXi where management tasks, and networking functions on a virtual machine would typically task the first physical core, CPU0. Although not tested here, this change can be made by logging directly into the host and utilizing the vSphere interface to adjust the affinity setting, as shown below:
In keeping with best practice of deploying a BIG-IP we recommend the allocation of 2 GB of memory for each of the 8 VCPUs, 1 TMM (Traffic Management Microkernels) Per VCPU up to 8 TMMs.
Power Management Optimization
The next optimization applied to the Dell R650 server was to change the default power management active policy to one of the available policies which prioritizes server performance and reduces the amount of CPU throttling that has been known to inhibit latency sensitive applications (VMware KB). Step one to this policy change is done through the BIOS, from a server restart access either the physical machine and/or iDRAC to access the console to gain access to the BIOS menus and select “System Profile Settings”.
Set the System Profile attribute to the Performance option and save the changes to the BIOS and restart.
With the BIOS setup saved and the server restarted, it is recommended to set in the ESXi/vSphere Console the Power Management Active Policy to “High Performance”. This ensures that any unintentional BIOS updates/changes are also reflected in the Operating System.
High-speed Ethernet Adjustments
One primary concern in achieving high traffic rates through the Dell server is to ensure unhindered performance of the Intel 25 Gbps network interface cards (NICs). To do this, we first confirmed the “icen” drivers for the E810-XXV Dual Port OCP 3.0 adapter were supported on the VMware Hardware Compatibility Matrix.
A check with VMware’s Hardware Compatibility List (HCL), indicated that the Dell Factory installed OS (including the Dell Factory ISO) contains a fully supported driver, version 1.6.2. To confirm in your setup, use the search function in the compatibility tool with values such as those shown below:
The resulting details (shown below) will both identify the latest driver compatible with ESX 7.0 Update 3 and provide a link where the driver can be downloaded and applied to the server, if necessary.
Our next consideration was to ensure Receive Side Scaling (RSS) was enabled on the Intel NICs. RSS is a technology that allows any single TCP connection to be handled by more than one core on the receiving host. The default driver and default configuration for the Intel NIC was confirmed to both support RSS and have it enabled. The RSS support can be seen in the Features list in the previous firmware availability screenshot.
By default, the virtual NICs of an ESXi-based virtual machine, such as the BIG-IP VE, will only receive traffic from one hardware queue in the physical Intel network interface. To achieve at least 10Gb of throughput it is important to have the virtual machine request traffic from at least four queues offered by the Intel interface.
The virtual machine layer parameter that governs the requested queue count is the variable ethernetX.pnicFeatures = “4”, where X in the NIC number in the VMX file (0 is Management, 1 Internal, 2, External, 3 HA in a default BIG-IP OVA Deployment), management (0) is not necessary to be enabled for this feature.
Another parameter that we defined was ethernetX.ctxPerDev, which controls the maximum number of threads available per virtual NIC. In our case, with the layer2 speeds in question we set the parameter to 1, however a value of 3 would allow for between 2 and 8 threads per virtual NIC. For much larger workloads/throughput we would consider setting this to 3 in the future. Again, management (0) is not necessary to be enabled for this feature
There are two equivalent approaches to achieving the objectives of queue count of four and a thread per NIC:
1) Manually edit the underlying VMX file onboard the Dell host.
2) Modify the variable using the vCenter/ESXi HTML5 Console.
The console configuration approach is highlighted in the following screenshots where both ethernetX.pnicFeatures and its companion variable ethernetX.ctxPerDev are set. For either approach, the results will be the following assignments in the VMX file:
ethernet1.ctxPerDev = 1
ethernet2.ctxPerDev = 1
ethernet3.ctxPerDev = 1
ethernet1.pnicFeatures = “4”
ethernet2.pnicFeatures = “4”
ethernet3.pnicFeatures = “4”
To reach the editing screen for the above in vCenter, select the BIG-IP VE in the Inventory screen, right-click and select “Edit Settings,” followed by “VM Options.” Expand the “Advanced” nested menu item and click on the blue hyperlink labelled “Edit Configuration,” as shown here:
At this point, you simply enter the list of variables to ensure features such as each virtual NIC being able to receive traffic from all four hardware queues of the Intel physical interfaces.
The net result of either the GUI variable additions or alternate approach of VMX file editing, when examined through vCenter, should look like the following, and then just click OK. Ensure that the Reconfiguration of the VM completes successfully (if fails retry, sometimes failures occur due to VMware software accessing the VMX file at the same time).
Summary of Findings
In the out-of-the-box experience with the Dell R650 server coupled with a newly deployed OVA of F5 BIG-IP Virtual Edition, users can deliver solid performance results. Using an Ixia PerfectStorm load generator, the throughput results using a Performance L4 (Layer 4) profile and 512KB payload with 1000 concurrent users (100 requests per connection) we were able to achieve greater than 8 Gbps. With L7 (Layer 7) Profile HTTP transactions (with the same concurrency and 128B payload) we were able to achieve up to 510,000 transactions per second (TPS).
Applying subsequent optimizations to both the Dell server, VMware hypervisor and BIG-IP Virtual Machine File (VMX), the BIG-IP produced even more impressive performance numbers. With the 10Gbps Virtual Edition license, the test bed measured the full 10 Gbps of throughput on the same Performance L4 Profile (an increase in performance of 20%). Testing saw the same L7 HTTP Profile supported transactions per second also increased significantly, now exceeding 1.28 million (an increase in TPS of almost 151%).