Forum Discussion

rld_178240's avatar
rld_178240
Icon for Altostratus rankAltostratus
Nov 25, 2014

Slow application performance when using BIG-IP LTM VE for load balancing.

The Question

 

Is there some limitation on the BIG-IP F5 LTM that prevents it from being able to support a large user load and/or are there any configurations that could specifically interfere with performance of an application?

 

Supporting Information

 

I am load testing our application, and wanting to utilize our virtual F5 as the load balancer for the various clusters. Currently I am seeing slow response times when using a Virtual F5 as the load balancer. The slow performance is most apparent when the system is under load. While I can see slow performance in single user tests, it is not as extreme as when the system is under a large user load. When using other load balancing options (in our specific case, NLB) I am not seeing the same slow performance.

 

I have tried a variaty of debugging steps, but none have really helped me to put a finger on a solution to the problem.

 

1. Checking VM resource usage

 

All VMs show acceptable resource consumption and availability (no VMs appear to be strained... this includes the F5 VM.)

 

2. Checking ESX host usage

 

All ESX hosts show acceptable resource consumption and availability (no ESX hosts appear to be strained.)

 

3. Adjustments to F5 configuration

 

Disabled oneConnect for the F5 virtual server configuration. There was an issue discovered earlier with using oneConnect with our application and the 11.x versions of F5. This change did not have any obvious affect on test results.

 

4. Test validation:

 

We have ran the same test with another load balancer set-up. This set-up shows acceptable response times with that configuration. Is there perhaps a limitation with the virtual server as to the max number of users we can expect to support? If so what might this number be?

 

5. Comparison against other environments

 

I have compared configurations between an environment using a physical F5 to the the virtual F5 setup I am having issues with. I am not seeing any noticeable differences that would potentially cause issues with performance. It should be noted that using the physical F5 is yielding expected response time performance.

 

6. Analytics from the F5 console

 

Monitoring Latency of pool members (while under load) is showing an average latency of 1,000+ ms. This seems high... particularly for a virtualized environment. Is this perhaps an F5 VM limitation, or is there perhaps something at an F5 configuration level that we are overlooking, that could cause this?

 

7. Changing vNIC type

 

It had been suggested that the particular vNIC being used may be a possible source of bad behavior. As such, the vNIC type was changed from Intel to vmxnet3. Current test executions thus far have not shown any noticeable change, but there are additional tests to be executed in this avenue.

 

8. DynaTrace Analysis

 

This is the latest testing being done, and as such the results are still under analysis. However, initial test runs suggest that the majority of the extra time is being spent in two points:

 

  • Requests between load test agents and
  • Requests between Java Tier and IIS host

This seems to suggest that the F5 is somehow bottle-necking the request.

 

Version Information

 

  • F5 Version: BIG-IP v11.3.0 (Build 2806.0)
  • Node OS version (Windows 2008 r2)
  • VMWare Tools Version: Version 9.4.0, build-1280544

Thank you in advance for any help that can be provided.

 

  • JG's avatar
    JG
    Icon for Cumulonimbus rankCumulonimbus
    Not sure what you mean by: "Requests between load test agents and" It's hard to say not knowing the application architecture, and how the application works.
  • OK, standard set of questions here;

     

    • Please double-check all vNICs (apart from mgmt) are VMXNET3
    • What TCP profile settings are being used client and server side?
    • What vNICs are assigned to the servers?
    • Are you using any FastL4 virtuals?
    • Have you enabled TCP Segmentation Offload (TSO) in VE?
    • Have you enabled Large Receive Offload (LRO) in the hypervisor?
    • If your physical host NICs support it, have you enabled SR-IOV support?

    There's lots of factors that may influence things and that can be tweaked including;

     

    • TCP Slow Start
    • Nagle (disabled by default) and Delayed ACKs (enabled by default)
    • Server/host receive window, initial congestion window etc.

    Anyway, if you can answer/investigate my initial questions we can take it from there.

     

  • Apologies for the typo. It is not letting me save edits to the original question. That should have been Requests between load test agents and IIS host.
  • Alright. Sorry for the delay. Interestingly enough I can across a similar issue in my own environment today where file transfers takes approx. 10x long via 'traffic' interfaces compared to the management interface.

     

    We can probably leave the TSO/LSO options alone, especially if FastL4 isn't being used. SRV-IO would be nice but shouldn't make a massive difference.

     

    • Create custom client and server side TCP profiles

       

    • I'd suggest bumping the TCP initial congestion window up to 10 if you can - server and client side, this should be done on the servers/hosts: http://andydavies.me/blog/2011/11/21/increasing-the-tcp-initial-congestion-window-on-windows-2008-server-r2/ and within the TCP profile changing the Initial Congestion Window Size from the default of 0 (which actually means 4).

       

    • Enable Proxy MSS client side

       

    • Up the Initial Receive Window Size server side

       

    • If client side is lossy/wireless, consider enabling D-SACK*

       

    • Ensure Congestion Control is set to High Speed. If clients are on a lossy network, consider New Reno instead.

       

    Let me know how you get on.

     

  • Thanks for the extensive feedback. VE should be able to handle such a load with ease.

    This command may provide some insight;

    more /var/log/tmm | grep UNIC
    

    My v11.4.1 VE (Openstack on KVM on RHEL) returns this;

    <13> Dec  3 11:54:18 host-192-xxx-xxx-xxx notice UNIC: set MTU for eth1 to 9000
    <13> Dec  3 11:54:18 host-192-xxx-xxx-xxx notice UNIC: eth1 supports tx csum
    <13> Dec  3 11:54:18 host-192-xxx-xxx-xxx notice UNIC: eth1 supports TSO
    <13> Dec  3 11:54:18 host-192-xxx-xxx-xxx notice UNIC [un1] unic_attach(450): Hardware checksum offload is enabled
    <13> Dec  3 11:54:18 host-192-xxx-xxx-xxx notice UNIC [un1] unic_attach(462): TCP segmentation offload is enabled
    <13> Dec  3 11:54:18 host-192-xxx-xxx-xxx notice UNIC [un1] unic_attach(466): VLAN hardware checksum is disabled
    <13> Dec  3 11:54:18 host-192-xxx-xxx-xxx notice UNIC [un1] unic_attach(470): VLAN TCP segmentation offload is disabled
    

    I'm not sure what the last two lines might indicate and unfortunately can't test under load. Would be interesting to see what you get back on v11.3.

    I think, until you get to v11.5 at least the only next logical step is to do a packet capture client and server side and try to identify where the delay actually is.

  • You may also want to consider these from the v11.4 and 11.5 release notes respectively;

     

    ID 409234

     

    "FastL4 Virtual Servers can experience very low throughput on Virtual Edition with TCP Segmentation Offload disabled. The customer will notice a large amount of Transmit Datagram Errors for the fastl4 profile (tmsh show ltm profile fastl4)" "The customer must be running Big-IP version 11.4 Virtual Edition with at least one fastL4 virtual server configured. The customer must additionally have TCP Segmentation Offload (TSO) disabled in the TMM (sys db tm.tcpsegmentationoffload). The customer may see low throughput numbers in this configuration if their hypervisor has Large Receive Offload (LRO) enabled. This is a hypervisor configuration and is beyond our control. The customer may also see these low throughput numbers when their Virtual Edition is passing traffic to other virtual machines running on the same physical hypervisor." FastL4 virtual servers affected will have very low throughput. "The customer should enable TCP Segmentation Offload by modifying 'sys db tm.tcpsegmentationoffload'. The customer may also resolve this issue by disabling large Receive Offload (LRO) on any hypervisor they plan on running Virtual Edition."

     

    ID 455361

     

    Fixed improper handling of ICMP (Internet Control Message Protocol) 'Fragmentation Required' messages from routers. Bug resulted in extremely inefficient behavior by BIG-IP TCP segmentation offload if path MTU (Maximum Transmission Unit) was smaller than what TCP endpoints negotiated.

     

  • The overall answer to this question is, yes there is a limitation... but it depends on the license type purchased. The license I had available to me had a 10 Mbps limitation on it, rendering it useless for load testing scenarios. This information in the assorted comment threads.

     

    The following references on F5 were helpful, but ultimately, I would have liked to find a SOL that explicitly calls out this limitation, as it feels somewhat masked (it may not be, when talking to account teams from F5, but unfortunately I came into this post-purchase.) It is not listed in the licensing information when running tmsh show /sys license.

     

    Thank you again to WLB for all of the help. It was WLB's guidance that helped me find the answer to this question. I am just posting the direct answer so that the thread can be closed.