Forum Discussion

Marc_57522's avatar
Marc_57522
Icon for Nimbostratus rankNimbostratus
Jun 06, 2011

High ping latency and loss of connection

I've installed the trial of LTM VE on ESXi 4.1 and, most of the time, it's working perfectly. Randomly, however, I lose connectivity to the LTM virtual machine and all nodes behind it. During this timeout period, the LTM cannot ping any nodes, including those VM's on the same ESX host.

 

 

This, to me, indicates that the issue isn't with networking outside of the ESX host, but rather within the virtual machine or the virtual switch. I've moved the VM to another ESXi host but the problem persists.

 

 

Another curious sign is the ping latency from the LTM out to a VM node (same ESXi host):

 

 

PING 172.16.xxx.xxx (172.16.xxx.xxx) 56(84) bytes of data.

 

64 bytes from 172.16.xxx.xxx: icmp_seq=1 ttl=128 time=7.25 ms

 

64 bytes from 172.16.xxx.xxx: icmp_seq=2 ttl=128 time=9.26 ms

 

64 bytes from 172.16.xxx.xxx: icmp_seq=3 ttl=128 time=10.2 ms

 

64 bytes from 172.16.xxx.xxx: icmp_seq=4 ttl=128 time=10.2 ms

 

64 bytes from 172.16.xxx.xxx: icmp_seq=5 ttl=128 time=9.12 ms

 

64 bytes from 172.16.xxx.xxx: icmp_seq=6 ttl=128 time=10.3 ms

 

 

--- 172.16.xxx.xxx ping statistics ---

 

6 packets transmitted, 6 received, 0% packet loss, time 5035ms

 

rtt min/avg/max/mdev = 7.252/9.421/10.319/1.091 ms

 

 

 

If, on the other hand, I ping from a node to the LTM, I get <1ms latency. So:

 

 

LTM VE -> MyHost = ~10ms

 

MyHost -> LTM VE = <1ms

 

 

Marc

 

 

 

  • A few notes:

     

     

    1. My LTM VE instance has two interfaces, but the internal interface handles four tagged VLANs. External interface manages a single untagged VLAN.

     

     

    2. Nothing logged to any of the /var/log files that would be of any help.

     

     

    3. Performance graphs don't indicate that I'm hitting any sort of ceiling (no load on this yet).

     

     

    4. Outages last for 2 - 3 minutes, then traffic resumes on its own.

     

  • 1) intermittent connectivity like this sounds like a duplicate MAC or IP address is present elsewhere within the layer2 infrastructure.

     

     

    2) 10ms ping responses from the node generally indicate a slow node/polling driver. The fact that responses from the VE to the node are quick indicates this behavior. Are your nodes using E1000 NICS or other fully emulated NIC types? I've seen slow responses there.

     

  • Posted By qe on 06/06/2011 01:19 PM

     

    1) intermittent connectivity like this sounds like a duplicate MAC or IP address is present elsewhere within the layer2 infrastructure.

     

     

    Wouldn't a dupe only cause problems on one of the interfaces? I can't ping out any of the LTM interfaces while this is going on. Would the LTM log IP conflicts anywhere?

     

    2) 10ms ping responses from the node generally indicate a slow node/polling driver. The fact that responses from the VE to the node are quick indicates this behavior. Are your nodes using E1000 NICS or other fully emulated NIC types? I've seen slow responses there.

     

    Node-to-node ping (even across two LTM interfaces) is at 1ms. The only pings that exceed this are LTM-to-node. The LTM installation is the trial, v10.1.0.3341 and has E1000 interfaces. All of my VM nodes use VMXNET 3 adapters, and a few are physical nodes with Broadcom NICs.

     

     

    Is there a chance that the trial version of LTM VE is rate-limited by holding every packet for 10ms?

     

  • Hi Marc,

     

     

    Have you ever found a solution to this problem? The EXACT same situation has been bothering my setup.
  • John_Hall_11177's avatar
    John_Hall_11177
    Historic F5 Account
    Marc,

     

    If LTM held packets for 10ms, then you'd see the same delay on LTM to LTM interfaces. One other diagnostic step would be to temporarily switch the nodes you're communicating with to the e1000 driver. We've seen some very weird behavior with VMware when receiving traffic on VMware's VMXNET 3 driver resulting in very large packets being handed off to upper layers that are not expecting them. Apparently the VMXNET 3 driver doesn't always obey the configured MTU of the guest.