Forum Discussion

Dave_Whitla_254's avatar
Icon for Nimbostratus rankNimbostratus
Nov 25, 2011

VE cannot establish TCP connections from VLAN interfaces to other guests on untagged ports of the same VLAN

I've just spent a couple of days scratching my head on this one. I may just be slow, but in any case, to spare others the pain, here's the solution.




First the problem.




There may be other combinations which exhibit this behaviour but this is my setup.





The BIG-IP has interface 1.1 connected to the virtual switch with several VLANs trunked on the switch port. The 802.1q traffic is untagged by tagged VLAN interfaces configured on the BIG-IP on interface 1.1. I require this setup because I must present more than 7 pool subnets to the BIG-IP. The pool members for virtual services are connected to untagged ports on the various VLANS. The BIG-IP can ping the pool members but cannot establish TCP connections. Likewise the pool members cannot connect to the BIG-IP (e.g. over SSH).




Strangely, while debugging connections between the BIG-IP and a pool member you will find that running tcpdump on the BIG-IP will magically make everything work whilst ever tcpdump continues to run. Also, when running tshark or tcpdump on the Xen host bound to either the BIG-IP's switch port or the pool member's switch port you will notice that it is the pool member which is ignoring all packets from the BIG-IP. The BIG-IP receives and acknowledges every packet from the pool member.




The reason for this odd behaviour, as detailed in this KB solution, is that the BIGIP does not, in normal operation, calculate the UDP/TCP checksum for packets being transmitted. It instead relies on the switch to do this in hardware, effectively reducing the load on the appliance. This is a problem however when running VE on Xen or XenServer using either the linux kernel bridging stack or Open vSwitch to implement a virtual switched network.




Due to issues with the combination of hardware TX checksumming, bridging and VLANs in the linux kernel (resulting in a short circuit IIRC) hardware TX checksumming is disabled for virtual interfaces which are bridged to VLAN's. Likewise the Open vSwitch software switch does not perform any checksum on ingress packets from a virtual interface which is configured to offload this to the switch.




Given the configuration described, what is happening is the BIG-IP is not calculating the TX checksum on egress packets - assuming the switch will do this for it. At the receiving end the nil TCP/UDP checksum on the packet does not match that calculated by the recipient and the packet is dropped.




Now the fix.




If you use LTM-VE on Xen and whatever configuration you deploy results in behaviour similar to the above you must force software TX checksumming on the BIG-IP.




To do this, from the shell on the BIG-IP:




bigpipe db TM.TcpUdpTxChecksum software-only



bigpipe save all




Personally I think this should be the default on VE, or at least prominently mentioned in the deployment notes.









2 Replies

  • When running on VMware, BIG-IP VE takes advantage of virtualized hardware TCP checksum offload - the VMXNET3 NIC handles this.



    In XenServer, Hypervisor VLAN tagged packets are passed to external devices with correct framing information. Leaving the tmm generated checksum off improves performance and reduces hypervisor load. If you're forced into a corner where you must do the unsupported dance of passing guest tagged packets around, this solution is excellent - hooray for BIG-IP's flexible architecture.




  • I haven't actually tested this on Citrix XenServer, but AFAIK the code in question is identical to that on OS Xen.


    So I expect identical behaviour on XenServer to that which I have described. Incidentally the hypervisor doesn't tag anything, the dom0 kernel does. As described, the problem occurs when you must mix trunked and untagged ports on the same host.



    As for virtual hardware tx offload - I doubt it would be a performance improvement as the LTM vm is fully virtualised not paravirtualised. As a result the "virtual hardware" runs in host user-space where you're looking at 8 context switches for every guest IO.



    * context-switch to guest kernel


    * context-switch to hypervisor


    * context-switch to dom0 kernel


    * context-switch to dom0 qemu-dm


    * context-switch to dom0 kernel


    * context-switch to hypervisor


    * context-switch to guest kernel


    * context-switch to guest application



    If F5 used the pvops kernel and the PV xennet drivers there might be some performance improvement.