VE cannot establish TCP connections from VLAN interfaces to other guests on untagged ports of the same VLAN
I've just spent a couple of days scratching my head on this one. I may just be slow, but in any case, to spare others the pain, here's the solution.
First the problem.
There may be other combinations which exhibit this behaviour but this is my setup.
The BIG-IP has interface 1.1 connected to the virtual switch with several VLANs trunked on the switch port. The 802.1q traffic is untagged by tagged VLAN interfaces configured on the BIG-IP on interface 1.1. I require this setup because I must present more than 7 pool subnets to the BIG-IP. The pool members for virtual services are connected to untagged ports on the various VLANS. The BIG-IP can ping the pool members but cannot establish TCP connections. Likewise the pool members cannot connect to the BIG-IP (e.g. over SSH).
Strangely, while debugging connections between the BIG-IP and a pool member you will find that running tcpdump on the BIG-IP will magically make everything work whilst ever tcpdump continues to run. Also, when running tshark or tcpdump on the Xen host bound to either the BIG-IP's switch port or the pool member's switch port you will notice that it is the pool member which is ignoring all packets from the BIG-IP. The BIG-IP receives and acknowledges every packet from the pool member.
The reason for this odd behaviour, as detailed in this KB solution, is that the BIGIP does not, in normal operation, calculate the UDP/TCP checksum for packets being transmitted. It instead relies on the switch to do this in hardware, effectively reducing the load on the appliance. This is a problem however when running VE on Xen or XenServer using either the linux kernel bridging stack or Open vSwitch to implement a virtual switched network.
Due to issues with the combination of hardware TX checksumming, bridging and VLANs in the linux kernel (resulting in a short circuit IIRC) hardware TX checksumming is disabled for virtual interfaces which are bridged to VLAN's. Likewise the Open vSwitch software switch does not perform any checksum on ingress packets from a virtual interface which is configured to offload this to the switch.
Given the configuration described, what is happening is the BIG-IP is not calculating the TX checksum on egress packets - assuming the switch will do this for it. At the receiving end the nil TCP/UDP checksum on the packet does not match that calculated by the recipient and the packet is dropped.
Now the fix.
If you use LTM-VE on Xen and whatever configuration you deploy results in behaviour similar to the above you must force software TX checksumming on the BIG-IP.
To do this, from the shell on the BIG-IP:
bigpipe db TM.TcpUdpTxChecksum software-only
bigpipe save all
Personally I think this should be the default on VE, or at least prominently mentioned in the deployment notes.