Very Slow Application performance behind F5

We have one application that performs very poorly behind F5. There is a 9 second delay on the initial GET request going through the VIP. If we bypass the F5 to the servers directly there is no delay. Wireshark shows a lot of reassembled PDU's. I'm no guru with captures so I'm not sure what this means. Here is our setup:

SSL Offloading VIP.

one http pool with 2 members.

TCP lan/wan Optimized profiles on VIP, with one connect profile.

We are using SNAT

We tried disabling Nagle's, no effect

Tried enabling proxy max segment, no effect

tried going thru F5 using HTTP only, no effect

If we connect to the servers directly that 9 second initial delay vanishes.

No packet loss on NIC's.

Switch is set to 100MB Full as are F5 Nics.

2 HA LTM 3400 vers 10.0.1.

We do have a case open with support but they have not been able to identify the issue within our TCP Dumps. Has anyone seen this type of delay only on the initial GET Request? Any tips on improving performance? Our other applications behind F5 don't have this delay.

Thanks,

Marc

config

design

27 Replies

hoolio
Cirrostratus
Mar 08, 2010
Hi Marc,

I'd hope the tcpdumps would show enough information to start troubleshooting this. It sounds like you've already checked the main things like port duplex and played around with the TCP profile options. I think your best bet would be to analyze the tcpdumps with F5 Support and go from there.

I have read here that LAN optimized profiles generally provide better results over the WAN optimized profile even with public internet-based web apps.

Here are a few related posts:

FastL4 vs Optimized TCP Profiles

http://devcentral.f5.com/Default.aspx?tabid=53&forumid=31&postid=31604&view=topic

Any simple / quick ways to improve web traffic

http://devcentral.f5.com/Default.aspx?tabid=53&forumid=31&tpage=1&view=topic&postid=30500

Implementing TCP Lan Optimized Profile.

http://devcentral.f5.com/Default.aspx?tabid=53&forumid=31&tpage=1&view=topic&postid=31697

ICMP Virtual Server

http://devcentral.f5.com/Default.aspx?tabid=53&forumid=31&tpage=1&view=topic&postid=33817

Aaron
L4L7_53191
Nimbostratus
Mar 08, 2010
A 9-second delay sounds a lot like a couple of DNS timeouts may be happening (perhaps there are a few domain suffixes set up somewhere that don't resolve when hitting the VIP?).

-Matt
pagema1_69881
Nimbostratus
Mar 08, 2010
Thanks and I will look at these posts. I will also follow-up once we get a resolution to this. The delay occurs by VIP IP as well as FQDN so I don't think it is DNS related.

Thanks!

Marc
hoolio
Cirrostratus
Mar 08, 2010
Now that Matt mentions it, I remember a customer who was running a Windows based app where the app was trying to do a netbios lookup of the client IP. The netbios query was failing but took a few seconds to time out. A stab in the dark, but something to check on.

Do you see the same delay if you use curl from LTM to the VIP? What about from curl to the servers directly?

I'd think you'd see a definitive gap between frames in the tcpdumps that should indicate who is sitting on the connection. Once you figure that out, you can try to figure out why.

Aaron
L4L7_53191
Nimbostratus
Mar 08, 2010
Yep the stuff I've run into was a reverse-stye lookup as well, so even if forward resolution works to the VIP it may be worth checking, esp. with SNAT enabled.

-Matt
pagema1_69881
Nimbostratus
Mar 10, 2010
Updated info:

This is not DNS related. Capture shows Ack from server almost instantly, SSL handshake is very fast. After the handshake there is 9 seconds of nothing happening before the page begins to download. Still working with support on this. One caveat is that our LTM is deployed in our DMZ VLAN and for this app the pool members are a member of another network. I don't know if this could pose an issue. The tech stated most of the time the LTM is directly connected to the same network of it's pool members. We are now taking captures near the switch levels at all connect points to see if we can identify the bottleneck. Again, if we bypass the LTM everything is fine, but of course when we bypass LTM we are also bypassing our DMZ network as well.
hoolio
Cirrostratus
Mar 10, 2010
Out of curiosity, is it LTM or on the serverside when the last/next packet is sent during the nine second hang?

We have customers who load balance remote apps with no issue. It's not as common as load balancing local servers, but definitely possible.

Aaron
L4L7_53191
Nimbostratus
Mar 10, 2010
@pagema1: note that if it were DNS related it wouldn't affect the ACK coming back from the server at layer 4. The look up that I am referring to happens up the stack on the server side (usually a domain suffix list that is being traversed) and would very likely be a call off-box from the server to some remote system. Speaking of that, have you done a capture on the server side to see what is going on? That would be a solid data point to have if you've not done so already.

The fact that you're seeing a consistent 9 second delay really sounds like an "other connection" timeout of some sort (DNS or not).

-Matt
pagema1_69881
Nimbostratus
Mar 10, 2010
Following up on the second round of packet captures - the file Wireshark_173_030910.pcap (taken on the webserver) shows that the difference in the connection response time is due to some unknown issue within the server itself.

Ephemeral port tcp.port==2271 shows the test session through the LTM to the webserver.

Sequence number 1038 is the GET for the login page (only shows 'Application Data' due to encryption).

The ACK is sent 0.14 seconds later in sequence number 1042 The application data in response to the GET is sent starting in sequence number 1441 - this is 8.77 seconds after the ACK.

The data transfer continues until sequence number 1544, though that whole transfer took about 0.02 seconds.

So the general focus of the investigation should be at the webserver to see why it is waiting 8.77 seconds before returning that application data.

Note, we are taking additional captures to see where this delay lies. Note when connecting directly to the same web server there is no delay. It is only through LTM. We are looking at additional captures now to see if we can pin point what is happening. The server should behave the same regardless of if the call comes thru LTM or not I would think. We continue to research and will update.
naladar_65658
Altostratus
Mar 10, 2010
The only things that pop in my mind are to try turning on source based persistence and and turning off Netbios over DNS on those servers NIC's like what hoolio mentioned. I did that on some Citrix boxes recently and it made a big difference.

I am running an F5 unit in the same kind of network set up you have and haven't had any issues with over 400 VS's.....

Edit:

What kind of SSL certificates are you using on the F5 VS's? Are they self signed?