LTM TCP Connection management

I am trying to understand why the F5 always shows 2-3 times more active sessions for a pool member than is actaully in the physical server state table. In addition I am seeing a problem with having Linux (ubuntu) and Solaris servers in the same pool. What happens is that the Solaris servers get most all of the connections and the Ubunut servers which are on better hardware are sitting mostly idle... The distribution method we use is least connections (node) and a both performance layer 4 or standard TCP depending on location.

So I guess to questions from this: 1) My uderstanding of LTM is the TCP connections which are closed normally for 4-way/3-way close should be immediately closed on the F5. The server always intiates the active close and hence goes into TIME_WAIT. Why does the pool member active connections always show so much more than the server really has active? (Server side I can see this via netstat and F5 I can use b pool | grep cur)

2) Ubuntu has hard coded 60 sec TIME_WAIT in the kernel but Solaris it is a tuneable paramter which we have set to 10secs for performance reasons. ( These connections are very short/fast so no issues with lower time). Why would the f5 send most everything to Solaris servers on poor hardware which translates to slower response times? ( we are not using oneconnect)

I cant seem to find any data that would explain this behaviour and it does not make any technical sense. we are on archaic code (9.25) which I have no control over but I have not seen this issue with multiple OS before. I have also tried to use a Round Robin pool balance method which also did not work and same behaviour... Does anyone have any logic as what is the problem here?

Thanks Andy

18 Replies

What_Lies_Bene1
Cirrostratus
Sep 04, 2013
It doesn't appear to be hard coded (and it's actually 120s) - see here: http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/
andy_12_5042
Nimbostratus
Sep 04, 2013
It defintely is as I have looked into the source and it is harcoded to 60 seconds.

include/net/tcp.h as:

define TCP_TIMEWAIT_LEN (60*HZ)
andy_12_5042
Nimbostratus
Sep 04, 2013
sorry for formatting on that last post....
What_Lies_Bene1
Cirrostratus
Sep 04, 2013
No worries. I say 120s as the TIME WAIT is 2x MSL and I've assumed the value of 60 is for the MSL rather than the full time wait. Worth testing I guess.

Did you read the article? It can clearly be changed with a sysctrl entry;

net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = XXX
andy_12_5042
Nimbostratus
Sep 04, 2013
yep I read it but that key does not exist in ubuntu and hence is not exposed. ( At least in the 10.4LTS which we are using) I dont believe this is the real issue anyway though as I can do the opposite and increase solaris to 60 seconds. The behavior does not change and TIME_WAIT should only impact the server and not the F5. Meaning if that was an issue , I should see problem with socket creartion for new connections as there would be to many in TIME_WAIT. That is not the case here , so there is something else wrong.

THere is NO way to modify TIME_WAIT on ubuntu 10.4 and I have not looked into new versions but I dont beleive that has changed. I will have to go and double check.
andy_12_5042
Nimbostratus
Sep 04, 2013
this would be kernel most likley and not version, I suppose. maybe this is a module not loaded or something along those lines..
andy_12_5042
Nimbostratus
Sep 04, 2013
Ah I see , looks like the nf module was not loaded . So I fixed that and I can see this key now which I was unaware of.. I will modify just to see if it gives any difference in behaviour and does in fact override the hard code 60 seconds.
andy_12_5042
Nimbostratus
Sep 04, 2013
for the record, this does not override the hardcoded value in the code and does not work . ( I tested) I dont see anything in the source that would honor this change via sysctl. (Its easy to see the timers of a socket in any state with -o flag to netstat)

But again, I cant see how this could be the source of the issue based on how the F5 SHOULD be handling connections. This is where I wish we actaully had support :) but no doubt 9.25 would be a "yeah go ahead and upgrade to newer code first"..
What_Lies_Bene1
Cirrostratus
Sep 04, 2013
True, I've probably been focussed on the wrong thing. Can you tell me what values you have configured in the FastL4 and TCP profiles assigned to the Virtual Servers please, specifically;

-Idle Timeout -Loose Close -TCP Close Timeout

Enabling the last two (if not already) may help. I suspect a bug myself.
andy_12_5042
Nimbostratus
Sep 04, 2013
the particular unit which I am testing on is using this fast L4 profile:

reset on timeout enable
reassemble fragments disable
idle timeout 300
tcp handshake timeout 5
tcp close timeout 5
mss override 1460
pva acceleration full
tcp timestamp preserve
tcp wscale preserve
tcp generate isn disable
tcp strip sack disable
ip tos to client pass
ip tos to server pass
link qos to client pass
link qos to server pass
rtt from client disable
rtt from server disable
loose initiation disable
loose close disable

Forum Discussion

LTM TCP Connection management

18 Replies

Recent Discussions

Provisioning r2600 equipment

Block CBC

active/active Support Declarative Onboarding

Error while running ansible

URL

Related Content

Re: Management Certificate

Prepare BIG-IP Central Manager for Automation

Using iControl REST API to manage F5 BIG-IP Advanced Firewall Manager (AFM)

Managing NGINX App Protect WAF for Beginners

Device Management