Forum Discussion
LTM TCP Connection management
I am trying to understand why the F5 always shows 2-3 times more active sessions for a pool member than is actaully in the physical server state table. In addition I am seeing a problem with having Linux (ubuntu) and Solaris servers in the same pool. What happens is that the Solaris servers get most all of the connections and the Ubunut servers which are on better hardware are sitting mostly idle... The distribution method we use is least connections (node) and a both performance layer 4 or standard TCP depending on location.
So I guess to questions from this: 1) My uderstanding of LTM is the TCP connections which are closed normally for 4-way/3-way close should be immediately closed on the F5. The server always intiates the active close and hence goes into TIME_WAIT. Why does the pool member active connections always show so much more than the server really has active? (Server side I can see this via netstat and F5 I can use b pool | grep cur)
2) Ubuntu has hard coded 60 sec TIME_WAIT in the kernel but Solaris it is a tuneable paramter which we have set to 10secs for performance reasons. ( These connections are very short/fast so no issues with lower time). Why would the f5 send most everything to Solaris servers on poor hardware which translates to slower response times? ( we are not using oneconnect)
I cant seem to find any data that would explain this behaviour and it does not make any technical sense. we are on archaic code (9.25) which I have no control over but I have not seen this issue with multiple OS before. I have also tried to use a Round Robin pool balance method which also did not work and same behaviour... Does anyone have any logic as what is the problem here?
Thanks Andy
18 Replies
- What_Lies_Bene1
Cirrostratus
It doesn't appear to be hard coded (and it's actually 120s) - see here: http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/
- andy_12_5042
Nimbostratus
It defintely is as I have looked into the source and it is harcoded to 60 seconds.
include/net/tcp.h as:
define TCP_TIMEWAIT_LEN (60*HZ) - andy_12_5042
Nimbostratus
sorry for formatting on that last post....
- What_Lies_Bene1
Cirrostratus
No worries. I say 120s as the TIME WAIT is 2x MSL and I've assumed the value of 60 is for the MSL rather than the full time wait. Worth testing I guess.
Did you read the article? It can clearly be changed with a sysctrl entry;
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = XXX
- andy_12_5042
Nimbostratus
yep I read it but that key does not exist in ubuntu and hence is not exposed. ( At least in the 10.4LTS which we are using) I dont believe this is the real issue anyway though as I can do the opposite and increase solaris to 60 seconds. The behavior does not change and TIME_WAIT should only impact the server and not the F5. Meaning if that was an issue , I should see problem with socket creartion for new connections as there would be to many in TIME_WAIT. That is not the case here , so there is something else wrong.
THere is NO way to modify TIME_WAIT on ubuntu 10.4 and I have not looked into new versions but I dont beleive that has changed. I will have to go and double check.
- andy_12_5042
Nimbostratus
this would be kernel most likley and not version, I suppose. maybe this is a module not loaded or something along those lines..
- andy_12_5042
Nimbostratus
Ah I see , looks like the nf module was not loaded . So I fixed that and I can see this key now which I was unaware of.. I will modify just to see if it gives any difference in behaviour and does in fact override the hard code 60 seconds.
- andy_12_5042
Nimbostratus
for the record, this does not override the hardcoded value in the code and does not work . ( I tested) I dont see anything in the source that would honor this change via sysctl. (Its easy to see the timers of a socket in any state with -o flag to netstat)
But again, I cant see how this could be the source of the issue based on how the F5 SHOULD be handling connections. This is where I wish we actaully had support :) but no doubt 9.25 would be a "yeah go ahead and upgrade to newer code first"..
- What_Lies_Bene1
Cirrostratus
True, I've probably been focussed on the wrong thing. Can you tell me what values you have configured in the FastL4 and TCP profiles assigned to the Virtual Servers please, specifically;
-Idle Timeout -Loose Close -TCP Close Timeout
Enabling the last two (if not already) may help. I suspect a bug myself.
- andy_12_5042
Nimbostratus
the particular unit which I am testing on is using this fast L4 profile:
- reset on timeout enable
- reassemble fragments disable
- idle timeout 300
- tcp handshake timeout 5
- tcp close timeout 5
- mss override 1460
- pva acceleration full
- tcp timestamp preserve
- tcp wscale preserve
- tcp generate isn disable
- tcp strip sack disable
- ip tos to client pass
- ip tos to server pass
- link qos to client pass
- link qos to server pass
- rtt from client disable
- rtt from server disable
- loose initiation disable
- loose close disable
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
