cancel
Showing results for 
Search instead for 
Did you mean: 

LTM TCP Connection management

andy_12_5042
Nimbostratus
Nimbostratus

I am trying to understand why the F5 always shows 2-3 times more active sessions for a pool member than is actaully in the physical server state table. In addition I am seeing a problem with having Linux (ubuntu) and Solaris servers in the same pool. What happens is that the Solaris servers get most all of the connections and the Ubunut servers which are on better hardware are sitting mostly idle... The distribution method we use is least connections (node) and a both performance layer 4 or standard TCP depending on location.

 

So I guess to questions from this: 1) My uderstanding of LTM is the TCP connections which are closed normally for 4-way/3-way close should be immediately closed on the F5. The server always intiates the active close and hence goes into TIME_WAIT. Why does the pool member active connections always show so much more than the server really has active? (Server side I can see this via netstat and F5 I can use b pool | grep cur)

 

2) Ubuntu has hard coded 60 sec TIME_WAIT in the kernel but Solaris it is a tuneable paramter which we have set to 10secs for performance reasons. ( These connections are very short/fast so no issues with lower time). Why would the f5 send most everything to Solaris servers on poor hardware which translates to slower response times? ( we are not using oneconnect)

 

I cant seem to find any data that would explain this behaviour and it does not make any technical sense. we are on archaic code (9.25) which I have no control over but I have not seen this issue with multiple OS before. I have also tried to use a Round Robin pool balance method which also did not work and same behaviour... Does anyone have any logic as what is the problem here?

 

Thanks Andy

 

18 REPLIES 18

What_Lies_Bene1
Cirrostratus
Cirrostratus

It doesn't appear to be hard coded (and it's actually 120s) - see here: http://www.lognormal.com/blog/2012/09/27/linux-tcpip-tuning/

 

andy_12_5042
Nimbostratus
Nimbostratus

It defintely is as I have looked into the source and it is harcoded to 60 seconds.

 

include/net/tcp.h as:

 

define TCP_TIMEWAIT_LEN (60*HZ)

andy_12_5042
Nimbostratus
Nimbostratus

sorry for formatting on that last post....

 

What_Lies_Bene1
Cirrostratus
Cirrostratus

No worries. I say 120s as the TIME WAIT is 2x MSL and I've assumed the value of 60 is for the MSL rather than the full time wait. Worth testing I guess.

 

Did you read the article? It can clearly be changed with a sysctrl entry;

 

net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = XXX

 

andy_12_5042
Nimbostratus
Nimbostratus

yep I read it but that key does not exist in ubuntu and hence is not exposed. ( At least in the 10.4LTS which we are using) I dont believe this is the real issue anyway though as I can do the opposite and increase solaris to 60 seconds. The behavior does not change and TIME_WAIT should only impact the server and not the F5. Meaning if that was an issue , I should see problem with socket creartion for new connections as there would be to many in TIME_WAIT. That is not the case here , so there is something else wrong.

 

THere is NO way to modify TIME_WAIT on ubuntu 10.4 and I have not looked into new versions but I dont beleive that has changed. I will have to go and double check.

 

andy_12_5042
Nimbostratus
Nimbostratus

this would be kernel most likley and not version, I suppose. maybe this is a module not loaded or something along those lines..

 

andy_12_5042
Nimbostratus
Nimbostratus

Ah I see , looks like the nf module was not loaded . So I fixed that and I can see this key now which I was unaware of.. I will modify just to see if it gives any difference in behaviour and does in fact override the hard code 60 seconds.

 

andy_12_5042
Nimbostratus
Nimbostratus

for the record, this does not override the hardcoded value in the code and does not work . ( I tested) I dont see anything in the source that would honor this change via sysctl. (Its easy to see the timers of a socket in any state with -o flag to netstat)

 

But again, I cant see how this could be the source of the issue based on how the F5 SHOULD be handling connections. This is where I wish we actaully had support 🙂 but no doubt 9.25 would be a "yeah go ahead and upgrade to newer code first"..

 

What_Lies_Bene1
Cirrostratus
Cirrostratus

True, I've probably been focussed on the wrong thing. Can you tell me what values you have configured in the FastL4 and TCP profiles assigned to the Virtual Servers please, specifically;

 

-Idle Timeout -Loose Close -TCP Close Timeout

 

Enabling the last two (if not already) may help. I suspect a bug myself.

 

andy_12_5042
Nimbostratus
Nimbostratus

the particular unit which I am testing on is using this fast L4 profile:

 

  • reset on timeout enable
  • reassemble fragments disable
  • idle timeout 300
  • tcp handshake timeout 5
  • tcp close timeout 5
  • mss override 1460
  • pva acceleration full
  • tcp timestamp preserve
  • tcp wscale preserve
  • tcp generate isn disable
  • tcp strip sack disable
  • ip tos to client pass
  • ip tos to server pass
  • link qos to client pass
  • link qos to server pass
  • rtt from client disable
  • rtt from server disable
  • loose initiation disable
  • loose close disable

What_Lies_Bene1
Cirrostratus
Cirrostratus

OK, so could you drop the idle timeout. I'm clutching at straws but you could also enable loose close.

 

OneConnect would also help reduce the number of server-side connections and reduce the load somewhat on the servers.

 

nitass
F5 Employee
F5 Employee

I have also tried to use a Round Robin pool balance method which also did not work and same behaviour

 

i think round robin should work. how did you test? can you reproduce the issue?

 

andy_12_5042
Nimbostratus
Nimbostratus

I have tried Round Robin and it would work for a while and then under heavy load, we start to see the same issue with much more traffic going to Solaris Servers.

 

I have reduced the ide time to as low as 10 seconds but that does not help since these are active. The F5 sees these connections as EST and not persistent but this is not reflected in the server session state table. There appears to be a difference in how long it is holding connections between these servers and I just cant understand why.

 

nitass
F5 Employee
F5 Employee

I have tried Round Robin and it would work for a while and then under heavy load, we start to see the same issue with much more traffic going to Solaris Servers.

 

how did you measure traffic to each server? was it from statistics on bigip?

 

are you using any setting which may affect load distribution?

 

sol10430: Causes of uneven traffic distribution across BIG-IP pool members

 

http://support.f5.com/kb/en-us/solutions/public/10000/400/sol10430.html

 

andy_12_5042
Nimbostratus
Nimbostratus

traffic was measured from both the F5 pool member statistics as well as the server side session table.. The server will always reflect the most accurate number of sockets that are in EST or TIME_WAIT state for example.

 

None of the things mentioned in that article apply here. Since I have seen this with Round robin that eliminates it being just with least connnections. This is one of those issues that I would need to get at the internals which I cant do without support. For example with some other vendor devices I can turn on specific types of debugging and observer the decision logic on where a request is sent based on the current configuration which is very helpful in these cases. It would at least provide some logic as to why more traffic is getting sent to same set of servers.

 

I doubt very much if we'll get to the root cause of this, particularly with such an old version of code. However (Nitass gave me this idea in response to another post) perhaps it can be overcome using a more 'intelligent' load balancing method. Candidates would be Weighted Least Connections, Dynamic Ratio, Observed or Predictive.

andy_12_5042
Nimbostratus
Nimbostratus

Yeah I agree and was starting to think that is the only possible solution at this point. I will have to test some different types of balancing methods and see what I can do.

 

Thanks for the comments guys! I dont know how I ended up with another gig that is using such old software and no support 🙂

 

Andy

 

What_Lies_Bene1
Cirrostratus
Cirrostratus

You're welcome, it's always the way. Please do post back if this does the trick. Here's a quick run down of the methods I mentioned;

 

Weighted Least Connections – Member & Node - This method load balances new connections to whichever Pool Member or Node has the least number of active connections, however, you define a Connection Limit (Weight) for each Pool Member or Node based on your knowledge of its abilities. The Connection Limits are used along with the active connection count to distribute connections unequally in a Least Connections fashion.

 

This method is suitable where the real servers have differing capabilities.

 

As each connection can have differing overheads (one could related to a request for a HTML page, the other a 20Mb PDF document that needs to be generated and downloaded) this is not a reliable way of distributing bandwidth and processing load between servers.

 

Member method: The weights and connection count for each Pool Member is calculated only in relation to connections specific to the Pool in question.

 

Node method: The weights and connection count for each Node is calculated in relation to all the Pools the Node is a Member of.

 

If all Pool Members have the same Connection Limit then this method acts just like Least Connections.

 

Dynamic Ratio – Member & Node - Also known as Dynamic Round Robin, this method is similar to Ratio but dynamic; real-time server performance (such as the current number of connections and response time) analysis is used to distribute connections unequally in a circular (Round Robin) fashion. This may sound like Observed but keep in mind connections are still distributed in a circular way.

 

This method is suitable where the real servers have differing capabilities.

 

Member method: The performance of each Pool Member is calculated only in relation to the Pool in question.

 

Node method: The performance of each Node is calculated in relation to all the Pools the Node is a Member of.

 

Observed – Member & Node - This method load balances connections using a ranking derived from the number of Layer Four connections to each real server and each server’s response time to the last request. This is effectively a combination of the Least Connections and Fastest methods.

 

Not recommended except in specific circumstances and not at all for large Pools. Connections to each Pool Member are only considered in relation to the specific Pool in question.

 

Member method: The weights and connection count for each Pool Member is calculated only in relation to connections specific to the Pool in question.

 

Node method: The weights and connection count for each Node is calculated in relation to all the Pools the Node is a Member of.

 

Predictive – Member & Node - Similar to Observed but more aggressive as the resulting Pool Member rankings are analysed over time and if a Pool Member’s ranking is improving it will receive a higher proportion of connections than one whose ranking is declining.

 

Not recommended except in specific circumstances and not at all for large Pools.

 

Member method: The ranking and analysis for each Pool Member is calculated only in relation to connections and response times specific to the Pool in question.

 

Node method: The ranking and analysis for each Node is calculated in relation to connections and response times for all the Pools the Node is a Member of.