Forum Discussion
Closing Active Connections sooner for maintenance
Background on the problem:
I have a pool setup of 6 windows machines executing .NET web services in IIS 7.x. During a deployment process we use iControl to disable 3 servers at a time to update the web service code/configuration. The problem is it takes a very long time for active connections to those pool members to die off. I want to figure out a way to quickly and cleanly kill the connections. The pool is set to round robin.
I have done some searching on the forums here and have found questions similiar to mine but most involve taking all pool members out for maintenance. The solution usually involves an iRule to check active_pool_members <1 then redirect to a maintenance page.
A possible solution I have come up with but not tested is to add this iRule:
when HTTP_REQUEST{
switch [LB::status]{
"session_disabled" -
"down"{
HTTP::close
}
default{}
}
}
I believe it checks the status of the pool member selected for this request and if it is down or disabled it then calls the HTTP::close which is supposed to close the connection after the response is sent. https://devcentral.f5.com/wiki/irules.HTTP__close.ashx
This feels a bit convuluded and not the best approach. Has anyone run into this problem before or can validate I am on the right track?
Thanks
-Mike
- What_Lies_Bene1Cirrostratus
I've come up with two alternative iRules and something else entirely;
1) Reselect a different pool member. This should work and not pick another member that is disabled (as only the fact the connection is existing or persisted kept it on a down member) but there's a chance a loop could occur;
3) Configure the Pool Action on Service Down setting as Reject or Reselect. I'd prefer this over either iRule.when LB_SELECTED { switch [LB::status] { "session_disabled" - "down" { [LB::reselect] return } default { } } }
- What_Lies_Bene1CirrostratusGuess you can't use 'code' twice, here's the missing one;
when LB_SELECTED { switch [LB::status] { "session_disabled" - "down" { reject return } default { } } }
- Mike_Young_6152NimbostratusYeah I was reading about the LB_SELECTED event last night and thought it would be a better event to respond to. I am now curious as to what the "reject" actually does. Does it kill the request and give the client an error? Does it complete the request and then kill the connection? Or does it just force a reselect? The Action on Service Down does sound like the best approach if it works like I hope. Does it mean when the actual service (IIS) is down or when I disable the node or pool member in F5? I have to avoid client errors like the plague and each request still needs procesed.
- What_Lies_Bene1CirrostratusReject will close the connection and send a RST to the client. The client should retry transparently to the user so it shouldn't be an issue, but it's worth testing.
- Mike_Young_6152NimbostratusI just ran some tests with Action on Service Down set to Reject. Reselect had no affect on my situation which I think is odd. My test includes me running a batch of roughly 1000 requests to the F5 for routing to my servers. I am using a test environnment that only has two servers in my pool. Sometimes my requests get stuck and spin. It's almost like F5 is caught in a loop trying to figure out which pool member to send the request to or it silently drops it. It doesn't happen every time but it happens often enough that it concerns me 1 out of five times.
- What_Lies_Bene1CirrostratusThis article suggests some ways to get round the loop: https://devcentral.f5.com/wiki/iRules.LB__reselect.ashx
- Mike_Young_6152NimbostratusOkay. After trying several different solutions I think I have finally chosen the one that seems to work the best so far and that is enabling OneConnect profile. My connections seem to die off faster after disabling the pool node and I seem to get better load balancing distribution. This all at the cost of a little bit of performance.
- What_Lies_Bene1CirrostratusI'm not aware of any, the only issue from back in the day was NTLM authentication and that's solved now. I'm not sure why you think performance has suffered, did your testing demonstrate it did? OneConnect should improve it (or at least make things more efficient).
- Mike_Young_6152NimbostratusI thought it would have helped it as well. I ran the same test 5 times with it on and with it off. The average time to complete was 215 seconds with it off and 228 with it on. Obviously there could be some explanation with network/ database performance at the moment and a larger sample size might get these numbers closer.
- What_Lies_Bene1CirrostratusInteresting, that's around a 7-8% difference. Thanks, I'll keep that in mind for the future...
Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com