Forum Discussion
Krzysztof_Kozlo
Nimbostratus
Aug 03, 2007lb::reselect fails to select another node
I stripped out everything fancy and this still doesn't work. The behavior is peculiar:
rule reselect_test {
when LB_FAILED {
LB::reselect
}
}
pool test {
member 1.1.1.1:any
member 1.1.1.2:any
}
virtual test {
destination 1.1.2.1:any
protocol tcp
rule reselect_test
pool test
snat automap
}
When I connect, every other time the connection hangs while the LTM goes nuts trying to reconnect to the same back-end server.
Curiously, if I open another connection it breaks the first connection out of this loop and connects to the second.
I tested this on two 9.2.3 255.0 and one 9.3.0 system. Same thing.
I thought LB::reselect was a) supposed to select a _different_ node and b) supposed to be limited in the amount of retries?
8 Replies
- romka_77775
Nimbostratus
Try to call to LB::detach before LB:reselect.
LB::detach disconnects the server side connection. - d_9795
Nimbostratus
Thanks - Krzysztof_Kozlo
Nimbostratus
Tried that already. Changes nothing in the behavior whatsoever. - Joseph_Chan_463Historic F5 AccountBTW, is there a monitor to check the health of those two nodes?
This topic also tries to do something similar.
http://devcentral.f5.com/Default.aspx?tabid=53&forumid=5&postid=14059&view=topic
You may wish to try LB::down, but monitor is the proper way to do this. Monitor will watch out for the node when it comes back up. Rule marks it down and forget about it.
http://devcentral.f5.com/wiki/default.aspx/iRules/LB__down.html - Deb_Allen_18Historic F5 AccountLB::reselect chooses a node based on the LB algorithm for the pool, which may or may not be a "different" server. It reselects only once, but if the server fails to respond, you will loop on the LB_FAILED event endlessly unless you include some count/stop logic in your iRule.
When you say "every other time the connection hangs", that would seem to indicate that one of your pool members is not responding. I don't see that you have any monitoring in place.
I'm not sure why the other node isn't selected on failure, though, since you have default LB method Round Robin configured.
I'd start by applying a monitor to the pool. You should see better behaviour then. If you continue to have difficulty, post back & we can try to help further.
/deb - Krzysztof_Kozlo
Nimbostratus
I don't want monitoring on the pool. The whole idea is that this is supposed to be a layer-3 rule that will dynamically send users to only servers that are listening on a given port.
One of the servers is not responding, that's correct. That is by design. The connection hangs because the LTM is infinitely looping reselecting the same node that it selected to begin with (i.e. the one that doesn't respond).
What I want it to do is select the other one when the first one fails. That's what lb::reselect is supposed to do, but it doesn't. - Deb_Allen_18Historic F5 AccountI'd say you need to open a Support case, then, especially if you've been struggling with this for several months without resolution.
An iRules workaround might be to manually re-select the other server, then bail out if both are non-responsive. I've had other customers implement similar logic successfully for other reasons, but it obviously won't scale well above 2 servers:
The advantage of monitoring is that a monitor looks for an expected response, rather than just a SYN/ACK, to determine if the server is healthy enough to receive traffic.when CLIENT_ACCEPTED { set failed 0 } when LB_FAILED { incr failed if {$failed > 1 } specify action if both servers failed reject } else { default case would match if no pool or server selected switch [LB::server addr] { 1.1.1.1 { node 1.1.1.2 0 } 1.1.1.2 { node 1.1.1.1 0 } default { reject } } } }
HTH, and please let us know what you discover with Support.
/deb - Krzysztof_Kozlo
Nimbostratus
I'm well aware of the advantage of out-of-band monitoring. In this case, that doesn't scale, since the servers in question are dynamically allocated on various ports and we don't want to have to update the load balancer configuration every time a server is brought up or down.
Also, this scheme will catch servers that fail to respond within the interval window of out-of-band monitoring.
I've had a support case opened since last Friday, but have only yesterday received a response, which was a suggestion to check this thread!
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects
