Forum Discussion
Krzysztof_Kozlo
Nimbostratus
May 02, 2007TCP redirect on LB_FAILED for in-band health check.
We have several situations in the enterprise where it is desirable to have a large number of farmed services run on a single pool of servers. New instances come online all the time, and only TCP health checks are required, but we don't want to configure an explicit pool, complete with monitor, each time someone starts up a listening process on a port.
We want to use a Layer 3 virtual server like this:
virtual moo {
destination 1.1.1.1:any
ip protocol tcp
pool moo
rule moo
}
pool moo {
member server1:any
}
pool foo {
member server2:any
}
What I'd like to be able to do is create a rule like this:
rule moo {
when LB_FAILED {
log "connection to [IP::server_addr] failed"
use pool foo
}
This would enable an on-the-fly TCP health check, essentially -- if the host is not responding on that port, try the other server. I don't see any reason this shouldn't be possible, but it doesn't work. I simply get disconnected when LB_FAILED. LB_FAILED is working, based on LTM output:
May 2 16:20:05 tmm tmm[1049]: 01220002:6: Rule moo : connection failed: 144.203.239.34
Also, it is not the case that LB_FAILED is processed after the client flow is closed. This rule works:
rule moo {
when LB_FAILED {
log "connection failed: [IP::server_addr]"
TCP::respond "sorry, dude, your server's down."
}
}
Observe:
zuul /u/ineteng/Data/f5 239$ telnet 10.165.29.17 23
Trying 10.165.29.17...
Connected to 10.165.29.17.
Escape character is '^]'.
sorry, dude, your server's down.Connection closed by foreign host.
zuul /u/ineteng/Data/f5 240$
Anyone have any ideas? This sure would be useful!
10 Replies
- Krzysztof_Kozlo
Nimbostratus
If no one has any experience or tips to offer on getting this working, can I ask if anyone at least sees this functionality as useful? Folks I've talked to here are pretty excited about the possibilities.
What we want to do in effect is set up a Layer 3 rule with no monitoring, but make sure that any connections on any port are directed to a server that's listening on that port. If nothing is listening, the connection would be dropped.
Combined with, say, source IP persistence, this would allow us to load balance services that talk on arbitrary port ranges, or our present use case, in which we want to be able to start up servers arbitrarily on the pool members and have them load balanced (or at least highly available) without having to touch the LTM.
If we can't do this today, it sounds like a ripe, low-hanging feature request for the dev team at the least! I don't know of any other vendor who can claim in-band TCP health checking... - JRahm
Admin
Actually, Cisco LocalDirector (yes, that dinosaur) did this passive monitoring. It removed members from the pool after X number of failed tcp handshake attempts, then occasionally would throw bones back at it in attempts to bring it back "online"
I was hoping that the passive monitoring hyped for 9.4 was in line with this, but it is not the same. - Krzysztof_Kozlo
Nimbostratus
This is great! The documentation for 9.2.3 does not list "LB::reselect" as a method. (F5, send your doc writers back to the salt mines.) Initial results seem positive. I'll doc my full iRule when and if I get it working. - Krzysztof_Kozlo
Nimbostratus
According to the iRules Wiki (which I just discovered, thank you very much):
This command is used to advance to the next available node in a pool, either using the load balancing settings of that pool, or by specifying a member explicitly. ****Note that the reselection is currently limited to two tries.**** (emphasis added)
If this is correct, it means that a loop is not possible, and the logic
when LB_FAILED {
if { [LB::server addr] == "" } {
log "connection failed: no servers available"
} else {
log "connection failed: [LB::server addr]"
LB::reselect
}
}
is all we need. It also means that this technique is limited to pools with three or fewer members (two retries) unless that documentation is obsolete. - bl0ndie_127134Historic F5 AccountOk, I would like to kill this urban legend that Passive monitoring is limited to HTTP right now. 'LB::status' can be used from most reasonable events such as LB_FAILED HTTP_RESPONSE etc.
- Casey_Lucas_167
Nimbostratus
I've found that LB::status is great if you want to know ltm's current understanding of a member's status. However, I don't think the status is instant. I remember having to handle a situation where LB::status would report "up" even though a node had just failed. If you handle the LB_FAILED event, you can know instantly that a member has just failed. I think that LB::status would report "up" until a health check or irule marked the member as down.
So basically, LB::status helps let you know ltm's current (which can be delayed by health check interval) knowledge of a member. I found that handling LB_FAILED is more "instant". - bl0ndie_127134Historic F5 AccountYou are right, the status value is determined by the monitors so there is a bit of a lag depending on how you have the monitor setup.
However you can set the status (down the member) from the rule (because you got a SOAP Exception etc.) and effect of this is immediate.
This server will be marked down and will only be marked back up next time the monitors have a successful health check (or for some reason you want to mark them up in rules; which is actually possible but not recommended). nded). - Krzysztof_Kozlo
Nimbostratus
BTW it would be useful to have the number of retries for LB::reselect be configurable. - JRahm
Admin
as well as removing from consideration any failed pool member previously selected during the current iteration of the reselection process. - Krzysztof_Kozlo
Nimbostratus
The rule above seemed to work when I tested it back in May, but now that I am trying it again it seems to get into an infinite loop of SYN/RESETs with the downed back-end every other time with lb::reselect reselecting the same (broken) server.
Has anyone seen this? What could cause it? I'm running v9.2.3 255.0...
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects