Forum Discussion
Matt_Breedlove_
Nimbostratus
Apr 25, 2009Help with avoiding Reselect bug in LB_Failed
Hi All,
Running BIG-IP 9.3.1 Build 37.1
Was planning on using LB:Reselect in LB_Failed then I saw the known issue on the reference page about causing a system crash...yikes. H...
hoolio
Cirrostratus
Apr 29, 2009Hi Matt,
See below for feedback:
"Also concerned even if it doesn't cause a crash on the bigip, will re-select the same failed member. The whole reason I use "LB::reselect pool" instead of just "pool" is to make sure I get a member that is up."
You could mark the current pool member down using LB::down. I think selecting a pool in LB_FAILED using pool does just that: selects the pool. It wouldn't do anything to retry the request, so no new request would be made and no response would be sent to the client. I could be wrong on this, but I've only seen LB::reselect used as it forces a new selection and retry.
1) Can I really not safely use LB::reselect and/or LB::mode inside of an LB_Failed event without crashing out my BigIP? What is the alternative...to only use "pool"? Doesn't using "pool" only instead of "LB::reselect pool" potentially will send a request to a member node that is actually down? Wiki page makese it sound like that
It looks like TMM crashes when the reselected pool member is down or sends a reset. This combination of events might not happen frequently, but when it does the result is significant. I'd check with F5 Support to see if there is a hotfix available for 9.3.1. If not, see if they can build one for you. 9.3.1 might be two major versions back but it's till under support for the better part of a year.
https://support.f5.com/kb/en-us/solutions/public/8000/700/sol8724.html
This is the result of a known issue. When you use the LB::detach and LB::reselect commands simultaneously within an LB_FAILED event, and the target pool member is unreachable or rejects the connection, double freeing causes a system crash.
The double freeing of the connection resources on the server side occur because the LB::detach command frees the connection resource initially. When the LB::reselect command initiates, it attempts to free the connection resource before it attempts to reselect another pool member.
2) Are there context rules as to what commands/statements you can issue inside of events? Can I call HTTP:Redirect inside of LB events with impunity or should I only call HTTP:anything inside of CLIENT_CONNECT, HTTP_REQUEST, and HTTP_RESPONSE?
CLIENT_ACCEPTED is triggered when the TCP connection has been established to the VS. The HTTP headers aren't parsed until HTTP_REQUEST. So you can't use any HTTP:: commands until HTTP_REQUEST. I'm not sure why you couldn't call HTTP::redirect from LB_SELECTED, but per the wiki page you can't. If this is actually correct, you could use LB::select in HTTP_REQUEST to make a load balancing selection and then use an HTTP:: command. You should be able to use HTTP::redirect from LB_FAILED.
3) In the first rule I am using a pool of static html web servers for the maintenance window page...in the second it is using a redirect to an external URL with the page. Is one or the other preferred? Should I just code the maintenance window page (very simple short html page...few lines) into the irule itself using HTTP::response and avoid having to use new pools or new external URL's. Performance impact?
There isn't much difference between using an HTTP redirect compared with selecting a new pool in terms of performance. The former would probably result in the client establishing a new TCP connection to the VS whereas selecting the sorry pool wouldn't. Also, selecting the sorry pool and not sending the client an HTTP redirect would mean the client doesn't see an update on the URI. If you go with the sorry pool option, you would want to make sure the sorry pool server sets appropriate caching headers to prevent the client, search engine spider, or intermediate proxy server from caching the sorry content.
4) Is the various uses of "persist none" making sense below? Not necessary for HTTP:Redirect?
It's not necessary, as there is no pool selection being made.
Also, in the first rule, if you're always rewriting the URI to / then you can't have the root document reference any images, css files or other documents on the same VS, because the iRule will rewrite that request to /. You can get around this by putting the server content in a specific directory (like /maintenance/ and then only rewriting the URI to / if it doesn't already start with /maintenance/.
It would be faster to serve the maintenance content from LTM itself. It eliminates the need to have specific servers designated to serve sorry content. But it's a bit more complicated to configure and update the set up on LTM compared with a standard web server. And the content you serve is stored in LTM memory and takes resources away from other functions.
Aaron
Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects