Forum Discussion

EBL_27513's avatar
EBL_27513
Icon for Nimbostratus rankNimbostratus
Apr 30, 2012

Assistance Requested - iRule causes virtual to hang upon LTM reboot

Hello,

 

 

I'm a relative newbie when it comes to writing iRules, and so wanted to see if anyone in the forum would have some time to help me with an issue I'm having. Any help would be much appreciated!

 

 

Basically what I want to do is have traffic for a virtual route to a "maintenance" pool when all the members of the virtual's pool are down. I have a high availability pair of LTM-3900's in Active/Standby redundancy mode, with what I call the "primary" set with the Redundancy State Preference to "Active". The "secondary" LTM is set to "Standby" preference. They're running OS 10.2.0, Hotfix 2.

 

 

I have an iRule that does reselect the maintenance pool successfully. However, upon primary LTM reboot, because we're using cookie persistence and at reboot the nodes in the pool are seen as "down", the current iRule is triggered, and it's causing our LTM to hang the virtual for a few minutes. F5 investigated and said that since the iRule is doing a reselect, but is still cookie-sticky on the down nodes, it goes into a loop until it sees a node as "up".

 

 

They recommended that I get rid of the persistence (stickiness) in the iRule before I try to select a node from the "maintenance" pool. I think I just need a slight tweak to my current iRule, but wanted to get the thoughts of the pro's on this board.

 

 

Current code from bigip.conf:

 

virtual website.com-tcp-443-vip {

 

pool website.com

 

destination x.x.x.x:443

 

ip protocol 6

 

rules i-maintenance-hub

 

persist cookie

 

profiles {

 

website.com {

 

clientside

 

}

 

http {}

 

tcp {}

 

}

 

}

 

 

rule i-maintenance-hub {

 

when LB_FAILED {

 

LB::reselect

 

pool maintenance-hub

 

}

 

}

 

 

To remove persistence when the iRule is triggered, can I basically add "persist none" as the first line of the when clause?

 

ie:

 

rule i-maintenance-hub {

 

when LB_FAILED {

 

persist none

 

LB::reselect

 

pool maintenance-hub

 

}

 

}

 

 

If there's a better way to do this, I'd be very happy to learn it - we don't have to stick with this code. It just seemed the most straightforward and simple way to do it, which is what I'm going for.

 

 

Any help greatly appreciated! Thanks!
  • Hi EBL,

     

     

    You may want to change your methodology a little bit. You can configure LB re-select on the Pool (which should be synchronized across the HA Pair) by using the "Action On Service Down" option, which is set on the pool. Set this option to "Reselect".

     

     

    So if the node fails then the client will be sent to another working member of the pool.

     

     

    If no pool members are available (because you have taken them all down for maintenance) then you can use the "active_members" option inside of an iRule to determine if there are any members available. If so then the client would proceed to another node because of the Action On Service Down option being set to Reselect. If not, then you could redirect them to a different pool (which would negate the persistence).

     

     

    See the active_members iRule command.

     

     

    Hope this helps.
  • Thanks very much for your response - that'd be a great solution, however, we'd like to implement this iRule across a bunch of virtuals, so would like to avoid having to specify a check against a particular pool, if possible. Would the above suggestion (adding "persist none" to the iRule) remove persistence to the pool specified in the virtual, do you think?

     

    As an aside - we haven't had to use the "Action on Service Down" option on our pools yet, as it seems the load balancer recognizes down pool members fairly quickly and routes traffic to the remaining active members. Is there a benefit to setting this option to "Reselect" rather than just leaving it at its default ("None")?

     

    Thanks again for your help!

     

  • Hi EBL,

    If you want to use this as a generic iRule then you might want to try something like this:

    
    when HTTP_REQUEST {
    Check active servers.  If there are none, redirect to Maintenance Page
    if { [active_members [LB::server pool]] == 0 } {
    Clear any request page before routing traffic to new pool.  
    This will keep the requested [HTTP::host] value but reset the [HTTP::uri]. 
    HTTP::uri /
    pool maintenance-hub 
    }
    }
    when LB_FAILED {
    Catch pending requests / any failures and nullify their Persistence
    Then Redirect back to the base site for handling
    persist none
    HTTP::redirect "/"
    }
    

    This will check the the number of active members in the Default Pool [LB::server pool. If the results are 0 then change the URI to the root of the website (HTTP::uri /) hosted in the target pool (maintenance-hub).

    For any requests that were in flight the LB_FAILED will capture them, nullify the persistence and Redirect the requestor to the base of the site.

    I did not use the LB::reselect becuse it would attempt to select another member of the same pool and if there are none you might get a long hang and then a failure rather than some immediate action.

    You may want to put even more thought into the LB_FAILED Event. If your site is operating normally and then takes some type of hit (say one server crashes), you wouldn't want those users to be sent to a maintenance page if there are still other servers that could service them.

    I would suggest keeping the first portion of the HTTP_REQUEST that checks the active_members and handles the user when all of the servers are down (most likely on purpose for maintenance) and handle the in flight requests by adding the "Action On Service Down" option to "Reselect" on the pools to handle any server failures.

    Hope this helps.