Forum Discussion

f5gtm_45183's avatar
Icon for Nimbostratus rankNimbostratus
Jan 23, 2012

Graceful failover

Hi all,



I've found a couple of similar questions to mine however they're either too confusing or the answers do not fit my requirement so please accept my apologies if this seems like it's already been answered.



I have 2 data centres with different ip address ranges.


I have a gtm (v10) configured at each site which sync up with each other.


I have a web site configured (clones at each site) - the gtm dns has an ip address for each range configured for my web site.



My problem... if one of the developers take one of the sites offline (for maintenance) then the client browsers that were connected to downed site are affected in that they experience a blank screen on their next click.


Refreshing (i.e. pressing F5) the browser doesn't work. However closing the browser and opening it again does seem to resolve the problem, or if you wait a while (5 mins +) the site will return. Far from ideal for our customers. I understand this is a dns / local cache issue.



This seems like a standard / minimum requirement for most of us I would imagine, and I'm sure others have experienced this issue too.



Do I have to live with this situation or is there anything I can do to resolve or minimise the impact on our customer's experience when browsing our site?



It might also be worth mentioning that our web site config is using persistence seen as the customer session will add items to their basket. Some of our site will be serverd via https too.



Any help or guidance would be greatly appreciated.



Many thanks,






3 Replies

  • George_Watkins_'s avatar
    Historic F5 Account
    Hi f5gtm,



    The default TTL for a WideIP is 30 seconds. If a browser or local DNS cache is holding the record longer than that then it is breaking standard by not honoring the A record's TTL. I would recommend talking to your developers and asking for a 30 minute window to failover the WideIP. You can monitor your LTMs statistics until traffic subsides, then they can proceed with their work. Site-specific session data always presents a challenge. A shared session database is the best way I know of to track shopping carts. I've seen it done site-to-site with an MySQL NDB cluster and it works well. Hope this helps clarify,



  • Hi George,



    Thanks for the reply. I have more info and would like you to clarify a point if possible?



    You mentioned getting the developers to wait 30 minutes for the failover to happen. Could you confirm how you would do the failover? You see we are doing this by disabling the pool member on the gtm however the number of connections never seem to go down - well only by a small amount. We've even left it on over the weekend and the connections seem to still exist after a long period of time.



    The other part of this project is to do with Internet Explorer. It seems IE has it's own internal dns cache. When we perform a failover (by manually stopping the web service on the web server) our test systems using Firefox and Chrome pretty much continue without any problems - you might have to press F5 once or twice. However with IE the only way we can get our session to continue (i.e. so we can carry on browsing the site) is to close the browser down and start a fresh instance. We don't require any session data to be carried over with the failover, we just need the browser to be able to continue browsing the site thereby not inconveniencing the customer.



    Hopefully that makes sense. All we want these boxes to do is failover without causing any disruption. It seems we're unable to do this. I hope this is due to a lack of understanding on our behalf and not the 'features' of F5 and Microsoft products.