Forum Discussion

Robert_Landrito's avatar
Robert_Landrito
Icon for Nimbostratus rankNimbostratus
Aug 04, 2009

browser-handled site failover

Greetings,

 

 

I'm trying to find a way to do a seamless site failover that may or may not involve GTM techniques. The current setup is:

 

 

-site 1, ltm 9.4.5

 

-site 2, ltm 9.4.5

 

-two A records for mysite.com.

 

 

This is basic http, and lets assume for simplicity that the website is static and mirrored at both sites. A standard http profile is attached and the client is Internet Explorer. With multiple A records, the idea was to have Internet Explorer failover to site 2, in the event that site 1 goes down (or vice versa).

 

 

We find, however, that IE will only retry an alternate DNS record if the 3-way TCP handshake for a request fails. As far as I know, since LTM 9.x is a full proxy, the three way handshake always succeeds, even if all of the backend members of a particular VIP are down. I can use an iRule to send an RST on the CLIENT_ACCEPTED event, but this event is only called after the 3-way TCP handshake has completed. We did find that if a VIP was manually disabled (through the GUI or iControl), that the 3-way handshake will then fail, and we found that the browser did failover in this case.

 

 

We then considered a GTM option. In our setup, we found that GTM responded correctly to a downed site, and thereafter only published the IP address of the good site. But this does not help those clients who are already making requests, or that have IP addresses already cached locally.

 

 

We are now considering the following:

 

1) user defined trigger ("/config/user_alert.conf") to administratively disable a VIP on the line "No members available for pool ". The problem with this is that no log is generated when a pool comes back online. VIPs would therefore have to be reenabled manually.

 

2) use an external monitor that checks the availability of a pool, and administratively enable/disable VIPs accordingly. Our problem with this is that external monitors can get expensive. Our configs are already quite large and we'd like to avoid adding more load to the LTMs.

 

3) use a foreign external monitoring system that uses iControl and/or SNMP to monitor pool status, and enable/disable VIPs. Polling too often would add unwanted load to the LTMs and there would be a delay in detecting pool status.

 

 

I'm open to any suggestions or input anyone can provide. Thanks !

5 Replies

  • Use the Fallback Host setting in the HTTP config to issue a 302 redirect that will result in the "other" LTM virtual server destination IP.

     

     

    I'm not sure if IE will do a new lookup if you use the same URI, but you could just have two WideIPs in the GTM that both use the same pool(s). Then instead of issuing two responses to an A record request, you could issue one, and if an LTM virtual were to go down, queries generated by the fallback host would automatically respond with the active site.
  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    Basically, its not going to work very well... This is why load balancers came along in the first place many years ago (before irules etc). It was an easy way to control the balance of traffic (better than RR dns because its harder to fake), and more importantly it provides for availability.

     

     

    Now if you need 2 ip addresses... You need a smart dns... That would be in the bigip world a GTM. But you could fake it with DDNS and a custom alert on your LTM ( When the pool goes down remove the dns record. When it comes up, add it back in)

     

     

    H
  • @Hamish - It works fine.

     

     

    @Robert - You need to only hand out one response to A/AAAA requests on TM; then you are fully leveraging the synergy between LTM and GTM to service new records with the best available A or AAAA response, and using the HTTP falback feature in LTM to redirect when the pool member is down. The only time you have to wait for a timeout is if you lose the LTM HA pair or the whole data center goes dark - both bigger problems than slow redirects. LTM will tell GTM when service is restored and then GTM will re-insert the LTM virtual server into rotation.

     

     

    If you have non-HTTP traffic, then you probably don't have a way to redirect the traffic in the application layer protocol, so the best thing to do is send an RST (which the LTM should do if the pool members are unavailable according to the setting of Action on Service Down in the pool config). The more complex alternative is to set the virtual server address of the remote datacenter as a pool member, use priority group activation, and set the remote data center as the fall back priority group; this will send the traffic from dc1 to dc2 (and back again) if the servers in dc1 go down. It will also send new connections to the restored priority group when the pool members come back on line.

     

     

    The whole point of only issuing one A/AAAA response at the GTM is to keep this problem from happening. Older versions of Windows/IE were broken and would timeout cached DNS records in minutes, even if the TCP SYN never elicited an ACK; that is mostly fixed now, so you should be OK in setting responses to 1 and perhaps lowering your persistence timeout on GTM.