Forum Discussion

peter_booth_716's avatar
peter_booth_716
Icon for Nimbostratus rankNimbostratus
Aug 14, 2012

Why isnt Active/Active the norm?

I'm a technologist who admits to having some strong technical prejudices-

 

 

- I hate slow applications and websites ... which is why I choose to work in performance

 

- I prefer vendors who nurture open customer discussions (as F5 does)

 

- I prefer open source to commercial software (unless the commercial is demonstrably better)

 

- I bristle at the idea of "enterprise vendors" having seen too much shelfware and wasted dollars when people spend "Other People's Money"

 

- I hate seeing hardware thats under-utilized having seen how it makes it harder to build snappy, fast systems.

 

 

 

and, in general, I hate Active/Standby configurations, because I don't like seeing hardware sitting "idle" and I mistrust that Standby systems will work unless I see them working.

 

 

 

All that said, I've become a fan of F5 after having worked with many other hardware and software load balancers in the past. But one question stumps me:

 

 

 

why is the most common LTM configuration active/standby and not dual active?

 

 

 

Peter

 

  • Hi Peter,

     

     

    The major drawback of active-active is that you risk overloading both units past 50% and then having a complete failure if one unit goes down.

     

     

    F5 has added support for device service clustering in v11. This allows you to cluster up to 8 devices (currently, with higher counts targeted) and run groups of VIPs on a single unit. This allows you to achieve higher overall device utilization.

     

     

     

    Release Note: BIG-IP LTM and TMOS version 11.0.0

     

    https://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/product/relnote_11_0_0_ltm.htmlltm_rn_1100_new

     

     

    Device Service Clustering

     

     

    In this release, the Traffic Management Operation System (TMOS) within the BIG-IP system includes an underlying architecture that allows you to create an N+1 redundant system configuration, known as device service clustering (DSC). This redundant system architecture provides both synchronization of multiple BIG-IP configuration data and high availability at user-defined levels of granularity.

     

     

     

    Aaron
  • Hi Peter, Aaron,

     

     

    Fresh from the LTM course and we have just implemented F5 in our network. We are running active/active with traffic profiles so we can manipulate the primary route of traffic to the right site, as we also have active/active data centres.

     

     

    We are harldly running anything through thiese boxes yet but the memory allocation is 32%. Does the 50% threshold for active/active implementations apply to memory or just CPU?

     

     

    What parameters do you think we should check to ensure our failover is within limits?
  • The short answer is that all of this is currently changing. Aaron's answer (the first line) is spot on for most environments. In a 10.x world, you could run both active / active, and with VS creep over the years, you would reach a point where the boxes could not fail over for one another in terms of traffic load. In 11.x, with device clustering, you can run active / active / passive. Or active / active / active / passive / passive (you get the point) - As Aaron pointed out.

     

     

    I have set up a few active / active configurations. Most were on VIPRIONs, and running vCMP, but that's beside the point. What I am looking for in the future is clustering between physical and virtual hosts which, IMHO, would be very cool, and something that folks ask about all the time.

     

     

    CPU, memory, throughput, SSL transactions per second - These are the limiting factors you run into when running active / active. Keep this in mind, and remember - PLAN IT OUT FOR FUTURE GROWTH!!!!
  • About the memory datapoint of 32%. My limited experience suggests that there's nothing to be concerned about there ...

     

     

    We have a dev/QA LTM that gets almost zero traffic, under 2% CPU utilization, and currently has 27% memory utilization.

     

    For me, that makes it a non-issue.

     

     

    Perhaps the memory datum is similar to memory usage in Linux or Solaris, where "free" memory is held by a "free list" or buffer pool, because that performs better than "really" freeing memory?