Forum Discussion

MH's avatar
MH
Icon for Altocumulus rankAltocumulus
Dec 08, 2022

GTM - Topology load balancing failover when one pool is down

Hello All,

I am looking for a solution to the problem that has been raised several times, but I do not find a confirmed solution. The situation I am in is described in the following post: GTM Topology across pools issue when one of the po... - DevCentral (f5.com)

We have two topology records with the same source, but different destination pools, with different wights:

  • SRC: Region X => DEST: Pool A, wieght 200
  • SRC: Region X => DEST: Pool B, Wieght 20

When Pool A is down the Topology load balancing for the Wide IP still selects Pool A which is down, and no IP is returned to the client.

If the topology load balancing selection mechanism, is not going to take in the status of the destination pool and just stop on first match in its selection mechanism, then why have "Wieght" at all. I do no believe disabling "longest match" would help as this just affects the order the topology rules are searched, it woudl still stop with the first match.

The often mentioned solution is to use a single pool with Global Availability load balancing, as mentioned in the post: GTM and Topology - DevCentral (f5.com).

The problem I have is that Pool A and Pool B are pools with mulitple generic host servers.  I cannot have a pool with all generic host in it as we want to memebers in each Pool are Active/Active and not Active/ Backup

Many thanks,

Michael

  • xuwen's avatar
    xuwen
    Icon for Cumulonimbus rankCumulonimbus

    give the  BIGIP version,  and the configuration of GTM wideip and gslb pool. In wideip Advanced>>Load Balancing Decision Log,  Check all the options. You can see why this pool member is selected in the /var/log/ltm

    After testing, the reason why wideip cannot rollback and skip tcp down gslb pool is related to the settings of Preferred, Alternate, and Fallback in gslb pool.
    1. If gslb pool_ A is manually disabled by the administrator, wideip will skip disabled gslb pool_A and automatically selected Status "up" gslb pool_ B
    2. If gslb pool_ A  all pool members are disabled by the administrator or all pool members tcp monitor down, wideip will automatically fallback to the Status "up" gslb pool_ B need to do this steps:

    wideip name Load Balancing Method set "Topology", Pools is ["gslb pool_A", "gslb pool_B"]

    gslb pool_A,pool_B Load Balancing Method, "Preferred" set Round Robin, "Alternate" set None, "Fallback" set Topology

    root@(f5)(cfg-sync Standalone)(Active)(/Common)(tmos)# list gtm wideip a www.bestpay.com.cn
    gtm wideip a www.bestpay.com.cn {
        aliases {
            mapi.bestpay.com.cn
        }
        load-balancing-decision-log-verbosity { pool-selection pool-traversal pool-member-selection pool-member-traversal }
        pool-lb-mode topology
        pools {
            gslb_pool_bestpay_ctc_v4 {
                order 0
            }
            gslb_pool_bestpay_cuc_v4 {
                order 1
            }
        }
        topology-prefer-edns0-client-subnet enabled
    }
    
    
    
    root@(f5)(cfg-sync Standalone)(Active)(/Common)(tmos)# list gtm pool a gslb_pool_bestpay_ctc_v4 
    gtm pool a gslb_pool_bestpay_ctc_v4 {
        alternate-mode none
        fallback-mode topology
        members {
            DC-2-GTM-ipv4:/Common/vs_ctc_97_22 {
                disabled
                member-order 0
            }
            DC-2-GTM-ipv4:/Common/vs_ctc_97_23 {
                disabled
                member-order 1
            }
        }
    }
    root@(f5)(cfg-sync Standalone)(Active)(/Common)(tmos)# list gtm pool a gslb_pool_bestpay_cuc_v4 
    gtm pool a gslb_pool_bestpay_cuc_v4 {
        alternate-mode none
        fallback-mode topology
        members {
            DC-2-GTM-ipv4:/Common/vs_ctc_98_22 {
                member-order 0
            }
            DC-2-GTM-ipv4:/Common/vs_ctc_98_23 {
                member-order 1
            }
        }
    }

     

    • MH's avatar
      MH
      Icon for Altocumulus rankAltocumulus

      Hello,

      We had the loggin already enabled and we use dthis to verify that the topology load balancing was selecting the pool with al lmemembers down:

      [pool member check failed (Yyyyyyyy_RAS:y.y.y.y)]
      [pool member (Yyyyyyyy_RAS:y.y.y.y) deleted persistence (y.y.y.y)]
      [matched topology record (ldns:(region:/Common/Japan_RAS_VPN_region), server:(pool:/Common/Geneva_RAS_VPN_Pool), score:20) to pool (Geneva_RAS_VPN_Pool)]
      [matched topology record (ldns:(region:/Common/Japan_RAS_VPN_region), server:(pool:/Common/Japan_RAS_VPN_Pool), score:200) to pool (Japan_RAS_VPN_Pool)]
      [topology selected pool (Geneva_RAS_VPN_Pool) - topology score (20) is higher]
      [topology selected pool (Japan_RAS_VPN_Pool) - topology score (200) is higher]
      [topology selected pool (Japan_RAS_VPN_Pool) with the highest topology score (200)]
      [topology selected pool (Japan_RAS_VPN_Pool)]
      [pool member select check failed (Zzzzzzzz_RAS_VPN:z.z.z.z) - pool member is disabled]
      [pool member select check failed (Yyyyyyyy_RAS:y.y.y.y) - pool member is disabled]
      [round robin failed to select a pool member]
      [failed to select pool member by preferred load balancing method]
      [selected configured option Return To DNS]

      We were using the default "Return to DNS" as the faillback load balancing method.  If the Fallback load balancing method as topology, would this not only apply the topology rules ot the pool members of the pool.  But I have not tested what you have suggest. I will test it when I can.

      Many thanks,

      Michael

      • xuwen's avatar
        xuwen
        Icon for Cumulonimbus rankCumulonimbus

        Fallback load balancing method set None or Topology, result is the same(choose a lower points state "up" gtm pool  for wideip)
        The system will also check the up status of the gtm pool member when the persistence is enabled for wideip. Like LTM, if the pool member is down, it will be reselected

    • Hi xuwen,

      So fallback-mode within the pools as "none" replicates the issue?

      To me this makes no sense. We're talking about the LB mode within the wide-ip to choose the correct pool. The LB mode within the pools (and the fallback mode, at that!) should have no effect on that. Do you agree?

      /Mike

      • MH's avatar
        MH
        Icon for Altocumulus rankAltocumulus

        Hello,

        I found documentaiton that if we set the Alternate and Fallback to None, then it will fall back and use the next avaiable pool.  This is a situation I am now testing. It woudl work in a situation with only 2 load balancing pool.  With more than 3 load balancing pools, this is not ideal, as it does not give you explicit control over the pool that is then selected as it says simply that it selects the next avaiable pool. So this is likely to be the based on the order of the pools associated with the Wide IP.

        Thanks,

        Michael

  • Hi MH,

    This might be a long shot... but is it possible that you have enabled persistence? I'm not that experienced in DNS, but it does have a few quirks.

    /Mike

    • MH's avatar
      MH
      Icon for Altocumulus rankAltocumulus

      Hello,

      Persistence is enabled, we can see in the log file that the processing of the DNS requests are selecting the correct IP based on persisntence, furing the normal operation of the service. In the failover situation, we can see that the selection is not using persistence to pick the answer. 

      Many thanks,

      Michael