Forum Discussion

MH's avatar
MH
Icon for Altocumulus rankAltocumulus
Dec 08, 2022

GTM - Topology load balancing failover when one pool is down

Hello All,

I am looking for a solution to the problem that has been raised several times, but I do not find a confirmed solution. The situation I am in is described in the following post: GTM Topology across pools issue when one of the po... - DevCentral (f5.com)

We have two topology records with the same source, but different destination pools, with different wights:

  • SRC: Region X => DEST: Pool A, wieght 200
  • SRC: Region X => DEST: Pool B, Wieght 20

When Pool A is down the Topology load balancing for the Wide IP still selects Pool A which is down, and no IP is returned to the client.

If the topology load balancing selection mechanism, is not going to take in the status of the destination pool and just stop on first match in its selection mechanism, then why have "Wieght" at all. I do no believe disabling "longest match" would help as this just affects the order the topology rules are searched, it woudl still stop with the first match.

The often mentioned solution is to use a single pool with Global Availability load balancing, as mentioned in the post: GTM and Topology - DevCentral (f5.com).

The problem I have is that Pool A and Pool B are pools with mulitple generic host servers.  I cannot have a pool with all generic host in it as we want to memebers in each Pool are Active/Active and not Active/ Backup

Many thanks,

Michael

11 Replies

  • Hi MH,

    This might be a long shot... but is it possible that you have enabled persistence? I'm not that experienced in DNS, but it does have a few quirks.

    /Mike

    • MH's avatar
      MH
      Icon for Altocumulus rankAltocumulus

      Hello,

      Persistence is enabled, we can see in the log file that the processing of the DNS requests are selecting the correct IP based on persisntence, furing the normal operation of the service. In the failover situation, we can see that the selection is not using persistence to pick the answer. 

      Many thanks,

      Michael

  • xuwen's avatar
    xuwen
    Icon for Cumulonimbus rankCumulonimbus

    give the  BIGIP version,  and the configuration of GTM wideip and gslb pool. In wideip Advanced>>Load Balancing Decision Log,  Check all the options. You can see why this pool member is selected in the /var/log/ltm

    After testing, the reason why wideip cannot rollback and skip tcp down gslb pool is related to the settings of Preferred, Alternate, and Fallback in gslb pool.
    1. If gslb pool_ A is manually disabled by the administrator, wideip will skip disabled gslb pool_A and automatically selected Status "up" gslb pool_ B
    2. If gslb pool_ A  all pool members are disabled by the administrator or all pool members tcp monitor down, wideip will automatically fallback to the Status "up" gslb pool_ B need to do this steps:

    wideip name Load Balancing Method set "Topology", Pools is ["gslb pool_A", "gslb pool_B"]

    gslb pool_A,pool_B Load Balancing Method, "Preferred" set Round Robin, "Alternate" set None, "Fallback" set Topology

    root@(f5)(cfg-sync Standalone)(Active)(/Common)(tmos)# list gtm wideip a www.bestpay.com.cn
    gtm wideip a www.bestpay.com.cn {
        aliases {
            mapi.bestpay.com.cn
        }
        load-balancing-decision-log-verbosity { pool-selection pool-traversal pool-member-selection pool-member-traversal }
        pool-lb-mode topology
        pools {
            gslb_pool_bestpay_ctc_v4 {
                order 0
            }
            gslb_pool_bestpay_cuc_v4 {
                order 1
            }
        }
        topology-prefer-edns0-client-subnet enabled
    }
    
    
    
    root@(f5)(cfg-sync Standalone)(Active)(/Common)(tmos)# list gtm pool a gslb_pool_bestpay_ctc_v4 
    gtm pool a gslb_pool_bestpay_ctc_v4 {
        alternate-mode none
        fallback-mode topology
        members {
            DC-2-GTM-ipv4:/Common/vs_ctc_97_22 {
                disabled
                member-order 0
            }
            DC-2-GTM-ipv4:/Common/vs_ctc_97_23 {
                disabled
                member-order 1
            }
        }
    }
    root@(f5)(cfg-sync Standalone)(Active)(/Common)(tmos)# list gtm pool a gslb_pool_bestpay_cuc_v4 
    gtm pool a gslb_pool_bestpay_cuc_v4 {
        alternate-mode none
        fallback-mode topology
        members {
            DC-2-GTM-ipv4:/Common/vs_ctc_98_22 {
                member-order 0
            }
            DC-2-GTM-ipv4:/Common/vs_ctc_98_23 {
                member-order 1
            }
        }
    }

     

    • Hi xuwen,

      So fallback-mode within the pools as "none" replicates the issue?

      To me this makes no sense. We're talking about the LB mode within the wide-ip to choose the correct pool. The LB mode within the pools (and the fallback mode, at that!) should have no effect on that. Do you agree?

      /Mike

      • xuwen's avatar
        xuwen
        Icon for Cumulonimbus rankCumulonimbus

        In wideip Advanced>>Load Balancing Decision Log, Check all the options. You can see why this pool member is selected in the /var/log/ltm

        Overview of BIG-IP DNS Topology records (11.x - 17.x) (f5.com)

        After testing with BIGIP VE, when the wideip LB mode is topology, it only considers the following bound gtm pool to calculate the corresponding score according to the topology algorithm. gtm pool A is 200 points, and gtm pool B is 20 points, so the gtm pool A with 200 points is highest selected, regardless of the up/down status of the gtm pool at wideip level(when the administrator disables the gtm pool, will skip the gtm pool disabled by the administrator and directly select the next gtm pool)
        If the fallback of the gtm pool is set to None or Topology, gtm pool A  Preferred Round Robin failed because no active pool members are available at the gtm pool level, and "Alternate" set None, "Fallback" set None or Topology, it will fallback to topology choose lower points status "up" gtm pool pool_B 

        If the fallback is set to Return to DNS, it will fallback to dns bind to resolution, and all pool members in gtm pool A and pool B will be returned as A records(such as A1,A1, B1, B2)

         

        fallback set "None", log:

         

        [mapi.bestpay.com.cn A] 
        [matched topology record (ldns:(region:/Common/all_ctc), server:(region:/Common/Region_CTC_GSLB_Pool), score:2000) to pool (gslb_pool_bestpay_ctc_v4)] 
        [matched topology record (ldns:(region:/Common/Other_Region), server:(region:/Common/Region_CUC_GSLB_Pool), score:5) to pool (gslb_pool_bestpay_cuc_v4)] 
        [topology selected pool (gslb_pool_bestpay_cuc_v4) - topology score (5) is higher] 
        [topology selected pool (gslb_pool_bestpay_ctc_v4) - topology score (2000) is higher] 
        [topology selected pool (gslb_pool_bestpay_ctc_v4) with the highest topology score (2000)] 
        [topology selected pool (gslb_pool_bestpay_ctc_v4)] 
        [pool member select check failed (vs_ctc_97_23:58.213.97.23) - pool member is disabled] 
        [pool member select check failed (vs_ctc_97_22:58.213.97.22) - pool member is disabled] 
        [round robin failed to select a pool member] 
        [failed to select pool member by preferred load balancing method] 
        [Using none load balancing method] 
        [failed to select pool member by alternate load balancing method] 
        [Using none load balancing method] 
        [failed to select pool member by fallback load balancing method] 
        [topology selected pool (gslb_pool_bestpay_cuc_v4) - topology score (5) is higher] 
        [topology selected pool (gslb_pool_bestpay_cuc_v4) with the highest topology score (5)] 
        [topology selected pool (gslb_pool_bestpay_cuc_v4)] 
        [pool member check succeeded (vs_ctc_98_23:58.213.98.23) - pool member state is available (green)] 
        [round robin selected pool member (vs_ctc_98_23:58.213.98.23)] 

         

         

        fallback set "Topology", log:

         

        [mapi.bestpay.com.cn A] 
        [matched topology record (ldns:(region:/Common/all_ctc), server:(region:/Common/Region_CTC_GSLB_Pool), score:2000) to pool (gslb_pool_bestpay_ctc_v4)] 
        [matched topology record (ldns:(region:/Common/Other_Region), server:(region:/Common/Region_CUC_GSLB_Pool), score:5) to pool (gslb_pool_bestpay_cuc_v4)] 
        [topology selected pool (gslb_pool_bestpay_ctc_v4) - topology score (2000) is higher] 
        [topology skipped pool (gslb_pool_bestpay_cuc_v4) - topology score (5) is not higher] 
        [topology selected pool (gslb_pool_bestpay_ctc_v4) with the highest topology score (2000)] 
        [topology selected pool (gslb_pool_bestpay_ctc_v4)] 
        [pool member select check failed (vs_ctc_97_23:58.213.97.23) - pool member is disabled] 
        [pool member select check failed (vs_ctc_97_22:58.213.97.22) - pool member is disabled] 
        [round robin failed to select a pool member] 
        [failed to select pool member by preferred load balancing method] 
        [Using none load balancing method] 
        [failed to select pool member by alternate load balancing method] 
        [pool member select check failed (vs_ctc_97_22:58.213.97.22) - pool member is disabled] 
        [pool member select check failed (vs_ctc_97_23:58.213.97.23) - pool member is disabled] 
        [topology failed to select a pool member] 
        [failed to select pool member by fallback load balancing method] 
        [topology selected pool (gslb_pool_bestpay_cuc_v4) - topology score (5) is higher] 
        [topology selected pool (gslb_pool_bestpay_cuc_v4) with the highest topology score (5)] 
        [topology selected pool (gslb_pool_bestpay_cuc_v4)] 
        [pool member check succeeded (vs_ctc_98_22:58.213.98.22) - pool member state is available (green)] 
        [round robin selected pool member (vs_ctc_98_22:58.213.98.22)] 
        [root@f5:Active:Standalone] config # 

         

         

        fallback set "Return to DNS", log:

         

        [mapi.bestpay.com.cn A] 
        [matched topology record (ldns:(region:/Common/all_ctc), server:(region:/Common/Region_CTC_GSLB_Pool), score:2000) to pool (gslb_pool_bestpay_ctc_v4)] 
        [matched topology record (ldns:(region:/Common/Other_Region), server:(region:/Common/Region_CUC_GSLB_Pool), score:5) to pool (gslb_pool_bestpay_cuc_v4)] 
        [topology selected pool (gslb_pool_bestpay_cuc_v4) - topology score (5) is higher] 
        [topology selected pool (gslb_pool_bestpay_ctc_v4) - topology score (2000) is higher] 
        [topology selected pool (gslb_pool_bestpay_ctc_v4) with the highest topology score (2000)] 
        [topology selected pool (gslb_pool_bestpay_ctc_v4)] 
        [pool member select check failed (vs_ctc_97_23:58.213.97.23) - pool member is disabled] 
        [pool member select check failed (vs_ctc_97_22:58.213.97.22) - pool member is disabled] 
        [round robin failed to select a pool member] 
        [failed to select pool member by preferred load balancing method] 
        [Using none load balancing method] 
        [failed to select pool member by alternate load balancing method] 
        [selected configured option Return To DNS] 

         

         

    • MH's avatar
      MH
      Icon for Altocumulus rankAltocumulus

      Hello,

      We had the loggin already enabled and we use dthis to verify that the topology load balancing was selecting the pool with al lmemembers down:

      [pool member check failed (Yyyyyyyy_RAS:y.y.y.y)]
      [pool member (Yyyyyyyy_RAS:y.y.y.y) deleted persistence (y.y.y.y)]
      [matched topology record (ldns:(region:/Common/Japan_RAS_VPN_region), server:(pool:/Common/Geneva_RAS_VPN_Pool), score:20) to pool (Geneva_RAS_VPN_Pool)]
      [matched topology record (ldns:(region:/Common/Japan_RAS_VPN_region), server:(pool:/Common/Japan_RAS_VPN_Pool), score:200) to pool (Japan_RAS_VPN_Pool)]
      [topology selected pool (Geneva_RAS_VPN_Pool) - topology score (20) is higher]
      [topology selected pool (Japan_RAS_VPN_Pool) - topology score (200) is higher]
      [topology selected pool (Japan_RAS_VPN_Pool) with the highest topology score (200)]
      [topology selected pool (Japan_RAS_VPN_Pool)]
      [pool member select check failed (Zzzzzzzz_RAS_VPN:z.z.z.z) - pool member is disabled]
      [pool member select check failed (Yyyyyyyy_RAS:y.y.y.y) - pool member is disabled]
      [round robin failed to select a pool member]
      [failed to select pool member by preferred load balancing method]
      [selected configured option Return To DNS]

      We were using the default "Return to DNS" as the faillback load balancing method.  If the Fallback load balancing method as topology, would this not only apply the topology rules ot the pool members of the pool.  But I have not tested what you have suggest. I will test it when I can.

      Many thanks,

      Michael

      • xuwen's avatar
        xuwen
        Icon for Cumulonimbus rankCumulonimbus

        Fallback load balancing method set None or Topology, result is the same(choose a lower points state "up" gtm pool  for wideip)
        The system will also check the up status of the gtm pool member when the persistence is enabled for wideip. Like LTM, if the pool member is down, it will be reselected