For more information regarding the security incident at F5, the actions we are taking to address it, and our ongoing efforts to protect our customers, click here.

Forum Discussion

dipta_03_149731's avatar
dipta_03_149731
Icon for Nimbostratus rankNimbostratus
Oct 07, 2015

The load balancing on GTM is not working properly. Preferred is Global Availability and rest are none.

I have a GTM setup where the load balancing methods are as follows between two VIPs, one for productiona nad another for DR.

gtm pool xxx.lb { alternate-mode none fallback-ipv4 any fallback-mode none load-balancing-mode global-availability members { KAN-DMZ-GTM-A:xxx_lb-KC { order 1 } NUS-DMZ-GTM-A:xxx_lb-NUS { disabled order 2 } nus-f5-10200v:/xxx_com-80 { order 0 } }

monitor gateway_icmp

}

The preferred method is GA and Alternate/Fallback is none. As per lb method all request should go to member with order 0 and if thats down it should go to member with order 1. But in our case traffic often going to the member with order 1 and kind of acting as round robin.

Any suggestions..

11 Replies

  • Do you see any entries in /var/log/gtm about the order 0 vip being marked down/unavailable? Are these vips on LTMS? What sort of monitors is GTM using?

     

  • There's one more custom monitor apart from tcp. I have put that below:

     

    ltm monitor http xxx-http-lb.txt { defaults-from /Common/http destination : interval 30 partition xxx recv YES recv-disable NO send "GET /_prodsupport/lb.txt HTTP/1.1\r\nHost: www3.xxx.com\r\nConnection: Close\r\n\r\n" time-until-up 0 timeout 46 up-interval 10 }

     

  • I have disabled the DR VIP completely on GTM and only Prod VIP is active right now still we see traffic going to DR.

     

  • Are you certain it is the GTM that is resolving the wideip to the DR vip? In the web interface check GTM virtual server statistics: Statistics >> Module Statistics: DNS: GSLB >> Statistics Type: Virtual Servers. The "Picks" column will show how many times GTM has resolved to each vip in the pool.

     

    GTM should only pick the DR vip if the primary is unavailable according to the way global availability works. If GTM is picking the DR vip then I would suspect it either thinks the primary is down or you are facing a bug in GTM.

     

    The monitor config you posted appears to be an LTM monitor. Does GTM also have monitors assigned to its pool members or virtual servers? These could be marking the GTM primary vip down causing GTM to resolve to the DR vip.

     

  • Hello Scott,

     

    Yes I am sure that GTM is resolving the wideip to the DR VIP. One thing we noticed yesterday was that the primary VIP is continously flapping and going RED . When its going RED , thats when the traffic is going to DR VIP.

     

    On LTM teh primary VIp , pool members are green when it shows red on GTM , so not sure why GTM is marking it down. Below I have pasted few logs:

     

    Thu Oct 15 04:45:49 EDT 2015noticegslb2gtmd[2020]011ae01cConnection complete to 216.229.158.9. Starting SSL handshake Thu Oct 15 04:45:49 EDT 2015gslb2iqmgmt_ssl_connect SSL error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed Thu Oct 15 04:45:49 EDT 2015errgslb2gtmd[2020]011ae0faiqmgmt_ssl_connect: SSL error: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed (336134278) Thu Oct 15 04:45:49 EDT 2015errgslb2gtmd[2020]011ae00bCould not find monitor object 216.229.154.10:443

     

    Main thing to notice here is that GTM logs a message : could not find monitor object for primary VIP. I was checking one of sol article and it says only if GTM devices in sync are in diff software versions then GTm will throw such error. But our devices are running on same version and HF.

     

  • Hi dipta,

     

    The message iqmgmt_ssl_connect: SSL error: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed (336134278) would indicate that the iquery connection between the GTM and LTM is failing. This can cause unexpected monitor results. Possible causes are bigip device certificates are expired or bigips are not in time sync. Am I correct in assuming that the addr 216.229.158.9 is the LTM self-ip and the addr 216.229.154.10:443 is the vip?

     

    See these articles for troubleshooting the SSL error you are getting:

     

    sol9467

     

    sol14106

     

    Also take a look at articles regarding GTM monitors

     

    sol15408

     

  • I checked those Scott and the iquery is successfull. Also iqdump over the self ip for ltm and GTM shows data. So its not for SSl error . I checked the device certificates and its there valid till 2024.

     

    From logs I see the other GTM is marking this VIP down though the other GTM has same big3d version, software version as that of primary GTM.

     

    Oct 15 07:11:08 gslb2 alert gtmd[2020]: 011a4003:1: SNMP_TRAP: Pool /Common/ftp.lb member /SKP/xxx_im_com-22 (ip:port=216.x.x.x:22) state change green --> red ( Monitor /Common/gateway_icmp from /Common/NUS-DMZ-GTM-A : no reply from big3d: timed out)

     

  • The "iqmgmt_ssl_connect: SSL error" is not normal. I still suspect there is an iquery connection failure somewhere. If I understand correctly you have 2 GTMs-they will have iquery connection with each other. Both of these GTMs will also have an iquery connection with every LTM server object present in the GTM config. This is how GTM delegates monitoring tasks to bigips in your data centers. With the default config GTM rotates monitoring tasks among all the bigips in a data center. I suspect there is an LTM that the GTM is failing to reach when it tries to delegate monitor. The error no reply from big3d: timed out also leads me in that direction.

     

    Another possible issue could be that the bigip being asked to run the monitor is incapable of reaching the device that is to be monitored. This could account for the fact that the primary is up sometimes but down at other times; it would depend on which bigip is performing the monitoring. Are all bigips able to communicate with one another over tcp port 4353? Are all bigips able to reach every vip that is being monitored?

     

    If all the GTM vips are hosted on LTMs then the best practice is to not add additional GTM monitors to the servers or pools. Let the LTMs report vip status to the GTMs through the bigip monitor and iquery connection.

     

    I would also check big3d version on every bigip, GTM and LTM. They must all be compatible.

     

    If you haven't seen these articles I highly recommend reading them:

     

    sol13703

     

    sol8170

     

  • Hi Scott, Thanks for shedding some light on this , which helped to troubleshoot the root cause.

     

    So GTM was trying to communicate to an old LTM from which the server object was removed and put into another LTM but we forgot to remove the server objects hence we were getting continous no reply from big3d.

     

    Also I reemoved the gateway icmp monitor which I had within the pool in GTM .

     

    Above troubleshooting resolved the issue.

     

    The issue was due to the combination of: -2 different VS -On 2 different servers -they are sharing the same ip -they use additional monitor /Common/gateway_icmp

     

    Then when a bigip is responsible for probing both the VS at the same time the big3d response does not contain the server_name and the vs_name and the gtmd thinks that there is no response for his request (his request contained, vs_name and server). By removing the monitor or the vs from the pool we managed to resolve the issue.