Forum Discussion

Dave_Watts_1515's avatar
Dave_Watts_1515
Icon for Nimbostratus rankNimbostratus
Nov 24, 2014

F5 not sending health check to all members in a pool

We have an active/standby setup. The standby unit is only sending HealthChecks to 1 out of 10 members in a pool. I manually tested the connection (smtp) and the F5 can connect to all members. When capturing traffic via tcpdump you ever see the traffic go out to 9 of 10 members.

 

Anyone seen anything like this before?

 

    • Dave_Watts_1515's avatar
      Dave_Watts_1515
      Icon for Nimbostratus rankNimbostratus
      1 of 10 members are marked as available. All 10 of them are reachable via ICMP and telnet to SMTP port from tmsh. The other 9 are marked as down by monitor. When I watched via tcpdump you can see the healthchecks going to the 1 server however you never see traffic to the other 9.
    • Dave_Watts_1515's avatar
      Dave_Watts_1515
      Icon for Nimbostratus rankNimbostratus
      1 of 10 members are marked as available. All 10 of them are reachable via ICMP and telnet to SMTP port from tmsh. The other 9 are marked as down by monitor. When I watched via tcpdump you can see the healthchecks going to the 1 server however you never see traffic to the other 9.
  • have you checked mac address? is it correct?

     

    have you ever tried to restart bigd?

     

  • I've seen this before, but don't yet have an answer as to why it happens. Force them down on the standby and then re-enable them to force the monitor to retest.

     

  • Forcing them offline and then re-enabling fixed this. Is there a specific condition that causes the healthcheck service/process to fail requiring this manual intervention? Thanks for the tip!

     

    • nitass's avatar
      nitass
      Icon for Employee rankEmployee
      i think it may be good to open a support case. much appreciated if you update us the outcome. :-)
  • I'm having the same problem, I use the http monitor and some members get infinite status checking, with tcpdump the check attempt is not even done.

     

    Doing test with ping, telnet and curl results is positive.

     

    >> I already did the bigd restart;

     

    >> I pushed forced offline and returned;

     

    *** I believe a possible solution would be the touch / service / mcpd / forceload procedure; reboot, however, doing this I will not have the root cause of the problem

     

    Note: As a palliative solution I enabled the tcp monitor and some members returned, however, some still have a problem.

     

    update November 14, 2018

     

    The "Logging Monitor" enabled on a Node was the trigger for Health Check failure, it is a known BUG as described in article K06263705.

     

    After disabling the logging monitor and running the restart of the bigd process (clsh bigstart restart bigd) the environment has normalized.

     

    Problem solved!