We have an active/standby setup. The standby unit is only sending HealthChecks to 1 out of 10 members in a pool. I manually tested the connection (smtp) and the F5 can connect to all members. When capturing traffic via tcpdump you ever see the traffic go out to 9 of 10 members. Anyone seen anything like this before?

Are the nodes marked available?

1 of 10 members are marked as available. All 10 of them are reachable via ICMP and telnet to SMTP port from tmsh. The other 9 are marked as down by monitor. When I watched via tcpdump you can see the healthchecks going to the 1 server however you never see traffic to the other 9.

have you checked mac address? is it correct? have you ever tried to restart bigd?

F5 not sending health check to all members in a pool

11 Replies

Brad_Parker
Cirrus
Nov 24, 2014
Are the nodes marked available?
- Dave_Watts_1515
  Nimbostratus
  Nov 24, 2014
  1 of 10 members are marked as available. All 10 of them are reachable via ICMP and telnet to SMTP port from tmsh. The other 9 are marked as down by monitor. When I watched via tcpdump you can see the healthchecks going to the 1 server however you never see traffic to the other 9.
Brad_Parker_139
Nacreous
Nov 24, 2014
Are the nodes marked available?
- Dave_Watts_1515
  Nimbostratus
  Nov 24, 2014
  1 of 10 members are marked as available. All 10 of them are reachable via ICMP and telnet to SMTP port from tmsh. The other 9 are marked as down by monitor. When I watched via tcpdump you can see the healthchecks going to the 1 server however you never see traffic to the other 9.
nitass
Employee
Nov 24, 2014
have you checked mac address? is it correct?

have you ever tried to restart bigd?
Brad_Parker
Cirrus
Nov 24, 2014
I've seen this before, but don't yet have an answer as to why it happens. Force them down on the standby and then re-enable them to force the monitor to retest.
Muhammad_Irfan1
Cirrus
Nov 24, 2014
Are you using Vlan group?
Dave_Watts_1515
Nimbostratus
Nov 25, 2014
Forcing them offline and then re-enabling fixed this. Is there a specific condition that causes the healthcheck service/process to fail requiring this manual intervention? Thanks for the tip!
Dave_Watts_1515
Nimbostratus
Nov 26, 2014
Argh. This fixed the standby however it broke the active.
- nitass
  Employee
  Nov 30, 2014
  i think it may be good to open a support case. much appreciated if you update us the outcome. :-)
Adriano_Bezerr1
Cirrus
Nov 12, 2018
I'm having the same problem, I use the http monitor and some members get infinite status checking, with tcpdump the check attempt is not even done.

Doing test with ping, telnet and curl results is positive.

>> I already did the bigd restart;

>> I pushed forced offline and returned;

*** I believe a possible solution would be the touch / service / mcpd / forceload procedure; reboot, however, doing this I will not have the root cause of the problem

Note: As a palliative solution I enabled the tcp monitor and some members returned, however, some still have a problem.

update November 14, 2018

The "Logging Monitor" enabled on a Node was the trigger for Health Check failure, it is a known BUG as described in article K06263705.

After disabling the logging monitor and running the restart of the bigd process (clsh bigstart restart bigd) the environment has normalized.

Problem solved!