24-Nov-2014 12:11
We have an active/standby setup. The standby unit is only sending HealthChecks to 1 out of 10 members in a pool. I manually tested the connection (smtp) and the F5 can connect to all members. When capturing traffic via tcpdump you ever see the traffic go out to 9 of 10 members.
Anyone seen anything like this before?
24-Nov-2014 13:02
Are the nodes marked available?
24-Nov-2014 16:57
24-Nov-2014 13:02
Are the nodes marked available?
24-Nov-2014 16:57
24-Nov-2014 17:05
I've seen this before, but don't yet have an answer as to why it happens. Force them down on the standby and then re-enable them to force the monitor to retest.
24-Nov-2014 18:12
Are you using Vlan group?
25-Nov-2014 10:42
Forcing them offline and then re-enabling fixed this. Is there a specific condition that causes the healthcheck service/process to fail requiring this manual intervention? Thanks for the tip!
26-Nov-2014 16:39
Argh. This fixed the standby however it broke the active.
30-Nov-2014 00:06
12-Nov-2018 21:05
I'm having the same problem, I use the http monitor and some members get infinite status checking, with tcpdump the check attempt is not even done.
Doing test with ping, telnet and curl results is positive.
>> I already did the bigd restart;
>> I pushed forced offline and returned;
*** I believe a possible solution would be the touch / service / mcpd / forceload procedure; reboot, however, doing this I will not have the root cause of the problem
Note: As a palliative solution I enabled the tcp monitor and some members returned, however, some still have a problem.
update November 14, 2018
The "Logging Monitor" enabled on a Node was the trigger for Health Check failure, it is a known BUG as described in article K06263705.
After disabling the logging monitor and running the restart of the bigd process (clsh bigstart restart bigd) the environment has normalized.
Problem solved!