Forum Discussion
- Brad_ParkerCirrus
Are the nodes marked available?
- Dave_Watts_1515Nimbostratus1 of 10 members are marked as available. All 10 of them are reachable via ICMP and telnet to SMTP port from tmsh. The other 9 are marked as down by monitor. When I watched via tcpdump you can see the healthchecks going to the 1 server however you never see traffic to the other 9.
- Brad_Parker_139Nacreous
Are the nodes marked available?
- Dave_Watts_1515Nimbostratus1 of 10 members are marked as available. All 10 of them are reachable via ICMP and telnet to SMTP port from tmsh. The other 9 are marked as down by monitor. When I watched via tcpdump you can see the healthchecks going to the 1 server however you never see traffic to the other 9.
- nitassEmployee
have you checked mac address? is it correct?
have you ever tried to restart bigd?
- Brad_ParkerCirrus
I've seen this before, but don't yet have an answer as to why it happens. Force them down on the standby and then re-enable them to force the monitor to retest.
Are you using Vlan group?
- Dave_Watts_1515Nimbostratus
Forcing them offline and then re-enabling fixed this. Is there a specific condition that causes the healthcheck service/process to fail requiring this manual intervention? Thanks for the tip!
- Dave_Watts_1515Nimbostratus
Argh. This fixed the standby however it broke the active.
- nitassEmployeei think it may be good to open a support case. much appreciated if you update us the outcome. :-)
I'm having the same problem, I use the http monitor and some members get infinite status checking, with tcpdump the check attempt is not even done.
Doing test with ping, telnet and curl results is positive.
>> I already did the bigd restart;
>> I pushed forced offline and returned;
*** I believe a possible solution would be the touch / service / mcpd / forceload procedure; reboot, however, doing this I will not have the root cause of the problem
Note: As a palliative solution I enabled the tcp monitor and some members returned, however, some still have a problem.
update November 14, 2018
The "Logging Monitor" enabled on a Node was the trigger for Health Check failure, it is a known BUG as described in article K06263705.
After disabling the logging monitor and running the restart of the bigd process (clsh bigstart restart bigd) the environment has normalized.
Problem solved!