Forum Discussion

paulpatriot_129's avatar
paulpatriot_129
Icon for Nimbostratus rankNimbostratus
Feb 01, 2018

Healthchecks failing on one unit working on other unit until rebooting the F5 fixes the problem

We failed the Active Unit over to backup and on some pools the Healthcheck are failing. The Active unit was working fine as we needed to reboot the device to get the Healthchecks working again. We are running code 12.1.2 HF2. Anyone else experiencing these problems?

 

4 Replies

  • There are a large number of possible causes, not all directly involving the BigIP. To start with, I would suggest that you take a packet capture on the unit where the healthchecks are failing. All monitor traffic should be sourced from the non floating self-IP of the BigIP (each BigIP handles monitoring independently), which should simplify capturing.

     

    If the problem is not apparent at that point, traffic is not leaving the BigIP, or traffic is leaving the BigIP and being responded to but we're still not marking the pool members up, please open a support case so we can look at the issue in more depth.

     

    Note that in this case, a packet capture, a qkview, and a network diagram would all help with resolution.

     

  • K thanks I have a case open with F5 c2634290 uploaded qkview files and as far as the Network topology in a one arm configuration utilizing sourcenat. I 've never seen this before until we upgraded to version 12.1.2 HF2

     

  • Unfortunately your engineer in that case is correct. If the problem is no longer happening, root cause becomes very difficult if not impossible. As to the root cause, it can be an ARP issue, it can be that something has gone wrong (though I am unaware of any known issues that would cause this), it can be that a firewall is blocking an over aggressive monitor. There are many things that can cause the system to fail it's health monitors. While root cause may not be possible, a plan of action should this happen again is not only possible, it's advisable.

     

    I would suggest that should this happen again, you start by taking a packet capture on one or more of the failing monitors. Then take a qkview. Note that if you have the luxury of doing so, it may be better to wait to reboot until after an investigation can be completed. If this is configurational or due to a software bug it may continue happening, and without the ability to trace what is happening we may not be able to identify and fix the underlying problem. F5 support should absolutely be able to help you with this.

     

  • I have a capture history from the server side let me look and see if the health monitors were getting there Thanks