Forum Discussion
Receiving Log of Servers down for 9-10 Seconds
Hi,
We are regularly receiving logs that our pool members are unavailable temporarily. About 16x per week at random intervals. The back end we use to monitor with ICMP never goes offline but, the pool member using http to monitor does.
It's always the same amount of downtime 9-10 seconds consistently. It really doesn't make sense. We're on 11.5.1 with HF4 BIGIP 5000s.
I created another test server in the same virtual environment thinking it could be related to the application but the issue was reproduced. It appears to be either a virtual environment or F5 issue.
Any ideas on how to troubleshoot?
6 Replies
- shaggy_121467
Cumulonimbus
If you have an HA pair, are both BIGIPs showing the monitor failures simultaneously? If so, that helps eliminate a BIGIP system issue. I would start by using tcpdump to capture the failed monitors and see exactly why they're failing (bad response, no response, etc.). Another option is to have another non-F5 device in the same network as the F5s mimicking the health monitor using curl to see if it shows failures/timeouts at the same time as the F5
- Nfordhk_66801
Nimbostratus
It's never the pair at the same time. but this issue affects both in the pairs at random various times
- shaggy
Nimbostratus
If you have an HA pair, are both BIGIPs showing the monitor failures simultaneously? If so, that helps eliminate a BIGIP system issue. I would start by using tcpdump to capture the failed monitors and see exactly why they're failing (bad response, no response, etc.). Another option is to have another non-F5 device in the same network as the F5s mimicking the health monitor using curl to see if it shows failures/timeouts at the same time as the F5
- Nfordhk_66801
Nimbostratus
It's never the pair at the same time. but this issue affects both in the pairs at random various times
- nathe
Cirrocumulus
My advice would be to review the health monitor setup, in particular the Interval and Timeout. Does this need to be increased? Also, I'd run a packet capture to see if there is a delay in the response from the server perhaps. something like tcpdump -ni 0.0 host server_ip and port 80. - Nfordhk_66801
Nimbostratus
Found the issue to be a bug with the help of F5 Impact Pool members monitored by the affected health monitor are erroneously marked down. Symptoms As a result of this issue, you may encounter the following symptoms: Pool members are marked down when they are actually up. A packet capture on the affected monitor traffic shows that the BIG-IP system receives a SYN/ACK from a pool member and responds with an ICMP destination unreachable message. Here is the link to a solution article that details the issue. SOL15907: The BIG-IP system may incorrectly send an ICMP destination unreachable message to a server responding to health monitor traffic on TCP source port 54321 https://support.f5.com/kb/en-us/solutions/public/15000/900/sol15907.html SOL13123: Managing BIG-IP product hotfixes (11.x) https://support.f5.com/kb/en-us/solutions/public/13000/100/sol13123/
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com