Forum Discussion
Gea-Suan_Lin_34
Nimbostratus
Jan 05, 2010Packet lost when loading
Hello,
We have two F5 BIG-IP LTM 6400 with 9.4.7 HF2, running with Active/Standby mode. I setup snmp trap, and copy these log to IRC, to let our management team understand what's happen easily.
There are lots of monitor DOWN & UP msgs about one year. But because it's quite quick to UP again, and usually we have 3+ servers in one pool, this is not issue.
Recently, because the site grow, DOWN/UP is quote annoyed, I want to find out the problem and fix it. So I've tried serveral way to diagnose.
I tried to run tcpdump on both web server and F5 itself, and I found there is packet lost, which causes monitor DOWN:
https://gist.github.com/6f573f746c2eed533e65
As you can see, after F5 send first SYN packet, webapi-1's first reply (SYN+ACK) didn't be received by F5. And then both side tried to resend packet, which causes issue.
I also tried to ping (with interval 0.01 sec) and get:
10000 packets transmitted, 9995 packets received, 0.1% packet loss
round-trip min/avg/max/stddev = 0.082/0.235/8.674/0.182 ms
At the same time, I also ping the standby one, which has no packet lost:
10000 packets transmitted, 10000 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.079/0.155/0.666/0.040 ms
Any possible cause ? I've seen some discussion in http://devcentral.f5.com/Default.aspx?tabid=53&view=topic&postid=34302 but this seems not same issue.
4 Replies
- The_Bhattman
Nimbostratus
Hi Gea-Suan,
I have had this problem ongoing on and off. I haven't been able to resolve every one one but about 95% of it was related to how the webserver responds to health probes. Most cases I deal with it was adjusting the probing from say 5 seconds to 10 seconds with a larger timeout and that appeared to either resolve it or usually cut down the false alarms. In other cases the application needed to be tuned. Ultimately your best bet is to work with F5 support so they can look into your configuration and determine the best course of action.
Bhattman - Gea-Suan_Lin_34
Nimbostratus
I've upgraded to 9.4.8 HF2 yesterday, and the the problem still exists.
If it's only affected monitor, then I can accept to set timeout from 5 secs (current setting) to 10 secs. But I've seen 2.7% packet lost in peak time, which causes the traffic between F5 and backend server slow.
Anyway, I'll contact support team to investigate the issue and see what's happen. Thanks for reply. - smp_86112
Cirrostratus
So the SYN-ACK sent from the web server to the LTM was never received - definitely indicates packet loss. This sounds very much like a duplex mismatch somewhere between the two endpoints . I would start comparing the speed and duplex settings everywhere along the device chain. In particular if you have 100Mbit links somewhere, ensure one side isn't set to 100/Full while the other is set to Auto/Auto. This results in a half-duplex link which causes collisions and packet loss - I've seen it many times. - The_Bhattman
Nimbostratus
It's definitely possible. Also if you can check on the version of the switch you are using to make sure. I know we had a couple switches that were incorrectly going half-duplex even when all the wiring and settings were set to Auto and it turned out to be some bug.
Bhattman
Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects