Forum Discussion
monitor timeouts vs actual behaviour
3n+1 is the recommendation from F5 only. It's based on their perception of a 'safe' number of tries before marking a pool member down. If you find that a different interval/timeout works better for you in your environment - go for it and create a new monitor based on the default one.
Having said that I have seen and also wondered at the behaviour of monitors when the servers are not behaving optimally (as far as I can see they do behave exactly as per the doco when they complete handshakes and other responses efficiently).
The F5 is supposed to RST a TCP connection after 3 retransmissions, however we can see 6 SYN transmissions for a single client port in your tcpdump above. Maybe they don't follow the tcp profile settings?
In addition I totally agree with your observation "Testing just now, I see the HTTP monitor just crudely stuffing additional GET's down the same connection that's still waiting for a response, what's that all about??" - what indeed? It's disturbed me for a long time and I've been meaning to post to this forum about it. It's like it's doing HTTP request pipelining, and I've only just realised that in fact pipelining is enabled by default in http profile, so I wonder if disabling pipelining in the profile will prevent the request stuffing behaviour of the monitor? I will try tomorrow. The problem with this behaviour is that if the single TCP connection that gets opened is a not-very-well one, we really only get one try at a response - I mean if it hasn't responded to the first request it's hardly likely to respond to the 2nd.
Another thing I've thought of to try is to apply 3 http-based monitors with interval/timeout 5/6 to the same pool, and say that a minimum of 2 have to pass in order to mark the member up. This would fix the request stuffing and perhaps the single tcp connection with 5 retransmits. If anyone else has experimented with this kind of thing please let us know!!
- Chris_PhillipsApr 15, 2014NimbostratusThe 3n+1 makes sense to *some* extent if things work as documented, but they don't. I understood it was to have three failed checks in a row, which yeah sure, a consistent failure is proven. But with only a single connection possible, the logic doesn't even begin to exist, let alone be sub-optimal. Health monitors won't follow a TCP or HTTP profile at all. Profiles are against the VIP, so pool member availability wouldn't be relevant to it. It's just bog standard OS TCP/IP stack defaults I understand, so I could presumably go tune the actual kernel parameters but somehow I don't think so! As far as monitoring a pool of 50 web apps in the rack next door, with a 10GE interconnect, if I don't get a TCP connection within 0.01 seconds, I'm realistically not going to get one, and if I did, the vary fact I had to retry should be enough to not want to. Yet that is totally blocked off what what I can do. The way things are documented and taught really seem massively different to the reality.
Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com