Forum Discussion
Son_of_Tom_1379
Nimbostratus
Jun 02, 2014BigIP LTM Pool Monitors Stuck on "Checking"
We have dissolved an F5 BigIP HA pair to redesign/update the configurations of a system without any downtime, the plan is to run two BigIp's side by side, and once we're happy with the new build, we'...
nitass
Employee
Jun 02, 2014i understand monitors will be distributed to blade (i.e. not all the blades do the same health check). i suspect some blade may be overwhelming especially if you have so many monitors. to see which blade does which monitor, you may turn on bigd debug and check /var/log/bigdlog. please make sure you turn the debug off. otherwise, it will eat up your disk space.
root@(VIP2400-R77-S2)(cfg-sync Standalone)(/S1-green-P:Active)(/Common)(tmos) list sys db bigd.debug
sys db bigd.debug {
value "enable"
}
root@(VIP2400-R77-S2)(cfg-sync Standalone)(/S1-green-P:Active)(/Common)(tmos) list sys db bigd.dbgfile
sys db bigd.dbgfile {
value "/var/log/bigdlog"
}
Son_of_Tom_1379
Nimbostratus
Jun 02, 2014Thanks Patrik,
I'll give that go, but I don't see a manual restart of a service post reboot as a viable solution. If it works around the issue for now that would suffice, would just need to make it procedure. The strange part is the old system never needed this, albeit the old system used mainly ICMP monitors, there were only a couple of http/tcp monitors in operation.
Nitass, this is a single 1600 system (until I put it into HA, then there will be two), but I will certainly review some verbose logs.
A new point to this, the system has been running overnight with email alerting in place, and I've received about 30 alerts of members going down. The old system did not have alerting in place (SNMP or otherwise) so I'm not sure that this is usual, but we've never had an issue with accessing services. The next question is, how many monitors is too many? We have about 50 nodes, the nodes that are reporting down are using http monitors, and are only ever down for about 10 seconds (or so it reports when the monitor comes back up). I'm tempted to just increase the timeout, as it's in the default 5 / 16, as perhaps that's to low.
Thanks for you time guys, I'll report back my next set of findings
Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects