Inconsistent Monitors

Question

We have two F5 LTM's in a Sync-Failover pair. They're load balancing some critical production services. They are automatically sync'ed and they both say they are in sync.  I inherited them a couple of weeks ago with minimal handover from someone who resigned.&nbsp;
We have a number of services and the monitors for these services are set at pool level.  There is a variety of monitors, some are built-ins and some are custom.&nbsp;
The primary shows all services and nodes as up.  The standby shows some nodes and some complete services as down.  Hovering over the red diamond on a 'down' node on the standby gives a message of the form "Offline(Enabled) {Monitor} failed to connect.  Failed to succeed before deadline @ {date}" .  The dates vary between 1 and 5 weeks ago.&nbsp;
I've tried the following to diagnose:&nbsp;
Telnetting from the management interface of the standby to all 'down' services on the relevant port for each service.  No problems encountered.Pinging the 'down' nodes from the management interface of the standby, and its internal and external interfaces.  All are successful.
From this I think it's safe to assume that there are no firewall or routing issues.&nbsp;
I've run packet tracing on both the primary and standby LTM's.  This shows:&nbsp;
No traffic between the management interface and any node. This is the same on both LTM's.Conversations between the internal IP address of the primary and all nodes on a regular basis.  Coversations between the internal IP address of the secondary and some of the nodes (the 'up' ones).No traffic between the internal IP address of the secondary and any of the 'down' nodes
This leads me to think that the monitoring is done via the internal interface and that the secondary has decided not to recheck 'down' nodes.&nbsp;
According to the BIG-IP Local Traffic Manager: Monitors Reference I should be able to enable and disable monitor instances.  I thought that doing this might cause the secondary to attempt to contact the down modes.  Although I can list the instances of any monitor, I'm not given any way to select the instance and thereby enable/disable it.  This is on monitors created before I was given this to support and a test one I've just created. &nbsp;
So, I have several questions:&nbsp;
Is this normal behaviour?If not, (or even if it is) what happens when there's a failover?Is there any way of manually forcing the secondary to rescan?

shaggy · Answer

Is this normal behaviour? noIf not, (or even if it is) what happens when there's a failover? the nodes may still failIs there any way of manually forcing the secondary to rescan? bigd is the process that handles monitoring, so restarting that service may kick something into gear on the standby unit tmsh restart sys service bigd
also, since this is strange behavior, you should open a case with f5 support. they may be able to help you better-identify potential issues

whoward_194825 · Answer

I'm having the same inconsistent monitor problems with 2 HA pairs on my environment.
My primary Big-IPs 4K are showing all nodes on green, but the stand by is marking some nodes down.
I proceeded to reload both stand by units, and the issue is still present.
Please advise if there any other troubleshooting steps that I can try.&nbsp;
Regards,&nbsp;
WH&nbsp;

martin_sharratt · Answer

I raised a support case with F5 who said that this is due to a bug in the version my F5's were on (11.5.1).  the cure is to upgrade to 11.6.0 which I've now done and which has solved the problem.&nbsp;
In the interim (it took a few weeks before I could update for various unrelated reasons), running this command at a bash prompt on the primary forces them back in sync&nbsp;
/usr/bin/tmsh  modify cm device-group cluster name devices modify {primary {set-sync-leader } }&nbsp;
As we had a service that is stopped by the administrators daily, I created a cron job to run this command once a day in the early hours of the morning.&nbsp;
Martin S&nbsp;

whoward_194825 · Answer

Thanks for the information Martin S.
I'm currently running version 11.5, so I will be getting a maintenance window to upgrade to 11.6 as soon as possible.&nbsp;
Regards,
-WH&nbsp;

david_bizzle_20 · Answer

do you have or know of the BugID related to this issue? &nbsp;

Forum Discussion

Inconsistent Monitors

Recent Discussions

F5 looses the token for the first call

iRule - Url rewrite and header replace and pool selection not working

Switch ssl profile based on weak cipher detection via IRULE

full-proxy HTTP2

Decode ObjectSID from Base64-encode string

Related Content

iRule based RADIUS Health Monitor Builder

HTTP Monitor

F5 Distributed Cloud - Regional Edge Health Monitoring Insights

F5 NGINXaaS for Azure: Monitoring and Metrics

About monitor

ABOUT DEVCENTRAL

RESOURCES

SUPPORT

PARTNERS