Clock advanced case with F5
Since the end of last year either of our units 6900 LTM appear to freeze for whatever reason and produce the follwowing error for example:
Mon Feb 11 06:35:49 GMT 2013 notice tmm2 tmm2[10266] 01010029 Clock advanced by 3457 ticks
Mon Feb 11 06:35:49 GMT 2013 notice tmm3 tmm3[10267] 01010029 Clock advanced by 3458 ticks
Mon Feb 11 06:35:49 GMT 2013 notice tmm1 tmm1[10265] 01010029 Clock advanced by 3456 ticks
Mon Feb 11 06:35:49 GMT 2013 notice tmm tmm[10264] 01010029 Clock advanced by 3459 ticks
Mon Feb 11 06:35:51 GMT 2013 notice f5unit sod[6369] 010c0025 Toggle from active to standby to active.
Mon Feb 11 06:35:51 GMT 2013 notice f5unit sod[6369] 010c0025 Toggle from active to standby to active.
We run a pair of LTM 6900 across Data Centres (10GB dark fibre) in HA (Hot/Standby mode), the above problems have caused us no end of problems because a lot of our applications run trading/price platforms and we don't yet have VIP mirroring because of the high small packet traffic volume we are not sure the F5 could cope and the current risk of breaking connections again (NB: the cpu runs about 20%, memory ok). However, as you see above the unit goes into panic mode, Active then Standby then Active again breaking these critical TCP sessions. The case has now been with F5 since December 2012. Still Open!
We applied the following, the errors have been happening less often since the command was a applied about 3 months ago.
tmsh modify sys db failover.nettimeoutsec value 6
The latest in their labs is to disable the Linux NMI watchdog process
echo 0 > /proc/sys/kernel/nmi_watchdog
According to their lab tests......they say the following........
Here's where we stand concerning the NMI Watchdog:
The escalation engineer believes it will be worthwile to turn off the NMI (Non-maskable interrupt) watchdog on your device as the next step.
On our devices, this should be relatively safe because the NMI watchdog is really only used to detect serious failures in system components which might cause an ordinary computer to freeze. On F5 devices, we have our own hardware watchdog systems in place which cover this use case. The NMI watchdog is in fact disabled in LTM VE and vCMP. It is not required for normal operation of our equipment.
We'd like to emphasise that we're not seeing any watchdog triggering, but we are seeing peculiar behaviour with some of the interrupts during the 3.5 second pauses. We've run one of our lab units for 2 weeks with NMI watchdog disabled, and we did not see any incidence of the 3.5s pause. After re-enabling NMI watchdog and rebooting the device, we had two incidences within 3 days.
We have upgraded code from 10.2.1 (HF3) to 11.1.0 (HF5) as recommended a few months back, still these errors persist. Including replacing both hardware units and moving power.
Any comments out there, help!!!!