Forum Discussion

Luca_55898

Nimbostratus

Jul 11, 2011

Failover event detected.... not sure why

Can anyone shed some light on the following logs:

We have two LTMs in a HA config. Not sure if the devices are actually failing over or not....

Mon Jul 11 11:48:24 EST 2011 notice local/LB003 sod[5151] 01140045 HA reports tmm NOT ready.

Mon Jul 11 11:48:24 EST 2011 notice local/LB003 sod[5151] 010c0050 Sod requests links down.

Mon Jul 11 11:48:25 EST 2011 info local/LB003 lacpd[5133] 01160016 Failover event detected. (Switchboard failsafe disabled while offline)

Mon Jul 11 11:48:25 EST 2011 err local/LB003 bcm56xxd[4993] 012c0010 Failover event detected. Marking external interfaces down. bsx.c(3276)

Mon Jul 11 11:48:25 EST 2011 info local/LB003 bcm56xxd[4993] 012c0015 Link: 1.15 is DOWN

Mon Jul 11 11:48:25 EST 2011 info local/LB003 bcm56xxd[4993] 012c0015 Link: 1.16 is DOWN

Mon Jul 11 11:48:25 EST 2011 info local/LB003 bcm56xxd[4993] 012c0015 Link: 3.1 is DOWN

Mon Jul 11 11:48:25 EST 2011 info local/LB003 lacpd[5133] 01160016 Interface 1.15, link admin status: enabled, link status: down, duplex mode: half, lacp operation state: down

Mon Jul 11 11:48:25 EST 2011 info local/LB003 lacpd[5133] 01160010 Link 1.15 removed from aggregation

Mon Jul 11 11:48:25 EST 2011 info local/LB003 bcm56xxd[4993] 012c0015 Link: 3.2 is DOWN

Mon Jul 11 11:48:25 EST 2011 info local/LB003 lacpd[5133] 01160016 Interface 1.16, link admin status: enabled, link status: down, duplex mode: half, lacp operation state: down

Mon Jul 11 11:48:25 EST 2011 info local/LB003 lacpd[5133] 01160010 Link 1.16 removed from aggregation

Mon Jul 11 11:48:25 EST 2011 info local/LB003 lacpd[5133] 01160016 Interface 3.1, link admin status: enabled, link status: down, duplex mode: half, lacp operation state: down

Mon Jul 11 11:48:25 EST 2011 info local/LB003 lacpd[5133] 01160010 Link 3.1 removed from aggregation

Mon Jul 11 11:48:25 EST 2011 info local/LB003 lacpd[5133] 01160016 Interface 3.2, link admin status: enabled, link status: down, duplex mode: half, lacp operation state: down

Mon Jul 11 11:48:25 EST 2011 info local/LB003 lacpd[5133] 01160010 Link 3.2 removed from aggregation

Mon Jul 11 11:48:35 EST 2011 notice local/LB003 sod[5151] 01140029 HA daemon_heartbeat tmm5 fails action is go offline down links and restart.

Mon Jul 11 11:48:35 EST 2011 notice local/LB003 sod[5151] 010c003e Offline

Mon Jul 11 11:48:35 EST 2011 notice local/LB003 sod[5151] 01140044 HA reports tmm ready.

Mon Jul 11 11:48:35 EST 2011 notice local/LB003 sod[5151] 010c0018 Standby

Mon Jul 11 11:48:36 EST 2011 info local/LB003 lacpd[5133] 01160016 Connected to failover service.

Mon Jul 11 11:48:36 EST 2011 info local/LB003 bcm56xxd[4993] 012c0012 Connected to failover service.

Mon Jul 11 11:48:36 EST 2011 notice local/LB003 sod[5151] 010c0048 Bcm56xxd and lacpd connected - links up.

Mon Jul 11 11:48:38 EST 2011 info local/LB003 bcm56xxd[4993] 012c0015 Link: 3.1 is UP

Mon Jul 11 11:48:38 EST 2011 info local/LB003 bcm56xxd[4993] 012c0015 Link: 3.2 is UP

Mon Jul 11 11:48:39 EST 2011 info local/LB003 bcm56xxd[4993] 012c0015 Link: 1.15 is UP

Mon Jul 11 11:48:39 EST 2011 info local/LB003 bcm56xxd[4993] 012c0015 Link: 1.16 is UP

Mon Jul 11 11:48:40 EST 2011 info local/LB003 lacpd[5133] 01160009 Link 1.15 added to aggregation

Mon Jul 11 11:48:40 EST 2011 info local/LB003 lacpd[5133] 01160009 Link 1.16 added to aggregation

Mon Jul 11 11:48:40 EST 2011 info local/LB003 lacpd[5133] 01160009 Link 3.1 added to aggregation

Mon Jul 11 11:48:40 EST 2011 info local/LB003 lacpd[5133] 01160009 Link 3.2 added to aggregation

config

design

25 Replies

iamczar_12961
Cirrus
Jul 11, 2011
Hi there,

Can you get the logs from your units?

There's a point in the logs that local/LB003 went to standby, but I have to check your logs and configs to see if i twas the original standby or active unit.

I'm also concern that the "HA reports tmm NOT ready.", can you check /var/core if there's any core file?

Once you have the sn of the unit, can you open a ticket by sending an email to support@f5.com

Send the following too:

File: Standard qkview output

Command to Generate: qkview

Location of Output file: /var/tmp/$HOSTNAME.tgz

Files: Host log tarball

Command to Generate: tar zcvf /var/tmp/$HOSTNAME-logs.tar.gz /var/log/*

Location of Output file: /var/tmp/$HOSTNAME-logs.tar.gz

hth..
Luca_55898
Nimbostratus
Jul 11, 2011
Hi,

Yes there are numerous files in /var/core

I will log a case today
iamczar_12961
Cirrus
Jul 12, 2011
Sounds great! =)
afedden_1985
Cirrus
Oct 28, 2014
I know this is an old thread but we just had a fail over with the same symptoms on a HA pair of 11050s running 11.5.1 hot fix 4, did you ever find out what the root cause was?
aj1
Nimbostratus
Mar 04, 2015
Hi, i had the exact same logs, and there was a failover. The logs start with "HA reports tmm NOT ready." line. I'm running an HA pair on 11.5.1 HF7. There are a couple of core files created. Would appreciate any light on this.

Thank you.
jgranieri
Nimbostratus
Mar 05, 2015
can you advise what your HA configuration is?

vlan failsage gateway Bonus scoring (if so what are your monitoring
Dizzle_79606
Nimbostratus
Mar 05, 2015
Have you looked into if your switch is not flapping? looks like you might be monitoring your bcm56xxd[4993] port and it could be flapping.

aj1

Nimbostratus

Mar 10, 2015

Opened a case with support, its a bug in 11.5.1 HF7, resolved in 11.5.2 and 11.6.0 HF4.

/var/log/ltm:

Mar  4 21:29:55 slot1/bigip notice sod[6191]: 01140045:5: HA reports tmm NOT ready.
Mar  4 21:29:55 slot1/bigip notice sod[6191]: 010c0050:5: Sod requests links down.
Mar  4 21:29:55 slot1/bigip info lacpd[7970]: 01160016:6: Failover event detected. (Switchboard         failsafe disabled while offline)
Mar  4 21:29:55 slot1/bigip err bcm56xxd[8241]: 012c0010:3: Failover event detected.  Marking       external interfaces down. bsx.c(3988)
Mar  4 21:29:56 slot1/bigip info bcm56xxd[8241]: 012c0015:6: Link: 1/1.6 is DOWN
Mar  4 21:29:56 slot1/bigip info bcm56xxd[8241]: 012c0015:6: Link: 1/1.5 is DOWN
Mar  4 21:29:56 slot1/bigip info lacpd[7970]: 01160016:6: Interface 1/1.6, link admin status:      enabled, link status: down, duplex mode: half, lacp operation state: down 
Mar  4 21:29:56 slot1/bigip info lacpd[7970]: 01160010:6: Link 1/1.6 removed from aggregation
Mar  4 21:29:56 slot1/bigip info bcm56xxd[8241]: 012c0015:6: Link: 1/1.4 is DOWN
Mar  4 21:29:56 slot1/bigip info bcm56xxd[8241]: 012c0015:6: Link: 1/1.3 is DOWN
Mar  4 21:29:56 slot1/bigip info bcm56xxd[8241]: 012c0015:6: Link: 1/1.2 is DOWN
Mar  4 21:29:56 slot1/bigip info bcm56xxd[8241]: 012c0015:6: Link: 1/1.1 is DOWN
Mar  4 21:29:56 slot1/bigip info bcm56xxd[8241]: 012c0015:6: Link: 1/9.1 is DOWN
Mar  4 21:29:56 slot1/bigip info lacpd[7970]: 01160016:6: Interface 1/1.3, link admin status:       enabled, link status: down, duplex mode: half, lacp operation state: down
Mar  4 21:29:56 slot1/bigip info lacpd[7970]: 01160010:6: Link 1/1.3 removed from aggregation
Mar  4 21:29:56 slot1/bigip info lacpd[7970]: 01160016:6: Interface 1/1.2, link admin status:     enabled, link status: down, duplex mode: half, lacp operation state: down
Mar  4 21:29:56 slot1/bigip info lacpd[7970]: 01160010:6: Link 1/1.2 removed from aggregation
Mar  4 21:29:56 slot1/bigip info lacpd[7970]: 01160016:6: Interface 1/1.1, link admin status:       enabled, link status: down, duplex mode: half, lacp operation state: down
Mar  4 21:29:56 slot1/bigip info lacpd[7970]: 01160010:6: Link 1/1.1 removed from aggregation
Mar  4 21:29:59 slot1/bigip notice chmand[8246]: 012a0005:5: Interface: 1/8.1 is DOWN
Mar  4 21:29:59 slot1/bigip notice clusterd[6184]: 013a0006:5: Bumping this blade's revision and     saving cluster config 
Mar  4 21:30:08 slot1/bigip notice sod[6191]: 01140029:5: HA daemon_heartbeat tmm fails action is     go offline down links and restart.
Mar  4 21:30:08 slot1/bigip notice sod[6191]: 01140029:5: HA daemon_heartbeat tmm1 fails action      is go offline down links and restart.
Mar  4 21:30:08 slot1/bigip notice sod[6191]: 01140029:5: HA daemon_heartbeat tmm2 fails action     is go offline down links and restart.
Mar  4 21:30:08 slot1/bigip notice sod[6191]: 01140029:5: HA daemon_heartbeat tmm3 fails action     is go offline down links and restart.
Mar  4 21:30:08 slot1/bigip notice sod[6191]: 010c0054:5: Offline for traffic group     /Common/traffic-group-1.
Mar  4 21:30:08 slot1/bigip notice sod[6191]: 010c003e:5: Offline
Mar  4 21:30:08 slot1/bigip err clusterd[6184]: 013a0018:3: Blade 1 turned RED: Run, HA TABLE    offline

Eduardo_de_Oliv
Nimbostratus
Aug 13, 2015
I have a 4200 pair in active-standby and had the same problem a few times after to upgrade from 11.4 to 11.5. After 11.6 HF4 that problem didn't appear anymore until today, and I'm using 11.6.0 HF4... I'm really trying to understand what happened this time... but I just know one thing, i really bored with this...
AdrienR_219328
Nimbostratus
Aug 27, 2015
Hello, same here.

I have a Big-IP 2000 pair in active-standby running latest 11.6.0 HF5, and run into the same issue for several days. Mostly in the afternoon, supposedly when there are more users connected to it (around 50), it does the following: ltm:Aug 27 15:06:00 ssl-vpn-p notice sod[8619]: 01140045:5: HA reports tmm NOT ready. ltm:Aug 27 15:18:02 ssl-vpn-p notice sod[8619]: 01140045:5: HA reports tmm NOT ready. ltm:Aug 27 15:30:32 ssl-vpn-p notice sod[8619]: 01140045:5: HA reports tmm NOT ready. ltm:Aug 27 15:41:54 ssl-vpn-p notice sod[8619]: 01140045:5: HA reports tmm NOT ready. ltm:Aug 27 15:53:23 ssl-vpn-p notice sod[8619]: 01140045:5: HA reports tmm NOT ready.

Seems like a regression on the fix: https://support.f5.com/kb/en-us/solutions/public/15000/700/sol15713.html Will try to disable the TCP Segmentation Offload, but I don't like that... F5, please help!