Forum Discussion
BIG-IP 11.3.0 --> 11.6.0 Upgrade Failover Woes
Hello all,
During an upgrade of 11.3.0 to 11.6.0 I experienced issues when failing over to the standby device to upgrade the active. The devices are in an active-standby configuration, but not configured in an HA-group.
When forcing the traffic group on the active device to standby, the following was initially seen in the ltm log:
Oct 18 21:39:52 LB-F5-1 notice sod[5994]: 010c0044:5: Command: go standby /Common/traffic-group-1 GUI.
Oct 18 21:39:52 LB-F5-1 notice sod[5994]: 010c0052:5: Standby for traffic group /Common/traffic-group-1.
Oct 18 21:39:52 LB-F5-1 err sod[5994]: 010c0035:3: Function run_external failed to fork at call to failover scripts.
Oct 18 21:39:52 LB-F5-1 notice sod[5994]: 010c0018:5: Standby
Oct 18 21:39:52 LB-F5-1 info tmm[25247]: 01190004:6: Resuming log processing at this invocation; held 32 messages.
Oct 18 21:39:52 LB-F5-1 warning tmm[25247]: 01190004:4: address conflict detected for X.X.X.X (XX:XX:XX:XX:XX:XX) on vlan $$$
Oct 18 21:39:52 LB-F5-1 warning tmm[25247]: 01190004:4: address conflict detected for X.X.X.X (XX:XX:XX:XX:XX:XX) on vlan $$$
Oct 18 21:39:52 LB-F5-1 warning tmm[25247]: 01190004:4: address conflict detected for X.X.X.X (XX:XX:XX:XX:XX:XX) on vlan $$$
Oct 18 21:39:52 LB-F5-1 warning tmm[25247]: 01190004:4: address conflict detected for X.X.X.X (XX:XX:XX:XX:XX:XX) on vlan $$$
Oct 18 21:39:53 LB-F5-1 warning tmm[25247]: 01190004:4: address conflict detected for X.X.X.X (XX:XX:XX:XX:XX:XX) on vlan $$$
Oct 18 21:39:53 LB-F5-1 info tmm[25247]: 01190004:6: Per-invocation log rate exceeded; throttling.
Oct 18 21:39:54 LB-F5-1 notice tmm1[25248]: 01010029:5: Clock advanced by 124 ticks
Oct 18 21:39:54 LB-F5-1 notice tmm[25247]: 01010029:5: Clock advanced by 136 ticks
Oct 18 21:40:13 LB-F5-1 notice logger: /usr/bin/tmipsecd --tmmcount 4 ==> /usr/bin/bigstart stop racoon
This resulted in a situation where both devices flipped back and forth from active to standby.
What I'm most intrigued about here is the following portion of the above snippet:
Oct 18 21:39:52 LB-F5-1 err sod[5994]: 010c0035:3: Function run_external failed to fork at call to failover scripts.
Can anyone shed some light on this error, and what it's implications might be?
Thanks! Chris
11 Replies
- NikhilB
Employee
Why are they not in a HA group? - insomniak_11745
Nimbostratus
They are not in an HA group because there are only two, and this wouldn't fit into our use-case for them. - JG
Cumulonimbus
How did you get out of the loop? "tmsh run /sys failover offline" on the old active? - insomniak_11745
Nimbostratus
I forced the (presumably now active) LB back to standby through the web management interface. - JG
Cumulonimbus
So you aborted the upgrade after upgrading the standby to v11.6.0? Did you have mac masquerade configured? I have had some similar but different experience with failover recently, with some services failing to go over to the other device after a fail-back operation. - insomniak_11745
Nimbostratus
It was all a bit chaotic, to be perfectly honest. I eventually was able to successfully fail over to the upgraded 11.6 standby unit. I then upgraded the 11.3 unit to 11.6 successfully. After which I failed back to the primary unit, and this happened again but I was unable to recover from it. I eventually booted the standby device back to 11.3 while it was in an active state and it returned to normal. I understand that my panic could very well have been some of the cause for this, but that log snippet is from immediately after I forced a failover. I did not have mac masquerade configured. - JG
Cumulonimbus
Understandably, as nothing is more disturbing and reduces my life more than seeing a loop at an upgrade! How was your HA configured? With a dedicated VLAN? With a multicast addr? You could have put the standby offline and trouble-shoot the HA. - insomniak_11745
Nimbostratus
>You could have put the standby offline and trouble-shoot the HA. Yes, this makes worlds of sense now. Again, panic. HA is not configured, essentially. The two devices are not in a configured HA group. Before I failed over the active unit, I unchecked the "Auto Failback" option in the traffic group. Pardon me if I'm misinterpreting your question. - JG
Cumulonimbus
You don't have to configure an HA group, but you must have HA configured in order to have an active/standby setup, for example, failover/configsync addresses. See https://support.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/tmos-implementations-11-2-0/2.html . - insomniak_11745
Nimbostratus
Ah, yes, sorry. Device failover is *not* configured with a multicast address. It is configured on a dedicated VLAN. The Primary Local Mirror address is on the same VLAN, and the Secondary Local Mirror address is a separate VLAN.
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com