Forum Discussion
High Availability HA Group not failing over
We have two F5 BigIP 3600's paired together in a high availability configuration. They are connected via serial port and we have two trunks set up with 2 members each. I have set up the HA Group on each F5 so that each trunk has a weight of 10 and an Active Bonus of 5. My theory is that if any one trunk member goes down, it will still function but if at least 2 trunk members go down (threshold is set at 1) that it should fail over.
We are not getting fail over if one member of each trunk is down. I see the scoring on each and the active server has a score of 15 and the standby server has a score of 20, yet there is no failover happening.
Active:
root@LBNCRLTM01(Active)(tmos.sys) show ha-group TEST detail
Sys::HA Group: TEST
---------------------
State enabled
Active Bonus 5
Score 15
Sys::HA Group Trunk: TEST:LTM01-LACP
------------------------------------
Threshold 1
Percent Up 50
Weight 10
Score Contribution 5
Sys::HA Group Trunk: TEST:LTM01-LACP-SRV
----------------------------------------
Threshold 1
Percent Up 50
Weight 10
Score Contribution 5
Standby:
root@LBNCRLTM02(Standby)(tmos.sys) show ha-group TEST detail
Sys::HA Group: TEST
---------------------
State enabled
Active Bonus 5
Score 20
Sys::HA Group Trunk: TEST:LTM02-LACP
------------------------------------
Threshold 1
Percent Up 100
Weight 10
Score Contribution 10
Sys::HA Group Trunk: TEST:SRV-LACP
----------------------------------
Threshold 1
Percent Up 100
Weight 10
Score Contribution 10
Here are the setups on each F5:
Active:
root@LBNCRLTM01(Active)(tmos.sys) list ha-group
sys ha-group TEST {
active-bonus 5
trunks {
LTM01-LACP {
percent-up 50
threshold 1
weight 10
}
LTM01-LACP-SRV {
percent-up 50
threshold 1
weight 10
}
}
}
Standby:
root@LBNCRLTM02(Standby)(tmos.sys) list ha-group
sys ha-group TEST {
active-bonus 5
trunks {
LTM02-LACP {
percent-up 100
threshold 1
weight 10
}
SRV-LACP {
percent-up 100
threshold 1
weight 10
}
}
}
Any ideas as to why this isn't failing over?
Thanks,
Jason
6 Replies
- Chris_Miller
Altostratus
If I'm reading the documentation correctly, I would expect the threshold setting to be unique to each trunk. If you're set at 1, and each trunk still has 1 member left, I wouldn't expect a fail-over to happen. Also, the "Weight" setting seems interesting. According to documentation, "The sum of the weights in the HA group must equal 100." I'm going to read a bit more. - Chris_Miller
Altostratus
I just read this:
"A health score is based on the number of members that are currently available for any trunks, pools, and clusters in the HA group, combined with a weight that you assign to each trunk, pool, and cluster. The unit that has the best overall score at any given time becomes or remains the active unit."
In your case, the standby has a higher score, so I'd expect a fail-over to happen. The only thing that comes to mind is that because the weights don't add up to 100, the scores might be being ignored. Might be worth a support case but you could test it by simply adjusting the weights from 10/10 to 50/50. - neo1674_66454
Nimbostratus
Hello guys,
I'm experiencing a similar issue. We have 4 LB available on a LAB environment, two of them are running HA config (active/passive) with network failover only. The other two are running HA config (active/passive) with serial cable failover. I have configured the HA group feature on both on them in order to trigger a failover when trunk interfaces become unavailable...
The HA group feature is working fine on the network failover pair, but it doesn't work at all on the serial failover pair. I must say I'm surprised because I was going thru the F5 docs on HA groups and AFAIK there is no reference to the fact that network failover is needed in order for HA groups to work properly...
Any comments/experience with that?
Thanx! - Chris_Miller
Altostratus
neo - can you create a support ticket and also paste in your config here? I don't use network failover anywhere but would certainly use HA Groups so I don't expect this to be a requirement. - neo1674_66454
Nimbostratus
I'm currently working with a F5 engineer to try to isolate the root cause of this behavior. I have done some tests on a LAB environment and I think HA groups feature requires network failover to be in place, and possibly serial failover cable to be removed. If that's the case, then we could argue that the documentation needs to be reviewed and updated to clearly state this.
I'll keep you posted once the case is closed.- IanB
Employee
Just to add a very late reply here, HA groups absolutely require network failover to be enabled - this is what causes the BigIP to send udp failover packets containing the current state of the box. The serial failover is very simplistic and carries no state information - it's literally only indicating that the other box is powered on and active.
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
