Forum Discussion
Zhinjio_101470
Nimbostratus
Sep 22, 2011Active/Passive NODE question
Hey folks,
Having a friendly argument with a coworker about the best way to implement a particular failover scenario.
General Behavior Definition:
We want one node to be receiving all traffic until it fails.
Failover to the secondary node should be automated and quick.
Once a node is "promoted" to active, it should remain that way, even if the old node comes back into service.
A failed node returning to service (monitoring succeeds) should be a MANUAL step.
We want to minimize the work required to return a failed node into service again (but it should not be completely automated).
We currently have two different methods we're implementing this, but it seems to me there should be a better way than either of these methods:
-------------------
Method 1:
Round Robin, Priority Pool (nodes set to, say ... 3 and 5) (Less than 1)
Monitor set "Manual Resume" to "YES"
No persistence profile
When active node fails, secondary node is promoted due to load balancing method. Recovered node will NOT return to service until manually checked back to "Enabled". At that time, I would also adjust its Priority to be LOWER than the active node so it will not receive traffic until the newly promoted node fails. Enable the node. All is well.
------------------
Method 2:
Round Robin, Priority Pool (nodes set to 3 and 5) (Less than 1)
Monitor set "Manual Resume" to "YES"
Persistence profile set to "Dest.Addr Affinity" and Timeout set to "Indefinite"
Similar to above, active node fails, secondary node is promoted. Recovered node will not return to service until manually checked back to Enabled. UNLIKE the above method though, it will not receive traffic due to the persistence profile and timeout setting.
--------------------
So it seems to me that with the second method, you are setting the priority pool settings, but not really "taking advantage" of them. The only purpose is really serves is to ensure that only one node is getting traffic, but then ignoring it with the persistence profile. The one advantage of the second method is that I don't have to muck with priority values when I bring a node back into service.
Either way, it seems that either of these methods are kind of "hacks" and it is really just cover for the fact that there is no "real" active/passive node implementation that covers our requirements.
Y'all are smart. Are we missing some feature or method for doing this that would still cover our requirements?
Thanks in advance,
- ZJ
5 Replies
- The_Bhattman
Nimbostratus
Hi ZJ,
Here is a method that i used in the pasthttp://devcentral.f5.com/wiki/iRules.SingleNodePersistence.ashx
You can alter it so that you can toggle to turn on persistance for one node over the other via HTTP request.
That is the beauty of the F5 in that you can customize active/passive methods without relying a vendors version of what a "real" active/passive configuration
I hope this helps
Bhattman - Zhinjio_101470
Nimbostratus
That is a great solution. However... you noted:
Doesn't offer the capability of manual resume after failure, or true designation of a "primary" and "secondary" instance (sometimes required for db applications)...
That is, in fact, the situation we're talking about, even though I hadn't mentioned it above. When a node fails, a DBA will check the failed node, verify its health, ensure there are no collision transactions, turn on replication again, and verify everything is healthy. We want to then "enable" the node again, but not take traffic. Ideally, the fewer times we "flip flop" the better.
The real question here is of solution "elegance", I guess. - The_Bhattman
Nimbostratus
HI Zhinjio,
In that point then "elegance" would be that you want a built-in functionality rather then a customized one.
You can still use the iRules version with another script (iControl method) which can be executed by the DBA to be fail the persistance over to the other node when they are ready.
Bhattman - hoolio
Cirrostratus
You could try using Manual Resume on the monitor. From the online help:
Specifies whether the system automatically changes the status of a resource to Enabled at the next successful monitor check. If you set this option to Yes, you must manually re-enable the resource before the system can use it for load balancing connections. The default is No.
* Yes: Specifies that you must manually re-enable the resource after an unsuccessful monitor check.
* No: Specifies that the system automatically changes the status of a resource to Enabled at the next successful monitor check.
Another option would be to use a db monitor to make a SQL query to a table that only returns a successful response if that SQL node is active.
Aaron - Zhinjio_101470
Nimbostratus
Posted By hoolio on 09/27/2011 10:47 AM
You could try using Manual Resume on the monitor. From the online help:
Specifies whether the system automatically changes the status of a resource to Enabled at the next successful monitor check. If you set this option to Yes, you must manually re-enable the resource before the system can use it for load balancing connections. The default is No.
* Yes: Specifies that you must manually re-enable the resource after an unsuccessful monitor check.
* No: Specifies that the system automatically changes the status of a resource to Enabled at the next successful monitor check.
Another option would be to use a db monitor to make a SQL query to a table that only returns a successful response if that SQL node is active.
Aaron
Exactly. We are doing both of those things. Specifically, the monitor is "External". It ultimately runs a sql query against the database, the results of which determines whether I the monitor spits out "UP" (its alive) or does nothing (its not).
Recovery is manual on the failed node. The question is in how you bring the failed node back into service.
Method 1 requires rejiggering node priorities when you re-enabled the node to make sure it does not start taking traffic.
Method 2 requires just re-enabling the node, but it also ignores node priorities, and also will swap nodes again if you were to restart the F5.
I guess its sort of a moot point since there are manual steps in either case. It just seems that both options are "inelegant", and I was looking for some other solution that we might not have considered.
I do appreciate the conversation, though. It at least validates that we didn't miss something obvious.
- ZJ
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects
