Forum Discussion
Florin_Andrei_1
Nimbostratus
Jan 21, 2010GTM, active/passive and split-brain
The situation:
A pair of redundant sites, functionally equivalent (hardware-wise too, almost identical), geographically separated.
A private DS3 line in between, used to synchronize data from one site to another.
Site architecture is pretty typical: GTM for site-to-site switching, firewalls, pair of LTMs for local load balancing, webservers, other systems, storage.
The two sites are active/passive (they cannot be both active at the same time), mostly due to the database architecture. The active site is synchronizing its fresh data to the backup site in near-real-time over the private line.
Each GTM monitors its local site, and a site switch decision is made if the current active site has a problem (let's say, the storage went offline for some reason).
The problem:
Typical for an active/passive design, a split-brain situation can be pretty bad. We don't want both sites to become active at the same time. It's better to leave that decision to a human operator (the site's reliability is only truly critical a few hours each week, during which time it is very closely monitored).
So, basically, we want this behavior:
- flip the active and backup roles if both GTMs can see each other and the current active site detects a local fault
- preserve the current state and wait for the human operator to make a decision if the connection between the two GTM units is lost
I'm pretty new to the GTM so I'm wondering, is this doable at all?
2 Replies
- Robert_184589
Nimbostratus
Did you ever get an answer to this question? I'm facing the same thing. - Max_Q_factor
Cirrocumulus
I will give you my thoughts on this for each behavior you are asking about:
1. Split brain is a bit difficult to enforce and you have to be willing to get creative, but ultimately you could do things like add an additional TCP monitor to the pools (either GTM or LTM) that reference the IP and 443 management port of the GTMs directly (Alias Address and Alias Service Port respectively), also look at the minimum monitors up, as well as reversing.
2. Preserving the current state is pretty easy when using the "manual resume" option in the monitor. Once the monitor is tripped the pool member will stay down until you manually re-enable the pool member. I see manual resume used quite a bit on database virtual servers.As a bonus! If you have some sort of witness server, you might want to look at creating a monitor that queries the witness server for service availability, and/or using the appropriate database monitor to query a value in a table that reflects the operational state of the database.
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects