#devops An ecosystem-based data center approach means accepting the constancy of change…
It is an interesting fact of life for aquarists that the term “stable” does not actually mean a lack of change. On the contrary, it means that the core system is maintaining equilibrium at a constant rate. That is, the change is controlled and managed automatically either by the system itself or through the use of mechanical and chemical assistance.
Sometimes, those systems need modifications or break (usually when you’re away from home and don’t know it and couldn’t do anything about it if you did anyway but when you come back, whoa, you’re in a state of panic about it) and must be repaired or replaced and then reinserted into the system. The removal and subsequent replacement introduces more change as the system attempts to realign itself to the temporary measures put into place and then again when the permanent solution is again reintroduced.
A recent automatic top-off system failure reminded me of this valuable lesson as I tried to compensate for the loss while waiting for a replacement. This 150 gallon tank is its own ecosystem and it tried to compensate itself for the fluctuations in salinity (salt-to-water ratio) caused by a less-than-perfect stop-gap measure I used while waiting a more permanent solution. As I was checking things out after the replacement pump had been put in place, it occurred to me that the data center is in a similar position as an ecosystem constantly in flux and the need for devops to be able to automate as much as possible in a repeatable fashion as a means to avoid incurring operational risk.
The reason my temporary, stop-gap measure was less than perfect was that the pump I used to simulate the same auto-top off process was not the same as the one used by the failed pump. The two systems were operationally incompatible. One monitored the water level and automatically pumped fresh water into the tank as a means to keep the water level stable while the other required an interval based cycle that pumped fresh water for a specified period of time and then shut off. To correctly configure it meant determining the actual flow rate (as opposed to the stated maximum flow rate) and doing some math to figure out how much water was actually lost on daily basis (which is variable) and how long to run the pumps to replace that loss over a 24 hour period.
Needless to say I did not get this right and it had unintended consequences. Because the water level increased too far it caused a siphon break to fail which resulted in even more water being pumped into the system, effectively driving it close to hypo-salinity (not enough salt in the water) and threatening the existence of those creatures sensitive to salinity levels (many corals and some invertebrates are particularly sensitive to fluctuations in salinity, among other variables).
The end result? By not nailing down the process I’d opened a potential hole through which the stability of the ecosystem could be compromised. Luckily, I discovered it quickly because I monitor the system on a daily basis, but if I’d been away, well, disaster may have greeted me on return.
The process in this tale of near-disaster was key; it was the poor automation of (what should be) a simple process.
This is not peculiar to the ecosystem of an aquarium, a fact of which Tufin Technologies recently reminded us when it published the results of a survey focused on change management. The survey found that organizations are acutely aware of the impact of poorly implemented processes and the (often negative) impact of manual processes in the realm of security:
66% of the sample felt their change management processes do or could place the organization at risk of a breach. The main reasons cited were lack of formal processes (56%), followed by manual processes with too many steps or people in the process (29%).
The Tufin survey focused on security change management (it is a security focused organization, so no surprise there) but as security, performance, and availability are intimately related it seems logical to extrapolate that similar results might be exposed if we were to survey folks with respect to whether or not their change management processes might incur some form of operational risk.
One of the goals of devops is to enable successful and repeatable application deployments through automation of the operational processes associated with a deployment. That means provisioning of the appropriate security, performance, and availability services and policies required to support the delivery of the application. Change management processes are a part of the deployment process – or if they aren’t, they should be to ensure success and avoid the risks associated with lack of formal processes or too many cooks in the kitchen with highly complex manually followed recipes. Automation of configuration and policy-related tasks as well as orchestration of accepted processes is critical to maintaining a healthy data center ecosystem in the face of application updates, changes in security and access policies, as well as adjustments necessary to combat attacks as well as legitimate sudden spikes in demand.
More focus on services and policy as a means to not only deploy but maintain application deployments is necessary to enable IT to continue transforming from its traditional static, manual environment to a dynamic and more fluid ecosystem able to adapt to the natural fluctuations that occur in any ecosystem, including that of the data center.