To Err is Human

#devops Automating incomplete or ineffective processes will only enable you to make mistakes faster – and more often.

Most folks probably remember the play on "to err is human…" proverb when computers first began to take over, well, everything.

The saying was only partially tongue-in-cheek, because as we've long since learned the reality is that computers allow us to make mistakes faster and more often and with greater reach.

One of the statistics used to justify a devops initiative is the rate at which human error contributes to a variety of operational badness: downtime, performance, and deployment life-cycle time. 

Human error is a non-trivial cause of downtime and other operational interruptions. A recent Paragon Software survey found that human error was cited as a cause of downtime by 13.2% of respondents. Other surveys have indicated rates much higher. Gartner analysts Ronni J. Colville and George Spafford in "Configuration Management for Virtual and Cloud Infrastructures" predict as much as 80% of outages through 2015 impacting mission-critical services will be caused by "people and process" issues.

Regardless of the actual rates at which human error causes downtime or other operational disruptions, reality is that it is a factor. One of the ways in which we hope to remediate the problem is through automation and devops.

While certainly an appropriate course of action, adopters need to exercise caution when embarking on such an initiative, lest they codify incomplete or inefficient processes that simply promulgate errors faster and more often.

DISCOVER, REMEDIATE, REFINE, DEPLOY

Something that all too often seems to be falling by the wayside is the relationship between agile development and agile operations. Agile isn't just about fast(er) development cycles, it's about employing a rapid, iterative process to the development cycle. Similarly, operations must remember that it is unlikely they will "get it right" the first time and, following agile methodology, are not expected to. Process iteration assists in discovering errors, missing steps, and other potential sources of misconfiguration that are ultimately the source of outages or operational disruption.

An organization that has experienced outages due to human error are practically assured that they will codify those errors into automation frameworks if they do not take the time to iteratively execute on those processes to find out where errors or missing steps may lie.

It is process that drives continuous delivery in development and process that must drive continuous delivery in devops. Process that must be perfected first through practice, through the application of iterative models of development on devops automation and orchestration.

What may appear as a tedious repetition is also an opportunity to refine the process. To discover and eliminate inefficiencies that streamline the deployment process and enable faster time to market. Inefficiencies that are generally only discovered when someone takes the time to clearly document all steps in the process – from beginning (build) to end (production). Cross-functional responsibilities are often the source of such inefficiencies, because of the overlap between development, operations, and administration.


The outage of Microsoft’s cloud service for some customers in Western Europe on 26 July happened because the company’s engineers had expanded capacity of one compute cluster but forgot to make all the necessary configuration adjustments in the network infrastructure.

-- Microsoft: error during capacity expansion led to Azure cloud outage

Applying an agile methodology to the process of defining and refining devops processes around continuous delivery automation enables discovery of the errors and missing steps and duplicated tasks that bog down or disrupt the entire chain of deployment tasks.

We all know that automation is a boon for operations, particularly in organizations employing virtualization and cloud computing to enable elasticity and improved provisioning. But what we need to remember is that if that automation simply encodes poor processes or errors, then automation just enables to make mistakes a whole lot faster.

Take care to pay attention to process and to test early, test often.


 

 

Published Oct 24, 2012
Version 1.0

Was this article helpful?

No CommentsBe the first to comment