Devops and infrastructure 2.0 is really trying to scale the last bottleneck in operations: people. But the corollary is also true: don’t think you can depend solely on machines.
One of the reasons it’s so easy for folks to fall into the “Trough of Disillusionment” regarding virtualization and cloud computing is because it sounds like it’s going to magically transform operations. Get rid of all those physical servers by turning them into virtual ones and voila! All your operational bottlenecks go away, right?
Nope. What the removal of physical devices from the data center does is eliminate a lot of time (and sweat) from the deployment phase of compute resources. There’s no more searching the rack for a place to shove a server, no more physical plugging of cables into this switch or that, and no more operating system installation and the subsequent configuration that generally goes into the deployment of hardware.
What it doesn’t remove is the need for systems’ administrators and operators to manage the applications deployed on that server – physical or virtual. Sure, you got rid of X number of physical pieces of hardware, but you’ve still got the same (or more) number of applications that must be managed. Operations is still bogged down with the same burdens it always has and to make it worse, virtualization is piling up more with yet another “virtual” stack of configurations that must be managed: virtual NICs, virtual switches, virtual platforms.
Over in finance and up in the corner offices the operations’ budget is not necessarily growing and neither is headcount. There’s only so many red pills to go around, after all, so ops will have to make do with the people and budgets they have. Which is clearly not enough. Virtualization adds complexity which increases the costs associated with management primarily because we rely on people – on manpower – to perform a plethora of mundane operational tasks. Even when it’s recognized that this is a labor (and thus time and cost) intensive process and the ops team puts together scripts, the scripts themselves must oft times be initiated by a human being.
Consider the process involved in scaling an application dynamically. Let’s assume that it’s a typical three-tier architecture with web servers (tier 1) that communicate with application servers (tier 2) that communicate with a single, shared database (tier 3). The first tier that likely needs to scale is the web tier. So an instance is launched, which immediately is assigned an IP address – as are all the virtual IP addresses. If they’re hardwired in the image they may need to be manually adjusted. Once the network configuration is complete that instance now needs to know how to communicate with the application server tier. It then needs to be added to the pool of web servers on the Load balancer. A fairly simple set of steps, true, but each of these steps takes time and if the web server is not hardwired with the location of the application server tier then the configuration must be changed and the configuration reloaded. Then the load balancer needs to be updated.
This process takes time. Not hours, but minutes, and these steps often require manual processing. And it’s worse in the second tier (application servers) unless the architecture has been segmented and virtualized itself. Every human interaction with a network or application delivery network or application infrastructure component introduces the possibility of an error which increases the risk of downtime through erroneous configuration.
The 150 minute-long outage, during which time the site was turned off completely, was the result of a single incorrect setting that produced a cascade of erroneous traffic, Facebook software engineering director Robert Johnson said in a posting to the site. [emphasis added]
“Facebook outage due to internal errors, says company” ZDNet UK (September 26, 2010)
This process takes time, it costs money, and it’s tedious and mundane. Most operations teams would rather be doing something else, I assure you, than manually configuring virtual instances as they are launched and decommissioned. And let’s face it, what organization has the human resources to dedicate to just handling these processes in a highly dynamic environment? Not many.
Agent Smith had a huge advantage over Neo in the movie The Matrix. Not just because he was technically a machine, but because he could “clone” himself using people in the Matrix. If organizations could clone operations teams out of its customer service or business analyst departments, maybe they could afford to continue running their data centers manually.
But as much as science fiction has spurred the invention of many time-saving and hawesome gadgets, it can’t instantaneously clone operations folks to handle the job. This is one case where Agent Smith was right: never send a human to do a machine’s job.
These tedious tasks can easily be handled by a “machine”, by an automation or orchestration system that controls network components via an open, standards-based dynamic control plane. Codifying these tasks is the first step down the path toward a completely automated data center and what most folks would recognize as being cloud computing. Eliminating the possibility of error and executing on a much faster time table, an integrated network can eliminate the bottleneck to achieving a dynamic data center: people. Leveraging Infrastructure 2.0 to shift the burden from people to technology is what ultimately gives organizations a push out of the trough of disillusionment and up the slope of enlightenment toward the plateau of productivity – and cost savings.
Virtualization does in fact afford opportunities to make more efficient IT operations. In fact, many a pundit has claimed automation of operations will drastically impact the need for operations staff in the first place. Remember that the aforementioned incorrect configuration setting that caused the outage experienced by Facebook recently was enabled by automation to spread like wildfire, but the setting was almost certainly changed by a human operator. A human operator that’s required to understand what buttons to push and what knobs to turn and when. The machines aren’t smart enough to do that, yet, and it is possible (likely) they will never reach that point. And what’s more, it was human operators that tracked down and resolved the issue. The machines that excel at performing tasks are not able to self-diagnose or even recognize that there is a problem in the first place. That requires people – people with the expertise and time to interpret and evaluate and analyze data so that it can be used to do something – fix a problem, create a product, help a customer.
Automation and devops are the means by which human operators will have that time, by shifting the burden of mundane tasks to technology where it belongs and leveraging the depth and breadth of human knowledge and skill to better optimize and analyze the computational systems that are the foundation for virtually every business endeavor today. If we had enough people to do that without automation, then perhaps the belief that operations automation would enable organizations to “get rid of IT”. But IT doesn’t have enough people in the first place to get everything they need to get done, done.
If they did, it’s likely we wouldn’t be here in the first place. Necessity is, after all, the mother of invention.
|Related blogs & articles: |