The Case (For & Against) Management-Driven Scalability in Cloud Computing Environments

Examining responsibility for auto-scalability in cloud computing environments.

[ If you’re coming in late, you may want to also read previous entries on the network and the application ]

Today, the argument regarding responsibility for auto-scaling in cloud computing as well as highly virtualized environments remains mostly constrained to e-mail conversations and gatherings at espresso machines. It’s an argument that needs more industry and “technology consumer” awareness, because it’s ultimately one of the underpinnings of a dynamic data center architecture; it’s the piece of the puzzle that makes or breaks one of the highest value propositions of cloud computing and virtualization: scalability.

The question appears to be a simple one: what component is responsible not only for recognizing the need for additional capacity, but acting on that information to actually initiate the provisioning of more capacity? Neither the answer, nor the question, it turns out are as simple as appears at first glance. There are a variety of factors that need to be considered, and each of the arguments for – and against - a specific component have considerable weight.

Today we’re going to specifically examine the case for management frameworks as the primary driver of scalability in cloud computing environments.

ANSWER: MANAGEMENT FRAMEWORK

We’re using “management framework” as a catch-all for the “system” in charge of “the cloud”. In some cases this might be a commercial solution offered by popular vendors like VMware, Citrix, Microsoft, or even those included in open-source solutions like Ubuntu. It might be a custom-built solution managed by a provider, like that of Amazon, Rackspace, BlueLock, and other cloud computing providers. These systems generally allow end-user (IT) control via APIs or web-based management consoles, and allow in varying degrees the provisioning and management of virtual instances and infrastructure services within the environment.

This management capability implies, of course, control over the provisioning of resources – compute, network, and storage – as well as any services required, such as load balancing services required to enable scalability. Obviously this means the management framework has the ability to initiate a scaling event because it has control over the required systems and components. The only problem with this approach is one we’ve heard before – integration fatigue. Because the network and server infrastructure often present management interfaces as open, but unique APIs, the management framework must be enabled through integration to control them. This is less of a problem for server infrastructure, where there are few hypervisor platforms that require such integration. It is more of a challenge in the infrastructure arena where there are many, many options for load balancing services.

But let’s assume for the sake of this argument that the network infrastructure has a common API and model and integration is not a problem. The question remains, does the management framework recognize the conditions under which a scaling event should be initiated, i.e. does it have the pertinent information required to make that decision? Does it have the visibility necessary?

In general, the answer is no, it does not. Most “cloud” management frameworks do not themselves collect the data upon which a decision made. Doing so would almost certainly require us to return to an agent-based collection model in which agents are deployed on every network and server infrastructure component as a means to feed that data back into the management system, where it is analyzed and used to determine whether a scaling event – up or down – is necessary. This is not efficient and, if you were to ask most customers, the prospect of paying for the resources consumed by such an agent may be a negative factor in deciding whether to use a particular provider or not.

The question remains, as well, as to how such a system would manage in real-time to track the appropriate thresholds, by application and by instance, to ensure a scaling event is initiated at the right time. It would need to manage each individual instance as well as the virtual application entity that exists in the load balancing service. So not only would it need to collect the data from each, but it would need to correlate and analyze on a fairly regular basis, which would require a lot of cycles in a fairly large deployment. This processing would be in addition to managing the actual provisioning process and keeping an eye on whether additional resources were available at any given moment.

It seems impractical and inefficient to expect the management framework to perform all these duties. Perhaps they can – and even do for small environments – but scaling the management framework itself would be a herculean task as the environment and demands on it grew.

To sum up, management frameworks have the capabilities to manage scaling events, but like the “application” it has no practical means of visibility into the virtual application. Assuming visibility was possible there remains processing and operational challenges that may in the long run likely impact the ability of the system to collect, correlate, and analyze data in large environments, making it impractical to lay sole responsibility on the management framework.

NEXT: Resolution to the Case (For & Against) X-Driven Scalability in Cloud Computing Environments