Examining responsibility for auto-scalability in cloud computing environments.
Today, the argument regarding responsibility for auto-scaling in cloud computing as well as highly virtualized environments remains mostly constrained to e-mail conversations and gatherings at espresso machines. It’s an argument that needs more industry and “technology consumer” awareness, because it’s ultimately one of the underpinnings of a dynamic data center architecture; it’s the piece of the puzzle that makes or breaks one of the highest value propositions of cloud computing and virtualization: scalability.
The question appears to be a simple one: what component is responsible not only for recognizing the need for additional capacity, but acting on that information to actually initiate the provisioning of more capacity? Neither the answer, nor the question, it turns out are as simple as appears at first glance. There are a variety of factors that need to be considered, and each of the arguments for – and against - a specific component have considerable weight.
Today we’re going to specifically examine the case for the application as the primary driver of scalability in cloud computing environments.
The first and most obvious answer is that the application should drive scalability. After all, the application knows when it is overwhelmed, right? Only the application has the visibility into its local environment that indicates when it is reaching capacity with respect to memory and compute and even application resource. The developer, having thoroughly load tested the application, knows the precise load in terms of connections and usage patterns that causes performance degradation and/or maximum capacity and thus it should logically fall to the application to monitor resource consumption with the goal of initiating a scaling event when necessary.
In theory, this works. In practice, however, it doesn’t. First, application instances may be scaled-up in deployment. The subsequent increase in memory and/or compute resources changing the thresholds at which the application should scale. If the application is developed with the ability to set thresholds based on an external set of values this may still be a viable solution. But once the application recognizes it is approaching a scaling event, how does it initiate that event? If it is integrated with the environment’s management framework – say the cloud API – it can do so. But to do so requires that the application be developed with an end-goal of being deployed in a specific cloud computing environment. Portability is immediately sacrificed for control in a way that locks in the organization to a specific vendor for the life of the application – or until said organization is willing to rewrite the application so that it targets a different management framework.
Some day, in the future, it may be the case that the application can simply send out an SoS of sorts – some interoperable message that indicates it is hovering near dangerous utilization levels and needs help. Such a message could then be interpreted by the infrastructure and/or management framework to automatically initiate a scaling event, thus preserving control and interoperability (and thus portability). That’s the vision of a stateless infrastructure, but one that is unlikely to be an option in the near future.
At this time, back in reality land, the application has the necessary data but is unable to practically act upon it.
But let’s assume it could, or that the developers integrated the application appropriately with the management framework such that the application could initiate a scaling event. This works for the first instance, but subsequently becomes problematic – and expensive. This is because visibility is limited to the application instance, not the application as a whole – the one which is presented to the end-user through network server virtualization enabled by load balancing and application delivery. As application instance #1 nears capacity, it initiates a scaling event which launches application instance #2. Any number of things go wrong at this point:
These are generally undesirable results; no one architects a dynamic data center with the intention of making scalability less efficient or rendering it unable to effectively meet business and operational requirements with respect to performance and availability. Both are compromised by an application-driven scalability strategy because application instances lack the visibility necessary to maintaining capacity at levels appropriate to meet demand, and have no visibility into end-user performance, only internal application and back-end integration performance. Both impact overall end-user performance, but are incomplete measures of the response time experienced by application consumers.
To sum up, the “application” has limited data and today, no control over provisioning processes. And even assuming full control over provisioning, the availability of only partial data leaves the process fraught with potential risk to operational stability.