The Secret to Doing Cloud Scalability Right

Hint: The answer lies in being aware of the entire application context and a little pre-planning

Thanks to the maturity of load balancing services and technology, dynamically scaling applications in pre-cloud and cloud computing environments is a fairly simple task. But doing it right – in a way that maintains performance while maximizing resources and minimizing costs well, that is not so trivial a task unless you have the right tools.


Before we can explain how to do it right, we have to dig into the basics of how scalability (and more precisely auto-scalability) works and what’s required to scale not only dynamically.

A key characteristic of cloud computing is scalability, or more precisely the ease with which scalability can be achieved.

Scalability and Elasticity via dynamic ("on-demand") provisioning of resources on a fine-grained, self-service basis near real-time, without users having to engineer for peak loads.

-- Wikipedia, “Cloud Computing

When you take this goal apart, what folks are really after is the ability to transparently add and/or remove resources to an “application” as needed to meet demand. Interestingly enough, both in pre-cloud and cloud computing environments this happens due to two key components: load balancing and automation.

Load balancing has always been used to scale applications transparently. The load balancing service provides a layer of virtualization in the network that abstracts the “real” resources providing the application and makes many instances of that application appear to be a single, holistic entity. This layer of abstraction has the added benefit of allowing the load balancing service to see both the overall demand on the “application” as well as each individual instance. This is important to cloud scalability because a single application instance does not have the visibility necessary to see load at the “application” layer, it sees only load at the application instance layer, i.e. itself.

Visibility is paramount to scalability to maintain efficiency of scale. That means measuring CAP (capacity, availability, and performance) both at the “virtual” application and application instance layers. These measurements are generally tied to business and operational goals – the goals upon which IT is measured by its consumers. The three are inseparable and impact each other in very real ways. High capacity utilization often results in degrading performance, availability impacts both capacity and performance, and poor performance can in turn degrade capacity. Measuring only one or two is insufficient; all three variables must be monitored and, ultimately, acted upon to achieve not only scalability but efficiency of scale. Just as important is flexibility in determining what defines “capacity” for an application. In some cases it may be connections, in other CPU and/or memory load, and in still others it may be some other measurement. It may be (should be) a combination of both capacity and performance, and any load balancing service ought to be able to balance all three variables dynamically to achieve maximum results with minimum resources (and therefore in a cloud environment, costs).


There are three things you must do in order to ensure cloud scalability is efficient:

1. Determine what “capacity” means for your application. This will likely require load testing of a single instance to understand resource consumption and determine an appropriate set of thresholds based on connections, memory and CPU utilization. Depending on what load balancing service you will ultimately use, you may be limited to only viewing capacity in terms of concurrent connections. If this is the case – as is generally true in an off-premise cloud environment where services are limited – then ramp up connections while measuring performance (be sure to read #3 before you measure “performance”). Do this multiple times until you’re sure you have a good average connection limit at which performance becomes an issue.

2. Determine what “available” means for an application instance. Try not to think in simple terms such as “responds to a ping” or “returns an HTTP response”. Such health checks are not valid when measuring application availability as they only determine whether the network and web server stack are available and responding properly. Both can be true yet the application may be experiencing troubles and returning error codes or bad data (or no data). In any dynamic environment, availability must focus on the core unit of scalability – the application. If that’s all you’ve got in an off-premise cloud load balancing service, however, be aware of the risk to availability and pass on the warning to the business side of the house.

3. Determine “performance” threshold limitations for application instances. This value directly impacts the performance of the virtual application. Remember to factor in that application responses times are the sum of the time it takes to traverse from the client to the application and back. That means the application instance response time is only a portion, albeit likely the largest portion, of the overall performance threshold. Determine the RTT (round trip time) for an average request/response and factor that into the performance thresholds for the application instances.


If you’re thinking at this point that it’s not supposed to require so much work to “auto-scale” in cloud computing environments, well, it doesn’t have to. As long as you’re willing to trade a higher risk of unnoticed failure with performance degradation as well as potentially higher-costs in inefficient scaling strategies, then you need do nothing more than just “let go, let cloud” (to shamelessly quote the 451 Group’s Wendy Nather ).

The reason that ignoring all the factors that impact when to scale out and back down is so perilous is because of the limitations in load balancing algorithms and, in particular in off-premise cloud environments – inability to leverage layer 7 load balancing (application switching, page routing, et al) to architect scalability domains. You are left with a few simple and often inefficient algorithms from which to choose, which impedes efficiency by making it more difficult to actually scale in response to actual demand and its impact on the application. You are instead reacting (and often too late) to individual pieces of data that alone do not provide a holistic view of the application, but rather only limited views into application instances.

Cloud scalability – whether on-premise or off – should be a balancing (pun only somewhat intended) act that maximizes performance and efficiency while minimizing costs. While allowing “the cloud” to auto-scale encourages operational efficiency, it often does so at the expense of performance and higher costs.

An ounce of prevention is worth a pound of cure, and in the case of scalability a few hours of testing is worth a month of additional uptime.

Published Nov 09, 2011
Version 1.0

Was this article helpful?

No CommentsBe the first to comment