load testing
4 TopicsThe Secret to Doing Cloud Scalability Right
Hint: The answer lies in being aware of the entire application context and a little pre-planning Thanks to the maturity of load balancing services and technology, dynamically scaling applications in pre-cloud and cloud computing environments is a fairly simple task. But doing it right – in a way that maintains performance while maximizing resources and minimizing costs well, that is not so trivial a task unless you have the right tools. SCALABILITY RECAP Before we can explain how to do it right, we have to dig into the basics of how scalability (and more precisely auto-scalability) works and what’s required to scale not only dynamically. A key characteristic of cloud computing is scalability, or more precisely the ease with which scalability can be achieved. Scalability and Elasticity via dynamic ("on-demand") provisioning of resources on a fine-grained, self-service basis near real-time, without users having to engineer for peak loads. -- Wikipedia, “Cloud Computing” When you take this goal apart, what folks are really after is the ability to transparently add and/or remove resources to an “application” as needed to meet demand. Interestingly enough, both in pre-cloud and cloud computing environments this happens due to two key components: load balancing and automation. Load balancing has always been used to scale applications transparently. The load balancing service provides a layer of virtualization in the network that abstracts the “real” resources providing the application and makes many instances of that application appear to be a single, holistic entity. This layer of abstraction has the added benefit of allowing the load balancing service to see both the overall demand on the “application” as well as each individual instance. This is important to cloud scalability because a single application instance does not have the visibility necessary to see load at the “application” layer, it sees only load at the application instance layer, i.e. itself. Visibility is paramount to scalability to maintain efficiency of scale. That means measuring CAP (capacity, availability, and performance) both at the “virtual” application and application instance layers. These measurements are generally tied to business and operational goals – the goals upon which IT is measured by its consumers. The three are inseparable and impact each other in very real ways. High capacity utilization often results in degrading performance, availability impacts both capacity and performance, and poor performance can in turn degrade capacity. Measuring only one or two is insufficient; all three variables must be monitored and, ultimately, acted upon to achieve not only scalability but efficiency of scale. Just as important is flexibility in determining what defines “capacity” for an application. In some cases it may be connections, in other CPU and/or memory load, and in still others it may be some other measurement. It may be (should be) a combination of both capacity and performance, and any load balancing service ought to be able to balance all three variables dynamically to achieve maximum results with minimum resources (and therefore in a cloud environment, costs). WHAT YOU NEED TO KNOW BEFORE YOU CONFIGURE There are three things you must do in order to ensure cloud scalability is efficient: 1. Determine what “capacity” means for your application. This will likely require load testing of a single instance to understand resource consumption and determine an appropriate set of thresholds based on connections, memory and CPU utilization. Depending on what load balancing service you will ultimately use, you may be limited to only viewing capacity in terms of concurrent connections. If this is the case – as is generally true in an off-premise cloud environment where services are limited – then ramp up connections while measuring performance (be sure to read #3 before you measure “performance”). Do this multiple times until you’re sure you have a good average connection limit at which performance becomes an issue. 2. Determine what “available” means for an application instance. Try not to think in simple terms such as “responds to a ping” or “returns an HTTP response”. Such health checks are not valid when measuring application availability as they only determine whether the network and web server stack are available and responding properly. Both can be true yet the application may be experiencing troubles and returning error codes or bad data (or no data). In any dynamic environment, availability must focus on the core unit of scalability – the application. If that’s all you’ve got in an off-premise cloud load balancing service, however, be aware of the risk to availability and pass on the warning to the business side of the house. 3. Determine “performance” threshold limitations for application instances. This value directly impacts the performance of the virtual application. Remember to factor in that application responses times are the sum of the time it takes to traverse from the client to the application and back. That means the application instance response time is only a portion, albeit likely the largest portion, of the overall performance threshold. Determine the RTT (round trip time) for an average request/response and factor that into the performance thresholds for the application instances. WHY IS THIS ALL IMPORTANT If you’re thinking at this point that it’s not supposed to require so much work to “auto-scale” in cloud computing environments, well, it doesn’t have to. As long as you’re willing to trade a higher risk of unnoticed failure with performance degradation as well as potentially higher-costs in inefficient scaling strategies, then you need do nothing more than just “let go, let cloud” (to shamelessly quote the 451 Group’s Wendy Nather ). The reason that ignoring all the factors that impact when to scale out and back down is so perilous is because of the limitations in load balancing algorithms and, in particular in off-premise cloud environments – inability to leverage layer 7 load balancing (application switching, page routing, et al) to architect scalability domains. You are left with a few simple and often inefficient algorithms from which to choose, which impedes efficiency by making it more difficult to actually scale in response to actual demand and its impact on the application. You are instead reacting (and often too late) to individual pieces of data that alone do not provide a holistic view of the application, but rather only limited views into application instances. Cloud scalability – whether on-premise or off – should be a balancing (pun only somewhat intended) act that maximizes performance and efficiency while minimizing costs. While allowing “the cloud” to auto-scale encourages operational efficiency, it often does so at the expense of performance and higher costs. An ounce of prevention is worth a pound of cure, and in the case of scalability a few hours of testing is worth a month of additional uptime.240Views0likes0CommentsCloud Testing: The Next Generation
It seems only fair that as the Internet caused the problem, it should solve it. One of the negatives of deploying an Internet-scale infrastructure and application is that until it’s put to the test, you can’t have 100 percent confidence that it will scale as expected. If you do, you probably shouldn’t. Applications and infrastructure that perform well – and correctly – at nominal scale may begin to act wonky as load increases. Dan Bartow , VP at SOASTA, says it is still often load balancing configuration errors that crop up during testing that impedes scalability and performance under load. Choices regarding the load balancing algorithm have a direct impact on the way in which sites and applications scale – or fail to scale – and only under stress does infrastructure and applications begin to experience problems. The last time I ran a scalability and performance test on industry load balancers that’s exactly what happened – what appeared to be a well-behaving Load balancer under normal load turned into a temper-tantrum throwing device under heavier load. The problem? A defect deep in the code that only appeared when the device’s session table was full. Considering the capability of such devices even then, that meant millions of connections had to be seen in a single session before the problem reared its ugly head. Today load balancers are capable of not millions, but tens of millions of connections. Scale that is difficult if not impossible for organizations to duplicate. cloud computing and virtualization bring new challenges to testing the scalability of an application deployment. Application deployed in a cloud environment may be designed to auto-scale “infinitely” which implies testing that application and its infrastructure requires the same capability in a testing solution. That’s no small trick. Traditionally organizations would leverage a load testing solution capable of generating enough clients and traffic to push an application and its infrastructure to the limits. But given increases in raw compute power and parallel improvements in capacity and performance of infrastructure solutions, the cost of a solution capable of generating the kind of Internet-scale load necessary is prohibitive. One of our internal performance management engineers applied some math and came up with a jaw-dropping investment: In other words, enough hardware to test a top-of-the-line ADC [application delivery controller] would set you back a staggering $3 million. It should be clear that even buying equipment to test a fairly low-end ADC would be a big ticket item, likely costing quite a bit more than the device under test. It seems fairly obvious that testing Internet-scale architectures is going to require Internet-scale load generation solutions but without the Internet-scale cost. It’s only fair that if the scalability of the Internet is the cause of the problem that it should also provide the solution.183Views0likes0CommentsIt’s 2am: Do You Know What Algorithm Your Load Balancer is Using?
The wrong load balancing algorithm can be detrimental to the performance and scalability of your web applications. When you’re mixing and matching virtual or physical servers you need to take care with how you configure your Load balancer – and that includes cloud-based load balancing services. Load balancers do not at this time, unsurprisingly, magically choose the right algorithm for distributing requests for a given environment. One of the nice things about a load balancing solution that comes replete with application-specific templates is that all the work required to determine the optimal configuration for the load balancer and its associated functionality (web application security, acceleration, optimization) has already been done – including the choice of the right algorithm for that application. But for most applications there are no such templates, no guidance, nothing. Making things more difficult are heterogeneous environments in which the compute resources available vary from instance to instance. These variations make some load balancing algorithms unsuited to such environments. There is some general guidance you can use when trying to determine which algorithm is best suited to meeting the performance and scalability needs of your applications based on an understanding of how the algorithms are designed to make decisions, but if you want optimal performance and scalability you’ll ultimately have to do some testing. Heterogeneous environments can pose a challenge to scale if careful consideration of load balancing algorithms is not taken. Whether the limitations on compute resources are imposed by a virtualization solution or the hardware itself, limitations that vary from application instance to application instance are an important factor to consider when configuring your load balancing solution. Let’s say you’ve got a pool of application instances and you know the capacity of each in terms of connections (X). Two of the servers can handle 500 concurrent connections, and one can handle 1000 concurrent connections. Now assume that your load balancer is configured to perform standardround robin load balancingbetween the three instances. Even though the total capacity of these three servers appears to be 2000 concurrent connections, by the time you hit 1501, the first of the three servers will be over capacity because it will have to try to handle 501 connections. If you tweak the configuration just a bit to indicate the maximum connection capacity for each node (instance) you can probably avoid this situation, but there are no guarantees. Now let’s make a small change in the algorithm – instead of standard round robin we’ll useweightedround robin (often called “ratio”), and give the largest capacity server a higher weight based on its capacity ratio to the other servers, say 2. This means the “bigger” server will receive twice as many requests as the other two servers, which brings the total capacity closer to what is expected. You might be thinking that aleast connection algorithmwould be more appropriate in a heterogeneous environment, but that’s not the case. Least connection algorithms base distribution upon the number of connections currently open on any given server instance; it does not necessarily take into consideration the maximum connection capacity for that particular node. Fastest response time combined with per node connection limits would be a better option, but afastest response time algorithmtends to result in a very unequal distribution as load increases in a heterogeneous environment. This does not, however, say anything about the performance of the application when using any of the aforementioned algorithms. We do know that as application instances near capacity performance tends to degrade. Thus we could extrapolate that the performance for the two “smaller” servers will degrade faster than the performance for the bigger server because they will certainly reach capacity under high load before the larger server instance – when using some algorithms, at least. Algorithms like fastest response time and least connections tend to favor higher performing servers which means in the face of a sudden spike of traffic performance may degrade usingthatalgorithm as well. How about more “dynamic” algorithms that take into consideration multiple factors? Dynamic load balancing methods are designed to work with servers that differ in processing speed and memory. The resulting load balancing decisions may be uneven in terms of distribution but generally provides a more consistent user experience in terms of performance. For example, theobserved dynamic load balancing algorithmdistributes connections across applications based on a ratio calculated every second, andpredictive dynamic load balancinguses the same ratio but also takes into consideration the change between previous connection counts and current connection counts and adjusts the ratio based on the delta. Predictive mode is more aggressive in adjusting ratio values for individual application instances based on connection changes in real-time and in a heterogeneous environment is likely better able to handle the differences between server capabilities. What is TCP multiplexing? TCP multiplexing is a technique used primarily by load balancers and application delivery controllers (but also by some stand-alone web application acceleration solutions) that enables the device to "reuse" existing TCP connections. This is similar to the way in which persistent HTTP 1.1 connections work in that a single HTTP connection can be used to retrieve multiple objects, thus reducing the impact of TCP overhead on application performance. TCP multiplexing allows the same thing to happen for TCP-based applications (usually HTTP / web) except that instead of the reuse being limited to only 1 client, the connections can be reused over many clients, resulting in much greater efficiency of web servers and faster performing applications. Interestingly enough, chatting withDan Bartow(now CloudTest Evangelist and Vice President atSOASTA) about his experiences as Senior Manager of Performance Engineering atIntuit, revealed that testing different algorithms under heavy load generated externally finally led them to the discovery that a simple round robin algorithm combined with the application ofTCP multiplexing optionsyielded a huge boost in both capacity and performance. But that was only after testing under conditions which were similar to those the applications would experience during peaks in usage and normalization of the server environment. This illustrates well that performance and availability isn’t simply a matter of dumping a load balancing solution into the mix – it’s important to test, to tweak configurations, and test again to find the overall infrastructure configuration that’s going to provide the best application performance (and thus end-user experience) while maximizing resource utilization. Theoretical mathematically accurate models of load balancing are all well and good, but in the real world the complexity of the variables and interaction between infrastructure solutions and applications and servers is much higher, rendering the “theory” just that – theory. Invariably which load balancing algorithm is right for your application is going to depend heavily on what metrics are most important to you. A balance of server efficiency, response time, and availability is likely involved, but which one of these key metrics is most important depends on what business stakeholders have deemed most important to them. The only way to really determine which load balancing algorithm will achieve the results you are looking for is totest them, under load, and observe the distribution and performance of the application. FIRE and FORGET NOT a GOOD STRATEGY The worst thing you can do is “fire and forget” about your load balancer. The algorithm that might be right for one application might not be right for another, depending on the style of application, its usage patterns, the servers used to serve it, and even the time of year. Unfortunately we’re not quite at the point where the load balancer can automatically determine the right load balancing algorithm for you, butthere are ways to adjust – dynamically – the algorithm based on not just the application but also the capabilities of the servers(physical and/or virtual) being load balanced so one day it is quite possible that through the magic of Infrastructure 2.0, load balancing algorithms will be modified on-demand based on the type of servers that make up the pool of resources. In order for the level of sophistication we’d (all) like to see, however, it’s necessary to first understand the impact of the load balancing algorithm on applications and determine which one is best able to meet the service level agreements in various environments based on a variety of parameters. This will become more important as public and private cloud computing environments are leveraged in new ways and introduce more heterogeneous environments. Seasonal demand might, for example, be met by leveraging different “sizes” of unused capacity across multiple servers in the data center. These “servers” would likely be of different CPU and RAM capabilities and thus would certainly be impacted by the choice of load balancing algorithm. Being able todynamically modify the load balancing algorithmbased on the capacities of application instances is an invaluable tool when attempting to maximize the efficiency of resources while minimizing associated costs. There is, of course, a lack of control over algorithms in cloud computing environments, as well, that make the situation more difficult. With a limited set of choices available from providers the algorithm that’s best for your application and server resource composition may not be available. Providers need to make it easier for customers to take advantage of modern, application and resource-aware algorithms that have evolved through trial-and-error over the past decade. Again, Infrastructure 2.0 enables this level of choice but must be leveraged by the provider to extend that choice and control to its customers. For now, it’s going to have to be enough to (1) thoroughly test the application and its supporting infrastructure under load and (2) adjust the load balancing algorithm to meet your specific performance criteria based on what is available. You might be surprised to find how much better your response time and capacity can be when you’re using the “right” load balancing algorithm for your application – or at least one that’s more right than it is wrong if you’re in a cloud computing environment.301Views0likes2CommentsLoad Testing as a Service: A Look at Load Impact (beta)
I admit it. I’m a load / performance testing junkie. During my years with Network Computing I burned through any number of solutions designed to throw more traffic at products than money Congress is throwing at failed banks these days. And I do mean burned, as the last time I was in the lab there were no less than three non-functioning Spirent Avalanche systems that had given up the ghost after being forced to their absolute limits over years of use and abuse. When I received a note telling me about LoadImpact.com, a load testing as a service site, naturally I was intrigued. Generate load? From across the Internet? Inconceivable! That’s a no-no, after all, at least the kind of load that I’m used to generating. So I decided to check it out. Here’s the low down. PICK YOUR POISON Load Impact offers four service levels, with the lowest-capable load generating service being free (as in gratis). Pricing is per month with varying numbers of users, configuration and result management, and differences in what you can/cannot configure for any given test. I tested out the “Light” version, of course, but the site allows you to see all the configuration options – they’re just not enabled for modification if you aren’t using the right version. Load Impact does a nice job of laying out the various versions in an at-a-glance format. LOAD ‘EM UP First you’ve got to create a test. When configuring a test you have two options: Basic and Advanced. Basic requires the page you want to test. That’s it. Load Impact supports both HTTP and HTTPS (SSL) and can handle sites requiring HTTP Basic Auth. From there it uses defaults. The Advanced settings offer you a little more control in the Light version, allowing you to analyze the page and then, if you so desire, you can edit the “load script”. The load script is pretty simple: # Initial sleep to distribute clients sleep 500-5000 request 2 GET:http://www.nandgate.com/ request 2 GET:http://www.nandgate.com/base.css request 2 GET:http://www.nandgate.com/bigip/f5-logo.png request 2 GET:http://www.nandgate.com/donlori-w.jpg Don’t let the simplicity fool you. You can do quite a bit with the script to change the behavior of the clients and customize it to closely match the behavior of (or at least what you expect will be the behavior of) real visitors. The script format is documented in the FAQ. The format allows you to control the number of simultaneous connections that can be opened (in this case always 2 by default, which appears to try to emulate the behavior of IE6 and below) to better mimic the behavior of a “real” web browser. To simulate a POST method, you send parameters as though it was a GET request; it is assumed that the Load Impact client interprets the script and translates that into an HTTP compliant POST operation. The ability to edit the script allows you to simulate behavior rather than just the loading of a page as you can continue the script and have a client load multiple pages and sleep in between if desired. In more advanced (paid) versions you can also use a recorder to help you build out more complex scripts – a feature common in most commercial load testing products. Load Impact clients can also handle cookies, so if you’re testing a web application that uses cookies (and what web application today doesn’t?) you’re good to go. This is also good news for testing web applications that are load balanced and using persistence-based routing as such functionality is often implemented via cookies. The actual load balancing algorithms and functions for hosted and cloud-based applications are rarely made public (and too often no one asks) so supporting cookies ensures that the most common of those methods will be supported. You can change the number of clients and the “ramp up” (step) values, as long as they stay within the specified limits for the version you are using. The default for the Light (free) version is to start with 10, stop at 50, and ramp up by 10 clients. You cannot configure it to “burst” all the clients at one time, though you could start with 49, end at 50, and ramp by 1 to simulate a situation close enough to bursting. The load generators are using HTTP 1.1 and each test consists of several subtests: one per step. Clients use persistent connections throughout each subtest, but open new connections at the onset of each subtest. A nice option, though not configurable in the “Light” version, is “page view time”. In high-end load testing terms this is usually referred to as “think time”. “Page view time” is used to simulate the fact that a real client wouldn’t just load the page and leave. They’d visit for a while - hence the reason we call them visitors, right? You can also change the client bandwidth in paid versions, though a change affects all clients, not just a specified subset. Load Impact has plans to broaden this capability and allow more granular assignment of bandwidth in the future in a manner similar to “page view time” by allowing you to configure a range and then randomly assigning bandwidth values within that range to clients. An important note about Load Impact clients: (*) Note that Load Impact clients always behave as web browsers with an empty client-side cache. They will always load all objects on a page, never caching anything. This means that they are sometimes "heavier" than a real client would be. This means that if you were hoping to test the impact of a web acceleration solution that takes advantage of client-side caching (like F5’s WebAccelerator) that Load Impact will not be valuable in assessing the improvement over subsequent visits due to client-side caching. This also means Load Impact will not accurately test your web server configuration or help you assess the impact on the use of cache-control HTTP headers or HTML meta-tags. It will, however, help assess any improvements in performance that may be due to server-side caching or other optimization/acceleration techniques. This means Load Impact may be invaluable to you when assessing technology designed to help improve application performance before purchasing. The Light version will do well enough for assessing impact on performance and is free, so be sure to remember to include it on the list of options during a proof of concept run with new technology. WYSIWYG Once the test is running you’ll see the graph and data updated in real-time. A nice to have, as you don’t have visibility into the status of the clients, but not very interesting to watch as it tends to update slowly. The end result is a nice little graph depicting “Delay” in seconds as it relates to the number of clients. Initially you see only the summary, the total delay for the entire page. But you can drill into each of the file types and select up to three objects to be included in the graph. As you drill down and select different objects, a spreadsheet with delay as it relates to load is updated. In this case, choosing the base.css and two image files showed the delay in ms rather than in full seconds, as would be expected. You can export the data in CSV format, which should allow you to chart all the composite objects performance together to get a better view of what’s causing any problems you might see. When you’ve selected more than one object you can also play with the “min”, “max”, and “average” settings to dig around a bit more into the actual data that was generated. Load Impact saves one average delay value per client step, so if you ramped from 10-50 on a step of 10, you’ll see 5 distinct data points plotted, each representing the average of all clients at that step. Load Impact also notes that “Delay” is not the time it would take for a client (browser) to load the page. This value is the sum of all individual object load times. Please note that this is not the same as the time the page will load and render in a web browser. Because objects are loaded across several connections in parallel, the total sum of all individual load times will not be the same as the time a web browser takes to load a page. The current "test summary" metric is intended for optimization work, as it accurately shows changes in object load times. We have, however, realized that many people expect it to show page load times as experienced by a user, and therefore we are going to implement such a metric as well. It’s also important to note that the data collected is not normalized against intercontinental latency. Load Impact has plans to increase the number of load generators available in the future but right now there are two options: one located in Stockholm and one in Chicago. In the Light version all load is generated from Stockholm, which necessarily incurs intercontinental latency that may or may not be applicable to your application. So if you’re using the Light version and hosting somewhere other than Europe and expect most visitors to be outside Europe you may want to manually adjust the data collected by downloading the CSV version of the data. The next version of Load Impact, which is expected to be available in the next couple of weeks, will offer more granular performance metrics including “a new response time metric and a bandwidth usage metric.” Initially these will be summary only with per-object reporting in the future. Plans for TCP metrics like TTFB (Time to First Byte) and TTLB (Time to Last Byte) as well as IP metrics such as packet loss, delay, and jitter are also in the works. More data is always good for troubleshooting exactly where application performance problems are occurring: at the network or application layers. RUN IT AGAIN SAM You can save and re-run a number of tests based on the version you choose, which is a nice way to test sites under low traffic, medium traffic, and high traffic volumes. With the free service version the limit of 50 users makes the definitions of low-medium-high somewhat irrelevant, but even at 5000 concurrent users with the Advanced version Load Impact is definitely not going to be able to generate a high enough load to be considered a mainstay for high-volume site testing. You’ll also want to consider that the transactions generated by the client are not synthetic unless you use parameters you will recognize as synthetic and can account for them. The traffic generated by Load Impact is real, and therefore it will register as real hits for purposes of advertising, campaign tracking, and any other web site measurement based on visitors you may have in place. If you are using script-based analytics or advertising you can relax; while scripts are downloaded they are not executed or evaluated by the load generator clients. Be careful about the potential to trigger security mechanisms if you’re testing a hosted or cloud-based application with Load Impact, or even your own web application. At this time all clients appear to be arriving from the same IP address, and depending on the configuration of security systems this could trigger alarms and notifications of an attack and potentially blacklist (usually temporarily) the load generator’s IP address. Of course if you’re implementing a security solution or are trying to evaluate rate shaping/QoS functionality you may want to consider using Load Impact to generate some load to evaluate the solution’s effectiveness as well. To address the possibility of overwhelming third party sites hosting scripts commonly used by web applications Load Impact limits the number of times a resource can be loaded by the entire system in a 24 hour period. While this doesn’t completely mitigate the potential for tripping up security infrastructure, it is a good idea for Load Impact to impose such a limit in the interests of being a good Internet citizen. The FAQ and forums have a lot of good questions and answers as well as tips to help you build out a test scenario. While the help / information available within the actual application are a bit light, you should be able to find what you need in the forums. There aren’t enough knobs to make me truly happy yet (I am a load testing junkie, after all) but the roadmap for additional features and metrics is definitely on track to get me there. For most folks this is a good way to start out testing and for many it is probably more than enough than they’ll ever need. For developers trying out their application in a cloud or hosted environment, this is definitely a good way to get a handle on how it’s going to perform. Even for developers wondering how their locally hosted web application is performing might want to consider giving Load Impact a whirl; it provides the perspective of a user outside the local environment which is something not always available to developers using in-house solutions. It’s free to try out, so go on over and give it a run yourself.526Views0likes0Comments