Layer 7 Switching + Load Balancing = Layer 7 Load Balancing
Modern load balancers (application delivery controllers) blend traditional load-balancing capabilities with advanced, application aware layer 7 switching to support the design of a highly scalable, optimized application delivery network. Here's the difference between the two technologies, and the benefits of combining the two into a single application delivery controller. LOAD BALANCING Load balancing is the process of balancing load (application requests) across a number of servers. The load balancer presents to the outside world a "virtual server" that accepts requests on behalf of a pool (also called a cluster or farm) of servers and distributes those requests across all servers based on a load-balancing algorithm. All servers in the pool must contain the same content. Load balancers generally use one of several industry standard algorithms to distribute request. Some of the most common standard load balancing algorithms are: round-robin weighted round-robin least connections weighted least connections Load balancers are used to increase the capacity of a web site or application, ensure availability through failover capabilities, and to improve application performance. LAYER 7 SWITCHING Layer 7 switching takes its name from the OSI model, indicating that the device switches requests based on layer 7 (application) data. Layer 7 switching is also known as "request switching", "application switching", and "content based routing". A layer 7 switch presents to the outside world a "virtual server" that accepts requests on behalf of a number of servers and distributes those requests based on policies that use application data to determine which server should service which request. This allows for the application infrastructure to be specifically tuned/optimized to serve specific types of content. For example, one server can be tuned to serve only images, another for execution of server-side scripting languages like PHP and ASP, and another for static content such as HTML , CSS , and JavaScript. Unlike load balancing, layer 7 switching does not require that all servers in the pool (farm/cluster) have the same content. In fact, layer 7 switching expects that servers will have different content, thus the need to more deeply inspect requests before determining where they should be directed. Layer 7 switches are capable of directing requests based on URI, host, HTTP headers, and anything in the application message. The latter capability is what gives layer 7 switches the ability to perform content based routing for ESBs and XML/SOAP services. LAYER 7 LOAD BALANCING By combining load balancing with layer 7 switching, we arrive at layer 7 load balancing, a core capability of all modern load balancers (a.k.a. application delivery controllers). Layer 7 load balancing combines the standard load balancing features of a load balancing to provide failover and improved capacity for specific types of content. This allows the architect to design an application delivery network that is highly optimized to serve specific types of content but is also highly available. Layer 7 load balancing allows for additional features offered by application delivery controllers to be applied based on content type, which further improves performance by executing only those policies that are applicable to the content. For example, data security in the form of data scrubbing is likely not necessary on JPG or GIF images, so it need only be applied to HTML and PHP. Layer 7 load balancing also allows for increased efficiency of the application infrastructure. For example, only two highly tuned image servers may be required to meet application performance and user concurrency needs, while three or four optimized servers may be necessary to meet the same requirements for PHP or ASP scripting services. Being able to separate out content based on type, URI, or data allows for better allocation of physical resources in the application infrastructure.1.6KViews0likes2CommentsThe Secret to Doing Cloud Scalability Right
Hint: The answer lies in being aware of the entire application context and a little pre-planning Thanks to the maturity of load balancing services and technology, dynamically scaling applications in pre-cloud and cloud computing environments is a fairly simple task. But doing it right – in a way that maintains performance while maximizing resources and minimizing costs well, that is not so trivial a task unless you have the right tools. SCALABILITY RECAP Before we can explain how to do it right, we have to dig into the basics of how scalability (and more precisely auto-scalability) works and what’s required to scale not only dynamically. A key characteristic of cloud computing is scalability, or more precisely the ease with which scalability can be achieved. Scalability and Elasticity via dynamic ("on-demand") provisioning of resources on a fine-grained, self-service basis near real-time, without users having to engineer for peak loads. -- Wikipedia, “Cloud Computing” When you take this goal apart, what folks are really after is the ability to transparently add and/or remove resources to an “application” as needed to meet demand. Interestingly enough, both in pre-cloud and cloud computing environments this happens due to two key components: load balancing and automation. Load balancing has always been used to scale applications transparently. The load balancing service provides a layer of virtualization in the network that abstracts the “real” resources providing the application and makes many instances of that application appear to be a single, holistic entity. This layer of abstraction has the added benefit of allowing the load balancing service to see both the overall demand on the “application” as well as each individual instance. This is important to cloud scalability because a single application instance does not have the visibility necessary to see load at the “application” layer, it sees only load at the application instance layer, i.e. itself. Visibility is paramount to scalability to maintain efficiency of scale. That means measuring CAP (capacity, availability, and performance) both at the “virtual” application and application instance layers. These measurements are generally tied to business and operational goals – the goals upon which IT is measured by its consumers. The three are inseparable and impact each other in very real ways. High capacity utilization often results in degrading performance, availability impacts both capacity and performance, and poor performance can in turn degrade capacity. Measuring only one or two is insufficient; all three variables must be monitored and, ultimately, acted upon to achieve not only scalability but efficiency of scale. Just as important is flexibility in determining what defines “capacity” for an application. In some cases it may be connections, in other CPU and/or memory load, and in still others it may be some other measurement. It may be (should be) a combination of both capacity and performance, and any load balancing service ought to be able to balance all three variables dynamically to achieve maximum results with minimum resources (and therefore in a cloud environment, costs). WHAT YOU NEED TO KNOW BEFORE YOU CONFIGURE There are three things you must do in order to ensure cloud scalability is efficient: 1. Determine what “capacity” means for your application. This will likely require load testing of a single instance to understand resource consumption and determine an appropriate set of thresholds based on connections, memory and CPU utilization. Depending on what load balancing service you will ultimately use, you may be limited to only viewing capacity in terms of concurrent connections. If this is the case – as is generally true in an off-premise cloud environment where services are limited – then ramp up connections while measuring performance (be sure to read #3 before you measure “performance”). Do this multiple times until you’re sure you have a good average connection limit at which performance becomes an issue. 2. Determine what “available” means for an application instance. Try not to think in simple terms such as “responds to a ping” or “returns an HTTP response”. Such health checks are not valid when measuring application availability as they only determine whether the network and web server stack are available and responding properly. Both can be true yet the application may be experiencing troubles and returning error codes or bad data (or no data). In any dynamic environment, availability must focus on the core unit of scalability – the application. If that’s all you’ve got in an off-premise cloud load balancing service, however, be aware of the risk to availability and pass on the warning to the business side of the house. 3. Determine “performance” threshold limitations for application instances. This value directly impacts the performance of the virtual application. Remember to factor in that application responses times are the sum of the time it takes to traverse from the client to the application and back. That means the application instance response time is only a portion, albeit likely the largest portion, of the overall performance threshold. Determine the RTT (round trip time) for an average request/response and factor that into the performance thresholds for the application instances. WHY IS THIS ALL IMPORTANT If you’re thinking at this point that it’s not supposed to require so much work to “auto-scale” in cloud computing environments, well, it doesn’t have to. As long as you’re willing to trade a higher risk of unnoticed failure with performance degradation as well as potentially higher-costs in inefficient scaling strategies, then you need do nothing more than just “let go, let cloud” (to shamelessly quote the 451 Group’s Wendy Nather ). The reason that ignoring all the factors that impact when to scale out and back down is so perilous is because of the limitations in load balancing algorithms and, in particular in off-premise cloud environments – inability to leverage layer 7 load balancing (application switching, page routing, et al) to architect scalability domains. You are left with a few simple and often inefficient algorithms from which to choose, which impedes efficiency by making it more difficult to actually scale in response to actual demand and its impact on the application. You are instead reacting (and often too late) to individual pieces of data that alone do not provide a holistic view of the application, but rather only limited views into application instances. Cloud scalability – whether on-premise or off – should be a balancing (pun only somewhat intended) act that maximizes performance and efficiency while minimizing costs. While allowing “the cloud” to auto-scale encourages operational efficiency, it often does so at the expense of performance and higher costs. An ounce of prevention is worth a pound of cure, and in the case of scalability a few hours of testing is worth a month of additional uptime.240Views0likes0CommentsApplying Scalability Patterns to Infrastructure Architecture
Too often software design patterns are overlooked by network and application delivery network architects but these patterns are often equally applicable to addressing a broad range of architectural challenges in the application delivery tier of the data center. The “High Scalability” blog is fast becoming one of my favorite reads. Last week did not disappoint with a post highlighting a set of scalability design patterns that was, apparently, inspired by yet another High Scalability post on “6 Ways to Kill Your Servers: Learning to Scale the Hard Way.” Credit:Michael Chow/azcentral.com This particular post caught my attention primarily because although I’ve touched on many of these patterns in the past, I’ve never thought to call them what they are: scalability patterns. That’s probably a side-effect of forgetting that building an architecture of any kind is at its core computer science and thus algorithms and design patterns are applicable to both micro- and macro-architectures, such as those used when designing a scalable architecture. This is actually more common than you’d think, as it’s rarely the case that a network guy and a developer sit down and discuss scalability patterns over beer and deep fried cheese curds (hey, I live in Wisconsin and it’s my blog post so just stop making faces until you’ve tried it). Developers and architects sit over there and think about how to design a scalable application from the perspective of its components – databases, application servers, middleware, etc… Network architects sit over here and think about how to scale an application from the perspective of network components – load balancers, trunks, VLANs, and switches. The thing is that the scalability patterns leveraged by developers and architects can almost universally be abstracted and applied to the application delivery network – the set of components integrated as a means to ensure availability, performance, and security of applications. That’s why devops is so important and why devops has to bring dev into ops as much as its necessary to bring some ops into dev. There needs to be more cross-over, more discussion, between the two groups if not an entirely new group in order to leverage the knowledge and skills that each has in new and innovative ways. ABSTRACT and APPLY So the aforementioned post is just a summary of a longer and more detailed post, but for purposes of this post I think the summary will do with the caveat that the original, “Scalability patterns and an interesting story...” by Jesper Söderlund is a great read that should definitely be on your “to read” list in the very near future. For now, let’s briefly touch on the scalability patterns and sub-patterns Jesper described with some commentary on how they fit into scalability from a network and application delivery network perspective. The original text from the High Scalability blog are in red(dish) text. Load distribution - Spread the system load across multiple processing units This is a horizontal scaling strategy that is well-understood. It may take the form of “clustering” or “load balancing” but in both cases it is essentially an aggregation coupled with a distributed processing model. The secret sauce is almost always in the way in which the aggregation point (strategic point of control) determines how best to distribute the load across the “multiple processing units.” load balancing / load sharing - Spreading the load across many components with equal properties for handling the request This is what most people think of when they hear “load balancing”, it’s just that at the application delivery layer we think in terms of directing application requests (usually HTTP but can just about any application protocol) to equal “servers” (physical or virtual) that handle the request. This is a “scaling out” approach that is most typically associated today with cloud computing and auto-scaling: launch additional clones of applications as virtual instances in order to increase the total capacity of an application. The load balancing distributes requests across all instances based on the configured load balancing algorithm. Partitioning - Spreading the load across many components by routing an individual request to a component that owns that data specific This is really where the architecture comes in and where efficiency and performance can be dramatically increased in an application delivery architecture. Rather than each instance of an application being identical to every other one, each instance (or pool of instances) is designated as the “owner”. This allows for devops to tweak configurations of the underlying operating system, web and application server software for the specific type of request being handled. This is, also, where the difference between “application switching” and “load balancing” becomes abundantly clear as “application switching” is used as a means to determine where to route a particular request which is/can be then load balanced across a pool of resources. It’s a subtle distinction but an important one when architecting not only efficient and fast but resilient and reliable delivery networks. Vertical partitioning - Spreading the load across the functional boundaries of a problem space, separate functions being handled by different processing units When it comes to routing application requests we really don’t separate by function unless that function is easily associated with a URI. The most common implementation of vertical partitioning at the application switching layer will be by content. Example: creating resource pools based on the Content-Type HTTP header: images in pool “image servers” and content in pool “content servers”. This allows for greater optimization of the web/application server based on the usage pattern and the content type, which can often also be related to a range of sizes. This also, in a distributed environment, allows architects to leverage say cloud-based storage for static content while maintaining dynamic content (and its associated data stores) on-premise. This kind of hybrid cloud strategy has been postulated as one of the most common use cases since the first wispy edges of cloud were seen on the horizon. Horizontal partitioning - Spreading a single type of data element across many instances, according to some partitioning key, e.g. hashing the player id and doing a modulus operation, etc. Quite often referred to as sharding. This sub-pattern is inline with the way in which persistence-based load balancing is accomplished, as well as the handling of object caching. This also describes the way in which you might direct requests received from specific users to designated instances that are specifically designed to handle their unique needs or requirements, such as the separation of “gold” users from “free” users based on some partitioning key which in HTTP land is often a cookie containing the relevant data. Queuing and batch - Achieve efficiencies of scale by processing batches of data, usually because the overhead of an operation is amortized across multiple request I admit defeat in applying this sub-pattern to application delivery. I know, you’re surprised, but this really is very specific to middleware and aside from the ability to leverage queuing for Quality of Service (QoS) at the delivery layer this one is just not fitting in well. If you have an idea how this fits, feel free to let me know – I’d love to be able to apply all the scalability patterns and sub-patterns to a broader infrastructure architecture. Relaxing of data constraints - Many different techniques and trade-offs with regards to the immediacy of processing / storing / access to data fall in this strategy This one takes us to storage virtualization and tiering and the way in which data storage and access is intelligently handled in varying properties based on usage and prioritization of the content. If one relaxes the constraints around access times for certain types of data, it is possible to achieve a higher efficiency use of storage by subjugating some content to secondary and tertiary tiers which may not have the same performance attributes as your primary storage tier. And make no mistake, storage virtualization is a part of the application delivery network – has been since its inception – and as cloud computing and virtualization have grown so has the importance of a well-defined storage tiering strategy. We can bring this back up to the application layer by considering that a relaxation of data constraints with regards to immediacy of access can be applied by architecting a solution that separates data reads from writes. This implies eventual consistency, as data updated/written to one database must necessarily be replicated to the databases from which reads are, well, read, but that’s part of relaxing a data constraint. This is a technique used by many large, social sites such as Facebook and Plenty of Fish in order to scale the system to the millions upon millions of requests it handles in any given hour. Parallelization - Work on the same task in parallel on multiple processing units I’m not going to be able to apply this one either, unless it was in conjunction with optimizing something like MapReduce and SPDY. I’ve been thinking hard about this one, and the problem is the implication that “same task” is really the “same task”, and that processing is distributed. That said, if the actual task can be performed by multiple processing units, then an application delivery controller could certainly be configured to recognize that a specific URL should be essentially sent to some other proxy/solution that performs the actual distribution, but the processing model here deviates sharply from the request-reply paradigm under which most applications today operate. DEVOPS CAN MAKE THIS HAPPEN I hate to sound-off too much on the “devops” trumpet, but one of the primary ways in which devops will be of significant value in the future is exactly in this type of practical implementation. Only by recognizing that many architectural patterns are applicable to not only application but infrastructure architecture can we start to apply a whole lot of “lessons that have already been learned” by developers and architects to emerging infrastructure architectural models. This abstraction and application from well-understood patterns in application design and architecture will be invaluable in designing the new network; the next iteration of network theory and implementation that will allow it to scale along with the applications it is delivering. Related blogs & articles: Cloud is not Rocket Science but it is Computer Science Implementing SOA Patterns: The Router Implementing SOA Patterns: The Service Firewall Implementing SOA Patterns: Input/Output Validator Lori MacVittie - interstitial request pattern (AJAX) Business-Layer Load Balancing I Find Your Lack of Win Disturbing Cloud Computing: Vertical Scalability is Still Your Problem Vertical Scalability Cloud Computing Style Scalability Only One Half the Reliability Equation Automating scalability and high availability services Service Virtualization Helps Localize Impact of Elastic Scalability Web 2.0: Integration, APIs, and Scalability To Take Advantage of Cloud Computing You Must Unlearn, Luke. Statistics Collection and Management Pack Scalability367Views0likes1CommentInfrastructure Scalability Pattern: Partition by Function or Type
A deeper dive on how to apply scalability patterns at the infrastructure layer. So it’s all well and good to say that you can apply scalability patterns to infrastructure and provide a high-level overview of the theory but it’s always much nicer to provide more detail so someone can actually execute on such a strategy. Thus, today we’re going to dig a bit deeper into applying a scalability pattern – vertical partitioning, to be exact – to an application infrastructure as a means to scale out an application in a way that’s efficient and supports growth and that leverages infrastructure, i.e. the operational domain. This is the reason for the focus on “devops”; this is certainly an architectural exercise that requires an understanding of both operations and the applications it is supporting, because in order to achieve a truly scalable partitioning-based architecture it’s going to have to take into consideration the functional aspects of the application. There is a less efficient but still inherently more scalable implementation that relies on content-type and generation, and we’ll briefly examine that, but the more efficient method of scalable requires some application awareness on the part of not only the infrastructure but the implementers as well. OPTION ONE: PARTITION by TYPE This vertical partitioning pattern requires no changes to the application and very little knowledge of its functional aspects or performance characteristics. A simple vertical partitioning pattern leverages the difference in delivery characteristics across content types as the basis for partitioning at the infrastructure layer. In this configuration the Application Delivery Controller (ADC) becomes the “endpoint” as far as the client is concerned. The ADC virtualizes the application and mediates all requests through it. This gives it the opportunity to apply all sorts of policies – security, acceleration, etc… – including application-layer switching. Application-layer switching allows the ADC to inspect every request and, based on its Content-Type HTTP header, direct it to an appropriate pool of resources. Generally this type of logic is encoded in the ADC either by configuring a mapping of content-types to the appropriate pool of resources, or by leveraging the ADC’s innate network-side scripting capability.182Views0likes0CommentsHTTP: The de facto application transport protocol of the Web
When the OSI defined its model it included a transport layer which was supposed to handle end-to-end connections and address communication reliability. In the early days of the web HTTP sat at the application layer (layer 7) and rode atop TCP, its transport layer. An interesting thing happened on the way to the 21st century; HTTP became an application transport layer. Many web applications today use HTTP to transport other application protocols such as JSON and SOAP and RSS. Applications now "speak" using a variety of languages to communicate, but underlying them all is HTTP. This is not the same as tunneling a different application through port 80 simply because almost all HTTP traffic flows through that port and it is therefore likely to be open on the corporate firewall. Those applications that simply tunnel through port 80 use TCP and their own application layer protocols, they're essentially just pretending to be HTTP by using the same port to fool firewalls into allowing their traffic to pass unhindered. No, this is different. This is the use of HTTP to wrap other application protocols and transport them. The web server interprets the HTTP and handles sessions and cookies and parameters, but another application is required to interpret the messages contained within because they represent the protocol of yet another application. In today's world the availability of exponentially expanding collaboration and syndication applications, all requiring different applications, is driving the need for smarter application delivery solutions to ensure availability, reliability, and scalability. Simple layer 4 (TCP) load balancing is not enough, neither is load balancing based on layer 7 (HTTP). Load balancing requests based on TCP or HTTP doesn't address the need to distribute application requests because the app is no longer HTTP, it's something else entirely. HTTP has been relegated to the status of application transport protocol, and that means in order to intelligently deliver an application we have to dig even deeper than layer 7. We've got to get inside. The problem is, of course, that there are no standards beyond HTTP. My JSON-based Web 2.0 application looks nothing like your SOAP-based Web 2.0 application. And yet a single solution must be able to adapt to those differences and provide the same level of scalability and reliability for me as it does you. It has to be extensible. It has to provide some mechanism for adding custom behavior and addressing the specific needs of application protocols that are unknown at the time the solution is created. This is an important facet of application delivery that is often overlooked. Applications aren't about HTTP anymore, they're about undefined and unknowable protocols. An application delivery solution can't distribute application load across servers unless it can understand which application it's supposed to be managing. And because HTTP connections are artificially limited by browsers, multiple application protocols are using the same HTTP connections over which to exchange data. That means an application delivery solution has to be able to dig into the application protocol and figure out where that request should be directed, and how to treat it, and what policies to apply. Application delivery today is about the message, not the protocol, and the message is undefined until it's created by a developer. There's a lot of traffic out there that's just HTTP, as it was conceived of and implemented years ago. But there's a growing amount of traffic out there that's more than HTTP, that's relegated this ubiquitous protocol to an application transport layer protocol and uses it as such to deliver custom applications that use protocols without RFCs, without standards bodies, without the W3C. If your application delivery solution doesn't offer a way that easily allows you to dig into the real application protocols, but instead relegates you to making load balancing and routing decisions based solely on HTTP, you need to reconsider your solution. HTTP is the de facto application transport protocol today, but because it's so often used this way we have to get smarter about how we load balance and distribute those messages riding on HTTP if we want to architect smarter, greener, more efficient architectures. Imbibing: Coffee222Views0likes1Comment