scalability
91 TopicsClient-side vs Server-side Load Balancing
Lei Zhu @ Digital Web Magazine has an interesting article on Client Side Load Balancing for Web 2.0 Applications. It is interesting in that it presents an alternative mechanism for implementing high-availability without the use of an intermediate load balancing solution. His solution relies solely on the client and takes advantage of the dynamic nature of Web 2.0. The problem with Lei's article is that there are a few assumptions made that are simply inaccurate. Lei contends that the negatives to using an intermediate load balancing solution are: There is a limit to the number of requests the load balancer itself can handle. However, this problem can be resolved with the combination of round-robin DNS and dedicated load balancers. The upper bounds of a hardware load balancing solution are typically higher than any given server can handle. We are talking about hundreds of thousands of requests per second. Round-robin DNS, while an industry standard, is one of the most rudimentary implementations of "load balancing" that exists. Load balancers, a.k.a. application delivery controllers, today are capable of making routing decisions based on much more intelligent factors than a simple list of servers. There is no server that can handle more requests than a load balancing solution today. Even SMB load balancers can process requests faster than any server and maintain a session table much larger than existing web and application servers. There is an extra cost related to operating a dedicated load balancer, which can run into tens of thousands of dollars. The backup load balancer generally does nothing other than wait for the primary to fail. It is absolutely true that an intermediate load balancer is going to require an investment. That the backup "does nothing" is entirely dependent upon the implementation. Load balancers today are capable of both active-standby (this is Lei's assumption here) as well as active-active configurations. In an active-active configuration both load balancers are working all the time. It is completely up to the implementor to determine which configuration should be used. Lei also claims: It is easier to make the client code and resources highly available and scalable than to do so for the servers—serving non-dynamic content requires fewer server resources. There are two statements here, one regarding the ease with which you can deploy a highly available site and that serving non-dynamic content requires fewer server resources. The latter is absolutely correct, it does indeed take less resources on the server to serve non-dynamic content. The former assertion, however, is not entirely true. Lei's article describes a methodology that requires modifications to not only the application (the client code) but also to the infrastructure (additional servers and entries in DNS to accomodate the new servers). A good application delivery controller will not require changes to the application and will be less disruptive to the infrastructure because it effectively intercepts requests for the domain and handles them without changes to the server-side of the equation. For example, if your domain is www.example.com and you implement a load balancing solution, the load balancer becomes www.example.com. Servers can be added to the pool (a.k.a. cluster, farm) of servers and it will then distribute requests amongst those servers without any further changes to the client or the servers. A good application delivery controller can distribute requests based on more than DNS round-robin algorithms, and can base its decisions in real time on the health and status of any given server. This is one the problems with Lei's arguments - he bases his decision upon a single, outdated mechanism for load balancing and uses that as the basis for recommending a client-side solution. And as far as scalability - let's look at that for a moment. In the intermediate load balancer scenario if we need more resources we (1) add another server, and (2) add it to the pool on the load balancer. In Lei's scenario we (1) add another server, (2) reconfigure the client-application, and (3) add another entry to DNS. Lei's solution is much more invasive and disruptive to every piece of the equation than the intermediate load balancing solution scenario, and Lei's solution is not guaranteed to work as he cannot ensure that the client browser's caching scheme will definitively update and include the new server. And lest we forget, Lei does not discuss the problem of persistence inherent in his client-side solution. A purely round-robin based solution that ignores client-session is limited in use and value. The solution must take into consideration the possibility that a client needs to be tied to a particular server for the duration of a session. This is particularly true of e-commerce sites. A client-side solution that does not take this into consideration (and Lei's solution does not) will not be useful in many situations. A good application delivery controller by default is capable of handling persistence and "stickiness" to a server - and again does so without requiring modification to the client or server side code. Lei's client-side load balancing solution is novel, and while it certainly might be interesting for individual developers whose livelihoods do not rely on the availability of their site this solution is not something that should be recommended lightly and without a great deal of forethought. The assumptions upon which his article is based are, with the exception the additional cost of a load balancing solution, outdated and inaccurate. Client-side load balancing requires more maintenance, more initial work to deploy, and introduces security risks by "opening the organization's kimono" and showing clients what lies beneath, namely the infrastructure architecture. Imbibing: Coffee Technorati tags: MacVittie, F5, load balancing, application delivery, scalability, high availability, Web 2.02.5KViews0likes1CommentHTML5 Web Sockets Changes the Scalability Game
#HTML5 Web Sockets are poised to completely change scalability models … again. Using Web Sockets instead of XMLHTTPRequest and AJAX polling methods will dramatically reduce the number of connections required by servers and thus has a positive impact on performance. But that reliance on a single connection also changes the scalability game, at least in terms of architecture. Here comes the (computer) science… If you aren’t familiar with what is sure to be a disruptive web technology you should be. Web Sockets, while not broadly in use (it is only a specification, and a non-stable one at that) today is getting a lot of attention based on its core precepts and model. Web Sockets Defined in the Communications section of the HTML5 specification, HTML5 Web Sockets represents the next evolution of web communications—a full-duplex, bidirectional communications channel that operates through a single socket over the Web. HTML5 Web Sockets provides a true standard that you can use to build scalable, real-time web applications. In addition, since it provides a socket that is native to the browser, it eliminates many of the problems Comet solutions are prone to. Web Sockets removes the overhead and dramatically reduces complexity. - HTML5 Web Sockets: A Quantum Leap in Scalability for the Web So far, so good. The premise upon which the improvements in scalability coming from Web Sockets are based is the elimination of HTTP headers (reduces bandwidth dramatically) and session management overhead that can be incurred by the closing and opening of TCP connections. There’s only one connection required between the client and server over which much smaller data segments can be sent without necessarily requiring a request and a response pair. That communication pattern is definitely more scalable from a performance perspective, and also has a positive impact of reducing the number of connections per client required on the server. Similar techniques have long been used in application delivery (TCP multiplexing) to achieve the same results – a more scalable application. So far, so good. Where the scalability model ends up having a significant impact on infrastructure and architectures is the longevity of that single connection: Unlike regular HTTP traffic, which uses a request/response protocol, WebSocket connections can remain open for a long time. - How HTML5 Web Sockets Interact With Proxy Servers This single, persistent connection combined with a lot of, shall we say, interesting commentary on the interaction with intermediate proxies such as load balancers. But ignoring that for the nonce, let’s focus on the “remain open for a long time.” A given application instance has a limit on the number of concurrent connections it can theoretically and operationally manage before it reaches the threshold at which performance begins to dramatically degrade. That’s the price paid for TCP session management in general by every device and server that manages TCP-based connections. But Lori, you’re thinking, HTTP 1.1 connections are persistent, too. In fact, you don’t even have to tell an HTTP 1.1 server to keep-alive the connection! This really isn’t a big change. Whoa there hoss, yes it is. While you’d be right in that HTTP connections are also persistent, they generally have very short connection timeout settings. For example, the default connection timeout for Apache 2.0 is 15 seconds and for Apache 2.2 a mere 5 seconds. A well-tuned web server, in fact, will have thresholds that closely match the interaction patterns of the application it is hosting. This is because it’s a recognized truism that long and often idle connections tie up server processes or threads that negatively impact overall capacity and performance. Thus the introduction of connections that remain open for a long time changes the capacity of the server and introduces potential performance issues when that same server is also tasked with managing other short-lived, connection-oriented requests. Why this Changes the Game… One of the most common inhibitors of scale and high-performance for web applications today is the deployment of both near-real-time communication functions (AJAX) and traditional web content functions on the same server. That’s because web servers do not support a per-application HTTP profile. That is to say, the configuration for a web server is global; every communication exchange uses the same configuration values such as connection timeouts. That means configuring the web server for exchanges that would benefit from a longer time out end up with a lot of hanging connections doing absolutely nothing because they were used to grab standard dynamic or static content and then ignored. Conversely, configuring for quick bursts of requests necessarily sets timeout values too low for near or real-time exchanges and can cause performance issues as a client continually opens and re-opens connections. Remember, an idle connection is a drain on resources that directly impacts the performance and capacity of applications. So it’s a Very Bad Thing™. One of the solutions to this somewhat frustrating conundrum, made more feasible by the advent of cloud computing and virtualization, is to deploy specialized servers in a scalability domain-based architecture using infrastructure scalability patterns. Another approach to ensuring scalability is to offload responsibility for performance and connection management to an appropriately capable intermediary. Now, one would hope that a web server implementing support for both HTTP and Web Sockets would support separately configurable values for communication settings on at least the protocol level. Today there are very few web servers that support both HTTP and Web Sockets. It’s a nascent and still evolving standard so many of the servers are “pure” Web Sockets servers, many implemented in familiar scripting languages like PHP and Python. Which means two separate sets of servers that must be managed and scaled. Which should sound a lot like … specialized servers in a scalability domain-based architecture. The more things change, the more they stay the same. The second impact on scalability architectures centers on the premise that Web Sockets keep one connection open over which message bits can be exchanged. This ties up resources, but it also requires that clients maintain a connection to a specific server instance. This means infrastructure (like load balancers and web/application servers) will need to support persistence (not the same as persistent, you can read about the difference here if you’re so inclined). That’s because once connected to a Web Socket service the performance benefits are only realized if you stay connected to that same service. If you don’t and end up opening a second (or Heaven-forbid a third or more) connection, the first connection may remain open until it times out. Given that the premise of the Web Socket is to stay open – even through potentially longer idle intervals – it may remain open, with no client, until the configured time out. That means completely useless resources tied up by … nothing. Persistence-based load balancing is a common feature of next-generation load balancers (application delivery controllers) and even most cloud-based load balancing services. It is also commonly implemented in application server clustering offerings, where you’ll find it called server-affinity. It is worth noting that persistence-based load balancing is not without its own set of gotchas when it comes to performance and capacity. THE ANSWER: ARCHITECTURE The reason that these two ramifications of Web Sockets impacts the scalability game is it requires an broader architectural approach to scalability. It can’t necessarily be achieved simply by duplicating services and distributing the load across them. Persistence requires collaboration with the load distribution mechanism and there are protocol-based security constraints with respect to incorporating even intra-domain content in a single page/application. While these security constraints are addressable through configuration, the same caveats with regards to the lack of granularity in configuration at the infrastructure (web/application server) layer must be made. Careful consideration of what may be accidentally allowed and/or disallowed is necessary to prevent unintended consequences. And that’s not even starting to consider the potential use of Web Sockets as an attack vector, particularly in the realm of DDoS. The long-lived nature of a Web Socket connection is bound to be exploited at some point in the future, which will engender another round of evaluating how to best address application-layer DDoS attacks. A service-focused, distributed (and collaborative) approach to scalability is likely to garner the highest levels of success when employing Web Socket-based functionality within a broader web application, as opposed to the popular cookie-cutter cloning approach made exceedingly easy by virtualization. Infrastructure Scalability Pattern: Partition by Function or Type Infrastructure Scalability Pattern: Sharding Sessions Amazon Makes the Cloud Sticky Load Balancing Fu: Beware the Algorithm and Sticky Sessions Et Tu, Browser? Forget Hyper-Scale. Think Hyper-Local Scale. Infrastructure Scalability Pattern: Sharding Streams Infrastructure Architecture: Whitelisting with JSON and API Keys Does This Application Make My Browser Look Fat? HTTP Now Serving … Everything1.2KViews0likes5CommentsWILS: Load Balancing and Ephemeral Port Exhaustion
Understanding the relationship between SNAT and connection limitations in full proxy intermediaries. If you’ve previously delved into the world of SNAT (which is becoming increasingly important in large-scale implementations, such as those in the service provider world) you remember that SNAT essentially provides an IP address from which a full-proxy intermediary can communicate with server-side resources and maintain control over the return routing path. There is an interesting relationship between intermediaries that leverage two separate TCP stacks (such as full-proxies) and SNAT in terms of concurrent (open) connections that can be supported by any given “virtual” server (or virtual IP address, as they are often referred to in the industry). The number of ephemeral ports that can be used by any client IP address is 65535. Programmer types will recognize that as a natural limitation imposed by the use of an unsigned short integer (16 bits) in many programming languages. Now, what that means is that for each SNAT address assigned to a virtual IP address, a theoretical total of 65535 connections can be open at any other single address at any given time. This is because in a full-proxy architecture the intermediary is acting as a client and while servers use well-known ports for communication, clients do not. They use ephemeral (temporary) ports, the value of which is communicated to the server in the source port field in the request. Each additional SNAT address available increases the total number of connections by some portion of that space. As you should never use ephemeral ports in the privileged range (port numbers under 1024 are traditionally reserved for firewall and other sanity checkers - see /etc/services on any Unix box) that number can be as many as 64512 available ports between the SNAT address and any other IP address. For example, if a server pool (virtual or iron) has 24 members and assuming the SNAT address is configured to use ephemeral ports in the range of 1024-65535, then a single SNAT address results in a total of 24 x 48k = 1,152k concurrent connections to the pool. If the SNAT is assigned to a virtual server that is targeting a single address (like another virtual server or another intermediate device) then the total connections is 1 x 48k = 48k connections. Obviously this has a rather profound impact on scalability and capacity planning. If you only have one SNAT address available and you need the capabilities of a full-proxy (such as payload inspection inbound and out) you can only support a limited number of connections (and by extension, users). Some solutions provide the means by which these limitations can be mitigated, such as the ability to configure a SNAT pool (a set of dedicated IP addresses) from which SNAT addresses can be automatically pulled and used to automatically increase the number of available ephemeral ports. Running out of ephemeral ports is known as “ephemeral port exhaustion” as you have exhausted the ports available from which a connection to the server resource can be made. In practice the number of ephemeral ports available for any given IP address can be limited by operating system implementations and is always much lower than the 65535 available per IP address. For example, the IANA official suggestion is that ephemeral ports use 49152 through 65535, which means a limitation of 16383 open connections per address. Any full-proxy intermediary that has adopted this suggestion would necessarily require more SNAT addresses to scale an application to more concurrent connections. One of the advantages of a solution implementing a custom TCP/IP stack, then, is that they can ignore the suggestion on ephemeral port assignment typically imposed at the operating system or underlying software layer and increase the range to the full 65535 if desired. Another major advantage is making aggressive use of TIME-WAIT recycling. Normal TCP stacks hold on to the ephemeral port for seconds to minutes after a connection closes. This leads to odd bursting behavior. With proper use of TCP timestamps you can recycle that ephemeral port almost immediately. Regardless, it is an important relationship to remember, especially if it appears that the Load balancer (intermediary) is suddenly the bottleneck when demand increases. It may be that you don’t have enough IP addresses and thus ports available to handle the load. WILS: Write It Like Seth. Seth Godin always gets his point across with brevity and wit. WILS is an ATTEMPT TO BE concise about application delivery TOPICS AND just get straight to the point. NO DILLY DALLYING AROUND. Related Posts All WILS Topics on DevCentral Server Virtualization versus Server Virtualization1KViews0likes0Comments6 Reasons You Need an Application Delivery Controller Now
Application delivery controllers, and load balancing in general, are often seen as solutions waiting for a problem to solve. We know what those problems are, but until we experience them we often don't feel a sense of urgency in acquiring and deploying an application delivery controller. While it's certainly true that an application delivery controller can solve many problems that arise, it's also true that there are benefits to acquiring and deploying an application delivery controller before it becomes absolutely necessary in order to save your application, your site, or your job. So here are six good reasons to consider deploying an application delivery controller now rather than waiting until the next emergency. 6. Efficiency An application delivery controller (ADC) can improve the efficiency of the servers for which it manages application requests. By offloading compute intensive processing like SSL or TCP/IP connection management an ADC reduces the overhead associated with assembling and serving responses to application requests and makes better use of the resources (RAM, CPU, I/O) on each server. Making your infrastructure more efficient is also a great way to "go green". 5. Performance The performance of your applications can be improved dramatically through the deployment of an ADC. Whether it's because of compression, caching, protocol optimizations, connection management or intelligent load balancing algorithms, an ADC improves the overall performance of your applications. 4. Reliability If you rely on applications for business processes or as a revenue stream, the last thing you want is for those application to be unavailable. An ADC provides reliability by ensuring that requests are sent only to available servers, redirecting requests when a server is down for maintenance or finally hit the wall and died. If you're large enough to have two data centers, an ADC with global load balancing capabilities furthers assurance of reliability by redirecting requests from the primary data center to a secondary in the event of a disaster - whether that's a natural disaster (earthquake, fire, flood) or man-made (oops - was that our DS3 I just ripped out?). 3. Security We're not talking about advanced security options like web application firewalls or secure remote access products such as an SSL VPN, we're just talking basic security here. DDoS protection, rate limiting, blacklisting, whitelisting, authentication, resource obfuscation, SSL, content encryption - the bare minimum security you need to protect your applications and the servers on which they're deployed. An ADC provides the core security functions you need to ensure your site is safe. 2. Capacity Capacity is about how much throughput, how many requests, how many users you can support. It's nearly impossible to support thousands of concurrent users with a single server, unless it's one really really really big server. You need more than one server, and in order to architect a solution that uses a pool of servers you need something to mediate and direct those requests - to balance the load across those servers. That means you need an ADC, because the core purpose of an ADC is to perform load balancing and ensure that you can serve everyone who wants to be served. 1. Scalability Scaling up to meet demand is difficult, doing so without re-architecting your infrastructure or scheduling down-time is even more difficult. By including an ADC in your architecture from the very beginning, the process becomes a simple one. Add a new server, add it to the ADC and voila! You've just scaled up and can instantly support more users and more requests without requiring downtime or moving around network cables. Imbibing: Mountain Dew999Views0likes0CommentsF5 Friday: Gracefully Scaling Down
What goes up, must come down. The question is how much it hurts (the user). An oft ignored side of elasticity is scaling down. Everyone associates scaling out/up with elasticity of cloud computing but the other side of the coin is just as important, maybe more so. After all, what goes up must come down. The trick is to scale down gracefully, i.e. to do it in such a way as to prevent the disruption of service to existing users while simultaneously trying to scale back down after a spike in demand. The ramifications of not scaling down are real in terms of utilization and therefore cost. Scaling up with the means to scale back down means higher costs, and simply shutting down an instance that is currently in use can result in angry users as service is disrupted. What’s necessary is to be able to gracefully scale down; to indicate somehow to the load balancing solution that a particular instance is no longer necessary and begin preparation for eventually shutting it down. Doing so gracefully requires that you are somehow able to quiesce or bleed off the connections. You want to continue to service those users who are currently connected to the instance while not accepting any new connections. This is one of the benefits of leveraging an application-aware application delivery controller versus a simple Load balancer: the ability to receive instruction in-process to begin preparation for shut down without interrupting existing connections. SERVING UP ACTIONABLE DATA BIG-IP users have always had the ability to specify whether disabling a particular “node” or “member” results in the rejection of all connections (including existing ones) or if it results in refusing new connections while allowing old ones to continue to completion. The latter technique is often used in preparation for maintenance on a particular server for applications (and businesses) that are sensitive to downtime. This method maintains availability while accommodating necessary maintenance. In version 10.2 of the core BIG-IP platform a new option was introduced that more easily enables the process of draining a server/application’s connections in preparation for being taken offline. Whether the purpose is maintenance or simply the scaling down side of elastic scalability is really irrelevant; the process is much the same. Being able to direct a load balancing service in the way in which connections are handled from the application is an increasingly important capability, especially in a public cloud computing environment because you are unlikely to have the direct access to the load balancing system necessary to manually engage this process. By providing the means by which an application can not only report but direct the load balancing service, some measure of customer control over the deployment environment is re-established without introducing the complexity of requiring the provider to manage the thousands (or more) credentials that would otherwise be required to allow this level of control over the load balancer’s behavior. HOW IT WORKS For specific types of monitors in LTM (Local Traffic Manager) – HTTP, HTTPS, TCP, and UDP – there is a new option called “Receive Disable String.” This “string” is just that, a string that is found within the content returned from the application as a result of the health check. In phase one we have three instances of an application (physical or virtual, doesn’t matter) that are all active. They all have active connections and are all receiving new connections. In phase two a health check on one server returns a response that includes the string “DISABLE ME.” BIG-IP sees this and, because of its configuration, knows that this means the instance of the application needs to gracefully go offline. LTM therefore continues to direct existing connections (sessions) with that instance to the right application (phase 3), but subsequently directs all new connection requests to the other instances in the pool (farm, cluster). When there are no more existing connections the instance can be taken offline or shut down with zero impact to users. The combination of “receive string” and “receive disable string” impacts the way in which BIG-IP interprets the instruction. A “receive string” typically describes the content received that indicates an available and properly executing application. This can be as simple as “HTTP 200 OK” or as complex as looking for a specific string in the response. Similarly the “receive disable” string indicates a particular string of text that indicates a desire to disable the node and begin the process of bleeding off connections. This could be as simple as “DISABLE” as indicated in the above diagram or it could just as easily be based solely on HTTP status codes. If an application instance starts returning 50x errors because it’s at capacity, the load balancing policy might include a live disable of the instance to allow it time to cool down – maintaining existing connections while not allowing new ones. Because action is based on matching a specific string, the possibilities are pretty much wide open. The following table describes the possible interactions between the two receive string types: LEVERAGING as a PROVIDER One of the ways in which a provider could leverage this functionality to provide differentiated value-added cloud services (as Randy Bias calls them) would be to define an application health monitoring API of sorts that allows customers to add to their application a specific set of URIs that are used solely for monitoring and can thus control the behavior of the load balancer without requiring per-customer access to the infrastructure itself. That’s a win-win, by the way. The customer gets control but so does the provider. Consider an health monitoring API that is a single URI: http://$APPLICATION_INSTANCE_HOSTNAME/health/check. Now provide a set of three options for customers to return (these are likely oversimplified for illustration purposes, but not by much): ENABLE QUIESCE DISABLE For all application instances the BIG-IP will automatically use an HTTP-derived monitor that calls $APP_INSTANCE/health/check and examines the result. The monitor would use “ENABLE” as the “receive string” and “QUIESCE” as the “receive disable” string. Based on the string returned by the application, the BIG-IP takes the appropriate action (as defined by the table above). Of course this can also easily be accomplished by providing a button on the cloud management interface to do the same via iControl, but this option is more able to be programmatically defined by customers and thus is more dynamic and allows for automation. And of course such an implementation isn’t relegated only to service providers; IT organizations in any environment can take advantage of such an implementation, especially if they’re working toward an automated data center and/or self-service provisioning/management of IT services. That is infrastructure as a service. Yes, this means modification to the application being deployed. No, I don’t think that’s a problem – cloud and Infrastructure as a Service (IaaS), at least real IaaS is going to necessarily require modifications to existing applications and new applications will need to include this type of integration in the future if we are to take advantage of the benefits afforded by a more application aware infrastructure and, conversely, a more infrastructure-aware application architecture. Related Posts999Views0likes1CommentWAN Optimization is not Application Acceleration
Increasingly WAN optimization solutions are adopting the application acceleration moniker, implying a focus that just does not exist. WAN optimization solutions are designed to improve the performance of the network, not applications, and while the former does beget improvements of the latter, true application acceleration solutions offer greater opportunity for improving efficiency and end-user experience as well as aiding in consolidation efforts that result in a reduction in operating and capital expenditure costs. WAN Optimization solutions are, as their title implies, focused on the WAN; on the network. It is their task to improve the utilization of bandwidth, arrest the effects of network congestion, and apply quality of service policies to speed delivery of critical application data by respecting application prioritization. WAN Optimization solutions achieve these goals primarily through the use of data de-duplication techniques. These techniques require a pair of devices as the technology is most often based on a replacement algorithm that seeks out common blocks of data and replaces them with a smaller representative tag or indicator that is interpreted by the paired device such that it can reinsert the common block of data before passing it on to the receiver. The base techniques used by WAN optimization are thus highly effective in scenarios in which large files are transferred back and forth over a connection by one or many people, as large chunks of data are often repeated and the de-duplication process significantly reduces the amount of data traversing the WAN and thus improves performance. Most WAN optimization solutions specifically implement “application” level acceleration for protocols aimed at the transfer of files such as CIFS and SAMBA. But WAN optimization solutions do very little to aid in the improvement of application performance when the data being exchanged is highly volatile and already transferred in small chunks. Web applications today are highly dynamic and personalized, making it less likely that a WAN optimization solution will find chunks of duplicated data large enough to make the overhead of the replacement process beneficial to application performance. In fact, the process of examining small chunks of data for potential duplicated chunks can introduce additional latency that actual degrades performance, much in the same way compression of small chunks of data can be detrimental to application performance. Too, WAN optimization solutions require deployment in pairs which results in what little benefits these solutions offer for web applications being enjoyed only by end-users in a location served by a “remote” device. Customers, partners, and roaming employees will not see improvements in performance because they are not served by a “remote” device. Application acceleration solutions, however, are not constrained by such limitations. Application acceleration solutions act at the higher layers of the stack, from TCP to HTTP, and attempt to improve performance through the optimization of protocols and the applications themselves. The optimizations of TCP, for example, reduce the overhead associated with TCP session management on servers and improve the capacity and performance of the actual application which in turn results in improved response times. The understanding of HTTP and both the browser and server allows application acceleration solutions to employ techniques that leverage cached data and industry standard compression to reduce the amount of data transferred without requiring a “remote” device. Application acceleration solutions are generally asymmetric, with some few also offering a symmetric mode. The former ensures that regardless of the location of the user, partner, or employee that some form of acceleration will provide a better end-user experience while the latter employs more traditional WAN optimization-like functionality to increase the improvements for clients served by a “remote” device. Regardless of the mode, application acceleration solutions improve the efficiency of servers and applications which results in higher capacities and can aid in consolidation efforts (fewer servers are required to serve the same user base with better performance) or simply lengthens the time available before additional investment in servers – and the associated licensing and management costs – must be made. Both WAN optimization and application acceleration aim to improve application performance, but they are not the same solutions nor do they even focus on the same types of applications. It is important to understand the type of application you want to accelerate before choosing a solution. If you are primarily concerned with office productivity applications and the exchange of large files (including backups, virtual images, etc…) between offices, then certainly WAN optimization solutions will provide greater benefits than application acceleration. If you’re concerned primarily about web application performance then application acceleration solutions will offer the greatest boost in performance and efficiency gains. But do not confuse WAN optimization with application acceleration. There is a reason WAN optimization-focused providers have recently begun to partner with application acceleration and application delivery providers – because there is a marked difference between the two types of solutions and a single offering that combines them both is not (yet) available.946Views0likes2CommentsBack to Basics: Load balancing Virtualized Applications
#virtualization load balancing in a virtualized world is the same as it ever was, but different. The introduction of virtualization and cloud computing to data centers has been heralded as “transformational” and “disruptive” and “game changing.” From an operational IT perspective, that’s absolutely true. But like transformational innovation in other industries, such disruption is often not in how the core solution is leveraged or used, but how it impacts operations and the broader ecosystem, rather than the individual tasked with using the solution. The transformation of the auto-industry, for example, toward alternative fuel-sourced vehicles is disruptive and changes much about the industry. But it doesn’t change the way you drive a car; it still works on the same principles and the skills you’ve learned driving gas powered cars are still applicable to alternative fuel-source cars. What changes for the operator – just as within IT - is there may be new concerns with which you must contend. Load balancing virtualized applications is in this category. While the core principles you’ve always applied to load balancing applications still applies, there are a few additional concerns that arise from the use of virtualization that you’re going to have to take into consideration. LOAD BALANCING 101 REFRESH Let’s remember quickly how load balancing traditional applications works, shall we? The load balancing service presents to the end-user a single endpoint, i.e. “the application”. Users communicate exclusively with that endpoint. The load balancing service communicates with a pool of resources comprised of one or more application instances. It is by adding instances to the pool that an application is able to scale horizontally to meet demand. In the most common traditional load balancing environment, each application instance is hosted on a single, physical server. The availability of the “application” is maintained by insuring there are always enough instances (nodes) available to compensate for any failures that might occur at the physical server, operating system, platform, or application layers. Load balancing services also allow for the designation of “back up” nodes. Each node in a pool may have a back up node that is only activated in the event of a failure. This is used primarily for high-availability purposes to ensure continuous application availability rather than for scaling purposes. Now, when we replace the physical servers with virtual servers, we have pretty much the same system. There still exists a pool of resources that comprise “the application”, the load balancing service still mediates for the end-user, and there are still enough application instances in the pool to compensate for failure, thus ensuring availability of “the application.” However, there are some new potential sources of failure that must be addressed that impact the topology – the physical placement – of the application instances in the pool. TWO RULES for LOAD BALANCING VIRTUALIZED APPLICATIONS One of the most important changes coming from virtualization that must be addressed is fault isolation. Assume for a moment that we took all four physical nodes and consolidated them on a single, physical virtualized platform. In theory, nothing changes. The load balancing service views a “node” as a unique combination of IP address and TCP port, and whether that’s hosted on a virtual platform or a physical server is irrelevant to the load balancing service. The load balancing algorithms still work the same way, nodes are selected as directed by configured policies, backup nodes are still used to ensure continuous availability, and nothing about the way in which load balancing works changes. But it’s very relevant to operations because this type of server-consolidated deployment model introduces higher unrecoverable failure scenarios and it will directly impact the performance (in a bad way) of “the application.” There are a couple operational axioms at work here: 1. Shared infrastructure (network, compute, storage) means shared risk. 2. As load increases, performance decreases. Let’s say “Node 1” fails. In both the physical and virtual deployments, the load is simply shifted to the remaining active nodes. No problem. But what if the network connectivity between the load balancing service and “Node 1” fails? In a physical deployment, no problem – each node has its own physical connection and is unlikely to impact the other nodes. But what about the virtual deployment? Each node has its own virtual network connection, certainly, but does it have its own physical network connection or is it shared? If it’s a shared physical connection and it fails, then all nodes will fail – leaving “the application” unavailable. Load Balancing Virtualized Applications Rule #1: Team and Trunk. Physical network redundancy is a must. Modern server platforms are generally enabled with at least 2 if not 4 GBE connections, use them. So now you’ve got your network topology designed to ensure that a physical failure will not take out every application instance on the server. Next you need to consider how the application instances are isolated and deployed to ensure that a failure at the hypervisor layer does not disrupt all application instances. Consider that there are two possible reasons you are implementing load balancing: scalability and availability. In the former, you’re trying to ensure supply meets demand. In the latter, you’re trying to mitigate potential failure in a way to ensure “the application” is always available, regardless of failure. If there is a failure at the hypervisor layer, all instances relying on that hypervisor will be impacted (and not in a good way). Regardless of why you’re implementing load balancing, the result of such a failure is the same, instances are unavailable. Similarly, if the physical device on which virtualized applications are deployed fails, every instance on that device will be down. In both cases, if all your virtual eggs are in one basket and there’s a failure at the hypervisor layer, you’re in trouble. Load Balancing Virtualized Applications Rule #2: Divide and Conquer. Application instance redundancy is a must. Never put all your application instances on a single virtualized or physical platform. Spread them across at least two, to isolate potential failures in the virtualization layer or at the physical server layer. Node backups should always be located on physically separate devices. Load balancing services are adept at discerning failure but they are not necessarily able to determine the source. A failure to communicate with an application instance could be caused by a bad cable, a failed port, an unresponsive network stack, or an application error. The load balancing service knows the application instance is down, but not necessarily why it’s down. If it’s a crashed instance, then failing over to a back up instance on the same server is probably going to work out fine. But if the root cause is a failed port or bad cable, failing over to a backup instance on the same server isn’t going to help – because it is down too. It is imperative to ensure availability that there are always at least two of everything – and that means physical devices, as well. Never put all your eggs in one basket – at any layer. THE PERFORMANCE IMPACT Aside from general availability issues, there is also the very real possibility that where you deploy virtualized application instances will impact performance of “the application.” Remember that even though you can designate CPU and memory on a per application instance, they still ultimately shared I/O – both storage and network. That means even if you use rate limiting technologies to try to manage bandwidth consumption as a means to reduce congestion or latency, ultimately you’re impacting performance. If you don’t use rate limiting or other bandwidth-focused solutions to manage the shared network resource, you run the risk of congestion and increasing latency on the wire. Similarly, shared storage is even more problematic because when you trace I/O down through the system, you end up at a single, shared I/O controller that is going to have some serious limitations on it. I/O intense application instances deployed on the same physical device are going to cause contention in the underlying system, which is going to negatively impact performance. Again, divide and conquer. Disperse such instances across two (or more) physical servers. The number of servers will depend on the overall scale of the application and the resource consumption rate. Load balancing will be able to assist in maintaining performance across instances if you take advantage of a response-time aware algorithm such as fastest response time (the assumption is that response time correlates directly to load and in most cases, this is true). This keeps any given instance from becoming overwhelmed. Ultimately, what this means is that you have to be a little more aware of physical deployment location for application instances than you did with pure physical deployments. Consolidation is a great way to reduce operational and capital expenditures, but it also means consolidating risk. LOCATION MATTERS This is a particularly tough nut to crack especially when combined with the desire to implement auto-scaling operations in a more cloud-like environment. The idea that you can leverage “whatever idle resources” you can find to scale out applications on-demand is powerful, but it’s also potentially fraught with risk if you’re unable to control placement at all. While the possibility that every instance would end up deployed on a single server or even a select handful of servers is minimal, there is the possibility that multiple instances could be deployed in a way that means a single server failure could eliminate a sizeable number of application instances, resulting in an unacceptable degradation of performance or even downtime for some percentage of users. In the end, location really does matter when it comes to load balancing virtualized applications. Where they are deployed and in what groupings becomes a critical factor for maintaining performance and availability. The tendency to increase VM density is high, but that tendency can lead to highly disruptive situations in the event of a failed component. Be aware that cost savings from mass-consolidation and “high efficiency” through increasing VM density metrics may look good now, but may not look so good through the lens of hindsight. Digital is Different The Cost of Ignoring ‘Non-Human’ Visitors Cloud Bursting: Gateway Drug for Hybrid Cloud The HTTP 2.0 War has Just Begun Why Layer 7 Load Balancing Doesn’t Suck Network versus Application Layer Prioritization Complexity Drives Consolidation Performance in the Cloud: Business Jitter is Bad934Views0likes1CommentArchitecting Scalable Infrastructures: CPS versus DPS
#webperf As we continue to find new ways to make connections more efficient, capacity planning must look to other metrics to ensure scalability without compromising performance. Infrastructure metrics have always been focused on speeds and feeds. Throughput, packets per second, connections per second, etc… These metrics have been used to evaluate and compare network infrastructure for years, ultimately being used as a critical component in data center design. This makes sense. After all, it's not rocket science to figure out that a firewall capable of handling 10,000 connections per second (CPS) will overwhelm a next hop (load balancer, A/V scanner, etc… ) device only capable of 5,000 CPS. Or will it? The problem with old skool performance metrics is they focus on ingress, not egress capacity. With SDN pushing a new focus on both northbound and southbound capabilities, it makes sense to revisit the metrics upon which we evaluate infrastructure and design data centers. CONNECTIONS versus DECISIONS As we've progressed from focusing on packets to sessions, from IP addresses to users, from servers to applications, we've necessarily seen an evolution in the intelligence of network components. It's not just application delivery that's gotten smarter, it's everything. Security, access control, bandwidth management, even routing (think NAC), has become much more intelligent. But that intelligence comes at a price: processing. That processing turns into latency as each device takes a certain amount of time to inspect, evaluate and ultimate decide what to do with the data. And therein lies the key to our conundrum: it makes a decision. That decision might be routing based or security based or even logging based. What the decision is is not as important as the fact that it must be made. SDN necessarily brings this key differentiator between legacy and next-generation infrastructure to the fore, as it's just software-defined but software-deciding networking. When a switch doesn't know what to do with a packet in SDN it asks the controller, which evaluates and makes a decision. The capacity of SDN – and of any modern infrastructure – is at least partially determined by how fast it can make decisions. Examples of decisions: URI-based routing (load balancers, application delivery controllers) Virus-scanning SPAM scanning Traffic anomaly scanning (IPS/IDS) SQLi / XSS inspection (web application firewalls) SYN flood protection (firewalls) BYOD policy enforcement (access control systems) Content scrubbing (web application firewalls) The DPS capacity of a system is not the same as its connection capacity, which is merely the measure of how many new connections a second can be established (and in many cases how many connections can be simultaneously sustained). Such a measure is merely determining how optimized the networking stack of any given solution might be, as connections – whether TCP or UDP or SMTP – are protocol oriented and it is the networking stack that determines how well connections are managed. The CPS rate of any given device tells us nothing about how well it will actually perform its appointed tasks. That's what the Decisions Per Second (DPS) metric tells us. CONSIDERING BOTH CPS and DPS Reality is that most systems will have a higher CPS compared to its DPS. That's not necessarily bad, as evaluating data as it flows through a device requires processing, and processing necessarily takes time. Using both CPS and DPS merely recognizes this truth and forces it to the fore, where it can be used to better design the network. A combined metric helps design the network by offering insight into the real capacity of a given device, rather than a marketing capacity. When we look only at CPS, for example, we might feel perfectly comfortable with a topological design with a flow of similar CPS capacities. But what we really want is to make sure that DPS –> CPS (and vice-versa) capabilities were matched up correctly, lest we introduce more latency than is necessary into a given flow. What we don't want is to end up with is a device with a high DPS rate feeding into a device with a lower CPS rate. We also don't want to design a flow in which DPS rates successively decline. Doing so means we're adding more and more latency into the equation. The DPS rate is a much better indicator of capacity than CPS for designing high-performance networks because it is a realistic measure of performance, and yet a high DPS coupled with a low CPS would be disastrous. Luckily, it is almost always the case that a mismatch in CPS and DPS will favor CPS, with DPS being the lower of the two metrics in almost all cases. What we want to see is as close a CPS:DPS ratio as possible. The ideal is 1:1, of course, but given the nature of inspecting data it is unrealistic to expect such a tight ratio. Still, if the ratio becomes too high, it indicates a potential bottleneck in the network that must be addressed. For example, assume an extreme case of a CPS:DPS of 2:1. The device can establish 10,000 CPS, but only process at a rate of 5,000 DPS, leading to increasing latency or other undesirable performance issues as connections queue up waiting to be processed. Obviously there's more at play than just new CPS and DPS (concurrent connection capability is also a factor) but the new CPS and DPS relationship is a good general indicator of potential issues. Knowing the DPS of a device enables architects to properly scale out the infrastructure to remediate potential bottlenecks. This is particularly true when TCP multiplexing is in play, because it necessarily reduces CPS to the target systems but in no way impacts the DPS. On the ingress, too, are emerging protocols like SPDY that make more efficient use of TCP connections, making CPS an unreliable measure of capacity, especially if DPS is significantly lower than the CPS rating of the system. Relying upon CPS alone – particularly when using TCP connection management technologies - as a means to achieve scalability can negatively impact performance. Testing systems to understand their DPS rate is paramount to designing a scalable infrastructure with consistent performance. The Need for (HTML5) Speed SPDY versus HTML5 WebSockets Y U No Support SPDY Yet? Curing the Cloud Performance Arrhythmia F5 Friday: Performance, Throughput and DPS Data Center Feng Shui: Architecting for Predictable Performance On Cloud, Integration and Performance902Views0likes0CommentsWhy Layer 7 Load Balancing Doesn’t Suck
Alternative title: Didn’t We Resolve This One 10 Years Ago? There’s always been a bit of a disconnect between traditional network-focused ops and more modern, application-focused ops when it comes to infrastructure architecture. The arguments for an against layer 7 (application) load balancing first appeared nearly a decade ago when the concept was introduced. These arguments were being tossed around at the same time we were all arguing for or against various QoS (Quality of Service) technologies, namely the now infamous “rate shaping” versus “queuing” debate. If you missed that one, well, imagine an argument similar to that today of “public” versus “private” clouds. Same goals, different focus on how to get there. Seeing this one crop up again is not really any more surprising than seeing some of the other old debates bubbling to the surface. cloud computing and virtualization have brought network infrastructure and its capabilities, advantages and disadvantages – as well as its role in architecting a dynamic data center – to the fore again. But seriously? You’d think the arguments would have evolved in the past ten years. While load balancing hardware marketing execs get very excited about the fact that their product can magically scale your application by using amazing Layer 7 technology in the Load balancer such as cookie inserts and tracking/re-writing. What they fail to mention is that any application that requires the load balancer to keep track of session related information within the communications stream can never ever be scalable or reliable. -- Why Layer 7 load balancing sucks… I myself have never shied away from mentioning the session management capabilities of a full-proxy application delivery controller (ADC) and, in fact, I have spent much time advocating on behalf of its benefits to architectural flexibility, scalability, and security. Second, argument by selective observation is a poor basis upon which to make any argument, particularly this one. While persistence-based load balancing is indeed one of the scenarios in which an advanced application delivery controller is often used (and in some environments, such as scaling out VDI deployments, is a requirement) this is a fairly rudimentary task assigned to such devices. The use of layer 7 routing, aka page routing in some circles (such as Facebook), is a highly desirable capability to have at your disposable. It enables more flexible, scalable architectures by creating scalability domains based on application-specific functions. There are a plethora of architectural patterns that leverage the use of an intelligent, application-aware intermediary (i.e. layer 7 load balancing capable ADC). Here’s a few I’ve written myself, but rest assured there are many more examples of infrastructure scalability patterns out there on the Internets: Infrastructure Architecture: Avoiding a Technical Ambush Infrastructure Scalability Pattern: Partition by Function or Type Applying Scalability Patterns to Infrastructure Architecture Infrastructure Scalability Pattern: Sharding Streams Interestingly, ten years ago this argument may have held some water. This was before full-proxy application delivery controllers were natively able to extract the necessary HTTP headers (where cookies are exchanged). That means in the past, such devices had to laboriously find and extract the cookie and its value from the textual representation of the headers, which obviously took time and processing cycles (i.e. added latency). But that’s not been the case since oh, 2004-2005 when such capabilities were recognized as being a requirement for most organizations and were moved into native processing, which reduced the impact of such extraction and evaluation to a negligible, i.e. trivial, amount of delay and effectively removed as an objection this argument. THE SECURITY FACTOR Both this technique and trivial tasks like tracking and re-writing do, as pointed out, require session tracking – TCP session tracking, to be exact. Interestingly, it turns out that this is a Very Good Idea TM from a security perspective as it provides a layer of protection against DDoS attacks at lower levels of the stack and enables an application-aware ADC to more effectively thwart application-layer DDoS attacks, such as SlowLoris and HTTP GET floods. A simple, layer 4 load balancing solution (one that ignores session tracking, as is implied by the author) can neither recognize nor defend against such attacks because they maintain no sense of state. Ultimately this means the application and/or web server is at high risk of being overwhelmed by even modest attacks, because these servers are incapable of scaling sessions at the magnitude required to sustain availability under such conditions. This is true in general of high volumes of traffic or under overwhelming loads due to processor-intense workloads. A full-proxy device mitigates many of the issues associated with concurrency and throughput simply by virtue of its dual-stacked nature. SCALABILITY As far as scalability goes, really? This is such an old and inaccurate argument (and put forth with no data, to boot) that I’m not sure it’s worth presenting a counter argument. Many thousands of customers use application delivery controllers to perform layer 7 load balancing specifically for availability assurance. Terminating sessions does require session management (see above), but to claim that this fact is a detriment to availability and business continuity shows a decided lack of understanding of failover mechanisms that provide near stateful-failover (true stateful-failover is impossible at any layer). The claim that such mechanisms require network bandwidth indicates either a purposeful ignorance with respect to modern failover mechanisms or simply failure to investigate. We’ve come a long way, baby, since then. In fact, it is nigh unto impossible for a simple layer 4 load balancing mechanism to provide the level of availability and consistency claimed, because such an architecture is incapable of monitoring the entire application (which consists of all instances of the application residing on app/web servers, regardless of form-factor). See axiom #3 (Context-Aware) in “The Three Axioms of Application Delivery”. The Case (For & Against) Network-Driven Scalability in Cloud Computing Environments The Case (For & Against) Management-Driven Scalability in Cloud Computing Environments The Case (For & Against) Application-Driven Scalability in Cloud Computing Environments Resolution to the Case (For & Against) X-Driven Scalability in Cloud Computing Environments The author’s claim seems to rest almost entirely on the argument “Google does layer 4 not layer 7” to which I say, “So?” If you’re going to tell part of the story, tell the entire story. Google also does not use a traditional three-tiered approach to application architecture, nor do its most demanding services (search) require state, nor does it lack the capacity necessary to thwart attacks (which cannot be said for most organizations on the planet). There is a big difference between modern, stateless applications (and many benefits) and traditional three-tiered application architectures. Now it may in fact be the case that regardless of architecture an application and its supporting infrastructure do not require layer 7 load balancing services. In that case, absolutely – go with layer 4. But to claim layer 7 load balancing is not scalable, or resilient, or high-performance enough when there is plenty of industry-wide evidence to prove otherwise is simply a case of not wanting to recognize it.806Views0likes0CommentsDire (Load Balancing) Straits: I Want My Client IP
load balancing fu for developers to avoid losing what is vital business data I want my client IP Now read the manuals that's the way you do it Give the IP to the SLB That ain't workin' that’s the way to do it Set the gateway on the server to your SLB (many apologies to Dire Straits) My brother called me last week with a load balancing emergency. See, for most retailers the “big day” of the year is the Friday after Thanksgiving. In Wisconsin, particularly for retailers that focus on water sports, the “big day” of the year is Memorial Day because it’s the first long weekend of the summer. My brother was up late trying to help one of those retailers prepare for the weekend and the inevitable spike in traffic to the websites of people looking for boats and water-skis and other water-related sports “things”. They’d just installed a pair of load balancers and though they worked just fine they had a huge problem on their hands: the application wasn’t getting the client IP but instead was logging the IP address of the load balancers. This is a common problem that crops up the first time an application is horizontally scaled out using a Load balancer. The business (sales and marketing) analysts need the IP address of the client in order to properly slice and dice data collected. The application must log the correct IP address or the data is virtually useless to the business. There are a couple of solutions to this problem; which one you choose will depend on the human resources you have available and whether or not you can change your network architecture. The latter would be particularly challenging for applications deployed in a cloud computing environment that are taking advantage of a load balancing solution deployed as a virtual network appliance or “softADC” along with the applications, because you only have control over the configuration of your virtual images and the virtual network appliance-hosted load balancer. THE PROBLEM The problem is that often times load balancers in small-medium sized datacenters and cloud computing environments are deployed in a flat network architecture, with the load balancer essentially in a one-armed (or side-armed) configuration. This means the load balancer needs only one interface, and there’s no complex network routing necessary because the load balancer and the servers are on the same network. It’s also often used as a transparent option for a new load balancing implementation when applications are already live and in use and disrupting service is not an option. But herein lies the source of the problem: because the load balancer and the servers are on the same network the load balancer must pretend to be the client in order to force the responses back through the load balancer (This is also true of larger and more complex configurations in which the load balancer acts as a full proxy). What happens generally is the load balancer replaces the client IP address in the network layer with its own IP address to force responses to come back to it, then rewrites the Ethernet header on the way back out. When the web server logs the transaction, the client IP address is the IP address of the load balancer, and that’s what gets written into the logs. This is also the case in larger deployments when the load balancer is a full proxy; the requests appear to come from the load balancer in order to apply the appropriate optimization and security policies on the responses, e.g. content scrubbing and TCP multiplexing. The load balancer/ADC cannot provide any kind of TCP connection optimization (such as is used to increase VM density) and application acceleration capabilities in such a mode because it never sees the responses. The result is a DSR (Direct Server Return) mode of operation, meaning that while requests traverse the load balancer, the responses don’t. This is the situation my brother found himself in (on his birthday, no less, what a dedicated guy) when he called me. It was Friday afternoon of Memorial Day weekend and the site absolutely had to be up and logging client IP addresses accurately or the client – a retailer of water-related goods - was going to be very, very, very angry. THE SOLUTION(s) You might be thinking the obvious solution to this problem is to turn on the spoofing option on the load balancer. It is, but it isn’t. Or at least it isn’t without a configuration change to the servers. See, the real problem in these scenarios is that the servers are still configured with a default gateway other than the load balancer. Enabling spoofing, i.e. the ability of the load balancer to pretend to be the client IP address, in this situation would have resulted in the web server’s responses still being routed around the load balancer because the servers, having no static route to the client IP, would automatically send them to the default gateway (the router). The solution is to change the network configuration of the web servers to use a default gateway that is the load balancer’s IP address, thus insuring that responses are routed back through the load balancer regardless of the client IP. Once that’s accomplished then you can enable SNAT and/or spoofing or “preserve client IP” or whatever the vendor refers to the functionality as and the web server logs will show the actual IP address of the client instead of the load balancer. Another option is to enable the load balancer to insert the “X-Forwarded-For” custom HTTP header. Most modern load balancers offer this as a checkbox configuration feature. Every HTTP request is modified by the load balancer to include the custom HTTP header with a value that is the client’s true IP address. It is then the application’s responsibility to extract that value instead of the default for logging and per-client personalization/authorization. The extent of the modification of the application depends on how reliant on the client IP address is. If it’s only for logging then a web/app server configuration change is likely the only change necessary. If the application makes use of that IP to apply policies or make application-specific decisions, then modification to the application itself will be necessary. THIS is WHY DEVOPS is NECESSARY As more developers take to the cloud as a deployment option for applications this very basic – but very frustrating – scenario is likely to occur more often than not as developers become more familiar with load balancers and deploy them in an IaaS environment. IaaS providers offering “auto-scaling” or “load balancing” services as a capability will have taken this problem into consideration when they architected their underlying infrastructure and thus the customer is relieved of responsibility for the nitty-gritty details. Customers deploying a separate virtualized load balancing solution – to take advantage of more robust load balancing algorithms and application delivery features like network-side scripting – will need to be aware of the impact of network configuration on the application’s ability to “see” the client IP address. Even if the customer is not concerned about the ability to extract the real client IP address it is still important to understand the impact to the return data path as well as the loss of functionality (optimization, security, acceleration) from routing around the load balancing solution on the application response. This particular knowledge falls into the responsibility demesne of what folks like James Urquhart and Damon Edwards and Shlomo Swidler (among many others) calls “devops”; developers whose skill set and knowledge expands into operations in order to successfully take advantage of cloud computing. Developers must be involved with and understand the implications of what has before been primarily an operational concern because it impacts their application and potentially impacts the ability of their applications to function properly. Development and operations must collaborate on just such concerns because it impacts the business and its ability to succeed in its functions, such as understanding the demographics of its customer base by applying GeoLocation technologies to client IP addresses. Without the real client IP address such demographic data mining fails and the business cannot make the adjustments necessary to its products/services. Business, operations, and development must collaborate to understand the requirements and then implement them in the most operationally efficient manner possible. Devops promises to enable that collaboration by tearing down the wall that has long existed between developers and operations.699Views0likes0Comments