scalability
89 Topics2.5 bad ways to implement a server load balancing architecture
I'm in a bit of mood after reading a Javaworld article on server load balancing that presents some fairly poor ideas on architectural implementations. It's not the concepts that are necessarily wrong; they will work. It's the architectures offered as a method of load balancing made me do a double-take and say "What?" I started reading this article because it was part 2 of a series on load balancing and this installment focused on application layer load balancing. You know, layer 7 load balancing. Something we at F5 just might know a thing or two about. But you never know where and from whom you'll learn something new, so I was eager to dive in and learn something. I learned something alright. I learned a couple of bad ways to implement a server load balancing architecture. TWO LOAD BALANCERS? The first indication I wasn't going to be pleased with these suggestions came with the description of a "popular" load-balancing architecture that included two load balancers: one for the transport layer (layer 4) and another for the application layer (layer 7). In contrast to low-level load balancing solutions, application-level server load balancing operates with application knowledge. One popular load-balancing architecture, shown in Figure 1, includes both an application-level load balancer and a transport-level load balancer. Even the most rudimentary, entry level load balancers on the market today - software and hardware, free and commercial - can handle both transport and application layer load balancing. There is absolutely no need to deploy two separate load balancers to handle two different layers in the stack. This is a poor architecture introducing unnecessary management and architectural complexity as well as additional points of failure into the network architecture. It's bad for performance because it introduces additional hops and points of inspection through which application messages must flow. To give the author credit he does recognize this and offers up a second option to counter the negative impact of the "additional network hops." One way to avoid additional network hops is to make use of the HTTP redirect directive. With the help of the redirect directive, the server reroutes a client to another location. Instead of returning the requested object, the server returns a redirect response such as 303. I found it interesting that the author cited an HTTP response code of 303, which is rarely returned in conjunction with redirects. More often a 302 is used. But it is valid, if not a bit odd. That's not the real problem with this one, anyway. The author claims "The HTTP redirect approach has two weaknesses." That's true, it has two weaknesses - and a few more as well. He correctly identifies that this approach does nothing for availability and exposes the infrastructure, which is a security risk. But he fails to mention that using HTTP redirects introduces additional latency because it requires additional requests that must be made by the client (increasing network traffic), and that it is further incapable of providing any other advanced functionality at the load balancing point because it essentially turns the architecture into a variation of a DSR (direct server return) configuration. THAT"S ONLY 2 BAD WAYS, WHERE'S THE .5? The half bad way comes from the fact that the solutions are presented as a Java based solution. They will work in the sense that they do what the author says they'll do, but they won't scale. Consider this: the reason you're implementing load balancing is to scale, because one server can't handle the load. A solution that involves putting a single server - with the same limitations on connections and session tables - in front of two servers with essentially the twice the capacity of the load balancer gains you nothing. The single server may be able to handle 1.5 times (if you're lucky) what the servers serving applications may be capable of due to the fact that the burden of processing application requests has been offloaded to the application servers, but you're still limited in the number of concurrent users and connections you can handle because it's limited by the platform on which you are deploying the solution. An application server acting as a cluster controller or load balancer simply doesn't scale as well as a purpose-built load balancing solution because it isn't optimized to be a load balancer and its resource management is limited to that of a typical application server. That's true whether you're using a software solution like Apache mod_proxy_balancer or hardware solution. So if you're implementing this type of a solution to scale an application, you aren't going to see the benefits you think you are, and in fact you may see a degradation of performance due to the introduction of additional hops, additional processing, and poorly designed network architectures. I'm all for load balancing, obviously, but I'm also all for doing it the right way. And these solutions are just not the right way to implement a load balancing solution unless you're trying to learn the concepts involved or are in a computer science class in college. If you're going to do something, do it right. And doing it right means taking into consideration the goals of the solution you're trying to implement. The goals of a load balancing solution are to provide availability and scale, neither of which the solutions presented in this article will truly achieve.322Views0likes1CommentHTML5 Web Sockets Changes the Scalability Game
#HTML5 Web Sockets are poised to completely change scalability models … again. Using Web Sockets instead of XMLHTTPRequest and AJAX polling methods will dramatically reduce the number of connections required by servers and thus has a positive impact on performance. But that reliance on a single connection also changes the scalability game, at least in terms of architecture. Here comes the (computer) science… If you aren’t familiar with what is sure to be a disruptive web technology you should be. Web Sockets, while not broadly in use (it is only a specification, and a non-stable one at that) today is getting a lot of attention based on its core precepts and model. Web Sockets Defined in the Communications section of the HTML5 specification, HTML5 Web Sockets represents the next evolution of web communications—a full-duplex, bidirectional communications channel that operates through a single socket over the Web. HTML5 Web Sockets provides a true standard that you can use to build scalable, real-time web applications. In addition, since it provides a socket that is native to the browser, it eliminates many of the problems Comet solutions are prone to. Web Sockets removes the overhead and dramatically reduces complexity. - HTML5 Web Sockets: A Quantum Leap in Scalability for the Web So far, so good. The premise upon which the improvements in scalability coming from Web Sockets are based is the elimination of HTTP headers (reduces bandwidth dramatically) and session management overhead that can be incurred by the closing and opening of TCP connections. There’s only one connection required between the client and server over which much smaller data segments can be sent without necessarily requiring a request and a response pair. That communication pattern is definitely more scalable from a performance perspective, and also has a positive impact of reducing the number of connections per client required on the server. Similar techniques have long been used in application delivery (TCP multiplexing) to achieve the same results – a more scalable application. So far, so good. Where the scalability model ends up having a significant impact on infrastructure and architectures is the longevity of that single connection: Unlike regular HTTP traffic, which uses a request/response protocol, WebSocket connections can remain open for a long time. - How HTML5 Web Sockets Interact With Proxy Servers This single, persistent connection combined with a lot of, shall we say, interesting commentary on the interaction with intermediate proxies such as load balancers. But ignoring that for the nonce, let’s focus on the “remain open for a long time.” A given application instance has a limit on the number of concurrent connections it can theoretically and operationally manage before it reaches the threshold at which performance begins to dramatically degrade. That’s the price paid for TCP session management in general by every device and server that manages TCP-based connections. But Lori, you’re thinking, HTTP 1.1 connections are persistent, too. In fact, you don’t even have to tell an HTTP 1.1 server to keep-alive the connection! This really isn’t a big change. Whoa there hoss, yes it is. While you’d be right in that HTTP connections are also persistent, they generally have very short connection timeout settings. For example, the default connection timeout for Apache 2.0 is 15 seconds and for Apache 2.2 a mere 5 seconds. A well-tuned web server, in fact, will have thresholds that closely match the interaction patterns of the application it is hosting. This is because it’s a recognized truism that long and often idle connections tie up server processes or threads that negatively impact overall capacity and performance. Thus the introduction of connections that remain open for a long time changes the capacity of the server and introduces potential performance issues when that same server is also tasked with managing other short-lived, connection-oriented requests. Why this Changes the Game… One of the most common inhibitors of scale and high-performance for web applications today is the deployment of both near-real-time communication functions (AJAX) and traditional web content functions on the same server. That’s because web servers do not support a per-application HTTP profile. That is to say, the configuration for a web server is global; every communication exchange uses the same configuration values such as connection timeouts. That means configuring the web server for exchanges that would benefit from a longer time out end up with a lot of hanging connections doing absolutely nothing because they were used to grab standard dynamic or static content and then ignored. Conversely, configuring for quick bursts of requests necessarily sets timeout values too low for near or real-time exchanges and can cause performance issues as a client continually opens and re-opens connections. Remember, an idle connection is a drain on resources that directly impacts the performance and capacity of applications. So it’s a Very Bad Thing™. One of the solutions to this somewhat frustrating conundrum, made more feasible by the advent of cloud computing and virtualization, is to deploy specialized servers in a scalability domain-based architecture using infrastructure scalability patterns. Another approach to ensuring scalability is to offload responsibility for performance and connection management to an appropriately capable intermediary. Now, one would hope that a web server implementing support for both HTTP and Web Sockets would support separately configurable values for communication settings on at least the protocol level. Today there are very few web servers that support both HTTP and Web Sockets. It’s a nascent and still evolving standard so many of the servers are “pure” Web Sockets servers, many implemented in familiar scripting languages like PHP and Python. Which means two separate sets of servers that must be managed and scaled. Which should sound a lot like … specialized servers in a scalability domain-based architecture. The more things change, the more they stay the same. The second impact on scalability architectures centers on the premise that Web Sockets keep one connection open over which message bits can be exchanged. This ties up resources, but it also requires that clients maintain a connection to a specific server instance. This means infrastructure (like load balancers and web/application servers) will need to support persistence (not the same as persistent, you can read about the difference here if you’re so inclined). That’s because once connected to a Web Socket service the performance benefits are only realized if you stay connected to that same service. If you don’t and end up opening a second (or Heaven-forbid a third or more) connection, the first connection may remain open until it times out. Given that the premise of the Web Socket is to stay open – even through potentially longer idle intervals – it may remain open, with no client, until the configured time out. That means completely useless resources tied up by … nothing. Persistence-based load balancing is a common feature of next-generation load balancers (application delivery controllers) and even most cloud-based load balancing services. It is also commonly implemented in application server clustering offerings, where you’ll find it called server-affinity. It is worth noting that persistence-based load balancing is not without its own set of gotchas when it comes to performance and capacity. THE ANSWER: ARCHITECTURE The reason that these two ramifications of Web Sockets impacts the scalability game is it requires an broader architectural approach to scalability. It can’t necessarily be achieved simply by duplicating services and distributing the load across them. Persistence requires collaboration with the load distribution mechanism and there are protocol-based security constraints with respect to incorporating even intra-domain content in a single page/application. While these security constraints are addressable through configuration, the same caveats with regards to the lack of granularity in configuration at the infrastructure (web/application server) layer must be made. Careful consideration of what may be accidentally allowed and/or disallowed is necessary to prevent unintended consequences. And that’s not even starting to consider the potential use of Web Sockets as an attack vector, particularly in the realm of DDoS. The long-lived nature of a Web Socket connection is bound to be exploited at some point in the future, which will engender another round of evaluating how to best address application-layer DDoS attacks. A service-focused, distributed (and collaborative) approach to scalability is likely to garner the highest levels of success when employing Web Socket-based functionality within a broader web application, as opposed to the popular cookie-cutter cloning approach made exceedingly easy by virtualization. Infrastructure Scalability Pattern: Partition by Function or Type Infrastructure Scalability Pattern: Sharding Sessions Amazon Makes the Cloud Sticky Load Balancing Fu: Beware the Algorithm and Sticky Sessions Et Tu, Browser? Forget Hyper-Scale. Think Hyper-Local Scale. Infrastructure Scalability Pattern: Sharding Streams Infrastructure Architecture: Whitelisting with JSON and API Keys Does This Application Make My Browser Look Fat? HTTP Now Serving … Everything645Views0likes5Comments4 Things You Need in a Cloud Computing Infrastructure
Cloud computing is, at its core, about delivering applications or services in an on-demand environment. Cloud computing providers will need to support hundreds of thousands of users and applications/services and ensure that they are fast, secure, and available. In order to accomplish this goal, they'll need to build a dynamic, intelligent infrastructure with four core properties in mind: transparency, scalability, monitoring/management, and security. Transparency One of the premises of Cloud Computing is that services are delivered transparently regardless of the physical implementation within the "cloud". Transparency is one of the foundational concepts of cloud computing, in that the actual implementation of services in the "cloud" are obscured from the user. This is actually another version of virtualization, where multiple resources appear to the user as a single resource. It is unlikely that a single server or resource will always be enough to satisfy demand for a given provisioned resource, which means transparent load-balancing and application delivery will be required to enable the transparent horizontal scaling of applications on-demand. The application delivery solution used to provide transparent load-balancing services will need to be automated and integrated into the provisioning workflow process such that resources can be provisioned on-demand at any time. Related Articles from around the Web What cloud computing really means How Cloud & Utility Computing Are Different The dangers of cloud computing Guide To Cloud Computing For example, when a service is provisioned to a user or organization, it may need only a single server (real or virtual) to handle demand. But as more users access that service it may require the addition of more servers (real or virtual). Transparency allows those additional servers to be added to the provisioned service without interrupting the service or requiring reconfiguration of the application delivery solution. If the application delivery solution is integrated via a management API with the provisioning workflow system, then transparency is also achieved through the automated provisioning and de-provisioning of resources. Scalability Obviously cloud computing service providers are going to need to scale up and build out "mega data centers". Scalability is easy enough if you've deployed the proper application delivery solution, but what about scaling the application delivery solution? That's often not so easy and it usually isn't a transparent process; there's configuration work and, in many cases, re-architecting of the network. The potential to interrupt services is huge, and assuming that cloud computing service providers are servicing hundreds of thousands of customers, unacceptable. The application delivery solution is going to need to not only provide the ability to transparently scale the service infrastructure, but itself, as well. That's a tall order, and something very rarely seen in an application delivery solution. Making things even more difficult will be the need to scale on-demand in real-time in order to make the most efficient use of application infrastructure resources. Many postulate that this will require a virtualized infrastructure such that resources can be provisioned and de-provisioned quickly, easily and, one hopes, automatically. The "control node" often depicted in high-level diagrams of the "cloud computing mega data center" will need to provide on-demand dynamic application scalability. This means integration with the virtualization solution and the ability to be orchestrated into a workflow or process that manages provisioning. Intelligent Monitoring In order to achieve the on-demand scalability and transparency required of a mega data center in the cloud, the control node, i.e. application delivery solution, will need to have intelligent monitoring capabilities. It will need to understand when a particular server is overwhelmed and when network conditions are adversely affecting application performance. It needs to know the applications and services being served from the cloud and understand when behavior is outside accepted norms. While this functionality can certainly be implemented externally in a massive management monitoring system, if the control node sees clients, the network, and the state of the applications it is in the best position to understand the real-time conditions and performance of all involved parties without requiring the heavy lifting of correlation that would be required by an external monitoring system. But more than just knowing when an application or service is in trouble, the application delivery mechanism should be able to take action based on that information. If an application is responding slowly and is detected by the monitoring mechanism, then the delivery solution should adjust application requests accordingly. If the number of concurrent users accessing a service is reaching capacity, then the application delivery solution should be able to not only detect that through intelligent monitoring but participate in the provisioning of another instance of the service in order to ensure service to all clients. Security Cloud computing is somewhat risky in that if the security of the cloud is compromised potentially all services and associated data within the cloud are at risk. That means that the mega data center must be architected with security in mind, and it must be considered a priority for every application, service, and network infrastructure solution that is deployed. The application delivery solution, as the "control node" in the mega data center, is necessarily one of the first entry points into the cloud data center and must itself be secure. It should also provide full application security - from layer 2 to layer 7 - in order to thwart potential attacks at the edge. Network security, protocol security, transport layer security, and application security should be prime candidates for implementation at the edge of the cloud, in the control node. While there certainly will be, and should be, additional security measures deployed within the data center, stopping as many potential threats as possible at the edge of the cloud will alleviate much of the risk to the internal service infrastructure. What are your plans for cloud computing? ( polls)392Views0likes2CommentsSaaS Creating Eventually Consistent Business Model
Our reliance on #cloud and external systems has finally trickled down (or is it up?) to the business. The success of SOA, which grew out of the popular Object Oriented development paradigm, was greatly hampered by the inability of architects to enforce its central premise of reuse. But it wasn't just the lack of reusing services that caused it to fail to achieve the greatness predicted, it was the lack of adopting the idea of an authoritative source for business critical objects, i.e. data. A customer, an order, a lead, a prospect, a service call. These "business objects" within SOA were intended to represented by a single, authoritative source as a means to ultimately provide a more holistic view of a customer that could be then be used by various business applications to ensure more quality service. It didn't turn out that way, mores the pity, and while organizations adopted the protocols and programmatic methods associated with SOA, they never really got down to the business of implementing authoritative sources for business critical "objects". As organizations increasingly turn to SaaS solutions, particularly for CRM and SFA solutions (Gartner’s Market Trends: SaaS’s Varied Levels of Cannibalization to On-Premises Applications published: 29 October 2012) the ability to enforce a single, authoritative source becomes even more unpossible. What's perhaps even more disturbing is the potential inability to generate that holistic view of a customer that's so important to managing customer relationships and business processes. The New Normal Organizations have had to return to an integration-focused strategy in order to provide applications with the most current view of a customer. Unfortunately, that strategy often relies upon APIs from SaaS vendors who necessarily put limits on APIs that can interfere with that integration. As noted in "The Quest for a Cloud Integration Strategy", these limitations can strangle integration efforts to reassemble a holistic view of business objects as an organization grows: "...many SaaS applications have very particular usage restrictions about how much data can be sent through their API in a given time window. It is critical that as data volumes increase that the solution adequately is aware of and handles those restrictions." Note that the integration solution must be "aware of" and "handle" the restrictions. It is nearly a foregone conclusion that these limitations will eventually be met and there is no real solution around them save paying for more, if that's even an option. While certainly that approach works for the provider - it keeps the service available - the definition of availability with respect to data is that it's, well, available. That means accessible. The existence of limitations means that at times and under certain conditions, your data will not be accessible, ergo by most folks definition it's not available. If it's not available, the ability to put together a view of the customer is pretty much out of the question. But eventually, it'll get there, right? Eventually, you'll have the data. Eventually, the data you're basing decisions on, managing customers with, and basing manufacturing process on, will be consistent with reality. Kicking Costs Down the Road - and Over the Wall Many point to exorbitant IT costs to setup, scale, and maintain on-premise systems such as CRM. It is truth that a SaaS solution is faster and likely less expensive to maintain and scale. But it is also true that if the SaaS is unable to scale along with your business in terms of your ability to access, integrate, and analyze your own data, that you're merely kicking those capital and operating expenses down to the road - and over the wall to the business. The problem of limitations on cloud integration (specifically SaaS integration) methods are not trivial. A perusal of support forums shows a variety of discussion on how to circumvent, avoid, and workaround these limitations to enable timely integration of data with other critical systems upon which business stakeholders rely to carry out their daily responsibilities to each other, to their investors, and to their customers. Fulfillment, for example, may rely on data it receives as a result of integration with a SaaS. It is difficult to estimate fulfillment on data that may or may not be up to date and thus may not be consistent with the customer's view. Accounting may be relying on data it assumes is accurate, but actually is not. Most SaaS systems impose a 24 hour interval in which it enforces API access limits, which may set the books off by as much as a day - or more, depending on how much of a backlog may occur. Customers may be interfacing with systems that integrate with back-office SaaS that shows incomplete order histories, payments and deliveries, which in turn can result in increasing call center costs to deal with the inaccuracies. The inability to access critical business data has a domino effect on every other system in place. The more distributed the sources of authoritative data the more disruptive an effect the inability to access that data due to provider-imposed limitations has on the entire business. Eventually consistent business models are not optimal, yet the massive adoption of SaaS solutions make such a model inevitable for organizations of all sizes as they encounter artificial limitations imposed to ensure system wide availability but not necessarily individual data accessibility. Being aware of such limitations can enable the development and implementation of strategies designed to keep data - especially authoritative data - as consistent as possible. But ultimately, any strategy is going to be highly dependent upon the provider and its ability to scale to meet demand - and loosen limitations on accessibility.231Views0likes1CommentTaking the driver’s seat in the App-centric journey
The app economy has taken hold of the world at lightning pace. It is as much social and cultural as it is economic. At its core is how it affects and engages the wider technology landscape. Connectivity is no longer a mere mechanism of interconnected ‘dumb pipes’. Thanks to the rise of the Internet of Things (IoT), connectivity is evolving into an ecosystem of increasingly intelligent, bi–directional streams of knowledge. This shift to sensor-driven, wireless connectivity for devices, appliances, inert objects from the high-end (e.g. tracking electricity flows for cities) to the mundane (e.g. replacing the milk in your fridge) has taken the ubiquity of the Internet to a new level and is compelling companies of all sizes to transform and adapt. Gartner predicts that by 2020, Internet of Things will create $1.9 trillion of economic value add, globally. In 2009, there were 2.5 billion connected devices, globally; most of these were mobile phones, PCs and tablets. In 2020, Gartner predicts there will be over 30 billion devices connected, of far greater variety. While the demand is pushing new boundaries in connectivity and its applications, the technology ecosystem for enabling the IoT is highly fragmented. In order to make this new layer of interconnectedness work, solutions must be curated from various providers of sensors and communications modules, network management and control systems, communications networks, enterprise applications and customer-facing applications. The network borders as we knew them are collapsing and in their place are seemingly disparate clusters of cloud and mobility, and the ever present generation of new sources of data and higher stream of traffic. For companies navigating this new terrain, management and planning of infrastructure is essential and understanding how to get ‘social intelligence’ in the connectivity fabric is paramount. Intelligent data centres are now in vogue, splitting the control plane with data plane so that data and services can be shared and moved within an expandable network fabric. In order for ‘social intelligence’ to work, data and services need to be orchestrated efficiently and simply from a single point of management. Out of this shift, Software Defined Application Services (SDAS), the next inevitable phase in the evolution of application delivery has emerged. SDAS is the result of delivering highly flexible and programmatic application services from a unified, high-performance application service fabric and serves to solve the significant challenges that the IoT is creating. SDAS relies on abstraction, the ability to take advantage of resources pooled from any combination of physical, virtual and cloud-deployed platforms. SDAS provides answers to new questions enterprises are facing from controlling application delivery via traffic management over broadening applications services to include web application security, mobility, LTE, domain name services in addition to facilitating cloud-ready services. The Synthesis solution is already changing the way we interact with our customers and partners by giving them flexibility and the ability to be as inclusive as they like across their application choices. The elastic, multi-tenant service fabric that delivers SDAS can cluster up to 32 F5 devices deployed across any combination of hardware, software or cloud and supports up to 80 unique instances per device. That translates to a combined throughput of 20 TB and connection capacity of 9.2 billion, more than three times the capacity needed to connect every Internet user in the world. The changing application and network architecture landscape requires such an evolution in software defined app services. The new application centred world needs solutions that empower IT and business stakeholders to align technology with their biggest challenges. The application services that have become critical to ensuring the reliability, security and performance of the plethora of applications that enterprises engage with must be provisioned, managed and scaled in a way that aligns with an application-driven world. As the industry collectively builds the faster, broader super connected, super highway for Software Defined Network players, we are ensuring that all cars get from point A to B smoothly, smartly and with tools to facilitate their their journey.236Views0likes0CommentsF5 Friday: Gracefully Scaling Down
What goes up, must come down. The question is how much it hurts (the user). An oft ignored side of elasticity is scaling down. Everyone associates scaling out/up with elasticity of cloud computing but the other side of the coin is just as important, maybe more so. After all, what goes up must come down. The trick is to scale down gracefully, i.e. to do it in such a way as to prevent the disruption of service to existing users while simultaneously trying to scale back down after a spike in demand. The ramifications of not scaling down are real in terms of utilization and therefore cost. Scaling up with the means to scale back down means higher costs, and simply shutting down an instance that is currently in use can result in angry users as service is disrupted. What’s necessary is to be able to gracefully scale down; to indicate somehow to the load balancing solution that a particular instance is no longer necessary and begin preparation for eventually shutting it down. Doing so gracefully requires that you are somehow able to quiesce or bleed off the connections. You want to continue to service those users who are currently connected to the instance while not accepting any new connections. This is one of the benefits of leveraging an application-aware application delivery controller versus a simple Load balancer: the ability to receive instruction in-process to begin preparation for shut down without interrupting existing connections. SERVING UP ACTIONABLE DATA BIG-IP users have always had the ability to specify whether disabling a particular “node” or “member” results in the rejection of all connections (including existing ones) or if it results in refusing new connections while allowing old ones to continue to completion. The latter technique is often used in preparation for maintenance on a particular server for applications (and businesses) that are sensitive to downtime. This method maintains availability while accommodating necessary maintenance. In version 10.2 of the core BIG-IP platform a new option was introduced that more easily enables the process of draining a server/application’s connections in preparation for being taken offline. Whether the purpose is maintenance or simply the scaling down side of elastic scalability is really irrelevant; the process is much the same. Being able to direct a load balancing service in the way in which connections are handled from the application is an increasingly important capability, especially in a public cloud computing environment because you are unlikely to have the direct access to the load balancing system necessary to manually engage this process. By providing the means by which an application can not only report but direct the load balancing service, some measure of customer control over the deployment environment is re-established without introducing the complexity of requiring the provider to manage the thousands (or more) credentials that would otherwise be required to allow this level of control over the load balancer’s behavior. HOW IT WORKS For specific types of monitors in LTM (Local Traffic Manager) – HTTP, HTTPS, TCP, and UDP – there is a new option called “Receive Disable String.” This “string” is just that, a string that is found within the content returned from the application as a result of the health check. In phase one we have three instances of an application (physical or virtual, doesn’t matter) that are all active. They all have active connections and are all receiving new connections. In phase two a health check on one server returns a response that includes the string “DISABLE ME.” BIG-IP sees this and, because of its configuration, knows that this means the instance of the application needs to gracefully go offline. LTM therefore continues to direct existing connections (sessions) with that instance to the right application (phase 3), but subsequently directs all new connection requests to the other instances in the pool (farm, cluster). When there are no more existing connections the instance can be taken offline or shut down with zero impact to users. The combination of “receive string” and “receive disable string” impacts the way in which BIG-IP interprets the instruction. A “receive string” typically describes the content received that indicates an available and properly executing application. This can be as simple as “HTTP 200 OK” or as complex as looking for a specific string in the response. Similarly the “receive disable” string indicates a particular string of text that indicates a desire to disable the node and begin the process of bleeding off connections. This could be as simple as “DISABLE” as indicated in the above diagram or it could just as easily be based solely on HTTP status codes. If an application instance starts returning 50x errors because it’s at capacity, the load balancing policy might include a live disable of the instance to allow it time to cool down – maintaining existing connections while not allowing new ones. Because action is based on matching a specific string, the possibilities are pretty much wide open. The following table describes the possible interactions between the two receive string types: LEVERAGING as a PROVIDER One of the ways in which a provider could leverage this functionality to provide differentiated value-added cloud services (as Randy Bias calls them) would be to define an application health monitoring API of sorts that allows customers to add to their application a specific set of URIs that are used solely for monitoring and can thus control the behavior of the load balancer without requiring per-customer access to the infrastructure itself. That’s a win-win, by the way. The customer gets control but so does the provider. Consider an health monitoring API that is a single URI: http://$APPLICATION_INSTANCE_HOSTNAME/health/check. Now provide a set of three options for customers to return (these are likely oversimplified for illustration purposes, but not by much): ENABLE QUIESCE DISABLE For all application instances the BIG-IP will automatically use an HTTP-derived monitor that calls $APP_INSTANCE/health/check and examines the result. The monitor would use “ENABLE” as the “receive string” and “QUIESCE” as the “receive disable” string. Based on the string returned by the application, the BIG-IP takes the appropriate action (as defined by the table above). Of course this can also easily be accomplished by providing a button on the cloud management interface to do the same via iControl, but this option is more able to be programmatically defined by customers and thus is more dynamic and allows for automation. And of course such an implementation isn’t relegated only to service providers; IT organizations in any environment can take advantage of such an implementation, especially if they’re working toward an automated data center and/or self-service provisioning/management of IT services. That is infrastructure as a service. Yes, this means modification to the application being deployed. No, I don’t think that’s a problem – cloud and Infrastructure as a Service (IaaS), at least real IaaS is going to necessarily require modifications to existing applications and new applications will need to include this type of integration in the future if we are to take advantage of the benefits afforded by a more application aware infrastructure and, conversely, a more infrastructure-aware application architecture. Related Posts735Views0likes1CommentBack to Basics: Load balancing Virtualized Applications
#virtualization load balancing in a virtualized world is the same as it ever was, but different. The introduction of virtualization and cloud computing to data centers has been heralded as “transformational” and “disruptive” and “game changing.” From an operational IT perspective, that’s absolutely true. But like transformational innovation in other industries, such disruption is often not in how the core solution is leveraged or used, but how it impacts operations and the broader ecosystem, rather than the individual tasked with using the solution. The transformation of the auto-industry, for example, toward alternative fuel-sourced vehicles is disruptive and changes much about the industry. But it doesn’t change the way you drive a car; it still works on the same principles and the skills you’ve learned driving gas powered cars are still applicable to alternative fuel-source cars. What changes for the operator – just as within IT - is there may be new concerns with which you must contend. Load balancing virtualized applications is in this category. While the core principles you’ve always applied to load balancing applications still applies, there are a few additional concerns that arise from the use of virtualization that you’re going to have to take into consideration. LOAD BALANCING 101 REFRESH Let’s remember quickly how load balancing traditional applications works, shall we? The load balancing service presents to the end-user a single endpoint, i.e. “the application”. Users communicate exclusively with that endpoint. The load balancing service communicates with a pool of resources comprised of one or more application instances. It is by adding instances to the pool that an application is able to scale horizontally to meet demand. In the most common traditional load balancing environment, each application instance is hosted on a single, physical server. The availability of the “application” is maintained by insuring there are always enough instances (nodes) available to compensate for any failures that might occur at the physical server, operating system, platform, or application layers. Load balancing services also allow for the designation of “back up” nodes. Each node in a pool may have a back up node that is only activated in the event of a failure. This is used primarily for high-availability purposes to ensure continuous application availability rather than for scaling purposes. Now, when we replace the physical servers with virtual servers, we have pretty much the same system. There still exists a pool of resources that comprise “the application”, the load balancing service still mediates for the end-user, and there are still enough application instances in the pool to compensate for failure, thus ensuring availability of “the application.” However, there are some new potential sources of failure that must be addressed that impact the topology – the physical placement – of the application instances in the pool. TWO RULES for LOAD BALANCING VIRTUALIZED APPLICATIONS One of the most important changes coming from virtualization that must be addressed is fault isolation. Assume for a moment that we took all four physical nodes and consolidated them on a single, physical virtualized platform. In theory, nothing changes. The load balancing service views a “node” as a unique combination of IP address and TCP port, and whether that’s hosted on a virtual platform or a physical server is irrelevant to the load balancing service. The load balancing algorithms still work the same way, nodes are selected as directed by configured policies, backup nodes are still used to ensure continuous availability, and nothing about the way in which load balancing works changes. But it’s very relevant to operations because this type of server-consolidated deployment model introduces higher unrecoverable failure scenarios and it will directly impact the performance (in a bad way) of “the application.” There are a couple operational axioms at work here: 1. Shared infrastructure (network, compute, storage) means shared risk. 2. As load increases, performance decreases. Let’s say “Node 1” fails. In both the physical and virtual deployments, the load is simply shifted to the remaining active nodes. No problem. But what if the network connectivity between the load balancing service and “Node 1” fails? In a physical deployment, no problem – each node has its own physical connection and is unlikely to impact the other nodes. But what about the virtual deployment? Each node has its own virtual network connection, certainly, but does it have its own physical network connection or is it shared? If it’s a shared physical connection and it fails, then all nodes will fail – leaving “the application” unavailable. Load Balancing Virtualized Applications Rule #1: Team and Trunk. Physical network redundancy is a must. Modern server platforms are generally enabled with at least 2 if not 4 GBE connections, use them. So now you’ve got your network topology designed to ensure that a physical failure will not take out every application instance on the server. Next you need to consider how the application instances are isolated and deployed to ensure that a failure at the hypervisor layer does not disrupt all application instances. Consider that there are two possible reasons you are implementing load balancing: scalability and availability. In the former, you’re trying to ensure supply meets demand. In the latter, you’re trying to mitigate potential failure in a way to ensure “the application” is always available, regardless of failure. If there is a failure at the hypervisor layer, all instances relying on that hypervisor will be impacted (and not in a good way). Regardless of why you’re implementing load balancing, the result of such a failure is the same, instances are unavailable. Similarly, if the physical device on which virtualized applications are deployed fails, every instance on that device will be down. In both cases, if all your virtual eggs are in one basket and there’s a failure at the hypervisor layer, you’re in trouble. Load Balancing Virtualized Applications Rule #2: Divide and Conquer. Application instance redundancy is a must. Never put all your application instances on a single virtualized or physical platform. Spread them across at least two, to isolate potential failures in the virtualization layer or at the physical server layer. Node backups should always be located on physically separate devices. Load balancing services are adept at discerning failure but they are not necessarily able to determine the source. A failure to communicate with an application instance could be caused by a bad cable, a failed port, an unresponsive network stack, or an application error. The load balancing service knows the application instance is down, but not necessarily why it’s down. If it’s a crashed instance, then failing over to a back up instance on the same server is probably going to work out fine. But if the root cause is a failed port or bad cable, failing over to a backup instance on the same server isn’t going to help – because it is down too. It is imperative to ensure availability that there are always at least two of everything – and that means physical devices, as well. Never put all your eggs in one basket – at any layer. THE PERFORMANCE IMPACT Aside from general availability issues, there is also the very real possibility that where you deploy virtualized application instances will impact performance of “the application.” Remember that even though you can designate CPU and memory on a per application instance, they still ultimately shared I/O – both storage and network. That means even if you use rate limiting technologies to try to manage bandwidth consumption as a means to reduce congestion or latency, ultimately you’re impacting performance. If you don’t use rate limiting or other bandwidth-focused solutions to manage the shared network resource, you run the risk of congestion and increasing latency on the wire. Similarly, shared storage is even more problematic because when you trace I/O down through the system, you end up at a single, shared I/O controller that is going to have some serious limitations on it. I/O intense application instances deployed on the same physical device are going to cause contention in the underlying system, which is going to negatively impact performance. Again, divide and conquer. Disperse such instances across two (or more) physical servers. The number of servers will depend on the overall scale of the application and the resource consumption rate. Load balancing will be able to assist in maintaining performance across instances if you take advantage of a response-time aware algorithm such as fastest response time (the assumption is that response time correlates directly to load and in most cases, this is true). This keeps any given instance from becoming overwhelmed. Ultimately, what this means is that you have to be a little more aware of physical deployment location for application instances than you did with pure physical deployments. Consolidation is a great way to reduce operational and capital expenditures, but it also means consolidating risk. LOCATION MATTERS This is a particularly tough nut to crack especially when combined with the desire to implement auto-scaling operations in a more cloud-like environment. The idea that you can leverage “whatever idle resources” you can find to scale out applications on-demand is powerful, but it’s also potentially fraught with risk if you’re unable to control placement at all. While the possibility that every instance would end up deployed on a single server or even a select handful of servers is minimal, there is the possibility that multiple instances could be deployed in a way that means a single server failure could eliminate a sizeable number of application instances, resulting in an unacceptable degradation of performance or even downtime for some percentage of users. In the end, location really does matter when it comes to load balancing virtualized applications. Where they are deployed and in what groupings becomes a critical factor for maintaining performance and availability. The tendency to increase VM density is high, but that tendency can lead to highly disruptive situations in the event of a failed component. Be aware that cost savings from mass-consolidation and “high efficiency” through increasing VM density metrics may look good now, but may not look so good through the lens of hindsight. Digital is Different The Cost of Ignoring ‘Non-Human’ Visitors Cloud Bursting: Gateway Drug for Hybrid Cloud The HTTP 2.0 War has Just Begun Why Layer 7 Load Balancing Doesn’t Suck Network versus Application Layer Prioritization Complexity Drives Consolidation Performance in the Cloud: Business Jitter is Bad836Views0likes1CommentThe Event-Driven Data Center
#sdn #node.js Like the planets aligning, dev and the network sync up on architectural foundations so infrequently that it should be a major event One of the primary reasons node.js is currently ascending in the data center is because of its core model: event-driven, non-blocking processing. Historically, developers write applications based on connections and requests. It's blocking; it's not asynchronous; it's not fire and forget until some other event reminds them that something needs to be done. If the underlying network fabric worked like applications today work, we'd be in a heap of trouble. A switch would grab an incoming packet and forward it and then... wait for it to return. You can imagine what that would do to traffic flow and just how much bigger and beefier switches would have to be to support the kind of traffic experienced today by enterprises and web monsters alike. Luckily, the network isn't like that. It doesn't block waiting for a response. It grabs an ingress packet, determines where it should go next, forwards it and then moves on to the next packet in line. It does not hang out, mooning over and writing bad love poetry about the packet it just forwarded, wondering if it will ever come back. That, in part*, is why networks scale so well, why they are so fast and able to sustain a significant order of magnitude more concurrent connections than a web or application server. So imagine what happens when a web or application server adopts a more laissez-faire attitude toward processing requests; when it fires-and-forgets until it is reminded by the return of a response? Exactly. It gains phenomenal network-like speed and much better scalability. That's what node.js is bringing to the data center table - an event-driven, non-blocking application infrastructure that aligns with the event-driven, non-blocking nature of the network fabric. Louis Simoneau sums it up well: Node.js is the New Black Here’s where some of that jargon from before comes into play: specifically non-blocking and event-driven. What those terms mean in this context is less complicated than you might fear. Think of a non-blocking server as a loop: it just keeps going round and round. A request comes in, the loop grabs it, passes it along to some other process (like a database query), sets up a callback, and keeps going round, ready for the next request. It doesn’t just sit there, waiting for the database to come back with the requested info. And it's a quite capable platform based on the numerous benchmarks and tests performed by developers and devops interested in understanding the differences between it and the old guard (Apache, PHP, etc...) Developers deploying on off-the-shelf operating systems have scaled node.js to 250000 connections. (See chart) On a purpose-built operating system, node.js has been clocked at over 4 million simultaneous connections. It scales, and it scales well. Suffice to say that the application and network infrastructure is starting to align in terms of performance capabilities and, interestingly enough, programmability. What the network is taking from development is programmability, and what development is taking from the network is speed and capacity. They're aligning in so many ways that it's almost mind-boggling to consider the potential. It's not just like Christmas - it's like Christmas when you're five years old. Yeah, it's that awesomesauce. I haven't been this excited about a technology since application switching broke onto the scene in, well, quite some years ago now. Counting Down This is not to say that the entire network fabric is truly event-driven. It's not quite Christmas yet, but it is close enough to taste ... Network components, individually, are event-driven, but the overall data center network is not yet. But we're getting closer. You may recall that when we first started talking about Infrastructure 2.0, when cloud was in its infancy (almost pre-infancy, actually), we talked about event-driven configuration and policy deployment: Infrastructure 2.0: As a matter of fact that isn't what it means The configuration and policies applied by dynamic infrastructure are not static; they are able to change based on predefined criteria or events that occur in the environment such that the security, scalability, or performance of an application and its environs are preserved. Some solutions implement this capability through event-driven architectures, such as "IP_ADDRESS_ASSIGNED" or "HTTP_REQUEST_MADE". Today we'd call that SDN (software-defined networking) or the SDDC (software-defined data center). Regardless of what we call it, the core principle remains: events trigger the configuration, deployment, and enforcement of infrastructure policies across the data center. Now couple that with an inherently event-driven application infrastructure that complements the event-driven network infrastructure. And consider how one might use a platform that is event-driven and isn't going to bottleneck like more traditional development languages. Exactly. All the capacity and performance concerns we had around trying to architect an event-driven data center with Infrastructure 2.0 just evaporate, for the most part. The data center planets are aligning and what's yet to come will hopefully be a leap forward towards a dynamic, adaptable data center fabric that's capable of acting and reacting to common events across the entire network and application infrastructure. * Yes, there's hardware and firmware and operating system design that also contributes to the speed and capacity of the network fabric, but that would be all be undone were the network to sit around like a lovesick Juliet waiting for her Romeo-packet to return.256Views0likes0CommentsIs Cloud Built to Fail or Built to Scale?
#webperf #cloud The difference matters. A lot. There's been a growing focus on scalability as the Internet of Things has continued its rapid growth. Perhaps due in part to large online failures during periodic or individual events, perhaps due in part to simple growth, the reason is less important than the reality that scalability is a critical technological driver for a variety of new technologies - cloud and SDN being the most often referenced. But while we've been focusing on scalability we may have been overlooking the related and no less important availability factor. These two "itys" are related, as scalability is one way to achieve availability when dealing with growth, rapid or otherwise. But availability also means being sensitive to failure. Cloud, in general, is designed for scalability. It is specifically architected to provide elasticity - which is scalability both in and out. Cloud is designed to enable resource growth and contraction to match demand. In this way, cloud addresses one aspect of availability: capacity. But it does not always address the other aspect - failure. Cloud is built to scale, not necessarily fail. The Many Faces of Availability Availability and scale are both achieved primarily through the same mechanisms in both data centers and cloud environments (and SDN network fabrics, for that matter): load balancing. At the network layer we've seen techniques like link aggregation (trunking, teaming, bundling) to both manage both scale and fail. Multiple network links are bound together, usually using an established network protocol, and traffic is distributed via load balancing across those links. Similarly, the same techniques are used at the application layers (layers 4-7) to provide the same measure of scalability and resilience to failure at the server and application layer. Multiple resources are bound together, usually using a concept known as a virtual server (or virtual IP address) in a load balancing service and then requests are distributed via load balancing across those resources. In this way, scalability is achieved. As demand grows, resources are transparently added to increase capacity. Similarly, as demand contracts, resources can be transparently decommissioned to decrease capacity. Voila! Elasticity. But failure, the other and lesser mentioned aspect impacting availability, is not so easily managed. The Impact of Failure In the case of the network, the failure of a single link in an aggregated bundle (or trunk) is handled by simply ignoring the failed link. All traffic is distributed across the remaining links, every packet is pushed, and availability is maintained. Except when it isn't because of oversubscription or congestion that results in excessive latencies that cause delays ultimately resulting in poor application performance. While in the purest sense of the word "available" the application is still accessible, most businesses today consider unresponsive or poorly performing applications to be "unavailable", especially when those applications are revenue generating. At the application layers, failure is even more detrimental to availability. Oversubscription of an application due to failure of resources often results in true downtime; errors or timeouts that prevent the end-user from accessing the application at all. Worse, users that were active may suddenly find they have "lost" their connection as well as any work they may have been doing before the resource was lost. Load balancing architectures compensate, of course, by directing those users to other application instances. Cloud environments imbued with auto-scaling capabilities may be able to redress the failure by provisioning a new instance to take its place and thus maintain the proper levels of capacity. But that does not mitigate the loss of productivity and access experienced due to the original failure. It addresses scalability, not availability. That's because when a failure occurs in most cloud environments, all active sessions to the failed application instance are simply discarded. The users must start anew. The cloud infrastructure fabric will certainly redirect them to a new instance and start a new session (and in this way it will "handle" failure), but this is disruptive to the user; it is noticeable. And noticeable degradations of performance or availability are a no-no for most business stakeholders. Beware the Long Term It's not necessarily the immediate reaction that should be of concern, but the long term impact. Everyone cites the data presented by Microsoft, Google, and Shopzilla at Velocity 2009 with respect to the impact of seconds of delay on revenue (spoiler: it's not good) but they tend to ignore the long term impact - the behavioral impact - of such delays and disruption on the end user [emphasis mine]: Their data showed that slow sites get fewer search queries per user, less revenue per visitor, fewer clicks, fewer searches, and lower search engine rankings. They found that in some cases even after site performance was improved users continued to interact as if it was slow. Bad experiences have a lasting influence on customer behavior. -- More on how web performance impacts revenue… Did you catch that? Bad experiences (of which disruption is certainly one, I don't think we need to argue about that, do we?) have a lasting influence on customer behavior. It's important, therefore, to understand the limitations of the environment in which you are deploying an application - particularly one that is customer-facing. Understanding that cloud is built to scale - not fail - is a key piece of knowledge you need to make decisions regarding which applications and workloads are fit for migration to the cloud, and which may be a fit if they are re-architected to address such failure themselves.227Views0likes0CommentsScaling Stateful Network Devices
One of the premises of #SDN and #cloud scalability is that it's easy to simply replicate services - whether they be application or network focused - and distribute traffic across them to scale infinitely. In theory, this is absolutely the case. In theory, one can continue to add capacity to any layer of the data center and simply distribute requests across the layer to scale out as necessary. Where reality puts a big old roadblock in the way is when services are stateful. This is the case with many applications - much to the chagrin of cloud and REST purists, by the way - and it is also true with a significant number of network devices. Unfortunately, it is often these devices that proponents of network virtualization target without offering a clear path to addressing the challenges inherent in scaling stateful network devices. SDN's claims to supporting load balancing, at least at layer 4, are almost certainly based on traditional, dumb layer 4 load balancing. We use the term "dumb" to simply mean that it doesn't care about the payload or the application or anything else other than its destination port and service and does not participate in the flow. In most layer 4 load balancing scenarios for which this is the case, the only time the load balancer examines the traffic is when processing a new connection. The load balancer may buffer enough packets to determine some basic networking details - source and destination IP and TCP ports - and then it establishes a connection between the client and the server. From this point on, generally speaking, the load balancer assumes the role of a simple forwarder. Subsequent packets with the same pattern are simply forwarded on to the destination. If you think about it, this is so close to the behavior described by an SDN-enabled network as to be virtually the same. In an SDN-enabled network, a new flow (session if you will, in the load balancing vernacular) would be directed to the SDN controller for processing. The SDN controller would determine its destination and inform the appropriate network components of that decision. Subsequent packets with the same pattern would be forwarded on to the destination according to the information in the FIB (Forwarding Information Base). As the load balancing service was scaled out, inevitably packets would be distributed to components lacking an entry in the FIB. Said components would query the controller, which would simply return the appropriate entry to the device. In such a way, simple layer 4 load balancing can be achieved via SDN*. However, the behavior of the layer 4 load balancing service described is stateless. It does not actively manage the flow. Aside from the initial inspection and routing decision, the load balancing service is actually just a bump in the wire, forwarding packets much in the same manner as any other switch in the network. But what happens when the load balancing service is actively participating in the flow, i.e. it is stateful. Scaling Stateful Devices Stateful devices are those that actively manage a flow. That is, they may inspect, manipulate, or otherwise interact with flows in real-time. These devices are often used for security - both ingress and egress - as well as acceleration and optimization of application exchanges. They are also use for content transformation purposes, such as XML or SOA gateways, API management, and other application-focused scenarios. The most common use of stateful devices is persistent load balancing, aka sticky sessions, aka server affinity. Persistent load balancing requires the load balancing service (or device) maintain a mapping of user to application instance (or server, in traditional, non-virtualized environments). This mapping is unique to the device, and without it a wide variety of applications break when scaled - VDI being the most recent example of an application relying on persistence of sessions . In all these cases, however, one thing is true: the device providing the service is an active participant. The device maintains service-specific information regarding a variety of variables including the user, the device, the traffic, the application, the data. The entire context of the session is often maintained by one or more devices along the traffic chain. What that means is that, like stateful, shared-nothing applications, it matters to which device a specific request is directed. While certainly the same model used at layer 4 and below in which a central controller (or really bank of controllers) maintains this information and doles it on on-demand, the result is that depending on the distribution algorithm used, every stateful device would end up with the same flows installed. In the interim, the network is frantically applying optimization and acceleration policies to traffic that may be offset by the latency introduced by the need to query the controller for session state information, resulting in a net loss of performance experienced by the end-user. And we're not even considering the impact of secured traffic on such a model, where any device needing to make decisions on such traffic must have access to the certificates and keys used to encrypt the traffic in order to decrypt, examine, and usually re-encrypt the traffic. Stateful network devices - application delivery controllers, intrusion prevention and detection systems, secure gateways, etc... - are often required to manage secured content, which means distributing and managing certificates and keys across what may be an ever-expanding set of network devices. The reality is that stateful network devices are a necessary and integral component of not just networks but applications today. While modern network architectures like SDN bring much needed improvements to provisioning and management of large scale networks, their scaling models are based on the premise of stateless, relatively simple devices not actively participating in flows. For those devices that rely upon deep participation in the flow, this model introduces a variety of challenges that may not find a solution that fits well with SDN without compromising on performance outside new protocols capable of carrying that state persistently throughout the lifetime of a session. * This does not address the issue of resources required to maintain said forwarding tables in a given device, which given current capacity of commoditized switches supported for such a role seems unlikely to be realistically achieved.287Views0likes0Comments