global application delivery
3 TopicsThe Skeleton in the Global Server Load Balancing Closet
Like urban legends, every few years this one rears its head and makes its rounds. It is certainly true that everyone who has an e-mail address has received some message claiming that something bad is going on, or someone said something they didn’t, or that someone influential wrote a letter that turns out to be wishful thinking. I often point the propagators of such urban legends to Snopes because the folks who run Snopes are dedicated to hunting down the truth regarding these tidbits that make their way to the status of urban legend. It would nice, wouldn’t it, if there was such a thing for technical issues; a technology-focused Snopes, if you will. But there isn’t, and every few years a technical urban legend rears its head again and sends some folks into a panic. And we, as an industry, have to respond and provide some answers. This is certainly the case with Global Server load balancing (GSLB) and Round-Robin DNS (RR DNS). CLAIM : DNS Based Global Server Load Balancing (GSLB) Doesn’t Work STATUS : Inaccurate ORIGINS The origins of this skeleton in GLSB’s closet is a 2004 paper written by Pete Tenereillo, “Why DNS Based Global Server Load Balancing (GSLB) Doesn’t Work.” It is important to note that at the time of the writing Pete was not only very experienced with these technologies but was also considered an industry expert. Pete was intimately involved in the early days of load balancing and global server load balancing, being an instrumental part of projects at both Cisco and Alteon (Nortel, now Radware). So his perspective on the subject certainly came from experience and even “inside” knowledge about the way in which GSLB worked and was actually deployed in the “real” world. The premise upon which Pete bases his conclusion, i.e. GSLB doesn’t work, is that the features and functionality over and above that offered by standard DNS servers are inherently flawed and in theory sound good, but don’t work. His ultimate conclusion is that the only way to implement true global high-availability is to use multiple A records, which are already a standard function of DNS servers. DNS based Global Server Load Balancing (GSLB) solutions exist to provide features and functionality over and above what is available in standard DNS servers. This paper explains the pitfalls in using such features for the most common Internet services, including HTTP, HTTPS, FTP, streaming media, and any other application or protocol that relies on browser based client access. … An Axiom The only way to achieve high-availability GSLB for browser based clients is to include the use of multiple A records It would be easy to dismiss Pete’s concerns based on the fact that his commentary is nearly seven years old at this point. But the basic principles upon which DNS and GSLB are implemented today are still based on the same theories with which Pete takes issue. What Pete missed in 2004 and what remains missing from this treatise is twofold. First, GSLB implementations at that time, and today, do in fact support returning multiple A records, a.k.a. Round-Robin DNS. Second, the features and functionality provided over and above standard DNS do, in fact, address the issues raised and these features and functionality have, in fact, evolved over the past seven years. Is returning multiple A records to LDNS the only way of achieving High Availability? How is advanced health checking an important component of providing High Availability? Many people misuse the term ‘high availability’ by indicating that it only equates to when a site is either up or down. This type of binary thinking is misguided and is purely technical in focus. Our customers have all indicated that high availability also includes performance of the application or site. The reason is that by business definitions if a site or application is too slow it is unavailable. Poor performance directly impacts productivity, one of the key performance indicators used to measure the effectiveness of business employees and processes. As a result, high availability can be achieved in a number of different ways. Intelligent GSLB solutions, through advanced monitoring and statistical correlation, take into account not only whether the site is up or down, but such detail as hop count, packet loss, round-trip time, and data-center capacity to name a few. These metrics then transparently provide users with most efficient and intelligent way of steering traffic and achieving high availability. Geolocation is another means of steering traffic to the appropriate service location, as well as any number of client and business-specific variables. Context is important to application delivery in general, but is a critical component of GSLB to maintain availability – including performance. The round robin handling of the A records by the Local DNS (LDNS) is a well known problem in the industry. When multiple A records are handed back to the LDNS for an address resolution, the LDNS shuffles the list and returns the A records list back to the client without honoring the order in which it received it. The next time the client requests an address, the LDNS responds with a different ordered list of A records. This LDNS behavior makes it very difficult to predict the order in which A records are being returned to the client. In order to overcome this problem, many prefer to configure a GSLB solution to send back one A record to the LDNS. When compared with just a ‘plain’ DNS server that would send back any one of the site addresses with a TTL value, an intelligent GSLB sends back the address of the best performing site, based on the metrics that are important to the business, and sets the TTL value. A majority of the LDNS that are RFC compliant will honor the TTL value and resolve again after the TTL value has expired. The GSLB performs advanced health checking and sends back the address of the best performing site taking into account metrics like application availability, site capacity, round trip time, hops and completion rate hence providing the best user experience and meeting applicable business service level agreements. In the event of a site failure (when the link is down or because of a catastrophic event), existing clients would connect to the unavailable site for the period of time equal to the TTL value. The GSLB sets a TTL value of 30 seconds when returning an A record back to the LDNS. As soon as the 30 second time period expires, the LDNS resolves again and the GSLB uses its advanced health checking capability and determines that one of the multiple sites is unavailable. The GSLB then starts to direct users transparently to the best performing site by returning the address of that site back to the LDNS. A flexible GSLB will also provides a Manual Resume option that gives them the option of letting the unavailable site stay down to mitigate the commonly known back end database synchronization problem. An intelligent GSLB also has the option of sending multiple A records that allows delivery of content from the best performing sites. For example, let’s say an enterprise wants to deliver their content using 10 sites and wants to provide high availability. Using sophisticated health checking, the GSLB can determine the two best performing sites and return their addresses to the users. The GLSB would track each site’s performance and send back the best sites based on current network and site conditions (context) for every resolution. Slow sites or sites that are down would never be sent back to the user. What about the issues with DNS Browser Caching? Of all the issues raised by Pete in his seminal work of 2004, this is likely the one that is still relevant. Browser technology has evolved, yes, but the underlying network stack and functionality has not, mainly because DNS itself has not changed much in the past ten years. Most modern browsers may or may not (evidence is conflicting and documentation nailing it down scant) honor DNS TTL but they have, at least, reduced the caching interval on the client side. This may or may not - depending on timing - result in a slight delay in the event of a failure while resolution catches up but it does not have nearly the dramatic negative impact it once had. In early days, a delay of 15 minutes could be expected. Today that delay can generally be counted in seconds. It is, admittedly, still a delay but one that is certainly more acceptable to most business stakeholders and customers alike. And yet while the issue of DNS browser caching is still technically accurate, it is not all that relevant; the same solution Pete suggests to address the issue – RR DNS – has always been available as an option for GSLB. Any technology, when not configured to address the issue for which it was implemented, can be considered a failure. This is not a strike against the technology, but the particular implementation. The instances of browser caching impacting site availability and performance is minimal in most cases and for those organizations for which such instances would be completely unacceptable it is simply a matter of mitigation using the proper policies and configuration. > SUMMARY What it comes down to is that Pete, in his paper, is pushing for the use of Round-Robin DNS (RR DNS). Modern Global Server Load Balancing (GSLB) solutions fully support this option today and generally speaking always have. However, the focus on the technical aspects completely ignores the impact of business requirements and agreements and does not take into consideration the functions and features over and above standard DNS that assist in supporting those requirements. For example, health-checking has come a long way since 2004, and includes not only simply up-down indicators or even performance-based indicators but is now able to incorporate a full range of contextual variables into the equation. Location, client-type, client-network, data center network, capacity… all these parameters can be leveraged to perform “health” checks that enable a more accurate and ultimately adaptable decision. Interestingly, standard DNS servers leveraged to implement a GSLB solution are not capable of nor do they provide the means by which such health checks can be implemented. Such “health monitoring” is, however, a standard offering for GSLB solutions. NEW FACTORS to CONSIDER Given the dynamism inherent not only to local data centers but global implementations and the inclusion of cloud computing and virtualization, GSLB must also provide the means by which management and maintenance and process automation can be accomplished. Traditional DNS solutions like BIND do not provide such means of control; they are enabled with the ability to participate in the collaborative processes necessary to automate the migration and capacity fulfillment functions for which virtualization and cloud computing are and will be used. Thus a simple RR DNS implementation may be desirable, but the solution through which such implementations will be implemented must be more modern and capable of addressing management and business concerns as well. These are the “functions and features” over and above standard DNS servers that provide value to organizations regardless of the technical details of the algorithms and methods used to distribute DNS records. Additionally, traditional DNS solutions – while supporting new security initiatives like DNSSEC – are less able to handle such initiatives in a dynamic environment. A GSLB must be able to provide dynamic signing of records to enable global server load balancing as a means to support DNSSEC. DNSSEC introduces a variety of challenges associated with GSLB that cannot be easily or efficiently addressed by standard DNS services. Modern GLSB solutions can and do address these challenges while enabling integration and support for other emerging data center models that make use of cloud computing and virtualization. This skeleton is sure to creep out of the closet yet again in a few years, primarily because DNS itself is not changing. Extensions such as DNSSEC occasionally crop up to address issues that arise over time, but the core principles upon which DNS have always operated are still true and are likely to remain true for some time. What has changed are the data center architectures, technology, and business requirements that IT organizations must support, in part through the use of DNS and GSLB. The fact is that GSLB does work and modern GSLB solutions do provide the means by which both technical and business requirements can be met while simultaneously addressing new and emerging challenges associated with the steady march of technology toward a dynamic data center. So You Put an Application in the Cloud. Now what? F5 Friday: Hyperlocalize Applications for Everyone Location-Aware Load Balancing Cloud Needs Context-Aware Provisioning What is a Strategic Point of Control Anyway? German DPA Issues Legal Opinion on Cloud Computing As Cloud Computing Goes International, Whose Laws Matter? Load Balancing in a Cloud259Views0likes0CommentsSo You Put an Application in the Cloud. Now what?
We need to stop thinking of cloud as an autonomous system and start treating it as part of a global application delivery architecture. When you decided you needed another garage to house that third car (the one your teenager is so proud of) you probably had a couple choices in architecture. You could build a detached garage that, while connected to your driveway, was not connected to any existing structures or you could ensure that the garage was in some way connected to either the house or the garage. In both cases the new garage is a part of your location in that both are accessed (most likely) from the same driveway. The only real question is whether you want to extend your existing dwellings or not. When you decide to deploy an application to the cloud you have a very similar decision: do you extend your existing dwelling, essentially integrating the environment with your own or do you maintain a separate building that is still “on the premises” but not connected in any way except that it’s accessible via your shared driveway. In both cases the cloud-deployed application is still located at your “address” – or should be – and you’ll need to ensure that it looks to consumers of that application like it’s just another component of your data center. THE OFF-SITE GARAGE Global application delivery (a.k.a. Global Server Load Balancing) has been an integral part of a multi-datacenter deployment model for many years. Whether a secondary or tertiary data center is leveraged for business continuity, a.k.a. “OMG our main site is down”, or as a means to improve performance of applications for a more global user base is irrelevant. In both cases all “sites” have been integrated to appear as a single, seamless data center through the use of global application delivery infrastructure. So why, when we start talking about “cloud” do we treat it as some external, disconnected entity rather than as the extension of your data center that it is? Like building a new garage you have a couple choices in architecture. There is, of course, the continued treatment of a cloud-deployed application as some external entity that is not under the management or control of the organization. That’s like using an off-site garage. That doesn’t make a lot of sense (unless your homeowners association has judged the teenager’s pride and joy an eyesore and forbids it be parked on premise) and neither does it make a lot of sense to do the same with a cloud-deployed application. You need at a minimum the ability to direct customers/users to the application in whatever situation you find yourself using it – backup, failover, performance, geo-location, on-demand bursting. Even if you’re only using off-premise cloud environments today for development or testing, it may be that in the future you’ll want to leverage the on-demand nature of off-premise cloud computing for more critical business cases such as failover or bursting. In those cases a completely separate, unmanaged (in that you have no real operational control) off-premise cloud is not going to provide the control necessary for you to execute successfully on such an initiative. You need something more, something more integrated, something more strategic rather than tactical. Instead, you want to include cloud as a part of your greater, global (multi-site) application delivery strategy. It’s either detached or attached, but in both cases it is just an extension of your existing property. ATTACHED CLOUD In the scenario in which the cloud is “attached” to your data center it actually becomes an extension of your existing architecture. This is the “private virtual cloud” scenario in which the resources provisioned in a public cloud computing environment are not accessible to the general Internet public directly. In fact, customers/users should have no idea that you are leveraging public cloud computing as the resources are obfuscated by leveraging the many-to-one virtualization offered by an application delivery controller (load balancer). The data center is extended and connected to this pool of resources in the cloud via a secured (encrypted) and accelerated tunnel that bridges the network layer and provides whatever routing may be necessary to treat the remote application instances as local resources. This is simply a resource-focused use of VPN (virtual private network), one that was often used to integrate remote offices with the corporate data center as opposed to individual use of VPNs to access a corporate network. Amazon, for example, uses IPSEC as a means to integrate resources allocated in its environments with your data center, but other cloud computing providers may provide SSL or a choice of either. In the case that the provider offers no option, it may be necessary to deploy a virtual VPN endpoint in the cloud in order to achieve this level of seamless connectivity. Once the cloud resources are “attached” they can be treated like any other pool of resources by the application delivery controller (load balancer). [ This is depicted by connection (2) in the diagram ] DETACHED CLOUD A potentially simpler exercise (in both the house and cloud scenarios) is to treat the cloud-deployed resources as “detached” from the core networking and data center infrastructure and integrating the applications served by those resources at the global application delivery layer. [ This is depicted by connection (1) in the diagram ] In this scenario the application delivery network and resources it is managing are all deployed within an external cloud environment and can be accessed publicly (if one were to determine which public IP address was fronting them, of course). You don’t want users/customers accessing those resources by some other name (you’d prefer www.example.com/remoteapp over 34.35.164.4-cloud1.provider.com of course) and further more you want to be able to make the decision when a customer will be using the detached cloud and when they will be using local data center resources. Even if the application deployed is new and no copy exists in the local data center you still want to provide a consistent corporate naming scheme to ensure brand identity and trust that the application is yours. Regardless, in this case the detached cloud resources require the means by which customers/users can be routed to them; hence the use of global application delivery infrastructure. In this case users attempt to access www.example.com/remoteapp and are provided with an IP address that is either local (in your data center) or remote (in a detached cloud environment). This resolution may be static in that it does not change based on user location, capacity of applications, or performance or it may take into consideration such variables as are available to it: location, performance, security, device, etc… (context). Yes, you could just slap a record in your DNS of choice and resolve the issue. This does not, however, lay a foundation for more dynamic and flexible integration of off-premise cloud-deployed applications in the future. FOUR REASONS to LEVERAGE GLOBAL APPLICATION DELIVERY There are many reasons to include in a global application delivery strategy a global load balancing architecture, but these four stand out as the ones that provide the most benefit to both the organization and the user: Avoid unnecessary application changes due to changes in providers If all your images or a certain type of content are served by applications deployed in an external cloud computing environment, normalizing your global namespace eliminates the need to change the application references to that namespace in the case of a change of providers. The change is made at the global application delivery layer and is propagated quickly. This eliminates a form of vendor lock-in that is rarely remembered until a change in providers is desired. Developers should never be codifying domain names in applications, but legacy and third-party applications still need support and these often draw their name and information from configuration files that effectively codify the operational location of the server and application. These configurations are less important when the platforms are deployed behind a global application delivery controller and virtualized. Normalization of global name spaces preserves corporate identity and enables trust Applications served by a trusted domain are desirable in an age when phishing and malicious code often re/directs users to oddly named domains for the purposes of delivering a malware payload. Global application delivery normalizes global domain namespaces and provides a consistent naming scheme for applications regardless of physical deployment location. Enhances decision making processes Leveraging global application delivery enables more control over the use of resources at a user level as well as a business and technical layer. Decisions regarding which resources will be used by whom and when are the purview of global application delivery controllers (GSLB) and provide the organization with flexibility to determine which application instance is best suited to serve any given request based on the context of the request. Foundational component. Like load balancing, global load balancing (application delivery) is a foundational component of a well-rounded cloud computing architecture. It provides the means by which the first integrations of off-site cloud computing will be accomplished, e.g. cloud bursting, and lays the foundation upon which more advanced location-selection algorithms will be applied, e.g. cloud balancing. Without an intelligent, integrated global application delivery strategy it will be difficult to implement and execute on strategies which leverage external and internal cloud computing deployments that are more application and business focused. External (detached) cloud computing environments need not be isolated (silo’d) from the rest of your architecture. A much better way to realize the benefits associated with public cloud computing is to incorporate them into a flexible, global application delivery strategy that leverages existing architectural principles and best practices to architect an integrated, collaborative and seamless application delivery architecture. Related Posts from tag DNS The One Problem Cloud Can’t Solve. Or Can It? It’s DNSSEC Not DNSSUX Windows Vista Performance Issue Illustrates Importance of Context from tag strategy The Devil is in the Details The Myth of 100% IT Efficiency Greedy (IT) Algorithms If you aren’t asking “what if” now you’ll be asking “why me” later from tag F5 Madness? THIS. IS. SOA! WILS: How can a load balancer keep a single server site available? Optimize Prime: The Self-Optimizing Application Delivery Network F5 Friday: Eavesdropping on Availability WILS: Automation versus Orchestration (more..) del.icio.us Tags: MacVittie,F5,cloud computing,global application delivery,GSLB,load balancing,cloud balancing,DNS,strategy187Views0likes2CommentsDisaster Recovery: Not Just for Data Centers Anymore
It’s about business continuity between the customer or user and your applications, and you only have control over half that equation. Back in the day (when they still let me write code) I was contracted to a global transportation firm where we had just completed the very first implementation of an Internet-enabled tracking system. We had five whole pilot customers and it was, to say the least, a somewhat fragile system. We were all just learning back then, after all, and if you think integration today is difficult and fraught with disaster, try adding in some CICS and some EDI-delivered data. Yeah – exactly. In any case it was Saturday and I was on call and of course the phone rang. A customer couldn’t access the application so into the car I went and off to the office to see what was going on (remote access then wasn’t nearly as commonplace as it is now and yes, I drove uphill, both ways, to get there). The thing was, the application was working fine. No errors, nothing in the logs; I was able to log in and cruise around without any problems whatsoever. So I pull out my network fu and what do I find? There’s a router down around Chicago. Bleh. Trying to explain that to the business owner was more painful than giving birth and trust me, I’ve done both so I would know. Through nearly a decade of use and abuse, the Internet itself has grown a lot more resilient and outages of core routers rarely, if ever, happen. But localized service (and its infrastructure) do experience outages and interruptions and there are just as many (if not more) folks out there who don’t understand the difference between “Internet problem” and “application problem.” And you won’t have any better luck explaining to them than I did a decade ago. GLOBAL APPLICATION DELIVERY to the RESCUE (SOMETIMES) When most people hear the words “disaster” and “recovery” together their eyes either glaze over and they try to run from the room or they get very animated and start talking about all the backup-plans they have for when the alien space-craft lands atop their very own data center, complete with industrial strength tin-foil hat. Okay, okay. There are people for whom disaster recovery is an imperative and very important, it’s just that we hope they never get to execute their plans. The thing is that their plans have a broader and more generalized use than just in the event of a disaster and in fact they should probably be active all the time and addressing what external “disasters” may occur on a daily basis. Global application delivery (which grew out of global server load balancing (GSLB)) is generally the primary strategic component in any disaster recovery plan, especially when those plans involve multiple data centers. Now, global application delivery is also often leveraged as a means to provide GeoLocation-based load balancing across multiple application deployments as a way to improve performance and better distribute load across a global web application presence. If you expand that out just a bit, and maybe throw in some cloud, you can see that in a disaster recovery architecture, global application delivery is pretty key. It’s the decision maker, the router, the “thing” on the network that decides whether you are directed to site A or site B or site Z. Because many outages today are often localized or regionalized, it’s often the case that a segment of your user population can’t access your applications/sites even though most others can. It’s possible then to leverage either regional clouds + global application delivery or multiple data centers + global application delivery to “route around” a localized outage and ensure availability for all users (mostly). I keep saying “mostly” and “sometimes” because no system is 100%, and there’s always the chance that someone is going to be left out in the cold, no matter what. The thing is that a router outage somewhere in Chicago doesn’t necessarily impact users trying to access your main data center from California or Washington. It does, however, impact those of us in the midwest for whom Chicago is the onramp to the Internet. If only the route between Chicago and the west coast is impacted, its quite possible (and likely) we can access sites hosted on the east coast. But if you only have one data center – on the west coast – well, then you are experiencing an outage, whether you realize it or not. cloud computing MAKES MULTIPLE DATA CENTERS FEASIBLE For many organizations dual-data centers was never a possibility because, well, they’re downright expensive – both to buy, build, and maintain. Cloud computing offers a feasible alternative (for at least some applications) and provides a potential opportunity to improve availability in the event of any disaster or interruption, whether it happens to your data center or five states away. You don’t have control over the routes that make up the Internet and deliver users (and customers) to your applications, but you do have control over where you might deploy those applications and how you respond to an interruption in connectivity across the globe. Global application delivery enables a multi-site implementation that can be leveraged for more than just disaster recovery; such implementations can be used for localized service disruption as well as overflow (cloud bursting style) handling for seasonal or event-based spikes or to assist in maintaining service in the face of a DDoS attack. “Disaster recovery” isn’t just about the data center, anymore, it’s about the applications, the users, and maintaining connectivity between them regardless of what might cause a disruption – and where. Related blogs & articles: Location-Aware Load Balancing Cloud Needs Context-Aware Provisioning What is a Strategic Point of Control Anyway? So You Put an Application in the Cloud. Now what? The Inevitable Eventual Consistency of Cloud Computing Cloud Needs Context-Aware Provisioning Cloud Balancing, Reverse Cloud Bursting, and Staying PCI-Compliant Pursuit of Intercloud is Practical not Premature Cloud Balancing, Cloud Bursting, and Intercloud Getting Around That Pesky Speed of Light Limitation152Views0likes0Comments