Data Center Feng Shui: Fault Tolerance and Fault Isolation
Like most architectural decisions the two goals do not require mutually exclusive decisions. The difference between fault isolation and fault tolerance is not necessarily intuitive. The differences, though subtle, are profound and have a substantial impact on data center architecture. Fault tolerance is an attribute of systems and architecture that allow it to continue performing its tasks in the event of a component failure. Fault tolerance of servers, for example, is achieved through the use of redundancy in power-supplies, in hard-drives, and in network cards. In an architecture, fault tolerance is also achieved through redundancy by deploying two of everything: two servers, two load balancers, two switches, two firewalls, two Internet connections. The fault tolerant architecture includes no single point of failure; no component that can fail and cause a disruption in service. load balancing, for example, is a fault tolerant-based strategy that leverages multiple application instances to ensure that failure of one instance does not impact the availability of the application. Fault isolation on the other hand is an attribute of systems and architectures that isolates the impact of a failure such that only a single system, application, or component is impacted. Fault isolation allows that a component may fail as long as it does not impact the overall system. That sounds like a paradox, but it’s not. Many intermediary devices employ a “fail open” strategy as a method of fault isolation. When a network device is required to intercept data in order to perform its task – a common web application firewall configuration – it becomes a single point of failure in the data path. To mitigate the potential failure of the device, if something should fail and cause the system to crash it “fails open” and acts like a simple network bridge by simply forwarding packets on to the next device in the chain without performing any processing. If the same component were deployed in a fault-tolerant architecture, there would be deployed two devices and hopefully leveraging non-network based failover mechanisms. Similarly, application infrastructure components are often isolated through a contained deployment model (like sandboxes) that prevent a failure – whether an outright crash or sudden massive consumption of resources – from impacting other applications. Fault isolation is of increasing interest as it relates to cloud computing environments as part of a strategy to minimize the perceived negative impact of shared network, application delivery network, and server infrastructure.398Views0likes2CommentsDispelling the New SSL Myth
Claiming SSL is not computationally expensive is like saying gas is not expensive when you don’t have to drive to work every day. My car is eight years old this year. It has less than 30,000 miles on it. Yes, you heard that right, less than 30,000 miles. I don’t drive my car very often because, well, my commute is a short trip down two flights of stairs. I don’t need to go very far when I do drive it’s only ten miles or so round trip to the grocery store. So from my perspective, gas isn’t really very expensive. I may use a tank of gas a month, which works out to … well, it’s really not even worth mentioning the cost. But for someone who commutes every day – especially someone who commutes a long-distance every day – gas is expensive. It’s a significant expense every month for them and they would certainly dispute my assertion that the cost of gas isn’t a big deal. My youngest daughter, for example, would say gas is very expensive – but she’s got a smaller pool of cash from which to buy gas so relatively speaking, we’re both right. The same is true for anyone claiming that SSL is not computationally expensive. The way in which SSL is used – the ciphers, the certificate key lengths, the scale – has a profound impact on whether or not “computationally expensive” is an accurate statement or not. And as usual, it’s not just about speed – it’s also about the costs associated with achieving that performance. It’s about efficiency, and leveraging resources in a way that enables scalability. It’s not the cost of gas alone that’s problematic, it’s the cost of driving, which also has to take into consideration factors such as insurance, maintenance, tires, parking fees and other driving-related expenses. MYTH: SSL is NOT COMPUTATIONALLY EXPENSIVE TODAY SSL is still computationally expensive. Improvements in processor speeds in some circumstances have made that expense less impactful. Circumstances are changing. Commoditized x86 hardware can in fact handle SSL a lot better today than it ever could before –when you’re using 1024-bit keys and “easy” ciphers like RC4. Under such parameters it is true that commodity hardware may perform efficiently and scale up better than ever when supporting SSL. Unfortunately for proponents of SSL-on-the-server, 1024-bit keys are no longer the preferred option and security professionals are likely well-aware that “easy” ciphers are also “easy” pickings for miscreants. In January 2011, NIST recommendations regarding the deployment of SSL went into effect. While NIST is not a standards body can require compliance or else, they can and do force government and military compliance and have shown their influence with commercial certificate authorities. All commercial certificate authorities now issue only 2048-bit keys. This increase has a huge impact on the capacity of a server to process SSL and renders completely inaccurate the statement that SSL is not computationally expensive anymore. A typical server that could support 1500 TPS using 1024-bit keys will only support 1/5 of that (around 300 TPS) when supporting modern best practices, i.e. 2048-bit keys. Also of note is that NIST recommends ephemeral Diffie-Hellman - not RSA - for key exchange, and per TLS 1.0 specification, AES or 3DES-EDE-CBC, not RC4. These are much less “easy” ciphers than RC4 but unfortunately they are also more computationally intense, which also has an impact on overall performance. Key length and ciphers becomes important to the performance and capacity of SSL not just during the handshaking process, but in bulk-encryption rates. It is one thing to say a standard server deployed to support SSL can handle X handshakes (connections) and quite another to simultaneously perform bulk-encryption on subsequent data responses. The size and number of those responses have a huge impact on the consumption rate of resources when performing SSL-related functions on the overall server’s capacity. Larger data sets require more cryptographic attention that can drag down the rate of encryption – that means slower response times for users and higher resource consumption on servers, which decreases resources available for handshaking and server processing and cascades throughout the entire system to result in a reduction of capacity and poor performance. Tweaked configurations, poorly crafted performance tests, and a failure to consider basic mathematical relationships may seem to indicate SSL is “not” computationally expensive yet this contradicts most experience with deploying SSL on the server. Consider this question and answer in the SSL FAQ for the Apache web server: Why does my webserver have a higher load, now that it serves SSL encrypted traffic? SSL uses strong cryptographic encryption, which necessitates a lot of number crunching. When you request a webpage via HTTPS, everything (even the images) is encrypted before it is transferred. So increased HTTPS traffic leads to load increases. This is not myth, this is a well-understood fact – SSL requires higher computational load which translates into higher consumption of resources. That consumption of resources increases with load. Having more resources does not change the consumption of SSL, it simply means that from a mathematical point of view the consumption rates relative to the total appear to be different. The “amount” of resources consumed by SSL (which is really the amount of resources consumed by cryptographic operations) is proportional to the total system resources available. The additional consumption of resources from SSL is highly dependent on the type and size of data being encrypted, the load on the server from both processing SSL and application requests, and on the volume of requests. Interestingly enough, the same improvements in capacity and performance of SSL associated with “modern” processors and architecture is also applicable to intermediate SSL-managing devices. Both their specialized hardware (if applicable) and general purpose CPUs significantly increase the capacity and performance of SSL/TLS encrypted traffic on such solutions, making their economy of scale much greater than that of server-side deployed SSL solutions. THE SSL-SERVER DEPLOYED DISECONOMY of SCALE Certainly if you have only one or even two servers supporting an application for which you want to enable SSL the costs are going to be significantly different than for an organization that may have ten or more servers comprising such a farm. It is not just the computational costs that make SSL deployed on servers problematic, it is also the associated impact on infrastructure and the cost of management. Reports that fail to factor in the associated performance and financial costs of maintaining valid certificates on each and every server – and the management / creation of SSL certificates for ephemeral virtual machines – are misleading. Such solutions assume a static environment and a deep pocket or perhaps less than ethical business practices. Such tactics attempt to reduce the capital expense associated with external SSL intermediaries by increasing the operational expense of purchasing and managing large numbers of SSL certificates – including having a ready store that can be used for virtual machine instances. As the number of services for which you want to provide SSL secured communication increase and the scale of those services increases, the more costly it becomes to manage the required environment. Like IP address management in an increasingly dynamic environment, there is a diseconomy of scale that becomes evident as you attempt to scale the systems and processes involved. DISECONOMY of SCALE #1: CERTIFICATE MANAGEMENT Obviously the more servers you have, the more certificates you need to deploy. The costs associated with management of those certificates – especially in dynamic environments – continues to rise and the possibility of missing an expiring certificate increase with the number of servers on which certificates are deployed. The promise of virtualization and cloud computing is to address the diseconomy of scale; the ability to provision and ready-to-function server complete with the appropriate web or application stack serving up an application for purposes of scale assumes that everything is ready. Unless you’re failing to properly provision SSL certificates you cannot achieve this with a server-deployed SSL strategy. Each virtual image upon which a certificate is deployed must be pre-configured with the appropriate certificate and keys and you can’t launch the same one twice. This has the result of negating the benefits of a dynamically provisioned, scalable application environment and unnecessarily increases storage requirements because images aren’t small. Failure to recognize and address the management and resulting impact on other areas of infrastructure (such as storage and scalability processes) means ignoring completely the actual real-world costs of a server-deployed SSL strategy. It is always interesting to note the inability of web servers to support SSL for multiple hosts on the same server, i.e. virtual hosts. Why can't I use SSL with name-based/non-IP-based virtual hosts? The reason is very technical, and a somewhat "chicken and egg" problem. The SSL protocol layer stays below the HTTP protocol layer and encapsulates HTTP. When an SSL connection (HTTPS) is established Apache/mod_ssl has to negotiate the SSL protocol parameters with the client. For this, mod_ssl has to consult the configuration of the virtual server (for instance it has to look for the cipher suite, the server certificate, etc.). But in order to go to the correct virtual server Apache has to know the Host HTTP header field. To do this, the HTTP request header has to be read. This cannot be done before the SSL handshake is finished, but the information is needed in order to complete the SSL handshake phase. Bingo! Because an intermediary terminates the SSL session and then determines where to route the requests, a variety of architectures can be more easily supported without the hassle of configuring each and every web server – which must be bound to IP address to support SSL in a virtual host environment. This isn’t just a problem for hosting/cloud computing providers, this is a common issue faced by organizations supporting different “hosts” across the domain for tracking, for routing, for architectural control. For example, api.example.com and www.example.com often end up on the same web server, but use different “hosts” for a variety of reasons. Each requires its own certificate and SSL configuration – and they must be bound to IP address – making scalability, particularly auto-scalability, more challenging and more prone to the introduction of human error. The OpEx savings in a single year from SSL certificate costs alone could easily provide an ROI justification for the CapEx of deploying an SSL device before even considering the costs associated with managing such an environment. CapEx is a onetime expense while OpEx is recurring and expensive. DISECONOMY of SCALE #2: CERTIFICATE/KEY SECURITY The simplistic nature of the argument also fails to take into account the sensitive nature of keys and certificates and regulatory compliance issues that may require hardware-based storage and management of those keys regardless of where they are deployed (FIPS 140-2 level 2 and above). While there are secure and compliant HSM (Hardware Security Modules) that can be deployed on each server, this requires serious attention and an increase of management and skills to deploy. The alternative is to fail to meet compliance (not acceptable for some) or simply deploy the keys and certificates on commoditized hardware (increases the risk of theft which could lead to far more impactful breaches). For some IT organizations to meet business requirements they will have to rely on some form of hardware-based solution for certificate and key management such as an HSM or FIPS 140-2 compliant hardware. The choices are deploy on every server (note this may become very problematic when trying to support virtual machines) or deploy on a single intermediary that can support all servers at the same time, and scale without requiring additional hardware/software support. DISECONOMY of SCALE #3: LOSS of VISIBILITY / SECURITY / AGILITY SSL “all the way to the server” has a profound impact on the rest of the infrastructure, too, and the scalability of services. Encrypted traffic cannot be evaluated or scanned or routed based on content by any upstream device. IDS and IPS and even so-called “deep packet inspection” devices upstream of the server cannot perform their tasks upon the traffic because it is encrypted. The solution is to deploy the certificates from every machine on the devices such that they can decrypt and re-encrypt the traffic. Obviously this introduces unacceptable amounts of latency into the exchange of data, but the alternative is to not scan or inspect the traffic, leaving the organization open to potential compromise. It is also important to note that encrypted “bad” traffic, e.g. malicious code, malware, phishing links, etc… does not change the nature of that traffic. It’s still bad, it’s also now “hidden” to every piece of security infrastructure that was designed and deployed to detect and stop it. A server-deployed SSL strategy eliminates visibility and control and the ability to rapidly address both technical and business-related concerns. Security is particularly negatively impacted. Emerging threats such as a new worm or virus for which AV scans have not yet but updated can be immediately addressed by an intelligent intermediary – whether as a long-term solution or stop-gap measure. Vulnerabilities in security protocols themselves, such as the TLS man-in-the-middle attack, can be immediately addressed by an intelligent, flexible intermediary long before the actual solutions providing the service can be patched and upgraded. A purely technical approach to architectural decisions regarding the deployment of SSL or any other technology is simply unacceptable in an IT organization that is actively trying to support and align itself with the business. Architectural decisions of this nature can have a profound impact on the ability of IT to subsequently design, deploy and manage business-related applications and solutions and should not be made in a technical or business vacuum, without a full understanding of the ramifications. The Anatomy of an SSL Handshake [Network Computing] Get Ready for the Impact of 2048-bit RSA Keys [Network Computing] SSL handshake latency and HTTPS optimizations [semicomplete.com] Black Hat: PKI Hack Demonstrates Flaws in Digital Certificate Technology [DarkReading] SSL/TLS Strong Encryption: FAQ [apache.org] The Open Performance Testing Initiative The Order of (Network) Operations Congratulations! You do no nothing faster than anyone else! Data Center Feng Shui: SSL WILS: SSL TPS versus HTTP TPS over SSL F5 Friday: The 2048-bit Keys to the Kingdom TLS Man-in-the-Middle Attack Disclosed Yesterday Solved Today with Network-Side Scripting302Views0likes2CommentsF5 Friday: How to Stop Running the Project Gauntlet of Doom with DevOps
#devops #video #puppet #stack RedHat puppetizes its infrastructure to reduce inefficiency and streamline deployment I could cite various studies, pundits, and research to prove that automation improves the overall success rates of continuous deployment efforts, mitigates outages caused by human error and misconfiguration, and generally makes the data center smell minty fresh, but we already know all of that. It's generally understood that there is a general need for devops, for continuous deployment and lifecycle management systems enabled by frameworks like Puppet. What we don't often consider is that the process of "puppetizing" production (and pre-production) environments affords organizations the opportunity to re-evaluate existing network and application architecture and eliminate inefficiencies that exist there, as well. Many an architecture has been built upon legacy application network infrastructure, for example, that cannot be – easily or otherwise – puppetized and thus serves as an impediment to realizing the full benefits of operational automation. Sure, you could automate everything but those components that can't be integrated, but that leaves open the possibility for misconfiguration, for fat-finger errors, for fragmentation of ownership that is often the source of delays in moving releases through various environments and into production. RedHat, for example, maintains 4-7 pre-production environments through which it used to take 2-6 weeks to cycle a new release. Part of that time was caused by what RedHat veteran Bret McMillan called the "Project Gauntlet of Doom", which required collaboration across an increasingly fragmented set of groups responsible for managing the manual configuration of various infrastructure components – both hardware and software. What the fine folks at RedHat discovered when they decided to puppetize their environments to eliminate the fragmentation was a great deal of inefficiency in the existing architecture. Multiple tiers of proxies performing fairly standard application networking functions: SSL termination, header manipulation, compression, URL rewrites and redirects and, of course, load balancing. Approximately 10% of the virtual machines in the infrastructure were performing what RedHat considered "low business value" functions – primarily because business (and end-users) don't directly realize the impact of such services, only the end-result of performance, security and availability. What they found was that they could (1) consolidate 3 layers of services, (2) puppetize their entire environment, and (3) eliminate service-ip sprawl caused by the growing demands on IP addresses required to scale out its proxy layers. In other words, the desire for process automation through DevOps led to a streamlined, more efficient delivery architecture that benefitted operations, customers, and the business. Because BIG-IP could provide an integrated service platform upon which multiple application-layer services could be deployed, the need for multiple layers of proxies and load balancers was eliminated, reducing the number of devices through which traffic had to traverse. Performance improved, as did the consistency of operations both by providing a common, integrated platform management interface at the BIG-IP for all application network layer services as well as through Puppet for configuration and continuous deployment management. It's a win-win that effectively eliminates the need to run the Project Gauntlet of Doom during release cycles because Puppet automates the processes and eliminates the manual collaboration previously required. The end result is faster … everything, across the board. A presentation from PuppetConf 2012, "Managing F5 LTM with Puppet - Matthew Carpenter and Bret McMillan of Red Hat" is available in video format, including greater detail and color commentary. Happy Automating!196Views0likes0CommentsF5 Friday: Lessons from (IT) Geese
Birds migrate in flocks, which means every individual has the support of others. IT often migrates alone – but it doesn’t have to. “Lessons from Geese” has been around a long time. It is often cited and referenced, particularly with respect to teamwork and collaboration. The very first “lesson” learned from geese migrations applied to human collaboration is this: Fact #1: As each goose flaps its wings, it creates an uplift for the others behind it. By flying in a "V" formation, the whole flock adds 71% greater flying range than if each bird flew alone. Lesson: People who share a common direction and sense of community can get where they are going quicker and easier because they are traveling on the thrust of another. That’s probably not surprising at all and the basic lesson is one we’re all familiar with, no doubt. Fact #3: When the lead goose tires, it rotates back into the formation and another goose flies to the point position. Lesson: It pays to take turns doing the hard tasks and sharing leadership, as with geese, people are interdependent on each other’s skill, capabilities and unique arrangement of gifts, talents or resources. This lesson works well, if everyone is a goose is similarly talented at flying. But within IT there are myriad skill sets being used that must come together to migrate implementations from one version to another. It’s not just software – it’s data stores, identity stores, switches, and application delivery systems. There’s a lot of different skills required to successfully migrate large, business critical systems. And we can’t just pick a random goose to lead when it comes to migrating specific subsets and components; we need experts in various systems to assist. And sometimes, we don’t have the right goose. So we have to find one. “A Plan-Net survey found that 87% of organizations are currently using Exchange 2003 or earlier. There has been a reluctance to adopt the 2007 version, often considered to be the Vista of the server platform — faulty and dispensable.” -- 10 reasons to migrate to Exchange 2010 This doesn’t explain a reluctance to move to Exchange 2010. With larger mailboxes, virtualization support, voicemail transcription, and higher availability, what’s not to like? Significant changes in the underlying architecture – which cascade into the infrastructure – may be one of them. Upgrading a business critical service like Exchange requires more planning and forethought than upgrading to the latest version of Angry Birds, after all. Continuity of service is required even as the new version is put in place. And while there are plenty of experts who can help with the migration of Exchange, there are fewer that can help with the migration of its supporting infrastructure services. F5 has an answer for that, a skilled goose, if you will, who can take the lead and keep the organization on track. Introducing: F5 Architecture Design for Microsoft Exchange Service The F5 Architecture Design for Microsoft Exchange service comprises an intense three days of discussion, information gathering, analysis and knowledge-sharing of network considerations for the optimal deployment of Microsoft Exchange in an F5 network environment. F5 Professional Services consultants with Exchange expertise conduct assessments during which they review your current network and future needs to streamline your new implementation, upgrade or migration to your preferred version of Microsoft Exchange. Plan During the project kick-off call, F5 Professional Services consultants make sure to understand your overall project goals, flag dependencies, and validate that all questionnaires and information requirements have been addressed prior to the initiation of the engagement. Analyze The F5 Architecture Design for Microsoft Exchange Service facilitates the discussion, analysis and development of the network architecture requirements that best support your Exchange deployment. The engagement starts with an overview and whiteboard discussion of F5 technology, focusing on topics of high availability, scalability, security and performance. Next, the consultants engage in conversations about mail deployment for legacy mail systems or new deployments, touching on sizing, security and service-level agreements. Finally, they review the architectural components specific to your environment, including network flows, client access, unified messaging, and considerations of single vs. multisite deployments. Design and Report The F5 Professional Services consultants consolidate the results from the analysis phase and deliver a Proposed Microsoft Exchange Network Architecture and a Proposed Network Migration Plan report detailing the recommendations. F5 consultants intimately understand F5 BIG-IP ® systems and their operation, and can draw on the F5 Solutions for Microsoft Exchange Server. You can be assured of the thoroughness and relevance of their recommendations. The consultants’ reports provide you with the blueprint for flexible and cost-effective communication and collaboration in your organization. For more information about the F5 Architecture Design for Microsoft Exchange service, use the search function on f5.com or contact consulting@f5.com Additional Resources: Microsoft Exchange 2010: HELO New Architecture Deploying F5 with Microsoft Exchange 2010 F5 solution for Microsoft Exchange Microsoft Exchange 2010: HELO New Architecture F5 Friday: BIG-IP Solutions for Microsoft Private Cloud Webcast - BIG-IP v11 and Microsoft Technologies Social Forums - F5/Microsoft Solutions Eliminating Data Center Vertigo with F5 and Microsoft F5 Friday: Microsoft and F5 Lync Up on Unified Communications F5 Friday: Playing in the Infrastructure Orchestra(tion)253Views0likes0CommentsNever attribute to technology that which is explained by the failure of people
#cloud Whether it’s Hanlon or Occam or MacVittie, the razor often cuts both ways. I am certainly not one to ignore the issue of complexity in architecture nor do I dismiss lightly the risk introduced by cloud computing through increased complexity. But I am one who will point out absurdity when I see it, and especially when that risk is unfairly attributed to technology. Certainly the complexity introduced by attempts to integrate disparate environments, computing models, and networks will give rise to new challenges and introduce new risk. But we need to carefully consider whether the risk we discover is attributable to the technology or to simple failure by those implementing it. Almost all of the concepts and architectures being “discovered” in conjunction with cloud computing are far from original. They are adaptations, evolutions, and maturation of existing technology and architectures. Thus, it is almost always the case that when a “risk” of cloud computing is discovered it is not peculiar to cloud computing at all, and thus likely has it roots in implementation not the technology. This is not to say there aren’t new challenges or risks associated with cloud computing, there are and will be cloud-specific risks that must be addressed (IP Identity Theft was heretofore unknown before the advent of cloud computing). But let’s not make mountains out of molehills by failing to recognize those “new” risks that actually aren’t “new” at all, but rather are simply being recognized by a wider audience due to the abundance of interest in cloud computing models. For example, I found this article particularly apocalyptic with respect to cloud and complexity on the surface. Digging into the “simple scenario”, however, revealed that the meltdown referenced was nothing new, and certainly wasn’t a technological problem – it was another instance of lack of control, of governance, of oversight, and of communication. The risk is being attributed to technology, but is more than adequately explained by the failure of people. The Hidden Risk of a Meltdown in the Cloud Ford identifies a number of different possibilities. One example involves an application provider who bases its services in the cloud, such as a cloud -based advertising service. He imagines a simple scenario in which the cloud operator distributes the service between two virtual servers, using a power balancing program to switch the load from one server to the other as conditions demand. However, the application provider may also have a load balancing program that distributes the customer load. Now Ford imagines the scenario in which both load balancing programs operate with the same refresh period, say once a minute. When these periods coincide, the control loops start sending the load back and forth between the virtual servers in a positive feedback loop. Could this happen? Yes. But consider for a moment how it could happen. I see three obvious possibilities: IT has completely abdicated its responsibility to governing foundational infrastructure services like load balancing and allowed the business or developers to run amokwithout regard for existing services. IT has failed to communicate its overarching strategy and architecture with respect to high-availability and scale in inter-cloud scenarios to the rest of the IT organization, i.e. IT has failed to maintain control (governance) over infrastructure services. The left hand of IT and the right hand of IT have been severed from the body of IT and geographically separated with no means to communicate. Furthermore, each hand of IT wholeheartedly believes that the other is incompetent and will fail to properly architect for high-availability and scalability, thus requiring each hand to implement such services as required to achieve high-availability. While the third possibility might make a better “made for SyFy tech-horror” flick, the reality is likely somewhere between 1 and 2. This particular scenario, and likely others, is not peculiar to cloud. The same lack of oversight in a traditional architecture could lead to the same catastrophic cascade described by Ford in the aforementioned article. Given a load balancing service in the application delivery tier, and a cluster controller in the application infrastructure tier, the same cascading feedback loop could occur, causing a meltdown and inevitably downtime for the application in question. Astute observers will conclude that an IT organization in which both a load balancing service and a cluster controller are used to scale the same application has bigger problems than duplicated services and a failed application. This is not a failure of technology, nor is it caused by excessive complexity or lack of transparency within cloud computing environments. It’s a failure to communicate, to control, to oversee the technical implementation of business requirements through architecture. That’s a likely conclusion before we even start considering an inter-cloud model with two completely separate cloud providers sharing access to virtual servers deployed in one or the other – maybe both? Still, the same analysis applies – such an architecture would require willful configuration and knowledge of how to integrate the environments. Which ultimately means a failure on the part of people to communicate. THE REAL PROBLEM The real issue here is failure to oversee – control – the integration and use of cloud computing resources by the business and IT. There needs to be a roadmap that clearly articulates what services should be used and in what environments. There needs to be an understanding of who is responsible for what services, where they connect, with whom they share information, and by whom they will (and can be) accessed. Maybe I’m just growing jaded – but we’ve seen this lack of roadmap and oversight before. Remember SOA? It ultimately failed to achieve the benefits promised not because the technology failed, but because the implementations were generally poorly architected and governed. A lack of oversight and planning meant duplicated services that undermined the success promised by pundits. The same path lies ahead with cloud. Failure to plan and architect and clearly articulate proper usage and deployment of services will undoubtedly end with the same disillusioned dismissal of cloud as yet another over-hyped technology. Like SOA, the reality of cloud is that you should never attribute to technology that which is explained by the failure of people. BFF: Complexity and Operational Risk The Pythagorean Theorem of Operational Risk At the Intersection of Cloud and Control… What is a Strategic Point of Control Anyway? The Battle of Economy of Scale versus Control and Flexibility Hybrid Architectures Do Not Require Private Cloud Control, choice, and cost: The Conflict in the Cloud Do you control your application network stack? You should. The Wisdom of Clouds: In Cloud Computing, a Good Network Gives You Control...183Views0likes0CommentsHybrid Architectures Do Not Require Private Cloud
Oh, it certainly helps, but it’s not a requirement Taking advantage of cloud-hosted resources does not require forklift re-architecture of the data center. That may sound nearly heretical but that’s the truth, and I’m not talking about just SaaS which, of course, has never required anything more than an Internet connection to “integrate” into the data center. I’m talking about IaaS and integrating compute and storage resources into the data center, whether it’s cloud-based or traditional or simply highly virtualized. Extending the traditional data center using hybrid model means being able to incorporate (integrate) cloud-hosted resources as part of the data center. For most organizations this means elasticity – expanding and contracting capacity by adding and removing remote resources to a data center deployed application. Flexibility and cost savings drive this model, and the right model can realize the benefits of cloud without requiring wholesale re-architecture of the data center. That’s something that ought to please the 50% of organizations that, according to a 2011 CIO survey, are interested in cloud specifically to increase capacity and availability. Bonus: it also serves to address other top drivers identified in the same survey of reducing IT management and maintenance as well as IT infrastructure investment. Really Big Bonus? Most organizations probably have the means by which they can achieve this today. LEVERAGING CLOUD RESOURCES FROM A TRADITIONAL DATA CENTER Scalability requires two things: resources and a means to distribute load across them. In the world of application delivery we call the resources “pools” and the means to distribute them an application delivery controller (load balancing service, if you prefer). The application delivery tier, where the load balancing service resides topologically in the data center, is responsible for not only distributing load across resources but for being able to mitigate failure without disrupting the application service. That goes for elasticity, too. It should be possible to add and remove (intentionally through provisioning processes or unintentionally through failure) resources from a given pool without disruption the overall application service. This is the primary business and operational value brought to an organization by load balancing services: non-disruptive (or seamless or transparent if you prefer more positive marketing terminology) elasticity. Yes, the foundations of cloud have always existed and they’re in most organizations’ data centers today. Now, it isn’t that hard to imagine how this elasticity can extend to integrate cloud-hosted resources. Such resources are either non-disruptively added to/removed from the load balancing service’s “pool” of resources. The application delivery controller does not care whether the resources in the pool are local or remote, traditional or cloud, physical or virtual. Resources are resources. So whether the data center is still very traditional (physical-based), has moved into a highly virtualized state, or has gone all the way to cloud is really not relevant to the application delivery service. All resources can be operationally managed consistently by the application delivery controller. To integrate cloud-based resources into the architecture requires only one thing: connectivity. The connectivity between a data center and the “cloud” is generally referred to as a cloud bridge (or some variation thereof). This cloud bridge has the responsibility of connecting the two worlds securely and providing a network compatibility layer that “bridges” the two networks, implying a transparency that allows resources in either environment to communicate without concern for the underlying network topology. How this is accomplished varies from solution to solution, and there are emerging “virtual network encapsulation” technologies (think VXLAN and GRE) that are designed to make this process even smoother. Once a connection is established, and assuming network bridging capabilities, resources provisioned in “the cloud” can be non-disruptively added to the data center-hosted “pools” and from there, load is distributed as per the load balancing service’s configuration for the resource (application, etc… ). THE ROAD to CLOUD There seems to be a perception in the market that you aren’t going to get to hybrid cloud until you have private cloud, which may explain the preponderance of survey respondents who are focused on private cloud with must less focus on public cloud. The road to “cloud” doesn’t require that you completely revamp the data center to be cloud-based before you can begin taking advantage of public cloud resources. In fact, a hybrid approach that integrates public cloud into your existing data center provides an opportunity to move steadily in the direction of cloud without being overwhelmed by the transformation that must ultimately occur. A hybrid traditional-cloud based approach allows the organization to build the skill sets necessary, define the appropriate roles that will be needed, and understand the fundamental differences in operational models required to implement the automation and orchestration that ultimately brings to the table all the benefits of cloud (as opposed to just the cheaper resources). Cloud is a transformational journey – for both IT and the business – but it’s not one that can be taken overnight. The pressure to “go cloud” is immense, today, but IT still needs the opportunity to evaluate both the data center and cloud environments for appropriateness and to put into place the proper policies and governance structure around the use of cloud resources. A strategy that allows IT to begin taking advantage of cloud resources now without wholesale rip-and-replace of existing technology provides the breathing room IT needs to ensure that the journey to cloud will be a smooth one, where the benefits will be realized without compromising on the operational governance required to assure availability and security of network, data, and application resources. Related blogs & articles: F5 Friday: Addressing the Unintended Consequences of Cloud Getting at the Heart of Security in the Cloud Cloud Bursting: Gateway Drug for Hybrid Cloud Identity Gone Wild! Cloud Edition At the Intersection of Cloud and Control… The Conspecific Hybrid Cloud233Views0likes0CommentsCloud Bursting: Gateway Drug for Hybrid Cloud
The first hit’s cheap kid … Recently Ben Kepes started a very interesting discussion on cloud bursting by asking whether or not it was real. This led to Christofer Hoff pointing out that “true” cloud bursting required routing based on business parameters. That needs to be extended to operational parameters, but in general, Hoff’s on the mark in my opinion. The core of the issue with cloud bursting, however, is not that requests must be magically routed to the cloud in an overflow situation (that seems to be universally accepted as part of the definition), but the presumption that the content must also be dynamically pushed to the cloud as part of the process, i.e. live migration. If we accept that presumption then cloud bursting is nowhere near reality. Not because live migration can’t be done, but because the time requirement to do so prohibits a successful “just in time” bursting approach. There is already a requirement that provisioning of resources in the cloud as preparation for a bursting event happen well before the event, it’s a predictive, proactive process nor a reactionary one, and the inclusion of live migration as part of the process would likely result in false provisioning events (where content is migrated prematurely based on historical trending which fails to continue and therefore does not result in an overflow situation). So this leaves us with cloud bursting as a viable architectural solution to scale on-demand only if we pre-position content in the cloud, with the assumption that provisioning is a less time intensive process than migration plus provisioning. This results in a more permanent, hybrid cloud architecture. THE ROAD to HYBRID The constraints on the network today force organizations who wish to address their seasonal or periodic need for “overflow” capacity to pre-position the content in demand at a cloud provider. This isn’t as simple as dropping a virtual machine in EC2, it also requires DNS modifications to be made and the implementation of the policy that will ultimately trigger the routing to the cloud campus. Equally important – actually, perhaps more important – is having the process in place that will actually provision the application at the cloud campus. In other words, the organization is building out the foundation for a hybrid cloud architecture. But in terms of real usage, the cloud-deployed resources may only be used when overflow capacity is required. So it’s only used periodically. But as its user base grows, so does the need for that capacity and organizations will see those resources provisioned more and more often, until they’re virtually always on. There’s obviously an inflection point at which the use of cloud-based resources moves out of the realm of “overflow capacity” and into the realm of “capacity”, period. At that point, the organization is in possession of a full, hybrid cloud implementation. LIMITATIONS IMPOSE the MODEL Some might argue – and I’d almost certainly concede the point – that a cloud bursting model that requires pre-positioning in the first place is a hybrid cloud model and not the original intent of cloud bursting. The only substantive argument I could provide to counter is that cloud bursting focuses more on the use of the resources and not the model by which they are used. It’s the on-again off-again nature of the resources deployed at the cloud campus that make it cloud bursting, not the underlying model. Regardless, existing limitations on bandwidth force the organization’s hand; there’s virtually no way to avoid implementing what is a foundation for hybrid cloud as a means to execute on a cloud bursting strategy (which is probably a more accurate description of the concept than tying it to a technical implementation, but I’m getting off on a tangent now). The decision to embark on a cloud bursting initiative, therefore, should be made with the foresight that it requires essentially the same effort and investment as a hybrid cloud strategy. Recognizing that up front enables a broader set of options for using those cloud campus resources, particularly the ability to leverage them as true “utility” computing, rather than an application-specific (i.e. dedicated) set of resources. Because of the requirement to integrate and automate to achieve either model, organizations can architect both with an eye toward future integration needs – such as those surrounding identity management, which continues to balloon as a source of concern for those focusing in on SaaS and PaaS integration. Whether or not we’ll solve the issues with live migration as a barrier to “true” cloud bursting remains to be seen. As we’ve never managed to adequately solve the database replication issue (aside from accepting eventual consistency as reality), however, it seems likely that a “true” cloud bursting implementation may never be possible for organizations who aren’t mainlining the Internet backbone.283Views0likes0CommentsThe Cloud API is Pseudo-Consolidation of Infrastructure
It’s about operational efficiency and consistency, emulated in the cloud by an API to create the appearance of a converged platform In most cases, the use of the term “consolidation” implies the aggregation (and subsequently elimination) of like devices. Application delivery consolidation, for example, is used to describe a process of scaling up infrastructure that often occurs during upgrade cycles. Many little boxes are exchanged for a few larger ones as a means to simplify the architecture and reduce the overall costs (hard and soft) associated with delivering applications. Consolidation. But cloud has opened (or should have opened) our eyes to a type of consolidation in which like services are aggregated; a consolidation strategy in which we layer a thin veneer over a set of adjacent functionalities in order to provide a scalable and ultimately operationally consistent experience: an API. A cloud API consolidates infrastructure from an operational perspective. It is the bringing together of adjacent functionalities into a single “entity.” Through a single API, many infrastructure functions and services can be controlled – provisioning, monitoring, security, and load balancing (one part of application delivery) are all available through the same API. Certainly the organization of an API’s documentation segments services into similar containers of functionality, but if you’ve looked at a cloud API you’ll note that it’s all the same API; only the organization of the documentation makes it appear otherwise. This service-oriented approach allows for many of the same benefits as consolidation, without actually physically consolidating the infrastructure. Operational consistency is one of the biggest benefits. OPERATIONAL CONSISTENCY The ability to consistently manage and monitor infrastructure through the same interface – whether API or GUI or script – is an important factor in data center efficiency. One of the reasons enterprises demand overarching data center-level monitoring and management systems like HP OpenView and CA and IBM Tivoli is consistency and an aggregated view of the entire data center. It is no different in the consumer world, where the consistency of the same interface greatly enhances the ability of the consumer to take advantage of underlying services. Convenience, too, plays a role here, as a single device (or API) is ultimately more manageable than the requirement to use several devices to accomplish the same thing. Back in the day I carried a Blackberry, a mobile phone, and a PDA – each had a specific function and there was very little overlap between the two. Today, a single “smart”phone provides the functions of all three – and then some. The consistency of a single interface, a single foundation, is paramount to the success of such consumer devices. It is the platform, whether consumers realize it or not, that enables their highly integrated and operationally consistent experience. The same is true in the cloud, and ultimately in the data center. Cloud (pseudo) consolidates infrastructure the only way it can – through an API that ultimately becomes the platform analogous to an iPhone or Android-based device. Cloud does not eliminate infrastructure, it merely abstracts it into a consolidated API such that the costs to manage it are greatly reduced due to the multi-tenant nature of the platform. Infrastructure is still managed, it’s just managed through an API that simplifies and unifies the processes to provide a more consistent approach that is beneficial to the organization in terms of hard (hardware, software) and soft (time, administration) costs. The cloud and its requisite API provide the consolidation of infrastructure necessary to achieve greater cost savings and higher levels of consistency, both of which are necessary to scale operations in a way that makes IT able to meet the growing demand on its limited resources. BFF: Complexity and Operational Risk The Pythagorean Theorem of Operational Risk At the Intersection of Cloud and Control… Cloud Computing and the Truth About SLAs IT Services: Creating Commodities out of Complexity What is a Strategic Point of Control Anyway? The Battle of Economy of Scale versus Control and Flexibility193Views0likes0CommentsPerformance in the Cloud: Business Jitter is Bad
#fasterapp #ccevent While web applications aren’t sensitive to jitter, business processes are. One of the benefits of web applications is that they are generally transported via TCP, which is a connection-oriented protocol designed to assure delivery. TCP has a variety of native mechanisms through which delivery issues can be addressed – from window sizes to selective acks to idle time specification to ramp up parameters. All these technical knobs and buttons serve as a way for operators and administrators to tweak the protocol, often at run time, to ensure the exchange of requests and responses upon which web applications rely. This is unlike UDP, which is more of a “fire and forget” protocol in which the server doesn’t really care if you receive the data or not. Now, voice and streaming video and audio over the web has always leveraged UDP and thus it has always been highly sensitive to jitter. Jitter is, without getting into layer one (physical) jargon, an undesirable delay in the otherwise consistent delivery of packets. It causes the delay of and sometimes outright loss of packets that are experienced by users as pauses, skips, or jumps in multi-media content. While the same root causes of delay – network congestion, routing changes, time out intervals – have an impact on TCP, it generally only delays the communication and other than an uncomfortable wait for the user, does not negatively impact the content itself. The content is eventually delivered because TCP guarantees that, UDP does not. However, this does not mean that there are no negative impacts (other than trying the patience of users) from the performance issues that may plague web applications and particularly those that are more and more often out there, in the nebulous “cloud”. Delays are effectively business jitter and have a real impact on the ability of the business to perform its critical functions – and that includes generating revenue. BUSINESS JITTER and the CLOUD David Linthicum summed up the issue with performance of cloud-based applications well and actually used the terminology “jitter” to describe the unpredictable pattern of delay: Are cloud services slow? Or fast? Both, it turns out -- and that reality could cause unexpected problems if you rely on public clouds for part of your IT services and infrastructure. When I log performance on cloud-based processes -- some that are I/O intensive, some that are not -- I get results that vary randomly throughout the day. In fact, they appear to have the pattern of a very jittery process. Clearly, the program or system is struggling to obtain virtual resources that, in turn, struggle to obtain physical resources. Also, I suspect this "jitter" is not at all random, but based on the number of other processes or users sharing the same resources at that time. -- David Linthicum, “Face the facts: Cloud performance isn't always stable” But what the multitude of articles coming out over the past year or so with respect to performance of cloud services has largely ignored is the very real and often measurable impact on business processes. That jitter that occurs at the protocol and application layers trickles up to become jitter in the business process; a process that may be critical to servicing customers (and thus impacts satisfaction and brand) as well as on the bottom line. Unhappy customers forced to wait for “slow computers”, as it is so often called by the technically less adept customer service representatives employed by many organizations, may take to the social media airwaves to express displeasure, or cancel an order, or simply refuse to do business in the future with the organization based on delays experienced because of unpredictable cloud performance. Business jitter can also manifest as decreased business productivity measures, which it turns out can be measured mathematically if you put your mind to it. Understanding the variability of cloud performance is important for two reasons: You need to understand the impact on the business and quantify it before embarking on any cloud initiative so it can be factored in to the overall cost-benefit analysis. It may be that the cost savings from public cloud are much greater than the potential loss of revenue and/or productivity, and thus the benefits of a cloud-based solution outweigh the risks. Understanding the variability and from where it comes will have an impact and help guide you to choosing not only the right provider, but the right solutions that may be able to normalize or mitigate the variability. If the primary source of business jitter is your WAN, for example, then it may be that choosing a provider that supports your ability to deploy WAN optimization solutions would be an appropriate strategy. Similarly , if the variability in performance stems from capacity issues, then choosing a provider that allows greater latitude in load balancing algorithms or the deployment of a virtual (soft) ADC would likely be the best strategy. It seems clear from testing and empirical (as well as anecdotal) evidence that cloud performance is highly variable and, as David puts it, unstable. This should not necessarily be seen as a deterrent to adopting cloud services – unless your business is so highly sensitive to latency that even milliseconds can be financially damaging – but rather it should be a reality that factors into your decision making process with respect to your choice of provider and the architecture of the solution you’ll be deploying (or subscribing to, in the case of SaaS) in the cloud. Knowing is half the battle to leveraging cloud successfully. The other half is strategy and architecture. I’ll be at CloudConnect 2012 and we’ll discuss the subject of cloud and performance a whole lot more at the show! Sessions Is Features vs. Performance the New Cloud Battle Line? On the performance of clouds Face the facts: Cloud performance isn't always stable Data Center Feng Shui: Architecting for Predictable Performance A Formula for Quantifying Productivity of Web Applications Enterprise Apps are Not Written for Speed The Three Axioms of Application Delivery Virtualization and Cloud Computing: A Technological El Niño227Views0likes0CommentsThe API is the Center of the Application (Integration) Universe
#mobile #fasterapp #ccevent Today, at least. Tomorrow, who knows? Some have tried to distinguish between “mobile cloud” and “cloud” by claiming the former is the use of the web browser on a mobile device to access services while the latter uses device-native applications. Like all things cloud, the marketing fluff is purposefully obfuscating and sweeping under the rug the technology required to make things work for consumers, whether those consumers be your kids or IT professionals. Infrastructure is not eliminated when organizations take to the cloud nor do the constraints of web-based protocols and methodologies become irrelevant when Bob uses a service to store photos of his kid’s piano recital on Flickr. The applications and web browsers on a mobile device are using the same technology, the same protocols, suffering under the same constraints as the rest of us in wireline land. If developers are as smart as they are lazy (and I say that as a compliment because it is the laziness of developers that more often than not leads to innovation) they have already moved to an API-centric model in which web site and device native-app interfaces both leverage the same APIs. This isn’t just a social integration phenomenon – it isn’t just about Twitter and Facebook and Google. API usage and demand is growing, and it is not expected to stop any time soon. Given the option, developers asked about desire to connect to services (assuming service = API) the overwhelming response was developers would like to connect to “everything, if it were easy.” (API Integration Pain Survey Results) The API is rapidly becoming (if it isn’t already) the center of the application (integration) universe. This unfortunately has the potential to cause confusion and chaos in the data center. When a single API is consumed by multiple clients – mobile, remote, applications, partners, etc.. – solutions unique to each quickly seem to make their way into the code to deal with “exceptions” and “peculiarities” inherent to the client platform. That’s inefficient and, when one considers the growing number of platforms and form-factors associated with mobile communications alone, it is not scalable from a people and process perspective. But reality is that these exceptions and peculiarities – often times caused by a lack of feature parity across form-factors and platforms – must be addressed somewhere, and that somewhere is unfortunately almost unilaterally determined to be the application. Do we need to treat mobile devices differently? In terms of performance and delivery concerns, yes. But that’s where we leverage the application delivery tier to differentiate by device to ensure delivery. That’s the beauty of an abstracted, service-enabled data center – there’s an intelligent and agile layer of application delivery services that mediates between clients (regardless of their form factor) and services to ensure that delivery needs (security, performance, and availability) are met in part by addressing the unique characteristics and reality of access via mobile devices. ABSTRACT and ISOLATE This is exactly the type of problem application delivery is designed to address. Multiple clients, multiple networks, all accessing the same application service or API but requiring specific authentication, security, and delivery characteristics to ensure that operational risk is mitigated in the most efficient manner possible. This includes the ability to throttle services based on user and client, a common approach used by mega-sites such as Twitter. This includes the ability to provide single sign-on capabilities to all clients, regardless of platform, form-factor and support for enterprise-grade authentication integration to the same API or application service. This includes leveraging the appropriate security policies to ensure inbound and outbound security of data regardless of client, such that corporate data is not infected and spread to other consumers. A flexible, scalable application delivery tier addresses the problem of a single API being utilized by a variety of clients in a way that precludes the need to codify specific functionality on a per-platform or form-factor basis in the application logic itself, making the API simpler and easier to maintain as well as test and upgrade. It makes APIs and application services more scalable in terms of people and processes, which in turn makes the development and deployment process more efficient and able to focus on new services rather than constantly modifying and updating existing ones. Service-oriented architecture may have begun in the application demesne as a means to abstract and isolate services such that they could more easily be integrated, maintained, and changed without disruption, but the concept is applicable to the data center as a whole. By leveraging SOA concepts at the data center architecture level, the entire technological landscape of the business can be transformed into one that is ultimately more adaptable, more scalable, and more secure. I’ll be at CloudConnect 2012 and we’ll discuss the subject of cloud and performance a whole lot more at the show! Sessions Facebook Wins “Worst API” in Developer Survey API Integration Pain Survey Results IT Survey: Businesses Embrace APIs for Apps Integration, Not Social The Pythagorean Theorem of Operational Risk At the Intersection of Cloud and Control… Operational Risk Comprises More Than Just Security IT Services: Creating Commodities out of Complexity The Three Axioms of Application Delivery The Magic of Mobile Cloud201Views0likes0Comments