12 TopicsSite stuck during provisioning ?
Hi All, yesterday I installed the site with secure mesh and after some time the site got stuck in provisioning state, when I run some commands to see the status of the site and it shows below mentioned output "status ver" : "The service is being restarted" "config-network" : "The site is in provisioning state" Moreover the ssh and https access to the site was also lost and "health" command is also not showing any IP address configured, rebooted the site and nothing works. When checked the main console it shows like the site is being upgraded. Waited for 2 to 3 hours no difference. Also checked the F5 Zendesk document but couldn't able to relate to my situation. Left the site overnight and in the morning just rebooted the site and it actually provisioned. I observed the similar behavior during the F5 lab when the site was upgraded. it took atleast 6 to 7 hours of to upgrade and then the after the manual reboot the site got working. I just wana know is it a default behavior or am I missed something ?75Views0likes3CommentsF5 provisioning
Just putting my lab together and getting this message when I try adding LTM and ASM and the following error message comes up ... 01071008:3:Provisioning failed with error 1 - Disk limit exceeded. 20338MB are required to provision these modules,but only 30MB are available ... I have 93GB of space allocated for the VM and memory of 8.5GB .1KViews0likes12CommentsProvisioning AWAF
Hey everyone! So I'm new to AWAF and figured I'd lab a bit with it to check out the additional features. I have generated a StrongBox license that includes it and added it to my BIG-IP VE running However, I cannot see it under my resource provisioning page. Now I'm thinking it is included in a separate module like for instance ASM but I'm not sure at all. There should be a difference between ASM and AWAF so I believe they should be run in two separate modules. Do you guys have any idea? Been googling like crazy but coming up short.1.6KViews0likes14CommentsThe Colonial Data Center and Virtualization
No, not colonial as in Battlestar Gallactica or the British Empire, colonial as in corals and weeds and virtual machines I was out pulling weeds this summer – Canada thistle to be exact – and was struck by how much its root system reminded me of Cnidaria (soft corals to those of you whose experience with aquaria remains relegated to suicidal goldfish). Canada thistle is difficult to control because of its extensive root system. Pulling a larger specimen you often find yourself pulling up its root, only to find it connected to three, four or more other specimens. Cnidaria reproduce in a similar fashion, sharing a “root” system that enables them to share resources. Unlike thistles, however, Cnidaria has several different growth forms. There’s a traditional colonial form that resembles thistles – a single, shared long root with various specimens popping up along the path – and one that may be familiar to folks who’ve seen Finding Nemo: a tree formation in which the root branches not only horizontally but vertically, with individual specimens forming upwards along the branch in what gives it a tree-like appearance. Cnidaria produce a variety of colonial forms, each of which is one organism but consists of polyp-like zooids. The simplest is a connecting tunnel that runs over the substrate (rock or seabed) and from which single zooids sprout. In some cases the tunnels form visible webs, and in others they are enclosed in a fleshy mat. More complex forms are also based on connecting tunnels but produce "tree-like" groups of zooids. The "trees" may be formed either by a central zooid that functions as a "trunk" with later zooids growing to the sides as "branches", or in a zig-zag shape as a succession of zooids, each of which grows to full size and then produces a single bud at an angle to itself. In many cases the connecting tunnels and the "stems" are covered in periderm, a protective layer of chitin. [6] Some colonial forms have other specialized types of zooid, for example, to pump water through their tunnels. [12] -- Wikipedia, Cnidaria Of course, both thistle and Cnidaria and the notion of colonial inter-dependence is one that’s shared by the data center. Virtual machines deployed on the same physical host replicate in many ways the advantages and disadvantages of a Cnidarian tree-formation. The close proximity of the 15.6 average VMs per host (according to Vkernel VMI 2012) allows them to share the “local” (virtual) network, which eliminates many of the physical sources of network latency that occur naturally in the data center. But it also means that a failure in the physical network connecting them to the network backbone is catastrophic for all VMs on a given host. Which is why you want to pay careful attention to placement of VMs in a dynamic data center. The concept of pulling compute resources from anywhere in the data center to support scalability on-demand is a tantalizing one, but doing so can have disastrous results in the event of a catastrophic failure in the network. Architecture and careful planning is necessary to ensure that resources do not end up grouped in such a way that a failure in one negatively impacts the entire application. Proximity must be considered as part of a fault isolation strategy, which is a requirement when resources are loosely – if at all – coupled to specific locations within the data center. Referenced blogs & articles: Wikipedia, Cnidaria Virtualization Management Index: Issues 1 and 2 Back to Basics: Load balancing Virtualized Applications Digital is Different The Cost of Ignoring ‘Non-Human’ Visitors Cloud Bursting: Gateway Drug for Hybrid Cloud The HTTP 2.0 War has Just Begun Why Layer 7 Load Balancing Doesn’t Suck Network versus Application Layer Prioritization Complexity Drives Consolidation Performance in the Cloud: Business Jitter is Bad222Views0likes0CommentsThe Infrastructure Turk: Lessons in Services
#devops #cloud If your goal is IT as a Service, then at some point you have to actually service-enable the policies that govern IT infrastructure. My eldest shared the story of “The Turk” recently and it was a fine example of how appearances can be deceiving – and of the power of abstraction. If you aren’t familiar with the story, let me briefly share before we dive in to how this relates to infrastructure and, specifically, IT as a Service. The Turk, the Mechanical Turk or Automaton Chess Player was a fake chess-playing machine constructed in the late 18th century. The Turk was in fact a mechanical illusion that allowed a human chess master hiding inside to operate the machine. With a skilled operator, the Turk won most of the games played during its demonstrations around Europe and the Americas for nearly 84 years, playing and defeating many challengers including statesmen such as Napoleon Bonaparte and Benjamin Franklin. Although many had suspected the hidden human operator, the hoax was initially revealed only in the 1820s by the Londoner Robert Willis. [2] -- Wikipedia, “The Turk” The Automaton was actually automated in the sense that the operator was able to, via mechanical means, move the arm of the Automaton and thus give the impression the Automaton was moving pieces around the board. The operator could also nod and shake its head and offer rudimentary facial expressions. But the Automaton was not making decisions in any way, shape or form. The operator made the decisions and did so quite well, defeating many a chess champion of the day. [ You might also recall this theme appeared in the “Wizard of Oz”, wherein the Professor sat behind a “curtain” and “automated” what appeared to the inhabitants to be the great Wizard of Oz. ] The Turk was never really automated in the sense that it could make decisions and actually play chess. Unlike Watson, the centuries old Automaton was never imbued with the ability to dynamically determine what moves to make itself. This is strikingly similar to modern “automation” and in particular the automation being enabled in modern data centers today. While automated configuration and set up of components and applications is becoming more and more common, the actual decisions and configuration are still handled by operators who push the necessary levers and turn the right knobs to enable infrastructure to react. IT as a SERVICE needs POLICIES as well as RESOURCES We need to change this model. We need to automate the Automaton in a way that enables automated provisioning initiated by the end-user, i.e. application owner. We need infrastructure and ultimately operational services not only to configure and manage infrastructure, but to provision it. More importantly, end-users need to be able to provision the appropriate infrastructure services (policies) as well. Right now, devops is doing a great job enabling deployment automation; that is, creating scripts and recipes that are repeatable with respect to provisioning the appropriate infrastructure resources necessary to successfully deploy an application. But what we aren’t doing (yet) is enabling those as services. We’re currently the 18 th century version of the Automaton, when we want is the 21 st century equivalent – automation from top to bottom (or underneath, as the analogy would require). What we’ve done thus far is put a veneer over what is still a very manual process. Ops still determines the configuration on a per-application basis and subsequently customizes the configurations before pushing out the script. Certainly that script reduces operational costs and time whenever additional capacity is required for that application as it becomes possible to simply replicate the configuration, but it does not alleviate the need for manual configuration in the first place. Nor does it leave room for end-users to tweak or otherwise alter the policies that govern myriad operational functions across network, storage, and server infrastructure that have a direct impact – for good and for ill –on the performance, security, and stability of applications. End users must still wait for the operator hidden inside the Automaton to make a move. IT as a Service needs services. And not just services for devops, but services for end users, for the consumers of IT. The application owner, the business stakeholder, the admin. These services need to not only take into consideration the basic provisioning of the resources required, but the policies that govern them. The intelligence behind the Automaton needs to be codified and encapsulated in a way that makes them as reusable as the basic provisionable resources. We need not only provision resources – an IP address, network bandwidth, and the pool of resources from which applications are served and scale, but the policies governing access, security, and even performance. These policies are at the heart of what IT provides for its consumers; the security that enables compliance and protects applications from intrusions and downtime, the dynamic adjustments required to keep applications performing within specified business requirements, the thresholds that determine the ebb and flow of compute capacity required to keep the application available. These policies should be service-enabled and provisionable by the end-user, by the consumers of IT services. The definitions of cloud computing , from wherever they originate, tend to focus on resources and lifecycle management of those resources. If one construes that to include applicable policies as well, then we are on the right track. But if we do not, then we need to consider from a more strategic point of view what is required of a successful application deployment. It is not just the provisioning of resources, but policies, as well, that make a deployment successful. The Automaton is a great reminder of the power of automation, but it is just as powerful a reminder of the failure to encapsulate the intelligence and decision-making capabilities required. In the 18th century it was nearly impossible to imagine a mechanical system that could make intelligent, real-time decisions. That’s one of the reasons the Automaton was such a fascinating and popular exhibition. The revelation of the Automaton was a disappointment, because it revealed that under the hood, that touted mechanical system was still relying on manual and very human intelligence to function. If we do not pay attention to this lesson, we run the risk of the dynamic data center also being exposed as a hoax one day, still primarily enabled by manual and very human processes to function. Service-enablement of policy lifecycle management is a key component to liberating the data center and an integral part of enabling IT as a Service. Resolution to the Case (For & Against) X-Driven Scalability in Cloud Computing Environments The Cloud Configuration Management Conundrum IT as a Service: A Stateless Infrastructure Architecture Model If a Network Can’t Go Virtual Then Virtual Must Come to the Network You Can’t Have IT as a Service Until IT Has Infrastructure as a Service This is Why We Can’t Have Nice Things The Consumerization of IT: The OpsStore An Aristotlean Approach to Devops and Infrastructure Integration The Impact of Security on Infrastructure Integration Infrastructure 2.0 + Cloud + IT as a Service = An Architectural Parfait204Views0likes0CommentsF5 Friday: The Evolution of Reference Architectures to Repeatable Architectures
A reference architecture is a solution with the “some assembly required” instructions missing. As a developer and later an enterprise architect, I evaluated and leveraged untold number of “reference architectures.” Reference architectures, in and of themselves, are a valuable resource for organizations as they provide a foundational framework around which a concrete architecture can be derived and ultimately deployed. As data center architecture becomes more complex, employing emerging technologies like cloud computing and virtualization, this process becomes fraught with difficulty. The sheer number of moving parts and building blocks upon which such a framework must be laid is growing, and it is rarely the case that a single vendor has all the components necessary to implement such an architecture. Integration and collaboration across infrastructure solutions alone, a necessary component of a dynamic data center capable of providing the economy of scale desired, becomes a challenge on top of the expected topological design and configuration of individual components required to successfully deploy an enterprise infrastructure architecture from the blueprint of a reference architecture. It is becoming increasingly important to provide not only reference architectures, but repeatable architectures. Architectural guidelines that not only provide the abstraction of a reference architecture but offer the kind of detailed topological and integration guidance necessary for enterprise architects to move from concept to concrete implementation. Andre Kindness of Forrester Research said it well in a recent post titled, “Don’t Underestimate The Value Of Information, Documentation, And Expertise!”: Support documentation and availability to knowledge is especially critical in networking design, deployment, maintenance, and upgrades. Some pundits have relegated networking to a commodity play, but networking is more than plumbing. It’s the fabric that supports a dynamic business connecting users to services that are relevant to the moment, are aggregated at the point of use, and originate from multiple locations. The complexity has evolved from designing in a few links to tens of hundreds of relationships (security, acceleration, prioritization, etc.) along the flow of apps and data through a network. Virtualization, convergence, consolidation, and the evolving data center networks are prime examples of today’s network complexity. REPEATABLE ARCHITECTURE For many years one of F5’s differentiators has been the development and subsequent offering of “Application Ready Solutions”. The focus early on was on providing optimal deployment configuration of F5 solutions for specific applications including IBM, Oracle, Microsoft and more recently, VMware. These deployment guides are step-by-step, detailed documentation developed through collaborative testing with the application provider that offer the expertise of both organizations in deploying F5 solutions for optimal performance and efficiency. As the data center grows more complex, so do the challenges associated with architecting a firm foundation. It requires more than application-specific guidance, it now requires architectural guidance. While reference architectures are certainly still germane and useful, there also needs to be an evolution toward repeatable architectures such that the replication of proposed solutions derived from the collaborative efforts of vendors is achievable. It’s not enough to throw up an architecture comprised of multiple solutions from multiple vendors without providing the insight and guidance necessary to actually replicate that architecture in the data center. That’s why it’s exciting to see our collaborative efforts with vendors of key data center solutions like IBM and VMware result in what are “repeatable architectures.” These are not simply white papers and Power Point decks that came out of joint meetings; these are architectural blueprints that can be repeated in the data center. These are the missing instructions for the “some assembly required” architecture. These jointly designed and developed architectures have already been implemented and tested – and then tested again and again. The repeatable architecture that emerges from such efforts are based on the combined knowledge and expertise of the engineers involved from both organizations, providing insight normally not discovered – and certainly not validated – by an isolated implementation. This same collaboration, this cooperative and joint design and implementation of architectures, is required within the enterprise as well. It’s not enough for architects to design and subsequently “toss over the wall” an enterprise reference architecture. It’s not enough for application specialists in the enterprise to toss a deployment over the wall to the network and security operations teams. Collaboration across compute, network and storage infrastructure requires collaboration across the teams responsible for their management, implementation and optimal configuration. THE FUTURE is REPEATABLE This F5-IBM solution is the tangible representation of an emerging model of collaborative, documented and repeatable architectures. It’s an extension of an existing model F5 has used for years to provide the expertise and insight of the engineers and architects inside the organization that know the products best, and understand how to integrate, optimize and deploy successfully such joint efforts. Repeatable architectures are as important an evolution in the support of jointly developed solutions as APIs and dynamic control planes are to the successful implementation of data center automation. More information on the F5-IBM repeatable enterprise cloud architecture: Why You Need a Cloud to Call Your Own – F5 and IBM White Paper Building an Enterprise Cloud with F5 and IBM – F5 Tech Brief SlideShare Presentation F5 and IBM: Cloud Computing Architecture – Demo Related blogs & articles: F5 Application Ready Solutions F5 and IBM Help Enterprise Customers Confidently Deploy Private Clouds F5 Friday: A War of Ecosystems Data Center Feng Shui: Process Equally Important as Preparation Don’t Underestimate The Value Of Information, Documentation, And Expertise! Service Provider Series: Managing the IPv6 Migration230Views0likes0CommentsDon’t Conflate Virtual with Dynamic
Focusing on form factor over function is as shallow and misguided as focusing on beauty over brains. The saying goes that if all you have is a hammer, everything looks like a nail. I suppose then that it only makes sense that if the only tool you have for dealing with the rapid dynamism of today’s architectural models is virtualization that everything looks like a virtual image. Virtualization is but one way of implementing a dynamic infrastructure capable of the rapid provisioning and configuration gyrations needed to address the fluidity of the “perimeter” of the network today. Dynamic is not a synonym for virtualization and virtualization does not inherently provide the fluidity of the network architecture required to address the challenges associated with highly dynamic environments. COMPLEXIFICATION Consider for a moment the conclusion that the perimeter must become virtual because it is trying to contain a moving target: In the world of cloud infrastructures (IaaS), it is not so easy to determine the “area” that is supposed to be surrounded. Resources are shared among different clients (multi-tenancy) and they are allocated in data-centers of external providers (outsourcing). Moreover, computing resources get virtual – physical resources are transparently shared – and elastic – they are allocated and destroyed on demand. Since this can be done via APIs in a programmable and automated way, cloud computing infrastructures are highly dynamic and volatile. How can one build a perimeter around a moving target? Well, the short answer is: the perimeter must also become virtual, highly dynamic, and automated. -- Why The Perimeter Must Become Virtual There are a number of issues this raises, not the least of which is the mechanism for scaling and managing such a virtual perimeter especially given the topological sensitivity to a variety of network-hosted services, especially those that are focused on security. I’ll simply paraphrase Hoff at this point from his “The Four Horsemen of the Virtualization Security Apocalypse” – there are issues with a fully virtualized approach to security around topology, routing, scalability, and resiliency. In short, there are myriad architectural challenges associated with a fully virtualized approach to enabling a dynamic data center model. [An easy answer as to why security and virtual network devices aren’t always compatible is any situation in which FIPS-140 Level 2 compliance is necessary.] That’s in addition to the complexity introduced by replacing what are high-speed network components capable of handling upwards of 40 and 100 Gbps with commodity hardware, limited compute resources, and constrained network connections. Achieving similar throughput rates using virtual components will require multiple instances of the virtual network appliance which introduce architectural and topological challenges that must be addressed, not the least of which is controlling flow which subsequently introduces overhead that will negatively impact those throughput rates. This also assumes that the protocols typically associated with the network perimeter will scale across multiple, dynamic instances without noticeable disruption to services. If you’ve ever changed a routing table or a VLAN on a router and then had to wait for spanning tree to converge you’ll know what I’m talking about. It’s anything but rapid and will almost certainly have a detrimental effect on availability of every dependent service (which, at the network perimeter, is everything). IT’S NOT ABOUT THE FORM FACTOR In order to implement the kind of dynamic network perimeter introduced by the author of “Why The Perimeter Must Become Virtual” we do, in fact, need a more flexible, automated perimeter. However, that perimeter does not have to be virtual and in fact the key to implementing such a fluid network is the inherent dynamism of its components, not its form factor. If the components are dynamic themselves – programmable, if you will – and can be configured, deployed, modified and shut-down automatically and on-demand then they can be leveraged to address the dynamism inherent in a cloud computing and highly virtualized architectural model. Because they can be integrated. Because they are collaborative. The strategic points of control that exist in every data center model must be dynamic – both from a configuration and execution point of view. Not only must the components that form a strategic net across the data center - effectively virtualizing business resources such as applications and storage – be dynamic in their management they must themselves be contextually aware and capable of taking action at run-time. The kind of dynamic action required to address “moving targets” is not inherent in virtualization. Virtualizing a component only makes provisioning easier. Without a means to remotely invoke services (APIs) and modify configuration dynamically (APIs) as well as the means by which the component can dynamically adjust its behavior based on events within the data center, a virtualized component is little more than a virtual brick. Fluidity of the network is not a result of virtualization. There are myriad examples already of how traditional “iron” not only enables but stabilizes the management and control of dynamic environments. Programmability, on-demand contextual-awareness, APIs, scripting, policy-based networking. All these capabilities enable the fluidity necessary to address the “moving targets” comprising cloud-based and highly virtualized modern data center models, but without the instability created by the lack of topological and architectural control inherent in a “toss another virtual appliance” at the problem approach. It’s more about designing an architecture comprised of highly dynamic and interactive components that can be provisioned and managed on-demand, as services. Yes - dynamic, highly automated data centers are necessary to combat the issues arising from constantly changing infrastructure. But dynamism and automation do not require virtualization, they require collaboration and integration and a platform capable of providing both. Provisioning a Virtual Network is Only the Beginning The Four Horsemen Of the Virtualization Security Apocalypse Why The Perimeter Must Become Virtual Are You Ready for the New Network? VM Sprawl is Bad but Network Sprawl is Badder The Devil is in the Details Infrastructure 2.0 + Cloud + IT as a Service = An Architectural Parfait The Question Shouldn’t Be Where are the Network Virtual Appliances but Where is the Architecture? A Fluid Network is the Result of Collaboration Not Virtualization What is a Strategic Point of Control Anyway?174Views0likes0CommentsAchieving Scalability Through Fewer Resources
Sometimes it’s not about how many resources you have but how you use them The premise upon which scalability through cloud computing and highly virtualized architectures is built is the rapid provisioning of additional resources as a means to scale out to meet demand. That premise is a sound one and one that is a successful tactic in implementing a scalability strategy. But it’s not the only tactic that can be employed as a means to achieve scalability and it’s certainly not the most efficient means by which demand can be met. WHAT HAPPENED to EFFFICIENCY? One of the primary reasons cited in surveys regarding cloud computing drivers is that of efficiency. Organizations want to be more efficient as a means to better leverage the resources they do have and to streamline the processes by which additional resources are acquired and provisioned when necessary. But somewhere along the line it seems we’ve lost sight of enabling higher levels of efficiency for existing resources and have, in fact, often ignored that particular goal in favor of simplifying the provisioning process. After all, if scalability is as easy as clicking a button to provision more capacity in the cloud, why wouldn’t you? The answer is, of course, that it’s not as efficient and in some cases it may be an unnecessary expense. The danger with cloud computing and automated, virtualized infrastructures is in the tendency to react to demand for increases in capacity as we’ve always reacted: throw more hardware at the problem. While in the case of cloud computing and virtualization this has morphed from hardware to “virtual hardware”, the result is the same – we’re throwing more resources at the problem of increasing demand. That’s not necessarily the best option and it’s certainly not the most efficient use of the resources we have on hand. There are certainly efficiency gains in this approach, there’s no arguing that. The process for increasing capacity can go from a multi-week, many man-hour manual process to an hour or less, automated process that decreases the operational and capital expenses associated with increasing capacity. But if we want to truly take advantage of cloud computing and virtualization we should also be looking at optimizing the use of the resources we have on hand, for often it is the case that we have more than enough capacity, it simply isn’t being used to its full capacity. CONNECTION MANAGEMENT Discussions of resource management generally include compute, storage, and network resources. But they often fail to include connection management. That’s a travesty as TCP connection usage is increases dramatically with modern application architectures and TCP connections are resource heavy; they consume a lot of RAM and CPU on web and application servers to manage. In many cases the TCP connection management duties of a web or application server are by far the largest consumers of resources; the application itself actually consumes very little on a per-user basis. Optimizing those connections – or the use of those connections – then, should be a priority for any efficiency-minded organization, particularly those interested in reducing the operational costs associated with scalability and availability. As is often the case, the tools to make more efficient the use of TCP connections is likely already in the data center and has been merely overlooked: the application delivery controller. The reason for this is simple: most organizations acquire an application delivery controller (ADC) for its load balancing capabilities and tend to ignore all the bells and whistles and additional features (value) it can provide. Load balancing is but one feature of application delivery; there are many more that can dramatically impact the capacity and performance of web applications if they employed as part of a comprehensive application delivery strategy. An ADC provides the means to perform TCP multiplexing (a.k.a. server offload, a.k.a. connection management). TCP multiplexing allows the ADC to maintain millions of connections with clients (users) while requiring only a fraction of that number to the servers. By reusing existing TCP connections to web and application servers, an ADC eliminates the overhead in processing time associating with opening, managing, and closing TCP connections every time a user accesses the web application. If you consider that most applications today are Web 2.0 and employ a variety of automatically updating components, you can easily see that eliminating the TCP management for the connections required to perform those updates will decrease not only the number of TCP connections required on the server-side but will also eliminate the time associated with such a process, meaning better end-user performance. INCREASE CAPACITY by DECREASING RESOURCE UTILIZATION Essentially we’re talking about increasing capacity by decreasing resource utilization without compromising availability or performance. This is an application delivery strategy that requires a broader perspective than is generally available to operations and development staff. The ability to recognize a connection-heavy application and subsequently employ the optimization capabilities of an application delivery controller to improve the efficiency of resource utilization for that application require a more holistic view of the entire architecture. Yes, this is the realm of devops and it is in this realm that the full potential of application delivery will be realized. It will take someone well-versed in both network and application infrastructure to view the two as part of a larger, holistic delivery architecture in order to assess the situation and determine that optimization of connection management will benefit the application not only as a means to improve performance but to increase capacity without increasing associated server-side resources. Efficiency through optimization of resource utilization is an excellent strategy to improving the overall delivery of applications whilst simultaneously decreasing costs. It doesn’t require cloud or virtualization, it simply requires a better understanding of applications and their underlying infrastructure and optimizing the application delivery infrastructure such that the innate behavior of such infrastructure is made more efficient without negatively impacting performance or availability. Leveraging TCP multiplexing is a simple method of optimizing connection utilization between clients and servers that can dramatically improve resource utilization and immediately increase capacity of existing “servers”. Organizations looking to improve their bottom line and do more with less ought to closely evaluate their application delivery strategy and find those places where resource utilization can be optimized as a way as to improve efficiency of the use of existing resources before embarking on a “throw more hardware at the problem” initiative. Long Live(d) AJAX Cloud Lets You Throw More Hardware at the Problem Faster WILS: Application Acceleration versus Optimization Two Different Sock(et)s What is server offload and why do I need it? 3 Really good reasons you should use TCP multiplexing SOA and Web 2.0: The Connection Management Challenge The Impact of the Network on AJAX The Impact of AJAX on the Network229Views0likes0CommentsThe Impossibility of CAP and Cloud
It comes down to this: the on-demand provisioning and elastic scalability systems that make up “cloud” are addressing NP-Complete problems for which there is no known exact solutions. At the heart of what cloud computing provides – in addition to compute-on-demand – is the concept of elastic scalability. It is through the ability to rapidly provision resources and applications that we can achieve elastic scalability and, one assumes, through that high availability of systems. Obviously, given my relationship to F5 I am strongly interested in availability. It is, after all, at the heart of what an application delivery controller is designed to provide. So when a theorem is presented that basically says you cannot build a system that is Consistent, Available, and Partition-Tolerant I get a bit twitchy. Just about the same time that Rich Miller was reminding me of Brewer’s CAP Theorem someone from HP Labs claimed to have solved the P ≠ NP problem (shortly thereafter determined to not be a solution after all), which got me thinking about NP-Completeness in problem sets, of which solving the problem of creating a distributed CAP-compliant system certainly appears to be a member. CLOUD RESOURCE PROVISIONING is NP-COMPLETE A core conflict with cloud and CAP-compliance is on-demand provisioning. There are, after all, a minimal set of resources available (cloud is not infinitely scalable, after all) with, one assumes, each resource having a variable amount of compute availability. For example, most cloud providers use a “large”, “medium”, and “small” sizing approach to “instances” (which are, in almost all cases, a virtual machine). Each “size” has a defined set of reserved compute (RAM and CPU) for use. Customers of cloud providers provision instances by size. At first glance this should not a problem. The provisioning system is given an instruction, i.e. “provision instance type X.” The problem begins when you consider what happens next – the provisioning system must find a hardware resource with enough capacity available on which to launch the instance. In theory this certainly appears to be a variation of the Bin packing problem (which is NP-complete). It is (one hopes) resolved by the cloud provider by removing the variability of location (parameterization) or the use of approximation (using the greedy approximation algorithm “first-fit”, for example). In a pure on-demand provisioning environment, the management system would search out, in real-time, a physical server with enough physical resources available to support the requested instance requirements but it would also try to do so in a way that minimizes the utilization of physical resources on each machine so as to better guarantee availability of future requests and to be more efficient (and thus cost-effective). Brewer’s CAP Theorem It is impractical, of course, to query each physical server in real-time to determine an appropriate location, so no doubt there is a centralized “inventory” of resources available that is updated upon the successful provisioning of an instance. Note that this does not avoid the problem of NP-Completeness and the resulting lack of a solution as data replication/synchronization is also an NP-Complete problem. Now, because variability in size and an inefficient provisioning algorithm could result in a fruitless search, providers might (probably do) partition each machine based on the instance sizes available and the capacity of the machine. You’ll note that most providers size instances as multiples of the smallest, if you were looking for anecdotal evidence of this. If a large instance is 16GB RAM and 4 CPUs, then a physical server with 32 GB of RAM and 8 CPUs can support exactly two large instances. If a small instance is 4GB RAM and 1 CPU, that same server could ostensibly support a combination of both: 8 small instances or 4 small instances and 2 large instances, etc… However, that would make it difficult to keep track of the availability of resources based on instance size and would eventually result in a failure of capacity availability (which makes the system non-CAP compliant). However, not restricting the instances that can be deployed on a physical server returns us to a bin packing-like algorithm that is NP-complete which necessarily introduces unknown latency that could impact availability. This method also introduces the possibility that while searching for an appropriate location some other consumer has requested an instance that is provisioned on a server that could have supported the first consumer’s request, which results in a failure to achieve CAP-compliance by violating the consistency constraint (and likely the availability constraint, as well). The provisioning will never be “perfect” because there is no exact solution to an NP-complete problem. That means the solution is basically the fastest/best it can be given the constraints. Which we often distill down to “good enough.” That means that there are cases where either availability or consistency will be violated, making cloud in general non-CAP compliant. The core conflict is the definition of “highly available” as “working with minimal latency.” Or perhaps the real issue is the definition of “minimal”. For it is certainly the case that a management system that leverages opportunistic locking and shared data systems could alleviate the problem of consistency, but never availability. Eliminating the consistency problem by ensuring that every request has exclusive access to the “database” of instances when searching for an appropriate physical location introduces latency while others wait. This is the “good enough” solution used by CPU schedulers – the CPU scheduler is the one and only authority for CPU time-slice management. It works more than well-enough on a per-machine basis, but this is not scalable and in larger systems would result in essentially higher rates of non-availability as the number of requests grows. WHY SHOULD YOU CARE Resource provisioning and job scheduling in general are in the class of NP-complete problems. While the decision problem to choose an appropriate physical server on which to launch a set of requested instances can be considered an instantiation of the Bin packing problem, it can also be viewed as a generalized assignment problem or, depending on the parameters, a variation of the Knapsack problem, or any one of the multiprocessor scheduling problems, all of which are NP-complete. Cloud is essentially the integration of systems that provide resource provisioning and may include job scheduling as a means to automate provisioning and enable a self-service environment. Because of its reliance on problems that are NP-complete we can deduce that cloud is NP-complete. NOTE: No, I’m not going to provide a formal proof. I will leave that to someone with a better handle on the reductions necessary to prove (or disprove) that the algorithms driving cloud are either the same or derivations of existing NP-Complete problem sets. The question “why should I care if these problems are NP-Complete” is asked by just about every student in every algorithms class in every university there is. The answer is always the same: because if you can recognize that a problem you are trying to solve is NP-Complete you will not waste your time trying to solve a problem that thousands of mathematicians and computer scientists have been trying to solve for 50 years and have thus far not been able to do so. And if you do solve it, you might want to consider formalizing it, because you’ve just proved P = NP and there’s a $1,000,000 bounty out on that proof. But generally speaking, it’s a good idea to recognize them when you see them because you can avoid a lot of frustration by accepting up front you can’t solve it, and you can also leverage existing research / algorithms that have been proposed as alternatives (approximation algorithms, heuristics, parameterized algorithms, etc…) to get the “best possible” answer and get on with more important things. It also means there is no one optimal solution to “cloud”, only a variety of “good enough” or “approximately optimal” solutions. Neither the time required to provision can be consistently guaranteed or the availability of resources in a public cloud environment. This is, essentially, why the concept of reserved instances exists. Because if your priorities include high availability, you’d better consider budgeting for reserved instances, which is basically a more cost effective method of having a whole bunch of physical servers in your pool of available resources on stand-by. But if your priorities are geared toward pinching of pennies, and availability is lower on your “must have” list of requirements, then reserving instances is an unnecessary cost – as long as you’re willing to accept the possibility of lower availability. Basically, the impossibility of achieving CAP in cloud impacts (or should impact) your cloud computing strategy – whether you’re implementing locally or leveraging public resources. As I mentioned very recently – cloud is computer science, and if you understand the underlying foundations of the systems driving cloud you will be much better able to make strategic decisions regarding when and what type of cloud is appropriate and for what applications. Related Posts340Views0likes1CommentAgile Operations: A Formula for Just-In-Time Provisioning
One of the ways in which traditional architectures and deployment models is actually superior (yes, I said superior) to cloud computing is in provisioning. Before you label me a cloud heretic, let me explain. In traditional deployment models capacity is generally allocated based on anticipated peaks in demand. Because the time to acquire, deploy, and integrate hardware into the network and application infrastructure this process is planned for and well-understood, and the resources required are in place before they are needed. In cloud computing, the benefit is that the time required to acquire those resources is contracted to virtually nothing, making capacity planning much more difficult. The goal is just-in-time provisioning – resources are not provisioned until you are sure you’re going to need them because part of the value proposition of cloud and highly virtualized infrastructure is that you don’t pay for resources until you need them. But it’s very hard to provision just-in-time and sometimes the result will end up being almost-but-not-quite-in-time. Here’s a cute [whale | squirrel | furry animal] to look at until service is restored. While fans of Twitter’s fail whale are loyal and everyone will likely agree its inception and subsequent use bought Twitter more than a bit of patience with its often times unreliable service, not everyone will be as lucky or have customers as understanding as Twitter. We’d all really rather prefer not to see the Fail Whale, regardless of how endearing he (she? it?) might be. But we also don’t want to overprovision and potentially end up spending more money than we need to. So how can these two needs be balanced?227Views0likes0Comments