strategy
16 TopicsAmazon Outage Casts a Shadow on SDN
#SDN #AWS #Cloud Amazon’s latest outage casts a shadow on the ability of software-defined networking to respond to catastrophic failure Much of the chatter regarding the Amazon outage has been focused on issues related to global reliability and failover and multi-region deployments. The issue of costs associated with duplicating storage and infrastructure services has been raised, and much advice given on how to avoid the negative impact of a future outage at any cloud provider. But reading through the issues discovered during the outages caused specifically by Amazon’s control plane for EC2 and EBS one discovers a more subtle story. After reading, it seems easy to come to the conclusion that Amazon’s infrastructure is, in practice if not theory, a SDN-based network architecture. Control planes (with which customers and systems interact via its API) are separated from the actual data planes, and used to communicate constantly to assure service quality and perform more mundane operations across the entire cloud. After power was restored, the problems with this approach to such a massive system became evident in the inability of its control plane to scale. The duration of the recovery time for the EC2 and EBS control planes was the result of our inability to rapidly fail over to a new primary datastore. Because the ELB control plane currently manages requests for the US East-1 Region through a shared queue, it fell increasingly behind in processing these requests; and pretty soon, these requests started taking a very long time to complete. -- Summary of the AWS Service Event in the US East Region This architecture is similar to the one described by SDN proponents, where control is centralized and orders are dispatched through a single controller. In the case of Amazon, that single controller is a shared queue. As we know now, this did not scale well. While recovery time duration may be tied to the excessive time it took to fail over to a new primary data store, the excruciating slowness with which services were ultimately restored to customer’s customers was almost certainly due exclusively to the inability of the control plane to scale under load. This is not a new issue. The inability of SDN to ultimately scale in the face of very high loads has been noted by many experts who cite an inability the scale inserts into networking infrastructure via such an architecture in conjunction with inadequate response times as the primary cause of failure to scale. Traditional load balancing services – both global and local – deal with failure through redundancy and state mirroring. ELB mimics state mirroring through the use of a shared data store, much in the same way applications share state by sharing a data store. The difference is that the traditional load balancing services are able to detect and react to failures in sub-second time, whereas a distributed, shared application-based system cannot. In fact, one instance of ELB is unlikely to be aware another has failed by design – only the controller of the overarching system is aware of such failures as it is the primary mechanism through which such failures are addressed. Traditional load balancing services are instantly aware of such failures, and enact counter-measures automatically – without being required to wait for customers to move resources from one zone to another to compensate. A traditional load balancing architecture is designed to address this failure automatically, it is one of the primary purposes for which load balancers are designed and used across the globe today. This difference is not necessarily apparent or all that important in day to day operations when things are running smoothly. They only rise to the surface in the event of a catastrophic failure, and even then in a well-architected system they are not cause for concern, but rather relief. One can extend the issues with this SDN-like model for load balancing to the L2-3 network services SDN is designed to serve. The same issues with shared queues and a centralized model will be exposed in the event of a catastrophic failure. Excessive requests in the shared queue (or bus) result in the inability of the control plane to adequately scale to meet the demand experienced when the entire network must “come back online” after an outage. Even if the performance of an SDN is acceptable during normal operations, its ability to restore the network after a failure may not be. It would be unwise to ignore the issues experienced by Amazon because it does not call its ELB architecture SDN. In every sense of the term, it acts like an SDN for L4+ and this outage has exposed a potentially fatal flaw in the architecture that must be addressed moving forward. LESSON LEARNED: SDN requires that both the control and data planes be architected for failure, and able to respond and scale instantaneously. Applying ‘Centralized Control, Decentralized Execution’ to Network Architecture WILS: Virtualization, Clustering, and Disaster Recovery OpenFlow/SDN Is Not A Silver Bullet For Network Scalability Summary of the AWS Service Event in the US East Region After The Storm: Architecting AWS for Reliability QoS without Context: Good for the Network, Not So Good for the End user SDN, OpenFlow, and Infrastructure 2.0251Views0likes0CommentsApplying ‘Centralized Control, Decentralized Execution’ to Network Architecture
#SDN brings to the fore some critical differences between concepts of control and execution While most discussions with respect to SDN are focused on a variety of architectural questions (me included) and technical capabilities, there’s another very important concept that need to be considered: control and execution. SDN definitions include the notion of centralized control through a single point of control in the network, a controller. It is through the controller all important decisions are made regarding the flow of traffic through the network, i.e. execution. This is not feasible, at least not in very large (or even just large) networks. Nor is it feasible beyond simple L2/3 routing and forwarding. HERE COMES the SCIENCE (of WAR) There is very little more dynamic than combat operations. People, vehicles, supplies – all are distributed across what can be very disparate locations. One of the lessons the military has learned over time (sometimes quite painfully through experience) is the difference between control and execution. This has led to decisions to employ what is called, “Centralized Control, Decentralized Execution.” Joint Publication (JP) 1-02, Department of Defense Dictionary of Military and Associated Terms, defines centralized control as follows: “In joint air operations, placing within one commander the responsibility and authority for planning, directing, and coordinating a military operation or group/category of operations.” JP 1-02 defines decentralized execution as “delegation of execution authority to subordinate commanders.” Decentralized execution is the preferred mode of operation for dynamic combat operations. Commanders who clearly communicate their guidance and intent through broad mission-based or effects-based orders rather than through narrowly defined tasks maximize that type of execution. Mission-based or effects-based guidance allows subordinates the initiative to exploit opportunities in rapidly changing, fluid situations. -- Defining Decentralized Execution in Order to Recognize Centralized Execution * Lt Col Woody W. Parramore, USAF, Retired Applying this to IT network operations means a single point of control is contradictory to the “mission” and actually interferes with the ability of subordinates (strategic points of control) to dynamically adapt to rapidly changing, fluid situations such as those experienced in virtual and cloud computing environments. Not only does a single, centralized point of control (which in the SDN scenario implies control over execution through admittedly dynamically configured but rigidly executed) abrogate responsibility for adapting to “rapidly changing, fluid situations” but it also becomes the weakest link. Clausewitz, in the highly read and respected “On War”, defines a center of gravity as "the hub of all power and movement, on which everything depends. That is the point against which all our energies should be directed." Most military scholars and strategists logically imply from the notion of a Clausewitzian center of gravity is the existence of a critical weak link. If the “controller” in an SDN is the center of gravity, then it follows it is likely a critical, weak link. This does not mean the model is broken, or poorly conceived of, or a bad idea. What it means is that this issue needs to be addressed. The modern strategy of “Centralized Control, Decentralized Execution” does just that. Centralized Control, Decentralized Execution in the Network The major issue with the notion of a centralized controller is the same one air combat operations experienced in the latter part of the 20th century: agility, or more appropriately, lack thereof. Imagine a large network adopting fully an SDN as defined today. A single controller is responsible for managing the direction of traffic at L2-3 across the vast expanse of the data center. Imagine a node, behind a Load balancer, deep in the application infrastructure, fails. The controller must respond and instruct both the load balancing service and the core network how to react, but first it must be notified. It’s simply impossible to recover from a node or link failure in 50 milliseconds (a typical requirement in networks handling voice traffic) when it takes longer to get a reply from the central controller. There’s also the “slight” problem of network devices losing connectivity with the central controller if the primary uplink fails. -- OpenFlow/SDN Is Not A Silver Bullet For Network Scalability, Ivan Pepelnjak (CCIE#1354 Emeritus) Chief Technology Advisor at NIL Data Communications The controller, the center of network gravity, becomes the weak link, slowing down responses and inhibiting the network (and IT) from responding in a rapid manner to evolving situations. This does not mean the model is a failure. It means the model must adapt to take into consideration the need to adapt more quickly. This is where decentralized execution comes in, and why predictions that SDN will evolve into an overarching management system rather than an operational one are likely correct. There exist today, within the network, strategic points of control; locations within the data center architecture at which traffic (data) is aggregated, forcing all data to traverse, from which control over traffic and data is maintained. These locations are where decentralized execution can fulfill the “mission-based guidance” offered through centralized control. Certainly it is advantageous to both business and operations to centrally define and codify the operating parameters and goals of data center networking components (from L2 through L7), but it is neither efficient nor practical to assume that a single, centralized controller can achieve both managing and executing on the goals. What the military learned in its early attempts at air combat operations was that by relying on a single entity to make operational decisions in real time regarding the state of the mission on the ground, missions failed. Airmen, unable to dynamically adjust their actions based on current conditions, were forced to watch situations deteriorate rapidly while waiting for central command (controller) to receive updates and issue new orders. Thus, central command (controller) has moved to issuing mission or effects-based objectives and allowing the airmen (strategic points of control) to execute in a way that achieves those objectives, in whatever way (given a set of constraints) they deem necessary based on current conditions. This model is highly preferable (and much more feasible given today’s technology) than the one proffered today by SDN. It may be that such an extended model can easily be implemented by distributing a number of controllers throughout the network and federating them with a policy-driven control system that defines the mission, but leaves execution up to the distributed control points – the strategic control points. SDN is new, it’s exciting, it’s got potential to be the “next big thing.” Like all nascent technology and models, it will go through some evolutionary massaging as we dig into it and figure out where and why and how it can be used to its greatest potential and organizations’ greatest advantage. One thing we don’t want to do is replicate erroneous strategies of the past. No network model abrogating all control over execution has every really worked. All successful models have been a distributed, federated model in which control may be centralized, but execution is decentralized. Can we improve upon that? I think SDN does in its recognition that static configuration is holding us back. But it’s decision to reign in all control while addressing that issue may very well give rise to new issues that will need resolution before SDN can become a widely adopted model of networking. QoS without Context: Good for the Network, Not So Good for the End user Cyclomatic Complexity of OpenFlow-Based SDN May Drive Market Innovation SDN, OpenFlow, and Infrastructure 2.0 OpenFlow/SDN Is Not A Silver Bullet For Network Scalability Prediction: OpenFlow Is Dead by 2014; SDN Reborn in Network Management OpenFlow and Software Defined Networking: Is It Routing or Switching ? Cloud Security: It’s All About (Extreme Elastic) Control Ecosystems are Always in Flux The Full-Proxy Data Center Architecture384Views0likes0CommentsThe Mobile Chimera
#mobile #vdi #IPv6 In the case of technology – as with mythology - the whole is often greater (and more challenging) than the sum of its parts. The chimera is a mythological beast of scary proportions. Not only is it fairly large, but it’s also got three, independent heads – traditionally a lion, a goat, and a snake. Some variations on this theme exist, but the basic principle remains: it’s a three-headed, angry beast that should not be taken lightly should one encounter it in the hallway. Individually, one might have a strategy to meet the challenge of a lion or a goat head on. But when they converge into one very angry and dangerous beast, the strategies and tactics employed to best any one of them will almost certainly not work to address all three of them simultaneously. The world of mobility is rapidly approaching its own technological chimera, one comprised of three individual technology trends. While successful stratagem and tactics exist which address each one individually, when taken together they form a new challenge requiring a new strategic approach. THE MOBILE CHIMERA Three technology trends - VDI, mobile, and IPv6 - are rapidly converging upon the enterprise. Each is driven in part by the other, and each requires in part functionality and support of another. Addressing the challenges accompanying this trifecta requires a serious evaluation of the enterprise infrastructure with an eye toward performance, scalability, and flexibility, less it be overwhelmed by demand originating both internally and externally. Mobile The myriad articles, blogs, and editorial orations on mobile device growth have to date focused on the need for organizations to step up and accept the need for device-ready enterprise applications. This focus has thus far ignored the reality of the diversity of the device client base, the ramifications of which those with long careers in IT will painfully recall from the client-server era. Thus it is no surprise that interest in and adoption of technology such as VDI is on the rise, as virtualization serves as a popular solution to the problem of delivering applications to a highly-diverse set of clients. But virtualization, as popular a solution as it may be, is not a panacea. Security and control over corporate resources and applications is a growing necessity today because of the ease with which users can take advantage of mobile technology to access them. Access control does not entirely solve the challenges of a diverse mobile client audience, as attackers turn their attention on mobile platforms as a means to gain access to resources and data previously beyond their reach. The need for endpoint security inspection continues to grow as the threat posed by mobile devices continues to rear its ugly head. VDI It was inevitable that the growth of mobile device usage in the enterprise continued to grow that so, too, would the solution of VDI grow as the most efficient way to deliver applications without requiring mobile platform-specific versions. The desire by business owners and security practitioners to keep data securely within the data center "walls", too, is a factor in the rising desire to deploy VDI. VDI enables organizations to deliver applications remotely while maintaining control over data inside the data center, preserving enforcement of corporate security policies and minimizing risk. But VDI deployments are not trivial, regardless of the virtualization platform chosen. Each virtualization solution has its challenges and most of those challenges revolve around the infrastructure necessary to support such an initiative. Scalability and flexibility are important facets of VDI delivery infrastructure, and performance cannot be overlooked if such deployments are to be considered successful. IPv6 Who could forget that the Internet is being pressured to move to IPv6 sooner rather than later, in part because of the growth of mobile clients? The strain placed on service providers to maintain IPv4 support as a means to not "break the Internet" can only be borne so long before IPv6 becomes, as has been predicted, the Y2K for the network. The ability to deliver applications via VDI to mobile devices will soon require support for IPv6, but will not obviate the need to support IPv4 just yet. A dual stack approach will be required during the transition period, putting delivery infrastructure again front and center in the battle to deploy and support applications for mobile devices. With all accounts numbering mobile devices in the four billion range across multiple platforms and effectively 0 IPv4 addresses left to assign to those devices, it should be no surprise that as these three technology trends collide the result will be the need for a new mobility strategy. This is why solutions are strategic and technology is tactical. There exist individual products that easily solve each of these problems individually, but very few solutions that address the combined juggernaut that is the three combined. It is necessary to coordinate and architect a solution that can solve all three challenges simultaneously as a means to combat complexity and its associated best friend forever, operational risk. A flexible and scalable delivery strategy will be necessary to ensure performance and security without sacrificing operational efficiency. I Scream, You Scream, We all Scream for Ice Cream (Sandwich) The Full-Proxy Data Center Architecture Scaling VDI Architectures Virtualization and Cloud Computing: A Technological El Niño The Future of Cloud: Infrastructure as a Platform Strategic Trifecta: Access Management From a Network Perspective, What Is VDI, Really? F5 Friday: A Single Namespace to Rule Them All193Views0likes0CommentsMature Security Organizations Align Security with Service Delivery
#adcfw#RSAC Traditional strategy segregates delivery from security. Traditional strategy is doing it wrong… Everyone, I’m sure, has had the experience of calling customer service. First you get the automated system, which often asks for your account number. You know, to direct you to the right place and “serve you better.” Everyone has also likely been exasperated when the first question asked by a customer service representative upon being connected to a real live person is … “May I have your account number, please?” It’s frustrating and, for everyone involved, it’s cumbersome. That’s exactly the process that occurs in most data centers today as application requests are received by the firewall and then passed on to the service delivery layer. Traditional data center design segregates security from service delivery. There’s an entire complement of security-related components that reside at the perimeter of the network, designed to evaluate incoming traffic for a wide variety of potential security risks – DDoS, unauthorized access, malicious packets, etc… But that evaluation is limited to the network layers of the stack. It’s focused on packets and connections and protocols, and fails to take into consideration the broader contextual information that is carried along by every request. It’s asking for an account number but failing to leverage it and share it in a way that effectively applies and enforces corporate security policies. It’s cumbersome. Reality is that many of the functions executed by firewalls are duplicated in the application delivery tier by service delivery systems. What’s more frustrating is that many of those functions are executed more thoroughly and to better effect (i.e. they mitigate risk more effectively) at the application delivery layer. What should be frustrating to those concerned with IT budgets and operational efficiency is that this disconnected security strategy is more expensive to acquire, deploy, and maintain. Using shared infrastructure is the hallmark of a mature security organization; it’s a sign of moving toward a more strategic security strategy that’s not only more technically adept but is financially sound. SHARED INFRASTRUCTURE We most often hear the term “shared infrastructure” with respect to cloud computing and its benefits. The sharing of infrastructure across organizations in a public cloud computing environment nets operational savings not only from alleviating the need to manage the infrastructure from the fact that the capital costs are shared across hundreds if not thousands of customers. Inside the data center private cloud computing models are rising to the top of the “must have” list for IT for similar reasons. In the data center, however, there are additional technical and security benefits that should not be overlooked. Aligning corporate security strategy with the organizations’ service delivery strategy by leveraging shared infrastructure provides a more comprehensive, strategic deployment that is not only more secure, but more cost effective. Service delivery solutions already provide a wide variety of threat mitigation services that can leveraged to mitigate the performance degradation associated with a disjointed security infrastructure, the kind that leads 9 of 10 organizations to sacrifice that security in favor of performance. By leveraging shared infrastructure to perform both service delivery acceleration as well as security, neither performance nor security need be sacrificed because it essentially aligns with the mantra of the past decade with regards to performance and security: crack the packet only once. In other words, don’t ask the customer for their account number twice. It’s cumbersome, frustrating, and an inefficient means of delivering any kind of service. F5 Friday: When Firewalls Fail… At the Intersection of Cloud and Control… F5 Friday: Multi-Layer Security for Multi-Layer Attacks 1024 Words: If Neo Were Your CSO F5 Friday: No DNS? No … Anything. F5 Friday: Performance, Throughput and DPS When the Data Center is Under Siege Don’t Forget to Watch Under the Floor Challenging the Firewall Data Center Dogma What We Learned from Anonymous: DDoS is now 3DoS The Many Faces of DDoS: Variations on a Theme or Two168Views0likes0CommentsCloud Control Does Not Always Mean ‘Do it yourself’
You’re still asking the wrong questions about cloud computing . The city of Santa Clara is covered by a cloud this week, but not the kind of clouds most folks associate with California. CloudConnect 2011 is gearing up for a week of sessions and workshops, thought-provoking panels and general conversation on a topic that continues to be top of mind for everyone from press to analysts to IT professionals. “Everyone” is going to be there. Well, everyone but me. Now you might think that’s odd, that a co-chair of a track at a conference wouldn’t attend the show. My cohort in cloud crime, Randy Bias , will be moderating many of the Private Cloud track panels and generally making sure that the track is as exciting, informative and educational as we hope it will be. But I’ll be in my home office, watching the chatter and sound-bites intently from the sidelines via Twitter and blogs. I rarely wax too personal or complain because, well, I’m from the mid-west. We’re stoics and pragmatists and “it is what it is” is not an uncommon mantra for us. But it’s pertinent in this case as it’s ultimately the cause of my absence from the show and provides some insight into cloud computing and organizational approaches to leveraging the right “cloud” for the “application”. WHAT CELIACS can TEACH US about CONTROL Nearly two years I ago I was diagnosed with Celiac’s Disease. There’s a lot of misinformation and misunderstanding about Celiac’s out there and even folks who have family members diagnosed often don’t “get” the impact on your daily life, to say nothing about traveling and professional life. Add in a healthy dose of the popularizing of a “gluten-free” diet as the “new black” of dietary health for very visible celebrities like Oprah and you have yourself a perfect storm of misconception regarding a disease that’s most often described by experts as “debilitating.” That’s probably because most Celiac’s are very thin, so it must be what they eat, right? Unfortunately for everyone, it isn’t the diet, that’s malabsorption and malnutrition – neither of which are really good things in the long run. Talk to folks who frequent support forums for sufferers of Celiac’s and you’ll generally find a common theme regarding travel: they’ve given up. We don’t eat out at restaurants and we don’t travel far from the safety net of our own homes. That’s because at home we have control; not just over what we eat but our environment. We have control over the process by which the food we eat is prepared and served and ultimately that’s as important if not more so than what that food contains. Every Celiac reacts differently to ingesting gluten. Some experience no side-effects at all (asymptomatic) and others are wracked with so much pain and illness they end up in the hospital. If you think about having a stomach flu for 2-3 weeks you wouldn’t be far from how many Celiac’s react to ingesting even microscopic amounts of gluten. Yes, microscopic amounts. Trust me, our Toddler is the cleanest three-year old in existence – the dust from Captain Crunch Berries is full of gluten, after all, and three-year olds are not known for their proficiency with utensils (or their proclivity to use them). If you think about how that translates to eating out or on the run at a conference, you’ll probably see that practice is a whole lot more difficult than the theory. It’s all about process in my house these days; about following certain procedures to ensure that even minute traces of gluten do not come in contact with me, my food, or anything I might touch. If you can imagine trying to enforce such processes and policies while traveling you’ll probably see why so many Celiacs give up and cut travel from their lives. So after more than a year of traveling to conferences and events and ending up sick I took a step back to try to figure out how I could manage the processes and procedures I need to enforce to stay healthy while traveling. What I’ve discovered is that as with cloud computing, control is not a synonym for “do it yourself”, it’s about asking the right questions before you do anything else. DO not CONFUSE CONTROL with DIY Like Celiac’s, cloud computing is not just about the ingredients, it’s about how they are put together; the process and preparation. Ultimately ensuring that a cloud computing initiative achieves the goals it was intended to for the organization require control. That control is over the implementation and ultimately control over the deployment to ensure ongoing compliance with operational and organizational policies intended to ensure the efficiency, security and speedy delivery of applications critical to the business. Which makes the standard question “Which applications are ‘right’ for the cloud?” the wrong question in the first place. It’s not just the applications you have to match to any given cloud implementation, it’s the application ecosystem. Dependencies on application and network infrastructure providing for the security, optimization or availability of the application must be considered when determining where to deploy an application – internal, external, cloud, or traditional. As part of the vendor “machine” I of course hope you want to replicate your infrastructure in the cloud, but in many cases today this is simply not realistic. Either topological constraints or infrastructure integration issues will prevent such a deployment from happening. What’s important, overall, is to match the application’s operational dependencies to services available in a cloud environment. If that’s by deploying virtual network appliances, great. If it’s leveraging services in the cloud, that’s great too. The point is that you can’t simply look at the application, you have to examine its dependencies in the storage and application delivery network and replicate them, through service or solution, in the cloud environment (or architecturally, but that’s another discussion). The question you should be asking about cloud are the same kinds of questions I have to ask a restaurant: how are meals prepared and handled in the kitchen? How are applications isolated to prevent collateral damage? What optimization services are available? Are WAN optimization services an option? How does the cloud provider combat jitter? How do you replicate application access control processes in the cloud environment? What infrastructure services can I provision (if not replicate) in the cloud environment? I’ve recently eaten at a number of restaurants successfully (i.e. without ending up sick for weeks). The key was always asking the right questions – asking about isolation techniques and shared services; asking about the tools used and the processes in handling the food from preparation to delivery. The key to successfully deploying an application in an external (public) cloud computing environment is no different. The control you exercise is also in the decision making process; in asking the right questions in the first place; not necessarily commandeering the kitchen. Don’t think that sticking to private cloud computing alleviates the need to ask and answer those questions. The control you exercise in your private cloud implementation is as vital to your long-term success as that of the control exercised over public cloud computing. Just as I examine the ingredient list on every product I might eat – even if it’s labeled “gluten-free” – so must you examine the infrastructure ingredients necessary for each application you want to deploy in a cloud environment. The questions still need to be asked, because it isn’t just a matter of virtualizing an application and sticking a self-service layer over it. There are myriad network and application network components that make up an “application” and it those services that must also considered when posing the question “Is this application right for ‘the cloud’” whether private or public. So like a Celiac, the health (security, performance and availability) of the applications you manage to support the business is ultimately up to you and you alone. You need to take control of the processes and ensure that you’re asking the right questions before deploying an application in any environment. See, control isn’t necessarily the same thing as “do it yourself.” Public cloud computing can be the right answer – but only if you’ve asked the right questions in the first place. You can learn more about Celiac’s Disease (also commonly called Celiac Sprue) by visiting the Celiac Sprue Association. Related blogs & articles: Hybrid Cloud: Fact, Fiction or Future? Data Center Feng Shui: Process Equally Important as Preparation The Gluten-free Application Network Knowing is Half the Battle Putting the Cloud Before the Horse If You Focus on Products You’ll Miss the Cloud The Zero-Product Property of IT What is a Strategic Point of Control Anyway? Why You Need a Cloud to Call your Own | F5 White Paper The New Network165Views0likes2CommentsAbout that ‘Unassailable Economic Argument’ for Public Cloud Computing
Turns out that ‘unassailable’ economic argument for public cloud computing is very assailable The economic arguments are unassailable. Economies of scale make cloud computing more cost effective than running their own servers for all but the largest organisations. Cloud computing is also a perfect fit for the smart mobile devices that are eating into PC and laptop market. -- Tim Anderson, “Let the Cloud Developer Wars Begin” Ah, Tim. The arguments are not unassailable and, in fact, it appears you might be guilty of having tunnel vision – seeing only the list price and forgetting to factor in the associated costs that make public cloud computing not so economically attractive under many situations. Yes, on a per hour basis, per CPU cycle, per byte of RAM, public cloud computing is almost certainly cheaper than any other option. But that doesn’t mean that arguments for cloud computing (which is much more than just cheap compute resources) are economically unassailable. Ignoring for a moment that it isn’t as clear cut as basing a deployment strategy purely on costs, the variability in bandwidth and storage costs along with other factors that generate both hard and soft costs associated with applications must be considered . MACRO versus MICRO ECONOMICS The economic arguments for cloud computing almost always boil down to the competing views of micro versus macro economics. Those in favor of public cloud computing are micro-economic enthusiasts, narrowing in on the cost per cycle or hour of a given resource. But micro-economics don’t work for an application because an application is not an island of functionality; it’s an integrated, dependent component that is part of a larger, macro-economic environment in which other factors impact total costs. The lack of control over resources in external environments can be problematic for IT organizations seeking to leverage cheaper, commodity resources in public cloud environments. Failing to impose constraints on auto-scaling – as well as defining processes for de-scaling – and the inability to track and manage developer instances launched and left running are certainly two of the more common causes of “cloud sprawl.” Such scenarios can certainly lead to spiraling costs that, while not technically the fault of cloud computing or providers, may engender enough concern in enterprise IT to keep from pushing the “launch” button. The touted cost savings associated with cloud services didn't pan out for Ernie Neuman, not because the savings weren't real, but because the use of the service got out of hand. When he worked in IT for the Cole & Weber advertising firm in Seattle two and a half years ago, Neuman enlisted cloud services from a provider called Tier3, but had to bail because the costs quickly overran the budget, a victim of what he calls cloud sprawl - the uncontrolled growth of virtual servers as developers set them up at will, then abandoned them to work on other servers without shutting down the servers they no longer need. Whereas he expected the developers to use up to 25 virtual servers, the actual number hit 70 or so. "The bills were out of control compared with what the business planned to spend," he says. -- Unchecked usage can kill cost benefits of cloud services But these are not the only causes of cost overruns in public cloud computing environments and, in fact, uncontrolled provisioning whether due to auto-scaling or developers forgetfulness is not peculiar to public cloud but rather can be a problem in private cloud computing implementations as well. Without the proper processes and policies – and the right infrastructure and systems to enforce them – cloud sprawl will certainly impact especially those large enterprises for whom private cloud is becoming so attractive an option. While it’s vastly more difficult to implement the proper processes and procedures automatically in public as opposed to private cloud computing environments because of the lack of maturity in infrastructure services in the public arena, there are other, hotter issues in public cloud that will just as quickly burn up an IT or business budget if not recognized and addressed before deployment. And it’s this that cloud computing cannot necessarily address even by offering infrastructure services, which makes private cloud all the more attractive. TRAFFIC SPRAWL Though not quite technically accurate, we’ll use traffic sprawl to describe increasing amounts of unrelated traffic a cloud-deployed application must process. It’s the extra traffic – the malicious attacks and the leftovers from the last application that occupied an IP address – that the application must field and ultimately reject. This traffic is nothing less than a money pit, burning up CPU cycles and RAM that translate directly into dollars for customers. Every request an application handles – good or bad – costs money. The traditional answer to preventing the unnecessary consumption of resources on servers due to malicious or unwanted traffic is a web application firewall (WAF) and basic firewalling services. Both do, in fact, prevent that traffic from consuming resources on the server because they reject it, thereby preventing it from ever being seen by the application. So far so good. But in a public cloud computing environment you’re going to have to pay for the resources the services consumed, too. In other words, you’re paying per hour to process illegitimate and unwanted traffic no matter what. Even if IaaS providers were to offer WAF and more firewall services, you’re going to pay for that and all the unwanted, malicious traffic that comes your way will cost you, burning up your budget faster than you can say “technological money pit.” This is not to say that both types of firewall services are not a good idea in a public cloud environment; they are a valuable resource regardless and should be part and parcel of any dynamic infrastructure. But it is true that in a public cloud environment they address only security issues, and are unlikely to redress cost overruns but instead may help you further along the path to budget burnout. HYBRID WILL DOMINATE CLOUD COMPUTING I’ve made the statement before, I’ll make it again: hybrid models will dominate cloud computing in general due primarily to issues around control. Control over processes, over budgets, and over services. The inability to effectively control traffic at the network layer imposes higher processing and server consumption rates in public environments than in private, controlled environments even when public resources are leveraged in the private environment through hybrid architectures enabled by virtual private cloud computing technologies. Traffic sprawl initiated because of shared IP addresses in public cloud computing environments alone is simply not a factor in private and even hybrid style architectures where public resources are never exposed via a publicly accessible IP address. Malicious traffic is never processed by applications and servers in a well-secured and architected private environment because firewalls and application firewalls screen out such traffic and prevent them from unnecessarily increasing compute and network resource consumption, thereby expanding the capacity of existing resources. The costs of such technology and controls are shared across the organization and are fixed, leading to better forecasting in budgeting and planning and eliminating the concern that such essential services are not the cause of a budget overrun. Control over provisioning of resources in private environments is more easily achieved through existing and emerging technology, while public cloud computing environments still struggle to offer even the most rudimentary of data center infrastructure services. Without the ability to apply enterprise-class controls and limits on public cloud computing resources, organizations are likely to find that the macro-economic costs of cloud end up negating the benefits initially realized by cheap, easy to provision resources. A clear strategy with defined boundaries and processes – both technical and people related – must be defined before making the leap lest sprawl overrun budgets and eliminate the micro-economic benefits that could be realized by public cloud computing.207Views0likes0CommentsHow to Earn Your Data Center Merit Badge
Two words: be prepared. Way back when,Don was the Scoutmaster for our local Boy Scout Troop. He’d been a Scout and earned his Eagle and, as we had a son entering scouting age, it was a great opportunity for Don to give back and for me to get involved. I helped out in many ways, not the least of which was to help the boys memorize the Scout promise and be able to repeat on-demand its Motto (Be Prepared) and its Slogan (Do a good turn daily). Back then there was no Robotics Merit Badge (it was eerily introduced while I was writing this post, not kidding) but Scouts embracing the concept of being prepared were surely able to apply that principle to other aspects of their lives, covered by merit badges or not. I was excited reading this newest merit badge, of course, as our pre-schooler is an avid lover of robots and knowing he may be able to merge the two was, well, very cool for a #geek parent. Now, the simple motto of the Boy Scouts is one that will always serve IT well, especially when it comes to operational efficiency and effectiveness in dealing with unanticipated challenges. It was just such a motto put forward in different terms by a director in the US Federal Government working on “emergency preparedness plans.” In a nutshell, he said, “Think about what you would do the day after and do it the day before.” That was particularly good advice that expanded well on what it means to “Be Prepared.” Now obviously IT has to be more responsive to potential outages or other issues in the data center than the next day. But the advice still holds if we simply reduce the advice to putting into place the policies and processes you would use to address a given challenge before it becomes a challenge. Or at least be prepared to implement such policies and processes should they become necessary. The deciding factor in when to implement pre-challenge policies is likely the time required. For example, If you lose your primary ISP connection, what would you do? Provision a secondary connection to provide connectivity until the primary is returned to service, most likely. Given the period of time it takes to provision such a resource, it’s probably best to provision before you need it. Similarly, the time to consider how you’ll respond to a flash-crowd is before it happens, not after. Ask yourself how would you maintain performance and availability, and then determine how best to go about ensuring that those pieces of the solution that cannot be provisioned or implemented on-demand are in place before they are needed. EARNING the DATA CENTER MERIT BADGE It is certainly the case that some policies, if pre-implemented as a mitigation technique to address future challenges, might interrupt the normal operations in the data center.As a means to alleviate this possibility it is advised that such policies be implemented in such a way as to trigger only in the event of an emergency. In other words, based on context and with a full understanding of the current conditions within and without the data center. Because nothing says success like an empty inbox Contextually-aware policies implemented at a strategic point of control offer the means by which IT can “be prepared” to handle an emergency situation: suddenly constrained capacity, performance degradation and even attacks against the data center network or applications delivered from therein. Such policies and the processes by which they were deployed have traditionally been a manual operations’ task: push a new configuration, provision a new server or force an update to a routing table. But contextually aware solutions provide a mechanism for encapsulating much of the process and policy required to address challenges that arise occasionally in the data center. You need infrastructure components that are capable of adapting the enforcement of policies with little to no manual intervention such that availability, security and performance levels are maintained at all times. That’s Infrastructure 2.0 for the uninitiated. These components must be aware of all factors that might degrade the operational posture of any one of the three, incurring operational risk that is unacceptable to the business. By leveraging strategic points of control to deploy contextually-aware policies you can automatically respond to the unexpected in many cases without disruption. This leads to consistent application performance, behavior and availability and ensures that IT is meeting the challenges of the business. Similarly, when considering deploying an application in a public cloud computing environment, part of the process needs to be the asking of serious questions regarding the management and future integration needs of that application. Today it may not be business critical, but if/when it is – what then? How would you integrate that application’s data with your internal systems? How would you integrate processes that rely upon that application with business or operational processes inside the data center? How might you extend identity and application access management systems such that cloud-hosted applications can leverage them? Being prepared in the data center means you need the strategic platforms in place before they’re necessary and then subsequently requires that you lay out a set of tactical plans that address specific challenges that may arise along the way, noting the specific conditions that “trigger” the need for such measures in order to codify the “day after” procedures in such a way as to make them automatically provisioned when necessary. Doing so improves the responsiveness of IT, a major driver toward IT as a Service for both IT and the business. Fulfilling the requirements for a data center merit badge is a lot easier than you might think: consider the challenges you may need to address, formulate a plan, and then implement it. Then wear your badge proudly. You’ll have earned it. Related blogs & articles: Cloud Chemistry 101 Data Center Feng Shui: Reliability is not the Absence of Failure Now Witness the Power of this Fully Operational Feedback Loop Solutions are Strategic. Technology is Tactical. What CIOs Can Learn from the Spartans Operational Risk Comprises More Than Just Security The Strategy Not Taken: Broken Doesn’t Mean What You Think It Means What is a Strategic Point of Control Anyway?207Views0likes1CommentData Center Feng Shui
The right form-factor in the right location at the right-time will maximize the benefits associated with cloud computing and virtualization. Feng Shui, simply defined, is the art of knowing where to place things to maximize benefits. There are many styles of Feng Shui but the goal of all forms is to create the most beneficial environment in which one can live, work, play, etc… based on the individual’s goals. Historically, feng shui was widely used to orient buildings—often spiritually significant structures such as tombs, but also dwellings and other structures—in an auspicious manner. Depending on the particular style of feng shui being used, an auspicious site could be determined by reference to local features such as bodies of water, stars, or a compass. Feng shui was suppressed in China during the cultural revolution in the 1960s, but has since seen an increase in popularity, particularly in the United States. -- Feng Shui, Wikipedia In the US, at least, Feng Shui has gained popularity primarily as it relates to interior design – the art of placing your furniture in the right places based on relationship to water, stars, and compass directions. Applying the art of Feng Shui to your data center architecture is not nearly as difficult as it may sound because essentially you’re doing the same thing: determining the best location (on or off-premise? virtual or physical? VNA or hardware?) for each network, application delivery network, and security component in the data center based on a set of organizational (business and operational) needs or goals. The underlying theory of Feng Shui is that location matters, and it is certainly true in the data center that location and form-factor matter to the harmony of the data center. The architectural decisions regarding a hybrid cloud computing infrastructure (a mix of virtual network appliances, hardware, and software) have an impact on many facets of operational and business goals.187Views0likes2CommentsSolutions are Strategic. Technology is Tactical.
And it all begins with the business. Last week was one of those weeks where my to-do list was growing twice as fast as I was checking things off. And when that happens you know some things end up deprioritized and just don’t get the attention you know they deserve. Such was the case with a question from eBizQ regarding the relationship between strategy and technology: Does strategy always trump technology? As Joe Shepley wonders in this interesting post, Strategy Trumps Technology Every Time, could you have an enterprise content management strategy without ECM technology. So do you think strategy trumps technology every time? I answered with a short response because, well, it was a very long week: I wish I had more time to expound on this one today but essentially technology is a tactical means to implement a solution as part of the execution on a strategy designed to address a business need/problem. That definitely deserves more exploration and explanation. STRATEGY versus TACTICS The reason this was my answer is the difference between strategy and tactics. Strategy is the overarching goal; it’s the purpose to which you are working. Tactics, on the other hand, are specific details regarding how you’re going to achieve that goal. Let’s apply it to something more mundane. For example: The focus of the strategy may be very narrow – consuming a sammich – or it may be very broad and vague, as it often is when applied to military or business strategy. Regardless, a strategy is always in response to some challenge and defines the goal, the solution, to addressing the challenge. Business analysts don’t sit around, after all, and posit that the solution to increasing call duration in the call center is to implement software X deployed on a cloud computing framework. The solution is to improve the productivity of the customer service representatives. That may result in the implementation of a new CRM system, i.e. technology, but it just as well may be a more streamlined business process that requires changes in the integration of the relevant IT systems. The implementation, the technology, is tactical. Tactics are more specific. In military strategy the tactics are often refined as the strategy is imparted down the chain of command. If the challenge is to stop the enemy from crossing a bridge, the tactics will be very dependent on the resources and personnel available to each commander as they receive their orders. A tank battalion, for example, is going to use different tactics than the engineer corps, because they have different resources, equipment and ultimately perspectives on how to go about achieving any stated goal. The same is true for IT organizations. The question posed was focused on enterprise content management, but you can easily abstract this out to an enterprise architecture strategy or application delivery strategy or cloud computing strategy. Having a strategy does not require a related technology because technology is tactical, solutions are strategic. The challenge for an organization may be too much content or it may be that it’s process-related, e.g. the approval process for content as it moves through the publication cycle is not well-defined, or has a single point of failure in it that causes delays in publication. The solution is the strategy. For the former it may be to implement an enterprise content management solution, for the latter it may be to sit down and hammer out a better process and even to acquire and deploy a workflow or BPM (Business Process Management) solution that is better able to manage fluctuations in people and the process. The tactics are the technology; it’s the how we’re going to do it as opposed to the what we’re going to do. CHALLENGE –> SOLUTION –> TECHNOLOGY This is an important distinction, to separate solutions from technology; strategy from tactics. If the business declares that the risk of a data breach is too high to bear, the enterprise IT strategy is not to implement a specific technology but to discover and plug all the possible “holes” in the strategic lines of defense. The solution to a vulnerability in an application is “web application security”. The technology may be a web application firewall (WAF) or it may be vulnerability scanning solutions run on pre-deployed code to identify potential vulnerabilities. When we talk about strategic points of control we aren’t necessarily talking about specific technology but rather solutions and those locations within the data center that are best able to be leveraged tactically to a wide variety of strategic solutions. The strategic trifecta is a good example of this model because it’s based on the same concepts: that a strategy is driven by a business challenge or need and executed upon using technology. The solution is not the implementation; it’s not the tactical response. Technology doesn’t enter into the picture into we get down to the implementation, to specific products and platforms we need to implement a strategy consistent with meeting the defined business goal or challenge. The question remains whether “strategy trumps technology” or not and what I was trying to impart is what a subsequent response said much eloquently and concisely: The question isn't which one trumps but how should they be aligned in order to provide value to the customer. -- Kathy Long There shouldn’t be a struggle between the two for top billing honors. They are related, after all; a strategy needs to be implemented, to be executed upon, and that requires technology. It’s more a question of which comes first in a process that should be focused on solving a specific problem or meeting some business challenge. Strategy needs to be defined before implementation because if you don’t know what the end-goal is, you really can’t claim victory or admit defeat. A solution is strategic, technology is tactical. This distinction can help IT by forcing more attention on the business and solutions layer as it is at the strategic layer that IT is able to align itself with the business and provide greater value to the entire organization. Does strategy always trump technology? What CIOs Can Learn from the Spartans Operational Risk Comprises More Than Just Security The Strategy Not Taken: Broken Doesn’t Mean What You Think It Means What is a Strategic Point of Control Anyway? Cloud is the How not the What487Views0likes0Comments