efficiency
8 TopicsTrue or False: Application acceleration solutions teach developers to write inefficient code
It has been suggested that the use of application acceleration solutions as a means to improve application performance would result in programmers writing less efficient code. In a comment on “The House that Load Balancing Built” a reader replies: Not only will it cause the application to grow in cost and complexity, it's teaching new and old programmers to not write efficient code and rely on other products and services on [sic] thier behalf. I.E. Why write security into the app, when the ADC can do that for me. Why write code that executes faster, the ADC will do that for me, etc., etc. While no one can control whether a programmer writes “fast” code, the truth is that application acceleration solutions do not affect the execution of code in any way. A poorly constructed loop will run just as slow with or without an application acceleration solution in place. Complex mathematical calculations will execute with the same speed regardless of the external systems that may be in place to assist in improving application performance. The answer is, unequivocally, that the presence or lack thereof of an application acceleration solution should have no impact on the application developer because it does nothing to affect the internal execution of written code. If you answered false, you got the answer right. The question has to be, then, just what does an application acceleration solution do that improves performance? If it isn’t making the application logic execute faster, what’s the point? It’s a good question, and one that deserves an answer. Application acceleration is part of a solution we call “application delivery”. Application delivery focuses on improving application performance through optimization of the use and behavior of transport (TCP) and application transport (HTTP/S) protocols, offloading certain functions from the application that are more efficiently handled by an external often hardware-based system, and accelerating the delivery of the application data. OPTIMIZATION Application acceleration improves performance by understanding how these protocols (TCP, HTTP/S) interact across a WAN or LAN and acting on that understanding to improve its overall performance. There are a large number of performance enhancing RFCs (standards) around TCP that are usually implemented by application acceleration solutions. Delayed and Selective Acknowledgments (RFC 2018) Explicit Congestion Notification (RFC 3168) Limited and Fast Re-Transmits (RFC 3042 and RFC 2582) Adaptive Initial Congestion Windows (RFC 3390) Slow Start with Congestion Avoidance (RFC 2581) TCP Slow Start (RFC 3390) TimeStamps and Windows Scaling (RFC 1323) All of these RFCs deal with TCP and therefore have very little to do with the code developers create. Most developers code within a framework that hides the details of TCP and HTTP connection management from them. It is the rare programmer today that writes code to directly interact with HTTP connections, and even rare to find one coding directly at the TCP socket layer. The execution of code written by the developer takes just as long regardless of the implementation or lack of implementation of these RFCs. The application acceleration solution improves the performance of the delivery of the application data over TCP and HTTP which increases the performance of the application as seen from the user’s point of view. OFFLOAD Offloading compute intensive processing from application and web servers improves performance by reducing the consumption of CPU and memory required to perform those tasks. SSL and other encryption/decryption functions (cookie security, for example) are computationally expensive and require additional CPU and memory on the server. The reason offloading these functions to an application delivery controller or stand-alone application acceleration solution improves application performance is because it frees the CPU and memory available on the server and allows it to be dedicated to the application. If the application or web server does not need to perform these tasks, it saves CPU cycles that would otherwise be used to perform them. Those cycles can be used by the application and thus increases the performance of the application. Also beneficial is the way in which application delivery controllers manage TCP connections made to the web or application server. Opening and closing TCP connections takes time, and the time required is not something a developer – coding within a framework – can affect. Application acceleration solutions proxy connections for the client and subsequently reduce the number of TCP connections required on the web or application server as well as the frequency with which those connections need to be open and closed. By reducing the connections and frequency of connections the application performance is increased because it is not spending time opening and closing TCP connections, which are necessarily part of the performance equation but not directly affected by anything the developer does in his or her code. The commenter believes that an application delivery controller implementation should be an afterthought. However, the ability of modern application delivery controllers to offload certain application logic functions such as cookie security and HTTP header manipulation in a centralized, optimized manner through network-side scripting can be a performance benefit as well as a way to address browser-specific quirks and therefore should be seriously considered during the development process. ACCELERATION Finally, application acceleration solutions improve performance through the use of caching and compression technologies. Caching includes not just server-side caching, but the intelligent use of the client (usually the browser) cache to reduce the number of requests that must be handled by the server. By reducing the number of requests the server is responding to, the web or application server is less burdened in terms of managing TCP and HTTP sessions and state, and has more CPU cycles and memory that can be dedicated to executing the application. Compression, whether using traditional industry standard web-based compression (GZip) or WAN-focused data de-duplication techniques, decreases the amount of data that must be transferred from the server to the client. Decreasing traffic (bandwidth) results in fewer packets traversing the network which results in quicker delivery to the user. This makes it appear that the application is performing faster than it is, simply because it arrived sooner. Of all these techniques, the only one that could possibly contribute to the delinquency of developers is caching. This is because application acceleration caching features act on HTTP caching headers that can be set by the developer, but rarely are. These headers can also be configured by the web or application server administrator, but rarely are in a way that makes sense because most content today is generated dynamically and is rarely static, even though individual components inside the dynamically generated page may in fact be very static (CSS, JavaScript, images, headers, footers, etc…). However, the methods through which caching (pragma) headers are set is fairly standard and the actual code is usually handled by the framework in which the application is developed, meaning the developer ultimately cannot affect the efficiency of the use of this method because it was developed by someone else. The point of the comment was likely more broad, however. I am fairly certain that the commenter meant to imply that if developers know the performance of the application they are developing will be accelerated by an external solution that they will not be as concerned about writing efficient code. That’s a layer 8 (people) problem that isn’t peculiar to application delivery solutions at all. If a developer is going to write inefficient code, there’s a problem – but that problem isn’t with the solutions implemented to improve the end-user experience or scalability, it’s a problem with the developer. No technology can fix that.251Views0likes4Comments1024 Words: The Devops Butterfly Effect
#devops A single configuration error can have far flung impact across IT and the business Chaos theory claims a butterfly flapping its wings in one area of the world can result in a hurricane elsewhere. The impact of devops – or the lack thereof – may not be as devastating, but it does have an impact in terms of time, money and risk. IT Chaos Theory: The PeopleSoft Effect Cloud Delivery Model is about Ops, not Apps 1024 Words: Building Secure Web Applications F5 Security: The Changing Threat Landscape The Changing Security Threat Landscape Infographic All 1024 Words Posts on DevCentral189Views0likes0CommentsThe Secret to Doing Cloud Scalability Right
Hint: The answer lies in being aware of the entire application context and a little pre-planning Thanks to the maturity of load balancing services and technology, dynamically scaling applications in pre-cloud and cloud computing environments is a fairly simple task. But doing it right – in a way that maintains performance while maximizing resources and minimizing costs well, that is not so trivial a task unless you have the right tools. SCALABILITY RECAP Before we can explain how to do it right, we have to dig into the basics of how scalability (and more precisely auto-scalability) works and what’s required to scale not only dynamically. A key characteristic of cloud computing is scalability, or more precisely the ease with which scalability can be achieved. Scalability and Elasticity via dynamic ("on-demand") provisioning of resources on a fine-grained, self-service basis near real-time, without users having to engineer for peak loads. -- Wikipedia, “Cloud Computing” When you take this goal apart, what folks are really after is the ability to transparently add and/or remove resources to an “application” as needed to meet demand. Interestingly enough, both in pre-cloud and cloud computing environments this happens due to two key components: load balancing and automation. Load balancing has always been used to scale applications transparently. The load balancing service provides a layer of virtualization in the network that abstracts the “real” resources providing the application and makes many instances of that application appear to be a single, holistic entity. This layer of abstraction has the added benefit of allowing the load balancing service to see both the overall demand on the “application” as well as each individual instance. This is important to cloud scalability because a single application instance does not have the visibility necessary to see load at the “application” layer, it sees only load at the application instance layer, i.e. itself. Visibility is paramount to scalability to maintain efficiency of scale. That means measuring CAP (capacity, availability, and performance) both at the “virtual” application and application instance layers. These measurements are generally tied to business and operational goals – the goals upon which IT is measured by its consumers. The three are inseparable and impact each other in very real ways. High capacity utilization often results in degrading performance, availability impacts both capacity and performance, and poor performance can in turn degrade capacity. Measuring only one or two is insufficient; all three variables must be monitored and, ultimately, acted upon to achieve not only scalability but efficiency of scale. Just as important is flexibility in determining what defines “capacity” for an application. In some cases it may be connections, in other CPU and/or memory load, and in still others it may be some other measurement. It may be (should be) a combination of both capacity and performance, and any load balancing service ought to be able to balance all three variables dynamically to achieve maximum results with minimum resources (and therefore in a cloud environment, costs). WHAT YOU NEED TO KNOW BEFORE YOU CONFIGURE There are three things you must do in order to ensure cloud scalability is efficient: 1. Determine what “capacity” means for your application. This will likely require load testing of a single instance to understand resource consumption and determine an appropriate set of thresholds based on connections, memory and CPU utilization. Depending on what load balancing service you will ultimately use, you may be limited to only viewing capacity in terms of concurrent connections. If this is the case – as is generally true in an off-premise cloud environment where services are limited – then ramp up connections while measuring performance (be sure to read #3 before you measure “performance”). Do this multiple times until you’re sure you have a good average connection limit at which performance becomes an issue. 2. Determine what “available” means for an application instance. Try not to think in simple terms such as “responds to a ping” or “returns an HTTP response”. Such health checks are not valid when measuring application availability as they only determine whether the network and web server stack are available and responding properly. Both can be true yet the application may be experiencing troubles and returning error codes or bad data (or no data). In any dynamic environment, availability must focus on the core unit of scalability – the application. If that’s all you’ve got in an off-premise cloud load balancing service, however, be aware of the risk to availability and pass on the warning to the business side of the house. 3. Determine “performance” threshold limitations for application instances. This value directly impacts the performance of the virtual application. Remember to factor in that application responses times are the sum of the time it takes to traverse from the client to the application and back. That means the application instance response time is only a portion, albeit likely the largest portion, of the overall performance threshold. Determine the RTT (round trip time) for an average request/response and factor that into the performance thresholds for the application instances. WHY IS THIS ALL IMPORTANT If you’re thinking at this point that it’s not supposed to require so much work to “auto-scale” in cloud computing environments, well, it doesn’t have to. As long as you’re willing to trade a higher risk of unnoticed failure with performance degradation as well as potentially higher-costs in inefficient scaling strategies, then you need do nothing more than just “let go, let cloud” (to shamelessly quote the 451 Group’s Wendy Nather ). The reason that ignoring all the factors that impact when to scale out and back down is so perilous is because of the limitations in load balancing algorithms and, in particular in off-premise cloud environments – inability to leverage layer 7 load balancing (application switching, page routing, et al) to architect scalability domains. You are left with a few simple and often inefficient algorithms from which to choose, which impedes efficiency by making it more difficult to actually scale in response to actual demand and its impact on the application. You are instead reacting (and often too late) to individual pieces of data that alone do not provide a holistic view of the application, but rather only limited views into application instances. Cloud scalability – whether on-premise or off – should be a balancing (pun only somewhat intended) act that maximizes performance and efficiency while minimizing costs. While allowing “the cloud” to auto-scale encourages operational efficiency, it often does so at the expense of performance and higher costs. An ounce of prevention is worth a pound of cure, and in the case of scalability a few hours of testing is worth a month of additional uptime.240Views0likes0CommentsAligning IT with the Business by Decreasing Efficiency
Here’s the conundrum: utilizing every last drop of network, storage, and compute resources can impede performance and, through it, the business’ bottom line. So which do you choose? There are a few vertical industries for which performance is absolutely critical. A delay of even a micro-second can mean a huge differential in revenue or lost opportunities. A delay of seconds is a disaster, and more than that? Might as call yourself unavailable. While most organizations do not have such stringent “do or die” performance requirements, performance is always top of mind because users, well, users and customers are increasingly demanding with regards to the performance of their hyper-connected, online-driven lives. So along comes “cloud” and introduces the myth of 100% efficiency. Like a Maxwell House ad, providers and pundits alike tout full utilization – to the last drop – of compute, network, and storage infrastructure. Stop wasting resources! Put those idle resources to work for you! Save money! Do it now or the business will outsource you! It’s enough to make you want to do it and the performance of their applications be damned, isn’t it? So you do, and now you have to listen to complaints and watch the help desk tickets pile up regarding performance of applications – yours and everyone else’s, for that matter. You didn’t realize you were going to be responsible for timely responses from Facebook and Twitter, did you? See, there are some technical reasons why operations never ran network and server infrastructure components at 100% utilization. In fact, the rule of thumb was always 60% for most organizations and a harsher 30% or so for those with more performance-sensitive business needs. LET’S TALK ABOUT QUEUES IMage credit: Freefoto.com Not, not the UK English version of what many IT administrators and operators have hanging down their backs, the technical queues, the ones that handle input and output for network stacks and applications in every corner of the data center. See, any device that processes packets (which means everything network-capable) utilizes some sort of queue to manage the reality that packets will eventually come in faster than they can be processed. As packet processing “backs up”, the queue fills up. The longer it takes for packets to get through the queue and be processed, the longer it takes the overall exchange of data to occur. One of the reasons packets might “back up” in the queue is that the time it takes to process the packet – apply security, route it to the correct VLAN/network/etc…, apply quality of service policies – is related to the utilization on the device. The more packets the device is trying to process the more it consumes CPU and RAM and associated hardware resources which translates into less available resources being spread around all the different functions that must be performed on the device. The more the resources are consumed, the slower the device can process packets. This also happens on web/application servers, by the way, when a client is trying to receive data but is doing so over a relatively slow connection. The client can only pull data so fast, and so the send/receive queues on the web/application server remain filled with data until the client can complete the transfer. There are only so many send/receive queues available for use on a web/application server, so emptying those queues as quickly as possible is a primary focus for application delivery infrastructure as a means to improve capacity and overall performance. In any case, there is a fixed amount of compute resources available for each device/server and it must be shared across all the queues it is managing. As the utilization of devices increases, it means that the time-slices each queue receives to process data decreases, which means fewer packets are processed with every processing time-slice. That means packets “back up” in the queue, waiting their turn to be processed. Waiting = latency, and latency = delay in service delivery. The higher the utilization, the longer the queues. The longer the queues, the higher the latency. It’s actually a pretty simple equation when you get down to it. A BALANCING ACT This is where the rubber meets the road – balancing the need for speed with the need to be efficient. Idle resources are the devil’s playground, right? It’s a waste to leave resources unused when they could be doing something. At least that’s the message being sent by cloud computing advocates, even though efficiency of IT processes is much more a realizable benefit from cloud computing than efficiency of resources. For many organizations that’s absolutely true. Why not leverage idle resources to fill additional capacity needs? It just makes financial and technical sense. Unless you’re an organization whose livelihood (and revenue stream) depends on speed. If even a microsecond of latency may cost the business money, then utilization is important to you only in the sense that you want to keep it low enough on every network component that touches the data flow such that near-zero latency is introduced. If that means imposing a “no more than 30%” utilization on any component policy, that’s what it means – efficiency be damned. Different business models have different performance needs and while the majority of organizations do not have such stringent requirements regarding performance, those that do will never buy into the Maxwell House theory of resource utilization. They can’t, because doing so makes it impossible to meet performance requirements which are, for them, a much higher priority than utilization. Basically, the cost of failing to perform is much higher than the cost of acquiring and managing resources. This doesn’t mean that cloud computing isn’t a fit for such performance-focused organizations. In fact, cloud computing can be an asset for those organizations in the same way it is for organizations trying to achieve a “good to the last drop” resource utilization policy. It’s just that performance-minded organizations will set their thresholds for provisioning additional resources extremely low, to ensure optimal performance on each and every network and server component. Where most organizations may provision additional capacity when a component reaches 70-80% utilization, performance-minded organizations will likely try to remain below the 50% threshold – at around 30%. Or more accurately, they’ll use a context-aware network of application delivery components that can assist in maintaining performance levels by actually watching the real-time performance of applications and feeding that data into the appropriate systems to ensure additional capacity is provisioned before performance is impacted by increased utilization. Because load on a server – virtual or iron – is a direct input to the performance equation, utilization is an important key performance metric that should be monitored and leveraged as part of the automated provisioning process. ENGAGE the ENTIRE INFRASTRUCTURE Performance-minded organizations aren’t just financial and banking, as you might assume. Organizations running call centers of any kind should be, if they aren’t already, performance-focused for at least their call center applications. Why? Because milliseconds add up to seconds add up to minutes add up to reduced utilization of agents. It means a less efficient call center that costs more per customer to run. Most call centers leverage web-based applications and delays in moving through that system mean increased call duration and lowered agent utilization. That all translates into increased costs – hard and soft – that must be balanced elsewhere in the business’ financial ledger. While certainly not as laser-focused on performance as perhaps a financial institution, organizations for whom customer costs are an important key performance indicator should be concerned about the utilization of components across their entire data center. That means balancing the costs of “idle” resources versus costs incurred by delays caused by latency and a decision: where is the tipping point for utilization? At which point does the cost of latency exceed the costs of those idle resources? That’s your maximum utilization point, and it may be well below the touted 100% (or nigh unto that) utilization of cloud computing. Don’t forget, however, that this is essentially an exercise in tuning your entire data center infrastructure. You may baseline your infrastructure with a tipping point of 60% utilization, but by leveraging the appropriate application delivery technologies – caching, compression, network and application optimization, offload capabilities – the tipping point may be increased to 70% or higher. This is an iterative process requiring an agile infrastructure and operational culture; one that is able to tune, tweak, and refine the application delivery process until it’s running like a finely honed race car engine. Optimally burning the right amount of resources to provide just the right amount of performance such that the entire data center is perfectly balanced with the business. This is the process that is often overlooked and rarely discussed: that the data center is not – or should not be – simply a bunch of interconnected devices through which packets pass. It should be a collaborative, integrated ecosystem of components working in concert to enable the balance needed in the data center to ensure maximum utilization without compromising performance and through it, business requirements. So though it sounds counterintuitive, it may actually be necessary to decrease efficiency within IT as a means to align with business needs and ensure the right balance of performance, utilization, and costs. Because “costs” aren’t just about IT costs, they’re about business costs, too. When a decrease in IT costs increases business costs, nobody wins. Related blogs & articles: The Myth of 100% IT Efficiency IT Myths and Legends: Sharing Servers Cloud + BPM = Business Process Scalability WILS: What Does It Mean to Align IT with the Business Business-Layer Load Balancing Infrastructure 2.0: Aligning the network with the business (and the rest of IT) Caveat Emptor: Be sure to align your goals for cloud computing with provider models before you sign up Like Garth, We Fear Change183Views0likes0CommentsWAN Optimization is not Application Acceleration
Increasingly WAN optimization solutions are adopting the application acceleration moniker, implying a focus that just does not exist. WAN optimization solutions are designed to improve the performance of the network, not applications, and while the former does beget improvements of the latter, true application acceleration solutions offer greater opportunity for improving efficiency and end-user experience as well as aiding in consolidation efforts that result in a reduction in operating and capital expenditure costs. WAN Optimization solutions are, as their title implies, focused on the WAN; on the network. It is their task to improve the utilization of bandwidth, arrest the effects of network congestion, and apply quality of service policies to speed delivery of critical application data by respecting application prioritization. WAN Optimization solutions achieve these goals primarily through the use of data de-duplication techniques. These techniques require a pair of devices as the technology is most often based on a replacement algorithm that seeks out common blocks of data and replaces them with a smaller representative tag or indicator that is interpreted by the paired device such that it can reinsert the common block of data before passing it on to the receiver. The base techniques used by WAN optimization are thus highly effective in scenarios in which large files are transferred back and forth over a connection by one or many people, as large chunks of data are often repeated and the de-duplication process significantly reduces the amount of data traversing the WAN and thus improves performance. Most WAN optimization solutions specifically implement “application” level acceleration for protocols aimed at the transfer of files such as CIFS and SAMBA. But WAN optimization solutions do very little to aid in the improvement of application performance when the data being exchanged is highly volatile and already transferred in small chunks. Web applications today are highly dynamic and personalized, making it less likely that a WAN optimization solution will find chunks of duplicated data large enough to make the overhead of the replacement process beneficial to application performance. In fact, the process of examining small chunks of data for potential duplicated chunks can introduce additional latency that actual degrades performance, much in the same way compression of small chunks of data can be detrimental to application performance. Too, WAN optimization solutions require deployment in pairs which results in what little benefits these solutions offer for web applications being enjoyed only by end-users in a location served by a “remote” device. Customers, partners, and roaming employees will not see improvements in performance because they are not served by a “remote” device. Application acceleration solutions, however, are not constrained by such limitations. Application acceleration solutions act at the higher layers of the stack, from TCP to HTTP, and attempt to improve performance through the optimization of protocols and the applications themselves. The optimizations of TCP, for example, reduce the overhead associated with TCP session management on servers and improve the capacity and performance of the actual application which in turn results in improved response times. The understanding of HTTP and both the browser and server allows application acceleration solutions to employ techniques that leverage cached data and industry standard compression to reduce the amount of data transferred without requiring a “remote” device. Application acceleration solutions are generally asymmetric, with some few also offering a symmetric mode. The former ensures that regardless of the location of the user, partner, or employee that some form of acceleration will provide a better end-user experience while the latter employs more traditional WAN optimization-like functionality to increase the improvements for clients served by a “remote” device. Regardless of the mode, application acceleration solutions improve the efficiency of servers and applications which results in higher capacities and can aid in consolidation efforts (fewer servers are required to serve the same user base with better performance) or simply lengthens the time available before additional investment in servers – and the associated licensing and management costs – must be made. Both WAN optimization and application acceleration aim to improve application performance, but they are not the same solutions nor do they even focus on the same types of applications. It is important to understand the type of application you want to accelerate before choosing a solution. If you are primarily concerned with office productivity applications and the exchange of large files (including backups, virtual images, etc…) between offices, then certainly WAN optimization solutions will provide greater benefits than application acceleration. If you’re concerned primarily about web application performance then application acceleration solutions will offer the greatest boost in performance and efficiency gains. But do not confuse WAN optimization with application acceleration. There is a reason WAN optimization-focused providers have recently begun to partner with application acceleration and application delivery providers – because there is a marked difference between the two types of solutions and a single offering that combines them both is not (yet) available.812Views0likes2CommentsHow do you get the benefits of shared resources in a private cloud?
I was recording a podcast last week on the subject of cloud with an emphasis on security and of course we talked in general about cloud and definitions. During the discussion the subject of “private cloud” computing was raised and one of the participants asked a very good question: Some of the core benefits of cloud computing come from shared resources. In a private cloud, where does the sharing of resources come from? I had to stop and think about that one for a second, because it’s not something I’ve really thought about before. But it was a valid point; without sharing of resources the reduction in operating costs is not as easily realized. But even in an enterprise data center there is a lot more sharing that could be going on than perhaps people realize. SHARING in the ENTERPRISE There are plethora of ways in which sharing of resources can be accomplished in the enterprise. That’s because there are just as many divisions within an organization for which resources are often dedicated as there are outside the organization. Sometimes the separation is just maintained in the financial ledger, but just as frequently the separation manifests itself physically in the datacenter with dedicated resources. Individual initiatives. Departmental level applications. Lines of business. Subsidiaries. Organizations absorbed – mostly - via mergers and acquisitions. Each of these “entities” can – and often does – have its own budgets and thus dedicated resources. Some physical resources in the data center are dedicated to specific projects, or departments, or lines of business and it is often the case that the stakeholders of applications deployed on those resources “do not play well with others” in that they aren’t about to compromise the integrity and performance of that application by sharing what might be perfectly good compute resources with other folks across the organization. Thus is it perfectly reasonable to believe that there are quite a few “dedicated” resources in any large data center which can be shared across the organization. And given chargeback methods and project portfolio management methods, this results in savings in much the same manner as it would were the applications deployed to a public cloud. But there is also a good deal of compute resources that go to waste in the data center due to constraints placed upon hardware utilization by organizational operating policies. Many organizations still limit the total utilization of resources on any given machine (and hardware) to somewhere between 60% and 80%. After that the administrators get nervous and begin thinking about deploying a second machine from which resources can be utilized. This is often out of consideration for performance and a fear of over-provisioning that could result in the dread “d” word: downtime. Cloud computing models, however, are supposed to ensure availability and scalability through on-demand provisioning of resources. Thus if a single instance of an application begins to perform poorly or approaches capacity limits, another instance should be provisioned. The models themselves assume full utilization of all compute resources across available hardware, which means those pesky utilization limits should disappear. Imagine if you had twenty or thirty servers all running at 60% utilization that were suddenly freed to run up to 90% (or higher)? That’s like gaining … 600-900% more resources in the data center or 6-9 additional servers. The increase in utilization offers the ability to share the resources that otherwise sat idle in the data center. INCREASING VM DENSITY If you needed even more resources available to share across the organization, then it’s necessary to increase the density of virtual machines within the data center. Instead of a 5:1 VM –> physical server ratio you might want to try for 7:1 or 8:1. To do that, you’re going to have to tweak out those virtual servers and ensure they are as efficient as possible so you don’t compromise application availability or performance. Sounds harder than it is, trust me. The same technology – unified application delivery - that offloads compute intense operations from physical servers can do the same for virtual machines because what the solutions are really doing in the case of the former is optimizing the application, not the physical server. The offload techniques that provide such huge improvements in the efficiency of servers comes from optimizing applications and the network stack, both of which are not tied to the physical hardware but are peculiar to the operating system and/or application or web server on which an application is deployed. By optimizing the heck out of that, the benefits of offload technologies can be applied to all servers: virtual or physical. That means lower utilization of resources on a per virtual machine basis, which allows an organization to increase the VM density in their data center and frees up resources across physical servers that can be “shared” by the entire organization. CHANGE ATTITUDES AND ARCHITECTURES The hardest thing about sharing resources in a private cloud implementation is going to be changing the attitudes of business stakeholders toward the sharing of resources. IT will have to assure those stakeholders that the sharing of resources will not adversely affect the performance of applications for which those stakeholders are responsible. IT will need to prove to business stakeholders that the resulting architecture may actually lower costs of deploying new applications in the data center because they’ll only be “paying” (at least on paper and in accounting ledgers) for what they actually use rather than what is available. By sharing compute resources across all business entities in the data center, organizations can, in fact, realize the benefits of cloud computing models that comes from sharing of systems. It may take a bit more thought in what solutions are deployed as a foundation for that cloud computing model, but with the right solutions that enable greater efficiencies and higher VM densities the sharing of resources in a private cloud computing implementation can certainly be achieved. What is server offload and why do I need it? I am wondering why not all websites enabling this great feature GZIP? 3 Really good reasons you should use TCP multiplexing SOA & Web 2.0: The Connection Management Challenge Green IT: Reduce, Reuse, Recycle512Views0likes1CommentInfrastructure 2.0: The Diseconomy of Scale Virus
The diseconomy of scale so adversely affecting the IP address management space isn't limited to network infrastructure; it's crawling up the stack steadily and infecting all layers of the data center like some kind of unstoppable infrastructure management virus. That is why, even with the simple act of managing an enterprise network’s IP addresses, which is critical to the availability and proper functioning of the network, actually goes up as IP addresses are added. As TCP/IP continues to spread and take productivity to new heights, management costs are already escalating. -- Greg Ness, "What Are the Barriers to Entry and IT Diseconomies?" Greg does a great job of explaining exactly why the costs of management escalate with each IP address added to the infrastructure which, in cloud computing environments, can be many. What isn't often explained is how that diseconomy of scale at the IP address layer travels upwards quickly to escalate management costs and increases complexity for traditional scaling infrastructure as well. THE TRADITIONAL SCALING MODEL Traditional scaling models take advantage of an application delivery controller (load balancer) to horizontally scale applications. In this model, the application and its server (web or application) are replicated a number of times and the application delivery controller acts acts as a virtual copy of the application externally, distributing requests across the replicated copies of the server internally. If three applications are being scaled, then there are three virtual servers on the outside, with a set number of application servers on the inside actually serving up the application. So there may be ten physical instances of Application A, and ten instances of Application B, and ten of Application C. The number may deviate periodically based on maintenance windows and unplanned outages, but generally speaking the number of instances and the physical servers on which those application instances are deployed stays constant. In a virtualized or cloud computing model, these same principles of scaling are used, but the servers inside the data center are virtual and dynamic. The three applications in the previous example still require three virtual servers on the application delivery controller, but the number of servers on the inside (in each application pool|farm|cluster) are not static. There may be four servers for Application A while there are ten servers for Application B. At another time there may be seven servers for Application A and only two for Application B. Making the situation even more complex is the fact that not only are there are a variable number of application servers in each virtual server's application pool|farm|cluster, those application servers may reside on different physical servers at any given time. Using traditional scaling technology, each virtual server instance on the application delivery controller would need to be configured with every possible physical server instance on which the application server could be running. If the virtual data center contains thirty physical servers, the resources of which will be shared by those three applications, then each pool|farm|cluster for each application on the application delivery controller must necessarily be configured to contain and monitor each physical server in the infrastructure. This results in increased configuration and management, and has adverse affects on the network infrastructure as each virtual server must necessarily ping|query each server in its associated pool|farm|cluster in order to determine available instances of the application it is representing. This means a lot of additional configuration and network traffic as the application delivery controller attempts to manage the applications it is tasked with delivering. THE INFRASTRUCTURE 2.0 SCALING MODEL The Infrastructure 2.0 scaling model is based on the traditional model of scaling in its behavior but extended to be a better fit with the dynamic, elastic nature of emerging data center architecture. What makes the I2.0 model much more efficient and able to scale from a management perspective is the ability of the application delivery controller to be as dynamic as the infrastructure it is supporting. Rather than configure every instance of a virtual application with every possible physical server, the application delivery controller is notified using standards-based control mechanisms (APIs) when an application is brought on or off-line and automatically configures itself appropriately. This behavior results in a more efficient architecture, as the application delivery controller need only monitor the application servers actually executing applications at the time it performs its status inquiries, and reduces the amount of traffic on the network inside the data center. An agile, adaptable application delivery controller also improves efficiency by reducing the number of pings and connections and queries it must make of application servers, thus reducing the burden on those servers and ensuring that resources are consumed only when truly necessary. Implementing an I2.0 application delivery model requires less rigid control over the IP address space, as well, as it is no longer necessary to hardwire such information in the application delivery controller; it can adapt, in real-time, and automatically be configured with that information as applications are brought on and off line to deal with increases and decreases in capacity. The I2.0 model can be implemented through instrumentation of applications using standards-based APIs, or it can be implemented as a separate, integrated management mechanism that provides additional functionality by taking advantage of that same standards-based API. As Greg so often notes, the increase in IP addresses due to virtualization and cloud computing can quickly result in escalating costs to manage and increased complexity in data center architecture. This is also true at the application layer due to the traditionally static nature of networks and load balancing infrastructure. Application delivery solutions are necessarily elastic; they are agile and adaptable infrastructure devices capable of responding in real-time to server, network, and application conditions both internal and external to the data center. The use of an application delivery controller to implement a scaling solution for traditional and virtualized environments greatly reduces the burden on servers, on administrators, on the network, and on the client by optimizing, accelerating, and securing the applications it delivers in the most operationally efficient manner possible.278Views0likes4CommentsThe Unpossible Task of Eliminating Risk
An ant named Archimedes is in a hole 6' deep. He climbs half the distance to the top every hour. How long does it take for him to escape the hole? Trick question. He can never, mathematically, escape. Realistically, we know that when Archimedes gets close to the top he will escape because he is actually longer than the amount of hole he has left to go. But what if every hour that Archimedes climbed the hole expanded 6" and thus changed the equation? He'd be one frustrated ant, that's what he'd be. That's how IT security professionals must certainly feel when trying to climb out of the hole that is web application security they're tossed into every day and then told "hurry up, get us out of here!" Elimination of risk is an impossibility. If elimination were a possibility, then network errors would never occur. At the very core of computing and networking lies this basic fact: bits are either on or off. But are they? By using light and electrical signals to transmit bits we have introduced the risk that a bit with be maybe on or maybe off. Both types of signals can weaken due to distance or fluctuations in power strength, thus degrading the options to black, white and some shade of gray. This makes interpretation more fuzzy: it's on, off, or somewhere in between. This is why we always talk in terms of mitigating risk, not eliminating it. Elimination of risk is pretty much a mathematical limit and the equation changes every day with the introduction of new technology, newly discovered exploits and vulnerabilities, and an increase in the number of "bad guys" out there attempting to slither through your security measures. The ratio of them to you is pretty frightening, and even though you've likely employed a vast array of security technology measures to stop them, you can't eliminate the possibility entirely. You can only mitigate it, and get it as close to zero as possible. If Archimedes (who was really one of the greatest mathematicians in history and came up with the idea of limits and not an ant) were an IT security professional today he'd probably say that you can get close enough that you might as well have eliminated all the risk. But there's a big difference between a polygon being close enough to be a circle and mitigating risk being close enough to eliminating it. For one, Archimedes' job wasn't on the line if a polygon wasn't really a circle, and he wasn't trying to protect personal, private data of thousands of people. That's why it's amusing to me when folks rail against web application firewalls. A WAF is another weapon in your arsenal with which you can reduce the risk of a security breach. It's another layer of security that can help prevent a wide variety of attacks and has the added benefit of reducing the burden of scanning and inspecting requests on servers so they can perform better and work more efficiently. When you're faced with an impossible task like eliminating risk, why eschew any help you can get? While no technology can get you to zero risk, a WAF can get you closer much faster. Side note: the etymology of impossible includes "unpossible", most commonly used in the middle ages. While now obsolete, sometimes it just sounds cooler than "impossible". But it is really a word.206Views0likes1Comment