Intro to Load Balancing for Developers – The Algorithms
If you’re new to this series, you can find the complete list of articles in the series on my personal page here If you are writing applications to sit behind a Load Balancer, it behooves you to at least have a clue what the algorithm your load balancer uses is about. We’re taking this week’s installment to just chat about the most common algorithms and give a plain- programmer description of how they work. While historically the algorithm chosen is both beyond the developers’ control, you’re the one that has to deal with performance problems, so you should know what is happening in the application’s ecosystem, not just in the application. Anything that can slow your application down or introduce errors is something worth having reviewed. For algorithms supported by the BIG-IP, the text here is paraphrased/modified versions of the help text associated with the Pool Member tab of the BIG-IP UI. If they wrote a good description and all I needed to do was programmer-ize it, then I used it. For algorithms not supported by the BIG-IP I wrote from scratch. Note that there are many, many more algorithms out there, but as you read through here you’ll see why these (or minor variants of them) are the ones you’ll see the most. Plain Programmer Description: Is not intended to say anything about the way any particular dev team at F5 or any other company writes these algorithms, they’re just an attempt to put the process into terms that are easier for someone with a programming background to understand. Hopefully a successful attempt. Interestingly enough, I’ve pared down what BIG-IP supports to a subset. That means that F5 employees and aficionados will be going “But you didn’t mention…!” and non-F5 employees will likely say “But there’s the Chi-Squared Algorithm…!” (no, chi-squared is theoretical distribution method I know of because it was presented as a proof for testing the randomness of a 20 sided die, ages ago in Dragon Magazine). The point being that I tried to stick to a group that builds on each other in some connected fashion. So send me hate mail… I’m good. Unless you can say more than 2-5% of the world’s load balancers are running the algorithm, I won’t consider that I missed something important. The point is to give developers and software architects a familiarity with core algorithms, not to build the worlds most complete lexicon of algorithms. Random: This load balancing method randomly distributes load across the servers available, picking one via random number generation and sending the current connection to it. While it is available on many load balancing products, its usefulness is questionable except where uptime is concerned – and then only if you detect down machines. Plain Programmer Description: The system builds an array of Servers being load balanced, and uses the random number generator to determine who gets the next connection… Far from an elegant solution, and most often found in large software packages that have thrown load balancing in as a feature. Round Robin: Round Robin passes each new connection request to the next server in line, eventually distributing connections evenly across the array of machines being load balanced. Round Robin works well in most configurations, but could be better if the equipment that you are load balancing is not roughly equal in processing speed, connection speed, and/or memory. Plain Programmer Description: The system builds a standard circular queue and walks through it, sending one request to each machine before getting to the start of the queue and doing it again. While I’ve never seen the code (or actual load balancer code for any of these for that matter), we’ve all written this queue with the modulus function before. In school if nowhere else. Weighted Round Robin (called Ratio on the BIG-IP): With this method, the number of connections that each machine receives over time is proportionate to a ratio weight you define for each machine. This is an improvement over Round Robin because you can say “Machine 3 can handle 2x the load of machines 1 and 2”, and the load balancer will send two requests to machine #3 for each request to the others. Plain Programmer Description: The simplest way to explain for this one is that the system makes multiple entries in the Round Robin circular queue for servers with larger ratios. So if you set ratios at 3:2:1:1 for your four servers, that’s what the queue would look like – 3 entries for the first server, two for the second, one each for the third and fourth. In this version, the weights are set when the load balancing is configured for your application and never change, so the system will just keep looping through that circular queue. Different vendors use different weighting systems – whole numbers, decimals that must total 1.0 (100%), etc. but this is an implementation detail, they all end up in a circular queue style layout with more entries for larger ratings. Dynamic Round Robin (Called Dynamic Ratio on the BIG-IP): is similar to Weighted Round Robin, however, weights are based on continuous monitoring of the servers and are therefore continually changing. This is a dynamic load balancing method, distributing connections based on various aspects of real-time server performance analysis, such as the current number of connections per node or the fastest node response time. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: If you think of Weighted Round Robin where the circular queue is rebuilt with new (dynamic) weights whenever it has been fully traversed, you’ll be dead-on. Fastest: The Fastest method passes a new connection based on the fastest response time of all servers. This method may be particularly useful in environments where servers are distributed across different logical networks. On the BIG-IP, only servers that are active will be selected. Plain Programmer Description: The load balancer looks at the response time of each attached server and chooses the one with the best response time. This is pretty straight-forward, but can lead to congestion because response time right now won’t necessarily be response time in 1 second or two seconds. Since connections are generally going through the load balancer, this algorithm is a lot easier to implement than you might think, as long as the numbers are kept up to date whenever a response comes through. These next three I use the BIG-IP name for. They are variants of a generalized algorithm sometimes called Long Term Resource Monitoring. Least Connections: With this method, the system passes a new connection to the server that has the least number of current connections. Least Connections methods work best in environments where the servers or other equipment you are load balancing have similar capabilities. This is a dynamic load balancing method, distributing connections based on various aspects of real-time server performance analysis, such as the current number of connections per node or the fastest node response time. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: This algorithm just keeps track of the number of connections attached to each server, and selects the one with the smallest number to receive the connection. Like fastest, this can cause congestion when the connections are all of different durations – like if one is loading a plain HTML page and another is running a JSP with a ton of database lookups. Connection counting just doesn’t account for that scenario very well. Observed: The Observed method uses a combination of the logic used in the Least Connections and Fastest algorithms to load balance connections to servers being load-balanced. With this method, servers are ranked based on a combination of the number of current connections and the response time. Servers that have a better balance of fewest connections and fastest response time receive a greater proportion of the connections. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: This algorithm tries to merge Fastest and Least Connections, which does make it more appealing than either one of the above than alone. In this case, an array is built with the information indicated (how weighting is done will vary, and I don’t know even for F5, let alone our competitors), and the element with the highest value is chosen to receive the connection. This somewhat counters the weaknesses of both of the original algorithms, but does not account for when a server is about to be overloaded – like when three requests to that query-heavy JSP have just been submitted, but not yet hit the heavy work. Predictive: The Predictive method uses the ranking method used by the Observed method, however, with the Predictive method, the system analyzes the trend of the ranking over time, determining whether a servers performance is currently improving or declining. The servers in the specified pool with better performance rankings that are currently improving, rather than declining, receive a higher proportion of the connections. The Predictive methods work well in any environment. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: This method attempts to fix the one problem with Observed by watching what is happening with the server. If its response time has started going down, it is less likely to receive the packet. Again, no idea what the weightings are, but an array is built and the most desirable is chosen. You can see with some of these algorithms that persistent connections would cause problems. Like Round Robin, if the connections persist to a server for as long as the user session is working, some servers will build a backlog of persistent connections that slow their response time. The Long Term Resource Monitoring algorithms are the best choice if you have a significant number of persistent connections. Fastest works okay in this scenario also if you don’t have access to any of the dynamic solutions. That’s it for this week, next week we’ll start talking specifically about Application Delivery Controllers and what they offer – which is a whole lot – that can help your application in a variety of ways. Until then! Don.21KViews1like9CommentsForce Multipliers and Strategic Points of Control Revisited
On occasion I have talked about military force multipliers. These are things like terrain and minefields that can make your force able to do their job much more effectively if utilized correctly. In fact, a study of military history is every bit as much a study of battlefields as it is a study of armies. He who chooses the best terrain generally wins, and he who utilizes tools like minefields effectively often does too. Rommel in the desert often used Wadis to hide his dreaded 88mm guns – that at the time could rip through any tank the British fielded. For the last couple of years, we’ve all been inundated with the story of The 300 Spartans that held off an entire army. Of course it was more than just the 300 Spartans in that pass, but they were still massively outnumbered. Over and over again throughout history, it is the terrain and the technology that give a force the edge. Perhaps the first person to notice this trend and certainly the first to write a detailed work on the topic was von Clausewitz. His writing is some of the oldest military theory, and much of it is still relevant today, if you are interested in that type of writing. For those of us in IT, it is much the same. He who chooses the best architecture and makes the most of available technology wins. In this case, as in a war, winning is temporary and must constantly be revisited, but that is indeed what our job is – keeping the systems at their tip-top shape with the resources available. Do you put in the tool that is the absolute best at what it does but requires a zillion man-hours to maintain, or do you put in the tool that covers everything you need and takes almost no time to maintain? The answer to that question is not always as simple as it sounds like it should be. By way of example, which solution would you like your bank to put between your account and hackers? Probably a different one than the one you would you like your bank to put in for employee timekeeping. An 88 in the desert, compliments of WW2inColor Unlike warfare though, a lot of companies are in the business of making tools for our architecture needs, so we get plenty of options and most spaces have a happy medium. Instead of inserting all the bells and whistles they inserted the bells and made them relatively easy to configure, or they merged products to make your life easier. When the terrain suits a commanders’ needs in wartime, the need for such force multipliers as barbed wire and minefields are eliminated because an attacker can be channeled into the desired defenses by terrain features like cliffs and swamps. The same could be said of your network. There are a few places on the network that are Strategic Points of Control, where so much information (incidentally including attackers, though this is not, strictly speaking, a security blog) is funneled through that you can increase your visibility, level of control, and even implement new functionality. We here at F5 like to talk about three of them… Between your users and the apps they access, between your systems and the WAN, and between consumers of file services and the providers of those services. These are places where you can gather an enormous amount of information and act upon that information without a lot of staff effort – force multipliers, so to speak. When a user connects to your systems, the strategic point of control at the edge of your network can perform pre-application-access security checks, route them to a VPN, determine the best of a pool of servers to service their requests, encrypt the stream (on front, back, or both sides), redirect them to a completely different datacenter or an instance of the application they are requesting that actually resides in the cloud… The possibilities are endless. When a user accesses a file, the strategic point of control between them and the physical storage allows you to direct them to the file no matter where it might be stored, allows you to optimize the file for the pattern of access that is normally present, allows you to apply security checks before the physical file system is ever touched, again, the list goes on and on. When an application like replication or remote email is accessed over the WAN, the strategic point of control between the app and the actual Internet allows you to encrypt, compress, dedupe, and otherwise optimize the data before putting it out of your bandwidth-limited, publicly exposed WAN connection. The first strategic point of control listed above gives you control over incoming traffic and early detection of attack attempts. It also gives you force multiplication with load balancing, so your systems are unlikely to get overloaded unless something else is going on. Finally, you get the security of SSL termination or full-stream encryption. The second point of control gives you the ability to balance your storage needs by scripting movement of files between NAS devices or tiers without the user having to see a single change. This means you can do more with less storage, and support for cloud storage providers and cloud storage gateways extends your storage to nearly unlimited space – depending upon your appetite for monthly payments to cloud storage vendors. The third force-multiplies the dollars you are spending on your WAN connection by reducing the traffic going over it, while offloading a ton of work from your servers because encryption happens on the way out the door, not on each VM. Taking advantage of these strategic points of control, architectural force multipliers offers you the opportunity to do more with less daily maintenance. For instance, the point between users and applications can be hooked up to your ADS or LDAP server and be used to authenticate that a user attempting to access internal resources from… Say… and iPad… is indeed an employee before they ever get to the application in question. That limits the attack vectors on software that may be highly attractive to attackers. There are plenty more examples of multiplying your impact without increasing staff size or even growing your architectural footprint beyond the initial investment in tools at the strategic point of control. For F5, we have LTM at the Application Delivery Network Strategic Point of Control. Once that investment is made, a whole raft of options can be tacked on – APM, WOM, WAM, ASM, the list goes on again (tired of that phrase for this blog yet?). Since each resides on LTM, there is only one “bump in the wire”, but a ton of functionality that can be brought to bear, including integration with some of the biggest names in applications – Microsoft, Oracle, IBM, etc. Adding business value like remote access for devices, while multiplying your IT force. I recommend that you check it out if you haven’t, there is definitely a lot to be gained, and it costs you nothing but a little bit of your precious time to look into it. No matter what you do, looking closely at these strategic points of control and making certain you are using them effectively to meet the needs of your organization is easy and important. The network is not just a way to hook users to machines anymore, so make certain that’s not all you’re using it for. Make the most of the terrain. And yes, if you also read Lori’s blog, we were indeed watching the same shows, and talking about this concept, so no surprise our blogs are on similar wavelengths. Related Blogs: What is a Strategic Point of Control Anyway? Is Your Application Infrastructure Architecture Based on the ... F5 Tech Field Day – Intro To F5 As A Strategic Point Of Control What CIOs Can Learn from the Spartans What We Learned from Anonymous: DDoS is now 3DoS What is Network-based Application Virtualization and Why Do You ... They're Called Black Boxes Not Invisible Boxes Service Virtualization Helps Localize Impact of Elastic Scalability F5 Friday: It is now safe to enable File Upload256Views0likes0CommentsLoad Balancing For Developers: Improving Application Performance With ADCs
If you’ve never heard of my Load Balancing For Developers series, it’s a good idea to start here. There are quite a few installments behind us, and I’m not going to look back in this post any more than I must to make it readable without going back… Meaning there’s much more detail back there than I’ll relate here. Again after a lengthy sojourn covering other points of interest, I return to Load Balancing For Developers with a more holistic view – application performance. Lori has talked a bit about this topic, and I’ve talked about it in the form of Load Balancing benefits and algorithms, but I’d like to look more architecturally again, and talk about those difficult to uncover performance issues that web apps often face. You’re the IT manager for the company’s Zap-n-Go website, it has grown nearly exponentially since launch, and you’re the one responsible for keeping it alive. Lately it’s online, but your users are complaining of sluggishness. Following the advice of some guy on the Internet, you put a load balancer in about a year ago, and things were better, but after you put in a redundant data center and Global Load Balancing services, things started to degrade again. Time to rethink your architecture before your product gets known as Zap-N-Gone… Again. Thus far you have a complete system with multiple servers behind an ADC in your primary data center, and a complete system with multiple servers behind an ADC in your secondary data center. Failover tests work correctly when you shut down the primary web servers, and the database at the remote location is kept up to date with something like Data Guard for Oracle or Merge Replication Services for SQL Server. This meets the business requirement that the remote database is up-to-date except for those transactions in-progress at the moment of loss. This makes you highly HA, and if your ADCs are running as an HA pair and your Global DNS – Like our GTM product - is smart enough to switch when it notices your primary site is down, most users won’t even know they’ve been shoved off to the backup datacenter. The business is happy, you’re sleeping at night, all is well. Except that slowly, as usage for the site has grown, performance has suffered. What started as a slight lag has turned into a dragging sensation. You’ve put more web servers into the pool of available resources – or better yet, used your management tools (in the ADC and on your servers) to monitor all facets of web server performance – disk and network I/O, CPU and memory utilization. And still, performance lags. Then you check on your WAN connection and database, and find the problem. Either the WAN connection is overloaded, or the database is waiting long periods of time for responses from the secondary datacenter. If you have things configured so that the primary doesn’t wait for acknowledgment from the secondary database, then your problem might be even more sinister – some transactions may never get deposited in the secondary datacenter, causing your databases to be out of synch. And that’s a problem because you need the secondary database to be as up to date as possible, but buying more bandwidth is a monthly overhead expense, and sometimes it doesn’t help – because the problem isn’t always about bandwidth, sometimes it is about latency. In fact, with synchronous real-time replication, it is almost always about latency. Latency, for those who don’t know, is a combination of how far your connection must travel over the wire and the number of “bumps in the wire” that have been inserted. Not actually the number of devices, but the number and their performance. Each device that touches your data – packet inspection, load balancing, security, whatever the reason – adds time to the delivery window. So does traveling over the wires/fiber. Synchronous replication is very time sensitive. If it doesn’t hear back in time, it doesn’t commit the changes, and then the primary and secondary databases don’t match up. So you need to cut down the latency and improve the performance of your WAN link. Conveniently, your ADC can help. Out-of-the-box it should have TCP optimizations that cut down the impact of latency by reducing the number of packets going back and forth over the wire. It may have compression too – which cuts down the amount of data going over the wire, reducing the number of packets required, which improves the “apparent” performance and the amount of data on your WAN connection. They might offer more functionality than that too. And you’ve already paid for an HA pair – putting one in each datacenter – so all you have to do is check what they do “out of the box” for WAN connections, and then call your sales representative to find out what other functionality is available. F5 includes some functionality in our LTM product, and has more in our add-on WAN Optimization Module (WOM) that can be bought and activated on your BIG-IP. Other vendors have a variety of architectures to offer you similar functionality, but of course I work for and write for F5, so my view is that they aren’t as good as our products… Certainly check with your incumbent vendor before looking for other solutions to this problem. We have seen cases where replication was massively improved with WAN Optimization. More on that in the coming days under a different topic, but just the thought that you can increase the speed and reliability of transaction-based replication (and indeed, file/storage replication, but again, that’s another blog), and you as a manager or a developer do not have to do a thing to your code. That implies the other piece – that this method of improvement is applicable to applications that you have purchased and do not own the source code for. So check it out… At worst you will lose a few hours tracking down your vendor’s options, at best you will be able to go back to sleep at night. And if you’re shifting load between datacenters, as I’ve mentioned before, Long Distance vMotion is improved by these devices too. F5’s architecture for this solution is here – PDF deployment guide. This guide relies upon the WOM functionality mentioned above. And encryption is supported between devices. That means if you are not encrypting your replication, that you can start without impacting performance, and if you are encrypting, you can offload the work of encryption to a device designed to handle it. And bandwidth allocation means you can guarantee your replication has enough bandwidth to stay up to date by giving it priority. But you won’t care too much about that, you’ll be relaxing and dreaming of beaches and stock options… Until the next emergency crops up anyway.255Views0likes0CommentsDoes Cloud Solve or Increase the 'Four Pillars' Problem?
It has long been said – often by this author – that there are four pillars to application performance: Memory CPU Network Storage As soon as you resolve one in response to application response times, another becomes the bottleneck, even if you are not hitting that bottleneck yet. For a bit more detail, they are “memory consumption” – because this impacts swapping in modern Operating Systems. “CPU utilization” – because regardless of OS, there is a magic line after which performance degrades radically. “Network throughput” – because applications have to communicate over the network, and blocking or not (almost all coding for networks today is), the information requested over the network is necessary and will eventually block code from continuing to execute. “Storage” – because IOPS matter when writing/reading to/from disk (or the OS swaps memory out/back in). These four have long been relatively easy to track. The relationship is pretty easy to spot, when you resolve one problem, one of the others becomes the “most dangerous” to application performance. But historically, you’ve always had access to the hardware. Even in highly virtualized environments, these items could be considered both at the Host and Guest level – because both individual VMs and the entire system matter. When moving to the cloud, the four pillars become much less manageable. The amount “much less” implies depends a lot upon your cloud provider, and how you define “cloud”. Put in simple terms, if you are suddenly struck blind, that does not change what’s in front of you, only your ability to perceive it. In the PaaS world, you have only the tools the provider offers to measure these things, and are urged not to think of the impact that host machines may have on your app. But they do have an impact. In an IaaS world you have somewhat more insight, but as others have pointed out, less control than in your datacenter. Picture Courtesy of Stanley Rabinowitz, Math Pro Press. In the SaaS world, assuming you include that in “cloud”, you have zero control and very little insight. If you app is not performing, you’ll have to talk to the vendors’ staff to (hopefully) get them to resolve issues. But is the problem any worse in the cloud than in the datacenter? I would have to argue no. Your ability to touch and feel the bits is reduced, but the actual problems are not. In a pureplay public cloud deployment, the performance of an application is heavily dependent upon your vendor, but the top-tier vendors (Amazon springs to mind) can spin up copies as needed to reduce workload. This is not a far cry from one common performance trick used in highly virtualized environments – bring up another VM on another server and add them to load balancing. If the app is poorly designed, the net result is not that you’re buying servers to host instances, it is instead that you’re buying instances directly. This has implications for IT. The reduced up-front cost of using an inefficient app – no matter which of the four pillars it is inefficient in – means that IT shops are more likely to tolerate inefficiency, even though in the long run the cost of paying monthly may be far more than the cost of purchasing a new server was, simply because the budget pain is reduced. There are a lot of companies out there offering information about cloud deployments that can help you to see if you feel blind. Fair disclosure, F5 is one of them, I work for F5. That’s all you’re going to hear on that topic in this blog. While knowing does not always directly correlate to taking action, and there is some information that only the cloud provider could offer you, knowing where performance bottlenecks are does at least give some level of decision-making back to IT staff. If an application is performing poorly, looking into what appears to be happening (you can tell network bandwidth, VM CPU usage, VM IOPS, etc, but not what’s happening on the physical hardware) can inform decision-making about how to contain the OpEx costs of cloud. Internal cloud is a much easier play, you still have access to all the information you had before cloud came along, and generally the investigation is similar to that used in a highly virtualized environment. From a troubleshooting performance problems perspective, it’s much the same. The key with both virtualization and internal (private) clouds is that you’re aiming for maximum utilization of resources, so you will have to watch for the bottlenecks more closely – you’re “closer to the edge” of performance problems, because you designed it that way. A comprehensive logging and monitoring environment can go a long way in all cloud and virtualization environments to keeping on top of issues that crop up – particularly in a large datacenter with many apps running. And developer education on how not to be a resource hog is helpful for internally developed apps. For externally developed apps the best you can do is ask for sizing information and then test their assumptions before buying. Sometimes, cloud simply is the right choice. If network bandwidth is the prime limiting factor, and your organization can accept the perceived security/compliance risks, for example, the cloud is an easy solution – bandwidth in the cloud is either not limited, or limited by your willingness to write a monthly check to cover usage. Either way, it’s not an Internet connection upgrade, which can be dastardly expensive not just at install, but month after month. Keep rocking it. Get the visibility you need, don’t worry about what you don’t need. Related Articles and Blogs: Don MacVittie - Load Balancing For Developers Advanced Load Balancing For Developers. The Network Dev Tool Load Balancers for Developers – ADCs Wan Optimization ... Intro to Load Balancing for Developers – How they work Intro to Load Balancing for Developers – The Gotchas Intro to Load Balancing for Developers – The Algorithms Load Balancing For Developers: Security and TCP Optimizations Advanced Load Balancers for Developers: ADCs - The Code Advanced Load Balancing For Developers: Virtual Benefits Don MacVittie - ADCs for Developers Devops Proverb: Process Practice Makes Perfect Devops is Not All About Automation 1024 Words: Why Devops is Hard Will DevOps Fork? DevOps. It's in the Culture, Not Tech. Lori MacVittie - Development and General Devops: Controlling Application Release Cycles to Avoid the ... An Aristotlean Approach to Devops and Infrastructure Integration How to Build a Silo Faster: Not Enough Ops in your Devops233Views0likes0CommentsIs That An ACK In Your Packet, Or Are You Just Glad To See Me?
Every once in a while, I like to step back a bit and write for those who haven’t been in the field for a zillion years. For starters, it helps refresh the pool of information out there for people trying to research something they haven’t done before. It helps a lot that I enjoy sharing my knowledge, so writing such a blog is like “non-work”. Since I’m gearing up for some holiday time, this seemed like a great time to do just such an article, so I cast about and TCP optimizations came to mind. A lot has been written about TCP optimizations, this take will be for the beginner, and will cover them from the Application Delivery Controller (ADC) perspective. With a background in development, IT management, and storage, I had to learn this stuff the hard way, hopefully this helps some of you skip ahead a few squares in the “IT Learn Something New!” game. My knowledge leans heavily upon F5 gear, specifically BIG-IPLTM, but as usual, I try to stick to features and functionality common to ADCs. At least the big names in ADCs. One of the very cool bits about an ADC is that most act as a full proxy between the LAN and the WAN. This opens possibilities that would not normally exist in a standard network configuration. The ADC can ack to the server at server speed, while spooling to send to the client at client speed. In many cases, this single possibility helps performance by its mere existence. But there is much more going on in a modern ADC. If you’re interested in the deep-delve details along with RFC numbers to research, check out Optimizing WAN and LAN Application Performance With TCP Express on F5.com. It’s getting older (well over three years), and is F5-centric, but by including the RFC numbers, the author has left you room to research. For those who aren’t crazy about reading through RFCs, here are some highlights of what you can hope to get out of an ADC. As always, my knowledge is F5 centric, check with your vendor before assuming they’ve implemented all of these. All of these are turned on or configurable on a BIG-IP. Nagle’s Algorithm. This pools data until the receiver has ACK’d what has already been sent. By doing so, it sends less packets because it’s packing data waiting for ACKs. While this can make it appear that latency has increased, it does generally result in less packets on the wire. Dynamic Window Sizing (Including Slow Start). This adjusts the data window size to suit what’s on the other end. By doing so, the client can have one window size and the server another, each optimized to the network conditions it is seeing and the way its TCP stack is optimized. Normally the two would have to negotiate this to be the minimum. Adaptive Initial Congestion Windows and TCP Slow Start With Congestion Avoidance. These simply change how fast initial Slow Start is handled, so that some connections get to the proper window size quickly. Bandwidth Delay Control. An automatic calculation of how much data can be put into a link without overloading it. TCP Congestion Avoidance. A set of standards to avoid and recover from lost packets due to link congestion. Selective Acknowledgements and Limited and Fast Retransmits. When data is lost, this is a packet-based shorthand for recovery, cutting the time and retransmits required down. Connection pooling to servers. We draw an imaginary line in the sand and don’t call this a TCP optimization, but it really is – it creates less TCP overhead on your server by putting multiple clients into one connection. Normally the server would open one (or more) for each connection, an ADC, sitting in the middle, can “pool” these connections into one, saving your server from setting aside resources for each individual client. What does all of this mean? Well first off, these are not all of the possibilities, TCP has had a long history and lots of improvements have been suggested through the RFC system. Our engineers will likely grind their teeth that I distilled all of their hard work down to a few bullet points that don’t even cover all the possibilities. But the point is to help you understand why the simple act of putting an ADC into your network can improve application performance. If your server is communicating with the BIG-IP at its maximum speed, and the client is communicating with the BIG-IP at its maximum speed, things seem faster to the end user. Add in the ability to recover quickly on lossy networks, and the more remote the user, the more benefits they’ll see. That’s pretty cool. And it’s free with your ADC. How much of it is free with your ADC, and how well it is implemented is going to be vendor dependent, but much of this stuff has been out there for years, so ask your ADC vendor, I’d be surprised if they told you “yeah, we don’t do Nagle’s algorithm” or “Congestion Avoidance? Congestion helps your packets get tougher, why would we want to avoid it?” A modern ADC is a complex system. While implementing TCP and HTTP optimizations is a natural offshoot of what a load balancer does, it is certainly one of the hallmarks of an ADC that this offshoot has been incorporated into the product. I reiterate that this is simply a starting point. There is lots of good information out there about TCP optimizations (starting with that PDF linked to above), and you can get right to it if you need it. This was just a toe-dip into a very complex world. No doubt I have simplified to the point that some experts will think I’ve over-simplified. If it piqued your interest though, then I did not oversimplify at all. The answer to the title? If you have an ADC in your network, the answer is “Both”. That IS an ACK in your server’s packet, and since its workload is reduced, the server IS glad to see the ADC.233Views0likes1CommentAdvanced Load Balancing For Developers. The Network Dev Tool
It has been a while since I wrote an installment of Load Balancing for Developers, and now I think it has been too long, but never fear, this is the grad-daddy of Load Balancing for Developers blogs, covering a useful bit of information about Application Delivery Controllers that you might want to take advantage of. For those who have joined us since my last installment, feel free to check out the entire list of blog entries (along with related blog entries) here, though I assure you that this installment, like most of the others, does not require you to have read those that went before. ZapNGo! Is still a growing enterprise, now with several dozen complex applications and a high availability architecture that spans datacenters and the cloud. While the organization relies upon its web properties to generate revenue, those properties have been going along fine with your Application Delivery Controller (ADC) architecture. Now though, you’re seeing a need to centralize administration of a whole lot of functions. What worked fine separately for one or two applications is no longer working so well now that you have several development teams and several dozen applications, and you need to find a way to bring the growing inter-relationships under control before maintenance and hidden dependencies swamp you in a cascading mess of disruption. With maintenance taking a growing portion of your application development manhours, and a reasonably well positioned test environment configured with a virtual ADC to mimic your production environment, all you need now is a way to cut those maintenance manhours and reduce the amount of repetitive work required to create or update an application. Particularly update an application, because that is a constant problem, where creating is less frequent. With many of the threats that your ZapNGo application will be known as ZapNGone eliminated, now it is efficiencies you are after. And believe it or not, these too are available in an ADC. Not all ADC’s are created equal, but this discussion will stay on topics that most ADCs can handle, and I’ll mention it when I stray from generic into specific – which I will do in one case because only one vendor supports one of the tools you can use, but all of the others should be supported by whatever ADC vendor you have, though as always, check with your vendor directly first, since I’m not an expert in the inner workings of every one. There is a lot that many organizations do for themselves, and the array of possibilities is long – from implementing load balancing in source code to security checks in the application, the boundaries of what is expected of developers are shaped by an organization, its history, and its chosen future direction. At ZapNGo, the team has implemented a virtual test environment that as close as possible mirrors production, so that code can be implemented and tested in the way it will be used. They use an ADC for load balancing, so that they don’t have to rewrite the same code over and over, and they have a policy of utilizing a familiar subset of ADC functionality on all applications that face the public. The company is successful and growing, but as always happens in companies in that situation, the pressures upon them are changing just by virtue of their growth. There are more new people who don’t yet have intimate knowledge of the code base, network topology, security policies, whatever their area of expertise is. There are more lines of code to maintain, while new projects are being brought up at a more rapid pace and with higher priorities (I’ve twice lived through the “Everything is high priority? Well this is highest priority!” syndrome while working in IT. Thankfully, most companies grow out of that fast when it’s pointed out that if everything is priority #1, nothing is). Timelines to complete projects – be they new development, bug fixes, or enhancements are stretching longer and longer as the percentage of gurus in the company is down and the complexity of the code and the architecture it runs on is up. So what is a development manager to do to increase productivity? Teaming newer developers with people who’ve been around since the beginning is helping, but those seasoned developers are a smaller and smaller percentage of the workforce, while the volume of work has slowly removed them from some of the many products now under management. Adopting coding standards and standardized libraries helps increase experience portability between projects, but doesn’t do enough. Enter offloading to the ADC. Some things just don’t have to be done in code, and if they don’t have to be, at this stage in the company’s growth, IT management at ZapNGo (that’s you!) decides they won’t be. There just isn’t time for non-essential development anymore. Utilizing a policy management tool and/or an Application Firewall on the ADC can improve security without increasing the code base, for example. And that shaves hours off of maintenance projects, while standardizing on one or a few implementations that are simply selected on the ADC. Implementing Web Application Acceleration protocols on the ADC means that less in-code optimization has to occur. Performance is no longer purely the role of developers (but of course it is still a concern. No Web Application Acceleration tool can make a loop that runs for five minutes run faster), they can allow the Web Application Acceleration tool to shrink the amount of data being sent to the users’ browser for you. Utilizing a WAN Optimization ADC tool to improve the performance of bulk copies or backups to a remote datacenter or cloud storage… The list goes on and on. The key is that the ADC enables a lot of opportunities for App Dev to be more responsive to the needs of the organization by moving repetitive tasks to the ADC and standardizing them. And a heaping bonus is that it also does that for operations with a different subset of functionality, meaning one toolset gives both App Dev and Operations a bit more time out of their day for servicing important organizational needs. Some would say this is all part of DevOps, some would say it is not. I leave those discussions to others, all I care is that it can make your apps more secure, fast, and available, while cutting down on workload. And if your ADC supports an SSL VPN, your developers can work from home when necessary. Or more likely, if your code is your IP, a subset of your developers can. Making ZapNGo more responsive, easier to maintain, and more adaptable to the changes coming next week/month/year. That’s what ADCs do. And they’re pretty darned good at it. That brings us to the one bit that I have to caveat with F5 only, and that is iApps. An iApp is a constructed configuration tool that asks a few questions and then deploys all the bits necessary to set up an ADC for a particular application. Why do I mention it here? Well if you have dozens of applications with similar characteristics, you can create an iApp Template and use it to rapidly bring new applications or new instances of applications online. And since it is abstracted, these iApp templates can be designed such that AppDev, or even the business owner, is able to operate them Meaning less time worrying about what network resources will be available, how they’re configured, and waiting for operations to have time to implement them (in an advanced ADC that is being utilized to its maximum in a complex application environment, this can be hundreds of networking objects to configure – all encapsulated into a form). Less time on the project timeline, more time for the next project. Or for the post deployment party. One of the two. That’s it for the F5 only bit. And knowing that all of these items are standardized means less things to get mis-configured, more surety that it will all work right the first time. As with all of these articles, that offers you the most important benefit… A good night’s sleep.231Views0likes0CommentsLoad Balancing For Developers: Security and TCP Optimizations
It has been a while since I wrote a Load Balancing for Developers installment, and since they’re pretty popular and there’s still a lot about Application Delivery Controllers (ADCs) that are taken for granted in the Networking industry but relatively unknown in the development world, I thought I’d throw one out about making your security more resilient with ADCs. For those who are just joining this series, here’s the full list of posts I’ve tagged as Load Balancing for Developers, though only the ones whose title starts with “Load Balancing for Developers” or “Advance Load Balancing for Developers” were actually written from this perspective, utilizing our fictional web application Zap’N’Go! as an example. This post, like most of them, doesn’t require that you read the other entries in the “Load Balancers for Developers” series, but if you’re interested in the topic, they are all written from the developer’s perspective, and only bring in the networking/ops portions where it makes sense. So your organization has a truly successful web application called Zap’N’Go! That has taken the Internet by storm. Your hits are in the thousands an hour, and orders are rolling in. All was going well until your server couldn’t keep up and you went to a load balanced scenario so that multiple servers could share the load. The problem is that with the money you’ve generated off of Zap’N’Go, you’ve bought a competitor and started several new web applications, set up a forum or portal for your customers to communicate with you and each other directly, and are using the old datacenter from the company you purchased as a redundant datacenter in case the worst should happen. And all of that means that you are suffering server (and VM) sprawl. The CPU cycles being eaten up by your applications are truly astounding, and you’re looking into ways to drive them down. Virtualization helped you to be more agile in responding to the requests of the business, but also brings a lot of management overhead in making certain servers aren’t overloaded with too high a virtual density. One of the cool bits about an ADC is that they do a lot more than load balance, and much of that can be utilized to improve application performance without re-architecting the entire system. While there are a lot of ways that an ADC can improve application performance, we’ll look at a couple of easy ones here, and leave some of the more difficult or involved ones for another time. That keeps me in writing topics, and makes certain that I can give each one the attention it deserves in the space available. The biggest and most obvious improvement in an ADC is of course load balancing. This blog assumes you already have an ADC in place, and load balancing was your primary reason for purchasing it. While I don’t have market numbers in front of me, it is my experience that this is true of the vast majority of ADC customers. If you have overburdened web applications and have not looked into load balancing, before you go rewriting your entire system, take a look at the rest of this series. There really are options out there to help. After that win, I think the biggest place – in a virtualized environment – that developers can reap benefits from an ADC is one that developers wouldn’t normally think of. That’s the reason for this series, so I suppose that would be a good thing. Nearly every application out there hits a point where SSL is enabled. That point may be simply the act of accessing it, or it may be when they go to the “shopping cart” section of the web site, but they all use SSL to protect sensitive user data being passed over the Internet. As a developer, you don’t have to care too much about this fact. Pay attention to the protocol if you’re writing at that level and to the ports if you have reason to, but beyond that you don’t have to care. Networking takes care of all of that for you. But what if you could put a request in to your networking group that would greatly improve performance without changing a thing in your code and from a security perspective wouldn’t change much – most companies would see it as not changing anything, while a few will want to talk about it first? What if you could make this change over lunch and users wouldn’t know the difference? Here’s the background. SSL Encryption is expensive in terms of CPU cycles. No doubt you know that, most developers have to face this issue head-on at some point. It takes a lot of power to do encryption, and while commodity hardware is now fast enough that it isn’t a problem on a stand-alone server, in a VM environment, the number of applications requesting SSL encryption on the same physical hardware is many times what it once was. That creates a burden that, at this time at least, often drags on the hardware. It’s not the fault of any one application or a rogue programmer, it is the summation of the burdens placed by each application requiring SSL translation. One solution to this problem is to try and manage VM deployment such that encryption is only required on a couple of applications per physical server, but this is not a very appealing long-term solution as loads shift and priorities change. From a developers’ point of view, do you trust the systems/network teams to guarantee your application is not sharing hardware with a zillion applications that all require SSL encryption? Over time, this is not going to be their number one priority, and when performance troubles crop up, the first place that everyone looks in an in-house developed app is at the development team. We could argue whether that’s the right starting point or not, but it certainly is where we start. Another, more generic solution is to take advantage of a non-development feature of your ADC. This feature is SSL termination. Since the ADC sits between your application and the Internet, you can tell your ADC to handle encryption for your application, and then not worry about it again. If your network team sets this up for all of your applications, then you have no worries that SSL is burning up your CPU cycles behind your back. Is there a negative? A minor one that most organizations (as noted above) just won’t see as an issue. That is that from the ADC to your application, communications will happen in the clear. If your application is internal, this really isn’t a big deal at all. If you suspect a bad-guy on your internal network, you have much more to worry about than whether communications between two boxes are in the clear. If you application is in the cloud, this concern is more realistic, but in that case, SSL termination is limited in usefulness anyway because you can’t know if the other apps on the same hardware are utilizing it. So you simply flick a switch on your ADC to turn on SSL termination, and then turn it off on your applications, and you have what the ADC industry calls “SSL offload”. If your ADC is purpose-built hardware (like our BIG-IP), then there is encryption hardware in the box and you don’t have to worry about the impact to the ADC of overloading it with SSL requests, it’s built to handle the load. If your ADC is software or a VM (like our BIG-IP LTM VE), then you’ll have to do a bit of testing to see what the tolerance level for SSL load is on the hardware you deployed it on – but you can ask the network staff to worry about all of that, once you’ve started the conversation. Is this the only security-based performance boost you can get? No, but it is the easy one. Everything on the Internet remains encrypted, but your application is not burdening the server’s CPU with encryption requests each time communications in or out occur. The other easy one is TCP optimizations. This one requires less talk because it is completely out of the realm of the developer. Simply put, TCP is a well designed protocol that sometimes gets bogged down communicating and has a lot of overhead in those situations. Turning on TCP optimizations in your ADC can reduce the overhead – more or less, depending upon what is on the other end of the communications network – and improve perceived performance, which honestly is one of the most important measures of web application availability. By making it seem to load faster, you’ve improved your customer experience, and nothing about your development has to change. TCP optimizations are not new, and thus the ones that are turned on when you activate the option on most ADCs are stable and won’t disrupt most applications. Of course you should run a short test cycle with them enabled, just to be certain, but I would be surprised if you saw any issues. They’re not unheard of, but they are very rare. That’s enough for now, I think. I don’t want these to get so long that you wander off to develop some more. Keep doing what you do. And strive to keep your users from doing this. Slow apps anger users226Views0likes0CommentsGotta Catch Em All. Multiple bottlenecks are a part of the IT lifestyle
My older children, like most kids in their age group, all played with or collected Pokemon cards. Just like I and all of my friends had GI Joes and discussed the strengths and weaknesses of Kung-fu grip versus hard hands, they and all of their friends sat around talking about how much cooler their current favorite Pokemon card was compared to all of the others. We let them play and kept an eye on how cards were being passed about the group (they’re small and tend to walk off, so we patrolled a bit, but otherwise stayed out of the way). And the interesting thing about Pokemon – or any other Collectible Card Game – is that as soon as you’ve settled your discussion about which card is “best”, someone picks a new favorite so you can rehash all the same issues with this new card in the mix. People – mostly but not exclusively children - honestly spend hours at this pass-time, and every time they resolve the differences, it starts all over again. The point of Pokemon is to catch and train little creatures (build a deck of cards) that will, on your command, battle other little creatures (the other players’ card deck) for supremacy. But that’s often lost in the discussions of which individual card or small combinations of cards is “best”. Everyone has their favorites and a focused direction, so these conversations can grow quite heated. It is no mistake that I’m discussing Pokemon in an IT blog. Our role is to support the business with applications that will allow them to do their job, or do their job better, or do things the competition can’t do. That’s why we’re here. But everyone in IT has a focus and direction – Developer, Architect, Network Admin, Systems Admin, Storage Admin, Business Analyst… The list goes on – and sometimes our conversations about how to best serve the business get quite heated. More importantly, sometimes the point of IT – to support the business – gets lost in examining the minutiae, just like comparing two Pokemon cards when there are hundreds of cards to build decks from. There are a few – like Charizard pictured here – that are special until they’re superseded by even cooler cards. But a lot of what we do is written in stone, and is easily lost in the shuffle. Just as no one champions the basic “energy” cards in Pokemon – because they don’t DO anything by themselves – we often don’t discuss some of the basic issues IT always has and always will struggle with, because they’re known, set in stone, and should be self-evident. Or at least we think they should. So I’ll remind you of one of the basics, and perhaps that will spur you to keep the simple stuff in mind whilst arguing over the coolest new toy in the datacenter. Image courtesy of Pokebeach.com The item I’ve chosen? There is never one bottleneck. It is a truth. If you find and eliminate the performance bottleneck of your application, you have not resolved all problems, you have simply removed a roadblock on the way to the next bottleneck. A system that ran fine last week may not be running fine this week because a new bottleneck threshold has been hit. And the bottlenecks are always – always inter-related. (Warning – of course I reference F5 products in this list, if you have other vendors, insert their names) Consider this, your web app is having performance problems, and you track it down to your network card utilization. So you upgrade the server or throw it behind your BIG-IP (or other ADC or a load balancer), and the problem is resolved. So now your CPU utilization is fine, but the application’s performance degrades again relatively quickly. You go researching and discover that your new bottleneck is storage. Too many high-access files on a single NAS device is slowing down simple file reads and writes. So you move your web servers to use a different NAS device (downright simple if you have ARX in-house, not too terribly difficult if you don’t), and a couple of weeks later users are complaining again. You dig and research, and all seems well to you, but there are enough complaints that you are pretty certain there’s a problem. So you call up a coworker in a remote office and have them check. They say performance stinks. So you go home that night and try it from home, and sure enough, outside the building performance stinks. Inside, it’s fine. Now your problem is your Internet connection. So you check the statistics, and back-end services like replication are burying your Internet connection. So you do some research and decide that your problems are best addressed by reducing the bandwidth required for those back-end processes and setting guaranteed bandwidth numbers for HTTP traffic. Enter WAN Optimization. If you’re an F5 customer, you just add WOM to your BIG-IP and configure it. Other vendors have a few more steps, but not terribly more than if you were not an F5 customer and bought BIG-IP with WOM to solve this problem. And once all of that clears up, guess what? We’re back to Pikachu. Your two servers, now completely cleared of other bottlenecks, are servicing so many requests that their CPU utilization is spiking. Time for a third server. Now this whole story sounds simple, but it isn’t. Network, Storage, Systems, all fall under the bailiwick of different groups within IT. It is never so easy as the above paragraph makes it sound… I’ve glossed over the long nights, the endless status meetings, the frustration of not finding the bottleneck right away – mine are obvious only because I list them, I skip the part where you check fifty other things first. And inevitable, there is the discussion of what’s the right solution to a given problem that starts to sound like people who discuss the “best” Pokemon card. Someone wants to cut back on the amount of bandwidth back-office applications use by turning off services, someone wants to buy a bigger pipe, someone suggests WAN optimization, and we go a few rounds until we settle on a plan that’s best for the organization in question. But in the end, keeping the business going and customers happy is the key to IT. Sure, clearing up one bottleneck will create another and spawn another round of “right solution” discussions, but that’s the point. It’s why you’re there. You have the skills and the expertise the company needs to keep moving forward, and this is how they’re applied. And along the way you’ll get to find the new hot toy in the datacenter and propose it as the right solution to everything, because it is your Charizard – until the next round of discussion anyway. And admit it, this stuff is fun, just like the game. Choosing the right solution, getting it implemented, that’s what drives all good IT people. Figuring out problems that are complex enough to be called rocket science under pressure that is sometimes oppressive. But the rush is there when the solution is in and is right. And it’s often a team effort by all of the different groups in IT. I personally think IT should throw itself more parties, but I guess we’ll just have to settle for more dinner-at-the-desk moments for the time being.216Views0likes0CommentsDevOps. It’s in the Culture, Not Tech.
#F5 DevOps – Managers need to make use of existing technology and adopt culture change. It is entertaining to read all that is currently being written about DevOps. Having been a developer, a development manager, an operations manager, and even a CTO, I can attest to the fact that the “throw it over the wall” syndrome is real, and causes real problems for everyone involved. That is about where my agreement with the current round of pundits ends. The thing is that they talk like there is some fundamental technological reason why DevOps isn’t happening. That’s just not true. For those a little behind in your jargon, DevOps is making operations prevalent in the decisions of your development organization. We’ll take the discussion a little bit at a time. We have virtualization. We have astoundingly good virtualization from VMWare, Microsoft, RedHat, et al. So many of the concerns about development and ops go away immediately. “Developers test in the environment their tools run in!'” is oft-heard in the DevOps conversations. But that’s just not an issue. They can run three different OS’s to test with, and as many browsers in each OS as you want to support – all with a single set of hardware. So make testing in your operational environment mandatory. In fact, for most tools, they’re developing on whatever OS is being targeted anyway, because there are subtle differences – or no support at all – in other OS’s. Virtualization taken hand-in-hand with the capabilities of an Application Delivery Controller (ADC) like F5’s BIG-IP can also remove some of the “throw it over the wall” symptoms easily – particularly in Agile or other high-rev environments. Take a copy of the VM running the app today, modify that copy, place it on the network, then, utilizing one of the many algorithms available for load-balancing, switch a select load to the new server. Make it 1/3rd as likely as any other server to receive a given connection, and utilize persistence so users aren’t bouncing back and forth between revisions of the app. See how it performs under real-life usage. Have the Devs there for the first half hour, then make a “deploy/don’t deploy” decision, and if not deploying, take down the copy with the new code on it so the devs can continue to work on it. If deploying, then bring up copies of the new server and bleed connections off of the old ones, then bring those virtuals down and archive or destroy them, as your organization sees fit. Most of the issues with DevOps are gone at that point. Testing is the only other issue that comes up a lot, and let’s talk frankly about testing. There just aren’t enough really good test tools out there. A test tool needs to automate as close to 100% of the testing process as possible in order for thorough testing to be feasible. The complexity of today’s applications leave most test tools in the dust. When Microsoft had MS-Test available, one of the shops I worked at used it, and one of my friends was an expert at generating test scripts, but we were a software shop. Our product was the software, making it much more critical that there be as close to zero defects as possible. That’s not the case if your product is not software. In those cases, software is a supporting tool to help sell product, and as such the occasional bug, while regrettable and to be avoided, doesn’t reflect upon your entire product line. So there are some tools out there to do testing. They’ll help determine things like buffer overflows and such, and Microsoft is selling Microsoft Test Center, which looks to be a more comprehensive solution, but no program knows your business – and thus the testing context – in the manner that your employees do, and most companies are not willing to shell out the money required to start a dedicated testing group. If you are one of the lucky ones, well you can replicate your entire production environment with VMs and an ADC… But you can’t get the volumes you would see in a live public-facing Internet application with actual personal interaction and all the dumb mistakes we users make. So you can test, after a fashion, and with real people involved even set up intelligent testing, but it will take some effort. The best bet for most shops is to make unit testing part of the developers’ jobs, and system testing part of the project managers’ job, with help from both dev and ops. See, Devops! It is a testament to the quality of tools and developers out there that we don’t see a ton of issues all the time. Think if the Web had the percentage of problems on a daily basis that Windows Apps did early on… We’d never get anything done. So don’t take DevOps so seriously from a technological standpoint, instead, let’s talk about culture. Developers need to feel like part of the larger team if you want them to worry about DevOps. Here’s the catch, most of them don’t want to feel like part of the larger team. Network issues and storage shortages annoy them as much as your business users. They want to write cool apps and shove them out the door (or the Internet connection, more precisely) without worrying about deployment issues too much. It is up to management to take concrete steps to move dev closer to ops. To do this, first mandate that all testing occur on the OS that deployment will occur on. This shouldn’t be a big deal for most shops, but in a few it will be. Second, mandate that the dev team put resources on the deployment, and utilize the Virtualization/ADC scenario discussed above. Third, remind developers that their app isn’t great unless users think it is, and it is deployable. They’ll come around, but it will take some work on your part. The days of developing in a vacuum are long gone (at least in the enterprise), and the disconnect between dev and ops is one of the few remaining artifacts of that time. You just have to remove the artifacts and hook up the strengths of your ops team with the leet app skills on your dev team, and the whole concept of DevOps is significantly reduced. One last bit, troubleshooting when things do go wrong. Developers can build a lot into their apps to give them hints about status, but ops can use tools like iApps (built into TMOS v 11 on F5 BIG-IP LTM) to show that it is indeed the application not responding in a timely manner – or indeed to show exactly what IS the bottleneck in a deployment. The reporting functionality on iApps makes them a worthwhile endeavor without the “ease of infrastructure management” they offer. iApp reporting can tell you exactly which piece of the application environment is slow, dragging the entire system response time down. I think that’s huge. If you have a BIG-IP, check it out, I’m pretty certain you will too. So call a meeting. Make it around lunch time and announce that pizza will be provided. Both teams will show up, and you can start changing culture. The benefits will be long-term, with applications better suiting users needs and requiring less operations man-hours. And devs will get a better feel for what works and doesn’t in your environment.212Views0likes0CommentsAs NetWork Speeds Increase, Focus Shifts
Someone said something interesting to me the other day, and they’re right “at 10 Gig WAN connections with compression turned on, you’re not likely to fill the pipe, the key is to make certain you’re not the bottleneck.” (the other day is relative – I’ve been sitting on this post for a while) I saw this happen when 1 Gig LANs came about, applications at the time were hard pressed to actually use up a Gigabit of bandwidth, so the focus became how slow the server and application were, if the backplane on the switch was big enough to handle all that was plugged into it, etc. After this had gone on for a while, server hardware became so fast that we chucked application performance under the bus in most enterprises. And then those applications were running on the WAN, where we didn’t have really fast connections, and we started looking at optimizing those connections in lieu of optimizing the entire application. But there is only so much that an application developer can do to speed network communications. Most of the work of network communications is out of their hands, and all they control is the amount of data they send over the pipe. Even then, if persistence is being maintained, even how much data they send may be dictated by the needs of the application. And if you are one of those organizations that has situations where databases are communicating over your WAN connection, that is completely outside the control of application developers. So the speed bottleneck became the WAN. For every problem in high tech, there is a purchasable solution though, and several companies (including F5) offer solutions for both WAN Acceleration and Application Acceleration. The cool thing about solutions like BIG-IPWebAccelerator, EDGE Gateway, and WOM are that they speed application performance (WebAccelerator for web based applications and WOM for more back-end applications or remote office), while reducing the amount of data being sent over the wire – without requiring work on the part of developers. As I’ve said before: If developers can focus on solving the business problems at hand and not the technical issues that sit in the background, they are more productive. Now that WAN connections are growing again, you would think we would be poised to shift the focus back to some other piece of the huge performance puzzle, but this stuff doesn’t happen in a vacuum, and there are other pressures growing on your WAN connection that keep the focus squarely on how much data it can pass. Those pressures are multi-core, virtualization and cloud. Multi-core increases the CPU cycles available to applications. To keep up, server vendors have been putting more NICs in every given server, increasing potential traffic on both the LAN and the WAN. With virtualization we have a ton more applications running on the network, and the comparative ease with which they can be brought online implies this trend will continue, and cloud not only does the same thing, but puts the instances on a remote network that requires trips back to your datacenter for integration and database access (yeah, there are exceptions. I would argue not many). Both of these trends mean that the size of your pipe out to the world is not only important, but because it is a monthly expense, it must be maximized. By putting in both WAN Optimization and Web Application Acceleration, you stand a chance of keeping your pipe from growing to the size of the Alaska pipeline, and that means savings for you on a monthly basis. You’ll also see that improved performance that is so elusive. Never mind that as soon as one bottleneck is cleared another will crop up, that comes with the territory. By clearing this one you’ll have improved performance until you hit the next plateau, and you can then focus on settling it, secure in the knowledge that the WAN is not the bottleneck. And with most technologies – certainly with those offered by F5 – you’ll have the graphs and data to show that the WAN link isn’t the bottleneck. Meanwhile, your developers will be busy solving business problems, and all of those cores won’t go to waste. Photo of caribou walking alongside the, taken July 1998 by Stan Shebs208Views0likes0Comments