load balancing

186 Topics

Devops Proverb: Process Practice Makes Perfect
#devops Tools for automating – and optimizing – processes are a must-have for enabling continuous delivery of application deployments Some idioms are cross-cultural and cross-temporal. They transcend cultures and time, remaining relevant no matter where or when they are spoken. These idioms are often referred to as proverbs, which carries with it a sense of enduring wisdom. One such idiom, “practice makes perfect”, can be found in just about every culture in some form. In Chinese, for example, the idiom is apparently properly read as “familiarity through doing creates high proficiency”, i.e. practice makes perfect. This is a central tenet of devops, particularly where optimization of operational processes is concerned. The more often you execute a process, the more likely you are to get better at it and discover what activities (steps) within that process may need tweaking or changes or improvements. Ergo, optimization. This tenet grows out of the agile methodology adopted by devops: application release cycles should be nearly continuous, with both developers and operations iterating over the same process – develop, test, deploy – with a high level of frequency. Eventually (one hopes) we achieve process perfection – or at least what we might call process perfection: repeatable, consistent deployment success. It is implied that in order to achieve this many processes will be automated, once we have discovered and defined them in such a way as to enable them to be automated. But how does one automate a process such as an application release cycle? Business Process Management (BPM) works well for automating business workflows; such systems include adapters and plug-ins that allow communication between systems as well as people. But these systems are not designed for operations; there are no web servers or databases or Load balancer adapters for even the most widely adopted BPM systems. One such solution can be found in Electric Cloud with its recently announced ElectricDeploy. Process Automation for Operations ElectricDeploy is built upon a more well known product from Electric Cloud (well, more well-known in developer circles, at least) known as ElectricCommander, a build-test-deploy application deployment system. Its interface presents applications in terms of tiers – but extends beyond the traditional three-tiers associated with development to include infrastructure services such as – you guessed it – load balancers (yes, including BIG-IP) and virtual infrastructure. The view enables operators to create the tiers appropriate to applications and then orchestrate deployment processes through fairly predictable phases – test, QA, pre-production and production. What’s hawesome about the tools is the ability to control the process – to rollback, to restore, and even debug. The debugging capabilities enable operators to stop at specified tasks in order to examine output from systems, check log files, etc..to ensure the process is executing properly. While it’s not able to perform “step into” debugging (stepping into the configuration of the load balancer, for example, and manually executing line by line changes) it can perform what developers know as “step over” debugging, which means you can step through a process at the highest layer and pause at break points, but you can’t yet dive into the actual task. Still, the ability to pause an executing process and examine output, as well as rollback or restore specific process versions (yes, it versions the processes as well, just as you’d expect) would certainly be a boon to operations in the quest to adopt tools and methodologies from development that can aid them in improving time and consistency of deployments. The tool also enables operations to determine what is failure during a deployment. For example, you may want to stop and rollback the deployment when a server fails to launch if your deployment only comprises 2 or 3 servers, but when it comprises 1000s it may be acceptable that a few fail to launch. Success and failure of individual tasks as well as the overall process are defined by the organization and allow for flexibility. This is more than just automation, it’s managed automation; it’s agile in action; it’s focusing on the processes, not the plumbing. MANUAL still RULES Electric Cloud recently (June 2012) conducted a survey on the “state of application deployments today” and found some not unexpected but still frustrating results including that 75% of application deployments are still performed manually or with little to no automation. While automation may not be the goal of devops, but it is a tool enabling operations to achieve its goals and thus it should be more broadly considered as standard operating procedure to automate as much of the deployment process as possible. This is particularly true when operations fully adopts not only the premise of devops but the conclusion resulting from its agile roots. Tighter, faster, more frequent release cycles necessarily puts an additional burden on operations to execute the same processes over and over again. Trying to manually accomplish this may be setting operations up for failure and leave operations focused more on simply going through the motions and getting the application into production successfully than on streamlining and optimizing the processes they are executing. Electric Cloud’s ElectricDeploy is one of the ways in which process optimization can be achieved, and justifies its purchase by operations by promising to enable better control over application deployment processes across development and infrastructure. Devops is a Verb 1024 Words: The Devops Butterfly Effect Devops is Not All About Automation Application Security is a Stack Capacity in the Cloud: Concurrency versus Connections Ecosystems are Always in Flux The Pythagorean Theorem of Operational Risk
Lori_MacVittie
Jun 03, 2023 Place Technical Articles
374Views
0likes
1Comment
Intro to Load Balancing for Developers – The Algorithms
If you’re new to this series, you can find the complete list of articles in the series on my personal page here If you are writing applications to sit behind a Load Balancer, it behooves you to at least have a clue what the algorithm your load balancer uses is about. We’re taking this week’s installment to just chat about the most common algorithms and give a plain- programmer description of how they work. While historically the algorithm chosen is both beyond the developers’ control, you’re the one that has to deal with performance problems, so you should know what is happening in the application’s ecosystem, not just in the application. Anything that can slow your application down or introduce errors is something worth having reviewed. For algorithms supported by the BIG-IP, the text here is paraphrased/modified versions of the help text associated with the Pool Member tab of the BIG-IP UI. If they wrote a good description and all I needed to do was programmer-ize it, then I used it. For algorithms not supported by the BIG-IP I wrote from scratch. Note that there are many, many more algorithms out there, but as you read through here you’ll see why these (or minor variants of them) are the ones you’ll see the most. Plain Programmer Description: Is not intended to say anything about the way any particular dev team at F5 or any other company writes these algorithms, they’re just an attempt to put the process into terms that are easier for someone with a programming background to understand. Hopefully a successful attempt. Interestingly enough, I’ve pared down what BIG-IP supports to a subset. That means that F5 employees and aficionados will be going “But you didn’t mention…!” and non-F5 employees will likely say “But there’s the Chi-Squared Algorithm…!” (no, chi-squared is theoretical distribution method I know of because it was presented as a proof for testing the randomness of a 20 sided die, ages ago in Dragon Magazine). The point being that I tried to stick to a group that builds on each other in some connected fashion. So send me hate mail… I’m good. Unless you can say more than 2-5% of the world’s load balancers are running the algorithm, I won’t consider that I missed something important. The point is to give developers and software architects a familiarity with core algorithms, not to build the worlds most complete lexicon of algorithms. Random: This load balancing method randomly distributes load across the servers available, picking one via random number generation and sending the current connection to it. While it is available on many load balancing products, its usefulness is questionable except where uptime is concerned – and then only if you detect down machines. Plain Programmer Description: The system builds an array of Servers being load balanced, and uses the random number generator to determine who gets the next connection… Far from an elegant solution, and most often found in large software packages that have thrown load balancing in as a feature. Round Robin: Round Robin passes each new connection request to the next server in line, eventually distributing connections evenly across the array of machines being load balanced. Round Robin works well in most configurations, but could be better if the equipment that you are load balancing is not roughly equal in processing speed, connection speed, and/or memory. Plain Programmer Description: The system builds a standard circular queue and walks through it, sending one request to each machine before getting to the start of the queue and doing it again. While I’ve never seen the code (or actual load balancer code for any of these for that matter), we’ve all written this queue with the modulus function before. In school if nowhere else. Weighted Round Robin (called Ratio on the BIG-IP): With this method, the number of connections that each machine receives over time is proportionate to a ratio weight you define for each machine. This is an improvement over Round Robin because you can say “Machine 3 can handle 2x the load of machines 1 and 2”, and the load balancer will send two requests to machine #3 for each request to the others. Plain Programmer Description: The simplest way to explain for this one is that the system makes multiple entries in the Round Robin circular queue for servers with larger ratios. So if you set ratios at 3:2:1:1 for your four servers, that’s what the queue would look like – 3 entries for the first server, two for the second, one each for the third and fourth. In this version, the weights are set when the load balancing is configured for your application and never change, so the system will just keep looping through that circular queue. Different vendors use different weighting systems – whole numbers, decimals that must total 1.0 (100%), etc. but this is an implementation detail, they all end up in a circular queue style layout with more entries for larger ratings. Dynamic Round Robin (Called Dynamic Ratio on the BIG-IP): is similar to Weighted Round Robin, however, weights are based on continuous monitoring of the servers and are therefore continually changing. This is a dynamic load balancing method, distributing connections based on various aspects of real-time server performance analysis, such as the current number of connections per node or the fastest node response time. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: If you think of Weighted Round Robin where the circular queue is rebuilt with new (dynamic) weights whenever it has been fully traversed, you’ll be dead-on. Fastest: The Fastest method passes a new connection based on the fastest response time of all servers. This method may be particularly useful in environments where servers are distributed across different logical networks. On the BIG-IP, only servers that are active will be selected. Plain Programmer Description: The load balancer looks at the response time of each attached server and chooses the one with the best response time. This is pretty straight-forward, but can lead to congestion because response time right now won’t necessarily be response time in 1 second or two seconds. Since connections are generally going through the load balancer, this algorithm is a lot easier to implement than you might think, as long as the numbers are kept up to date whenever a response comes through. These next three I use the BIG-IP name for. They are variants of a generalized algorithm sometimes called Long Term Resource Monitoring. Least Connections: With this method, the system passes a new connection to the server that has the least number of current connections. Least Connections methods work best in environments where the servers or other equipment you are load balancing have similar capabilities. This is a dynamic load balancing method, distributing connections based on various aspects of real-time server performance analysis, such as the current number of connections per node or the fastest node response time. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: This algorithm just keeps track of the number of connections attached to each server, and selects the one with the smallest number to receive the connection. Like fastest, this can cause congestion when the connections are all of different durations – like if one is loading a plain HTML page and another is running a JSP with a ton of database lookups. Connection counting just doesn’t account for that scenario very well. Observed: The Observed method uses a combination of the logic used in the Least Connections and Fastest algorithms to load balance connections to servers being load-balanced. With this method, servers are ranked based on a combination of the number of current connections and the response time. Servers that have a better balance of fewest connections and fastest response time receive a greater proportion of the connections. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: This algorithm tries to merge Fastest and Least Connections, which does make it more appealing than either one of the above than alone. In this case, an array is built with the information indicated (how weighting is done will vary, and I don’t know even for F5, let alone our competitors), and the element with the highest value is chosen to receive the connection. This somewhat counters the weaknesses of both of the original algorithms, but does not account for when a server is about to be overloaded – like when three requests to that query-heavy JSP have just been submitted, but not yet hit the heavy work. Predictive: The Predictive method uses the ranking method used by the Observed method, however, with the Predictive method, the system analyzes the trend of the ranking over time, determining whether a servers performance is currently improving or declining. The servers in the specified pool with better performance rankings that are currently improving, rather than declining, receive a higher proportion of the connections. The Predictive methods work well in any environment. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: This method attempts to fix the one problem with Observed by watching what is happening with the server. If its response time has started going down, it is less likely to receive the packet. Again, no idea what the weightings are, but an array is built and the most desirable is chosen. You can see with some of these algorithms that persistent connections would cause problems. Like Round Robin, if the connections persist to a server for as long as the user session is working, some servers will build a backlog of persistent connections that slow their response time. The Long Term Resource Monitoring algorithms are the best choice if you have a significant number of persistent connections. Fastest works okay in this scenario also if you don’t have access to any of the dynamic solutions. That’s it for this week, next week we’ll start talking specifically about Application Delivery Controllers and what they offer – which is a whole lot – that can help your application in a variety of ways. Until then! Don.
Don_MacVittie_1
Mar 22, 2023 Place Technical Articles
22KViews
1like
9Comments
Why you still need layer 7 persistence
Tony Bourke of the Load Balancing Digest points out that mega proxies are largely dead. Very true. He then wonders whether layer 7 persistence is really all that important today, as it was largely implemented to solve the problems associated with mega-proxies - that is, large numbers of users coming from the same IP address. Layer 7 persistence is still applicable to situations where you may have multiple users coming from a single IP address (such as a small client base coming from a handful of offices, with each office using on public IP address), but I wonder what doing Layer 4 persistence would do to a major site these days. I’m thinking, not much. I'm going to say that layer 4 persistence would likely break a major site today. Layer 7 persistence is even more relevant today than it has been in the past for one very good reason: session coherence. Session coherence may not have the performance and availability ramifications of the mega-proxy problem, but it is essential to ensure that applications in a load-balanced environment work correctly. Where's F5? VMWorld Sept 15-18 in Las Vegas Storage Decisions Sept 23-24 in New York Networld IT Roadmap Sept 23 in Dallas Oracle Open World Sept 21-25 in San Francisco Storage Networking World Oct 13-16 in Dallas Storage Expo 2008 UK Oct 15-16 in London Storage Networking World Oct 27-29 in Frankfurt SESSION COHERENCE Layer 7 persistence is still heavily used in applications that are session sensitive. The most common example is shopping carts stored in the application server session, but it also increasingly important to Web 2.0 and interactive applications where state is important. Sessions are used to store that state and therefore Layer 7 persistence becomes important to maintaining that state in a load-balanced environment. It's common to see layer 7 persistence driven by JSESSIONID or PHPSESSIONID header variables today. It's a question we see in the forums here on DevCentral quite often. Many applications are rolled out, and then inserted into a load balanced environment, and subsequently break because sessions aren't shared across web application servers and the client isn't always routed to the same server or they come back "later" (after the connections have timed out) and expect the application to continue where they left it. If they aren't load balanced back to the same server, the session data isn't accessible and the application breaks. Application server sessions generally persist for hours as opposed to the minutes or seconds allowed for a TCP connection. Layer 4 (TCP) persistence can't adequately address this problem. Source port and IP address aren't always enough to ensure routing to the correct server because it doesn't persist once the connection is closed, and multiple requests coming from the same browser use multiple connections now, each with a different source port. That means two requests on the same page may not be load balanced to the same server, even though they both may require access to the application session data. These sites and applications are used for hours, often with long periods of time between requests, which means connections have often long timed out. Could layer 4 persistence work? Probably, but only if the time-out on these connections were set unreasonably high, which would consume a lot more resources on the load balancer and reduce its capacity significantly. And let's not forget SaaS (Software as a Service) sites like salesforce.com, where rather than mega-proxy issues cropping up we'd have lots-of-little-proxy issues cropping up as businesses still (thanks to IPv4 and the need to monitor Internet use) employ forward proxies. And SSL, too, is highly dependent upon header data to ensure persistence today. I agree with Tony's assessment that the mega proxy problem is largely a non-issue today, but session coherence is taking its place a one of the best reasons to implement layer 7 persistence over layer 4 persistence.
Lori_MacVittie
Jul 19, 2018 Place Technical Articles
497Views
0likes
1Comment
Load Balancing Fu: Beware the Algorithm and Sticky Sessions
The choice of load balancing algorithms can directly impact – for good or ill – the performance, behavior and capacity of applications. Beware making incompatible choices in architecture and algorithms. One of the most persistent issues encountered when deploying applications in scalable architectures involves sessions and the need for persistence-based (a.k.a. sticky) load balancing services to maintain state for the duration of an end-user’s session. It is common enough that even the rudimentary load balancing services offered by cloud computing providers such as Amazon include the option to enable persistence-based load balancing. While the use of persistence addresses the problem of maintaining session state, it introduces other operational issues that must also be addressed to ensure consistent operational behavior of load balancing services. In particular, the use of the Round Robin load balancing algorithm in conjunction with persistence-based load balancing should be discouraged if not outright disallowed. ROUND ROBIN + PERSISTENCE –> POTENTIALLY UNEQUAL DISTRIBUTION of LOAD When scaling applications there are two primary concerns: concurrent user capacity and performance. These two concerns are interrelated in that as capacity is consumed, performance degrades. This is particularly true of applications storing state as each request requires that the application server perform a lookup to retrieve the user session. The more sessions stored, the longer it takes to find and retrieve the session. The exactly efficiency of such lookups is determined by the underlying storage data structure and algorithm used to search the structure for the appropriate session. If you remember your undergraduate classes in data structures and computing Big (O) you’ll remember that some structures scale more efficiently in terms of performance than do others. The general rule of thumb, however, is that the more data stored, the longer the lookup. Only the amount of degradation is variable based on the efficiency of the algorithms used. Therefore, the more sessions in use on an application server instance, the poorer the performance. This is one of the reasons you want to choose a load balancing algorithm that evenly distributes load across all instances and ultimately why lots of little web servers scaled out offer better performance than a few, scaled up web servers. Now, when you apply persistence to the load balancing equation it essentially interrupts the normal operation of the algorithm, ignoring it. That’s the way it’s supposed to work: the algorithm essentially applies only to requests until a server-side session (state) is established and thereafter (when the session has been created) you want the end-user to interact with the same server to ensure consistent and expected application behavior. For example, consider this solution note for BIG-IP. Note that this is true of all load balancing services: A persistence profile allows a returning client to connect directly to the server to which it last connected. In some cases, assigning a persistence profile to a virtual server can create the appearance that the BIG-IP system is incorrectly distributing more requests to a particular server. However, when you enable a persistence profile for a virtual server, a returning client is allowed to bypass the load balancing method and connect directly to the pool member. As a result, the traffic load across pool members may be uneven, especially if the persistence profile is configured with a high timeout value. -- Causes of Uneven Traffic Distribution Across BIG-IP Pool Members So far so good. The problem with round robin- – and reason I’m picking on Round Robin specifically - is that round robin is pretty, well, dumb in its decision making. It doesn’t factor anything into its decision regarding which instance gets the next request. It’s as simple as “next in line", period. Depending on the number of users and at what point a session is created, this can lead to scenarios in which the majority of sessions are created on just a few instances. The result is a couple of overwhelmed instances (with performance degradations commensurate with the reduction in available resources) and a bunch of barely touched instances. The smaller the pool of instances, the more likely it is that a small number of servers will be disproportionately burdened. Again, lots of little (virtual) web servers scales out more evenly and efficiently than a few big (virtual) web servers. Assuming a pool of similarly-capable instances (RAM and CPU about equal on all) there are other load balancing algorithms that should be considered more appropriate for use in conjunction with persistence-based load balancing configurations. Least connections should provide better distribution, although the assumption that an active connection is equivalent to the number of sessions currently in memory on the application server could prove to be incorrect at some point, leading to the same situation as would be the case with the choice of round robin. It is still a better option, but not an infallible one. Fastest response time is likely a better indicator of capacity as we know that responses times increase along with resource consumption, thus a faster responding instance is likely (but not guaranteed) to have more capacity available. Again, this algorithm in conjunction with persistence is not a panacea. Better options for a load balancing algorithm include those that are application aware; that is, algorithms that can factor into the decision making process the current load on the application instance and thus direct requests toward less burdened instances, resulting in a more even distribution of load across available instances. NON-ALGORITHMIC SOLUTIONS There are also non-algorithmic, i.e. architectural, solutions that can address this issue. DIVIDE and CONQUER In cloud computing environments, where it is less likely to find available algorithms other than industry standard (none of which are application-aware), it may be necessary to approach the problem with a divide and conquer strategy, i.e. lots of little servers. Rather than choosing one or two “large” instances, choose to scale out with four or five “small” instances, thus providing a better (but not guaranteed) statistical chance of load being distributed more evenly across instances. FLANKING STRATEGY If the option is available, an architectural “flanking” strategy that leverages layer 7 load balancing, a.k.a. content/application switching, will also provide better consumptive rates as well as more consistent performance. An architectural strategy of this sort is in line with sharding practices at the data layer in that it separates out by some attribute different kinds of content and serves that content from separate pools. Thus, image or other static content may come from one pool of resources while session-oriented, process intensive dynamic content may come from another pool. This allows different strategies – and algorithms – to be used simultaneously without sacrificing the notion of a single point of entry through which all users interact on the client-side. Regardless of how you choose to address the potential impact on capacity, it is important to recognize the intimate relationship between infrastructure services and applications. A more integrated architectural approach to application delivery can result in a much more efficient and better performing application. Understanding the relationship between delivery services and application performance and capacity can also help improve on operational costs, especially in cloud computing environments that constrain the choices of load balancing algorithms. As always, test early and test often and test under high load if you want to be assured that the load balancing algorithm is suitable to meet your operational and business requirements. WILS: Why Does Load Balancing Improve Application Performance? Load Balancing in a Cloud Infrastructure Scalability Pattern: Sharding Sessions Infrastructure Scalability Pattern: Partition by Function or Type It’s 2am: Do You Know What Algorithm Your Load Balancer is Using? Lots of Little Virtual Web Applications Scale Out Better than Scaling Up Sessions, Sessions Everywhere Choosing a Load Balancing Algorithm Requires DevOps Fu Amazon Makes the Cloud Sticky To Boldly Go Where No Production Application Has Gone Before Cloud Testing: The Next Generation
Lori_MacVittie
May 31, 2018 Place Technical Articles
2.6KViews
0likes
1Comment
Load Balancing versus Application Routing
As the lines between DevOps and NetOps continue to blur thanks to the highly distributed models of modern application architectures, there rises a need to understand the difference between load balancing and application routing. These are not the same thing, even though they might be provided by the same service. Load balancing is designed to provide availability through horizontal scale. To scale an application, a load balancer distributes requests across a pool (farm, cluster, whatevs) of duplicated applications (or services). The decision on which pool member gets to respond to a request is based on an algorithm. That algorithm can be quite apathetic as to whether or the chosen pool member is capable of responding or it can be “smart” about its decision, factoring in response times, current load, and even weighting decisions based on all of the above. This is the most basic load balancing pattern in existence. It’s been the foundation for availability (scale and failover) since 1996. Load balancing of this kind is what we often (fondly) refer to as “dumb”. That’s because it’s almost always based on TCP (layer 4 of the OSI stack). Like honey badger, it don’t care about the application (or its protocols) at all. All it worries about is receiving a TCP connection request and matching it up with one of the members in the appropriate pool. It’s not necessarily efficient, but gosh darn it, it works and it works well. Systems have progressed to the point that purpose-built software designed to do nothing but load balancing can manage millions of connections simultaneously. It’s really quite amazing if you’re at all aware that back in the early 2000s most systems could only handle on the order of thousands of simultaneous requests. Now, application routing is something altogether different. First, it requires the system to care about the application and its protocols. That’s because in order to route an application request, the target must first be identified. This identification can be as simple as “what’s the host name” to something as complicated as “what’s the value of an element hidden somewhere in the payload in the form of a JSON key:value pair or XML element.” In between lies the most common application identifier – the URI. Application “routes” can be deduced from the URI by examining its path and extracting certain pieces. This is akin to routing in Express (one of the more popular node.js API frameworks). A URI path in the form of: /user/profile/xxxxx – where xxxxx is an actual user name or account number – can be split apart and used to “route” the request to a specific pool for load balancing or to a designated member (application/service instance). This happens at the “virtual server” construct of the load balancer using some sort of policy or code. Application routing occurs before the load balancing decision. In effect, application routing enables a single load balancer to distribute requests intelligently across multiple applications or services. If you consider modern microservices-based applications combined with APIs (URIs representing specific requests) you can see how this type of functionality becomes useful. An API can be represented as a single domain (api.example.com) to the client, but behind the scenes it is actually comprised of multiple applications or services that are scaled individually using a combination of application routing and load balancing. One of the reasons (aside from my pedantic nature) to understand the difference between application routing and load balancing is that the two are not interchangeable. Routing makes a decision on where to forward something – a packet, an application request, an approval in your business workflow. Load balancing distributes something (packets, requests, approval) across a set of resources designed to process that something. You really can’t (shouldn’t) substitute one for the other. But what it also means is that you have freedom to mix and match how these two interact with one another. You can, for example, use plain old load balancing (POLB) for ingress load balancing and then use application routing (layer 7) to distribute requests (inside a container cluster, perhaps). You can also switch that around and use application routing for ingress traffic, distributing it via POLB inside the application architecture. Load balancing and application routing can be layered, as well, to achieve specific goals with respect to availability and scale. I prefer to use application routing at the ingress because it enables greater variety and granularity in implementing both operational and application architectures more supportive of modern deployment patterns. The decision on where to use POLB vs application routing is largely based on application architecture and requirements. Scale can be achieved with both, though with differing levels of efficacy. That discussion is beyond the scope of today’s post, but there are trade-offs. It cannot be said often enough that the key to scaling applications today is about architectures, not algorithms. Understanding the differences of application routing and load balancing should provide a solid basis for designing highly scalable architectures.
Lori_MacVittie
Nov 25, 2017 Place Technical Articles
2.3KViews
0likes
5Comments
The Challenges of SQL Load Balancing
#infosec #iam load balancing databases is fraught with many operational and business challenges. While cloud computing has brought to the forefront of our attention the ability to scale through duplication, i.e. horizontal scaling or “scale out” strategies, this strategy tends to run into challenges the deeper into the application architecture you go. Working well at the web and application tiers, a duplicative strategy tends to fall on its face when applied to the database tier. Concerns over consistency abound, with many simply choosing to throw out the concept of consistency and adopting instead an “eventually consistent” stance in which it is assumed that data in a distributed database system will eventually become consistent and cause minimal disruption to application and business processes. Some argue that eventual consistency is not “good enough” and cite additional concerns with respect to the failure of such strategies to adequately address failures. Thus there are a number of vendors, open source groups, and pundits who spend time attempting to address both components. The result is database load balancing solutions. For the most part such solutions are effective. They leverage master-slave deployments – typically used to address failure and which can automatically replicate data between instances (with varying levels of success when distributed across the Internet) – and attempt to intelligently distribute SQL-bound queries across two or more database systems. The most successful of these architectures is the read-write separation strategy, in which all SQL transactions deemed “read-only” are routed to one database while all “write” focused transactions are distributed to another. Such foundational separation allows for higher-layer architectures to be implemented, such as geographic based read distribution, in which read-only transactions are further distributed by geographically dispersed database instances, all of which act ultimately as “slaves” to the single, master database which processes all write-focused transactions. This results in an eventually consistent architecture, but one which manages to mitigate the disruptive aspects of eventually consistent architectures by ensuring the most important transactions – write operations – are, in fact, consistent. Even so, there are issues, particularly with respect to security. MEDIATION inside the APPLICATION TIERS Generally speaking mediating solutions are a good thing – when they’re external to the application infrastructure itself, i.e. the traditional three tiers of an application. The problem with mediation inside the application tiers, particularly at the data layer, is the same for infrastructure as it is for software solutions: credential management. See, databases maintain their own set of users, roles, and permissions. Even as applications have been able to move toward a more shared set of identity stores, databases have not. This is in part due to the nature of data security and the need for granular permission structures down to the cell, in some cases, and including transactional security that allows some to update, delete, or insert while others may be granted a different subset of permissions. But more difficult to overcome is the tight-coupling of identity to connection for databases. With web protocols like HTTP, identity is carried along at the protocol level. This means it can be transient across connections because it is often stuffed into an HTTP header via a cookie or stored server-side in a session – again, not tied to connection but to identifying information. At the database layer, identity is tightly-coupled to the connection. The connection itself carries along the credentials with which it was opened. This gives rise to problems for mediating solutions. Not just load balancers but software solutions such as ESB (enterprise service bus) and EII (enterprise information integration) styled solutions. Any device or software which attempts to aggregate database access for any purpose eventually runs into the same problem: credential management. This is particularly challenging for load balancing when applied to databases. LOAD BALANCING SQL To understand the challenges with load balancing SQL you need to remember that there are essentially two models of load balancing: transport and application layer. At the transport layer, i.e. TCP, connections are only temporarily managed by the load balancing device. The initial connection is “caught” by the Load balancer and a decision is made based on transport layer variables where it should be directed. Thereafter, for the most part, there is no interaction at the load balancer with the connection, other than to forward it on to the previously selected node. At the application layer the load balancing device terminates the connection and interacts with every exchange. This affords the load balancing device the opportunity to inspect the actual data or application layer protocol metadata in order to determine where the request should be sent. Load balancing SQL at the transport layer is less problematic than at the application layer, yet it is at the application layer that the most value is derived from database load balancing implementations. That’s because it is at the application layer where distribution based on “read” or “write” operations can be made. But to accomplish this requires that the SQL be inline, that is that the SQL being executed is actually included in the code and then executed via a connection to the database. If your application uses stored procedures, then this method will not work for you. It is important to note that many packaged enterprise applications rely upon stored procedures, and are thus not able to leverage load balancing as a scaling option. Depending on your app or how your organization has agreed to protect your data will determine which of these methods are used to access your databases. The use of inline SQL affords the developer greater freedom at the cost of security, increased programming(to prevent the inherent security risks), difficulty in optimizing data and indices to adapt to changes in volume of data, and deployment burdens. However there is lively debate on the values of both access methods and how to overcome the inherent risks. The OWASP group has identified the injection attacks as the easiest exploitation with the most damaging impact. This also requires that the load balancing service parse MySQL or T-SQL (the Microsoft Transact Structured Query Language). Databases, of course, are designed to parse these string-based commands and are optimized to do so. Load balancing services are generally not designed to parse these languages and depending on the implementation of their underlying parsing capabilities, may actually incur significant performance penalties to do so. Regardless of those issues, still there are an increasing number of organizations who view SQL load balancing as a means to achieve a more scalable data tier. Which brings us back to the challenge of managing credentials. MANAGING CREDENTIALS Many solutions attempt to address the issue of credential management by simply duplicating credentials locally; that is, they create a local identity store that can be used to authenticate requests against the database. Ostensibly the credentials match those in the database (or identity store used by the database such as can be configured for MSSQL) and are kept in sync. This obviously poses an operational challenge similar to that of any distributed system: synchronization and replication. Such processes are not easily (if at all) automated, and rarely is the same level of security and permissions available on the local identity store as are available in the database. What you generally end up with is a very loose “allow/deny” set of permissions on the load balancing device that actually open the door for exploitation as well as caching of credentials that can lead to unauthorized access to the data source. This also leads to potential security risks from attempting to apply some of the same optimization techniques to SQL connections as is offered by application delivery solutions for TCP connections. For example, TCP multiplexing (sharing connections) is a common means of reusing web and application server connections to reduce latency (by eliminating the overhead associated with opening and closing TCP connections). Similar techniques at the database layer have been used by application servers for many years; connection pooling is not uncommon and is essentially duplicated at the application delivery tier through features like SQL multiplexing. Both connection pooling and SQL multiplexing incur security risks, as shared connections require shared credentials. So either every access to the database uses the same credentials (a significant negative when considering the loss of an audit trail) or we return to managing duplicate sets of credentials – one set at the application delivery tier and another at the database, which as noted earlier incurs additional management and security risks. YOU CAN’T WIN FOR LOSING Ultimately the decision to load balance SQL must be a combination of business and operational requirements. Many organizations successfully leverage load balancing of SQL as a means to achieve very high scale. Generally speaking the resulting solutions – such as those often touted by e-Bay - are based on sound architectural principles such as sharding and are designed as a strategic solution, not a tactical response to operational failures and they rarely involve inspection of inline SQL commands. Rather they are based on the ability to discern which database should be accessed given the function being invoked or type of data being accessed and then use a traditional database connection to connect to the appropriate database. This does not preclude the use of application delivery solutions as part of such an architecture, but rather indicates a need to collaborate across the various application delivery and infrastructure tiers to determine a strategy most likely to maintain high-availability, scalability, and security across the entire architecture. Load balancing SQL can be an effective means of addressing database scalability, but it should be approached with an eye toward its potential impact on security and operational management. What are the pros and cons to keeping SQL in Stored Procs versus Code Mission Impossible: Stateful Cloud Failover Infrastructure Scalability Pattern: Sharding Streams The Real News is Not that Facebook Serves Up 1 Trillion Pages a Month… SQL injection – past, present and future True DDoS Stories: SSL Connection Flood Why Layer 7 Load Balancing Doesn’t Suck Web App Performance: Think 1990s.
Lori_MacVittie
Aug 10, 2017 Place Technical Articles
2.6KViews
0likes
1Comment
8 things you can do with a proxy
After having recently discussed all the different kinds of proxies that exist, it occurred to me that it might be nice to provide some examples of what you can do with proxies besides the obvious web filtering scenario. This is by no means an exhaustive list, but is provided to show some of the more common (and cool, I think) uses of proxies. What's really awesome is that while some of these uses are available with only one type of proxy (reverse or forward), a full proxy can provide all these uses, and more, in a single, unified application delivery platform. 1. DATA SCRUBBING Data scrubbing is the process of removing sensitive information like credit card and social security numbers from web application responses. This is particularly useful in preventing data leaks, especially if you're subject to regulations like SOX, HIPPA, and PCI DSS where the penalties for divulging personally identifiable information can be harsh fines - or worse. Data scrubbing is is an implementation of a reverse proxy. 2. URL REWRITING Rewriting URLs is something everyone has likely had to do at one time or another if they've developed a web application. URL rewriting is used to refer web requests to new resources instead of sending out a redirect response in cases where resources have moved, renamed, or migrated to a new version. URL rewriting is an implementation of a reverse proxy. 3. LAYER 7 SWITCHING Layer 7 switching provides an organization with the ability to maximize their IP address space as well as architect a more efficient, better performing application architecture. Layer 7 switching routes specific web requests to different servers based on information in the application layer, like HTTP headers or application data. Layer 7 switching is an implementation of a reverse proxy. 4. CONTENT FILTERING The most common use of proxies is content filtering. Generally, content filtering allows or rejects requests for content based on organizational policies regarding content type, the existence of specific keywords, or based on the site itself. Content filtering is an implementation of a forward proxy. 5. REDIRECTION Redirection is the process of, well, redirecting a browser to a new resource. This could be a new instance of a requested resource or as part of application logic such as redirecting a failed login to the proper page. Redirection is generally implemented by a reverse proxy, but can also be implemented by a forward proxy as a means of redirecting rejected requests to an explanation page. 6. LOAD BALANCING Load balancing is one of the most common uses of a reverse proxy. Load balancing distributes requests for resources across a number of servers in order to provide scalability and availability services. Load balancing is an implementation of a reverse proxy. 7. APPLICATION FIREWALL An application firewall provides a number of functions including some in this list (data scrubbing and redirection). An application firewall sits in front of web applications and inspects requests for malicious content and attempts to circumvent security. An application firewall is an implementation of a reverse proxy. 8. PROTOCOL SECURITY Protocol security is the ability of a proxy to enforce protocol specifications on requests and responses in order to provide additional security at all layers of the OSI stack. Protocol security provides an additional layer of security atop traditional security mechanisms that focus on data. Protocol security is an implementation of a reverse proxy.
Lori_MacVittie
Oct 01, 2015 Place Technical Articles
1.7KViews
0likes
1Comment
Does Your Cloud Quiesce? It Should.
#cloud #sdn Without the ability to gracefully shutdown the "contraction" side of elasticity may be problematic Quiescence, in a nutshell, is your mom telling you to "finish what you're doing but don't start anything new, we're getting ready to go". It's an integral capability of load balancers (of enterprise-class load balancers, at least) that enables the graceful shutdown of application instances for a variety of purposes (patches, scheduled maintenance, etc... ). In more modern architectures this capability forms the foundation for non-disruptive (and thus live) migration of virtual machines.During the process the VM is moved and launched in location Y, the load balancer continues to send requests to the same VM in location X. Once the VM is available in location Y, the load balancer will no longer send new requests to location X but will continue to manage existing connections until they are complete. Cloud bursting, too, is enabled by the ability of a load balancer to quiesce connections at a global layer (virtual pattern) and at the local layer (bridged pattern). Load balancers must be able to support a "finish what's been started but don't start anything new" mode of operation on any given application The inability of a load balancing service to quiesce connections impacts not only the ability to implement specific architectural patterns, but it can seriously impact elasticity. The IMPACT on ELASTICITY Scaling out is easy, especially in the cloud. Add another instance of the application to the load balancing service and voila! Instant capacity. But scaling back, that's another story. You can't just stop the instance when load contracts, because, well, any existing connections relying on that instance will simply vanish. It's disruptive, in a very negative way, and can have a real impact on revenue (what happened to my order?) as well as productivity (that was three hours of work lost, OMERGERD). Scaling back requires careful collaboration between the load balancing service and the management framework to ensure that the process is graceful, that is, non-disruptive. It is unacceptable to business and operational stakeholders to simply "cut off" connections that may in the middle of executing a transaction or performing critical business functions. The problem is that the load balancing service must be imbued with enough intelligence to discern that there is a state between "up" and "down" for an instance. It must recognize that this state indicates "maintain, but do not add" connections. It is this capability, this stateful capability, that makes it difficult for many cloud and SDN-related architectures to really support the notion of elasticity at the application services layer. They might be able to scale out, but scaling back in requires more intelligence (and stateful awareness) than is currently available with most of these solutions.
Lori_MacVittie
Oct 01, 2015 Place Technical Articles
418Views
0likes
1Comment
Cloud bursting, the hybrid cloud, and why cloud-agnostic load balancers matter
Cloud Bursting and the Hybrid Cloud When researching cloud bursting, there are many directions Google may take you. Perhaps you come across services for airplanes that attempt to turn cloudy wedding days into memorable events. Perhaps you'd rather opt for a service that helps your IT organization avoid rainy days. Enter cloud bursting ... yes, the one involving computers and networks instead of airplanes. Cloud bursting is a term that has been around in the tech realm for quite a few years. It, in essence, is the ability to allocate resources across various public and private clouds as an organization's needs change. These needs could be economic drivers such as Cloud 2 having lower cost than Cloud 1, or perhaps capacity drivers where additional resources are needed during business hours to handle traffic. For intelligent applications, other interesting things are possible with cloud bursting where, for example, demand in a geographical region suddenly needs capacity that is not local to the primary, private cloud. Here, one can spin up resources to locally serve the demand and provide a better user experience. Nathan Pearce summarizes some of the aspects of cloud bursting in this minute long video, which is a great resource to remind oneself of some of the nuances of this architecture. While Cloud Bursting is a term that is generally accepted by the industry as an "on-demand capacity burst," Lori MacVittie points out that this architectural solution eventually leads to a Hybrid Cloud where multiple compute centers are employed to serve demand among both private-based resources are and public-based resources, or clouds, all the time. The primary driver for this: practically speaking, there are limitations around how fast data that is critical to one's application (think databases, for example) can be replicated across the internet to different data centers. Thus, the promises of "on-demand" cloud bursting scenarios may be short lived, eventually leaning in favor of multiple "always-on compute capacity centers" as loads increase for a given application. In any case, it is important to understand that that multiple locations, across multiple clouds will ultimately be serving application content in the not-too-distant future. An example hybrid cloud architecture where services are deployed across multiple clouds. The "application stack" remains the same, using LineRate in each cloud to balance the local application, while a BIG-IP Local Traffic Manager balances application requests across all of clouds. Advantages of cloud-agnostic Load Balancing As one might conclude from the Cloud Bursting and Hybrid Cloud discussion above, having multiple clouds running an application creates a need for user requests to be distributed among the resources and for automated systems to be able to control application access and flow. In order to provide the best control over how one's application behaves, it is optimal to use a load balancer to serve requests. No DNS or network routing changes need to be made and clients continue using the application as they always did as resources come online or go offline; many times, too, these load balancers offer advanced functionality alongside the load balancing service that provide additional value to the application. Having a load balancer that operates the same way no matter where it is deployed becomes important when resources are distributed among many locations. Understanding expectations around configuration, management, reporting, and behavior of a system limits issues for application deployments and discrepancies between how one platform behaves versus another. With a load balancer like F5's LineRate product line, anyone can programmatically manage the servers providing an application to users. Leveraging this programatic control, application providers have an easy way spin up and down capacity in any arbitrary cloud, retain a familiar yet powerful feature-set for their load balancer, ultimately redistribute resources for an application, and provide a seamless experience back to the user. No matter where the load balancer deployment is, LineRate can work hand-in-hand with any web service provider, whether considered a cloud or not. Your data, and perhaps more importantly cost-centers, are no longer locked down to one vendor or one location. With the right application logic paired with LineRate Precision's scripting engine, an application can dynamically react to take advantage of market pricing or general capacity needs. Consider the following scenarios where cloud-agnostic load balancer have advantages over vendor-specific ones: Economic Drivers Time-dependent instance pricing Spot instances with much lower cost becoming available at night Example: my startup's billing system can take advantage in better pricing per unit of work in the public cloud at night versus the private datacenter Multiple vendor instance pricing Cloud 2 just dropped their high-memory instance pricing lower than Cloud 1's Example: Useful for your workload during normal business hours; My application's primary workload is migrated to Cloud 2 with a simple config change Competition Having multiple cloud deployments simultaneously increases competition, and thus your organization's negotiated pricing contracts become more attractive over time Computational Drivers Traffic Spikes Someone in marketing just tweeted about our new product. All of a sudden, the web servers that traditionally handled all the loads thrown at them just fine are getting slashdotted by people all around North America placing orders. Instead of having humans react to the load and spin up new instances to handle the load - or even worse: doing nothing - your LineRate system and application worked hand-in-hand to spin up a few instances in Microsoft Azure's Texas location and a few more in Amazon's Virginia region. This helps you distribute requests from geographically diverse locations: your existing datacenter in Oregon, the central US Microsoft Cloud, and the east-coast based Amazon Cloud. Orders continue to pour in without any system downtime, or worse: lost customers. Compute Orchestration A mission-critical application in your organization's private cloud unexpectedly needs extra computer power, but needs to stay internal for compliance reasons. Fortunately, your application can spin up public cloud instances and migrate traffic out of the private datacenter without affecting any users or data integrity. Your LineRate instance reaches out to Amazon to boot instances and migrate important data. More importantly, application developers and system administrators don't even realize the application has migrated since everything behaves exactly the same in the cloud location. Once the cloud systems boot, alerts are made to F5's LTM and LineRate instances that migrate traffic to the new servers, allowing the mission-critical app to compute away. You just saved the day! The benefit to having a cloud-agnostic load balancing solution for connecting users with an organization's applications not only provides a unified user experience, but provides powerful, unified way of controlling the application for its administrators as well. If all of a sudden an application needs to be moved from, say, a private datacenter with a 100 Mbps connection to a public cloud with a GigE connection, this can easily be done without having to relearn a new load balancing solution. F5's LineRate product is available for bare-metal deployments on x86 hardware, virtual machine deployments, and has recently deployed an Amazon Machine Image (AMI). All of these deployment types leverage the same familiar, powerful tools that LineRate offers: lightweight and scalable load balancing, modern management through its intuitive GUI or the industry-standard CLI, and automated control via its comprehensive REST API. LineRate Point Load Balancer provides hardened, enterprise-grade load balancing and availability services whereas LineRate Precision Load Balancer adds powerful Node.js programmability, enabling developers and DevOps teams to leverage thousands of Node.js modules to easily create custom controls for application network traffic. Learn about some of LineRate's advanced scripting and functionality here, or try it out for free to see if LineRate is the right cloud-agnostic load balancing solution for your organization.
Andrew_Ragone_2
May 15, 2015 Place Technical Articles
1.1KViews
0likes
0Comments
Back to Basics: Health Monitors and Load Balancing
#webperf #ado Because every connection counts One of the truisms of architecting highly available systems is that you never, ever want to load balance a request to a system that is down. Therefore, some sort of health (status) monitoring is required. For applications, that means not just pinging the network interface or opening a TCP connection, it means querying the application and verifying that the response is valid. This, obviously, requires the application to respond. And respond often. Best practices suggest determining availability every 5 seconds or so. That means every X seconds the load balancing service is going to open up a connection to the application and make a request. Just like a user would do. That adds load to the application. It consumes network, transport, application and (possibly) database resources. Resources that cannot be used to service customers. While the impact on a single application may appear trivial, it's not. Remember, as load increases performance decreases. And no matter how trivial it may appear, health monitoring is adding load to what may be an already heavily loaded application. But Lori, you may be thinking, you expound on the importance of monitoring and visibility all the time! Are you saying we shouldn't be monitoring applications? Nope, not at all. Visibility is paramount, providing the actionable data necessary to enable highly dynamic, automated operations such as elasticity. Visibility through health-monitoring is a critical means of ensuring availability at both the local and global level. What we may need to do, however, is move from active to passive monitoring. PASSIVE MONITORING Passive monitoring, as the modifier suggests, is not an active process. The Load balancer does not open up connections nor query an application itself. Instead, it snoops on responses being returned to clients and from that infers the current status of the application. For example, if a request for content results in an HTTP error message, the load balancer can determine whether or not the application is available and capable of processing subsequent requests. If the load balancer is a BIG-IP, it can mark the service as "down" and invoke an active monitor to probe the application status as well as retrying the request to another available instance – insuring end-users do not see an error. Passive (inband) monitors are not binary. That is, they aren't simple "on" or "off" based on HTTP status codes. Such monitors can be configured to track the number of failures and evaluate failure rates against a configurable failure interval. When such thresholds are exceeded, the application can then be marked as "down". Passive monitors aren't restricted to availability status, either. They can also monitor for performance (response time). Failure to meet response time expectations results in a failure, and the application continues to be watched for subsequent failures. Passive monitors are, like most inline/inband technologies, transparent. They quietly monitor traffic and act upon that traffic without adding overhead to the process. Passive monitoring gives operations the visibility necessary to enable predictable performance and to meet or exceed user expectations with respect to uptime, without negatively impacting performance or capacity of the applications it is monitoring.
Lori_MacVittie
Dec 18, 2014 Place Technical Articles
3KViews
1like
2Comments