load balancing

196 Topics

LTM with DNS - logging query answers DNS_RESPONSE clientside
Hi All We have our DNS services behind LTM VIPs. We have the DNS license and are using DNS_QUERY and DNS_RESPONSE events for logging queries and answers. We are not using Express, BIGIP Bind, nor GTM configurations - straight LB work. Last week I was investigating some optimizations and wanted to add Answer header information, specifically the truncate flag. This will allow us to gather some stats on the amount of UDP to TCP based queries occuring. I added [DNS::header tc] to the log message on DNS_RESPONSE and proceeded to test from a client system using DIG for a test SRV record that exceeded 512bytes using UDP. DIG did the expected things and sent a UDP query, got a response with truncate on, and re-queried with TCP. All good. However, when I looked at the logs, [DNS::header tc] was always returning 0. From network trace, I see client to VIP, SNAT to service, service to SNAT, VIP to client. The client side traffic is as expected - no EDNS0, buffer set to 512bytes, etc. As well the answer in UDP shows the truncate flag set to 1. Further examination showed that while the client had specified no EDNS0, buffer 512bytes, etc., the SNAT to DNS service traffic showed EDNS0 on and buffer of 4096bytes. I asked our F5 account rep if he had any insights and he agreed that DNS_RESPONSE seems to be pulling from server-side. I tried using [clientside {DNS::header tc}] in the logging statement, but got the same results - truncate still shows 0. Questions: 1. Is there a way to tell LTM to respect the client settings for the server side communications? 2. Can I get the client-side info in the DNS_RESPONSE event? Thanks
Jon_Macy
Nov 20, 2023 Place Technical Forum
948Views
0likes
6Comments
Load balancing using an API
Hello team, We have a bunch of hosts running behind F5. Every host is running few services. One particular service is capable of providing free memory information through the API we developed: GET http://hostname/myservice/usageAPI Response: { "freeMemory": 369959592 } Is it possible to consume this API in F5, and load balance accordingly? E.g. If freeMemory is less than threshold, than no request should be sent to that host for the time being. After sometime, when freeMemory is above the threshold value, then F5 should redirect request to that host. How to load balance in F5 through such API? Note that we don't want to mark server/host status Up and Down. We just want to make sure that particular service has enough memory to take up the next memory intensive request. We know Dynamic Ratio Load Balancing but that considers the overall health of the host. We want to load balance based on status of one service out of few other services running on the host.
Hitesh
Jun 05, 2023 Place Technical Forum
456Views
0likes
2Comments
Question on Priority Group Activation
Hi, I want to make my virtual server with 9 pool member automatically disabled when four of its pool member are down. Can I achieve this with below settings : 1. Put all the pool members to the same priority group for example 5 2. Under Priority Group Activation, I would select 6 viz., traffic should be processed by the pool members of group 5 till the pool have 6 minimum active members failing which the group shall not process the traffic. Now, as all the pool members belong to same priority group 5 and when PGA conditions fails would the virtual server would be down as there are no more pool members to accept the traffic ?? Please provide your inputs. Thanks, MSK
MSK_222682
Jun 04, 2023 Place Technical Forum
227Views
0likes
1Comment
Devops Proverb: Process Practice Makes Perfect
#devops Tools for automating – and optimizing – processes are a must-have for enabling continuous delivery of application deployments Some idioms are cross-cultural and cross-temporal. They transcend cultures and time, remaining relevant no matter where or when they are spoken. These idioms are often referred to as proverbs, which carries with it a sense of enduring wisdom. One such idiom, “practice makes perfect”, can be found in just about every culture in some form. In Chinese, for example, the idiom is apparently properly read as “familiarity through doing creates high proficiency”, i.e. practice makes perfect. This is a central tenet of devops, particularly where optimization of operational processes is concerned. The more often you execute a process, the more likely you are to get better at it and discover what activities (steps) within that process may need tweaking or changes or improvements. Ergo, optimization. This tenet grows out of the agile methodology adopted by devops: application release cycles should be nearly continuous, with both developers and operations iterating over the same process – develop, test, deploy – with a high level of frequency. Eventually (one hopes) we achieve process perfection – or at least what we might call process perfection: repeatable, consistent deployment success. It is implied that in order to achieve this many processes will be automated, once we have discovered and defined them in such a way as to enable them to be automated. But how does one automate a process such as an application release cycle? Business Process Management (BPM) works well for automating business workflows; such systems include adapters and plug-ins that allow communication between systems as well as people. But these systems are not designed for operations; there are no web servers or databases or Load balancer adapters for even the most widely adopted BPM systems. One such solution can be found in Electric Cloud with its recently announced ElectricDeploy. Process Automation for Operations ElectricDeploy is built upon a more well known product from Electric Cloud (well, more well-known in developer circles, at least) known as ElectricCommander, a build-test-deploy application deployment system. Its interface presents applications in terms of tiers – but extends beyond the traditional three-tiers associated with development to include infrastructure services such as – you guessed it – load balancers (yes, including BIG-IP) and virtual infrastructure. The view enables operators to create the tiers appropriate to applications and then orchestrate deployment processes through fairly predictable phases – test, QA, pre-production and production. What’s hawesome about the tools is the ability to control the process – to rollback, to restore, and even debug. The debugging capabilities enable operators to stop at specified tasks in order to examine output from systems, check log files, etc..to ensure the process is executing properly. While it’s not able to perform “step into” debugging (stepping into the configuration of the load balancer, for example, and manually executing line by line changes) it can perform what developers know as “step over” debugging, which means you can step through a process at the highest layer and pause at break points, but you can’t yet dive into the actual task. Still, the ability to pause an executing process and examine output, as well as rollback or restore specific process versions (yes, it versions the processes as well, just as you’d expect) would certainly be a boon to operations in the quest to adopt tools and methodologies from development that can aid them in improving time and consistency of deployments. The tool also enables operations to determine what is failure during a deployment. For example, you may want to stop and rollback the deployment when a server fails to launch if your deployment only comprises 2 or 3 servers, but when it comprises 1000s it may be acceptable that a few fail to launch. Success and failure of individual tasks as well as the overall process are defined by the organization and allow for flexibility. This is more than just automation, it’s managed automation; it’s agile in action; it’s focusing on the processes, not the plumbing. MANUAL still RULES Electric Cloud recently (June 2012) conducted a survey on the “state of application deployments today” and found some not unexpected but still frustrating results including that 75% of application deployments are still performed manually or with little to no automation. While automation may not be the goal of devops, but it is a tool enabling operations to achieve its goals and thus it should be more broadly considered as standard operating procedure to automate as much of the deployment process as possible. This is particularly true when operations fully adopts not only the premise of devops but the conclusion resulting from its agile roots. Tighter, faster, more frequent release cycles necessarily puts an additional burden on operations to execute the same processes over and over again. Trying to manually accomplish this may be setting operations up for failure and leave operations focused more on simply going through the motions and getting the application into production successfully than on streamlining and optimizing the processes they are executing. Electric Cloud’s ElectricDeploy is one of the ways in which process optimization can be achieved, and justifies its purchase by operations by promising to enable better control over application deployment processes across development and infrastructure. Devops is a Verb 1024 Words: The Devops Butterfly Effect Devops is Not All About Automation Application Security is a Stack Capacity in the Cloud: Concurrency versus Connections Ecosystems are Always in Flux The Pythagorean Theorem of Operational Risk
Lori_MacVittie
Jun 03, 2023 Place Technical Articles
374Views
0likes
1Comment
Intro to Load Balancing for Developers – The Algorithms
If you’re new to this series, you can find the complete list of articles in the series on my personal page here If you are writing applications to sit behind a Load Balancer, it behooves you to at least have a clue what the algorithm your load balancer uses is about. We’re taking this week’s installment to just chat about the most common algorithms and give a plain- programmer description of how they work. While historically the algorithm chosen is both beyond the developers’ control, you’re the one that has to deal with performance problems, so you should know what is happening in the application’s ecosystem, not just in the application. Anything that can slow your application down or introduce errors is something worth having reviewed. For algorithms supported by the BIG-IP, the text here is paraphrased/modified versions of the help text associated with the Pool Member tab of the BIG-IP UI. If they wrote a good description and all I needed to do was programmer-ize it, then I used it. For algorithms not supported by the BIG-IP I wrote from scratch. Note that there are many, many more algorithms out there, but as you read through here you’ll see why these (or minor variants of them) are the ones you’ll see the most. Plain Programmer Description: Is not intended to say anything about the way any particular dev team at F5 or any other company writes these algorithms, they’re just an attempt to put the process into terms that are easier for someone with a programming background to understand. Hopefully a successful attempt. Interestingly enough, I’ve pared down what BIG-IP supports to a subset. That means that F5 employees and aficionados will be going “But you didn’t mention…!” and non-F5 employees will likely say “But there’s the Chi-Squared Algorithm…!” (no, chi-squared is theoretical distribution method I know of because it was presented as a proof for testing the randomness of a 20 sided die, ages ago in Dragon Magazine). The point being that I tried to stick to a group that builds on each other in some connected fashion. So send me hate mail… I’m good. Unless you can say more than 2-5% of the world’s load balancers are running the algorithm, I won’t consider that I missed something important. The point is to give developers and software architects a familiarity with core algorithms, not to build the worlds most complete lexicon of algorithms. Random: This load balancing method randomly distributes load across the servers available, picking one via random number generation and sending the current connection to it. While it is available on many load balancing products, its usefulness is questionable except where uptime is concerned – and then only if you detect down machines. Plain Programmer Description: The system builds an array of Servers being load balanced, and uses the random number generator to determine who gets the next connection… Far from an elegant solution, and most often found in large software packages that have thrown load balancing in as a feature. Round Robin: Round Robin passes each new connection request to the next server in line, eventually distributing connections evenly across the array of machines being load balanced. Round Robin works well in most configurations, but could be better if the equipment that you are load balancing is not roughly equal in processing speed, connection speed, and/or memory. Plain Programmer Description: The system builds a standard circular queue and walks through it, sending one request to each machine before getting to the start of the queue and doing it again. While I’ve never seen the code (or actual load balancer code for any of these for that matter), we’ve all written this queue with the modulus function before. In school if nowhere else. Weighted Round Robin (called Ratio on the BIG-IP): With this method, the number of connections that each machine receives over time is proportionate to a ratio weight you define for each machine. This is an improvement over Round Robin because you can say “Machine 3 can handle 2x the load of machines 1 and 2”, and the load balancer will send two requests to machine #3 for each request to the others. Plain Programmer Description: The simplest way to explain for this one is that the system makes multiple entries in the Round Robin circular queue for servers with larger ratios. So if you set ratios at 3:2:1:1 for your four servers, that’s what the queue would look like – 3 entries for the first server, two for the second, one each for the third and fourth. In this version, the weights are set when the load balancing is configured for your application and never change, so the system will just keep looping through that circular queue. Different vendors use different weighting systems – whole numbers, decimals that must total 1.0 (100%), etc. but this is an implementation detail, they all end up in a circular queue style layout with more entries for larger ratings. Dynamic Round Robin (Called Dynamic Ratio on the BIG-IP): is similar to Weighted Round Robin, however, weights are based on continuous monitoring of the servers and are therefore continually changing. This is a dynamic load balancing method, distributing connections based on various aspects of real-time server performance analysis, such as the current number of connections per node or the fastest node response time. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: If you think of Weighted Round Robin where the circular queue is rebuilt with new (dynamic) weights whenever it has been fully traversed, you’ll be dead-on. Fastest: The Fastest method passes a new connection based on the fastest response time of all servers. This method may be particularly useful in environments where servers are distributed across different logical networks. On the BIG-IP, only servers that are active will be selected. Plain Programmer Description: The load balancer looks at the response time of each attached server and chooses the one with the best response time. This is pretty straight-forward, but can lead to congestion because response time right now won’t necessarily be response time in 1 second or two seconds. Since connections are generally going through the load balancer, this algorithm is a lot easier to implement than you might think, as long as the numbers are kept up to date whenever a response comes through. These next three I use the BIG-IP name for. They are variants of a generalized algorithm sometimes called Long Term Resource Monitoring. Least Connections: With this method, the system passes a new connection to the server that has the least number of current connections. Least Connections methods work best in environments where the servers or other equipment you are load balancing have similar capabilities. This is a dynamic load balancing method, distributing connections based on various aspects of real-time server performance analysis, such as the current number of connections per node or the fastest node response time. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: This algorithm just keeps track of the number of connections attached to each server, and selects the one with the smallest number to receive the connection. Like fastest, this can cause congestion when the connections are all of different durations – like if one is loading a plain HTML page and another is running a JSP with a ton of database lookups. Connection counting just doesn’t account for that scenario very well. Observed: The Observed method uses a combination of the logic used in the Least Connections and Fastest algorithms to load balance connections to servers being load-balanced. With this method, servers are ranked based on a combination of the number of current connections and the response time. Servers that have a better balance of fewest connections and fastest response time receive a greater proportion of the connections. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: This algorithm tries to merge Fastest and Least Connections, which does make it more appealing than either one of the above than alone. In this case, an array is built with the information indicated (how weighting is done will vary, and I don’t know even for F5, let alone our competitors), and the element with the highest value is chosen to receive the connection. This somewhat counters the weaknesses of both of the original algorithms, but does not account for when a server is about to be overloaded – like when three requests to that query-heavy JSP have just been submitted, but not yet hit the heavy work. Predictive: The Predictive method uses the ranking method used by the Observed method, however, with the Predictive method, the system analyzes the trend of the ranking over time, determining whether a servers performance is currently improving or declining. The servers in the specified pool with better performance rankings that are currently improving, rather than declining, receive a higher proportion of the connections. The Predictive methods work well in any environment. This Application Delivery Controller method is rarely available in a simple load balancer. Plain Programmer Description: This method attempts to fix the one problem with Observed by watching what is happening with the server. If its response time has started going down, it is less likely to receive the packet. Again, no idea what the weightings are, but an array is built and the most desirable is chosen. You can see with some of these algorithms that persistent connections would cause problems. Like Round Robin, if the connections persist to a server for as long as the user session is working, some servers will build a backlog of persistent connections that slow their response time. The Long Term Resource Monitoring algorithms are the best choice if you have a significant number of persistent connections. Fastest works okay in this scenario also if you don’t have access to any of the dynamic solutions. That’s it for this week, next week we’ll start talking specifically about Application Delivery Controllers and what they offer – which is a whole lot – that can help your application in a variety of ways. Until then! Don.
Don_MacVittie_1
Mar 22, 2023 Place Technical Articles
22KViews
1like
9Comments
User access to servers
Hello Dears i have mluti servers working with different ports like 8080 , 8090 ... etc and all of them are load balnacing using F5 i am asking if it's double to make user reach the servers using starndered 443 port and the F5 make the connnection to correct for exmpale web1.abc.com would be to 8080 web2.abc.com would be to 8090 Best Regards
Solved
MustphaBassim
Nov 15, 2022 Place Technical Forum
2.1KViews
0likes
27Comments
Why you still need layer 7 persistence
Tony Bourke of the Load Balancing Digest points out that mega proxies are largely dead. Very true. He then wonders whether layer 7 persistence is really all that important today, as it was largely implemented to solve the problems associated with mega-proxies - that is, large numbers of users coming from the same IP address. Layer 7 persistence is still applicable to situations where you may have multiple users coming from a single IP address (such as a small client base coming from a handful of offices, with each office using on public IP address), but I wonder what doing Layer 4 persistence would do to a major site these days. I’m thinking, not much. I'm going to say that layer 4 persistence would likely break a major site today. Layer 7 persistence is even more relevant today than it has been in the past for one very good reason: session coherence. Session coherence may not have the performance and availability ramifications of the mega-proxy problem, but it is essential to ensure that applications in a load-balanced environment work correctly. Where's F5? VMWorld Sept 15-18 in Las Vegas Storage Decisions Sept 23-24 in New York Networld IT Roadmap Sept 23 in Dallas Oracle Open World Sept 21-25 in San Francisco Storage Networking World Oct 13-16 in Dallas Storage Expo 2008 UK Oct 15-16 in London Storage Networking World Oct 27-29 in Frankfurt SESSION COHERENCE Layer 7 persistence is still heavily used in applications that are session sensitive. The most common example is shopping carts stored in the application server session, but it also increasingly important to Web 2.0 and interactive applications where state is important. Sessions are used to store that state and therefore Layer 7 persistence becomes important to maintaining that state in a load-balanced environment. It's common to see layer 7 persistence driven by JSESSIONID or PHPSESSIONID header variables today. It's a question we see in the forums here on DevCentral quite often. Many applications are rolled out, and then inserted into a load balanced environment, and subsequently break because sessions aren't shared across web application servers and the client isn't always routed to the same server or they come back "later" (after the connections have timed out) and expect the application to continue where they left it. If they aren't load balanced back to the same server, the session data isn't accessible and the application breaks. Application server sessions generally persist for hours as opposed to the minutes or seconds allowed for a TCP connection. Layer 4 (TCP) persistence can't adequately address this problem. Source port and IP address aren't always enough to ensure routing to the correct server because it doesn't persist once the connection is closed, and multiple requests coming from the same browser use multiple connections now, each with a different source port. That means two requests on the same page may not be load balanced to the same server, even though they both may require access to the application session data. These sites and applications are used for hours, often with long periods of time between requests, which means connections have often long timed out. Could layer 4 persistence work? Probably, but only if the time-out on these connections were set unreasonably high, which would consume a lot more resources on the load balancer and reduce its capacity significantly. And let's not forget SaaS (Software as a Service) sites like salesforce.com, where rather than mega-proxy issues cropping up we'd have lots-of-little-proxy issues cropping up as businesses still (thanks to IPv4 and the need to monitor Internet use) employ forward proxies. And SSL, too, is highly dependent upon header data to ensure persistence today. I agree with Tony's assessment that the mega proxy problem is largely a non-issue today, but session coherence is taking its place a one of the best reasons to implement layer 7 persistence over layer 4 persistence.
Lori_MacVittie
Jul 19, 2018 Place Technical Articles
497Views
0likes
1Comment
Load Balancing Fu: Beware the Algorithm and Sticky Sessions
The choice of load balancing algorithms can directly impact – for good or ill – the performance, behavior and capacity of applications. Beware making incompatible choices in architecture and algorithms. One of the most persistent issues encountered when deploying applications in scalable architectures involves sessions and the need for persistence-based (a.k.a. sticky) load balancing services to maintain state for the duration of an end-user’s session. It is common enough that even the rudimentary load balancing services offered by cloud computing providers such as Amazon include the option to enable persistence-based load balancing. While the use of persistence addresses the problem of maintaining session state, it introduces other operational issues that must also be addressed to ensure consistent operational behavior of load balancing services. In particular, the use of the Round Robin load balancing algorithm in conjunction with persistence-based load balancing should be discouraged if not outright disallowed. ROUND ROBIN + PERSISTENCE –> POTENTIALLY UNEQUAL DISTRIBUTION of LOAD When scaling applications there are two primary concerns: concurrent user capacity and performance. These two concerns are interrelated in that as capacity is consumed, performance degrades. This is particularly true of applications storing state as each request requires that the application server perform a lookup to retrieve the user session. The more sessions stored, the longer it takes to find and retrieve the session. The exactly efficiency of such lookups is determined by the underlying storage data structure and algorithm used to search the structure for the appropriate session. If you remember your undergraduate classes in data structures and computing Big (O) you’ll remember that some structures scale more efficiently in terms of performance than do others. The general rule of thumb, however, is that the more data stored, the longer the lookup. Only the amount of degradation is variable based on the efficiency of the algorithms used. Therefore, the more sessions in use on an application server instance, the poorer the performance. This is one of the reasons you want to choose a load balancing algorithm that evenly distributes load across all instances and ultimately why lots of little web servers scaled out offer better performance than a few, scaled up web servers. Now, when you apply persistence to the load balancing equation it essentially interrupts the normal operation of the algorithm, ignoring it. That’s the way it’s supposed to work: the algorithm essentially applies only to requests until a server-side session (state) is established and thereafter (when the session has been created) you want the end-user to interact with the same server to ensure consistent and expected application behavior. For example, consider this solution note for BIG-IP. Note that this is true of all load balancing services: A persistence profile allows a returning client to connect directly to the server to which it last connected. In some cases, assigning a persistence profile to a virtual server can create the appearance that the BIG-IP system is incorrectly distributing more requests to a particular server. However, when you enable a persistence profile for a virtual server, a returning client is allowed to bypass the load balancing method and connect directly to the pool member. As a result, the traffic load across pool members may be uneven, especially if the persistence profile is configured with a high timeout value. -- Causes of Uneven Traffic Distribution Across BIG-IP Pool Members So far so good. The problem with round robin- – and reason I’m picking on Round Robin specifically - is that round robin is pretty, well, dumb in its decision making. It doesn’t factor anything into its decision regarding which instance gets the next request. It’s as simple as “next in line", period. Depending on the number of users and at what point a session is created, this can lead to scenarios in which the majority of sessions are created on just a few instances. The result is a couple of overwhelmed instances (with performance degradations commensurate with the reduction in available resources) and a bunch of barely touched instances. The smaller the pool of instances, the more likely it is that a small number of servers will be disproportionately burdened. Again, lots of little (virtual) web servers scales out more evenly and efficiently than a few big (virtual) web servers. Assuming a pool of similarly-capable instances (RAM and CPU about equal on all) there are other load balancing algorithms that should be considered more appropriate for use in conjunction with persistence-based load balancing configurations. Least connections should provide better distribution, although the assumption that an active connection is equivalent to the number of sessions currently in memory on the application server could prove to be incorrect at some point, leading to the same situation as would be the case with the choice of round robin. It is still a better option, but not an infallible one. Fastest response time is likely a better indicator of capacity as we know that responses times increase along with resource consumption, thus a faster responding instance is likely (but not guaranteed) to have more capacity available. Again, this algorithm in conjunction with persistence is not a panacea. Better options for a load balancing algorithm include those that are application aware; that is, algorithms that can factor into the decision making process the current load on the application instance and thus direct requests toward less burdened instances, resulting in a more even distribution of load across available instances. NON-ALGORITHMIC SOLUTIONS There are also non-algorithmic, i.e. architectural, solutions that can address this issue. DIVIDE and CONQUER In cloud computing environments, where it is less likely to find available algorithms other than industry standard (none of which are application-aware), it may be necessary to approach the problem with a divide and conquer strategy, i.e. lots of little servers. Rather than choosing one or two “large” instances, choose to scale out with four or five “small” instances, thus providing a better (but not guaranteed) statistical chance of load being distributed more evenly across instances. FLANKING STRATEGY If the option is available, an architectural “flanking” strategy that leverages layer 7 load balancing, a.k.a. content/application switching, will also provide better consumptive rates as well as more consistent performance. An architectural strategy of this sort is in line with sharding practices at the data layer in that it separates out by some attribute different kinds of content and serves that content from separate pools. Thus, image or other static content may come from one pool of resources while session-oriented, process intensive dynamic content may come from another pool. This allows different strategies – and algorithms – to be used simultaneously without sacrificing the notion of a single point of entry through which all users interact on the client-side. Regardless of how you choose to address the potential impact on capacity, it is important to recognize the intimate relationship between infrastructure services and applications. A more integrated architectural approach to application delivery can result in a much more efficient and better performing application. Understanding the relationship between delivery services and application performance and capacity can also help improve on operational costs, especially in cloud computing environments that constrain the choices of load balancing algorithms. As always, test early and test often and test under high load if you want to be assured that the load balancing algorithm is suitable to meet your operational and business requirements. WILS: Why Does Load Balancing Improve Application Performance? Load Balancing in a Cloud Infrastructure Scalability Pattern: Sharding Sessions Infrastructure Scalability Pattern: Partition by Function or Type It’s 2am: Do You Know What Algorithm Your Load Balancer is Using? Lots of Little Virtual Web Applications Scale Out Better than Scaling Up Sessions, Sessions Everywhere Choosing a Load Balancing Algorithm Requires DevOps Fu Amazon Makes the Cloud Sticky To Boldly Go Where No Production Application Has Gone Before Cloud Testing: The Next Generation
Lori_MacVittie
May 31, 2018 Place Technical Articles
2.6KViews
0likes
1Comment
Load Balancing versus Application Routing
As the lines between DevOps and NetOps continue to blur thanks to the highly distributed models of modern application architectures, there rises a need to understand the difference between load balancing and application routing. These are not the same thing, even though they might be provided by the same service. Load balancing is designed to provide availability through horizontal scale. To scale an application, a load balancer distributes requests across a pool (farm, cluster, whatevs) of duplicated applications (or services). The decision on which pool member gets to respond to a request is based on an algorithm. That algorithm can be quite apathetic as to whether or the chosen pool member is capable of responding or it can be “smart” about its decision, factoring in response times, current load, and even weighting decisions based on all of the above. This is the most basic load balancing pattern in existence. It’s been the foundation for availability (scale and failover) since 1996. Load balancing of this kind is what we often (fondly) refer to as “dumb”. That’s because it’s almost always based on TCP (layer 4 of the OSI stack). Like honey badger, it don’t care about the application (or its protocols) at all. All it worries about is receiving a TCP connection request and matching it up with one of the members in the appropriate pool. It’s not necessarily efficient, but gosh darn it, it works and it works well. Systems have progressed to the point that purpose-built software designed to do nothing but load balancing can manage millions of connections simultaneously. It’s really quite amazing if you’re at all aware that back in the early 2000s most systems could only handle on the order of thousands of simultaneous requests. Now, application routing is something altogether different. First, it requires the system to care about the application and its protocols. That’s because in order to route an application request, the target must first be identified. This identification can be as simple as “what’s the host name” to something as complicated as “what’s the value of an element hidden somewhere in the payload in the form of a JSON key:value pair or XML element.” In between lies the most common application identifier – the URI. Application “routes” can be deduced from the URI by examining its path and extracting certain pieces. This is akin to routing in Express (one of the more popular node.js API frameworks). A URI path in the form of: /user/profile/xxxxx – where xxxxx is an actual user name or account number – can be split apart and used to “route” the request to a specific pool for load balancing or to a designated member (application/service instance). This happens at the “virtual server” construct of the load balancer using some sort of policy or code. Application routing occurs before the load balancing decision. In effect, application routing enables a single load balancer to distribute requests intelligently across multiple applications or services. If you consider modern microservices-based applications combined with APIs (URIs representing specific requests) you can see how this type of functionality becomes useful. An API can be represented as a single domain (api.example.com) to the client, but behind the scenes it is actually comprised of multiple applications or services that are scaled individually using a combination of application routing and load balancing. One of the reasons (aside from my pedantic nature) to understand the difference between application routing and load balancing is that the two are not interchangeable. Routing makes a decision on where to forward something – a packet, an application request, an approval in your business workflow. Load balancing distributes something (packets, requests, approval) across a set of resources designed to process that something. You really can’t (shouldn’t) substitute one for the other. But what it also means is that you have freedom to mix and match how these two interact with one another. You can, for example, use plain old load balancing (POLB) for ingress load balancing and then use application routing (layer 7) to distribute requests (inside a container cluster, perhaps). You can also switch that around and use application routing for ingress traffic, distributing it via POLB inside the application architecture. Load balancing and application routing can be layered, as well, to achieve specific goals with respect to availability and scale. I prefer to use application routing at the ingress because it enables greater variety and granularity in implementing both operational and application architectures more supportive of modern deployment patterns. The decision on where to use POLB vs application routing is largely based on application architecture and requirements. Scale can be achieved with both, though with differing levels of efficacy. That discussion is beyond the scope of today’s post, but there are trade-offs. It cannot be said often enough that the key to scaling applications today is about architectures, not algorithms. Understanding the differences of application routing and load balancing should provide a solid basis for designing highly scalable architectures.
Lori_MacVittie
Nov 25, 2017 Place Technical Articles
2.3KViews
0likes
5Comments
Uneven balancing - ratio (member)
Hi everybody, So here is my situation, as you can see here: I have a pool (Pool_Extranet) which contains 2 nodes ExtranetB and ExtranetC where B is a physical server and C a virtual server. The thing is B is much more powerful than C and our customers would like to do a 75/25 balancing but as you can see it's not the case at all, 95% of the connection are on C. We tried to put the pool on ratio (member) and configure the ratio but it changes anything, same goes for dynamic ratio (member) but change anything too... We thought about rebooting ExtranetC to "move" the connections on ExtranetB but it's our last solution and wanted to try the rest before doing it. Any idea? Thanks in advance.
BlackBolt_22590
Oct 22, 2017 Place Technical Forum
569Views
0likes
5Comments