Back to Basics: Least Connections is Not Least Loaded

#webperf #ado When load balancing, "least connections" does not mean "least loaded"

Performance is important, and that means it's important that our infrastructure support the need for speed. Load balancing algorithms are an integral piece of the performance equation and can both improve - or degrade - performance.

That's why it's important to understand more about the algorithms than their general selection mechanism. Understanding that round robin is basically an iterative choice, traversing a list one by one is good - but understanding what that means in terms of performance and capacity on different types of applications and application workloads is even better.

We last checked out "fastest response time" and today we're diving into "least connections" which, as stated above, does not mean "least loaded."

INTRA-APPLICATION WORKLOADS 

The industry standard "Least connections" load balancing algorithm uses the number of current connections to each application instance (member) to make its load balancing decision. The member with the least number of active connections is chosen. Pretty simple, right?

The premise of this algorithm is a general assumption that fewer connections (and thus fewer users) means less load and therefore better performance. That's operational axiom #2 at work - if performance decreases as load increases it stands to reason that performance increases as load decreases.

That would be true (and in the early days of load balancing it was true) if all intra-application workloads required the same resources. Unfortunately, that's no longer true and the result is uneven load distribution that leads to unpredictable performance fluctuations as demand increases.

Consider a simple example: a user logging into a system takes at least one if not more database queries to validate credentials and then update the system to indicate the activity. Depending on the nature of the application, other intra-application activities will require different quantities of resources. Some are RAM heavy, others CPU heavy, others file or database heavy. Furthermore, depending on the user in question, the usage pattern will vary greatly. One hundred users can be logged into the same system (requiring at a minimum ten connections) but if they're all relatively idle, the system will be lightly loaded and performing well.

Conversely, another application instance may boast only 50 connections, but all fifty users are heavily active with database queries returning large volumes of data. The system is far more heavily loaded and performance may be already beginning to suffer.

When the next request comes in, however, the load balancer using a "least connections" algorithm will choose the latter member, increasing the burden on that member and likely further degrading performance.  

The premise of the least connections algorithm is that the application instance with the fewest number of connections is the least loaded. Except, it's not.

The only way to know which application instance is the least loaded is to monitor its system variables directly, gathering CPU utilization and memory and comparing it against known maximums. That generally requires either SNMP, agents, or other active monitoring mechanisms that can unduly tax the system in and of itself by virtue of consuming resources.

This is a quandary for operations, because "application workload" is simply too broad a generalization. Certainly some applications are more I/O heavy than others, still others are more CPU or connection heavy. But all applications have both a general workload profile and an intra-application workload profile. Understanding the usage patterns - the intra-application workload profile - of an application is critical to being able to determine how best to not only choose a load balancing algorithm but specify any limitations that may provide better overall performance and use of capacity during execution.

As always, being aware of the capabilities and the limitations of a given load balancing algorithm will assist in choosing one that is best able to meet the performance and availability requirements of an application (and thus the business).


 

Published Jan 02, 2013
Version 1.0
  • Workload analysis can be performed in a number of ways - there's of course CPU and memory load, but there's also network connections (database call utilization) and other calls out to dependent services that can be tracked using fairly standard network analysis tools (even netstat will help here).

     

     

    Instrumentation during dev / QA would be most helpful, even dumping method/function timing / memory use during execution to a log file (that can then be analyzed by tools like splunk) can assist in determining what calls are putting what load on the server.

     

     

    Profiling tools are also another good option if you're using a language (Java, .NET) that has a good set of options for finding out where time and memory is spent in the code.

     

     

    There's no good automatic way of determining workload however, it takes some time to gather all the appropriate statistics and then analysis them holistically to determine load per workload type.

     

     

    HTH

     

    Lori