One of the unfortunate effects of the continued evolution of the load balancer into today's application delivery controller (ADC) is that it is often too easy to forget the basic problem for which load balancers were originally created—producing highly available, scalable, and predictable application services. We get too lost in the realm of intelligent application routing, virtualized application services, and shared infrastructure deployments to remember that none of these things are possible without a firm basis in basic load balancing technology. So how important is load balancing, and how do its effects lead to streamlined application delivery?
Let’s examine the basic application delivery transaction. The ADC will typically sit in-line between the client and the hosts that provide the services the client wants to use. As with most things in application delivery, this is not a rule, but more of a best practice in a typical deployment. Let's also assume that the ADC is already configured with a virtual server that points to a cluster consisting of two service points. In this deployment scenario, it is common for the hosts to have a return route that points back to the load balancer so that return traffic will be processed through it on its way back to the client.
The basic application delivery transaction is as follows:
Figure 1. A basic load balancing transaction.
This very simple example is relatively straightforward, but there are a couple of key elements to take note of. First, as far as the client knows, it sends packets to the virtual server and the virtual server responds—simple. Second, the NAT takes place. This is where the ADC replaces the destination IP sent by the client (of the virtual server) with the destination IP of the host to which it has chosen to load balance the request. Step three is the second half of this process (the part that makes the NAT "bi-directional"). The source IP of the return packet from the host will be the IP of the host; if this address were not changed and the packet was simply forwarded to the client, the client would be receiving a packet from someone it didn't request one from, and would simply drop it. Instead, the ADC, remembering the connection, rewrites the packet so that the source IP is that of the virtual server, thus solving this problem.
So, how does the ADC decide which host to send the connection to? And what happens if the selected host isn't working?
Let's discuss the second question first. What happens if the selected host isn't working? The simple answer is that it doesn't respond to the client request and the connection attempt eventually times out and fails. This is obviously not a preferred circumstance, as it doesn't ensure high availability. That's why most ADC technology includes some level of health monitoring that determines whether a host is actually available before attempting to send connections to it.
There are multiple levels of health monitoring, each with increasing granularity and focus. A basic monitor would simply PING the host itself. If the host does not respond to PING, it is a good assumption that any services defined on the host are probably down and should be removed from the cluster of available services. Unfortunately, even if the host responds to PING, it doesn't necessarily mean the service itself is working. Therefore most devices can do "service PINGs" of some kind, ranging from simple TCP connections all the way to interacting with the application via a scripted or intelligent interaction. These higher-level health monitors not only provide greater confidence in the availability of the actual services (as opposed to the host), but they also allow the load balancer to differentiate between multiple services on a single host. The ADC understands that while one service might be unavailable, other services on the same host might be working just fine and should still be considered as valid destinations for user traffic.
This brings us back to the first question: How does the ADC decide which host to send a connection request to? Each virtual server has a specific dedicated cluster of services (listing the hosts that offer that service) which makes up the list of possibilities. Additionally, the health monitoring modifies that list to make a list of "currently available" hosts that provide the indicated service. It is this modified list from which the ADC chooses the host that will receive a new connection. Deciding the exact host depends on the ADC algorithm associated with that particular cluster. The most common is simple round-robin where the ADC simply goes down the list starting at the top and allocates each new connection to the next host; when it reaches the bottom of the list, it simply starts again at the top. While this is simple and very predictable, it assumes that all connections will have a similar load and duration on the back-end host, which is not always true. More advanced algorithms use things like current-connection counts, host utilization, and even real-world response times for existing traffic to the host in order to pick the most appropriate host from the available cluster services.
Sufficiently advanced application delivery systems will also be able to synthesize health monitoring information with load balancing algorithms to include an understanding of service dependency. This is the case when a single host has multiple services, all of which are necessary to complete the user's request. A common example would be in e-commerce situations where a single host will provide both standard HTTP services (port 80) as well as HTTPS (SSL/TLS at port 443) and any other potential service ports that need to be allowed. In many of these circumstances, you don't want a user going to a host that has one service operational, but not the other. In other words, if the HTTPS services should fail on a host, you also want that host's HTTP service to be taken out of the cluster list of available services. This functionality is increasingly important as HTTP-like services become more differentiated with this things like XML and scripting.
Load balancing in regards to picking an available service when a client initiates a transaction request is only half of the solution. Once the connection is established, the ADC must keep track of whether the following traffic from that user should be load balanced. There are generally two specific issues with handling follow-on traffic once it has been load balanced: connection maintenance and persistence.
If the user is trying to utilize a long-lived TCP connection (telnet, FTP, and more) that doesn't immediately close, the ADC must ensure that multiple data packets carried across that connection do not get load balanced to other available service hosts. This is connection maintenance and requires two key capabilities: 1) the ability to keep track of open connections and the host service they belong to; and 2) the ability to continue to monitor that connection so the connection table can be updated when the connection closes. This is rather standard fare for most ADCs.
One of the most basic forms of persistence is source-address affinity. Source address affinity persistence directs session requests to the same server based solely on the source IP address of a packet. This involves simply recording the source IP address of incoming requests and the service host they were load balanced to, and making all future transaction go to the same host. This is also an easy way to deal with application dependency as it can be applied across all virtual servers and all services. In practice however, the wide-spread use of proxy servers on the Internet and internally in enterprise networks renders this form of persistence almost useless; in theory it works, but proxy-servers inherently hide many users behind a single IP address resulting in none of those users being load balanced after the first user's request—essentially nullifying the ADC capability. Today, the intelligence of ADCs allows organizations to actually open up the data packets and create persistence tables for virtually anything within it. This enables them to use much more unique and identifiable information, such as user name, to maintain persistence. However, organizations one must take care to ensure that this identifiable client information will be present in every request made, as any packets without it will not be persisted and will be load balanced again, most likely breaking the application.
It is important to understand that basic load balancing technology, while still in use, is now only considered a feature of Application Delivery Controllers. ADCs evolved from the first load balancers through the service virtualization process and today with software only virtual editions. They can not only improve availability, but also affect the security and performance of the application services being requested.
Today, most organizations realize that simply being able to reach an application doesn't make it usable; and unusable applications mean wasted time and money for the enterprise deploying them. ADCs enable organizations to consolidate network-based services like SSL/TLS offload, caching, compression, rate-shaping, intrusion detection, application firewalls, and even remote access into a single strategic point that can be shared and reused across all application services and all hosts to create a virtualized Application Delivery Network. Basic load balancing is the foundation without which none of the enhanced functionality of today's ADCs would be possible.
Now that you’ve gotten this far, would you like to dig deeper or learn more about how application delivery works? Cool, then check out these resources: