Forum Discussion

zigzagdna_11404's avatar
zigzagdna_11404
Icon for Nimbostratus rankNimbostratus
Jul 18, 2012

I do not understand F5 observed setting

My company has F5. I have two app servers which I am accessing using a F5. App servers host same web site. I can access web site as appserver1.xxx.com, appserver2.xxx.com or lb.xxx.com. The last link, I will call as a load balancer link.

 

 

 

F5 provides following options when setting up load balancing: 1. Default (probably a random way of load balancing) 2. Round Robin (will send first message to app server 1, second to app server 2, 3rd message to app server 1…) 3. Least connections (whichever has least number of connections. Not sure what happens when an app server is down...) 4. Observed (a combination of least number of connections plus server performance; i.e., if a server responds faster, it may be sent traffic even if it has more connections???) I do not understand how observed works, but it seemed most sophisticated so I selected it.

 

 

 

Now I did some tests

 

 

1. Several users (5 of them) accessed my load balancer link; all of them went to app server 1, why? Why some were not sent to app server 2.

 

 

2. I shut down IIS (I have v7) on app server 1, then when users logged on, they went to app server 2. That makes sense.

 

 

3. I shut down IIS on app server 2 as well, now we get page found error, which also mak

 

es sense.

 

 

4. Now I started IIS on app server 2 (app server 1 IIS is still down). And when users access load balancer link, they do go to app server 2.That makes sense.

 

 

5. Now I brought up IIS on app server 1 so IIS on both servers is up. When Iaccess web site, it takes me to app server 1 which is also OK.

 

 

6. Now comes the troubling part, I rebooted app server 1, a Windows Server, reboot takes about 5 minutes. During this time I accessed load balanced web site link, I kept getting error, page not found because F5 was still sending to app server 1. After 5 minutes when app server 1 came up, I could access my web site through load balanced link. I do not understand at all why F5 observed mode behaves differently during reboot vs. when IIS is down. How does it really determine whether web site is down? Really really appreciate your insight to my questions: 1 and 6.
  • It's been awhile since I re-visited the load balancing algorithms, but I think I can point you in the right direction on what to check. The Observed, as I remember it, is a combination of least connections and response time. The two metrics are combined so as you stated, if a server is getting a lot of small connections and responding quickly or sessions are ending quickly, it may get more connections than the other server that has fewer, higher bandwidth connections or more CPU-intensive connections that cause it to respond slower. There is even a predictive method that takes the combined observed metric and plot over time in second intervals. This gives the "declining average" in that metric's response. The point of both is to distribute connections based on not only the number of connections (which can be misleading), but both the number and response time of the server as a result of the connections it is processing.

     

     

    In any case, after choosing Observed, you may want to verify you are not using persistence. If so, all bets are off on the exact distribution of requests since it is sending back to a particular server (server 1 in your case) based on source IP, cookie, etc. Next, check what is called the "ramp up" variable. This setting was developed for the exact case you mentioned, a server is rebooted or under maintenance and joins the pool. Suddenly, based on the observed metric, that is by far the pool member to send all requests to. The setting can be adjusted so it ramps up slowly over a certain period of time so the new servers is not overloaded. This can affect the distribution of requests in your test scenario.

     

     

     

    Finally, you may be dealing with a relatively small amount of traffic. Observed, Predictive, and other dynamic methods are intended for larger amounts of traffic and overall better performance and distribution over lots of traffic and time. A few connections at the rate you probably testing at is not *exactly* what it would look like in a real world production environment, nor any indication there's actually anything wrong.

     

     

     

    Now as far as the server getting sent requests when it's booting up, check the monitor(s) you have assigned to the nodes. Are they looking just at IP address, or service? Do you have a Content monitor that is checking to see if valid content is coming up which would not mark the server as up until it received specific content only available when it is running properly (GET /login.asp, response string "Welcome you are Logged into..."). Persistence would of course cause the same problem, so as mentioned earlier, make sure that is disabled unless needed, although a node marked down should redirect to a node that is up, but it depends on the persistence method. Also check your monitor timing, although it is recommended you have an interval of 30 seconds and the timeout be n*3+1, this can lead to a delay of seconds to several minutes of delay depending on the values. Try a 5-second interval, 16-second timeout, or even a 2 sec/5 sec setting just to see if that is the issue (the timeout, must be larger than interval). In other words, maybe it is not marking your node down quickly enough or more likely, the timeout is such the up state continues until the timeout expires. This is also affected when you use an IP or service monitor instead of content that will receive a definitive response faster.

     

     

     

    Hope this helps.