Forum Discussion

BalasM_147788's avatar
BalasM_147788
Icon for Nimbostratus rankNimbostratus
May 06, 2014

SharePoint 2013 HTTP Health Monitoring

Hi All,

 

I have been working with F5 LTM in fine tuning the health monitor for SharePoint 2013 iAPP rule, please go through the information below and provide your inputs: Environment: BIG IP F5 LTM 11.3 SharePoint 2010 Template released on October 2012 ( Not the latest) 2 SharePoint WFE servers ( Windows Server 2012)

 

Setup: 1. Few web applications are published through single virtual server IP on F5: webapp1.contoso.com, webapp2.conotoso.com, webapp3.contoso.com, webapp4.contoso.com and webapp5.contoso.com. 2. UAG is ahead of F5 and its doing the authentication and authorization for the end users. F5 is used for mainly load balancing. 3. The F5 iAPP rule as follows:

 

Questions: 1. The current basic HTTP monitor has the following send string GET /\r\n (it doesn't have any application URL's) and we are not sure 100% if this is correct and it will monitor all the web applications? 2. How do we configure the HTTP monitor to improve the load balancing for SharePoint web applications? Please provide me the examples and also the way to test the monitor and make sure they are working as expected. 3. Also noticed the latest iAPP templates for SharePoint http://support.f5.com/kb/en-us/solutions/public/15000/000/sol15043.html, does it have any specific improvements to monitoring?

 

I really appreciate your help!

 

Regards

 

Bala

 

4 Replies

  • UAG is Windows software running on a Windows server, usually in a DMZ. I've always been amazed how that idea doesn't scare the pants off of IA folk. A BIG-IP, on the other hand, is a default-deny, hardened security appliance. It may not have the inside knowledge that UAG may have on the health of a SharePoint environment, but it makes up for that in security and flexibility. That lack of internal introspection means that BIG-IP, like any ADC vendor's technology (other than UAG of course), cannot see things like resource utilization, CPU usage, and app pool thread counts. From the outside you can make a request to a service and use the service's response as basis for health status. If you think about it, that's not really a bad thing.

    Can you provide this resource? Yes? Okay you're good until the next health check. 
    

    Additionally, some of the more advanced LTM load balancing algorithms can look at things like TCP session counts and response latency, and actually help to stear traffic away from servers that appear to be having problems. The SharePoint iApp uses a very simple health monitor, most likely because 1) it's almost guaranteed not to break, and 2) the SharePoint environment can get so complex and so customized that there's simply no way to predict the best way to evaluate health. It's definitely not the best it can be, but the tools are there to make it better. For example, you could modify the monitor to do something like GET /Pages/Default.aspx, use NTLM credentials, and look for some specific attribute in the response payload. If you want to get crazy complex, you can use an external monitor, put on your Bash scripting hat (or Perl, Python, TCL, etc.) and make something that can log in, consume cookies, run queries, and do all sorts of things that would only be possible if the SharePoint server were truly healthy. And then you could simply look at this from SharePoint's perspective. What better place to know the true health and availability of a SharePoint server but the SharePoint server itself. You could, for example, built an ASP.NET page that queries various aspects of the environment (resource utilization, CPU usage, app pool thread counts, database connectivity, you name it) and then simply report "good", "bad", or something in between to a basic HTTP GET health monitor. I know this didn't exactly answer your question, so:

    1. The current basic HTTP monitor has the following send string GET /\r\n (it doesn't have any application URL's) and we are not sure 100% if this is correct and it will monitor all the web applications?

    The best way to make this monitor better is to know your SharePoint environment. What kind of HTTP request will truly identify a healthy (or unhealthy) SharePoint server? What URI is only available, or what response payload is only given if the SharePoint server is up and running? Do you need to provide credentials to make a request?

    1. How do we configure the HTTP monitor to improve the load balancing for SharePoint web applications? Please provide me the examples and also the way to test the monitor and make sure they are working as expected.

    Health monitoring and load balancing are tightly integrated, but still two different things. A load balancing decision is made based on all sorts of available data, one of which being the basic health and availability of a server based on monitoring. Using something better than Round Robin load balancing is your best first step. Least Connections and Observed methods are popular.

    1. Also noticed the latest iAPP templates for SharePoint http://support.f5.com/kb/en-us/solutions/public/15000/000/sol15043.html, does it have any specific improvements to monitoring?

    I don't believe the health monitors have specifically changed in the latest iApp, but given what I've already discussed, you're going to want to change it anyway.

  • mikeshimkus_111's avatar
    mikeshimkus_111
    Historic F5 Account

    Adding a bit to Kevin's excellent response.

     

    The latest version of the SharePoint iApp will allow you to specify custom send and receive strings and select either Basic or NTLM authentication for the monitor. It also lets you choose an existing custom monitor (an EAV, for example).

     

    W/R/T monitors and load balancing, another option is to use F5's WMI monitor together with dynamic ratio load balancing:

     

    http://support.f5.com/kb/en-us/solutions/public/6000/900/sol6914.html

     

    We also have an iCall script that can dynamically adjust pool member ratios based on the self-reported health of the SharePoint servers:

     

    https://devcentral.f5.com/wiki/iCall.Prioritize_Sharepoint_Nodes_on_Reported_Health.ashx

     

    thanks Mike

     

  • Hi Kevin and Mike!

     

    Thank you so much for your response. I am a newbie to F5 and in the process learning its awesomeness. To answer your questions:

     

    1. What kind of HTTP request will truly identify a healthy (or unhealthy) SharePoint server? What URI is only available, or what response payload is only given if the SharePoint server is up and running? Do you need to provide credentials to make a request? Bala: HTTP 200 OK Response from the SP servers applications considered as healthy. Yes, we need to use NTLM to access the application pages. One thing that I would like to understand about the F5 HTTP monitoring is, is it capable of monitoring all the web applications through single GET string? In our case, we have five web applications published through the single virtual IP on F5, if we need to monitor all the five applications and making sure they are all up and running before F5 LTM passes the request, is it possible through the HTTP monitor? Our goal is to be able to monitor all the web applications on SharePoint servers ( Send the GET requests different web applications home pages .i.e. welcomePage.aspx, default.aspx and homepage.aspx) and making sure they are all healthy before passing the user request to the servers.

       

    2. Using something better than Round Robin load balancing is your best first step. Least Connections and Observed methods are popular. Yes we are using the Least Connections (Member) for that rule.

       

    3. iCall Script looks great, but I need some details on how to implement this in conjunction with iAPP SharePoint rule. Also noticed in that article that the BIG-IP needs to be on 11.4 or higher in order to get the iCall implemented. Ours is currently on BIG-IP 11.3.0 Build 3117.0 Hotfix HF5, not sure if its possible to update the Big-IP at this stage.

       

    We do have a software (WMI Based) for monitoring the websites in our environment. Last time when we had an outage on the SharePoint application (App pool down) the F5 was still passing the request to the servers :( without even marking that node is down.

     

    I really appreciate your help!

     

    Regards

     

    Bala

     

  • HTTP 200 OK Response from the SP servers applications considered as healthy. Yes, we need to use NTLM to access the application pages. One thing that I would like to understand about the F5 HTTP monitoring is, is it capable of monitoring all the web applications through single GET string? In our case, we have five web applications published through the single virtual IP on F5, if we need to monitor all the five applications and making sure they are all up and running before F5 LTM passes the request, is it possible through the HTTP monitor? Our goal is to be able to monitor all the web applications on SharePoint servers ( Send the GET requests different web applications home pages .i.e. welcomePage.aspx, default.aspx and homepage.aspx) and making sure they are all healthy before passing the user request to the servers.

     

    Not within a single HTTP monitor. I believe you'd necessarily need to use an external monitor for something like this. So, assuming you need all of the applications to be healthy to mark the server up, here's an example Bash script:

     

     these arguments supplied automatically for all external pingers:
     $1 = IP (::ffff:nnn.nnn.nnn.nnn notation or hostname)
     $2 = port (decimal, host byte order)
    
    pidfile="/var/run/$MONITOR_NAME.$1..$2.pid"
    if [ -f $pidfile ]
    then
        kill -9 -`cat $pidfile` > /dev/null 2>&1
    fi
    echo "$$" > $pidfile
    
     Remove the IPv6/IPv4 compatibility prefix
    node_ip=`echo $1 | sed 's/::ffff://'`
    
    res=1
    
    creds='domain\user:password'
    
    curl -fNs -k -u $creds --ntlm https://$node_ip:$2/app1 |grep -i "app1-blah" 2>&1 > /dev/null
    if [ $? -ne 0 ]; then res=0; fi
    
    curl -fNs -k -u $creds --ntlm https://$node_ip:$2/test |grep -i "test-blah" 2>&1 > /dev/null
    if [ $? -ne 0 ]; then res=0; fi
    
    curl -fNs -k -u $creds --ntlm https://$node_ip:$2/blah |grep -i "blah-blah" 2>&1 > /dev/null
    if [ $? -ne 0 ]; then res=0; fi
    
    curl -fNs -k -u $creds --ntlm https://$node_ip:$2/bar |grep -i "bar-blah" 2>&1 > /dev/null
    if [ $? -ne 0 ]; then res=0; fi
    
    curl -fNs -k -u $creds --ntlm https://$node_ip:$2/foo |grep -i "foo-blah" 2>&1 > /dev/null
    if [ $? -ne 0 ]; then res=0; fi
    
    if [ $res -eq 1 ]
    then
         Remove pidfile before script echoes anything to stdout and is killed by bigd
        rm -f $pidfile
        echo "up"
    fi
    
    rm -f $pidfile
    

    iCall Script looks great, but I need some details on how to implement this in conjunction with iAPP SharePoint rule. Also noticed in that article that the BIG-IP needs to be on 11.4 or higher in order to get the iCall implemented. Ours is currently on BIG-IP 11.3.0 Build 3117.0 Hotfix HF5, not sure if its possible to update the Big-IP at this stage.

     

    You can still do iCall-like functions with a user_alert.conf configuration, but in this case you need something to run periodically, which user_alert wouldn't do natively. The iCall script in Mike's response is basically TMSH scripting, so you could still technically run the code as an external monitor (with some modification).

     

    We do have a software (WMI Based) for monitoring the websites in our environment. Last time when we had an outage on the SharePoint application (App pool down) the F5 was still passing the request to the servers ­čś× without even marking that node is down.

     

    Understand that in many cases an application can fail while the web server itself is still alive and well. I've even seen instances where the application failed and the web server still reported 200 OK. This is why, especially in complex environments like this, you should dive deeper into the application for monitoring. Don't just look for a 200 OK, but something that would only exist if the application was truly working.