Forum Discussion

dean132_137579's avatar
dean132_137579
Icon for Nimbostratus rankNimbostratus
Mar 07, 2014

F5 LTM and Sharepoint

Hi there I am hoping someone can help me.

 

I have created a Virtual Server using the SharePoint 2010 Template for our SharePoint 2010 Farm we are only load balancing HTPP and HTTPS (separate virtual server for both). Browsing to the site works fine for the most part, but we seem to be having an issue where every 6 hours or so we have to restart the application pool on the Sharepoint servers as the website becomes unavailable.

 

I know what you are thinking, that is an application issue (I thought the same myself), but if I move the SharePoint front end servers away from the F5 vLAN I do not get the problem at all.

 

I should point out here that the F5 LTMs are the gateway to the subnet/vlan that the front end servers reside on (Server side vLAN). I should also point out that the Application Server and the Database servers are on separate vLANs so the traffic between the 2 pass through the F5 using an "AllowEverythingThrough" virtual server which is configured to use loose initiation and loose close explained in this document http://packetpushers.net/stateless-routing-f5-ltm

 

My theory is that something on the F5 is keeping a connection/s alive with the sharepoint server and it is causing the requirement for the app pool recycle, but I am struggling to prove it. I really believed that the issue was with the SharePoint servers but if I move them away from the F5 I do not get the issue.

 

Any help or ideas would be greatly appreciated.

 

THanks in Advance

 

8 Replies

  • Have you tried using netstat when the server has problems? If the issue is lack of local ports you should have tens of thousands there. Could be a start?
  • Sorry forgot to reply: Thanks very much for the reply, I did try that, we get around 5000 unique hits a day, the SharePoint/IIS Settings have a timeout of 30 minutes so in theory that should free up any ports. I will check it again next time it goes down. Any ideas what I can do if there is a lack of local ports? As it would be the F5 keeping the ports open.
  • Port congestion would be based on unique source IP addresses, and could happen if you're SNATting traffic with a single auto-map self-IP. Do you have a monitor on the pools? What kind of failure are you seeing? No access to SharePoint at all or some limit the access as if there was no access to the database? If the former, when it fails, open a shell on the F5 and see if you can access the server (ping, curl).

     

  • Brain storming here: Have you checked the IIS logs at the time and seen if requests is registered at all when the problems occur? Have you checked the event log? This one is a tough one but putting the site into maintenance and only allowing your IP while logging the traffic in Wireshark would also help. Then you could compare a working scenario with a non-working one. Test the server via curl like Kevin suggests, check the logs in IIS. Good luck! /Patrik
  • JG's avatar
    JG
    Icon for Cumulonimbus rankCumulonimbus

    Do you have a firewall between the F5 and the SP servers?

     

  • Apologies for the delay in getting back to people I had to liase with the business a time to let it fail, and it takes 9 hours to reach the tipping point:

     

    So,

     

    When having an issue I can ping the SQL server from the front end servers. I can RDP to the Front End Servers. I can also see entries in the IIS logs for my attempted failed connections.

     

    Hope this helps?

     

  • Okay, so based on what we now know (please correct me if I'm wrong):

     

    1. You don't have OneConnect enabled, so it's likely not a keep-alive persistent connection thing.

       

    2. When it fails you can actually get to SharePoint, but the page doesn't completely load. This would presume no lack of network connectivity at least between the F5 and the SharePoint server.

       

    3. A restart of the app pool clears it up. Perhaps splitting hairs here but do you mean a complete service restart or recycling an app pool (subtle but different things). Recycling would just be creating and swapping out worker processes, but a service reset would be destroying all of the existing state, connections, and threads.

       

    4. When it fails you can ping the SQL server. Does ping also go through the F5 gateway?

       

    5. When it fails you can RDP to the SharePoint server and see IIS logs for the failed attempts. Are there any other clues in the logs about what is going on? Have you considered running a set of performance monitors to observe the app pools/IIS throughout the day to see if memory or thread count is steadily increasing? Does the Windows event log give you any additional detail? How about the SQL server itself, does it log anything?

       

    We almost have to assume that it's not so much a connectivity issue between the F5 and web server, because you never technically lose that, but perhaps a lack of connectivity to something SharePoint needs to function. At the point of failure, SharePoint is struggling to either keep up with current threads, or unable to access some external resource that it needs. Let's then also assume for the moment that it is a connectivity issue, and that we have several connections that are potentially suspect:

     

    1. Client to F5 (web)
    2. F5 to SharePoint (web)
    3. SharePoint to F5 gateway (to SQL server)
    4. F5 gateway to SQL server

    While it's working, try to tcpdump on these individual connections so that you know what it looks like. Then when it fails, tcpdump again to see if any patterns emerge. So for example, if we suspect SQL connectivity problems, you might see SharePoint trying to get to SQL around the proxy, failing to get through the proxy, or SQL not returning responses. Are there any other off-box dependencies for this SharePoint instance?

     

  • Any chance for an update to this post? Was the issue resolved and if so what was it?