Forum Discussion
Mike_62629
Jul 16, 2008Nimbostratus
Rate limiting Search Spiders
We're currently having some problems with some web spiders beating up our webservers sucking up available sessions in our application and slurping up a whole bunch of our bandwidth. We're interested i...
hooleylist
Jan 24, 2012Cirrostratus
I think you can tell Google and Bing to crawl your sites at a slower rate:
http://www.bing.com/community/site_blogs/b/webmaster/archive/2009/08/10/crawl-delay-and-the-bing-crawler-msnbot.aspx
In the robots.txt file, within the generic user agent section, add the crawl-delay directive as shown in the example below:
User-agent: *
Crawl-delay: 1
http://googlewebmastercentral.blogspot.com/2008/12/more-control-of-googlebots-crawl-rate.html
We've upgraded the crawl rate setting in Webmaster Tools so that webmasters experiencing problems with Googlebot can now provide us more specific information. Crawl rate for your site determines the time used by Googlebot to crawl your site on each visit.
If those options don't work for you it might be better to assign a rateclass for search engine spiders rather than sending back a 503. It should add less overhead on LTM and provide faster overall crawl times. Of course, I'm not an SEO expert so this is something you might want to research before using.
You could use a list of spider user-agents like this to identify spiders:
http://www.useragentstring.com/pages/Crawlerlist/
You could either check the User-Agent header using a switch statement or putting the header tokens in a data group and using the class command to do the lookup. Once you identify a spider you could assign a rate class:
http://devcentral.f5.com/wiki/iRules.switch.ashx
http://devcentral.f5.com/wiki/iRules.class.ashx
http://devcentral.f5.com/wiki/iRules.rateclass.ashx
Aaron
Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects