Forum Discussion
How can I see if Google is hitting us?
We are experiencing many problems with Google hitting us really hard. Our Tomcat log indicates this, and it is almost crashing our ecommerce website. Is there a way that I can check F5 to see when we are being hit by Google, and then block that IP address? I am looking at the Event Application Request logs, and do not even see that we are being hit by Google, nor does the web scraping indicate that we are being hit. I appreciate any suggestions, ideas or experience. Many thanks, Dianna
8 Replies
- StephanManthey
Nacreous
Afaik, the google bot can be identified by the user agent.
But this can be faked. And sometimes the google bot does not identify itself to make sure you are not tuning your site for google.
How to proceed?
First thing might be a log analysis of your tomcat logs to figure out, what these requests have in common. Perhaps it´s always the same source IP, same user agent, same path, whatever ...
Based on this, an iRule can be applied to limit the number of requests based on specific criterias. There are some table-based sample iRules for request rate limitation on DC.
Just dropping all google requests may have some negative business impact ... - Dianna_129659
Nimbostratus
Hi Stephan. Thank you for this thoughtful reply. We recognize the negative impact of blocking google, but truly are having the webstore crash. I did not know that I could use iRules to limit google. That sounds like a good potential solution. Thank you!
- Leonardo_39231
Nimbostratus
Here is a good link for implementing irules for bots.
https://devcentral.f5.com/wiki/iRules.Controlling-Bots.ashx
- Dianna_129659
Nimbostratus
Hi Leonardo. Many thanks for this good resouce!
- hoolio
Cirrostratus
You could try rate limiting Google requests based on the user-agent header value, but I think as Steve Iveson pointed out on the codeshare example, it may affect your Google site ranking.
It might be better to send a 503 response from the iRule when you want to block access to a search engine spider:
http://googlewebmastercentral.blogspot.com/2011/01/how-to-deal-with-planned-site-downtime.html
Outages that are not clearly marked as such can negatively affect a site’s reputation. ... it’s better to return a 503 HTTP result code (Service Unavailable) which tells search engine crawlers that the downtime is temporary.
Aaron
- Dianna_129659
Nimbostratus
Thank you, Aaron. There is much to consider, with the website eCommerce remaining open to our customers being the most important. Still, we don't want to cause negative google ratings, etc. I appreciate the knowledge being shared. Many thanks, Dianna
- hopefully this works out for you. but when things calm down i would rexamine what is going on because i really doubt google is in the business of crashing sites.
- Dianna_129659
Nimbostratus
Yesterday someone posted information about slowing down google instead of completely blocking with an iRule. I thought there was sample code to throttle user-agents. Do you know where I can find that sample code, please?
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com