Forum Discussion
How to hide URL( Virtual-Servers) which are exposed over Internet from search engines using robots.txt?
Hi,
Kindly guide into this, we wanted to hide our Virtual-server which is exposed into internet from search engines using robots.txt?
I was also going through the below link in which Kevin guided:
https://devcentral.f5.com/questions/irules-and-robotstxt-question
Kindly assist into this so that we can understand the relationship of irule and robots.txt w.r.t Virtual-Server o URL which is exposed over internet and requirement is to hide it from search-engines.
Thanks and Regards Parveez
5 Replies
- Kevin_Stewart
Employee
As the other post alludes, a robots.txt file is purely advisory. Most of the major search engines do honor them, but they certainly don't have to. The contents of the robots.txt file, assuming you wanted to block all crawlers, is pretty straight forward:
User-agent: * Disallow: /This tells all robots to go away.
So to generate that with an iRule, you might do something like this:
when HTTP_REQUEST { if { [string tolower [HTTP::uri]] equals "/robots.txt" } { HTTP::respond 200 content "User-agent: *\nDisallow: /" } }Something that is a little more forceful, and potentially more dangerous, is to track on the requesting client's User-Agent header. The list of potential crawlers could get large, so I'd probably include those in a string-based data group. Example:
Crawler data group (ex. my_robots_dg)
bingbot msnbot exabot googlebot slurp** Reference: http://user-agent-string.info/list-of-ua/bots
And then an iRule like this:
when HTTP_REQUEST { if { [class match [string tolower [HTTP::header User-Agent]] contains my_robots_dg] } { drop } }Again, this approach is a bit more exhaustive and potentially dangerous if you don't get one of the bot names right in the data group, or a legitimate browser client sends this string.
- Parveez_70209
Nimbostratus
Hi Kevin,
Thank you for guiding, if we attach this irule with Virtual-Server, do the Server/Application team also need to include/configure something related to robot.txt.
Thanks and Regards Parveez
- Kevin_Stewart
Employee
In the first example, the robots.txt content is included in the iRule and includes ALL robots.
- Parveez_70209
Nimbostratus
Yes,we are planning to test this with the below URL which covers all:
when HTTP_REQUEST { if { [string tolower [HTTP::uri]] equals "/robots.txt" } { HTTP::respond 200 content "User-agent: *\nDisallow: /" } }
So, nothing specific to do from server/application team right for the objective to achieve.
Thanks and Regards Parveez
- Kevin_Stewart
Employee
Correct. If the crawler makes a request for "/robots.txt", the iRule will serve it. Nothing else needs to be done.
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com