Forum Discussion
How to hide URL( Virtual-Servers) which are exposed over Internet from search engines using robots.txt?
As the other post alludes, a robots.txt file is purely advisory. Most of the major search engines do honor them, but they certainly don't have to. The contents of the robots.txt file, assuming you wanted to block all crawlers, is pretty straight forward:
User-agent: *
Disallow: /
This tells all robots to go away.
So to generate that with an iRule, you might do something like this:
when HTTP_REQUEST {
if { [string tolower [HTTP::uri]] equals "/robots.txt" } {
HTTP::respond 200 content "User-agent: *\nDisallow: /"
}
}
Something that is a little more forceful, and potentially more dangerous, is to track on the requesting client's User-Agent header. The list of potential crawlers could get large, so I'd probably include those in a string-based data group. Example:
Crawler data group (ex. my_robots_dg)
bingbot
msnbot
exabot
googlebot
slurp
** Reference: http://user-agent-string.info/list-of-ua/bots
And then an iRule like this:
when HTTP_REQUEST {
if { [class match [string tolower [HTTP::header User-Agent]] contains my_robots_dg] } {
drop
}
}
Again, this approach is a bit more exhaustive and potentially dangerous if you don't get one of the bot names right in the data group, or a legitimate browser client sends this string.
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com