Forum Discussion
Kevin_Stewart
Nov 08, 2022Employee
The issue here is that amazonaws.com is not the Host value. The amazonaws.com bot is making a request to your site, so the Host value is still your HTTP Host. To find a crawler bot, you'd want to use the User-Agent header.
when HTTP_REQUEST {
if { [matchclass [string tolower [HTTP::header User-Agent]] contains blacklist_host] } {
reject
}
}
But then it may also be useful to consider using a robots.txt file: https://developers.google.com/search/docs/crawling-indexing/robots/intro, which you could host directly from an iRule:
when HTTP_REQUEST {
if { [HTTP::uri] == "/robots.txt" } {
HTTP::respond 200 content [ifile get robots.txt]
}
}