Forum Discussion
nrelihan_68143
Jul 21, 2011Nimbostratus
How to omit bots & crawlers from an iRule?
Hey!
I've written a Geo location iRule that redirects users to their regional websites if they try to go to a domain website, www.test.com.
So for instance if a visitor from the UK goes...
Joel_Moses
Aug 03, 2011Nimbostratus
I'm almost afraid to post this one for fear of being laughed off of here, but you mentioned that you wanted to use the content from user-agents.org to seed the user-agent determination. There's actually another service called useragentstring.com that offers a simple user agent lookup service. I wrote this iRule a long while back to play with HTTP::retry but never did anything with it. It _does_ work, but it's a piece of... work.
Fun to look at though.
It takes an incoming web request, formulates an HTTP request to the UA lookup service, gets the response, and puts it into both cookies and an in-memory array (yeah, an array -- old TCL habits die hard -- if I had to do it again, I'd use tables). It also allows through only clients whose user-agents are recognized as a "Browser" type.
Enjoy!
when RULE_INIT {
set static::browser_id_cookie_name "x_browser"
set static::browser_id_header_name "X-Browser-Characteristics"
}
when CLIENT_ACCEPTED {
Set initial session variables to allow us to track where we are
in the validation of the incoming user-agent string.
set do_lookup 1
set browsercookie 0
set content_collected 0
set done_retrying 0
}
when HTTP_REQUEST {
Save the original pool and hostname. We will need these later to
rebuild our original request.
set original_pool [LB::server pool]
set original_host [HTTP::host]
First check to see if we:
1. Do NOT have a cookie set that indicates we've been through validation before.
2. Are not in a user-agent validation retry.
3. Have no HTTP content collected from a previous run.
4. Are just inside a CLIENT_ACCEPTED event where we are expected to do a lookup.
If these conditions are satisfied, we lookup the user-agent validation service IP
address, save the current request, then construct an outgoing HTTP::request to the
service IP.
To do this, we use the existing request as a base, but use HTTP::header sanitize to
strip out most data, then put back in the headers needed to connect and close.
if { (! $done_retrying) && (! $content_collected) && ($do_lookup) && (! [HTTP::cookie exists $static::browser_id_cookie_name]) } {
set original_request [HTTP::request]
set uas_lookup_node [RESOLV::lookup @4.2.2.2 "www.useragentstring.com"]
node $uas_lookup_node 80
HTTP::uri "/?uas=[URI::encode [HTTP::header User-Agent]]&getText=all"
HTTP::header sanitize "Accept-Encoding Connection Cookie Keep-Alive"
HTTP::header replace Host "www.useragentstring.com"
HTTP::header insert Connection "close"
If we have a browser cookie set, we forgo the outgoing lookup and keep marching.
} elseif { ([HTTP::cookie exists $static::browser_id_cookie_name]) } {
set browsercookie 1
set do_lookup 0
}
}
when HTTP_RESPONSE {
If we're in a user-agent lookup loop, have no content currently collected, and
have not yet sent our HTTP::retry, then collect the HTTP response from the
UA lookup service.
if {($do_lookup) && (! $content_collected) && (! $done_retrying) }{
if {[HTTP::header exists Content-Length] && ([HTTP::header Content-Length] < 2048)} {
set con_length [HTTP::header Content-Length]
} else {
set con_length 2048
}
HTTP::collect $con_length
set content_collected 1
}
If the current connection has a charateristics array and doesn't have a user-agent
type that's a browser, then deny access. This array is either seeded directly from
the UA verification response or is derived from the cookies we set.
This will block ALL but browser clients -- including robots, crawlers, etc.
if { ([array exists browser_characteristics]) && ($browser_characteristics(agent_type) ne "Browser") } {
HTTP::respond 403 content "Not Allowed403 - Not allowedYour browser type is not allowed here."
}
If we have a characteristics array but no cookie, formulate one and place it in
the outgoing response.
This example iRule inserts the common things derived from the UA check in multiple cookies
named after their datafields. It doesn't have to do this, but might be helpful because the
web application can get the benefit of the learned information. If you don't need this or
like it, then comment out the foreach loop.
if { (! $browsercookie) && ([array exists browser_characteristics]) } {
HTTP::cookie insert name ${static::browser_id_cookie_name} value "1" domain .$original_host path /
foreach item [array get browser_characteristics] {
switch $item {
"os_type" -
"agent_type" -
"agent_name" -
"agent_version" -
"os_name" -
"agent_language" {
HTTP::cookie insert name ${static::browser_id_cookie_name}_$item value "$browser_characteristics($item)" domain .$original_host path /
}
}
}
set browsercookie 1
}
}
when HTTP_RESPONSE_DATA {
Determine if the response we just got was from the user-agent service. If
it was, then we're going to parse it into an array and then replay the original
HTTP request to the original pool.
if { ($do_lookup) && ($content_collected) && (! $done_retrying) } {
if { [HTTP::payload] contains "agent_type" } {
set parse_payload "[split [string replace [lindex [split [HTTP::payload] "\n"] 1] end end] ";"]"
set browser_array_list ""
foreach record $parse_payload {
if { ($record ne "") } {
set record [split $record "="]
set rtype [lindex $record 0]
set rvalue [lindex $record 1]
if { ( $rvalue ne "") && ([string tolower $rvalue] ne "null") } {
set browser_array_list "$browser_array_list{$rtype} {$rvalue} "
}
}
}
array set browser_characteristics $browser_array_list
}
HTTP::payload replace 0 [HTTP::payload length] ""
set do_lookup 0
set content_collected 0
pool $original_pool
HTTP::retry $original_request
set done_retrying 1
}
}
Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects