Forum Discussion
nrelihan_68143
Jul 21, 2011Nimbostratus
How to omit bots & crawlers from an iRule?
Hey!
I've written a Geo location iRule that redirects users to their regional websites if they try to go to a domain website, www.test.com.
So for instance if a visitor from the UK goes to www.test.com, they are redirected to www.test.co.uk.
Instead of using DNS lookup function to find a visitors region i use their local IP address to do so, new function of F5
So heres the thing i want to do:
It’s a requirement that when visitors from outside a region covered by the iRule (IP address cannont determine where to direct visitors), the visitor must be prompted to select a region. The region selector page will be like a splash page on the internet. Bots and crawlers are not to be redirected and will not see the splash page.
The first part of policy 3 should be OK , as I just need to redirect the non-regional visitors to a region selector page, however is there a way to omit bots and crawlers from falling under the iRule?
Thanks for any help.
Neil
- nrelihan_68143NimbostratusOk this is like something im looking for:
- The_BhattmanNimbostratusHi Neil,
- nrelihan_68143NimbostratusHey Bhattman,
- Michael_YatesNimbostratusData Groups and Classes are interchangeable (which can get confusing until you get use to it).
- nrelihan_68143NimbostratusThanks for the help Michael,
- Michael_YatesNimbostratusLooks right.
- Joel_MosesNimbostratusI'm almost afraid to post this one for fear of being laughed off of here, but you mentioned that you wanted to use the content from user-agents.org to seed the user-agent determination. There's actually another service called useragentstring.com that offers a simple user agent lookup service. I wrote this iRule a long while back to play with HTTP::retry but never did anything with it. It _does_ work, but it's a piece of... work.
when RULE_INIT { set static::browser_id_cookie_name "x_browser" set static::browser_id_header_name "X-Browser-Characteristics" } when CLIENT_ACCEPTED { Set initial session variables to allow us to track where we are in the validation of the incoming user-agent string. set do_lookup 1 set browsercookie 0 set content_collected 0 set done_retrying 0 } when HTTP_REQUEST { Save the original pool and hostname. We will need these later to rebuild our original request. set original_pool [LB::server pool] set original_host [HTTP::host] First check to see if we: 1. Do NOT have a cookie set that indicates we've been through validation before. 2. Are not in a user-agent validation retry. 3. Have no HTTP content collected from a previous run. 4. Are just inside a CLIENT_ACCEPTED event where we are expected to do a lookup. If these conditions are satisfied, we lookup the user-agent validation service IP address, save the current request, then construct an outgoing HTTP::request to the service IP. To do this, we use the existing request as a base, but use HTTP::header sanitize to strip out most data, then put back in the headers needed to connect and close. if { (! $done_retrying) && (! $content_collected) && ($do_lookup) && (! [HTTP::cookie exists $static::browser_id_cookie_name]) } { set original_request [HTTP::request] set uas_lookup_node [RESOLV::lookup @4.2.2.2 "www.useragentstring.com"] node $uas_lookup_node 80 HTTP::uri "/?uas=[URI::encode [HTTP::header User-Agent]]&getText=all" HTTP::header sanitize "Accept-Encoding Connection Cookie Keep-Alive" HTTP::header replace Host "www.useragentstring.com" HTTP::header insert Connection "close" If we have a browser cookie set, we forgo the outgoing lookup and keep marching. } elseif { ([HTTP::cookie exists $static::browser_id_cookie_name]) } { set browsercookie 1 set do_lookup 0 } } when HTTP_RESPONSE { If we're in a user-agent lookup loop, have no content currently collected, and have not yet sent our HTTP::retry, then collect the HTTP response from the UA lookup service. if {($do_lookup) && (! $content_collected) && (! $done_retrying) }{ if {[HTTP::header exists Content-Length] && ([HTTP::header Content-Length] < 2048)} { set con_length [HTTP::header Content-Length] } else { set con_length 2048 } HTTP::collect $con_length set content_collected 1 } If the current connection has a charateristics array and doesn't have a user-agent type that's a browser, then deny access. This array is either seeded directly from the UA verification response or is derived from the cookies we set. This will block ALL but browser clients -- including robots, crawlers, etc. if { ([array exists browser_characteristics]) && ($browser_characteristics(agent_type) ne "Browser") } { HTTP::respond 403 content "Not Allowed403 - Not allowedYour browser type is not allowed here." } If we have a characteristics array but no cookie, formulate one and place it in the outgoing response. This example iRule inserts the common things derived from the UA check in multiple cookies named after their datafields. It doesn't have to do this, but might be helpful because the web application can get the benefit of the learned information. If you don't need this or like it, then comment out the foreach loop. if { (! $browsercookie) && ([array exists browser_characteristics]) } { HTTP::cookie insert name ${static::browser_id_cookie_name} value "1" domain .$original_host path / foreach item [array get browser_characteristics] { switch $item { "os_type" - "agent_type" - "agent_name" - "agent_version" - "os_name" - "agent_language" { HTTP::cookie insert name ${static::browser_id_cookie_name}_$item value "$browser_characteristics($item)" domain .$original_host path / } } } set browsercookie 1 } } when HTTP_RESPONSE_DATA { Determine if the response we just got was from the user-agent service. If it was, then we're going to parse it into an array and then replay the original HTTP request to the original pool. if { ($do_lookup) && ($content_collected) && (! $done_retrying) } { if { [HTTP::payload] contains "agent_type" } { set parse_payload "[split [string replace [lindex [split [HTTP::payload] "\n"] 1] end end] ";"]" set browser_array_list "" foreach record $parse_payload { if { ($record ne "") } { set record [split $record "="] set rtype [lindex $record 0] set rvalue [lindex $record 1] if { ( $rvalue ne "") && ([string tolower $rvalue] ne "null") } { set browser_array_list "$browser_array_list{$rtype} {$rvalue} " } } } array set browser_characteristics $browser_array_list } HTTP::payload replace 0 [HTTP::payload length] "" set do_lookup 0 set content_collected 0 pool $original_pool HTTP::retry $original_request set done_retrying 1 } }
- Joel_MosesNimbostratusNaturally, the code is a bit mangled by the editor. :>
- hooleylistCirrostratusTry clicking edit and save to have the ampersands re-rendered correctly.
- nrelihan_68143NimbostratusThats an impressive piece of code Joel, but I dont think i'll be implementing it this time, Perhaps when i become more experienced with iRules, I'll look into doing something similar, thanks anyway. Its a pretty good idea though!
Recent Discussions
Related Content
Â
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects