Forum Discussion
mtobkes_64700
May 12, 2010Nimbostratus
Rate-Limiting Crawlers
Hi I found this iRule here that will limit requests to 1 request per n seconds. I would like to know how I'd be able to allow n requests per 1 second, e.g. allow 5 requests per 1 second.
wh...
mtobkes_64700
May 13, 2010Nimbostratus
I've modified the iRule I found to limit crawlers. I want to allow ::max_req_count for every ::min_interval, but I am getting a TCL error in my logs. Was wondering if someone can help me figure out what the problem is. The error I'm getting is:
TCL error: googlebot_rate-limit_vb5 HTTP_REQUEST - invalid command name ::active_crawlersmozilla/4.0 compatible msie 7.0 windows nt 5.1 gtb6.4 .net clr 1.1.4322 .net clr 2.0.50727 .net clr 3.0.4506.2152 .net clr 3.5.30729 while executing ::active_crawlers$user_agent $curr_time
when RULE_INIT {
array set ::active_crawlers { }
min_interval is the minimum amount of seconds
set ::min_interval 10
max_req_count variable is the maximum amount of request per min_interval
set ::max_req_count 3
set ::rate_limit_message "You've been rate limited for sending more than $::max_req_count request every $::min_interval seconds."
}
when HTTP_REQUEST {
set user_agent [string tolower [HTTP::header "User-Agent"]]
remove below log when we go to production
log local0. "user agent is $user_agent"
if { [matchclass $user_agent contains $::Crawlers] } {
Throttle crawlers.
remove below log when we go to production
log local0. "user agent matches $user_agent"
set curr_time [clock seconds]
if { [info exists ::active_crawlers($user_agent)] } {
remove below log when we go to production
log local0. "passed active Crawlers"
if { [ ::active_crawlers($user_agent) < $curr_time ] } {
set ::active_crawlers($user_agent) [expr {$curr_time + $::min_interval}]
set reqcount 1
remove below log when we go to production
log local0. "passed set active crawlers"
} else {
if { [$reqcount > $::max_req_count] } {
allow 10 request then block
HTTP::respond 503 content $::rate_limit_message
log when crawler hits more than 10 requests and block it
log local0. "Rate Limit Has Reached $::max_req_count Requests Per $min_interval for $user_agent"
} else {
reqcount keeps track of request
set reqcount [expr {$reqcount + 1}]
}
}
} else {
set ::active_crawlers($user_agent) [expr {$curr_time + $::min_interval}]
set reqcount 1
}
}
}
Thanks
m
Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects