Forum Discussion
mtobkes_64700
Nimbostratus
May 12, 2010Rate-Limiting Crawlers
Hi I found this iRule here that will limit requests to 1 request per n seconds. I would like to know how I'd be able to allow n requests per 1 second, e.g. allow 5 requests per 1 second.
when RULE_INIT {
array set ::active_crawlers { }
set ::min_interval 1
set ::rate_limit_message "You've been rate limited for sending more than 1 request every $::min_interval seconds."
}
when HTTP_REQUEST {
set user_agent [string tolower [HTTP::header "User-Agent"]]
if { [matchclass $user_agent contains $::Crawlers] } {
Throttle crawlers.
set curr_time [clock seconds]
if { [info exists ::active_crawlers($user_agent)] } {
if { [ $::active_crawlers($user_agent) < $curr_time ] } {
set ::active_crawlers($user_agent) [expr {$curr_time + $::min_interval}]
} else {
block it somehow
HTTP::respond 503 content $::rate_limit_message }
} else {
set ::active_crawlers($user_agent) [expr {$curr_time + $::min_interval}]
}
}
}
Thanks,
myles
7 Replies
- JRahm
Admin
For cleaner, more accurate rate-limiting, check out the table command article series that covers this in depth:
http://devcentral.f5.com/Tutorials/TechTips/tabid/63/articleType/ArticleView/articleId/2391/categoryId/96/v101--iRules-rate-limiting-with-the-table-command.aspx Click Here - mtobkes_64700
Nimbostratus
Thanks for the link. However I'm only running v9.4.7. Can you tell me what options that leaves me?
Thanks again,
myles - JRahm
Admin
Check this version of the dns flood protection rule, the bones of the rate limiting are there:
http://devcentral.f5.com/wiki/default.aspx/iRules/DNS_Flood_Protection_v2.html Click Here - mtobkes_64700
Nimbostratus
I've modified the iRule I found to limit crawlers. I want to allow ::max_req_count for every ::min_interval, but I am getting a TCL error in my logs. Was wondering if someone can help me figure out what the problem is. The error I'm getting is:
TCL error: googlebot_rate-limit_vb5 HTTP_REQUEST - invalid command name ::active_crawlersmozilla/4.0 compatible msie 7.0 windows nt 5.1 gtb6.4 .net clr 1.1.4322 .net clr 2.0.50727 .net clr 3.0.4506.2152 .net clr 3.5.30729 while executing ::active_crawlers$user_agent $curr_time
when RULE_INIT {
array set ::active_crawlers { }
min_interval is the minimum amount of seconds
set ::min_interval 10
max_req_count variable is the maximum amount of request per min_interval
set ::max_req_count 3
set ::rate_limit_message "You've been rate limited for sending more than $::max_req_count request every $::min_interval seconds."
}
when HTTP_REQUEST {
set user_agent [string tolower [HTTP::header "User-Agent"]]
remove below log when we go to production
log local0. "user agent is $user_agent"
if { [matchclass $user_agent contains $::Crawlers] } {
Throttle crawlers.
remove below log when we go to production
log local0. "user agent matches $user_agent"
set curr_time [clock seconds]
if { [info exists ::active_crawlers($user_agent)] } {
remove below log when we go to production
log local0. "passed active Crawlers"
if { [ ::active_crawlers($user_agent) < $curr_time ] } {
set ::active_crawlers($user_agent) [expr {$curr_time + $::min_interval}]
set reqcount 1
remove below log when we go to production
log local0. "passed set active crawlers"
} else {
if { [$reqcount > $::max_req_count] } {
allow 10 request then block
HTTP::respond 503 content $::rate_limit_message
log when crawler hits more than 10 requests and block it
log local0. "Rate Limit Has Reached $::max_req_count Requests Per $min_interval for $user_agent"
} else {
reqcount keeps track of request
set reqcount [expr {$reqcount + 1}]
}
}
} else {
set ::active_crawlers($user_agent) [expr {$curr_time + $::min_interval}]
set reqcount 1
}
}
}
Thanks
m - hoolio
Cirrostratus
Hi myles,
Try changing this line:
if { [ ::active_crawlers($user_agent) < $curr_time ] } {
to
if { $::active_crawlers($user_agent) < $curr_time } {
Aaron - mtobkes_64700
Nimbostratus
Thanks Aaron. I changed the line however I now get this TCL error in my logs:
TCL error: googlebot_rate-limit_vb5 HTTP_REQUEST - invalid command name 1273758282 while executing $::active_crawlers$user_agent $curr_time - hoolio
Cirrostratus
Do you still have the parentheses around $user_agent and the less than sign in this line?
if { $::active_crawlers($user_agent) < $curr_time } {
Can you post a current copy of the iRule and the exact error message from /var/log/ltm?
Thanks, Aaron
Help guide the future of your DevCentral Community!
What tools do you use to collaborate? (1min - anonymous)Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects