For more information regarding the security incident at F5, the actions we are taking to address it, and our ongoing efforts to protect our customers, click here.

Forum Discussion

gnico's avatar
gnico
Icon for Nimbostratus rankNimbostratus
Nov 08, 2022

Irule to block request from amazonaws.com

Hello,

I have an irule to block request from amazonaws.com bad crawlers (millions of requests a day) but my irule doesn't work. Total executions is 0.. 

Here is the code : 

 

 

when HTTP_REQUEST  {
    if { [matchclass [string tolower [HTTP::header Host]] contains blacklist_host] } {
        reject
    }
}

 

 

In my datagroup blacklist_host, I have amazonaws.com entry.

If someone has a solution. Thank you

4 Replies

  • The issue here is that amazonaws.com is not the Host value. The amazonaws.com bot is making a request to your site, so the Host value is still your HTTP Host. To find a crawler bot, you'd want to use the User-Agent header.

    when HTTP_REQUEST  {
        if { [matchclass [string tolower [HTTP::header User-Agent]] contains blacklist_host] } {
            reject
        }
    }

    But then it may also be useful to consider using a robots.txt file: https://developers.google.com/search/docs/crawling-indexing/robots/intro, which you could host directly from an iRule:

    when HTTP_REQUEST {
      if { [HTTP::uri] == "/robots.txt" } {
        HTTP::respond 200 content [ifile get robots.txt]
      }
    }
  • gnico's avatar
    gnico
    Icon for Nimbostratus rankNimbostratus

    Thank you for your reply.

    That I want to get is a property like the Apache remote_host. In my Apache logs, i have millions hits from remote_host ec2-xx-xxx-xxx-xx.eu-west-3.compute.amazonaws.com

    They are malicious bots with classical User Agent likeMozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78 and with a lot of differents IP

    So, the last and only solution I found is to block them with their remote host. I don't want to use Apache rules. I want to block them before Apache.

    Is there a way ?

    Thank you

    • Hi gnico , 
            if you recieve millions of hits also if you have Advanced WAF license on your F5 Bigip Appliance 

      , I think configuring Bot defense on your F5 will be good workaround. 
      Please check this Video : 
      https://www.youtube.com/watch?v=zSw4boZmNBA

      and monitor traffic from Asm event logs , also keep track your CPU and Memory as well. 

    • Kevin_Stewart's avatar
      Kevin_Stewart
      Icon for Employee rankEmployee

      Apache remote_host is essentially a reverse DNS lookup. You can do this in an iRule:

      1. Create a resolver object:

      list net dns-resolver my-resolver 
      net dns-resolver my-resolver {
          forward-zones {
              . {
                  nameservers {
                      10.1.20.1:domain { }
                  }
              }
          }
          route-domain 0
      }

      2. Create an iRule that uses the resolver object.

      Ref: https://clouddocs.f5.com/api/irules/RESOLVER__name_lookup.html

      proc resolv_ptr_v4 { addr_v4 } {
          set ret [scan $addr_v4 {%d.%d.%d.%d} a b c d]
          if { $ret != 4 } {
              return
          }
          set ret [RESOLVER::name_lookup "/Common/my-resolver" "$d.$c.$b.$a.in-addr.arpa" PTR]
          set ret [lindex [DNSMSG::section $ret answer] 0]
          if { $ret eq "" } {
              return
          }
          return [lindex $ret end]
      }
      when CLIENT_ACCEPTED {
          set result [call resolv_ptr_v4 [IP::client_addr]]
          log local0. $result
          ## put your data group search here
      }