Forum Discussion

JamesS_40157's avatar
Icon for Nimbostratus rankNimbostratus
Dec 16, 2011

ASM search engine configuration query

Hi all,



We are currently in the process of setting up the ASM for production use, and unfortunately I'm finding the documentation on search engine configuration quite lacking.



Search engine crawlers / bots are very important to our site, but what is also important for us is to block malicious data scrapers. We have a list of various bots/crawlers we want to allow (by user agent). However, i'm struggling to understand how to translate this into a search engine list on the ASM.



I understand the "domain" part of the configuration is to do with reverse lookups on the IP addresses of suspected crawlers, however what happens if a particular crawler doesn't have reverse DNS configured?



Also, what should the "name" part of the search engine relate to - is this a string match against the user-agent they provide? And should both the user-agent and domain name match, or is just one or the other enough to suffice?



Thanks in advance




4 Replies

  • James,



    You are right there does not appear to much in the way of configuration for Search Engines, although I am not sure there has to be. If you have your site mapping done correctly and are using the right Attack Signatures, then the crawlers/bots from the legitimate search engines should get through and the Attack Signatures should take care of blocking malicious crawlers/bots. If you want to be double sure you are not catching one of the UA strings that needs to be allowed, you can port the logs off to a syslog server and then perhaps have a script that looks for the various UA strings.



    Really once you setup proper site mapping in your policy any legitimate traffic should make it to your web site just fine.



  • Thanks for the response Mike.



    My concerns are that they would be blocked by the "web scraping" portion of the ASM. I understand the web scraping logic sends javascript and cookie requests, looking to see if the client is moving the mouse, pressing the keyboard etc etc. I would have thought that even good bots would not be able to respond correctly to these queries? And even if they can, i'd imagine it would only be the big players (google, yahoo, bing etc) that would bother implementing such a thing. Smaller engines may not do this...?
  • Well actually depending on what version of ASM you are running you may want to disable Web Scraping detection. There is an XSS vulnerability within that feature in 10.1.0 - 10.2.2.





    If you are running safe code to leave this enabled, another way to work around this might be to design an ASM iRule that matches the UA string against a safe list of UA strings you define as a data list, and then have it bypass the Web Scraping protection based upon that. I am definitely not an iRule guru so you will want to hunt through the Wiki's for more information. On how you might doe that, although you probably want to be careful doing something like that as it would be easy to spoof the UA string.
  • Thanks Mike - i agree just looking for a UA string might be risky as it is easily spoofed, we may have to push back on our SEO team in this case if there is no easy way to define this. I'll try raising a formal information request with F5 too to see exactly what those search engine fields mean.



    We're running v11.1 too so we shouldn't be vulnerable to the attack that you speak of.