When IT attacks are being talked about by security experts, the first point covered is normally the measure of destruction.  What harm does a specific attack do to networks, to servers, to applications?  

We look at the function set of the attack in terms of what service or application will be rendered unavailable or compromised in some way.  A whole army of security technologies  - firewalls, IPS / IDS, anti-virus, AAA, SSL - stands ready to bat down these attacks.

There are also other attack scenarios where no permanent or visible damage is caused - other than to business models.  Web scraping is very widespread, and essentially involves taking information off web servers.  Search engines use web scraping.  This isn’t bad per se - everyone wants to come up on a Google search, ergo few people will complain about this.

Other, more insidious forms of web scraping exist.  Intellectual property is a common target, often to benefit a particular business model that in itself is perfectly legal and above board.  There are many, many organisations that - for example - trawl airline and travel agent flight destinations and costs to present ‘the lowest cost flight to Ibiza with XYZ Airline’.

Of course, if you are one of these travel agents or airlines, and you object to your information being skimmed in this way, the law is a recourse, and there are a few examples where organisations (Ryanair, for one) have successfully brought an action against web scraping.  But many organisations are put off this course of action by potential expense, time, and a far-from-certain outcome.

Better, then, for the more legally-averse, to look at web scraping just as you would more directly damaging attack scenarios - to defend against them technically.  Those that seek to scrape your information use scanners or bots to get the information they want.   

If you have an application firewall as part of your defences, it may be that you already have the functionality in place to recognise and block web scraping tools and attempts.  You might even be able, as you can with F5’s offering, to be extremely selective, to be able to allowlist IP addresses so search engines can do their job while preventing web scraping attempts.

Published Jun 09, 2011
Version 1.0

Was this article helpful?

No CommentsBe the first to comment