Forum Discussion
Dave_Burnett_20
Nimbostratus
Nov 10, 2008How to allow Search Engine Robots/Slurps through ASM?
We have recently installed a pair of F56400s (v9.4.3) in front of our website with ASM in blocking mode.
We are seeing and blocking loads of Non-RFC compliant request violations. Examination of these violation entries reveals them to be predominantly Yahoo robots.
As RFC compliance checking is a standard feature of the ASM policy (which we have not changed in any way) I would have thought that anyone with an F5 using ASM will be blocking these robots, unless they have Non RFC blocking turned off.
Is this indeed the case? Are other users experiencing the same issues? Does anyone know how we can allow search engine robots access to our site through the ASM as blcoking them could impact on our website search rating?
Would be grateful for any adivce or pointers.
13 Replies
- AaronJBRet. EmployeeHi David,
I did a little digging and I haven't seen anyone else report this problem via the Support channel - that doesn't necessarily mean that nobody else is seeing this, of course, just that nobody else has come to Support with a request about it.
If you don't want to adjust your non-RFC blocking mask for your policy, which is entirely understandable, then I suspect your only recourse would be to filter these requests with either a Class or an iRule attached to the VIP and then pass them through a separate security policy with a more lenient blocking mask.
That approach would retain good security and open up the smallest possible attack vector (since, obviously, it is possible for someone to forge the user agent or source IP, depending on your filtering, and pass a malicious request through your more lenient policy).
That's the best way around this that I can think of, certainly - though I am intrigued as to why Yahoo are sending non-RFC compliant requests. That sounds like the kind of thing that ought to be mentioned to them in parallel with your efforts, since I would consider that "bad manners" on their part.
Let us know if you need assistance in filtering the requests with a Class or iRule and we can work from there.
Thanks,
Aaron - hoolio
Cirrostratus
As Aaron said, I haven't seen this in other customer implementations and I'm surprised to see that Yahoo's spider would be making invalid requests. I'd guess that this might be a "custom" interpretation of an HTTP RFC requirement that ASM is enforcing. Can you post an anonymized copy of a request or two that is being marked illegal?
Thanks,
Aaron - Dave_Burnett_20
Nimbostratus
Do these help?
GET /spurs/savings/adult_saver/index.html HTTP/1.0
Accept: */*
User-Agent: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
LLF-Cache-Control:
Host: www.britannia.co.uk
This violation was generated by IP address 74.6.22.97
GET /isa/transfer_in.html HTTP/1.0
Accept: */*
User-Agent: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
LLF-Cache-Control:
Host: www.britannia.co.uk
If-Modified-Since: Fri, 07 Nov 2008 17:33:57 GMT
This violation was generated by IP address 67.195.37.92
GET /careers/benefits.html HTTP/1.0
Accept: */*
User-Agent: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
LLF-Cache-Control:
Host: www.britannia.co.uk
This violation was again generated by IP address 74.6.22.97 - AaronJBRet. EmployeeThat helps a great deal;
I passed one of those requests through my lab unit and my v9.4.5 ASM is triggering a violation on "Header name with no header value"
While that isn't actually an HTTP RFC violation, since both the 1.0 and 1.1 RFCs list the header value as optional, it is a configurable blocking setting which is enabled by default for newly created policies.
Unfortunately I can't remember if that was an easily configurable option in v9.4.3, but in v9.4.5 (and v9.4.4 IIRC) it certainly is, under Policy->Blocking->HTTP Protocol Compliance where you can simply uncheck that blocking option for your policy. - Dave_Burnett_20
Nimbostratus
Thanks for this information.
I think the approach we'll take is to upgrade to 9.4.5 first (it's on our to do list !) and then take it from there as to what changes to implement regarding the yahoo robots - if any.
If I need any assistance with new classes or iRules to achieve this I'll certainly be back to take up your offer of help gratefully. - dburnett_103851
Nimbostratus
Posted By abrailsford on 11/11/2008 6:25 AM
That helps a great deal;
I passed one of those requests through my lab unit and my v9.4.5 ASM is triggering a violation on "Header name with no header value"
While that isn't actually an HTTP RFC violation, since both the 1.0 and 1.1 RFCs list the header value as optional, it is a configurable blocking setting which is enabled by default for newly created policies.
Unfortunately I can't remember if that was an easily configurable option in v9.4.3, but in v9.4.5 (and v9.4.4 IIRC) it certainly is, under Policy->Blocking->HTTP Protocol Compliance where you can simply uncheck that blocking option for your policy.
We've upgraded to v9.4.5 and I can see where we can allow the Yahoo slurps through by turning off the Header Name with No Header Value. However, what are the implications of turning this particular feature off? Are we potentially opening ourselves and making our site vulnerable? Is there an alternative way to allow the Yahoo robots through which is less risk? - hoolio
Cirrostratus
I couldn't find any references to HTTP attacks using an empty header value. You could open a case with F5 Support and ask them why this is considered a violation by default in 9.4.5. If you do get more information on this, could you reply here?
Thanks,
Aaron - dburnett_103851
Nimbostratus
Did what you suggested and opened a case with F5.
Their response was that turning off the 'Header Name with No Header Value' blocking did not give rise to any issues/risks to the website that they were aware of.
They pointed out that, within the HTTP Protocol Compliance section there are only 3 blocks that you should never turn off (Unparsable request content, Null in request and Several Content-Length headers)
I asked why, if there are no risks, is the Header Name with No Header Value block enabled by default after upgrading to 9.4.5. They haven't answered that one yet.
Anyway, the upshot of the matter is that I've turned off the block so that the slurps should now be getting through. - hoolio
Cirrostratus
Thanks for the info. If you get more info on the logic behind marking no value headers illegal, let me know.
Thanks,
Aaron - dburnett_103851
Nimbostratus
Thanks for the posting.
As it now transpires that the 'Header Name with No Header Value' check actively protects against a HTTP Request Smuggling attack I've had no other choice but to re-enable the checks on our F5s.
However, this should now mean the Yahoo robots will start to be blocked again, so if there are other options to allow the slurps through, or to prevent HTTP Request Smuggling attacks whilst having the check turned off, I'd be glad to hear them
Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects
