Forum Discussion
David_Homoney
Nimbostratus
Nov 30, 2007Removing some links in payload but not all
So I have a dickens of a problem. I am trying to remove some
- Deb_Allen_18Historic F5 AccountI think you mostly just need to move the release outside of the loop, otherwise you will always be releasing on the first match.
when HTTP_REQUEST { Don't allow data to be chunked if { [HTTP::version] eq "1.1" } { if { [HTTP::header is_keepalive] } { HTTP::header replace "Connection" "Keep-Alive" } HTTP::version "1.0" } } when HTTP_RESPONSE { Only check responses that are a text content type (text/html, text/xml, text/plain, etc). if { [HTTP::header "Content-Type"] starts_with "text/" } { Get the content length so we can request the data to be processed in the HTTP_RESPONSE_DATA event. if { [HTTP::header exists "Content-Length"] && [HTTP::header "Content-Length"] < 1048577 } { set content_length [HTTP::header "Content-Length"] } else { set content_length 1048576 } log local0.info "Content Length: $content_length" if { $content_length > 0 } { HTTP::collect $content_length } } } when HTTP_RESPONSE_DATA { Find ALL the possible URLs in one pass log local0.info "Time for some regex action baby" set url_indices [regexp -all -inline -indices {^((http[s]?):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^?\s]+)(.*)?([\w\-]+)?$} [HTTP::payload]] log local0.info "url_indices: $url_indices" foreach url_idx $url_indices { set url_start [lindex $url_idx 0] set url_end [lindex $url_idx 1] set url_len [expr {$url_end - $url_start + 1}] log local0.info "url_start: $url_start url_end: $url_end url_len: $url_len" set url_address [string range [HTTP::payload] $url_start $url_end] log local0.info "url_address: $url_address" Check to see if URL is not part of allowed hosts data group if { !([matchclass $url_address contains $::valid_hosts]) } { If not a valid URL, then mask out URLs with X's HTTP::payload replace $url_start $url_len [string repeat "X" $url_len] } } HTTP::release }
- hoolio
Cirrostratus
Have you considered using a stream profile to do the replacements? You can use a regex for the search parameters. - David_Homoney
Nimbostratus
Thanks for the input guys. Ok so the goal is to allow some URLs to be present in the pages and have some scrubbed. There is a data group of hosts (www.domain.com) that needs to be checked to see if the link needs to be scrubbed or not. If the url contains a host from the DG it is to be left alone, otherwise it needs to be x'ed out. Make sense? - hoolio
Cirrostratus
How many domains do you want to remove from the HTTP content? I would guess that using a stream profile/expression with the list would be significantly faster than collecting the response data and performing regex operations on it. Using a stream to do this would also simplify the rule needed to do this.when HTTP_RESPONSE { if {[HTTP::header value Content-Type] contains "text"}{ STREAM::expression {@https?://(?:www\.)?example1\.com@xxxxxxxxxx@ @https?://(?:www\.)?example2\.com@xxxxxxxxxx@} STREAM::enable } }
- David_Homoney
Nimbostratus
Aaron, - David_Homoney
Nimbostratus
No problem. Since it would appear that your regexfu is better than mine, do you know how to regex for any ? I have to capture relative links and regular links. Frequently the URL is the text and I need to x that all out. - hoolio
Cirrostratus
Sure. Can you post some anonymized examples of strings you want to match and strings you don't, as they would appear in the HTML of a response? - David_Homoney
Nimbostratus
I need to match anything thing in an href tag. This includes the url (either relative or explicit) and the text afterwards as it could contain the link. - hoolio
Cirrostratus
Here is a sample of hrefs I assume you don't want to check: - Eduardo_Saito_1
Nimbostratus
Hello homoney,
Recent Discussions
Related Content
DevCentral Quicklinks
* Getting Started on DevCentral
* Community Guidelines
* Community Terms of Use / EULA
* Community Ranking Explained
* Community Resources
* Contact the DevCentral Team
* Update MFA on account.f5.com
Discover DevCentral Connects