cancel
Showing results for 
Search instead for 
Did you mean: 
Login & Join the DevCentral Connects Group to watch the Recorded LiveStream (May 12) on Basic iControl Security - show notes included.

iRule to examine response HTML for image URLs

gerhackett
Nimbostratus
Nimbostratus

Hi,

 

I'd like to write an iRule to scan a HTML response body for image tags and store all of the source URLs in a table. I understand that once I've fired a HTTP::collect in a HTTP_REQUEST event, I can see retrieve the response body in the HTTP_RESPONSE_DATA event using HTTP::payload. However, I struggling with an approach to scan or parse the response string to find all of the image tags. I do not want to modify the response I only want to identify and store the image URLs.

 

Do you think it’s possible to do something like this in an iRule. I’ll only be processing a very small subset of responses, so I’m not too concerned with any performance overheads this may incur.

 

Thanks in advance,

 

Ger.

4 REPLIES 4

Hello Gerhackett.

 

After collecting the HTTP payload, you can do something like this.

when HTTP_RESPONSE_DATA { set find "https?://.*.jpg" set indices [regexp -all -indices -inline $find [HTTP::payload]]   foreach idx $indices { set start [ expr { [lindex $idx 0] } ] set end [expr { [lindex $idx 1] } ] log local0. "[substr [HTTP::payload] $start $end]" } }

Modify the regex properly to match your goal. You can test your regex here:

https://regex101.com/

 

Regards,

Dario.

Regards,
Dario.

Andrew-F5
F5 Employee
F5 Employee

You can use an HTML profile + iRule to log all the src attributes which is fairly simple and less resource intensive compared to HTTP::collect but the src attribute obviously isn't a full path or URI if that's what you really need. In my example below the logged 'src' attribute is image/blacklogo.png while the full URL http://mywebsite.com/test/image/blacklogo.png. From here you could store the src attributes value in a table or a variable to be used later.

 

Using your approach you will likely want to use some regex to identify the img tag in the collected HTTP data then log or store whatever value you want for example "<img([\w\W]+?)/>"

 

A third alternative would be to just parse the URI looking for image file extensions then log the full URL.

if { [string tolower [HTTP::uri]] contains ".png" } {

log local0. "[HTTP::url]"

}

 

== References ==

K99872325: Modifying HTML tag attributes using an HTML profile

∟ Match Tag Name = img

Match Attribute Name = (empty)

∟ Match Attribute Value = (empty)

 

HTML::tag attribute (iRules CloudDocs)

table (iRules CloudDocs)

when HTML_TAG_MATCHED { log local0. "[HTML::tag attribute value "src"]" }   http://mywebsite.com/test/ Oct 19 08:40:33 bigip1 info tmm[18126]: Rule /Common/matchImgTagiRule <HTML_TAG_MATCHED>: image/blacklogo.png Oct 19 08:40:33 bigip1 info tmm[18126]: Rule /Common/matchImgTagiRule <HTML_TAG_MATCHED>: image/22.png Oct 19 08:40:33 bigip1 info tmm[18126]: Rule /Common/matchImgTagiRule <HTML_TAG_MATCHED>: image/iislogo.png

 

gerhackett
Nimbostratus
Nimbostratus

Thanks very much Dario and Andrew for your responses. I think I'll go with the collect/regex approach. I only want to examine a very small percentage of the responses, less than 1 in a 1000. On that basis I think using HTML_TAG_MATCHED would be less efficient.

 

Thanks again,

 

Ger.

Andrew-F5
F5 Employee
F5 Employee

Just an FYI the tag is found via the HTML profile, built-in profiles generally are faster than iRules for many situations. The iRule is parsing the tag(s) that were already found via the HTML profile. I do agree with your assessment that you're only trying to review a small sample.