Forum Discussion

gerhackett's avatar
gerhackett
Icon for Nimbostratus rankNimbostratus
Oct 13, 2020

iRule to examine response HTML for image URLs

Hi,

 

I'd like to write an iRule to scan a HTML response body for image tags and store all of the source URLs in a table. I understand that once I've fired a HTTP::collect in a HTTP_REQUEST event, I can see retrieve the response body in the HTTP_RESPONSE_DATA event using HTTP::payload. However, I struggling with an approach to scan or parse the response string to find all of the image tags. I do not want to modify the response I only want to identify and store the image URLs.

 

Do you think it’s possible to do something like this in an iRule. I’ll only be processing a very small subset of responses, so I’m not too concerned with any performance overheads this may incur.

 

Thanks in advance,

 

Ger.

  • Hello Gerhackett.

    After collecting the HTTP payload, you can do something like this.

    when HTTP_RESPONSE_DATA {
    	set find "https?://.*.jpg"
    	set indices [regexp -all -indices -inline $find [HTTP::payload]]
     
    	foreach idx $indices {
    		set start [ expr { [lindex $idx 0] } ]
    		set end [expr { [lindex $idx 1] } ]
    		log local0. "[substr [HTTP::payload] $start $end]"
    	}
    }

    Modify the regex properly to match your goal. You can test your regex here:

    https://regex101.com/

    Regards,

    Dario.

  • You can use an HTML profile + iRule to log all the src attributes which is fairly simple and less resource intensive compared to HTTP::collect but the src attribute obviously isn't a full path or URI if that's what you really need. In my example below the logged 'src' attribute is image/blacklogo.png while the full URL http://mywebsite.com/test/image/blacklogo.png. From here you could store the src attributes value in a table or a variable to be used later.

    Using your approach you will likely want to use some regex to identify the img tag in the collected HTTP data then log or store whatever value you want for example "<img([\w\W]+?)/>"

    A third alternative would be to just parse the URI looking for image file extensions then log the full URL.

    if { [string tolower [HTTP::uri]] contains ".png" } {

    log local0. "[HTTP::url]"

    }

    == References ==

    K99872325: Modifying HTML tag attributes using an HTML profile

    ∟ Match Tag Name = img

    Match Attribute Name = (empty)

    ∟ Match Attribute Value = (empty)

    HTML::tag attribute (iRules CloudDocs)

    table (iRules CloudDocs)

    when HTML_TAG_MATCHED {
    	log local0. "[HTML::tag attribute value "src"]"
    }
     
    http://mywebsite.com/test/
    Oct 19 08:40:33 bigip1 info tmm[18126]: Rule /Common/matchImgTagiRule <HTML_TAG_MATCHED>: image/blacklogo.png
    Oct 19 08:40:33 bigip1 info tmm[18126]: Rule /Common/matchImgTagiRule <HTML_TAG_MATCHED>: image/22.png
    Oct 19 08:40:33 bigip1 info tmm[18126]: Rule /Common/matchImgTagiRule <HTML_TAG_MATCHED>: image/iislogo.png
  • Thanks very much Dario and Andrew for your responses. I think I'll go with the collect/regex approach. I only want to examine a very small percentage of the responses, less than 1 in a 1000. On that basis I think using HTML_TAG_MATCHED would be less efficient.

     

    Thanks again,

     

    Ger.

  • Just an FYI the tag is found via the HTML profile, built-in profiles generally are faster than iRules for many situations. The iRule is parsing the tag(s) that were already found via the HTML profile. I do agree with your assessment that you're only trying to review a small sample.