iRule to examine response HTML for image URLs

Question

Hi,&nbsp;I'd like to write an iRule to scan a HTML response body for image tags and store all of the source URLs in a table. I understand that once I've fired a HTTP::collect in a HTTP_REQUEST event, I can see retrieve the response body in the HTTP_RESPONSE_DATA event using HTTP::payload. However, I struggling with an approach to scan or parse the response string to find all of the image tags. I do not want to modify the response I only want to identify and store the image URLs.&nbsp;Do you think it’s possible to do something like this in an iRule. I’ll only be processing a very small subset of responses, so I’m not too concerned with any performance overheads this may incur.&nbsp;Thanks in advance,&nbsp;Ger.

dario_garrido · Answer

Hello Gerhackett.After collecting the HTTP payload, you can do something like this.when HTTP_RESPONSE_DATA {
	set find "https?://.*.jpg"
	set indices [regexp -all -indices -inline $find [HTTP::payload]]
&nbsp;
	foreach idx $indices {
		set start [ expr { [lindex $idx 0] } ]
		set end [expr { [lindex $idx 1] } ]
		log local0. "[substr [HTTP::payload] $start $end]"
	}
}Modify the regex properly to match your goal. You can test your regex here:https://regex101.com/Regards,Dario.

andrew-f5 · Answer

You can use an HTML profile + iRule to log all the src attributes which is fairly simple and less resource intensive compared to HTTP::collect but the src attribute obviously isn't a full path or URI if that's what you really need. In my example below the logged 'src' attribute is image/blacklogo.png while the full URL http://mywebsite.com/test/image/blacklogo.png. From here you could store the src attributes value in a table or a variable to be used later.Using your approach you will likely want to use some regex to identify the img tag in the collected HTTP data then log or store whatever value you want for example "&lt;img([\w\W]+?)/&gt;"A third alternative would be to just parse the URI looking for image file extensions then log the full URL.if { [string tolower [HTTP::uri]] contains ".png" } {   log local0. "[HTTP::url]"}== References ==K99872325: Modifying HTML tag attributes using an HTML profile∟ Match Tag Name = img∟ Match Attribute Name = (empty)∟ Match Attribute Value = (empty)HTML::tag attribute (iRules CloudDocs)table (iRules CloudDocs)when HTML_TAG_MATCHED {
	log local0. "[HTML::tag attribute value "src"]"
}
&nbsp;
http://mywebsite.com/test/
Oct 19 08:40:33 bigip1 info tmm[18126]: Rule /Common/matchImgTagiRule &lt;HTML_TAG_MATCHED&gt;: image/blacklogo.png
Oct 19 08:40:33 bigip1 info tmm[18126]: Rule /Common/matchImgTagiRule &lt;HTML_TAG_MATCHED&gt;: image/22.png
Oct 19 08:40:33 bigip1 info tmm[18126]: Rule /Common/matchImgTagiRule &lt;HTML_TAG_MATCHED&gt;: image/iislogo.png

gerhackett · Answer

Thanks very much Dario and Andrew for your responses. I think I'll go with the collect/regex approach. I only want to examine a very small percentage of the responses, less than 1 in a 1000. On that basis I think using HTML_TAG_MATCHED would be less efficient.

Thanks again,

Ger.

andrew-f5 · Answer

Just an FYI the tag is found via the HTML profile, built-in profiles generally are faster than iRules for many situations. The iRule is parsing the tag(s) that were already found via the HTML profile. I do agree with your assessment that you're only trying to review a small sample.

Forum Discussion

iRule to examine response HTML for image URLs

Recent Discussions

F5 looses the token for the first call

iRule - Url rewrite and header replace and pool selection not working

Switch ssl profile based on weak cipher detection via IRULE

full-proxy HTTP2

Decode ObjectSID from Base64-encode string

Related Content

Http response irule for server address with TLSv1.2

Decode SAML Response from IDP Server

Devcentral email response adds the entire signature to the response!

SSL Orchestrator Response Inspection

iRULE regsub error

ABOUT DEVCENTRAL

RESOURCES

SUPPORT

PARTNERS