Social Security Number Scrubbing

Problem this snippet solves:

This sample iRule will search all outbound HTTP traffic and mask all social security numbers.

Want to secure your site from accidentally exposing Social Security Numbers? No problem says the iRules team. This example will show how to, for a given class of uri's, scrub the SSN's from response content replacing them with the blanket "xxx-xx-xxxx" string.

Code :

class scrub_uris {
   "/cgi-bin",
   "/account"
}

when HTTP_REQUEST {
   if { [matchclass [HTTP::uri] starts_with $::scrub_uris] } {
      set scrub_content 1
      # Don't allow data to be chunked
      if { [HTTP::version] eq "1.1" } {
         if { [HTTP::header is_keepalive] } {
            HTTP::header replace "Connection" "Keep-Alive"
         }
         HTTP::version "1.0"
      }
   } else {
      set scrub_content 0
   }
}
when HTTP_RESPONSE {
   if { $scrub_content } {
      # Only collect up to 2048000 bytes (SOL6578)
      if { [HTTP::header exists "Content-Length"] && [HTTP::header "Content-Length"] < 2048000} {
         set content_length [HTTP::header "Content-Length"]
      } else {
         set content_length 2048000
      }
      if { $content_length > 0 } {
         HTTP::collect $content_length
      }
   }
}
when HTTP_RESPONSE_DATA {
   # Find the SSN numbers
   set ssn_indices [regexp -all -inline -indices {\d{3}-\d{2}-\d{4}} [HTTP::payload]]
   # Scrub the SSN's from the response
   foreach ssn_idx $ssn_indices {
      set ssn_start [lindex $ssn_idx 0]
      set ssn_len [expr {[lindex $ssn_idx 1] - $ssn_start + 1}]
      HTTP::payload replace $ssn_start $ssn_len "xxx-xx-xxxx"
   }
}

# Here is an alternative way to write the last event:

when HTTP_RESPONSE_DATA {
    if { [regsub -all {\d{3}-\d{2}-\d{4}} [HTTP::payload] "xxx-xx-xxxx" newdata] } {
        HTTP::payload replace 0 [HTTP::payload length] $newdata  
    }
}

# It's fewer line of code - but is it faster? It depends on how many replacements there are on the page. If there are just a few it is cheaper to replace them one at a time, communicating only the changes, as the original block does. If on the other hand there are many replacements it is better to use the very efficient regsub to scrub the whole buffer in one command and pass the complete result to HTTP::payload replace. I tested with a smallish web page (3k) and found that if there more than 10 replacements the regsub method was faster. Both methods are equally fast when there is nothing to replace. Your mileage may vary.

Published Mar 18, 2015

Version 1.0