TLS Fingerprinting Update - a method for identifying a TLS client without decryption

Some time has passed since the original version of this article, and some interesting things have happened:

New BIG-IP versions. A few have reported in the previous article that BIG-IP versions 14 and 15 caused validation errors in the iRule code. These are addressed here.

Salesforce.com extended Lee Brotherston's original work into a project called JA3, and open-sourced it. JA3 included some excellent updates, including support for TLS1.3 and GREASE protocol extensions, plus integrations with tools like Zeek/Bro and a bunch of other open source and commercial products. But most important, Lee's version included a data set of fingerprint-to-user-agent matches that hasn't really been maintained for a while. JA3, however, offers a much larger, and more current, crowd-sourced data set.

TL;DR

TLS fingerprinting is a methodology for uniquely identifying a client (user-agent) by virtue of examining a TLS Client Hello message for patterns that are particular to that user-agent. In any given TLS Client Hello message, the client will send a TLS version, a number of supported ciphers, a number of extensions, a number of elliptic curves, and curve point formats, and in a specific order. This pattern (how many elements and what order) has been found to be somewhat unique among the various user-agents (i.e. different browsers running in different OS's, non-browser Internet apps, malware command-and-control agents, etc.). It is possible then, to use this information in a few ways. Here are a few example use cases where I've personally witnessed its use over the last few years:

To attempt positive identification of user agents to control some process flow. For example, in a SSL visibility use case where SSL Orchestrator is deployed to decrypt and inspect outbound traffic, but devices may exist in the environment that aren't controlled by the organization and thus don't have the local issuing CA certificate installed. A version of TLS fingerprinting can be deployed to whitelist the known assets for decryption and inspection. It's important to understand here that TLS fingerprinting isn't (can't be) a 100%-reliable fingerprinting option. Clients can change the type and order of ciphers and extensions they use to mask their signature. But for identification of corporate-owned assets, this method can still be quite effective. It's also been highly useful at identifying malware command-and-control (C2) agents, which is exactly how some of the aforementioned JA3 integrated tools use it.

To uniquely identify a specific client (not the user-agent). As mentioned above, positive identification of a user-agent isn't 100% reliable. But, sometimes just having a hash of these unique signature properties is enough to isolate one potentially malicious client from another. And this, as it turns out, is how some of the other JA3 integrated tools use TLS fingerprint signatures. It doesn't matter if the client changes TLS properties to mask user-agent fingerprinting, as the hash of its current TLS signature is enough to identify and track it through its journey of destruction.

This article provides a few updates

First and foremost, it fixes the iRule validation issues in the original article for BIG-IP 14 and higher.

Our security heroes over in F5 SIRT created a JA3 version of this iRule that returns the JA3 version of the fingerprint hash. I've updated that JA3 code some more to include an option to search the crowd-sourced user-agent data set.

So without further ado...

TLS Fingerprinting Update

I'll ask that you refer to the original article for information on collecting the user-agent data and converting to a data group, as that hasn't changed. Otherwise, the relevant iRule code change is here in the updated Library-Rule:

## Library-Rule

## TLS Fingerprint Procedure #################
## 
## Author: Kevin Stewart, Original (12/2016), Update(09/2020)
## Derived from Lee Brotherston's "tls-fingerprinting" project @ https://github.com/LeeBrotherston/tls-fingerprinting
## Purpose: to identify the user agent based on unique characteristics of the TLS ClientHello message
## Input: 
##      Full TCP payload collected in CLIENT_DATA event of a TLS handshake ClientHello message
##      Record length (rlen)
##      TLS outer version (outer)
##      TLS inner version (inner)
##      Client IP
##      Server IP
##
## ## Update v2 to remove TCL errors from original code, and add code to return FP string, fp hash, or user-agent lookup
##############################################
proc fingerprintTLS { payload rlen outer inner clientip serverip } {
    
    ## user-defined: enable logging
    set debug 0
    
    ## user-defined: enable 
    ## - fingerprint string return      ("fp")
    ## - fingerprint md5 hash return    ("fphash")
    ## - user-agent lookup result       ("ua")
    set proc_return "fp"


    ## The first 43 bytes of a ClientHello message are the record type, TLS versions, some length values and the
    ## handshake type. We should already know this stuff from the calling iRule. We're also going to be walking the
    ## packet, so the field_offset variable will be used to track where we are.
    set field_offset 43

    ## The first value in the payload after the offset is the session ID, which may be empty. Grab the session ID length
    ## value and move the field_offset variable that many bytes forward to skip it.
    binary scan ${payload} @${field_offset}c sessID_len
    set field_offset [expr {${field_offset} + 1 + ${sessID_len}}]

    ## The next value in the payload is the ciphersuite list length (how big the ciphersuite list is. We need the binary
    ## and hex values of this data.
    binary scan ${payload} @${field_offset}S cipherList_len
    binary scan ${payload} @${field_offset}H4 cipherList_len_hex
    set cipherList_len_hex_text ${cipherList_len_hex}

    ## Now that we have the ciphersuite list length, let's offset the field_offset variable to skip over the length (2) bytes
    ## and go get the ciphersuite list. Multiple by 2 to get the number of appropriate hex characters.
    set field_offset [expr {${field_offset} + 2}]
    set cipherList_len_hex [expr {${cipherList_len} * 2}]
    binary scan ${payload} @${field_offset}H${cipherList_len_hex} cipherlist

    ## Next is the compression method length and compression method. First move field_offset to skip past the ciphersuite
    ## list, then grab the compression method length. Then move field_offset past the length (2) bytes and grab the 
    ## compression method value. Finally, move field_offset past the compression method bytes.
    set field_offset [expr {${field_offset} + ${cipherList_len}}]
    binary scan ${payload} @${field_offset}c compression_len
    #set field_offset [expr {${field_offset} + ${compression_len}}]
    set field_offset [expr {${field_offset} + 1}]
    binary scan ${payload} @${field_offset}H[expr {${compression_len} * 2}] compression_type
    set field_offset [expr {${field_offset} + ${compression_len}}]

    ## We should be in the extensions section now, so we're going to just run through the remaining data and
    ## pick out the extensions as we go. But first let's make sure there's more record data left, based on 
    ## the current field_offset vs. rlen.
    if { [expr {${field_offset} < ${rlen}}] } {
        ## There's extension data, so let's go get it. Skip the first 2 bytes that are the extensions length
        set field_offset [expr {${field_offset} + 2}]

        ## Make a variable to store the extension types we find
        set extensions_list ""

        ## Pad rlen by 1 byte
        set rlen [expr {${rlen} + 1}]

        while { [expr {${field_offset} <= ${rlen}}] } {
            ## Grab the first 2 bytes to determine the extension type
            binary scan ${payload} @${field_offset}H4 ext

            ## Store the extension in the extensions_list variable
            append extensions_list ${ext}

            ## Increment field_offset past the 2 bytes of the extension type
            set field_offset [expr {${field_offset} + 2}]

            ## Grab the 2 bytes of extension lenth
            binary scan ${payload} @${field_offset}S ext_len

            ## Increment field_offset past the 2 bytes of the extension length
            set field_offset [expr {${field_offset} + 2}]

            ## Look for specific extension types in case these need to increment the field_offset (and because we need their values)
            switch $ext {
                "000b" {
                    ## ec_point_format - there's another 1 byte after length
                    ## Grab the extension data
                    binary scan ${payload} @[expr {${field_offset} + 1}]H[expr {(${ext_len} - 1) * 2}] ext_data
                    set ec_point_format ${ext_data}
                }
                "000a" {
                    ## elliptic_curves - there's another 2 bytes after length
                    ## Grab the extension data
                    binary scan ${payload} @[expr {${field_offset} + 2}]H[expr {(${ext_len} - 2) * 2}] ext_data
                    set elliptic_curves ${ext_data}
                }
                "000d" {
                    ## sig_alg - there's another 2 bytes after length
                    ## Grab the extension data
                    binary scan ${payload} @[expr {${field_offset} + 2}]H[expr {(${ext_len} - 2) * 2}] ext_data
                    set sig_alg ${ext_data}
                }
                default {
                    ## Grab the otherwise unknown extension data
                    binary scan ${payload} @${field_offset}H[expr {${ext_len} * 2}] ext_data
                }
            }

            ## Increment the field_offset past the extension data length. Repeat this loop until we reach rlen (the end of the payload)
            set field_offset [expr {${field_offset} + ${ext_len}}]
        }
    }

    ## Now let's compile all of that data.
    set cipl [string toupper ${cipherList_len_hex_text}]
    set ciph [string toupper ${cipherlist}]
    set coml ${compression_len}
    set comp [string toupper ${compression_type}]
    if { ( [info exists extensions_list] ) and ( ${extensions_list} ne "" ) } { set exte [string toupper ${extensions_list}] } else { set exte "@@@@" }
    if { ( [info exists elliptic_curves] ) and ( ${elliptic_curves} ne "" ) } { set ecur [string toupper ${elliptic_curves}] } else { set ecur "@@@@" }
    if { ( [info exists sig_alg] ) and ( ${sig_alg} ne "" ) } { set siga [string toupper ${sig_alg}] } else { set siga "@@@@" }
    if { ( [info exists ec_point_format] ) and ( ${ec_point_format} ne "" ) } { set ecfp [string toupper ${ec_point_format}] } else { set ecfp "@@@@" }

    ## Now let's build the fingerprint string and search the database
    set fingerprint_str "${outer}+${inner}+${cipl}+${ciph}+${coml}+${comp}+${exte}+${ecur}+${siga}+${ecfp}"
    if { ${debug} } { log local0. "${clientip}-${serverip}: fingerprint_str = ${fingerprint_str}" }

    switch ${proc_return} {
        "fp" {
            return ${fingerprint_str}
        }
        "fphash" {
            binary scan [md5 ${fingerprint_str}] H* fp_digest
            return ${fp_digest}
        }
        "ua" {
            ## Initialize the match variable
            set match ""
        
            if { [class match ${fingerprint_str} equals fingerprint_db] } {
                ## Direct match
                set match [class match -value ${fingerprint_str} equals fingerprint_db]
            } elseif { not ( ${ciph} starts_with "C0" ) and not ( ${ciph} starts_with "00" ) } {
                ## Hmm.. there's no direct match, which could either mean a database entry doesn't exist, or Chrome (and Opera) are adding
                ## special values to the cipherlist, extensions list and elliptic curves list.
                ##  ex. 9A9A, 5A5A, EAEA, BABA, etc. at the beginning of the cipherlist 
                ## Let's strip out these anomalous values and try the match again.
        
                ## Substract 2 bytes from cipherlist length (v2 fix)
                set cipl_tmp "set cipl \[format %04x \[expr \{ \[expr 0x$\{cipl\}\] - 2 \}\]\]"
                eval ${cipl_tmp}
        
                ## Subtract 2 bytes from the front of the cipher list
                set ciph [string range ${ciph} 4 end]
        
                ## Subtract 2 bytes from the front of the extensions list
                set exte [string range ${exte} 4 end]
                ## There might be an additional random set in the string that needs to be removed (pattern is "(.)A\1A")
                ## (v2 fix)
                regsub {(.)A\\1A} ${exte} "" exte
        
                ## Subtract 2 bytes from the front of the elliptic curves list
                set ecur [string range ${ecur} 4 end]
        
                ## Rebuild the fingerprint string
                set fingerprint_str "${outer}+${inner}+${cipl}+${ciph}+${coml}+${comp}+${exte}+${ecur}+${siga}+${ecfp}"
        
                if { [class match ${fingerprint_str} equals fingerprint_db] } {
                    ## Guess match
                    set match [class match -value ${fingerprint_str} equals fingerprint_db]
                } else {
                    ## No match
                    set match ""
                }
            }
        
            ## Return the matching user agent string
            return ${match}
        }
    }
}

Of particular interest, I've added some control flags at the top of the iRule to control debug logging and what to output:

    ## user-defined: enable logging
    set debug 0
    
    ## user-defined: enable 
    ## - fingerprint string return      ("fp")
    ## - fingerprint md5 hash return    ("fphash")
    ## - user-agent lookup result       ("ua")
    set proc_return "fp"

where "fp" returns the full TLS fingerprint string, "fphash" returns a short md5 hash of that string (see the second use case above), and "ua" attempts to match the fingerprint against the data group (removing the GREASE extensions first). The caller iRule is the same as the original.

JA3 TLS Fingerprint Update

For this exercise, we'll extend upon Aaron Brailsford's JA3 version (F5 SIRT) and add a user-agent lookup. As with the above, I've given you the option to select what to output, so if you only want the hash, then you won't need the data set. But if you do want to lookup the user-agent, you'll first need to acquire the data.

In an empty folder on your local machine (or BIG-IP), download the crowd-sourced JA3 data to a file:

curl -vk https://ja3er.com/getAllUasJson > ja3.db

Then create and run this simple Python script to convert this file to external data group format. It's a lot of data, so you'll want to use an external data group for this.

import json

with open('ja3.db') as data_file:
    data = json.load(data_file)

file = open("ja3.dg", "a")

for x in data:
    if x['User-Agent']: 
        #print("%s := %s," % (x['md5'], x['User-Agent']))
        file.write("\"%s\" := \"%s\",\n" % (x['md5'].encode('utf-8'), x['User-Agent'].encode('utf-8').strip('\"')))

file.close()

Simply run it like this:

python ja3-converter.py

The script looks for ja3.db and creates a new ja3.dg file. Now go ahead and import this to the BIG-IP. Navigate to System -> File Management -> Data Group File List. then click Import.

File Name: browse to and select the ja3.dg file
Name: enter "ja3_dg"
File Contents: select "String"
Key / Value Pair Separator: select :=
Data Group Name: enter "ja3_dg"

Now create the "Library-Rule" iRule:

## Library-Rule

## JA3 TLS Fingerprint Procedure #################
##
## Author: Kevin Stewart, 09/2020
## Derived from Aaron Brailsford's JA3 version
## Derived from Lee Brotherston's "tls-fingerprinting" project @ https://github.com/LeeBrotherston/tls-fingerprinting
## Based on the TLS Fingerprinting iRule by Kevin Stewart @ https://devcentral.f5.com/s/articles/tls-fingerprinting-a-method-for-identifying-a-tls-client-without-decrypting-24598
## Purpose: to identify the user agent based on unique characteristics of the TLS ClientHello message
## Input:
##   Full TCP payload collected in CLIENT_DATA event of a TLS handshake ClientHello message
##   Record length (rlen)
##   TLS inner version (sslversion)
##
## Update v2 to remove GREASE information and provide method to search a local ja3 user-agent database
##############################################
proc fingerprintTLS { payload rlen sslversion } {

    ## user-defined: debug logging
    set debug 0
    
    ## user-defined: enable
    ## - ja3 fingerprint hash           ("fphash")
    ## - ja3 user-agent lookup result   ("ua")
    set proc_return "fphash"    


    ## decimal-converted GREASE values - these will be removed from the output
    set GREASE_LIST [list 2570 6682 10794 14906 19018 23130 27242 31354 35466 39578 43690 47802 51914 56026 60138 64250]

    ## The first 43 bytes of a ClientHello message are the record type, TLS versions, some length values and the
    ## handshake type. We should already know this stuff from the calling iRule. We're also going to be walking the
    ## packet, so the field_offset variable will be used to track where we are.
    set field_offset 43

    ## The first value in the payload after the offset is the session ID, which may be empty. Grab the session ID length
    ## value and move the field_offset variable that many bytes forward to skip it.
    binary scan ${payload} @${field_offset}c sessID_len
    set field_offset [expr {${field_offset} + 1 + ${sessID_len}}]

    ## The next value in the payload is the ciphersuite list length (how big the ciphersuite list is.
    binary scan ${payload} @${field_offset}S cipherList_len

    ## Now that we have the ciphersuite list length, let's offset the field_offset variable to skip over the length (2) bytes
    ## and go get the ciphersuite list.
    set field_offset [expr {${field_offset} + 2}]
    binary scan ${payload} @${field_offset}S[expr {${cipherList_len} / 2}] cipherlist_decimal

    ## Next is the compression method length and compression method. First move field_offset to skip past the ciphersuite
    ## list, then grab the compression method length. Then move field_offset past the length (2)
    ## Finally, move field_offset past the compression method bytes.
    set field_offset [expr {${field_offset} + ${cipherList_len}}]
    binary scan ${payload} @${field_offset}c compression_len
    set field_offset [expr {${field_offset} + 1}]
    set field_offset [expr {${field_offset} + ${compression_len}}]

    ## We should be in the extensions section now, so we're going to just run through the remaining data and
    ## pick out the extensions as we go. But first let's make sure there's more record data left, based on
    ## the current field_offset vs. rlen.
    if { [expr {${field_offset} < ${rlen}}] } {
    ## There's extension data, so let's go get it. Skip the first 2 bytes that are the extensions length
    set field_offset [expr {${field_offset} + 2}]

    ## Make a variable to store the extension types we find
    set extensions_list ""

    ## Pad rlen by 1 byte
    set rlen [expr {${rlen} + 1}]

    while { [expr {${field_offset} <= ${rlen}}] } {
        ## Grab the first 2 bytes to determine the extension type
        binary scan ${payload} @${field_offset}S ext
        set ext [expr {$ext & 0xFFFF}]

        ## Store the extension in the extensions_list variable
        lappend extensions_list ${ext}

        ## Increment field_offset past the 2 bytes of the extension type
        set field_offset [expr {${field_offset} + 2}]

        ## Grab the 2 bytes of extension lenth
        binary scan ${payload} @${field_offset}S ext_len

        ## Increment field_offset past the 2 bytes of the extension length
        set field_offset [expr {${field_offset} + 2}]

        ## Look for specific extension types in case these need to increment the field_offset (and because we need their values)
        switch $ext {
            "11" {
                ## ec_point_format - there's another 1 byte after length
                ## Grab the extension data
                binary scan ${payload} @[expr {${field_offset} + 1}]s ext_data
                set ec_point_format ${ext_data}
            }
            "10" {
                ## elliptic_curves - there's another 2 bytes after length
                ## Grab the extension data
                binary scan ${payload} @[expr {${field_offset} + 2}]S[expr {(${ext_len} - 2) / 2}] ext_data
                set elliptic_curves ${ext_data}
            }
            default {
                ## Grab the otherwise unknown extension data
                binary scan ${payload} @${field_offset}H[expr {${ext_len} * 2}] ext_data
            }
        }

        ## Increment the field_offset past the extension data length. Repeat this loop until we reach rlen (the end of the payload)
        set field_offset [expr {${field_offset} + ${ext_len}}]
    }
}

    ## Now let's compile all of that data.
    ## The cipherlist values need masking with 0xFFFF to return the unsigned integers we need
    ## v2 update: strip out GREASE values
    foreach cipher $cipherlist_decimal {
        if { [lsearch -exact ${GREASE_LIST} [expr {$cipher & 0xFFFF}]] == -1 } {
            lappend cipd [expr {$cipher & 0xFFFF}]
        }
    }

    set cipd_str [join $cipd "-"]

    ## v2 update: strip out GREASE values
    if { ( [info exists extensions_list] ) and ( ${extensions_list} ne "" ) } {
        set exte_tmp [list]
        foreach x ${extensions_list} {
            if { [lsearch -exact ${GREASE_LIST} ${x}] == -1 } {
                lappend exte_tmp ${x}
            }
        }
        set exte [join ${exte_tmp} "-"] 
    } else { 
        set exte "" 
    }

    ## v2 update: strip out GREASE values
    if { ( [info exists elliptic_curves] ) and ( ${elliptic_curves} ne "" ) } { 
        set ecur_tmp [list]
        foreach x ${elliptic_curves} {
            if { [lsearch -exact ${GREASE_LIST} ${x}] == -1 } {
                lappend ecur_tmp ${x}
            }
        }
        set ecur [join ${ecur_tmp} "-"] 
    } else { 
        set ecur "" 
    }

    if { ( [info exists ec_point_format] ) and ( ${ec_point_format} ne "" ) } { set ecfp [join ${ec_point_format} "-"] } else { set ecfp "" }

    set ja3_str "${sslversion},${cipd_str},${exte},${ecur},${ecfp}"
    binary scan [md5 ${ja3_str}] H* ja3_digest
    if { ${debug} } { 
        log local0. "ja3 string = ${ja3_str}"
        log local0. "ja3 digest = ${ja3_digest}"
    }

    switch ${proc_return} {
        "fphash" {
            return ${ja3_digest}    
        }
        "ua" {
             if { [set ua_match [class match -value ${ja3_digest} eq ja3_dg]] ne "" } {
                return ${ua_match}
            } else {
                return "No Match"
            }    
        }
    }
}

Again, I've given you the option to enable/disable debug logging, and select what data to return:

    ## user-defined: debug logging
    set debug 0
    
    ## user-defined: enable
    ## - ja3 fingerprint hash           ("fphash")
    ## - ja3 user-agent lookup result   ("ua")
    set proc_return "fphash"

where "fphash" is the JA3 fingerprint hash, and "ua" attempts to find the matching user-agent in the data group. The caller iRule is pretty similar to the original:

when CLIENT_ACCEPTED {
  ## Collect the TCP payload
  TCP::collect
}
when CLIENT_DATA {
  ## Get the TLS packet type and versions
  if { ! [info exists rlen] } {
    ## We actually only need the recort type (rtype), record length (rlen) handshake type (hs_type) and 'inner' SSL version (inner_sslver) here
    ## But it's easiest to parse them all out of the payload along with the bytes we don't need (outer_sslver & rilen)
    binary scan [TCP::payload] cSScH6S rtype outer_sslver rlen hs_type rilen inner_sslver

    if { ( ${rtype} == 22 ) and ( ${hs_type} == 1 ) } {
      ## This is a TLS ClientHello message (22 = TLS handshake, 1 = ClientHello)

      ## Call the fingerprintTLS proc
      set ja3_fingerprint [call Library-Rule::fingerprintTLS [TCP::payload] ${rlen} ${inner_sslver}]
      #binary scan [md5 ${ja3_fingerprint}] H* ja3_digest

      ### Do Something here ###
      log local0. "[IP::client_addr]:[TCP::client_port] ja3 ${ja3_fingerprint}"
      ### Do Something here ###

    }
  }

  # Collect the rest of the record if necessary
  if { [TCP::payload length] < $rlen } {
    TCP::collect $rlen
  }

  ## Release the paylaod
  TCP::release
}

And there you have it. I think you'll find that if you experiment with both of these versions, you'll have more success matching a user-agent to the JA3 data set. Otherwise, using the fingerprint hash (from either version) provides a compelling method to uniquely identify and track clients as a function of threat intelligence.

Thanks

-- Kevin

Published Oct 05, 2020

Version 1.0