21-Dec-2012 09:09 - edited 02-Oct-2023 10:08
I had the privilege of watching an exercise in string handling unfold on an internal list recently that should have its day in the community.
URLs for this particular application arrive in this format:
/session/636971/abcd/hlsa/dcwiuh/index.html
/session/128766/ssrg/hlsb/yuyt/yutyu/index.html
/session/128766/dwlmhlsb/ewfwef/index.html
/session/122623/wqs/wedew/mhlsb/ewfwef/uiy/index.html
The useful data to extract exists between /session/[nnn] and index.html, so the final data we want, respectively, is:
/abcd/hlsa/dcwiuh
/ssrg/hlsb/yuyt/yutyu
/dwlmhlsb/ewfwef
/wqs/wedew/mhlsb/ewfwef/uiy
There are almost always multiple solutions to a problem set, and this scenario is no different. Each approach is shown below. The proc wrapper is only for testing purposes, the code within the proc is the actual solution for each approach.
proc splitjoin {arg} {
# convert: /session/1111/abcd/efg/hij/index.html
# to: { {} session 1111 abcd efg hih index.html }
set sl [split $arg {/}]
# remove leading { {} session 1111 } and trailing index.html from $sl,
# then produce / followed by remainder of $sl joined with /
set result "/[join [lrange $sl 3 [expr { [llength $sl] -2}]] {/}]"
}
proc scana {arg} {
scan $arg {/%[^/]/%[^/]%s} a b c
set result [string range $c 0 [expr { [string last {/} $c] - 1 }]]
}
proc scanb {arg} {
# split the path, skipping ‘session’ and ‘[nnn]’ and setting c to ‘/[xxx]/[yyy]’
scan $arg {/%*[^/]/%*[^/]%s} c
# remove the set of characters following the last slash (i.e., ‘/[yyy]’) in c
set result [string range $c 0 [expr { [string last {/} $c] - 1 }]]
}
proc regex1 {arg} {
# after match, $whole is /session/[nnn]/[xxx]/ and $result is /[xxx]
regexp {^/[^/]+/[^/]+(/.+)/} $arg whole result
}
In order from least to most efficient:
Approach | Command | Time |
Regular Expressions | time {regex1 $x} 100000 | 17.22531 microseconds |
Scan will all variables | time {scana $x} 100000 | 7.999566 microseconds |
Split & Lists | time {splitjoin $x} 100000 | 6.90115 microseconds |
Scan with necessary variable | time {scanb $x} 100000 | 6.12924 microseconds |
Where $x is the longest of the original strings at the top of this article. I ran these tests in the tclsh on BIG-IP LTM VE 11.2.1 running on an ESXi 4.1 installation. Actual numbers in iRules will likely be different, but the performance of these commands in relation to one another shouldn't vary much. Many thanks to F5er Vernon Wells, Cameron Jenkins, and Ken Wong for the source information!