optimization
110 TopicsAdvanced iRules: An Abstract View of iRules with the Tcl Bytecode Disassembler
In case you didn't already know, I'm a child of the 70's. As such, my formative years were in the 80's, where the music and movies were synthesized and super cheesy. Short Circuit was one of those cheesy movies, featuring Ally Sheedy, Steve Guttenberg, and Johnny Five, the tank-treaded laser-wielding robot with feelings and self awareness. Oh yeah! The plot...doesn't at all matter, but Johnny's big fear once reaching self actualization was being disassembled. Well, in this article, we won't dissamble Johnny Five, but we will take a look at disassembling some Tcl code and talk about optimizations. Tcl forms the foundation of several code environments on BIG-IP: iRules, iCall, tmsh, and iApps. The latter environments don't carry the burden of performance that iRules do, so efficiency isn't as big a concern. When we speak at conferences, we often commit some time to cover code optimization techniques due to the impactful nature of applying an iRule to live traffic. This isn't to say that the system isn't highly tuned and optimized already, it's just important not to introduce any more impact than is absolutely necessary to carry out purpose. In iRules, you can turn timing on to see the impact of an iRule, and in the Tcl shell (tclsh) you can use the time command. These are ultimately the best tools to see what the impact is going to be from a performance perspective. But if you want to see what the Tcl interpreter is actually doing from an instruction standpoint, well, you will need to disassemble the code. I've looked at bytecode in some of the python scripts I've written, but I wasn't aware of a way to do that in Tcl. I found a thread on stack that indicated it was possible, and after probing a little further was given a solution. This doesn't work in Tcl 8.4, which is what the BIG-IP uses, but it does work on 8.5+ so if you have a linux box with 8.5+ you're good to go. Note that there are variances from version to version that could absolutely change the way the interpreter works, so understand that this is an just an exercise in discovery. Solution 1 Fire up tclsh and then grab a piece of code. For simplicity, I'll use two forms of a simple math problem. The first is using the expr command to evaluate 3 times 4, and the second is the same math problem, but wraps the evaluation with curly brackets. The command that will show how the interpreter works its magic is tcl::unsupported::disassemble. ## ## unwrapped expression ## ## % tcl::unsupported::disassemble script { expr 3 * 4 } ByteCode 0x0x1e1ee20, refCt 1, epoch 16, interp 0x0x1d59670 (epoch 16) Source " expr 3 * 4 " Cmds 1, src 12, inst 14, litObjs 4, aux 0, stkDepth 5, code/src 0.00 Commands 1: 1: pc 0-12, src 1-11 Command 1: "expr 3 * 4 " (0) push1 0 # "3" (2) push1 1 # " " (4) push1 2 # "*" (6) push1 1 # " " (8) push1 3 # "4" (10) concat1 5 (12) exprStk (13) done ## ## wrapped expression ## ## % tcl::unsupported::disassemble script { expr { 3 * 4 } } ByteCode 0x0x1de7a40, refCt 1, epoch 16, interp 0x0x1d59670 (epoch 16) Source " expr { 3 * 4 } " Cmds 1, src 16, inst 3, litObjs 1, aux 0, stkDepth 1, code/src 0.00 Commands 1: 1: pc 0-1, src 1-15 Command 1: "expr { 3 * 4 } " (0) push1 0 # "12" (2) done Because the first expression is unwrapped, the interpreter has to build the expression and then call the runtime expression engine, resulting in 4 objects and a stack depth of 5. With the wrapped expression, the interpreter found a compile-time constant and used that directly, resulting in 1 object and a stack depth of 1 as well. Much thanks to Donal Fellows on Stack Overflow for the details. Using the time command in the shell, you can see that wrapping the expression results in a wildly more efficient experience. % time { expr 3 * 4 } 100000 1.02325 microseconds per iteration % time { expr {3*4} } 100000 0.07945 microseconds per iteration Solution 2 I was looking in earnest for some explanatory information for the bytecode fields displayed with tcl::unsupported::disassemble, and came across a couple pages on the Tcl wiki, one building on the other. Combining the pertinent sections of code from each page results in this script you can paste into tclsh: namespace eval tcl::unsupported {namespace export assemble} namespace import tcl::unsupported::assemble rename assemble asm interp alias {} disasm {} ::tcl::unsupported::disassemble proc aproc {name argl body args} { proc $name $argl $body set res [disasm proc $name] if {"-x" in $args} { set res [list proc $name $argl [list asm [dis2asm $res]]] eval $res } return $res } proc dis2asm body { set fstart " push -1; store @p; pop " set fstep " incrImm @p +1;load @l;load @p listIndex;store @i;pop load @l;listLength;lt " set res "" set wait "" set jumptargets {} set lines [split $body \n] foreach line $lines { ;#-- pass 1: collect jump targets if [regexp {\# pc (\d+)} $line -> pc] {lappend jumptargets $pc} } set lineno 0 foreach line $lines { ;#-- pass 2: do the rest incr lineno set line [string trim $line] if {$line eq ""} continue set code "" if {[regexp {slot (\d+), (.+)} $line -> number descr]} { set slot($number) $descr } elseif {[regexp {data=.+loop=%v(\d+)} $line -> ptr]} { #got ptr, carry on } elseif {[regexp {it%v(\d+).+\[%v(\d+)\]} $line -> copy number]} { set loopvar [lindex $slot($number) end] if {$wait ne ""} { set map [list @p $ptr @i $loopvar @l $copy] set code [string map $map $fstart] append res "\n $code ;# $wait" set wait "" } } elseif {[regexp {^ *\((\d+)\) (.+)} $line -> pc instr]} { if {$pc in $jumptargets} {append res "\n label L$pc;"} if {[regexp {(.+)#(.+)} $instr -> instr comment]} { set arg [list [lindex $comment end]] if [string match jump* $instr] {set arg L$arg} } else {set arg ""} set instr0 [normalize [lindex $instr 0]] switch -- $instr0 { concat - invokeStk {set arg [lindex $instr end]} incrImm {set arg [list $arg [lindex $instr end]]} } set code "$instr0 $arg" switch -- $instr0 { done { if {$lineno < [llength $lines]-2} { set code "jump Done" } else {set code ""} } startCommand {set code ""} foreach_start {set wait $line; continue} foreach_step {set code [string map $map $fstep]} } append res "\n [format %-24s $code] ;# $line" } } append res "\n label Done;\n" return $res } proc normalize instr { regsub {\d+$} $instr "" instr ;# strip off trailing length indicator set instr [string map { loadScalar load nop "" storeScalar store incrScalar1Imm incrImm } $instr] return $instr } Now that the script source is in place, you can test the two expressions we tested in solution 1. The output is very similar, however, there is less diagnostic information to go with the bytecode instructions. Still, the instructions are consistent between the two solutions. The difference here is that after "building" the proc, you can execute it, shown below each aproc expression. % aproc f x { expr 3 * 4 } -x proc f x {asm { push 3 ;# (0) push1 0 # "3" push { } ;# (2) push1 1 # " " push * ;# (4) push1 2 # "*" push { } ;# (6) push1 1 # " " push 4 ;# (8) push1 3 # "4" concat 5 ;# (10) concat1 5 exprStk ;# (12) exprStk ;# (13) done label Done; }} % f x 12 % aproc f x { expr { 3 * 4 } } -x proc f x {asm { push 12 ;# (0) push1 0 # "12" ;# (2) done label Done; }} % f x 12 Deeper Down the Rabbit Hole Will the internet explode if I switch metaphors from bad 80's movie to literary classic? I guess we'll find out. Simple comparisons are interesting, but now that we're peeling back the layers, let's look at something a little more complicated like a for loop and a list append. % tcl::unsupported::disassemble script { for { $x } { $x < 50 } { incr x } { lappend mylist $x } } ByteCode 0x0x2479d30, refCt 1, epoch 16, interp 0x0x23ef670 (epoch 16) Source " for { $x } { $x < 50 } { incr x } { lappend mylist $x " Cmds 4, src 57, inst 43, litObjs 5, aux 0, stkDepth 3, code/src 0.00 Exception ranges 2, depth 1: 0: level 0, loop, pc 8-16, continue 18, break 40 1: level 0, loop, pc 18-30, continue -1, break 40 Commands 4: 1: pc 0-41, src 1-56 2: pc 0-4, src 7-9 3: pc 8-16, src 37-54 4: pc 18-30, src 26-32 Command 1: "for { $x } { $x < 50 } { incr x } { lappend mylist $x }" Command 2: "$x " (0) push1 0 # "x" (2) loadStk (3) invokeStk1 1 (5) pop (6) jump1 +26 # pc 32 Command 3: "lappend mylist $x " (8) push1 1 # "lappend" (10) push1 2 # "mylist" (12) push1 0 # "x" (14) loadStk (15) invokeStk1 3 (17) pop Command 4: "incr x " (18) startCommand +13 1 # next cmd at pc 31 (27) push1 0 # "x" (29) incrStkImm +1 (31) pop (32) push1 0 # "x" (34) loadStk (35) push1 3 # "50" (37) lt (38) jumpTrue1 -30 # pc 8 (40) push1 4 # "" (42) done You'll notice that there are four commands in this code. The for loop, the x variable evaluation, the lappend operations, and the loop control with the incr command. There are a lot more instructions in this code, with jump pointers from the x interpretation to the incr statement, the less than comparison, then a jump to the list append. Wrapping Up I went through an exercise years ago to see how far I could minimize the Solaris kernel before it stopped working. I personally got down into the twenties before the system was unusable, but I think the record was somewhere south of 15. So...what's the point? Minimal for minimal's sake is not the point. Meet the functional objectives, that is job one. But then start tuning. Less is more. Less objects. Less stack depth. Less instantiation. Reviewing bytecode is good for that, and is possible with the native Tcl code. However, it is still important to test the code performance, as relying on bytecode objects and stack depth alone is not a good idea. For example, if we look at the bytecode differences with matching an IP address, there is no discernable difference from Tcl's perspective between the two regexp versions, and very little difference between the two regexp versions and the scan example. % dis script { regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})} 192.168.101.20 _ a b c d } ByteCode 0x0x24cfd30, refCt 1, epoch 15, interp 0x0x2446670 (epoch 15) Source " regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-" Cmds 1, src 90, inst 19, litObjs 8, aux 0, stkDepth 8, code/src 0.00 Commands 1: 1: pc 0-17, src 1-89 Command 1: "regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9" (0) push1 0 # "regexp" (2) push1 1 # "([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})" (4) push1 2 # "192.168.101.20" (6) push1 3 # "_" (8) push1 4 # "a" (10) push1 5 # "b" (12) push1 6 # "c" (14) push1 7 # "d" (16) invokeStk1 8 (18) done % dis script { regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _ a b c d } ByteCode 0x0x24d1730, refCt 1, epoch 15, interp 0x0x2446670 (epoch 15) Source " regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _" Cmds 1, src 64, inst 19, litObjs 8, aux 0, stkDepth 8, code/src 0.00 Commands 1: 1: pc 0-17, src 1-63 Command 1: "regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _ " (0) push1 0 # "regexp" (2) push1 1 # "^(\d+)\.(\d+)\.(\d+)\.(\d+)$" (4) push1 2 # "192.168.101.20" (6) push1 3 # "_" (8) push1 4 # "a" (10) push1 5 # "b" (12) push1 6 # "c" (14) push1 7 # "d" (16) invokeStk1 8 (18) done % dis script { scan 192.168.101.20 %d.%d.%d.%d a b c d } ByteCode 0x0x24d1930, refCt 1, epoch 15, interp 0x0x2446670 (epoch 15) Source " scan 192.168.101.20 %d.%d.%d.%d a b c d " Cmds 1, src 41, inst 17, litObjs 7, aux 0, stkDepth 7, code/src 0.00 Commands 1: 1: pc 0-15, src 1-40 Command 1: "scan 192.168.101.20 %d.%d.%d.%d a b c d " (0) push1 0 # "scan" (2) push1 1 # "192.168.101.20" (4) push1 2 # "%d.%d.%d.%d" (6) push1 3 # "a" (8) push1 4 # "b" (10) push1 5 # "c" (12) push1 6 # "d" (14) invokeStk1 7 (16) done However, if you look at the time results from these examples, they are very different. % time { regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})} 192.168.101.20 matched a b c d } 100000 11.29749 microseconds per iteration % time { regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _ a b c d } 100000 7.78696 microseconds per iteration % time { scan 192.168.101.20 %d.%d.%d.%d a b c d } 100000 1.03708 microseconds per iteration Why is that? Well, bytecode is a good indicator, but it doesn't address the inherent speed of the commands being invoked. Regex is a very slow operation comparatively. And within the regex engine, the second example is a simpler regex to evaluate, so it's faster (though less accurate, so make sure you are actually passing an IP address.) Then of course, scan shows off its optimized self in grand fashion. Was this a useful exercise in understanding Tcl under the hood? Drop some feedback in the comments if you'd like more tech tips like this that aren't directly covering a product feature or solution, but reveal some utility that assists in learning how things tick.1.3KViews1like3CommentsUseful IT. Bringing Health Record Transfer into the 21st Century.
I read the Life as a Healthcare CIO blog on occasion, mostly because as a former radiographer, health care records integration and other non-diagnostic IT use in healthcare is a passing interest of mine. Within the last hospital I worked at the systems didn’t communicate – not even close, as in there was no effort to make them do so. This intrigues me, as since I’ve entered IT I have watched technology uptake in healthcare slowly ramp up at a great curve behind the rest of the business world. Oh make no mistake, technology has been in overdrive on the equipment used, but things like systems interoperability and utilizing technology to make doctors, nurses, and tech’s lives easier is just slower in the medical world. A huge chunk of the resistance is grounded in a very common sense philosophy. “When people’s lives are on the line you do not rush willy-nilly to the newest gadget.” No one in healthcare says it that way – at least not to my knowledge – but that’s the essence of what they think. I can think of a few businesses that could use that same mentality applied occasionally with a slightly different twist: “When the company’s viability is on the line…” but that’s a different blog. Even with this very common-sense resistance, there has been a steady acceleration of uptake in technology use for things like patient records and prescriptions. It has been interesting to watch, as someone on the outside with plenty of experience with the way hospitals worked and their systems were all silos. Healthcare IT is to be commended for things like electronic prescription pads and instant transfer of (now nearly all electronic) X-Rays to those who need them to care for the patient. Applying the “this can help with little impact on critical care” or even “this can help with positive impact on critical care and little risk of negative impact” viewpoint as a counter to the above-noted resistance has produced some astounding results. A friend of mine from my radiographer days is manager of a Cardiac Cath Lab, and talking with him is just fun. “Dude, ninety percent of the pups coming out of Radiology schools can’t set an exposure!” is evidence that diagnostic tools are continuing to take advantage of technology – in this case auto-detecting XRay exposure limits. He has more glowing things to say about the non-diagnostic growth of technology within any given organization. But outside the organization? Well that’s a completely different story. The healthcare organization wants to keep your records safe and intact, and rarely even want to let you touch them. That’s just a case of the “intact” bit. Some people might want their records to not contain some portion – like their blood alcohol level when brought to the ER – and some people might inadvertently lose some portion of the record. While they’re more than happy to send them on a referral, and willing to give you a copy if you’re seeking a second opinion, these records all have one archaic quality. Paper. If I want to buy a movie, I can go to netflix, sign up, and stream it (at least many of them) to watch. If I want my medical records transferred to a specialist so I can get treatment before my left eye oozes out of its socket, they have to be copied, verified, and mailed. If they’re short or my eye is on the verge of falling out right this instant, then they might be faxed. But the bulk of records are mailed. Even overnight is another day lost in the treatment cycle. Recently – the last couple of years – there has been a movement to replicate the records delivery process electronically. As time goes on, more and more of your medical records are being stored digitally. It saves room, time, and makes it easier for a doctor to “request” your record should he need it in a hurry. It also makes it easier to track accidental or even intentional changes in records. While it didn’t happen as often as fear-mongers and ambulance chasers want you to believe, of course there are deletions and misplacements in the medical records of the 300 million US citizens. An electronic system never forgets, so while something as simple as a piece of paper falling out of a record could forever change it, in electronic form that can’t happen. Even an intentional deletion can be “deleted” as in not show up, but still there, stored with your other information so that changes can be checked should the need ever arise. The inevitable off-shoot of electronic records is the ability to communicate them between hospitals. If you’re in the ER in Tulsa, and your normal doctor is in Manhattan, getting your records quickly and accurately could save your life. So it made sense that as the percentage of new records that were electronic grew, someone would start to put together a way to communicate them. No doubt you’re familiar with the debate about national health information databases, a centralized location for records is a big screaming target from many people’s perspectives, while it is a potentially life-saving technological advancement to others (they’re both right, but I think the infosec crowd has the stronger argument). But a smart group of people put together a project to facilitate doing electronically exactly what is being done today physically. The process is that the patient (or another doctor) requests the records be sent, they are pulled out, copied, mailed or faxed, and then a follow-up or “record received” communication occurs to insure that the source doctor got your records where they belong. Electronically this equates to the same thing, but instead of “selected” you get “looked up”, and instead of “mailed or faxed” you get “sent electronically”. There’s a lot more to it, but that’s the gist of The Direct Project. There are several reasons I got sucked into reading about this project. From a former healthcare worker’s perspective, it’s very cool to see non-diagnostic technology making a positive difference in healthcare, from a patient perspective, I would like the transfer of records to be as streamlined as possible, from the InfoSec perspective (I did a couple of brief stints in InfoSec), I like that it is not a massive database, but rather a “faster transit” mechanism, and from an F5 perspective, the possibilities for our gear to help make this viable were in my mind while reading. While Dr. Halamka has a lot of interesting stuff on his blog, this is one I followed the links and read the information about. It’s a pretty cool initiative, and what may seem very limiting in their scope assumptions holds true to the Direct Project’s idea of replacing the transfer mechanism and not creating a centralized database. While they’re not specifying formats to use during said transfer, they do list some recommended reading on that topic. What they do have is a registry of people who can receive records, and a system for transferring data over the wire. They worry about DNS-style health-care provider lookups, transfer protocols, and encryption, which is certainly a large enough chunk for them to bite off, and then they show how they fit into the larger nation-wide healthcare electronic records efforts going on. I hope they get it right, and the system they’re helping to build results in near-instantaneous secure records transfers, but many inventions are a product of the time and society in which they live, and even if The Direct Project fails, something like it will eventually succeed. If you’re in Healthcare IT, this is certainly a way to add value to the organization, and worth checking out. Meanwhile, I’m going to continue to delve into their work and the work of other organizations they’ve linked to and see if there isn’t a way F5 can help. After all, we can compress, dedupe, and encrypt communications on-the-wire, and the entire system is about on-the-wire communications, so it seems like a perfectly logical route to explore. Though the patient care guy in me will be reading up as much as the IT guy, because healthcare was a very rewarding field that seriously needed a bit more non-diagnostic technology when I was doing it.282Views1like0Comments