optimization
110 TopicsAdvanced iRules: An Abstract View of iRules with the Tcl Bytecode Disassembler
In case you didn't already know, I'm a child of the 70's. As such, my formative years were in the 80's, where the music and movies were synthesized and super cheesy. Short Circuit was one of those cheesy movies, featuring Ally Sheedy, Steve Guttenberg, and Johnny Five, the tank-treaded laser-wielding robot with feelings and self awareness. Oh yeah! The plot...doesn't at all matter, but Johnny's big fear once reaching self actualization was being disassembled. Well, in this article, we won't dissamble Johnny Five, but we will take a look at disassembling some Tcl code and talk about optimizations. Tcl forms the foundation of several code environments on BIG-IP: iRules, iCall, tmsh, and iApps. The latter environments don't carry the burden of performance that iRules do, so efficiency isn't as big a concern. When we speak at conferences, we often commit some time to cover code optimization techniques due to the impactful nature of applying an iRule to live traffic. This isn't to say that the system isn't highly tuned and optimized already, it's just important not to introduce any more impact than is absolutely necessary to carry out purpose. In iRules, you can turn timing on to see the impact of an iRule, and in the Tcl shell (tclsh) you can use the time command. These are ultimately the best tools to see what the impact is going to be from a performance perspective. But if you want to see what the Tcl interpreter is actually doing from an instruction standpoint, well, you will need to disassemble the code. I've looked at bytecode in some of the python scripts I've written, but I wasn't aware of a way to do that in Tcl. I found a thread on stack that indicated it was possible, and after probing a little further was given a solution. This doesn't work in Tcl 8.4, which is what the BIG-IP uses, but it does work on 8.5+ so if you have a linux box with 8.5+ you're good to go. Note that there are variances from version to version that could absolutely change the way the interpreter works, so understand that this is an just an exercise in discovery. Solution 1 Fire up tclsh and then grab a piece of code. For simplicity, I'll use two forms of a simple math problem. The first is using the expr command to evaluate 3 times 4, and the second is the same math problem, but wraps the evaluation with curly brackets. The command that will show how the interpreter works its magic is tcl::unsupported::disassemble. ## ## unwrapped expression ## ## % tcl::unsupported::disassemble script { expr 3 * 4 } ByteCode 0x0x1e1ee20, refCt 1, epoch 16, interp 0x0x1d59670 (epoch 16) Source " expr 3 * 4 " Cmds 1, src 12, inst 14, litObjs 4, aux 0, stkDepth 5, code/src 0.00 Commands 1: 1: pc 0-12, src 1-11 Command 1: "expr 3 * 4 " (0) push1 0 # "3" (2) push1 1 # " " (4) push1 2 # "*" (6) push1 1 # " " (8) push1 3 # "4" (10) concat1 5 (12) exprStk (13) done ## ## wrapped expression ## ## % tcl::unsupported::disassemble script { expr { 3 * 4 } } ByteCode 0x0x1de7a40, refCt 1, epoch 16, interp 0x0x1d59670 (epoch 16) Source " expr { 3 * 4 } " Cmds 1, src 16, inst 3, litObjs 1, aux 0, stkDepth 1, code/src 0.00 Commands 1: 1: pc 0-1, src 1-15 Command 1: "expr { 3 * 4 } " (0) push1 0 # "12" (2) done Because the first expression is unwrapped, the interpreter has to build the expression and then call the runtime expression engine, resulting in 4 objects and a stack depth of 5. With the wrapped expression, the interpreter found a compile-time constant and used that directly, resulting in 1 object and a stack depth of 1 as well. Much thanks to Donal Fellows on Stack Overflow for the details. Using the time command in the shell, you can see that wrapping the expression results in a wildly more efficient experience. % time { expr 3 * 4 } 100000 1.02325 microseconds per iteration % time { expr {3*4} } 100000 0.07945 microseconds per iteration Solution 2 I was looking in earnest for some explanatory information for the bytecode fields displayed with tcl::unsupported::disassemble, and came across a couple pages on the Tcl wiki, one building on the other. Combining the pertinent sections of code from each page results in this script you can paste into tclsh: namespace eval tcl::unsupported {namespace export assemble} namespace import tcl::unsupported::assemble rename assemble asm interp alias {} disasm {} ::tcl::unsupported::disassemble proc aproc {name argl body args} { proc $name $argl $body set res [disasm proc $name] if {"-x" in $args} { set res [list proc $name $argl [list asm [dis2asm $res]]] eval $res } return $res } proc dis2asm body { set fstart " push -1; store @p; pop " set fstep " incrImm @p +1;load @l;load @p listIndex;store @i;pop load @l;listLength;lt " set res "" set wait "" set jumptargets {} set lines [split $body \n] foreach line $lines { ;#-- pass 1: collect jump targets if [regexp {\# pc (\d+)} $line -> pc] {lappend jumptargets $pc} } set lineno 0 foreach line $lines { ;#-- pass 2: do the rest incr lineno set line [string trim $line] if {$line eq ""} continue set code "" if {[regexp {slot (\d+), (.+)} $line -> number descr]} { set slot($number) $descr } elseif {[regexp {data=.+loop=%v(\d+)} $line -> ptr]} { #got ptr, carry on } elseif {[regexp {it%v(\d+).+\[%v(\d+)\]} $line -> copy number]} { set loopvar [lindex $slot($number) end] if {$wait ne ""} { set map [list @p $ptr @i $loopvar @l $copy] set code [string map $map $fstart] append res "\n $code ;# $wait" set wait "" } } elseif {[regexp {^ *\((\d+)\) (.+)} $line -> pc instr]} { if {$pc in $jumptargets} {append res "\n label L$pc;"} if {[regexp {(.+)#(.+)} $instr -> instr comment]} { set arg [list [lindex $comment end]] if [string match jump* $instr] {set arg L$arg} } else {set arg ""} set instr0 [normalize [lindex $instr 0]] switch -- $instr0 { concat - invokeStk {set arg [lindex $instr end]} incrImm {set arg [list $arg [lindex $instr end]]} } set code "$instr0 $arg" switch -- $instr0 { done { if {$lineno < [llength $lines]-2} { set code "jump Done" } else {set code ""} } startCommand {set code ""} foreach_start {set wait $line; continue} foreach_step {set code [string map $map $fstep]} } append res "\n [format %-24s $code] ;# $line" } } append res "\n label Done;\n" return $res } proc normalize instr { regsub {\d+$} $instr "" instr ;# strip off trailing length indicator set instr [string map { loadScalar load nop "" storeScalar store incrScalar1Imm incrImm } $instr] return $instr } Now that the script source is in place, you can test the two expressions we tested in solution 1. The output is very similar, however, there is less diagnostic information to go with the bytecode instructions. Still, the instructions are consistent between the two solutions. The difference here is that after "building" the proc, you can execute it, shown below each aproc expression. % aproc f x { expr 3 * 4 } -x proc f x {asm { push 3 ;# (0) push1 0 # "3" push { } ;# (2) push1 1 # " " push * ;# (4) push1 2 # "*" push { } ;# (6) push1 1 # " " push 4 ;# (8) push1 3 # "4" concat 5 ;# (10) concat1 5 exprStk ;# (12) exprStk ;# (13) done label Done; }} % f x 12 % aproc f x { expr { 3 * 4 } } -x proc f x {asm { push 12 ;# (0) push1 0 # "12" ;# (2) done label Done; }} % f x 12 Deeper Down the Rabbit Hole Will the internet explode if I switch metaphors from bad 80's movie to literary classic? I guess we'll find out. Simple comparisons are interesting, but now that we're peeling back the layers, let's look at something a little more complicated like a for loop and a list append. % tcl::unsupported::disassemble script { for { $x } { $x < 50 } { incr x } { lappend mylist $x } } ByteCode 0x0x2479d30, refCt 1, epoch 16, interp 0x0x23ef670 (epoch 16) Source " for { $x } { $x < 50 } { incr x } { lappend mylist $x " Cmds 4, src 57, inst 43, litObjs 5, aux 0, stkDepth 3, code/src 0.00 Exception ranges 2, depth 1: 0: level 0, loop, pc 8-16, continue 18, break 40 1: level 0, loop, pc 18-30, continue -1, break 40 Commands 4: 1: pc 0-41, src 1-56 2: pc 0-4, src 7-9 3: pc 8-16, src 37-54 4: pc 18-30, src 26-32 Command 1: "for { $x } { $x < 50 } { incr x } { lappend mylist $x }" Command 2: "$x " (0) push1 0 # "x" (2) loadStk (3) invokeStk1 1 (5) pop (6) jump1 +26 # pc 32 Command 3: "lappend mylist $x " (8) push1 1 # "lappend" (10) push1 2 # "mylist" (12) push1 0 # "x" (14) loadStk (15) invokeStk1 3 (17) pop Command 4: "incr x " (18) startCommand +13 1 # next cmd at pc 31 (27) push1 0 # "x" (29) incrStkImm +1 (31) pop (32) push1 0 # "x" (34) loadStk (35) push1 3 # "50" (37) lt (38) jumpTrue1 -30 # pc 8 (40) push1 4 # "" (42) done You'll notice that there are four commands in this code. The for loop, the x variable evaluation, the lappend operations, and the loop control with the incr command. There are a lot more instructions in this code, with jump pointers from the x interpretation to the incr statement, the less than comparison, then a jump to the list append. Wrapping Up I went through an exercise years ago to see how far I could minimize the Solaris kernel before it stopped working. I personally got down into the twenties before the system was unusable, but I think the record was somewhere south of 15. So...what's the point? Minimal for minimal's sake is not the point. Meet the functional objectives, that is job one. But then start tuning. Less is more. Less objects. Less stack depth. Less instantiation. Reviewing bytecode is good for that, and is possible with the native Tcl code. However, it is still important to test the code performance, as relying on bytecode objects and stack depth alone is not a good idea. For example, if we look at the bytecode differences with matching an IP address, there is no discernable difference from Tcl's perspective between the two regexp versions, and very little difference between the two regexp versions and the scan example. % dis script { regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})} 192.168.101.20 _ a b c d } ByteCode 0x0x24cfd30, refCt 1, epoch 15, interp 0x0x2446670 (epoch 15) Source " regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-" Cmds 1, src 90, inst 19, litObjs 8, aux 0, stkDepth 8, code/src 0.00 Commands 1: 1: pc 0-17, src 1-89 Command 1: "regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9" (0) push1 0 # "regexp" (2) push1 1 # "([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})" (4) push1 2 # "192.168.101.20" (6) push1 3 # "_" (8) push1 4 # "a" (10) push1 5 # "b" (12) push1 6 # "c" (14) push1 7 # "d" (16) invokeStk1 8 (18) done % dis script { regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _ a b c d } ByteCode 0x0x24d1730, refCt 1, epoch 15, interp 0x0x2446670 (epoch 15) Source " regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _" Cmds 1, src 64, inst 19, litObjs 8, aux 0, stkDepth 8, code/src 0.00 Commands 1: 1: pc 0-17, src 1-63 Command 1: "regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _ " (0) push1 0 # "regexp" (2) push1 1 # "^(\d+)\.(\d+)\.(\d+)\.(\d+)$" (4) push1 2 # "192.168.101.20" (6) push1 3 # "_" (8) push1 4 # "a" (10) push1 5 # "b" (12) push1 6 # "c" (14) push1 7 # "d" (16) invokeStk1 8 (18) done % dis script { scan 192.168.101.20 %d.%d.%d.%d a b c d } ByteCode 0x0x24d1930, refCt 1, epoch 15, interp 0x0x2446670 (epoch 15) Source " scan 192.168.101.20 %d.%d.%d.%d a b c d " Cmds 1, src 41, inst 17, litObjs 7, aux 0, stkDepth 7, code/src 0.00 Commands 1: 1: pc 0-15, src 1-40 Command 1: "scan 192.168.101.20 %d.%d.%d.%d a b c d " (0) push1 0 # "scan" (2) push1 1 # "192.168.101.20" (4) push1 2 # "%d.%d.%d.%d" (6) push1 3 # "a" (8) push1 4 # "b" (10) push1 5 # "c" (12) push1 6 # "d" (14) invokeStk1 7 (16) done However, if you look at the time results from these examples, they are very different. % time { regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})} 192.168.101.20 matched a b c d } 100000 11.29749 microseconds per iteration % time { regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _ a b c d } 100000 7.78696 microseconds per iteration % time { scan 192.168.101.20 %d.%d.%d.%d a b c d } 100000 1.03708 microseconds per iteration Why is that? Well, bytecode is a good indicator, but it doesn't address the inherent speed of the commands being invoked. Regex is a very slow operation comparatively. And within the regex engine, the second example is a simpler regex to evaluate, so it's faster (though less accurate, so make sure you are actually passing an IP address.) Then of course, scan shows off its optimized self in grand fashion. Was this a useful exercise in understanding Tcl under the hood? Drop some feedback in the comments if you'd like more tech tips like this that aren't directly covering a product feature or solution, but reveal some utility that assists in learning how things tick.1.3KViews1like3CommentsSelective Compression on BIG-IP
BIG-IP provides Local Traffic Policies that simplify the way in which you can manage traffic associated with a virtual server. You can associate a BIG-IP local traffic policy to support selective compression for types of content that can benefit from compression, like HTML, XML, and CSS stylesheets. These file types can realize performance improvements, especially across slow connections, by compressing them. You can easily configure your BIG-IP system to use a simple Local Traffic Policy that selectively compresses these file types. In order to use a policy, you will want to create and configure a draft policy, publish that policy, and then associate the policy with a virtual server in BIG-IP v12. Alright, let’s log into a BIG-IP The first thing you’ll need to do is create a draft policy. On the main menu select Local Traffic>Policies>Policy List and then the Create or + button. This takes us to the create policy config screen. We’ll name the policy SelectiveCompression, add a description like ‘This policy compresses file types,’ and we’ll leave the Strategy as the default of Execute First matching rule. This is so the policy uses the first rule that matches the request. Click Create Policy which saves the policy to the policies list. When saved, the Rules search field appears but has no rules. Click Create under Rules. This brings us to the Rules General Properties area of the policy. We’ll give this rule a name (CompressFiles) and then the first settings we need to configure are the conditions that need to match the request. Click the + button to associate file types. We know that the files for compression are comprised of specific file types associated with a content type HTTP Header. We choose HTTP Header and select Content-Type in the Named field. Select ‘begins with’ next and type ‘text/’ for the condition and compress at the ‘response’ time. We’ll add another condition to manage CPU usage effectively. So we click CPU Usage from the list with a duration of 1 minute with a conditional operator of ‘less than or equal to’ 5 as the usage level at response time. Next under Do the following, click the create + button to create a new action when those conditions are met. Here, we’ll enable compression at the response time. Click Save. Now the draft policy screen appears with the General Properties and a list of rules. Here we want to click Save Draft. Now we need to publish the draft policy and associate it with a virtual server. Select the policy and click Publish. Next, on the main menu click Local Traffic>Virtual Servers>Virtual Server List and click the name of the virtual server you’d like to associate for the policy. On the menu bar click Resources and for Policies click Manage. Move SelectiveCompression to the Enabled list and click Finished. The SelectiveCompression policy is now listed in the policies list which is now associated with the chosen virtual server. The virtual server with the SelectiveCompression Local Traffic Policy will compress the file types you specified. Congrats! You’ve now added a local traffic policy for selective compression! You can also watch the full video demo thanks to our TechPubs team. ps981Views0likes7CommentsOptimizing the CVE-2015-1635 iRule
A couple days ago an iRule was published that mitigates Microsoft’s HTTP.sys vulnerability described in CVE-2015-1635 and MS15-034. It’s a short rule, but it features the dreaded regex. Every time I get the chance at user groups, conferences, webinars, I preach the evils of regex on the data plane. One of the many reasons it is not the best choice is that just calling the regex engine is the equivalent of an 8x operation, let alone actually performing the matching. So I decided to look into some optimizations. Original Rule The rule as published is pretty simple. And actually, the regex is clean and the string it’s matching against is not long, so whereas the instantiation penalty is still high, the matching penalty should not be, so this isn’t a terrible use of regex. I’ve cleaned up the rule to only test the regex itself, setting the malicious header to a variable in CLIENT_ACCEPTED so I can just slam the box with an ab request (ab -n 5000 http://testvip/) from the BIG-IP cli. when CLIENT_ACCEPTED { set x "GET / HTTP/1.1\r\nHost: stuff\r\nRange: bytes=0-18446744073709551615\r\n\r\n" } when HTTP_REQUEST { if { $x matches_regex {bytes\s*=.*[0-9]{10}} } { HTTP::respond 200 ok } } With this modified version, over 5000 requests my average CPU cycles landed at 45.4K on a BIG-IP VE on TMOS 11.6 running in VMware Fusion 5. Not bad at all, regex or not. My Optimized Rule My approach was to use scan to pull out the appropriate fields, then string commands to match the bytes and the digits in the number 10 or greater. For a scan deep dive, see my scan revisited article in the iRules 101 series. when CLIENT_ACCEPTED { set x "GET / HTTP/1.1\r\nHost: stuff\r\nRange: bytes=0-18446744073709551615\r\n\r\n" } when HTTP_REQUEST { scan $x {%[^:]:%[^:]:%[^=]=%[^-]-%[0-9]} 0 0 a 0 b if { ($a eq " bytes") && ([string length $b] > 9) } { HTTP::respond 200 ok } } This is definitely faster at 42.6K average CPU cycles, but not nearly as much of an improvement as I had anticipated. Why? Well, scan is doing a lot, and then I’m using the ends_with operator, which is equivalent to a string match, and then also performing a string length operation and finally a comparison operation. So whereas it is a little faster, the number of operations performed was more than it needs to be, mostly because I wasn’t clever enough to make a better string match. Another Way Better Optimized Rule Which leads us to the current course leader…who I’ll leave unnamed for now, but you all know him (or her) very well in these parts. Back to a better string match. I love that 6+ years in, I learn new ways to manipulate the string commands in Tcl. I suppose if I really RTFM’d the Tcl docs a little better I’d know this, but I had no idea you could match quite the way this rule does. when CLIENT_ACCEPTED { set x "GET / HTTP/1.1\r\nHost: stuff\r\nRange: bytes=0-18446744073709551615\r\n\r\n" } when HTTP_REQUEST { if { [string match -nocase {*bytes*=*[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]*} $x] } { HTTP::respond 200 ok } } So there you go. One simple string command. And it comes in at 31.3K average CPU cycles. That’s a significant reduction. (Almost Always) A Better Way Sometimes regex is the only way. But it’s a rare day when that is the case. There is almost always a better way than regex when it comes to iRules. And since speed is the name of the game on the data plane, I highly encourage all you iRulers out there to not be like me and get yourself deep into the weeds of the Tcl string commands and arm yourself with some seriously powerful weapons. You’ve been charged—Go forth!327Views0likes3CommentsTrue or False: Application acceleration solutions teach developers to write inefficient code
It has been suggested that the use of application acceleration solutions as a means to improve application performance would result in programmers writing less efficient code. In a comment on “The House that Load Balancing Built” a reader replies: Not only will it cause the application to grow in cost and complexity, it's teaching new and old programmers to not write efficient code and rely on other products and services on [sic] thier behalf. I.E. Why write security into the app, when the ADC can do that for me. Why write code that executes faster, the ADC will do that for me, etc., etc. While no one can control whether a programmer writes “fast” code, the truth is that application acceleration solutions do not affect the execution of code in any way. A poorly constructed loop will run just as slow with or without an application acceleration solution in place. Complex mathematical calculations will execute with the same speed regardless of the external systems that may be in place to assist in improving application performance. The answer is, unequivocally, that the presence or lack thereof of an application acceleration solution should have no impact on the application developer because it does nothing to affect the internal execution of written code. If you answered false, you got the answer right. The question has to be, then, just what does an application acceleration solution do that improves performance? If it isn’t making the application logic execute faster, what’s the point? It’s a good question, and one that deserves an answer. Application acceleration is part of a solution we call “application delivery”. Application delivery focuses on improving application performance through optimization of the use and behavior of transport (TCP) and application transport (HTTP/S) protocols, offloading certain functions from the application that are more efficiently handled by an external often hardware-based system, and accelerating the delivery of the application data. OPTIMIZATION Application acceleration improves performance by understanding how these protocols (TCP, HTTP/S) interact across a WAN or LAN and acting on that understanding to improve its overall performance. There are a large number of performance enhancing RFCs (standards) around TCP that are usually implemented by application acceleration solutions. Delayed and Selective Acknowledgments (RFC 2018) Explicit Congestion Notification (RFC 3168) Limited and Fast Re-Transmits (RFC 3042 and RFC 2582) Adaptive Initial Congestion Windows (RFC 3390) Slow Start with Congestion Avoidance (RFC 2581) TCP Slow Start (RFC 3390) TimeStamps and Windows Scaling (RFC 1323) All of these RFCs deal with TCP and therefore have very little to do with the code developers create. Most developers code within a framework that hides the details of TCP and HTTP connection management from them. It is the rare programmer today that writes code to directly interact with HTTP connections, and even rare to find one coding directly at the TCP socket layer. The execution of code written by the developer takes just as long regardless of the implementation or lack of implementation of these RFCs. The application acceleration solution improves the performance of the delivery of the application data over TCP and HTTP which increases the performance of the application as seen from the user’s point of view. OFFLOAD Offloading compute intensive processing from application and web servers improves performance by reducing the consumption of CPU and memory required to perform those tasks. SSL and other encryption/decryption functions (cookie security, for example) are computationally expensive and require additional CPU and memory on the server. The reason offloading these functions to an application delivery controller or stand-alone application acceleration solution improves application performance is because it frees the CPU and memory available on the server and allows it to be dedicated to the application. If the application or web server does not need to perform these tasks, it saves CPU cycles that would otherwise be used to perform them. Those cycles can be used by the application and thus increases the performance of the application. Also beneficial is the way in which application delivery controllers manage TCP connections made to the web or application server. Opening and closing TCP connections takes time, and the time required is not something a developer – coding within a framework – can affect. Application acceleration solutions proxy connections for the client and subsequently reduce the number of TCP connections required on the web or application server as well as the frequency with which those connections need to be open and closed. By reducing the connections and frequency of connections the application performance is increased because it is not spending time opening and closing TCP connections, which are necessarily part of the performance equation but not directly affected by anything the developer does in his or her code. The commenter believes that an application delivery controller implementation should be an afterthought. However, the ability of modern application delivery controllers to offload certain application logic functions such as cookie security and HTTP header manipulation in a centralized, optimized manner through network-side scripting can be a performance benefit as well as a way to address browser-specific quirks and therefore should be seriously considered during the development process. ACCELERATION Finally, application acceleration solutions improve performance through the use of caching and compression technologies. Caching includes not just server-side caching, but the intelligent use of the client (usually the browser) cache to reduce the number of requests that must be handled by the server. By reducing the number of requests the server is responding to, the web or application server is less burdened in terms of managing TCP and HTTP sessions and state, and has more CPU cycles and memory that can be dedicated to executing the application. Compression, whether using traditional industry standard web-based compression (GZip) or WAN-focused data de-duplication techniques, decreases the amount of data that must be transferred from the server to the client. Decreasing traffic (bandwidth) results in fewer packets traversing the network which results in quicker delivery to the user. This makes it appear that the application is performing faster than it is, simply because it arrived sooner. Of all these techniques, the only one that could possibly contribute to the delinquency of developers is caching. This is because application acceleration caching features act on HTTP caching headers that can be set by the developer, but rarely are. These headers can also be configured by the web or application server administrator, but rarely are in a way that makes sense because most content today is generated dynamically and is rarely static, even though individual components inside the dynamically generated page may in fact be very static (CSS, JavaScript, images, headers, footers, etc…). However, the methods through which caching (pragma) headers are set is fairly standard and the actual code is usually handled by the framework in which the application is developed, meaning the developer ultimately cannot affect the efficiency of the use of this method because it was developed by someone else. The point of the comment was likely more broad, however. I am fairly certain that the commenter meant to imply that if developers know the performance of the application they are developing will be accelerated by an external solution that they will not be as concerned about writing efficient code. That’s a layer 8 (people) problem that isn’t peculiar to application delivery solutions at all. If a developer is going to write inefficient code, there’s a problem – but that problem isn’t with the solutions implemented to improve the end-user experience or scalability, it’s a problem with the developer. No technology can fix that.250Views0likes4CommentsThe Image Format Wars
In the late 1990’s we experienced the browser wars between Internet Explorer and Netscape Navigator, more recently the battle has waged on between IE, Firefox, Chrome, Safari and Opera. Now be prepared for the next battle – the image format wars. Browsers are now battling for a more optimized image format with Microsoft, Google, and Mozilla all introducing new image formats, but before we go into these new image formats let’s look at why this is emerging and the history behind web based images. The most popular image formats on the web today are GIF, JPEG and PNG. The oldest image format is the GIF, introduced in 1987, although the oldest its popularity is waning in favor of image formats with more colors options and better compression algorithms like JPEG (introduced in the early 90s) and PNG (introduced in 1995) . These image formats have worked for many years so why the sudden interest in introducing new file formats, if it ain’t broke don’t fix it. Well, it’s not that the image formats are broken, but rather how they impact page performance is the issue. The size of the average web page is growing exponentially and the largest contributor to that is images. According to HttpArchive images comprise 65% of the content of a typical web page at just over 1MB in total size. This presents a huge opportunity for improving the web browsing experience. Doesn’t everybody want a faster loading web page and to use less data on their smartphones? This has lead to the search for a way to optimize images without sacrificing quality. This brings us to the introduction of new image formats and encoding algorithms that are in contention to improve the performance of images on the web. Google introduced WebP , Microsoft introduced JPEG XR in IE 9 , and Mozilla announced earlier this year a new project entitled mozjpeg. Let’s start with WebP and JPEG XR both provide significant savings when it comes to the size of images. We did some testing recently to see how much improvement can be achieved with WebP and JPEG XR. The test consisted of 417 JPEG images and 50 PNG images. The results showed that both provided substantial benefits reducing the file size by over 50%. If the results are this good, why aren’t more sites using these formats? The answer: browser compatibility. WebP only works in Chrome and Opera while JPEG XR only works in IE. If you want your users to see the benefits of these new formats you need to create 3 versions of every image, which can be time consuming. Luckily the image optimization functionality in BIG-IP Application Acceleration Manager allows an administrator to click a single check mark enabling an image to be converted to the appropriate format on a per client basis. This eliminates the need to create multiple versions of the same object and all users can benefit from the optimizations. No, wait – not all users, only clients connecting with certain IE version, Chrome and Opera will benefit from the new format types. This is where mozjpeg is looking to change the game. Mozjpeg’s goal is to improve the compressibility of JPEGs without sacrificing quality. The beauty of this project is that the file format would remain a JPEG meaning that every browser and user would be able to see the benefits of reduced image sizes. It is interesting to track the progress of this project and see who will win the image wars, in the meantime optimize the images you can for the users that you can.295Views0likes0CommentsF5 Synthesis: Trickle Up Performance
#sdas #webperf New TCP algorithms and protocols in a platform net improved performance for apps The original RFC for TCP (793) was written in September of 1981. Let's pause for a moment and reflect on that date. When TCP was introduced applications on "smart" devices was relegated to science fiction and the use of the Internet by your average consumer was still more than two decades away. Yet TCP remains, like IP, uncontested as "the" transport protocol of ... everything. New application architectures, networks, and devices all introduce challenges with respect to how TCP behaves. Over the years it's become necessary to tweak the protocol with new algorithms designed to address everything from congestion control to window size control to key management for the TCP MD5 signature option to header compression. There are, in fact, over 100 (that's where I stopped counting) TCP-related RFCs on the digital books. The most recent addition is RFC 6897, or Multipath TCP (MPTCP) as it's more commonly referred to. I take any bets against there being more in the future, if I were you. What most of these RFCs related to TCP have in common is they attempt to address some issue that's holding application performance hostage. Congestion, for example, is a huge network issue that can impact TCP performance (and subsequently the performance of applications) in a very negative way. Congestion can cause lost packets and, for TCP at least, that often means retransmission. You see, TCP is very particular about the order in which is receives packets and, is also designed to be reliable. That means if a packet is lost, you're going to hear about it (or more accurately your infrastructure will hear about it and need to resend it). All those retransmitted packets, trying to traverse an already congested network... well, you can imagine it isn't exactly a good thing. So there are a variety of congestion control algorithms designed to better manage TCP in such situations. From TCP Reno to Vegas to Illinois to H-TCP, algorithms are the most common way in which network stacks in general deal with congestion. The important thing to remember about this is that performance trickles up. Improvements in the TCP stack benefit those layers that reside above, like the application. F5 Synthesis: Faster Platforms Means Faster Apps The F5 platforms that comprise the Synthesis High Performance Service Fabric are no different. They implement the vast majority of the congestion control algorithms available to ensure the fastest TCP stack we can offer. In the latest release of our platforms, we also added an F5 created algorithm - TCP Woodside - designed to use both loss and latency-based algorithms to improve, in particular, the performance of applications operating over mobile networks. By implementing at the TCP - the platform - layer, all applications receive the benefit of improved performance but it's particularly noticeable in mobile environments because of the differences inherent in mobile versus fixed networks. Also new in our latest platforms is support for MPTCP, another potential boon for mobile application users. MPTCP is designed to improve performance by enabling the use of multiple TCP subflows over a single TCP connection. Messages can then be dynamically routed across those subflows. For web applications, this can result in significantly faster retrieval of the more than 90 objects that comprise the average web page today. Synthesis 1.5 comprises a variety of new performance and security-related features that improve the platforms that make up its High Performance Services Fabric. What that ultimately means for customers and users alike is faster apps. For more information on Synthesis: F5 Synthesis Site More DevCentral articles on F5 Synthesis416Views0likes1CommentA Brief History of TCP
The foundation of the Internet is TCP, it includes the rules for formatting messages, handling congestion, error correction and provides information on where a packet should be delivered, whether it arrived too quickly for the receiving computer and whether it arrived at all. Basically TCP is the glue that ensures network conditions work smoothly. As the way people connect to the internet has changed from 2400 baud modems to high speed fiber to the home and mobile connectivity – TCP is being stretched to perform optimally across multiple channels. It seems like for many years development and innovation in TCP had slowed down and now with mobile becoming a more popular way of accessing the internet and initiatives to make the web faster there is renewed development and research into TCP. The table below highlights some of the major events in the life of TCP over the past 40 years. DATE EVENT 1974 RFC 675 specification of Internet Transmission Control Protocol published 1988 Jacobson’s “Congestion Avoidance and Control” paper sparked multiple congestion related improvements, including Tahoe release of BSD. 1990 Reno introduced in BSD. 1994 TCP Vegas altered way in which set timeouts and RTT delays were measured. 1997 RFC 2001 proposed as standard for dealing with TCP slow start, congestion avoidance, fast retransmit and fast recovery. 1999 RFC 2581 proposed standard for TCP congestion control obsoletes RFC 2001 2002 First 3G network available commercially 2006 First 4g system deployed in South Korea 2009 RFC 5681 draft standard for TCP congestion control obsoletes RFC 2581 2010 TCP Fast Open proposed 2011 Proposal to reduce retransmission timeout from 3 to 1 2013 RFC 6824 specification for multipath TCP published as experimental standard. QUIC proposed as alternative to TCP for web based applications. TCP was designed in 1973, during the infancy of the internet and was made for a wired infrastructure, namely ARPA Net, a low capacity network of 213 computers very different than today where there are more than 500 million connected devices in US homes. As the network of computers increased and the internet expanded “congestion collapses” began to occur where the transmission rates of networks dropped by a thousand fold from 32Kbps to 40 bps. This drastic drop in rates led to some investigation and analysis by leading computer scientists and ultimately led to the creation of what we now know as congestion control. The internet has been advancing at a rate that nobody could imagine where today not only are our computers connected but also our phones, televisions, cars and even our glasses. With the rise of more connected devices, we are using mobile networks such as 3G, 4G and high capacity fixed line networks to access information. Needless to say, these networks have very different characteristics in comparison to their ancestral networks. These different characteristics require that TCP stacks need to evolve to become more effective in the changing landscape. Successful, reliable and fast transmission of data is all controlled by a number of TCP parameters that can be tuned and modified depending on the network characteristics. With-in BIG-IP we breakdown the TCP properties into the following major categories: · Resource Management · Behavior · Performance Over the coming weeks we will be diving into each of these categories in further detail.300Views0likes0Comments1024 Words: I Didn't Say It Was Your Fault, I Said I Was Going to Blame You
#webperf We often lay the blame for application performance woes on the nebulous (and apparently sentient-with-malevolent-tendencies) "network". But the truth is that the causes of application performance woes are more often than not related to the "first*" and "last" mile of connectivity. That's why optimization is as important, often more so, than acceleration. And yes, there is a difference. * That whole "first mile" thing isn't the network as we generally see it, per se, but the network internal to the server. It's complicated, engineering things. Trust me, there's a bus over which data has to travel that slows things down. Besides, the analogy doesn't work well if there isn't a "first" mile to match the "last" mile so just run with it, okay?171Views0likes1CommentThe Disadvantages of DSR (Direct Server Return)
I read a very nice blog post yesterday discussing some of the traditional pros and cons of load-balancing configurations. The author comes to the conclusion that if you can use direct server return, you should. I agree with the author's list of pros and cons; DSR is the least intrusive method of deploying a load-balancer in terms of network configuration. But there are quite a few disadvantages missing from the author's list. Author's List of Disadvantages of DSR The disadvantages of Direct Routing are: Backend server must respond to both its own IP (for health checks) and the virtual IP (for load balanced traffic) Port translation or cookie insertion cannot be implemented. The backend server must not reply to ARP requests for the VIP (otherwise it will steal all the traffic from the load balancer) Prior to Windows Server 2008 some odd routing behavior could occur in In some situations either the application or the operating system cannot be modified to utilse Direct Routing. Some additional disadvantages: Protocol sanitization can't be performed. This means vulnerabilities introduced due to manipulation of lax enforcement of RFCs and protocol specifications can't be addressed. Application acceleration can't be applied. Even the simplest of acceleration techniques, e.g. compression, can't be applied because the traffic is bypassing the load-balancer (a.k.a. application delivery controller). Implementing caching solutions become more complex. With a DSR configuration the routing that makes it so easy to implement requires that caching solutions be deployed elsewhere, such as via WCCP on the router. This requires additional configuration and changes to the routing infrastructure, and introduces another point of failure as well as an additional hop, increasing latency. Error/Exception/SOAP fault handling can't be implemented. In order to address failures in applications such as missing files (404) and SOAP Faults (500) it is necessary for the load-balancer to inspect outbound messages. Using a DSR configuration this ability is lost, which means errors are passed directly back to the user without the ability to retry a request, write an entry in the log, or notify an administrator. Data Leak Prevention can't be accomplished. Without the ability to inspect outbound messages, you can't prevent sensitive data (SSN, credit card numbers) from leaving the building. Connection Optimization functionality is lost. TCP multiplexing can't be accomplished in a DSR configuration because it relies on separating client connections from server connections. This reduces the efficiency of your servers and minimizes the value added to your network by a load balancer. There are more disadvantages than you're likely willing to read, so I'll stop there. Suffice to say that the problem with the suggestion to use DSR whenever possible is that if you're an application-aware network administrator you know that most of the time, DSR isn't the right solution because it restricts the ability of the load-balancer (application delivery controller) to perform additional functions that improve the security, performance, and availability of the applications it is delivering. DSR is well-suited, and always has been, to UDP-based streaming applications such as audio and video delivered via RTSP. However, in the increasingly sensitive environment that is application infrastructure, it is necessary to do more than just "load balancing" to improve the performance and reliability of applications. Additional application delivery techniques are an integral component to a well-performing, efficient application infrastructure. DSR may be easier to implement and, in some cases, may be the right solution. But in most cases, it's going to leave you simply serving applications, instead of delivering them. Just because you can, doesn't mean you should.5.9KViews0likes4CommentsHTML5 WebSockets Illustrates Need for Programmability in the Network
#HTML5 #SDN The increasing use of HTML5 WebSockets illustrates one of the lesser mentioned value propositions of SDN – and ADN: extensibility. It's likely that IT network and security staff would agree that HTML5 WebSockets has the potential for high levels of disruptions (and arguments) across the data center. Developers want to leverage the ability to define their own protocols while reaping the benefits of the HTTP-as-application-transport paradigm. Doing so, however, introduces security risks and network challenges as never-before-seen protocols start streaming through firewalls, load balancers, caches and other network-hosted intermediaries that IT network and security pros are likely to balk at. Usually because they're the last to know, and by the time they do – it's already too late to raise objections. Aside from the obvious "you folks need to talk more" (because that's always been the answer and as of yet has failed to actually occur) there are other answers. Perhaps not turn-key, perhaps not easy, but there are other answers. One of them points to a rarely discussed benefit of SDN that has long been true for ADN but is often overlooked: extensibility through programmability. In addition, leveraging the SDN controller’s centralized intelligence, IT can alter network behavior in real-time and deploy new applications and network services in a matter of hours or days, rather than the weeks or months needed today. By centralizing network state in the control layer, SDN gives network managers the flexibility to configure, manage, secure, and optimize network resources via dynamic, automated SDN programs. Moreover, they can write these programs themselves and not wait for features to be embedded in vendors’ proprietary and closed software environments in the middle of the network. -- ONF, Software-Defined Networking: The New Norm for Networks The ability to alter behavior of any network component in real-time, to make what has been traditionally static dynamic enough to adapt to changing conditions is the goal of many modern technology innovations including SDN (the network) and cloud computing (applications and services). When developers and vendors can create and deploy new protocols and toss them over the wall into a production environment, operations needs the ability to adapt the network and delivery infrastructure to ensure the continued enforcement of security policies as well as provide support to assure availability and performance expectations are met. Doing so requires extensibility in the network. Ultimately that means programmability. EXTENSIBILITY through PROGRAMMABILITY While most of the networking world is focused on OpenFlow and VXLAN and NVGRE and virtual network gateways, the value of the ability to extend SDN through applications seems to be grossly underestimated. The premise of SDN is that the controller's functionality can be extended through specific applications that provide for handling of new protocols, provide new methods of managing flows, and do other nifty things that likely only network geeks would truly appreciate. The ability to extend packet processing and add new functions or support for new protocols rapidly, through software, is a significant part of the value proposition of SDN. Likewise, it illustrates the value of the same capabilities that currently exist in ADN solutions. ADN, too, enables extensibility through programmability. While varying degrees of control and capabilities exist across the ADN spectrum, at least some provide complete programmatic control over traffic management by offering the ability to "plug-in" applications (of a sort) that provide support for application-specific handling or new (and often proprietary) protocols, like those used to exchange data over WebSockets-transported connections. What both afford is the ability to extend the functionality of the network (SDN) or application traffic management (ADN) without requiring upgrades or new products. This has been a significant source of value for organizations with respect to security, who often turn to the ADN solutions topologically positioned in a strategic point of control within the network to address zero-day or emerging exploits for which there are no quick fixes. When it comes to something like dealing with custom (proprietary) application protocols and the use of WebSockets, for which network infrastructure services naturally has no support, the extensibility of SDN and ADN are a boon to network and security staff looking for ways in which to secure and address operational risk associated with new and heretofore unknown protocols. The Need for (HTML5) Speed SPDY versus HTML5 WebSockets Oops! HTML5 Does It Again Reactive, Proactive, Predictive: SDN Models The Next IT Killer Is… Not SDN SDN is Network Control. ADN is Application Control.215Views0likes0Comments