optimization
114 TopicsThe Disadvantages of DSR (Direct Server Return)
I read a very nice blog post yesterday discussing some of the traditional pros and cons of load-balancing configurations. The author comes to the conclusion that if you can use direct server return, you should. I agree with the author's list of pros and cons; DSR is the least intrusive method of deploying a load-balancer in terms of network configuration. But there are quite a few disadvantages missing from the author's list. Author's List of Disadvantages of DSR The disadvantages of Direct Routing are: Backend server must respond to both its own IP (for health checks) and the virtual IP (for load balanced traffic) Port translation or cookie insertion cannot be implemented. The backend server must not reply to ARP requests for the VIP (otherwise it will steal all the traffic from the load balancer) Prior to Windows Server 2008 some odd routing behavior could occur in In some situations either the application or the operating system cannot be modified to utilse Direct Routing. Some additional disadvantages: Protocol sanitization can't be performed. This means vulnerabilities introduced due to manipulation of lax enforcement of RFCs and protocol specifications can't be addressed. Application acceleration can't be applied. Even the simplest of acceleration techniques, e.g. compression, can't be applied because the traffic is bypassing the load-balancer (a.k.a. application delivery controller). Implementing caching solutions become more complex. With a DSR configuration the routing that makes it so easy to implement requires that caching solutions be deployed elsewhere, such as via WCCP on the router. This requires additional configuration and changes to the routing infrastructure, and introduces another point of failure as well as an additional hop, increasing latency. Error/Exception/SOAP fault handling can't be implemented. In order to address failures in applications such as missing files (404) and SOAP Faults (500) it is necessary for the load-balancer to inspect outbound messages. Using a DSR configuration this ability is lost, which means errors are passed directly back to the user without the ability to retry a request, write an entry in the log, or notify an administrator. Data Leak Prevention can't be accomplished. Without the ability to inspect outbound messages, you can't prevent sensitive data (SSN, credit card numbers) from leaving the building. Connection Optimization functionality is lost. TCP multiplexing can't be accomplished in a DSR configuration because it relies on separating client connections from server connections. This reduces the efficiency of your servers and minimizes the value added to your network by a load balancer. There are more disadvantages than you're likely willing to read, so I'll stop there. Suffice to say that the problem with the suggestion to use DSR whenever possible is that if you're an application-aware network administrator you know that most of the time, DSR isn't the right solution because it restricts the ability of the load-balancer (application delivery controller) to perform additional functions that improve the security, performance, and availability of the applications it is delivering. DSR is well-suited, and always has been, to UDP-based streaming applications such as audio and video delivered via RTSP. However, in the increasingly sensitive environment that is application infrastructure, it is necessary to do more than just "load balancing" to improve the performance and reliability of applications. Additional application delivery techniques are an integral component to a well-performing, efficient application infrastructure. DSR may be easier to implement and, in some cases, may be the right solution. But in most cases, it's going to leave you simply serving applications, instead of delivering them. Just because you can, doesn't mean you should.5.9KViews0likes4CommentsWhat is server offload and why do I need it?
One of the tasks of an enterprise architect is to design a framework atop which developers can implement and deploy applications consistently and easily. The consistency is important for internal business continuity and reuse; common objects, operations, and processes can be reused across applications to make development and integration with other applications and systems easier. Architects also often decide where functionality resides and design the base application infrastructure framework. Application server, identity management, messaging, and integration are all often a part of such architecture designs. Rarely does the architect concern him/herself with the network infrastructure, as that is the purview of “that group”; the “you know who I’m talking about” group. And for the most part there’s no need for architects to concern themselves with network-oriented architecture. Applications should not need to know on which VLAN they will be deployed or what their default gateway might be. But what architects might need to know – and probably should know – is whether the network infrastructure supports “server offload” of some application functions or not, and how that can benefit their enterprise architecture and the applications which will be deployed atop it. WHAT IT IS Server offload is a generic term used by the networking industry to indicate some functionality designed to improve the performance or security of applications. We use the term “offload” because the functionality is “offloaded” from the server and moved to an application network infrastructure device instead. Server offload works because the application network infrastructure is almost always these days deployed in front of the web/application servers and is in fact acting as a broker (proxy) between the client and the server. Server offload is generally offered by load balancers and application delivery controllers. You can think of server offload like a relay race. The application network infrastructure device runs the first leg and then hands off the baton (the request) to the server. When the server is finished, the application network infrastructure device gets to run another leg, and then the race is done as the response is sent back to the client. There are basically two kinds of server offload functionality: Protocol processing offload Protocol processing offload includes functions like SSL termination and TCP optimizations. Rather than enable SSL communication on the web/application server, it can be “offloaded” to an application network infrastructure device and shared across all applications requiring secured communications. Offloading SSL to an application network infrastructure device improves application performance because the device is generally optimized to handle the complex calculations involved in encryption and decryption of secured data and web/application servers are not. TCP optimization is a little different. We say TCP session management is “offloaded” to the server but that’s really not what happens as obviously TCP connections are still opened, closed, and managed on the server as well. Offloading TCP session management means that the application network infrastructure is managing the connections between itself and the server in such a way as to reduce the total number of connections needed without impacting the capacity of the application. This is more commonly referred to as TCP multiplexing and it “offloads” the overhead of TCP connection management from the web/application server to the application network infrastructure device by effectively giving up control over those connections. By allowing an application network infrastructure device to decide how many connections to maintain and which ones to use to communicate with the server, it can manage thousands of client-side connections using merely hundreds of server-side connections. Reducing the overhead associated with opening and closing TCP sockets on the web/application server improves application performance and actually increases the user capacity of servers. TCP offload is beneficial to all TCP-based applications, but is particularly beneficial for Web 2.0 applications making use of AJAX and other near real-time technologies that maintain one or more connections to the server for its functionality. Protocol processing offload does not require any modifications to the applications. Application-oriented offload Application-oriented offload includes the ability to implement shared services on an application network infrastructure device. This is often accomplished via a network-side scripting capability, but some functionality has become so commonplace that it is now built into the core features available on application network infrastructure solutions. Application-oriented offload can include functions like cookie encryption/decryption, compression, caching, URI rewriting, HTTP redirection, DLP (Data Leak Prevention), selective data encryption, application security functionality, and data transformation. When network-side scripting is available, virtually any kind of pre or post-processing can be offloaded to the application network infrastructure and thereafter shared with all applications. Application-oriented offload works because the application network infrastructure solution is mediating between the client and the server and it has the ability to inspect and manipulate the application data. The benefits of application-oriented offload are that the services implemented can be shared across multiple applications and in many cases the functionality removes the need for the web/application server to handle a specific request. For example, HTTP redirection can be fully accomplished on the application network infrastructure device. HTTP redirection is often used as a means to handle application upgrades, commonly mistyped URIs, or as part of the application logic when certain conditions are met. Application security offload usually falls into this category because it is application – or at least application data – specific. Application security offload can include scanning URIs and data for malicious content, validating the existence of specific cookies/data required for the application, etc… This kind of offload improves server efficiency and performance but a bigger benefit is consistent, shared security across all applications for which the service is enabled. Some application-oriented offload can require modification to the application, so it is important to design such features into the application architecture before development and deployment. While it is certainly possible to add such functionality into the architecture after deployment, it is always easier to do so at the beginning. WHY YOU NEED IT Server offload is a way to increase the efficiency of servers and improve application performance and security. Server offload increases efficiency of servers by alleviating the need for the web/application server to consume resources performing tasks that can be performed more efficiently on an application network infrastructure solution. The two best examples of this are SSL encryption/decryption and compression. Both are CPU intense operations that can consume 20-40% of a web/application server’s resources. By offloading these functions to an application network infrastructure solution, servers “reclaim” those resources and can use them instead to execute application logic, serve more users, handle more requests, and do so faster. Server offload improves application performance by allowing the web/application server to concentrate on what it is designed to do: serve applications and putting the onus for performing ancillary functions on a platform that is more optimized to handle those functions. Server offload provides these benefits whether you have a traditional client-server architecture or have moved (or are moving) toward a virtualized infrastructure. Applications deployed on virtual servers still use TCP connections and SSL and run applications and therefore will benefit the same as those deployed on traditional servers. I am wondering why not all websites enabling this great feature GZIP? 3 Really good reasons you should use TCP multiplexing SOA & Web 2.0: The Connection Management Challenge Understanding network-side scripting I am in your HTTP headers, attacking your application Infrastructure 2.0: As a matter of fact that isn't what it means2.7KViews0likes1CommentAdvanced iRules: An Abstract View of iRules with the Tcl Bytecode Disassembler
In case you didn't already know, I'm a child of the 70's. As such, my formative years were in the 80's, where the music and movies were synthesized and super cheesy. Short Circuit was one of those cheesy movies, featuring Ally Sheedy, Steve Guttenberg, and Johnny Five, the tank-treaded laser-wielding robot with feelings and self awareness. Oh yeah! The plot...doesn't at all matter, but Johnny's big fear once reaching self actualization was being disassembled. Well, in this article, we won't dissamble Johnny Five, but we will take a look at disassembling some Tcl code and talk about optimizations. Tcl forms the foundation of several code environments on BIG-IP: iRules, iCall, tmsh, and iApps. The latter environments don't carry the burden of performance that iRules do, so efficiency isn't as big a concern. When we speak at conferences, we often commit some time to cover code optimization techniques due to the impactful nature of applying an iRule to live traffic. This isn't to say that the system isn't highly tuned and optimized already, it's just important not to introduce any more impact than is absolutely necessary to carry out purpose. In iRules, you can turn timing on to see the impact of an iRule, and in the Tcl shell (tclsh) you can use the time command. These are ultimately the best tools to see what the impact is going to be from a performance perspective. But if you want to see what the Tcl interpreter is actually doing from an instruction standpoint, well, you will need to disassemble the code. I've looked at bytecode in some of the python scripts I've written, but I wasn't aware of a way to do that in Tcl. I found a thread on stack that indicated it was possible, and after probing a little further was given a solution. This doesn't work in Tcl 8.4, which is what the BIG-IP uses, but it does work on 8.5+ so if you have a linux box with 8.5+ you're good to go. Note that there are variances from version to version that could absolutely change the way the interpreter works, so understand that this is an just an exercise in discovery. Solution 1 Fire up tclsh and then grab a piece of code. For simplicity, I'll use two forms of a simple math problem. The first is using the expr command to evaluate 3 times 4, and the second is the same math problem, but wraps the evaluation with curly brackets. The command that will show how the interpreter works its magic is tcl::unsupported::disassemble. ## ## unwrapped expression ## ## % tcl::unsupported::disassemble script { expr 3 * 4 } ByteCode 0x0x1e1ee20, refCt 1, epoch 16, interp 0x0x1d59670 (epoch 16) Source " expr 3 * 4 " Cmds 1, src 12, inst 14, litObjs 4, aux 0, stkDepth 5, code/src 0.00 Commands 1: 1: pc 0-12, src 1-11 Command 1: "expr 3 * 4 " (0) push1 0 # "3" (2) push1 1 # " " (4) push1 2 # "*" (6) push1 1 # " " (8) push1 3 # "4" (10) concat1 5 (12) exprStk (13) done ## ## wrapped expression ## ## % tcl::unsupported::disassemble script { expr { 3 * 4 } } ByteCode 0x0x1de7a40, refCt 1, epoch 16, interp 0x0x1d59670 (epoch 16) Source " expr { 3 * 4 } " Cmds 1, src 16, inst 3, litObjs 1, aux 0, stkDepth 1, code/src 0.00 Commands 1: 1: pc 0-1, src 1-15 Command 1: "expr { 3 * 4 } " (0) push1 0 # "12" (2) done Because the first expression is unwrapped, the interpreter has to build the expression and then call the runtime expression engine, resulting in 4 objects and a stack depth of 5. With the wrapped expression, the interpreter found a compile-time constant and used that directly, resulting in 1 object and a stack depth of 1 as well. Much thanks to Donal Fellows on Stack Overflow for the details. Using the time command in the shell, you can see that wrapping the expression results in a wildly more efficient experience. % time { expr 3 * 4 } 100000 1.02325 microseconds per iteration % time { expr {3*4} } 100000 0.07945 microseconds per iteration Solution 2 I was looking in earnest for some explanatory information for the bytecode fields displayed with tcl::unsupported::disassemble, and came across a couple pages on the Tcl wiki, one building on the other. Combining the pertinent sections of code from each page results in this script you can paste into tclsh: namespace eval tcl::unsupported {namespace export assemble} namespace import tcl::unsupported::assemble rename assemble asm interp alias {} disasm {} ::tcl::unsupported::disassemble proc aproc {name argl body args} { proc $name $argl $body set res [disasm proc $name] if {"-x" in $args} { set res [list proc $name $argl [list asm [dis2asm $res]]] eval $res } return $res } proc dis2asm body { set fstart " push -1; store @p; pop " set fstep " incrImm @p +1;load @l;load @p listIndex;store @i;pop load @l;listLength;lt " set res "" set wait "" set jumptargets {} set lines [split $body \n] foreach line $lines { ;#-- pass 1: collect jump targets if [regexp {\# pc (\d+)} $line -> pc] {lappend jumptargets $pc} } set lineno 0 foreach line $lines { ;#-- pass 2: do the rest incr lineno set line [string trim $line] if {$line eq ""} continue set code "" if {[regexp {slot (\d+), (.+)} $line -> number descr]} { set slot($number) $descr } elseif {[regexp {data=.+loop=%v(\d+)} $line -> ptr]} { #got ptr, carry on } elseif {[regexp {it%v(\d+).+\[%v(\d+)\]} $line -> copy number]} { set loopvar [lindex $slot($number) end] if {$wait ne ""} { set map [list @p $ptr @i $loopvar @l $copy] set code [string map $map $fstart] append res "\n $code ;# $wait" set wait "" } } elseif {[regexp {^ *\((\d+)\) (.+)} $line -> pc instr]} { if {$pc in $jumptargets} {append res "\n label L$pc;"} if {[regexp {(.+)#(.+)} $instr -> instr comment]} { set arg [list [lindex $comment end]] if [string match jump* $instr] {set arg L$arg} } else {set arg ""} set instr0 [normalize [lindex $instr 0]] switch -- $instr0 { concat - invokeStk {set arg [lindex $instr end]} incrImm {set arg [list $arg [lindex $instr end]]} } set code "$instr0 $arg" switch -- $instr0 { done { if {$lineno < [llength $lines]-2} { set code "jump Done" } else {set code ""} } startCommand {set code ""} foreach_start {set wait $line; continue} foreach_step {set code [string map $map $fstep]} } append res "\n [format %-24s $code] ;# $line" } } append res "\n label Done;\n" return $res } proc normalize instr { regsub {\d+$} $instr "" instr ;# strip off trailing length indicator set instr [string map { loadScalar load nop "" storeScalar store incrScalar1Imm incrImm } $instr] return $instr } Now that the script source is in place, you can test the two expressions we tested in solution 1. The output is very similar, however, there is less diagnostic information to go with the bytecode instructions. Still, the instructions are consistent between the two solutions. The difference here is that after "building" the proc, you can execute it, shown below each aproc expression. % aproc f x { expr 3 * 4 } -x proc f x {asm { push 3 ;# (0) push1 0 # "3" push { } ;# (2) push1 1 # " " push * ;# (4) push1 2 # "*" push { } ;# (6) push1 1 # " " push 4 ;# (8) push1 3 # "4" concat 5 ;# (10) concat1 5 exprStk ;# (12) exprStk ;# (13) done label Done; }} % f x 12 % aproc f x { expr { 3 * 4 } } -x proc f x {asm { push 12 ;# (0) push1 0 # "12" ;# (2) done label Done; }} % f x 12 Deeper Down the Rabbit Hole Will the internet explode if I switch metaphors from bad 80's movie to literary classic? I guess we'll find out. Simple comparisons are interesting, but now that we're peeling back the layers, let's look at something a little more complicated like a for loop and a list append. % tcl::unsupported::disassemble script { for { $x } { $x < 50 } { incr x } { lappend mylist $x } } ByteCode 0x0x2479d30, refCt 1, epoch 16, interp 0x0x23ef670 (epoch 16) Source " for { $x } { $x < 50 } { incr x } { lappend mylist $x " Cmds 4, src 57, inst 43, litObjs 5, aux 0, stkDepth 3, code/src 0.00 Exception ranges 2, depth 1: 0: level 0, loop, pc 8-16, continue 18, break 40 1: level 0, loop, pc 18-30, continue -1, break 40 Commands 4: 1: pc 0-41, src 1-56 2: pc 0-4, src 7-9 3: pc 8-16, src 37-54 4: pc 18-30, src 26-32 Command 1: "for { $x } { $x < 50 } { incr x } { lappend mylist $x }" Command 2: "$x " (0) push1 0 # "x" (2) loadStk (3) invokeStk1 1 (5) pop (6) jump1 +26 # pc 32 Command 3: "lappend mylist $x " (8) push1 1 # "lappend" (10) push1 2 # "mylist" (12) push1 0 # "x" (14) loadStk (15) invokeStk1 3 (17) pop Command 4: "incr x " (18) startCommand +13 1 # next cmd at pc 31 (27) push1 0 # "x" (29) incrStkImm +1 (31) pop (32) push1 0 # "x" (34) loadStk (35) push1 3 # "50" (37) lt (38) jumpTrue1 -30 # pc 8 (40) push1 4 # "" (42) done You'll notice that there are four commands in this code. The for loop, the x variable evaluation, the lappend operations, and the loop control with the incr command. There are a lot more instructions in this code, with jump pointers from the x interpretation to the incr statement, the less than comparison, then a jump to the list append. Wrapping Up I went through an exercise years ago to see how far I could minimize the Solaris kernel before it stopped working. I personally got down into the twenties before the system was unusable, but I think the record was somewhere south of 15. So...what's the point? Minimal for minimal's sake is not the point. Meet the functional objectives, that is job one. But then start tuning. Less is more. Less objects. Less stack depth. Less instantiation. Reviewing bytecode is good for that, and is possible with the native Tcl code. However, it is still important to test the code performance, as relying on bytecode objects and stack depth alone is not a good idea. For example, if we look at the bytecode differences with matching an IP address, there is no discernable difference from Tcl's perspective between the two regexp versions, and very little difference between the two regexp versions and the scan example. % dis script { regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})} 192.168.101.20 _ a b c d } ByteCode 0x0x24cfd30, refCt 1, epoch 15, interp 0x0x2446670 (epoch 15) Source " regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-" Cmds 1, src 90, inst 19, litObjs 8, aux 0, stkDepth 8, code/src 0.00 Commands 1: 1: pc 0-17, src 1-89 Command 1: "regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9" (0) push1 0 # "regexp" (2) push1 1 # "([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})" (4) push1 2 # "192.168.101.20" (6) push1 3 # "_" (8) push1 4 # "a" (10) push1 5 # "b" (12) push1 6 # "c" (14) push1 7 # "d" (16) invokeStk1 8 (18) done % dis script { regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _ a b c d } ByteCode 0x0x24d1730, refCt 1, epoch 15, interp 0x0x2446670 (epoch 15) Source " regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _" Cmds 1, src 64, inst 19, litObjs 8, aux 0, stkDepth 8, code/src 0.00 Commands 1: 1: pc 0-17, src 1-63 Command 1: "regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _ " (0) push1 0 # "regexp" (2) push1 1 # "^(\d+)\.(\d+)\.(\d+)\.(\d+)$" (4) push1 2 # "192.168.101.20" (6) push1 3 # "_" (8) push1 4 # "a" (10) push1 5 # "b" (12) push1 6 # "c" (14) push1 7 # "d" (16) invokeStk1 8 (18) done % dis script { scan 192.168.101.20 %d.%d.%d.%d a b c d } ByteCode 0x0x24d1930, refCt 1, epoch 15, interp 0x0x2446670 (epoch 15) Source " scan 192.168.101.20 %d.%d.%d.%d a b c d " Cmds 1, src 41, inst 17, litObjs 7, aux 0, stkDepth 7, code/src 0.00 Commands 1: 1: pc 0-15, src 1-40 Command 1: "scan 192.168.101.20 %d.%d.%d.%d a b c d " (0) push1 0 # "scan" (2) push1 1 # "192.168.101.20" (4) push1 2 # "%d.%d.%d.%d" (6) push1 3 # "a" (8) push1 4 # "b" (10) push1 5 # "c" (12) push1 6 # "d" (14) invokeStk1 7 (16) done However, if you look at the time results from these examples, they are very different. % time { regexp {([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})} 192.168.101.20 matched a b c d } 100000 11.29749 microseconds per iteration % time { regexp {^(\d+)\.(\d+)\.(\d+)\.(\d+)$} 192.168.101.20 _ a b c d } 100000 7.78696 microseconds per iteration % time { scan 192.168.101.20 %d.%d.%d.%d a b c d } 100000 1.03708 microseconds per iteration Why is that? Well, bytecode is a good indicator, but it doesn't address the inherent speed of the commands being invoked. Regex is a very slow operation comparatively. And within the regex engine, the second example is a simpler regex to evaluate, so it's faster (though less accurate, so make sure you are actually passing an IP address.) Then of course, scan shows off its optimized self in grand fashion. Was this a useful exercise in understanding Tcl under the hood? Drop some feedback in the comments if you'd like more tech tips like this that aren't directly covering a product feature or solution, but reveal some utility that assists in learning how things tick.1.3KViews1like3CommentsWhat is a Strategic Point of Control Anyway?
From mammoth hunting to military maneuvers to the datacenter, the key to success is control Recalling your elementary school lessons, you’ll probably remember that mammoths were large and dangerous creatures and like most animals they were quite deadly to primitive man. But yet man found a way to hunt them effectively and, we assume, with more than a small degree of success as we are still here and, well, the mammoths aren’t. Marx Cavemen PHOTO AND ART WORK : Fred R Hinojosa. The theory of how man successfully hunted ginormous creatures like the mammoth goes something like this: a group of hunters would single out a mammoth and herd it toward a point at which the hunters would have an advantage – a narrow mountain pass, a clearing enclosed by large rock, etc… The qualifying criteria for the place in which the hunters would finally confront their next meal was that it afforded the hunters a strategic point of control over the mammoth’s movement. The mammoth could not move away without either (a) climbing sheer rock walls or (b) being attacked by the hunters. By forcing mammoths into a confined space, the hunters controlled the environment and the mammoth’s ability to flee, thus a successful hunt was had by all. At least by all the hunters; the mammoths probably didn’t find it successful at all. Whether you consider mammoth hunting or military maneuvers or strategy-based games (chess, checkers) one thing remains the same: a winning strategy almost always involves forcing the opposition into a situation over which you have control. That might be a mountain pass, or a densely wooded forest, or a bridge. The key is to force the entire complement of the opposition through an easily and tightly controlled path. Once they’re on that path – and can’t turn back – you can execute your plan of attack. These easily and highly constrained paths are “strategic points of control.” They are strategic because they are the points at which you are empowered to perform some action with a high degree of assurance of success. In data center architecture there are several “strategic points of control” at which security, optimization, and acceleration policies can be applied to inbound and outbound data. These strategic points of control are important to recognize as they are the most efficient – and effective – points at which control can be exerted over the use of data center resources. DATA CENTER STRATEGIC POINTS of CONTROL In every data center architecture there are aggregation points. These are points (one or more components) through which all traffic is forced to flow, for one reason or another. For example, the most obvious strategic point of control within a data center is at its perimeter – the router and firewalls that control inbound access to resources and in some cases control outbound access as well. All data flows through this strategic point of control and because it’s at the perimeter of the data center it makes sense to implement broad resource access policies at this point. Similarly, strategic points of control occur internal to the data center at several “tiers” within the architecture. Several of these tiers are: Storage virtualization provides a unified view of storage resources by virtualizing storage solutions (NAS, SAN, etc…). Because the storage virtualization tier manages all access to the resources it is managing, it is a strategic point of control at which optimization and security policies can be easily applied. Application Delivery / load balancing virtualizes application instances and ensures availability and scalability of an application. Because it is virtualizing the application it therefore becomes a point of aggregation through which all requests and responses for an application must flow. It is a strategic point of control for application security, optimization, and acceleration. Network virtualization is emerging internal to the data center architecture as a means to provide inter-virtual machine connectivity more efficiently than perhaps can be achieved through traditional network connectivity. Virtual switches often reside on a server on which multiple applications have been deployed within virtual machines. Traditionally it might be necessary for communication between those applications to physically exit and re-enter the server’s network card. But by virtualizing the network at this tier the physical traversal path is eliminated (and the associated latency, by the way) and more efficient inter-vm communication can be achieved. This is a strategic point of control at which access to applications at the network layer should be applied, especially in a public cloud environment where inter-organizational residency on the same physical machine is highly likely. OLD SKOOL VIRTUALIZATION EVOLVES You might have begun noticing a central theme to these strategic points of control: they are all points at which some kind of virtualization – and thus aggregation – occur naturally in a data center architecture. This is the original (first) kind of virtualization: the presentation of many resources as a single resources, a la load balancing and other proxy-based solutions. When there is a one —> many (1:M) virtualization solution employed, it naturally becomes a strategic point of control by virtue of the fact that all “X” traffic must flow through that solution and thus policies regarding access, security, logging, etc… can be applied in a single, centrally managed location. The key here is “strategic” and “control”. The former relates to the ability to apply the latter over data at a single point in the data path. This kind of 1:M virtualization has been a part of datacenter architectures since the mid 1990s. It’s evolved to provide ever broader and deeper control over the data that must traverse these points of control by nature of network design. These points have become, over time, strategic in terms of the ability to consistently apply policies to data in as operationally efficient manner as possible. Thus have these virtualization layers become “strategic points of control”. And you thought the term was just another square on the buzz-word bingo card, didn’t you?1.1KViews0likes6CommentsSelective Compression on BIG-IP
BIG-IP provides Local Traffic Policies that simplify the way in which you can manage traffic associated with a virtual server. You can associate a BIG-IP local traffic policy to support selective compression for types of content that can benefit from compression, like HTML, XML, and CSS stylesheets. These file types can realize performance improvements, especially across slow connections, by compressing them. You can easily configure your BIG-IP system to use a simple Local Traffic Policy that selectively compresses these file types. In order to use a policy, you will want to create and configure a draft policy, publish that policy, and then associate the policy with a virtual server in BIG-IP v12. Alright, let’s log into a BIG-IP The first thing you’ll need to do is create a draft policy. On the main menu select Local Traffic>Policies>Policy List and then the Create or + button. This takes us to the create policy config screen. We’ll name the policy SelectiveCompression, add a description like ‘This policy compresses file types,’ and we’ll leave the Strategy as the default of Execute First matching rule. This is so the policy uses the first rule that matches the request. Click Create Policy which saves the policy to the policies list. When saved, the Rules search field appears but has no rules. Click Create under Rules. This brings us to the Rules General Properties area of the policy. We’ll give this rule a name (CompressFiles) and then the first settings we need to configure are the conditions that need to match the request. Click the + button to associate file types. We know that the files for compression are comprised of specific file types associated with a content type HTTP Header. We choose HTTP Header and select Content-Type in the Named field. Select ‘begins with’ next and type ‘text/’ for the condition and compress at the ‘response’ time. We’ll add another condition to manage CPU usage effectively. So we click CPU Usage from the list with a duration of 1 minute with a conditional operator of ‘less than or equal to’ 5 as the usage level at response time. Next under Do the following, click the create + button to create a new action when those conditions are met. Here, we’ll enable compression at the response time. Click Save. Now the draft policy screen appears with the General Properties and a list of rules. Here we want to click Save Draft. Now we need to publish the draft policy and associate it with a virtual server. Select the policy and click Publish. Next, on the main menu click Local Traffic>Virtual Servers>Virtual Server List and click the name of the virtual server you’d like to associate for the policy. On the menu bar click Resources and for Policies click Manage. Move SelectiveCompression to the Enabled list and click Finished. The SelectiveCompression policy is now listed in the policies list which is now associated with the chosen virtual server. The virtual server with the SelectiveCompression Local Traffic Policy will compress the file types you specified. Congrats! You’ve now added a local traffic policy for selective compression! You can also watch the full video demo thanks to our TechPubs team. ps966Views0likes7CommentsWAN Optimization is not Application Acceleration
Increasingly WAN optimization solutions are adopting the application acceleration moniker, implying a focus that just does not exist. WAN optimization solutions are designed to improve the performance of the network, not applications, and while the former does beget improvements of the latter, true application acceleration solutions offer greater opportunity for improving efficiency and end-user experience as well as aiding in consolidation efforts that result in a reduction in operating and capital expenditure costs. WAN Optimization solutions are, as their title implies, focused on the WAN; on the network. It is their task to improve the utilization of bandwidth, arrest the effects of network congestion, and apply quality of service policies to speed delivery of critical application data by respecting application prioritization. WAN Optimization solutions achieve these goals primarily through the use of data de-duplication techniques. These techniques require a pair of devices as the technology is most often based on a replacement algorithm that seeks out common blocks of data and replaces them with a smaller representative tag or indicator that is interpreted by the paired device such that it can reinsert the common block of data before passing it on to the receiver. The base techniques used by WAN optimization are thus highly effective in scenarios in which large files are transferred back and forth over a connection by one or many people, as large chunks of data are often repeated and the de-duplication process significantly reduces the amount of data traversing the WAN and thus improves performance. Most WAN optimization solutions specifically implement “application” level acceleration for protocols aimed at the transfer of files such as CIFS and SAMBA. But WAN optimization solutions do very little to aid in the improvement of application performance when the data being exchanged is highly volatile and already transferred in small chunks. Web applications today are highly dynamic and personalized, making it less likely that a WAN optimization solution will find chunks of duplicated data large enough to make the overhead of the replacement process beneficial to application performance. In fact, the process of examining small chunks of data for potential duplicated chunks can introduce additional latency that actual degrades performance, much in the same way compression of small chunks of data can be detrimental to application performance. Too, WAN optimization solutions require deployment in pairs which results in what little benefits these solutions offer for web applications being enjoyed only by end-users in a location served by a “remote” device. Customers, partners, and roaming employees will not see improvements in performance because they are not served by a “remote” device. Application acceleration solutions, however, are not constrained by such limitations. Application acceleration solutions act at the higher layers of the stack, from TCP to HTTP, and attempt to improve performance through the optimization of protocols and the applications themselves. The optimizations of TCP, for example, reduce the overhead associated with TCP session management on servers and improve the capacity and performance of the actual application which in turn results in improved response times. The understanding of HTTP and both the browser and server allows application acceleration solutions to employ techniques that leverage cached data and industry standard compression to reduce the amount of data transferred without requiring a “remote” device. Application acceleration solutions are generally asymmetric, with some few also offering a symmetric mode. The former ensures that regardless of the location of the user, partner, or employee that some form of acceleration will provide a better end-user experience while the latter employs more traditional WAN optimization-like functionality to increase the improvements for clients served by a “remote” device. Regardless of the mode, application acceleration solutions improve the efficiency of servers and applications which results in higher capacities and can aid in consolidation efforts (fewer servers are required to serve the same user base with better performance) or simply lengthens the time available before additional investment in servers – and the associated licensing and management costs – must be made. Both WAN optimization and application acceleration aim to improve application performance, but they are not the same solutions nor do they even focus on the same types of applications. It is important to understand the type of application you want to accelerate before choosing a solution. If you are primarily concerned with office productivity applications and the exchange of large files (including backups, virtual images, etc…) between offices, then certainly WAN optimization solutions will provide greater benefits than application acceleration. If you’re concerned primarily about web application performance then application acceleration solutions will offer the greatest boost in performance and efficiency gains. But do not confuse WAN optimization with application acceleration. There is a reason WAN optimization-focused providers have recently begun to partner with application acceleration and application delivery providers – because there is a marked difference between the two types of solutions and a single offering that combines them both is not (yet) available.799Views0likes2Comments6 Reasons You Need an Application Delivery Controller Now
Application delivery controllers, and load balancing in general, are often seen as solutions waiting for a problem to solve. We know what those problems are, but until we experience them we often don't feel a sense of urgency in acquiring and deploying an application delivery controller. While it's certainly true that an application delivery controller can solve many problems that arise, it's also true that there are benefits to acquiring and deploying an application delivery controller before it becomes absolutely necessary in order to save your application, your site, or your job. So here are six good reasons to consider deploying an application delivery controller now rather than waiting until the next emergency. 6. Efficiency An application delivery controller (ADC) can improve the efficiency of the servers for which it manages application requests. By offloading compute intensive processing like SSL or TCP/IP connection management an ADC reduces the overhead associated with assembling and serving responses to application requests and makes better use of the resources (RAM, CPU, I/O) on each server. Making your infrastructure more efficient is also a great way to "go green". 5. Performance The performance of your applications can be improved dramatically through the deployment of an ADC. Whether it's because of compression, caching, protocol optimizations, connection management or intelligent load balancing algorithms, an ADC improves the overall performance of your applications. 4. Reliability If you rely on applications for business processes or as a revenue stream, the last thing you want is for those application to be unavailable. An ADC provides reliability by ensuring that requests are sent only to available servers, redirecting requests when a server is down for maintenance or finally hit the wall and died. If you're large enough to have two data centers, an ADC with global load balancing capabilities furthers assurance of reliability by redirecting requests from the primary data center to a secondary in the event of a disaster - whether that's a natural disaster (earthquake, fire, flood) or man-made (oops - was that our DS3 I just ripped out?). 3. Security We're not talking about advanced security options like web application firewalls or secure remote access products such as an SSL VPN, we're just talking basic security here. DDoS protection, rate limiting, blacklisting, whitelisting, authentication, resource obfuscation, SSL, content encryption - the bare minimum security you need to protect your applications and the servers on which they're deployed. An ADC provides the core security functions you need to ensure your site is safe. 2. Capacity Capacity is about how much throughput, how many requests, how many users you can support. It's nearly impossible to support thousands of concurrent users with a single server, unless it's one really really really big server. You need more than one server, and in order to architect a solution that uses a pool of servers you need something to mediate and direct those requests - to balance the load across those servers. That means you need an ADC, because the core purpose of an ADC is to perform load balancing and ensure that you can serve everyone who wants to be served. 1. Scalability Scaling up to meet demand is difficult, doing so without re-architecting your infrastructure or scheduling down-time is even more difficult. By including an ADC in your architecture from the very beginning, the process becomes a simple one. Add a new server, add it to the ADC and voila! You've just scaled up and can instantly support more users and more requests without requiring downtime or moving around network cables. Imbibing: Mountain Dew799Views0likes0CommentsAAM iSession replacement?
Hi, I am aware per this that the AAM product suite is going away, and is not even available on current hardware models. I am currently using iSessions to create optimized tunnels for some TCP traffic between sites, which is then secured with SSL profiles. In particular the deduplication and compression is what we mainly take advantage of. My question is, is this functionality going to eventually ship in the eventual BigIP product? Or is iSession/Dedup/Compression simply going away? Thanks, Bryan459Views0likes2CommentsCloud Storage Gateways. Short term win, but long term…?
In the rush to cloud, there are many tools and technologies out there that are brand new. I’ve covered a few, but that’s nowhere near a complete list, but it’s interesting to see what is going on out there from a broad-spectrum view. I have talked a bit about Cloud Storage Gateways here. And I’m slowly becoming a fan of this technology for those who are considering storing in the cloud tier. There are a couple of good reasons to consider these products, and I was thinking about the reasons and their standing validity. Thought I’d share with you where I stand on them at this time, and what I see happening that might impact their value proposition. The two vendors I have taken some time to research while preparing this blog for you are Nasuni and Panzura. No doubt there are plenty of others, but I’m writing you a blog here, not researching a major IT initiative. So I researched two of them to have some points of comparison, and leave the in-depth vendor selection research to you and your staff. These two vendors present similar base technology and very different additional feature sets. Both rely heavily upon local caching in the controller box, and both work with multiple cloud vendors, and both claim to manage compression. Nasuni delivers as a Virtual Appliance, includes encryption on your network before transmitting to the cloud, automated cloud provisioning, and caching that has timed updates to the cloud, but can perform a forced update if the cache gets full. It presents the cloud storage you’ve provisioned as a NAS on your end. Panzura delivers a hardware appliance that also presents the cloud as a NAS, works with multiple cloud vendors, handles encryption on-device, and claims global dedupe. I say claims, because “global” has a meaning that is “all” and in their case “all” means “all the storage we know about”, not “all the storage you know”. I would prefer a different term, but I get what they mean. Like everything else, they can’t de-dupe what they don’t control. They too present the cloud storage you’ve provisioned as a NAS on your end, but claim to accelerate CIFS and NFS also. Panzura is also trying to make a big splash about speeding access to MS-Sharepoint, but honestly, as a TMM for F5, a company that makes two astounding products that speed access to Sharepoint and nearly everything else on the Internet (LTM and WOM), I’m not impressed by Sharepoint acceleration. In fact, our Sharepoint Application Ready Solution is here, and our list of Application Ready Solutions is here. Those are just complete architectures we support directly, and don’t touch on what you can do with the products through Virtuals, iRules, profiles, and the host of other dials and knobs. I could go on and on about this topic, but that’s not the point of this blog, so suffice it to say there are some excellent application acceleration and WAN Optimization products out there, so this point solution alone should not be a buying criteria. There are some compelling reasons to purchase one of these products if you are considering cloud storage as a possible solution. Let’s take a look at them. Present cloud storage as a NAS – This is a huge benefit right now, but over time the importance will hopefully decrease as standards for cloud storage access emerge. Even if there is no actual standard that everyone agrees to, it will behoove smaller players to emulate the larger players that are allowing access to their storage in a manner that is similar to other storage technologies. Encryption – As far as I can see this will always be a big driver. They’re taking care of encryption for you, so you can sleep at night as they ship your files to the public cloud. If you’re considering them for non-public cloud, this point may still be huge if your pipe to the storage is over the public Internet. Local Caching – With current broadband bandwidths, this will be a large driver for the foreseeable future. You need your storage to be responsive, and local caching increases responsiveness, depending upon implementation, cache size, and how many writes you are doing this could be a huge improvement. De-duplication – I wish I had more time to dig into what these vendors mean by dedupe. Replacing duplicate files with a symlink is simplest and most resembles existing file systems, but it is also significantly less effective than partial file de-dupe. Let’s face it, most organizations have a lot more duplication laying around in files named Filename.Draft1.doc through Filename.DraftX.doc than they do in completely duplicate files. Check with the vendors if you’re considering this technology to find out what you can hope to gain from their de-dupe. This is important for the simple reason that in the cloud, you pay for what you use. That makes de-duplication more important than it has historically been. The largest caution sign I can see is vendor viability. This is a new space, and we have plenty of history with early entry players in a new space. Some will fold, some will get bought up by companies in adjacent spaces, some will be successful… at something other than Cloud Storage Gateways, and some will still be around in five or ten years. Since these products compress, encrypt, and de-dupe your data, and both of them manage your relationship with the cloud vendor, losing them is a huge risk. I would advise some due diligence before signing on with one – new companies in new market spaces are not always a risky proposition, but you’ll have to explore the possibilities to make sure your company is protected. After all, if they’re as good as they seem, you’ll soon have more data running through them than you’ll have free space in your data center, making eliminating them difficult at best. I haven’t done the research to say which product I prefer, and my gut reaction may well be wrong, so I’ll leave it to you to check into them if the topic interests you. They would certainly fit well with an ARX, as I mentioned in that other blog post. Here’s a sample architecture that would make “the Cloud Tier” just another piece of your virtual storage directory under ARX, complete with automated tiering and replication capabilities that ARX owners thrive on. This sample architecture shows your storage going to a remote data center over EDGE Gateway, to the cloud over Nasuni, and to NAS boxes, all run through an ARX to make the client (which could be a server or a user – remember this is the NAS client) see a super-simplified, unified directory view of the entire thing. Note that this is theoretical, to my knowledge no testing has occurred between Nasuni and ARX, and usually (though certainly not always) the storage traffic sent over EDGE Gateway will be from a local NAS to a remote one, but there is no reason I can think of for this not to work as expected – as long as the Cloud Gateway really presents itself as a NAS. That gives you several paths to replicate your data, and still presents client machines with a clean, single-directory NAS that participates in ADS if required. In this case Tier one could be NAS Vendor 1, Tier two NAS Vendor 2, your replication targets securely connected over EDGE Gateway, and tier 3 (things you want to save but no longer need to replicate for example) is the cloud as presented by the Cloud Gateway. The Cloud Gateway would arbitrate between actual file systems and whatever idiotic interface the cloud provider decided to present and tell you to deal with, while the ARX presents all of these different sources as a single-directory-tree NAS to the clients, handling tiering between them, access control, etc. And yes, if you’re not an F5 shop, you could indeed accomplish pieces of this architecture with other solutions. Of course, I’m biased, but I’m pretty certain the solution would not be nearly as efficient, cool, or let you sleep as well at night. Storage is complicated, but this architecture cleans it up a bit. And that’s got to be good for you. And all things considered, the only issue that is truly concerning is failure of a startup cloud gateway vendor. If another vendor takes one over, they’ll either support it or provide a migration path, if they are successful at something else, you’ll have plenty of time to move off of their storage gateway product, so only outright failure is a major concern. Related Articles and Blogs Panzura Launches ANS, Cloud Storage Enabled Alternative to NAS Nasuni Cloud Storage Gateway InfoSmack Podcasts #52: Nasuni (Podcast) F5’s BIG-IP Edge Gateway Solution Takes New Approach to Unifying, Optimizing Data Center Access Tiering is Like Tables or Storing in the Cloud Tier444Views0likes1CommentCloud Computing: Vertical Scalability is Still Your Problem
Horizontal scalability achieved through the implementation of a load balancing solution is easy. It's vertical scalability that's always been and remains difficult to achieve, and it's even more important in a cloud computing or virtualized environment because now it can hurt you where it counts: the bottom line. Horizontal scalability is the ability of an application to be scaled up to meet demand through replication and the distribution of requests across a pool or farm of servers. It's the traditional load balanced model, and it's an integral component of cloud computing environments. Vertical scalability is the ability of an application to scale under load; to maintain performance levels as the number of concurrent requests increases. While load balancing solutions can certainly assist in optimizing the environment in which an application needs to scale by reducing overhead that can negatively impact performance (such as TCP session management, SSL operations, and compression/caching functionality) it can't solve core problems that prevent vertical scalability. The problem is that a single database table or SQL query that is poorly constructed can destroy vertical scalability and actually increase the cost of deploying in the cloud. Because you generally pay on a resource basis, if the application isn’t scaling up well it will require more resources to maintain performance levels and thus cost a lot more. Cloud computing isn’t going to magically optimize code or database queries or design database tables with performance in mind, that’s still squarely in the hands of the developers regardless of whether or not cloud computing is used as the deployment model. The issue of vertical scalability is very important when considering the use of cloud computing because you’re often charged based on compute resources used, much like the old mainframe model. If an application doesn’t vertically scale well, it’s going to increase the costs to run in the cloud. Cloud computing providers can't, and probably wouldn't if they could (it makes them money, after all), address vertical scalability issues because they are peculiar to the application. No external solution can optimize code such that the application will magically scale up vertically. External solutions can improve overall performance, certainly, by optimizing protocols, reducing protocol and application overhead, and reducing bandwidth requirements, but it can't dig into the application code and rearrange the order in which joins are performed inside an SQL query, rewrite a particularly poorly written loop, or refactor code to use a more efficient data structure. Vertical scalability, whether the application is deployed inside the local data center or out there in the cloud, is still the domain of the application developer. While developers can certainly take advantage of technologies like network-side scripting and inherent features in application delivery solutions to assist with efforts to increase vertical scalability, there is a limit to what solutions can do to address the root cause of an application's failure to vertically scale. Improving the vertical scalability of applications is important in achieving the benefits of a reduction in costs associated with cloud computing and virtualization. Applications that fail to vertically scale well may end up costing more when deployed in the cloud because of the additional demand on compute resources required as demand increases. Things you can do to improve vertical scalability Optimize SQL / database queries Take advantage of offload capabilities of application delivery solutions available Be aware of the impact on performance and scalability of decomposing applications into too finely grained services Remember that API usage will impact vertical scalability Understand the bottlenecks associated with the programming language(s) used and address them Cloud computing and virtualization can certainly address vertical scalability limitations by using horizontal scaling techniques to ensure capacity meets demand and performance level agreements are met. But doing so may cost you dearly and eliminate many of the financial incentives that led you to adopt cloud computing or virtualization in the first place. Related articles by Zemanta Infrastructure 2.0: The Diseconomy of Scale Virus Why you should not use clustering to scale an application Top 10 Concepts That Every Software Engineer Should Know Can Today's Hardware Handle the Cloud? Vendors air the cloud's pros and cons Twitter and the Architectural Challenges of Life Streaming Applications424Views0likes3Comments