The Disadvantages of DSR (Direct Server Return)
I read a very nice blog post yesterday discussing some of the traditional pros and cons of load-balancing configurations. The author comes to the conclusion that if you can use direct server return, you should. I agree with the author's list of pros and cons; DSR is the least intrusive method of deploying a load-balancer in terms of network configuration. But there are quite a few disadvantages missing from the author's list. Author's List of Disadvantages of DSR The disadvantages of Direct Routing are: Backend server must respond to both its own IP (for health checks) and the virtual IP (for load balanced traffic) Port translation or cookie insertion cannot be implemented. The backend server must not reply to ARP requests for the VIP (otherwise it will steal all the traffic from the load balancer) Prior to Windows Server 2008 some odd routing behavior could occur in In some situations either the application or the operating system cannot be modified to utilse Direct Routing. Some additional disadvantages: Protocol sanitization can't be performed. This means vulnerabilities introduced due to manipulation of lax enforcement of RFCs and protocol specifications can't be addressed. Application acceleration can't be applied. Even the simplest of acceleration techniques, e.g. compression, can't be applied because the traffic is bypassing the load-balancer (a.k.a. application delivery controller). Implementing caching solutions become more complex. With a DSR configuration the routing that makes it so easy to implement requires that caching solutions be deployed elsewhere, such as via WCCP on the router. This requires additional configuration and changes to the routing infrastructure, and introduces another point of failure as well as an additional hop, increasing latency. Error/Exception/SOAP fault handling can't be implemented. In order to address failures in applications such as missing files (404) and SOAP Faults (500) it is necessary for the load-balancer to inspect outbound messages. Using a DSR configuration this ability is lost, which means errors are passed directly back to the user without the ability to retry a request, write an entry in the log, or notify an administrator. Data Leak Prevention can't be accomplished. Without the ability to inspect outbound messages, you can't prevent sensitive data (SSN, credit card numbers) from leaving the building. Connection Optimization functionality is lost. TCP multiplexing can't be accomplished in a DSR configuration because it relies on separating client connections from server connections. This reduces the efficiency of your servers and minimizes the value added to your network by a load balancer. There are more disadvantages than you're likely willing to read, so I'll stop there. Suffice to say that the problem with the suggestion to use DSR whenever possible is that if you're an application-aware network administrator you know that most of the time, DSR isn't the right solution because it restricts the ability of the load-balancer (application delivery controller) to perform additional functions that improve the security, performance, and availability of the applications it is delivering. DSR is well-suited, and always has been, to UDP-based streaming applications such as audio and video delivered via RTSP. However, in the increasingly sensitive environment that is application infrastructure, it is necessary to do more than just "load balancing" to improve the performance and reliability of applications. Additional application delivery techniques are an integral component to a well-performing, efficient application infrastructure. DSR may be easier to implement and, in some cases, may be the right solution. But in most cases, it's going to leave you simply serving applications, instead of delivering them. Just because you can, doesn't mean you should.5.8KViews0likes4CommentsHTTP Pipelining: A security risk without real performance benefits
Everyone wants web sites and applications to load faster, and there’s no shortage of folks out there looking for ways to do just that. But all that glitters is not gold, and not all acceleration techniques actually do all that much to accelerate the delivery of web sites and applications. Worse, some actual incur risk in the form of leaving servers open to exploitation. A BRIEF HISTORY Back in the day when HTTP was still evolving, someone came up with the concept of persistent connections. See, in ancient times – when administrators still wore togas in the data center – HTTP 1.0 required one TCP connection for every object on a page. That was okay, until pages started comprising ten, twenty, and more objects. So someone added an HTTP header, Keep-Alive, which basically told the server not to close the TCP connection until (a) the browser told it to or (b) it didn’t hear from the browser for X number of seconds (a time out). This eventually became the default behavior when HTTP 1.1 was written and became a standard. I told you it was a brief history. This capability is known as a persistent connection, because the connection persists across multiple requests. This is not the same as pipelining, though the two are closely related. Pipelining takes the concept of persistent connections and then ignores the traditional request – reply relationship inherent in HTTP and throws it out the window. The general line of thought goes like this: “Whoa. What if we just shoved all the requests from a page at the server and then waited for them all to come back rather than doing it one at a time? We could make things even faster!” Tada! HTTP pipelining. In technical terms, HTTP pipelining is initiated by the browser by opening a connection to the server and then sending multiple requests to the server without waiting for a response. Once the requests are all sent then the browser starts listening for responses. The reason this is considered an acceleration technique is that by shoving all the requests at the server at once you essentially save the RTT (Round Trip Time) on the connection waiting for a response after each request is sent. WHY IT JUST DOESN’T MATTER ANYMORE (AND MAYBE NEVER DID) Unfortunately, pipelining was conceived of and implemented before broadband connections were widely utilized as a method of accessing the Internet. Back then, the RTT was significant enough to have a negative impact on application and web site performance and the overall user-experience was improved by the use of pipelining. Today, however, most folks have a comfortable speed at which they access the Internet and the RTT impact on most web application’s performance, despite the increasing number of objects per page, is relatively low. There is no arguing, however, that some reduction in time to load is better than none. Too, anyone who’s had to access the Internet via high latency links can tell you anything that makes that experience faster has got to be a Good Thing. So what’s the problem? The problem is that pipelining isn’t actually treated any differently on the server than regular old persistent connections. In fact, the HTTP 1.1 specification requires that a “server MUST send its responses to those requests in the same order that the requests were received.” In other words, the requests are return in serial, despite the fact that some web servers may actually process those requests in parallel. Because the server MUST return responses to requests in order that the server has to do some extra processing to ensure compliance with this part of the HTTP 1.1 specification. It has to queue up the responses and make certain responses are returned properly, which essentially negates the performance gained by reducing the number of round trips using pipelining. Depending on the order in which requests are sent, if a request requiring particularly lengthy processing – say a database query – were sent relatively early in the pipeline, this could actually cause a degradation in performance because all the other responses have to wait for the lengthy one to finish before the others can be sent back. Application intermediaries such as proxies, application delivery controllers, and general load-balancers can and do support pipelining, but they, too, will adhere to the protocol specification and return responses in the proper order according to how the requests were received. This limitation on the server side actually inhibits a potentially significant boost in performance because we know that processing dynamic requests takes longer than processing a request for static content. If this limitation were removed it is possible that the server would become more efficient and the user would experience non-trivial improvements in performance. Or, if intermediaries were smart enough to rearrange requests such that they their execution were optimized (I seem to recall I was required to design and implement a solution to a similar example in graduate school) then we’d maintain the performance benefits gained by pipelining. But that would require an understanding of the application that goes far beyond what even today’s most intelligent application delivery controllers are capable of providing. THE SILVER LINING At this point it may be fairly disappointing to learn that HTTP pipelining today does not result in as significant a performance gain as it might at first seem to offer (except over high latency links like satellite or dial-up, which are rapidly dwindling in usage). But that may very well be a good thing. As miscreants have become smarter and more intelligent about exploiting protocols and not just application code, they’ve learned to take advantage of the protocol to “trick” servers into believing their requests are legitimate, even though the desired result is usually malicious. In the case of pipelining, it would be a simple thing to exploit the capability to enact a layer 7 DoS attack on the server in question. Because pipelining assumes that requests will be sent one after the other and that the client is not waiting for the response until the end, it would have a difficult time distinguishing between someone attempting to consume resources and a legitimate request. Consider that the server has no understanding of a “page”. It understands individual requests. It has no way of knowing that a “page” consists of only 50 objects, and therefore a client pipelining requests for the maximum allowed – by default 100 for Apache – may not be seen as out of the ordinary. Several clients opening connections and pipelining hundreds or thousands of requests every second without caring if they receive any of the responses could quickly consume the server’s resources or available bandwidth and result in a denial of service to legitimate users. So perhaps the fact that pipelining is not really all that useful to most folks is a good thing, as server administrators can disable the feature without too much concern and thereby mitigate the risk of the feature being leveraged as an attack method against them. Pipelining as it is specified and implemented today is more of a security risk than it is a performance enhancement. There are, however, tweaks to the specification that could be made in the future that might make it more useful. Those tweaks do not address the potential security risk, however, so perhaps given that there are so many other optimizations and acceleration techniques that can be used to improve performance that incur no measurable security risk that we simply let sleeping dogs lie. IMAGES COURTESTY WIKIPEDIA COMMONS4.5KViews0likes5CommentsWhat is server offload and why do I need it?
One of the tasks of an enterprise architect is to design a framework atop which developers can implement and deploy applications consistently and easily. The consistency is important for internal business continuity and reuse; common objects, operations, and processes can be reused across applications to make development and integration with other applications and systems easier. Architects also often decide where functionality resides and design the base application infrastructure framework. Application server, identity management, messaging, and integration are all often a part of such architecture designs. Rarely does the architect concern him/herself with the network infrastructure, as that is the purview of “that group”; the “you know who I’m talking about” group. And for the most part there’s no need for architects to concern themselves with network-oriented architecture. Applications should not need to know on which VLAN they will be deployed or what their default gateway might be. But what architects might need to know – and probably should know – is whether the network infrastructure supports “server offload” of some application functions or not, and how that can benefit their enterprise architecture and the applications which will be deployed atop it. WHAT IT IS Server offload is a generic term used by the networking industry to indicate some functionality designed to improve the performance or security of applications. We use the term “offload” because the functionality is “offloaded” from the server and moved to an application network infrastructure device instead. Server offload works because the application network infrastructure is almost always these days deployed in front of the web/application servers and is in fact acting as a broker (proxy) between the client and the server. Server offload is generally offered by load balancers and application delivery controllers. You can think of server offload like a relay race. The application network infrastructure device runs the first leg and then hands off the baton (the request) to the server. When the server is finished, the application network infrastructure device gets to run another leg, and then the race is done as the response is sent back to the client. There are basically two kinds of server offload functionality: Protocol processing offload Protocol processing offload includes functions like SSL termination and TCP optimizations. Rather than enable SSL communication on the web/application server, it can be “offloaded” to an application network infrastructure device and shared across all applications requiring secured communications. Offloading SSL to an application network infrastructure device improves application performance because the device is generally optimized to handle the complex calculations involved in encryption and decryption of secured data and web/application servers are not. TCP optimization is a little different. We say TCP session management is “offloaded” to the server but that’s really not what happens as obviously TCP connections are still opened, closed, and managed on the server as well. Offloading TCP session management means that the application network infrastructure is managing the connections between itself and the server in such a way as to reduce the total number of connections needed without impacting the capacity of the application. This is more commonly referred to as TCP multiplexing and it “offloads” the overhead of TCP connection management from the web/application server to the application network infrastructure device by effectively giving up control over those connections. By allowing an application network infrastructure device to decide how many connections to maintain and which ones to use to communicate with the server, it can manage thousands of client-side connections using merely hundreds of server-side connections. Reducing the overhead associated with opening and closing TCP sockets on the web/application server improves application performance and actually increases the user capacity of servers. TCP offload is beneficial to all TCP-based applications, but is particularly beneficial for Web 2.0 applications making use of AJAX and other near real-time technologies that maintain one or more connections to the server for its functionality. Protocol processing offload does not require any modifications to the applications. Application-oriented offload Application-oriented offload includes the ability to implement shared services on an application network infrastructure device. This is often accomplished via a network-side scripting capability, but some functionality has become so commonplace that it is now built into the core features available on application network infrastructure solutions. Application-oriented offload can include functions like cookie encryption/decryption, compression, caching, URI rewriting, HTTP redirection, DLP (Data Leak Prevention), selective data encryption, application security functionality, and data transformation. When network-side scripting is available, virtually any kind of pre or post-processing can be offloaded to the application network infrastructure and thereafter shared with all applications. Application-oriented offload works because the application network infrastructure solution is mediating between the client and the server and it has the ability to inspect and manipulate the application data. The benefits of application-oriented offload are that the services implemented can be shared across multiple applications and in many cases the functionality removes the need for the web/application server to handle a specific request. For example, HTTP redirection can be fully accomplished on the application network infrastructure device. HTTP redirection is often used as a means to handle application upgrades, commonly mistyped URIs, or as part of the application logic when certain conditions are met. Application security offload usually falls into this category because it is application – or at least application data – specific. Application security offload can include scanning URIs and data for malicious content, validating the existence of specific cookies/data required for the application, etc… This kind of offload improves server efficiency and performance but a bigger benefit is consistent, shared security across all applications for which the service is enabled. Some application-oriented offload can require modification to the application, so it is important to design such features into the application architecture before development and deployment. While it is certainly possible to add such functionality into the architecture after deployment, it is always easier to do so at the beginning. WHY YOU NEED IT Server offload is a way to increase the efficiency of servers and improve application performance and security. Server offload increases efficiency of servers by alleviating the need for the web/application server to consume resources performing tasks that can be performed more efficiently on an application network infrastructure solution. The two best examples of this are SSL encryption/decryption and compression. Both are CPU intense operations that can consume 20-40% of a web/application server’s resources. By offloading these functions to an application network infrastructure solution, servers “reclaim” those resources and can use them instead to execute application logic, serve more users, handle more requests, and do so faster. Server offload improves application performance by allowing the web/application server to concentrate on what it is designed to do: serve applications and putting the onus for performing ancillary functions on a platform that is more optimized to handle those functions. Server offload provides these benefits whether you have a traditional client-server architecture or have moved (or are moving) toward a virtualized infrastructure. Applications deployed on virtual servers still use TCP connections and SSL and run applications and therefore will benefit the same as those deployed on traditional servers. I am wondering why not all websites enabling this great feature GZIP? 3 Really good reasons you should use TCP multiplexing SOA & Web 2.0: The Connection Management Challenge Understanding network-side scripting I am in your HTTP headers, attacking your application Infrastructure 2.0: As a matter of fact that isn't what it means2.7KViews0likes1CommentiCall Triggers - Invalidating Cache from iRules
iCall is BIG-IP's all new (as of BIG-IP version 11.4) event-based automation system for the control plane. Previously, I wrote up the iCall system overview, as well as an article on the use of a periodic handler for automating backups. This article will feature the use of the triggered iCall handler to allow a user to submit a http request to invalidate the cache served up for an application managed by the Application Acceleration Manager. Starting at the End Before we get to the solution, I'd like to address the use case for invalidating cache. In many cases, the team responsible for an application's health is not the network services team which is the typical point of access to the BIG-IP. For large organizations with process overhead in generating tickets, invalidating cache can take time. A lot of time. So the request has come in quite frequently..."How can I invalidate cache remotely?" Or even more often, "Can I invalidate cache from an iRule?" Others have approached this via script, and it has been absolutely possible previously with iRules, albeit through very ugly and very-not-recommended ways. In the end, you just need to issue one TMSH command to invalidate the cache for a particular application: tmsh::modify wam application content-expiration-time now So how do we get signal from iRules to instruct BIG-IP to run a TMSH command? This is where iCall trigger handlers come in. Before we hope back to the beginning and discuss the iRule, the process looks like this: Back to the Beginning The iStats interface was introduced in BIG-IP version 11 as a way to make data accessible to both the control and data planes. I'll use this to pass the data to the control plane. In this case, the only data I need to pass is to set a key. To set an iStats key, you need to specify : Class Object Measure type (counter, gauge, or string) Measure name I'm not measuring anything, so I'll use a string starting with "WA policy string" and followed by the name of the policy. You can be explicit or allow the users to pass it in a query parameter as I'm doing in this iRule below: when HTTP_REQUEST { if { [HTTP::path] eq "/invalidate" } { set wa_policy [URI::query [HTTP::uri] policy] if { $wa_policy ne "" } { ISTATS::set "WA policy string $wa_policy" 1 HTTP::respond 200 content "App $wa_policy cache invalidated." } else { HTTP::respond 200 content "Please specify a policy /invalidate?policy=policy_name" } } } Setting the key this way will allow you to create as many triggers as you have policies. I'll leave it as an exercise for the reader to make that step more dynamic. Setting the Trigger With iStats-based triggers, you need linkage to bind the iStats key to an event-name, wacache in my case. You can also set thresholds and durations, but again since I am not measuring anything, that isn't necessary. sys icall istats-trigger wacache_trigger_istats { event-name wacache istats-key "WA policy string wa_policy_name" } Creating the Script The script is very simple. Clear the cache with the TMSH command, then remove the iStats key. sys icall script wacache_script { app-service none definition { tmsh::modify wam application dc.wa_hero content-expiration-time now exec istats remove "WA policy string wa_policy_name" } description none events none } Creating the Handler The handler is the glue that binds the event I created in the iStats trigger. When the handler sees an event named wacache, it'll execute the wacache_script iCall script. sys icall handler triggered wacache_trigger_handler { script wacache_script subscriptions { messages { event-name wacache } } } Notes on Testing Add this command to your arsenal - tmsh generate sys icall event <event-name> context none</event-name> where event-name in my case is wacache. This allows you to troubleshoot the handler and script without worrying about the trigger. And this one - tmsh modify sys db log.evrouted.level value Debug. Just note that the default is Notice when you're all done troubleshooting.1.6KViews0likes6CommentsF5 Friday: How to Create Your Own URL Shortener
Network-side scripting and really big, really fast tables let you implement your own (controllable) URL shortening service We all use URL shorteners to share links, especially via Twitter and other space-constrained communications channels. At the same time, we’re leery of clicking on a short URL that comes from someone we don’t know well enough to trust implicitly. And unless the service you’re using to exchange thoughts automatically applies a URL shortening service to any links contained within your message, you’re likely creating those short URLs by hand. We love to hate them and we hate to love them. But it is what it is, and what it is is both useful and somewhat risky. Basically there’s three core issues with leveraging URL shortening services: Unless you’ve got a developer on hand (and even sometimes if you do) external URL shortening services require manual creation Most services don’t allow “custom” domains, i.e. allow you to use your domain and simply shorten the URI. Those that do (bit.ly for example) require changes to your infrastructure (specifically DNS entries) Shortened URLs shared via traditional services are often suspect because these services have been used to “hide” the destination. The malicious use of short URLs engenders suspicion with many and a refusal to investigate on the off-chance the destination is a malware laden site or something NSFW (Not Safe For Work). And yet sharing URLs becomes increasingly tedious the longer the URL is. Really, just because you can use several thousand characters doesn’t mean you should. Thus URL shorteners, despite their shortcomings, have become the method du jour for turning long URLs into easily consumed, sharable tidbits. We hate to love them, we love to hate them. We’re addicted to short URLs. To address the shortcomings, wouldn’t it be nice if you could maintain your own domain and still shorten those URLs? And wouldn’t it be even nicer if that meant you could actually gather usage statistics about that URL? While bit.ly’s “pro” service allows the former, it’s still amazingly naive immature in the reporting department, and it’s nigh-unto-impossible to extract that data any way but manually. Finally, wouldn’t it be nice if you could integrate the shortening process in a dynamic way rather than always creating them manually? Have I got a deal for you… iRULE CUSTOM URL SHORTENER I talk a lot about network-side scripting as an agile method of well, manipulating application requests and data on-demand. From inbound inspection to outbound rewriting, network-side scripting is the realization of one of the foundational dynamic datacenter components: dynamic infrastructure. Providing real-time interaction with requests and responses traversing an intelligent intermediary means devops, infosec, developers, and network teams have the tools with which they can address a variety of obstacles and pain-points. In this case, it’s adding business value and increasing visibility; maintaining control and ensuring the integrity of links shared for whatever the reason. It also allows the ability to better discern from where and whom links are being picked up. It’s real-time campaign tracking. The core value here though is two-fold: (1) you maintain control and (2) you use your own domain to provide some measure of integrity assurance to those you’re sharing the links with. The secondary and tertiary benefits are in having a way to track business and marketing campaigns. An immediate question should be (it was for me) “what about performance?” Just how large can a table containing a mapping of short URIs to long URIs get before it starts to impede performance? This is essentially a proxy solution, so every microsecond it takes to look up the short URI and replace it with a long URI adds to the response time of a request. Well, the bonus if you’re using BIG-IP LTM and an iRule is that the functionality is taking advantage of the core platform session table which, if you know a thing or two about networking, absolutely must be high-speed, high-performance in its ability to perform lookups because it can grow to billions of entries in high-traffic situations. So the answer from the experts to my question was, “Giant. Huge. Ginormous.” The second bonus is that you don’t necessarily have to do a redirect, which adds to the overall response time. With out-of-band URL shortening services the request goes to a third-party proxy, is translated, and a redirect to the original is returned to the user. Then the user’s browser automatically makes a second request and gets the content they wanted. With an integrated, full-proxy iRule-based solution the redirect isn’t strictly necessary. While you can still use that same method, it would be much more efficient to simply look up the short URI, grab the full URI, and then simply replace the requested URI with the real one and send it on to the server. You’re eliminating time on the wire between the third-party service and the user completely, and the associated TCP-session setup/teardown time which we know is rather expensive in terms of time and resources. You can still do a redirect if you want to, but it’s completely unnecessary unless, of course, you’re planning on offering the capability as an out-of-band service to your customers. So by using an iRule you can improve performance, increase visibility, and provide some measure of integrity assurance while you’re out there sharing links with whomever you’re sharing them with. Additionally, it’s just a darn cool use of iRules that has a lot of potential to be modified and used for other situations in which URI mapping might be useful. And of course it happens to be the case that DevCentral’s newest cohort, George Watkins, has written up an iRule to handle URI shortening. iRule wizard Colin Walker helped optimize the rule, so it ought to be a very efficient little iRule. Go ahead and give George’s URI shortening iRule a look-see and try it out. If you don’t have a BIG-IP yourself, then go ahead and get one – iRules are a part of the core TMOS platform upon which BIG-IP products and modules are based, so the VE (Virtual Edition) of BIG-IP LTM has everything you need to deploy the iRule and take it for a spin. NOTE: George’s version of the iRule is based on an out-of-band service model. Using HTTP::uri instead of HTTP::redirect for the URL will change the behavior and eliminate the overhead of the redirect, but don’t forget to assign the iRule to the appropriate VIP. It is also a manual create, but there’s no reason you could not integrate the iRule functionality into the response processing and rewrite all URIs in a page to be small URLs automatically – or just any URL with a length greater than . Happy coding! Related Posts All F5 Friday Entries on DevCentral All About iRules from tag iRules F5 Friday: Eavesdropping on Availability Defeating Attacks Easier Than Detecting Them F5 Friday: An On-Demand Turing Test Out, Damn’d Bot! Out, I Say! F5 Friday: A Network Heatwave That’s Good For Operations No Shirt, No Shoes, No HTTP Service Is Vendor Lock-In Really a Bad Thing? AJAX and Network-Side Scripting Automatically Removing Cookies from tag performance The Great Client-Server Architecture Myth IE8: Robbing Peter to pay Paul Your Network is Not My Network (more..) del.icio.us Tags: MacVittie,F5,F5 Friday,George Watkins,Colin Walker,iRules,URL shortener,bit.ly,performance1.5KViews0likes0CommentsWILS: SSL TPS versus HTTP TPS over SSL
The difference between these two performance metrics is significant so be sure you know which one you’re measuring, and which one you wanted to be measuring. It may be the case that you’ve decided that SSL is, in fact, a good idea for securing data in transit. Excellent. Now you’re trying to figure out how to implement support and you’re testing solutions or perhaps trying to peruse reports someone else generated from testing. Excellent. I’m a huge testing fan and it really is one of the best ways to size a solution specifically for your environment. Some of the terminology used to describe specific performance metrics in application delivery, however, can be misleading. The difference between SSL TPS (Transactions per second) and HTTP TPS over SSL, for example, are significant and therefore should not be used interchangeably when comparing performance and capacity of any solution – that goes for software, hardware, or some yet-to-be-defined combination thereof. The reasons why interpreting claims of SSL TPS are so difficult is due to the ambiguity that comes from SSL itself. SSL “transactions” are, by general industry agreement (unenforceable, of course) a single transaction that is “wrapped” in an SSL session. Generally speaking one SSL transaction is considered: 1. Session establishment (authentication, key exchange) 2. Exchange of data over SSL, often a 1KB file over HTTP 3. Session closure Seems logical, but technically speaking a single SSL transaction could be interpreted as any single transaction conducted over an SSL encrypted session because the very act of transmitting data over the SSL session necessarily requires SSL-related operations. SSL session establishment requires a handshake and an exchange of keys, and the transfer of data within such a session requires the invocation of encryption and decryption operations (often referred to as bulk encryption). Therefore it is technically accurate for SSL capacity/performance metrics to use the term “SSL TPS” and be referring to two completely different things. This means it is important that whomever is interested in such data must do a little research to determine exactly what is meant by SSL TPS when presented with such data. Based on the definition the actual results mean different things. When used to refer to HTTP TPS over SSL the constraint is actually on the bulk encryption rate (related more to response time, latency, and throughput measurements), while SSL TPS measures the number of SSL sessions that can be created per second and is more related to capacity than response time metrics. It can be difficult to determine which method was utilized, but if you see the term “SSL ID re-use” anywhere, you can be relatively certain the test results refer to HTTP TPS over SSL rather than SSL TPS. When SSL session IDs are reused, the handshaking and key exchange steps are skipped, which reduces the number of computationally expensive RSA operations that must be performed and artificially increases the results. As always, if you aren’t sure what a performance metric really means, ask. If you don’t get a straight answer, ask again, or take advantage of all that great social networking you’re doing and find someone you trust to help you determine what was really tested. Basing architectural decisions on misleading or misunderstood data can cause grief and be expensive later when you have to purchase additional licenses or solutions to bring your capacity up to what was originally expected. WILS: Write It Like Seth. Seth Godin always gets his point across with brevity and wit. WILS is an ATTEMPT TO BE concise about application delivery TOPICS AND just get straight to the point. NO DILLY DALLYING AROUND. The Anatomy of an SSL Handshake When Did Specialized Hardware Become a Dirty Word? WILS: Virtual Server versus Virtual IP Address Following Google’s Lead on Security? Don’t Forget to Encrypt Cookies WILS: What Does It Mean to Align IT with the Business WILS: Three Ways To Better Utilize Resources In Any Data Center WILS: Why Does Load Balancing Improve Application Performance? WILS: Application Acceleration versus Optimization All WILS Topics on DevCentral What is server offload and why do I need it?1.2KViews0likes3CommentsF5 Friday: The 2048-bit Keys to the Kingdom
There’s a rarely mentioned move from 1024-bit to 2048-bit key lengths in the security demesne … are you ready? More importantly, are your infrastructure and applications ready? Everyone has likely read about DNSSEC and the exciting day on which the root servers were signed. In response to security concerns – and very valid ones at that – around the veracity of responses returned by DNS, which underpins the entire Internet, the practice of signing responses was introduced. Everyone who had anything to do with encryption and certificates said something about the initiative. But less mentioned was a move to leverage longer RSA key lengths as a means to increase the security of the encryption of data, a la SSL (Secure Socket Layer). While there have been a few stories on SSL vulnerabilities – Dan Kaminsky illustrated flaws in the system at Black Hat last year – there’s been very little public discussion about the transition in key sizes across the industry. The last time we had such a massive move in the cryptography space was back when we moved from 128-bit to 256-bit keys. Some folks may remember that many early adopters of the Internet had issues with browser support back then, and the impact on the performance and capacity of infrastructure were very negatively impacted. Well, that’s about to happen again as we move from 1024-bit keys to 2048-bit keys – and the recommended transition deadline is fast approaching. In fact, NIST is recommending the transition by January 1st, 2011 and several key providers of certificates are already restricting the issuance of certificates to 2048-bit keys. NIST Recommends transition to 2048-bit key lengths by Jan 1st 2011: Special Publication 800-57 Part 1 Table 4 VeriSign Started focusing on 2048-bit keys in 2006; complete transition by October 2010. Indicates their transition is to comply with best practices as recommended by NIST GeoTrust Clearly indicates why it transitioned to only 2048-bit Keys in June 2010 Entrust Also following NIST recommendations : TN 7710 - Entrust is moving to 2048-bit RSA keys. GoDaddy "We enforced a new policy where all newly issued and renewed certificates must be 2048-bit“. Extended Validation (EV) required 2048-bit keys on 1/1/09 Note that it isn’t just providers who are making this move. Microsoft uses and recommends 2048-bit keys per the NIST guidelines for all servers and other products. Red Hat recommends 2048+ length for keys using RSA algorithm. And as of December 31, 2013 Mozilla will disable or remove all root certificates with RSA key sizes smaller than 2048 bits. That means sites that have not made the move as of that date will find it difficult for customers and visitors to hook up, as it were. THE IMPACT on YOU The impact on organizations that take advantage of encryption and decryption to secure web sites, sign code, and authenticate access is primarily in performance and capacity. The decrease in performance as key sizes increase is not linear, but more on the lines of exponential. For example, though the key size is shifting by a factor of two, F5 internal testing indicates that such a shift results in approximately a 5x reduction in performance (as measured by TPS – Transactions per Second). This reduction in performance has also been seen by others in the space, as indicated by a recent Citrix announcement of a 5x increase in performance of its cryptographic processing. This decrease in TPS is due primarily to heavy use of the key during the handshaking process. The impact on you is heavily dependent on how much of your infrastructure leverages SSL. For some organizations – those that require SSL end-to-end – the impact will be much higher. Any infrastructure component that terminated SSL and re-encrypted the data as a means to provide inline functionality (think IDS, Load balancer, web application firewall, anti-virus scan) will need to also support 2048-bit keys, and if new certificates are necessary these, too, will need to be deployed throughout the infrastructure. Any organization with additional security/encryption requirements over and above simply SSL encryption, such as FIPS 140-2 or higher, are looking at new/additional hardware to support the migration. Note: There are architectural solutions to avoid the type of forklift upgrade necessary, we’ll get to that shortly. If your infrastructure is currently supporting SSL encryption/decryption on your web/application servers, you’ll certainly want to start investigating the impact on capacity and performance now. SSL with 1024-bit keys typically requires about 30% of a server’s resources (RAM, CPU) and the increase to 2048-bit keys will require more, which necessarily comes from the resources used by the application. That means a decrease in capacity of applications running on servers on which SSL is terminated and typically a degradation in performance. In general, the decrease we’ve (and others) have seen in TPS performance on hardware should give you a good idea of what to expect on software or virtual network appliances. As a general rule you should determine what level of SSL transaction you are currently licensed for and divide that number by five to determine whether you can maintain the capacity you have today after a migration to 2048-bit keys. It may not be a pretty picture. ADVANTAGES of SSL OFFLOAD If the advantages of offloading SSL to an external infrastructure component were significant before the move from 1024-bit keys to 2048-bit keys makes them nearly indispensable to maintaining performance and capacity of existing applications and infrastructure. Offloading SSL to an external infrastructure component enabled with specialized hardware further improves the capacity and performance of these mathematically complex and compute intensive processes. ARCHITECTURAL SOLUTION to support 1024-bit key only applications If you were thinking about leveraging a virtual network appliance for this purpose, you might want to think about that one again. Early testing of RSA operations using 2048-bit keys on 64-bit commodity hardware shows a capacity in the hundreds of transactions per second. Not tens of thousands, not even thousands, but hundreds. Even if the only use of SSL in your organization is to provide secure web-based access to e-mail, a la Microsoft Web Outlook, this is likely unacceptable. Remember there is rarely a 1:1 relationship between connections and web applications today, and each connection requires the use of those SSL operations, which can drastically impact the capacity in terms of user concurrency. Perhaps as important is the ability to architect around limitations imposed by applications on the security infrastructure. For example, many legacy applications (Lotus Notes, IIS 5.0) do not support 2048-bit keys. Thus meeting the recommendation to migrate to 2048-bit keys is all but impossible for this class of application. Leveraging the capabilities of an application delivery controller that can support 2048-bit keys, however, allows for the continued support of 1024-bit keys to the application while supporting 2048-bit keys to the client. ARE YOU READY? That’s a question only you can answer, and you can only answer that by taking a good look at your infrastructure and applications. Now is a good time to evaluate your SSL strategy to ensure it’s up to the challenge of 2048-bit keys. Check your licenses, determine your current capacity and requirements, and compare those to what can be realistically expected once the migration is complete. Validate that applications currently requiring 1024-bit keys can support 2048-bit keys or whether such a migration is contraindicated by the application, and investigate whether a proxy-based (mediation) solution might be appropriate. And don’t forget to determine whether or not compliance with regulations may require new hardware solutions. Now this is an F5 Friday post, so you knew there had to be some tie-in, right? Other than the fact that the red-ball glowing ball on every BIG-IP just looks hawesome in the dim light of a data center, F5 solutions can mitigate many potential negative impacts resulting from a migration of 1024-bit to 2048-bit key lengths: BIG-IP Specialized Hardware BIG-IP hardware platforms include specialized RSA acceleration hardware that improves the performance of the RSA operations necessary to support encryption/decryption and SSL communication and enables higher capacities of the same. EM (Enterprise Manager) Streamlines Certificate Management F5’s centralized management solution, EM (Enterprise Manager), allows an organization to better manage a cryptographic infrastructure by providing the means to monitor and manage key expirations across all F5 solutions and collect TPS history and usage when sizing to better understand capacity constraints. BIG-IP Flexibility BIG-IP is a full proxy-based solution. It can mediate between clients and applications that have disparate requirements, such as may be the case with key sizes. This allows you to use 2048-bit keys but retain the use of 1024-bit keys to web/application servers and other infrastructure solutions. Strong partnerships and integration with leading centralized key management and crypto vendors that provide automated key migration and provisioning through open and standards-based APIs and robust scripting capabilities. DNSSEC Enhance security through DNSSEC to validate domain names. Although it has been suggested that 1024-bit keys might be sufficient for signing zones, with the forced migration to 2048-bit keys there will be increased pressure on the DNS infrastructure that may require a new solution for your DNS systems. THIS IS IN MANY REGARDS INFOSEC’S “Y2K” In many ways a change of this magnitude is for Information Security professionals their “Y2K” because such a migration will have an impact on nearly every component and application in the data center. Unfortunately for the security folks, we had a lot more time to prepare for Y2K…so get started, go through the checklist, and get yourself ready to make the switch now before the eleventh hour is upon us. Related blogs & articles: The Anatomy of an SSL Handshake [Network Computing] DNSSEC Readiness [ISC.org] Get Ready for the Impact of 2048-bit RSA Keys [Network Computing] SSL handshake latency and HTTPS optimizations [semicomplete.com] Pete Silva Demonstrates the FirePass SSL-VPN Data Center Feng Shui: SSL WILS: SSL TPS versus HTTP TPS over SSL SSL performance - DevCentral - F5 DevCentral > Community > Group ... DevCentral Weekly Roundup | Audio Podcast - SSL iControl Apps - #12 - Global SSL Statistics > DevCentral > F5 ... Oracle 10g SSL Offload - JInitiator:X509CertChainInvalidErr error ... Requiring an SSL Certificate for Parts of an Application ... The Order of (Network) Operations1.2KViews0likes4CommentsWhat is a Strategic Point of Control Anyway?
From mammoth hunting to military maneuvers to the datacenter, the key to success is control Recalling your elementary school lessons, you’ll probably remember that mammoths were large and dangerous creatures and like most animals they were quite deadly to primitive man. But yet man found a way to hunt them effectively and, we assume, with more than a small degree of success as we are still here and, well, the mammoths aren’t. Marx Cavemen PHOTO AND ART WORK : Fred R Hinojosa. The theory of how man successfully hunted ginormous creatures like the mammoth goes something like this: a group of hunters would single out a mammoth and herd it toward a point at which the hunters would have an advantage – a narrow mountain pass, a clearing enclosed by large rock, etc… The qualifying criteria for the place in which the hunters would finally confront their next meal was that it afforded the hunters a strategic point of control over the mammoth’s movement. The mammoth could not move away without either (a) climbing sheer rock walls or (b) being attacked by the hunters. By forcing mammoths into a confined space, the hunters controlled the environment and the mammoth’s ability to flee, thus a successful hunt was had by all. At least by all the hunters; the mammoths probably didn’t find it successful at all. Whether you consider mammoth hunting or military maneuvers or strategy-based games (chess, checkers) one thing remains the same: a winning strategy almost always involves forcing the opposition into a situation over which you have control. That might be a mountain pass, or a densely wooded forest, or a bridge. The key is to force the entire complement of the opposition through an easily and tightly controlled path. Once they’re on that path – and can’t turn back – you can execute your plan of attack. These easily and highly constrained paths are “strategic points of control.” They are strategic because they are the points at which you are empowered to perform some action with a high degree of assurance of success. In data center architecture there are several “strategic points of control” at which security, optimization, and acceleration policies can be applied to inbound and outbound data. These strategic points of control are important to recognize as they are the most efficient – and effective – points at which control can be exerted over the use of data center resources. DATA CENTER STRATEGIC POINTS of CONTROL In every data center architecture there are aggregation points. These are points (one or more components) through which all traffic is forced to flow, for one reason or another. For example, the most obvious strategic point of control within a data center is at its perimeter – the router and firewalls that control inbound access to resources and in some cases control outbound access as well. All data flows through this strategic point of control and because it’s at the perimeter of the data center it makes sense to implement broad resource access policies at this point. Similarly, strategic points of control occur internal to the data center at several “tiers” within the architecture. Several of these tiers are: Storage virtualization provides a unified view of storage resources by virtualizing storage solutions (NAS, SAN, etc…). Because the storage virtualization tier manages all access to the resources it is managing, it is a strategic point of control at which optimization and security policies can be easily applied. Application Delivery / load balancing virtualizes application instances and ensures availability and scalability of an application. Because it is virtualizing the application it therefore becomes a point of aggregation through which all requests and responses for an application must flow. It is a strategic point of control for application security, optimization, and acceleration. Network virtualization is emerging internal to the data center architecture as a means to provide inter-virtual machine connectivity more efficiently than perhaps can be achieved through traditional network connectivity. Virtual switches often reside on a server on which multiple applications have been deployed within virtual machines. Traditionally it might be necessary for communication between those applications to physically exit and re-enter the server’s network card. But by virtualizing the network at this tier the physical traversal path is eliminated (and the associated latency, by the way) and more efficient inter-vm communication can be achieved. This is a strategic point of control at which access to applications at the network layer should be applied, especially in a public cloud environment where inter-organizational residency on the same physical machine is highly likely. OLD SKOOL VIRTUALIZATION EVOLVES You might have begun noticing a central theme to these strategic points of control: they are all points at which some kind of virtualization – and thus aggregation – occur naturally in a data center architecture. This is the original (first) kind of virtualization: the presentation of many resources as a single resources, a la load balancing and other proxy-based solutions. When there is a one —> many (1:M) virtualization solution employed, it naturally becomes a strategic point of control by virtue of the fact that all “X” traffic must flow through that solution and thus policies regarding access, security, logging, etc… can be applied in a single, centrally managed location. The key here is “strategic” and “control”. The former relates to the ability to apply the latter over data at a single point in the data path. This kind of 1:M virtualization has been a part of datacenter architectures since the mid 1990s. It’s evolved to provide ever broader and deeper control over the data that must traverse these points of control by nature of network design. These points have become, over time, strategic in terms of the ability to consistently apply policies to data in as operationally efficient manner as possible. Thus have these virtualization layers become “strategic points of control”. And you thought the term was just another square on the buzz-word bingo card, didn’t you?1.1KViews0likes6CommentsCaching FAQs
One of the most mysterious parts of the BIG-IP Application Acceleration Manager (AAM) is caching. Rarely is it explained, and there are very few documents that describe why you would or would not use one of the BIG-IP's caching facilities. Even harder to find is some kind of description of what numbers you should use, or whether or not to push some specific caching button when trying configure your AAM policies or applications. So here's an overview of a select few bits of frequently asked AAM caching questions, and some explanation of why you would or would not do something with those pretty buttons and number fields. To be clear, AAM does not use fast Cache, it has two entirely separate and distinct caching systems of its own: Metastor and the Small Object Cache. In this posting, however, we'll be talking about them, mostly, as if they are one in the same. The 4 most commonly asked questions we get regarding caching are as follows: · Why is there an option to turn off cache on first hit, and why would I ever enable this? · What does Queue Parallel Requests do? · Why would I ever set the maximum object size to anything less than infinity? · OK, a maximum object size makes sense, but what about the minimum object size? Each question is addressed using an analogy of putting marbles into a mason jar. We are, of course, talking about web objects and bytes of data, not marbles and weight. 1) "Why is there an option to turn off cache on first hit, and why would I ever do so?" OK, well, let's start with a simple mental model of a cache. Imagine your website as just a bunch of marbles. To keep it simple, all your marbles are the same size. Now think of a cache as being like a Mason jar. Imagine if the Mason jar is just big enough to hold exactly one marble. You can think of the BIG-IP as a super-fast copying machine that can copy marbles, and store one copy of one marble. Finally, imagine a single user sending requests for marbles to your website through the BIG-IP, where every policy node has "Cache marbles on first hit" turned on, and every marble is cacheable, and cached if requested. Pretty simple, right? If you have "Cache marble on first hit" turned on, then the very first request your user makes for a marble will cause the BIG-IP to turn around, get that marble from the website, copy it, put that copy into the Mason jar, and then hand the original marble to your user. At this point, the Mason jar is full. If the next request your user makes is for a different marble, then the first marble must be removed from the jar in order to make room for the one just requested. Sadly, the effort and time it took to copy and put the first marble into the Mason jar was entirely wasted, and the user got both of his marbles later, and slower than he would have if the BIG-IP had simply taken them from the website, and handed them to your user. If the third request the customer makes is for the first marble, then again the Mason jar has to be emptied and the first marble cached (remember only a single marble can be cached at any time). The BIG-IP is churning away, copying then putting a marble into the Mason jar, then emptying out the Mason jar, but never actually getting any value out of having that Mason jar. If the user keeps switching back and forth between requesting the first marble and the second marble, the jar will never have the marble being requested, and the load on the back end servers has not been reduced. This is considered a zero cache scenario where the benefits of the cache are moot. But imagine if "Cache marble on first hit" is turned off. Now the same marble has to be requested twice before the BIG-IP will copy it and put the copy in the Mason jar. So now, with the first request the BIG-IP does nothing but pass it along. However, the BIG-IP remembers that the blue marble was requested once. The second request also does nothing but pass the marble along, but again, the BIG-IP remembers that, say, a red marble was requested once. At this point, if the user goes back and asks for the blue marble again, it has been requested twice, so it will be copied and stored in the Mason jar. If the user then asks for a green marble, the BIG-IP remembers that the request was made, but does not discard the marble in the jar, as this is the first request. If the user requests the blue marble again, then the user will get a copy of that from the Mason jar, not from your website. You now have an effective cache where 1 in 5 requests have been offloaded from the origin server. In summary, turn off "Cache object on first hit" for policy nodes where the objects either change very quickly, or where the time between requests is relatively long. This will prevent the cache from discarding an object that your users will hopefully be requesting more often, and more frequently. Obviously, the flip side of that coin is that the BIG-IP will have to get the same object from your website twice, so if you are sure that the objects matched by a particular policy node are really popular, and that they will be requested quite frequently, (such as the company logo and navigation buttons) then copy 'em and dump them in the cache the first time they are requested. 2) What is "Queue Parallel Requests" and why would I turn it on? Queuing parallel requests is interesting, as it interacts with caching, but it really only helps when you have a lot of users trying to get the same marble at the same time, and that marble is being cached for the first time. A cache is kind of stupid, and it doesn't remember the marbles it threw away. As a result, any marble being put into it looks like it is being stored "for the first time", even when it is actually being put into the jar for the hundredth time. "Queue Parallel Requests" basically makes all the users who are requesting the same marble wait for it to be fetched off of your website, and then copied once for each user by the BIG-IP. That doesn't sound too interesting or useful until you realize that if you don't turn this on, then between the time you start the process of requesting that marble from your website and finish putting it into the jar, every other request for that same marble will have to be forwarded to your website. Image a scenario where a server takes 2 ms to respond to a request for an object. Every ms 2 new users request the object. In the time it has taken the server to respond to the first request 3 additional requests would have been sent for the server to process. This has created unnecessary demand on the servers. With queuing turned on all subsequent requests for the object will be placed into a parking area to wait for the original response to be returned and cached. Four requests doesn’t sound like it will cause a server to be overloaded, but what if it isn’t 4 but 400 requests. Suddenly, queuing sounds like a better idea, right? It is, but like any other feature, it is not a panacea. Turn it on for new, shareable, highly popular objects that remain the same for a relatively long time. More to the point, however, if the web server that is giving one marble to the BIG-IP to copy and give to a bunch of users hiccups (say, you decide to take down one of the web servers in your pool, or as luck would have it, one of them fails in the middle of hand over that marble), all of those users will get part of a marble, and that is all. You are trading less pool traffic for what our engineers like to call a "single point of failure" risk. But if you have a really rare and valuable marble that everyone wants a copy of, all at the same time, and your website pool is pretty stable and handing out marbles pretty efficiently, then request queuing will really reduce the traffic on your web servers! 3) There is an option to set the minimum and maximum cacheable object size. Why would I ever set the maximum object size to anything less than infinity? Yeah, that's a tough one. First, go read the answer to "Why turn off Cache content on first hit". Then, let's imagine a Mason jar where instead of one marble, we have a jar big enough to store one thousand marbles. In this scenario, however, we are going to assume exactly 16 simultaneous users, and also that the marbles they are requesting are in the jar. Obviously, the web servers in your pool are getting zero requests. Cool, right!? When caching is working, it can be really handy! But now let us change one assumption: let's allow your web site objects to vary in size. We still have 16 users, but there is one marble that is twice the diameter as the marbles in our first example. When this marble is cached it reduces the total number of marbles that can be cached. Only 13 of the original 16 requests can be served from the jar, the other 3 requests have to go to the server pool. If every marble in the cache is twice the diameter of the marbles in our first example, twelve of the 16 requests being made have to go to your pool. At the extreme, if one object completely fills the Mason jar, that marble (well, bowling ball, really!) is the only object that can be served from cache; the other 15 requests have to go to your pool. So you limit the maximum size of the marbles that can be stored in your Mason jar to configure the BIG-IP to serve the average number of simultaneous users you expect, and wish to serve. By the emergent properties of the system, it turns out that large objects are often times not that popular, anyway. Unless you are running a web server whose job it is to serve large patch files to end users, that is. 4) OK, a maximum object size makes sense. So why have a minimum object size? OK, now we have to get explicit about the jar, and about knowing what has been requested, copied, and stored in the jar. Assume that we have a peg board that has exactly one thousand holes in it. Each time we dump a marble in the jar, we write out a tag that describes the marble, tie it to a peg, then put that peg into the peg board. When we remove a marble from the jar, we remove its associated peg from the board. When the peg board is full, we can't store any more marbles in the Mason jar. Now, what if your minimum size is that of a grain of sand, but your mason jar is big enough to fit 100 marbles with a diameter of 2 inches? If what is popular, and requested quite frequently is a bunch of grains of sand, you can end up running out of peg board space long, LONG before you even finish coating the bottom of your Mason jar with sand. Giving your customers copies of those grains of sand will happen often, but will by definition be a smaller percentage of the total volume of traffic than if you made your minimum size larger, AND if you still have enough marbles of that minimum size on your website to fill your cache. Another way of looking at it is in terms of a collection of marbles of all sizes. If a large marble is in cache, and it has to be displaced to make room on the peg board for a tag that records the information for a grain of sand, and then the grain of sand has to be displaced to make room for the large marble, you will have to get both off of your origin servers. If you don't try to cache the sand grain, then when a user asks for the larger marble, the total weight of marbles requested from your server is going to be smaller. Even if that grain of sand has to be served from your server several times in order to keep the larger marble in the jar, that will be a lot less total grams of marbles moved, copied and stored or retrieved from the jar. Obviously, there is a trade off here between the number of requests, versus the total weight of the marbles being requested. Putting it all together Knowing when and what to cache is an important step to ensure that BIG-IP and your application is performing optimally. Setting a parameter with a wrong value can have negative effects causing increased traffic on your origin servers and consuming resources unnecessarily on BIG-IP. Think about what you are trying to achieve, what other optimization features are enabled and the traffic patterns of your site when configuring the cache settings. Thank you to my colleague John Stevens for assistance in writing this article.1.1KViews0likes2CommentsMake Your Cache Work For You
One of the questions we frequently get from the field and customers is how to appropriately tune the profile for caching. There are lots of settings in the profile and a mis-configuration can actually cause some pretty adverse effects, so getting the settings tuned properly is highly recommended. Of course the answer to this question is my go-to response ‘It depends.’ I am sure many people have gotten tired of always hearing the same answer for every question, but there is no one size fits all answer to this question. The natural follow on question is “What does it depend on?” Here I can help you with more details. First are you trying to tune caching for RAM cache (AKA Fast cache) or are you trying to tune for Application Acceleration Manager (AAM)? The settings in the profile will perform differently for each of the caches. How do you determine which objects are cacheable and for how long? RAM Cache as the name implies is based entirely on RAM memory and is available with every BIG-IP LTM. AAM’s cache on the other hand uses both RAM memory and disk for storing objects. How the two determine which objects to cache and for how long differs. AAM decides if an object is cacheable based on the policy associated with the application assigned to the profile. Filters are then applied based on object size, “Responses Cached” and Profile settings. How long an object is cached for is then determined by the lifetime settings within the policy. RAM Cache determines if an object is cacheable and for how long based on the configuration within the profile. The settings are the same for all object types there is no per-object setting as exists with AAM. This profile can control both AAM and RAM Cache, although the settings mean different things depending on which you are configuring for. The table below outlines the differences Table 1 highlights the differences between how decisions on caching are made. Setting RAM Cache AAM Cache Cache Size Maximum amount of space that can be used per profile. No borrowing occurs. Minimum amount of space that is dedicated to the profile, borrowing will occur if resources are available. Max Entries Maximum number of objects that can be stored Number of references that are stored for objects in the resource and entity cache. A reference to an object can be evicted from the resource cache but the item still exists in cache and can be served. Responses served from cache may be slightly delayed in these circumstances, but requests will not be proxied to the origin web servers. How long objects are cached for Fixed for all objects based on the max-age setting in the acceleration profile Configurable on a per object or object type basis in the acceleration policy Determination if an object is cacheable Based on configuration in the acceleration profile Based on the acceleration policy responses cached and proxy settings along with the object size setting from the acceleration profile. How much space can be used for caching? The maximum amount of space available for caching is half of the RAM a TMM process has been allocated. Depending on which platform you are using will impact how much space is available for caching. RAM is used for smaller objects and disk is used for larger objects. The maximum amount of space, both memory and disk) that is available for caching with AAM is up to 256 GB per profile, if resources are available. This does NOT mean you should set the size on all profiles to 256 GB. AAM will borrow if space is available. The trick is figuring out what the initial value should be. The following provides some guidelines on how to calculate this initial value. Calculating the ideal cache size The initial set of variables to care about regarding the cache size: OBJECT_SIZE and lifetime settings. Of course, the values of these variables are going to depend (there’s that pesky word again) on the application, the application content, the traffic patterns, etc. The more unique cacheable objects the application may require a larger cache to run faster, however the frequency of access for those objects, if it's low, may make a large cache to be a waste of space since the objects expire in the cache before the next request needs them based on the lifetime, plus cache latency introduced by the high number of records. See it depends. When the cache is full, AAM will evict the entry that is deemed less important, in order to make room for a new one, resulting in cache misses if the number of popular entities is higher than what the cache can accommodate. Lifetime settings have meaning here again, since it could be the case where having a high age value forces the cache to keep on rotating (evicting) still valid content. The main goal should be to minimize evictions and maximize the load savings on the origin web servers. Other "external factors", that dictate amount of memory/disk space available for caching in AAM are: · Hardware specs. · Number of applications running on that device. · Other modules running in the BIG-IP. As I said in the beginning and you can now see this depends on a number of variables, there's no hard answer that applies to all scenarios. Knowing the specifics of the application makes setting the values easier, however if you don’t know the specifics here are some general guidelines on setting the values-: · Min/Max Object Size: Knowing the distribution of object sizes can help determine what these values could be. If your site is made up of mostly GIFs setting a minimum object size of 10Kb could result in the majority of the objects not being cached. Similarly if your objects are mostly flash files and the maximum object size was set to 100 Kb not many items would be cached. Minimum values of 2-4Kb and maximum values of 1MB are good starting points for these settings · Aging/Lifetime settings: How long should content be cached for is often times a business decision. AAM uses default lifetimes of 4 hours for static content such as images and includes. This means an object will not be revalidated for 4 hours, in most instances this is good. Altering this would determine on how often objects are updated and how long it is safe to serve stale content. In most businesses it is rare for an object to be edited frequently. Yes, new objects and content will be added but the same exact file will likely not change. Take a social site like LinkedIn for example – people are constantly changing their profiles, posting articles, and adding content, but much of the content such as icons and JS files stays the same. The last modified dates of content on my LinkedIn home page range from November 2012 – today. With only a few objects from today. Having a cache serve the objects for 4 hours is relatively safe. · Cache size: The cache-size value for the LTM web-acceleration profile should be set to a "trivial" value based on the content type. A good starting point could be the default value of 100MB, however if your site serves a lot of heavy images maybe a larger than default value should be used. Remember AAM will borrow space if needed so there is no need to set this to 100 GB. A value between 100-500MB is likely a good starting point. The trick here is making sure the space isn’t over or under utilized (more on this below). · Number of entries: This should not be set to the total number of objects on the application but rather calculated based on the size of the cache above in either of the following ways: 1) If all content is of primarily a single object type such as images, you can calculate based on the average object size. According to HttpArchive the average image size is 19KB. If you set the cache size to be 100 MB then the max entries could be calculated using the following formula: Cache size / average object size = Max entries 102400/19 = 5389 I would suggest rounding up to pad slightly to a value of 6000. 2) Now not all caches will cache the same exact type of object there will be objects of varying sizes and content types so an alternative way of calculating the max entries # of HTML pages * average # of objects per page = Max entries HttpArchive reports that the average number of objects on a page is 95 and the average number of requests across a single domain is 51. Why the discrepancy and which number to use? With domain sharding and third party content the requests will not all come from a single FQDN. For the purpose of this calculation we are concerned with the objects that are being served from the origin servers no the third party content so I will choose the lower of the numbers. Sadly there is not a metric for the average number of pages if you have access to that number use it otherwise you will have to guess. For the purpose of this example I am going with a nice round number of 300 pages. 300 * 51 = 15300 That’s a lot of objects and honestly is probably too high but we’re not done calculating yet. We assumed that every page will be downloading 51 unique objects from cache, this is not the case. There are likely common items on the pages js, css, images which will be getting served from the browser’s cache and some pages which are only accessed once in a blue moon, it would be safe to estimate that 50-75% of the objects will be getting served from caching resulting in a total of 7650-11475. A number within this range would be a good starting point. There is a bit of trial and error that goes into configuring the settings. With the above guidance and the process below it becomes a bit easier to narrow in on the best settings. 1.- Set the cache values to a seed value as described above and evaluate. 2.- Let the Application receive the traffic it is expected to receive normally. 3.- Monitor the cache stats: Via TMSH on box $ tmsh show ltm profile web-acceleration Or the TMCTL version which provides the output in csv for scripting analysis & parsing $ tmctl profile_webacceleration_jail_stat For example: tmctl -c profile_webacceleration_jail_stat | grep | grep And look for cache_size and cache_evictions. You can run the following (just put the appropriate WEB_ACCEL_PROFILE_NAME and VIRTUAL_SERVER_NAME) to get the simplified table: % cut_fields=`tmctl -c profile_webacceleration_jail_stat | head -1 | awk 'BEGIN{FS=","; fields="name,vs_name,cache_size,cache_evictions"; split(fields,sfx,","); for (x in sfx) sf[sfx[x]] = sfx[x]; cut_fields=""} { for (i=1; i<=NF; ++i) { if ($i in sf ) cut_fields=cut_fields i"," } } END{ print cut_fields }'`; echo ; echo 'Stats table:' ; tmctl -c profile_webacceleration_jail_stat | head -1 | cut -d ',' -f $cut_fields ; tmctl -c profile_webacceleration_jail_stat | grep WEB_ACCEL_PROFILE_NAME | grep VIRTUAL_SERVER_NAME | cut -d ',' -f $cut_fields; echo Like: % cut_fields=`tmctl -c profile_webacceleration_jail_stat | head -1 | awk 'BEGIN{FS=","; fields="name,vs_name,cache_size,cache_evictions"; split(fields,sfx,","); for (x in sfx) sf[sfx[x]] = sfx[x]; cut_fields=""} { for (i=1; i<=NF; ++i) { if ($i in sf ) cut_fields=cut_fields i"," } } END{ print cut_fields }'`; echo ; echo 'Stats table:' ; tmctl -c profile_webacceleration_jail_stat | head -1 | cut -d ',' -f $cut_fields ; tmctl -c profile_webacceleration_jail_stat | grep webacceleration | grep _listener | cut -d ',' -f $cut_fields; echo This command will output the cache size at that moment, and the cache evictions (the number of objects that were pushed out of the cache to make room for new objects). In the example below the cache is empty and as a result there are no evictions. 4.- Given that applications and traffic patterns are fluid and constantly changing it is recommended to periodically monitor the cache size and store the data in a table to view trends over time. If the maximum cache size is reached frequently or there is a high number of cache evictions then adjusting the cache size would be recommended. On the other hand, if you are barely reaching half the value for the cache size and there are no evictions, consider reducing the setting for a more efficient use of resources. Maximizing the cache hits, highly depends on the traffic pattern. A pattern that is conducive to caching depends on having a subset of documents out of the entire document space that are highly popular, and a long tail of less popular documents. Ideally we have enough space to fit all the highly popular documents. If not, then whatever can fit in becomes the cacheable popular content and we have to live with it. As cache pressure rears its head, we throw out a document based on a calculated weight that is derived from some of the parameters AAM to pick a document that has been configured as less important to throw out when under pressure. An important observation here, note that the more objects cached, the greater the time to first byte, so if latency is mentioned as something more important than OWS off-load, you should take note of that. Look carefully at the traffic. Any content produced by programs or scripts, or that require database accesses may not be useful to cache. If it is useful, a select sub-set of very low recency, high hit count, highly ephemeral objects should be marked as memory only. A very big thank you to my following coworkers Eswar Bala, Sergio Ligregni, Matt Miller and John Stevens for contributing to this article.1KViews0likes2Comments