What is HTTP Part VIII - Compression and Caching
In the last article of this What is HTTP? series we covered the nuances of OneConnect on HTTP traffic through the BIG-IP. In this article, we’ll cover caching and compression. We’ll deal with compression first, and then move on to caching. Compression In the very early days of the internet, much of the content was text based. This meant that the majority of resources were very small in nature. As popularity grew, the desire for more rich content filled with images grew as well, and resource sized began to explode. What had not yet exploded yet, however, was the bandwidth available to handle all that rich content (and you could argue that’s still the case in mobile and remote terrestrial networks as well.) This intersection of more resources without more bandwidth led to HTTP development in a few different areas: Methods for getting or sending partial resources Methods for identifying if resources needed to be retrieved at all Methods for reducing resources during transit that could be successfully reproduced after receipt The various rangeheaders were developed to handle the first case, caching, which we will discuss later in this article, was developed to handle the second case, and compression was developed to handle the third case. The basic definition of data compression is simply reducing the bits necessary to accurately represent the resource. This is done not only to save network bandwidth, but also on storage devices to save space. And of course money in both areas as well. In HTTP/1.0, end-to-end compression was possible, but not hop-by-hop as it does not have a distinguishing mechanism between the two. That is addressed in HTTP/1.1, so intermediaries can use complex algorithms unknown to the server or client to compress data between them and translate accordingly when speaking to the clients and servers respectively. In 11.x forward, compression is managed in its own profile. Prior to 11.x, it was included in the http profile. The httpcompression profile overview on AskF5 is very thorough, so I won’t repeat that information here, but you will want to pay attention to the compression level if you are using gzip (default.) The default of level 1 is fast from the perspective of the act of compressing on BIG-IP, but having done minimal compressing, reaps the least amount of benefit on the wire. If a particular application has great need for less bandwidth utilization toward the clientside of the network footprint, bumping up to level 6 will increase the reduction in bandwidth without overly taxing the BIG-IP to perform the operation. Also, it’s best to avoid compressing data that has already been compressed, like images and pdfs. Compressing them actually makes the resource larger, and wastes BIG-IP resources doing it! SVG format would be an exception to that rule. Also, don’t compress small files. The profile default is 1M for minimum content length. For BIG-IP hardware platforms, compression can be performed in hardware to offload that function. There is a database variable that you can configure to select the data compression strategy via sys modify db compression.strategy . The default value is latency, but there are four other strategies you can employ as covered in the manual. Caching Web caching could (and probably should) be its own multi-part series. The complexities are numerous, and the details plentiful. We did a series called Project Acceleration that covered some of the TCP optimization and compression topics, as well as the larger product we used to call Web Accelerator but is now the Application Acceleration Manager or AAM. AAM is caching and application optimization on steroids and we are not going to dive that deep here. We are going to focus specifically on HTTP caching and how the default functionality of the ramcache works on the BIG-IP. Consider the situation where there is no caching: In this scenario, every request from the browser communicates with the web server, no matter how infrequently the content changes. This is a wasteful use of resources on the server, the network, and even the client itself. The most important resource to our short attention span end users is time! The more objects and distance from the server, the longer the end user waits for that page to render. One way to help is to allow local caching in the browser: This way, the first request will hit the web server, and repeat requests for that same resource will be pulled from the cache (assuming the resource is still valid, more on that below.) Finally, there is the intermediary cache. This can live immediately in front of the end users like in an enterprise LAN, in a content distribution network, immediately in front of the servers in a datacenter, or all of the above! In this case, the browser1 client requests an object not yet in the cache serving all the browser clients shown. Once the cache has the object from the server, it will serve it to all the browser clients, which offloads the requests to server, saves the time in doing so, and brings the response closer to the browser clients as well. Given the benefits of a caching solution, let’s talk briefly of the risks. If you take the control of what’s served away from the server and put it in the hands of an intermediary, especially an intermediary the administrators of the origin server might not have authority over, how do you control then what content the browsers ultimately are loading? That’s where the HTTP standards on caching control come into play. HTTP/1.0 introduced the Pragma, If-Modified-Since, Last-Modified, and Expires headers for cache control. The Cache-Control and ETag headers along with a slew of “If-“ conditional headers were introduced in HTTP/1.1, but you will see many of the HTTP/1.0 cache headers in responses alongside the HTTP/1.1 headers for backwards compatibility. Rather than try to cover the breadth of caching here, I’ll leave it to the reader to dig into the quite good resources linked at the bottom (start with "Things Caches Do") for detailed understanding. However, there's a lot to glean from your browser developer tools and tools like Fiddler and HttpWatch. Consider this request from my browser for the globe-sm.svg file on f5.com. Near the bottom of the image, I’ve highlighted the request Cache-Control header, which has a value of no-cache. This isn’t a very intuitive name, but what the client is directing the cache is that it must submit the request to the origin server every time, even if the content is fresh. This assures authentication is respected while still allowing for the cache to be utilized for content delivery. In the response, the Cache-Control header has two values: public and max-age. The max-age here is quite large, so this is obviously an asset that is not expected to change much. The public directive means the resource can be stored in a shared cache. Now that we have a basic idea what caching is, how does the BIG-IP handle it? The basic caching available in LTM is handled in the same profile that AAM uses, but there are some features missing when AAM is not provisioned. It used to be called ramcache, but now is the webacceleration profile. Solution K14903 provides the overview of the webacceleration profile but we’ll discuss the cache size briefly. Unlike the Web Accelerator, there is no disk associated with the ramcache. As the name implies, this is “hot” cache in memory. So if you are memory limited on your BIG-IP, 100MB might be a little too large to keep locally. Managing the items in cache can be done via the tmsh command line with the ltm profile ramcache command. tmsh show/delete operations can be used against this method. An example show on my local test box: root@(ltm3)(cfg-sync Standalone)(Active)(/Common)(tmos)# show ltm profile ramcache webacceleration Ltm::Ramcaches /Common/webacceleration Host: 192.168.102.62 URI : / -------------------------------------- Source Slot/TMM 1/1 Owner Slot/TMM 1/1 Rank 1 Size (bytes) 3545 Hits 5 Received 2017-11-30 22:16:47 Last Sent 2017-11-30 22:56:33 Expires 2017-11-30 23:16:47 Vary Type encoding Vary Count 1 Vary User Agent none Vary Encoding gzip,deflate Again, if you have AAM licensed, you can provision it and then additional fields will be shown in the webacceleration profile above to allow for an acceleration policy to be applied against your virtual server. Resources RFC 2616 - The standard fine print. Things Caches Do- Excellent napkin diagrams that provide simple explanations of caching operations. Caching Tutorial - Comprehensive walk through of caching. HTTP Caching - Brief but informative look at caching from a webdev perspective. HTTP Caching - Google develops page with examples, flowcharts, and advice on caching strategies. Project Acceleration - Our 10 part series on web acceleration technology available on the BIG-IP platform in LTM and/or AAM modules. Solution K5157 - BIG-IP caching and the Vary header Make Your Cache Work For You - Article by Dawn Parzych here on DevCentral on tuning techniques2.7KViews1like0CommentsMake Your Cache Work For You
One of the questions we frequently get from the field and customers is how to appropriately tune the profile for caching. There are lots of settings in the profile and a mis-configuration can actually cause some pretty adverse effects, so getting the settings tuned properly is highly recommended. Of course the answer to this question is my go-to response ‘It depends.’ I am sure many people have gotten tired of always hearing the same answer for every question, but there is no one size fits all answer to this question. The natural follow on question is “What does it depend on?” Here I can help you with more details. First are you trying to tune caching for RAM cache (AKA Fast cache) or are you trying to tune for Application Acceleration Manager (AAM)? The settings in the profile will perform differently for each of the caches. How do you determine which objects are cacheable and for how long? RAM Cache as the name implies is based entirely on RAM memory and is available with every BIG-IP LTM. AAM’s cache on the other hand uses both RAM memory and disk for storing objects. How the two determine which objects to cache and for how long differs. AAM decides if an object is cacheable based on the policy associated with the application assigned to the profile. Filters are then applied based on object size, “Responses Cached” and Profile settings. How long an object is cached for is then determined by the lifetime settings within the policy. RAM Cache determines if an object is cacheable and for how long based on the configuration within the profile. The settings are the same for all object types there is no per-object setting as exists with AAM. This profile can control both AAM and RAM Cache, although the settings mean different things depending on which you are configuring for. The table below outlines the differences Table 1 highlights the differences between how decisions on caching are made. Setting RAM Cache AAM Cache Cache Size Maximum amount of space that can be used per profile. No borrowing occurs. Minimum amount of space that is dedicated to the profile, borrowing will occur if resources are available. Max Entries Maximum number of objects that can be stored Number of references that are stored for objects in the resource and entity cache. A reference to an object can be evicted from the resource cache but the item still exists in cache and can be served. Responses served from cache may be slightly delayed in these circumstances, but requests will not be proxied to the origin web servers. How long objects are cached for Fixed for all objects based on the max-age setting in the acceleration profile Configurable on a per object or object type basis in the acceleration policy Determination if an object is cacheable Based on configuration in the acceleration profile Based on the acceleration policy responses cached and proxy settings along with the object size setting from the acceleration profile. How much space can be used for caching? The maximum amount of space available for caching is half of the RAM a TMM process has been allocated. Depending on which platform you are using will impact how much space is available for caching. RAM is used for smaller objects and disk is used for larger objects. The maximum amount of space, both memory and disk) that is available for caching with AAM is up to 256 GB per profile, if resources are available. This does NOT mean you should set the size on all profiles to 256 GB. AAM will borrow if space is available. The trick is figuring out what the initial value should be. The following provides some guidelines on how to calculate this initial value. Calculating the ideal cache size The initial set of variables to care about regarding the cache size: OBJECT_SIZE and lifetime settings. Of course, the values of these variables are going to depend (there’s that pesky word again) on the application, the application content, the traffic patterns, etc. The more unique cacheable objects the application may require a larger cache to run faster, however the frequency of access for those objects, if it's low, may make a large cache to be a waste of space since the objects expire in the cache before the next request needs them based on the lifetime, plus cache latency introduced by the high number of records. See it depends. When the cache is full, AAM will evict the entry that is deemed less important, in order to make room for a new one, resulting in cache misses if the number of popular entities is higher than what the cache can accommodate. Lifetime settings have meaning here again, since it could be the case where having a high age value forces the cache to keep on rotating (evicting) still valid content. The main goal should be to minimize evictions and maximize the load savings on the origin web servers. Other "external factors", that dictate amount of memory/disk space available for caching in AAM are: · Hardware specs. · Number of applications running on that device. · Other modules running in the BIG-IP. As I said in the beginning and you can now see this depends on a number of variables, there's no hard answer that applies to all scenarios. Knowing the specifics of the application makes setting the values easier, however if you don’t know the specifics here are some general guidelines on setting the values-: · Min/Max Object Size: Knowing the distribution of object sizes can help determine what these values could be. If your site is made up of mostly GIFs setting a minimum object size of 10Kb could result in the majority of the objects not being cached. Similarly if your objects are mostly flash files and the maximum object size was set to 100 Kb not many items would be cached. Minimum values of 2-4Kb and maximum values of 1MB are good starting points for these settings · Aging/Lifetime settings: How long should content be cached for is often times a business decision. AAM uses default lifetimes of 4 hours for static content such as images and includes. This means an object will not be revalidated for 4 hours, in most instances this is good. Altering this would determine on how often objects are updated and how long it is safe to serve stale content. In most businesses it is rare for an object to be edited frequently. Yes, new objects and content will be added but the same exact file will likely not change. Take a social site like LinkedIn for example – people are constantly changing their profiles, posting articles, and adding content, but much of the content such as icons and JS files stays the same. The last modified dates of content on my LinkedIn home page range from November 2012 – today. With only a few objects from today. Having a cache serve the objects for 4 hours is relatively safe. · Cache size: The cache-size value for the LTM web-acceleration profile should be set to a "trivial" value based on the content type. A good starting point could be the default value of 100MB, however if your site serves a lot of heavy images maybe a larger than default value should be used. Remember AAM will borrow space if needed so there is no need to set this to 100 GB. A value between 100-500MB is likely a good starting point. The trick here is making sure the space isn’t over or under utilized (more on this below). · Number of entries: This should not be set to the total number of objects on the application but rather calculated based on the size of the cache above in either of the following ways: 1) If all content is of primarily a single object type such as images, you can calculate based on the average object size. According to HttpArchive the average image size is 19KB. If you set the cache size to be 100 MB then the max entries could be calculated using the following formula: Cache size / average object size = Max entries 102400/19 = 5389 I would suggest rounding up to pad slightly to a value of 6000. 2) Now not all caches will cache the same exact type of object there will be objects of varying sizes and content types so an alternative way of calculating the max entries # of HTML pages * average # of objects per page = Max entries HttpArchive reports that the average number of objects on a page is 95 and the average number of requests across a single domain is 51. Why the discrepancy and which number to use? With domain sharding and third party content the requests will not all come from a single FQDN. For the purpose of this calculation we are concerned with the objects that are being served from the origin servers no the third party content so I will choose the lower of the numbers. Sadly there is not a metric for the average number of pages if you have access to that number use it otherwise you will have to guess. For the purpose of this example I am going with a nice round number of 300 pages. 300 * 51 = 15300 That’s a lot of objects and honestly is probably too high but we’re not done calculating yet. We assumed that every page will be downloading 51 unique objects from cache, this is not the case. There are likely common items on the pages js, css, images which will be getting served from the browser’s cache and some pages which are only accessed once in a blue moon, it would be safe to estimate that 50-75% of the objects will be getting served from caching resulting in a total of 7650-11475. A number within this range would be a good starting point. There is a bit of trial and error that goes into configuring the settings. With the above guidance and the process below it becomes a bit easier to narrow in on the best settings. 1.- Set the cache values to a seed value as described above and evaluate. 2.- Let the Application receive the traffic it is expected to receive normally. 3.- Monitor the cache stats: Via TMSH on box $ tmsh show ltm profile web-acceleration Or the TMCTL version which provides the output in csv for scripting analysis & parsing $ tmctl profile_webacceleration_jail_stat For example: tmctl -c profile_webacceleration_jail_stat | grep | grep And look for cache_size and cache_evictions. You can run the following (just put the appropriate WEB_ACCEL_PROFILE_NAME and VIRTUAL_SERVER_NAME) to get the simplified table: % cut_fields=`tmctl -c profile_webacceleration_jail_stat | head -1 | awk 'BEGIN{FS=","; fields="name,vs_name,cache_size,cache_evictions"; split(fields,sfx,","); for (x in sfx) sf[sfx[x]] = sfx[x]; cut_fields=""} { for (i=1; i<=NF; ++i) { if ($i in sf ) cut_fields=cut_fields i"," } } END{ print cut_fields }'`; echo ; echo 'Stats table:' ; tmctl -c profile_webacceleration_jail_stat | head -1 | cut -d ',' -f $cut_fields ; tmctl -c profile_webacceleration_jail_stat | grep WEB_ACCEL_PROFILE_NAME | grep VIRTUAL_SERVER_NAME | cut -d ',' -f $cut_fields; echo Like: % cut_fields=`tmctl -c profile_webacceleration_jail_stat | head -1 | awk 'BEGIN{FS=","; fields="name,vs_name,cache_size,cache_evictions"; split(fields,sfx,","); for (x in sfx) sf[sfx[x]] = sfx[x]; cut_fields=""} { for (i=1; i<=NF; ++i) { if ($i in sf ) cut_fields=cut_fields i"," } } END{ print cut_fields }'`; echo ; echo 'Stats table:' ; tmctl -c profile_webacceleration_jail_stat | head -1 | cut -d ',' -f $cut_fields ; tmctl -c profile_webacceleration_jail_stat | grep webacceleration | grep _listener | cut -d ',' -f $cut_fields; echo This command will output the cache size at that moment, and the cache evictions (the number of objects that were pushed out of the cache to make room for new objects). In the example below the cache is empty and as a result there are no evictions. 4.- Given that applications and traffic patterns are fluid and constantly changing it is recommended to periodically monitor the cache size and store the data in a table to view trends over time. If the maximum cache size is reached frequently or there is a high number of cache evictions then adjusting the cache size would be recommended. On the other hand, if you are barely reaching half the value for the cache size and there are no evictions, consider reducing the setting for a more efficient use of resources. Maximizing the cache hits, highly depends on the traffic pattern. A pattern that is conducive to caching depends on having a subset of documents out of the entire document space that are highly popular, and a long tail of less popular documents. Ideally we have enough space to fit all the highly popular documents. If not, then whatever can fit in becomes the cacheable popular content and we have to live with it. As cache pressure rears its head, we throw out a document based on a calculated weight that is derived from some of the parameters AAM to pick a document that has been configured as less important to throw out when under pressure. An important observation here, note that the more objects cached, the greater the time to first byte, so if latency is mentioned as something more important than OWS off-load, you should take note of that. Look carefully at the traffic. Any content produced by programs or scripts, or that require database accesses may not be useful to cache. If it is useful, a select sub-set of very low recency, high hit count, highly ephemeral objects should be marked as memory only. A very big thank you to my following coworkers Eswar Bala, Sergio Ligregni, Matt Miller and John Stevens for contributing to this article.1.1KViews0likes2CommentsDNS Caching
I've been writing some DNS articles over the past couple of months, and I wanted to keep the momentum going with a discussion on DNS Caching. As a reminder, my first five articles are: Let's Talk DNS on DevCentral DNS The F5 Way: A Paradigm Shift DNS Express and Zone Transfers The BIG-IP GTM: Configuring DNSSEC DNS on the BIG-IP: IPv6 to IPv4 Translation We all know that caching improves response time and allows for a better user experience, and the good news is that the BIG-IP is the best in the business when it comes to caching DNS requests. When a user requests access to a website, it's obviously faster if the DNS response comes directly from the cache on a nearby machine rather than waiting for a recursive lookup process that spans multiple servers. The BIG-IP is specifically designed to quickly and efficiently respond to DNS requests directly from cache. This cuts down on administrative overhead and provides a better and faster experience for your users. There are three different types of DNS caches on the BIG-IP: Transparent, Resolver, and Validating Resolver. In order to create a new cache, navigate to DNS >> Caches >> Cache List and create a new cache (I'm using v11.5). Let's check out the details of each one! Transparent Resolver When the BIG-IP is configured with a transparent cache, it uses external DNS resolvers to resolve DNS queries and then cache the responses from the resolvers. Then, the next time the BIG-IP receives a query for a response that exists in cache, it immediately returns the response to the user. Transparent cache responses contain messages and resource records. The diagram below shows a transparent cache scenario where the BIG-IP does not yet have the response to a DNS query in its cache. In this example, the user sends a DNS query, but because the BIG-IP does not have a response cached, it transparently forwards the query to the appropriate external DNS resolver. When the BIG-IP receives the response from the resolver, it caches the response for future queries. A big benefit of transparent caching is that it consolidates content that would otherwise be cached across many different external resolvers. This consolidated cache approach produces a much higher cache hit percentage for your users. The following screenshot shows the configuration options for setting up a transparent cache. Notice that when you select the "Transparent" Resolver Type, you simply configure a few DNS Cache settings and you're done! The Message Cache Size (listed in bytes) is the maximum size of the message cache, and the Resource Record Cache Size (also listed in bytes) is the maximum size of the resource record cache. Pretty straightforward stuff. The "Answer Default Zones" setting is not enabled by default; meaning, it will pass along DNS queries for the default zones. When enabled, the BIG-IP will answer DNS queries for the following default zones: localhost, reverse 127.0.0.1 and ::1, and AS112. The "Add DNSSEC OK Bit On Miss" setting is enabled by default., and simply means that the BIG-IP will add the DNSSEC OK bit when it forwards DNS queries. Adding this bit indicates to the server that the resolver is able to accept DNSSEC security resource records. Resolver Whereas the Transparent cache will forward the DNS query to an external resolver, the "Resolver" cache will actually resolve the DNS queries and cache the responses. The Resolver cache contains messages, resource records, and the nameservers that the BIG-IP queries to resolve DNS queries. The screenshot below shows the configuration options for setting up a Resolver cache. When you select the Resolver cache, you will need to select a Route Domain Name from the dropdown list. This specifies the route domain that the resolver uses for outbound traffic. The Message Cache Size and Resource Record Cache Size are the same settings as in Transparent cache. The Name Server Cache Count (listed in entries) is the maximum number of DNS nameservers on which the BIG-IP will cache connection and capability data. The Answer Default Zones is the same setting as described above for the Transparent cache. As expected, the Resolver cache has several DNS Resolver settings. The "Use IPv4, IPv6, UDP, and TCP" checkboxes are fairly straightforward. They are all enabled by default, and they simply specify whether the BIG-IP will answer and issue queries in those specific formats. The "Max Concurrent UDP and TCP Flows" specifies the maximum number of concurrent sessions the BIG-IP supports. The "Max Concurrent Queries" is similar in that it specifies the maximum number of concurrent queries used by the resolver. The "Unsolicited Reply Threshold" specifies the number of replies to DNS queries the BIG-IP will support before generating an SNMP trap and log message as an alert to a potential attack. DNS cache poisoning and other Denial of Service attacks will sometimes use unsolicited replies as part of their attack vectors. The "Allowed Query Time" is listed in milliseconds and specifies the time the BIG-IP will allow a query to remain in the queue before replacing it with a new query when the number of concurrent distinct queries exceeds the limit listed in the "Max Concurrent Queries" setting (discussed above). The "Randomize Query Character Case" is enabled by default and will force the BIG-IP to randomize the character case (upper/lower) in domain name queries issued to the root DNS servers. Finally, the "Root Hints" setting specifies the host information needed to resolve names outside the authoritative DNS domains. Simply input IP addresses of the root DNS servers and hit the "add" button. Validating Resolver The Validating Resolver cache takes things to the next level by recursively querying public DNS servers and validating the identity of the responding server before caching the response. The Validating Resolver uses DNSSEC to validate the responses (which mitigates DNS attacks like cache poisoning). For more on DNSSEC, you can check out my previous article. The Validating Resolver cache contains messages, resource records, the nameservers that the BIG-IP queries to resolve DNS queries, and DNSSEC keys. When an authoritative server signs a DNS response, the Validating Resolver will verify the data prior to storing it in cache. The Validating Resolver cache also includes a built-in filter and detection mechanism that rejects unsolicited DNS responses. The screenshot below shows the configuration options for setting up a Validating Resolver cache. I won't go into detail on all the settings for this page because most of them are identical to the Validating Resolver cache. However, as you would expect, there are a few extra options in the Validating Resolver cache. The first is found in the DNS Cache settings where you will find the "DNSSEC Key Cache Size". This setting specifies the maximum size (in bytes) of the DNSKEY cache. The DNS Resolver settings are identical to the Resolver cache. The only other difference is found in the DNSSEC Validator settings. The "Prefetch Key" is enabled by default and it specifies that the BIG-IP will fetch the DNSKEY early in the validation process. You can disable this setting if you want to reduce the amount of resolver traffic, but also understand that, if disabled, a client might have to wait for the validating resolver to perform a key lookup (which will take some time). The "Ignore Checking Disabled Bit" is disabled by default. If you enable this setting, the BIG-IP will ignore the Checking Disabled setting on client queries and will perform response validation and only return secure answers. Keep in mind that caching is a great tool to use, but it's also good to know how much space you are willing to allocate for caching. If you allocate too much, you might serve up outdated responses, but if you allocate too little, you will force your users to wait while DNS recursively asks a bunch of servers for information you could have been holding all along. In the end, it's a reminder that you should know how often your application data changes, and you should configure these caching values accordingly. Well, that does it for this edition of DNS caching...stay tuned for more DNS goodness in future articles!2.2KViews1like3CommentsCaching FAQs
One of the most mysterious parts of the BIG-IP Application Acceleration Manager (AAM) is caching. Rarely is it explained, and there are very few documents that describe why you would or would not use one of the BIG-IP's caching facilities. Even harder to find is some kind of description of what numbers you should use, or whether or not to push some specific caching button when trying configure your AAM policies or applications. So here's an overview of a select few bits of frequently asked AAM caching questions, and some explanation of why you would or would not do something with those pretty buttons and number fields. To be clear, AAM does not use fast Cache, it has two entirely separate and distinct caching systems of its own: Metastor and the Small Object Cache. In this posting, however, we'll be talking about them, mostly, as if they are one in the same. The 4 most commonly asked questions we get regarding caching are as follows: · Why is there an option to turn off cache on first hit, and why would I ever enable this? · What does Queue Parallel Requests do? · Why would I ever set the maximum object size to anything less than infinity? · OK, a maximum object size makes sense, but what about the minimum object size? Each question is addressed using an analogy of putting marbles into a mason jar. We are, of course, talking about web objects and bytes of data, not marbles and weight. 1) "Why is there an option to turn off cache on first hit, and why would I ever do so?" OK, well, let's start with a simple mental model of a cache. Imagine your website as just a bunch of marbles. To keep it simple, all your marbles are the same size. Now think of a cache as being like a Mason jar. Imagine if the Mason jar is just big enough to hold exactly one marble. You can think of the BIG-IP as a super-fast copying machine that can copy marbles, and store one copy of one marble. Finally, imagine a single user sending requests for marbles to your website through the BIG-IP, where every policy node has "Cache marbles on first hit" turned on, and every marble is cacheable, and cached if requested. Pretty simple, right? If you have "Cache marble on first hit" turned on, then the very first request your user makes for a marble will cause the BIG-IP to turn around, get that marble from the website, copy it, put that copy into the Mason jar, and then hand the original marble to your user. At this point, the Mason jar is full. If the next request your user makes is for a different marble, then the first marble must be removed from the jar in order to make room for the one just requested. Sadly, the effort and time it took to copy and put the first marble into the Mason jar was entirely wasted, and the user got both of his marbles later, and slower than he would have if the BIG-IP had simply taken them from the website, and handed them to your user. If the third request the customer makes is for the first marble, then again the Mason jar has to be emptied and the first marble cached (remember only a single marble can be cached at any time). The BIG-IP is churning away, copying then putting a marble into the Mason jar, then emptying out the Mason jar, but never actually getting any value out of having that Mason jar. If the user keeps switching back and forth between requesting the first marble and the second marble, the jar will never have the marble being requested, and the load on the back end servers has not been reduced. This is considered a zero cache scenario where the benefits of the cache are moot. But imagine if "Cache marble on first hit" is turned off. Now the same marble has to be requested twice before the BIG-IP will copy it and put the copy in the Mason jar. So now, with the first request the BIG-IP does nothing but pass it along. However, the BIG-IP remembers that the blue marble was requested once. The second request also does nothing but pass the marble along, but again, the BIG-IP remembers that, say, a red marble was requested once. At this point, if the user goes back and asks for the blue marble again, it has been requested twice, so it will be copied and stored in the Mason jar. If the user then asks for a green marble, the BIG-IP remembers that the request was made, but does not discard the marble in the jar, as this is the first request. If the user requests the blue marble again, then the user will get a copy of that from the Mason jar, not from your website. You now have an effective cache where 1 in 5 requests have been offloaded from the origin server. In summary, turn off "Cache object on first hit" for policy nodes where the objects either change very quickly, or where the time between requests is relatively long. This will prevent the cache from discarding an object that your users will hopefully be requesting more often, and more frequently. Obviously, the flip side of that coin is that the BIG-IP will have to get the same object from your website twice, so if you are sure that the objects matched by a particular policy node are really popular, and that they will be requested quite frequently, (such as the company logo and navigation buttons) then copy 'em and dump them in the cache the first time they are requested. 2) What is "Queue Parallel Requests" and why would I turn it on? Queuing parallel requests is interesting, as it interacts with caching, but it really only helps when you have a lot of users trying to get the same marble at the same time, and that marble is being cached for the first time. A cache is kind of stupid, and it doesn't remember the marbles it threw away. As a result, any marble being put into it looks like it is being stored "for the first time", even when it is actually being put into the jar for the hundredth time. "Queue Parallel Requests" basically makes all the users who are requesting the same marble wait for it to be fetched off of your website, and then copied once for each user by the BIG-IP. That doesn't sound too interesting or useful until you realize that if you don't turn this on, then between the time you start the process of requesting that marble from your website and finish putting it into the jar, every other request for that same marble will have to be forwarded to your website. Image a scenario where a server takes 2 ms to respond to a request for an object. Every ms 2 new users request the object. In the time it has taken the server to respond to the first request 3 additional requests would have been sent for the server to process. This has created unnecessary demand on the servers. With queuing turned on all subsequent requests for the object will be placed into a parking area to wait for the original response to be returned and cached. Four requests doesn’t sound like it will cause a server to be overloaded, but what if it isn’t 4 but 400 requests. Suddenly, queuing sounds like a better idea, right? It is, but like any other feature, it is not a panacea. Turn it on for new, shareable, highly popular objects that remain the same for a relatively long time. More to the point, however, if the web server that is giving one marble to the BIG-IP to copy and give to a bunch of users hiccups (say, you decide to take down one of the web servers in your pool, or as luck would have it, one of them fails in the middle of hand over that marble), all of those users will get part of a marble, and that is all. You are trading less pool traffic for what our engineers like to call a "single point of failure" risk. But if you have a really rare and valuable marble that everyone wants a copy of, all at the same time, and your website pool is pretty stable and handing out marbles pretty efficiently, then request queuing will really reduce the traffic on your web servers! 3) There is an option to set the minimum and maximum cacheable object size. Why would I ever set the maximum object size to anything less than infinity? Yeah, that's a tough one. First, go read the answer to "Why turn off Cache content on first hit". Then, let's imagine a Mason jar where instead of one marble, we have a jar big enough to store one thousand marbles. In this scenario, however, we are going to assume exactly 16 simultaneous users, and also that the marbles they are requesting are in the jar. Obviously, the web servers in your pool are getting zero requests. Cool, right!? When caching is working, it can be really handy! But now let us change one assumption: let's allow your web site objects to vary in size. We still have 16 users, but there is one marble that is twice the diameter as the marbles in our first example. When this marble is cached it reduces the total number of marbles that can be cached. Only 13 of the original 16 requests can be served from the jar, the other 3 requests have to go to the server pool. If every marble in the cache is twice the diameter of the marbles in our first example, twelve of the 16 requests being made have to go to your pool. At the extreme, if one object completely fills the Mason jar, that marble (well, bowling ball, really!) is the only object that can be served from cache; the other 15 requests have to go to your pool. So you limit the maximum size of the marbles that can be stored in your Mason jar to configure the BIG-IP to serve the average number of simultaneous users you expect, and wish to serve. By the emergent properties of the system, it turns out that large objects are often times not that popular, anyway. Unless you are running a web server whose job it is to serve large patch files to end users, that is. 4) OK, a maximum object size makes sense. So why have a minimum object size? OK, now we have to get explicit about the jar, and about knowing what has been requested, copied, and stored in the jar. Assume that we have a peg board that has exactly one thousand holes in it. Each time we dump a marble in the jar, we write out a tag that describes the marble, tie it to a peg, then put that peg into the peg board. When we remove a marble from the jar, we remove its associated peg from the board. When the peg board is full, we can't store any more marbles in the Mason jar. Now, what if your minimum size is that of a grain of sand, but your mason jar is big enough to fit 100 marbles with a diameter of 2 inches? If what is popular, and requested quite frequently is a bunch of grains of sand, you can end up running out of peg board space long, LONG before you even finish coating the bottom of your Mason jar with sand. Giving your customers copies of those grains of sand will happen often, but will by definition be a smaller percentage of the total volume of traffic than if you made your minimum size larger, AND if you still have enough marbles of that minimum size on your website to fill your cache. Another way of looking at it is in terms of a collection of marbles of all sizes. If a large marble is in cache, and it has to be displaced to make room on the peg board for a tag that records the information for a grain of sand, and then the grain of sand has to be displaced to make room for the large marble, you will have to get both off of your origin servers. If you don't try to cache the sand grain, then when a user asks for the larger marble, the total weight of marbles requested from your server is going to be smaller. Even if that grain of sand has to be served from your server several times in order to keep the larger marble in the jar, that will be a lot less total grams of marbles moved, copied and stored or retrieved from the jar. Obviously, there is a trade off here between the number of requests, versus the total weight of the marbles being requested. Putting it all together Knowing when and what to cache is an important step to ensure that BIG-IP and your application is performing optimally. Setting a parameter with a wrong value can have negative effects causing increased traffic on your origin servers and consuming resources unnecessarily on BIG-IP. Think about what you are trying to achieve, what other optimization features are enabled and the traffic patterns of your site when configuring the cache settings. Thank you to my colleague John Stevens for assistance in writing this article.1.1KViews0likes2CommentsCache Confusion
Caching is a great way to offload servers and networks; reducing the amount of requests coming into and out of a network. The concept behind the need for caching is simple; store the content most frequently accessed closest to the end user to improve performance. The notion of caching can be applied in many areas: server side caching, client side caching, edge caching, and transparent caching. It is possible when accessing a web page that a user will encounter each of these caches. Server-side Caching One of the primary purposes of caching within the data center is to reduce the load on the web servers and build a more scalable infrastructure. Web servers can quickly become overwhelmed with requests to serve static content, it is best to let the servers handle the high value transactions than being overwhelmed with serving static content. Caching content closer to the edge of the data center improves the application response time and reduces server load. Edge Caching A logical extension of server side caching is edge caching. If the application performs better when content is at the edge of the data center; it would perform even better from the edge of the network. Content Delivery Networks (CDNs) are the most common edge caching solution. Transparent Caching Transparent caches are deployed at ISPs to cache frequently accessed items at the edge of the operator’s network. Deploying a transparent cache provides savings for the operator on the backhaul and improves delivery to the end user. End users and content owners are not aware that content was delivered from the operator’s network hence the classification as a transparent cache. Forward Proxy Cache Forward proxy caching addresses outbound HTTP versus inbound HTTP requests to any internet facing property. They are typically deployed to reduce upstream bandwidth utilization and costs. Client-side Caching Once a user has retrieved a web page resources can be cached on the local device to improve performance on subsequent visits. Browsers determine what content to cache based on a variety of response headers including: Cache-Control, ETag, and Last-Modified. Caching content on the client side provides the greatest benefit for repeat visits as the request never leaves the end-users device. The below diagram illustrates how an end user may encounter each of these caches when requesting a web page. When the request hits a cache along the chain a cache hit will result in that cache serving the object a cache miss will result in the request moving along the chain until the next cache is encountered. Future blog posts in this series will offer an in-depth look at server side and client side caching.309Views0likes0CommentsCloud: Impact of DNS on Performance
#webperf #devops Developers and operations must work together to mitigate the impact of hybrid architectures on application performance One of the ramifications of relying on off-premise cloud infrastructure is that you're necessarily stuck with some of the idiosyncrasies that come with it. For example, it's not your network, and thus topologically-related identifiers such as host names and IP address are not within your purview. But you certainly aren't going to ask your customers to visit "host111-east-virginia-zone3-subnet5.cloudproivder.com". At least not if you want them visit, you won't. Luckily, you control your own DNS destiny, so you'll just CNAME that crazy long host name provided by the provider to be something more catchy and inline with your branding, say, "coolappz.com". While certainly more appealing to everyone (easy to remember, fits better on a bumper sticker and on branded swag) it does have a downside: double the latency. You see, CNAME lookups require two distinct DNS queries to resolve - the first retrieves the ultra-ugly-long host name, the second resolves the ultra-ugly-long host name into an IP address that can actually be used by the browser to connect. So that's double the lookup, double the roundtrips, double the latency. Of course, no web page comprises just one host. That would be so 90s and this, this is the 21st century! This is Web 2.0, the age of integration and interconnection and inter-everything. And if the services upon which you rely to build that web app are using CNAMEs, too, well... I hope you like math cause you're going to be added up some roundtrips and latency for a while. The point here is not to scare you off of hybrid architectures due to the potential impact on performance, but rather to remind you to keep the impact in the fore. It is important to remember the impact of topology, proximity, and the technology in general on the overall performance of your web applications. A Google Developers article nails down where DNS latency comes from quite well: There are two components to DNS latency: - Latency between the client (user) and DNS resolving server. In most cases this is largely due to the usual round-trip time (RTT) constraints in networked systems: geographical distance between client and server machines; network congestion; packet loss and long retransmit delays (one second on average); overloaded servers, denial-of-service attacks and so on. - Latency between resolving servers and other nameservers. This source of latency is caused primarily by the following factors: - Cache misses. If a response cannot be served from a resolver's cache, but requires recursively querying other nameservers, the added network latency is considerable, especially if the authoritative servers are geographically remote. - Underprovisioning. If DNS resolvers are overloaded, they must queue DNS resolution requests and responses, and may begin dropping and retransmitting packets. - Malicious traffic. Even if a DNS service is overprovisioned, DoS traffic can place undue load on the servers. Similarly, Kaminsky-style attacks can involve flooding resolvers with queries that are guaranteed to bypass the cache and require outgoing requests for resolution. -- Introduction: causes and mitigations of DNS latency Interestingly, Google is arguing for public DNS services, even though this may in fact contribute to location-induced DNS latency, particularly for custom domains for which the authoritative zone is served by relatively few number of DNS servers, most of which are geographically located far from the majority of users. Intercontinental latency is still very much problematic. Catchpoint, a web performance monitoring service, mentions this in its exhaustive list of the ways in which DNS impacts performance: Exotic Domains: be careful with the exotic domain names, .ly, .tv… these domains have authoritative servers that are often far away from you end user ISPs. The records will have almost always 2 day TTL, however you never know when someone will be impacted because the query has to go to the authoritative servers and they fail. Example “.ly”, 2 authoritative servers are in Libya, 2 in the US, and 1 in the Netherlands. So when we go connecting clouds and data centers, we need to be concerned with where and how domains are being disseminated, sharded, and resolved. We need to more carefully consider how we are referencing content and whether or not the performance boosts we get from some techniques (such as domain sharding) are being offset by the impact of double the latency from the need to resolve those extra hosts. We need to examine that in the context of other contributing factors, such as TTL (time to live). If the time to live is long enough, then perhaps the initial hit from the extra lookup required to resolve a CNAME isn't going to matter over the life of the session. If we're looking at supporting a stateless API in which sessions don't really exist, then the second lookup may indeed be problematic, but only if the calls are generally spread out over a time interval that is greater than the TTL. It's a balancing act, where understanding how application network services contribute to the performance of applications is critical to pushing the right buttons and twisting the right knobs will alleviate performance issues that can damage adoption and growth of the web applications that are key to business. You're Not Off The Hook, Developers So often it's the case that applications are written with a specific behavior in mind and it is left to devops to figure out how to mitigate these kinds of potential performance issues. But it is just as important for developers to understand how the application network services contribute to performance because sometimes, all it takes is for the application to be "tweaked' with respect to an update interval or use of a different host name to generate a significant improvement in performance. It is increasingly difficult for - and sometimes even impossible - for operations to make adjustments in the infrastructure, particularly in hybrid environments where infrastructure services are black-box and off-limits. Thus, it is of growing importance that developers and operations work together to map the interaction of applications with application network services such that each group can make appropriate modifications and configuration changes that serve to improve the overall performance of the application, no matter where it might be deployed. As more and more organizations adopt hybrid, distributed applications that span geographies in addition to environments, this level of cooperation and collaboration will be key to managing web application performance issues.393Views0likes0CommentsiRule: prevent caching on certain file types
A user wanted to offload some cache control manipulation from the webserver configuration onto the BIG-IP via iRules. Since this is just HTTP header settings it should be fairly straightforward. We'll, almost... Trying to write an IRule to make client computers not cache certain file types. We can not have our client computers download any files that start with "slide" or "ps" into their temporary internet files. I have tried the following however the files are still caching into the Temp internet files. when HTTP_REQUEST { if {[HTTP::uri] contains "slide."} { set foundmatch 1 } if {[HTTP::uri] contains "ps."} { set foundmatch 1 } else { set foundmatch 0 } } when HTTP_RESPONSE { if {$foundmatch == 1} { HTTP::header replace Cache-Control no-cache HTTP::header replace Pragma no-cache } } We'll after a bit of packet sniffing it was determined that the browser wasn't honoring the headers since the default response was HTTP/1.0. Fortunately you can easily change this in iRules with the HTTP::version command. Here's the completed iRule. when HTTP_REQUEST { set foundmatch 0 if { ([HTTP::uri] contains ".swf") or ([HTTP::uri] contains "ps")} { set foundmatch 1 } } when HTTP_RESPONSE { if {$foundmatch == 1} { HTTP::version "1.1" HTTP::header replace Pragma no-cache HTTP::header replace Cache-Control no-cache HTTP::header replace Expires -1 HTTP::header remove Connection HTTP::header insert Connection Close } } So, the lesson here is to remember to upgrade the version to 1.1 if you want the browser to honor the no-cache directives B-). Click here to access the original forum thread. -Joe [Listening to: The Boxer - Simon & Garfunkel - Concert in Central Park (06:03)]808Views0likes0Comments