Inside Look - PCoIP Proxy for VMware Horizon View
I sit down with F5 Solution Architect Paul Pindell to get an inside look at BIG-IP's native support for VMware's PCoIP protocol. He reviews the architecture, business value and gives a great demo on how to configure BIG-IP. BIG-IP APM offers full proxy support for PC-over-IP (PCoIP), a leading virtual desktop infrastructure (VDI) protocol. F5 is the first to provide this functionality which allows organizations to simplify their VMware Horizon View architectures. Combining PCoIP proxy with the power of the BIG-IP platform delivers hardened security and increased scalability for end-user computing. In addition to PCoIP, F5 supports a number of other VDI solutions, giving customers flexibility in designing and deploying their network infrastructure. ps Related: F5 Friday: Simple, Scalable and Secure PCoIP for VMware Horizon View Solutions for VMware applications F5's YouTube Channel In 5 Minutes or Less Series (24 videos – over 2 hours of In 5 Fun) Inside Look Series Life@F5 Series Technorati Tags: vdi,PCoIP,VMware,Access,Applications,Infrastructure,Performance,Security,Virtualization,silva,video,inside look,big-ip,apm Connect with Peter: Connect with F5:355Views0likes0CommentsiCall Triggers - Invalidating Cache from iRules
iCall is BIG-IP's all new (as of BIG-IP version 11.4) event-based automation system for the control plane. Previously, I wrote up the iCall system overview, as well as an article on the use of a periodic handler for automating backups. This article will feature the use of the triggered iCall handler to allow a user to submit a http request to invalidate the cache served up for an application managed by the Application Acceleration Manager. Starting at the End Before we get to the solution, I'd like to address the use case for invalidating cache. In many cases, the team responsible for an application's health is not the network services team which is the typical point of access to the BIG-IP. For large organizations with process overhead in generating tickets, invalidating cache can take time. A lot of time. So the request has come in quite frequently..."How can I invalidate cache remotely?" Or even more often, "Can I invalidate cache from an iRule?" Others have approached this via script, and it has been absolutely possible previously with iRules, albeit through very ugly and very-not-recommended ways. In the end, you just need to issue one TMSH command to invalidate the cache for a particular application: tmsh::modify wam application content-expiration-time now So how do we get signal from iRules to instruct BIG-IP to run a TMSH command? This is where iCall trigger handlers come in. Before we hope back to the beginning and discuss the iRule, the process looks like this: Back to the Beginning The iStats interface was introduced in BIG-IP version 11 as a way to make data accessible to both the control and data planes. I'll use this to pass the data to the control plane. In this case, the only data I need to pass is to set a key. To set an iStats key, you need to specify : Class Object Measure type (counter, gauge, or string) Measure name I'm not measuring anything, so I'll use a string starting with "WA policy string" and followed by the name of the policy. You can be explicit or allow the users to pass it in a query parameter as I'm doing in this iRule below: when HTTP_REQUEST { if { [HTTP::path] eq "/invalidate" } { set wa_policy [URI::query [HTTP::uri] policy] if { $wa_policy ne "" } { ISTATS::set "WA policy string $wa_policy" 1 HTTP::respond 200 content "App $wa_policy cache invalidated." } else { HTTP::respond 200 content "Please specify a policy /invalidate?policy=policy_name" } } } Setting the key this way will allow you to create as many triggers as you have policies. I'll leave it as an exercise for the reader to make that step more dynamic. Setting the Trigger With iStats-based triggers, you need linkage to bind the iStats key to an event-name, wacache in my case. You can also set thresholds and durations, but again since I am not measuring anything, that isn't necessary. sys icall istats-trigger wacache_trigger_istats { event-name wacache istats-key "WA policy string wa_policy_name" } Creating the Script The script is very simple. Clear the cache with the TMSH command, then remove the iStats key. sys icall script wacache_script { app-service none definition { tmsh::modify wam application dc.wa_hero content-expiration-time now exec istats remove "WA policy string wa_policy_name" } description none events none } Creating the Handler The handler is the glue that binds the event I created in the iStats trigger. When the handler sees an event named wacache, it'll execute the wacache_script iCall script. sys icall handler triggered wacache_trigger_handler { script wacache_script subscriptions { messages { event-name wacache } } } Notes on Testing Add this command to your arsenal - tmsh generate sys icall event <event-name> context none</event-name> where event-name in my case is wacache. This allows you to troubleshoot the handler and script without worrying about the trigger. And this one - tmsh modify sys db log.evrouted.level value Debug. Just note that the default is Notice when you're all done troubleshooting.1.6KViews0likes6CommentsThe Order of (Network) Operations
Thought those math rules you learned in 6 th grade were useless? Think again…some are more applicable to the architecture of your data center than you might think. Remember back when you were in the 6 th grade, learning about the order of operations in math class? You might recall that you learned that the order in which mathematical operators were applied can have a significant impact on the result. That’s why we learned there’s an order of operations – a set of rules – that we need to follow in order to ensure that we always get the correct answer when performing mathematical equations. Rule 1: First perform any calculations inside parentheses. Rule 2: Next perform all multiplications and divisions, working from left to right. Rule 3: Lastly, perform all additions and subtractions, working from left to right. Similarly, the order in which network and application delivery operations are applied can dramatically impact the performance and efficiency of the delivery of applications – no matter where those applications reside.359Views0likes1CommentHTML5 Web Sockets Changes the Scalability Game
#HTML5 Web Sockets are poised to completely change scalability models … again. Using Web Sockets instead of XMLHTTPRequest and AJAX polling methods will dramatically reduce the number of connections required by servers and thus has a positive impact on performance. But that reliance on a single connection also changes the scalability game, at least in terms of architecture. Here comes the (computer) science… If you aren’t familiar with what is sure to be a disruptive web technology you should be. Web Sockets, while not broadly in use (it is only a specification, and a non-stable one at that) today is getting a lot of attention based on its core precepts and model. Web Sockets Defined in the Communications section of the HTML5 specification, HTML5 Web Sockets represents the next evolution of web communications—a full-duplex, bidirectional communications channel that operates through a single socket over the Web. HTML5 Web Sockets provides a true standard that you can use to build scalable, real-time web applications. In addition, since it provides a socket that is native to the browser, it eliminates many of the problems Comet solutions are prone to. Web Sockets removes the overhead and dramatically reduces complexity. - HTML5 Web Sockets: A Quantum Leap in Scalability for the Web So far, so good. The premise upon which the improvements in scalability coming from Web Sockets are based is the elimination of HTTP headers (reduces bandwidth dramatically) and session management overhead that can be incurred by the closing and opening of TCP connections. There’s only one connection required between the client and server over which much smaller data segments can be sent without necessarily requiring a request and a response pair. That communication pattern is definitely more scalable from a performance perspective, and also has a positive impact of reducing the number of connections per client required on the server. Similar techniques have long been used in application delivery (TCP multiplexing) to achieve the same results – a more scalable application. So far, so good. Where the scalability model ends up having a significant impact on infrastructure and architectures is the longevity of that single connection: Unlike regular HTTP traffic, which uses a request/response protocol, WebSocket connections can remain open for a long time. - How HTML5 Web Sockets Interact With Proxy Servers This single, persistent connection combined with a lot of, shall we say, interesting commentary on the interaction with intermediate proxies such as load balancers. But ignoring that for the nonce, let’s focus on the “remain open for a long time.” A given application instance has a limit on the number of concurrent connections it can theoretically and operationally manage before it reaches the threshold at which performance begins to dramatically degrade. That’s the price paid for TCP session management in general by every device and server that manages TCP-based connections. But Lori, you’re thinking, HTTP 1.1 connections are persistent, too. In fact, you don’t even have to tell an HTTP 1.1 server to keep-alive the connection! This really isn’t a big change. Whoa there hoss, yes it is. While you’d be right in that HTTP connections are also persistent, they generally have very short connection timeout settings. For example, the default connection timeout for Apache 2.0 is 15 seconds and for Apache 2.2 a mere 5 seconds. A well-tuned web server, in fact, will have thresholds that closely match the interaction patterns of the application it is hosting. This is because it’s a recognized truism that long and often idle connections tie up server processes or threads that negatively impact overall capacity and performance. Thus the introduction of connections that remain open for a long time changes the capacity of the server and introduces potential performance issues when that same server is also tasked with managing other short-lived, connection-oriented requests. Why this Changes the Game… One of the most common inhibitors of scale and high-performance for web applications today is the deployment of both near-real-time communication functions (AJAX) and traditional web content functions on the same server. That’s because web servers do not support a per-application HTTP profile. That is to say, the configuration for a web server is global; every communication exchange uses the same configuration values such as connection timeouts. That means configuring the web server for exchanges that would benefit from a longer time out end up with a lot of hanging connections doing absolutely nothing because they were used to grab standard dynamic or static content and then ignored. Conversely, configuring for quick bursts of requests necessarily sets timeout values too low for near or real-time exchanges and can cause performance issues as a client continually opens and re-opens connections. Remember, an idle connection is a drain on resources that directly impacts the performance and capacity of applications. So it’s a Very Bad Thing™. One of the solutions to this somewhat frustrating conundrum, made more feasible by the advent of cloud computing and virtualization, is to deploy specialized servers in a scalability domain-based architecture using infrastructure scalability patterns. Another approach to ensuring scalability is to offload responsibility for performance and connection management to an appropriately capable intermediary. Now, one would hope that a web server implementing support for both HTTP and Web Sockets would support separately configurable values for communication settings on at least the protocol level. Today there are very few web servers that support both HTTP and Web Sockets. It’s a nascent and still evolving standard so many of the servers are “pure” Web Sockets servers, many implemented in familiar scripting languages like PHP and Python. Which means two separate sets of servers that must be managed and scaled. Which should sound a lot like … specialized servers in a scalability domain-based architecture. The more things change, the more they stay the same. The second impact on scalability architectures centers on the premise that Web Sockets keep one connection open over which message bits can be exchanged. This ties up resources, but it also requires that clients maintain a connection to a specific server instance. This means infrastructure (like load balancers and web/application servers) will need to support persistence (not the same as persistent, you can read about the difference here if you’re so inclined). That’s because once connected to a Web Socket service the performance benefits are only realized if you stay connected to that same service. If you don’t and end up opening a second (or Heaven-forbid a third or more) connection, the first connection may remain open until it times out. Given that the premise of the Web Socket is to stay open – even through potentially longer idle intervals – it may remain open, with no client, until the configured time out. That means completely useless resources tied up by … nothing. Persistence-based load balancing is a common feature of next-generation load balancers (application delivery controllers) and even most cloud-based load balancing services. It is also commonly implemented in application server clustering offerings, where you’ll find it called server-affinity. It is worth noting that persistence-based load balancing is not without its own set of gotchas when it comes to performance and capacity. THE ANSWER: ARCHITECTURE The reason that these two ramifications of Web Sockets impacts the scalability game is it requires an broader architectural approach to scalability. It can’t necessarily be achieved simply by duplicating services and distributing the load across them. Persistence requires collaboration with the load distribution mechanism and there are protocol-based security constraints with respect to incorporating even intra-domain content in a single page/application. While these security constraints are addressable through configuration, the same caveats with regards to the lack of granularity in configuration at the infrastructure (web/application server) layer must be made. Careful consideration of what may be accidentally allowed and/or disallowed is necessary to prevent unintended consequences. And that’s not even starting to consider the potential use of Web Sockets as an attack vector, particularly in the realm of DDoS. The long-lived nature of a Web Socket connection is bound to be exploited at some point in the future, which will engender another round of evaluating how to best address application-layer DDoS attacks. A service-focused, distributed (and collaborative) approach to scalability is likely to garner the highest levels of success when employing Web Socket-based functionality within a broader web application, as opposed to the popular cookie-cutter cloning approach made exceedingly easy by virtualization. Infrastructure Scalability Pattern: Partition by Function or Type Infrastructure Scalability Pattern: Sharding Sessions Amazon Makes the Cloud Sticky Load Balancing Fu: Beware the Algorithm and Sticky Sessions Et Tu, Browser? Forget Hyper-Scale. Think Hyper-Local Scale. Infrastructure Scalability Pattern: Sharding Streams Infrastructure Architecture: Whitelisting with JSON and API Keys Does This Application Make My Browser Look Fat? HTTP Now Serving … Everything641Views0likes5CommentsMake Your Cache Work For You
One of the questions we frequently get from the field and customers is how to appropriately tune the profile for caching. There are lots of settings in the profile and a mis-configuration can actually cause some pretty adverse effects, so getting the settings tuned properly is highly recommended. Of course the answer to this question is my go-to response ‘It depends.’ I am sure many people have gotten tired of always hearing the same answer for every question, but there is no one size fits all answer to this question. The natural follow on question is “What does it depend on?” Here I can help you with more details. First are you trying to tune caching for RAM cache (AKA Fast cache) or are you trying to tune for Application Acceleration Manager (AAM)? The settings in the profile will perform differently for each of the caches. How do you determine which objects are cacheable and for how long? RAM Cache as the name implies is based entirely on RAM memory and is available with every BIG-IP LTM. AAM’s cache on the other hand uses both RAM memory and disk for storing objects. How the two determine which objects to cache and for how long differs. AAM decides if an object is cacheable based on the policy associated with the application assigned to the profile. Filters are then applied based on object size, “Responses Cached” and Profile settings. How long an object is cached for is then determined by the lifetime settings within the policy. RAM Cache determines if an object is cacheable and for how long based on the configuration within the profile. The settings are the same for all object types there is no per-object setting as exists with AAM. This profile can control both AAM and RAM Cache, although the settings mean different things depending on which you are configuring for. The table below outlines the differences Table 1 highlights the differences between how decisions on caching are made. Setting RAM Cache AAM Cache Cache Size Maximum amount of space that can be used per profile. No borrowing occurs. Minimum amount of space that is dedicated to the profile, borrowing will occur if resources are available. Max Entries Maximum number of objects that can be stored Number of references that are stored for objects in the resource and entity cache. A reference to an object can be evicted from the resource cache but the item still exists in cache and can be served. Responses served from cache may be slightly delayed in these circumstances, but requests will not be proxied to the origin web servers. How long objects are cached for Fixed for all objects based on the max-age setting in the acceleration profile Configurable on a per object or object type basis in the acceleration policy Determination if an object is cacheable Based on configuration in the acceleration profile Based on the acceleration policy responses cached and proxy settings along with the object size setting from the acceleration profile. How much space can be used for caching? The maximum amount of space available for caching is half of the RAM a TMM process has been allocated. Depending on which platform you are using will impact how much space is available for caching. RAM is used for smaller objects and disk is used for larger objects. The maximum amount of space, both memory and disk) that is available for caching with AAM is up to 256 GB per profile, if resources are available. This does NOT mean you should set the size on all profiles to 256 GB. AAM will borrow if space is available. The trick is figuring out what the initial value should be. The following provides some guidelines on how to calculate this initial value. Calculating the ideal cache size The initial set of variables to care about regarding the cache size: OBJECT_SIZE and lifetime settings. Of course, the values of these variables are going to depend (there’s that pesky word again) on the application, the application content, the traffic patterns, etc. The more unique cacheable objects the application may require a larger cache to run faster, however the frequency of access for those objects, if it's low, may make a large cache to be a waste of space since the objects expire in the cache before the next request needs them based on the lifetime, plus cache latency introduced by the high number of records. See it depends. When the cache is full, AAM will evict the entry that is deemed less important, in order to make room for a new one, resulting in cache misses if the number of popular entities is higher than what the cache can accommodate. Lifetime settings have meaning here again, since it could be the case where having a high age value forces the cache to keep on rotating (evicting) still valid content. The main goal should be to minimize evictions and maximize the load savings on the origin web servers. Other "external factors", that dictate amount of memory/disk space available for caching in AAM are: · Hardware specs. · Number of applications running on that device. · Other modules running in the BIG-IP. As I said in the beginning and you can now see this depends on a number of variables, there's no hard answer that applies to all scenarios. Knowing the specifics of the application makes setting the values easier, however if you don’t know the specifics here are some general guidelines on setting the values-: · Min/Max Object Size: Knowing the distribution of object sizes can help determine what these values could be. If your site is made up of mostly GIFs setting a minimum object size of 10Kb could result in the majority of the objects not being cached. Similarly if your objects are mostly flash files and the maximum object size was set to 100 Kb not many items would be cached. Minimum values of 2-4Kb and maximum values of 1MB are good starting points for these settings · Aging/Lifetime settings: How long should content be cached for is often times a business decision. AAM uses default lifetimes of 4 hours for static content such as images and includes. This means an object will not be revalidated for 4 hours, in most instances this is good. Altering this would determine on how often objects are updated and how long it is safe to serve stale content. In most businesses it is rare for an object to be edited frequently. Yes, new objects and content will be added but the same exact file will likely not change. Take a social site like LinkedIn for example – people are constantly changing their profiles, posting articles, and adding content, but much of the content such as icons and JS files stays the same. The last modified dates of content on my LinkedIn home page range from November 2012 – today. With only a few objects from today. Having a cache serve the objects for 4 hours is relatively safe. · Cache size: The cache-size value for the LTM web-acceleration profile should be set to a "trivial" value based on the content type. A good starting point could be the default value of 100MB, however if your site serves a lot of heavy images maybe a larger than default value should be used. Remember AAM will borrow space if needed so there is no need to set this to 100 GB. A value between 100-500MB is likely a good starting point. The trick here is making sure the space isn’t over or under utilized (more on this below). · Number of entries: This should not be set to the total number of objects on the application but rather calculated based on the size of the cache above in either of the following ways: 1) If all content is of primarily a single object type such as images, you can calculate based on the average object size. According to HttpArchive the average image size is 19KB. If you set the cache size to be 100 MB then the max entries could be calculated using the following formula: Cache size / average object size = Max entries 102400/19 = 5389 I would suggest rounding up to pad slightly to a value of 6000. 2) Now not all caches will cache the same exact type of object there will be objects of varying sizes and content types so an alternative way of calculating the max entries # of HTML pages * average # of objects per page = Max entries HttpArchive reports that the average number of objects on a page is 95 and the average number of requests across a single domain is 51. Why the discrepancy and which number to use? With domain sharding and third party content the requests will not all come from a single FQDN. For the purpose of this calculation we are concerned with the objects that are being served from the origin servers no the third party content so I will choose the lower of the numbers. Sadly there is not a metric for the average number of pages if you have access to that number use it otherwise you will have to guess. For the purpose of this example I am going with a nice round number of 300 pages. 300 * 51 = 15300 That’s a lot of objects and honestly is probably too high but we’re not done calculating yet. We assumed that every page will be downloading 51 unique objects from cache, this is not the case. There are likely common items on the pages js, css, images which will be getting served from the browser’s cache and some pages which are only accessed once in a blue moon, it would be safe to estimate that 50-75% of the objects will be getting served from caching resulting in a total of 7650-11475. A number within this range would be a good starting point. There is a bit of trial and error that goes into configuring the settings. With the above guidance and the process below it becomes a bit easier to narrow in on the best settings. 1.- Set the cache values to a seed value as described above and evaluate. 2.- Let the Application receive the traffic it is expected to receive normally. 3.- Monitor the cache stats: Via TMSH on box $ tmsh show ltm profile web-acceleration Or the TMCTL version which provides the output in csv for scripting analysis & parsing $ tmctl profile_webacceleration_jail_stat For example: tmctl -c profile_webacceleration_jail_stat | grep | grep And look for cache_size and cache_evictions. You can run the following (just put the appropriate WEB_ACCEL_PROFILE_NAME and VIRTUAL_SERVER_NAME) to get the simplified table: % cut_fields=`tmctl -c profile_webacceleration_jail_stat | head -1 | awk 'BEGIN{FS=","; fields="name,vs_name,cache_size,cache_evictions"; split(fields,sfx,","); for (x in sfx) sf[sfx[x]] = sfx[x]; cut_fields=""} { for (i=1; i<=NF; ++i) { if ($i in sf ) cut_fields=cut_fields i"," } } END{ print cut_fields }'`; echo ; echo 'Stats table:' ; tmctl -c profile_webacceleration_jail_stat | head -1 | cut -d ',' -f $cut_fields ; tmctl -c profile_webacceleration_jail_stat | grep WEB_ACCEL_PROFILE_NAME | grep VIRTUAL_SERVER_NAME | cut -d ',' -f $cut_fields; echo Like: % cut_fields=`tmctl -c profile_webacceleration_jail_stat | head -1 | awk 'BEGIN{FS=","; fields="name,vs_name,cache_size,cache_evictions"; split(fields,sfx,","); for (x in sfx) sf[sfx[x]] = sfx[x]; cut_fields=""} { for (i=1; i<=NF; ++i) { if ($i in sf ) cut_fields=cut_fields i"," } } END{ print cut_fields }'`; echo ; echo 'Stats table:' ; tmctl -c profile_webacceleration_jail_stat | head -1 | cut -d ',' -f $cut_fields ; tmctl -c profile_webacceleration_jail_stat | grep webacceleration | grep _listener | cut -d ',' -f $cut_fields; echo This command will output the cache size at that moment, and the cache evictions (the number of objects that were pushed out of the cache to make room for new objects). In the example below the cache is empty and as a result there are no evictions. 4.- Given that applications and traffic patterns are fluid and constantly changing it is recommended to periodically monitor the cache size and store the data in a table to view trends over time. If the maximum cache size is reached frequently or there is a high number of cache evictions then adjusting the cache size would be recommended. On the other hand, if you are barely reaching half the value for the cache size and there are no evictions, consider reducing the setting for a more efficient use of resources. Maximizing the cache hits, highly depends on the traffic pattern. A pattern that is conducive to caching depends on having a subset of documents out of the entire document space that are highly popular, and a long tail of less popular documents. Ideally we have enough space to fit all the highly popular documents. If not, then whatever can fit in becomes the cacheable popular content and we have to live with it. As cache pressure rears its head, we throw out a document based on a calculated weight that is derived from some of the parameters AAM to pick a document that has been configured as less important to throw out when under pressure. An important observation here, note that the more objects cached, the greater the time to first byte, so if latency is mentioned as something more important than OWS off-load, you should take note of that. Look carefully at the traffic. Any content produced by programs or scripts, or that require database accesses may not be useful to cache. If it is useful, a select sub-set of very low recency, high hit count, highly ephemeral objects should be marked as memory only. A very big thank you to my following coworkers Eswar Bala, Sergio Ligregni, Matt Miller and John Stevens for contributing to this article.1.1KViews0likes2CommentsDeduplication and Compression – Exactly the same, but different.
One day many years ago, Lori and I’s oldest son held up two sheets of paper and said “These two things are exactly the same, but different!” Now, he’s a very bright individual, he was just young, and didn’t even get how incongruous the statement was. We, being a fun loving family that likes to tease each other on occasion, we of course have not yet let him live it down. It was honestly more than a decade ago, but all is fair, he doesn’t let Lori live down something funny that she did before he was born. It is all in good fun of course. Why am I bringing up this family story? Because that phrase does come to mind when you start talking about deduplication and compression. Highly complimentary and very similar, they are pretty much “Exactly the same, but different”. Since these technologies are both used pretty heavily in WAN Optimization, and are growing in use on storage products, this topic intrigued me. To get this out of the way, at F5, compression is built into the BIG-IP family as a feature of the core BIG-IP LTM product, and deduplication is an added layer implemented over BIG-IP LTM on BIG-IP WAN Optimization Module (WOM). Other vendors have similar but varied (there goes a variant of that phrase again) implementation details. Before we delve too deeply into this topic though, what caught my attention and started me pondering the whys of this topic was that F5’s deduplication is applied before compression, and it seems that reversing the order changes performance characteristics. I love a good puzzle, and while the fact that one should come before the other was no surprise, I started wanting to know why the order it was, and what the impact of reversing them in processing might be. So I started working to understand the details of implementation for these two technologies. Not understand them from an F5 perspective, though that is certainly where I started, but try to understand how they interact and compliment each other. While much of this discussion also applies to in-place compression and deduplication such as that used on many storage devices, some of it does not, so assume that I am talking about networking, specifically WAN networking, throughout this blog. At the very highest level, deduplication and compression are the same thing. They both look for ways to shrink your dataset before passing it along. After that, it gets a bit more complex. If it was really that simple, after all, we wouldn’t call them two different things. Well, okay, we might, IT has a way of having competing standards, product categories, even jobs that we lump together with the same name. But still, they wouldn’t warrant two different names in the same product like F5 does with BIG-IP WOM. The thing is that compression can do transformations to data to shrink it, and it also looks for small groupings of repetitive byte patterns and replaces them, while deduplication looks for larger groupings of repetitive byte patterns and replaces them. In the implementation you’ll see on BIG-IP WOM, deduplication looks for larger byte patterns repeated across all streams, while compression applies transformations to the data, and when removing duplication only looks for smaller combinations on a single stream. The net result? The two are very complimentary, but if you run compression before deduplication, it will find a whole collection of small repeating byte patterns and between that and transformations, deduplication will find nothing, making compression work harder and deduplication spin its wheels. There are other differences – because deduplication deals with large runs of repetitive data (I believe that in BIG-IP the minimum size is over a K), it uses some form of caching to hold patterns that duplicates can match, and the larger the caching, the more strings of bytes you have to compare to. This introduces some fun around where the cache should be stored. In memory is fast, but limited in size, on flash disk is fast and has a greater size, but is expensive, and on disk is slow but has a huge advantage in size. Good deduplication engines can support all three and thus are customizable to what your organization needs and can afford. Some workloads just won’t benefit from one, but will get a huge benefit from the other. The extremes are good examples of this phenomenon – if you have a lot of in-the-stream repetitive data that is too small for deduplication to pick up, and little or no cross-stream duplication, then deduplication will be of limited use to you, and the act of running through the dedupe engine might actually degrade performance a negligible amount – of course, everything is algorithm dependent, so depending upon your vendor it might degrade performance a large amount also. On the other extreme, if you have a lot of large byte count duplication across streams, but very little within a given stream, deduplication is going to save your day, while compression will, at best, offer you a little benefit. So yes, they’re exactly the same from the 50,000 foot view, but very very different from the benefits and use cases view. And they’re very complimentary, giving you more bang for the buck.297Views0likes1CommentTuning TCP
In the previous few posts I’ve discussed the new congestion control algorithms and rate pacing features that are available in BIG-IP 11.5; but if you’re not ready to move to 11.5 there is still plenty that you can do to optimize your TCP profiles. Adjusting the Initial Congestion Window Size The initial congestion window is a key component of slow start, or the exponential growth phase. Historically the initial congestion window has been 2, which means that slow start ramps up from 2 to 4 to 8 to 16 to 32. In this scenario, it takes 3 round trips before 8 segments are transmitted. The majority of web transactions are on the small side (under 16 Kb), very short lived, and are completed before slow start has a chance to ramp up. As a result of the significant increase in bandwidth available to users, increasing the initial congestion window to 10 has been proposed. We have increased the initial congestion window size in some of our default profiles (depending on version); but this is a modification you can easily make to any profile in any version. Ignoring Packet Loss By default, all packet loss events are passed directly to congestion control, which considers this an indicator of congestion and slow down transmission. In some networks, such as wireless, packet loss may not be a reliable indicator of congestion. In such cases, ignoring a small percentage of background loss can be beneficial. There are two settings in the BIG-IP TCP profile that can be adjusted to ignore loss: Packet loss ignore rate and packet loss ignore burst. Packet loss ignore rate allows you to specify the percentage of loss to ignore to prevent congestion control from kicking in and altering the transmission of packets. This is very useful when there is intermittent or stray packet drops on an uncongested network. If the network normally experiences a high degree of congestion, it is not recommended to configure this as it can be too aggressive and cause more packet loss. The packet loss ignore burst setting provides an exception value for the ignore rate loss parameter. If the connection sees a consecutive number of packet drops but the rate has not exceeded the ignore rate value; congestion control should kick in. A consecutive number of packet drops is evidence of a tail drop and should be considered a loss event. When configuring packet loss ignore rate, it is suggested that the packet loss ignore burst be set to a value between 6 and 12. Adjusting for SSL TCP profiles on SSL virtual servers should have Nagle and delayed ACKs disabled as this can create stalls. If your virtual server has a mixture of SSL and non-SSL traffic, Nagle can be disabled via an iRule when the SSL connection is detected.610Views1like3CommentsCaching FAQs
One of the most mysterious parts of the BIG-IP Application Acceleration Manager (AAM) is caching. Rarely is it explained, and there are very few documents that describe why you would or would not use one of the BIG-IP's caching facilities. Even harder to find is some kind of description of what numbers you should use, or whether or not to push some specific caching button when trying configure your AAM policies or applications. So here's an overview of a select few bits of frequently asked AAM caching questions, and some explanation of why you would or would not do something with those pretty buttons and number fields. To be clear, AAM does not use fast Cache, it has two entirely separate and distinct caching systems of its own: Metastor and the Small Object Cache. In this posting, however, we'll be talking about them, mostly, as if they are one in the same. The 4 most commonly asked questions we get regarding caching are as follows: · Why is there an option to turn off cache on first hit, and why would I ever enable this? · What does Queue Parallel Requests do? · Why would I ever set the maximum object size to anything less than infinity? · OK, a maximum object size makes sense, but what about the minimum object size? Each question is addressed using an analogy of putting marbles into a mason jar. We are, of course, talking about web objects and bytes of data, not marbles and weight. 1) "Why is there an option to turn off cache on first hit, and why would I ever do so?" OK, well, let's start with a simple mental model of a cache. Imagine your website as just a bunch of marbles. To keep it simple, all your marbles are the same size. Now think of a cache as being like a Mason jar. Imagine if the Mason jar is just big enough to hold exactly one marble. You can think of the BIG-IP as a super-fast copying machine that can copy marbles, and store one copy of one marble. Finally, imagine a single user sending requests for marbles to your website through the BIG-IP, where every policy node has "Cache marbles on first hit" turned on, and every marble is cacheable, and cached if requested. Pretty simple, right? If you have "Cache marble on first hit" turned on, then the very first request your user makes for a marble will cause the BIG-IP to turn around, get that marble from the website, copy it, put that copy into the Mason jar, and then hand the original marble to your user. At this point, the Mason jar is full. If the next request your user makes is for a different marble, then the first marble must be removed from the jar in order to make room for the one just requested. Sadly, the effort and time it took to copy and put the first marble into the Mason jar was entirely wasted, and the user got both of his marbles later, and slower than he would have if the BIG-IP had simply taken them from the website, and handed them to your user. If the third request the customer makes is for the first marble, then again the Mason jar has to be emptied and the first marble cached (remember only a single marble can be cached at any time). The BIG-IP is churning away, copying then putting a marble into the Mason jar, then emptying out the Mason jar, but never actually getting any value out of having that Mason jar. If the user keeps switching back and forth between requesting the first marble and the second marble, the jar will never have the marble being requested, and the load on the back end servers has not been reduced. This is considered a zero cache scenario where the benefits of the cache are moot. But imagine if "Cache marble on first hit" is turned off. Now the same marble has to be requested twice before the BIG-IP will copy it and put the copy in the Mason jar. So now, with the first request the BIG-IP does nothing but pass it along. However, the BIG-IP remembers that the blue marble was requested once. The second request also does nothing but pass the marble along, but again, the BIG-IP remembers that, say, a red marble was requested once. At this point, if the user goes back and asks for the blue marble again, it has been requested twice, so it will be copied and stored in the Mason jar. If the user then asks for a green marble, the BIG-IP remembers that the request was made, but does not discard the marble in the jar, as this is the first request. If the user requests the blue marble again, then the user will get a copy of that from the Mason jar, not from your website. You now have an effective cache where 1 in 5 requests have been offloaded from the origin server. In summary, turn off "Cache object on first hit" for policy nodes where the objects either change very quickly, or where the time between requests is relatively long. This will prevent the cache from discarding an object that your users will hopefully be requesting more often, and more frequently. Obviously, the flip side of that coin is that the BIG-IP will have to get the same object from your website twice, so if you are sure that the objects matched by a particular policy node are really popular, and that they will be requested quite frequently, (such as the company logo and navigation buttons) then copy 'em and dump them in the cache the first time they are requested. 2) What is "Queue Parallel Requests" and why would I turn it on? Queuing parallel requests is interesting, as it interacts with caching, but it really only helps when you have a lot of users trying to get the same marble at the same time, and that marble is being cached for the first time. A cache is kind of stupid, and it doesn't remember the marbles it threw away. As a result, any marble being put into it looks like it is being stored "for the first time", even when it is actually being put into the jar for the hundredth time. "Queue Parallel Requests" basically makes all the users who are requesting the same marble wait for it to be fetched off of your website, and then copied once for each user by the BIG-IP. That doesn't sound too interesting or useful until you realize that if you don't turn this on, then between the time you start the process of requesting that marble from your website and finish putting it into the jar, every other request for that same marble will have to be forwarded to your website. Image a scenario where a server takes 2 ms to respond to a request for an object. Every ms 2 new users request the object. In the time it has taken the server to respond to the first request 3 additional requests would have been sent for the server to process. This has created unnecessary demand on the servers. With queuing turned on all subsequent requests for the object will be placed into a parking area to wait for the original response to be returned and cached. Four requests doesn’t sound like it will cause a server to be overloaded, but what if it isn’t 4 but 400 requests. Suddenly, queuing sounds like a better idea, right? It is, but like any other feature, it is not a panacea. Turn it on for new, shareable, highly popular objects that remain the same for a relatively long time. More to the point, however, if the web server that is giving one marble to the BIG-IP to copy and give to a bunch of users hiccups (say, you decide to take down one of the web servers in your pool, or as luck would have it, one of them fails in the middle of hand over that marble), all of those users will get part of a marble, and that is all. You are trading less pool traffic for what our engineers like to call a "single point of failure" risk. But if you have a really rare and valuable marble that everyone wants a copy of, all at the same time, and your website pool is pretty stable and handing out marbles pretty efficiently, then request queuing will really reduce the traffic on your web servers! 3) There is an option to set the minimum and maximum cacheable object size. Why would I ever set the maximum object size to anything less than infinity? Yeah, that's a tough one. First, go read the answer to "Why turn off Cache content on first hit". Then, let's imagine a Mason jar where instead of one marble, we have a jar big enough to store one thousand marbles. In this scenario, however, we are going to assume exactly 16 simultaneous users, and also that the marbles they are requesting are in the jar. Obviously, the web servers in your pool are getting zero requests. Cool, right!? When caching is working, it can be really handy! But now let us change one assumption: let's allow your web site objects to vary in size. We still have 16 users, but there is one marble that is twice the diameter as the marbles in our first example. When this marble is cached it reduces the total number of marbles that can be cached. Only 13 of the original 16 requests can be served from the jar, the other 3 requests have to go to the server pool. If every marble in the cache is twice the diameter of the marbles in our first example, twelve of the 16 requests being made have to go to your pool. At the extreme, if one object completely fills the Mason jar, that marble (well, bowling ball, really!) is the only object that can be served from cache; the other 15 requests have to go to your pool. So you limit the maximum size of the marbles that can be stored in your Mason jar to configure the BIG-IP to serve the average number of simultaneous users you expect, and wish to serve. By the emergent properties of the system, it turns out that large objects are often times not that popular, anyway. Unless you are running a web server whose job it is to serve large patch files to end users, that is. 4) OK, a maximum object size makes sense. So why have a minimum object size? OK, now we have to get explicit about the jar, and about knowing what has been requested, copied, and stored in the jar. Assume that we have a peg board that has exactly one thousand holes in it. Each time we dump a marble in the jar, we write out a tag that describes the marble, tie it to a peg, then put that peg into the peg board. When we remove a marble from the jar, we remove its associated peg from the board. When the peg board is full, we can't store any more marbles in the Mason jar. Now, what if your minimum size is that of a grain of sand, but your mason jar is big enough to fit 100 marbles with a diameter of 2 inches? If what is popular, and requested quite frequently is a bunch of grains of sand, you can end up running out of peg board space long, LONG before you even finish coating the bottom of your Mason jar with sand. Giving your customers copies of those grains of sand will happen often, but will by definition be a smaller percentage of the total volume of traffic than if you made your minimum size larger, AND if you still have enough marbles of that minimum size on your website to fill your cache. Another way of looking at it is in terms of a collection of marbles of all sizes. If a large marble is in cache, and it has to be displaced to make room on the peg board for a tag that records the information for a grain of sand, and then the grain of sand has to be displaced to make room for the large marble, you will have to get both off of your origin servers. If you don't try to cache the sand grain, then when a user asks for the larger marble, the total weight of marbles requested from your server is going to be smaller. Even if that grain of sand has to be served from your server several times in order to keep the larger marble in the jar, that will be a lot less total grams of marbles moved, copied and stored or retrieved from the jar. Obviously, there is a trade off here between the number of requests, versus the total weight of the marbles being requested. Putting it all together Knowing when and what to cache is an important step to ensure that BIG-IP and your application is performing optimally. Setting a parameter with a wrong value can have negative effects causing increased traffic on your origin servers and consuming resources unnecessarily on BIG-IP. Think about what you are trying to achieve, what other optimization features are enabled and the traffic patterns of your site when configuring the cache settings. Thank you to my colleague John Stevens for assistance in writing this article.1.1KViews0likes2Comments4 Things You Need in a Cloud Computing Infrastructure
Cloud computing is, at its core, about delivering applications or services in an on-demand environment. Cloud computing providers will need to support hundreds of thousands of users and applications/services and ensure that they are fast, secure, and available. In order to accomplish this goal, they'll need to build a dynamic, intelligent infrastructure with four core properties in mind: transparency, scalability, monitoring/management, and security. Transparency One of the premises of Cloud Computing is that services are delivered transparently regardless of the physical implementation within the "cloud". Transparency is one of the foundational concepts of cloud computing, in that the actual implementation of services in the "cloud" are obscured from the user. This is actually another version of virtualization, where multiple resources appear to the user as a single resource. It is unlikely that a single server or resource will always be enough to satisfy demand for a given provisioned resource, which means transparent load-balancing and application delivery will be required to enable the transparent horizontal scaling of applications on-demand. The application delivery solution used to provide transparent load-balancing services will need to be automated and integrated into the provisioning workflow process such that resources can be provisioned on-demand at any time. Related Articles from around the Web What cloud computing really means How Cloud & Utility Computing Are Different The dangers of cloud computing Guide To Cloud Computing For example, when a service is provisioned to a user or organization, it may need only a single server (real or virtual) to handle demand. But as more users access that service it may require the addition of more servers (real or virtual). Transparency allows those additional servers to be added to the provisioned service without interrupting the service or requiring reconfiguration of the application delivery solution. If the application delivery solution is integrated via a management API with the provisioning workflow system, then transparency is also achieved through the automated provisioning and de-provisioning of resources. Scalability Obviously cloud computing service providers are going to need to scale up and build out "mega data centers". Scalability is easy enough if you've deployed the proper application delivery solution, but what about scaling the application delivery solution? That's often not so easy and it usually isn't a transparent process; there's configuration work and, in many cases, re-architecting of the network. The potential to interrupt services is huge, and assuming that cloud computing service providers are servicing hundreds of thousands of customers, unacceptable. The application delivery solution is going to need to not only provide the ability to transparently scale the service infrastructure, but itself, as well. That's a tall order, and something very rarely seen in an application delivery solution. Making things even more difficult will be the need to scale on-demand in real-time in order to make the most efficient use of application infrastructure resources. Many postulate that this will require a virtualized infrastructure such that resources can be provisioned and de-provisioned quickly, easily and, one hopes, automatically. The "control node" often depicted in high-level diagrams of the "cloud computing mega data center" will need to provide on-demand dynamic application scalability. This means integration with the virtualization solution and the ability to be orchestrated into a workflow or process that manages provisioning. Intelligent Monitoring In order to achieve the on-demand scalability and transparency required of a mega data center in the cloud, the control node, i.e. application delivery solution, will need to have intelligent monitoring capabilities. It will need to understand when a particular server is overwhelmed and when network conditions are adversely affecting application performance. It needs to know the applications and services being served from the cloud and understand when behavior is outside accepted norms. While this functionality can certainly be implemented externally in a massive management monitoring system, if the control node sees clients, the network, and the state of the applications it is in the best position to understand the real-time conditions and performance of all involved parties without requiring the heavy lifting of correlation that would be required by an external monitoring system. But more than just knowing when an application or service is in trouble, the application delivery mechanism should be able to take action based on that information. If an application is responding slowly and is detected by the monitoring mechanism, then the delivery solution should adjust application requests accordingly. If the number of concurrent users accessing a service is reaching capacity, then the application delivery solution should be able to not only detect that through intelligent monitoring but participate in the provisioning of another instance of the service in order to ensure service to all clients. Security Cloud computing is somewhat risky in that if the security of the cloud is compromised potentially all services and associated data within the cloud are at risk. That means that the mega data center must be architected with security in mind, and it must be considered a priority for every application, service, and network infrastructure solution that is deployed. The application delivery solution, as the "control node" in the mega data center, is necessarily one of the first entry points into the cloud data center and must itself be secure. It should also provide full application security - from layer 2 to layer 7 - in order to thwart potential attacks at the edge. Network security, protocol security, transport layer security, and application security should be prime candidates for implementation at the edge of the cloud, in the control node. While there certainly will be, and should be, additional security measures deployed within the data center, stopping as many potential threats as possible at the edge of the cloud will alleviate much of the risk to the internal service infrastructure. What are your plans for cloud computing? ( polls)391Views0likes2CommentsHow Apps and APIs are Changing the App Acceleration Game
Modern apps are changing. And I’m not just talking about the rush to containerize apps or put them in the cloud. I’m talking about app architectures and, more specifically, the techniques behind them that are having a significant impact on networking and app services. No, not microservices or containers. I’m talking about how apps are composed and the division of duties within an app architecture. Way back in the early days of the web (in the ‘e-commerce’ era), the need for speed propelled a number of “acceleration” technologies into data centers everywhere. These “front-end” acceleration services were focused primarily on caching and compression at the time. It was an appropriate response to an app model in which the server-side response bore most of the burden for delivering an app. The traditional “layers” of an application: presentation, logic, and data, were all bound up in a single – often large – response wrapped up in HTML. Web 2.0 and AJAX started to change that. More of the presentation layer of the app was moved onto the client (the browser). Responses became smaller and the notion of dynamic content drove us toward newer acceleration techniques designed to address the unique challenges of these applications. Image optimization, inlining scripts, and reordering of content became an expected response to improving the speed with which these apps performed. Connection (TCP) related optimizations became imperative as a single web app started to consume often eight-times the number of connections (and commensurately, compute and memory) to achieve the dynamism users craved. Today, REST-based API-driven apps are becoming (if they aren’t already) the norm. At this point, presentation logic is almost completely contained within the client via JavaScript-based frameworks like JQuery and Angular. The presentation layer has always (necessarily, some might say) been the fattest of the three layers. That remains true, but with most of it constrained to the client (browser or native mobile app), the burden on the server-side has been dramatically reduced. Responses to queries are now JSON, for the most part, and often fit within the somewhat restrictive size of a single packet (about 1460 bytes). HTTP/2 and WebSockets with request pipelining capabilities and binary payloads have made connection (TCP) management on the client side nearly impotent in terms of improving performance because it’s implicitly included in the protocols themselves. The days of needing to multiplex connections are gone thanks to protocols like SPDY and HTTP/2 that provide the same result and require only a single connection. For most apps, the images that used to provide visual cues (the icons) have moved from individual images to sprites to web icon sets from offerings like Font Awesome. With relatively tiny footprints, such icon sets provide the same visual cues at a greatly reduced data footprint and reside client-side, not server side, making image optimization less of an problem for most modern applications. The scripts driving the UI and app logic on the client side, too, are almost universally offered in an already minimized form. Developers know to grab the “.min.” script for deployment and judiciously load them from their hosted sites – already backed by a CDN and cached for more immediate and timely response. In the past year the percentage of HTML being served via a CDN has risen to 20%. For JavaScript, it’s 21.5% of the Quantcast Top10k and 7% of the BuiltWith Top Million. But Lori, you’re thinking, then why is the average size of a web page going up and not down? Well, all those scripts and presentation logic are heavy. The actual size of a web page is going up because it’s now containing much of the presentation and logic that used to be delivered from the server. It’s still delivered initially by the server, but then most of the actual interaction with the app occurs dynamically, with very small chunks of data being exchanged with the server side logic. Consider the change in the composition of a “page” over the past year. Where fonts, JavaScript, and CSS have increased, HTML has actually decreased by 18%. That’s a significant change. What this all means is that the focus of “front-end acceleration” services are changing, necessarily. The focus has to shift from manipulating and modifying content to streamlining and managing protocols. Compression may be beneficial – particularly on the first visit to an app – and caching thereafter, but performance improvements where modern apps are concerned will be best achieved through the use of protocols like HTTP/2 and optimization and offload on the server-side of the equation. Apps are changing; not just where and in what they’re deployed, but their basic composition. That means the technologies and tools traditionally used to improve performance and address capacity issues has to change, as well.315Views0likes0Comments