persistence
13 TopicsWhy you still need layer 7 persistence
Tony Bourke of the Load Balancing Digest points out that mega proxies are largely dead. Very true. He then wonders whether layer 7 persistence is really all that important today, as it was largely implemented to solve the problems associated with mega-proxies - that is, large numbers of users coming from the same IP address. Layer 7 persistence is still applicable to situations where you may have multiple users coming from a single IP address (such as a small client base coming from a handful of offices, with each office using on public IP address), but I wonder what doing Layer 4 persistence would do to a major site these days. I’m thinking, not much. I'm going to say that layer 4 persistence would likely break a major site today. Layer 7 persistence is even more relevant today than it has been in the past for one very good reason: session coherence. Session coherence may not have the performance and availability ramifications of the mega-proxy problem, but it is essential to ensure that applications in a load-balanced environment work correctly. Where's F5? VMWorld Sept 15-18 in Las Vegas Storage Decisions Sept 23-24 in New York Networld IT Roadmap Sept 23 in Dallas Oracle Open World Sept 21-25 in San Francisco Storage Networking World Oct 13-16 in Dallas Storage Expo 2008 UK Oct 15-16 in London Storage Networking World Oct 27-29 in Frankfurt SESSION COHERENCE Layer 7 persistence is still heavily used in applications that are session sensitive. The most common example is shopping carts stored in the application server session, but it also increasingly important to Web 2.0 and interactive applications where state is important. Sessions are used to store that state and therefore Layer 7 persistence becomes important to maintaining that state in a load-balanced environment. It's common to see layer 7 persistence driven by JSESSIONID or PHPSESSIONID header variables today. It's a question we see in the forums here on DevCentral quite often. Many applications are rolled out, and then inserted into a load balanced environment, and subsequently break because sessions aren't shared across web application servers and the client isn't always routed to the same server or they come back "later" (after the connections have timed out) and expect the application to continue where they left it. If they aren't load balanced back to the same server, the session data isn't accessible and the application breaks. Application server sessions generally persist for hours as opposed to the minutes or seconds allowed for a TCP connection. Layer 4 (TCP) persistence can't adequately address this problem. Source port and IP address aren't always enough to ensure routing to the correct server because it doesn't persist once the connection is closed, and multiple requests coming from the same browser use multiple connections now, each with a different source port. That means two requests on the same page may not be load balanced to the same server, even though they both may require access to the application session data. These sites and applications are used for hours, often with long periods of time between requests, which means connections have often long timed out. Could layer 4 persistence work? Probably, but only if the time-out on these connections were set unreasonably high, which would consume a lot more resources on the load balancer and reduce its capacity significantly. And let's not forget SaaS (Software as a Service) sites like salesforce.com, where rather than mega-proxy issues cropping up we'd have lots-of-little-proxy issues cropping up as businesses still (thanks to IPv4 and the need to monitor Internet use) employ forward proxies. And SSL, too, is highly dependent upon header data to ensure persistence today. I agree with Tony's assessment that the mega proxy problem is largely a non-issue today, but session coherence is taking its place a one of the best reasons to implement layer 7 persistence over layer 4 persistence.366Views0likes1CommentLoad Balancing Fu: Beware the Algorithm and Sticky Sessions
The choice of load balancing algorithms can directly impact – for good or ill – the performance, behavior and capacity of applications. Beware making incompatible choices in architecture and algorithms. One of the most persistent issues encountered when deploying applications in scalable architectures involves sessions and the need for persistence-based (a.k.a. sticky) load balancing services to maintain state for the duration of an end-user’s session. It is common enough that even the rudimentary load balancing services offered by cloud computing providers such as Amazon include the option to enable persistence-based load balancing. While the use of persistence addresses the problem of maintaining session state, it introduces other operational issues that must also be addressed to ensure consistent operational behavior of load balancing services. In particular, the use of the Round Robin load balancing algorithm in conjunction with persistence-based load balancing should be discouraged if not outright disallowed. ROUND ROBIN + PERSISTENCE –> POTENTIALLY UNEQUAL DISTRIBUTION of LOAD When scaling applications there are two primary concerns: concurrent user capacity and performance. These two concerns are interrelated in that as capacity is consumed, performance degrades. This is particularly true of applications storing state as each request requires that the application server perform a lookup to retrieve the user session. The more sessions stored, the longer it takes to find and retrieve the session. The exactly efficiency of such lookups is determined by the underlying storage data structure and algorithm used to search the structure for the appropriate session. If you remember your undergraduate classes in data structures and computing Big (O) you’ll remember that some structures scale more efficiently in terms of performance than do others. The general rule of thumb, however, is that the more data stored, the longer the lookup. Only the amount of degradation is variable based on the efficiency of the algorithms used. Therefore, the more sessions in use on an application server instance, the poorer the performance. This is one of the reasons you want to choose a load balancing algorithm that evenly distributes load across all instances and ultimately why lots of little web servers scaled out offer better performance than a few, scaled up web servers. Now, when you apply persistence to the load balancing equation it essentially interrupts the normal operation of the algorithm, ignoring it. That’s the way it’s supposed to work: the algorithm essentially applies only to requests until a server-side session (state) is established and thereafter (when the session has been created) you want the end-user to interact with the same server to ensure consistent and expected application behavior. For example, consider this solution note for BIG-IP. Note that this is true of all load balancing services: A persistence profile allows a returning client to connect directly to the server to which it last connected. In some cases, assigning a persistence profile to a virtual server can create the appearance that the BIG-IP system is incorrectly distributing more requests to a particular server. However, when you enable a persistence profile for a virtual server, a returning client is allowed to bypass the load balancing method and connect directly to the pool member. As a result, the traffic load across pool members may be uneven, especially if the persistence profile is configured with a high timeout value. -- Causes of Uneven Traffic Distribution Across BIG-IP Pool Members So far so good. The problem with round robin- – and reason I’m picking on Round Robin specifically - is that round robin is pretty, well, dumb in its decision making. It doesn’t factor anything into its decision regarding which instance gets the next request. It’s as simple as “next in line", period. Depending on the number of users and at what point a session is created, this can lead to scenarios in which the majority of sessions are created on just a few instances. The result is a couple of overwhelmed instances (with performance degradations commensurate with the reduction in available resources) and a bunch of barely touched instances. The smaller the pool of instances, the more likely it is that a small number of servers will be disproportionately burdened. Again, lots of little (virtual) web servers scales out more evenly and efficiently than a few big (virtual) web servers. Assuming a pool of similarly-capable instances (RAM and CPU about equal on all) there are other load balancing algorithms that should be considered more appropriate for use in conjunction with persistence-based load balancing configurations. Least connections should provide better distribution, although the assumption that an active connection is equivalent to the number of sessions currently in memory on the application server could prove to be incorrect at some point, leading to the same situation as would be the case with the choice of round robin. It is still a better option, but not an infallible one. Fastest response time is likely a better indicator of capacity as we know that responses times increase along with resource consumption, thus a faster responding instance is likely (but not guaranteed) to have more capacity available. Again, this algorithm in conjunction with persistence is not a panacea. Better options for a load balancing algorithm include those that are application aware; that is, algorithms that can factor into the decision making process the current load on the application instance and thus direct requests toward less burdened instances, resulting in a more even distribution of load across available instances. NON-ALGORITHMIC SOLUTIONS There are also non-algorithmic, i.e. architectural, solutions that can address this issue. DIVIDE and CONQUER In cloud computing environments, where it is less likely to find available algorithms other than industry standard (none of which are application-aware), it may be necessary to approach the problem with a divide and conquer strategy, i.e. lots of little servers. Rather than choosing one or two “large” instances, choose to scale out with four or five “small” instances, thus providing a better (but not guaranteed) statistical chance of load being distributed more evenly across instances. FLANKING STRATEGY If the option is available, an architectural “flanking” strategy that leverages layer 7 load balancing, a.k.a. content/application switching, will also provide better consumptive rates as well as more consistent performance. An architectural strategy of this sort is in line with sharding practices at the data layer in that it separates out by some attribute different kinds of content and serves that content from separate pools. Thus, image or other static content may come from one pool of resources while session-oriented, process intensive dynamic content may come from another pool. This allows different strategies – and algorithms – to be used simultaneously without sacrificing the notion of a single point of entry through which all users interact on the client-side. Regardless of how you choose to address the potential impact on capacity, it is important to recognize the intimate relationship between infrastructure services and applications. A more integrated architectural approach to application delivery can result in a much more efficient and better performing application. Understanding the relationship between delivery services and application performance and capacity can also help improve on operational costs, especially in cloud computing environments that constrain the choices of load balancing algorithms. As always, test early and test often and test under high load if you want to be assured that the load balancing algorithm is suitable to meet your operational and business requirements. WILS: Why Does Load Balancing Improve Application Performance? Load Balancing in a Cloud Infrastructure Scalability Pattern: Sharding Sessions Infrastructure Scalability Pattern: Partition by Function or Type It’s 2am: Do You Know What Algorithm Your Load Balancer is Using? Lots of Little Virtual Web Applications Scale Out Better than Scaling Up Sessions, Sessions Everywhere Choosing a Load Balancing Algorithm Requires DevOps Fu Amazon Makes the Cloud Sticky To Boldly Go Where No Production Application Has Gone Before Cloud Testing: The Next Generation2.3KViews0likes1CommentPersistent and Persistence, What's the Difference?
The English language is one of the most expressive, and confusing, in existence. Words can have different meaning based not only on context, but on placement within a given sentence. Add in the twists that come from technical jargon and suddenly you've got words meaning completely different things. This is evident in the use of persistent and persistence. While the conceptual basis of persistence and persistent are essentially the same, in reality they refer to two different technical concepts. Both persistent and persistence relate to the handling of connections. The former is often used as a general description of the behavior of HTTP and, necessarily, TCP connections, though it is also used in the context of database connections. The latter is most often related to TCP/HTTP connection handling but almost exclusively in the context of load-balancing. Persistent Persistent connections are connections that are kept open and reused. The most commonly implemented form of persistent connections are HTTP, with database connections a close second. Persistent HTTP connections were implemented as part of the HTTP 1.1 specification as a method of improving the efficiency Related Links HTTP 1.1 RFC Persistent Connection Behavior of Popular Browsers Persistent Database Connections Apache Keep-Alive Support Cookies, Sessions, and Persistence of HTTP in general. Before HTTP 1.1 a browser would generally open one connection per object on a page in order to retrieve all the appropriate resources. As the number of objects in a page grew, this became increasingly inefficient and significantly reduced the capacity of web servers while causing browsers to appear slow to retrieve data. HTTP 1.1 and the Keep-Alive header in HTTP 1.0 were aimed at improving the performance of HTTP by reusing TCP connections to retrieve objects. They made the connections persistent such that they could be reused to send multiple HTTP requests using the same TCP connection. Similarly, this notion was implemented by proxy-based load-balancers as a way to improve performance of web applications and increase capacity on web servers. Persistent connections between a load-balancer and web servers is usually referred to as TCP multiplexing. Just like browsers, the load-balancer opens a few TCP connections to the servers and then reuses them to send multiple HTTP requests. Persistent connections, both in browsers and load-balancers, have several advantages: Less network traffic due to less TCP setup/teardown. It requires no less than 7 exchanges of data to set up and tear down a TCP connection, thus each connection that can be reused reduces the number of exchanges required resulting in less traffic. Improved performance. Because subsequent requests do not need to setup and tear down a TCP connection, requests arrive faster and responses are returned quicker. TCP has built-in mechanisms, for example window sizing, to address network congestion. Persistent connections give TCP the time to adjust itself appropriately to current network conditions, thus improving overall performance. Non-persistent connections are not able to adjust because they are open and almost immediately closed. Less server overhead. Servers are able to increase the number of concurrent users served because each user requires fewer connections through which to complete requests. Persistence Persistence, on the other hand, is related to the ability of a load-balancer or other traffic management solution to maintain a virtual connection between a client and a specific server. Persistence is often referred to in the application delivery networking world as "stickiness" while in the web and application server demesne it is called "server affinity". Persistence ensures that once a client has made a connection to a specific server that subsequent requests are sent to the same server. This is very important to maintain state and session-specific information in some application architectures and for handling of SSL-enabled applications. Examples of Persistence Hash Load Balancing and Persistence LTM Source Address Persistence Enabling Session Persistence 20 Lines or Less #7: JSessionID Persistence When the first request is seen by the load-balancer it chooses a server. On subsequent requests the load-balancer will automatically choose the same server to ensure continuity of the application or, in the case of SSL, to avoid the compute intensive process of renegotiation. This persistence is often implemented using cookies but can be based on other identifying attributes such as IP address. Load-balancers that have evolved into application delivery controllers are capable of implementing persistence based on any piece of data in the application message (payload), headers, or at in the transport protocol (TCP) and network protocol (IP) layers. Some advantages of persistence are: Avoid renegotiation of SSL. By ensuring that SSL enabled connections are directed to the same server throughout a session, it is possible to avoid renegotiating the keys associated with the session, which is compute and resource intensive. This improves performance and reduces overhead on servers. No need to rewrite applications. Applications developed without load-balancing in mind may break when deployed in a load-balanced architecture because they depend on session data that is stored only on the original server on which the session was initiated. Load-balancers capable of session persistence ensure that those applications do not break by always directing requests to the same server, preserving the session data without requiring that applications be rewritten. Summize So persistent connections are connections that are kept open so they can be reused to send multiple requests, while persistence is the process of ensuring that connections and subsequent requests are sent to the same server through a load-balancer or other proxy device. Both are important facets of communication between clients, servers, and mediators like load-balancers, and increase the overall performance and efficiency of the infrastructure as well as improving the end-user experience.4.9KViews0likes2CommentsLoad Balancing on the Inside
Business critical internal processing systems often require high-availability and fault tolerance, too. Load balancing and application delivery is almost always associated with scaling out interactive, web-based applications. Rarely does anyone think about load balancing and application delivery in batch processing systems even when those systems might be critical to the business they are supporting. But scaling out non-interactive processing systems and providing high-availability to such critical systems is just as easily accomplished for an application delivery controller (ADC) as it is to scale out an interactive web-based application. Maybe easier. When that system also requires a bit more intelligence than just simple load balancing, it makes a lot of sense to look closer at a context-aware system that can support all the requirements in a single solution. THE SCENARIO A batch document processing system uses a document ID to match all related documents to the same “case.” The first time a document ID is encountered, it creates a new “case” and subsequent documents bearing that ID are attached to the original case. To ensure processing around the clock, a redundant set of application servers is configured to process the documents, and the vendor’s application server clustering solution is used to load balance documents (in simple round-robin fashion) across the two instances. A load test is conducted, ramping up to 2500 documents per hour (41 per minute, fewer than 1 per second). During the test it is discovered that in some situations two documents with the same ID will arrive at the clustering solution in order. They will each be load balanced to separate instances. There is no existing “case” for this document id. Because of processing times and load on the servers, both documents result in the creation of separate “cases.” The test is considered a failure. Because the system, while managing the load fine from a network perspective, executed incorrectly under load from a process perspective. The solution? Reconfigure the clustering solution to an active-standby configuration, thus introducing the process latency needed to ensure that the scenario does not occur. Retest. Success. The result? The investment in the second instance of the application server – hardware, software licenses, management, maintenance – is wasted. It is a “failover” node only and reduces the overall capacity – and ultimately performance at higher load levels – of the system. WHEN CONTEXT MATTERS This scenario is real; it was described to me by a program manager at a Fortune 500 with a great deal of frustration as it seemed, to her anyway, that the architects could not come up with a working solution other than wasting a perfectly good set of resources. Instinctively she described a solution that leveraged persistence to force all documents with the same ID to the same server as it had been proven repeatedly that if all documents with the same ID were processed by the same application server that the system processed them correctly and associated them with the right “case” in all situations. But the application server clustering solution, which can provide server affinity (persistence) based on a few variables, was for some reason not able to support affinity (persistence) based on the document ID. After a few questions regarding the overall system and processing times it became clear that a context-aware application delivery controller could indeed solve this problem. The solution is fairly simple, actually, and based on existing persistence-based load balancing solutions. It is a given that documents with the same ID are batch processed within minutes of each other. Thus, a persistence table with a life of an hour or even thirty-minutes would provide the proper context in which documents could be processed and directed to the “right” web application server. This requires context; it requires that the load balancing solution, the application delivery controller, be aware of not only what it is processing but what it has processed already, and where it’s been sent. Document ID Based Persistence Logic Extract the document ID from the document Check the persistence table for the document ID If the document ID already exists, route the document to the same server as the previous document(s) with that ID If the document ID does not exist, decide which server the document will be sent to for processing and create an entry in the persistence table Wash. Rinse. Repeat. This problem is really about process level execution; about enforcing a business requirement on the technological implementation. In order to achieve compliance with the business process expectations it is necessary to be able to view each request in the context of that process rather than as an individual request that needs to be executed. Thus each touch point in the architecture that needs to manipulate, transform, or perform some task with or on or to the request needs to be able to take into consideration the process; it needs to be context-aware so that its decisions are made within the context of the entire process and not just the individual request. Layer 7 switching, application load balancing, application delivery. Whatever you want to call it, it is the way in which load balancing becomes context-aware and becomes collaborative. It enables the business requirements to be not only taken into consideration but enforced while ensuring that CapEx and OpEx investments in additional systems are not left to sit idle; wasted. It improves capacity essentially by introducing process latency into the equation. By forcing the process to follow a particular path the application delivery controller assists in the technological implementation meeting the goals of the business. In order words, it aligns IT with the business. Sometimes the marketing fluff is more solid than it appears. To Boldly Go Where No Production Application Has Gone Before WILS: Network Load Balancing versus Application Load Balancing Sessions and Cookies and Persistence, oh my! Persistent and Persistence, What's the Difference? If Load Balancers Are Dead Why Do We Keep Talking About Them? A new era in application delivery Infrastructure 2.0: The Diseconomy of Scale Virus The Politics of Load Balancing Business-Layer Load Balancing Not all application requests are created equal245Views0likes1CommentSessions and Cookies and Persistence, oh my!
At some point (you hope!) it becomes necessary to implement load-balancing for your applications. So you went out and got one, either from a hardware vendor or maybe downloaded a solution, and put it into place. Now you're ready to go, right? Maybe not just yet. Do your applications require persistence? Yes? You did remember to validate that your solution is capable of performing persistence-based load-balancing, didn't you? If you're shaking your head wondering why this application thing is important to load balancing, read on. Persistence is one of the best examples of why it's so very important to understand how the applications you will be load-balancing work, because if an application needs persistence, you may break it without a persistent capable load-balancing solution. The Relationship between Sessions and Cookies Sessions are not cookies, but they can (and do) work together to create the illusion of persistence in an otherwise stateless protocol. Sometimes persistence is referred to as "stickiness", or "sticky connections." That's because what persistence does is ensure that a client connects to the "real" server on which his/her current session is active. When a user connects the first time to a site, a session is created on the server to which the user was directed. If the site is load balanced and the user is directed to a second server on the next request, a new session is created. Obviously this is not an optimal situation. What we need is some mechanism to ensure that a user is reconnected to the same server for the duration of a session. Persistence is the process of ensuring that a user is connected to the same server every time they make a request within the boundaries of a single session. Even though users could be persisted based on their IP address, this is rarely done due to the sharing of IP addresses. Persistence is most often implemented using a cookie containing the server session id because it is the most accurate method of determining where a user's session is currently stored. If you're a web developer or administrator, you might think "that sounds a lot like server affinity". You'd be right, of course, server affinity and persistence are two different terms that mean the same thing. Sessions are stored on the server, and are not reliant on cookies being enable in the client's browser. Sessions are where web developers store bits of application relevant data that they may wish to use across requests. Shopping carts are the most ubiquitous example of session data, but there are other uses for it, especially in complex web applications like CRM (customer relationship management) or SFA (sales force automation) applications. Cookies store bits of data on the client (the browser) and are passed to the server via the HTTP header Cookie. Without persistence, users would be unknowingly creating sessions willy-nilly across multiple web servers in a load-balanced environment. That's a waste of resources, as sessions will remain in memory on the web server until they time out according to the web server's configuration. Additionally, a lack of persistence breaks web applications, because the data stored in the session on previous requests is no longer accessible on Server 2 because it's still sitting over on Server 1. This is why it's so important that a load-balancing or application delivery solution is capable of handling persistence-based load distribution. If your load-balancing solution works based on an industry standard algorithm like round-robin, least-connections, or a weighted version of either, then you're likely to break those applications which require persistence because the load balancing algorithms aren't taking session persistence needs into consideration. Persisting Connections The most common data used to persist connections is SSL session id. SSL connections without persistence is like crust without the bread. Yeah, it's that bad. Basically, load balancing SSL without persistence doesn't work. The second most common data used to persist connections is application or server session id, like JSESSIONID or PHPSESSIONID. These IDs are automatically generated by applications and web servers, and are generally passed to the client as a cookie on the first response, and then used by the load balancer to determine to which server it should direct subsequent requests. An example of HTTP headers storing a JSESSIONID in a cookie: Cookie: JSESSIONID=9597856473431 Cache-Control: no-cache Host: 127.0.0.2:8080 Connection: Keep-Alive Your chosen load-balancing or application delivery solution needs to be able to take application session data into consideration when making routing decisions. It must be able to look at HTTP headers and extract the data the web application stored to determine which server it should direct the request to, or you risk breaking your web applications and wasting resources on your servers. Imbibing: Coffee ADDITIONAL RESOURCES Colin has a great entry in his 20LOL series that implements JSessionID based persistence. The iRule should be easily modified to support other types of application ID based persistence, as long as the value is stored in the HTTP Headers somewhere. And Joe has a short article on enabling session persistence on BIG-IP. Wikipedia has a great discussion on Web server session management here.2.2KViews0likes6CommentsScaling Stateful Network Devices
One of the premises of #SDN and #cloud scalability is that it's easy to simply replicate services - whether they be application or network focused - and distribute traffic across them to scale infinitely. In theory, this is absolutely the case. In theory, one can continue to add capacity to any layer of the data center and simply distribute requests across the layer to scale out as necessary. Where reality puts a big old roadblock in the way is when services are stateful. This is the case with many applications - much to the chagrin of cloud and REST purists, by the way - and it is also true with a significant number of network devices. Unfortunately, it is often these devices that proponents of network virtualization target without offering a clear path to addressing the challenges inherent in scaling stateful network devices. SDN's claims to supporting load balancing, at least at layer 4, are almost certainly based on traditional, dumb layer 4 load balancing. We use the term "dumb" to simply mean that it doesn't care about the payload or the application or anything else other than its destination port and service and does not participate in the flow. In most layer 4 load balancing scenarios for which this is the case, the only time the load balancer examines the traffic is when processing a new connection. The load balancer may buffer enough packets to determine some basic networking details - source and destination IP and TCP ports - and then it establishes a connection between the client and the server. From this point on, generally speaking, the load balancer assumes the role of a simple forwarder. Subsequent packets with the same pattern are simply forwarded on to the destination. If you think about it, this is so close to the behavior described by an SDN-enabled network as to be virtually the same. In an SDN-enabled network, a new flow (session if you will, in the load balancing vernacular) would be directed to the SDN controller for processing. The SDN controller would determine its destination and inform the appropriate network components of that decision. Subsequent packets with the same pattern would be forwarded on to the destination according to the information in the FIB (Forwarding Information Base). As the load balancing service was scaled out, inevitably packets would be distributed to components lacking an entry in the FIB. Said components would query the controller, which would simply return the appropriate entry to the device. In such a way, simple layer 4 load balancing can be achieved via SDN*. However, the behavior of the layer 4 load balancing service described is stateless. It does not actively manage the flow. Aside from the initial inspection and routing decision, the load balancing service is actually just a bump in the wire, forwarding packets much in the same manner as any other switch in the network. But what happens when the load balancing service is actively participating in the flow, i.e. it is stateful. Scaling Stateful Devices Stateful devices are those that actively manage a flow. That is, they may inspect, manipulate, or otherwise interact with flows in real-time. These devices are often used for security - both ingress and egress - as well as acceleration and optimization of application exchanges. They are also use for content transformation purposes, such as XML or SOA gateways, API management, and other application-focused scenarios. The most common use of stateful devices is persistent load balancing, aka sticky sessions, aka server affinity. Persistent load balancing requires the load balancing service (or device) maintain a mapping of user to application instance (or server, in traditional, non-virtualized environments). This mapping is unique to the device, and without it a wide variety of applications break when scaled - VDI being the most recent example of an application relying on persistence of sessions . In all these cases, however, one thing is true: the device providing the service is an active participant. The device maintains service-specific information regarding a variety of variables including the user, the device, the traffic, the application, the data. The entire context of the session is often maintained by one or more devices along the traffic chain. What that means is that, like stateful, shared-nothing applications, it matters to which device a specific request is directed. While certainly the same model used at layer 4 and below in which a central controller (or really bank of controllers) maintains this information and doles it on on-demand, the result is that depending on the distribution algorithm used, every stateful device would end up with the same flows installed. In the interim, the network is frantically applying optimization and acceleration policies to traffic that may be offset by the latency introduced by the need to query the controller for session state information, resulting in a net loss of performance experienced by the end-user. And we're not even considering the impact of secured traffic on such a model, where any device needing to make decisions on such traffic must have access to the certificates and keys used to encrypt the traffic in order to decrypt, examine, and usually re-encrypt the traffic. Stateful network devices - application delivery controllers, intrusion prevention and detection systems, secure gateways, etc... - are often required to manage secured content, which means distributing and managing certificates and keys across what may be an ever-expanding set of network devices. The reality is that stateful network devices are a necessary and integral component of not just networks but applications today. While modern network architectures like SDN bring much needed improvements to provisioning and management of large scale networks, their scaling models are based on the premise of stateless, relatively simple devices not actively participating in flows. For those devices that rely upon deep participation in the flow, this model introduces a variety of challenges that may not find a solution that fits well with SDN without compromising on performance outside new protocols capable of carrying that state persistently throughout the lifetime of a session. * This does not address the issue of resources required to maintain said forwarding tables in a given device, which given current capacity of commoditized switches supported for such a role seems unlikely to be realistically achieved.287Views0likes0CommentsWeb 2.0 Killed the Middleware Star
Pondering the impact of cloud and Web 2.0 on traditional middleware messaging-based architectures and PaaS. It started out innocently enough with a simple question, “What exactly *is* the model for PaaS services scalability? If based on HTTP/REST API integration, fairly easy. If native middleware… input?” You’ll forgive the odd phrasing – Twitter’s limitations sometimes make conversations of this nature … interesting. The discussion culminated in what appeared to be the sentiment that middleware was mostly obsolete with respect to PaaS. THE OLD WAY Very briefly for those of you who are more infrastructure / network minded than application architecture fluent, let’s review the traditional middleware-based application architecture. Generally speaking, middleware – a.k.a. JMS, MQ Series and most recently, ESB – is leveraged as means to enable a publish-subscribe model of sharing data. Basically, it’s an integration pattern, but no one really likes to admit that because of the connotations associated with the evil word “integration” in the enterprise. But that’s what it’s used for – to integrate applications by sharing data. It’s more efficient than a point-to-point integration model, especially when one application might need to share data with two, three or more other applications. One application puts data into a queue and other applications pull it out. If the target of the “messages” is multiple applications or people, then the queue keeps the message for a specified period of time otherwise, it deletes or archives (or both) the message after the intended recipient receives it. That pattern is probably very familiar to even those who aren’t entrenched in enterprise application architecture because it’s similar to most social networking software in action today. One person writes a status update, the message, and it’s distributed to all the applications and users who have subscribed (followed, put in a circle, friended, etc… ). The difference between Web-based social networking and traditional enterprise applications is two-fold: First – web-based applications were not, until the advent of Web 2.0 and specifically AJAX, well-suited to “polling for” or “subscribing to” messages (updates, statuses, etc…) thus the use of traditional pub-sub architectures for web applications never much gained traction. Second – Middleware has never scaled well using traditional scalability models (up or out). Web-based applications generally require higher capacity and transaction rates than traditional applications taking advantage of middleware, making middleware’s inability to scale problematic. It is unsuited to use in social networking and other high-volume data sharing systems, where rapidity of response is vital to success. Moreover as James Urquhart noted earlier in the conversation, although cloud computing and virtualization appear capable of addressing the scalability issue by scaling middleware at the VM layer – which is certainly viable and makes the solution scalable in terms of volume – this introduces issues with consistency, a.k.a. CAP, because persistence is not addressed and thus the consistency of messages across queues shared by users is always in question. Basically, we end up – as pointed out by James Saull - with a model that basically kicks the problem to another tier – the scalable persistence service. Generally that means a database-based solution even if we use the power of virtualization and cloud computing to address the innate challenges associated with scaling messaging middleware. Mind you, that doesn’t mean an RDBMS is involved, but a data store of some kind and all data stores introduce similar architectural and technologically issues with consistency, reliability and scalability. THE NEW WAY Now this is not meant to say the concept of queuing, of pub-sub, is absent in web applications and social networking. Quite the contrary, in fact. The concept is seen in just about any social networking site today that bases itself on interaction (integration) of people. What’s absent is the traditional middleware as a means to manage the messages across those people (and applications). See, scaling middleware ran into the same issues as stateful applications – they required persistence or a shared-nothing architecture to ensure proper behavior. The problem as you added middleware servers became the same as other persistence-based issues seen in web applications, digital shopping carts and even today’s VDI implementations. How do you ensure that having been subscribed to a particular topic that you actually manage to get the messages when the load balancing solution arbitrarily directs you to the next available server? You can’t. Hence the use of persistent stores to enable scalability of middleware. What you end up with is essentially 4 tiers – web, application, middleware and database. You might at this point begin to recognize that one of these tiers is redundant and, given the web-based constraints above, unnecessary. Three guesses which one it is, and the first two do not count. Right. The middleware tier. Web 2.0 applications don’t generally use a middleware tier to facilitate messaging across users or applications. They use APIs and web-based database access methods to go directly to the source. Same concept, more scalable implementation, less complexity. As James put it later in the conversation, “the “proven” architecture seems to be Web2, which has its limitations.” BRINGING IT BACK to PaaS So how does this relate to PaaS? Well, PaaS is Platform as a Service which is really a nebulous way of describing developer services delivered in a cloud computing environment. Data, messaging, session management, libraries; the entire application development ecosystem. One of those components is messaging and, so it would seem, traditional middleware (as a service, of course). But the scalability issues with middleware really haven’t been solved and the persistence issues remain. Adding pressure is the web development paradigm in which middleware has traditionally been excluded. Most younger developers have not had the experience (and they should count themselves lucky in this regard) of dealing with queuing systems and traditional pub-sub implementations. They’re a three-tier generation of developers who implement the concept of messaging by leveraging database connectivity directly and most recently polling via AJAX and APIs. Queueing may be involved but if it is, it’s implemented in conjunction with the database – the persistent store – and clients access via the application tier directly, not through middleware. The ease with which web and application tiers are scaled in a cloud computing environment, moreover, meets the higher concurrent user and transaction volume requirements (not to mention performance) associated with highly integrated web applications today. Scaling middleware services at the virtualization layer, as noted above, is possible, but reintroduces the necessity of a persistent store. And if we’re going to use a persistent store, why add a layer of complexity (and cost) to the architecture when Web 2.0 has shown us it is not only viable but inherently more scalable to go directly to that source? At the end of the day, it certainly appears that between cloud computing models and Web 2.0 having been forced to solve the shared messaging concept without middleware – and having done so successfully – that middleware as a service is obsolete. Not dead, mind you, as some will find a use case in which it is a vital component, but those will be few and far between. The scalability and associated persistence issues have been solved by some providers – take RabbitMQ for example – but that ignores the underlying reality that Web 2.0 forced a solution that did not require middleware and that nearly all web-based applications eschew middleware as the mechanism for implementing pub-sub and other similar architectural patterns. We’ve gotten along just fine on the web without it, why reintroduce what is simply another layer of complexity and costs in the cloud unless there’s good reason. Viva la evolution. The Inevitable Eventual Consistency of Cloud Computing Let’s Face It: PaaS is Just SOA for Platforms Without the Baggage Cloud-Tiered Architectural Models are Bad Except When They Aren’t The Database Tier is Not Elastic The New Distribution of The 3-Tiered Architecture Changes Everything The Great Client-Server Architecture Myth Infrastructure Scalability Pattern: Sharding Sessions Infrastructure Scalability Pattern: Partition by Function or Type Applying Scalability Patterns to Infrastructure Architecture Sessions, Sessions Everywhere238Views0likes0CommentsAmazon Makes the Cloud Sticky
Stateless applications may be the long term answer to scalability of applications in the cloud, but until then, we need a solution like sticky sessions (persistence) Amazon recently introduced “stickiness” to its ELB (Elastic Load Balancing) offering. I’ve written a bit about “stickiness”, a.k.a. what we’ve called persistence for oh, nearly ten years now, before so I won’t reiterate again but to say, “it’s about time.” A description of why sticky sessions is necessary was offered in the AWS blog announcing the new feature: Up until now each Load balancer had the freedom to forward each incoming HTTP or TCP request to any of the EC2 instances under its purview. This resulted in a reasonably even load on each instance, but it also meant that each instance would have to retrieve, manipulate, and store session data for each request without any possible benefit from locality of reference. -- New Elastic Load Balancing Feature: Sticky Sessions What the author is really trying to say is that without “sticky sessions” ELB breaks applications because it does not honor state. Remember that most web applications today rely upon state (session) to store quite a bit of application and user specific data that’s necessary for the application to behave properly. When a load balancer distributes requests across instances without consideration for where that state (session) is stored, the application behavior can become erratic and unpredictable. Hence the need for “stickiness”.208Views0likes0CommentsFive questions you need to ask about load balancing and the cloud
Whether you are aware of it or not, if you’re deploying applications in the cloud or building out your own “enterprise class” cloud, you’re going to be using load balancing. Horizontal scaling of applications is a fairly well understood process that involves (old skool) server virtualization of the network kind: making many servers (instances) look like one to the outside world. When you start adding instances to increase capacity for your application, load balancing necessarily gets involved as it’s the way in which horizontal scalability is implemented today. The fact that you may have already deployed an application in the cloud and scaled it up without recognizing this basic fact may lead you to believe you don’t need to care about load balancing options. But nothing could be further from the truth. If you haven’t asked already, you should. But more than that you need to understand the importance of load balancing and its implications to the application. That’s even more true if you’re considering an enterprise cloud, because it will most assuredly be your problem in the long run. Do not be fooled; the options available for load balancing and assuring availability of your application in the cloud will affect your application – if not right now, then later. So let’s start with the five most important things you need to ask about load balancing and cloud environments regardless of where they may reside. #5 DIRECT SERVER RETURN If you’re going to be serving up video or audio – real-time streaming media – you should definitely be interested in whether or not the load balancing solution is capable of supporting direct server return (DSR). While there are pros and cons to using DSR, for video and audio content it’s nearly an untouchable axiom of application delivery that you should enable this capability. DSR basically allows the server to return content directly to the client without being processed by any intermediary (other than routers/switches/etc… which of course need to process individual packets). In most load balancing situations the responses from the server are returned via the same path they took to get to the server, notably through the load balancer. DSR allows responses to return outside the path of the load balancer or, if still returning through it, to do so unmolested. In the latter scenario the load balancer basically acts as a simple packet forwarder and does no additional processing on the packets. The advantage to DSR is that it removes any additional latency imposed by additional processing by intermediaries. Because real-time streaming media is very sensitive to the effects of latency (jitter), DSR is often suggested as a best practice when load balancing servers responsible for serving such content. Question: Is it supported? #4 HEALTH CHECKING One of the ways in which load balancers/application delivery controllers make decisions regarding which server should handle which request is to understand the current status of the application. It’s part of being context-aware, and it provides information about the application that is invaluable not just to the load balancing decision but to the overall availability of the application. Health checking allows a load balancing solution to determine whether a server/instance is “available” based on a variety of factors. At the simplest level an ICMP ping can be used to determine whether the server is available. But that tells it nothing of the state of the application. A three-way TCP handshake is the next “step” up the ladder, and this will tell the load balancing solution whether an application is capable of accepting connections, but still tells it nothing of the state of the application. A simple HTTP GET takes it one step further, but what’s really necessary is the ability of the load balancing solution to retrieve actual data and ensure it is valid in order to consider an application “available”. As the availability of an application may be (and should be if it is not) one way to determine whether new instances are necessary or not, the ability to determine whether the actual application is available and responding appropriately are important in keeping costs down in a cloud environment lest instances be launched for no reason or – more dangerously – instances are not launched when necessary due to an outage or failure. In an external cloud environment it is important to understand how the infrastructure determines when an application is “available” or “not”, based on such monitoring, as the subtle differences in what is actually being monitored/tested can impact application availability. Question: What determines when an application (instance) is available and responding as expected? #3 PERSISTENCE Persistence is one of the most important facets of load balancing that every application developer, architect, and network professional needs to understand. Nearly every application today makes heavy use of application sessions to maintain state, but not every application utilizes a shared database model for its session management. If you’re using standard application or web server session features to manage state in your application, you will need to understand whether the load balancing solutions available supports persistence and how that persistence is implemented. Persistence basically ensures that once a user has been “assigned” a server/instance that all subsequent requests go to that same server/instance in order to preserve access to the application session. Persistence can be based on just about anything depending on the load balancing solution available, but most commonly takes the form of either source ip address or cookie-based. In the case of the former there’s very little for you to do, though you should be somewhat concerned over the use of such a rudimentary method of enabling persistence as it is quite possible – probable, in fact – that many users will be sharing the same source IP address based on NAT and masquerading at the edge of corporate and shared networks. If the persistence is cookie-based then you’ll need to understand whether you have the ability to determine what data is used to enable that persistence. For example, many applications used PHPSESSIONID or ASPSESSIONID as it is routine for those environments to ensure that these values are inserted into the HTTP header and are available for such use. But if you can’t configure the option yourself, you’ll need to understand what values are used for persistence and to ensure your application can support that value in order to match up users with their application state. Question: How is persistence implemented? #2 QUIESCING (BLEEDING) CONNECTIONS Part of the allure of a cloud architecture is the ability to provision resources on-demand. With that comes the assumption that you can also de-provision resources when they are no longer needed. One would further hope this process is automated and based on a policy configurable by the user, but we are still in the early days of cloud so that may be just a goal at this point. Load balancers and clustering solutions can usually be told to begin quiescing (bleeding off) connections. This means that they stop distributing requests to the specified servers/instances but allow existing users to continue using the application until they are finished. It basically takes a server/instance out of the “rotation” but keeps it online until all users have finished and the server/instance is no longer needed. At that point either through a manual or automated process the server/instance can be de-provisioned or taken offline. This is often used in traditional data centers to enable maintenance such as patching/upgrades to occur without interrupting application availability. By taking one server/instance at a time offline the other servers/instances remain in service, serving up requests. In an on-demand environment this is of course used to keep costs controlled by only keeping the instances necessary for current capacity online. What you need to understand is whether that process is manual, i.e. you need to push a button to begin the process of bleeding off connections, or automated. If the latter, then you’ll need to ask about what variables you can use to create a policy to trigger the process. Variables might be number of total connections, requests, users, or bandwidth. It could also, if the load balancing solution is “smart enough” include application performance (response time) or even time of day variables. Question: How do connections quiesce (bleed) off – manually or automatically based on thresholds? #1 FAILOVER We talk a lot about the cloud as a means to scale applications, but we don’t very often mention availability. Availability usually means there needs to be in place some sort of “failover” mechanism, in case an application or server fails. Applications crash, hardware fails, these things happen. What should not happen, however, is that the application becomes unavailable because of these types of inevitable problems. If one instance suddenly becomes unavailable, what happens? That’s the question you need to ask. If there is more than one instance running at that time, then any load balancing solution worth its salt will direct subsequent requests to the remaining available instances. But if there are no other instances running, what happens? If the provisioning process is manual, you may need to push a button and wait for the new instance to come online. If the provisioning process is not manual, then you need to understand how long it will take for the automated system to bring a new instance online, and perhaps ask about the ability to serve up customized “apology” pages that reassure visitors that the site will return shortly. Question: What kind of failover options are available (if any)? THERE ARE NO STUPID QUESTIONS Folks seem to talk and write as if cloud computing relieves IT staff (customers) of the need to understand the infrastructure and architecture of the environments in which applications will be deployed. Because there is an increasingly symbiotic relationship between applications and its infrastructure – both network and application network – this fallacy needs to be exposed for the falsehood it is. It is more important today, with cloud computing, than it ever has been for all of IT – application, network, and security – to understand the infrastructure and how it works together to deliver applications. That means there are no stupid questions when it comes to cloud computing infrastructure. There are certainly other questions you can – and should – ask a potential provider or vendor in order to make the right decision about where to deploy your applications. Because when it comes down to it it’s your application and your customers, partners, and users are not going to be calling/e-mailing/tweeting the cloud provider; they’re going to be gunning for you if things don’t work as expected. Getting the answers to these five questions will provide a better understanding of how your application will handle unexpected failures, allow you to plan appropriately for maintenance/upgrades/patches, and formulate the proper policies for dealing with the nuances of a load balanced application environment. Don’t just ask about product/vendor and hope that will answer your questions. Sure, your cloud provider may be using F5 or another advanced application delivery platform, but that doesn’t mean that they’re utilizing the product in a way that would offer the features you need to ensure your application is always available. So dig deeper and ask questions. It’s your application, it’s your responsibility, no matter where it ends up running. And the Killer App for Private Cloud Computing Is… Not All Virtual Servers are Created Equal Infrastructure 2.0: The Feedback Loop Must Include Applications Cloud Computing: Is your cloud sticky? It should be The Disadvantages of DSR (Direct Server Return) Cloud Computing: Vertical Scalability is Still Your Problem Server Virtualization versus Server Virtualization1.4KViews0likes3CommentsCloud Computing: Is your cloud sticky? It should be.
Load balancing an application should, by now, be a fairly routine scaling exercise. But too often when an application is moved into a load balanced architecture it breaks. The reason? Application sessions are often specific to an application server instance. The solution? Persistence, also known as sticky connections. The use of sessions on application servers to add state to web (HTTP) applications is a common practice. In fact, it's one of the greatest "hacks" in the history of the web. It's an excellent solution to the problem of using a stateless application protocol to build applications for which state is important. But sessions are peculiar to the application server instance on which they were created, and in general are not shared across multiple instances unless you've specifically achitected the application infrastructure to do so. Inserting applications into a load balanced environment often ignores this requirement, as load balancing decisions are often made based on server and application load and not on application specific parameters. THE PROBLEM When a client connects the first time, it's directed to Server A and a session is created. Data specific to the application that needs to be persisted over the session of the application, like shopping carts or search parameters, are stored in that session. When a client makes subsequent requests, however, the load balancer doesn't automatically understand this, and may direct the client to Server B, effectively losing the data in the session and hence breaking the application. This affects cloud computing initiatives because an integral part of cloud computing is load balancing to provide horizontal scalability and integral part of applications is session management. Deploying an application that works properly in a test environment into the cloud may break that application because the load balancing solution utilized by the cloud computing provider isn't aware of the importance of that session to the application and, too often, the session variables are unique to the application and the provider's static network infrastructure isn't capable of adapting to those unique needs. THE SOLUTION When choosing a cloud provider, it is imperative that the load balancing solution used by the provider support sticky connections (persistence) so that developers don’t have to rewrite or re-architect applications for cloud deployment. "Sticky connections", or persistence in the networking vernacular, ensure that client requests are load balanced to the appropriate server on which their application session resides. This maintains the availability of the session and means applications don't "break" when deployed into the cloud or inserted into a load balanced environment. Sticky (persistent) connections are often implemented in load balancers (application delivery controllers) by using application server session IDs to help make the decision on where requests should be sent. This results in consistent behavior and a well behaved application. The most common method of implementing sticky (persistent) connections is the use of cookie persistence. Cookie persistence inserts a cookie into the response that is later used to properly direct requests. The cookie often contains the JSESSIONID or PHPSESSIONID or ASPSESSIONID, but can actually contain any data that would allow the load balancer (application delivery controller) to identify the right application server for any given request. ACTION ITEMS If you are deploying applications into the cloud, or planning on doing so, and those applications are session sensitive, it is important that you determine before you deploy into the cloud whether or not your provider's solution supports sticky (persistent) connections, and how that persistence is implemented. Some solutions are not very flexible and will only provide persistence based on a limited array of variables and you will need to instrument your application to support the proper variables. Other solutions provide network-side scripting capabilities that will allow the provider - and you - to support persistence on any variable you deem unique enough to identify the proper application server instance. Note that server FQDN (fully qualified domain name) or IP address will likely not be adequate in a cloud computing environment as these variables may not be guaranteed to be static. The unique variable should ostensibly be something application specific, as this is likely the only variables you will have control over in the cloud and the only variables guaranteed to be consistent across instances as they are brought on and off-line in response to changes in demand. If you're currently examining the possibility of deploying applications in the cloud, make sure the cloud you choose is sticky to avoid the possibility that you'll need to rearchitect your application to get it to work properly.297Views0likes2Comments