replication
21 TopicsSessions, Sessions Everywhere
If you’re replicating session state across application servers you probably need to rethink your strategy. There’s other options – more efficient options – than wasting RAM and, ultimately, money. Although the discussion of Oracle’s “cloud in a box” announcement at OpenWorld dominated much of the tweet-stream this week there were other discussions going on that proved to not only interesting but a good reminder of how cloud computing has brought to the fore the importance of architecture. Foremost in my mind was what started as a lamentation on the fact that Amazon EC2 does not support multicasting that evolved into a discussion on why that would cause grief for those deploying applications in the environment. Remember that multicast is essentially spraying the same data to a group of endpoints and is usually leveraged for streaming media topologies: In computer networking, multicast is the delivery of a message or information to a group of destination computers simultaneously in a single transmission from the source creating copies automatically in other network elements, such as routers, only when the topology of the network requires it. -- Wikipedia, multicast As it turns out, a primary reason behind the need for multicasting in the application architecture revolves around the mirroring of session state across a pool of application servers. Yeah, you heard that right – mirroring session state across a pool of application servers. The first question has to be: why? What is it about an application that requires this level of duplication? MULTICASTING for SESSIONS There are three reasons why someone would want to use multicasting to mirror session state across a pool of application servers. There may be additional reasons that aren’t as common and if so, feel free to share. The application relies on session state and, when deployed in a load balanced environment, broke because the tight-coupling between user and session state was not respected by the Load balancer. This is a common problem when moving from dev/qa to production and is generally caused by using a load balancing algorithm without enabling persistence, a.k.a. sticky sessions. The application requires high-availability that necessitates architecting a stateful-failover architecture. By mirroring sessions to all application servers if one fails (or is decommissioned in an elastic environment) another can easily re-establish the coupling between the user and their session. This is not peculiar to application architecture – load balancers and application delivery controllers mirror their own “session” state across redundant pairs to achieve a stateful failover architecture as well. Some applications, particularly those that are collaborative in nature (think white-boarding and online conferences) “spray” data across a number of sessions in order to enable the sharing in real time aspect of the application. There are other architectural choices that can achieve this functionality, but there are tradeoffs to all of them and in this case it is simply one of several options. THE COST of REPLICATING SESSIONS With the exception of addressing the needs of collaborative applications (and even then there are better options from an architectural point of view) there are much more efficient ways to handle the tight-coupling of user and session state in an elastic or scaled-out environment. The arguments against multicasting session state are primarily around resource consumption, which is particularly important in a cloud computing environment. Consider that the typical session state is 3-200 KB in size (Session State: Beyond Soft State ). Remember that if you’re mirroring every session across an entire cluster (pool) of application servers, that each server must use memory to store that session. Each mirrored session, then, is going to consume resources on every application server. Every application server has, of course, a limited amount of memory it can utilize. It needs that memory for more than just storing session state – it must also store connection tables, its own configuration data, and of course it needs memory in which to execute application logic. If you consume a lot of the available memory storing the session state from every other application server, you are necessarily reducing the amount of memory available to perform other important tasks. This reduces the capacity of the server in terms of users and connections, it reduces the speed with which it can execute application logic (which translates into reduced response times for users), and it operates on a diminishing returns principle. The more application servers you need to scale – and you’ll need more, more frequently, using this technique – the less efficient each added application server becomes because a good portion of its memory is required simply to maintain session state of all the other servers in the pool. It is exceedingly inefficient and, when leveraging a public cloud computing environment, more expensive. It’s a very good example of the diseconomy of scale associated with traditional architectures – it results in a “throw more ‘hardware’ at the problem, faster” approach to scalability. BETTER ARCHITECTURAL SOLUTIONS There are better architectural solutions to maintaining session state for every user. SHARED DATABASE Storing session state in a shared database is a much more efficient means of mirroring session state and allows for the same guarantees of consistency when experiencing a failure. If session state is stored in a database then regardless of which application server instance a user is directed to that application server has access to its session state. The interaction between the user and application becomes: User sends request Clustering/load balancing solution routes to application server Application server receives request, looks up session in database Application server processes request, creates response Application server stores updated session in database Application server returns response If a single database is problematic (because it is a single point of failure) then multicasting or other replication techniques can be used to implement a dual-database architecture. This is somewhat inefficient, but far less so than doing the same at the application server layer. PERSISTENCE-BASED LOAD BALANCING It is often the case that the replication of session state is implemented in response to wonky application behavior occurring only when the application is deployed in a scalable environment, a.k.a a load balancing solution is introduced into the architecture. This is almost always because the application requires tight-coupling between user and session and the load balancing is incorrectly configured to support this requirement. Almost every load balancing solution – hardware, software, virtual network appliance, infrastructure service – is capable of supporting persistence, a.k.a. sticky sessions. This solution requires, however, that the load balancing solution of choice be configured to support the persistence. Persistence (also sometimes referred to as “server affinity” when implemented by a clustering solution) can be configured in a number of ways. The most common configuration is to leverage the automated session IDs generated by application servers, e.g. PHPSESSIONID, ASPSESSIONID. These ids are contained in the HTTP headers and are, as a matter of fact, how the application server “finds” the appropriate session for any given user’s request. The load balancer intercepts every request (it does anyway) and performs the same type of lookup on its own session table (which is much, much higher capacity than an application server and leverages the same high-performance lookups used to store connection and network session tables) and routes the user to the appropriate application server based on the session ID. The interaction between the user and application becomes: User sends request Clustering/load balancing solution finds, if existing, the session-app server mapping. If it does not, it chooses the application server based on the load balancing algorithm and configured parameters Application server receives request, Application server processes request, creates response Application server returns response Clustering/load balancing solution creates the session-app server mapping if it did not already exist Persistence can generally be based on any data in the HTTP header or payload, but using the automatically generated session ids tends to be the most common implementation. YOUR INFRASTRUCTURE, GIVE IT TO ME Now, it may be the case when the multicasting architecture is the right one. It is impossible to say it’s never the right solution because there are always applications and specific scenarios in which an architecture that may not be a good idea in general is, in fact, the right solution. It is likely the case, however, in most situations that it is not the right solution and has more than likely been implemented as a workaround in response to problems with application behavior when moving through a staged development environment. This is one of the best reasons why the use of a virtual edition of your production load balancing solution should be encouraged in development environments. The earlier a holistic strategy to application design and architecture can be employed the fewer complications will be experienced when the application moves into the production environment. Leveraging a virtual version of your load balancing solution during the early stages of the development lifecycle can also enable developers to become familiar with production-level infrastructure services such that they can employ a holistic, architectural approach to solving application issues. See, it’s not always because developers don’t have the know how, it’s because they don’t have access to the tools during development and therefore can’t architect a complete solution. I recall a developer’s plaintive query after a keynote at [the now defunct] SD West conference a few years ago that clearly indicated a reluctance to even ask the network team for access to their load balancing solution to learn how to leverage its services in application development because he knew he would likely be denied. Network and application delivery network pros should encourage the use of and tinkering with virtual versions of application delivery controllers/load balancers in the application development environment as much as possible if they want to reduce infrastructure and application architectural-related issues from cropping up during production deployment. A greater understanding of application-infrastructure interaction will enable more efficient, higher performing applications in general and reduce the operational expenses associated with deploying applications that use inefficient methods such as replication of session state to address application architectural constraints. Related blogs & articles: Applying Scalability Patterns to Infrastructure Architecture Scalability Only One Half the Reliability Equation Service Virtualization Helps Localize Impact of Elastic Scalability Web 2.0: Integration, APIs, and Scalability Automating scalability and high availability services To Take Advantage of Cloud Computing You Must Unlearn, Luke. Scalability with multiple networks for Virtual Servers ... Cloud Lets You Throw More Hardware at the Problem Faster And That, Young Cloudwalker, Is Why You Fail599Views0likes0CommentsDatabases in the Cloud Revisited
A few of us were talking on Facebook about high speed rail (HSR) and where/when it makes sense the other day, and I finally said that it almost never does. Trains lost out to automobiles precisely because they are rigid and inflexible, while population densities and travel requirements are highly flexible. That hasn’t changed since the early 1900s, and isn’t likely to in the future, so we should be looking at different technologies to answer the problems that HSR tries to address. And since everything in my universe is inspiration for either blogging or gaming, this lead me to reconsider the state of cloud and the state of cloud databases in light of synergistic technologies (did I just use “synergistic technologies in a blog? Arrrggghhh…). There are several reasons why your organization might be looking to move out of a physical datacenter, or to have a backup datacenter that is completely virtual. Think of the disaster in Japan or hurricane Katrina. In both cases, having even the mission critical portions of your datacenter replicated to the cloud would keep your organization online while you recovered from all of the other very real issues such a disaster creates. In other cases, if you are a global organization, the cost of maintaining your own global infrastructure might well be more than utilizing a global cloud provider for many services… Though I’ve not checked, if I were CIO of a global organization today, I would be looking into it pretty closely, particularly since this option should continue to get more appealing as technology continues to catch up with hype. Today though, I’m going to revisit databases, because like trains, they are in one place, and are rigid. If you’ve ever played with database Continuous Data Protection or near-real-time replication, you know this particular technology area has issues that are only now starting to see technological resolution. Over the last year, I have talked about cloud and remote databases a few times, talking about early options for cloud databases, and mentioning Oracle Goldengate – or praising Goldengate is probably more accurate. Going to the west in the US? HSR is not an option. The thing is that the options get a lot more interesting if you have Goldengate available. There are a ton of tools, both integral to database systems and third-party that allow you to encrypt data at rest these days, and while it is not the most efficient access method, it does make your data more protected. Add to this capability the functionality of Oracle Goldengate – or if you don’t need heterogeneous support, any of the various database replication technologies available from Oracle, Microsoft, and IBM, you can seamlessly move data to the cloud behind the scenes, without interfering with your existing database. Yes, initial configuration of database replication will generally require work on the database server, but once configured, most of them run without interfering with the functionality of the primary database in any way – though if it is one that runs inside the RDBMS, remember that it will use up CPU cycles at the least, and most will work inside of a transaction so that they can insure transaction integrity on the target database, so know your solution. Running inside the primary transaction is not necessary, and for many uses may not even be desirable, so if you want your commits to happen rapidly, something like Goldengate that spawns a separate transaction for the replica are a good option… Just remember that you then need to pay attention to alerts from the replication tool so that you don’t end up with successful transactions on the primary not getting replicated because something goes wrong with the transaction on the secondary. But for DBAs, this is just an extension of their daily work, as long as someone is watching the logs. With the advent of Goldengate, advanced database encryption technology, and products like our own BIG-IPWOM, you now have the ability to drive a replica of your database into the cloud. This is certainly a boon for backup purposes, but it also adds an interesting perspective to application mobility. You can turn on replication from your data center to the cloud or from cloud provider A to cloud provider B, then use VMotion to move your application VMS… And you’re off to a new location. If you think you’ll be moving frequently, this can all be configured ahead of time, so you can flick a switch and move applications at will. You will, of course, have to weigh the impact of complete or near-complete database encryption against the benefits of cloud usage. Even if you use the adaptability of the cloud to speed encryption and decryption operations by distributing them over several instances, you’ll still have to pay for that CPU time, so there is a balancing act that needs some exploration before you’ll be certain this solution is a fit for you. And at this juncture, I don’t believe putting unencrypted corporate data of any kind into the cloud is a good idea. Every time I say that, it angers some cloud providers, but frankly, cloud being new and by definition shared resources, it is up to the provider to prove it is safe, not up to us to take their word for it. Until then, encryption is your friend, both going to/from the cloud and at rest in the cloud. I say the same thing about Cloud Storage Gateways, it is just a function of the current state of cloud technology, not some kind of unreasoning bias. So the key then is to make sure your applications are ready to be moved. This is actually pretty easy in the world of portable VMs, since the entire VM will pick up and move. The only catch is that you need to make sure users can get to the application at the new location. There are a ton of Global DNS solutions like F5’s BIG-IP Global Traffic Manager that can get your users where they need to be, since your public-facing IPs will be changing when moving from organization to organization. Everything else should be set, since you can use internal IP addresses to communicate between your application VMs and database VMs. Utilizing a some form of in-flight encryption and some form of acceleration for your database replication will round out the solution architecture, and leave you with a road map that looks more like a highway map than an HSR map. More flexible, more pervasive.365Views0likes0CommentsCloud Computing: Will data integration be its Achilles Heel?
Wesley: Now, there may be problems once our app is in the cloud. Inigo: I'll say. How do I find the data? Once I do, how do I integrate it with the other apps? Once I integrate it, how do I replicate it? If you remember this somewhat altered scene from the Princess Bride, you also remember that no one had any answers for Inigo. That's apropos of this discussion, because no one has any good answers for this version of Inigo either. And no, a holocaust cloak is not going to save the day this time. If you've been considering deploying applications in a public cloud, you've certainly considered what must be the Big Hairy Question regarding cloud computing: how do I get at my data? There's very little discussion about this topic, primarily because at this point there's no easy answer. Data stored in the cloud is not easily accessible for integration with applications not residing in the cloud, which can definitely be a roadblock to adopting public cloud computing. Stacey Higginbotham at GigaOM had a great post on the topic of getting data into the cloud, and while the conclusion that bandwidth is necessary is also applicable to getting your data out of the cloud, the details are left in your capable hands. We had this discussion when SaaS (Software as a Service) first started to pick up steam. If you're using a service like salesforce.com to store business critical data, how do you integrate that back into other applications that may need it? Web services were the first answer, followed by integration appliances and solutions that included custom-built adapters for salesforce.com to more easily enable access and integration to data stored "out there", in the cloud. Amazon offers URL-based and web services access to data stored in its SimpleDB offering, but that doesn't help folks who are using Oracle, SQL Server, or MySQL offerings in the cloud. And SimpleDB is appropriately named; it isn't designed to be an enterprise class service - caveat emptor is in full force if you rely upon it for critical business data. RDBMS' have their own methods of replication and synchronization, but mirroring and real-time replication methods require a lot of bandwidth and very low latency connections - something not every organization can count on having. Of course you can always deploy custom triggers and services that automatically replicate back into the local data center, but that, too, is problematic depending on bandwidth availability and accessibility of applications and databases inside the data center. The reverse scenario is much more likely, with a daemon constantly polling the cloud computing data and pulling updates back into the data center. You can also just leave that data out there in the cloud, implement, or take advantage of if they exist, service-based access to the data and integrate it with business processes and applications inside the data center. You're relying on the availability of the cloud, the Internet, and all the infrastructure in between, but like the solution for integrating with salesforce.com and other SaaS offerings, this is likely the best of a set of "will have to do" options. The issue of data and its integration has not yet raised its ugly head, mostly because very few folks are moving critical business applications into the cloud and admittedly, cloud computing is still in its infancy. But even non-critical applications are going to use or create data, and that data will, invariably, become important or need to be accessed by folks in the organization, which means access to that data will - probably sooner rather than later - become a monkey on the backs of IT. The availability of and ease of access to data stored in the public cloud for integration, data mining, business intelligence, and reporting - all common enterprise application use of data - will certainly affect adoption of cloud computing in general. The benefits of saving dollars on infrastructure (management, acquisition, maintenance) aren't nearly as compelling a reason to use the cloud when those savings would quickly be eaten up by the extra effort necessary to access and integrate data stored in the cloud. Related articles by Zemanta SQL-as-a-Service with CloudSQL bridges cloud and premises Amazon SimpleDB ready for public use Blurring the functional line - Zoho CloudSQL merges on-site and on-cloud As a Service: The many faces of the cloud A comparison of major cloud-computing providers (Amazon, Mosso, GoGrid) Public Data Goes on Amazon's Cloud300Views0likes2CommentsLoad Balancing For Developers: Improving Application Performance With ADCs
If you’ve never heard of my Load Balancing For Developers series, it’s a good idea to start here. There are quite a few installments behind us, and I’m not going to look back in this post any more than I must to make it readable without going back… Meaning there’s much more detail back there than I’ll relate here. Again after a lengthy sojourn covering other points of interest, I return to Load Balancing For Developers with a more holistic view – application performance. Lori has talked a bit about this topic, and I’ve talked about it in the form of Load Balancing benefits and algorithms, but I’d like to look more architecturally again, and talk about those difficult to uncover performance issues that web apps often face. You’re the IT manager for the company’s Zap-n-Go website, it has grown nearly exponentially since launch, and you’re the one responsible for keeping it alive. Lately it’s online, but your users are complaining of sluggishness. Following the advice of some guy on the Internet, you put a load balancer in about a year ago, and things were better, but after you put in a redundant data center and Global Load Balancing services, things started to degrade again. Time to rethink your architecture before your product gets known as Zap-N-Gone… Again. Thus far you have a complete system with multiple servers behind an ADC in your primary data center, and a complete system with multiple servers behind an ADC in your secondary data center. Failover tests work correctly when you shut down the primary web servers, and the database at the remote location is kept up to date with something like Data Guard for Oracle or Merge Replication Services for SQL Server. This meets the business requirement that the remote database is up-to-date except for those transactions in-progress at the moment of loss. This makes you highly HA, and if your ADCs are running as an HA pair and your Global DNS – Like our GTM product - is smart enough to switch when it notices your primary site is down, most users won’t even know they’ve been shoved off to the backup datacenter. The business is happy, you’re sleeping at night, all is well. Except that slowly, as usage for the site has grown, performance has suffered. What started as a slight lag has turned into a dragging sensation. You’ve put more web servers into the pool of available resources – or better yet, used your management tools (in the ADC and on your servers) to monitor all facets of web server performance – disk and network I/O, CPU and memory utilization. And still, performance lags. Then you check on your WAN connection and database, and find the problem. Either the WAN connection is overloaded, or the database is waiting long periods of time for responses from the secondary datacenter. If you have things configured so that the primary doesn’t wait for acknowledgment from the secondary database, then your problem might be even more sinister – some transactions may never get deposited in the secondary datacenter, causing your databases to be out of synch. And that’s a problem because you need the secondary database to be as up to date as possible, but buying more bandwidth is a monthly overhead expense, and sometimes it doesn’t help – because the problem isn’t always about bandwidth, sometimes it is about latency. In fact, with synchronous real-time replication, it is almost always about latency. Latency, for those who don’t know, is a combination of how far your connection must travel over the wire and the number of “bumps in the wire” that have been inserted. Not actually the number of devices, but the number and their performance. Each device that touches your data – packet inspection, load balancing, security, whatever the reason – adds time to the delivery window. So does traveling over the wires/fiber. Synchronous replication is very time sensitive. If it doesn’t hear back in time, it doesn’t commit the changes, and then the primary and secondary databases don’t match up. So you need to cut down the latency and improve the performance of your WAN link. Conveniently, your ADC can help. Out-of-the-box it should have TCP optimizations that cut down the impact of latency by reducing the number of packets going back and forth over the wire. It may have compression too – which cuts down the amount of data going over the wire, reducing the number of packets required, which improves the “apparent” performance and the amount of data on your WAN connection. They might offer more functionality than that too. And you’ve already paid for an HA pair – putting one in each datacenter – so all you have to do is check what they do “out of the box” for WAN connections, and then call your sales representative to find out what other functionality is available. F5 includes some functionality in our LTM product, and has more in our add-on WAN Optimization Module (WOM) that can be bought and activated on your BIG-IP. Other vendors have a variety of architectures to offer you similar functionality, but of course I work for and write for F5, so my view is that they aren’t as good as our products… Certainly check with your incumbent vendor before looking for other solutions to this problem. We have seen cases where replication was massively improved with WAN Optimization. More on that in the coming days under a different topic, but just the thought that you can increase the speed and reliability of transaction-based replication (and indeed, file/storage replication, but again, that’s another blog), and you as a manager or a developer do not have to do a thing to your code. That implies the other piece – that this method of improvement is applicable to applications that you have purchased and do not own the source code for. So check it out… At worst you will lose a few hours tracking down your vendor’s options, at best you will be able to go back to sleep at night. And if you’re shifting load between datacenters, as I’ve mentioned before, Long Distance vMotion is improved by these devices too. F5’s architecture for this solution is here – PDF deployment guide. This guide relies upon the WOM functionality mentioned above. And encryption is supported between devices. That means if you are not encrypting your replication, that you can start without impacting performance, and if you are encrypting, you can offload the work of encryption to a device designed to handle it. And bandwidth allocation means you can guarantee your replication has enough bandwidth to stay up to date by giving it priority. But you won’t care too much about that, you’ll be relaxing and dreaming of beaches and stock options… Until the next emergency crops up anyway.255Views0likes0CommentsCopied Data. Is it a Replica, Snapshot, Backup, or an Archive?
It is interesting to me the number of variant Transformers that have been put out over the years, and the effect that has on those who like transformers. There are four different “Construction Devastator” figures put out over the years (there may be more, I know of four), and every Transformers collector or fan that I know – including my youngest son – want them all. That’s great marketing on the part of Hasbro, for certain, but it does mean that those who are trying to collect them are going to have a hard time of it, just because they were produced and then stopped, and all of them consist of seven or more parts. That’s a lot of things to go wrong. But still, it is savvy for Hasbro to recognize that a changed Transformer equates to more sales, even though it angers the diehard fans. As time moves forward, technology inevitably changes things. In IT that statement implies “at the speed of light”. Just like your laptop has been replaced with a newer model before you get it, and is “completely obsolete” within 18 months, so other portions of the IT field are quickly subsumed or consumed by changes. The difference is that IT is less likely to get caught up in the “new gadget” hype than the mass market. So while your laptop was technically outdated before it landed in your lap, IT knows that it is still perfectly usable and will only replace it when the warrantee is up (if you work for a smart company) or it completely dies on you (for a company pinching pennies). The same is true in every piece of storage, it is just that we don’t suffer from “Transformer Syndrome”. Old storage is just fine for our purposes, unless it actually breaks. Since you can just continue to pay annual licensing fees, there’s no such thing as “out of warrantee” storage unless you purchase very inexpensive, or choose to let it lapse. For the very highest end, letting it lapse isn’t an option, since you’re licensing the software. The same is true with how we back up and restore that data. Devastator, image courtesy of Gizmodo.com But even with a stodgy group like IT, who has been bitten enough times to know that we don’t change something unless there’s a darned good reason, eventually change does come. And it’s coming to backup and replication. There are a lot of people still differentiating between backups and replication. I think it’s time for us to stop doing so. What are the differences? Let’s take a look. Backups go to tape. Hello Virtual Tape Libraries, how are you? Backups are archival. Hello tiering, you allow us to move things to different storage types, and replicate them at different intervals, right? So all is correctly backed up for its usage levels? Replication is near-real-time. Not really. You’re thinking of Continuous Data Protection (CDP), which is gaining traction by app, not broadly. Replication goes to disk and that makes it much faster. See #1. VTL is fast too. Tape is slow. Right, but that’s a target problem, not a backup problem. VTLs are fast. Replication can do just the changes. Yeah, why this one ever became a myth, I’ll never know, but remember “incremental backups”? Same thing. I’m not saying they’re exactly the same – incremental replicas can be reverse applied so that you can take a version of the file without keeping many copies, and that takes work in a backup environment, what I AM saying is that once you move to disk (or virtual disk in the case of cloud storage), there isn’t really a difference worthy of keeping two different phrases. Tape isn’t dead, many of you still use a metric ton of it a year, but it is definitely waning, slowly. Meaning more and more of us are backing up or replicating to disk. Where did this come from? A whitepaper I wrote recently came back from technical review with “this is not accurate when doing backups”, and that got me to thinking “why the heck not?” If the reason for maintaining two different names is simply a people reason, while the technology is rapidly becoming the same mechanisms – disk in, disk out, then I humbly suggest we just call it one thing, because all maintaining two names and one fiction does is cause confusion. For those who insist that replicas are regularly updated, I would say making a copy or snapshotting them eliminates even that difference – you now have an archival copy that is functionally the same as a major backup. Add in an incremental snapshot and, well, we’re doing a backup cycle. With tiering, you can set policies to create snapshots or replicas on different timelines for different storage platforms, meaning that your tier three data can be backed up very infrequently, while your tier one (primary) storage is replicated all of the time. Did you see what I did there? The two are used interchangeably. Nobody died, and there’s less room for confusion. Of course I think you should use our ARX to do your tiering, ARX Cloud Extender to do your cloud connections, and take advantage of the built-in rules engine to help maintain your backup schedule. But the point is that we just don’t need two names for what is essentially the same thing any more. So let’s clean up the lingo. Since replication is more accurate to what we’re doing these days, let’s just call it replication. We have “snapshot” that is already associated with replication for point-in-time copies, which makes us able to differentiate between a regularly updated replica and a frozen-in-time “backup”. Words fall in and out of usage all of the time, let’s clean up the tech lingo and all get on the same language. No, no we won’t, but I’ve done my bit by suggesting it. And no doubt there are those confused by the current state of lingo that this will help to understand that yes, they are essentially the same thing, only archaic history keeps them separate. Or you could buy all three – replicate to a place where you can take a snapshot and then back up the snapshot (not as crazy as it sounds, I have seen this architecture deployed to get the backup process out of production, but I was being facetious). And you don’t need a ton of names. You replicate to secondary (tertiary) storage, then take a snapshot, then move or replicate the snapshot to a remote location – like the cloud or remote datacenter. Not so tough, and one term is removed from the confusion, inadvertently adding crispness to the other terms.251Views0likes0CommentsCommitting to Overhead: Proceed With Caution.
Back when SaaS was making its debut in the enterprise, I was a mid-level IT manager with a boss that was smart. It was a great experience working for him overall, and if not for external pressures, I might still be working on his team. One of the SaaS conversations we had was pretty relevant to today’s rush to public cloud. He looked around the room and asked “Why are we getting rid of our mainframes?” There was the standard joking about old dogs and new tricks, and then the more serious cost analysis. Finally he said “No, we’re getting rid of our mainframes because a couple of decades ago, someone in my position said ‘we’ll sign these contracts that create overhead for ever, and future IT managers will have to deal with it. We won’t consider what happens when the market turns and the overhead is fixed even though the organization is making less, we won’t consider that this overhead will cost millions over the years. We’ll take the route we like, and everyone moving forward will have to deal with it.” We all pondered, it was a pretty cynical way to look at a process that chose the only viable solution back in the day, but it had a kernel of truth in it. He waited a bit, then finished. “And that’s why we will not be using SaaS unless we have an exit strategy that covers all of the bases. We will not sign future IT managers on to overhead that we cannot determine is onerous or not. If we have a way to get our data into the system, a way to get our data out of the system, and proof that it is as secure as it is on our premises, then we will utilize SaaS to the maximum.” That was good reasoning then, it’s good reasoning now. Though cloud is much more forgiving in terms of getting your data in and out, his point about committing the future to a fixed overhead holds today. When you own the systems, delaying upgrades or consolidating servers is an option. Dropping support to save money is an option. There are all sorts of fiscal flexibility issues that cloud takes away from management when times get tough. Typical mainframe – the early years. Compliments of ComputerScienceLab.Com That’s not to say “public cloud is a bad thing”, it is to say that the needs of an enterprise are not the same as those of a start-up or small business. There are even valid reasons that international corporations have chosen not to take email to the cloud, though cloud based email is appealing to an organization that would need servers in multiple datacenters and administrators with extreme email chops. As with everything, consider the options and do what’s best for your organization. The buzz words are not why we all have jobs, solving problems for business is. Even if you feel about cloud as my boss did about SaaS, you still have cloud opportunities. Replication is a good one if the replication tool handles encryption and compression. Testing is a no-brainer if your test data is scrubbed first. And capacity planning is a big one. If you deploy a pilot to the cloud and get a reasonable estimation of what kind of throughput, server utilization, etc. the application will require, then you can move it in-house and right-size the environment based upon projections from the pilot. It won’t be perfect, but it’s better than many of the capacity planning systems out there today, particularly the “let’s turn it on, and then worry about capacity” model some of you are using. And for some organizations, tasks like email really can be shipped to the cloud (or a SaaS provider that claims to be a cloud), it just depends upon the legal and accountability standards your organization must or has chosen to implement. Though looking ahead, make a plan for getting out. It’s not about distrusting your provider, it is about risk management. Even if you love your provider today, they’re one purchase or upper management change away from being the biggest PITA you have to deal with every day. The best system is if you’re actually doing cloud, replicate your VMs back to HQ on a regular basis. This process is easy and gives you a fall-back. You don’t have to “get your data out of the cloud”, it will already be out if you need it. And like I’ve said elsewhere, for many of the compliance/security concerns, extend your existing infrastructure to the cloud where you can. No sense implementing two separate access control systems when you really only need one, only geographic location separates them. Just some things to keep in mind when moving. Sure it’s cheaper this month, and maybe even cheaper in the long haul (the vote is still very much out on that one), but it will cost you some financial flexibility, fix more of your budget into immobility. If that trade-off is good for you, then just make sure you have an exit plan, because sooner or later, keeping a cloud service will no longer be your first choice, or you’ll have moved on and it will be someone else’s.220Views0likes0CommentsF5 Friday: CSG Case Study Shows Increased Performance, Less WAN Traffic With Dell and F5
When time and performance mattered, CSG Content Direct turned to Dell and F5 to make their replication faster while reducing WAN utilization. We talk a lot in our blogs about what benefits you could get from an array of F5 products, so when this case study (pdf link) hit our inboxes, we thought you’d like to hear about what CSG’s Content Direct did get out of deploying F5 BIG-IPWOM. Utilizing tools by two of the premier technology companies in the world, Content Direct was able to decrease backup windows to as little as 5% of their previous time, and reduce traffic on the WAN significantly. At the heart of the problem was WAN performance that was inhibiting their replication to a remote datacenter and causing them to fall further and further behind. Placing a BIG-IP WOM between their Dell EqualLogic iSCSI devices, Content Direct was able to improve performance to the point that they are now able to meet their RPOs and RTOs with room for expansion. Since Content Direct already deployed F5 BIG-IP LTM, they were able to implement this solution by purchasing and installing F5 BIG-IP WAN Optimization Manager (WOM) on the existing BIG-IP hardware, eliminating the need for new hardware. The improvements that they saw while replicating iSCSI devices is in line with the improvements our testing has shown for NAS device replication also, making this case study a good examination of what you can expect from BIG-IP WOM in many environments. Since BIG-IP WOM supports a wide array of applications – from the major NAS vendors to the major database vendors – and includes offloading of encryption from overburdened servers, you can deploy it once and gain benefits at many points in your architecture. If you are sending a lot of data between two datacenters, BIG-IP WOM has help for your overburdened WAN connection. Check out our White Papers and Solution Profiles relevant to BIG-IP WOM for more information about how it might help, and which applications have been tested for improvement measurements. Of course BIG-IP WOM works on IP connections, and as such can improve many more scenarios than we have tested or even could reasonably test, but those applications tested will give you a feel for the amount of savings you can get when deploying BIG-IP WOM on your WAN. And if you are already a BIG-IP LTM customer, you can upgrade to include WOM without introducing a new device into your already complex network. Related Blogs: F5 Friday: Speed Matters F5 Friday: Performance, Throughput and DPS F5 Friday: A War of Ecosystems F5 Friday: IPv6 Day Redux F5 Friday: Spelunking for Big Data F5 Friday: The 2048-bit Keys to the Kingdom F5 Friday: ARX VE Offers New Opportunities F5 Friday: Eliminating the Blind Spot in Your Data Center Security ... F5 Friday: Gracefully Scaling Down F5 Friday: Data Inventory Control217Views0likes0CommentsIn Replication, Speed isn’t the Only Issue
In the US, many people watch the entire season of NASCAR without ever really paying attention to the racing. They are fixated on seeing a crash, and at the speed that NASCAR races average – 81mph on the most complex track to 188 mph on the least curvy track – they’re likely to get what they’re watching for. But that misses the point of the races. The merging of man and machine to react at lightning speed to changes in the environment are what the races are about. Of course speed figures in, but it is not the only issue. Mechanical issues, and the dreaded “other driver” are things that must be watched for by every driver on the track. I’ve been writing a whole lot about remote replication and keeping systems up-to-date over a limited WAN pipe, but in all of those posts, I’ve only lightly touched upon some of the other very important issues, because first and foremost in most datacenter manager or storage manager’s mind is “how fast can I cram out a big update, and how fast can I restore if needed”. But of course the other issues are more in-your-face, so in the interests of not being lax, I’ll hit them a little more directly here. Images from NASCAR.com The cost of remote replication and backups is bandwidth. Whether that bandwidth is taken in huge bursts or leached from your WAN connection in a steady stream is merely a question of implementation. You have X amount of data to move in Y amount of time before the systems on the other end are not current enough to be useful. Some systems copy changes as they occur (largely application-based replication that largely resembles or is actually called Continuous Data Protection), some systems (think traditional backups) run the transfers in one large lump at a given time of the day. Both move roughly the same amount of data, the only variable is the level of impact on your WAN connection. There are good reasons to implement both – a small, steady stream of data is unlikely to block your WAN connection and will keep you the closest to up-to-date, while traditional backups can be scheduled such that at peak times they use no bandwidth whatsoever, and utilize the connection at times when there is not much other usage. Of course your environment is not so simple. There is always other usage, and if you’re a global organization, “peak time” becomes “peak times” in a very real sense as the sun travels around the globe and different people come online at different times. This can have implications for both types of remote replication, for even the CDP style utilizes bandwidth in bursty bits. When you hit a peak time, changes to databases and files also peak. This can effectively put a throttle on your connection by increasing replication bandwidth at the same time that normal usage is increasing in bandwidth needs. The obvious answer to this dilemma is the same answer that is obvious for every “the pipe is full” problem – get a bigger connection. But we’ve gone over this one before, bigger connections are a monthly fee, and the larger you go, the larger the hike in price. In fact, because the growth is near exponential, the price spike is near exponential. And that’s something most of us can’t just shell out. So the obvious answer is often a dead end. Not to mention that the smaller the city your datacenters are in, the harder it is to get more bandwidth in a single connection. This is improving in some places, but is still very much the truth in many smaller metropolitan areas. So what is an IT admin to do? This is where WAN Optimization Controllers come into the game. Standard disclaimer: F5 plays in this space with our WOM module. Many users approach WAN Optimization products from the perspective of cramming more through the pipe – which most are very good at, but often the need is not for a bigger pipe, it is for a more evenly utilized pipe, or one that can differentiate between the traffic (like replication and web store orders) that absolutely must get through versus traffic – like YouTube streams to employee’s desks – that doesn’t have to. If you could allocate bandwidth to data going through the pipe in such a way that you tagged and tracked the important data, you could reduce the chance that your backups are invalid due to network congestion, and improve the responsiveness of backups and other critical applications simply by rating them higher and allocating bandwidth to them. Add WAN Optimization style on-the-fly compression and deduplication, and you’re sending less data over the pipe and dedicating bandwidth to it. Leaving more room for other applications while guaranteeing that critical ones get the time they need is a huge combination. Of course the science of bandwidth allocation requires a good solid product and the art of bandwidth allocation at your organization. Only you know what is critical and how much of your pipe that critical data needs. You can get help making these determinations, but in the end, your staff has the knowledge necessary to make a go of it. But think about it, your replication taking 20-50% (or less, lots of variables in this number) of its current bandwidth requirements and being more reliable. Even if nothing in your organization runs one tiny smidgen faster (and that is highly unlikely if you’re using a WAN Optimization Controller), that’s a win in overall circuit usage. And that’s huge. Like I’ve said before, don’t buy a bigger pipe, use your connection more intelligently. Not all WAN Optimization products offer Bandwidth Allocation, check with your vendor. Or call us, we’ve got it all built in – because WOM runs on TMOS, and all LTM functionality comes with the package. Once you’ve cleared away the mechanical failures and the risks of collision, unlike a NASCAR driver, then you should focus on speed. Unlike them, we don’t have to live with the risk. Maybe that’s why they’re famous and we’re geeks ;-). And no, sorry, I’m not a NASCAR fan. Just a geek with Google. Related Articles and Blogs NASCAR.com Remote Backup and the Massive Failure IT and Data: If Not Me Then Who? If Not Now, Then When? How May I Speed and Secure Replication? Let Me Count the Ways. Informatica: Data Integration In adn For the Cloud216Views0likes0CommentsIt’s Show Time
Ladies and gentleman. In tonight’s show the role of Application Delivery, normally played by Load Balancer will be replaced by ADC. We hope you enjoy the performance. I studied Theatre in college and have spent a good amount of time in and around the performing arts. The telling of a engaging story and the creativity, imagination and spontaneity of a great live performance is something I truly enjoy. Most of my life, when I think of the term performance, I think of the performing arts – acting, dancing, singing and the rest. When you pay good money for a show, you expect a great performance. Actors embodying the characters, musicians merged with their instruments, singers feeling every note, dancers moving to the tune. When we perform ourselves, we want to give it our all, have good energy, be prepared, engage our audience and tell a good story no matter if it’s vocal, musical or movement. And if we nail it, there’s no better feeling when you hit every note, lived the character or let the music take your body. With Method acting (Stanislavski/Strasberg/Actors Studio) you try to create, in yourself, the thoughts and feelings of the character and often rely on emotional memory to generate, for instance, tears. Remember how you felt when your first dog died. Hoffman, De Niro, Pacino and Baldwin are some that practice this technique. William Gillette, an actor/director/playwright in the late 1800’s talked about ‘The Illusion of the First Time.’ That, no matter how many times you’ve done this, you need to make it seem/feel as if it is the first time that the character has ever heard or encountered whatever is occurring. This gives true responses, reactions and behavior, within the character itself, to the many conflicts within the story. The other important facet to this is, it is the audience’s first time seeing it so an actor should not ‘telepath’ a response. Just what the heck does this all have to do with application delivery? As part of the 50 Ways to Use Your BIG-IP series, this week we cover performance. How the BIG-IP system helps improve performance and what are some of the variables that can impact the performance of an application. Again you may ask, what does acting have to do with application delivery? ‘Method’ application delivery might be things like caching and data deduplication – I know I’ve seen his before so let me pull from memory and deliver the content. What is this character user trying to accomplish and I can get them there. Session persistence might be another area. I remember you from an earlier meeting, remember that you were doing this particular thing and it made you happy or more productive. I remember that if users are requesting access from a particular geo-location, then send them to that data center. The illusion of the first time also connects well with application delivery via context. The ADC might have seen this user hundreds, maybe thousands of times, but this time, they are coming from a unrecognized network or from an unknown device and the ADC needs to make an instantaneous decision as to how best handle the request…since it is the first time…within this context. Just like a character, the ADC absorbs the information, processes it and answers with the best possible response at that moment. I can tell you, there have been a few times where I did forget my line but so immersed in the moment, that when I opened my mouth, the actual written words just came out. ADC’s need to perform at their best every moment of every day, not just 8 times a week on an Equity stage. They need to remember certain pieces of information but also, receive information for the very first time and make instantaneous, intelligent decisions. They need to adjust depending on the conditions and star in that strategic point of control within the data center stage. They don’t sign autographs, appear on the front page of the National Enquirer or show up at red carpet events, but can help deliver all the Tony, Grammy, Oscar, Emmy and Obie award(s) data. As a director/actor once said in one of my acting classes, a true artist is someone who cannot do anything else, but their craft…if there is anything else that you can do with your life, do it. Hello Internet circa 1995, I’m Peter. ps Resources: All “50 Ways” to use your BIG-IP system entries 50 Ways to Use Your BIG-IP: Performance Presentation Availability resources on DevCentral Availability Solutions on F5.com Security resources on DevCentral Security Solutions on F5.com Follow #50waystousebigip on Twitter206Views0likes0CommentsBuilding the Hydra – Array Virtualization is not File Virtualization
So I’m jealous that Lori works D&D references into her posts regularly and I never have… Until today! For those who aren’t gamers or literary buffs, a Hydra is a big serpent or lizard with a variable number of heads (normally five to nine in both literature and gaming). They’re very powerful and very dangerous, and running into one unprepared is likely to get you p0wned. The worst part about them is that mythologically speaking, if you cut one of the heads off, two grow in its place. Ugly stuff if you’re determined to defeat it. That’s the way I see array-based file virtualization and other tack-on functionality. Vendors who are implementing it (many of whom are F5 partners), try to tell you that they’re unifying everything and the world is a wonderful place with greener grass and more smiling children due to their efforts. And they’re right…If you’re a homogenous shop with nothing but their storage gear. Then their multi-headed hydra looks pretty appealing. Everyone else feels like it only does a part of the job and is wary of getting too close. For the rest of us there are products like ARX to take care of that nasty truth that no organization is an all-one-vendor shop, particularly not in the NAS space, where higher end gear can cost hundreds of thousands while entry level is a commodity server with a thousand bucks worth of disk slapped into it. In fact, I have never seen an IT department that was all one vendor for NAS, and that’s the problem with single-vendor messaging. Sure they can give you a handle on their stuff, help you virtualize it, give you a unified directory structure, automate tiering, but what about that line where their box ends and the rest of the organization begins? That’s the demarcation line where you have to find other products to do the job. Picture from Pantheon.org Related Articles and Blogs: Storage Virtualization, Redux: Arise File Virtualization Lernaedan Hydraon Wikipedia Storage Vendors – The Deduplication Stakes are Raised SAN-Based Data Replication199Views0likes0Comments