f5 big-ip wom
9 TopicsDeduplication and Compression – Exactly the same, but different.
One day many years ago, Lori and I’s oldest son held up two sheets of paper and said “These two things are exactly the same, but different!” Now, he’s a very bright individual, he was just young, and didn’t even get how incongruous the statement was. We, being a fun loving family that likes to tease each other on occasion, we of course have not yet let him live it down. It was honestly more than a decade ago, but all is fair, he doesn’t let Lori live down something funny that she did before he was born. It is all in good fun of course. Why am I bringing up this family story? Because that phrase does come to mind when you start talking about deduplication and compression. Highly complimentary and very similar, they are pretty much “Exactly the same, but different”. Since these technologies are both used pretty heavily in WAN Optimization, and are growing in use on storage products, this topic intrigued me. To get this out of the way, at F5, compression is built into the BIG-IP family as a feature of the core BIG-IP LTM product, and deduplication is an added layer implemented over BIG-IP LTM on BIG-IP WAN Optimization Module (WOM). Other vendors have similar but varied (there goes a variant of that phrase again) implementation details. Before we delve too deeply into this topic though, what caught my attention and started me pondering the whys of this topic was that F5’s deduplication is applied before compression, and it seems that reversing the order changes performance characteristics. I love a good puzzle, and while the fact that one should come before the other was no surprise, I started wanting to know why the order it was, and what the impact of reversing them in processing might be. So I started working to understand the details of implementation for these two technologies. Not understand them from an F5 perspective, though that is certainly where I started, but try to understand how they interact and compliment each other. While much of this discussion also applies to in-place compression and deduplication such as that used on many storage devices, some of it does not, so assume that I am talking about networking, specifically WAN networking, throughout this blog. At the very highest level, deduplication and compression are the same thing. They both look for ways to shrink your dataset before passing it along. After that, it gets a bit more complex. If it was really that simple, after all, we wouldn’t call them two different things. Well, okay, we might, IT has a way of having competing standards, product categories, even jobs that we lump together with the same name. But still, they wouldn’t warrant two different names in the same product like F5 does with BIG-IP WOM. The thing is that compression can do transformations to data to shrink it, and it also looks for small groupings of repetitive byte patterns and replaces them, while deduplication looks for larger groupings of repetitive byte patterns and replaces them. In the implementation you’ll see on BIG-IP WOM, deduplication looks for larger byte patterns repeated across all streams, while compression applies transformations to the data, and when removing duplication only looks for smaller combinations on a single stream. The net result? The two are very complimentary, but if you run compression before deduplication, it will find a whole collection of small repeating byte patterns and between that and transformations, deduplication will find nothing, making compression work harder and deduplication spin its wheels. There are other differences – because deduplication deals with large runs of repetitive data (I believe that in BIG-IP the minimum size is over a K), it uses some form of caching to hold patterns that duplicates can match, and the larger the caching, the more strings of bytes you have to compare to. This introduces some fun around where the cache should be stored. In memory is fast, but limited in size, on flash disk is fast and has a greater size, but is expensive, and on disk is slow but has a huge advantage in size. Good deduplication engines can support all three and thus are customizable to what your organization needs and can afford. Some workloads just won’t benefit from one, but will get a huge benefit from the other. The extremes are good examples of this phenomenon – if you have a lot of in-the-stream repetitive data that is too small for deduplication to pick up, and little or no cross-stream duplication, then deduplication will be of limited use to you, and the act of running through the dedupe engine might actually degrade performance a negligible amount – of course, everything is algorithm dependent, so depending upon your vendor it might degrade performance a large amount also. On the other extreme, if you have a lot of large byte count duplication across streams, but very little within a given stream, deduplication is going to save your day, while compression will, at best, offer you a little benefit. So yes, they’re exactly the same from the 50,000 foot view, but very very different from the benefits and use cases view. And they’re very complimentary, giving you more bang for the buck.299Views0likes1CommentSSDs, Velocity and the Rate of Change.
The rate of change in a mathematical equation can vary immensely based upon the equation and the inputs to the equation. Certainly the rate of change for f(x) = x^2 is a far different picture than the rate of change for f(x)=2x, for example. The old adage “the only constant is change” is absolutely true in high tech. The definition of “high” in tech changes every time something becomes mainstream. You’re working with tools and systems that even ten years ago were hardly imaginable. You’re carrying a phone that Alexander Graham Bell would not recognize – or know how to use. You have tablets with the power that was not so long ago only held by mainframes. But that change did not occur overnight. Apologies to iPhone fans, but all the bits Apple put together to produce the iPhone had existed before, Apple merely had the foresight to see how they could be put together in a way customers would love. The changes happen over time, and we’re in the midst of them, sometimes that’s difficult to remember. Sometimes that’s really easy to remember, as our brand-new system or piece of architecture gives us headaches. Depends upon the day. Image generated at Cool Math So what is coming of age right now? Well, SSDs for one. They’re being deployed in the numbers that were expected long ago, largely because prices have come down far enough to make them affordable. We offer an SSD option for some of our systems these days, and since the stability of our products is of tantamount to our customers’ interests, we certainly aren’t out there on the cutting edge with this development. They’re stable enough for mission critical use, and the uptick in sales reflects that fact. If you have a high-performance application that relies upon speedy database access, you might look into them. There are a lot of other valid places to deploy SSDs – Tier one for example – but a database is an easy win. If access times are impacting application performance, it is relatively easy to drop in an SSD drive and point the DB (cache or the whole DB) at them, speeding performance of every application that relies on that DBMS. That’s an equation that is pretty simple to figure out, even if the precise numbers are elusive. Faster disk access = faster database response times = faster applications. That is the same type of equation that led us to offer SSDs for some of our products. They sit in the network between data and the applications that need the data. Faster is better, assuming reliability, which after years of tweaking and incremental development, SSDs offer. Another place to consider SSDs is in your virtual environment. If you have twenty VMs on a server, and two of them have high disk access requirements, putting SSDs into place will lighten the load on the overall system simply by reducing the blocking time waiting for disk responses. While there are some starting to call for SSDs everywhere, remember that there were some who said cloud computing meant no one should ever build out a datacenter again also. The price of HDs has gone down with the price of SSDs pushing them from the top, so there is still a significant cost differential, and frankly, a lot of applications just don’t need the level of performance that SSDs offer. The final place I’ll offer up for SSDs is if you are implementing storage tiering such as that available through our ARX product. If you have high-performance NAS needs, placing an SSD array as tier one behind a tiering device can significantly speed access to the files most frequently used. And that acceleration is global to the organization. All clients/apps that access the data receive the performance boost, making it another high-gain solution. Will we eventually end up in a market where old-school HDDs are a thing of the past and we’re all using SSDs for everything? I honestly can’t say. We have plenty of examples in high-tech where as demand went down, the older technology started to cost more because margins plus volume equals profit. Tube monitors versus LCDs, a variety of memory types, and even big old HDDs – the 5.25 inch ones. But the key is whether SDDs can fulfill all the roles of HDDs, and whether you and I believe they can. That has yet to be seen, IMO. The arc of price reduction for both HDDs and SSDs plays in there also – if quality HDDs remain cheaper, they’ll remain heavily used. If they don’t, that market will get eaten by SSDs just because all other things being roughly equal, speed wins. It’s an interesting time. I’m trying to come up with a plausible use for this puppy just so I can buy one and play with it. Suggestions are welcome, our websites don’t have enough volume to warrant it, and this monster for laptop backups would be extreme – though it would shorten my personal backup window ;-). OCZ Technology 1 TB SSD. The Golden Age of Data Mobility? What Do You Really Need? Use the Force Luke. (Zzzaap) Don't Confuse A Rubber Stamp With Validation On Cloud, Integration and Performance Data Center Feng Shui: Architecting for Predictable Performance F5 Friday: Performance, Throughput and DPS F5 Friday: Performance Analytics–More Than Eye-Candy Reports Audio White Paper - High-Performance DNS Services in BIG-IP ... Analyzing Performance Metrics for File Virtualization274Views0likes0CommentsYou Say Tomato, I Say Network Service Bus
It’s interesting to watch the evolution of IT over time. I have repeatedly been told “you people, we were doing that with X, back before you had a name for it!” And likely, the speaker is telling the truth, as far as it goes. Seriously, while the mechanisms may be different, putting a ton of commodity servers behind a load balancer and tweaking for performance looks an awful lot like having LPARs that can shrink and grow. You put “dynamic cloud” into the conversation and the similarities become more pronounced. The biggest difference is how much you’re paying for hardware and licensing. Back in the day, Enterprise Service Busses (ESB) were all the rage, able to handle communications between a variety of application sources and route things to the correct destination in the correct format, even providing guaranteed delivery if you needed it for transactional services. I trained in several of these tools, most notably IBM MQSeries (now called IBM WebSphere MQ, surprised?) and MS MQ. I was briefed on a ton more during my time at Network Computing. In the end, they’re simply message delivery and routing mechanisms that can translate along the way. Oh sure, with MQSeries Integrator you could include all sorts of other things like security callouts and such, but core functionality was restricted to message flow and delivery. While ESBs are still used today in highly mixed environments or highly complex application infrastructures, they’re not deployed broadly in IT, largely because XML significantly reduced the need for the translation aspect, which was a primary use of them in the enterprise. Today, technology is leading us to a parallel development that will likely turn out much more generically useful than ESBs. Since others have referred to it as several things, but the Network Service Bus is the closest I’ve seen in terms of accuracy, I’ll run with that term. This is routing, translation, and delivery across the network from consumer to the correct service. The service is running on a server somewhere, but that’s increasingly less relevant to the consumer application, merely that their request gets serviced is sufficient. Serviced in a timely and efficient manner is big too. Translated while servicing is seeing a temporary (though not short, in my estimation) bump while IPv4 is slowly supplanted by IPv6, but has other uses – like encrypted to unencrypted, for example. The network of the future will use a few key Strategic Points of Control – like the one between consumers and web servers – to handle routing to a service that is (a) active, (b) responsive, and (c) appropriate to the request. In the interim, while passing the request along, the Strategic point of control will translate the incoming request into a format that the service expects, and if necessary will validate the user in the context of the service being requested and the username/platform/location the request is coming from. This offloads a lot from your apps and your servers. Encryption can be offloaded to the strategic point of control, freeing up a lot of CPU time, and running unencrypted within your LAN, while maintaining encryption on the public Internet. IPv6 packets can be translated to IPv4 on the way in and back to IPv6 on the way out, so you don’t have to switch everything in your datacenter over to IPv6 at once, security checks can occur before the connection is allowed inside your LAN, and scalability gets a major upgrade because you now have a device in place that will route traffic according to the current back-end configuration. Adding and removing servers, upgrading apps, all benefit from the strategic point of control that allows you to maintain a given public IP while changing the machines that service requests as-needed. And then we factor in cloud computing. If all of this functionality – or at least a significant chunk of it – was available in the cloud, regardless of cloud vendor, then you could ship overflow traffic to the cloud. There are a lot of issues to deal with, like security, but they’re manageable if you can handle all of the other service requests as if the cloud servers were part of your everyday infrastructure. That’s a datacenter of the future. Let’s call it a tomato. And in the end it makes your infrastructure more adaptable while giving you a point of control that can harness to implement whatever monitoring or functionality you need. And if you have several of those points of control – one to globally load balance, one for storage, one in front of servers… Then you are offering services that are highly adaptable to fluctuations in usage. Like having a tomato, right in the palm of your hands. Completely irrelevant observation: The US Bureau of Labor Statistics (BLS) mentioned today that IT unemployment is at 3.3%. Now you have a bright spot in our economic doldrums.214Views0likes0CommentsF5 Friday: CSG Case Study Shows Increased Performance, Less WAN Traffic With Dell and F5
When time and performance mattered, CSG Content Direct turned to Dell and F5 to make their replication faster while reducing WAN utilization. We talk a lot in our blogs about what benefits you could get from an array of F5 products, so when this case study (pdf link) hit our inboxes, we thought you’d like to hear about what CSG’s Content Direct did get out of deploying F5 BIG-IPWOM. Utilizing tools by two of the premier technology companies in the world, Content Direct was able to decrease backup windows to as little as 5% of their previous time, and reduce traffic on the WAN significantly. At the heart of the problem was WAN performance that was inhibiting their replication to a remote datacenter and causing them to fall further and further behind. Placing a BIG-IP WOM between their Dell EqualLogic iSCSI devices, Content Direct was able to improve performance to the point that they are now able to meet their RPOs and RTOs with room for expansion. Since Content Direct already deployed F5 BIG-IP LTM, they were able to implement this solution by purchasing and installing F5 BIG-IP WAN Optimization Manager (WOM) on the existing BIG-IP hardware, eliminating the need for new hardware. The improvements that they saw while replicating iSCSI devices is in line with the improvements our testing has shown for NAS device replication also, making this case study a good examination of what you can expect from BIG-IP WOM in many environments. Since BIG-IP WOM supports a wide array of applications – from the major NAS vendors to the major database vendors – and includes offloading of encryption from overburdened servers, you can deploy it once and gain benefits at many points in your architecture. If you are sending a lot of data between two datacenters, BIG-IP WOM has help for your overburdened WAN connection. Check out our White Papers and Solution Profiles relevant to BIG-IP WOM for more information about how it might help, and which applications have been tested for improvement measurements. Of course BIG-IP WOM works on IP connections, and as such can improve many more scenarios than we have tested or even could reasonably test, but those applications tested will give you a feel for the amount of savings you can get when deploying BIG-IP WOM on your WAN. And if you are already a BIG-IP LTM customer, you can upgrade to include WOM without introducing a new device into your already complex network. Related Blogs: F5 Friday: Speed Matters F5 Friday: Performance, Throughput and DPS F5 Friday: A War of Ecosystems F5 Friday: IPv6 Day Redux F5 Friday: Spelunking for Big Data F5 Friday: The 2048-bit Keys to the Kingdom F5 Friday: ARX VE Offers New Opportunities F5 Friday: Eliminating the Blind Spot in Your Data Center Security ... F5 Friday: Gracefully Scaling Down F5 Friday: Data Inventory Control219Views0likes0CommentsThe Question Is Not “Are You Ready For Cloud Storage?”
I recently read a piece in Network Computing Magazine that was pretty disparaging of NAS devices, and with a hand-wave the author pronounced NAS dead, long live cloud storage. Until now, storage has been pretty much immune to the type of hype that “The Cloud” gets. Sure, there have been some saying that we should use the cloud for primary storage, and others predicting that it will kill this or that technology, but the outrageous and intangible claims that accompany placing your applications in the cloud. My favorite, repeated even by a lot of people I respect, is that cloud mystically makes you greener. Okay, I’ll sidetrack for a moment and slay that particular demon yet again, because it is just too easy. Virtualization makes you more green by running more apps on less hardware. Moving virtualized anything to the cloud changes not one iota of carbon footprint, because it still has to run on hardware. So if you take 20 VMs from one server and move them to your favorite cloud provider, you have moved where they are running, but they are certainly running on at least one server. Just because it is not your datacenter does not change the fact that it is in a datacenter. Not greener, not smaller carbon footprint. But this column was awash with the claim that cloud storage is it. We no longer need those big old NAS boxes, and they can just go away from the datacenter, starting with the ones that have been cloudwashed. The future is cloudy, cloouuuudddyyy Okay, let us just examine a hypothetical corporation for a moment – I’ll use my old standby, Zap-N-Go. Sally, the CIO of Zap-N-Go is under pressure to “do something with the cloud!” or “Identify three applications to move to the cloud within the next six months!” Now this is a painful way to run an IT shop, but it’s happening all over, so Sally assigns Bob to check out the possibilities, and Bob suggests that moving storage to the cloud might be a big win because of the cost of putting in a new NAS box. They work out a plan to move infrequently accessed files to the cloud as a test of robustness, but that’s not a bold enough staff for the rest of senior management, so their plan to test the waters turns into a full-blown movement of primary data to the cloud. Now this may be a bit extreme, Sally, like any good CIO, would dig in her heals at this one, but bear with me. They move primary storage to the cloud on a cloudy Sunday, utilizing ARX or one of the other cloud-enabled devices on the market, and start to reorganize everything so that people can access their data. On Monday morning, everyone comes in and starts to work, but work is slow, nothing is performing like it used to. The calls start coming to the help desk. “Why is my system so slow?” And then, the CEO calls Sally directly. “It should not take minutes to open an Excel Spreadsheet” he harrumphs. And Sally goes down to help her staff figure out how to improve performance. Since the storage move was the day before, everyone knows the ultimate source of the problem, they’re just trying to figure out what is happening. Sue, the network wizard, pops off with “Our Internet connection is overloaded.” and everyone stops looking. After some work, the staff is able to get WOM running with the cloud provider to accelerate data flowing between the two companies… But doing so in the middle of the business day has cost the company money, and Sally is in trouble. After days of redress meetings, and acceptable, if not perfect performance, all seems well, and Sally can report to the rest of upper management that files have been moved to the cloud, and now a low monthly fee will be paid instead of large incremental chunks of budget going to new NAS devices. It’s Almost Ready for Primary Storage… Until the first time the Internet connection goes down. And then, gentle reader, Sally and Bob’s resume’ will end up on your desk, because they will not survive the aftermath of “no one can do anything”. Cloud in general and cloud storage in particular has amazing promise – I really believe that – but pumping it full of meaningless hyperbole does no one any good. Not IT, not the business, and not whatever you’re hawking. So take such proclamations with a grain of salt, keep your eye on the goal. Secure, Fast, and Agile solutions for your business, not “all in” like it’s a poker table. And don’t let such buffoons sour you on the promise of cloud, while I wouldn’t call them visionary, I do see a day when most of our storage and apps are in a cloud somewhere. It’s just not tomorrow. Or next year. Next year archiving and tier three will be out there, let’s just see how that goes before we start discussing primary storage. …And Ask Not “Are We Ready For Cloud Storage?” but rather “Is Cloud Storage Ready For Us?” My vote? Archival and Tier three are getting a good workout, start there.161Views0likes0CommentsDatabases in the Cloud Revisited
A few of us were talking on Facebook about high speed rail (HSR) and where/when it makes sense the other day, and I finally said that it almost never does. Trains lost out to automobiles precisely because they are rigid and inflexible, while population densities and travel requirements are highly flexible. That hasn’t changed since the early 1900s, and isn’t likely to in the future, so we should be looking at different technologies to answer the problems that HSR tries to address. And since everything in my universe is inspiration for either blogging or gaming, this lead me to reconsider the state of cloud and the state of cloud databases in light of synergistic technologies (did I just use “synergistic technologies in a blog? Arrrggghhh…). There are several reasons why your organization might be looking to move out of a physical datacenter, or to have a backup datacenter that is completely virtual. Think of the disaster in Japan or hurricane Katrina. In both cases, having even the mission critical portions of your datacenter replicated to the cloud would keep your organization online while you recovered from all of the other very real issues such a disaster creates. In other cases, if you are a global organization, the cost of maintaining your own global infrastructure might well be more than utilizing a global cloud provider for many services… Though I’ve not checked, if I were CIO of a global organization today, I would be looking into it pretty closely, particularly since this option should continue to get more appealing as technology continues to catch up with hype. Today though, I’m going to revisit databases, because like trains, they are in one place, and are rigid. If you’ve ever played with database Continuous Data Protection or near-real-time replication, you know this particular technology area has issues that are only now starting to see technological resolution. Over the last year, I have talked about cloud and remote databases a few times, talking about early options for cloud databases, and mentioning Oracle Goldengate – or praising Goldengate is probably more accurate. Going to the west in the US? HSR is not an option. The thing is that the options get a lot more interesting if you have Goldengate available. There are a ton of tools, both integral to database systems and third-party that allow you to encrypt data at rest these days, and while it is not the most efficient access method, it does make your data more protected. Add to this capability the functionality of Oracle Goldengate – or if you don’t need heterogeneous support, any of the various database replication technologies available from Oracle, Microsoft, and IBM, you can seamlessly move data to the cloud behind the scenes, without interfering with your existing database. Yes, initial configuration of database replication will generally require work on the database server, but once configured, most of them run without interfering with the functionality of the primary database in any way – though if it is one that runs inside the RDBMS, remember that it will use up CPU cycles at the least, and most will work inside of a transaction so that they can insure transaction integrity on the target database, so know your solution. Running inside the primary transaction is not necessary, and for many uses may not even be desirable, so if you want your commits to happen rapidly, something like Goldengate that spawns a separate transaction for the replica are a good option… Just remember that you then need to pay attention to alerts from the replication tool so that you don’t end up with successful transactions on the primary not getting replicated because something goes wrong with the transaction on the secondary. But for DBAs, this is just an extension of their daily work, as long as someone is watching the logs. With the advent of Goldengate, advanced database encryption technology, and products like our own BIG-IPWOM, you now have the ability to drive a replica of your database into the cloud. This is certainly a boon for backup purposes, but it also adds an interesting perspective to application mobility. You can turn on replication from your data center to the cloud or from cloud provider A to cloud provider B, then use VMotion to move your application VMS… And you’re off to a new location. If you think you’ll be moving frequently, this can all be configured ahead of time, so you can flick a switch and move applications at will. You will, of course, have to weigh the impact of complete or near-complete database encryption against the benefits of cloud usage. Even if you use the adaptability of the cloud to speed encryption and decryption operations by distributing them over several instances, you’ll still have to pay for that CPU time, so there is a balancing act that needs some exploration before you’ll be certain this solution is a fit for you. And at this juncture, I don’t believe putting unencrypted corporate data of any kind into the cloud is a good idea. Every time I say that, it angers some cloud providers, but frankly, cloud being new and by definition shared resources, it is up to the provider to prove it is safe, not up to us to take their word for it. Until then, encryption is your friend, both going to/from the cloud and at rest in the cloud. I say the same thing about Cloud Storage Gateways, it is just a function of the current state of cloud technology, not some kind of unreasoning bias. So the key then is to make sure your applications are ready to be moved. This is actually pretty easy in the world of portable VMs, since the entire VM will pick up and move. The only catch is that you need to make sure users can get to the application at the new location. There are a ton of Global DNS solutions like F5’s BIG-IP Global Traffic Manager that can get your users where they need to be, since your public-facing IPs will be changing when moving from organization to organization. Everything else should be set, since you can use internal IP addresses to communicate between your application VMs and database VMs. Utilizing a some form of in-flight encryption and some form of acceleration for your database replication will round out the solution architecture, and leave you with a road map that looks more like a highway map than an HSR map. More flexible, more pervasive.368Views0likes0CommentsSmart Energy Cloud? Sounds like fun.
Having just returned from our annual D&D tournament, this year in Las Vegas, I have role-playing on the mind, so when I read the title of Elizabeth White’s blog IBM and Cable & Wireless to Develop UK Smart Energy Cloud, I immediately thought of the AD&D Druid spell Call Lightning which gathers clouds and then emits lightning every ten minutes until it runs out. Which is kind of in line with what her blog is talking about – two companies with a history in smart energy grids getting together to make it a reality. Most striking to me is the following quote from the article: The challenge is for smart meters to reach the entire UK population and this will require a combination of enabling solutions, such as GPRS, radio and Power Line Carrier to make sure it's cost effective. However, it is the network connecting it all and intelligent data management, that is central to the smart agenda's success. Yes indeed, that’s one of the challenges. For regular readers you will note that I’ve talked about how complex this gets and how fast. We had everything – power line carrier, telephone, RIM, local wireless, and cell phone towers being used to get data back to our DC when I was running just such a project. It was a lot of work both in IT and in the field with engineering, and we had some great guys on the engineering side to keep things coming, and my staff was busy pretty much non-stop keeping up with the IT end of the problem. But the private cloud and entire country bits add some interesting twists to the problem domain. Like the volume of data that is transferred and the bandwidth required to handle that much data. An individual meter read is not a huge chunk of data, and read once a day does not, even with header information, amount to a substantial amount of data. For most uses a single daily read fits inside an IP packet, which is not much bother on a decent sized WAN. The problem comes up when there are hundreds of thousands (or in the UK’s case no doubt millions) of meters and many of them need 15 minute or even five minute interval reads, or there are meters that can gauge power being put back onto the grid from local sources that needs to be compared to usage figures. It rapidly becomes not only complex data-wise, which no doubt they’re hoping having it all in one major cloud would resolve, but also network bandwidth wise. This would be one heck of a project to work on. There is a little of everything in meter reading, and I really enjoyed the technology part of that job. You have infosec – is the data protected? You have database – Is the data in a common format (much of our internal application development was on this topic), you have communications issues – is the communications method being used for a given meter adequate for the requirements of not just that meter but the entire collection of meters that use the medium, and can you set reading schedules to maximize bandwidth usage over the course of a day? There is analytics, can you drag patterns out of the data that will help your generation units better provide for the needs of the customer base, which sounds like a lot of the focus of this project. There is systems admin because the article lists three communications mechanisms, which means at least three systems to read meters. We had many, all required to meet our goal of 100% coverage because truly rural places or “end of the line” customers often could not be well serviced by power line carrier solutions, which were our first choice. They’re going to add “build a cloud” and “manage a massive network” to an already massively complex undertaking. If you’re a geek in the UK, see if they’re looking for help, you’ll have a blast and learn a lot. Having been through most of this, I’m focused on learning about the cloud bit. Moving that much data around in a timely manner is difficult, and we certainly had a fair number of translation programs to get the data into a unified format that could be utilized for billing, analysis, and troubleshooting. But using private cloud might relieve much of the overhead of applications by allowing more translations to occur in real time, and utilizing IBM might give a leg up on standardizing the smart meter communications themselves (even “standardized” isn’t “interoperable”, just like every other IT field), so then the networking aspect becomes the bottleneck. Can you switch enough servers over to task X to accomplish the job, do you have acceleration in place to get the data transmitted down to the smallest number of bytes possible, is the security as secure as possible or “good enough”, and once you start talking security, is there an upgrade path for that security? Seriously, most of the functionality of a meter never needs to be updated. It’s reading usage. But the tools used when securing those connections is guaranteed to become outdated over time, so what’s the upgrade path, and is security in the network (ala our BIG-IPWAN Optimization Module (WOM) which wouldn’t work for this solution because it needs symmetric deployment, but is an example of offloading security to a place that can be easily updated, or is security hard bundled into the smart-meter, touching every piece of it. My guess is the latter, just so that the reads can be stored encrypted locally and the meter can be tamper-proofed. Both worthy reasons, but that means instead of upgrading one tiny bit when security becomes dated, you’ll have to upgrade an array of systems on the meter. It would be cool too to see how the private cloud is locked down, because one big pool of any kind of data becomes a target of curious or malicious hackers just because it is a lot of information collected at one point. Point-to-point security within the cloud is probably going to be a requirement, even if the perimeter is well locked down, so that’s a huge overhead that would be interesting to see how they’re implementing and what the cost in terms of on-server encryption versus offload devices is. Apparently you can take the geek out of IT, but you can’t take IT out of the geek. I’d love to see the design and analysis of this monster – from the cloud to the meter – and see how far we’ve come since I was running a project like this. I’ll bet there are a lot of similarities and a lot of differences. You might want to follow along too – there are a lot of uses for private cloud, but if this goes forward, IBM and Cable & Wireless are going to have experience in highly redundant, highly efficient private cloud implementations. Any word on how or what they’re doing might be an indicator of what you can expect moving forward in the growing world of private cloud, which is driven by the desire for more IT efficiency, these people will be proving or disproving that reasoning over the next several years. Like the AD&D Druid spell though, this is going to take time, longer than we’d like. Sometimes we in IT forget the massive investment of time and resources that this level of project requires, so be patient if you want to watch them, the clouds have to be summoned and the energy built up.209Views0likes0CommentsStop Repeating Yourself. Deduping WAN-Opt Style
Ever hang out with the person who just wants to make their point, and no matter what the conversation says the same thing over and over in slightly different ways? Ever want to tell them they were doing their favorite cause/point/whatever a huge disfavor by acting like a repetitive fool? That’s what your data is doing when you send it across the WAN. Ever seen the data in a database file? Or in your corporate marketing documents? R E P E T I T I V E. And under a normal backup or replication scenario – or a remote office scenario – you are sending the same sequence of bytes over and over and over. Machines may be quad word these days, but your pipe is still measured in bits. That means even most of your large integers have 32 bits of redundant zeroes. Let’s not talk about all the places your corporate logo is in files, or how many times the word “the” appears in your documents. It is worth noting for those of you just delving into this topic that WAN deduplication shares some features and even technologies with storage deduplication, but because the WAN has to handle an essentially unlimited stream of data running through it, and it does not have to store that data and keep differentials or anything moving forward, it is a very different beast than disk-based deduplication. WAN deduplication is more along the lines of “fire and forget” (though forget is the wrong word, since it keeps duplicate info for future reference) than storage which is “fire and remember exactly what we did”. Thankfully, your data doesn’t have feelings, so we can offer a technological solution to its repetitive babbling. There are a growing number of products out there that tell your data “Hey! Say it once and move on!” these products either are or implement in-flight data deduplication. These devices require a system on each end – one to dedupe, one to rehydrate – and there are a variety of options the developer can choose, along with a few that you can choose, to make the deduplication of higher or lower quality. Interestingly, some of these options are perfect for one customers’ data set and not at all high-return for others. So I thought we’d talk through them generically, giving you an idea of what to ask your vendor when you consider deduplication as part of your WAN Optimization strategy. Related Articles and Blogs: WAN Optimization Continues to Evolve Best Practices for Deploying WAN Optimization with Data Replication Like a Matrushka, WAN Optimization is Nested195Views0likes0CommentsOur data is so deduped that no two bits are alike!
Related Articles and Blogs Dedupe Ratios Do Matter (NWC) Ask Dr Dedupe: NetApp Deduplication Crosses the Exabyte Mark (NetApp) Dipesh on Dedupe: Deduplication Boost or Bust? (CommVault) Deduplication Ratios and their Impact on DR Cost Savings (About Restore) Make the Right Call (Online Storage Optimization) – okay, that one’s a joke BIG-IP WAN Optimization Module (f5 – PDF) Like a Matrushka, WAN Optimization is Nested (F5 DevCentral)188Views0likes0Comments