Copied Data. Is it a Replica, Snapshot, Backup, or an Archive?
It is interesting to me the number of variant Transformers that have been put out over the years, and the effect that has on those who like transformers. There are four different “Construction Devastator” figures put out over the years (there may be more, I know of four), and every Transformers collector or fan that I know – including my youngest son – want them all. That’s great marketing on the part of Hasbro, for certain, but it does mean that those who are trying to collect them are going to have a hard time of it, just because they were produced and then stopped, and all of them consist of seven or more parts. That’s a lot of things to go wrong. But still, it is savvy for Hasbro to recognize that a changed Transformer equates to more sales, even though it angers the diehard fans.
As time moves forward, technology inevitably changes things. In IT that statement implies “at the speed of light”. Just like your laptop has been replaced with a newer model before you get it, and is “completely obsolete” within 18 months, so other portions of the IT field are quickly subsumed or consumed by changes. The difference is that IT is less likely to get caught up in the “new gadget” hype than the mass market. So while your laptop was technically outdated before it landed in your lap, IT knows that it is still perfectly usable and will only replace it when the warrantee is up (if you work for a smart company) or it completely dies on you (for a company pinching pennies). The same is true in every piece of storage, it is just that we don’t suffer from “Transformer Syndrome”. Old storage is just fine for our purposes, unless it actually breaks. Since you can just continue to pay annual licensing fees, there’s no such thing as “out of warrantee” storage unless you purchase very inexpensive, or choose to let it lapse. For the very highest end, letting it lapse isn’t an option, since you’re licensing the software. The same is true with how we back up and restore that data.
Devastator, image courtesy of Gizmodo.com
But even with a stodgy group like IT, who has been bitten enough times to know that we don’t change something unless there’s a darned good reason, eventually change does come. And it’s coming to backup and replication.
There are a lot of people still differentiating between backups and replication. I think it’s time for us to stop doing so. What are the differences? Let’s take a look.
- Backups go to tape. Hello Virtual Tape Libraries, how are you?
- Backups are archival. Hello tiering, you allow us to move things to different storage types, and replicate them at different intervals, right? So all is correctly backed up for its usage levels?
- Replication is near-real-time. Not really. You’re thinking of Continuous Data Protection (CDP), which is gaining traction by app, not broadly.
- Replication goes to disk and that makes it much faster. See #1. VTL is fast too.
- Tape is slow. Right, but that’s a target problem, not a backup problem. VTLs are fast.
- Replication can do just the changes. Yeah, why this one ever became a myth, I’ll never know, but remember “incremental backups”? Same thing.
I’m not saying they’re exactly the same – incremental replicas can be reverse applied so that you can take a version of the file without keeping many copies, and that takes work in a backup environment, what I AM saying is that once you move to disk (or virtual disk in the case of cloud storage), there isn’t really a difference worthy of keeping two different phrases. Tape isn’t dead, many of you still use a metric ton of it a year, but it is definitely waning, slowly. Meaning more and more of us are backing up or replicating to disk.
Where did this come from? A whitepaper I wrote recently came back from technical review with “this is not accurate when doing backups”, and that got me to thinking “why the heck not?” If the reason for maintaining two different names is simply a people reason, while the technology is rapidly becoming the same mechanisms – disk in, disk out, then I humbly suggest we just call it one thing, because all maintaining two names and one fiction does is cause confusion.
For those who insist that replicas are regularly updated, I would say making a copy or snapshotting them eliminates even that difference – you now have an archival copy that is functionally the same as a major backup. Add in an incremental snapshot and, well, we’re doing a backup cycle.
With tiering, you can set policies to create snapshots or replicas on different timelines for different storage platforms, meaning that your tier three data can be backed up very infrequently, while your tier one (primary) storage is replicated all of the time. Did you see what I did there? The two are used interchangeably. Nobody died, and there’s less room for confusion.
Of course I think you should use our ARX to do your tiering, ARX Cloud Extender to do your cloud connections, and take advantage of the built-in rules engine to help maintain your backup schedule. But the point is that we just don’t need two names for what is essentially the same thing any more.
So let’s clean up the lingo. Since replication is more accurate to what we’re doing these days, let’s just call it replication. We have “snapshot” that is already associated with replication for point-in-time copies, which makes us able to differentiate between a regularly updated replica and a frozen-in-time “backup”. Words fall in and out of usage all of the time, let’s clean up the tech lingo and all get on the same language.
No, no we won’t, but I’ve done my bit by suggesting it. And no doubt there are those confused by the current state of lingo that this will help to understand that yes, they are essentially the same thing, only archaic history keeps them separate. Or you could buy all three – replicate to a place where you can take a snapshot and then back up the snapshot (not as crazy as it sounds, I have seen this architecture deployed to get the backup process out of production, but I was being facetious). And you don’t need a ton of names. You replicate to secondary (tertiary) storage, then take a snapshot, then move or replicate the snapshot to a remote location – like the cloud or remote datacenter. Not so tough, and one term is removed from the confusion, inadvertently adding crispness to the other terms.