Data safety: Backups
OK, now for the topic that most folks treat like going to the dentist: Backups. You almost certainly have several types of data who’s loss would at worst be devastating and at best be time-consuming and a pain to recover. However, my guess is that you also are not currently safeguarding this data adequately to prevent loss and if lost, effect recovery. In fact, most homes and small businesses do not provide any backups or woefully inadequate ones.
So, let’s get some basic terminology down, so that it’s clear what’s being discussed:
- Backup – It’s the act of making a controlled copy of your data. There are two basic types of backups:
- Full – This backs up the entire dataset at a given time. Most back up solutions perform a full back up initially, then perform incremental backups thereafter.
- Incremental – This backs up only the files that have changed since the last backup. This is critical since an average computer has from 50-150 GBs of data, but only changes a few MBs of data between backups. 100 GB will take a 1/2 day to backup, while a few MBs will only take minutes.
- Data Reconstruction – Data that is recoverable through an alternative to backups. Reinstalling the OS or applications is a form of reconstruction, as is the rebuilding of a RAID set.
- Dataset – The collection of files being protected via backup.
- NAS Drive – This refers to a Network Attached Storage device. Though NAS storage has all sorts of sizes, configurations and protocols, for our purposes it refers to a storage solution that has the following attributes:
- Network attached – It resides on your local LAN network as is accessible by all devices with the appropriate credentials. BTW: An available filesystem on an alternate computer is also considered to be network attached.
- Mirrored drives – It has at least 2 drives configured with RAID-1, which means every write is simultaneously written to both drives so that if one drive is unavailable, the data is still secure and accessible on the other drive.
- Multi-protocol access – It can appear as a remote filesystem that speaks Appletalk for Macs, CiFS for Windows and NFS for Linux.
- Raid set – The set of disks in a RAID configuration that act as one drive. When a disk is replaced, the RAID set needs to be rebuilt, which will take several hours, but the system will remain available.
- Restore – It’s the act of moving a backed up copy of all or part of the dataset back into place for use.
- Versioning – The backups contain different versions of a given file, usually indexed by time.
Here’s a list of potential losses, with associated requirements:
- Bad disk – This is a lot more common than most people realize. Yearly disk failure rates range from 2%-10% with lower rates within a disk’s first couple years and higher rates as the disk ages. This means that for every 100 drives, 2-10 drives will failure any given year. Said another way: If you have 5 disks in your home, you are likely to lose one within 2-3 years. Environment, activity and power-cycling also effect failure rates. The requirement is that the data is replicated elsewhere or there is some method to reconstruct the data.
- Hardware loss/theft – If the laptop that has the manuscript for your great-American-novel on it is stolen or lost, you are in a bad way. Again, you need to have the data replicated elsewhere.
- Fat-Fingered-Freddy — We’ve all done it: made a change to a file that corrupts it or we’ve simply deleted the file. Frequently, we don’t even know we’ve done it until several days have past and we are trying to access it. This is known in the industry as “fat fingering”. The requirement is that we can get an older version of this file, before it was messed up as it’s likely that we’ve backed up at least one copy of the corrupted file.
- Backup/recovery failure — During backup and recovery, there are several stages where a failure can occur. The backed up data could be silently corrupted during backup. The disk the data resides on can go bad (see Bad disk above) or a failure can occur during recovery. The requirement is that we have at least two different forms of backups, so that a failure from one source will be able to be recovered from another sources, preferably from different locations.
- Types of data — Though not strictly a loss scenario, the type of data poses requirements on your backup strategy. For example, pictures and archived files typically are stored once and do not change. However, the files containing emails, Quicken data and the like do change frequently, demanding file versioning and frequent backups.
- Catastrophic failure — How do you recover your critical data if your home is destroyed including all of the electronics? The requirement here is off-site backups, which come in two forms: Backup services in the “cloud” and copies of files stored in alternative locations like a safe-deposit box or Dropbox.
- Automation — Data replication and backups need to be performed in an automated fashion. Just doing them when you think about it will not be sufficient to insure it gets done and is done frequently enough.
So, to summarize: You really need a couple backup solutions, one local and one remote. Requirements for the local version include:
- Frequent, automated backups — I’d recommend at least daily, unless you’re dataset doesn’t change often.
- Detached disk for storage — You need a disk not part of your computer for storing backups. This means either a USB external disk, space on an alternate computer or a NAS solution.
- File Versioning — Though the basic backup strategy doesn’t need file versioning, being able to browse and recover a particular version of a file is highly useful, especially with fat-fingered issues.
For Remote backups, the requirements include:
- Automated backups — Unless remote backups are your primary form of backup, remote backups can be more infrequent, as they are primarily backing up your local backups and providing for catastrophic loss.
- Critical data backups — With remote backups, you’ll not backup any data that can be reconstructed, like your computer’s OS or applications. If you store your email on a cloud service, you’ll not need to perform a remote backup on email. The reason for limiting your dataset, is two-fold: First, cloud storage is much more expensive than local storage. Second, backing up and restoring over the Net is also very slow compared to a LAN. So, you’ll only want to store critical datasets.
Backup strategy and implementation
To come up with a strategy, you’ll need to answer some questions:
- How many computers need to be backed up?
- What is the critical data on each?
- Is some of this data already being backed up? For example, your CD collection is in iTunes and resides on your phone and tablet. That doesn’t mean that it isn’t critical, but it might not be necessary to back it up remotely.
- Do you need to be able to backup while moving your computer around within your home or office? If you have a laptop, a network attached drive makes sense.
- Do you travel with your computer? Cloud backups are really nice when you travel, though you’ll probably need more frequency.
- For local backups, do you have adequate backup software or will you need to acquire software? If your running Macs with TimeMachine support or Windows 7, you have a built-in solution. Older versions of Windows probably need a third-party solution like Retrospect or simply use a Cloud solution.
So, once you’ve answered those questions, you can design a viable plan. It can be a phased approach also. So, assuming you have no backup solution, start with a USB drive on your computer and back it up using the existing backup software or a third-party solution. That will get you started. Next, I’d suggest you get another UBS drive or flash drive and make a copy of your critical data and get it to a safe place off-site, like a safe-deposit box. Alternatively, you can use Dropbox to drop a copy of your critical files into the cloud. Just be aware that this is largely a manual operation.
That might be sufficient if you only have one or two computers. However, I’d recommend that you replace the safe-deposit-box/dropbox solution (or augment it) with a cloud solution like Mozy or Carbonite. You can find a comparison of various cloud solutions here.
Finally, if you have significant amounts of data or multiple computers, I’d highly recommend getting a NAS system. It’s also a good solution to store and serve up your music, movies and photos. I have an Iomega ix2-200. I store critical files and all my computer backups on this device. Once per quarter, I make a copy of the critical files off this device and remotely store them (on an encrypted USB drive).
The value of RAID-1 was pounded home a couple weeks ago, when one of the drives in the Iomega device failed. Iomega sent me a new drive, I replaced the drive, the system rebuilt the RAID set and all’s normal again. I had full use of the system and data while I waited to receive the replacement drive. I only needed to power it down to replace the drive (there are hot-swappable systems, but for home use, that’s overkill). Without RAID-1, I would have lost all or most of the backups and critical data.
Whatever you do … protect that data! We now live in an environment where our lives are largely digitized. Like going to the dentist, you need to be willing to invest time and some $$$ to protect what’s precious to you. The good news is once this is in place, you should sleep easier.