You are here

backup

Backing up Business Data

A timely reminder that if you have data that your business depends on, get it backed up properly. If you have personal data that you wouldn't like to lose, you also must back up.

http://journalspace.com/ has been well covered in the media. It wasn't one of the great players on the web, but had a sizeable user base. Visit the link to see what it has become now that their database has been wiped.

There are many ways to back up your data, depending on budget and 'continuity' requirements. For example, with a PHP/MySQL website such as this you'd certainly pull an image of the MySQL data down to some form of backup device. You might do this daily. But for disaster recovery, this form of backup requires you to build a new server from scratch (or at least the system software), then import your data, then spend the next six month finding and applying the performance tweaks that'd taken you years to discover.

You're talking hours to days of downtime. But you will get back online.

The next level is a complete snapshot of the server. This is most common with tape backups - you place a complete copy of the server's file system on tape - perhaps once a week with daily incrementals, depending on your data sizes and backup overhead. All you need to do now is get server hardware similar to that which you have lost and pull the image back onto the disks. Well, it's rarely that simple. But the expense of tape buys you time - you'll be back up and running in hours - and there are other important advantages including robust offsite backup.

After that, you get into standby servers and clusters. These increase your availability following a failure to the point where a server failure is completely hidden from your users.

Note that these are cumulative backup systems - if you have a cluster, you also have tape backups of at least one typical machine. If you have tape, you also have MySQL dumps.

I have designed and tested and implemented backup solutions for a variety of situations, including PHP/MySQL hosting - both shared and dedicated systems, including PHP server clusters and MySQL replication. I'll shamelessly plug my services here - I'm not looking for full time work but if you've got a need for this sort of system administration consultation for your small business or start-up, then give me a shout. You'll find an email address in the contact link at the bottom of the page.

In contrast to other blogs and news sites, I'd like to discuss the real reason behind JournalSpace's (JS's) failure - the technical problem of data loss seems to have been caused by a human, and the certainly the fundamental problem was human.

JS's manager doesn't seem to be a very technical person. He/she didn't need to be, there were employees to look after that aspect. From the publicly available information it looks like this unfortunate business man has been taken for a ride by a thoroughly unprofessional idiot.

In business, you really need to trust your employees. It's an unfortunate truth that you can't. This is why computer networks get locked down, you can't install software on your work PC, you can't visit Facebook, you're not allowed in the server room, etc.

But there are some employees that you have to trust. The ones who look after the money. The ones who look after your data. If your company is a jewellers, somebody is going to handle the diamonds - will it be the contract cleaner who comes in on Saturday mornings and is a different face every week?

JS has been exposed to a toxic and dangerous employee. I wish they'd name names so we know who to avoid. There are two aspects here:
1. The person was expected to be technically literate. If they're in charge, or the only IT guy there, they need to be better than average. Indeed, they are claimed to have boasted about being smart.
2. The person must be trustworthy. They have your future (your business data) in their hands, you must know that whatever happens, they will try their hardest to ensure its safety.

In JS's case, the person was neither and unfortunately there was no way of raising alarm bells that this might be the case.

Until, that is, it was too late. The employee was caught stealing, and on being ejected caused damage to servers. I assume something along the lines of deleting system files. This behaviour of someone so trusted is shocking.

At this point, JS should have called in security experts to go over each and every system the employee had touched, but that's easy to say in hindsight. I have work on post-attack forensics and have demonstrated rescuing systems which have been compromised, and then patching holes. In this case a strong security policy would be needed for new and existing employees.

I'm reminded of this from last year:
http://www.zedshaw.com/rants/rails_is_a_ghetto.html
Would you trust that person with everything you have worked for and built up? It's a shame that such people are not always so clearly marked out by publishing their own career suicide notes.

Keywords: 

Storage

I am fascinated by storage on computers. The fact is, I'm fascinated by most things computing and storage is one of those things. Hard disks are noisy, slow and unreliable - yet still today the best mass storage devices we have.

On my home network, incoming data from the Internet first hits my firewall machine. This has been upgraded many times using the spares from other machines, but the storage is a legacy dating back maybe 6 years. It's powered by a single 30GB disk. It's actually been replaced, and even the OS replaced, but as a spare parts machine replacements have never been at the same time: no co-ordinated upgrades. So this is an unRAIDed disk. To compensate, it's fully backed up and can be restored in entirety from tape.

The firewall also runs CCTV and other odd security tasks - not all IP related. I have a pair of spare 120GB disks for it (30GB minus OS doesn't leave much room for video). Strangely these disks were once in the firewall as part of a RAID5 when I needed some extra PCI and IDE slots for a storage array, which was later moved to a more appropriate machine.

The next machine is my primary server. This is by far the most interesting storage machine:
8 disks
3x RAID1 mirrors - /boot & /, /home, music
1x RAID5 (4 disks) - everything else

The first 1GB of the RAID5 disks are actually two RAID1s. That is, using Linux software raid I have the ability to use different levels of RAID on the same physical disks. The upper 300GB portions of the RAID5 disks are the RAID5 section.

On top of the RAID5 is LVM. LVM is truly great. I have 400GB allocated to partitions - /var and /usr etc. 400GB is currently completely unallocated - if I need it for anything, I can bring it online at any moment, either in addition to existing partitions or as new partitions. If I ever run out of space, I can add more physical disks and extend the LVM onto them.

Some of the RAID1 disks backup by replication onto a partition on the LVM. For this I use rsync run nightly in a small script, which also handles backing up and archival of databases.

The root and boot partitions are not on LVM, as I followed general advice. If you lose the LVM for whatever reason, you want to be able to boot the machine to fix it. Especially if you don't have a CDROM drive in the machine to boot a rescue disk.

Each disk in on it's own IDE channel. No slave disks - on IDE, if one disk fails, the other on the same cable will be lost too. I used to split RAID1 over channels - then a failure on one array kills another array. Not good for diagnosing it later.

Moving on: my 1U server.
This has a pair of 20GB disks, in a Linux RAID1. The machine has front access disks so you can remove them without disassembling the machine (and removing it from a rack - it's not on rails). However they are not hot-swap. It's IDE again. IDE is a very poor interface, but for decades the cheapest and most available. The machine has a BIOS based hard/software RAID. It's awful. It is supported only by Windows, FreeBSD and Linux 2.4, and to rebuild the array you have to drop down to the BIOS.

On to my main desktop, one which isn't a Mac:
One 160GB SATA2 disk. Almost empty as all storage is held on the server. There's a lot of temporary files on it and the OS though - the disk was chosen for speed and the size is a reflection of the best price point for a disk used for only temporary files.

Further: Amiga 4000
Two disks, a 40GB and a 4GB, both IDE. It was a 4 and 20GB but the 20GB developed 'noise' - it worked fine since 1999 until mid 2008, and still does, but it's loud. The Amiga runs 24h and needs to be quiet. So I replaced it with a silent 40GB and decided to get rid of the 4GB (dating from 1997). Bah, lots of work - the machine doesn't like the large disk. I used to run a 80GB on it but this is different somehow. So the 4GB boots the basic OS then reboots into a 40GB supporting version of the OS.

Onwards: iMac
250GB SATA2, 3.5". Not interesting. Boots MacOS, works until it fails (no RAID in an iMac). Feels very fast.

Onwards: MacBook
80GB SATA2, 2.5". Same, but is vastly slower than the iMac. the iMac has almost the same spec - same CPU, same memory - but the overall performance is completely different and pretty much entirely down to the disk.

Onwards: Amiga 4000T
36GB and 9GB SCSI2-Ultrawide. Old, tried, tested, and as fast as some new IDE disks I've bought. IDE sucks. SCSI is good (except for the plethora of connectors which all do much the same thing - beyond physical connectors SCSI is almost completely forward and backward compatible).

Boredom is hitting right now. Disks are disks and I have lots, so to continue on with something else...:

Backups: DLT-7000 x2 drives, DLT-IV tapes, all in a robot tape changer unit with 32 tape cells.
This is the coolest piece of hardware I have. A huge, power hungry Dell box (rebranded StorageTek, of course - do Dell make anything?). Lots of tapes. Terabytes of potential storage. Most importantly, a big robot arm that moves tapes around, loads and unloads the tape drives. Very cool to watch.

It's connected to the primary server using high voltage differential SCSI (I didn't know this existed back when I bought the changer, it was called just 'differential SCSI' - cue lots of effort trying to get it working with LVD SCSI...).

Amanda is the software I used to manage it. It does everything for me, now I have it set up. It did take a great deal of configuration though... I have it dump backups to a holding disk overnight on Friday and send it to tape on Saturday (switch on the unit, run the command on the server, wait for the notification that it has finished).

I was hoping to write something interesting in this post. It seems that I haven't. I blame MIT and UC Berkeley, as I'm watching some of their physics lectures on YouTube while I (try to) write this.

Backing up Data

I have a bit of experience in backing up unix and Windows platforms using open tools. I'll take five minutes to share a little of what I do, but I know that this is imperfect. I'm really looking for hints and tips on how to improve things!

Backing up Linux from one disk to another

The quickest way to retrieve a lost file is to have it sitting on the file system so that you can simply copy it back. If that file system is mounted remotely by NFS, you get protection from disk loss too.

What we're really doing here is replicating the data exactly as it sits on the source.

The way to do this is to use rsync. Backing up using cp or tar will result in all files being copied every time. Rsync reduces the load on the system by only copying changes - this is very important once you get beyond a few megabytes and into gigabytes. I initially started doing this using an NFS mount, but found it to be inefficient. Disk I/O and NFS traffic was much higher than I wanted and this lead to high CPU load too. The solution was to have the network I/O use rsyncd. This is specialised to network rather than disk I/O.

Configuring rsyncd:

The defaults in most installations are good, you need only add your 'module' which is where backups are placed.

[backup]
path = /mnt/massstorage/backups
comment = Backup storage
read only = no
write only = yes
hosts allow = 192.168.9.6 192.168.9.7
hosts deny = *
auth users = newuser
secrets file = /etc/rsyncd.secrets
# required for preserve attributes.
uid = root
gid = root

In rsyncd.secrets, make a user. Perhaps one per machine backed up.

newuser:password

On the client machine, create yourself a script to be run from cron. Note that there are two ways to give the password to rsync - either by environment variable or by a secured file. The file is the correct way to go as this reduces the chances of leaking the password to other programs. The example below uses a bash style local environment variable.

The use of slashes on the end of source and end paths is significant. Please read the rsync man page which does explain it well.

RSYNC_PASSWORD="password" rsync -a --delete -x /var/ rsync://newuser@mybackupmachine/backup/var/

Backing up Windows to unix using rsync

This is much like the above, but I prefer to do things a little differently to ensure that the more involved tasks are done on unix, where they are more easily managed.

I install the best Windows port I've found - cwRsync. If this is to go over the public internet or anything untrusted, I install the cwRsync Server version together with OpenSSH.

I configure the Windows version to run as a server and then pull the files over from unix - specifying the files from the remote unix machine. This gives a great deal of flexibility and control if you are backing up a customer's machine. Configuring the SSH tunnel to run reliably when customers have very unreliable ADSL lines can be a challenge but I leave this up to the reader.

There are a couple of issues with backing up Windows:
Locked files.
Permissions.

For locked files (and database style files which are locked and need to be grabbed in a consistent fashion - think Exchange) I configure Windows Backup to created a large bundle of files in a .bkf file. This is what rsync grabs. Rsync does a good job of applying deltas to large files like this to speed up the transfer, but it can still make the situation difficult as it is still much slower and more I/O intensive to do this than to grab smaller individual files.
Permissions is more of a mess. Rsync runs in the Backup Operators group by default. Normally, a Windows backup utility would do this but also set a special Windows API bit to say "I'm a backup tool, let me at the files". Rsync can't do this. Therefore, any file not explicitly readable by the Administrators or Backup Operators groups is lost. The best solution I have here is to change the rsync service to run under the one user which does have the equivalent of 'root' access - Administrator. This isn't the cleanest or most secure solution. You may find that rsync then refuses to start - you need to delete the two special stdin and stdout log files in C:\Program Files\cwRsync.
This method still has problems as you can block access to Administrator with Windows permissions. This is quite common in my experience. To this the only solution is to watch the rsync logs and ask the customer/admin to add Backup Operator access to any files you can't get to. Messy.

Backing up to tape

Use Amanda if you have a changer. In fact, I'd say use Amanda otherwise too. The reporting features are useful even if you only use a fraction of the software's capabilities. This works well under Windows too using the available Windows client package. The same permissions and locked files issues will occur under Windows with the same solutions as rsync.

I can't think of much to say here regarding Amanda and tapes - it seems to my memory to be more down to getting the configuration correct than of any special voodoo. The Amanda docs, mailing list and especially wiki are by far the best sources of information and more valuable than anything I could write here.

Subscribe to RSS - backup