You are here

data

Backing up Data

I have a bit of experience in backing up unix and Windows platforms using open tools. I'll take five minutes to share a little of what I do, but I know that this is imperfect. I'm really looking for hints and tips on how to improve things!

Backing up Linux from one disk to another

The quickest way to retrieve a lost file is to have it sitting on the file system so that you can simply copy it back. If that file system is mounted remotely by NFS, you get protection from disk loss too.

What we're really doing here is replicating the data exactly as it sits on the source.

The way to do this is to use rsync. Backing up using cp or tar will result in all files being copied every time. Rsync reduces the load on the system by only copying changes - this is very important once you get beyond a few megabytes and into gigabytes. I initially started doing this using an NFS mount, but found it to be inefficient. Disk I/O and NFS traffic was much higher than I wanted and this lead to high CPU load too. The solution was to have the network I/O use rsyncd. This is specialised to network rather than disk I/O.

Configuring rsyncd:

The defaults in most installations are good, you need only add your 'module' which is where backups are placed.

[backup]
path = /mnt/massstorage/backups
comment = Backup storage
read only = no
write only = yes
hosts allow = 192.168.9.6 192.168.9.7
hosts deny = *
auth users = newuser
secrets file = /etc/rsyncd.secrets
# required for preserve attributes.
uid = root
gid = root

In rsyncd.secrets, make a user. Perhaps one per machine backed up.

newuser:password

On the client machine, create yourself a script to be run from cron. Note that there are two ways to give the password to rsync - either by environment variable or by a secured file. The file is the correct way to go as this reduces the chances of leaking the password to other programs. The example below uses a bash style local environment variable.

The use of slashes on the end of source and end paths is significant. Please read the rsync man page which does explain it well.

RSYNC_PASSWORD="password" rsync -a --delete -x /var/ rsync://newuser@mybackupmachine/backup/var/

Backing up Windows to unix using rsync

This is much like the above, but I prefer to do things a little differently to ensure that the more involved tasks are done on unix, where they are more easily managed.

I install the best Windows port I've found - cwRsync. If this is to go over the public internet or anything untrusted, I install the cwRsync Server version together with OpenSSH.

I configure the Windows version to run as a server and then pull the files over from unix - specifying the files from the remote unix machine. This gives a great deal of flexibility and control if you are backing up a customer's machine. Configuring the SSH tunnel to run reliably when customers have very unreliable ADSL lines can be a challenge but I leave this up to the reader.

There are a couple of issues with backing up Windows:
Locked files.
Permissions.

For locked files (and database style files which are locked and need to be grabbed in a consistent fashion - think Exchange) I configure Windows Backup to created a large bundle of files in a .bkf file. This is what rsync grabs. Rsync does a good job of applying deltas to large files like this to speed up the transfer, but it can still make the situation difficult as it is still much slower and more I/O intensive to do this than to grab smaller individual files.
Permissions is more of a mess. Rsync runs in the Backup Operators group by default. Normally, a Windows backup utility would do this but also set a special Windows API bit to say "I'm a backup tool, let me at the files". Rsync can't do this. Therefore, any file not explicitly readable by the Administrators or Backup Operators groups is lost. The best solution I have here is to change the rsync service to run under the one user which does have the equivalent of 'root' access - Administrator. This isn't the cleanest or most secure solution. You may find that rsync then refuses to start - you need to delete the two special stdin and stdout log files in C:\Program Files\cwRsync.
This method still has problems as you can block access to Administrator with Windows permissions. This is quite common in my experience. To this the only solution is to watch the rsync logs and ask the customer/admin to add Backup Operator access to any files you can't get to. Messy.

Backing up to tape

Use Amanda if you have a changer. In fact, I'd say use Amanda otherwise too. The reporting features are useful even if you only use a fraction of the software's capabilities. This works well under Windows too using the available Windows client package. The same permissions and locked files issues will occur under Windows with the same solutions as rsync.

I can't think of much to say here regarding Amanda and tapes - it seems to my memory to be more down to getting the configuration correct than of any special voodoo. The Amanda docs, mailing list and especially wiki are by far the best sources of information and more valuable than anything I could write here.

Subscribe to RSS - data