Saturday, April 25, 2009

Backing up VDI files to Amazon S3

I've got a Centos 5.3 server that uses VirtualBox to run a couple of headless virtual servers. Apart from regularly (i.e. nightly or weekly) backing up the contents of the virtual servers, it's also useful to take the occasional snapshot of the whole virtual server. To do this you basically just need to grab a copy of the .vdi file, which can be found somewhere like /root/.VirtualBox/VDI/.

So in the unfortunate case of a complete hardware failure or data centre screwup I'll be able to put the .vdi file on another box somewhere and startup the virtual server to the same state that it was when I took the snapshot. At that point I would probably want to restore the files from the nightly backup into the restored virtual server to make sure everything is as up-to-date as possible.

So what I'm going to run through here is how to backup the snapshot to Amazon's Simple Storage Service (S3). Which I'm using for backups because:
  1. It's cheaper and easier than maintaining hardware in the office.
  2. It's faster to copy a large file from the server (inside data centre) to Amazon S3 than to copy to the office server.
  3. Bandwidth is still very expensive in Australia and my fellow office-mates get rather annoyed when our connection is shaped by me copying large backups onto the office server.
So here's the basic process of doing the backup, hopefully I'll at some point get around to automating this with a shell script.

1. Install s3cmd by executing the following as the root user:
yum install s3cmd
Update: Actually you'll need to install the repository first - instructions here: http://s3tools.org/repositories

2. You will now need to configure s3cmd with the details of your Amazon S3 account.
s3cmd --configure
This will prompt you for your access key and secret access key. I also selected to use encryption and https.

3. Create a bucket to store your backups. Personally, I've got one S3 account and then a bucket for each server I need to backup, so my bucket is setup like this:
s3cmd mb s3://myservername.com

4. Login to the virtual server guest that you want to backup and shut it down.

5. Navigate to the location of the VDI files.
cd /root/.VirtualBox/VDI/

6. Make a copy of the VDI file you want to backup.
cp myvps.vdi myvps_snapshotdate.vdi

7. Startup the virtual server again. Note: I've written startup scripts for my virtual servers, so this command isn't available by default. The startup scripts might be the subject of another blog post if anyone requests it.
/etc/init.d/myvps start

8. Compress the vdi backup file. My original vdi was about 9GB, compressed it got down to 1.8GB.
tar -czvf myvps_snapshotdate.vdi.tgz ./myvps_snapshotdate.vdi
Note: The reason why I didn't compress this at the same time as making a copy from the original is because I wanted to get the virtual server started up with as little downtime as possible.

9. Send the compressed vdi to S3. This might take a while.
s3cmd put myvps_snapshotdate.vdi.tgz s3://myservername.com/VDI/myvps_snapshotdate.vdi.tgz

10. Cleanup

rm myvps_snapshotdate.vdi.tgz
rm myvps_snapshotdate.vdi

No comments:

Post a Comment