Borg backup - You have been assimilated.

As we want to get rid of an expensive backup solution, we started to look around for options to backup our servers. I was asking around in IRC for recommendations about backup software and mirage has shown me the light to borg.

It has several pros that I really do enjoy:

  • Open source.
  • Excellent documentation and getting started articles.
  • No fussing with the kernel, meaning no loading of external kernel modules (we had the problem with our current backup software).
  • Can be scripted and run from the CLI, we can use standard Linux tools to handle the backup process.
  • Does all the neat things that a backup software should do: deduplication, compression, and encryption in a very simple way.
  • Borg has an option to prune backups, in order for you to control the retention periods of the backups.

Cons:

We have automated the deployment of the backup server and client with Puppet, this allows us to have a central point where we can configure the backups as well to see what is backed up, where and when.

Here you can see the initial backup of one of our development servers.

Duration: 2 hours 31 minutes 44.02 seconds

Number of files: 3390851

Original size      Compressed size    Deduplicated size

This archive: 93.20 GB 93.20 GB 62.59 GB
All archives: 93.20 GB 93.20 GB 62.59 GB

U Unique chunks Total chunks
Chunk index: 1301009 3408522

This means that without any compression enabled the deduplication already saves 33% of disk space. Quite impressive.

For the deployment part I have written some puppet code that does the following:

  • Read from a included Hiera file all the client properties necessary for the client to work, to name a few, the public ssh key, the schedule, and extra folders to backup, this way we have an organized and central point where we can find out and see which servers are being backed up, what and at which time.
  • Create a templated bash script with the above-mentioned variables, so that each client has is set of rules and definitions applied locally.
  • On the server side, in order to accommodate the clients, the script is reading the above-mentioned variables and create the backup folders for each server being backed up as well as adding the necessary client keys to the .ssh/authorized_keys file.

The only part that we are currently missing on automating with Puppet is to create the ssh key file automatically and run borg init after the initial puppet run. This is due to the fact that puppet does not have a native ssh-keygen wrapper (and I didn’t bother to write one) as well as the fact that borg init requires some manual input when running for the first time.We still have to look at how to look further into automating this part, but I’m quite happy with the low effort already.

This was all possible to be done without much effort due to the simplicity of Borg, how well it is documented and how easy it can be scripted, very different from all the other commercial backup solutions that I have worked with so far.

For reference read the awesome Borg documentation, special emphasis on Internals as it demystifies a lot on what Borg does and some cool tips.

Next in these series will be on how to do the reporting on Borg.