Basic concepts to know before using storeBackup's replication

The prior subsection listed some of the main features of storeBackup's replication functionality. In the following subsection we offer a simple and typical example to have a copy of your backup data on another disk (or at another location). In a later subsection, we also offer a more advanced example.

But first, there are a few important conventions and concepts related to storeBackup's replication functionality that you need to be aware of. With storeBackup's replication functionality, there are four important storage locations you need to be conceptually familiar with. These four locations are normal directory trees.36

Of these four conceptual locations, one is the original source. The other three are related to backups or replication:

  1. ``master backup''37
  2. ``backup copy''38
  3. ``deltaCache''.
None of those three directories is allowed to be a subdirectory of the others. These locations are separate directory trees.39

You are already familiar with what we are calling the ``master backup'' if you are doing any kind of backup: it is just the backup of your original data.40

The next important storage location for replication is the backup copy. That one is probably obvious - after all, it is the point of replication.41

The last of the important storage locations for replication is a cache of deltas (and meta data) used by storeBackup to provide its advanced replication functionality in the most efficient manner. We refer to this location as the ``deltaCache''. The reason why there is a deltaCache is thats it allows the masterBackup to be completed (including hard linked) independently of the backup copies.

Another important replication detail to understand is that each of those backup-related directory trees must have its own configuration file in the root of the tree. The reason is that by establishing a fixed location for the configuration files, everything can be handled without additional options (or complications) by storeBackupUpdateBackup.pl.

In storeBackup replication, the data flow is always: masterBackup $\rightarrow$ deltaCache $\rightarrow$ (multiple) backup copy / copies.

  1. ``master backup'' contains its own unique storeBackupBaseTree.conf
  2. each ``backup copy'' directory tree contains its own unique storeBackupBaseTree.conf
  3. ``deltaCache'' contains deltaCache.conf

The ``master backup'' directory tree has to contain the configuration file storeBackupBaseTree.conf. This config file defines which backup series has to be copied to the deltaCache.

Each ``backup copy'' directory tree contains a file named storeBackupBaseTree.conf which is its individual configuration file. It defines which backup series has to be copied to this specific backup copy directory tree.

The ``deltaCache'' directory tree contains deltaCache.conf in the root of the tree. The purpose of this configuration file is to provide one central place which denotes which backup series shall be copied to which named backup copy (physical directory paths are not used). This information is needed by storeBackupUpdateBackup.pl to decide if a backup can be marked as processed and, later, deleted. storeBackupUpdateBackup.pl needs to know who wants to copy a backup and if it has already been copied.

These config files contain some options (e.g., backupTreeName) for which you specify a unique identifier. Note that this parameter is simply a named reference to another location. It is not a file system path or an actual directory name. It is a unique identifiers that you can make up. This will be explained further below.

There is no information shared between two different backup copies. For a home user, this is necessary because the external disks used for replication might not always be connected. In the professional admin case it might be related to no routing for security reasons.

However, when understanding the overall concept of storeBackup replication, you might want to understand why the replication configuration uses these unique identifiers (which are not specific directory names). Why not just use the directory name? The reasons can be illustrated with two examples.

First, consider the case of somebody who wants to make two backup copies (replicas) to two different external disks, one on odd weekends and one on even weekends. Assume they would be mounted at the same mount point. The most elegant way for storeBackup has to manage the alternation of these two different copies is via these unique identifiers. In this example, imagine you have unique identifiers named CopyA and CopyB. This allows storeBackup to know whether each one was completed (copied + hard linked) so it can be moved to processedBackups - even if a backup was interrupted, etc. Other implementations would not be as advantageous.

Another example would be a sysadmin who wants to make two replications, one in the same data center and the other one in a remote data center. He sets up a server for that in the same data center which pulls its data from the deltaCache via some mount points. In the remote data center, he sets up another server in the same way. Using unique identifiers in storeBackup's replication configuration (so it is decoupled from the physical directory) makes this administration easier.

The configuration file of deltaCache doesn't know the directory where the backup copy is located. Instead, the configuration file knows only a name (unique identifier), which is more flexible. If you change the directory of the backup copy, you do not have to change the deltaCache configuration file. And, as illustrated in the examples above, you have have two unique identifiers pointing to the same physical path to facilitate rotation of backup copies.

You will probably have at least four separate configuration files with your storeBackup replication setup. These are the three files mentioned above and your normal storeBackup.pl config file42.

The use of replication can affect two options of storeBackup.pl: --lateLinks and --otherBackupSeries.

If you do not run your backups with the option lateLinks at the moment and want to use replication, you have to enable the option lateLinks when using storeBackup.pl. However, there is no real disadvantage in using it. It simply splits the full backup process into two steps without otherwise altering anything that would be done without this option.

You also need to be aware of the option --otherBackupSeries in the main config file and how this relates to the potential need for using a command line parameter (e.g. 0:homeBackup as shown in the example below) with storeBackup.pl.

If you want to replicate one backup series only, it is not possible to have cross links to other backup series. This restriction only applies, of course, if you have multiple backup series [e.g., different computers] in your master backup. From a series which is replicated, you cannot refer to series not being replicated to the same backup copy. (But, conversely, from a series which is not replicated, you can refer to any series being replicated.)

This restriction might go away in the future. (This would mean that the unresolvable files have to be added to the deltas (for deltaCache) when running storeBackupUpdateBackup.pl on the master backup.)

In short, to keep it simple and to set up replication the first time, make sure that there are only hard links to older versions of the same backup series. Anything where you have links in the master backup you also have to have in the backup copy, so the same links can be established. If you replicate all series, you do not have to change anything about hard linking.

This is all very simple, but only if you understand what's happening. (And naturally, the situation is somewhat more complicated if you replicate different series (overlapping) to different backup copies.)

When running storeBackupUpdateBackup.pl on the backup copy, autorepair is switched on by default (but generates INFO entries only, no ERROR messages).

Heinz-Josef Claes 2014-04-20