Reducing Disk Space

Saving files as a whole

The first measure to decrease the necessary hard drive storage space would be the compression of data - if that makes sense. storeBackup allows the use of any compression algorithm as an external program. The default is bzip2.

Looking at the stored data closely, it is apparent that from backup to backup relatively few files change - which is the reason for incremental backups. We also find that many files with the same content may be found in a backup because users copy files or a version administration program (like cvs) is active. In addition, files or directory structures are re-named by users, in incremental backups they are again (unnecessarily) secured. The solution to this is to check the backup for files with the same content (possibly compressed) and to refer to those. Within storeBackup, a hard link is used for referencing. With this trick of adding hard links, which were already created in existing backup files, each file is present in each backup although it exists physically on the hard drive only once. Copying and renaming of files or directories takes only the storage space of the hard links - nearly nothing.

Most likely not only one computer needs to be secured but a number of them. They often have a high proportion of identical files, especially with directories like /etc, /usr or /home. Obviously, there should be only one copy of identical files stored on the backup drive. To mount all directories from the backup server and to backup all computers in one sweep would be the most simple solution. This way duplicate files get detected and hard linked. However, this procedure has the disadvantage that all machines to be secured have to be available for the backup time. That procedure can in many cases not be feasible, for example, if notebooks shall be backed up using storeBackup. Specifically with notebooks we can find a high overlap rate of files since users create local copies. In such cases or if servers are backed up independently from one another, and the available hard drive space shall be utilized optimally through hard links, storeBackup is able to hard link files in independent backups (meaning: independent from each other, possibly from different machines).

Splitting files into parts: blocked files

The method of compressing and hard linking files works pretty well for ``normal'' files like office, configuration, program code and all other type of small files.
It more or less fails for big image files where only parts are changed. Such a file with e.g., 3 GB has only a few megabytes of changes, but the method described above would copy or compress the whole 3 GB into the backup, which is neither space nor time efficient. To solve this problem, storeBackup can handle such files in a special way.
In the configuration file you can specify which one should be handled as ``blocked files''. For these blocked files, a directory instead of a plain file is created in the backup. (The name of the directory is identical to the original file name.) The affected file from the source is not stored as a whole in the backup - instead it is stored as (small) numbered blocks in the created directory. These blocks can be compressed.
In the next backup (after something has changed in the original file,) storeBackup checks which of these blocks have changed and only copies / compresses that blocks. For the now missing unchanged blocks a hard link is generated to the fitting blocks in the old backup(s). This md5 sum based comparison is also done with other blocked files, so if you duplicate a VM for different use, storeBackup will find the identical blocks. It will also find identical blocks within one blocked file. This may happen when unused areas in an image are blanked or massively when saving sparse files.
As a result the needed space is reduced dramatically (compared with copying / compressing the whole file) and it is still possible to restore the contents of the original file without a running storeBackup which is the philosophy of storebackup (restoring is the most important part of a backup) and might be useful in e.g., 10 years. (Who knows what's happening then!?)

Deleting Backups

For the deletion of files storeBackup offers a set of options. It is a great advantage for deletion when each backup is a full backup, as those may be deleted indiscriminately. Unlike traditional backups, there is no need to consider if an incremental backup is depending on previous ones.
The options permit the deletion or saving of backups on specific workdays, first or last existing backup of the week/month or year. It can be assured that a set of a minimum number of backups remains. This is especially useful if backups are not generated on a regular basis. It is possible to keep the last backups of a laptop until the end of a four week vacation even though the period to keep it is set to three weeks. Furthermore it is possible to define the maximal number of backups. There are more options to resolve the existence of conflicts between contradictory rules (by using common sense).

Heinz-Josef Claes 2014-04-20