linkToDirs.pl

Makes a de-duplicated copy of files in defined directories to another location. Utilizes hard links to the full extent possible to avoid wasting storage space.

linkToDirs.pl is a general purpose tool. However, it is very helpful if you want to copy a backup made with storeBackup to another disk.

Usage note: whereas many file copy utilities have just two primary parameters (the source and destination), linkToDirs.pl allows three primary parameters:

The reference location is the place to look for existing content which can be hard linked to (see --linkWith option).

The --linkWith option is not required. If you use it, you can optionally specify multiple link references for hard linking (i.e., the --linkWith option can be repeated).

Files with the same content as the specified link reference(s) and on the same file system will be hard linked. Hard links within the copied files will be maintained or re-created: linkToDirs.pl will always hard link identical files, with one exception. That exception is: files in the directories specified by --linkWith will never be changed. So if there are two identical files which are not hard linked, they will remain that way (unlinked). linkToDirs.pl supports hard linking of symbolic links with at least as much capability as the main storeBackup.pl program does.

(Naturally, if there are no identical files, it will only copy files.)

Hard links on Linux have these rules:

If it is not possible to create a hard link to the reference file (due to the limitations of hard links) linkToDirs.pl will generate a new file copy (on the target file system) and then hard link to that one going forward. In this way, linkToDirs.pl can be used to maintain a de-duplicated state of source files when copying them to another filesystem.

linkToDirs.pl is a general purpose tool. However, it has a special synergy with storeBackup. As you know, storeBackup eliminates wasted space in the storage location by maintaining a de-duplicated state through the use of hard links (even if the target filesystem supports less hard links per file than the source filesystem). But hard links cannot be maintained across different file systems.

Therefore, when you want to copy an existing backup made with storeBackup to a new disk (or new file system), linkToDirs.pl allows you to do so and to maintain all the storage efficiency benefits of the original storeBackup backup.

            linkToDirs.pl [--linkWith copyBackupDir] [--linkWith ...]
                          --targetDir targetForSourceDir
                          [--progressReport number[,timeframe]] 
                           [--printDepth] [--dontLinkSymlinks]
                          [--ignoreErrors] [--saveRAM] [-T tmpdir]
                          [--createSparseFiles [--blockSize]]
                          sourceDir ...

--help / -h
Print a help message
--linkWith / -w
The reference location; consider the files in these directories for hard linking. This option can be repeated. (The directories are recursed, as you would expect.)
--targetDir / -t
The destination; files from sourceDirs will be copied to this directory.
--dontLinkSymlinks
Do not hard link identical symbolic links (symlinks). The default is to hard link each existing symlink rather than copy the symlink.
--progressReport / -P / progressReport
Print a progress report after the specified number of files. If you want to get a message at least after a specific time frame, you may add that time frame separated by a comma, eg:
-P 1000,1m10s     on the command line or
progressReport = 1000,1m10s     in the configuration file.
There must be no white space in the parameter to that option. The syntax of the time frame is the same as with the keep$*$ options.
sourceDir
the source directory; files (or existing storeBackup backups) from this directory will be copied to targetDir. sourceDir may be repeated multiple times with different directories. Normal shell file and directory conventions, including wildcards, are acceptable. Copy functionality is recursive into all subdirectories within the listed sourceDir.
--ignoreErrors
Don't stop copying in case of errors during copying / linking.
--saveRAM / saveRAM
Use this option if storeBackup.pl runs on a system with very low memory. You will then see some dbm files in ``tmpDir''. This will slow down storeBackup.pl a little bit, so do this only if you run into problems without it. On modern computers, it should only be necessary to use this option if you copy millions of files.
--tmpdir / -T /tmpdir
Directory for temporary files, the default value is picked from the environment variable $TMPDIR. If it does not exist, /tmp is set as the default value.
--createSparseFiles / -s
A mismatch between block size, number of used blocks for a file and the filesize is used to indicate a (possible) sparse file. If this option is set, linkToDirs.pl copies the affected file in case of a possible sparse file with the external program cp to support sparse files. On Linux systems (and many others) gnucp is installed which supports sparse files - on other systems this option may not work (depending on cp on your system).
--blockSize
The blocksize to indicate a sparse file. The default value is 512.

Heinz-Josef Claes 2014-04-20