Hard links

Nick Duffek nick@duffek.com
Tue, 12 Mar 2002 08:37:58 -0500 (EST)

On 11-Mar-2002, Ben Escoto wrote:

>If /bin/gzip were diffed, then there would be 4 diffs in the increments
>directory instead of one snapshot.  Which of these would use more space
>depends on the specifics of how /bin/gzip was changed.

Agreed, it's unlikely that diffing 4 copies of binaries would be better
than making one snapshot, so that was a bad example.

However, text files tend to change more frequently and in a diff-friendly
fashion.  Would you agree that usually it would waste space to snapshot
them instead of diffing them?

What about viewing hard links in rdiff-backup-data as a special case of
compression, i.e. of noticing similarities between files and saving the
similar sections once instead of multiple times?

If there were long-term plans for rdiff-backup to do that for all files,
then the hard-link space savings question could be ignored, since
eventually it would become irrelevant.

>But this seems to contradict the earlier remark - why would it be useful
>if only 1% of your files are hardlinked?

I'm not sure which remark you mean: that it'd be useful to support hard
links, or that it'd be useful to have hard links intact in the mirror.

It's useful to support hard links because I want my backups to support
restoring the whole filesystem.  Users may depend on hard links for
obscure reasons other than space savings.

It's useful to have hard links intact in the mirror so that I can use the
mirror as-is -- for browsing, NFS-mounting, etc. -- and have an accurate
representation of yesterday's snapshot, including hard links.

>Why don't you think it would be useful to hardlink (snapshots) in the
>rdiff-backup-data dir?

Because rdiff-backup-data isn't a mirror, so I'd never browse it or
NFS-mount it to see a past snapshot.  I'd use rdiff-backup to regenerate
past filesystem states, and rdiff-backup would restore hard links during
the regeneration.

Yes, hard links in rdiff-backup-data would save me a small amount of
space, but it would complicate the task of manually fixing things when
there's something wrong, which for me counterbalances the small space