reversing order of diffs.

Donovan Baarda
Thu, 14 Mar 2002 14:37:00 +1100


new to the list... have some comments/ideas.

As I understand it, rdiff-backup currently uses a full copy of the most
recent backup, with diffs for older backups. It is also capable of
efficiently updating a remote backup by sending only delta's over the wire.

This makes it nice and easy to restore the latest backup, and a bit slower
to restore older backups. This "full-latest + old-deltas" architecture at
first glance looks like rdiff would be less efficient than xdelta, which can
calculate optimal delta's better than rsync's block aligned match algo.
Also, xdelta2 would give you all the "get-a-particular-version" and ACID
for free.

However, xdelta alone can't do efficient over-the-wire transfers, because it
requires access to full copies of both versions to calculate the delta.
but... as I understand it, rdiff-backups efficient over-the-wire transfers
must involve calculating forward-delta's to transmit over the wire,
generating the latest version for the archive, then calculating backwards
deltas to record older versions in the archive. This looks to me like you
could still benefit from using xdelta as the archive store, and use rdiff
for the efficient over-the-wire transfers.

But... I question the whole full-latest+old-deltas archive. My problem is
that it doesn't allow you to make backups that you can store offline. You
cannot make a full backup, store it offline, then make small incremental
backups that you also keep offline. I know that people are going to say
"that is not what rdiff-backup is for", but I think it is pretty close and a
small change or two could add this. 

All you need is to (optionly) reverse things so you have a
full-oldest+new-deltas archive. For each backup you keep a full list of file
signatures online. The beauty of keeping this signature list online is you
can calculate new diffs against any backup, without having the full backup

The storing a signature list online saves calculating it for remote updates.
Keeping latest deltas saves the forward+reverse delta calculation needed
when doing efficient over-the-wire transfers, as you just keep the
transfered delta. This brings the whole thing more inline with traditional
full+incremental backup tools, with the added benefit that _any_ previous
backup, full or incremental, can be used as a basis for an incremental
backup. Note that using offline backups with only online signatures means
you can't use xdelta as the store.

I'm going to look at rsync-backup code in more detail to implement something
like this soon, as I _need_ offline backups. I actualy have a significant
amount of Python code already written towards this end, including things
like rsync-style include/exclude lists with efficient directory pruning. I
never quite finished it, and now that rsync-backup is here, I'm more
interested in "molding/extending" it to my needs than releasing Yet Another
Backup Tool.

If anyone is interested, let me know...

ABO: finger for more info, including pgp key