xdelta vs. rdiff
Tue, 12 Feb 2002 12:12:07 -0500 (EST)
The rediculously low number I was getting before is way off. The actual
size of the delta using xdelta on the tar (or iso) files is 115 Megs.
This is still substantially lower than the 150 Megs created by rdiffs of
each individual file and has the added advantage of working well with a
large single file with lots of binary data. The algorithm works better
better under my set of constraints. The limiting factor others might
run into is that it requires a large amount of memory.
Ideally, you would have enough memory to have both files in
memory at the same time during delta generation. In my case This size is
1.3 Gigs. However, if the xdelta algorithm was used instead of rdiff
in a system like rdiff backup, the size of each individual file in memory
would be much smaller than my 650 Meg file. Backups would take longer,
but binary data size would be substantially smaller.
> On Mon, 11 Feb 2002, Dan Sturtevant wrote:
> > Ben, I used xdelta to create a diff of the distributions.
> > There were approx 50 rpms that were different between the two distros.
> > distro 1: 610 Megs
> > distro 2: 623 Megs
> > Delta file generated by rdiff on the two tarballs was ~650 Megs.
> > Delta directory generated by rdiff-backup ~150 Megs.
> > Delta directory created by rsync+ ~150 Megs. (although this system is
> > still beta and very broken.)
> > Al the above systems were based on the rdiff algorithm. The reason the
> > rdiff-backup and rsync+ got down to 150 Megs is because they traverse
> > directories and make deltas against individual files. The compression
> > within each file is still based upon the inefficient rdiff algorithm.
> > Here is the impressive part.
> > running:
> > xdelta delta tar1.tar tar2.tar tar.patch
> > produced a patch file of 87K
> > I couldn't believe it.
> > I moved tar1.tar and tar.patch to a different directory and ran:
> > xdelta patch tar.patch tar1.tar tar2-2.tar
> > I then ran
> > diff tar2.tar tar2-2.tar.
> > No difference.
> > xdelta is very computationally intensive. I dont have any hard numbers
> > thus far, but My system running a 2.4.3-12 redhat kernel with 512 Megs of
> > memory was swapping.
> > Needless to say, I reccomend looking into using this system in any case
> > where binary data represents the majority of the data you are trying to
> > compress.
> > Thanks,
> > Dan
> > On Thu, 7 Feb 2002, Ben Escoto wrote:
> > > >>>>> "DS" == Dan Sturtevant <email@example.com>
> > > >>>>> wrote the following on Thu, 7 Feb 2002 14:05:45 -0500 (EST)
> > >
> > > DS> 4. This was a nogo. rdiff's output (from the diff of the 2
> > > DS> tarballs) was ~650Megs. Each of the distro's was approximately
> > > DS> the same size. I assume that this was because of file offsets
> > > DS> within the tar file and because lots of binary info was present.
> > > DS> The algorithm just didnt work for this case.
> > >
> > > This is a bit disappointing... It seems rdiff isn't as good at
> > > finding binary similarities as I thought. Just for my curiousity
> > > though, if you still have the tarballs around, could you try the same
> > > thing with xdelta v1.x.x? You can find RPMs of it with rpmfind. I'm
> > > wondering if it is superior to rdiff for this kind of thing.
> > _______________________________________________
> > Rdiff-backup mailing list
> > Rdiff-backup@keywest.Stanford.EDU
> > http://keywest.Stanford.EDU/mailman/listinfo/rdiff-backup
> Rdiff-backup mailing list