rdiff-backup across the network
Tue, 12 Mar 2002 11:48:01 -0800
Content-Type: text/plain; charset=us-ascii
>>>>> "ST" == Stephen Tan <Tan>
>>>>> wrote the following on Tue, 12 Mar 2002 13:59:47 -0000
ST> Hi, I'm considering using rdiff-backup here at work and I'm
ST> generally vry impressed with the concept and ease of use of this
ST> It has a very low cpu and memory overhead which is welcome.
Well, I find that it uses lots of cpu, but I'm glad for your
ST> There is one thing I was wondering about though, and that is the
ST> speed across the network of rdiff-backup. I am running across a
ST> 100mbit switched LAN, and using rsync, I can acheive a transfer
ST> rate of about 1.5 mb/s (albeit with the CPU consumption going
ST> very high.)
ST> I get about about 4-5 mb/min (for actual data transfer speed)
ST> using rdiff-backup.
Well that sounds good, except for the "min" part.
ST> You did mention that rdiff-backup was slower than rsync, but I
ST> did anticipate such a large factor.
ST> Is this because:
ST> (i) increasing throughput would load the cpu more? (ii) ssh is a
ST> bottleneck? (I think this is unlikely!) (iii) the rdiff
ST> algorithm is set for smaller bandwidths?
ST> I'd love to be able to increase the throughput by a factor of
ST> 2-3 if possible - I have some cpu on both ends to spare and lots
ST> of bandwidth. Is this possible?
I'm surprised rdiff-backup is doing so poorly. If cpu and bandwidth
aren't bottlenecks, what is? There is nothing in rdiff-backup (except
at one small point which I don't think would be an issue) which tells
it to take it easy, so it should just run as fast as it can.
Are you running rsync over ssh? I agree that it is unlikely that
ssh is the problem, but comparing rsync w/ ssh to rdiff-backup would
at least remove that variable.
If I can bother you to do some of my work for me, it would help if
you ran a few tests to try to narrow the problem now. There are three
natural possibilities for what rdiff-backup is doing too slowly:
1. Just transferring files. So you could just transfer one file for
an initial mirroring and see if that is much slower than rsync.
2. Comparing lots of files that are the same to see if they are the
same. I'm sure rdiff-backup has more overhead than rsync (or at
least it should, considering how much overhead there is, and rsync
has that neato superpipelining stuff), so this may be part of the
3. Updating changed files when most of the file is the same. So
maybe rsync is using a better diffing algorithm than rdiff.
I ran some benchmarks of an earlier version (0.4.x?) against rsync. I
found that rdiff-backup uses more memory than rsync for small file
sets, but uses A LOT less memory for large file sets (rsync wants to
load the whole filelist into memory at once). In the local case,
rdiff-backup was 25% faster maybe. For the remote case, I think rsync
is about twice as fast (?) for lots of small files or not much change
(but this isn't as bad as it sounds, because rsync is already
something like 1000 times faster than ftp for small files under some
conditions - read Tridgell's dissertation), but rdiff-backup
approaches equality the larger the files get. But even if I'm
remembering this correctly, the current version is probably much
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001
-----END PGP SIGNATURE-----