rdiff-backup optimization

Ben Escoto bescoto@stanford.edu
Thu, 16 May 2002 10:44:08 -0700

Content-Type: text/plain; charset=us-ascii

>>>>> "DB" == Donovan Baarda <abo@minkirri.apana.org.au>
>>>>> wrote the following on Thu, 16 May 2002 19:30:34 +1000

  DB> I have a cleaner version of the rolling checksum code that is
  DB> 2~3x faster, for a start.

  DB> I posted a list of things that could be fixed to the rproxy list
  DB> a while ago. I'm looking at implementing them now. Depending on
  DB> when/if I get developer access on SF, I'll either post it all as
  DB> a patch, or release a new version of librsync.

Anything that makes rdiff faster will help with rdiff-backup, of
course, but I think the main problem with rdiff-backup is that it uses
too much CPU time.  For instance, if out/ doesn't exist and manyfiles
is a directory containing 10000 1 byte files:

~/prog/python/rdiff-backup/src $ time rsync -a manyfiles/ out
real    0m19.684s
user    0m1.300s
sys     0m5.260s

~/prog/python/rdiff-backup/src $ time rdiff-backup manyfiles out
real    1m32.337s
user    0m59.870s
sys     0m7.980s

    Running it again (so no files are changed, and they all just need
to be checked):

~/prog/python/rdiff-backup/src $ time rsync -a --delete manyfiles/ out
real    0m1.598s
user    0m0.990s
sys     0m0.530s

~/prog/python/rdiff-backup/src $ time rdiff-backup manyfiles/ out
real    0m31.987s
user    0m31.340s
sys     0m0.630s

    The directory in question is kind of a worst-case test for
rdiff-backup (for copying large files locally, it is actually faster
than rsync), but I think at least the second case is typical, where
rdiff-backup spends a lot of time realizing that nothing has changed.

    So rdiff-backup may be waste more system calls (and maybe this
would be a bigger deal under Solaris) but at least on my system the
main reason it is much slower than rsync in these cases is its CPU
time.  Also, it seems that a lot of rdiff-backup's code is in the
"inner loop" (profiler says top 10 functions total account for less
than 50% of cpu time) so it won't be easy to get any miracle

    Unless I'm missing something, there are three options as far
rdiff-backup optimization goes:

1.  Leave it the way it is.
2.  Conceptually rejigger the architecture so it somehow comes out
    much faster.
3.  Rewrite substantial portions of it in C.

Probably (1) is the only likely one in the near future.

Ben Escoto

Content-Type: application/pgp-signature

Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001