0.7.6 slower than 0.6.1

dean gaudet dean-list-rdiff-backup@arctic.org
Sat, 15 Jun 2002 19:21:28 -0700 (PDT)

On Sat, 15 Jun 2002, Ben Escoto wrote:

> Yes, rdiff-backup eats a lot of CPU.  There are undoubtedly
> inefficiencies along the lines Dean mentioned, but for most systems
> CPU (or possibly bandwidth) will be the limiting factor.

at times there's a cpu limitation, but i'm guessing the problem is that
rdiff-backup is mostly serialized.  fixing that is a chore though :)

i suspect that there'd be some benefit to spawning a couple of rdiffs in
parallel.  basically so that while one rdiff is blocked on reading data,
another is calculating.

with that change i know that my bottleneck would be bandwidth -- there's
only a 128kbit uplink from my mirror to my primary (1.5mbit the other
way).  i can watch the uplink saturate when rdiff hits a large file and
read-ahead can feed it data as fast as the cpu can do the checksums.
when it's in amongst small files the uplink isn't saturated at all.

in terms of scaling the mirror host to handle many primaries it might be
nice to have rdiff-backup cache the signature files on the mirror.
perhaps a file per directory with the names and signatures of the
directory contents.  not bulletproof -- if someone goes about mucking in
the mirror they could damage things.  but this would reduce the cpu and
i/o requirements on the mirror host.

my particular setup is working like a charm though.  nightly finishes in 3
hours, and i've got about a 6 hour window in which i don't mind having a
1.5mbit outbound load from the server, so i've got room to grow.  ~50GB of
data in ~600k files.