--bwlimit how to

trevor@tecnopolis.ca trevor@tecnopolis.ca
Tue, 3 Sep 2002 13:33:09 -0500 (CDT)

> Very cool.  I added this to the FAQ at
> with some editing.  Tell me if you mind and I'll take it down.  I also
> edited it a bit so you may want to look at it and make sure it is
> still OK.

I don't mind.  I'm sure it will help people and keep you from getting so
many emails asking about bwlimit!

You might also want to put a URL to a source for cstream in there as
only people who've read your list archives will have a clue what we're
talking about.

>     One question though:  it looks like only ssh's output is getting
> piped into cstream.  I suppose this is appropriate since the source
> directory is remote.  Would the commands have to be rearranged if the
> source directory is local?

You mean (local source, remote destination)?  You must, because the
(local source, local destination) permutation wouldn't really benefit
from a bwlimit :-)

Good point.  It works for my setup because I'm limiting the bandwidth
coming into my box, which is acting as the rdiff-backup client.  I have
no need to limit the outgoing bandwidth usage.

If you were to reverse it (local source, remote dest), then it would
seem necessary to reverse the pipe ordering as you say.

In fact, I just tested:

rdiff-backup --remote-schema
  'cstream -v 1 -t 10000 | ssh %s '\''rdiff-backup --server'\'' | cstream -t 10000'
  'netbak@foo.bar.com::/mnt/backup' localbakdir

and it seems to work.  That would apply a limit in both directions.  I
don't think you'd ever really want to do this though as really you just
want to limit it in one direction.  Also, note how I only -v 1 in one
direction.  You probably don't want to output stats for both directions
as it will confuse whatever script you have parsing the output.  I guess
it wouldn't hurt for manual runs however.

So taking off the last cstream you get:

* rdiff-backup --remote-schema
  'cstream -v 1 -t 10000 | ssh %s '\''rdiff-backup --server'\'
  localsrcdir 'user@foo.bar.com::/remotebakdir'

I haven't tested that to see if the limiting works as we'd expect, but
it should.

The only other option would be to put cstream in the ssh command like:

rdiff-backup --remote-schema
  'ssh %s '\''cstream -v 1 -t 10000 | rdiff-backup --server'\'
  localsrcdir 'user@foo.bar.com::/remotebakdir'

I'm wracking my brain trying to determine what the difference between
that and the last option would be, if any.  I guess with this last
example you are limiting the incoming bandwidth on the remote dest box,
as contrasted to limiting the outgoing bandwidth on the local src box.
Network buffering will mean they should give slightly different
behaviours, with the former probably being "burstier", but the overall
bandwidth limit should still be valid.

Come to think of it, the same notes should apply, in reverse, to my
original usage of (remote src, local dest).

* rdiff-backup --remote-schema
  'ssh %s '\''rdiff-backup --server | cstream -v 1 -t 10000'\'
  'netbak@foo.bar.com::/mnt/backup' localbakdir

This may actually be preferable to eliminate the "burstiness" of
limiting it on the local destination.

In fact, I've just changed my script to do it this way.  It seems to
work perfectly.  Thanks for making me think more about this!

So if you want to update your FAQ, use the 2 examples I've marked with
*'s as they, in theory, should be the best methods.

One last thing: I'm trying to determine whether cstream limits bandwidth
on a per second basis or on a total-connect-time-average basis.  I'm
suspecting the latter.  By this I mean that it will "bank" bandwidth
during low or no transmission periods.  rdiff-backup can spend a lot of
time thinking, especially on big files, so there will be periods of time
where nothing is being transmitted.  If it "banks" the time then there
could be 30 seconds of no bw usage, then an initial burst of traffic
when rdiff-backup actually sends some data that could potentially
saturate your network interface connection before the "bank" runs out
and it goes back to your specified limit.

I'm only guessing at this, as I'm hard pressed to find an easy method of
testing this.  However, this would make it extremely bursty and somewhat
contrary to what I want to achieve with --bwlimit.

>From the cstream man page:

-t num    Limit the throughput of the data stream to num bytes/second.
       Limiting is done at the input side, you can rely on cstream not
       to accept more than this rate. cstream accumulates errors and
       tries to keep the overall rate at the specified value, not just
       every single read or pairs of reads in isolation.

That would seem to indicate my suspicions are correct.  :-(

This leads me to a suggestion.  rdiff-backup should either multithread
or fork into 2 processes so that while a file's data is transferring, it
can be calculating the next file and getting it ready for potential
transfer.  That way you keep the data pipe busy at all times, instead of
the think - transfer - think - transfer burstiness it seems to do now.

I'm not saying that this is imperative or you should triple the
complexity of the code to do this... (I am a programmer and know how
this stuff works)... but if you ever do a rewrite it would be something
to think about.

Phew, long email!  I'll leave it at that for now.  Hopefully this will
be helpful to someone (including me!).