Loading whole filelist into memory

Ben Escoto bescoto@stanford.edu
Thu, 28 Mar 2002 00:12:02 -0800

Content-Type: text/plain; charset=us-ascii

Another thing I wanted feedback on was how to process filelists.  For
technical reasons it would be a lot easier for me to read the whole
filelist into memory and sort it, instead of, say, reading a line,
backing up that file, reading the next line, etc.  The main
differences would be:

1.  Applications couldn't generate filelists that depended on
    rdiff-backup already having processed an earlier file in the
    list.  This is a stretch; I don't think it would be an issue
    in real life.

2.  More importantly, large filelists may not fit into memory easily.
    For instance, rsync builds whole filelists ahead of time, and for
    that reason often consumes hundreds of megabytes of memory.  Say
    each entry in a file list takes up 60 bytes.  If the filelist
    contained 10 million files, that's 600 MB.

3.  I guess it could take a long time to sort long filelists, but
    probably for the most part they'd come pre-sorted and sorting 10
    million entries wouldn't take long anyway (?).

Also, --exclude-from-filelist wouldn't make much sense if the entire
filelist couldn't be read first.

Ben Escoto

Content-Type: application/pgp-signature

Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001