Loading whole filelist into memory

Donovan Baarda abo@minkirri.apana.org.au
Fri, 29 Mar 2002 21:39:30 +1100

On Thu, Mar 28, 2002 at 12:12:02AM -0800, Ben Escoto wrote:
> Another thing I wanted feedback on was how to process filelists.  For
> technical reasons it would be a lot easier for me to read the whole
> filelist into memory and sort it, instead of, say, reading a line,
> backing up that file, reading the next line, etc.  The main
> differences would be:

generally, reading a whole filelist is much simpler. However, it also does
chew memory, and it also extends the time-gap between when a file is
"scanned" and when it is backed up. This means you are more likely to hit
the "file has changed/disapeared/appeared between scanning and backing up"

> Also, --exclude-from-filelist wouldn't make much sense if the entire
> filelist couldn't be read first.

not entirely true... a smart scanner can exclude files and skip whole
directories as it scans.

Currently my "dirscan.py" module builds and returns a big python list of all
matching files. This was so you could do things like; 

for file in scan(startdir,selectlist):
    do something...
I'm thinking of changing/extending this so that it can be used to process
files as they are scanned. The simplest approach would be to introduce an
os.walk() style command that applies a function to each matching file as it
finds them. A probably better way would be for me to delve into how things
like xrange work to see if I could implement something like it.

ABO: finger abo@minkirri.apana.org.au for more info, including pgp key