Loading whole filelist into memory

Donovan Baarda abo@minkirri.apana.org.au
Sat, 30 Mar 2002 17:30:56 +1100

On Fri, Mar 29, 2002 at 12:04:01PM -0800, Ben Escoto wrote:
> >>>>> "DB" == Donovan Baarda <abo@minkirri.apana.org.au>
> >>>>> wrote the following on Fri, 29 Mar 2002 21:39:30 +1100
>   >> Also, --exclude-from-filelist wouldn't make much sense if the
>   >> entire filelist couldn't be read first.
>   DB> not entirely true... a smart scanner can exclude files and skip
>   DB> whole directories as it scans.
> Well, suppose there is an exclude filelist.  rdiff-backup wants to
> start backing up, so it begins with file, say, /bin/ls.  Should it
> back it up, or it is somewhere in the exclude list?  Unless we require
> the exclude list to be sorted or something like that we can't process
> a single file until the whole list is read.

Ahh. A miss-understanding. 

I thought you meant reading the whole directory tree filelist, not the whole
include/exclude list. I can't really see any way of avoiding reading the
whole include/exclude list into memory. 
>   DB> Currently my "dirscan.py" module builds and returns a big python
>   DB> list of all matching files. This was so you could do things
>   DB> like;
>   DB> for file in scan(startdir,selectlist): do something...
>   DB> I'm thinking of changing/extending this so that it can be used
>   DB> to process files as they are scanned. The simplest approach
>   DB> would be to introduce an os.walk() style command that applies a
>   DB> function to each matching file as it finds them. A probably
>   DB> better way would be for me to delve into how things like xrange
>   DB> work to see if I could implement something like it.
> They are called generators and are a great new feature of python 2.2.
> So you can use the exact same:
> for file in scan(startdir,selectlist):
>     do something...
> but have scan(..) yield objects as they are requested by the for loop.

I've thus far been avoiding 2.2 features. I thought that since 2.1 had
xrange, there might be a way to make it do the same thing... but maybe not.
I thought maybe you could do something wierd with a UserList that builds the
elements as they are referenced...

ABO: finger abo@minkirri.apana.org.au for more info, including pgp key