Expanding the include/exclude options

Ben Escoto bescoto@stanford.edu
Fri, 29 Mar 2002 12:31:43 -0800

Content-Type: text/plain; charset=us-ascii

>>>>> "DB" == Donovan Baarda <abo@minkirri.apana.org.au>
>>>>> wrote the following on Fri, 29 Mar 2002 21:21:44 +1100

  DB> Someone suggested forgetting about include/exclude lists, and
  DB> just using a list of files from something like 'find'. This is
  DB> fine, except when you want to selectively restore files, you
  DB> can't use 'find' to search through a backed-up file list, so
  DB> your "select for backup' tool ends up different to your "select
  DB> for restore" tool. This can be a pain.

Yes, excellent point.

  DB> rsync's extended unix-wildcard syntax is nice; directories end
  DB> in '/', '*' and '?' match anything except '/', ** and ?? match
  DB> anything including '/'.  rsync has taken the "require
  DB> directories to be explicitly included" approach which means you
  DB> need to do things like "--include /home/ --include /home/*/
  DB> --include /home/*/Mail/ --include /home/*/Mail/** --exclude **"
  DB> to get what you really wanted from the above example. It also
  DB> allows a sortof shorthand where anything without a '/' matches a
  DB> filename with any directory prefix.

Ok, so one question is python/perl style regular expressions vs
(extended?) shell globbing.  In favor of regular expressions:

1.  More flexible
2.  Entire syntax may be known by people already
3.  Backwards compatibility (--exclude already uses them)

And in favor of extended shell globbing:

1.  Superset (?) of normal globbing, which everyone knows about
2.  Less complicated (. means .)
3.  Allows implicit matching, if this is a good idea

The implicit including sounds like a good idea, and definitely would
be good to use with file lists.  It may be compatible with regular
expressions, or at least the most common ones, I'll have to think
about it.  Why doesn't rsync use that system?  Did they have a reason
or did it just turn out that way?

  DB> I posted that I had rsync include/exclude list code available
  DB> for this. I think the rsync method is perfect for this, and see
  DB> no reason to re-invent something else.

I remember that you had some code on this topic, but I was assuming
that none of the possible schemes would be too hard to implement.  So
it seemed that we should figure out the Right Way of doing this, and
then worry whether there was any code that did this.

  DB> I have code to do all of the above. The extended unix-wildcard
  DB> "efnmatch.py" is complete and attached. The include/exclude list
  DB> matching and directory scanning code is complete, but is
  DB> different from rsync in that it takes the "don't require
  DB> directories to be included, don't implicitly include them"
  DB> approach. I was going to expand this to to handle all of the
  DB> above before I posted them, but I thought I'd better post it now
  DB> before someone re-invents something worse :-). I'll post a
  DB> "Usage" blurb + help info to anyone that asks. I'll also update
  DB> it to do pretty much anything you want.

  DB> The future of this code is up in the air. I would like to
  DB> mantain and make them publicly available under GPL. I have a few
  DB> small Python projects on freshmeat that I mantain this way, but
  DB> this one is so small I'd feel embarased creating a project out
  DB> of it. Any suggestions as to the best way to support and
  DB> advertise this code are welcome :-)

Perhaps in some kind of python repository?  (Vaults of
Parnassus(sp?)?)  I never searched through any of them, but it seems
something like this would probably best be distributed as a python
module, so should be wherever it is people look for python modules.

Ben Escoto

Content-Type: application/pgp-signature

Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001