Expanding the include/exclude options
Fri, 29 Mar 2002 12:31:43 -0800
Content-Type: text/plain; charset=us-ascii
>>>>> "DB" == Donovan Baarda <firstname.lastname@example.org>
>>>>> wrote the following on Fri, 29 Mar 2002 21:21:44 +1100
DB> Someone suggested forgetting about include/exclude lists, and
DB> just using a list of files from something like 'find'. This is
DB> fine, except when you want to selectively restore files, you
DB> can't use 'find' to search through a backed-up file list, so
DB> your "select for backup' tool ends up different to your "select
DB> for restore" tool. This can be a pain.
Yes, excellent point.
DB> rsync's extended unix-wildcard syntax is nice; directories end
DB> in '/', '*' and '?' match anything except '/', ** and ?? match
DB> anything including '/'. rsync has taken the "require
DB> directories to be explicitly included" approach which means you
DB> need to do things like "--include /home/ --include /home/*/
DB> --include /home/*/Mail/ --include /home/*/Mail/** --exclude **"
DB> to get what you really wanted from the above example. It also
DB> allows a sortof shorthand where anything without a '/' matches a
DB> filename with any directory prefix.
Ok, so one question is python/perl style regular expressions vs
(extended?) shell globbing. In favor of regular expressions:
1. More flexible
2. Entire syntax may be known by people already
3. Backwards compatibility (--exclude already uses them)
And in favor of extended shell globbing:
1. Superset (?) of normal globbing, which everyone knows about
2. Less complicated (. means .)
3. Allows implicit matching, if this is a good idea
The implicit including sounds like a good idea, and definitely would
be good to use with file lists. It may be compatible with regular
expressions, or at least the most common ones, I'll have to think
about it. Why doesn't rsync use that system? Did they have a reason
or did it just turn out that way?
DB> I posted that I had rsync include/exclude list code available
DB> for this. I think the rsync method is perfect for this, and see
DB> no reason to re-invent something else.
I remember that you had some code on this topic, but I was assuming
that none of the possible schemes would be too hard to implement. So
it seemed that we should figure out the Right Way of doing this, and
then worry whether there was any code that did this.
DB> I have code to do all of the above. The extended unix-wildcard
DB> "efnmatch.py" is complete and attached. The include/exclude list
DB> matching and directory scanning code is complete, but is
DB> different from rsync in that it takes the "don't require
DB> directories to be included, don't implicitly include them"
DB> approach. I was going to expand this to to handle all of the
DB> above before I posted them, but I thought I'd better post it now
DB> before someone re-invents something worse :-). I'll post a
DB> "Usage" blurb + help info to anyone that asks. I'll also update
DB> it to do pretty much anything you want.
DB> The future of this code is up in the air. I would like to
DB> mantain and make them publicly available under GPL. I have a few
DB> small Python projects on freshmeat that I mantain this way, but
DB> this one is so small I'd feel embarased creating a project out
DB> of it. Any suggestions as to the best way to support and
DB> advertise this code are welcome :-)
Perhaps in some kind of python repository? (Vaults of
Parnassus(sp?)?) I never searched through any of them, but it seems
something like this would probably best be distributed as a python
module, so should be wherever it is people look for python modules.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.5 01/15/2001
-----END PGP SIGNATURE-----