regexp support

Donovan Baarda
Wed, 10 Apr 2002 19:16:51 +1000

On Wed, Apr 10, 2002 at 10:09:47AM +0200, Louis-David Mitterrand wrote:
> On Tue, Apr 09, 2002 at 10:02:32AM -0700, Ben Escoto wrote:
> >   LDM> '\.exe$' will not exclude file.exe even though I did not give
> >   LDM> any start anchor, one have to write '.*\.exe$' which seems a
> >   LDM> bit counter-intuitive as regexps should match partially.
> > 
> > Hmm, I would have thought the opposite, that regular expressions
> > should only match from the beginning.
> That's what anchors are for. If you want to match from the beginning
> then anchor the expression with '^'. At least that's the way it works in
> Perl (sorry for mentionning that language ;-).

In python, "match" type regex compares must match from the beginning. "search"
regex compares are the ones to use for what you want. I personally think that
Python's "match" is a waste of space, as search with explicit anchors is more

> > If the regular expression starts with (?i) it will be interpreted
> > case-insensitively.  So '(?i).*\.(exe|sys)$' may be what you want.
> Great, I hadn't tried that. Apparently the Python regexp engine is
> totally Perl-compatible (heh!).

I think they both use a common library these days...

> I've just read the thread on --exclude --include syntax plans for
> rdiff-backup. Please don't follow the rsync route by using their
> perverted shell globing and rules. Much too complicated and inelegant
> IMHO. Regular expressions are the way to go, please don't deviate from
> your initial implementation (save the anchoring), everyone should know
> and use regular expressions. The only rule should be that --include
> overrides --exclude as in rsync.

I agree regex's are great, you can do just about anything with them.
Particularly with the perl/python extended regex's. The entire --include
--exclude list functionality can be provided by a single regex --exclude.
There is no --include/-exclude wildcard list that can't be implemented by a
single regex --exclude. In fact I'm pretty sure I could implement the
--include --exclude wildcard stuff (inefficiently) by getting it to generate
and compile a single huge regex.

However, they can become quite complicated for end-users wanting to just
select files. They have a syntax that was not "tuned" for filename matching,
and hence common filename chars like '.' have a special meaning and must be
escaped. Most of my more elaborate regex work has required writing scripts
to assemble the complex regex, rather than hand-coding a 1000+ char regex.

I think an application specific simple syntax can be more "handy" than a
general purpose powerful syntax for a simple application. I think the rsync
extended wildcards are a good match to this application.

just my 2c :-)

ABO: finger for more info, including pgp key