Path: utzoo!attcan!uunet!lll-winken!lll-tis!ames!elroy!jpl-devvax!lwall
From: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall)
Newsgroups: comp.unix.wizards
Subject: Re: what should egrep '|root' /etc/passwd print?
Message-ID: <2894@jpl-devvax.JPL.NASA.GOV>
Date: 19 Sep 88 22:49:58 GMT
References: <44414@beno.seismo.CSS.GOV> <68203@sun.uucp> <8202@alice.UUCP> <410@quintus.UUCP> <8209@alice.UUCP> <5060@watdcsu.waterloo.edu>
Reply-To: lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall)
Organization: Jet Propulsion Laboratory, Pasadena, CA.
Lines: 32

In article <5060@watdcsu.waterloo.edu> dmcanzi@watdcsu.waterloo.edu (David Canzi) writes:
: As for utility, consider the case, which I have actually run into,
: where I wanted an expression like 'aa(|bb)cc' to match the strings
: 'aacc' and 'aabbcc'.  In this case, it's clear I want the expression
: in parentheses to match the null string.  The program I was using
: wouldn't let me do this, and I had to use something like 'a(a|abb)cc'
: to get what I wanted.  If I had had a program generate that expression,
: I would have had to add code to detect this special case and rewrite
: the regular expression.  Yecch.

Interestingly enough, in Henry Spencer's regexp routines (which I borrowed
for perl), if you say /aa(bb)?cc/, it gets translated internally to
the equivalent /aa(bb|)cc/.

The null string should match anything because the whole idea of regular
expressions involves rejecting strings that you can't match.  To match /abc/,
you say "For each of the next N characters, bomb out if it doesn't match.
Otherwise it matches."  You don't go and change the rules just because N
happens to be 0 sometimes.

If you DO change the rules on boundary conditions, people who write program
generators will hate you forever, as David mentioned.  I know, I've been
there.  "Whaddya mean, I can't declare an array of size 0?"

Or look at it another way.  As the pattern gets shorter and shorter,
it matches more and more things.  When it gets as short as it can,
it ought to match as many things as it can, by the Principle of Least
Surprise.

Let's hear it for intuitionalization.

Larry Wall
lwall@jpl-devvax.jpl.nasa.gov