Path: utzoo!utgpu!water!watmath!clyde!att!rutgers!mailrus!cornell!blandy
From: blandy@marduk.cs.cornell.edu (Jim Blandy)
Newsgroups: comp.editors
Subject: Re: pattern matches
Message-ID: <18838@cornell.UUCP>
Date: 5 Jul 88 12:35:52 GMT
References: <427@grand.UUCP> <37200009@m.cs.uiuc.edu> <7618@watdragon.waterloo.edu> <11070@sol.ARPA>
Sender: nobody@cornell.UUCP
Reply-To: blandy@cs.cornell.edu (Jim Blandy)
Organization: Cornell Univ. CS Dept, Ithaca NY
Lines: 41

I think you guys are missing something important about regular expressions
and the way they're matched;  The rule is as follows:

	1) If it is AT ALL possible for the pattern to match, it will.
	   This means that a pattern like e*eeee will always match a
	   string consisting of at least four e's; there's no danger of
	   the * operator "overstepping its bounds."
	2) If one is searching for a pattern match in a large string,
	   (like, for example, a text buffer), then we are faced with
	   the question "HOW MUCH did it match?"  (This question is
	   especially important in a search-and-replace situation;
	   what are we to replace?)  That's where the "longest possible"
	   rule comes into play.

So the "longest possible" rule will never mess up a match that would 
otherwise be successful...

I do think negation is well-defined; using the proposed syntax, (pat)^
matches any string pat would not.  Since the set of strings matched by
pat is (presumably) well-defined, the set for (pat)^ is too.

About the claim that "negation should be trivial, since it only entails
flipping the accept/reject-ingness of the states in the automaton...":

If you're using a DFA (deterministic finite automaton) to match your
patterns, this is true.  No problem.  But that trick does not work
for NFA's (non-deterministic FA).  An NFA could be in any one
of several states after a certain input, some accepting, some rejecting;
the rule is, if any of the possible states are accepting states, the
NFA accepts its input, i.e. matches the pattern.  Okay, so now flip
all your states;  the NFA could be in some rejecting states and some
accepting states; you're not excluding everything you matched before.
You want to say "anything this matches, I don't," and that's not what happens.

Well, so what?  Well, most pattern-matching functions in editors simulate
an NFA.  All (I think?) the Unix editors do this.  So negation is not
a trivial thing.  I can't think of any simple way to do it off the bat,
but that's just me; maybe someone else?
--
Jim Blandy - blandy@crnlcs.bitnet
"insects were insects when man was just a burbling whatisit."  - archie