Path: utzoo!utgpu!water!watmath!clyde!att!rutgers!mailrus!cornell!blandy From: blandy@marduk.cs.cornell.edu (Jim Blandy) Newsgroups: comp.editors Subject: Re: pattern matches Message-ID: <18838@cornell.UUCP> Date: 5 Jul 88 12:35:52 GMT References: <427@grand.UUCP> <37200009@m.cs.uiuc.edu> <7618@watdragon.waterloo.edu> <11070@sol.ARPA> Sender: nobody@cornell.UUCP Reply-To: blandy@cs.cornell.edu (Jim Blandy) Organization: Cornell Univ. CS Dept, Ithaca NY Lines: 41 I think you guys are missing something important about regular expressions and the way they're matched; The rule is as follows: 1) If it is AT ALL possible for the pattern to match, it will. This means that a pattern like e*eeee will always match a string consisting of at least four e's; there's no danger of the * operator "overstepping its bounds." 2) If one is searching for a pattern match in a large string, (like, for example, a text buffer), then we are faced with the question "HOW MUCH did it match?" (This question is especially important in a search-and-replace situation; what are we to replace?) That's where the "longest possible" rule comes into play. So the "longest possible" rule will never mess up a match that would otherwise be successful... I do think negation is well-defined; using the proposed syntax, (pat)^ matches any string pat would not. Since the set of strings matched by pat is (presumably) well-defined, the set for (pat)^ is too. About the claim that "negation should be trivial, since it only entails flipping the accept/reject-ingness of the states in the automaton...": If you're using a DFA (deterministic finite automaton) to match your patterns, this is true. No problem. But that trick does not work for NFA's (non-deterministic FA). An NFA could be in any one of several states after a certain input, some accepting, some rejecting; the rule is, if any of the possible states are accepting states, the NFA accepts its input, i.e. matches the pattern. Okay, so now flip all your states; the NFA could be in some rejecting states and some accepting states; you're not excluding everything you matched before. You want to say "anything this matches, I don't," and that's not what happens. Well, so what? Well, most pattern-matching functions in editors simulate an NFA. All (I think?) the Unix editors do this. So negation is not a trivial thing. I can't think of any simple way to do it off the bat, but that's just me; maybe someone else? -- Jim Blandy - blandy@crnlcs.bitnet "insects were insects when man was just a burbling whatisit." - archie