Path: utzoo!attcan!uunet!husc6!rutgers!bellcore!clyde!watmath!watdcsu!dmcanzi From: dmcanzi@watdcsu.waterloo.edu (David Canzi) Newsgroups: comp.unix.wizards Subject: Re: what should egrep '|root' /etc/passwd print? Message-ID: <5060@watdcsu.waterloo.edu> Date: 19 Sep 88 03:55:58 GMT References: <44414@beno.seismo.CSS.GOV> <68203@sun.uucp> <8202@alice.UUCP> <410@quintus.UUCP> <8209@alice.UUCP> Reply-To: dmcanzi@watdcsu.waterloo.edu (David Canzi) Organization: U. of Waterloo, Ontario Lines: 34 In article <8209@alice.UUCP> andrew@alice.UUCP (Andrew Hume) writes: >it sounds appealing to allow a missing RE to mean the empty string >but i am unconvinced as to its utility. If x, y, and z are regular expressions, then xyz matches those strings which can be formed by concatenating any three strings X, Y, and Z where x matches X, y matches Y, and z matches Z. The expression 'x|y' matches any string that is matched by x or y. So, suppose y=''. Let x='aa' and z='bb'. Then xyz='aabb'. 'aa' is the only string x matches, and 'bb' is the only string z matches, 'aabb' is the only string xyz matches. The only thing left for y to match is the null string between 'aa' and 'bb'. Therefore, the null string matches the null string. Let x='' and y='root', so that x|y = '|root'. Then x|y matches the null string (because it matches x) and the string 'root' (because it matches y). So the egrep command in the subject line should print out all of /etc/passwd, since every line has the null string on it. This is intuitively obvious to me, but I tried to prove it because I'm not sure other people's intuitions are similar to mine. As for utility, consider the case, which I have actually run into, where I wanted an expression like 'aa(|bb)cc' to match the strings 'aacc' and 'aabbcc'. In this case, it's clear I want the expression in parentheses to match the null string. The program I was using wouldn't let me do this, and I had to use something like 'a(a|abb)cc' to get what I wanted. If I had had a program generate that expression, I would have had to add code to detect this special case and rewrite the regular expression. Yecch. -- David Canzi