Xref: utzoo comp.unix.wizards:8983 comp.unix.questions:7331 Path: utzoo!attcan!uunet!mcvax!philmds!leo From: leo@philmds.UUCP (Leo de Wit) Newsgroups: comp.unix.wizards,comp.unix.questions Subject: Re: grep replacement Message-ID: <488@philmds.UUCP> Date: 31 May 88 10:42:03 GMT References: <7882@alice.UUCP> <5630@umn-cs.cs.umn.edu> <6866@elroy.Jpl.Nasa.Gov> <2312@bgsuvax.UUCP> <292@ncar.ucar.edu> Reply-To: leo@philmds.UUCP (L.J.M. de Wit) Organization: Philips I&E DTS Eindhoven Lines: 177 In article <292@ncar.ucar.edu> russ@groucho.UCAR.EDU (Russ Rew) writes: >I also recently had a need for printing multi-line "records" in which a >specified pattern appeared somewhere in the record. The following >short csh script uses the awk capability to treat whole lines as fields >and empty lines as record separators to print all the records from >standard input that contain a line matching a regular specified as an >argument: > >#!/bin/csh -f >awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} ' > > Awk is a nice solution, but sed is a much faster one. I've been following the 'grep' discussion for some time now, and have seen much demand for features that are simply within sed. Here are some; I have left the discussion about the function of this or that sed-command out: there is a sed article and a man page... Patrick Powell writes: >The other facility is to find multiple line patterns, as in: >find the pair of lines that have pattern1 in the first line >pattern2 in the second, etc. Try this one: sed -n -e '/PATTERN1/,/PATTERN2/p' file It prints all lines between PATTERN1 and PATTERN2 matches. Of course you can have subcommands to do special things (with '{' I mean). Alan (..!cit-vax!elroy!alan) writes: >One thing I would _love_ is to be able to find the context of what I've >found, for example, to find the two (n?) surrounding lines. I have wanted >to do this many times and there is no good way. There is. Try this one: sed -n -e ' /PATTERN/{ x p x p n p } h' file It prints the line before, the line containing the PATTERN, and the line after. Of course you can make the output fancier and the number of lines printed larger. David Connet writes: >> >>One thing I would _love_ is to be able to find the context of what I've >>found, for example, to find the two (n?) surrounding lines. I have wanted >>to do this many times and there is no good way. >Also, what line number it was found on. Sed can also handle this one: sed -n -e '/PATTERN/=' file Lloyd Zusman writes: >Or another way to get this functionality would be for this new greplike >thing to allow matches on the newline character. For example: > ^.*foo\nbar.*$ > ^^ > newline Sed can match on embedded newline characters in the substitute command (it is indeed \n here!). The trailing newline is matched by $. Barry Shein writes [story about relative addressing]: >I dunno, food for thought, like I said, maybe there's a generalization >here somewhere. Or maybe grep should just emit line numbers in a form >which could be post-processed by sed for fancier output (grep in >backquotes on sed line.) Therefore none of this is necessary :-) Quite right. I think most times you want to see the context it is in interactive use. In that case you can write a simple sed-script that does just what is needed, i.e. display the [/pattern/-N] through [/pattern/+N] lines , where N is a constant. The example I gave for N == 1 can be extended for larger N, with fancy output etc. Bill Wyatt writes: >> There have been times when I wanted a grep that would print out the >> first occurrence and then stop. > >grep '(your_pattern_here)' | head -1 Much simpler, and faster: sed -n -e '/PATTERN/{ p q }' file Sed quits immediately after finding the first match. You could even create an alias for something like that. Michael Morrell writes: >>Also, what line number it was found on. >grep -n does this, but I'd like to see an option which ONLY prints the line >numbers where the pattern was found. The sed trick does this: sed -n -e '/PATTERN/=' file Or you could even: sed -n -e '/PATTERN/{ = q }' file which prints the first matched line number and exits. Roy Smith writes: >wyatt@cfa.harvard.EDU (Bill Wyatt) writes: >[as a way to get just the first occurance of pattern] >> grep '(your_pattern_here)' | head -1 > Yes, it'll certainly work, but I think it bypasses the original >intention; to save CPU time. If I had a 1000 line file with pattern on >line 7, I want grep to read the first 7 lines, print out line 7, and exit. >grep|head, on the other hand, will read and search all 1000 lines of the >file; it won't exit (with a EPIPE) until it writes another line to stdout >and finds that head has already exited. In fact, if grep block-buffers its >output, it may never do more than a single write(2) and never notice that >head has exited. Quite right. The sed-solution I mentioned before is fast and neat. In fact, who needs head: sed 10q does the job, as you can find in a book of Kernigan and Pike, I thought the title was 'the Unix Programming Environment'. Stan Brown writes: > Along this same general line it would be nice to be abble to > look for paterns that span lines. But perhaps this would be > tom complete a change in the philosophy of grep ? As I mentioned before, embedded newlines can be matched by sed in the substitute command. What I also see often is things like grep 'pattern' file | sed 'expression' A pity a lot of people don't know that sed can do the pattern matching itself. S. E. D. (Sic Erat Demonstrandum) As far as options for a new grep are conceirned, I suggest to use the options proposed (and no more). Let other tools handle other problems - that's in the Un*x spirit. What I would appreciate most in a new grep is: no more grep, egrep, fgrep, just one tool that can be both fast (for fixed strings) and elaborate (for pattern matching like egrep). The 'bm' tool that was on the net (author Peter Bain) is very fast for fixed strings, using the Boyer-Moore algorithm. Maybe this knowledge could be 'joined in'...? Leo.