Xref: utzoo comp.lang.c++:1631 comp.unix.questions:9265 comp.sources.wanted:5067
Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!uwmcsd1!marque!uunet!mcvax!hp4nl!philmds!leo
From: leo@philmds.UUCP (Leo de Wit)
Newsgroups: comp.lang.c++,comp.unix.questions,comp.sources.wanted
Subject: Re: "cut" needed to run CC
Summary: sed *IS* faster
Message-ID: <809@philmds.UUCP>
Date: 18 Sep 88 11:48:24 GMT
References: <990@acornrc.UUCP> <486@poseidon.UUCP> <911@riddle.UUCP>
Reply-To: leo@philmds.UUCP (Leo de Wit)
Organization: Philips I&E DTS Eindhoven
Lines: 49

In article <911@riddle.UUCP> domo@riddle.UUCP (Dominic Dunlop) writes:
    [lines deleted]...
>Quick hack fix: echo "How are you today" | awk -d" " '{print $1 " " $3}'
    [more lines deleted]...
>It is left as an exercise for the reader to perform the operation above
>using sed.  Clue: it ain't pretty...

Not so pretty as awk, but not as ugly as some nroff scripts I've seen 8-).
Testing an sed-version however revealed it was almost 4 times as fast as
the awk version.
Consider:

------------ start here for awk version --------------
#!/bin/sh
# cut by awk

exec awk -d" " '{print $1 " " $3}' $*
------------ end   here --------------

and

------------ start here for sed version --------------
#!/bin/sh
# cut by sed

sp='  *'
wd='[^ ][^ ]*'

exec sed "s/$sp$wd$sp\\($wd\\).*/ \\1/" $*
------------ end   here --------------

On a source text (a News mailbox) of about 180Kbyte this was the result:

awk-version
       44.1 real        41.4 user         1.8 sys
sed-version
       12.3 real        10.5 user         1.1 sys

Output was redirected to /dev/null; directing it to a file will probably
increase both real and sys time slightly.
If you are going to use it with a compiler, you'd better make sure it's
fast, considering how often and for how large inputs it is going to be
used.  Maybe it's even worth considering writing a small C program for
it, although I doubt this will gain much over the sed version.

           Leo.

B.T.W. Although awk has builtin mechanisms for handling words ($1 etc),
it is pretty disappointing here. Anyone cares to check out perl on this one?