Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Posting-Version: version B 2.10.2 9/5/84; site zaphod.UUCP Path: utzoo!watmath!clyde!burl!ulysses!mhuxr!mhuxt!houxm!ihnp4!alberta!sask!zaphod!bobd From: bobd@zaphod.UUCP (Bob Dalgleish) Newsgroups: net.unix,net.bugs.usg Subject: Re: fgrep (isn't) Message-ID: <293@zaphod.UUCP> Date: Thu, 11-Jul-85 14:08:06 EDT Article-I.D.: zaphod.293 Posted: Thu Jul 11 14:08:06 1985 Date-Received: Wed, 17-Jul-85 04:21:23 EDT References: <495@unisoft.UUCP> Reply-To: bobd@zaphod.UUCP (Bob Dalgleish) Distribution: net Organization: Develcon Electronics, Saskatoon, SK Lines: 43 Xref: watmath net.unix:5036 net.bugs.usg:255 Summary: In article <495@unisoft.UUCP> fnf@unisoft.UUCP writes: >After grabbing the bgrep distribution off of mod.sources recently >I decided to try a quick test of the various grep's on one our system >V release 2 ports: > > Trial 1: *grep FOOBARBLETCH /etc/termcap > Trial 2: *grep BARF /etc/termcap > Trial 3: *grep MI /etc/termcap > > [Table depicting /bin/time stats for "benchmarks"] >Notice that plain old grep is the fastest of all, and fgrep is the slowest! > >Fred Fish UniSoft Systems Inc, 739 Allston Way, Berkeley, CA 94710 USA The Boyer-Moore pattern matching algorithm is slower than a naive pattern matching algorithm in many cases (including all of the cases in the "benchmark"). It uses a lookahead set to decide how much to advance the pattern against the subject string. Using a one or two character pattern causes the pattern matching overhead (both the setup and runtime) to greatly exceed the matching operation. The BM algorithm works best on longer strings, *especially* strings that resemble the subject string, i.e., the front part of the pattern is in the subject string, with variations in the rest of the pattern. Since only one of the benchmark patterns actually occurs in the subject string file (at least I presume it does - it doesn't in mine), the partial match recovery that the BM algorithm uses will not come into effect more than a few times overall. A much better test would use (for instance) a misspelled word in a large document. The BM algorithm would then show some improvements. As mentioned in the documentation for the grep family, ideally there need only be one program that determines the best algorithm to use for the pattern. Since the best algorithm actually depends on both the pattern and the subject spaces, "best" is not possible in practice. REMEMBER, in benchmarking as in everything else: CHOOSE HORSES FOR COURSES. -- [Forgive me, Father, for I have signed ...] Bob Dalgleish ...!alberta!sask!zaphod!bobd ihnp4! (My company has disclaimed any knowledge of me and whatever I might say)