Path: utzoo!utgpu!watmath!clyde!att!pacbell!ames!sgi!arisia!quintus!ok
From: ok@quintus.uucp (Richard A. O'Keefe)
Newsgroups: comp.arch
Subject: Re: BENCHMARKS AND LIPS
Message-ID: <798@quintus.UUCP>
Date: 2 Dec 88 10:49:02 GMT
References: <1740MLWLG@CUNYVM> <746@quintus.UUCP> <595@mqcomp.oz>
Sender: news@quintus.UUCP
Reply-To: ok@quintus.UUCP (Richard A. O'Keefe)
Organization: Quintus Computer Systems, Inc.
Lines: 36

In article <595@mqcomp.oz> s8504867@mqcomp.mq.oz (John Gardner) writes:
>In article <746@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes:
>>Logical Inferences Per Second is a property of a _Prolog_ implementation,

>Come off the grass.  That's like saying MIPS are a property of C
>only.  While it is nice to do benchmarks by only varying what you want to
>compare, it is certainly valid to calculate LIPS using any theorem prover,
>not just prolog.  Do you think that all other theorem provers are incapable
>of logical inferences ?  Do you think prolog is the only langauge availible
>for this sort of work ?

Stop knocking down straw men; you'll only get straw in your hair and then
what will the neighbours think?

It is no more "valid to calculate LIPS using any theorem prover" than it
is to calculate Dhrystones "using any program".  Remember: the meaning
of a compound noun is *not* a simple composition of the meanings of the
words it is made from.  Is a foot-hill a hill made of feet?  Is a
benchmark a mark on a bench?  (Not now.)  Yes, other programs are
theorem provers capable of drawing logical inferences.  Prolog is a
pretty weak theorem prover, that's _why_ it is usable as a programming
language.  If you compare the number of resolutions per second in a good
theorem prover (say Markgraf Karl, or ITP) with the kind of LIPS rating
a good Prolog should get (a) the theorem prover would look _terrible_,
and (b) you would learn nothing of interest.  In fact, you don't learn
a whole lot comparing the LIPS rating of two Prolog systems, either.
Run some tests and you can get quite a few more procedure calls per second
than the LIPS rating; run some others and you can get far fewer.

We should be trying to get rid of "LIPS", not trying to spread the disease.

For comparing theorem provers in general, there's a book of examples
from one of the Argonne crowd which might be useful for benchmarking.
Cpu and wall time in seconds to solve each of those problems would be
much more illuminating than a single figure which favours depth-first
search.