Path: utzoo!utgpu!watmath!clyde!att!rutgers!iuvax!jwmills
From: jwmills@iuvax.cs.indiana.edu (Jonathan Mills)
Newsgroups: comp.arch
Subject: Re: BENCHMARKS AND LIPS
Message-ID: <15512@iuvax.cs.indiana.edu>
Date: 3 Dec 88 04:21:54 GMT
References: <1740MLWLG@CUNYVM> <746@quintus.UUCP> <595@mqcomp.oz> <798@quintus.UUCP>
Reply-To: jwmills@iuvax.UUCP (Jonathan Mills)
Organization: Indiana University CSCI, Bloomington
Lines: 69


To expand a little on what Richard has said (and I agree with him) LIPS 
depend heavily on the ability to perform the following fragment of Warren 
abstract machine code (the inner loop of list concatenate - see An Abstract 
Prolog Instruction Set by D.H.D. Warren, SRI Tech Note 309, October 1983):

conc/3:	switch_on_term C1a, C1, C2, fail



C2:		get_list	A1	% conc( [
		unify_variable 	X4	%	X|
		unify_variable	A1	%	L1],

		get_variable 	X2,A2	%	L2,  

		get_list	A3	%	[
		unify_value	X4	%	X|
		unify_variable	A3	%	L3]  ) :-

		execute conc/3		%	conc( L1, L2, L3 ).


Because the functionality of this instruction sequence could be duplicated 
by writing it in assembler, Pascal, C, LISP, or whatever, the "LIPS" for 
ANY language or machine can be obtained... and be just as meaningful as you 
like.  Or as meaningless.

LIPS bear little relation to inferences in the rest of the system for other 
reasons:  smart Prolog compilers look for concatenate-like operations, and 
produce code optimized for it (so LIPS may vary between NREV and any other 
program by a factor of 2 to 10).  And if one is willing to accept as LIPS 
the rate at which a Prolog can do procedure calls, unbelievably high (and 
non-representative) speeds can be claimed.  (For example, benchmarking "a 
:- a", which reduces to "a: goto a" in assembly language produces a LIPS 
rate *approximately* equal to the MIPS rate of your CPU).  And I am still
embarrassed to recall a version of NREV on a DPS-88 that fit both the
program and the list into cache... and ran at 2.7 MegaLIPS...and dropped to
700 KLIPS when the list was made too large for the cache.  Live and learn.

Theorem provers and Prologs can be rated for LIPS, but again, this has 
little relation to the power of the system.  Proving Sam's Lemma takes ITP 
(an interactive theorem prover written in Pascal running on a VAX 11-780) 
16,000 seconds, ALS Prolog (running on an 88000 ANGELFIRE 1) 150 seconds, 
and OTTER (a batch theorem prover written in C running on a VAX 8800) 29 
seconds.  All use the set of support strategy, and generate roughly the 
same number of clauses, so the number of logical inferences are similar - 
but this tells little about what makes the programs' execution times vary 
(aside from the CPU, of course).

Because rewriting operations (subsumption, demodulation, paramodulation), 
clause selection (indexing), unification, and clause integration (asserts) 
all interact to determine how many clauses are generated (i.e., how many 
inferences are *completed*), it is not an easy task to give a single 
"inference rate".  Indeed, it would be NAIVE to do so - LIPS are now part 
of the *Prolog* mythos, and should be left there, as firmly embedded as is 
TAK in the Lisp community.  If anyone out there would undertake Prolog 
benchmarks analogous to Gabriel's benchmarks for Lisp, we would all 
benefit.  At least we could attempt to justify our prejudices with better 
numbers! (:-)

And when I have better numbers for the behavior of various theorem provers
under various constraints over a wide variety of domains, I'll post them.
Don't expect to see anything before 1990.

(ITP is available from NAG (Numerical Algorithms Group), and OTTER is
available from Argonne National Lab, try "mccune@mcs.anl.gov".  I can e-mail
a compiled version of Sam's Lemma (useful as a benchmark) to interested folks,
but the FOL compiler itself is a ways off yet.)