Path: utzoo!utgpu!watmath!clyde!att!rutgers!iuvax!jwmills From: jwmills@iuvax.cs.indiana.edu (Jonathan Mills) Newsgroups: comp.arch Subject: Re: BENCHMARKS AND LIPS Message-ID: <15512@iuvax.cs.indiana.edu> Date: 3 Dec 88 04:21:54 GMT References: <1740MLWLG@CUNYVM> <746@quintus.UUCP> <595@mqcomp.oz> <798@quintus.UUCP> Reply-To: jwmills@iuvax.UUCP (Jonathan Mills) Organization: Indiana University CSCI, Bloomington Lines: 69 To expand a little on what Richard has said (and I agree with him) LIPS depend heavily on the ability to perform the following fragment of Warren abstract machine code (the inner loop of list concatenate - see An Abstract Prolog Instruction Set by D.H.D. Warren, SRI Tech Note 309, October 1983): conc/3: switch_on_term C1a, C1, C2, failC2: get_list A1 % conc( [ unify_variable X4 % X| unify_variable A1 % L1], get_variable X2,A2 % L2, get_list A3 % [ unify_value X4 % X| unify_variable A3 % L3] ) :- execute conc/3 % conc( L1, L2, L3 ). Because the functionality of this instruction sequence could be duplicated by writing it in assembler, Pascal, C, LISP, or whatever, the "LIPS" for ANY language or machine can be obtained... and be just as meaningful as you like. Or as meaningless. LIPS bear little relation to inferences in the rest of the system for other reasons: smart Prolog compilers look for concatenate-like operations, and produce code optimized for it (so LIPS may vary between NREV and any other program by a factor of 2 to 10). And if one is willing to accept as LIPS the rate at which a Prolog can do procedure calls, unbelievably high (and non-representative) speeds can be claimed. (For example, benchmarking "a :- a", which reduces to "a: goto a" in assembly language produces a LIPS rate *approximately* equal to the MIPS rate of your CPU). And I am still embarrassed to recall a version of NREV on a DPS-88 that fit both the program and the list into cache... and ran at 2.7 MegaLIPS...and dropped to 700 KLIPS when the list was made too large for the cache. Live and learn. Theorem provers and Prologs can be rated for LIPS, but again, this has little relation to the power of the system. Proving Sam's Lemma takes ITP (an interactive theorem prover written in Pascal running on a VAX 11-780) 16,000 seconds, ALS Prolog (running on an 88000 ANGELFIRE 1) 150 seconds, and OTTER (a batch theorem prover written in C running on a VAX 8800) 29 seconds. All use the set of support strategy, and generate roughly the same number of clauses, so the number of logical inferences are similar - but this tells little about what makes the programs' execution times vary (aside from the CPU, of course). Because rewriting operations (subsumption, demodulation, paramodulation), clause selection (indexing), unification, and clause integration (asserts) all interact to determine how many clauses are generated (i.e., how many inferences are *completed*), it is not an easy task to give a single "inference rate". Indeed, it would be NAIVE to do so - LIPS are now part of the *Prolog* mythos, and should be left there, as firmly embedded as is TAK in the Lisp community. If anyone out there would undertake Prolog benchmarks analogous to Gabriel's benchmarks for Lisp, we would all benefit. At least we could attempt to justify our prejudices with better numbers! (:-) And when I have better numbers for the behavior of various theorem provers under various constraints over a wide variety of domains, I'll post them. Don't expect to see anything before 1990. (ITP is available from NAG (Numerical Algorithms Group), and OTTER is available from Argonne National Lab, try "mccune@mcs.anl.gov". I can e-mail a compiled version of Sam's Lemma (useful as a benchmark) to interested folks, but the FOL compiler itself is a ways off yet.)