Xref: utzoo comp.arch:6063 comp.lang.prolog:1180 Path: utzoo!utgpu!water!watmath!clyde!att!osu-cis!tut.cis.ohio-state.edu!mailrus!cornell!uw-beaver!teknowledge-vaxc!sri-unix!quintus!ok From: ok@quintus.uucp (Richard A. O'Keefe) Newsgroups: comp.arch,comp.lang.prolog Subject: Re: Perils of comparison -- an example Message-ID: <292@quintus.UUCP> Date: 14 Aug 88 20:41:30 GMT References: <282@quintus.UUCP> <15221@shemp.CS.UCLA.EDU> Sender: news@quintus.UUCP Reply-To: ok@quintus.UUCP (Richard A. O'Keefe) Organization: Quintus Computer Systems, Inc. Lines: 41 In article <15221@shemp.CS.UCLA.EDU> casey@cs.ucla.edu.UUCP (Casey Leedom) writes: >In article <282@quintus.UUCP> ok@quintus () writes: >> >> ... kLI/s are defined solely by that particular benchmark, by the way. >> Other benchmarks may be "procedure calls per second", but _only_ Naive >> Reverse gives "logical instructions". > > I believe "kLI/s" is 1000's of Logical Inferences per second (but I may >be wrong of course). This is normally abrieviated as kLIPS. Really fast >PROLOG machines are rated in mLIPS (10^6 LIPS). Right, it is "logical _inferences_ per second". Silly me. There is a single specific benchmark, called naive reverse, which happens to do 496 procedure calls. To determine the kLI/s rating, you run this benchmark N times, for some large N. If it takes T seconds, you report (496*N)/T as the LIPS rating. When you are benchmarking, it is necessary to be precise about what you have measured. Some people have taken any old small program and reported the number of procedure calls it did per second as LIPS. It simply won't *DO*! Procedures can have different numbers of arguments, and the cost of head unification can range from next to nothing to exponential in the size of the arguments. Don't get me wrong: Naive Reverse is not a specially good benchmark. (Think about the fact that native code for it fits comfortably into a 68020's on-chip instruction cache...) But using *different* benchmarks when talking about different machines can't yield better comparisons! There is a more comprehensive set of micro-benchmarks which was described in AI Expert last year. Instead of a single LI/s rating, it would be better to report an "AIE spectrum". But even the best micro-benchmarks don't always predict the performance of real programs well, for reasons explained in the SmallTalk books, amongst others. One of the things which makes the DLM article credible is that it reports figures for several other (small) benchmarks (I surmise that "quickstart" really meant "quicksort"). I have seen enough papers that report really high performance where the system described seems never to have run anything _but_ Naive Reverse. At least the DLM is realer than that!