Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!uunet!ig!daemon
From: daemon@ig.UUCP
Newsgroups: bionet.molbio.news
Subject: CSLG|COMMENTARY: From Andrew Coulson
Message-ID: <4307@ig.ig.com>
Date: Fri, 4-Dec-87 22:23:25 EST
Article-I.D.: ig.4307
Posted: Fri Dec  4 22:23:25 1987
Date-Received: Thu, 10-Dec-87 01:46:22 EST
Sender: daemon@presto.ig.com
Lines: 49

From: Sunil Maulik 

 4-Dec-87 12:33:43-PST,9076;000000000001
Return-Path: <@WISCVM.WISC.EDU:A.F.W.Coulson@EDINBURGH.AC.UK>
Received: from WISCVM.WISC.EDU by BIONET-20.ARPA with TCP; Fri 4 Dec 87 12:33:28-PST
Received: from UKACRL.BITNET by WISCVM.WISC.EDU ; Fri, 04 Dec 87 14:34:41 CDT
Received: from RL.IB by UKACRL.BITNET (Mailer X1.25) with BSMTP id 1126; Fri,
 04 Dec 87 20:27:53 GMT
Via:        UK.AC.RL.EARN; Fri, 04 Dec 87 20:27:52 GMT
Received:
Via:        000015001006.FTP.MAIL;  4 DEC 87 20:27:44 GMT
Date:       04 Dec 87  20:28:06 gmt
From:       A.F.W.Coulson@EDINBURGH.AC.UK
Subject:    CSLG Discussion or Conference
To:         MAULIK%arpa.bionet-20%RL.earn
Message-ID: <04 Dec 87  20:28:06 gmt  100798@EMAS-A>

May I make some comment on questions raised in the CSLG discussion?

       Searching large databases for sequence similarities.

       It is clear from several contributions that it is still not generally
appreciated, even in the computer-sophisticated US, how grossly inefficient
implementations of the exhaustive searching methods (NWS algorithms) are on
machines of conventional architecture -- including conventional supercomputers.
       For some time we have been routinely running exhaustive searches of the
protein sequence database, using the "Best Local Homology" algorithm of
Smith and Waterman to compare the query sequence in turn with every known
sequence, and returning a list of the best 4000+ similarities in the complete
surface of comparison.   This program will use any arbitrarily defined scoring
scheme for both residue comparisons and indel penalties; and it takes 1-2 secs
of cpu time per residue of the query sequence.  The algorithm is essentially
the same as that used in the second stage of the FASTP program; the key
difference is that we can afford to run this exhaustive algorithm on the
entire database, while FASTP attempts to filter out a subset of the database
by a rapid and "approximate" search method, and runs the full search only on
this subset.
       This program runs on the ICL Distributed Array Processor ("DAP"; DAP's
are now being marketed by an ICL spin-off company called AMT), a massively
parallel SIMD machine, which is orders of magnitude cheaper than any of
the other high performance machines whose names have been invoked in this
correspondence so far, because of the simplicity of the processors at each
node; each processor in a DAP is a single bit full adder.  Numerical calculation
can be made to run on the DAP with great efficiency (returning an overall
performance on these applications comparable to a Cray-1), but the architecture
is obviously best suited to applications (such as the database search) involving
a lot of logical and integer operations, and for these applications
the machine out-performs anything we've heard of.
-------