Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!uunet!ig!daemon From: daemon@ig.UUCP Newsgroups: bionet.molbio.news Subject: CSLG|COMMENTARY: From Andrew Coulson Message-ID: <4307@ig.ig.com> Date: Fri, 4-Dec-87 22:23:25 EST Article-I.D.: ig.4307 Posted: Fri Dec 4 22:23:25 1987 Date-Received: Thu, 10-Dec-87 01:46:22 EST Sender: daemon@presto.ig.com Lines: 49 From: Sunil Maulik4-Dec-87 12:33:43-PST,9076;000000000001 Return-Path: <@WISCVM.WISC.EDU:A.F.W.Coulson@EDINBURGH.AC.UK> Received: from WISCVM.WISC.EDU by BIONET-20.ARPA with TCP; Fri 4 Dec 87 12:33:28-PST Received: from UKACRL.BITNET by WISCVM.WISC.EDU ; Fri, 04 Dec 87 14:34:41 CDT Received: from RL.IB by UKACRL.BITNET (Mailer X1.25) with BSMTP id 1126; Fri, 04 Dec 87 20:27:53 GMT Via: UK.AC.RL.EARN; Fri, 04 Dec 87 20:27:52 GMT Received: Via: 000015001006.FTP.MAIL; 4 DEC 87 20:27:44 GMT Date: 04 Dec 87 20:28:06 gmt From: A.F.W.Coulson@EDINBURGH.AC.UK Subject: CSLG Discussion or Conference To: MAULIK%arpa.bionet-20%RL.earn Message-ID: <04 Dec 87 20:28:06 gmt 100798@EMAS-A> May I make some comment on questions raised in the CSLG discussion? Searching large databases for sequence similarities. It is clear from several contributions that it is still not generally appreciated, even in the computer-sophisticated US, how grossly inefficient implementations of the exhaustive searching methods (NWS algorithms) are on machines of conventional architecture -- including conventional supercomputers. For some time we have been routinely running exhaustive searches of the protein sequence database, using the "Best Local Homology" algorithm of Smith and Waterman to compare the query sequence in turn with every known sequence, and returning a list of the best 4000+ similarities in the complete surface of comparison. This program will use any arbitrarily defined scoring scheme for both residue comparisons and indel penalties; and it takes 1-2 secs of cpu time per residue of the query sequence. The algorithm is essentially the same as that used in the second stage of the FASTP program; the key difference is that we can afford to run this exhaustive algorithm on the entire database, while FASTP attempts to filter out a subset of the database by a rapid and "approximate" search method, and runs the full search only on this subset. This program runs on the ICL Distributed Array Processor ("DAP"; DAP's are now being marketed by an ICL spin-off company called AMT), a massively parallel SIMD machine, which is orders of magnitude cheaper than any of the other high performance machines whose names have been invoked in this correspondence so far, because of the simplicity of the processors at each node; each processor in a DAP is a single bit full adder. Numerical calculation can be made to run on the DAP with great efficiency (returning an overall performance on these applications comparable to a Cray-1), but the architecture is obviously best suited to applications (such as the database search) involving a lot of logical and integer operations, and for these applications the machine out-performs anything we've heard of. -------