Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!uunet!ig!daemon From: daemon@ig.UUCP Newsgroups: bionet.molbio.news Subject: CSLG|COMMENTARY: From Ellis Golub (2) Message-ID: <4260@ig.ig.com> Date: Tue, 1-Dec-87 14:39:45 EST Article-I.D.: ig.4260 Posted: Tue Dec 1 14:39:45 1987 Date-Received: Sat, 5-Dec-87 13:17:08 EST Sender: daemon@presto.ig.com Lines: 30 From: Sunil MaulikComputer Applications in the Sequencing of Large Genomes At present, the Genbank database consists of approximately 14,000 sequences comprising ~15 mb. To search this entire database using IFIND on BIONET is already not practical, and searches are often restricted to subsets of the total database. For example, the mammalian and unannotated sequences comprise about 7000 sequences consisting of more than 6 mb. A recent search of this segment of Genbank using a 1.6 kb probe required approximately 3 hours of cpu time on the BIONET computer in batch mode. The same program running on a VAX (about 5 times faster for the Sieve of Erosthenes benchmark) required about 45 minutes of cpu time. Using the faster Lipman and Pearson algorithm, XFASTN on BIONET, 1.7 mb in 1565 sequences were searched in 9 min, while another implementation of the Wilbur and Lipman method on a VAX searched the mammalian and unannotated lists in about 20 min. As the search time is highly dependant on the probe size (smaller is faster) and the word size (larger is faster), these searches were conducted with approximately similar parameters. It was also somewhat distressing that several of these searches returned different lists of similar sequences. Clearly, attempts to apply these techniques to the complete human genome (~ 30 gb; 2000 times larger than the current Genbank database) will strain all available facilities beyond the breaking point. The recent proposal to begin accumulating sequence data of this magnitude poses a clear challenge to the molecular biology software community to develop new and faster algorithms for new and faster hardware in order to provide tools capable of practical utilization of gigabase-databases. -------