Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!uunet!ig!daemon From: daemon@ig.UUCP Newsgroups: bionet.molbio.news Subject: CSLG|COMMENTARY: From Alex Reisner Message-ID: <4256@ig.ig.com> Date: Tue, 1-Dec-87 14:32:18 EST Article-I.D.: ig.4256 Posted: Tue Dec 1 14:32:18 1987 Date-Received: Sat, 5-Dec-87 13:14:38 EST Sender: daemon@presto.ig.com Lines: 24 From: Sunil MaulikComputers and the Sequencing of Large Genomes Currently the proposal to obtain the complete and ordered nucleotide sequence of the human genome comes immediately to mind when this topic is raised. Leaving aside the question as to whether it is sensible to set such a goal as a specific and highly directed project, the requirement for automated data acquisition, processing and analysis is an obvious necessity. The least difficult of problems in dealing with the data that would be acquired would be their storage and straight forward single sequence comparison searches with the data base. 3,000 million bases, allowing two bits per character (ignore ambiguity coding), is 750 Mbytes - Dealing with peptide sequences is slightly more complex. Allowing for storage overheads two current CD-ROMS would easily contain the lot. Add to that the additional information that one would want the database to contain and probably some half dozen 120mm CD-ROMS would be sufficient. With more advanced optical technology and 300-400mm discs a single disc would be more than adequate. The slowness of data transfer from optical media would, even now, be no particular obstacle, e.g. it could be overcome using even sub-Gbyte magnetic discs and high speed cashe memory as intermediate stages. At this stage not even particularly large sums would be involved. -------