Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!uunet!ig!daemon
From: daemon@ig.UUCP
Newsgroups: bionet.molbio.news
Subject: CSLG|COMMENTARY: From Alex Reisner
Message-ID: <4256@ig.ig.com>
Date: Tue, 1-Dec-87 14:32:18 EST
Article-I.D.: ig.4256
Posted: Tue Dec  1 14:32:18 1987
Date-Received: Sat, 5-Dec-87 13:14:38 EST
Sender: daemon@presto.ig.com
Lines: 24

From: Sunil Maulik 

		Computers and the Sequencing of Large Genomes

	Currently the proposal to obtain the complete and ordered nucleotide 
sequence of the human genome comes immediately to mind when this topic is 
raised.  Leaving aside the question as to whether it is sensible to set such 
a goal as a specific and highly directed project, the requirement for 
automated data acquisition, processing and analysis is an obvious necessity.
	The least difficult of problems in dealing with the data that would be 
acquired would be their storage and straight forward single sequence 
comparison searches with the data base.  3,000 million bases, allowing two 
bits per character (ignore ambiguity coding), is 750 Mbytes - Dealing with 
peptide sequences is slightly more complex.  Allowing for storage overheads 
two current CD-ROMS would easily contain the lot. Add to that the additional 
information that one would want the database to contain and probably some 
half dozen 120mm CD-ROMS would be sufficient.  With more advanced optical 
technology and 300-400mm discs a single disc would be more than adequate.  
The slowness of data transfer from optical media would, even now, be no 
particular obstacle, e.g. it could be overcome using even sub-Gbyte magnetic 
discs and high speed cashe memory as intermediate stages.  At this stage not 
even particularly large sums would be involved.
	
-------