Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP Path: utzoo!utgpu!water!watmath!uunet!ig!daemon From: daemon@ig.UUCP Newsgroups: bionet.molbio.news Subject: CSLG: COMMENTARY: From Ellis Golub (5) Message-ID: <4264@ig.ig.com> Date: Tue, 1-Dec-87 14:45:08 EST Article-I.D.: ig.4264 Posted: Tue Dec 1 14:45:08 1987 Date-Received: Sat, 5-Dec-87 13:18:51 EST Sender: daemon@presto.ig.com Lines: 26 From: Sunil MaulikComputer Applications in the Sequencing of Large Genomes Easy access to coding and regulatory sequences for the entire human genome will also lead to an unprecedented growth in sequence derived data such as consensus sequences for transcriptional regulatory sites, splice junctions and other RNA processing signals, as well as a windfall in putative protein sequence data from open reading frames in the nucleotide sequences. This latter class of data will bring pressure on biochemists to locate the proteins coded by sequences of interest, and to determine their properties. One route currently employed to approach such problems is to predict the structure of the protein from its sequence, and to use the predicted structure as a basis for designing useful probes to study the actual protein, either in its natural tissue of origin, or in engineered expression systems. For example, prediction of continuous epitope locations and synthesis of isosequential peptides has been successful in eliciting the production of antibodies which are specific for the protein from which the sequence data was derived. The state of the art in protein structure prediction is quite primitive as yet and essentially empirical. To fully take advantage of the large genome sequence database, much effort will have to be expended in the further development of these methods. -------