Report
Integrated Genomic/ The challenge of analyzing biological samples is driving diverse disciplines toward a common analytical platform. A comprehensive understanding of biological systems requires knowledge of the chemical composition of cellular systems, and this understanding is becoming increasingly important in elucidating disease mechanisms (1). Less than 2% of the noninfectious human disease load is monogenic in nature (2). The remaining 98% is polygenic (involving several genes) in origin or of an epigenetic nature (i.e., nongenetic). Therefore, elucidating disease mechanisms will require diagnostic tools ranging from direct DNA sequencing to mRNA profiling, protein sequencing protein localization studies and metabolic profiling. Another essential step is characterizing the normal range of human polymorphisms (localized changes in a specific DNA sequence in a genome), which provides a necessary benchmark for correlating genetic variance with disease states. The situation is further complicated because the biological diversity associated with disease may also be influenced by post-transla-
William Hancock Alex Apffel John Chakel Karen Hahnenberger Gargi Choudhary Agilent Laboratories
Joseph A. Traina Erno Pungor Berlex Biosciences 742 A
Analytical Chemistry News & Features, November 1, 1999
Proteomic Analysis tional processes (structural modifications of the initial protein transcript) controlled by the cellular environment—changes that cannot be inferred from known DNA variance. With the mapping of the human genome proceeding rapidly, scientists have recognized the need to characterize the corresponding gene products, namely proteins (3-6). In a recent book (in a chapter entitled "Proteome: A new field of biology"), it was noted that "PROTEOME indicates the PROTEins expressed by a genOMEortissue"(3). The tremendous technical advances in both genomics and proteomics have also spawned a related revolution in bioinformatics. Bioinformatics can be described as the acquisition, analysis, and storage of biological information, specifically nucleic acid and protein sequences (7). Computational biology is the development of algorithms and computer programs integral to these endeavors. For example, the powerful program, Entrez allows ssers to move from DNA sequences to the corresponding proteins, the chromosome maps of the genes, and the three-dimensional structures of proteins. The goal of computational biology is to generate prot?rams that allow the user to extract more information than was originallv entered into the system (7) An examnle of this process known as "data mining" is the inferring of structural functional and evolutionary relationshiosvia nrotein sequence alignments (phyloeenetics') Another challenge is to close the gap between protein entries in databases versus the entries in protein sequence databases resulting from DNA sequencing—in 1997, the difference was 507 versus 428,814, respectively (7). Traditional protein techniques, such as N-terminal sequencing performed on individual samples, cannot hope to bridge this gap, and, thus, the biologist must rely on predictive methods. This predictive approach uses amino acid composi-
tional lata auch aa molecular weights, hydrophobicity values, isoelectric points, peptide masses, and the presence of fc-helices so (3-sheets. Such information is used in various algorithms to assign new protein sequences to known protein families (7). This Report will concentrate on issues involved with integrating genomic and proteomic data in the context of a biological system and some of the technology issues related to this goal. The importance of integrating the two fields was recently emphasized by the discovery that a low correlation exists between the protein and mRNA abundance for one gene product across 60 cell lines (4) and for selected genes in yeast (5). This has become an active area of study now that data from genome and proteome studies are becoming available. Many pharmaceutical companies, in search of new dni(? tarsrets are usint? a combination of genomic and proteomic information to drive screening programs for broad
basis, but it will ultimately be performed with the raw information from new, massively parallel, miniaturized analytical systems that are being designed to give the necessary throughput. Integrating genomic and proteomic analysis A gene's DNA sequence provides a basic map of the amino acid sequence of the corresponding translated protein. However, gene function can be regulated by manipulating any of the many steps required for gene expression in higher eukaryotic cells (9). Examples include transcription control by DNA methylation or by mRNA editing. Thus, the presence of a given DNA sequence does not guarantee the synthesis of a corresponding protein. In addition, DNA information is insufficient to describe protein structure and function because much of a protein's complexity arises from cellular context-dependent post-translational processes (2)
which are conserved in many genomes (8) At present much of this integration is being done on an ad hoc
In many cases, the activity and distribution of a protein within an organism depends on processes such as phosphoryla-
Glossary Amplicon Antisense drugs bp Epigenetic Genome mRNA pBR322 Plasmid Polygenic Polymorphism Post-translational processes Primer Proteome Restriction enzyme RFLP
DNA fragment generated by PCR Oligonucleotides whose sequence is antiparallel to the sense gene and can be used to interfere with normal expression Base pair; also used to denote double-stranded DNA Nongenetic An organism's complete set of genes and chromosomes Messenger RNA, which is used to specify the sequence of amino acids in a protein Plasmid cloning vector containing unique sites that allow insertion of foreign DNA into a production cell Closed circular DNA duplex that can replicate in a host cell Involving several genes Localized change in a specific DNA sequence in a genome Structural modifications of initial protein transcript Short oligonucleotide used for DNA polymerase initiation Proteins expressed by the genome An endonuclease that recognizes specific target nucleotide sequences and catalyzes hydrolysis of DNA Restriction fragment length polymorphism resulting from DNA polymorphism in two or more individuals
A valuable reference is Dictionary of Gene Technology, G. Kahl, VCH: New York, NY, 1995.
Analytical Chemistry News & Features, November 1, 1999
743 A
Report
Table 1. Analytical approaches to DNA analysis. Approach Conventional electrophoresis Sequencing Slab-gel CGE Sizing Slab-gel (agarose) CGE (pulsed field) Mutation detection Single-strand conformational polymorphism Constant denaturant CE
Performance
700 b/4 h/lane 600 b/2 h/capillary > 1 Mb > 1 Mb