Peer Reviewed: Genomics technologies for environmental science

Sep 1, 2001 - Hemant J. Purohit , Dhanajay V Raje , Atya Kapley , Parsuraman Padmanabhan and Rishi N. ... Dan Spiegelman , Gavin Whissell , Charles W ...
0 downloads 0 Views 38MB Size
Genomics for Technologies

New molecular tools are creating innovative ways to address environmental challenges.

Environmental Science C H A R L E S W. G R E E R , LY L E G . W H Y T E ,

JOHN R. L AWRENCE,

LUKE MASSON, AND ROLAND BROUSSEAU

he ongoing explosion of nucleotide sequence data from prokaryotic and eukaryotic organisms is creating a vast information resource. Potential applications range from understanding and combating human disease to characterizing naturally occurring microbial communities, whose combined biomass is the single most important biological force influencing global elemental cycles and the balance of atmospheric gases. Although the health sector is moving rapidly into the postgenomics era, environmental genomics, the use of genomics to address environmental issues and problems, remains in its infancy. For example, although specific nucleotide sequence information has been available for some time, whole genome sequences for environmentally relevant microorganisms are only now beginning to appear in databases. Presently, ecological studies of microbial communities remain largely process-oriented. Measurable metabolic parameters, like nitrogen fixation and substrate biodegradation, are determined as a function of the whole community. The contribution of individual bacterial species to these parameters is nearly impossible to determine because many organisms cannot be cultivated in vitro. The use of genomics-based tools to augment these traditional methods promises to accelerate our understanding of the complexities of species diversity, population dynamics, and metabolic pathways within microbial communities in soil and water. Their use should markedly improve the reliability and accuracy of remediation activities and of predictions of ad-

KEN EWARD/BIOGRAFX

T

© 2001 American Chemical Society

verse environmental impacts before they happen. (In the following discussion of recent developments, readers who are less familiar with the terms used can refer to glossaries such as http://biotech.icmb.utexas. edu/search/dict-search.html and http://genomics. phrma.org/lexicon, and www.onelook.com.)

Sequencing projects In its broadest sense, genomics entails the complete sequencing of an organism’s entire complement of DNA, which consists of the bases adenine, guanine, cytosine, and thymine. The sequence of DNA for a particular gene is its genetic code, or blueprint, which can be translated into specific proteins, the key components in assembling all of the organism’s structures and regulating its functions, behavior, and physiology. Since 1995, 49 genomes (http://wit.integratedgenomics. com/GOLD) have been completely sequenced, of which 37 (www.tigr.org/tdb/mdb/mdbcomplete. html) are prokaryotic. A further 188 prokaryotic genomes (archaea and bacteria) and ~128 eukaryotic genomes are presently being sequenced. HansPeter Klenck of EPIDAUROS Biotechnolgie AG in Bernried, Germany, estimates that as of September 2000, ~200 additional microbial genomes are being sequenced by private organizations. Most microbial genome-sequencing projects are targeting pathogenic microorganisms in hopes of elucidating the mechanisms of their pathogenicity and discovering new drug targets. The remaining projects were initially directed at extremophiles (life forms that survive extremes of temperature, radiation, pressure, salt, and pH, often where no other life forms SEPTEMBER 1, 2001 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

I

365 A

can exist), primarily archaea, because of their significance to fundamental evolutionary processes and, to a lesser extent, their potential biotechnological applications. The sequence data obtained from these initial projects, whether from pathogenic bacteria or extremophiles, can be mined for useful sequence information relevant to closely related, environmentally important microorganisms—sequence data from the human pathogen Mycobacterium tuberculosis have enabled the identification of alkane degradation genes from related bacteria, in the genus Rhodococcus.

of Jan. 24, 2001, target microorganisms involved in carbon sequestering (8 genome projects), bioremediation (16 projects), energy production (4 projects), cellulose degradation (4 projects), industrial processes (8 projects), and technology development (4 projects). Among environmentally relevant organisms being fully sequenced are Deinococcus radiodurans, an astonishingly radiation-resistant bacterium with potential for the remediation of organic compounds in radioactive environments; Pseudomonas putida, which has a very versatile catabolic potential; and Dehalococcoides ethenogenes, which degrades the important groundwater pollutant trichloroethylene. How will the large amount of sequence data availFIGURE 1 able in the near future benefit environmental techAnalyzing nucleic acids extracted directly from nology? What technical challenges remain? Analyses of newly sequenced microbial genomes reveals that environmental samples ~40% ofof thefragment putative genes identified encode proIsolated DNA can be directly examined following reduction teins unknown functions, indicating that an size (i.e., restriction enzyme digestion) and cloning to having obtain functional enormous data or sequence information, or the fragments can be labeledreservoir and of interesting proteins and their biological to be exploited. This has mohybridized to microarrays. Alternatively, the DNA and RNA invalue the remain nucleic tivated the search for unique acid extract are separated and processed further to look for specific genes biomolecules from orthat or liveorganism in the most diverse environments or functions in individual organisms, or to evaluateganisms function on earth whose DNA may produce commercially virelatedness at the microbial community level. able bioactive molecules and enzymes. The nascent but rapidly developing sciences of bioinformatics (asEnvironmental sample sembling and annotating sequences, identifying genes (soil, sediment, water) and metabolic pathways) and functional genomics and proteomics (i.e., methodologies for determining the functions of proteins encoded by unknown genes) will undoubtedly identify new microbial processes Fragment, clone, involved in bioremediation and, equally important, Extract functional analysis, total community expand our knowledge of microbial diversity and ecolsequence analysi nucleic acids microarray analysis ogy. Genome sequencing data will also lead to the development of novel technologies and methodologies, genomic approaches, and molecular monitoring tools for studying the structure and functions of comPolymerase Reverse transcriptase plex microbial communities, including those associchain reaction (PCR) PCR amplification of RNA ated with contaminated environments. amplification of DNA (production of cDNA)

Genomic-based tools The study of nonculturable organisms has benefited Cloning and sequencing of rRNA genes, database analysis, phylogeny Analysis of gene expression in the total community

Denaturing gradient enormously from recent advances in molecular gegel electrophoresis or thermal gradient gel netics. Prokaryotic cells represent a large amount of electrophoresis of the global biological diversity and may make up more ribosomal DNA gene than 50% of the planet’s protoplasmic biomass (1). Restriction fragme length polymorphism

Detection of speci catabolic/functional

With costs of genome sequencing substantially reduced, an increasing number of microbial genomes are being sequenced from microorganisms considered important from both industrial and environmental perspectives. Many of the publicly available genome sequencing projects directed toward these organisms are sponsored by the U.S. Department of Energy’s (DOE’s) Microbial Genome Project in collaboration with other partners. The DOE projects, as 366 A

I

ENVIRONMENTAL SCIENCE & TECHNOLOGY / SEPTEMBER 1, 2001

Most of these organisms are located in subsurface terrestrial and oceanic environments, and because of this inaccessability, have received little attention. Although these microorganisms have survived on the planet for over 3.7 billion years and are found in every conceivable environment, most have not been successfully cultured in the laboratory. About 99% of the microorganisms present in complex environments like soil have yet to be cultivated. Molecular genetic analyses of total nucleic acids extracted from environmental samples provide our only window for studying these microbial populations. Figure 1 schematically represents many of the important molecular analyses typically performed on nucleic acids extracted from environmental samples. The methods used to recover nucleic acids from environmental samples are often dictated by the sample’s source (2). Soils and sediments contain large

quantities of insoluble/particulate material that must be separated from solubilized nucleic acids, which in turn must be separated from coextracted humic substances known to interfere with DNA analysis. Sludge and biofilm samples contain large quantities of biomaterial, such as cell wall polymers, that can interfere with efficient separation and recovery of nucleic acids. Direct extraction techniques, in which cells are disrupted before separation from particulate material, have been developed and are generally more effective than techniques that separate the cells from the particulates before disruption. Although direct extraction techniques yield DNA from most samples, these extracts require additional purification before they can be effectively used in hybridization or polymerase chain reaction (PCR) procedures. Purified messenger RNA (mRNA)—the chemical “messenger” or transcript that carries a synthesis blueprint from the DNA of the cell to its protein synthetic apparatus—is an excellent measure of cellular functions at any given time. Unfortunately, the technical challenges of isolating mRNA from natural environments are imposing. RNA is highly susceptible to enzymatic degradation prevalent in “dirty” preparations like soil. Moreover, the asynchronous growth of bacterial cells in many natural environments, which are often characterized by starvation conditions, coupled with the rapid turnover and relatively short life cycles of bacterial transcripts, makes them difficult to isolate in appreciable quantities. Studying mRNA production under typical soil conditions will likely remain a challenge for the future. An extremely important development in molecular biology is the PCR process, an in vitro process that amplifies a specific fragment (template) of DNA, using primers (two short oligonucleotides that bind to opposite strands of the template DNA and act as initiators of the amplification reaction) designed from the desired target sequence to produce large quantities of the specific DNA fragment. The process can be used to detect specific genes and microorganisms or to produce large amounts of DNA to be used as probes. Because of the difficulties of working with RNA, it is generally converted into complementary DNA (cDNA) using reverse transcriptase-PCR (RT-PCR). This very powerful technique provides an opportunity to address unculturable bacteria and is an integral element of many of the techniques described below. Derivations of the PCR technique have been used to compare or classify closely related bacterial isolates (e.g., random amplified polymorphic DNAPCR) and to perform in situ detection or analysis of single isolates and total communities (primed in situ amplification). The DNA sequences that code for 16S rRNA (the small ribosomal subunit RNA) have a special value in bacterial taxonomy, as essential cellular functions, such as protein synthesis, have evolved slowly. Thousands of 16S rDNA sequences have been determined, providing a taxonomical basis that is coherent, informative, and easily applied to novel or nonculturable species. Universal primers capable of amplifying rDNA from bacteria, archaea, or eukarya

have been designed and successfully used for several environmental studies. PCR using conserved regions of the rRNA genes permits amplification of many diverse fragments, originating from different organisms in environmental nucleic acid extracts. These fragments, although of essentially identical length, differ substantially at the nucleotide sequence level, and this property is used to subsequently separate them using denaturing- or thermal-gradient gel electrophoresis. The resolved fragment patterns can be analyzed using image analysis software, and the DNA can be gel-extracted for subsequent nucleotide sequence determination. Comparisons of the fragment patterns are used to assess changes in microbial populations and community composition with time or with different treatments, and the sequence of extracted fragments is used to identify members of the community. Although the greatest impact of using rRNA gene sequences has been a reclassification of all living organisms into three domains above the kingdom level (3), its greatest utility in environmental research has been to identify closely related novel organisms (4) and for designing specific fluorescent-labeled oligonucleotide probes (fluorescent in situ hybridization, or FISH) to monitor individual organisms in complex natural environments. Although PCR amplification bias can occur (factors that affect the relative amplification frequencies of different genes), these methods remain useful for a range of environmental applications, independent of culturebased approaches, and have contributed significantly to increasing our knowledge of microbial diversity, particularly in the identification of viable but nonculturable microorganisms (5). Recently, the coupling of FISH with flow cytometry has provided microbial ecologists with a powerful approach for exploring microbial community structure such as those found in complex aquatic environments (6). Catabolic gene probes that are designed from specific genes involved in key enzymatic steps in the microbial degradation pathways for environmental pollutants—such as petroleum hydrocarbons, chlorinated or nitrated organic compounds, and various pesticides (2,4–dichlorophenoxyacetic acid, atrazine)—have also been used to monitor contaminated environments. This approach can be used to examine both pristine and contaminated environments to determine the presence of organisms having specific functional capabilities. Known genes are used in the design of probes or PCR primers to screen for genes of similar function from environmental samples or isolates. As more nucleotide sequence information becomes available, additional genes from catabolic pathways involved in organic pollutant degradation or heavy metal reduction will be discovered.

Environmental utility of microarrays DNA microarrays are an ordered arrangement of multiple DNA probes (>10,000 spots, each ∼100 µm in diameter) printed onto a solid surface. Typically, these surfaces are treated glass slides similar to the common (25 mm × 75 mm) microscope slide. The two SEPTEMBER 1, 2001 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

I

367 A

primary types of DNA used in microarrays are oligonucleotides and PCR fragments (amplicons). The great advantage of oligonucleotide microarrays is that they can be designed directly from GenBank sequence data without handling the actual organism from which the sequence was derived. In contrast, the length of a PCR amplicon allows for variations in the stringency of hybridization, which is useful when dealing with sequence variation within a species. A PCR amplicon-based microarray can give a reliable signal with sequences differing by 20% or more from the immobilized probe, whereas an oligonucleotide of average length (25–30 bases) may show significant signal degradation even with a single base mismatch. The immobilization procedures for PCR amplicons are also simpler and more robust than oligonucleotides. Balanced against these advantages are the added cost of synthesizing two PCR primers for each amplicon (unless one is fortunate enough to be using universal primers, for instance, in conserved regions of 16S rDNA) and the added difficulties and expense of obtaining the relevant organisms, preparing the genomic DNA templates, and subsequently, the PCR amplicons. Microarrays combine analysis of vast numbers of genes with parallel data acquisition (7). Microbiology, whether classical or molecular, has generally relied on the isolation of pure cultures followed by identification and characterization of each isolate. In contrast, the microarray approach analyzes in a single experiment total extracted DNA or RNA on an array that detects all the species that may reasonably be expected in the sample. For instance, a wastewater sample could be analyzed simultaneously for the presence of poliovirus and Escherichia coli O157:H7, a virus and a bacterium, each of which requires distinctly different culture protocols. This parallel processing power allows experimental designs that would be much less costly and time-consuming than conventional methods. Microarrays have two major areas of application (see sidebar on the next page) in environmental genomics: the simultaneous evaluation of the differential gene expression within a bacterial strain or bacterial community, and the simultaneous detection of a large number of microbial genes. The first approach requires extraction of mRNA followed by conversion into cDNA. The answers obtained in this instance provide gene expression results for selected target genes, such as housekeeping, catabolic, or facultatively expressed genes. The second approach relies on the extraction of microbial DNA. Depending on the genes used and the prior use of PCR, the answers obtained may provide information about the presence or absence of organisms or virulence genes, or could focus instead on the relative abundance of species, genera, or microbial domains in a given sample. The most common approach to detection and measurement of hybridization between the sample and the relevant probes on the microarray is to label the sample DNA with a covalently attached fluorescent molecule. Fluorescent enzymatic labeling of DNA extracted from environmental samples presents difficulties because of possible copurification of polymerase inhibitors such as humic acid, por368 A

I

ENVIRONMENTAL SCIENCE & TECHNOLOGY / SEPTEMBER 1, 2001

phyrins, and colloidal iron. Alternate labeling approaches, using direct chemical labeling, are therefore favored in many applications. DNA hybridization between the labeled sample and the microarray is measured as a fluorescence signal in a commercial microarray scanner. A variety of fluorescent probes are available, but most of the published work uses the fluorescent cyanine dyes Cy3 and Cy5. The fluorescent labeling approach works well within a research environment, but it is not well suited to field use or routine applications. More robust detection strategies based on magnetic detection, interferometry, and electrochemistry are being developed to address this problem. In general, microarrays do not offer the low detection levels available by PCR and may not be suited for detection of low-abundance organisms in biologically complex samples. A partial answer to this problem lies in the coupling of PCR to microarrays; this may prove quite advantageous, especially when universal primers in conserved sequences such as 16S rRNA genes, DNA gyrases, or cpn60 (8) genes can be used. One would then combine the parallel identification of hundreds of diagnostic sequences simultaneously, while having to perform only one amplification test and maintain the low detection limits typical of PCR. Although the known problems of PCR bias (9) will also manifest themselves in this approach, the possibility of independent confirmation on the same array with different conserved genes and primers provides an experimental safeguard.

Example applications A variety of bioprocesses are used to treat environmental pollution, including landfarming, biopiles, subsurface permeable barriers for remediating contaminated aquifers, and activated sludge or anaerobic granule-containing bioreactors for industrial wastewater treatment. These processes are often subject to perturbations that can jeopardize the treatment efficiency of the system, or even result in total system failure. The use of microarray-based monitoring systems that target key organisms and their functions could prove to be a versatile strategy for rapidly detecting system imbalances and provide an opportunity to take the necessary corrective action to restore system integrity. Microarray technology has great potential for assessing changes in the functionality of microbial communities, that is, enrichment or depletion of specific functional groups that are linked to the presence of particular stressors. For example, several genes code for the initial step in alkane degradation (alkB2 from Rhodococcus spp. Q15 and 16531, alkB from Pseudomonas oleovorans, alkM from Acinetobacter calcoaceticus), and changes in their abundance and expression indicate both a response to contamination and the potential for natural remediation of contamination. Comparisons of the changes in functional gene presence or absence, and in relative abundance, may be made, provided that valid “uncontaminated reference” sites are available. Microarrays may also be used to assess the response of a microbial community to the challenge of

a specific stressor or combinations of stressors. It is possible to examine facAssessing environmental samples using microarray technology tors such as the total number or range There are two basic applications involving DNA microarrays in environmental geof functional genes, the richness within nomics. Initially, total community nucleic acids are extracted, and the DNA and RNA specific functional genes (e.g., diversity are separated. As shown in the sidebar figure, the level of gene expression in an orof alkane degradation genes), the ganism (differential gene expression) is assessed by comparing mRNA levels (A, norchanges in the proportions of functionmal) with a change in a single external parameter (B, altered), such as exposure to al genes, and the presence or absence of an environmental (temperature) or chemical (substrate) alteration (left-hand side of functional genes. From a regulatory perfigure). This type of application—understanding how a whole organism reacts at the spective, the development of gene migene level—currently plays a major role in both toxico- and pharmacogenomics. croarrays that target a broad range of functional genes provides a potentially rapid and efficient method of using molDNA microarray applications ecular information to assess changes in biodiversity and community function. Environmental sample This in turn provides a basis for the de(soil, sediment, water) termination of environmental effects. As previously mentioned, there are major hurdles to using microarrays for analyzing mRNA in environmental samExtract total nucleic acids ples. Nevertheless, they may be used to assess gene expression in a communiSeparation and ty and responses to external perturbapurification of DNA and RNA tions at the molecular level. Microarrays can routinely process numerous samRNA ples, so the problems of sampling inDNA tensity and required replication due to A B Fragment the heterogeneity in natural systems DNA Analysis may be addressed. The potential exists Convert RNA to cDNA to produce less equivocal field data, providing much stronger inference reLabel nucleic garding causality and acceptable limits acid(s) with fluorescent for stressors in terms of community redye(s) sponse. Assuming that the challenges of analyzing mRNA can be overcome, traditional toxicology, as well as experimental and field survey studies, sepaHybridization Differential Metabolic/ Detection gene functional of target rately and in concert, will still be needed expression analysis gene/organism to understand the significance of a specific gene array profile and how it reflects community function and impacts at various trophic levels. A natural consequence of the availAlternatively, one could substitute total community RNA to look at differential gene ability of high-density human, mouse, expression at the community level. As illustrated in the figure, the two sets of RNA (A and rat oligonucleotide and cDNA miand B) are labeled with different fluorescent dyes during or after conversion to crorrays used in drug testing and recDNA. The mixed cDNA is hybridized to the chip, then washed and scanned at two search has been their direct application different wavelengths to separately detect the individual fluorescent dyes. The scans in toxicogenomics—the use of bioinare superimposed, and through the use of specialized imaging software, each spot is formatics and genomics to assess mechassessed for different fluorescent intensities of the two labels. The spots are then anisms of action of toxicants. For color-coded to indicate alterations—up or down regulated—in the relative levels of example, the same array used to detect the mRNA between the two samples. The cDNA can also be analyzed for the expresundesired oestrogenic side effects in a sion of specific target genes (metabolic/functional analysis). candidate drug can be used to detect A second area of application is the detection of known genes or groups of similar the effects of these substances in envigene sequences (right side of figure). This approach is primarily useful for biomonitorronmental samples (10). The newly creing studies involving the detection of specific microorganisms (e.g., through 16S rDNA) ated science of toxicogenomics offers or specific genes (e.g., key degradative enzymes) in total extracted community DNA. the promise of detecting and avoiding nonlethal, subtle, long-term side effects that do not show up in short-term studies on laboratory animals. Research at the National Laboratory (Oak Ridge, TN) and the U.S. Army Institute of Environmental Health Sciences in Engineer Research and Development Center Research Triangle Park, NC, is exploiting this process, (Vicksburg, MS) are developing “genosensors”— specifically addressing the effects of toxic compounds oligonucleotide arrays designed to detect changes in on humans. Researchers at the Oak Ridge National the expression of stress-response genes. These are SEPTEMBER 1, 2001 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

I

369 A

being used to monitor pollutant impacts in different environments. The ability to rapidly and unequivocally detect potential human pathogens in water is of immense importance to helping ensure the integrity of potable water treatment and distribution systems and to identifying harmful inputs. The level of protection provided by current methods relying on indicator organisms, such as coliform bacteria, would be strengthened by the actual detection of a broader variety of potential human pathogens, typified by Cryptosporidium and Giardia—parasitic organisms often associated with farm runoff. A microarray for detecting multiple pathogens in water is now being commercially developed.

Postgenomic world The study and understanding of diverse groups of microbes are leading to solutions to problems such as environmental cleanup; drug development and pathogen detection in medicine; agriculture; control and monitoring of industrial processes; energy production and its use; and detection of biological warfare agents. As we enter into the postgenomic era, many of the molecular tools outlined in this article will allow us to move beyond the broader processoriented studies of microbial communities to a finer level of analysis of biodiversity and physiological ecology at the species level. With the current pace of sequence data accumulation, it is exciting to note that the barrier imposed by process-oriented studies will soon be surpassed as the new sequence information

370 A

I

ENVIRONMENTAL SCIENCE & TECHNOLOGY / SEPTEMBER 1, 2001

being generated is combined with other technologies like flow cytometry (11). The ability to identify and monitor multiple parameters from individual cells (both culturable and nonculturable) within a population will improve our analysis of microbial ecology and bioproceses. The potential uses of this information are as limitless as our imaginations.

References (1) Whitman, W. B.; Coleman, D. C.; Wiebe, W. J. Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 6578–6583. (2) Miller, D. N.; Bryant, J. E.; Madsen, E. L.; Ghiorse, W. C. Appl. Environ. Microbiol. 1999, 65, 4715–4724. (3) Woese, C. R.; Kandler, O.; Wheelis, M. L. Proc. Natl. Acad. Sci. U.S.A. 1990, 87, 4576–4579. (4) Hugenholtz, P.; Pace, N. R. Trends Biotechnol. 1996, 14, 190–197. (5) Amann, R. I.; Ludwig, W.; Schleifer, K. H. Microbiol. Rev. 1995, 59, 143–169. (6) Porter, J.; Pickup, R. W. J. Microbiol. Methods 2000, 42, 75–79. (7) Graves, D. J. Trends Biotechnol. 1999, 17, 127–134. (8) Goh, S. H.; Potter, S.;Wood, J.; Hemmingsen, S. M.; Reynolds, R. P.; Chow, A. W. J. Clin. Microbiol. 1996, 34, 818–823. (9) Suzuki, M. T.; Giovannoni, S. J. Appl. Environ. Microbiol. 1996, 62, 625–630. (10) Nuwaysir, E. F.; Bittner, M.; Trent, J.; Barrett, J. C.; Afshari, C. A. Mol. Carcinog. 1999, 24, 153−159. (11) Collier, J. L.; Campbell, L. Hydrobiologia 1999, 401, 33–53.

Charles W. Greer, Lyle G.Whyte, Luke Masson, and Roland Brousseau are research officers in the Environmental Biotechnology Sector at the Biotechnology Research Institute of the National Research Council of Canada in Montreal. John R. Lawrence is a research scientist at the National Water Research Institute of Environment Canada in Saskatoon.