Multicolor Super-Resolution DNA Imaging for

Jun 14, 2012 - ABSTRACT: Many types of cancer and neurodegenerative diseases are caused by abnormalities and variations in the genome. We have designe...
13 downloads 5 Views 237KB Size
npg

© 2012 Nature America, Inc. All rights reserved.

news and v i ews This process is performed automatically by a new commercial instrument that vastly improves data quality relative to previous similar techniques. Lam et al.3 analyze 95 bacterial artificial chromosome (BAC) clones spanning the 4.7-Mb MHC region from two individuals. First, they use a nicking enzyme and a polymerase to incorporate fluorescently labeled nucleotides at specific sequence sites in the BAC DNA. Next, the labeled molecules are loaded onto a chip containing the nanochannel array. One optical scan of the loaded array collects images of 23,000 molecules, corresponding to 3 Gb of DNA sequence. The molecules range in size from 20–220 kb, with the majority 100–170 kb in length, as expected from the sizes of the BAC clones. These data are used to create maps of the locations of the labeled sites. Maps of individual BACs are tiled to cover the MHC region and show excellent agreement with the reference sequence. The mapping data are sufficient to detect errors in the reference sequence (verified by resequencing) as well as to resolve haplotypes and to identify the presence and location of structural variations (Fig.  1). The authors assemble the optical maps into three long contigs spanning the 4.7-Mb MHC region. They then sequence the BAC DNA on an Illumina instrument, assemble de novo the resulting short reads into contigs and use the optical maps as scaffolds to assemble the much shorter sequencing contigs, which rarely span more than several hundred kilobases. Despite these encouraging results for the BAC library, analysis of unamplified, native genomic DNA has yet to be demonstrated. Such experiments would allow direct comparison of long genomic regions between single chromosomes. Because each measured molecule originates from a single cell, the method of Lam et al.3 avoids the need for single-cell processing when comparing regions in the size range of several hundred kilobases extracted from cell populations. This will also mitigate PCR amplification bias and maintain native information that does not transfer to amplified material, such as chemical DNA modifications. The beauty of optical imaging analysis is the potential to distinguish between different types of information through color. For example, the use of two (or more) sets of sequence-specific patterns may greatly increase the resolution and reliability of genome mapping, reducing the coverage needed and increasing the

frequency of haplotype detection. This may be improved even more by the use of superresolution techniques during data acquisition and analysis7–9. One of the most exciting prospects for optical techniques capable of probing native, unamplified DNA is the possibility of mapping genetic information simultaneously with epigenetic information, such as DNA methylation patterns, or the positions and identity of DNA-binding proteins 9. The integration of genome mapping approaches with single-cell manipulation and processing methods10 may lead to a truly comprehensive genetic-epigenetic view of single chromosomes.

COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. Treangen, T.J. & Salzberg, S.L. Nat. Rev. Genet. 13, 36–46 (2012). 2. 1000 Genomes Project Consortium. Nature 467, 1061–1073 (2010). 3. Lam, E.T. et al. Nat. Biotechnol. 30, 771–776 (2012). 4. Zohar, H. & Muller, S.J. Nanoscale 3, 3027–3039 (2011). 5. Samad, A., Huff, E.F., Cai, W. & Schwartz, D.C. Genome Res. 5, 1–4 (1995). 6. Michalet, X. Science 277, 1518–1523 (1997). 7. Baday, M. et al. Nano Lett. doi:10.1021/nl302069q (2012). 8. Neely, R.K. et al. Chem. Sci. 1, 453–460 (2010). 9. Kim, S. et al. Angew. Chem. Int. Ed. 51, 3578–3581 (2012). 10. Fan, H.C., Wang, J., Potanina, A. & Quake, S.R. Nat. Biotechnol. 29, 51–57 (2011).

Transcriptome sequencing of single cells with Smart-Seq Jillian J Goetz & Jeffrey M Trimarchi A method for performing RNA-Seq on picograms of mRNA facilitates gene expression profiling of rare cells. A single eukaryotic cell contains a vanishingly small amount of RNA—around one thousandth of what’s needed for standard RNA sequencing protocols. In this issue, Ramsköld et al.1 introduce Smart-Seq, a robust and reproducible method for sequencing the transcriptomes of single cells that far outperforms existing methods in capturing sequences across the full length of mRNAs. As the authors demonstrate in studies of gene expression and alternative splicing in cancer cells, the new approach opens up a host of possible inquiries into RNA processing events in single cells. The detailed drawings of the nineteenth century anatomist Santiago Ramón y Cajal revealed that the central nervous system is composed of a myriad of diverse cell types, and it is thus fitting that the first single-cell global mRNA profiling experiment was carried out on a hippocampal neuron2. Many other cell types were also found to be more diverse than had been appreciated. To begin to explore this diversity, researchers have applied transcriptome profiling of single cells to the developing Jillian J. Goetz and Jeffrey M. Trimarchi are in the Department of Genetics, Development and Cell Biology, Neuroscience Program, Iowa State University, Ames, Iowa. e-mail: [email protected]

nature biotechnology volume 30 number 8 AUGUST 2012

pancreas3, retina4 and olfactory system5, as well as to stem cells6 and cancer cells7. These studies all used similar singlecell sequencing protocols. Polyadenylated mRNA was selected with an oligo d(T) primer and reverse transcribed. The small amount of cDNA obtained was then amplified using either T7 RNA polymerase or a PCR protocol, and analyzed by qPCR or micro­array hybridization (Fig. 1). These methods had several drawbacks, including bias toward previously known transcripts, a possibility of cross-hybridization with related genes and a lack of information beyond the 3′ ends of transcripts. High-throughput RNA-Seq has begun to supplant microarrays for quantitative transcriptome analysis8. To adapt RNA-Seq for use on single cells, researchers have amplified cDNA by existing protocols9,10 (Fig. 1), allowing transcriptome analysis of a single mouse blastomere isolated from a four-cell embryo10. But this work did not take advantage of the ability of RNA-Seq to examine transcripts along their entire length and not just at the ends. In fact, only ~10% of the transcripts reached 3 kb in length. With Smart-Seq, Ramsköld et al.1 improve both the average transcript size and the number of full-length transcripts. Beginning with only 10 pg of RNA (the estimated amount present in 763

news and v i ews Early methods

Tang et al. (2009)

Smart-Seq

npg

AAAA

AAAA

TTTT

TTTT

AAAA 5′-CCC

TTTT-3′

GGG

AAAA

CCC

TTTT

Second-strand synthesis

Second-strand synthesis

cDNA amplification

cDNA amplification Library prep

cDNA amplification Library prep

Microarray or qPCR

Sequencing

Sequencing Assessments of coverage and reproducibility

100 Read coverage (%)

© 2012 Nature America, Inc. All rights reserved.

Single cell

0

3′

Transcript

5′

Figure 1 Methods for single-cell transcriptomics. The first single-cell transcriptomes were generated using an oligo d(T) primer to initiate reverse transcription and subsequent amplification. cDNA libraries from these single cells were hybridized to microarrays. These methods were biased toward assaying the 3′ ends of known sequences. Tang et al.10 modified these protocols for use with RNA-Seq technology, overcoming the sequence bias of microarrays, but not the 3′ skewing inherent to the protocols (green curves in graph). Smart-Seq uses Moloney murine leukemia virus reverse transcriptase to incorporate an oligonucleotide having a PCR-primer sequence at the 5′ end of transcripts (purple). Only these cDNAs are amplified and sequenced, facilitating the analysis of full-length transcripts (blue curve in graph).

a single cell), they achieve nearly 40% coverage through the 5′ end of transcripts. The protocol of Ramsköld et al.1 generates and amplifies full-length cDNA from single cells using a reverse transcriptase enzyme from the Moloney murine leukemia virus. This enzyme possesses two features that are critical to the success of Smart-Seq: template switching and terminal transferase activity. When the enzyme arrives at the end of an mRNA it is reverse transcribing, the terminal transferase activity adds non-templated cytosine residues to the cDNA. If the reaction mixture includes an oligo containing guanine residues that base pair with the C’s, the reverse transcriptase can switch templates and transcribe to the end of the oligonucleotide. This method greatly enriches for transcripts with 764

an intact 5′ end and eliminates the need for second strand synthesis (Fig. 1). Template switching is not new and was previously called switch mechanism at the 5′ end of RNA templates (SMART). In Ramsköld et al.’s version of the approach1, individual cells are lysed in a hypotonic solution containing a high concentration of RNase inhibitors, leading to improved stabilization of the mRNA. The end result is greatly expanded coverage along the entire length of mRNA transcripts in experiments that start from as little as 10 pg of mRNA. A direct comparison between data generated using Smart-Seq and previously published10 RNA-Seq data from single mouse oocytes show a substantial increase in the number of alternatively spliced exons detected by Smart-Seq. Thus, researchers can

now discover a greater number of transcripts present in single cells. Ramsköld et al.1 also apply Smart-Seq to study rare cells that may give rise to cancer. Deciphering the cell of origin for different cancers has been a holy grail for many years, but the small numbers of cells present at the stage of tumor initiation are difficult to isolate and analyze. The authors capture potentially cancerous circulating cells from the blood of a melanoma patient using antibodies conjugated to magnetic beads and a MagSweeper instrument. The gene expression profiles of the cells are highly correlated with those of melanoma cell lines, strongly indicating that the circulating cells originated from a melanoma tumor. Intriguingly, the authors observe heterogeneity in the gene expression profiles. This result is consistent with previous studies that have found extensive differences in gene expression between individual cells3–7. With Smart-Seq, it will be possible to uncover even more differences between cells by examining differentially spliced mRNAs and alternative transcripts. The challenge for all single-cell transcriptomic studies is to determine which gene expression differences are biologically relevant and which are the result of technical or biological noise. Many researchers have been deterred from gene expression profiling experiments on single cells owing to questions regarding reproducibility and sensitivity. To address these issues, Ramsköld et al.1 perform Smart-Seq on reference RNA samples diluted down to picogram levels. Comparisons of the relative expression between Smart-Seq and standard RNA-Seq yield reproducible results (Spearman correlations of 0.87 and 0.77) using 1 ng or 0.1 ng of starting material, which is 100or 1,000-fold, respectively, less input material than in standard RNA-Seq protocols. Using 10 pg of starting RNA—the estimated amount of RNA in a small eukaryotic cell—the method still performs quite well. However, at this level of starting RNA, one of the limitations of the Smart-Seq technique is revealed. Some transcripts, especially those expressed at lower levels, show stochastic loss. Transcripts expressed at lower levels might code for very important proteins that are dose-sensitive and, therefore, kept at lower levels. Future improvements should focus on methods to more consistently recover low-abundance transcripts from single cells. COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests. 1. Ramsköld, D. et al. Nat. Biotechnol. 30, 777–782 (2012). 2. Eberwine, J. et al. Proc. Natl. Acad. Sci. USA 89, 3010–3014 (1992).

volume 30 number 8 AUGUST 2012 nature biotechnology

news and v i ews 3. Chiang, M.K. & Melton, D.A. Dev. Cell 4, 383–393 (2003). 4. Trimarchi, J.M., Stadler, M.B. & Cepko, C.L. PloS One 3, e1588 (2008). 5. Tietjen, I. et al. Neuron 38, 161–175 (2003). 6. Ramos, C.A. et al. PLoS Genet. 2, e159 (2006).

7. Dalerba, P. et al. Nat. Biotechnol. 29, 1120–1127 (2011). 8. Wang, Z., Gerstein, M. & Snyder, M. Nat. Rev. Genet. 10, 57–63 (2009). 9. Islam, S. et al. Genome Res. 21, 1160–1167 (2011). 10. Tang, F. et al. Nat. Methods 6, 377–382 (2009).

The tomato genome fleshed out Todd P Michael & Rob Alba

npg

© 2012 Nature America, Inc. All rights reserved.

Sequencing of the tomato genome reveals key events in the evolution of fruit size, texture, flavor and nutritional quality. Flowering plants have a long history of DNA acquisition, and increasing evidence points to whole genome duplication resulting in polyploidy as a recurring mechanism in the evolution of traits such as seed, fruit and flower development1. This mechanism has now been identified in the tomato lineage, as described in a letter to Nature from the Tomato Genome Consortium2. The paper reports genome sequences of cultivated tomato (Solanum lycopersicum cv. Heinz 1706) and its closest wild relative (Solanum pimpinellifolium) and reveals two paleo-hexaploidy (triplication) events and introgressions from wild relatives that underpin various agriculturally important traits. The availability of the tomato sequence is beneficial to tomato breeders as it will enable the discovery of genes associated with yield, fruit flavor, nutritional value, shelf life, disease resistance and tolerance to abiotic stresses. It will also facilitate the identification of tightly linked molecular markers for virtually any tomato gene and provide a universal reference sequence for mapping most nucleo­tide sequences from any tomato cultivar and for integrating different genetic maps. The worldwide economic value of tomato crops is in the billions of dollars, and the fruits are an important source of nutrients such as lycopene, flavonoids, vitamin C, minerals and dietary fiber. Tomato is one of 29 plants whose genome sequence has been published, including other crop species (apple, banana, cabbage, castor bean, cocoa, cucumber, date palm, grape, hemp, jatropha, maize, melon, millet, papaya, pigeon pea, poplar, potato, sorghum, soybean, strawberry and rice), model species (Arabidopsis thaliana, Arabidopsis lyrata, Todd P. Michael and Rob Alba are in The Genome Analysis Center, Monsanto Company, St. Louis, Missouri, USA. e-mail: [email protected]

Brachypodium distachyon, Lotus japonicus, Medicago truncatula, and Thellungiella parvula) and nonmodel species (moss and spikemoss). The tomato genome (900 Mb) is similar in size to that of its close relative, potato (844 Mb), 2.5-fold smaller than the maize genome (2,300 Mb) and 3.5-fold smaller than the human genome (3,200 Mb). The tomato genome project was initiated in 2003 and involved 300 scientists from 14 countries. The team produced a high-quality sequence of S. lycopersicum and a draft sequence of S. pimpinellifolium by integrating physical and genetic maps and leveraging highthroughput sequencing technologies, including strand-specific RNA-Seq. The S. lycopersicum genome was sequenced using a combination of long Sanger and 454/Roche GS FLX reads, together with high-coverage, shorter SOLiD and Illumina GAIIx reads. To minimize problems arising from repetitive nucleotide regions, the consortium ensured that most (66%) of the bacterial artificial chromosome (BAC) clones used for Sanger sequencing had ends with no sequence similarity to any of the tomato repetitive sequences that were previously captured in the tomato genome repeat data sets of the Sol Genomics Network. The remainder of the BAC clones chosen for Sanger sequencing included those with one unique sequence and one repetitive sequence at each of the two BAC ends. The consortium assembled 84% (760 Mb) of the genome into 91 scaffolds anchored to the 12 chromosomes and predicted 34,727 genes, 727 of which may be specific to plants with fleshy fruit. Decoding the tomato genome sequence revealed that floral architecture and fruit texture, size and nutritional quality are the result of gene retention after two sequential paleo-hexaploidy events followed by recombination, natural selection and domestication. For example, after two rounds of triplication, the genes for fruit ripening, fruit quality and

nature biotechnology volume 30 number 8 AUGUST 2012

lycopene biosynthesis were retained by natural selection. In addition, some genes have been lost from tomato compared with its close relatives in the deadly nightshade family (Belladonna; Atropa belladonna). Genome analysis showed that some cytochrome P450 genes associated with the biosynthesis of toxic alkaloids have been lost, whereas other cytochrome P450 genes associated with alkaloid biosynthesis are not expressed in ripe fruit from modern cultivars. Comparison of the cultivated and wild tomato genomes identified recent admixture (introgression from S. pimpinellifolium) in the Heinz 1706 genome; two large introgressions on chromosomes 9 and 11 have been implicated in disease resistance and several others have been associated with small fruit size (cherry tomatoes)2. Domesticated genomes of maize3, soybean4 and grape5 also show evidence of admixture with wild relatives, so similar observations in the tomato genome lend weight to the hypothesis that gene flow from wild relatives is an important source of novel or improved traits in cultivated varieties. For instance, 43 Mb of recently introgressed wild material was identified in selected cultivated soybean accessions, and some of these regions overlap with quantitative trait loci for plant height and internode length4. High-density marker platforms built using the tomato reference genome will facilitate linking of wild traits to genes so that breeders can harness the benefits of wild traits without introducing less desirable linked genes (Fig. 1). Unlike the co-linear genomes of mammals, plant genomes have undergone rapid and extensive rearrangements after genome fusion events1,6. Data from the tomato genome are consistent with this picture of genome fusions, rearrangements and shuffling in the flowering plant lineage. In addition to a whole genome duplication in an ancestral seed plant (~320 Myr ago), a whole genome duplication in an ancestral angiosperm (~200 Myr ago) and the paleo-hexaploid event that resulted in the core eudicots1,6, the Tomato Genome Consortium2 was able to show that another triplication occurred in the tomato and potato genomes, followed by reduction, diploidization, recombination, and selection for genes that are essential for fruit ripening and quality in the cultivated tomato. It is interesting that the Potato Genome Sequencing Consortium7 did not identify this additional triplication event, perhaps because the potato genome scaffold contiguity is much shorter than that achieved by the Tomato Genome Consortium2 (1 Mb and 16 Mb N50, respectively). DNA sequence divergence between the wild and cultivated tomato genomes is very 765