Perspective Cite This: ACS Infect. Dis. XXXX, XXX, XXX−XXX
pubs.acs.org/journal/aidcbc
Single-Cell RNA Sequencing to Understand Host−Pathogen Interactions Cristina Penaranda and Deborah T. Hung*
Downloaded via UNIV OF CAMBRIDGE on February 1, 2019 at 08:26:44 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
Infectious Disease and Microbiome Program, Broad Institute of Harvard and MIT, 415 Main Street, Cambridge, Massachusetts 02142, United States Department of Molecular Biology and Center for Computational and Integrative Biology, Massachusetts General Hospital, 185 Cambridge Street, Boston, Massachusetts 02114, United States Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, Massachusetts 02115, United States ABSTRACT: Host−pathogen interactions, particularly in the context of bacterial infections, are dynamic exchanges where transcriptional heterogeneity from both the host and the pathogen can lead to many diverse outcomes via distinct molecular pathways. Transcriptional profiling at the single-cell level, on a genome-wide scale, has enabled a greater appreciation of the cellular diversity in complex biological organisms and the myriad of host transcriptional states during infection. Here, we highlight recent reports of single-cell RNA sequencing within the context of host−pathogen interactions, describe current limitations for detecting and profiling the transcriptome of invading pathogens at the single-cell level, and suggest exciting future prospects for this technology in the study of infection. We propose that understanding infection as an integrated process between pathogen and host with resolution at the single-cell level will ultimately inform development of vaccines with greater productive and protective host immunity, enable the development of novel therapeutics that harness host mechanisms, and yield more accurate biomarkers to guide better diagnostics. KEYWORDS: sc-RNAseq, dual-RNAseq, transcriptional profiling, infectious disease, bacteria, virus and analysis (Figure 2). Since the first studies that were done on a handful of cells in individual tubes or 96-well plates, a number of engineering advances in droplet-based methods such as Dropseq,14 inDrop sequencing,15 and Seqwell16 have enabled much higher throughput assays. The key technological innovation that has enabled sc-RNAseq is the ability to barcode transcripts from a single cell, thereby allowing pooling and amplification with hundreds to thousands of other cells, ultimately enabling transcript enumeration through standard sequencing technologies (e.g., Illumina).1−4 Many distinct protocols now exist for the molecular biology of single-cell barcoding and library construction both at the microliter and nanoliter scales. Comparisons of these protocols have shown that each introduces different biases and sources of error likely resulting from amplification bias or inefficiencies in various steps of library construction such as reverse transcription or template switching. However, all methods tested were shown to be quantitatively comparable to qRT-PCR, which is more sensitive and is considered the gold standard for validating gene expression.17 The use of unique molecular identifiers (UMIs) in the barcodes at the individual transcript level has enabled more accurate enumeration of transcripts by accounting for the skewing that can occur with nonuniform PCR amplification.18 A number of commercial platforms,
T
here has been a recent explosion in studies that examine individual cellular behavior with an unprecedented level of resolution that enables full appreciation of the heterogeneity heretofore buried within population behavior (Figure 1). In the past, populations of cells, either at the resting state or in response to stimuli, have been studied in bulk, averaging the behavior of all cells within the population. This limitation existed despite the recognition that heterogeneity exists in different cell types within a tissue, including the existence of rare cell types, or in different, individual cell states that arise within a greater population. However, the recent development of methods to interrogate populations on the single-cell level, on a genome-wide scale, has dramatically altered our understanding of cellular heterogeneity by providing much greater resolution of different cell types and cell states. Singlecell approaches have now been used to characterize cellular transcriptional profiles,1−5 DNA methylation states,6 protein content,7,8 and chromatin accessibility.9,10 In this review, we focus on the most advanced and widely used technology, single-cell RNAseq (sc-RNAseq), to characterize individual gene expression programs and discuss its application to studying host−pathogen interactions. This topic has also been recently reviewed by others with an emphasis on the microbiome11 and noncoding RNAs.12 sc-RNAseq involves single-cell isolation and capture,13 followed by cell lysis, cDNA synthesis from messenger RNA (mRNA) and amplification, library construction, sequencing, © XXXX American Chemical Society
Received: December 19, 2018
A
DOI: 10.1021/acsinfecdis.8b00369 ACS Infect. Dis. XXXX, XXX, XXX−XXX
ACS Infectious Diseases
Perspective
fields untouched, including, but not limited to, descriptions of the developmental landscapes of the lung27 and intestine,28 the composition of solid tumors,29 and the identification of rare cell types.30 These findings can have significant implications for understanding disease biology, as evidenced by the recent, comprehensive analysis of the composition of the proximal airway epithelium by Plasschaert et al.31 They described a novel cell type, the pulmonary ionocyte, which makes up 1− 2% of epithelial cells that uniquely express high levels of Cftr, the gene that encodes a chloride channel and is mutated in patients with cystic fibrosis. Similarly, Cochain et al. described a previously unrecognized macrophage subpopulation in atherosclerosis that may have a role in lesion calcification.32 These data suggest that small subpopulations of cells, which up until now had been obscured in bulk analysis, may have an enormous impact on disease states. This revolution has similarly impacted the field of infection with efforts to comprehensively define immune cell types by the Immunological Genome project (http://www.immgen. org/) with the goal of creating an expression profile database for all mouse immune cell types, which so far number more than 250. These collaborative efforts have been complemented by a large number of individual studies, both in animal models and directly from humans, that have resulted in the definition of heterogeneity in immune cell behavior, discovery of new, often rare immune cell types, and insight into immune cell development. For example, Jaitin et al. used sc-RNAseq of more than 4000 mouse spleen cells to show that in response to LPS stimulation, some genes are upregulated in all cells, while others are regulated in a cell-type specific manner, demonstrating heterogeneity in response to stimuli both at the level of cell types and cell states.33 Meanwhile, Villani et al. identified six different subpopulations of dendritic cells and four subtypes of monocytes in human blood, thereby redefining the taxonomy of these cells, which play key roles in pathogen sensing, recognition, and clearance.34 Lastly, Drissen et al. analyzed granulocyte-macrophage progenitors and identified two distinct myeloid differentiation pathways determined by the expression of the transcription factor GATA-1; GATA-1 expression results in differentiation into mast cells, eosinophils, and red blood cells, while its absence results in the differentiation into monocytes, neutrophils, and lymphocytes.35 This level of analysis of cell-type specificity and lineage development will be critical to understanding cellular function and dysfunction under pathological conditions such as infection.
Figure 1. Single-cell studies reveal heterogeneity obscured in bulk phenotypic measurements. Traditional bulk measurements generate an average of the phenotype over all cells (gray). Bulk measurements can thus obscure heterogeneity (red and blue) contained within cellular populations, including masking bimodal phenotypes (upper), phenotypes in which cells can adopt a large range of states (middle), and rare cell types which are obscured by the larger, dominant population (bottom). The expression program of cells is one such phenotype that can show heterogeneity within a population, and scRNAseq can reveal this heterogeneity. scRNA-seq can be performed in both in vitro studies as well as in vivo human studies.
including Fluidigm’s C1 system and the Chromium system from 10x Genomics, are now available that streamline automated single-cell lysis, RNA extraction, cDNA synthesis, and library construction. Analysis methods are then required to extract meaningful biological information from the large data sets, including removal of technical noise that arises from amplification of picograms of RNA from a single cell.19,20 Methods such as t-distributed stochastic neighbor embedding (t-SNE21) can be applied to visualize high-dimensional gene expression data in two or three dimensions to facilitate visualization, interpretation, and analysis. Computational tools such as Seurat22 and SCDE23 have been developed that integrate the various steps of single-cell sequencing data analysis, including performing unsupervised learning to cluster cell types or states and determine differential gene expression. As data sets become larger and analysis becomes more specific to a given data set, more specialized algorithms will need to be developed. All of these developments have enabled the widespread use of sc-RNAseq, leading to significant advances in almost all fields of biology, including that of host−pathogen interactions.
■
CELLULAR HETEROGENEITY DURING BACTERIAL INFECTION While heterogeneity at the single-cell level has been explored mostly in eukaryotic cells, bacterial populations that are genetically identical have also been shown to be phenotypically heterogeneous even under homogeneous laboratory conditions. Thus, infection is likely the result of an extremely elaborate interplay between two complex (host and bacterial) populations. In bacterial populations, the term bistability has been used to describe reversible bimodal gene expression that leads to two distinct, coexisting subpopulations.36 One of the most clear examples of bistability in bacteria is persister cells in Escherichia coli, where a small fraction of the bacterial population has been shown to survive otherwise lethal antibiotic exposure due to their reduced growth rates.37
■
THE sc-RNAseq REVOLUTION Single-cell studies have expanded our appreciation of cellular diversity in complex biological systems and have led to an explosion of cell atlases, which demonstrate heterogeneity both at the level of cell type and cell state,17,24,25 in addition to the discovery of novel cell types. Analogous to the Human Genome Project, laboratories from around the world have come together in collaborative projects such as The Human Cell Atlas,26 supported by the Chan Zuckerberg Initiative, and the Human BioMolecular Atlas Program, supported by National Institutes of Health, with the goal of creating a census of all human cell types and tissues, defined by their molecular profiles using genomic approaches that analyze DNA, RNA, and proteins. Such studies have left relatively few B
DOI: 10.1021/acsinfecdis.8b00369 ACS Infect. Dis. XXXX, XXX, XXX−XXX
ACS Infectious Diseases
Perspective
Figure 2. sc-RNAseq library construction methods for single-cell host, bulk bacterial, and paired dual host−bacterial transcriptional profiling. (A) Host sc-RNAseq relies on use of the poly(A) tail to exclusively capture mRNA transcripts. Unique molecular identifiers (UMIs) and bar codes (BCs) are added during cDNA synthesis to mark individual transcripts and cells, respectively, allowing for pooling of cells early in the library construction protocol. Illumina adaptors (IA1 and IA2) needed for sequencing are added, and libraries are amplified to obtain sufficient material for sequencing. Bioinformatic analysis is used for quality control, to assess technical variability and extract interpretable biological data through dimensionality reduction, such as t-distributed stochastic neighbor embedding (t-SNE) or principal component analysis (PCA) and differential expression analysis. (B) Bulk bacterial RNAseq relies on depletion of rRNA prior to or in the early steps of library construction to exclusively capture bacterial mRNA transcripts. cDNA is then synthesized. Bar codes, which allow pooling of finished libraries prior to sequencing, and Illumina adapters are added, and libraries are amplified. Bioinformatic analysis is used to determine differential gene expression between populations. (C) Paired dual host−bacterial sc-RNAseq relies on capture of all RNA transcripts via direct adaptor ligation, random hexamer amplification, or enzymatic polyadenylation, followed by protocols similar to those of host sc-RNAseq. Because dual sc-RNAseq libraries are made up of >90% rRNA of both species, which is uninformative for transcriptional profiling, either deep sequencing, subtractive hybridization to deplete rRNA-derived templates, or enrichment for bacterial-derived transcripts must be performed. Analysis and interpretation of results has the added complexity of correlating host and pathogen transcriptional profiles.
Whether these switches are regulated responses or stochastic spontaneous events is not well understood. Evolutionarily, bistability is seen as a bet-hedging or division of labor strategy within a population that ensures the survival of the population by generating maximum flexibility to survive fluctuating environments, including encounters with a eukaryotic host.36
To date, the behavior of individual bacteria has been characterized using microscopy, promoter reporters, and selective qRT-PCR, but single-bacterium RNAseq has not been possible because of the limitations of current technologies. C
DOI: 10.1021/acsinfecdis.8b00369 ACS Infect. Dis. XXXX, XXX, XXX−XXX
ACS Infectious Diseases
Perspective
differing, individual host cell responses.39 Transcriptional profiling of individual, infected macrophages revealed a cluster of bimodally expressed genes that were enriched for the type I interferon response and were induced in only about one-third of the infected cells. This response was correlated with the induction of the bacterial PhoPQ two-component system and driven by PhoPQ’s modulation of LPS on the surface of individual bacteria. Host cells that had taken up a bacterium with high levels of PhoPQ activity expressed high levels of the type I interferon response; conversely, host cells that had taken up a bacterium with low levels of PhoPQ activity expressed the type I interferon response at lower levels. These results demonstrate that differences in the induction of the type I interferon response are not due to differences in the intrinsic state of the host cell but rather to differences in the infecting bacterium, emphasizing the complexity that emerges when these two populations interact. In another study, using fluorescent reporters of bacterial cell division, Saliba et al. also studied Salmonella infection of individual macrophages, with a focus on distinguishing primary bone marrow derived macrophages harboring either nongrowing or growing Salmonella.40 Their results revealed that macrophages harboring nongrowing Salmonella display hallmarks of proinflammatory M1 macrophages while macrophages harboring fast-growing Salmonella display hallmarks of anti-inflammatory M2 macrophages. They also recognized a range of host states between these two extreme programs. In a subsequent study using bulk dual-RNAseq that captured the transcriptome of both host and pathogen, they reported that the transcriptomes of intracellular nongrowing and growing Salmonella were similar and that M2 polarization is driven by a type 3 secretion system effector, thus demonstrating how invading bacteria influence macrophage polarization to their advantage.41 These data highlight the heterogeneity in both host and pathogen and the crosstalk that occurs during their encounters as they each struggle to win. These studies just begin to scratch the surface of the many basic questions that plague our understanding of the biology of host−bacterial interactions. Many questions can be raised from the bacterial perspective. Why do bacteria only infect some cells but not all? Why do bacteria proliferate in some cells but are cleared from others? Why and how do some bacteria enter an antibiotic tolerant state in response to the cells in which they reside? Understanding the bacterial subpopulations that arise during infection may be critical to develop truly novel therapeutic approaches to eradicate infections. Similarly, many questions can be raised from the host perspective. Which cell types are actually infected? How do their responses differ? What determines the mode of cell death if the host cell succumbs? These questions are likely to only become more complex and numerous as one moves from a reductionist in vitro host cell model to animal models of infection and, ultimately, to human infections where multiple cell types respond to infection. Significant progress has been made in developing tools that allow us to begin answering these questions; however, some technical challenges continue to exist that limit full dissection of infection, including simultaneously profiling of the pathogen and host cell transcriptional programs on the single-cell level.
Bacterial interactions with the host involve dynamic exchanges where heterogeneity from both sides can lead to many different outcomes38 (Figure 3). When a bacterium
Figure 3. Transcriptional heterogeneity of host and bacteria results in various infection outcomes. Transcriptional heterogeneity exists both within the host and bacterial populations such that infection outcome is the product of their interacting transcriptional programs and how they respond to one another. When an individual bacterium comes in contact with an individual eukaryotic cell, the bacterium may internalized. As a consequence of bacterial internalization, the host cell may survive, leading to bacterial persistence, proliferation, or clearance; alternatively, the host cell may succumb, either killing the bacterium with it or allowing it to escape. There may be multiple ways to achieve each of these outcomes such that unique host−bacterial pairs may result in the same infection outcome (black arrows), highlighting the variety of molecular pathways that may be at play during these interactions.
comes in contact with a eukaryotic host, the bacterium can be internalized or not. If internalization occurs, the host cell may survive, permitting the intracellular bacteria to persist, proliferate, or be cleared; alternatively, the host cell may succumb by different mechanisms of cell death, resulting either in bacterial death or escape. There may be multiple ways to achieve each of these cellular outcomes, further expanding the number of molecular pathways that may be at play. On the whole organism level, there may similarly be multiple paths toward determining host outcome, with the outcome emerging either as the sum of hundreds or thousands of such single-cell interactions or as the result of a single or small number of interactions gone awry where a single bacterium “winning”, particularly early in infection, can determine the course of infection. Transcriptomic analysis at the single-cell level has now become a powerful tool to better understand these interactions. Two of the first studies describing the host−bacterial interaction on the single-cell level could only characterize the host by sc-RNAseq, relying on other methods to characterize the pathogen. They focused on the interactions between the intracellular pathogen Salmonella typhimurium and the host macrophage.39,40 Using a combination of host sc-RNAseq and fluorescently labeled bacteria, Avraham et al. investigated the macrophage response after Salmonella invasion to discover that indeed, the variability of individual bacteria can determine D
DOI: 10.1021/acsinfecdis.8b00369 ACS Infect. Dis. XXXX, XXX, XXX−XXX
ACS Infectious Diseases
■
Perspective
HOST HETEROGENEITY IN RESPONSE TO VIRAL INFECTION Single-cell characterizations of the host−viral interaction have similarly provided unprecedented resolution of host cellular behavior upon infection. Recent single-cell analysis of viral infection has revealed host heterogeneity both at the level of cell type and cell state.42,43 However, viral infection differs from bacterial infection in their lack of pathogen transcriptional heterogeneity prior to infection. Viral heterogeneity occurs both at the sequence level, due to polymorphisms of the infecting viral population or mutations that occur during viral mRNA (vmRNA) transcription after infection, and at the transcript level, due to differences in viral genome copy number or temporal gene expression. Technically, sc-RNAseq can capture viral RNAs for viruses, such as influenza, that use the viral polymerase to add poly(A) tails to vmRNA,44 thus making them indistinguishable from host mRNA and amenable to poly(A) enrichment capture and library construction. Taking advantage of the ability to capture vmRNA using conventional sc-RNAseq methods, two studies have utilized scRNAseq to identify which cells are infected and to detect and quantify vmRNA after infection with influenza. Steuerman et al.42 performed sc-RNAseq on isolated immune (CD45+) and nonimmune (CD45−) cells from the lungs of mice 2 days after influenza infection. They clustered cells by cell type and then inferred viral load within each cell type from the proportion of unique reads that aligned to viral segments (vmRNA). They found that viral infection can be detected in all cell types, ranging from 62% of epithelial cells to 22% of T cells, challenging the traditional view that influenza virus primarily infects epithelial cells. This is in contrast to a previous study using a recombinant influenza virus carrying a GFP reporter, which found a much lower fraction of infected cells based on GFP fluorescence,45 suggesting that RNAseq may be more sensitive at detecting viral infection than fluorescent markers. Steuerman et al.42 also found that the viral load in infected lung epithelial cells could vary by 2 orders of magnitude while the viral load in most nonepithelial cells was low. In a second study, Russell et al.43 used sc-RNAseq of epithelial cells infected in vitro with influenza to also demonstrate the variability of viral load in infected cells. The authors used stocks of virus that were “pure” of defective particles, thereby minimizing polymorphic particles and controlling for viral heterogeneity at the genomic sequence level. After infecting at a low multiplicity of infection, they found that most infected cells had