Bacterial Single Cell Whole Transcriptome Amplification in Microfluidic

4 days ago - Single cell RNA sequencing is a technology that provides the capability of analyzing the transcriptome of a single cell from a population...
0 downloads 0 Views 1MB Size
Subscriber access provided by UNIV AUTONOMA DE COAHUILA UADEC

Article

Bacterial Single Cell Whole Transcriptome Amplification in Microfluidic Platform Shows Putative Gene Expression Heterogeneity Yuguang Liu, Patricio Jeraldo, Jinsung Jang, Bruce Eckloff, Jin Jen, and Marina Walther-Antonio Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.8b04773 • Publication Date (Web): 03 Jun 2019 Downloaded from http://pubs.acs.org on June 6, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Bacterial Single Cell Whole Transcriptome Amplification in Microfluidic Platform Shows Putative Gene Expression Heterogeneity Yuguang Liu1,2, Patricio Jeraldo1,2, Jin Sung Jang4, Bruce Eckloff4, Jin Jen4, Marina Walther-Antonio1,2,3* 1Department

of Surgery, Division of Surgical Research, Mayo Clinic, Rochester, MN, USA Program, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA 3Department of Obstetrics and Gynecology, Mayo Clinic, Rochester, MN, USA 4Medical Genome Facility, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA 2Microbiome

*Corresponding author Email: [email protected]

ACS Paragon Plus Environment

-1-

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 20

Abstract Single cell RNA sequencing is a technology that provides the capability of analyzing the transcriptome of a single cell from a population. So far, single cell RNA sequencing has been focused mostly on human cells due to the larger starting amount of RNA template for subsequent amplification. One of the major challenges of applying single cell RNA sequencing to microbial cells is to amplify the femtograms of RNA template to obtain sufficient material for downstream sequencing with minimal contamination. To achieve this goal, efforts have been focused on multi-round RNA amplification but would introduce additional contamination and bias. In this work, we for the first time coupled a microfluidic platform with multiple displacement amplification technology to perform single cell whole transcriptome amplification and sequencing of Porphyromonas somerae, a microbe of interest in endometrial cancer, as a proof-ofconcept demonstration of using single cell RNA sequencing tool to unveil gene expression heterogeneity in single microbial cells. Our results show that the bacterial single-cell gene expression regulation is distinct across different cells, supporting widespread heterogeneity.

Keywords Bacterial single cell RNA sequencing; RNA seq; Bacterial single cell gene expression

ACS Paragon Plus Environment

-2-

Page 3 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Introduction Phenotypically identical and genetically similar cells from the same population can have dramatic heterogeneity in their behavior which is often expressed in their transcriptomic landscape1. This heterogeneity plays a significant role in various biological processes including tumor progression2, 3 and immune response4. Single cell whole transcriptome sequencing (SC-WTS) is emerging as a powerful tool for profiling the cell-to-cell variability to uncover this heterogeneity which has been obscured in bulk studies1, 5-7. As a result, SC-WTS is starting to impact our understanding of human physiology and diseases such as how particular cells can be the determinants of an infection8, drug resistance9, 10

and cancer relapse11. The key steps in SC-WTS include single cell isolation, lysis and amplification of femto to picograms of total RNA

to reach the quantity sufficient for library preparation and sequencing. Microfluidic platforms are ideal for single cell applications as they offer the unique ability of handling nanoliters of fluid in a controlled manner, thus allowing for the manipulation and isolation of single cells for subsequent nanoscale reactions12-17 with minimal contamination and marginal reagent cost. Multiple displacement amplification (MDA)18 has been a popular method for whole genome19, 20 and transcriptome amplification due to the relatively low amplification bias21. It is based on reverse transcription (RT) to generate template complementary DNA (cDNA) and amplify the cDNA by φ29 polymerase and random primers. The MDA method is reported to have high fidelity and lower error rates following isothermal procedures easy to implement in microfluidic systems22-24. So far, most of the SC-WTA applications have been focused on human cells25, 26. SC-WTA of bacterial species is still rare due to the rigid and multi-layered cell walls of these microorganisms27 and their 100X-500X lower amount of starting total RNA than single eukaryotic cells for total transcriptome amplification19, 28, 29. To generate sufficient cDNA from the miniscule amount of RNA from a single cell for downstream sequencing and analyses, multiple rounds of amplification30, 31 is often adopted. However, the additional processes would take up to 48 hours and can increase the percentage of duplicated reads and thus bias in the data analyses. To extend microfluidic-based SC-WTA to microbiological and microbiome research, it is essential to develop protocols suitable for a single bacterial cell that circumvent the aforementioned concerns.

ACS Paragon Plus Environment

-3-

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 20

In this work, we demonstrate an MDA-based bacterial SC-WTA in microfluidic platform that produces 2-3 nanograms of cDNA for downstream library preparation and sequencing. In this proof-of-concept study, we used Porphyromonas somerae as the object because this bacterium is not considered as a common contaminant; moreover, this species was recently identified as one of the microbiome signatures of gynecological diseases including endometrial cancer and thus has the potential to be used as a biomarker for early disease detection32. The results revealed heterogeneity of gene expression in single cells that can be obscured by bulk cell gene expression studies. We believe that an effective microfluidic-based protocol for bacterial SC-WTA would serve as a guideline for bacterial single cell transcriptomic analyses in microfluidic platforms and can be applied to a wide range of biological research and medical applications.

Materials and Methods Cell preparation P. somerae (ATCC, BAA1230) was cultured in chopped meat carbohydrate broth (DB) at 37C in an anaerobic chamber (Coy) and harvested during log phase (~107/mL). Two volumes of RNAprotect Bacteria Reagent (Qiagen) was added to the culture inside the anaerobic chamber and incubated for 10 min. The bacterial cells were then pelleted at 12,000 g for 3 min at 4C, and the supernatant was aspirated. The cell pellet was washed twice with nuclease-free water and resuspended in 1 mL nuclease-free water, then the sample was divided into two 0.5 mL aliquots. RNA was extracted from one aliquot using RNeasy Micro Kit (Qiagen) following manufacturer’s instruction, and was stored at -20C for bulk RNA sequencing without amplification in the microfluidic device. Pluronic F-127 (10%, Sigma) was added to the other aliquot (final concentration of 0.08%) before the cells were introduced into the microfluidic device for single cell isolation and whole transcriptome amplification.

Microfluidic experimental setup The study was performed in our optofluidic platform at Mayo Clinic (Rochester, MN) with minor modifications of the microfluidic device used in our previous work33, 34. Briefly, this platform integrates a microscope (Nikon Eclipse), optical tweezers (Thorlabs) and a customized Polydimethylsiloxane (PDMS) microfluidic device with 12 parallel reaction

ACS Paragon Plus Environment

-4-

Page 5 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

lines with sets of valves that allow for the on-demand creation of isolated microenvironments. The number of chambers in each reaction line corresponds to the number of reagents that needs to be sequentially added to perform the reactions. We chose this microfluidic structure as it has been widely used and validated in a number of single cell genomic works that rely on optical tweezers for single cell isolation35-37. The major advantages of using optical tweezers to isolate single cells from a population include high target single cell confidence, providing a way to visually ensure that only one cell is trapped into a microchamber and thus maintaining minimal possibility of sequencing contaminating cells unintendedly. Prior to the experiments, the sample channel in the microfluidic device was pre-soaked in the chip diluent (0.04% Pluronic F127 in Phosphate buffered saline (PBS)) for 30 min to prevent the cells from sticking to the PDMS surface during cell sorting. The cell sample was introduced into the chip, and single cells were trapped and transported into microchambers by optical tweezers. Visually identifiable contaminating cells were trapped and moved out of the chambers to ensure only the target cell was in the chamber prior to the lysis step. Reagents including lysis buffers, RT mix, ligation mix and polymerase were sequentially added to the isolated single cells to perform chemical reactions in the same way as described in our recent work33. The amplified products were collected from the outlet ports of the device and transferred into microwell plates for downstream processing. All the supplies and reagents were filtered (0.2 μm), autoclaved or UV-sterilized, except for the RT mix, ligation mix and polymerase. Ten single cell reactions and two negative control (sterile PBS) reactions were performed in the on-chip SC-WTA experiment. Lysis buffer components The on-chip bacterial SC-WTA was performed using primarily REPLI-g WTA Single Cell Kit (Qiagen) except for a customized lysis buffer, as the lysis buffer provided in the kit does not support the lysis of bacterial or plant cells. Our lysis buffer was adapted from a recipe reported by Kang et al.30, which include 100 mM Tris-Cl (pH 8), 200 mM KCl, 0.5 mM Ethylenediaminetetraacetic acid (EDTA), 0.1% Triton X-100, 200 mM Dithiothreitol (DTT) (BioRad), 0.04 U/L RnaseOut (Invitrogen) and 3x10-7 U/L Ready-Lyse lysozyme (Epicentre). The rest of the SC-WTA reagents were prepared using supplied components according to the kit manufacturer’s instruction with the additional of low concentration of Tween 20. A detailed description of these reagents is provided in supplemental information. Microfluidic bacterial lysis for SC-WGA workflow

ACS Paragon Plus Environment

-5-

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 20

The general workflow of SC-WTA in microfluidic chip is shown in Figure 1. After single cell isolation in the microfluidic chip using laser tweezers, customized lysis buffer was introduced into the chambers and incubated on a hotplate at 37C for 15 min, followed by incubation at 80C for 2 min for RNA denaturation. Genomic DNA (gDNA) wipeout buffer was then added and incubated at 42C for 10 min to remove gDNA released from the cell. RT mix was added and incubated at 42C for 1 hr to transcribe the bacterial RNA into a first-strand cDNA template, and this reaction was terminated by incubating at 95C for 3 min. Ligation buffer was then added and incubated at 24C for 30 min to create a second-strand cDNA, synthesizing the double-stranded cDNA ready for MDA-based amplification. The ligation process was terminated by incubation at 95C for 5 min. φ29 polymerase was added and incubated at 30C for 2 hrs followed by the inactivation of all enzymes at 65C for 5 min. Gel-loading pipette tips were inserted into the outlet ports of the chip, and nuclease-free water with 0.02% Tween 20 (Sigma) was introduced into the chip to flush the amplified into the pipette tips until the fluid level reached the 20 µL mark. The product was collected and stored at 4˚C, and highsensitivity Qubit assay (Thermo Fisher) was performed to assess the amount of the amplified cDNA from single cells. We arbitrarily chose three replicates of cDNA amplified on-chip for library construction. For RNA-seq library construction, both single cell and bulk cDNAs were diluted to 250 pg and used to construct indexed libraries using Nextera XT DNA Sample Preparation kit (Illumina, CA). Libraries were quantified by Bioanalyzer (High Sensitivity DNA analysis kit, Agilent, CA) and Qubit (dsDNA BR Assay kits, Thermo Fisher). The cDNA libraries were sequenced using 101 bases paired-end protocol on Illumina HiSeq 4000 platform. Bioinformatics post-processing Following sequencing, adapter removal was perfomed using atropos v1.1.938. Reads were mapped using BWA MEM v0.7.1739 to a reference genome of P. somerae DSM 23386 (a synonym of ATCC BAA-1230). This reference (provided in the supplemental files) was re-assembled from raw reads obtained from NCBI Sequence Read Archive run accession SRR3947667, using the Unicycler v0.4.6 40pipeline with SPAdes assembler v3.11.141, and annotated using Prokka v1.1342 and UniFam43 as an additional database. Mapped reads were coordinate-sorted using samtools v1.844. Gene expression was then profiled using HTSeq v0.10.045. Gene body coverage plots were calculated using Picard Tools v2.18.746 using the CollectRnaSeqMetrics program. To profile potential contaminants, we used the adapter-trimmed reads and called

ACS Paragon Plus Environment

-6-

Page 7 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

their taxonomy using LMAT v1.2.647 with database kML+H.noprune.4-14.2025, and using its genus-level output. Reads per Kilobase per Million mapped (RPKM) values were verified by using the Rockhopper pipeline version 2.0348 to de novo assemble transcripts and calculate their expression values. The RPKM values are comparable in magnitude to the ones in the original data. The table of RPKM expression values and the sequences of the de novo assembled transcripts are provided in the supplemental information.

Results and Discussion RNA amplification in microfluidic device Most library constructions require 25 ng minimum DNA input for subsequent sequencing. To enable standard cDNA library construction for single bacterial cells, we attempted producing sufficient cDNA from a single P. somerae in the microfluidic chip. Studies show that the total RNA per bacterial cell is estimated to be 1.5-200 fg19, 28, 49. To investigate the effect of the amount of starting RNA on the amplification yield, we performed a set of experiments to compare the on-chip amplification of 100 pg and 0.1 pg total RNA extracted from a P. somerae population, and single P. somerae cells, with each batch of amplification for 2 hrs under the same conditions (Figure 2(a)). The RNA extracted from P. somerae was diluted in nuclease-free water to desired concentrations before introducing it into the microfluidic chip for amplification to ensure that each microchamber contains the aforementioned amount of starting RNA. All ten replicates of extracted RNA and 9 out of 10 single P. somerae replicates were successfully amplified. Although the starting RNA amount differed 1000X in these experiments, no statistical difference was observed for the cDNA produced from each batch of on-chip amplification. The comparable amplification results of 0.1 pg and 100 pg of the extracted bulk P. somerae RNA and a single P. somerae cell RNA suggest that the cell lysis was efficient to release starting RNA template within the cell. However, if the microchamber volume was further reduced, higher amount of starting RNA would likely lead to larger amount of cDNA after amplification50-52. On the other hand, the actual starting RNA in a single P. somerae cell could be as low as 1.5 fg and the quality of the RNA template from a single cell could be lower compared to the bulk RNA, these factors can also lead to the low amount of cDNA after amplification. To increase cDNA amplification yield, we attempted the amplification with extended time (Figure 2(b)). In addition to performing SC-WTA for 2 hrs as recommended, we performed the on-chip amplification for 16 hrs. Still, this did not

ACS Paragon Plus Environment

-7-

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 20

increase the average amount of cDNA produced. We conducted another experiment with an initial 2 hr amplification of single P. somerae cells, followed by a second around of on-chip amplification by introducing additional polymerase into the microchamber and incubating for 16 hrs. This led to a 5X increase in the average amount of cDNA produced; however, significant amplification was also detected in the negative controls. To minimize concerns caused by contaminants and duplicated reads introduced by the second round of amplification, cDNA produced from 2 hrs of on-chip amplification (2-3 ng) was used for low-input library construction and sequencing. The amount of DNA in the negative controls from this experiment was below the detection limit of the high sensitivity Qubit assay (Thermo Fisher) and therefore was not sequenced. To evaluate the possibility of gDNA contamination in the on-chip SC-WTA process, we did additional tests to verify the removal of gDNA before the amplification. In the first test, we used gDNA wipeout reagent provided in the SC-WTA kit according to manufacturer’s instruction to degrade the gDNA released from the P. somerae single cells after the lysis step. In the second test, we proceeded straight onto reverse transcription step after cell lysis without adding gDNA wipeout reagent (Figure 2(b)). Results did not show significant difference after 2 hrs of on-chip amplification, however, ~5X DNA was harvested from the sample without gDNA removal step after 16 hrs of on-chip amplification. These results verified the efficiency of gDNA wipeout reagent. We analyzed the species origins of the sequencing data to estimate the contamination of the single cell samples. The taxonomy profile was called using Livermore Metagenomic Analysis Toolkit (LMAT), and groups of sequences featuring a hit to the database were sorted into taxonomic hit distributions (Figure 3). The level of contamination was in a range comparable with on-chip MDA-based applications53. Others have compared amplification biases across various amplification methods and found that MDA-based reaction, specifically Repli-g kit, introduces the lowest amplification bias21. However, bias inevitably exists because of the capture and non-linear amplification of minute amount RNA in a single cell54. In the future, we seek to label target cDNA with unique molecular identifiers to further reduce the intrinsic amplification noise and bias in the library preparation and sequencing55, 56, as well as comparing the results of bacterial and eukaryotic SC-WTA. Figure 4 shows the distribution of normalized sequencing coverage across the transcript for the RNA extracted from the bulk P. somerae and three single cells amplified on-chip and selected for sequencing. The results show that the

ACS Paragon Plus Environment

-8-

Page 9 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

sequencing coverage between bulk RNA and single cells are comparable, which reconciles with earlier results showing that low-volume MDA reactions tend to reduce amplification bias which effectively improves coverage uniformity23, 57, 58.

The coverage of assembled transcript of the three single P. somerae cells did not display significant differences (Figure

4(b)-(d)), even though the gene expression profile vary largely from cell to cell which will be discussed in details below. Performing MDA reaction and sequencing a few cells rather than a single cell would likely achieve higher coverage and improve uniformity, however, it would pose challenges of deconvolution during data analysis, and most importantly, it would jeopardize the chance to access transcriptomic data originating from a single bacterial unit of life. Accessing this type of data has the potential to transform our understanding of how bacteria operate at the single-cell level and how disparate their activities can be, despite the fact that they may live in the same colony and be genomic clones. We compared the ratio of expressed genes detected in the three single P. somerae cells with the bulk RNA (Figure 5 (a)). On average, 1517 expressed genes were detected in single P. somerae cells, which is 74.7% of all the expressed genes detected in the bulk RNA sample. Compared with earlier work by others, the percentage of expressed genes detected in single P. somerae cells was ~10% higher than the percentage of the expressed genes detected in single eukaryotic cells31. When grouping single cell results, the total number of expressed genes detected in every two single P. somerae cells reached 1756, which is 86.5% of the total expressed genes detected in the bulk. We found that 1850 expressed genes were detected in three cells, making up of 91.1% of those in the bulk. This trend shows that by sequencing a very small number of single cell libraries, it is possible to reconstruct a bulk equivalent transcriptome, offering global information as well as cellular heterogeneity. Moreover, the gene expression of the bulk sample is assumed to be the sum of multiple single cell gene expression, and could lead to the discovery of some emergent expressions due to cell-cell interaction. We estimated that the sequencing of about 170 single P. somerae cells would lead to the construction of a bulk gene expression profile while still having single cell resolution, based on a combinatoric estimate. Therefore, in this study, three single P. somerae cells are not sufficient to represent the bulk population gene expression due to both small cell number and technical variations. In our follow-up studies, it would be worthwhile to perform a larger scale SC-WTS based on the combinatoric estimate to experimentally reconstruct a bulk-equivalent transcriptome and to examine and minimize the effects of technical variation occurred during amplification and sequencing. The Venn diagram shows the correlation between the number of expressed genes in the three single cells respectively (Figure 5(b)). 1133 expressed

ACS Paragon Plus Environment

-9-

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 20

genes were detected in all three cells, which makes up of 55.8% of the ones in the bulk sample, and 4-5.5% of the expressed genes were detected in only one of the cells. It is likely that these transcripts are low-abundance or expressed in a small fraction of cells. The results display that bacterial single cell RNA-seq could potentially enable the detection of genes that are below the detection limit in the bulk sample or are in low-abundance. Note that there was only 0.3% ribosomal RNA detected in the RNA extracted from the bulk P. somerae sample and 0.005% in the single cells. Based on earlier studies, the low amount of ribosomal RNA in P. somerae is mostly linked to its slow growth rate59. To the best of our knowledge, there is no evidence that P. somerae is a species that has high polyploidy, and others have found that polyploidy is often tightly linked with cell size although not proportionally60. We sorted the detected genes in the order of the ones with highest to the lowest RPKM in the bulk sample (Figure 6(a)). In the bulk sample, 99.8% of the genes showed certain number of transcripts, among which 93.2% of the genes displayed medium expression level (100 – 1,000 RPKM), while only 2.5% displayed relatively high expression level (>1,000 RPKM) and 4.3% displayed low expression (1 RPKM, and only 0.5-1.2% of the genes had >1,000 RPKM expression level, which echoes the finding done by fluorescent in-situ hybridization that single bacterial cells display zero transcripts for many genes and the expression is stochastic61. Earlier reports show that bacterial gene expression is influenced by a number of cellular parameters and growth conditions including cell size and growth rate which further affect other factors such as gene copy numbers and RNA polymerase and ribosomal content62, 63. It is worthwhile to mention that 0.2-0.7% genes of the single cells had very high expression level (>10,000 RPKM). We also report the correlation coefficient values for each of the single cells (Figure 7), the results showed that the bulk sample and single cell gene expression values are not correlated. Also, note that the detectable RPKM level largely relies on the efficiency of RNA-seq technology. In this study, 0 RPKM indicates no gene expression

ACS Paragon Plus Environment

- 10 -

Page 11 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

was detected rather than the gene had no expression due to the levels of detection of current RNA-seq technology. In addition, technical variabilities have not been rigorously excluded as a source of RPKM variation in the single cell results, and it is worthwhile to determine the sources of the variations and characterize their impact on the sequencing data in the future. We showed the top 20 expressed genes detected from each of the single cells and the bulk sample in Figure 8. The results show that each cell is transcriptomically distinct, and that a bulk analysis assumes a transcriptomic homogeneity would be erroneous. Large scale studies will be able to provide much needed clarity into how much gene expression heterogeneity is present within clonal bacterial populations, between clonal populations, through space, through time, and disturbances inflicted by environmental changes and antibiotic exposures. We anticipate that the thorough understanding of single-cell bacterial gene expression will revolutionize our understanding of biology, and ultimately how we approach and treat microbes in the context of a disease.

Conclusions Single cell whole transcriptome sequencing has found various applications in eukaryotic cells. However, it has rarely been applied to microbial cells, and the major hurdles including the minute amount of RNA template insufficient for producing required amount of cDNA as well as prominent contamination issues. These challenges can be adequately addressed using microfluidic platforms without resorting to additional rounds of amplification as the reaction volume is confined to nanoscales, thus minimizing contamination during non-targeted amplification. This work demonstrated single-round P. somerae SC-WTA in a microfluidic platform to generate a few nanograms of cDNA for low-input library construction and sequencing. The results revealed the vast differences of gene expression of single cells that cannot be identified otherwise. This proof-of-concept protocol can be adapted to perform SC-WTS on various bacterial species in wide range of human microbiome studies using microfluidic platforms. In the future, we endeavor to perform a larger scale of bacterial SC-WTS to investigate the minimize technical variation during the amplification and sequencing. Ultimately, we envision that it would be possible to perform bacterial SC-WTS on various clinical samples including surgically collected samples or bacterial cells in fixed tissues to open up this technology to clinical studies.

ACS Paragon Plus Environment

- 11 -

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 20

Supporting Information Supporting information includes the components and concentrations of reagents (Table S1-S5) and a description of bioinformatic methods used in this work including reference genome and annotations, RPKM expression values and sequences of de novo assembled transcripts. Author Contributions YL cultured the cells, designed and fabricated the microfluidic devices, performed the SC-WTA experiments, analyzed the data, prepared Figure 1, 2, 5, 6, 7 and wrote the manuscript. PJ did bioinformatics work and analyzed the sequencing data, prepared Figure 3, 4, 7 and wrote the manuscript. JSJ prepared the library and wrote the manuscript. BE did the sequencing. JJ supervised the library preparation and sequencing. MWA supervised the project, analyzed the data and wrote the manuscript. Notes The authors declare no competing financial interest. Acknowledgements This work was support by the Ivan Bowen Family Foundation and by CTSA Grant Number KL2 TR002379 from the National Center for Advancing Translational Science (NCATS). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. In addition, we thank the Microbiome Program and the Center for Individualized Medicine at Mayo Clinic for their support, and Dr. Alexander Revzin at Mayo Clinic for granting us the access to his microfabrication facilities.

References 1. A.-E. Saliba, A. J. Westermann, S. A. Gorski and J. Vogel, Nucleic acids research 42 (14), 8845-8860 (2014). 2. N. Navin, J. Kendall, J. Troge, P. Andrews, L. Rodgers, J. McIndoo, K. Cook, A. Stepansky, D. Levy and D. Esposito, Nature 472 (7341), 90-94 (2011). 3. P. Dalerba, T. Kalisky, D. Sahoo, P. S. Rajendran, M. E. Rothenberg, A. A. Leyrat, S. Sim, J. Okamoto, D. M. Johnston and D. Qian, Nature biotechnology 29 (12), 1120 (2011). 4. J. Zhu and W. E. Paul, Cell research 20 (1), 4 (2010). 5. E. Hedlund and Q. Deng, Molecular aspects of medicine (2017). 6. Y. Liu and M. Walther-Antonio, Biomicrofluidics 11 (6), 061501 (2017).

ACS Paragon Plus Environment

- 12 -

Page 13 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

7. D. Zemmour, R. Zilionis, E. Kiner, A. M. Klein, D. Mathis and C. Benoist, Nature immunology 19 (3), 291 (2018). 8. B. Snijder, R. Sacher, P. Rämö, E.-M. Damm, P. Liberali and L. Pelkmans, Nature 461 (7263), 520 (2009). 9. S. V. Sharma, D. Y. Lee, B. Li, M. P. Quinlan, F. Takahashi, S. Maheswaran, U. McDermott, N. Azizian, L. Zou and M. A. Fischbach, Cell 141 (1), 69-80 (2010). 10. S. Helaine, A. M. Cheverton, K. G. Watson, L. M. Faure, S. A. Matthews and D. W. Holden, Science 343 (6167), 204-208 (2014). 11. I. Baccelli, A. Schneeweiss, S. Riethdorf, A. Stenzinger, A. Schillert, V. Vogel, C. Klein, M. Saini, T. Bäuerle and M. Wallwiener, Nature biotechnology 31 (6), 539 (2013). 12. G. M. Whitesides, Nature 442 (7101), 368-373 (2006). 13. S.-Y. Teh, R. Lin, L.-H. Hung and A. P. Lee, Lab on a Chip 8 (2), 198-220 (2008). 14. S. Siddiqui, N. Tufenkji and C. Moraes, Integrative Biology 8 (9), 914-917 (2016). 15. E. K. Sackmann, A. L. Fulton and D. J. Beebe, Nature 507 (7491), 181-189 (2014). 16. T. P. Lagus and J. F. Edd, Journal of Physics D: Applied Physics 46 (11), 114005 (2013). 17. J. Autebert, B. Coudert, F.-C. Bidard, J.-Y. Pierga, S. Descroix, L. Malaquin and J.-L. Viovy, Methods 57 (3), 297-307 (2012). 18. F. B. Dean, J. R. Nelson, T. L. Giesler and R. S. Lasken, Genome research 11 (6), 1095-1099 (2001). 19. Y. Kang, M. H. Norris, J. Zarzycki-Siek, W. C. Nierman, S. P. Donachie and T. T. Hoang, Genome research 21 (6), 925-935 (2011). 20. J. Wang, L. Chen, Z. Chen and W. Zhang, Integrative Biology 7 (11), 1466-1476 (2015). 21. R. Pinard, A. de Winter, G. J. Sarkis, M. B. Gerstein, K. R. Tartaro, R. N. Plant, M. Egholm, J. M. Rothberg and J. H. Leamon, BMC genomics 7 (1), 216 (2006). 22. S. T. Motley, J. M. Picuri, C. D. Crowder, J. J. Minich, S. A. Hofstadler and M. W. Eshoo, BMC genomics 15 (1), 443 (2014). 23. C. F. De Bourcy, I. De Vlaminck, J. N. Kanbar, J. Wang, C. Gawad and S. R. Quake, PloS one 9 (8), e105585 (2014). 24. M. Chen, P. Song, D. Zou, X. Hu, S. Zhao, S. Gao and F. Ling, PloS one 9 (12), e114520 (2014). 25. S. Zhong, S. Zhang, X. Fan, Q. Wu, L. Yan, J. Dong, H. Zhang, L. Li, L. Sun and N. Pan, Nature 555 (7697), 524 (2018). 26. V. Svensson, R. Vento-Tormo and S. A. Teichmann, Nature protocols 13 (4), 599 (2018). 27. L. Brown, J. M. Wolf, R. Prados-Rosales and A. Casadevall, Nature Reviews Microbiology 13 (10), 620-630 (2015). 28. Z. Chen, J. Wang, L. Chen and W. Zhang, Cyanobacteria, 75 (2017). 29. F. Han and S. J. Lillard, Analytical chemistry 72 (17), 4073-4079 (2000). 30. Y. Kang, I. McMillan, M. H. Norris and T. T. Hoang, Nature protocols 10 (7), 974 (2015). 31. A. M. Streets, X. Zhang, C. Cao, Y. Pang, X. Wu, L. Xiong, L. Yang, Y. Fu, L. Zhao and F. Tang, Proceedings of the National Academy of Sciences 111 (19), 7048-7053 (2014). 32. M. R. Walther-António, J. Chen, F. Multinu, A. Hokenstad, T. J. Distad, E. H. Cheek, G. L. Keeney, D. J. Creedon, H. Nelson and A. Mariani, Genome medicine 8 (1), 122 (2016). 33. Y. Liu, D. Schulze-Makuch, J.-P. de Vera, C. Cockell, T. Leya, M. Baqué and M. Walther-Antonio, Micromachines 9 (8), 367 (2018). 34. Y. Liu, J. Yao and M. Walther-Antonio, Biomicrofluidics 13 (3), 034109 (2019). 35. Z. C. Landry, S. J. Giovanonni, S. R. Quake and P. C. Blainey, Methods in enzymology 531 (2013). 36. J. A. Dodsworth, P. C. Blainey, S. K. Murugapiran, W. D. Swingley, C. A. Ross, S. G. Tringe, P. S. Chain, M. B. Scholz, C.-C. Lo and J. Raymond, Nature communications 4, 1854 (2013). 37. N. H. Youssef, P. C. Blainey, S. R. Quake and M. S. Elshahed, Applied and environmental microbiology, AEM. 06059-06011 (2011). 38. J. P. Didion, M. Martin and F. S. Collins, PeerJ 5, e3720 (2017). 39. H. Li, arXiv preprint arXiv:1303.3997 (2013). 40. R. R. Wick, L. M. Judd, C. L. Gorrie and K. E. Holt, PLoS computational biology 13 (6), e1005595 (2017). 41. A. Bankevich, S. Nurk, D. Antipov, A. A. Gurevich, M. Dvorkin, A. S. Kulikov, V. M. Lesin, S. I. Nikolenko, S. Pham and A. D. Prjibelski, Journal of computational biology 19 (5), 455-477 (2012).

ACS Paragon Plus Environment

- 13 -

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 20

42. T. Seemann, Bioinformatics 30 (14), 2068-2069 (2014). 43. J. Chai, G. Kora, T.-H. Ahn, D. Hyatt and C. Pan, BMC evolutionary biology 14 (1), 207 (2014). 44. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis and R. Durbin, Bioinformatics 25 (16), 2078-2079 (2009). 45. S. Anders, P. T. Pyl and W. Huber, Bioinformatics 31 (2), 166-169 (2015). 46. . 47. S. K. Ames, D. A. Hysom, S. N. Gardner, G. S. Lloyd, M. B. Gokhale and J. E. Allen, Bioinformatics 29 (18), 2253-2260 (2013). 48. B. Tjaden, Genome biology 16 (1), 1 (2015). 49. A. J. Westermann, S. A. Gorski and J. Vogel, Nature Reviews Microbiology 10 (9), 618 (2012). 50. M. Hammond, F. Homa, H. Andersson-Svahn, T. J. Ettema and H. N. Joensson, Microbiome 4 (1), 52 (2016). 51. M. Rhee, Y. K. Light, R. J. Meagher and A. K. Singh, PloS one 11 (5), e0153699 (2016). 52. Y. Fu, C. Li, S. Lu, W. Zhou, F. Tang, X. S. Xie and Y. Huang, Proceedings of the National Academy of Sciences 112 (38), 11923-11928 (2015). 53. Z. Yu, S. Lu and Y. Huang, Analytical chemistry 86 (19), 9386-9390 (2014). 54. F. Ozsolak and P. M. Milos, Nature reviews genetics 12 (2), 87 (2011). 55. K. Shiroguchi, T. Z. Jia, P. A. Sims and X. S. Xie, Proceedings of the National Academy of Sciences 109 (4), 1347-1352 (2012). 56. Q. Peng, R. V. Satya, M. Lewis, P. Randad and Y. Wang, BMC genomics 16 (1), 589 (2015). 57. C. Zong, S. Lu, A. R. Chapman and X. S. Xie, Science 338 (6114), 1622-1626 (2012). 58. Y. Marcy, T. Ishoey, R. S. Lasken, T. B. Stockwell, B. P. Walenz, A. L. Halpern, K. Y. Beeson, S. M. Goldberg and S. R. Quake, PLoS Genet 3 (9), e155 (2007). 59. B. R. Levin, I. C. McCall, V. Perrot, H. Weiss, A. Ovesepian and F. Baquero, MBio 8 (1), e02253-02216 (2017). 60. J. E. Mendell, K. D. Clements, J. H. Choat and E. R. Angert, Proceedings of the National Academy of Sciences 105 (18), 6730-6734 (2008). 61. Y. Taniguchi, P. J. Choi, G.-W. Li, H. Chen, M. Babu, J. Hearn, A. Emili and X. S. Xie, science 329 (5991), 533-538 (2010). 62. S. Klumpp and T. Hwa, Current opinion in biotechnology 28, 96-102 (2014). 63. V. Shahrezaei and S. Marguerat, Current opinion in microbiology 25, 127-135 (2015).

ACS Paragon Plus Environment

- 14 -

Page 15 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 1. An overview of the workflow of bacterial SC-WTA for sequencing. The process includes cell culture and RNA protection treatment, single cell isolation, RT and cDNA amplification in microfluidic chip. Blue rectangles standard off-chip processes. Yellow box highlights the on-chip steps.

ACS Paragon Plus Environment

- 15 -

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 20

Figure 2. On-chip single cell RNA amplification. (a) Comparison of on-chip amplification of 100 pg and 0.1 pg purified P. somerae total RNA and a single P. somerae cell. (b) Comparison of amplification yield versus amplification time. N=10 per experiment, error bars show standard deviation.

Figure 3. Top contamination sources in amplified transcripts from single cells compared to extracted RNA from bulk cells.

ACS Paragon Plus Environment

- 16 -

Page 17 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 4. Normalized coverage for (a) P. somerae bulk RNA and (b-d) P. somerae single cells.

ACS Paragon Plus Environment

- 17 -

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 20

Figure 5. (a) Comparison of total number of genes detected in single P. somerae cells and P. somerae bulk RNA. Detected gene ratio refers to the number of expressed genes detected in single cell versus the number of expressed genes in the bulk RNA. (b) Venn diagram showing total number of genes detected in each of the three single P. somerae and the overlapped detection among the three cells.

Figure 6. (a) Genes sorted based on the descending order of expression level (RPKM) in the bulk sample. Genes expressed in the single cells did not show correlation with those in the bulk. (b) Genes sorted based on the descending order of expression level (RPKM) in single cells and bulk sample respectively.

ACS Paragon Plus Environment

- 18 -

Page 19 of 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 7. Comparison of bulk and single cell gene expression profiles, indicating lack of correlation between the bulk and single cell gene expression level.

Figure 8. Top 20 expressed genes in the bulk RNA and each of the single cells, displaying the heterogeneity of the expression level.

ACS Paragon Plus Environment

- 19 -

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 20

For TOC only

ACS Paragon Plus Environment

- 20 -