Design and Experimental Validation of Small ... - ACS Publications

Dec 29, 2016 - ACS eBooks; C&EN Global Enterprise .... Design and Experimental Validation of Small Activating RNAs Targeting an Exogenous ... 2) when ...
1 downloads 0 Views 1MB Size
Subscriber access provided by Fudan University

Article

Design and experimental validation of small activating RNAs targeting an exogenous promoter in human cells Edouard A. Harris, Alla Buzina, Jason Moffat, and David R. McMillen ACS Synth. Biol., Just Accepted Manuscript • DOI: 10.1021/acssynbio.6b00125 • Publication Date (Web): 29 Dec 2016 Downloaded from http://pubs.acs.org on December 31, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

ACS Synthetic Biology is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

Design and experimental validation of small activating RNAs targeting an exogenous promoter in human cells Edouard A. Harris,†,‡,¶ Alla Buzina,§ Jason Moffat,k and David R. McMillen∗,†,‡,¶ †Department of Physics, University of Toronto, 60 St. George St., Toronto, Ontario, M5S 1A7 Canada ‡Department of Chemical and Physical Sciences, University of Toronto Mississauga, 3359 Mississauga Rd., Mississauga, Ontario, L5L 1C6 Canada ¶Impact Centre, University of Toronto, 112 College St., Toronto, Ontario, M5G 1A7 Canada §Banting and Best Department of Medical Research, University of Toronto, 160 College St., Toronto, Ontario, M5E 3E1 Canada kDepartment of Molecular Genetics, University of Toronto, 160 College St., Toronto, Ontario, M5S 3E1 Canada E-mail: [email protected]

Abstract It is increasingly practical to co-opt many native cellular components into use as elements of synthetic biological systems. We present the design, and experimental investigation, of the first exogenous genetic construct to be successfully targeted by RNA activation, a phenomenon whereby small double-stranded RNAs increase gene expression from sequence-similar promoters by a mechanism thought to be related to that

1

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of RNA interference. Our selection of activating RNA candidates was informed by a custom-written computer program designed to choose target sites in the promoter of interest according to a set of empirical optimality criteria drawn from prior research. Activating RNA candidates were assessed for activity against two exogenously-derived target promoters, with successful candidates being subjected to further rounds of validation as a precaution against potential off-target effects. A genetic platform was assembled that allowed activating RNA candidates to be simultaneously screened both for positive activity on the target reporter gene, and for possible nonspecific effects on cell metabolism. Several candidate sequences were tested to appraise the utility of this platform, with the most successful achieving a moderate activation level with minimal off-target effects.

Keywords RNA activation, promoter, double-stranded RNA, target prediction

1

Introduction

RNA activation (RNAa) is a phenomenon whereby double-stranded RNA (dsRNA) molecules termed small activating RNAs (saRNAs) increase the rate of gene expression from specific loci. RNAa was first discovered in human cells (1 , 2 ), when it was found that small dsRNAs of both native and exogenous origin were capable of stimulating the transcription of native genes whose promoters included a sequence complementary to the introduced dsRNA. As such, RNAa is analogous to RNA interference (RNAi), in that it allows, in principle, for sequence-specific and orthogonal control of the expression of multiple genes (3 ). Since its discovery, RNAa has been observed in several mammalian cell types (4 ), and in nematodes (5 ); moreover, chemical modifications that enhance its activation potential have also been reported (6 ). More recently, the biological mechanism of RNAa has become the subject

2

ACS Paragon Plus Environment

Page 2 of 34

Page 3 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

of systematic experimental investigation (7 –9 ). Though the specific biochemical pathway has yet to be definitively established, there is substantial evidence of an epigenetic connection (7 , 8 , 10 , 11 ). It has also been suggested (8 , 12 ) that saRNA molecules may orient RNA polymerase II in the sense direction on the DNA strand, thereby reducing the number of antisense false starts by the enzyme. For all the recent advances in this field, there remain a few obstacles to the deployment of RNAa in synthetic biological systems. Reported studies up to this point (1 , 2 , 4 –6 , 13 , 14 ) have been exclusively concerned with the regulation of native promoters, as opposed to those of exogenous origin. Yet, in order for RNAa to constitute an effective, modular tool in the arsenal of the synthetic biologist, the phenomenon must, to some extent, be functionally divested from its endogenous environment. This is particularly true in mammalian cells, whose promoters tend to be ill-defined and inconveniently long, making them poorly suited to synthetic applications (15 , 16 ). By contrast, many exogenous promoters, which may be derived from viral or other sources, are far better characterized, and are often short enough to package into plasmids for efficient lentiviral delivery into mammalian cells (17 ). Notably, in bacteria, the simplicity of typical prokaryotic promoters means that RNA-based engineering techniques for transcriptional activation tend to be better understood, and further advanced (18 ). Unfortunately, due to this very difference in promoter complexity, few control strategies developed in bacteria are viable in eukaryotic contexts. However, several recent approaches that have made use of modified forms of the CRISPR/Cas9 system (19 , 20 ) have achieved notable success in driving increased transcription from targeted loci, apparently with minimal off-target effects (21 –24 ). In this work, we describe the first RNAa system that targets a viral promoter, the murine stem cell virus (MSCV) 5’ long terminal repeat (5’LTR) (25 ), upon this promoter’s stable integration into a human cell line via lentiviral infection. The MSCV 5’LTR promoter was scanned by custom-written software that subjected potential target sites on its sequence to a set of heuristic criteria that, from past observations (4 , 26 ), had been found to yield func-

3

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

tional saRNAs with reasonably high probability. The saRNA candidate sequences returned by this software were then assessed experimentally by monitoring the level of green fluorescent protein EGFP) expressed from the reporter gene downstream of the targeted promoter. Candidates were also tested on cells harboring an EGFP construct driven by an unrelated promoter, to control for the possibility that an observed increase in expression could have been due to off-target effects. To streamline the validation process, we constructed a genetic platform that allowed saRNAs to be assessed, both for increased gene expression from the MSCV 5’LTR promoter, and for relevant nonspecific effects, concurrently in the same experiment. Our platform consists of two genetic loci: in the first, the MSCV 5’LTR promoter drives expression of EGFP; in the second, the human phosphoglycerate kinase (hPGK) promoter (27 ) drives expression of enhanced blue fluorescent protein (EBFP2). The plasmid containing this test platform was packaged into a lentivirus, and the platform was then stably integrated, by infection, into a human cancer cell line. Because of the low overlap in the emission spectra of the two fluorescent reporters, the expression levels of both could be monitored simultaneously by flow cytometry. Moreover, the integration of the two loci in close proximity to each other in the genome, along with their presence in each cell in necessarily the same copy numbers, has the effect of reducing or eliminating many sources of error. Our platform was used to evaluate the increase in gene expression caused by, and potential off-target activity due to, several saRNA candidates, and it successfully detected a few instances of nonspecificity.

2 2.1

Results and discussion Promoter target site identification

High-probability RNAa targets share many thermodynamic characteristics with promising RNAi targets, such as intermediate GC content (40% - 65%), lower 3’ stability of the sense strand relative to its 5’ stability, and a bias against large numbers of consecutive identical 4

ACS Paragon Plus Environment

Page 4 of 34

Page 5 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

nucleotides (4 , 28 ). But some additional features are unique to RNAa, such as avoidance of CpG islands by, and minimization of CpG sites within, the targeted sequence (26 ). With this in mind, we wrote an automated search program in Python (https://www.python.org) to scan a promoter and generate a list of potential RNAa target sites within that promoter that satisfy the criteria set out in (4 ). The program also attempts to predict possible nonspecific effects by automatically aligning the returned target sites against known human transcript collection, using the National Center for Biotechnology Information’s (NCBI) Basic Local Alignment Search Tool (BLAST). Transcripts that match the target sequence according to known RNAi discovery criteria (28 ) are flagged by the program. We began by investigating two well-characterized and readily available promoters for potential sensitivity to RNAa. Our RNAa search program was applied to the cytomegalovirus (CMV) and MSCV 5’LTR promoters, both of which are constitutive. The canonical CMV promoter is entirely contained in a CpG island, whereas the MSCV 5’LTR promoter comprises sub-regions that are not covered by CpG islands (29 ). On the basis of the program’s output, we selected two promising targets against the CMV promoter (Fig. 1(b)), and three against the MSCV 5’LTR promoter (Fig. 1(a)). We chose these particular targets from among those identified by our program, primarily because they appeared the least prone to nonspecific effects, based on the results of an automated BLAST search. We remark that, in order to identify any potential targets at all on the CMV promoter, the canonical requirement that targets not lie within a CpG island had to be relaxed, since the entire CMV promoter lies within a CpG island (see Fig. 1(b)).

2.2

Test for activity: MSCV 5’LTR and CMV promoters

We tested both CMV saRNA candidates, and all three MSCV 5’LTR saRNA candidates, for activity against their respective target promoters, both in the human kidney cancer cell line HEK293T, and in the human prostate cancer cell line PC-3. We began by infecting both cell lines separately with lentivirus containing the pLJM1 plasmid, which incorporates the 5

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

fLTR3 38

fLTR4 144

56

47

Page 6 of 34

162

65

fLTR6

(a)

fCMV3

fCMV1

87 105

327 345

(b)

Figure 1: Promoter maps and target site locations for saRNAs identified by RNAa software for initial validation. Annotated promoter regions are shown in green, CpG islands in grey, and saRNA target sites as pink arrows. All saRNA target sites are 19 nucleotides in length, beginning and ending inclusively at the indicated bases. The sequences of both promoters are provided in the Supporting Information (SI) (30 ), Sections 1.1 and 1.2. (a) The MSCV 5’LTR promoter, with three identified target sites. There is a CpG island downstream of base 166, with the portion of the promoter upstream of this being amenable to RNAa, according to the criteria of (4 ). (b) The CMV promoter, with two identified target sites. Notice that the entirety of the promoter lies within a CpG island. Images were created using the gene annotation program Geneious (31 ). The sequences of all saRNA candidates are provided in Table 1 of the SI (30 ).

6

ACS Paragon Plus Environment

Page 7 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

CMV promoter driving an enhanced green fluorescent protein (EGFP) reporter gene, and with lentivirus containing the pLL5.0 plasmid, which includes the MSCV 5’LTR promoter driving an EGFP reporter gene. This resulted in four distinct stable cell lines: HEK293T cells infected with pLJM1 (HEK293T-pLJM1); HEK293T cells infected with pLL5.0 (HEK293TpLL5.0); PC-3 cells infected with pLJM1 (PC-3-pLJM1); and PC-3 cells infected with pLL5.0 (PC-3-pLL5.0). Our lentivirus infection protocol is given in Section 3.3. All plasmid maps are provided in Section 10 of the SI (30 ), along with GenBank files for each plasmid. The per-cell fluorescence of the EGFP reporter was assessed in each case by flow cytometry (see Section 3.5). The MSCV 5’LTR-targeting sequences were all found to deliver increased expression from their target locus (Figs. 2(a), 2(b), 2(c); 3(a), 3(b), 3(c)). In HEK293T cells, the saRNAs fLTR3, fLTR4 and fLTR6 were found to result in fluorescence increases of 55%, 146%, and 54% over controls, respectively (see Section 3.5). The corresponding numbers for PC-3 cells were 29% for fLTR3, 49% for fLTR4, and 38% for fLTR6. Controls were mock transfections without saRNA, carried out by following the transfection procedure given in Section 3.4, with the single change that the OPTI-MEM-diluted saRNA mixture was replaced by pure OPTI-MEM. Maximum activity was recorded between 72 h and 96 h following transfection, and EGFP levels increased steadily and consistently over the first three days (see the SI (30 ), Figs 1 and 3). These kinetics are consistent with those that have been reported elsewhere for RNAa (6 ). The CMV-targeting saRNAs, by contrast, exerted little detectable influence upon reporter fluorescence (Figs. 2(d), 2(e); 3(d), 3(e)), and this influence was inconsistent across cell lines and time points (see the SI (30 ), Figs 2 and 4). Owing to the greater observed activity of RNAa in the HEK293T cell line, all subsequent experiments were conducted solely in these cells. The evidence for successful activation of the MSCV 5’LTR promoter, along with the consistency of the effect across different replicates of the HEK293T cell line, and across time points, prompted us to investigate the MSCV 5’LTR-targeting saRNAs in fuller detail.

7

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 8 of 34

Page 9 of 34

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

2.3

External control for off-target effects

We sought to verify that the observed increase in the expression of the EGFP reporter from the MSCV 5’LTR locus was due to specific, promoter-dependent activation, rather than to a spurious off-target effect. To this end, we produced a stable line of HEK293T cells, in which EGFP expression was driven by the CMV promoter (see Section 3.3), and then transfected these cells with the fLTR3, fLTR4, and fLTR6 saRNAs. If the effect of these saRNAs were indeed tied to the sequence of the MSCV 5’LTR promoter, we would expect them to have no measurable effect on the expression of a reporter associated with a non-cognate promoter such as CMV. Transfection with fLTR6 was, indeed, found not to significantly increase the expression of EGFP from the CMV promoter, as compared with mock transfections, over the course of the control experiment (Fig. 4). It thus remained plausible that the upregulatory effect of fLTR6 was, as expected, specific to its cognate target promoter. By contrast, we observed that two of the three saRNAs, fLTR3 and fLTR4, did in fact increase EGFP expression from the CMV promoter to a significant degree, contrary to our prior expectations (see Fig. 4). This is a strong indication that the previously observed increases in EGFP expression from the MSCV 5’LTR promoter may have been due to nonspecific effects. The fLTR4 saRNA caused a notably large rise in EGFP expression in pLJM1-transfected cells (Fig. 4). Moreover, this expression was associated with a kinetic profile that was inconsistent with its previously recorded effect upon the MSCV 5’LTR promoter (see the SI (30 ), Fig. 5). We also applied our fCMV1 and fCMV3 saRNAs, which we previously found did not substantially increase expression from the CMV promoter, to HEK293T cells infected with the MSCV 5’LTR promoter, in order to verify that this promoter was not simply prone to increased expression as a general consequence of the transfection of an saRNA into the host cell. Interestingly, we noted a slight, though statistically significant, decrease in the expression of CMV-driven EGFP 72 hours following the initial transfection (Fig. 5). This 10

ACS Paragon Plus Environment

Page 10 of 34

Page 11 of 34

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

indicates that the nonspecific increase in EGFP expression caused by the fLTR3 and fLTR4 saRNAs was probably sequence-dependent, rather than arising as a general consequence of the transfection process itself. More importantly, it suggests that the observed specific effect of fLTR6 was most likely related to its particular sequence.

2.4

RNAa internal control platform

The use of an independent cell line to verify the specificity of an saRNA, though indicative, is nevertheless subject to several potential sources of experimental error. In Section 2.3, activation potential and specificity were tested in two different cell lines, and these two cell lines will inevitably have been subjected to slightly disparate conditions. If these differences in any way substantially affect the RNAa pathway, then they may in and of themselves lead to misleading results, either falsely supporting or falsely calling into question the specificity or activity of a tested saRNA. To eliminate this source of variability, we designed and assembled a genetic construct that combined the activation and specificity test loci into a single plasmid. The construct pLEH1 (see the SI (30 ), Fig.13) harbored the original MSCV 5’LTR-EGFP locus, in addition to a locus which contained the EBFP2 gene driven by the hPGK promoter. Following sequencing and validation, pLEH1 was stably integrated into a human HEK293T cell line by lentiviral infection (see Section 3.3). This infected line then underwent selection by fluorescenceactivated cell sorting (FACS), being gated for cells that were bright in both the green and blue channels, and were thus known to be constitutively expressing reporters from both loci (see Section 3.5). The sorted cell line could then be used as a platform for the general validation of saRNA candidates designed to target the MSCV 5’LTR promoter. The activity of an saRNA candidate was assessed by tracking the increase in EGFP fluorescence that resulted from that saRNA’s transfection into the infected cell line. Simultaneously, the specificity of a candidate could be estimated by measuring the level of EBFP2 fluorescence following transfection. EBFP2 fluorescence should not increase appreciably over mock controls, provided that the 12

ACS Paragon Plus Environment

Page 12 of 34

Page 13 of 34

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

rLTRcg 3

rLTR

fLTR5

21

52

42

129

70

147

60

fLTR8

274

292

fLTRcg Figure 6: MSCV 5’LTR promoter map with further target site locations for saRNAs identified by RNAa software (see Fig. 1). Software constraints were relaxed to allow for identification of potential RNAa targets within CpG islands, and also of targets on the reverse (antisense) strand of the promoter, which are indicated by pink arrows pointing to the left. saRNA target sites are all 19 nucleotides in length, beginning and ending inclusively at the indicated bases. Image was created using the gene annotation program Geneious (31 ). The sequences of all saRNA candidates are provided in Table 1 of the SI (30 ). transfected saRNA candidate differs markedly in sequence from the hPGK promoter that drives EBFP2 on the test plasmid. The pLEH1 plasmid thus provides a convenient internal control for off-target effects of a saRNA candidate that effect substantial changes to overall rates of protein synthesis. We used our platform to validate five saRNA candidates in addition to the fLTR6 sequence, all of which targeted the MSCV 5’LTR promoter. These candidates were given the designations fLTR5, fLTR8, fLTRcg, rLTR, and rLTRcg (see Fig. 6). Of these, fLTR5, fLTR8 and fLTRcg all targeted the promoter’s sense strand, whereas rLTR and rLTRcg targeted the promoter’s antisense strand. Both fLTRcg and rLTRcg targeted sequences within the MSCV 5’LTR promoter’s unique CpG island, in breach of the prescription against localizing RNAa targets in GC-rich regions (4 ). 14

ACS Paragon Plus Environment

Page 14 of 34

Page 15 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

We verified the activity and specificity of the fLTR6 saRNA using our platform, observing an increase in EGFP expression over controls of ∼43% at 72 hours post-transfection, which was consistent with the results observed in the earlier pilot experiments (Fig. 7, p ≈ 0.13). At the same time, an increase of only ∼4% in EBFP2 fluorescence was observed as a result of fLTR6 transfection (p ≈ 0.64), indicating a low probability of non-specific effects upon transcription for this saRNA. All of the saRNAs tested were found to effect changes in fluorescence in the green channel of 20% or more, but only for the rLTR saRNA was the change statistically significant (∼61% increase, p ≈ 0.005). The rLTR sequence, however, was also found to cause a significant fluorescence increase in the EBFP2 channel (∼28% increase, p ≈ 0.02; see Fig. 7). The much higher magnitude and significance level associated with rLTR’s EGFP fluorescence increase, as compared to its increase in EBFP2, indicates that the EGFP increase associated with rLTR may have been caused by a combination of specific and non-specific effects. The rLTRcg saRNA effected the second-highest mean fluorescence increase in the EGFP channel (∼47% increase, p ≈ 0.09), and led to virtually no change in EBFP2 levels (∼3% decrease, p ≈ 0.72; see Fig. 7). Though this lies below the significance threshold of p = 0.05, it provides some evidence for a specific, on-target activation effect. By contrast, the fLTRcg saRNA increased both EGFP and EBFP2 levels by roughly similar relative amounts, respectively ∼31% (p ≈ 0.11) and ∼23% (p ≈ 0.04; see Fig. 7). The EBFP2 increase for fLTRcg is suggestive that the observed increase in EGFP expression for this saRNA was likely caused by off-target changes to cellular metabolism, rather than by the desired sequence-dependent on-target effect. The fLTR8 saRNA gave a similar result, increasing EGFP levels by ∼24% (p ≈ 0.09) and EBFP2 levels by ∼11% (p ≈ 0.19), indicating that the response due to fLTR8 may be ascribable to similar causes as those observed for fLTRcg. The fLTR5 saRNA candidate delivered an EGFP increase of modest magnitude and significance (∼23% increase, with p ≈ 0.18) with virtually no change in EBFP2 levels compared

15

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 ACS Paragon Plus Environment

Page 16 of 34

Page 17 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

to controls (∼2% decrease, with p ≈ 0.84; see Fig. 7). In this circumstance, while fLTR5 is likely not causing undue off-target effects, its efficacy as an activator of transcription from its target locus is minimal and somewhat inconclusive.

2.5

qPCR validation of platform

In order to verify that the activity levels observed during our test of the saRNA control platform were in fact due to an increase in transcriptional rather than translational activity, we performed qPCR on EGFP, along with seven control genes (ACTB, CUL2, TIGD2, NUP43, DUSP16, ZNF14, and FIS1). The mRNA levels of these genes were assessed under induction by the fLTR5, fLTR6, and fLTRcg saRNA candidates (see Fig. 7), along with the control sequence fCMV1. The results are shown in Fig. 8, with more detail provided in Fig. 8 in the SI (30 ). Increases in average mRNA levels over controls for EGFP were found to be 1% for fCMV1, 43% for fLTR5, 87% for fLTR6, and 84% for fLTRcg. These results differ slightly in magnitude from those of Fig. 7 (see green bars), but the trend does appear consistent, considering the differences between protein and mRNA levels. Results for control endogenous genes appear to broadly follow the trends of Fig. 7 in most cases. In particular, fLTRcg was identified in our platform test as potentially resulting in off-target activity, as it was found to increase EBFP2 levels beyond those of mock controls. This trend was supported by our qPCR results, which indicate that this saRNA candidate increased the mRNA levels of all genes tested, and in three of tested genes significantly, above those of controls. This strongly suggests that fLTRcg acts to increase transcription in a way that is not specific to its putative target gene, and confirms the fluorescent reporter-based result obtained from our platform for this candidate sequence. The fLTR5 saRNA caused significantly increased transcriptional activity only in the EGFP gene, and did not significantly affect activity of the control genes. The fLTR6 saRNA caused a slight increase of activity (16%) for 1 of 7 examined genes (p ≈ 0.04). This suggests that gene-specific transcriptional activation of 17

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

EGFP is likely occurring in our system, which is consistent the results of Fig. 7 for these saRNAs.

2.6

Significance and future work

RNA activation is a promising phenomenon from the standpoint of synthetic biology, because it potentially allows a biologist to modulate the expression of an unlimited number of genes independently, without the laborious protein engineering entailed by a search for artificial transcription factors. Until now, however, the promise of RNAa has been confined to native target promoters, which do not afford the flexibility or facility of direct manipulation provided by an exogenous system amenable to synthetic fine-tuning. We have advanced evidence that RNAa can target an exogenous promoter successfully in a mammalian cellular context, and show that exogenous promoters are amenable to essentially similar discovery techniques that have previously been deployed in the targeting of their native counterparts for RNAa. Indeed, our saRNA screening software identified two or three saRNAs (namely, fLTR6, rLTRcg, and possibly fLTR5) that detectably increased the expression of the target protein (EGFP)while effecting no visible change to levels of the internal control reporter (EBFP2). This success rate of 25-38% is comparable to that reported elsewhere (4 ) for a heuristic algorithm working based on similar selection criteria in the context of native genes. Though our research has not revealed saRNAs that have the same three- to six-fold activation potency as some of the others that have been reported (4 ), it does constitute a proof of concept for the deployment of RNAa in a synthetic context. Moreover, it is known that even a modest increase in expression of a target gene may have biologically relevant effects if the target gene is coupled downstream to a sensitive pathway (32 ), and this potentially imparts direct utility on our discovered saRNAs. Our findings also lend some support to the heuristic saRNA discovery framework advanced in (4 , 26 ), though notably with some mixed evidence with regard to the effect of CpG islands on RNAa activity. Our results do indicate, however, that this framework might be applied more broadly, beyond 18

ACS Paragon Plus Environment

Page 18 of 34

Page 19 of 34

250% Percent of gene expression relative to mock transfection

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

200%

**

fCMV1 fLTR5 fLTR6 fLTRcg

** **

150%

* *

100%

50%

0% EGFP GFP

CUL 2

TIGD2

NUP43

DUSP16

ZNF14

!" FIS1 FAS1

Figure 8: qPCR results showing levels of EGFP, CUL2, TIGD2, NUP43, DUSP16, ZNF14, and FIS1 gene expression relative to mock transfections in RNA samples obtained from 3 biological replicates (consisting of three technical replicates each) of HEK293T-pLL5.0 cell line sampled 48 hours after transfections with 50 nM of the indicated saRNAs (fCMV1, fLTR5, fLTR6, and fLTRcg). Each gene’s RNA levels have been normalized to those of actin (ACTB). Error bars show ± one standard deviation. An asterisk (*) above a column in a panel denotes a significant deviation (p ≤ 0.05) of that column from the transfection with the control fCMV1 saRNA for that panel. Two asterisks (**) correspond to p ≤ 0.01.

19

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the native promoters of mammalian (4 ) and other eukaryotic cells (5 ), to virus-derived promoters introduced artificially into eukaryotic genomes. Questions remain to be addressed with regard to the modularity of the activation phenomenon in the broader genetic context in which it acts. Future investigations could focus, for example, on the effects of varying both the target promoter and the output gene. Preliminary results (data not shown) indicate that the activation persists upon replacement of EGFP with d2EGFP, a faster-degrading form of the protein (but one sharing most of its sequence with EGFP). Testing the activation over a wider range of promoter and gene targets will be required before the degree of modularity of the phenomenon can be established. Apart from its utility as a validator of saRNAs for synthetic biological applications, our RNAa control platform may also serve as an effective starting point in the experimental investigation of the basic biology of RNAa. It may be possible, for instance, to mutate sites on the MSCV 5’LTR promoter, in an effort to make certain regions of the sequence more amenable to saRNA activity, with or without materially affecting the promoter’s constitutive level of activity. The presence of a synthetic platform might also allow for the investigation of which region of the promoter is best to target to attain maximal results from RNAa; it may be that target sequences distal to the transcription start site are more responsive than proximal ones, for example. The artificial construct created in this work provides a means of carrying out such an investigation in a systematic manner, making the process of inferring the characteristics of a successful saRNA far more straightforward, and accelerating research upon, and systematization of, the RNAa phenomenon in the interest of both biological research and synthetic genetic circuit assembly.

20

ACS Paragon Plus Environment

Page 20 of 34

Page 21 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

3

Materials and methods

3.1

Calculations and software

RNAa targets were identified and ranked using a custom-written Python program (https: //www.python.org/), the source code for which will be made available upon request. This program was based on a series of design criteria that have previously been published (4 ). In brief, the program begins by taking as input a promoter sequence, and converts this sequence into a list of all possible subsequences of 19 nucleotides in length. Each subsequence, which is a potential RNAa target, then undergoes a series of checks; if the putative target fails any one of these checks, it is eliminated from the list of potential candidates. For each target sequence, the program verifies (1) that the target does not contain a stretch of four or more consecutive identical nucleotides; (2) that the 5’ end of the target (defined as the last four bases of the target (33 )) is more thermodynamically stable than its 3’ end (defined as the first four bases of the target (33 )), where the nearest-neighbor method is used to calculate nucleotide binding energies according to the values of the Gibbs free energies provided in (34 ); (3) that the proportion of GC nucleotides in the target is between 40% and 65%; (4) that the nucleotide at the 19th position of the target (that is, at the very 3’ end of the target) is an adenine (A); (5) that the nucleotide at the 18th position of the target is either an A or a thymine (T); (6) that the nucleotide at the 7th position of the target is a T; (7) that the target does not overlap with CpG sites; and (8) that the target does not overlap with CpG islands, with CpG islands being as defined in (29 ). Target sequences that survive the above checks are optionally aligned by the program against the human nucleotide transcript collection, using NCBI’s BLAST tool. The BLAST results are parsed by the program to identify targets whose RNAi seed regions closely match those of human genetic transcripts, since the saRNA corresponding to that target sequence is intended to be deployed in human cells (35 ). Genes that are deemed to yield close matches to the saRNA seed region are red-flagged as possible off-target matches, and given over to the

21

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

user of the program for manual assessment. The final output is a list of saRNA candidates that may target the promoter of interest, along with the possible off-target matches identified for each one of these candidates. All p-values were obtained by applying a two-tailed t-test between treated and control samples, as indicated in the appropriate figures. The calculation was implemented using the ttest_ind() method of the Python third-party submodule scipy.stats.

3.2

Plasmid cloning

The plasmids used for our initial pilot experiments with the CMV and MSCV 5’LTR promoters were, respectively, pLJM1 and pLL5.0 (see the Supporting Information (SI) (30 ), Sections 10.2 and 10.4). The external control experiment was carried out by transfecting the saRNAs targeting the MSCV 5’LTR promoter into stable HEK293T cell lines harboring pLJM1, which incorporates the CMV promoter (Fig. 4). The data for Fig. 5 was obtained by performing the opposite experiment, transfecting the saRNAs targeting the CMV promoter into stable HEK293T cell lines infected with pLL5.0, which included the MSCV 5’LTR promoter. The internal control plasmid pLEH1 (SI (30 ), Section 10.6) was constructed in two steps, using both traditional restriction enzyme cloning and the Gibson assembly method (36 ). Our starting material consisted of the pLJM1 plasmid, which incorporates one locus in which the EGFP gene is driven by the CMV promoter, and one locus in which the puromycin resistance (puroR) gene is driven by the hPGK promoter. We began by constructing plasmid pLJM1EBFP2 (SI (30 ), Section 10.3) by first amplifying the EBFP2 gene from the plasmid pLJY1 (SI (30 ), Section 10.7), with forward primer EBFP2-pLJM1 FWD (sequence 5’-CTC CCT CGT TGA CCG AAT CAC CGA CCT CTC TCC CCA GGG GGA TCC ACC GGA GCT TAC CAT GGT GAG CAA GGG CGA GGA GC-3’), and reverse primer EBFP2-pLJM1 REV (sequence 5’-CAT TGG TCT TAA AGG TAC CGA TGC ATG GGG TCG TGC GCT CCT TTC GGT CGG GCG CTG CGG GTC GTG GGG CGG GCG AGC TCG TCC 22

ACS Paragon Plus Environment

Page 22 of 34

Page 23 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

ATG CCG AGA GTG ATC C-3’), both of which include 3’ flanking sequences that overlap with the pLJM1 backbone. The pLJM1 plasmid was then restriction-digested with BamHI and NsiI, and the backbone ligated to the amplified EBFP2 gene using a Gibson Assembly Cloning Kit (New England Biolabs), thereby replacing the puroR gene in pLJM1 with the EBFP2 gene from pLJY1. Insertion of EBFP2 into pLJM1 was verified by sequencing the plasmid (Sanger method) using the primer pLJM1 EBFP2 SEQ1F, whose sequence is given in Table 2 of the SI (30 ), and by microscopic observation that transiently transfected cells fluoresced both green and blue, as expected. Next, we replaced the CMV promoter driving EGFP in pLJM1-EBFP2 with the MSCV 5’LTR promoter, taken from pLL5.0. To do this, we Gibson-assembled the backbone of pLJM1-EBFP2 (amplified from pLJM1-EBFP2 using the forward primer gibs-pLJM1-EBFP2 FWD, whose sequence is 5’-GCA ATT CGT CGA GGG ACC TAG CTT CGA ATT CTC GAC CTC G-3’, and the reverse primer gibs-pLJM1EBFP2 REV, whose sequence is 5’-CGA CCT GCT GGA ATC TCG TGT CAT GGG AAA TAG GCC CTC G-3’) and the MSCV 5’LTR promoter (amplified from pLL5.0 using the forward primer gibs-LTR FWD, whose sequence is 5’-CGA GGG CCT ATT TCC CAT GAC ACG AGA TTC CAG CAG GTC G-3’, and the reverse primer gibs-LTR REV, whose sequence is 5’-CGA GGT CGA GAA TTC GAA GCT AGG TCC CTC GAC GAA TTG C3’), using a Gibson Assembly Cloning Kit as before. Insertion of the MSCV 5’LTR promoter into pLJM1-EBFP2 was verified by sequencing the promoter region with the primers pLEH1 LTR SEQ1F and pLEH1 LTR SEQ2F, whose sequences are provided in Table 2 of the SI (30 ). We also confirmed plasmid functionality by microscopic observation of transfected and lentivirally-infected cells, as well as by flow cytometry during FACS.

3.3

Lentiviral production and infection

Lentivirus for the delivery of each plasmid was produced over a 5-day cycle. On day 1, the HEK293T packaging cells were seeded in 10 mL of low-antibiotic growth medium (Dulbecco’s modified Eagle’s medium (DMEM), 10% fetal bovine serum (FBS), and 0.1% penicillin / 23

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

streptomycin, all from Gibco, Lifetech) on 10 cm clear polystyrene tissue culture plates (Corning), at a density of ∼3.5 million cells per plate. These cells were left at room temperature for 20 to 30 minutes prior to incubation at 37◦ C, in an atmosphere of 5% CO2 , for 24 hours. On day 2, cells were transfected by preparing a DNA mixture of 6.0 µg of the chosen plasmid, 5400 ng of the PAX2 lentiviral delivery vector, and 600 ng of the MD2G lentiviral delivery vector per 10 cm plate. This mixture was then combined with a mixture of 540 µg of OPTI-MEM (Gibco, LifeTech) and 36 µL of X-tremeGENE 9 DNA Transfection Reagent (Roche) per 10 cm plate, following a 5-minute incubation of the latter mixture at room temperature. The combined mixture was then incubated for a further 45 minutes at room temperature, and subsequently transferred, drop-by-drop, to the packaging cells. The transfected packaging cells were incubated for 18 hours at 37◦ C at 5% CO2 . On day 3, the media was removed by aspiration, and replaced with 10 mL of high-bovine serum albumin (BSA) growth medium (DMEM with 6.4% 200 g/L BSA stock solution (BSA from BioShop) and 1% penicillin / streptomycin) for later viral harvests. Cells were incubated in the new media for a further 24 hours at 37◦ C and 5% CO2 . On day 4, the high-BSA media overlying the packaging cells, which contained the live lentivirus, was collected and purified by centrifugation at 200 g for 3 minutes. This media was replaced with 10 mL of fresh high-BSA media, and cells were incubated for a further 24 hours at 37◦ C and 5% CO2 . On day 5, a final viral harvest was collected by the same method as in day 4, and the supernatants from days 4 and 5 were combined. Live virus was stored at −80◦ C until required for infections. Viral infections were carried out on cell suspensions by combining, in each well of a six-well tissue culture plate (Corning), approximately 200 000 cells, either HEK293T or PC-3, with 4 µL of 2 mg / mL polybrene stock solution (hexadimethrine bromide; Sigma) in a variable volume of standard cell growth media (DMEM with 10% FBS for HEK293T; RPMI 1640 + L-glutamine (Gibco, Lifetech) with 10% FBS for PC-3). This cell / polybrene suspension was then combined with a mixture of standard cell growth media and the appropriate live lentivirus solution, in variable proportions, that totaled 1 mL per well of a six-well plate. The

24

ACS Paragon Plus Environment

Page 24 of 34

Page 25 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

volume proportions of lentivirus were varied from 1% to 100%, depending upon the plasmid to be integrated, to obtain different viral titers. The cell-lentivirus suspension was then incubated for 24 hours at 37◦ C and 5% CO2 . Following this incubation, the supernatant, containing live virus, was aspirated and replaced with standard cell growth media, and the cells were grown to approximately 70% confluence. All cell lines except those infected with pLJM1 were then passaged, and selected for the appropriate fluorescent marker by fluorescence-activated cell sorting (FACS), described in Section 3.5. HEK293T-pLJM1 and PC-3-pLJM1 cells, which expressed the puromycin resistance gene puroR, were selected by treatment with standard growth media supplemented with 2 µg / mL puromycin (Sigma).

3.4

saRNA transfections and cell growth

A forward transfection protocol was used to introduce the candidate dsRNA molecules into the cell lines described above. Assorted test saRNAs were ordered as custom siRNAs (Sigma); all sequences are provided in Table 1 of the SI (30 ). Note that all of these dsRNAs additionally possessed UU overhangs at the 3’ ends of each strand. Transfections were carried out in 12-well tissue culture plates (Corning), over the course of two days. On day 1, cells were seeded at densities of 10 000 to 20 000 per well, depending upon the experiment, in 2.5 mL of the appropriate standard cell growth media (see Section 3.3). On day 2, the transfection was performed by diluting an appropriate volume of 50 µM saRNA stock solution in 250 µL of OPTI-MEM (per well of a 12-well plate), and then diluting 5 µL of Lipofectamine RNAiMAX (LifeTech) in another 250 µL of OPTI-MEM, combining these two mixtures together, and allowing the combined mixture to incubate at room temperature for 10 to 20 minutes. This combined mixture was then added to the cells, making a total volume of 3 mL of growth media. The transfected cells were incubated at 37◦ C and 5% CO2 for between 24 and 120 hours, prior to being detached from their wells with 250 µL of EDTA solution (190 mg ethylenediaminetetraacetic acid disodium salt dihydrate (EDTA), 4.0 g NaCl, 280 mg NaHCO3 , 500 mg dextrose, and 200 mg KCl (all chemicals from Sigma) 25

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

in 500 mL of de-ionized water), then resuspended in Dulbecco’s phosphate buffered saline (PBS, Gibco, Lifetech) for subsequent cell sorting or flow cytometry.

3.5

Flow cytometry and cell sorting

To quantify the green and blue fluorescence of experimental cells, populations suspended in 250 µL of EDTA solution (see Section 3.4) were treated with 1 µL of 7-AAD live cell stain (BioLegend). Samples corresponding to the initial experiments of Fig. 2 were, however, not treated with 7-AAD. For all samples except those used to validate the saRNA internal control platform, prepared cells were analyzed by a FACSCalibur flow cytometer (BD Biosciences). Events were first gated in the forward- and side-scatter (FSC and SSC) channels, in order to identify those most likely to correspond to actual cells. Events were further gated in the red FL3 channel (682 nm / 30 nm) to identify live cells by their fluorescence due to the 7-AAD stain. Finally, the fluorescence of these live cells was recorded in the green FL1 channel (530 nm / 30 nm). The internal control samples shown in Fig. 7 were run on an LSRFortessa flow cytometry system (BD Biosciences), coupled to a high-throughput sampler (HTS, BD Biosciences). As above, events were gated in the FSC-H and SSC-H channels to enrich for HEK293T cells. These were, in turn, gated for live cells in the red channel (710 nm / 50 nm). The EGFP and EBFP2 fluorescences of the live cells were assessed, respectively, in the green (530 nm / 30 nm) and blue (450 nm / 50 nm) channels. All gating was carried out post-experimentally, using the flow cytometry software FlowJo 7.6.5 (FlowJo LLC). Events gated as live cells were exported from FlowJo as raw data, and this data was analyzed using a custom-written Python script, the source code for which will be made available upon request. Except in the case of cells containing the pLJM1 plasmid, the plasmids bearing the promoters being tested for RNAa activity incorporated no antibiotic resistance markers, so the cells in which these plasmids were integrated had to be selected by FACS (see below). Since FACS is prone to more false positives than is antibiotic selection, the resulting sorted cell populations were 26

ACS Paragon Plus Environment

Page 26 of 34

Page 27 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

comprised of two main sub-populations, one of which was successfully infected with the plasmid of interest, and one of which was not; this can clearly be discerned, for example, in Fig. ??. Consequently, when comparing mock controls to saRNA-treated populations, the triplicate fluorescence measurements shown in all figures are the averaged medians of the populations’ fluorescences. Had the population means been used instead, the uninfected sub-population would have caused the shift due to RNAa to incorrectly appear reduced, because this shift only occurs in the sub-population that actually harbors the reporter gene. All cell lines, with the exception of that infected with pLJM1, were sorted using a FACSAria II (BD Biosciences) instrument, by gating, in the case of the pLL5.0 and pLL5.1 plasmids, for green cells (525 nm / 50 nm). In the case of the pLEH1 plasmid, cells were gated for both green and blue (450 nm / 50 nm) fluorescence. Cells were prepared in standard sorting buffer (1× PBS, 1 mM EDTA, 25 mM HEPES (Sigma), pH 7.0, 1% BSA), and collected in FBS-rich media (standard media, with 50% FBS rather than 10%) prior to seeding in high-antibiotic media (standard media, with 2% penicillin / streptomycin).

3.6

RNA isolation and qPCR

Transfections were carried out in 12-well tissue culture plates in biological triplicates as described in section 3.4, except on day 1 cells were seeded at density of 100,000 per well. The transfected cells were incubated at 37◦ C and 5% CO2 for 48 hours and RNA was extracted using the RNeasy Plus mini kit (Qiagen) following the manufacturer?s protocol. RNA was then reverse-transcribed using the SuperScript II Reverse Transcriptase (Thermo Fisher Scientific) according to the manufacturer?s protocol. Real-time PCR was performed on a Biorad CFX96 instrument, using the Power SYBR Green Master Mix (Applied Biosystems) with 1/300 dilution of cDNA and a number of primer pairs. For each primer pair a calibration curve was generated using dilutions of untransfected HEK293T-pLL5.0 cell line cDNA. Every PCR was performed in triplicate. SYBR Green detection was followed by melting curve analysis to ensure that a single fragment was amplified during the reaction. Relative ex27

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 34

pression levels were calculated using actin as the endogenous control. Primer sequences are presented in Table 3 in the SI (30 ).

Acknowledgement The authors thank Franco Vizeacoumar, Richard Kil, and Brendan Hussey for valuable suggestions and helpful discussions. This work was funded by the Ontario Research Fund GL2 Program, and by the Natural Sciences and Engineering Research Council of Canada (NSERC) through the Discovery grant and Vanier Canada Graduate Scholarship programmes.

Supporting Information Available The following files are available free of charge. • Harris et al - Activating RNA - SI.pdf: Sequences of targeted promoters, candidate saRNAs, primers, all plasmid maps, and additional time course and qPCR data. • Harris et al - GenBank files for plasmids.zip: An archive of GenBank-format files for each plasmid used in this study. All files are named simply [plasmid name].gb: pLEH1.gb; pLJM1-EBFP2.gb; pLJM1.gb; pLJM17.gb; pLJY1.gb; pLL5.0.gb; and pLL5.1.gb. This material is available free of charge via the Internet at http://pubs.acs.org/.

References 1. Li, L. C., Okino, S. T., Zhao, H., Pookot, D., Place, R. F., Urakami, S., Enokida, H., and Dahiya, R. (2006) Small dsRNAs induce transcriptional activation in human cells. PNAS 103, 17337–17342. 28

ACS Paragon Plus Environment

Page 29 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

2. Place, R. F., Li, L. C., Pookot, D., Noonan, E. J., and Dahiya, R. (2008) MicroRNA373 induces expression of genes with complementary promoter sequences. PNAS 105, 1608–1613. 3. Rinaudo, K., Bleris, L., Maddamsetti, R., Subramanian, S., Weiss, R., and Benenson, Y. (2007) A universal RNAi-based logic evaluator that operates in human cells. Nat. Biotechnol. 25, 795–801. 4. Huang, V., Qin, Y., Wang, J., Wang, X., Place, R. F., Lin, G., Lue, T. F., and Li, L. C. (2010) RNAa is conserved in mammalian cells. PLoS ONE 5, e8848. 5. Turner, M., Jiao, A., and Slack, F. J. (2014) Autoregulation of lin-4 microRNA transcription by RNA activation (RNAa). Cell Cycle 13, 772–781. 6. Place, R. F., Noonan, E. J., Foldes-Papp, Z., and Li, L. C. (2010) Defining features and exploring chemical modifications to manipulate RNAa activity. Curr. Pharm. Biotechno. 11, 518–526. 7. Janowski, B. A., Younger, S. T., Hardy, D. B., Ram, R., Huffman, K. E., and Corey, D. R. (2007) Activating gene expression in mammalian cells with promoter-targeted duplex RNAs. Nat. Chem. Biol. 3, 166–173. 8. Cecere, G., Hoersch, S., O’Keefe, S., Sachidanandam, R., and Grishok, A. (2014) Global effects of the CSR-1 RNA interference pathway on the transcriptional landscape. Nat. Struct. Mol. Biol. 21, 358–367. 9. Portnoy, V., Lin, S. H. S., Li, K. H., Burlingame, A., Hu, Z.-H., Li, H., and Li, L.C. (2016) saRNA-guided Ago2 targets the RITA complex to promoters to stimulate transcription. Cell Res. 26, 320–335. 10. Morris, K. V., Santoso, S., Turner, A. M., Pastori, C., and Hawkins, P. G. (2008) Bidi-

29

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

rectional transcription directs both transcriptional gene activation and suppression in human cells. PLoS Genet. 4, e1000258. 11. Guo, G., Barry, L., Lin, S. S. H., Huang, V., and Li, L. C. (2015) RNAa in action: from the exception to the norm. RNA Biol. 11, 1221–1225. 12. Huang, V., Zheng, J., Qi, Z., Wang, J., Place, R. F., Yu, J., Li, H., and Li, L. C. (2013) AgoI interacts with RNA polymerase II and binds to the promoters of actively transcribed genes in human cells. PLoS Genet. 9, e1003821. 13. Wang, X., Wang, J., Huang, V., Place, R. F., and Li, L. C. (2012) Induction of NANOG expression by targeting promoter sequence with small activation RNA antagonizes retinoic acid-induced differentiation. Biochem. J. 443, 821–828. 14. Matsui, M., Sakurai, F., Elbashir, S., Foster, D. J., Manoharan, M., and Corey, D. R. (2010) Activation of LDL receptor expression by small RNAs complementary to a noncoding transcript that overlaps the LDLR promoter. Chem. and Biol. 17, 1344–1355. 15. Carninci, P. et al. (2006) Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38, 626–635. 16. Lin, H., and Li, Q. Z. (2011) Eukaryotic and prokaryotic promoter prediction using hybrid approach. Theor. Biosci. 7, 91–100. 17. Qin, J. Y., Zhang, L., Clift, K. L., Hulur, I., Xiang, A. P., Ren, B. Z., and Lahn, B. T. (2010) Systematic comparison of constitutive promoters and the doxycycline-inducible promoter. PLoS ONE 5, e10611. 18. Meyer, S., Chappell, J., Sankar, S., Chew, R., and Lucks, J. B. (2015) Improving fold activation of small transcription activating RNAs (STARs) with rational RNA engineering strategies. Biotechnology and Bioengineering

30

ACS Paragon Plus Environment

Page 30 of 34

Page 31 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

19. Cong, L., Ran, F. A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P. D., Wu, X., Jiang, W., Marraffini, L. A., and Zhang, F. (2013) Multiplex genome engineering using CRISPR/Cas System. Science 339, 819–823. 20. Mali, P., Yang, L., Esvelt, K. M., Aach, J., Guell, M., DiCarlo, J. E., Norville, J. E., and Church, G. M. (2013) RNA-guided human genome engineering via Cas9. Science 339, 823–826. 21. Gilbert, L. A., Horlbeck, M. A., Adamson, B., Villaita, J. E., Chen, Y., Whitehead, E. H., Guimaraes, C., Panning, B., Ploegh, H. L., Bassik, M. C., Qi, L. S., Kampmann, M., and Weissman, J. S. (2014) Genome-scale CRISPR-mediated control of gene repression and activation. Cell 3, 647–661. 22. Tanenbaum, M. E., Gilbert, L. A., Qi, L. S., Weissman, J. S., and Vale, R. D. (2014) A protein tagging system for signal amplification in gene expression and fluorescence imaging. Cell 3, 635–646. 23. Dahlman, J. E., Abudayyeh, O. O., Joung, J., Gootenberg, J. S., Zhang, F., and Konermann, S. (2015) Orthogonal gene knockout and activation with a catalytically active Cas9 nuclease. Nat. Biotechnol 33, 1159–1161. 24. Konermann, S., Brigham, M. D., Trevino, A. E., Joung, J., Abudayyeh, O. O., Barcena, C., Hsu, P. D., Habib, N., Gootenberg, J. S., Nishimasu, H., Nureki, O., and Zhang, F. (2015) Genome-scale transcriptional activation by an engineered CRISPRCas9 complex. Nature 517, 583–588. 25. Hong, S., Hwang, D. Y., Yoon, S., Isacson, O., Ramezani, A., Hawley, R. G., and Kim, K. S. (2007) Functional analysis of various promoters in lentiviral vectors at different stages of in vitro differentiation of mouse embryonic stem cells. Mol. Ther. 15, 1630–1639.

31

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

26. Wang, J., Place, R. F., Portnoy, V., Huang, V., Kang, M. R., Kosaka, M., Ho, M. K. C., and Li, L. C. (2015) Inducing gene expression by targeting promoter sequences using small activating RNAs. J. Biol. Methods 2, e14. 27. Pfeifer, G. P. (1992) Analysis of chromatin structure by ligation-mediated PCR. Genome Res. 2, 107–111. 28. Jagla, B., Aulner, N., Kelly, P. D., Song, D., Volchuk, A., Zatorski, A., Shum, D., Mayer, T., de Angelis, D. A., Ouerfelli, O., Rutishauser, U., and Rothman, J. E. (2005) Sequence characteristics of functional siRNAs. Bioinformatics 11, 864–872. 29. Takai, D., and Jones, P. A. (2002) The CpG Island Searcher: a new WWW resource. In Silico Biol. 3 . 30. See Supporting Information at INSERT_SM_LINK_HERE for further details. 31. Drummond, A. J., Buxton, S., Cheung, M., Cooper, A., Duran, C., Field, M., Heled, J., Kearse, M., Markowitz, S., Moir, R., Stones-Havas, S., Sturrock, S., Thierer, T., and Wilson, A. Geneious v5.5. 2011; available online at http://www.geneious.com. 32. Culler, S. J., Hoff, K. G., and Smolke, C. D. (2010) Reprogramming cellular behavior with RNA controllers responsive to endogenous proteins. Science 330, 1251–1255. 33. Schwarz, D. S., Hutvagner, G., Du, T., Xu, Z., Aronin, N., and Zamore, P. D. (2003) Asymmetry in the assembly of the RNAi enzyme complex. Cell 115, 199–208. 34. Santalucia, J., Jr. (1998) A unified view of polymer, dumbell, and oligonucleotide DNA nearest-neighbor thermodynamics. PNAS 95, 1460–1465. 35. Jackson, A. L., Burchard, J., Schelter, J., Chan, B. N., Clearly, M., Lim, L., and Linsley, P. S. (2006) Widespread siRNA ’off-target’ transcript silencing mediated by seed region sequence complementarity. RNA 12, 1179–1187.

32

ACS Paragon Plus Environment

Page 32 of 34

Page 33 of 34

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Synthetic Biology

36. Gibson, D. G., Young, L., Chuang, R. Y., Venter, J. C., Hutchinson, C. A., III, and Smith, H. O. (2009) Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345.

33

ACS Paragon Plus Environment

ACS Synthetic Biology

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Graphical TOC Entry

Promoter

Reporter Baseline GFP

activating RNA

34

Induced GFP

ACS Paragon Plus Environment

Page 34 of 34