Combined Computational-Experimental Approach to Explore

Explore Molecular Mechanism of SaCas9 with ... ¶Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, China...
0 downloads 0 Views 8MB Size
Subscriber access provided by IDAHO STATE UNIV

Article

Combined Computational-Experimental Approach to Explore Molecular Mechanism of SaCas9 with Broadened DNA Targeting Range Binquan Luan, guangxue Xu, Mei Feng, Le Cong, and Ruhong Zhou J. Am. Chem. Soc., Just Accepted Manuscript • DOI: 10.1021/jacs.8b13144 • Publication Date (Web): 29 Mar 2019 Downloaded from http://pubs.acs.org on March 29, 2019

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Combined Computational-Experimental Approach to Explore Molecular Mechanism of SaCas9 with Broadened DNA Targeting Range Binquan Luan,† Guangxue Xu,‡ Mei Feng,¶ Le Cong,∗,§ and Ruhong Zhou∗,¶,† Computational Biological Center, IBM Thomas J. Watson Research, Yorktown Heights, NY 10598, USA, School of Life Sciences, Tsinghua University, Beijing, 100084, China, Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, China, and Department of Pathology, Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA E-mail: [email protected]; [email protected]

Abstract Despite accelerating development of CRISPR technology, there remains high demand for further interrogation of its fundamental biology. This is particularly fascinating as new improved CRISPR tools were artificially engineered to harbor beneficial features but often lack mechanistic explanation. SaCas9, a minimal Cas9 ideal for in vivo applications, suffers from long protospacer adjacent motif (PAM), which prompted effort on mutant KKH SaCas9 with relaxed PAM requirement. Leveraging structure-based molecular dynamics simulation, freeenergy perturbation, and targeted experimentation, we developed a workflow for probing SaCas9 ∗ To

whom correspondence should be addressed Biological Center, IBM Thomas J. Watson Research, Yorktown Heights, NY 10598, USA ‡ School of Life Sciences, Tsinghua University, Beijing, 100084, China ¶ Institute of Quantitative Biology, Department of Physics, Zhejiang University, Hangzhou 310027, China § Department of Pathology, Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA † Computational

1

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and series of its variants, revealing intriguing dynamics of PAM recognition and molecular mechanism of KKH mutations. Furthermore, we deployed this approach to design and validate new mutant SaCas9, SaCas9-NR and SaCas9-RL, with enhanced targeting range distinctive from KKH mutant and improved activity in mammalian cells. Overall, our approach provides dynamic understanding of SaCas9 PAM recognition and a new gateway to enlighten rationally designed Cas9 variants harboring novel properties.

Introduction The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) system, originally found in bacteria and archaea, can adaptively resist foreign genetic materials to provide microbial immunity employing RNA-guided protein machineries and intricate molecular mechanisms. 1–9 Recent advances enabled harnessing of customized CRISPR systems for genome editing in eukaryotic organisms. 10–15 The exemplar class 2 Type II CRISPR system consists of Cas9 protein in complex with single-guide RNA (sgRNA) forming an programmable endonuclease that cleaves double stranded DNA (dsDNA) target. The dsDNA substrate contains a target strand complimentary to the guide sequence in sgRNA 7 and a non-target strand bearing a protospacer adjacent motif (PAM) required for target recognition. 1,2 For example, the widely used Cas9 from Streptococcus pyogenes (SpCas9) requires NGG PAM 7 while the newly identified Cas9 from Staphylococcus aureus (SaCas9) recognizes longer NNGRRT PAM. 16 SaCas9 is significantly smaller than SpCas9, which makes its delivery more convenient and efficient for gene therapy applications. 16 Despite this promise for clinical translation with compact size, the longer PAM of SaCas9 limits its targeting range and application potential, e.g. when its PAM is not in proximity of diseaserelevant loci. A recent report, without knowing the crystal structure, screened and found that a set of triple mutations E782K/N968K/R1015H (KKH) could effectively alter SaCas9 PAM from NNGRRT to NNNRRT. 17 Concurrently, the structure of wild-type SaCas9 bound with sgRNA/DNA was independently resolved, 18 which provides valuable insights on the molecular basis of SaCas9 2

ACS Paragon Plus Environment

Page 2 of 23

Page 3 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

function. However, given the dynamic nature of PAM recognition, it is highly desirable to further leverage crystal structure to probe into this elaborate process by examining both large-scale conformational changes and residue-wise interactions. Recently, exploration of in-silico approaches to study Cas9 recognition has been proposed, where the authors used enhanced molecular dynamics (MD) simulations to study the conformational transition of Cas9. 19 Nonetheless, establishing general applicability of MD-based methodology for CRISPR system and validating its predictability for gene editing activity warrant a concerted workflow combining computational analysis and experimental assay. Hence, we set out to develop this approach, which we termed COmbined Molecular dynamics and Experimental Target validation (COMET), and explore its utility in the study of SaCas9 PAM recognition and KKH mutant with altered PAM specificity. We also implemented COMET approach to develop SaCas9-NR and SaCas9-RL that harbors improved activity over NNGRR PAM in mammalian cells, removing the strong preference for thymine (T) base at the last position of SaCas9 PAM. Our results highlight the unique impact of each mutation and their coordinated interplay, which can guide rational engineering of SaCas9 variants with altered PAM preference. We believe this methodology could potentially serve as a general motif in exploring non-natural CRISPR utilities combining the power of computational physical chemistry and gene editing.

Results All-atom MD simulation of SaCas9 complex We begin our analysis by examining the high-resolution SaCas9 complex structure reported recently, 18 which provides excellent underlying framework for our simulation (although crystal contacts may affect local structure; see Figure S1 in supporting information). To establish dynamic details of SaCas9 complex in native state, we resort to molecular dynamics (MD) method to model the complex under physiological conditions (additional notes in supporting information). MD simulations, complimentary to experimental studies, have proven to be effective in understanding 3

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

protein-DNA interactions. 19,20 Here, we undertake extensive all-atom MD simulations to characterize the molecular mechanism of the binding of a DNA target with SaCas9-sgRNA complex (Figure 1a and Method). In our MD analysis, after equilibration in the bound state with target DNA substrate (Figure S2a), the secondary structures of RNA and DNA molecules were stable as the averages of saturated root-mean-square-deviations (RMSDs) calculated against the crystal structures were both around 2.5 Å. Following similar protocol, we independently obtained the equilibrated complex without bound DNA. From a global view, the saturated RMSD of backbone atoms was only about 3.2 Å in SaCas9 complex with substrate DNA (Figure S2a), while the HNH domain of the nuclease (NUC) lobe, without being blocked by the recognition (REC) lobe from a neighboring protein in the crystal environment (note Figure S1c), moved a distance of 7.6 Å toward the cleavage site on target DNA strand, which exactly recapitulated the physiological process (Figure S2c). Similarly, REC lobe as well as the end fragment of DNA-RNA heteroduplex moved closer to the NUC lobe (Figure S2c). On the other hand, for SaCas9 without the bound DNA (Figure S3), RMSDs increase and saturate at 7.5 Å, indicating a larger conformational change in relevant domains (see motions of HNH domain and REC lobe in Movie S2). Nonetheless, RMSDs for the DNA binding region (where proposed mutations lie) remain small (∼3.5 Å) (Figure S3). These observations are all in accordance with biochemical and biophysical analysis in previous reports . 21–24 With these equilibrated structures from both bound and free states that relate well to experimental findings, we laid out the foundation for examining the molecular basis of SaCas9 PAM recognition and performing free-energy perturbation (FEP) calculations on mutations of novel SaCas9 variants (i.e., in silico mutagenesis studies). As enlarged and highlighted in Figure 1a, the binding site in the PAM-interacting (PI) domain contains all three residues from KKH SaCas9, namely E782, N968, R1015, along with the G in NNGRRT, whose specificity was altered. 17 Hereafter, we use G3 to denote the third position of PAM (the base changed in KKH SaCas9 PAM) and G0 to denote the first nucleotide at the PAM proximal end on target DNA strand (see below). From MD results, we observed all key PAM-

4

ACS Paragon Plus Environment

Page 4 of 23

Page 5 of 23

a SaCas9

E782

N968

PAM

G’

G3

b

c

E782 -- K910

E782 -- G’ 0

50 t (ns)

5 R1015 -- G3 0

100

0

50 t (ns)

57 ns

0 ns E782

K910

K910

100

80 ns

E782 G’

G3

distance (Å)

5

d

N968 -- G3

10

10

0

R1015

gRNA

target DNA

distance (Å)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

E782 G’

G3

K910

G’

G3

Figure 1: MD simulations of the SaCas9 with the bound DNA and RNA. a) Model system. The SaCas9 protein is in the cartoon representation (green); The PAM-containing DNA, target DNA and RNA are colored in purple, grey and orange, respectively. Na+ and Cl− are represented as yellow and cyan spheres respectively. Water is shown transparently. The interaction between the PAM region of DNA and its surrounding protein residues are enlarged. b) Time-dependent distances for pairs E782–K910 and E782–G0 . c) Time-dependent distances for pairs N968–G3 and R1015–G3. d) Snap-shots of coordinations of E782 at 0, 57 and 80 ns. In FEP calculations for the E782K mutation, the Na+ ion in (d) was annihilated along with the E782 to avoid its unfavorable clash (or electrostatic repulsion) with the emerging the K782. Accordingly, in FEP calculations for the free state, an extra Na+ in the electrolyte (not close to the protein complex) was annihilated simultaneously with the E782 as well. 5 ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

recognizing residue interactions in crystal structure: R1015 coordinates the G3; N985 coordinates the A at the fourth position; and R991 dynamically coordinates both the A and the T at the fifth and the sixth positions respectively (see Figure S2b and Movie S1). It is worth noting that the role of R991 in coordinating A (or R generally) at the fifth position cannot be derived from the (static) crystal structure where none of the residues are in contact with the A, ascertaining the enhanced molecular details available with MD methodology. At a close-in view for finer details, several key residues located near the PI domain were observed to adjust side-chain positions compared with those in the crystal. This could be potentially due to an environment that bears closer resemblance to in vivo active state. By calculating distances of residue pairs (defined as the shortest distance between non-hydrogen atoms), we found that residue E782 could be close to either K910 or the G0 , signaling its possible direct involvement in PAM interaction (Figure 1b). On the other hand, the flexible N968 was near the G3, but it was not close enough to form a direct contact (Figure 1c). Notably, K910 formed dynamic coordinations that were not shown in the crystal structure. Figure 1d shows the conformation of K910 (in crystal environment at 0 ns) that was sandwiched by negatively charged E782 and G3. However, the positively-charged amine group (NH3+ ) in K910 did not form salt bridge with either the carboxyl group (COO− ) of E782 or the phosphate group (PO−1 4 ) of G3. During MD simulation, K910 moved toward E782 and formed a salt bridge with E782 after 57 ns (Figure 1d). This salt-bridge was broken later after a Na+ diffused into this region. Figure 1d shows that, at 80 ns, K910 formed a new salt bridge with G3, while at the same time E782 bound this Na+ , further coordinating the phosphate group of G0 in target DNA strand to stablize dsDNA binding. These coordinations were absent in substrate-free state and further demonstrates the critical role of dynamic conformational transitions for strong PAM recognition of SaCas9.

COMET combines FEP and experiments to probe Cas9 PAM recognition Completing system equilibrations and MD analysis allowed us to gain previously under-appreciated dynamics of PAM recognition. They also enabled us to begin addressing one of the fundamental 6

ACS Paragon Plus Environment

Page 6 of 23

Page 7 of 23

a

b G3

G3

15

∆G1 H1015 ∆GA

∆∆G (kcal/mol)

R1015

∆GB E993

E993

10

5

∆G2

R1015

N9 85 N9 A 86 R9 A 91 E9 A 9 R1 3A 01 5A

0 H1015

c ∆∆G (kcal/mol)

20

experimental Cas9 efficiency (normalized)

1.0

15 10 5 0

0

1

2

ln(WT/MT)

3

0.5

A 15 R

10

93 E9

99 R

A

1A

6A 98 N

N

98

5A

0.0

W T

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Figure 2: Free energy perturbation calculations with targeted experimental validation lead to development of COMET workflow. a) Thermodynamic cycle for calculating ∆∆G of the mutation R1015H. ∆GA and ∆GB are free energy changes for the dsDNA’s binding to the wide-type and mutant proteins, respectively; ∆G1 and ∆G2 are free energy changes for the mutation occurred at the DNA-bound state and DNA-free state, respectively. Atoms in protein residues 993 and 1015 are highlighted as van der Waals spheres; the rest are same as described in the caption of Figure 1. b) Free energy changes of alanine scanning on selected residues involved in PAM recognition. c) normalized Cas9 efficiency as measured in mammalian cell experiments using molecular constructs corresponding to the mutation scanning performed in computational analysis. Inset: robust linear correlation between FEP results and experimental Cas9 efficiency demonstrates validity of COMET. Linear regression was performed using ∆∆G and natural log (ln) of efficiency ratio for each mutant Cas9 tested over wild-type control. Goodness of fit by R square is 0.92.

7

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

challenges in the modeling genome editing tools, which is how to inform the contribution of protein residues to target recognition in a quantitative manner. To this end, we utilized an combined process where structural insight guided computational analysis, followed by targeted gene editing experimentation that justify further computational mapping of Cas9 variant activity, where in-silico prediction could correlate with experimental Cas9 editing efficiency. First, we started this effort by pushing beyond a qualitative understanding to quantify the contribution of PI domain residues to SaCas9 PAM recognition with free energy perturbation (FEP) calculations (Figure 2a and details in Method). We performed alanine scan analysis of the residues directly in vicinity of the PAM sequence. Figure 2b shows that mutations R991A and R1015A significantly reduce the binding free energies while N986A (due to the small value of ∆∆G) is much less important. Mutations N985A and E993A also casued about 2-4 kcal/mol increases in ∆∆G, which could destabilize the PAM binding. Experimentally, we created and expressed corresponding SaCas9 mutants bearing targeted alanine mutation with guide RNAs (gRNAs) to quantitatively assess Cas9 activity, measured as its cleavage efficiency across three different genomic targets. We normalized the efficiency against wild-type Cas9 so that reduction of activity after introducing alanine mutation would indicate the importance of the particular residue tested (Figure 2c). Next, to examine the correlation between computational and experimental data, we performed linear fitting of ∆∆G from each alanine mutation versus the transformed activity of experimental counterpart (calculated by taking the natural log of mutant SaCas9 efficiency over wild-type control), as plotted in Figure 2c (inset). Measured biological activity matched our FEP calculation extremely well, as indicated by goodness of fitting reaching 0.92 (Figure 2c inset), which highlights the efficacy of our computational screening method for identifying key residues for further experimental investigation/verification. Overall, these results revealed strong predictability of COMET approach, while we also noted possible non-linear factors that may affect the computational-experimental translation, such as endogenous genome context. The validation of COMET thus allowed analysis of the more complex KKH-SaCas9 mutant to understand the molecular mechanism of its broadened PAM range.

8

ACS Paragon Plus Environment

Page 8 of 23

Page 9 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Analysis on KKH SaCas9 to reveal molecular mechanism of expanded PAM The KKH mutations of SaCas9 involves three residues, namely E782K, N968K and R1015H. 17 The thermodynamic cycle for R1015H is illustrated in Figure 2a. In the bound state, the R1015 binds the G3 with two hydrogen bonds, responsible for the PAM specificity NNGRRT. This interaction was further stabilized by a salt bridge, between E993 and R1015, that can significantly reduce the conformational fluctuation of R1015. The same salt bridge is also present in the free state of SaCas9 as shown in Figure 2a. After the R1015H mutation, in the bound state H1015 moved away from G3, thereby releasing the specificity on G3 in the NNGRRT PAM. However, such mutation (denoted by ∆G1 in Figure 2a) significantly reduces the binding free energy (or binding affinity). Compared with the same mutation process in the free state (denoted by ∆G2 in Figure 2a), the net change of the binding free energy is +11.3 kcal/mol (Figure 3a). This is a significant reduction in binding affinity, even more unfavorable when compared with the ∆∆G for R1015A mutation (∼16.9 kcal/mol, in Figure 2b). Thus, despite reducing the PAM specificity, the binding between the PI domain of SaCas9 and the PAM region of dsDNA is destabilized by R1015H. To compensate this and stabilize protein-DNA binding, extra mutations (E782K and N968K) were introduced in previous work . 17 As shown in Figure 1d, the E782K mutation is expected to have profound changes on local coordinations: 1) the NH+ 3 group of K782 binds to the phosphate group of G0 in the target DNA strand directly; 2) K910 repelled by K782 binds to G3 more stably. Indeed, in the final stage of the FEP calculation, K910 and K782 were bound to G3 and G0 in two complimentary DNA strands respectively via the salt-bridge formed by the amine and the phosphate groups (Figure 3b), which can significantly increase DNA-protein binding free energy. Consistently, the calculated ∆∆G for the E782K mutation is -13.1 kcal/mol, much more favorable than the E782A mutation, which destabilizes the local coordination (E782-Na+ -G0 , Figure 1d) with a calculated ∆∆G of about 1.1 kcal/mol (Figure 3a). Furthermore, the N968K mutation was suggested to enhance substrate binding. 17 Consistent with experimental result, we found via FEP calculation that the ∆∆G for this residue change is about -2.3 kcal/mol (Figure 3a). In the end of FEP analysis for the bound state, K968 could move close to G3 in the PAM sequence because 9

ACS Paragon Plus Environment

Journal of the American Chemical Society

of electrostatic attraction between the amine group in K968 and the phosphate group in G3. Note that K910 can momentarily bind to G3 as well (Figure. 1d). Therefore, the smaller free energy reduction for the N968K mutation than that for E782K is the result of the temporary electrostatic repulsion between K910 and K968, i.e. a weaker binding between K968 and G3 of the PAM. Mutating N968 to alanine has a negligible effect on protein-DNA binding (∆∆G=0.5 kcal/mol), indicating its relatively neutrality on PAM recognition in wild-type SaCas9 which corroborates our earlier results from MD simulation (Figure 1c).

a

20

∆∆G (kcal/mol)

10

c

K782

T

b

H KK

KK

A E7 82 K N 96 8A N 96 8K R 10 15 A R 10 15 H

-10

-20

G’

K968

0

E7 82

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 23

d

T787 K782

G’

K782 G’

G3 K910 T

K910

K968

G3

H1015

Figure 3: Molecular mechanism of KKH SaCas9 as elucidated via COMET. a) Free energy changes for various mutations associated with the KKH mutant. b) The E782K mutation in SaCas9. Atoms in protein residues K782 and K910 are highlighted as van der Waals spheres; the rest are same as described in the caption of Figure 1. c) The representative configuration illustrating aqueous solvation at residues 782 and 968, after their respective E782K and N968K mutations. d) Perspective view of key interactions between KKH-SaCas9 protein and bound DNA.

The double mutations E782K and N968K (KK) yield an even stronger protein-DNA binding, 10

ACS Paragon Plus Environment

Page 11 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

enhancing the binding free energy by 14.2 kcal/mol (Figure 3a). If acting by a simple additive manner, the KK double mutation is expected to at least increase the binding free energy by 15.3 kcal/mol (via addition of two -∆∆G values). As a consequence of the E782K mutation, K910 can stably bind to G3 in the PAM sequence. However, competitively, K968 can also interact with the same G3. Thus, the change of -∆∆G for the KK mutation is less than the simple addition of two independent ones, indicating a complex interplay among these mutated residues. Finally, simultaneous triple mutations E782K, N968K, and R1015H (KKH) yielded ∆∆G of -3.9 kcal/mol, a net gain in binding free energy (Figure 3a). As expected, when the specific binding between R1015 and G3 was released, the PAM region on non-target DNA strand were allowed larger conformational fluctuation as indicated by the entropy calculation (see Supporting Information). Consequently, it is shown in the FEP computation that K968 can bind to the phosphate group of T (one nucleotide ahead of the G3 in TT GAAT PAM) and K910 can bind to G3, which reduces the electrostatic repulsion between K968 and K910, improving the affinity of protein-DNA binding (Figure 3c). The salt bridge formed by K968 and the T are well exposed to water, with 12 water molecules within 4 Å of the salt bridge (Figure 3c). However, the salt bridge formed by K782 and the G0 are considerably buried within the complex, with only 6 water molecules within 4 Å of the salt bridge. Thus, due to the different dielectric environments, the binding free energy enhancement from the K968-T salt-bridge can be much smaller than that for the K782-G0 salt bridge. 25 Thus, based on FEP calculation, the KKH mutations were only able to modestly enhance protein-DNA binding (Figure 3a), taking into account the error of our analysis. Overall, the molecular mechanism of KKH mutations is summarized in Figure 3d, as E782K and N968K compensate the free energy loss by the R1015H mutation that removes the restriction to G3 in the PAM, leading to expanded targeting range of KKH SaCas9 without compromising its energetic property. On top of the energy calculation, We also observed from our simulation that all other coordinations between wild-type SaCas9 and the bound DNA are preserved. For example, the phosphate locker T787 forms a hydrogen bond with the G0 (Figure 3d) and R991 coordinates AT in the TTGAAT PAM, which are both key residues involved in target DNA binding.

11

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Taken together, simulation results of double mutations (KK) and triple mutations (KKH) demonstrated the complex interplay among three single mutations and their coordination to establish the underlying PAM-relaxing biology. This also highlights the importance of the dynamic nature of CRISPR recognition process, and the need of compatible techniques (such as molecular dynamics simulation and single-molecular biophysics) for guiding experimental investigation.

COMET-based engineering of SaCas9 variants to expand PAM range Our analysis on KKH SaCas9 and its consistency with previous experiments provided us with the confidence to expand COMET approach for rational exploration of novel SaCas9 designs to alter PAM specificity. To this end, we decided to target the remaining non-wobbling position of SaCas9 PAM, the last (sixth) T base of NNGRRT, a prime constraint in gene editing applications. From structural information and our MD simulation, N986 serves as the key residue for coordinating this PAM position. Hence, as a first step, we performed screening of various mutations on N986 to change it to alternative amino acids (mostly charged for maintaining protein-DNA interaction) with COMET workflow, and yielded a set of FEP calculations to guide downstream experiments (Fig. 4a). Based on our free energy results, the most promising candidates were N986H/K/R mutants. The unfavorable energy prediction for N986A, N986E, and N986Q mutants guided our experimental efforts so that we could exclude these variants from our experimental tests. Here the COMET workflow spared significant time and cost given that, for defining Cas9 PAM specificity, we would have to test each individual mutant against complete sets of editing sites spanning four different bases at the target position, i.e. NNGRRT /C/G/A. The targeted experiment on SaCas9 N986H/K/R variants revealed that their PAM recognition profiles were indeed modified to various degree, with SaCas9 N986R as the single most notable candidate (Fig. 4b). Compared with the wild type, SaCas9 N986R moderately prefers the non-natural PAM NNGRRG, with decreased activity against NNGRRT while mostly maintained the PAM recognition activity of other bases on the sixth PAM position. As expected, a single mutation could affect but not sufficiently create a powerful new variant, demanding combination effect from additional mutations that we wanted 12

ACS Paragon Plus Environment

Page 12 of 23

Page 13 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

to probe with another iteration of COMET. To guide the selection of additional target residue for combinatorial mutagenesis, we zoomed in on the top-rank SaCas9 N986R variant and preformed new MD simulation modeling its PAM recognition process (Fig. 4c). From the molecular details of residue coordination, we reasoned that R991, in close proximity to R986 (after the N986R mutation), would interact with R986 in a negative fashion due to the electrostatic repulsion. Hence, following computationally screening of possible mutations on R991 we focused on R991A/L/K, to be combined with N986R to further enhance its non-T PAM recognition. With this lead from COMET workflow, we applied DNA targeting assays to test base preference of last PAM position of these combinatorial SaCas9 variants, again compared with wild type reference. Our results yielded another non-T PAM SaCas9 variant with one candidate, SaCas9 N986R + R991L, showing significantly enhanced recognition of NNGRRC and NNGRRG, as well as moderately improved NNGRRA PAM binding activity across different targets when applied to target endogenous genome sequences (Supporting Information, Fig. S4). The activity of both SaCas9-NR and SaCas9-RA, when compared to the original SaCas9, would for the first time allow efficient targeting of new PAM sequences that were previously inaccessible to this small Cas9, with potentially triple or quadruple expansion of SaCas9 range of action, as demonstrated in human cells (Fig. 4d, Fig. S5). These encouraging result verified on multiple targets within mammalian cell context led us to term this new variant as SaCas9-NR (for SaCas9 N986R) and SaCas9-RL (for SaCas9 N986R+R991L). These SaCas9 variants serve as a promising component in the family of Cas9 tools for targeting disease-relevant loci where the last position in SaCas9 natural PAM would prevent optimal design of editing strategy. We envisioned the expansion could enhance the range of available small Cas9 tools, particularly given the ability to combine SaCas9-NR and SaCas9-RL with other powerful Cas9-based tools for enhancement. 26 These results demonstrated the capacity of COMET to provide fresh ground for engineering novel Cas9s with modified properties.

13

ACS Paragon Plus Environment

b

15 10

N9 86R

N9 86K

N9 86Q

1.5

NNGRR|T| NNGRR|C| NNGRR|G| NNGRR|A|

1.0

0.5

d

R986

98 6R

98 6K

N

N

40

endogenous gene activation (relative to SaCas9 WT)

c

N

98 6A

98 6H

0.0

N

SaCas9_NR

30

SaCas9_RA SaCas9_RL

20

SaCas9_WT activity level

10

te s G R

R

A

PA M

PA M G R

R

C R R G

Si

Si te

s Si

te

te s Si PA M RT R G

s

0

L991

G

N9 86E

-5

N9 86A

0

N9 86H

5

Page 14 of 23

PA M

a

∆∆E (kcal/mol)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

normalized Cas9 efficiency (vs. WT)

Journal of the American Chemical Society

Structural & Functional Studies on CRISPR-based Technology

e

COmbined Molecular dynamics & Experimental Target validation (COMET) Molecular Dynamics Simulation with All-atom Structural Model

FEP Calculation of Key Residue Mutations

Experimental Target Validation using Relevant Assay

Molecular & Mechanistic Insight for Novel CRISPR Variants

Figure 4: COMET-based optimization of SaCas9 variants with expanded PAM range. a) FEP calculations for various mutations for N986. b) normalized Cas9 efficiency for engineered SaCas9 variants targeting NNGRRT /C/G/A PAM. c) Illustration of R986’s coordination with the DNA backbone and the hydrophobic interaction between R986 and L991 (after the R991L mutation). d) Endogenous genome targeting activity of novel SaCas9 variants discovered through COMET workflow, dash line represents wild type SaCas9 activity as the basis for normalization. For each PAM sequence shown on X-axis, results from different targets were represented with S.E.M. as error bar. e) Summary of COMET for a combined approach to understand and engineer CRISPR genome editing tools.

14

ACS Paragon Plus Environment

Page 15 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

Discussion In summary, with the development and implementation of COMET that leverages extensive molecular dynamics simulations and targeted measurement of biological activity, we interrogated the molecular mechanism of KKH SaCas9. In this process, we revealed intriguing features of Cas9 target recognition mediated by dynamic interactions between PI domain and the PAM sequence. Our work captured molecular details and associated energy properties of KKH SaCas9, owing to the combined power of complex crystal-image, structure-based simulation, and experimental validation. These parallel computation and experimentation determined the unique impact of each mutation over SaCas9 PAM recognition and the complicated interplay among them. Then as a further step to validate our methodology, we inventively designed and tested novel SaCas9-NR and SaCas9-RL variants that harbor expanded PAM activities that enabled human genome targeting for PAM sequences that were non-targetable previously. Additionally, the discovered mechanism could be used to design a SpCas9 variant that recognizes the NG PAM (Fig. S5), broadening its targeting space. Therefore, we expect that the newly revealed insights and accompanying approach could be useful as possible guiding principle for other Cas9 orthologs. While we are preparing our work for submission, another group applied directed evolution approach for evolve Cas9 PAM specificity to enable a relaxed NG PAM for SpCas9, 27,28 which complements our approach. These work taken together revealed the flexibility of Cas9 PAM recognition as a tunable property of the CRISPR system, and lead to the perspective that additional properties such as Cas9 specificity could also be tackled in similar fashion, building on each other in a synergistic way. We believe the COMET approach established here opened an exciting path to further understand, validate, and inventively predict CRISPR variants with new capacity (Fig. 4e). In particular, it will have potential intersection with additional areas of experimental research, such as singlemolecular experiments 29 , a set of powerful biophysics methodology with limitation in scalability yet already provided fascinating understanding of CRISPR system dynamics. 23,24,30–32 We expect that this work could serve as one of the stepping stones towards a concerted effort by biologists, bioengineers, and computational scientists to tackle the remaining challenges in generating more 15

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 23

powerful and precise genome engineering technology, with potential impact not only in basic science but also in various biomedical applications.

Methods MD simulations

We performed all-atom MD simulations on the SaCas9-sgRNA complex with

or without bound DNA that is solvated in the 0.15 M NaCl electrolyte (Figure 1a). Details are listed in the Supporting Information. Free energy perturbation calculations The free energy perturbation (FEP) method 33 has been widely used for in silico mutagenesis studies of proteins. Here, after obtaining the equilibrated bound and free states of complexes, we deployed this method to calculate the change of the binding free energy for each proposed mutation on SaCas9. Figure 2a shows a thermal dynamical cycle used in the FEP method to calculate the free energy difference ∆∆G for the mutation R1015H: ∆GA and ∆GB are free energy changes for dsDNA’s binding to SaCas9 and the mutated one, respectively; ∆G1 and ∆G2 are free energy changes for annihilating R1015 and simultaneously creating H1015 in the bound (with dsDNA) and the free (without dsDNA) states, respectively. As shown in this cycle, for the R1015H mutation, difference between dsDNA’s binding free energies can be calculated by the following equation,

∆∆G = ∆GA − ∆GB = ∆G1 − ∆G2 ,

(1)

Generally, direct calculations of ∆GA and ∆GB are challenging and can be circumvented by computing ∆G1 and ∆G2 instead (Eq. 1). From the following ensemble average, 33 ∆G1 and ∆G2 can be calculated theoretically,

∆G = −kB T ln < exp(

16

H f − Hi ) >i , kB T

ACS Paragon Plus Environment

(2)

Page 17 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

where kB is the Boltzmann constant; T the temperature; Hi and H f the Hamiltonians at the initial (i) and the final ( f ) stages respectively. For example, for the R1015H mutation, the initial state is the wild-type SaCas9 and the final state is the one with its R1015 replaced by H1015. Using the perturbation method, many intermediate stages (denoted by λ ) whose Hamiltonian H(λ )=λ H f +(1λ )Hi ought to be inserted between initial and final states to improve the accuracy. In calculations of ∆G1 and ∆G2 , λ changes from 0 to 1 in 18 perturbation windows with the soft-core potential enabled, yielding gradual annihilation and creation processes for R1015 and H1015, respectively.

SaCas9 experiments

Experimental assay were done using constructs from original SaCas9 work

with molecular cloning to introduce mutations or alterations that correspond to engineering design or computational simulation. The backbone vector used was the pX601-SaCas9 plasmid (available from Addgene) as previously described . 16 Briefly, oligo primers (IDT DNA) were designed to amplify DNA fragments containing desirable mutations of SaCas9 construct and used in PCR reaction with template pX601 plasmid. The resulting PCR products were purified using PCR purification kit (QIAGEN), subjected to further separation by agarose gel electrophoresis, and then purified again with gel-extraction kit (QIAGEN) before normalized for downstream assembly. Final cloning of vectors was performed using Gibson Assembly method and transformed into bacteria for isolating plasmids. All plasmids were verified by Sanger Sequencing (Genewiz) and stored for cell transfection experiments. For measurement of SaCas9 activity in mammalian cells, human embryonic kidney 293FT (Thermo Fisher) cells were maintained in Dulbecco’s modified Eagle’s Medium (DMEM), supplemented with FBS and GlutaMAX (Thermo Fisher), in incubators at 37◦ C with 5% CO2 supply. Around 24 hours prior to transfection, cells were seeded into 24-well plates (Corning) at a density of 2.5 × 105 cells per well, and transfected at appropriate confluency using Lipofectamine 2000 (Thermo Fisher), according to the manufacturer’s recommended protocol. A total of 600 ng DNA was used for each well of the 24-well plate. Cells were then incubated until ready to be harvested. Detection and quantification of genomic modification were done using the workflow similar to

17

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

previous work . 10,18 Briefly, about 72 hours after transfection, genomic DNA from transfected cells was harvested using QuickExtract DNA Extraction Solution (Epicentre) with step-wise incubation method, followed by InDel analyses using the SURVEYOR assay, as described previously . 10 The targeted genomic region was amplified using primers for the SURVEYOR assay with amplicon size between 500 to 900 bp for all targets. In the SURVEYOR assay, purified PCR product was re-annealed, subjected to SURVEYOR nuclease digestion, and then analyzed and quantified by polyacrylamide gel electrophoresis. 10 Transcriptional reporter assay and endogenous gene activation/targeting are done in similar set-up as previously described. 18 All experiments were done as triplicate to account for possible technical noise in the assay to obtain the error statistics.

Supporting Information Contacts between neighboring copies in the crystal environment; The MD simulation of the SaCas9 in the bound and free states; The equilibrated structure in the bound state from the MD simulation; Conformational entropies of TGA in the PAM sequence TTGAAT; Possible applicaiton of the SaCas9-KKH mechanism to SpCas9; MD simulation protocols; Movies showing dynamic PAM coordinations of SaCas9 in the bound state and domain fluctuations in the free state.

Acknowledgement Authors would like to express gratitude to Drs. Yuexin Zhou, Feng Zhang, Bernd Zetche, Yinqing Li, and Zihe Rao for their support and helpful discussion. Authors gratefully acknowledge the financial support from the IBM Bluegene Science Program (grant number: W125859, W1464125 and W1464164).

References (1) Mojica, F. J.; Díez-Villaseñor, C.; García-Martínez, J.; Soria, E. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J. Mol. Evol. 2005, 18

ACS Paragon Plus Environment

Page 18 of 23

Page 19 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

60, 174–182. (2) Bolotin, A.; Quinquis, B.; Sorokin, A.; Ehrlich, S. D. Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology 2005, 151, 2551–2561. (3) Barrangou, R.; Fremaux, C.; Deveau, H.; Richards, M.; Boyaval, P.; Moineau, S.; Romero, D. A.; Horvath, P. CRISPR provides acquired resistance against viruses in prokaryotes. Science 2007, 315, 1709–1712. (4) Garneau, J. E.; Dupuis, M.-È.; Villion, M.; Romero, D. A.; Barrangou, R.; Boyaval, P.; Fremaux, C.; Horvath, P.; Magadán, A. H.; Moineau, S. The CRISPR/Cas bacterial immune system cleaves bacteriophage and plasmid DNA. Nature 2010, 468, 67. (5) Deltcheva, E.; Chylinski, K.; Sharma, C. M.; Gonzales, K.; Chao, Y.; Pirzada, Z. A.; Eckert, M. R.; Vogel, J.; Charpentier, E. CRISPR RNA maturation by trans-encoded small RNA and host factor RNase III. Nature 2011, 471, 602. (6) Sapranauskas, R.; Gasiunas, G.; Fremaux, C.; Barrangou, R.; Horvath, P.; Siksnys, V. The Streptococcus thermophilus CRISPR/Cas system provides immunity in Escherichia coli. Nucl. Acids Res. 2011, 39, 9275–9282. (7) Jinek, M.; Chylinski, K.; Fonfara, I.; Hauer, M.; Doudna, J. A.; Charpentier, E. A programmable dual-RNA–guided DNA endonuclease in adaptive bacterial immunity. Science 2012, 337, 816–821. (8) Gasiunas, G.; Barrangou, R.; Horvath, P.; Siksnys, V. Cas9–crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proc. Natl. Acad. Sci. USA 2012, 109, E2579–E2586. (9) Wiedenheft, B.; Sternberg, S. H.; Doudna, J. A. RNA-guided genetic silencing systems in bacteria and archaea. Nature 2012, 482, 331. 19

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(10) Cong, L.; Ran, F. A.; Cox, D.; Lin, S.; Barretto, R.; Habib, N.; Hsu, P. D.; Wu, X.; Jiang, W.; Marraffini, L. A.; Zhang, F. Multiplex genome engineering using CRISPR/Cas systems. Science 2013, 339, 819–823. (11) Mali, P.; Yang, L.; Esvelt, K. M.; Aach, J.; Guell, M.; DiCarlo, J. E.; Norville, J. E.; Church, G. M. RNA-guided human genome engineering via Cas9. Science 2013, 339, 823– 826. (12) Jiang, W.; Bikard, D.; Cox, D.; Zhang, F.; Marraffini, L. A. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nature Biotech. 2013, 31, 233–239. (13) Jinek, M.; East, A.; Cheng, A.; Lin, S.; Ma, E.; Doudna, J. RNA-programmed genome editing in human cells. Elife 2013, 2, e00471. (14) Cho, S. W.; Kim, S.; Kim, J. M.; Kim, J.-S. Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nature Biotech. 2013, 31, 230. (15) Hwang, W. Y.; Fu, Y.; Reyon, D.; Maeder, M. L.; Tsai, S. Q.; Sander, J. D.; Peterson, R. T.; Yeh, J. J.; Joung, J. K. Efficient genome editing in zebrafish using a CRISPR-Cas system. Nature Biotech. 2013, 31, 227. (16) Ran, F. A.; Cong, L.; Yan, W. X.; Scott, D. A.; Gootenberg, J. S.; Kriz, A. J.; Zetsche, B.; Shalem, O.; Wu, X.; Makarova, K. S.; Koonin, E.; Sharp, P.; Zhang, F. In vivo genome editing using Staphylococcus aureus Cas9. Nature 2015, 520, 186–191. (17) Kleinstiver, B. P.; Prew, M. S.; Tsai, S. Q.; Nguyen, N. T.; Topkar, V. V.; Zheng, Z.; Joung, J. K. Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition. Nature Biotech. 2015, 33, 1293–1298. (18) Nishimasu, H.; Cong, L.; Yan, W. X.; Ran, F. A.; Zetsche, B.; Li, Y.; Kurabayashi, A.; Ishitani, R.; Zhang, F.; Nureki, O. Crystal structure of Staphylococcus aureus Cas9. Cell 2015, 162, 1113–1126. 20

ACS Paragon Plus Environment

Page 20 of 23

Page 21 of 23 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

(19) Palermo, G.; Miao, Y.; Walker, R. C.; Jinek, M.; McCammon, J. A. CRISPR-Cas9 conformational activation as elucidated from enhanced molecular simulations. Proceedings of the National Academy of Sciences 2017, 114, 7260–7265. (20) Cong, L.; Zhou, R.; Kuo, Y.-c.; Cunniff, M.; Zhang, F. Comprehensive interrogation of natural TALE DNA-binding modules and transcriptional repressor domains. Nat. Commun. 2012, 3, 968. (21) Sternberg, S. H.; LaFrance, B.; Kaplan, M.; Doudna, J. A. Conformational control of DNA target cleavage by CRISPR–Cas9. Nature 2015, 527, 110. (22) Jiang, F.; Taylor, D. W.; Chen, J. S.; Kornfeld, J. E.; Zhou, K.; Thompson, A. J.; Nogales, E.; Doudna, J. A. Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage. Science 2016, 351, 867–871. (23) Dagdas, Y. S.; Chen, J. S.; Sternberg, S. H.; Doudna, J. A.; Yildiz, A. A conformational checkpoint between DNA binding and cleavage by CRISPR-Cas9. Science advances 2017, 3, eaao0027. (24) Chen, J. S.; Dagdas, Y. S.; Kleinstiver, B. P.; Welch, M. M.; Sousa, A. A.; Harrington, L. B.; Sternberg, S. H.; Joung, J. K.; Yildiz, A.; Doudna, J. A. Enhanced proofreading governs CRISPR–Cas9 targeting accuracy. Nature 2017, 550, 407. (25) Zhou, R. Trp-cage: folding free energy landscape in explicit water. Proc. Natl. Acad. Sci. USA 2003, 100, 13280–13285. (26) Slaymaker, I. M.; Gao, L.; Zetsche, B.; Scott, D. A.; Yan, W. X.; Zhang, F. Rationally engineered Cas9 nucleases with improved specificity. Science 2016, 351, 84. (27) Hu, J. H.; Miller, S. M.; Geurts, M. H.; Tang, W.; Chen, L.; Sun, N.; Zeina, C. M.; Gao, X.; Rees, H. A.; Lin, Z.; Liu, D. R. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 2018, 556, 57. 21

ACS Paragon Plus Environment

Journal of the American Chemical Society 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(28) Nishimasu, H.; Shi, X.; Ishiguro, S.; Gao, L.; Hirano, S.; Okazaki, S.; Noda, T.; Abudayyeh, O. O.; Gootenberg, J. S.; Mori, H.; Oura, S.; Holmes, B.; Tanaka, M.; Seki, M.; Hirano, H.; Aburatani, H.; Ishitani, R.; Ikawa, M.; Yachie, N.; Zhang F.; Nureki, O., Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 2018, 361, 1259–1262. (29) Cuculis, L.; Schroeder, C. M. A Single-Molecule View of Genome Editing Proteins: Biophysical Mechanisms for TALEs and CRISPR/Cas9. Annu. Rev. Chem. Biomol. Eng. 2017, 8, 577–597. (30) Sternberg, S. H.; Redding, S.; Jinek, M.; Greene, E. C.; Doudna, J. A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 2014, 507, 62–67. (31) Josephs, E. A.; Kocak, D. D.; Fitzgibbon, C. J.; McMenemy, J.; Gersbach, C. A.; Marszalek, P. E. Structure and specificity of the RNA-guided endonuclease Cas9 during DNA interrogation, target binding and cleavage. Nucl. Acids Res. 2015, 43, 8924–8941. (32) Singh, D.; Sternberg, S. H.; Fei, J.; Doudna, J. A.; Ha, T. Real-time observation of DNA recognition and rejection by the RNA-guided endonuclease Cas9. Nat. Commun. 2016, 7, 12778. (33) Chipot, C.; Pohorille, A. Free energy calculations; Springer, 2007.

22

ACS Paragon Plus Environment

Page 22 of 23

CRISPR-SaCas9 optimization

NNGRR|C| NNGRR|G| NNGRR|A|

1.0

0.5

in-silico mutagenesis prediction

experimental validation

23

ACS Paragon Plus Environment

6R 98 N

6K 98 N

N

98

6H

0.0

98

L991

NNGRR|T|

6A

N986R

1.5

N

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of the American Chemical Society

normalized Cas9 efficiency (vs. WT)

Page 23 of 23