Molecular Dynamics Simulations for Deciphering the Structural Basis

Jan 11, 2017 - Phone: +91 11-26703749. ... Our simulations suggest that a conserved structural feature of the loop regions ... might aid in designing ...
2 downloads 0 Views 8MB Size
Subscriber access provided by the Henry Madden Library | California State University, Fresno

Article

Molecular dynamics simulations for deciphering structural basis of recognition of pre-let-7 miRNAs by LIN28 Chhaya Sharma, and Debasisa Mohanty Biochemistry, Just Accepted Manuscript • DOI: 10.1021/acs.biochem.6b00837 • Publication Date (Web): 11 Jan 2017 Downloaded from http://pubs.acs.org on January 12, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Biochemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

Molecular dynamics simulations for deciphering structural basis of recognition of pre-let-7 miRNAs by LIN28

Chhaya Sharma and Debasisa Mohanty* Bioinformatics Center, National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi – 110067, India.

*

Correspondence

Debasisa Mohanty, Bioinformatics Center, National Institute of Immunology, Aruna Asaf Ali Marg, New Delhi – 110067, India. E-mail: [email protected]; [email protected] Phone : +91 11-26703749

E-mail address of the author Chhaya Sharma: [email protected]

Running Title: Structural basis of LIN28:pre-let-7 interaction

Keywords: pre-miRNAs, LIN28, let-7, miRNA Biogenesis, molecular dynamics simulations, RNA hairpin conformation

Abbreviations MD, molecular dynamics; SDRs, specificity determining residues; CSD, cold shock domain; CCHCx2, two Cys-Cys-His-Cys type zinc binding domains; preE, precursor element . 1

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 41

ABSTRACT LIN28 protein inhibits biogenesis of miRNAs belonging to let-7 family by binding to precursor forms of miRNAs. Over expression of LIN28 and low levels of let-7 miRNAs are associated with several forms of cancer cells. We have performed multiple explicit solvent molecular dynamics simulations ranging from 200 to 500 ns in length on different isoforms of preE-let7 in complex with LIN28 and also in isolation to identify structural features and key specificity determining residues (SDR) important for inhibitory role of LIN28. Our simulations suggest that, a conserved structural feature of the loop regions of preE-let-7 miRNAs is more important for LIN28 recognition than sequence conservation among members of let-7 family or presence of GGAG motif in the 3’ region. The loop region consisting of minimum of five nucleotides helps pre-miRNAs to acquire conformation ideal for binding to LIN28, but pre-let7c-2 prefers a conformation with three nucleotide loop. Thus our simulations provide a theoretical rationale for the recent experimental observation on escape of LIN28-mediated repression by pre-let-7c-2. The essential structural and sequence features highlighted in this study might aid in designing synthetic small molecule inhibitors for modulating LIN28:let-7 interaction in malignant cells. We have also identified crucial specificity determining residues of LIN28:preE-let-7 complex involving 13 residues of LIN28 and 10 residues of the pre-miRNA. Based on conservation profile of these 13 SDRs, we have identified 10 novel proteins which are not annotated as LIN28 like, but show sequence, domain or fold level similarity to LIN28.

2

ACS Paragon Plus Environment

Page 3 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

INTRODUCTION MicroRNAs (miRNAs) are small non-coding RNAs that play an important role in gene regulation by targeting messenger RNAs (mRNAs)

(1-3)

. Therefore, targeting of miRNA

machinery is emerging as an important tool for the development of therapeutics

(4)

. miRNA

biogenesis is initiated in the nucleus where a miRNA gene is transcribed as pri-miRNA hairpin and is cleaved by Drosha into precursor miRNA (pre-miRNA) hairpin of ~70 nucleotides

(5-7)

.

The pre-miRNA is transported to the cytoplasm and Dicer, an RNase III endonuclease cleaves an extension of the pre-RNA stem-loop to generate a ~22 nucleotide active mature miRNA duplex (8)

. Dicer activity can also be repressed by specific inhibitors that compete with Dicer for pre-

miRNA binding

(9)

. One such example is LIN28, which specifically blocks miRNA let-7

biogenesis. LIN28 can not only inhibit pre-let-7 processing by Dicer but also the pri-let-7 processing by Drosha, by binding to the precursor forms of let-7 family of miRNAs

(10, 11)

. The

LIN28 protein contains two well known nucleic acid binding domains; a cold shock domain (CSD) and two Cys-Cys-His-Cys (CCHC)- type zinc binding domains (CCHCx2)

(12)

. Since

LIN28 inhibits biogenesis of let-7, a double-negative feedback loop exists between LIN28 and let-7

(13)

and their expression levels are inversely related such that LIN28 has been found to be

highly expressed in many malignant cells

(14)

. Therefore, understanding the structural details of

let7:LIN28 interactions and disrupting the complex might be a useful strategy for controlling development and progression of malignancies. Interestingly, it has been recently reported that LIN28 inhibits all isoforms of let-7 miRNA except let-7a-3 and its mouse ortholog let-7c-2 which bypass LIN28-mediated regulation

(15)

. Deciphering structural details of let7:LIN28

complex might explain why let-7a-3 escapes LIN28 mediated inhibition. LIN28 is one of a few 3

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 41

specific inhibitors of miRNA biogenesis to be discovered till now. Hence, understanding mechanistic details of inhibition of let-7 biogenesis by LIN28 might provide clues for identifying other miRNA and protein pairs which function in an analogous manner in miRNA biogenesis pathways. The elucidation of the crystal structure of mouse LIN28A in complex with the precursor element of let-7 (preE-let7) (PDB ID: 3TRZ) has opened up possibilities to study the structural basis of this recognition process systems alone

(17, 18)

(16)

. A number of MD simulation studies on different RNA

and various protein-RNA complexes

(19-22)

have been reported in recent

years. These studies have suggested that simulations ranging from at least few hundred nano seconds to micro second length are required for deciphering dynamic features of RNA or RNAprotein complexes

(23, 24)

.

In an earlier study involving structural analysis of preE-let7

recognition by LIN28 and MD simulations, authors have utilized the 3TRZ crystal structure to deduce interacting residues between human LIN28 protein and let-7g miRNA

(25)

. Though, this

study highlighted several important details of LIN28/let-7 binding and emphasized the requirement of full length LIN28 for efficient binding, the simulation length was relatively smaller. Secondly, recent experimental observation about escape of LIN28 mediated inhibition by let-7a-3, necessitates further comparative analysis of different preE-let7 isoforms in complex with LIN28 to decipher crucial specificity determining residues for recognition by LIN28. Here, we have performed multiple explicit solvent molecular dynamics simulations of length ranging from 200 ns to 500 ns on different isoforms of preE-let-7 in complex with LIN28 and also in isolation to identify structural features and key specificity determining residues (SDR) important for inhibitory role of LIN28. Our simulation results provide a structural rationale for understanding why let-7a-3 does not bind to LIN28 as shown in recent experimental 4

ACS Paragon Plus Environment

Page 5 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

studies. Analysis of conservation profile of these SDRs in sequence and structural homologs of LIN28 has revealed that they are evolutionary conserved, thus highlighting their importance in LIN28 activity. Based on conservation of these crucial SDRs we have identified 10 novel proteins in sequence databases which are not annotated as members of LIN28 family, but they might have regulatory/inhibitory function in the miRNA biogenesis pathway similar to LIN28. We have also analyzed sequence and structure of some pre-miRNAs known to be regulated by LIN28 to emphasize upon additional features in these pre-miRNAs, other than the GGAG motif that aid in their interaction with LIN28 protein. MATERIALS AND METHODS The overview of various different types of computational analysis carried out in this study is depicted in Figure S1. Explicit solvent molecular dynamics simulations have been carried on the crystal structure of LIN28:preE-let-7d complex for 200 ns and key specificity determining residues (SDR) of LIN28-let7 interaction have been identified by a variety of structural analysis. The conservation profile of these residues across species has been analyzed to judge the functional relevance of the SDRs identified by structural analysis. Sequence Profile and structure based searches have been carried out in protein sequence databases to identify novel proteins that are functionally similar to LIN28 but are not annotated as LIN28 homologs. Conservation of the important residues in half of these novel proteins highlighted their functional similarity to LIN28. Searches were also carried out based on similarity in structural features of pre-miRNAs for identifying other miRNAs whose biogenesis can be potentially inhibited by LIN28. Molecular dynamics simulations

5

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 41

The X-ray crystal structure of LIN28 protein in complex with 20 residue long pre-let-7d element (PDB ID: 3TRZ)

(16)

was downloaded from the Protein Data Bank (PDB). In this 2.9Å

resolution crystal structure the coordinates of the 14 residue linker region joining CSD and Zinc binding domains were not available. This linker region was modeled using SWISS-MODEL (26). In order to model various RNA fragments, the crystal structure of preE-let-7d from 3TRZ was used as template and RNA nucleotides were mutated using software 3-DNA version 2.1

(27)

.

Using the coordinates of the crystal structure with modeled linker region, topology file was prepared by xleap module of AMBER12

(28)

. ff99SB force field

(29)

was used to assign the

molecular mechanics parameters for protein molecule and AMBER parm99 force field

(30, 31)

was used for the RNA. Even though newer versions of AMBER force field have been available for protein as well as RNA, we preferred to use ff99SB for protein and parm99 for RNA as they have been used in a large number of simulations on protein-RNA systems

(22, 25, 32, 33)

. Secondly

the recent comparative analysis of the effect of force field on simulations of Protein-RNA complexes by Krepl et al has revealed that, major structural features like interactions in the protein-RNA interface remain conserved in simulations using ff99 RNA forcefield (34). However, publications describing the latest nucleic acid force fields have highlighted some limitations of the parm99 forcefield in case of long timescale simulations on RNA alone

(35, 36)

. Therefore,

while parm99 forcefield has been used for RNA-protein complexes, for the purpose of comparison we have carried out simulations on RNA fragments in absence of protein using the parm99 as well as the latest bsc0χOL3 force field

(35, 36)

. Since the structure had two Zn2+ ions

each coordinated to three cysteine residues and one histidine residue, a cationic dummy atom method was used during simulation of this complex to impose the orientational requirements for the zinc atom and the cysteine and histidine residues(37).

The zinc parameter files were 6

ACS Paragon Plus Environment

Page 7 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

downloaded

from

the

cationic

dummy

atom

approach

(CaDA)

website

(http://:www.mayo.edu/research/labs/computer-aided-molecular-design/projects/zinc-proteinsimulations-using-cationic-dummy-atom-cada-approach). The tetrahedron shaped zinc divalent cation consisted of four peripheral dummy atoms attached to the central zinc atom and Xleap module of AMBER generated coordinates for these dummy atoms. The charge states of the coordinated hystidine and cysteine residues were changed to represent anionic forms. It may be noted that the cationic dummy atom approach has also been used in a recent study involving modeling and simulation of Zinc finger domains using AMBER force field charge of the system was neutralized by adding Na+ ions

(39)

(38)

. The overall

(11 Na+ ions were required for

neutralizing the protein-RNA complexes and 20 Na+ ions were required for neutralizing free RNA complexes) and the system was solvated using TIP3P water box

(40)

, with box edges lying

8Å from the outermost atoms of the system in all directions. System was then minimized using steepest descent approach to remove all the steric clashes. Molecular Dynamics (MD) simulations were carried out using 2 fs time step and SHAKE algorithm

(41)

was applied to

constrain the bonds containing hydrogens. The system was heated to 300K over a period of 20 ps dynamic simulations using NVT ensemble. The temperature was constrained using Langevin dynamics temperature coupling and collision frequency of 3ps-1. After that pressure was equilibrated to 1 atm over a 100 ps timescale using isotropic position scaling and keeping a constant temperature. Production MD was carried out for 200 ns using NPT ensemble after ensuring stable density (~1 g cm-1) and temperature (~300K). All simulations were carried out using periodic boundary conditions and long range electrostatics interactions were computed using PME (Particle Mesh Ewald) (42) method using 10 Å cut off in direct space. The simulation of the preE-let-7c-2 in complex with LIN28 was also carried out for 200 ns using a cut off of 7

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 41

10Å. However, in order to understand the effect of direct space cut off on the results, the simulations on preE-let-7d:LIN28 complex were also repeated using a larger water box with box edges lying 20 Å from the outermost atoms of the system in all directions and using a 15 Å cut off for electrostatics interactions in direct space.

In addition explicit solvent MD simulations

were also carried out for 200 ns each on free preE-let-7d, preE-let-7c-2 and preE-let-7a-1 i.e. without the bound LIN28 protein and each of these simulations were repeated once by assigning different initial velocities using different random number seeds. The simulations of free preE hairpins were carried out using PME approach and water boxes which extended 8 Å from the outermost atoms of the system in all directions. In these simulations 10Å cut off for electrostatic interaction was used. All simulations were carried out using PMEMD module of AMBER12 package (28) on NVIDIA GPU cluster nodes having M2090 or K20x GPUs. Analysis of MD trajectories The various conformations of let-7:LIN28 complexes sampled during MD simulations were analyzed using ptraj module of AMBER 12 for variations in RMSD (root mean square deviation), RMSF (Root mean square fluctuation), computational B-factor and inter molecular interactions involving hydrogen bonds, electrostatic and van der Waals contacts. In order to identify water mediated interactions between RNA and Protein molecules cpptraj module of AMBER 12 was used. Graphs were plotted using Grace plotting tool (http://plasmagate.weizmann.ac.il/Grace/) and R v3.0.2 (R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL: http://www.R-project.org/). The PyMOL Molecular Graphics System, Version 1.6 Schrödinger, LLC was used for visualization and analysis of molecular complexes. The inter residue cross correlations were calculated using ptraj module of 8

ACS Paragon Plus Environment

Page 9 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

AMBER 12. In order to analyze correlated movements in MD trajectories, PCA (principle component analysis) was performed using GROMACS v4.5.7

(43)

. The first eigenvector that

captured the major component of movements was visualized using IED plug in

(44)

of VMD

v1.8.4 (45). Search for LIN28 homologs LIN28 protein homologs across various species were identified using BLASTp (ProteinProtein Basic Local Alignment Search Tool) search

(46)

. The mouse LIN28 protein sequence

(Accession:AAH68304.1) was used as a query and search was carried out against all nonredundant protein sequences (GenBank CDS translations + PDB + SwissProt + PIR + PRF excluding environmental samples from WGS projects) with default parameters of BLASTp. Out of the total of 110 hits, 104 sequences were selected excluding sequences that had not been annotated as LIN28 or LIN28-like. Remaining 6 sequences that were not annotated as LIN28 were analyzed separately for functional similarity to LIN28 in terms of involvement in binding to pre-let-7. A phylogenetic tree was generated for these 104 sequences using ClustalW2– Phylogeny (47) and one representative sequence was chosen from each of the distinct clusters of sequences in this tree for further analyses. The multiple sequence alignment tool Clustal Omega (48)

was utilized to identify conserved residues in 21 representative LIN28 homologs identified

from phylogenetic tree. The multiple sequence alignments were visualized and analyzed using Jalview v3 (49). The conservation profile of residues on the three dimensional structure of LIN28 was depicted using chimera v1.10.1 (50). Chimera uses a multiple sequence alignment and a 3-D structure as input and based on the conservation of sequences highlights the different regions on the structure.

9

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 41

In addition to BLASTp search, search for distant homologs of LIN28 was also carried out using HMM profiles and structural similarity. Proteins having similar domain organization and architecture as the LIN28 protein were identified using Pfam (Protein family database) domain architecture similarity search tool (Pfam::Alyzer)

(51)

. The profile HMM search was performed

using HH-Pred server (52). HH-Pred accepts a query sequence or a MSA, builds a MSA in case a single sequence is input, using HHBlits or PSI-BLAST to generate a HMM (Hidden Markov Model) that is searched against databases like Pfam and PDB. The results of HH-Pred are sorted in terms of E-value or probability score. Analysis of RNA secondary structure In order to identify a consensus secondary structure for different pre-miRNAs, LocARNA alignment and folding tool was used

(53)

. LocARNA simultaneously aligns and

predicts a secondary structure for multiple RNA sequences that are provided as query. It generates high quality alignments taking into account the structural similarity. LocARNA makes use of a similarity scoring and realistic gap cost for alignment and a free energy RNA model for folding. RESULTS Dynamics of the LIN28:preE-let7 complex Table 1 lists the various simulations carried out in the current study. Since obtaining stable trajectories of protein-RNA complexes has been a difficult task, for the LIN28:preE-let7 complex we carried out two independent 500ns explicit solvent simulation using PME and nonbonded cut offs of 10Å and 15Å. Even though the LIN28:preE-let-7d complex remained stable throughout 200 ns MD simulation (cut off 10Å), when we extended the simulation length to 500 ns it was observed that after ~250 ns the RMSD for the complex started increasing and it 10

ACS Paragon Plus Environment

Page 11 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

reached ~5Å at the end of the 500 ns simulation (Fig. S2). Detailed analysis of the RMSDs of the individual domains indicated that, the higher overall RMSD was primarily arising from higher RMSD values for the CCHCx2 domains which increased upto ~6 Å. On the other hand the RMSDs for the CSD domain remained around 1 Å throughout the 500ns trajectory, while RMSD values for the RNA and linker region fluctuated between 2 to 3 Å.

Analysis of

interactions between protein and RNA revealed that, the higher structural deviations with respect to the crystal structure resulted from breakage of crucial interactions between the CCHC zinc finger domains and preE-let-7d.

It may be noted that, even though our simulations have been

carried out using the PME approach which eliminates artifacts of cut off in reciprocal space, the interactions in direct space are computed using a cut off of 10 Å. It is possible that the loss of crucial electrostatic contacts between RNA and zinc finger domains might be an artifact arising from truncation of electrostatic interactions in direct space. Therefore, we wanted to investigate if simulations using a larger water box and larger cut off distance in direct space can eliminate these possible artifacts and result in a stable MD trajectory. We carried out another 500ns simulation by solvating the LIN28:preE-let-7d complex in water box which extended 20Å in all three directions from outermost atoms of the complex and used PME approach with a direct space cut off of 15 Å. Interestingly, the complex remained stable throughout 500 ns and the overall RMSD values for the entire complex remain below 5 Å (Fig. S3). Interestingly, the final structure from this 500 ns MD simulation with larger 15Å cut off was very similar to the final structure from 200 ns MD simulation with a RMSD of 1.8 Å (Fig. S3A). The RMSD variation and B-factor variation for the 500 ns MD simulation (cut off 15 Å) and 200 ns MD simulation (cut off 10 Å) were also comparable (Fig. S3B and Fig. S3C). Since the simulation with 15Å cut off was stable over the entire 500ns we have used the 500 ns trajectory from the simulation with 11

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 41

15Å cut off for all subsequent analysis. However, for the simulation of preE-let-7c-2 in complex with LIN28 we have carried out 200 ns simulation using a cut off of 10 Å for the purpose of comparison with the LIN28:preE-let-7d complex. Fig. 1 shows the variation of RMSD over the 500 ns explicit solvent MD trajectory and as can be seen even for this complex containing a multi-domain protein and a flexible RNA fragment the RMSD of the complete LIN28:preE-let-7d complex (black) remained under 4 Å for most of the time over the 500 ns simulation. The variations in individual RMSD values for the preE-let-7d miRNA hairpin, two domains and connecting linker region of LIN28 are also depicted in different colors. As can be seen in Fig. 1, the CSD domain (red) showed RMSD variations with all values under 2 Å throughout the simulation time. On the other hand, the preE-let-7d miRNA displayed slightly higher RMSD values (2 to 3 Å) compared to CSD over the entire length of the simulation. Interestingly linker region between CSD and CCHC domains showed lows RMSD values. However, the two CCHC domains showed relatively higher RMSD values though all values remained under 5 Å with an average of ~4 Å. This might be because the CCHC domains did not have any ordered secondary structure elements i.e. presence of either α helices or β sheets, but are stabilized by Zn tetrads.

The two Zn tetrads in CCHCx2 domain

consist of two divalent Zn cations each coordinated to three cysteine residues and one histidine residue. The comparison of the starting structure and final structure of LIN28 after the 500 ns simulation indicate that, the higher RMSD of the CCHCx2 domain arises primarily from relative movement between the two Zn tetrads resulting in an increase in the distance between the two Zn binding motifs after simulation, but structure of each CCHC Zn tetrad is maintained throughout the simulations. Thus our RMSD analysis indicates that CSD and Zn binding domains of LIN28 remain close to their starting structure and higher RMSD arises from movements in the linker 12

ACS Paragon Plus Environment

Page 13 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

joining two CCHC domains. We also wanted to investigate whether the higher flexibilities of individual fragments as indicated by our MD simulations are in agreement with B-factor values of different residues in the crystal structures. RMSF values for individual residues were computed over the 500 ns MD trajectories and the corresponding values were used to obtain the computational or theoretical B-factor from MD trajectory (last panel in Fig. 1). As can be seen, apart from few N-terminus residues of CSD domain, higher computational B-factors are seen for linker region connecting CSD and CCHXx2 domain, the two Zn binding domains and the 3' region of the pre-let-7d which interacts with the Zn binding motifs. Interestingly comparison with experimental B-factor values in the crystal structure (PDB ID:3TRZ) (Fig. S4) indicate that our simulation results are in agreement with experiments. However, it must be noted that crystallographic B-factors contain contributions from other factors apart from thermal fluctuations. The comparison with experimental B-factors have been made only to indicate that flexible regions identified in the simulations are in agreement with regions with regions of higher flexibility found in crystallographic studies. In order to analyze the dynamic cross correlations in the inter domain movements in the LIN28:preE-let-7d complex in details, we also performed principle component analysis (PCA) of the conformers sampled during the 500 ns MD trajectory (Fig. 2B). The different regions of the LIN28: preE-let-7d complex exhibit positive correlation in their movements throughout 500 ns simulations except a small terminal region of CCHCx2 domain i.e. residues 160 to 167 that show anti-correlated motion with respect to other regions in the complex (Fig. 2B).

Since a major

component of the movements was captured by the first eigenvector, this has been plotted on the 3-D structure of the complex, where the cyan arrows show the direction and extent of motion in the complex (Fig. 2C). In agreement with the results from RMSD analysis the linker and the 13

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 41

two CCHC domains displayed higher fluctuations compared to the CSD domain as indicated by the arrow heads and length of the arrows. The cross correlation map also indicate that the movements in the end region of miRNA are highly correlated with the middle region of the CCHCx2 domain as highlighted by dark red color on the map (Fig. 2B). It was observed that the lower unpaired region of miRNA had moved towards the CCHCx2 domain during the simulation to maintain crucial RNA-Protein interactions (Fig. 2A and 2C). Specificity Determining Residues (SDR) for binding of LIN28 to preE-let-7d In order to identify the crucial interactions governing binding of LIN28 to preE-let-7d, the hydrogen bonded (H-bond) residue pairs between RNA and the protein which remained stable over the 500 ns MD trajectory were analyzed. The hydrogen bonds were defined using the criteria distance between hydrogen donor (D) and acceptor (A) being ≤ 3 Å and D-H..A angle being ≥120 degrees. The networks of H-bonds between LIN28 and preE-let-7d were analyzed using ptraj module of AMBER. Since a recent study has highlighted the contribution of water molecules in stabilizing interactions between RNA and protein(54), we also analyzed the water mediated RNA-Protein interactions in the RNA-protein complexes sampled during the simulation. If a water molecule was involved in H-bonded interactions with both preE-let-7d and LIN28 at the same time, it was counted as water bridged RNA-Protein interaction between the corresponding nucleotide of RNA and amino acid residue of the protein. This analysis was performed using cpptraj module of AMBER. Apart from water mediated interactions, we also analyzed the MD trajectories of RNA-protein complexes as well as RNAs alone to find out if ions interacted with RNA backbone or mediated interactions between RNA and protein. However, it was found that ions mostly remained in the solvent. Since our initial structures used for simulations lacked crystallographically determined bound ions and ions were only added 14

ACS Paragon Plus Environment

Page 15 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

during salvation for overall charge neutralization, they remained mostly in the solvent instead of forming direct interactions with protein or RNA. The analysis of water mediated RNA-protein interactions revealed few such interactions mediated by water molecules, for instance, C3:LYS111, G20:LYS136, C11:GLU75 and C14:Gln165. However, all these water mediated interactions persisted only for about 25% of the total simulation time. Since, in our analysis only the hydrogen bonds which were maintained for ≥50% of the total simulation time were considered to be stable, none of these water mediated interactions between RNA and protein showed up in our final list of stable interactions between RNA and protein. However, it is important to note that water molecules and ions can mediate crucial interactions between RNA and protein. MD simulation studies on high resolution structures of RNA-protein complexes with bound water molecules and ions can provide novel insight into role of water in RNA-protein recognition. Thus our analysis led to identification of 13 important direct interactions between RNA and protein, comprising of 13 key residues of LIN28 and 10 nucleotides of preE-let-7 (Fig. 3). The 13 protein residues were Trp32, Arg36, Asp57, Arg71, Ser86, Glu91, Arg108, Lys113, Arg119, Asp123, Glu143 and Lys146, while the RNA nucleotides were C3, G5, A8, U9, U12, G13, C16, G17, G18 and G20. Out of the 13 LIN28 residues, six residues (Trp32, Arg36, Asp57, Arg71, Ser86 and Glu91) are present on CSD domain, two (Arg108 and Lys113) are in the linker region and the remaining four residues (Arg119, Asp123, Glu143 and Lys146) are contributed by the CCHCx2 domain. Five out of the 6 residues from CSD domain interact with those RNA nucleotides that fall in the loop region of the pre-miRNA. The two linker region residues interact with the stem region, while the CCHCx2 domain interacts with the unpaired 3’ end of the preE-let-7d (Fig. 3). The insets in Fig. 3 show the atomistic details of the interacting residue pairs and as can be seen, since most interactions are contributed by different bases of 15

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 41

preE-let-7d rather than phosphates, our analysis highlights the crucial role of the nucleotide sequence of the pre-miRNA in governing the specificity of recognition. LIN28 protein which was originally discovered in Caenorhabditis elegans is found to be highly conserved across evolution

(55)

. We also analyzed if these residues involved in recognition of preE-let-7d are

conserved in LIN28 homologes across various species. BLASTp search for LIN28 homologous in nr database of NCBI revealed 104 LIN28 sequences which clustered in the phylogenetic tree into 21 distinct nodes (Fig. S5A). Fig. S5B shows the multiple sequence alignment of representative sequences from each of the 21 clusters. It is interesting to note that the 13 residues of LIN28 which were identified by our simulation studies to be crucial for preE-let-7 recognition also have a high degree of evolutionarily conservation (Fig. 4 and Fig S5B). Thus our phylogenetic analysis further corroborates the functional importance the SDRs identified by structural modeling. Structural conservation of loop region is important for recognition by LIN28 It is well known that LIN28 protein selectively inhibits biogenesis of let-7 family miRNAs and it has been suggested that LIN28 specifically binds to the GGAG motif present in the 3’ arm of the precursor elements of let-7 family miRNAs (56). Our MD simulations revealed a set of 10 nucleotides of preE-let-7d which were crucial for binding to LIN28 (Fig. 3 and Fig. 5A). However, it was found that the A of the GGAG motif did not form stable interaction (interactions which persisted for more than 50% of the simulation time) with any of the residues of LIN28 despite conservation of GGAG motif in let-7 family of miRNAs. It is interesting to note that, this result from our simulation is in agreement with experimental observation which has reported that mutation of A of GGAG motif does not have a significant effect on pre-let7:LIN28 binding

(25, 57)

. Interestingly, some recent studies have also shown involvement of the 16

ACS Paragon Plus Environment

Page 17 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

LIN28 protein in post-transcriptional regulation of certain miRNAs other than members of the let-7 miRNA family and they lack the 3’- GGAG motif (58, 59) (Fig. 5B). This suggests that apart from the GGAG motif there are other sequence/structural features that govern the specificity of recognition between these pre-miRNAs and LIN28. We performed a structure based sequence alignment of preE-let-7d of the let-7 family harboring the GGAG motif with three recently identified preE-miRNAs from other family (miR-9, miR-3596 and miR-370) lacking the GGAG motif at the 3’ end. This structure based sequence analysis revealed that all four of these miRNAs had a five nucleotide loop similar to the let-7 family pre-miRNAs despite no obvious similarity in the sequence of the loop region (Fig. 5B&C). An extension of alignment shows that the GGAG motif is absent in these pre-miRNAs (Fig. 6A). In fact the base pair flanking the loop region is predicted to be A:U or G:U pair in most of these pre-miRNA sequences, but crystal structure of preE-let-7d indicate that the corresponding G:U pair is not formed and the structure consists of a four base pair stem and 7 nucleotide loop region which makes extensive contacts with the CSD domain of LIN28. This indicates that a RNA hairpin consisting of a seven nucleotide loop and four base pair stem region constitute the conserved structural element in premiRNAs which is crucial for their recognition by LIN28 and subsequent inhibition of their biogenesis. Therefore, based on our simulations and secondary structure prediction of other premiRNAs from let7 family, we hypothesize that the nucleotides 6 to 10 of the preE-miRNA must be unpaired to attain a loop conformation essential for binding to the CSD domain of LIN28 protein. Very recently it has been shown that a single let-7 family mouse pre-miRNA let-7c-2 and its human ortholog let-7a-3 bypass repression by LIN28 protein

(15)

. Even though this pre-

miRNA lacks a 3’ GGAG motif, introduction of a GGAG motif also did not rescue LIN28 17

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 41

binding. However, a five nucleotide sequence mutation in the preE-loop region was sufficient to rescue LIN28 binding. This study supports the results from our LIN28:preE-let-7d simulations regarding role of interaction between loop region and CSD domain in recognition of pre-let7 miRNAs by LIN28. Secondary structure prediction and alignment of the sequence of preE-let7c-2 onto preE-let-7d structure revealed that in case of preE-let-7c-2 C6 and G10 form the G:C pair flanking a three nucleotide loop (Fig. 5D). Hence, preE-let-7c lacks the five nucleotide loop conformation required for efficient binding to the CSD domain of LIN28. For one other let-7 family miRNA precursor element, preE-let-7a-1, the secondary structure based alignment predicts that it also forms a 3 residue loop similar to preE-let-7c-2 and an A:U pair constitutes the last base pair of the stem preceding the 3 nucleotide loop (Fig. 6B). However, experimental studies indicated that biogenesis of let-7a-1 is inhibited by LIN28 protein just like other let-7 miRNAs despite having a shorter loop region. We wanted to investigate if these crucial differences in the conformation of the loop regions of preE-let-7d, preE-let-7c-2 and preE-let-7a1, as suggested by secondary structure analysis are also seen in the explicit solvent simulations on these three molecules. Explicit solvent MD simulations were carried out for 200 ns on free preE-let-7d, preE-let7c-2 and preE-let-7a-1 i.e. without the bound LIN28 protein. Each of these simulations using parm99 forcefield was repeated twice by assigning different initial velocities using different random number seeds. It has been reported that, long time scale MD simulations using parm99 forcefield results in formation of spurious ladder like structures due to degradation of certain dihedral angles in the nucleic acid backbone(35, 36). While this effect is less observed in proteinnucleic acid complexes, it is greatly pronounced in long-timescale simulations of nucleic acids alone in explicit solvent. We wanted to check if the RNA in our simulations with parm99 18

ACS Paragon Plus Environment

Page 19 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

forcefield forms spurious ladder-like structure. Therefore, we compared the RMSD of the let-7d RNA fragment

with respect to the conformation found in let-7d:LIN28 crystal structure over

the simulation trajectory. As can be seen from supplementary Figure S6, in the RNA-protein complex let-7d remains close to the conformation in crystal structure with RMSD values remaining around 1.5Å throughout the trajectory and the final structure from simulation also show very good superposition with the crystal structure. On the other hand in the simulation of free let-7d structure (in absence of bound LIN28), the base paired stem region remains close to the crystal structure, but loop region shows higher fluctuations in the range of 3 to 3.5Å. In absence of any crystal structure for free let-7d, it is difficult to judge if the observed fluctuations in loop regions are artifacts of forcefield or inherent conformational flexibility of let-7d molecule. Since free RNA simulations showed higher fluctuations, we decided to repeat all the simulations on three free RNAs (preE-let-7d, preE-let-7a1 and preE-let-7c2) using the new bsc0χOL3 forcefield for nucleic acids

(35, 36)

. Fig. 7 shows 2-D and 3-D representation of the

structures of preE-let-7d, preE-let-7c-2 and preE-let-7a-1 before and after simulation, while variations of the hydrogen bond distances over the MD trajectories are shown in Fig. S7, Fig. S8 and Fig. S9. The starting backbone conformations for these three pre-miRNAs were same as the conformation of preE-let-7d in the LIN28 bound crystal structure (Fig. 7). As can be seen, in case of preE-let-7c-2 even though G5 to C11 and C6 to G10 were unpaired in the initial structure, G5:C11 and C6:G10 pair was formed during the simulation and attained a conformation with only three nucleotides i.e. U7, C8 and U9 in the loop region (Fig. 7). In case of preE-let-7a-1, G5:C11 base pairing was formed after 200 ns simulation, but U6 and A10 remained unpaired throughout the 200 ns trajectory (Fig. 7), thus the molecule prefers a conformation with a 5 nucleotide loop. It is interesting to note that, even though secondary 19

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 41

structure prediction suggest both preE-let-7c-2 and preE-let-7a-1 to have a propensity to adopt a conformation with three nucleotide loop, structural modeling and MD simulations reveal different conformational features for the loop region because of stronger preference for the base pairing between G and C compared to A and U. LIN28 can probably form a stable complex with preE-let-7a-1 because of its five nucleotide loop, while the conformation with a three nucleotide loop in case of preE-let-7c-2 is not optimum for LIN28 binding. On the other hand, in preE-let7d the nucleotides G5 to U11 remained unpaired even after 200 ns constituting a seven nucleotide loop as seen in the LIN28 bound structure of preE-let-7d (Fig. 7 and Fig. S9). Even though the nucleotides in the loop region of preE-let-7d showed conformational rearrangements in absence of the bound protein, they still maintained a loop conformation which facilitates interaction of bases with the CSD domain of LIN28 protein. It is also interesting to note that for these preE-let7 isoforms simulations using parm99 forcefield gives similar results as those obtained using the recently developed bsc0χOL3 forcefield for nucleic acids. These results from our simulations on free structures of preE-let-7d, preE-let-7c-2 and preE-let-7a-1 indicate that, preE-let-7c-2 has intrinsic preference for a conformation which is not optimal for LIN28 binding, while preE-let-7a-1 and preE-let-7d can adopt conformation with a minimum of five nucleotide loop which is required for binding to LIN28. Thus our simulations provide a theoretical rationale for the experimental observation about lack of LIN28 mediated inhibition for let-7c-2 miRNA. Since our simulation suggests the important role of conserved conformation of the loop region in recognition of preE-let-7 by LIN28, either crystallographic or NMR studies on preE-let-7c-2 and preE-let-7a-1, or biochemical binding studies involving synthetic sequences which do not permit G:C or G:U pairs in the loop region might help in testing this hypothesis arising out of our simulation studies. 20

ACS Paragon Plus Environment

Page 21 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

We also wanted to investigate if LIN28 can induce conformational changes in preE-let7c-2 to facilitate optimum interaction of the loop region with its CSD domain. Starting from the LIN28:preE-let-7d complex,

the nucleotides of the preE-let-7d were mutated to the

corresponding nucleotides from preE-let-7c-2 and MD simulations were performed for 200 ns on the LIN28:preE-let-7c-2 complex (Fig. 8A). Analysis of inter molecular hydrogen bonds which were present for more than 50% of the simulation time revealed that, only U9 and U19 of the preE-let-7c-2 form stable interactions with the residues Asp57, Trp32 and Asn127 of the LIN28 protein (Fig.8C). In contrast to the LIN28:preE-let-7d complex which had stable interactions between 10 miRNA nucleotides and 13 LIN28 residues over the 200 ns trajectory, preE-let-7c-2 showed stable interactions between two miRNA nucleotides and three LIN28 residues. This clearly indicates significantly diminished propensity of preE-let-7c-2 for binding to LIN28 compared to preE-let-7d. Detailed analysis of the conformations sampled during the 200 ns trajectory revealed that, even though preE-let-7c-2 did not adopt the conformation with three nucleotide loop, and G5:C11 and C6:G10 base pairs in the stem, the nucleotides G5, C6, G10 and C11 moved away from the CSD domain and favored conformation with increased intramolecular interactions (Fig. 8B). This suggests that a longer simulation on this complex might result in formation of the expected 3 nucleotide loop conformation in pre-let-7c-2. Identification of Proteins functionally similar to LIN28 From among the homologs of LIN28 identified by BLASTp search, there were six novel proteins which had not been annotated as LIN28 or LIN28-like protein (Table S1). Multiple sequence alignment of sequences of these six proteins with mouse LIN28 protein (Fig. S9) revealed that, four out of these six proteins have more than 10 specificity determining residues conserved. These four proteins, namely unnamed protein product (Tetraodon nigroviridis), 21

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 41

hypothetical protein EGM 00377( Macaca fascicularis), hypothetical protein PANDA 005261(Ailuropoda melanoleuca) and Rcg58531, isoform CRA b (Rattus norvegicus) could be annotated as LIN28-like proteins based on the conservation of functionally important residues (Fig. S10). Similarly, from among the proteins which lack obvious sequence similarity to LIN28, but have similar domain architecture as LIN28, we found another 13 proteins which were not annotated as LIN28 or LIN28-like proteins (Table S2). Interestingly six out of these 13 proteins could be re-annotated as LIN28-like based on the conservation profile of functionally important residues (Fig. S11 & S12). Even though Cold shock protein-1 (CSP-1) and Glycine rich protein-2 (GRP-2) from Arabidopsis thaliana were identified based on HH-pred search (Table S3), they lacked the conserved SDRs required for binding to pre-miRNAs. Interestingly, none of these two proteins are involved in regulation of miRNA biogenesis but are known to aid freezing tolerance in plants (60). DISCUSSION AND CONCLUSION In this study we have carried out structural bioinformatics and explicit solvent MD simulations on different isoforms of preE-let-7 in complex with LIN28 and also in isolation to decipher key structural determinants of preE-let-7 recognition by LIN28. Based on the simulation studies, we have identified crucial specificity determining residues of LIN28:preE-let7 complex involving 13 residues of LIN28 and 10 nucleotides of the pre-miRNA. Phylogenetic analysis revealed these 13 protein residues to be evolutionarily conserved, thus further supporting their role LIN28 activity. Based on conservation profile of these 13 SDRs on LIN28like proteins, we have identified 10 novel proteins which are not annotated as LIN28-like, but show sequence, domain or fold level similarity to LIN28. They might play similar roles as LIN28 protein in the miRNA biogenesis pathway. 22

ACS Paragon Plus Environment

Page 23 of 41

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Biochemistry

The comparative analysis of conformational features of different members of let-7 family using MD simulations reveals that, a conserved structural feature of the loop regions of preE-let7 miRNAs is more important for LIN28 recognition than sequence conservation among members of let-7 family. Terminal loop region consisting of a minimum of five nucleotides helps premiRNAs to acquire conformation ideal for binding to LIN28. Thus our simulations provide a theoretical rationale for the recent experimental observation on escape of LIN28 inhibition by pre-let-7c-2

(15)

. The requirement of five nucleotide loop conformation for binding to CSD

domain of LIN28 is also consistent with experimental studies which have identified miRNAs like miR-9, miR-3596 and miR-370 which do not belong to let-7 family, but are also inhibited by LIN28 (58, 59). The lack on contact between adenine of the GGAG motif in preE-let-7 and LIN28 in our simulation trajectories also suggest that, the presence or absence of GGAG motif is not the only specificity determinant for pre-miRNA:LIN28 binding. This result is in agreement with a previous study where it was shown that CSD remodels the terminal loop of pre-let-7 miRNA, hence facilitating the subsequent binding of the CCHCx2 to the GGAG motif

(61)

. Our analysis

also suggests that, based on tertiary structure prediction for preE-miRNAs using tools like Assemble (62), Vienna RNA package (63) it might be possible to identify other novel miRNAs that might be regulated by LIN28 protein. Recently an approach named INFORNA has been reported that can design small lead molecules for targeting pre-miRNAs using only sequence information (64)

. The essential structural and sequence features for let-7 biogenesis highlighted by our study

can in principle be combined with approaches like INFORNA to develop small molecule inhibitors to modulate preE-let-7:LIN28 interactions in cancerous cells, hence restoring normal let-7 levels that might reverse the malignancy.

23

ACS Paragon Plus Environment

Biochemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 41

Even though our simulations have provided several interesting insights into structural basis of inhibition of let7 biogenesis by LIN28, in view of various approximations involved in development of molecular mechanics force fields, results obtained from simulation studies should be interpreted with appropriate caution(34). It has been suggested that stable atomistic simulations should show low RMSD (