Computational Insights into the Stability and Folding Pathways of

May 23, 2016 - School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore. J. Phys. Chem. B , 2016, 120 (22), pp 491...
0 downloads 15 Views 5MB Size
Subscriber access provided by UCL Library Services

Article

Computational Insights into the Stability and Folding Pathways of Human Telomeric DNA G-Quadruplexes Di Luo, and Yuguang Mu J. Phys. Chem. B, Just Accepted Manuscript • DOI: 10.1021/acs.jpcb.6b01919 • Publication Date (Web): 23 May 2016 Downloaded from http://pubs.acs.org on May 24, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

The Journal of Physical Chemistry B is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Computational Insights into the Stability and Folding Pathways of Human Telomeric DNA G-quadruplexes

Di Luo, Yuguang Mu* *E-mail: [email protected] Telephone: (+65) 63162885

School of Biological Sciences, Nanyang Technological University Singapore 637551, Singapore

1 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 60

Abstract G-quadruplex is a non-canonical yet crucial secondary structure of nucleic acids, which has proven its importance in cell aging, anti-cancer therapies, gene expression and genome stability. In this study, the stability and folding dynamics of human telomeric DNA Gquadruplexes were investigated via enhanced sampling techniques. Firstly, temperaturereplica exchange MD (REMD) simulations were employed to compare the thermal stabilities among the five established folding topologies. The hybrid-2 type adopted by extended human telomeric sequence is revealed to be the most stable conformation in our simulations. Next, the free energy landscapes and folding intermediates of the hybrid-1 and -2 types were investigated with parallel tempering metadynamics simulations in the welltempered ensemble. It was observed that the N-glycosidic conformations of guanines can flip over to accommodate into the cyclic Hoogsteen H-bonding on G-tetrads in which they were not originally involved. Furthermore, a hairpin and a triplex intermediate were identified for the folding of the hybrid-1 type conformation. Whereas for the hybrid-2 type, there were no folding intermediates observed from its free energy surface. However, the energy barrier from its native topology to the transition structure is found extremely high compared to that of the hybrid-1 type, which is consistent with our stability predictions from the REMD simulations. We hope the insights presented in this work can help to complement current understanding on the stability and dynamics of G-quadruplexes, which is necessary not only to stabilize the structures but also to intervene their formation in genome.

2 ACS Paragon Plus Environment

Page 3 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Introduction G-quadruplex is a non-canonical yet crucial secondary structure of nucleic acids. It resembles a four-stranded building block with at least two stacking G-tetrads connected by loops in diverse lengths and types. Held by cyclic Hoogsteen hydrogen bonds, each G-tetrad possesses a tetrad polarity in the direction from H-bond donors to the acceptors1-2. Monovalent cations such as Na+ and K+ are recruited to promote and stabilize various Gquadruplex topologies, since the vacant orbitals of these cations can be coordinated by the carbonyl oxygen atoms of guanines. G-quadruplex has proven its importance in cell aging, anti-cancer therapies, gene expression and genome stability. Firstly, the double helical DNA at the end of human telomere is known to be tagged with a single-stranded 3’ end overhang in the tandem repeats of d(TTAGGG)3. This G-rich sequence has been well-established in vitro experiments to form G-quadruplexes at the presence of Na+ or K+ ions. The occurrence of Gquadruplexes at this location is therefore inferred as a protection for the single-stranded DNA from being recognized as damaged signals. On the other hand, the length of telomere is known to be shortened after each cell cycle4; whereas the lengths of telomeres in most cancer cells are maintained to avoid cell apoptosis due to the resumed activity of telomerase5-6. However, experiments have found that the existence of G-quadruplexes at the end of human telomeres can inhibit the activity of telomerase7-8 and therefore terminate the immortal of cancer cells. As a result, stabilizing human telomeric Gquadruplexes with potential drug molecules has been proposed as a promising anti-cancer strategy. Secondly, G-quadruplex structures can be also adopted by the G-rich sequences at promoter sites. For example, the G-quadruplex formed at the human c-MYC promoter can 3 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 60

act as a transcriptional silencer9 and has been recognized as an anti-cancer target, since the over-expression of transcription factors coded by the c-MYC genes has been associated with the progression of human cancers. What’s more, the block structure of G-quadruplex can make itself either an impediment where single-stranded DNA is required or a hallmark to initiate certain biological processes. For example, G-quadruplexes may form while duplex DNA is transiently unwinding for replication. Consequently, the obstacle posed by G-quadruplex at replication fork will affect DNA replication fidelity10. Interestingly, people have identified a few helicases which can specifically unwind G-quadruplexes to avoid genome instability10-12. Additionally, G-quadruplexes have been also attributed as a component to indicate replication origins13. The biological importance of G-quadruplex has to rely on its existence in vivo. The first evidence for the formation of G-quadruplexes in vivo was observed at the telomeres of Stylonychia in 200114. Around a decade later, G-quadruplexes in the human genome were detected15 and visualized16 in vivo to reinforce the involvements of G-quadruplexes in cellular activities. Recently, the genome-wide distribution of G-quadruplex structures in the human genome has been uncovered with the aid of high-throughput sequencing technology17. It has been shown that there is a prevalence of G-quadruplexes at regulatory regions and cancer-related genes17. Extensive efforts have been made to disclose the G-quadruplex topologies adopted by human telomeric sequence (Table 1 & Figure S1). The first structural model of human telomeric DNA G-quadruplex in the sequence of AGGG(TTAGGG)3 in Na+ solution was determined by NMR studies in 199318. The structure was promoted by Na+ and

4 ACS Paragon Plus Environment

Page 5 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

characterized as a basket type conformation with three stacking G-quartets connected by edgewise-diagonal-edgewise loops consecutively. In 2002, a crystal structure of the same sequence in K+ was reported to adopt a parallel conformation with all loops in the doublechain-reversal type19. Instead of residing at the centres of G-quartet planes as Na+ in the first structure, two K+ ions were captured to locate at the in-betweens of adjacent Gquartets. In 2006, elongated wild-type sequences with flanking bases at the 5’ and 3’ ends of GGG(TTAGGG)3 were observed to adopt multiple conformations in K+ solution. Specifically, the sequences with fewer or no 3’ flanking bases were observed to form the hybrid-1 type conformation20-22, in which the three connecting loops are double-chainreversal, edgewise and edgewise loops, respectively. However, for sequences with one or more 3’ flanking bases, a hybrid-2 type with the reversed loop arrangement was determined as the major conformation in K+ solution23-24. A T:A:T triple capping structure involving the 3’ flanking bases was revealed to stabilize the hybrid-2 conformation24. In 2009, the sequence GGG(TTAGGG)3T was discovered to adopt a two stacking G-tetrads basket type conformation in K+ solution25. In short, human telomeric DNA sequence can fold into different G-quadruplex topologies mainly depending on the types of monovalent cations and the 3’ flanking bases26. The structural features of these topologies can be distinguished from their strand orientations, loop arrangements, G-tetrad polarities and/or anti- or syn- N-glycosidic conformations. G-quadruplexes are readily formed in solution with the presence of Na+ or K+ at a minimum time scale from milliseconds to minutes. It thus has long been an attractive research topic to decipher the folding dynamics from single-stranded DNA to G-quadruplex. A variety of experimental techniques have been utilized to probe the folding kinetics of G-quadruplexes, 5 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 60

including circular dichroism (CD)27, fluorescence resonance energy transfer (FRET)27-28, magnetic tweezers29-30, time-resolved NMR spectroscopy31, and stopped-flow device32. Current experimental findings suggest that the folding of G-quadruplex is not a simple twostate kinetic process; intermediates that bridge the unfolded and folded states must be present to facilitate jumping between energy basins with large barriers. Although it is hard to unravel the conformations of intermediates directly from experiments, the structural information obtained has implied that the intermediates may be characterized as a hairpin31 and/or a triplex29, 33-34 conformation. Theoretical approaches have also been applied to elucidate the folding dynamics of G-quadruplexes. Comprehensive folding pathways have been proposed by Sugiyama H. et al. for human telomeric G-quadruplexes in the hybrid types35. They used quantum mechanics computation coupled with molecular simulations to examine the stabilization energies of Hoogsteen GG base pair, G-triplet and G-tetrad and to determine the preferable anti-/syn- N-glycosidic arrangements. Their results favour the involvements of both a hairpin and a triplex intermediate in the folding pathways for the hybrid-1 and -2 type human telomeric G-quadruplexes35. Moreover, they proposed that the folding of G-quadruplex would proceed in a way to decrease the total number of anti-conformations35. The stability and possible conformations of hairpin36 and triplex37 intermediates have been systematically examined by Sponer J. et al. via both conventional and enhanced MD simulations. They identified that the double chain reversal or propeller loop is most unstable compared to the lateral or diagonal loops in both hairpin and triplex intermediates36-37. Besides, they predict that the final folding topologies of Gquadruplexes are not associated with the initial anti/syn arrangements in the unfolded state36. 6 ACS Paragon Plus Environment

Page 7 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

In this study, we investigated the stability and folding pathways of human telomeric Gquadruplexes via enhanced sampling techniques. Firstly, we used the temperature-replica exchange MD (REMD) simulations to determine the frequencies of native folding topologies occurring at different temperatures by incorporating the five established structures as conformational isomers in simulations. The most thermal-stable topology will be therefore determined as the one with the highest probability to occur at low temperatures. Secondly, the parallel-tempering metadynamics simulations in the well-tempered ensemble were used to explore the free energy landscapes of the hybrid-1 and -2 types on folding-related dimensions. The intermediates identified from the free energy basins were further utilized to construct the folding pathways of G-quadruplexes. We hope the insights presented in this work can help to complement current understanding on the stability and dynamics of G-quadruplexes, which is necessary not only to stabilize the structures but also to intervene their formation in genome.

7 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 60

Methods i.

Temperature-replica exchange MD simulations

In this study, temperature-replica exchange MD (REMD) simulations38 were used to compare the thermal stabilities of the five human telomeric DNA G-quadruplex structural models as summarized in Table 1 & Figure S1. To incorporate the five distinct topologies while still maintain a uniform system for exchange in REMD simulations, we reconciled the five DNA sequences into 5’TTGGG(TTAGGG)3TTT-3’. Specifically, the core sequence GGG(TTAGGG)3 of each structure was conserved to maintain its folding topology; whereas the original 5’ and 3’ flanking bases were mutated into TT and TTT, respectively. The corresponding TT and TTT fragments were prepared with NAB program in Amber and attached to the truncated Gquadruplexes via “g_confrms” in Gromacs. The force field used for G-quadruplexes in this study is “parmbsc0”39. An equilibrated TIP3P water model and a modified force field40 for ions were used for all of the simulations in this work. Besides, all simulations were evolving at an integration time-step of 2 femtoseconds. Firstly, a 100 ns conventional MD simulation was run for each of the five G-quadruplex structures in the unified sequence with a bulk concentration of KCL at 0.1 M in a cubic box under the periodic boundary condition. Prior to the production run, each system went through a 500-step energy minimization, 100 ps NVT (300 K) and 100 ps NPT (1 atm) equilibration in succession. The long-range electrostatic interactions were solved by PME41 and the cut-off for electrostatic and Van der Waals interactions was set as 1.2 nm. The

8 ACS Paragon Plus Environment

Page 9 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

“LINCS”42 algorithm was used to apply constraints on the bond lengths of the solute. The velocity rescaling thermostat43 and Parrinello-Rahman barostat were used for temperature and pressure coupling, respectively. The final conformations were used as the seeding structures of our REMD simulations. Next, the conformations of the five folding topologies in the unified sequence were arranged repeatedly over 64 temperature replicas. The temperature distribution in the range from 313.12 K to 482.31 K was decided so as to attempt an exchange rate of ~0.2544. Each replica was simulated in a rhombic dodecahedron box at a volume of 165.91 nm3. There were 4793 water molecules added in each replica and the salt concentration of KCL was 0.1 M. A 100 ps NVT equilibration was run for each replica at its own temperature prior to the replica exchange simulations. The facility used to exchange neighbouring replicas was “-replex” in Gromacs and an exchange was attempted every 1 ps. Last, the REMD simulations were run for 380 ns on each replica. The software package used was Gromacs-4.6.345.

9 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ii.

Page 10 of 60

Parallel tempering metadynamics simulations in the well-tempered ensemble

It is known that conventional MD simulations within reachable time scales can only sample conformations of biomolecules around local minima in free energy surface. Observations for large conformational fluctuations like folding/unfolding, which are associated with transitions between local minima, are mostly studied with enhanced simulation techniques. Enhanced simulation techniques are effective in locating different free energy basins near the starting structure in the real free energy landscape. They are designed to change the way of molecules overcoming energy barriers, whereas they do not alter the real free energy landscapes of molecules. However, as the real free energy landscape is rough and there could be many local minima encountered; a structure identified from an enhanced simulation technique should be carefully evaluated. Parallel tempering metadynamics46 is an enhanced sampling technique, which simulates N parallel replicas at N different temperatures with each replica being biased on a couple of pre-defined collective variables (CVs) and being exchanged with its neighboring replicas subject to the metropolis criteria. A CV is a collective degree of freedom, the variation of which should well describe the conformational changes of the system. The success of a metadynamics47-48 simulation is largely dependent on its choice of CVs since bad CVs will lead to insufficient sampling for the intermediates of interest. With the combination of parallel tempering and metadynamics, sampling on hidden CVs can be accomplished since parallel tempering enhances MD simulations on all degrees of freedom by increasing temperature38. However, exchanges between neighboring replicas can be hardly achieved 10 ACS Paragon Plus Environment

Page 11 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

for a considerable temperature range with limited number of replicas and computing resources. To ensure a good overlapping of energies between neighboring replicas for exchange, the well-tempered ensemble49 was introduced to enlarge the fluctuations of potentials while still approximately maintain the canonical ensemble averages in each replica. In this study, parallel tempering metadynamics simulations in the well-tempered ensemble (PTMetaD-WTE) were used to explore the free energy landscapes and folding intermediates of the hybrid-1 and -2 type human telomeric G-quadruplexes. Firstly, a total of 16 temperature replicas from 300 K to 503 K were determined based on the temperature distribution proposed by Meher K. P. et al50. The PDB structures of the hybrid-1 and -2 human telomeric G-quadruplexes in their original sequences were used as the initial structures for all of their 16 replicas, respectively. Each replica was equilibrated in a 1 ns NVT simulation at its own temperature. In the first stage, replica exchange simulations between the 16 replicas were performed with the potential energy of the system as the only CV in the well-tempered metadynamics simulation of each replica. The Gaussian height and width for potential energy were 1.0 and 20 kJ/mol, respectively. The bias factor was set as 10. With the enlarged fluctuations of potentials, the averaged exchange rate between neighboring replicas reached 0.60 by the end of the first 20 ns. The simulations at this stage were then stopped since the bias applied on the potential energy had reached a plateau and the exchange rate had remained unchanged in the last few nanoseconds. In the second stage, replica exchange simulations between the 16 replicas were performed with “cmap” and “dRMSD” as the two CVs in the well-tempered metadynamics simulation

11 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 60

of each replica. “cmap” refers to the total number of native Hoogsteen H-bonds in the three G-tetrads of a G-quadruplex; hence, it theoretically varies from 0 to 24 since each G-tetrad possesses 8 Hoogsteen H-bonds. The switch function used for “cmap” was defined with the reference distance from the hydrogen atom to the acceptor as 0.25 nm and the two exponential factors as 6 and 12, respectively. The second CV is “dRMSD”, which accounts for the differences between the distances of any two atoms in the current conformation and those in the native or reference one. Because covalently bonded atoms will always maintain their distances throughout the simulations; thus, dRMSD in fact describes the variations of distances between non-bonded atoms. The advantage of dRMSD as a CV is that no fitting procedure is required and it is indicative of structural differences between the current conformation and the reference one. For computational efficiency, only backbone heavy atoms (non-hydrogen atoms) were considered for the calculations of dRMSD. The potential energy of the system was still acting as one of the CVs in the second stage; however, the bias on it would never be updated in the whole course of simulations in this stage. The historically deposited bias on potential energy from the simulations in the first stage would now act as a static contribution to the total bias to maintain the enlarged fluctuations of potentials. The bias factor was 10. The Gaussian height was still 1.0 kJ/mol, and the Gaussian widths for “cmap” and “dRMSD” were 0.2 and 0.01 nm, respectively. The simulations at this stage were run for 300 ns for each replica. The software package used for the PTMetaD-WTE simulations in this study was Gromacs-5.0.1 coupled with plumed2.1.0.

12 ACS Paragon Plus Environment

Page 13 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Results and discussions I.

REMD simulations

The folding topologies of the five intramolecular human telomeric DNA G-quadruplexes in the reconciled sequence 5’-TTGGG(TTAGGG)3TTT-3’ were incorporated as conformational isomers in our REMD simulations. Since the five topologies were arranged repeatedly over 64 temperature replicas, the last topology (2KF8, referring to the folding topology of 2KF8 in the unified sequence, same below for the other folding topologies) was therefore taking up one replica fewer than the others. However, due to the subtle difference in the resulted replica occupancies, we consider the five topologies were initially equally distributed over the 64 replicas. a.

Random walks in the temperature space

To realize how the conformations were exchanged during the whole course of simulations, random walks in the temperature space of conformations initially at the lowest five temperature replicas were investigated. The lowest five temperature replicas were initiated in the native folding topologies of 1KF1, 143D, 2JSM, 2JSL and 2KF8, respectively. As is shown in Figure 1, the three folding topologies experimentally promoted in K+ solution which is the same as the simulation environment in our study exhibited a broad temperature range for exchange. Although it is shown that they frequently occupied the low to mild temperature replicas, there were still considerable exchanges of them to the

13 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 60

high temperature replicas. Differently, the other two folding topologies, which are experimentally promoted in Na+ solution (143D) or crystallization environment with K+ (1KF1), were resistant to visit the low temperature replicas and only stayed at the high temperature replicas, especially for 143D. It is clear that the five folding topologies possess distinct thermal stabilities, which lead to the distinct exchange tendencies as observed in Figure 1. Further investigations will be discussed in the next sections. b. Frequencies of native folding topologies occurring at the lowest temperature replica as a function of time It is known that REMD simulations enhance sampling by increasing temperatures; thus, it is expected that thermally induced rare events will occur when conformations are exchanged to high temperature replicas. As a result, the low temperature replicas would be mainly occupied by native or folded conformations. Since we include different folding topologies in the simulations, it is worth investigating which folding topology has the highest probability to occur at the low temperature replicas and how such a probability evolves as a function of time. First of all, frequencies of the native folding topologies occurring at the lowest temperature replica (replica-0) were investigated as a function of time. Conformations from the lowest temperature ensemble were examined against the five native topologies at a time-window of 20 ns. Specifically, a conformation would be attributed to a native structure if its RMSD with reference to that structure is the lowest and within a cutoff of 0.4 nm. The frequencies of the native folding topologies occurring at replica-0 over each 20 ns window were illustrated in Figure 2.

14 ACS Paragon Plus Environment

Page 15 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Although replica-0 was initiated with 1KF1, after replica exchanges in the first 20 ns, it was 2KF8 that exhibited the highest frequency of occurrence at this replica. Whereas after 60 ns, 2JSL had started to rank as the most probable folding topology at replica-0 and remained its highest frequency till the end of the simulation. Moreover, the frequencies of all of the native folding topologies were maintained in the last 100 ns simulations in spite of slight fluctuations, which gives evidence for the convergence of our simulations. c.

Frequencies of native folding topologies occurring at different replicas

Similarly, frequencies of the native folding topologies occurring at different replicas were examined as in Figure 3. The frequencies were calculated based on the conformations from the last 100 ns simulation of each replica. As expected, the occurrences of native folding topologies were dominant at low temperature replicas while prohibited at high temperatures as the result of thermally induced unfolding. Among the low-to-mild temperatures, it was observed that 2JSL exhibited an overwhelming probability of occurrence across a wide range of temperature. For example, at replica-0, nearly 50% of the conformations at this replica adopted the native folding topology of 2JSL; the frequency was decreased to 15% when the temperature was increased to 386 K (replica-30). Moreover, 2KF8 also exhibited a distinguishable probability occurring in a wide range of temperature; whereas the frequencies of 2JSM were much lower. Additionally, it has been checked that there were no conformations at the lowest five temperature replicas to adopt the native topologies of 1KF1 and 143D in the last 100 ns simulations.

15 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 60

d. Relative free energies estimated from frequencies of the native folding topologies at replica-0 The probability of a native structure occurring at a temperature is linked to its stability at that temperature; more specifically, the higher the probability, the more stable the structure is; whereas the lower, the less. As discussed in sections b and c, distinct frequencies of occurrence over replicas were observed among the five native folding topologies. To quantify the difference in stability between native structures from these frequencies, relative free energies were estimated as follows: ∆ = − ∗  (



)

Since there were no observations for the native folding topologies 1KF1 and 143D in the last 100 ns simulation at replica-0, they were not included in the free energy calculations but were recognized as the most unstable native folding topologies. As is shown in Table 2, the free energy differences of 2JSM and 2KF8 with reference to 2JSL were estimated as 10.14 and 5.25 kJ/mol, respectively. This result is consistent with previous experimental findings that the hybrid-2 type (2JSL) is the major conformation adopted by the elongated human telomeric sequence in K+ solution24. In fact, both the hybrid-1 (2JSM) and -2 (2JSL) conformations were observed with the extended human telomeric sequence in K+ solution; however, ~75% of the conformations were identified in the hybrid-2 type with being stabilized by its 3’ end triple capping structure24. The interconversion between the two

16 ACS Paragon Plus Environment

Page 17 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

structures was suggested slow in kinetics; whereas the energy difference between the two was small since the structures are co-existent in dynamic equilibrium and can be readily shifted in solution24. In summary, the relative thermal stabilities between the five structural models of human telomeric DNA G-quadruplexes is predicted as: 2JSL > 2KF8 > 2JSM > 1KF1 ≈ 143D. In the work recently published by Neidle and Sponer51, relative free energies of G-stems in different types of human telomeric G-quadruplexes were evaluated via MM-PBSA method. There are five types of G-quadruplex topologies probed in their study, the first four of which are the same as ours (the parallel, basket, hybrid-1 and -2 types). For the fifth model, we included the two stacking G-tetrads basket type; while they examined the (2+2) antiparallel type for comparison between different G-stems with three stacking G-tetrads. Although both of the two studies are aiming to determine the stabilities of human telomeric G-quadruplexes, they have different research focuses. In Neidle and Sponer’s work, the relative free energies were determined for G-stems (three stacking G-tetrads) only. Although MD simulations with and without loops were both carried out, only G-stems were included in the MM-PBSA calculations. Their objective is to examine the free energy differences between G-stems with different anti/syn arrangements; therefore, their work addresses the stability of G-tetrads stacking in a G-quadruplex. In our REMD study, both the simulations and analyses were inclusive of loops of G-quadruplexes. Hence, our study is focused on the comparison of stabilities between G-quadruplexes with different loop arrangements. The most stable topology predicted from our REMD simulations is the

17 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 60

hybrid-2 type, which has been established as the major conformation adopted by extended human telomeric sequence in experiments.

e.

Unfolding events revealed from the continuous trajectories

Although dynamics in REMD simulations is thermally biased, to monitor the process of unfolding for each native folding topology, continuous trajectories were reassembled. With reference to the initial conformation of each continuous trajectory, the continuous RMSD were calculated as in Figure 4. It was observed that almost all of the replicas initiated from 143D were unfolded in the simulations. This is consistent with the finding that 143D was extremely unstable at the low temperatures in our REMD simulations with K+ and frequently exchanged to the highest temperatures for unfolding. Since the energies of the unfolded structures were high, therefore they were unable to be exchanged back to the very low temperatures. Oppositely, for the most stable conformations 2JSL and 2KF8, although it was shown that there were exchanges of them to the high temperature replicas as indicated in the random walks in temperature space, there were no unfolding events of them to be observed in the simulations. Moreover, there were only one and two unfolding events for 2JSM and 1KF1, respectively. f. Polarization effects of K+ ions on the stabilities of G-quadruplexes Polarization effect induced by K+ ions in the ionic channel of G-quadruplex has been demonstrated by previous study52 to enhance the electrostatic attractions between K+ ions and guanines (O6 atoms mainly). Hence, it is believed that this polarization effect can stabilize the coordination of ions in the ionic channel and the stacking of G-tetrads. 18 ACS Paragon Plus Environment

Page 19 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

However, polarization effects are not described in commonly used MD force fields; as a result, K+ ions are observed to escape from the channel, especially those at the entry/exit sites of the channel. The escaping of K+ ions has been also observed in our previous MD simulations53. Even so, with non-polarizable force fields, K+ ions still show dominant presence at the coordination sites (in-betweens of adjacent stacking G-tetrads). As a result, the overall topologies of Gquadruplexes are still maintained in the MD simulations and G-quadruplexes will not unfold because of the underestimated electrostatic attractions between K+ ions and DNA. In our REMD simulations, the five G-quadruplex structures are all in the unified sequence though in different topologies. The entry/exit sites of the ionic channels are capped by the same 5’ or 3’ end flanking bases. Hence, we believe that the escaping of K+ ions does exist due to the non-polarizable force field and it may affect the absolute free energies of individual structures. However, the relative stabilities between the five investigated topologies will not be biased. In conclusion, the numbers of unfolding events of the five native folding topologies are consistent with their relative stabilities. However, with the few unfolding events for the most stable conformations, the sampling in our REMD simulations was insufficient to reveal the conformations of folding intermediates. II.

PTMetaD-WTE simulations

To explore the folding dynamics of G-quadruplexes, the K+ promoted conformations are of a higher priority than Na+ as K+ is intracellularly more abundant and has a higher biological relevance. Hence, we further investigated the unfolding of the hybrid-1 (2JSM) and -2 (2JSL) type human telomeric DNA G-quadruplexes via PTMetaD-WTE simulations. 19 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 60

In the PTMetaD-WTE simulations, the unfolding of G-quadruplexes was explored in a twodimensional configuration space. In particular, “cmap”, as one of the CV dimensions, measures the total number of the native Hoogsteen H-bonds in the three G-tetrads of a Gquadruplex; while “dRMSD” describes the differences between the distances of any two heavy atoms on the backbone of G-quadruplex in the current conformation and those in the reference one. These two CVs are quantitatively sensitive to the conformational changes of G-quadruplexes during unfolding. Besides, the ranges in which the two CVs span define a moderate configuration space for the sampling of intermediates. The simulations were run for 300 ns on each replica. The convergences of the simulations were checked by the evolution of FES as in Figure S2. Additionally, although there were 16 replicas exchanged in the simulations, only the FESs at the lowest temperature replica (300K) were adopted for analysis. As is shown in Figure S3, the locations of free energy basins on FES are conserved over different replicas; however, energy barriers between the free energy basins are changed with temperatures (i.e. unfolded structures are more accessible at high temperatures). a.

Free energy landscape of 2JSM

In the FES of 2JSM, a total of seven free energy basins were identified as in Figure 5(A). 1) Native structure Firstly, basin A characterizes the native structure of 2JSM at an occupancy of 61.8%. The representative structure of basin A (Figure 6(A)) shows that the structure is well stabilized by the three stacking G-tetrads in a good coordination with two K+ ions. 2) Transition structures: the escaping of K+ and flip-over of N-glycosidic bond Next, four partially unfolded transition structures were identified from basin B, C and D. 20 ACS Paragon Plus Environment

Page 21 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

In basin B, there were two representative structures identified as B1 and B2 (Figure 6(B1) & (B2)), respectively. These two structures are similar in the way of their DG-23 moving away from the 3’ end G-tetrad and only differ in the orientations of their 5’ end flanking bases. The flexibility of DG-23 has been observed in our previous studies in which its reversible conformational changes had facilitated the binding of a ligand molecule at the 3’ end of 2JSM53. Apart from DG-23, the other guanine DG-11 at the 3’ end G-tetrad also tilts out from its native position. The destabilization observed here at the 3’ end G-tetrad might be attributed to the loss of the K+ ion originally sitting in between the 2nd and 3rd G-tetrad. Moreover, the absence of K+ ion at the same position was also observed in the representative structure of basin D (Figure 6(D)). In this structure, DG-23 is also deviating from the 3’ end G-tetrad plane, which further implies a correlation between the flexibility of DG-23 and the escaping of K+ ion at the in-between of the 2nd and 3rd G-tetrad of 2JSM. Nevertheless, the major deformation of this structure in fact arises from the up-shift of its first strand. As a result, it is DG-4 and -5 instead of DG-3 and -4 that participate in the formation of the new 1st and 2nd G-tetrads (Figure S4 (A)) with the presence of the coordinated K+. Strikingly, it has been verified that the up-shift was not a simple translational displacement; however, it was accompanied by the conversion of the Nglycosidic conformation of DG-4 from anti to syn (Figure 7). The reason is that to participate in the cyclic Hoogsteen H-bonding on the 1st G-tetrad, DG-4 which takes the place of DG-3, has to cooperate with the other three guanines to maintain the original tetrad polarity in the counter-clockwise direction (from H-bond donor to acceptor). Whereas for DG-5 which is in the same anti-conformation as DG-4, there would be no adjustment required to participate in the formation of the 2nd G-tetrad. 21 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 60

Similarly, the representative structure of basin C (Figure 6(C)) describes a structure with its last strand shifted upwards. Consequently, DG-22 takes the place of DG-21 and converts its N-glycosidic conformation from anti to syn on the 1st G-tetrad; while DG-23 is incorporated in the 2nd G-tetrad with no flip-over of its anti-conformation (Figure 7). The stacking of the shifted G-tetrads (Figure S4 (B)) was again stabilized by the presence of a well-coordinated K+ ion. 3) Folding intermediates: the hairpin and the triplex Lastly, the folding intermediates were revealed from basin E and G. A triplex intermediate (Figure 6(E)) formed with the first three strands of 2JSM was found to populate at basin E at an occupancy of 93%. The triplex discovered here is different from that proposed previously35 in which the last three strands are engaged in the formation of the triplex with the first strand being put aside to form the double-chain-reversal (propeller) loop lastly. In this triplex, DG-23 of the last strand is intruding into the groove formed by the first strand and the propeller loop; in the groove, DG-23 is stacking with DT-6 and interacting with DG4 via bifurcated H-bonds (Figure 8). The steric hindrance posed by the intrusion of the last strand (mainly DG-23) may suggest its involvement in the formation of the propeller loop which renders the first two strands in the parallel direction. In addition, two K+ ions were captured at the in-betweens of adjacent G-triplets to stabilize the structure. It has been verified that the guanines in this triplex are all in the identical N-glycosidic arrangements as those in the native structure. Moreover, an analogue of the triplex intermediate was also identified at basin F (Figure 6(F)), which adopts a similar topology but with no effective Hoogsteen H-bonding due to the wrong configuration of its anti-/syn- conformations of guanines involved in the triplex. 22 ACS Paragon Plus Environment

Page 23 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

The second folding intermediate was uncovered from basin G (Figure 6(G)), which resembles a hairpin-like conformation formed by the 2nd and 3rd strands. In the conformation, the guanines on these strands are in the correct anti-/syn- arrangements for the Hoogsteen GG base pairing. Besides, a K+ ion sandwiched by the first two GG base pairs was observed to stabilize the structure. b. Free energy landscape of 2JSL In the FES of 2JSL, only two energy basins were recognized as in Figure 5(B). 1) Native structure: the 3’ end T:A:T triple capping structure The native structure of 2JSL was identified at basin H (Figure 6(H)). The structure is wellstabilized with the presence of two coordinated K+ ions and the 3’ end T:A:T capping structure as experimentally determined. 2) Transition structure: the flip-over of N-glycosidic conformation is not necessarily accompanied by an interchange between syn and anti A transition structure was identified at basin I (Figure 6(I)), in which the 1st and last strands are shifted downwards. As a result, a two stacking G-tetrads (Figure S4 (C)) is formed and stabilized by the presence of a coordinated K+. Similar to the structures at basin C and D, the shift was accompanied by the adjustments of N-glycosidic bonds of guanines to accommodate into the newly formed G-tetrads (Figure 9). Specifically, DG-21 which takes the place of DG-22, converts its N-glycosidic conformation from syn to anti to participate in the 1st G-tetrad of the transition structure. DG-3 which takes the place of DG-4 and is expected to flip over into the anti-conformation, however, still keeps its syn-conformation. This result at first confused us on the relation between the flip-over of N-glycosidic bond of a guanine and its participation in the cyclic Hoogsteen H-bonding. Since what we have 23 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 60

found so far is that when deformation occurs, guanines will re-orient themselves to participate in the cyclic Hoogsteen H-bonding for a new G-tetrad by adjusting their Nglycosidic conformations from anti to syn, or vice versa. However, the observation here on DG-3 seems to contradict to the relation. So, it drives us to further investigate the changes of the N-glycosidic torsional angles of the involved guanines as in Figure 9. By comparing the N-glycosidic torsional angles and orientations of guanines before and after deformation, a consistent conclusion was reached. It is true that guanines may re-orient themselves to participate in the cyclic Hoogsteen H-bonding, however, it is not necessary to convert from anti to syn or vice versa as long as an appropriate orientation is reached. Here, an appropriate orientation of a guanine refers to the accommodative directions of its Hoogsteen H-bonding donor and acceptor to the adjacent guanines on the same plane. Therefore, in conclusion, when deformation occurs, a guanine will adjust itself to accommodate into the newly formed G-tetrad. If its orientation is just fit into the G-tetrad, no adjustment is required; however, if not, it will flip over its N-glycosidic bond to re-orient the directions of its donor and acceptor whereas a total conversion from anti to syn or vice versa is not necessary. Specific re-arrangements of N-glycosidic conformations induced by chemical modifications on guanines have been experimentally studied recently1-2. It is found that the rearrangements reversed the tetrad polarity of a G-tetrad while the overall folding topology of the G-quadruplex was maintained. This result is in accordance with our findings that the N-glycosidic conformation is designated to the formation of a feasible cyclic Hoogsteen Hbonding.

24 ACS Paragon Plus Environment

Page 25 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

What’s more, the force field used in this study is Amber “parmbsc039”, which is an improved version of “parm99” for the correct descriptions on the / torsion of nucleic acids. With this force field, we observed that the folding of G-quadruplex is associated with interchanges between the anti and syn conformations of guanines. However, as the profile of glycosidic torsion  has been refined in bsc0OL354 and bsc0OL455 for RNA and DNA, respectively, it is advisable to use these refined versions in studies where the glycosidic torsions may play an important role in the dynamics of nucleic acids. 3) No folding intermediates observed: the high energy barriers On the FES of 2JSL, unfortunately, there were no representative folding intermediates observed. However, the energy barrier of 2JSL from its native topology to the transition structure is extremely high compared to that of 2JSM (Figure S5), which is consistent with our prediction from the REMD simulations that 2JSL is the most stable topology among the five. Thus, we conclude that 2JSL is extremely stable and exhibits high energy barriers from its native conformation to the transition structure; that’s why there are no folding intermediates identified within the time scale used in this study. c.

Proposed folding pathways

Based on the results in this study, new folding pathways for the hybrid-1 and -2 type human telomeric G-quadruplexes were proposed (Figure 10). Same as the hairpin structure proposed by Sugiyama H. et al. for the hybrid-1 type G-quadruplex, a hairpin intermediate formed by the 2nd and 3rd strands in the correct anti/syn configuration was revealed from the FES of 2JSM. The hairpin structure seems to be promoted by the Hoogsteen H-bonding and stabilized by a coordinated K+ between the first two GG base pairs. However, the triplex intermediate formed by the first three strands is different from 25 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 60

that proposed previously which involves the last three strands in the triplex. Since the last strand is captured to interact with the groove formed by the first strand and the propeller loop via π-π stacking and bifurcated H-bonding, it is speculated that the last strand may participate in the formation of the propeller loop. Whereas for 2JSL, unfortunately, we did not observe any folding intermediates from its FES. Considering the sequence similarity between 2JSM and 2JSL, there might be a possibility that it also folds through a hairpin structure. While with the 3’ end flanking bases exclusively for 2JSL, a triplex intermediate consisting of the last three strands may be stimulated by the presence of the 3’ end T:A:T triple capping structure. Besides, it should be noted that the self-adjustments on the Nglycosidic conformations of guanines as observed in the transition structures may occur throughout the folding process for the formation of feasible G-tetrads.

Conclusion In this study, the stability and folding dynamics of human telomeric DNA G-quadruplexes were investigated via enhanced sampling techniques. Firstly, REMD simulations were used to compare the thermal stabilities among the five established human telomeric G-quadruplex folding topologies. The hybrid-2 type conformation adopted by extended human telomeric sequence is revealed to be the most stable conformation in our simulations. Whereas the conformations experimentally promoted in Na+ solution or crystallization environment with K+ turn out to be the most unstable conformations in the simulations. What’s more, the relative free energy differences between the three K+ promoted conformations were determined. It is consistent with experimental findings that the free energy difference between the hybrid-1

26 ACS Paragon Plus Environment

Page 27 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

and -2 type is small while interconversion between them is slow in kinetics, which may arouse from the large energy barriers bridging the two structures. Next, the free energy landscapes and folding intermediates of the hybrid-1 and -2 type human telomeric G-quadruplexes were investigated with PTMetaD-WTE simulations. It is observed that guanines can flip over their N-glycosidic conformations to accommodate into the cyclic Hoogsteen H-bonding on G-tetrads in which they were not originally involved. The flip-over is to re-orient the directions of the H-bond donor and acceptor to the neighboring guanines on the same plane; thus, a complete conversion from anti to syn or vice versa is not necessary as long as an appropriate direction is reached. Furthermore, a hairpin and a triplex intermediate were revealed from the FES of 2JSM. The π-π stacking and bifurcated H-bonding identified in the triplex suggest that the last strand may be involved in the formation of the propeller loop. Whereas for 2JSL, there were no folding intermediates observed from its FES. However, the energy barrier of 2JSL from its native topology to the transition structure is found extremely high compared to that of 2JSM. This is consistent with our stability predictions from the REMD simulations. We hope the insights presented in this work can deepen current understanding on the stability and dynamics of G-quadruplexes, which is necessary not only to stabilize the structures but also to intervene their formation in genome.

27 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 60

Supporting Information Descriptions Figure S1. Structural models of the five human telomeric DNA G-quadruplexes. Figure S2 (A). Evolution of the free energy surface of 2JSM. Figure S2 (B). Evolution of the free energy surface of 2JSL. Figure S3 (A). Free energy surfaces of 2JSM at different replicas (time = 300 ns). Figure S3 (B). Free energy surfaces of 2JSL at different replicas (time = 300 ns). Figure S4. The G-tetrads formed with the shifted guanines. Figure S5 (A). Projections of the free energy surface of 2JSM on CV: “cmap” and “drmsd” from t = 200 to 300 ns (at replica-0). Figure S5 (B). Projections of the free energy surface of 2JSL on CV: “cmap” and “drmsd” from t = 200 to 300 ns (at replica-0). This material is available free of charge via the Internet at http://pubs.acs.org/.

28 ACS Paragon Plus Environment

Page 29 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Acknowledgment This work was partially supported by NTU Tier 1 grant RG138/15.

29 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 60

Reference (1). Dickerhoff, J.; Weisz, K. Flipping a G-Tetrad in a Unimolecular Quadruplex without Affecting Its Global Fold. Angew. Chem., Int. Ed. 2015, 54, 5588-5591. (2). Cheong, V. V.; Lech, C. J.; Heddi, B.; Phan, A. T. Inverting the G-Tetrad Polarity of a G-Quadruplex by Using Xanthine and 8-Oxoguanine. Angew. Chem., Int. Ed. 2016, 55, 160-163. (3). Moyzis, R. K.; Buckingham, J. M.; Cram, L. S.; Dani, M.; Deaven, L. L.; Jones, M. D.; Meyne, J.; Ratliff, R. L.; Wu, J. R. A Highly Conserved Repetitive DNA-Sequence, (Ttaggg)N, Present at the Telomeres of Human-Chromosomes. Proc. Natl. Acad. Sci. U. S. A. 1988, 85, 6622-6626. (4). Harley, C. B.; Futcher, A. B.; Greider, C. W. Telomeres Shorten During Aging of Human Fibroblasts. Nature 1990, 345, 458-460. (5). Bodnar, A. G.; Ouellette, M.; Frolkis, M.; Holt, S. E.; Chiu, C. P.; Morin, G. B.; Harley, C. B.; Shay, J. W.; Lichtsteiner, S.; Wright, W. E. Extension of Life-Span by Introduction of Telomerase into Normal Human Cells. Science 1998, 279, 349-352. (6). Counter, C. M.; Avilion, A. A.; Lefeuvre, C. E.; Stewart, N. G.; Greider, C. W.; Harley, C. B.; Bacchetti, S. Telomere Shortening Associated with Chromosome Instability Is Arrested in Immortal Cells Which Express Telomerase Activity. Embo J. 1992, 11, 1921-1929. (7). Zahler, A. M.; Williamson, J. R.; Cech, T. R.; Prescott, D. M. Inhibition of Telomerase by G-Quartet DNA Structures. Nature 1991, 350, 718-720. (8). Sun, D. Y.; Thompson, B.; Cathers, B. E.; Salazar, M.; Kerwin, S. M.; Trent, J. O.; Jenkins, T. C.; Neidle, S.; Hurley, L. H. Inhibition of Human Telomerase by a G-Quadruplex-Interactive Compound. J. Med. Chem. 1997, 40, 2113-2116. (9). Siddiqui-Jain, A.; Grand, C. L.; Bearss, D. J.; Hurley, L. H. Direct Evidence for a G-Quadruplex in a Promoter Region and Its Targeting with a Small Molecule to Repress C-Myc Transcription. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 11593-11598. (10). Paeschke, K.; Capra, J. A.; Zakian, V. A. DNA Replication through G-Quadruplex Motifs Is Promoted by the Saccharomyces Cerevisiae Pif1 DNA Helicase. Cell 2011, 145, 678-691. (11). Budhathoki, J. B.; Stafford, E. J.; Yodh, J. G.; Balci, H. ATP-Dependent G-Quadruplex Unfolding by Bloom Helicase Exhibits Low Processivity. Nucleic Acids Res. 2015, 43, 5961-5970. (12). London, T. B. C.; Barber, L. J.; Mosedale, G.; Kelly, G. P.; Balasubramanian, S.; Hickson, I. D.; Boulton, S. J.; Hiom, K. Fancj Is a Structure-Specific DNA Helicase Associated with the Maintenance of Genomic G/C Tracts. J. Biol. Chem. 2008, 283, 36132-36139. (13). Valton, A. L.; Hassan‐Zadeh, V.; Lema, I.; Boggetto, N.; Alberti, P.; Saintomé, C.; Riou, J. F.; Prioleau, M. N. G4 Motifs Affect Origin Positioning and Efficiency in Two Vertebrate Replicators. EMBO J. 2014, 33, 732-746. (14). Schaffitzel, C.; Berger, I.; Postberg, J.; Hanes, J.; Lipps, H. J.; Plückthun, A. In Vitro Generated Antibodies Specific for Telomeric Guanine-Quadruplex DNA React with Stylonychia Lemnae Macronuclei. Proc. Natl. Acad. Sci. U. S. A. 2001, 98, 8572-8577. (15). Lam, E. Y. N.; Beraldi, D.; Tannahill, D.; Balasubramanian, S. G-Quadruplex Structures Are Stable and Detectable in Human Genomic DNA. Nat. Commun. 2013, 4, 1796. (16). Biffi, G.; Tannahill, D.; McCafferty, J.; Balasubramanian, S. Quantitative Visualization of DNA GQuadruplex Structures in Human Cells. Nat. Chem. 2013, 5, 182-186. (17). Chambers, V. S.; Marsico, G.; Boutell, J. M.; Di Antonio, M.; Smith, G. P.; Balasubramanian, S. High-Throughput Sequencing of DNA G-Quadruplex Structures in the Human Genome. Nat. Biotech. 2015, 33, 877-881. 30 ACS Paragon Plus Environment

Page 31 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

(18). Wang, Y.; Patel, D. J. Solution Structure of the Human Telomeric Repeat D[Ag(3)(T(2)Ag(3))3] GTetraplex. Structure 1993, 1, 263-282. (19). Parkinson, G. N.; Lee, M. P. H.; Neidle, S. Crystal Structure of Parallel Quadruplexes from Human Telomeric DNA. Nature 2002, 417, 876-880. (20). Ambrus, A.; Chen, D.; Dai, J.; Bialis, T.; Jones, R. A.; Yang, D. Human Telomeric Sequence Forms a Hybrid-Type Intramolecular G-Quadruplex Structure with Mixed Parallel/Antiparallel Strands in Potassium Solution. Nucleic Acids Res. 2006, 34, 2723-2735. (21). Luu, K. N.; Phan, A. T.; Kuryavyi, V.; Lacroix, L.; Patel, D. J. Structure of the Human Telomere in K+ Solution:  An Intramolecular (3 + 1) G-Quadruplex Scaffold. J. Am. Chem. Soc. 2006, 128, 9963-9970. (22). Xu, Y.; Noguchi, Y.; Sugiyama, H. The New Models of the Human Telomere D[Aggg(Ttaggg)3] in K+ Solution. Bioorg. Med. Chem. 2006, 14, 5584-91. (23). Phan, A. T.; Luu, K. N.; Patel, D. J. Different Loop Arrangements of Intramolecular Human Telomeric (3+1) G-Quadruplexes in K+ Solution. Nucleic Acids Res. 2006, 34, 5715-5719. (24). Dai, J.; Carver, M.; Punchihewa, C.; Jones, R. A.; Yang, D. Structure of the Hybrid-2 Type Intramolecular Human Telomeric G-Quadruplex in K+ Solution: Insights into Structure Polymorphism of the Human Telomeric Sequence. Nucleic Acids Res. 2007, 35, 4927-4940. (25). Lim, K. W.; Amrane, S.; Bouaziz, S.; Xu, W. X.; Mu, Y. G.; Patel, D. J.; Luu, K. N.; Phan, A. T. Structure of the Human Telomere in K+ Solution: A Stable Basket-Type G-Quadruplex with Only Two GTetrad Layers. J. Am. Chem. Soc. 2009, 131, 4301-4309. (26). Dai, J.; Carver, M.; Yang, D. Polymorphism of Human Telomeric Quadruplex Structures. Biochimie 2008, 90, 1172-1183. (27). Gray, R. D.; Buscaglia, R.; Chaires, J. B. Populated Intermediates in the Thermal Unfolding of the Human Telomeric Quadruplex. J. Am. Chem. Soc. 2012, 134, 16834-16844. (28). Lee, J. Y.; Okumus, B.; Kim, D. S.; Ha, T. Extreme Conformational Diversity in Human Telomeric DNA. Proc. Natl. Acad. Sci. U. S. A. 2005, 102, 18938-18943. (29). Li, W.; Hou, X.-M.; Wang, P.-Y.; Xi, X.-G.; Li, M. Direct Measurement of Sequential Folding Pathway and Energy Landscape of Human Telomeric G-Quadruplex Structures. J. Am. Chem. Soc. 2013, 135, 6423-6426. (30). You, H.; Zeng, X.; Xu, Y.; Lim, C. J.; Efremov, A. K.; Phan, A. T.; Yan, J. Dynamics and Stability of Polymorphic Human Telomeric G-Quadruplex under Tension. Nucleic Acids Res. 2014, 13, 8789–8795. (31). Bessi, I.; Jonker, H. R. A.; Richter, C.; Schwalbe, H. Involvement of Long-Lived Intermediate States in the Complex Folding Pathway of the Human Telomeric G-Quadruplex. Angew. Chem., Int. Ed. 2015, 54, 8444-8448. (32). Zhang, A. Y. Q.; Balasubramanian, S. The Kinetics and Folding Pathways of Intramolecular GQuadruplex Nucleic Acids. J. Am. Chem. Soc. 2012, 134, 19297-19308. (33). Limongelli, V.; De Tito, S.; Cerofolini, L.; Fragai, M.; Pagano, B.; Trotta, R.; Cosconati, S.; Marinelli, L.; Novellino, E.; Bertini, I., et al. The G-Triplex DNA. Angew. Chem., Int. Ed. 2013, 52, 2269-2273. (34). Bončina, M.; Lah, J.; Prislan, I.; Vesnaver, G. Energetic Basis of Human Telomeric DNA Folding into G-Quadruplex Structures. J. Am. Chem. Soc. 2012, 134, 9657-9663. (35). Mashimo, T.; Yagi, H.; Sannohe, Y.; Rajendran, A.; Sugiyama, H. Folding Pathways of Human Telomeric Type-1 and Type-2 G-Quadruplex Structures. J. Am. Chem. Soc. 2010, 132, 14910-14918. (36). Stadlbauer, P.; Kührová, P.; Banáš, P.; Koča, J.; Bussi, G.; Trantírek, L.; Otyepka, M.; Šponer, J. Hairpins Participating in Folding of Human Telomeric Sequence Quadruplexes Studied by Standard and T-Remd Simulations. Nucleic Acids Res. 2015, 43, 9626-9644. (37). Stadlbauer, P.; Trantírek, L.; Cheatham Iii, T. E.; Koča, J.; Šponer, J. Triplex Intermediates in Folding of Human Telomeric Quadruplexes Probed by Microsecond-Scale Molecular Dynamics Simulations. Biochimie 2014, 105, 22-35.

31 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 60

(38). Sugita, Y.; Okamoto, Y. Replica-Exchange Molecular Dynamics Method for Protein Folding. Chem. Phys. Lett. 1999, 314, 141-151. (39). Perez, A.; Marchan, I.; Svozil, D.; Sponer, J.; Cheatham, T. E.; Laughton, C. A.; Orozco, M. Refinenement of the Amber Force Field for Nucleic Acids: Improving the Description of Alpha/Gamma Conformers. Biophys. J. 2007, 92, 3817-3829. (40). Yoo, J. J.; Aksimentiev, A. Improved Parametrization of Li+, Na+, K+, and Mg2+ Ions for All-Atom Molecular Dynamics Simulations of Nucleic Acid Systems. J. Phys. Chem. Lett. 2012, 3, 45-50. (41). Darden, T.; York, D.; Pedersen, L. Particle Mesh Ewald - an N.Log(N) Method for Ewald Sums in Large Systems. J. Chem. Phys. 1993, 98, 10089-10092. (42). Hess, B. P-Lincs: A Parallel Linear Constraint Solver for Molecular Simulation. J. Chem. Theory Comput. 2008, 4, 116-122. (43). Bussi, G.; Donadio, D.; Parrinello, M. Canonical Sampling through Velocity Rescaling. J. Chem. Phys. 2007, 126, 014101. (44). Patriksson, A.; van der Spoel, D. A Temperature Predictor for Parallel Tempering Simulations. Phys. Chem. Chem. Phys. 2008, 10, 2073-2077. (45). Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. Gromacs 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 2008, 4, 435-447. (46). Bussi, G.; Gervasio, F. L.; Laio, A.; Parrinello, M. Free-Energy Landscape for Β Hairpin Folding from Combined Parallel Tempering and Metadynamics. J. Am. Chem. Soc. 2006, 128, 13435-13441. (47). Laio, A.; Gervasio, F. L. Metadynamics: A Method to Simulate Rare Events and Reconstruct the Free Energy in Biophysics, Chemistry and Material Science. Rep. Prog. Phys. 2008, 71, 126601. (48). Barducci, A.; Bussi, G.; Parrinello, M. Well-Tempered Metadynamics: A Smoothly Converging and Tunable Free-Energy Method. Phys. Rev. Lett. 2008, 100, 020603. (49). Bonomi, M.; Parrinello, M. Enhanced Sampling in the Well-Tempered Ensemble. Phys. Rev. Lett. 2010, 104, 190601. (50). Prakash, M. K.; Barducci, A.; Parrinello, M. Replica Temperatures for Uniform Exchange and Efficient Roundtrip Times in Explicit Solvent Parallel Tempering Simulations. J. Chem. Theory Comput. 2011, 7, 2025-2027. (51). Islam, B.; Stadlbauer, P.; Neidle, S.; Haider, S.; Sponer, J. Can We Execute Reliable MM-PBSA Free Energy Computations of Relative Stabilities of Different Guanine Quadruplex Folds? J. Phys. Chem. B 2016, 120, 2899-2912. (52). Song, J.; Ji, C.; Zhang, J. Z. H. The Critical Effect of Polarization on the Dynamical Structure of Guanine Quadruplex DNA. Phys. Chem. Chem. Phys. 2013, 15, 3846-3854. (53). Luo, D.; Mu, Y. All-Atomic Simulations on Human Telomeric G-Quadruplex DNA Binding with Thioflavin T. J. Phys. Chem. B 2015, 119, 4955-4967. (54). Zgarbová, M.; Otyepka, M.; Šponer, J.; Mládek, A.; Banáš, P.; Cheatham, T. E.; Jurečka, P. Refinement of the Cornell et al. Nucleic Acids Force Field Based on Reference Quantum Chemical Calculations of Glycosidic Torsion Profiles. J. Chem. Theory Comput. 2011, 7, 2886-2902. (55). Krepl, M.; Zgarbová, M.; Stadlbauer, P.; Otyepka, M.; Banáš, P.; Koča, J.; Cheatham, T. E.; Jurečka, P.; Šponer, J. Reference Simulations of Noncanonical Nucleic Acids with Different χ Variants of the Amber Force Field: Quadruplex DNA, Quadruplex RNA, and Z-DNA. J. Chem. Theory Comput. 2012, 8, 2506-2520. (56). Phan, A. T.; Kuryavyi, V.; Luu, K. N.; Patel, D. J. Structure of Two Intramolecular G-Quadruplexes Formed by Natural Human Telomere Sequences in K+ Solution. Nucleic Acids Res. 2007, 35, 6517-6525.

32 ACS Paragon Plus Environment

Page 33 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Table 1. Structural models of the five human telomeric DNA G-quadruplexes.

The five established human telomeric DNA G-quadruplexes investigated in this study are 1KF1(PDB code, same below)19 (K+ induced parallel type), 143D18 (Na+ induced anti-parallel or 3-tetrad basket type), 2JSM56 (K+ induced hybrid-1 type), 2JSL56 (K+ induced hybrid-2 type) and 2KF825 (K+ induced two G-tetrads basket type).

33 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 60

Table 2. Relative free energies of the three K+ promoted G-quadruplexes. PDB Relative free energy (kJ/mol)

2JSM 10.14

2JSL 0

2KF8 5.25

Frequencies of G-quadruplexes (folded) occurring at replica-0 in the last 100 ns simulations were utilized to determine the relative free energies among different topologies. Since 1KF1 and 143D do not occur at replica-0 in the last 100 ns simulations, they are not included in the calculation.

34 ACS Paragon Plus Environment

Page 35 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure Caption Figure 1. Random walks in the temperature space of conformations initially (time = 0 ps) at the lowest five temperature replicas. Time-evolving replica series visited by conformations initially (time = 0 ps) at the lowest five temperature replicas are shown. The lowest five temperature replicas are initiated in the folded Gquadruplex topologies: 1KF1, 143D, 2JSM, 2JSL and 2KF8, respectively.

Figure 2. Frequencies of the five G-quadruplex topologies (folded) occurring at replica-0 as a function of time. Frequencies of the five G-quadruplex topologies (folded) occurring at replica-0 are shown at a time window of 20 ns. The convergence of the REMD simulations is indicated by the steady frequencies in the last 100 ns.

Figure 3. Frequencies of the five G-quadruplex topologies (folded) occurring at different replicas. Frequencies of the five G-quadruplex topologies (folded) occurring at all replicas in the last 100 ns simulations are shown. The highest frequencies of 2JSL across a wide range of low temperature replicas reveal its highest thermal stability over the other G-quadruplex topologies investigated in this study.

Figure 4. RMSDs of continuous trajectories from the REMD simulations. To trace the unfolding events, continuous trajectories were assembled from the temperatureensemble trajectories in the REMD simulations and continuous RMSDs were calculated. The RMSDs are referenced to the initial conformation of each continuous trajectory and classified into five groups based on their initial folding topologies.

Figure 5. Free energy surfaces from the PTMetaD-WTE simulations. The free energy surfaces of 2JSM and 2JSL were explored with the PT-MetaD simulations in the well-tempered ensemble, as shown in (A) and (B), respectively. There are seven local minima identified in (A), denoted as A, B1, B2, C, D, E, F, and G. B1 & B2 are from the same free energy basin while their representative structures exhibit slight conformational distinction. In (B), only two local minima were identified, denoted as H and I.

35 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 60

Figure 6. Representative structures of free energy basins. Conformations delimited by free energy basins were clustered for the representative structures. From (A) to (I), the representative structures to free energy basins A to I are shown with their clustering occupancies (percentage of conformations in the top cluster). (A) Depicts the native topology of 2JSM in which two K+ ions are well-coordinated between the three stacking G-tetrads. Both of (B1) and (B2) exhibit the deformation of DG-23 moving away from the third G-tetrad of 2JSM, but the difference between the two is the orientations of their 5’ end flanking bases. In transition structure (C), the last G-tract of 2JSM is shifted upwards with the N-glycosidic conformation of DG-22 interchanged from anti to syn. Similarly, in transition structure (D), the first strand of 2JSM is shifted upwards with the N-glycosidic conformation of DG-4 interchanged from anti to syn. (E) depicts the triplex folding intermediate of 2JSM, which is formed by the first three Gtracts in the correct anti/syn conformations and stabilized by the presence of two coordinated K+ ions. (F) also exhibits a triplex-like structure; however, its wrong configuration of the anti/syn Nglycosidic conformations makes no effective Hoogsteen H-bonds. (G) is the hairpin folding intermediate, formed by the 2nd and 3rd G-tracts of 2JSM. (H) describes the native topology of 2JSL, which is stabilized by the presence of two coordinated K+ ions and the 3’ end T:A:T capping structure. (I) is characterized as a transition structure, in which the first and last G-tracts of 2JSL are shifted towards the 3’ end. Flip-overs of the N-glycosidic conformations are expected for DG-3 & 21 to participate in the newly formed cyclic Hoogsteen H-bonding. However, the flip-over on DG-3 is not associated with a complete conversion from syn to anti.

Figure 7. Interchanges of the N-glycosidic conformations in 2JSM. Structural models from the native structure of 2JSM to transition structures D & C are shown in (A) with the N-glycosidic conformations coloured in blue (“anti” type) and red (“syn” type), respectively. The torsional angle (O4’-C1’-N9-C4) around the N-glycosidic bond in the range from -90° to +90° is defined in the “syn” type; oppositely, angles outside this range (i.e. [-180°, -90°] & [90°, 180°]) are recognized in the “anti” type. In transition structure D, where the first strand of 2JSM undergoes upward shifting, DG-4 which is originally in the “anti” type conformation has to take the place of DG-3 which exhibits a “syn” type conformation in the native structure. The shifting results DG-4, 9, 17 and 21 on the same plane. To involve in the cyclic Hoogsteen H-bonding with the other three guanines, DG-4 has to flip over to expose its donor edge towards DG-21 and acceptor edge towards DG-9. By rotating the N-glycosidic bond, the conformation of DG-4 was converted from anti (-152.1°) to syn (56.6°) as illustrated in (B). Conformations in the native structure are shown in green and those in the transition structures are shown in cyan (same below). In the same transition structure D, DG-5 has to take the place of DG-4 and involves in the cyclic Hoogsteen H-bonding with DG-10, 16 and 22. As both of them are originally in the anti type conformation, there is no flip-over required (as shown in C). Similarly, in transition structure C, where the last strand of 2JSM is shifted upwards, DG-22 originally in the anti type conformation has to take the place of DG-21 to involve in the G-tetrad formation with DG-3, 9 and 17. Since DG-21 is originally in the syn type conformation, DG-22 has to convert from anti to syn (shown in (D)) to re-orient its donor edge towards DG-17 and acceptor edge towards DG-3. Lastly, DG-23 which is in the same anti type conformation as the original DG-22, does not need to interchange its N-glycosidic conformation (shown in (E)) to form a G-tetrad with DG-4, 10 and 16. 36 ACS Paragon Plus Environment

Page 37 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 8. Interactions in the triplex intermediate of 2JSM. In the triplex intermediate, as the last strand of 2JSM is intruding into the groove formed by the first strand and the propeller loop, DG-23 is stacking with DT-6 and forming bifurcated H-bonds with DG-4.

Figure 9. Interchanges of the N-glycosidic conformations in 2JSL. Structural model from the native structure of 2JSL to transition structure I is shown in (A) with the N-glycosidic conformations coloured in blue (anti) and red (syn), respectively. In transition structure I, where the first and last strands of 2JSL are shifted downwards, DG-3 originally in the syn type conformation has to take the place of DG-4 to participate in the cyclic Hoogsteen Hbonding with DG-10, 16 and 21 (DG-21 also shifted downwards). It has been observed that DG-3 does flip over itself to expose its donor edge towards DG-10 and acceptor edge towards DG-21 (like DG-4 in the native structure). However, DG-3 is remained in the syn type conformation (shown in (B)). For DG-21, it flips over from its original syn conformation to the anti type (shown in (D)). Whereas, there are no flip-overs required for DG-4 (shown in (C)) and DG-22 (shown in (E)) to participate in the G-tetrad formation with DG-9 & 17.

Figure 10. Proposed folding pathways for the hybrid-1 and -2 type human telomeric DNA Gquadruplexes. Folding pathways for the hybrid-1 and -2 type human telomeric DNA G-quadruplexes were proposed as shown above. The folding of a G-quadruplex can be described as a process in which a single-stranded G-rich sequence seeks to optimize its N-glycosidic conformations for the tetradwise G-G Hoogsteen base pairing. The initial types of the N-glycosidic conformations of guanines can be arbitrary (coloured in grey); however, to form effective Hoogsteen H-bonding, proper arrangements of the N-glycosidic conformations are essential for guanines to pose their donor and acceptor edges in an accommodative manner towards their neighbours. In the hairpin and triplex folding intermediates identified for 2JSM, all guanines involved in the intermediates are in the correct anti/syn types as in the native structure (anti in blue and syn in red). K+ ions are captured to stabilize the π-π stacking in the intermediates. For 2JSL, due to its high free energy barriers from the native topology to the transition structures, no folding intermediates were targeted within the time scale explored in this study. Considering its sequence similarity to 2JSM, a hairpin intermediate formed by the 2nd and 3rd strands was proposed. While with the 3’ end flanking bases exclusively for 2JSL, a triplex intermediate consisting of the last three strands may be stimulated by the presence of the 3’ end T:A:T triple capping structure. (Thymine is shown in yellow and Adenine in green.)

37 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 38 of 60

Figure 1. Random walks in the temperature space of conformations initially (time = 0 ps) at the lowest five temperature replicas.

38 ACS Paragon Plus Environment

Page 39 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 2. Frequencies of the five G-quadruplex topologies (folded) occurring at replica-0 as a function of time.

39 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 40 of 60

Figure 3. Frequencies of the five G-quadruplex topologies (folded) occurring at different replicas.

40 ACS Paragon Plus Environment

Page 41 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 4. RMSDs of continuous trajectories from the REMD simulations.

41 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 42 of 60

Figure 5. Free energy surfaces from the PTMetaD-WTE simulations. 42 ACS Paragon Plus Environment

Page 43 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 6. Representative structures of free energy basins. 43 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 60

Figure 7. Interchanges of the N-glycosidic conformations in 2JSM.

44 ACS Paragon Plus Environment

Page 45 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 8. Interactions in the triplex intermediate of 2JSM.

45 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 46 of 60

Figure 9. Interchanges of the N-glycosidic conformations in 2JSL.

46 ACS Paragon Plus Environment

Page 47 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 10. Proposed folding pathways for the hybrid-1 and -2 type human telomeric DNA Gquadruplexes.

47 ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 48 of 60

TOC Image

48 ACS Paragon Plus Environment

Page 49 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 1. Random walks in the temperature space of conformations initially (time = 0 ps) at the lowest five temperature replicas. 145x72mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Frequencies of the five G-quadruplex topologies (folded) occurring at replica-0 as a function of time. 250x94mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 50 of 60

Page 51 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 3. Frequencies of the five G-quadruplex topologies (folded) occurring at different replicas. 250x122mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. RMSDs of continuous trajectories from the REMD simulations. 145x74mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 52 of 60

Page 53 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 5(A). Free energy surfaces from the PTMetaD-WTE simulations. 600x405mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5(B). Free energy surfaces from the PTMetaD-WTE simulations. 600x408mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 54 of 60

Page 55 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 6. Representative structures of free energy basins. 330x430mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8. Interactions in the triplex intermediate of 2JSM. 246x170mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 56 of 60

Page 57 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

Figure 9. Interchanges of the N-glycosidic conformations in 2JSL. 150x91mm (300 x 300 DPI)

ACS Paragon Plus Environment

The Journal of Physical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 10. Proposed folding pathways for the hybrid-1 and -2 type human telomeric DNA G-quadruplexes. 291x412mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 58 of 60

Page 59 of 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

The Journal of Physical Chemistry

TOC image 34x14mm (600 x 600 DPI)

ACS Paragon Plus Environment