GADDLE Maps: General Algorithm for Discrete object Deformations

Dec 19, 2017 - A new method for switching between structures consisting of equivalent discrete and flexible objects with different particle representa...
1 downloads 11 Views 9MB Size
Subscriber access provided by RMIT University Library

Article

GADDLE Maps: General Algorithm for Discrete object Deformations based on Local Exchange Maps Txema Otero, Hadrián Montes, Martín Calvelo, Rebeca GarciaFandino, Luis J. Gallego, Ángel Piñeiro, and Luis Miguel Varela J. Chem. Theory Comput., Just Accepted Manuscript • DOI: 10.1021/acs.jctc.7b00861 • Publication Date (Web): 19 Dec 2017 Downloaded from http://pubs.acs.org on December 21, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Chemical Theory and Computation is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

GADDLE Maps: General Algorithm for Discrete object Deformations based on Local Exchange Maps J. Manuel Otero-Mato,† Hadri´an Montes-Campos,‡ Mart´ın Calvelo,¶ Rebeca ´ Garc´ıa-Fandi˜no,§ Luis J. Gallego,‡ Angel Pi˜neiro,∗,k and Luis M. Varela∗,‡ †Nanomaterials, Photonics and Soft Matter Group, Departamento de F´ısica de Part´ıculas y Departamento de F´ısica Aplicada, Facultade de F´ısica, Universidade de Santiago de Compostela, Campus Vida s/n, E-15782 Santiago de Compostela, Spain ‡Nanomaterials, Photonics and Soft Matter Group. Departamento de F´ısica de Partculas y Departamento de F´ısica Aplicada, Facultade de F´ısica, Universidade de Santiago de Compostela, Campus Vida s/n, E-15782 Santiago de Compostela, Spain ¶CIQUP, Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, R. Campo alegre, 687 P-4169-007 Porto, Portugal §Department of Organic Chemistry, Center for Research in Biological Chemistry and Molecular Materials, University of Santiago de Compostela, Campus Vida s/n E-15782 Santiago de Compostela, Spain kSoft Matter and Molecular Biophysics Group, Departamento de F´ısica Aplicada, Facultade de F´ısica, Universidade de Santiago de Compostela, Campus Vida s/n, E-15782 Santiago de Compostela, Spain E-mail: [email protected]; [email protected]

1

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract A new method for switching between structures consisting of equivalent discrete and flexible objects with different particle representation and object configuration, including different resolution levels (number of particles per object), is reported. The method is fully general since it does not require any extra code nor additional database elements for new systems. It is based on a Monte Carlo sampling of the configurational space for each object type of the target system, which is controlled by a Metropolis acceptance criterion of movements (translations, rotations and relative deformations of the object configuration) based on the generalized distance between the sets of particles at both representations. For Gaussian distributed distances, such minimization procedure is equivalent to an optimization of χ2 in a maximum likelihood method. This provides sound statistical support since the method leads to the statistically most probable configuration of the system at each representation. The configurations obtained in this way are then used to create resolution exchange maps for each object type, which allows the extrapolation of the conversion to every object configuration throughout the whole system. As an example, the method is here tested with several molecular dynamics simulated systems (ionic liquids, cyclodextrins, cell-penetrating peptides, cyclic peptides, lipid bilayers, vesicles, heterogeneous organic molecules, DNA and solvated proteins) for different resolution forcefields (GROMOS, AMBER, OPLS, MARTINI) using Gromacs. In this context, the method can be applied to map structures described by any other pair of force fields, as well as to homogeneous and heterogeneous systems with many different molecules. The method is proved to be highly efficient since the time required for the mapping is practically independent of the number of molecules in the target system.

Introduction In 1978, the first computational molecular dynamics (MD) simulation of a protein at atomistic level was performed. 1,2 The simulated system was the bovine pancreatic trypsin in2

ACS Paragon Plus Environment

Page 2 of 43

Page 3 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

hibitor (BPTI), with approximately 500 atoms, in vacuum, and the simulation time was only 9.2 ps. Nowadays, typical MD simulation systems include tens or even hundreds of thousands of atoms, for which microsecond timescales are reachable. The combined effort of many research groups through the last decades boosted the advance of hardware, software and algorithms, allowing such amazing progress in this kind of calculations. Thus, classical MD simulations at atomistic (AT) resolution have been successfully employed to study an impressive variety of molecular systems including supramolecular self-assembled structures in solution and at interfaces (amorphous aggregates of different size, micelles, membranes, vesicles and liposomes formed by a large variety of molecules), peptides, water-soluble or membrane proteins, DNA and other polymers, drugs, solvents, nanoparticles, and functional surfaces (gold, graphene, etc.). The correct analysis of well-designed simulations has the potential to provide energetic, structural and dynamic properties of the studied systems, in addition to clear pictures and movies that undoubtedly help us to understand the connection between all these properties and the corresponding interatomic and intermolecular interactions. The effect of explicit solvent and the presence of different phases can also be directly considered. More specific analysis also allows identifying useful interaction and dynamic mechanisms such as the passage of a molecule through a pore or membrane channel, 3 the location of hot residues in proteins or peptides, which can stabilize or perturb the structures upon mutation, 4 and folding/unfolding patterns. 5 In spite of the amazing possibilities affordable in terms of system variety and size, as well as in simulation timescales, there are many situations where the current limitations of atomic resolution MD simulations are too restrictive. This can be due to the huge amount of molecules involved in the target system or/and to the large time required to achieve a given process, amongst other reasons. The so-called Coarse Grained (CG) MD simulations, where groups of several atoms are represented by single particles, are typically employed in these cases. Such simulations allow increasing the system size and the total simulation timescales by 2-3 orders of magnitude due to the largely reduced number of explicit particles

3

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

to be considered, to the consequent reduction of the number of degrees of freedom and the smoothening of the energy landscape. 6 This allows significantly increasing the time step employed for the integration of the equation of motion. Thus, the spontaneous formation of large aggregates, the equilibrium or molecular exchange between different supramolecular structures or the aggregation of large macromolecules can be studied at this resolution level. One of the most successful CG approaches is the MARTINI force field, 7 which has been routinely employed by many research groups for the last fifteen years. Simulations at CG resolution are specially well suited for self-assembly studies of relatively small molecules, but more complex studies like protein aggregation in solution or embedded in membranes have also been reported (see refs. 8, 9 and references cited therein). The key points of CG simulations are the definition of the CG particles on one hand, since the grouping of actual atoms can be performed in different ways, and, on the other hand, the parameterization of the resulting structures, i.e. the interaction between CG beads. Specific CG topologies and structures for a number of molecules are available in the official MARTINI web site. 10 New molecules can be parameterized by hand with some effort but a number of automated tools are also available. 11,12 The loss of resolution associated to CG simulations entails a loss of information and, in some cases, artefacts that could lead to misinterpretations of the real molecular system behaviour. A good solution for this is performing multiscale simulations taking advantage of the best features of each simulation level, i.e. CG simulations would be done for relatively large scale self-assembly studies, for molecular exchange between different structures, to monitor the transfer of molecules between different regions separated by energy barriers, etc; and AT simulations would be then performed to go into details at atomistic or molecular level. The bottleneck of this procedure is the recovery of the AT resolution from CG structures. Several automated tools and methods have also been developed for this aim. Rzepiela et al. 13 proposed a method where the set of atoms represented by each CG bead are randomly positioned within a sphere with a radius of 0.3 nm around it; then, a simulated annealing simulation is employed to optimize the structure, gradually releasing

4

ACS Paragon Plus Environment

Page 4 of 43

Page 5 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

the imposed restraints between the center of mass of each set of atoms and the position of the corresponding CG bead. This method has the inconvenience that it is necessary to know the equivalence of each CG bead in AT resolution, so some kind of database is required. The method proposed by Stansfeld and Sansom 14 is specifically addressed to the backmapping of membrane systems; proteins are solved by simply aligning the AT to the CG structures or using the positions of the reference beads combined with modeller tools if conformational changes at CG level are considered. In their method, the atomic lipid coordinates are recovered based on the replacement of individual CG by AT structures taken from a pool of many fragment conformations previously generated. Brocos et al. 15 proposed an interpolation-extrapolation method for the atomic positions that is then optimized by a series of energy minimization methods. Wassenaar et al. 16 suggested a slightly more elaborate geometric approach. The disadvantage of the latter two methods is that only the molecules or residues included in the code or in a specific library can be considered. In general, all these methods are slow and complex, or they require specific implementations for new molecules. In this paper we present an efficient and fully general Monte Carlo-based method capable of recovering atomic resolution from any CG structure in a very easy and statistically well founded way without generating new code, library elements nor previously performing AT simulations to generate conformational pools. A detailed description of the method is presented followed by its application to a large variety of molecular systems of interest.

Theoretical and computational methods In this section, a new method to switch between structures consisting of equivalent discrete and flexible objects with different particle representations and object configurations is described in detail. The method is expected to be fully general but we will take the mapping of CG molecular structures to their corresponding AT resolution as a test case. In this context,

5

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

the information required to apply this algorithm is: (i) the CG coordinates of the system to be mapped (a PDB file or the equivalent information in another format); (ii) the CG and AT topology files for the compounds of the system to be mapped (ITP files in Gromacs format are being employed in our current implementation but the method could be easily adapted to any other MD engine); and (iii) one arbitrary AT-resolution conformation for each compound to be mapped. These files do not represent strong limitations to employ the method since the CG target structure is obviously always available and the CG and AT topologies are also required to perform the corresponding simulations at both resolution levels. Meanwhile, the AT conformation can be easily generated using a variety of molecular editors. For better understanding of each step, the complete algorithm is graphically summarized in Fig. 1.

Description of the algorithm The atomic coordinates of a molecule randomly chosen from the CG system are carefully modified to minimize the distance between them and the set of closest atoms in the provided AT structure. This is done by implementing a Metropolis Monte Carlo algorithm starting from a configuration where the geometric centers of both molecules are located at the same coordinates. Flexible Fitting of a random CG structure to the AT coordinates Each step of the Monte Carlo algorithm is employed to optimize the covering of the atomistic resolution molecule by that extracted from the CG system. In each step of this optimization, the coordinates of the CG beads can be modified by a rigid translation or rotation, which are applied over the CG molecule as a whole, or by a relative displacement of a single bead. ˆ Rigid translations of the whole CG molecule are performed in such a way that each

~ is randomly chosen from a normal distribution component of the translation array ∆R centered at zero with a standard deviation equal to twice the minimum bond distance in the AT structure. 6

ACS Paragon Plus Environment

Page 6 of 43

Page 7 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

ˆ Rigid rotations are always counter-clockwisely performed around an axis with a random

direction passing through the geometric center of the molecule. The value of the rotation angle is also randomly chosen from a normal distribution centered at zero with a standard deviation of π/4 rad. This choice produces mostly small rotations but also admits quite abrupt rotations (the probability of a rotation larger than π is one in 15787), thus allowing scanning all possible configurations. ˆ The relative displacement of single beads provides flexibility to the CG structures.

The bead to be displaced is always randomly chosen. The corresponding coordinates are changed in a different way depending on the number of bonds connected to the CG-bead (this information is in the corresponding topology file mentioned previously). Beads with a single bond are randomly translated along the plane perpendicular to that bond; beads with two bonds are randomly translated along the plane perpendicular to the line joining the two attached beads; beads with three or more bonds are randomly translated along the line perpendicular to the plane containing three randomly chosen bonded beads (see Fig. 2). Once the direction of the bead displacement is determined, the distance to be translated is randomly chosen from a normal distribution centered at zero and with a standard deviation equal to half the average bond distance of the CG molecule. This provides a small perturbation in each step which does not ensure the conservation of the bond distances. To avoid this, first neighbours of the already moved atoms (previously frozen) are subsequently displaced in order to restore the bond distances (see Fig. 3 for a visualization of the process). After each modification of the coordinates, the distance between the CG and the AT structures is determined as:

d2 =

X

|~riAT − ~riCG |2 ,

(1)

i∈AT

where ~riAT are the coordinates of the i-th atom of the AT molecule and ~riCG are the co7

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 43

ordinates of the closest CG-bead to the ith atom. For simplicity, in the following we will write ~riAT ≡ ~ri . Thus, considering that, in the case of molecular systems, the distances between both atoms (at AT resolution) and beads (at CG resolution) have characteristic length scales, we can confidently assume that the AT atoms are independently and normally distributed around the the positions of the corresponding nearest CG beads (and vice versa). Hence, the probability of finding the AT structure in a certain configuration around the CG one can be written as:   |~ri − ~riCG |2 . P = pi ∝ exp − 2 2σ i i∈AT i∈AT Y

Y

(2)

This probability represents the likelihood of having a given AT distribution if the average positions were those of the beads of the CG configuration. So, the mapping algorithm here explained provides the optimal covering through the maximum likelihood estimation. As the bond lengths take similar values for every atom, we can assume that the standard deviation for all atoms is constant and the latter equation can be rewritten as: (

1 X P ∝ exp − 2 |~ri − ~riCG |2 2σ i∈AT

)



 1 2 = exp − 2 d . 2σ

(3)

The optimal CG molecular configuration is found by maximizing P , so minimizing

χ2 =

1 X |~ri − ~riCG |2 , σ 2 i∈AT

(4)

which in this case is directly proportional to the square of the distance between the two data sets, d2 . If the statistical weights of the atoms were not the same, then the statistical distance in the exponent of eq 2 should be minimized. The distance between the molecules in both resolutions calculated with eq. 1 may have local minima around configurations where one or more CG beads are far away from all the AT atoms. In this case, those CG beads would not be involved in the set-to-set distance

8

ACS Paragon Plus Environment

Page 9 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

calculation. Additionally, it is known that every CG bead represents a set of atoms in the AT-resolution molecule, so we are interested in computing configurations in which all CG beads are involved. In order to favor that the Monte Carlo algorithm explores the whole χ2 landscape, jumping out of the local minima, a multiplicative factor greater than 1 was introduced. The penalty should not be too high since the transit throughout those unrealistic states should be allowed in some cases in order to achieve more stable configurations. Thus, after some tests that consider the speed of convergence, we decided to set a penalty factor (λ) of 1.1 for the states where at least one CG bead does not contribute to χ2 . Different λ values greater than 1 are expected to lead to the same final molecular configuration, perhaps even faster. This is one of the points of the algorithm that could be optimized in the future. The previously described algorithm to identify the set of atoms represented by each CG bead is based just on the spatial coordinates of both molecular representations, so it does not take into account the atomic charge distribution. This could also be considered to avoid artefacts like the covering of a linear surfactant in the AT representation by the CG structure aligned in an antiparallel way; i.e., in some cases the algorithm could provide a good alignment of both structures but with the CG and AT polar/ionic heads in opposite directions. In our implementation of the method, these undesired solutions have been prevented by directly associating CG beads to specific atoms in eq 4; i.e., when such kind of restrictions are imposed to one atom, its contribution to χ2 is not based on its distance to the closest CG bead but to a predetermined specific one. In the case of CG protein structures using the MARTINI force field, our code automatically imposes a restriction that identifies the alpha carbons in both resolution levels. Similarly, reference atoms are easy to identify for DNA and other bio or artificial polymers. In general, these user-defined restrictions significantly accelerate the convergence and ensure that the correct alignment between the CG and the AT structures is reached.

9

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Acceptance criterion for the Monte Carlo algorithm The acceptance criterion for every new CG structure configuration is based on the ratio of χ2 values before (χ2i ) and after (χ2f ) the corresponding modification of coordinates (C = χ2i /χ2f ). New configurations with C > 1 are always accepted. Otherwise, the new configuration is accepted if C/100 is higher than a number randomly chosen between 0 and 1. Thus the acceptation probability is proportional to the C value. Once again, this factor was arbitrarily chosen to minimize the mapping time of our test systems. Any modification should affect just the convergence time but not the resulting AT configuration.

Convergence criterion The Monte Carlo sampling stops when the number of sequentially proposed structures which do not decrease the lowest χ2 is larger than a certain value Nsteps . We set a linear dependence between Nsteps and the number of CG beads of the structure to be mapped (NCG ). Such linear dependence comes from the fact that the probability to move a certain particle in the correct direction to minimize χ2 is independent of the chosen particle. Thus, when the optimum configuration was reached for all the CG structures except for one bead, the probability of choosing the wrong placed particle is 1/NCG . For the studied systems Nsteps was set to 5000 NCG . Note that Nsteps is intimately related to the acceptance factor in the Metropolis algorithm, so a compromise between both numbers should be reached to optimize the performance of the mapping process.

Mapping of the whole CG system After the alignment of the CG and AT structures, i.e. when the CG beads which are expected to be closest to the corresponding atoms have been identified, we have the information required to map the whole CG system considering all the appearing structural configurations. In order to do this, a coordinate system is defined for each CG bead. The basis vectors for each coordinate system are built considering the positions of the two closest CG beads. The 10

ACS Paragon Plus Environment

Page 10 of 43

Page 11 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

first vector is parallel to the line joining the reference CG bead and its closest neighbour; the second one is chosen to be normal to the plane containing the three considered beads; and the last one is taken to conform an orthonormal basis. In the particular case of three collinear CG beads, the coordinate system is not univocally defined so the first vector of the basis is determined by the line containing the beads, the second vector is randomly chosen in the perpendicular plane to the first vector, and the third one is defined by the cross product of the first two vectors. These coordinate systems can be defined for every CG molecules in the system. Thus the local coordinate systems are extrapolated to all the CG structures to identify the position of the atoms in the AT resolution system. Then these positions are extrapolated to the coordinate systems of the rest of the CG molecules in the system. It is worth to mention that this process is computationally very cheap, even for large systems. Overlaps are prevented by scaling the atomic positions by a factor of 0.75 so the size of the mapped molecules is somehow reduced. This artificial compression of individual molecules at local level will be later removed by energy minimization.

Optimization of the mapping process In order to optimize the final AT resolution structure, we decided to apply a series of minimization processes. First, a double precision minimization using the steepest descent algorithm was applied using the maximum force between atoms as convergence criterion. Then, a longer minimization using the conjugated gradients algorithm was performed until the energy converged.

Molecular Dynamics Simulation methods Preparation of the initial structures Six totally different systems (ionic liquids, cyclodextrins, cell-penetrating peptides, cyclic peptides, lipid membranes, vesicles, organic molecules, DNA and solvated proteins) were prepared in order to test our method. The preparation of the initial structures was as 11

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

follows: ˆ The CG structure of ubiquitin was obtained from 1UBQ and solvated using 729 MAR-

TINI water molecules. Elastic network restraints (ElNeDyn) 17 were applied in order to maintain the secondary structure. ˆ The helicoidal peptide with sequence RRLKRLLRRLKRL was built using Avogadro. 18

The resulting structure was inserted into a preformed CG dipalmitoylphosphatidylcholine (DPPC) membrane (extracted from ref. 19) and solvated with 2063 MARTINI water molecules. The total system was neutralized by introducing eight chloride ions. The helical structure of the peptide was restricted during the CG MD simulation using a harmonic potential function with a constant force of 104 kJ/mol · nm2 between the carbon atoms of the protein backbone. Such contribution to the potential energy does not directly affect the motion of the amino acid lateral chains, which remain unrestrained. ˆ A self-assembled nanotube (SCPN) composed by eight cyclic peptides (CP) units was

built from the replication of a dimer composed by CPs with sequence RRKWLWLW (underlining represents D-amino-acid residues). Distance restraints were applied to the backbone of the CP to maintain the structure of the nanotube. The resulting structure was approached to a 2-Oleoyl-1-palmitoyl-sn-glycero-3-phosphocholine 95.5% 98% (POPC) membrane, which was built using the tool insane. 20 Then, the system was solvated with 4140 preequilibrated MARTINI water molecules and neutralized. ˆ In the case of the vitamin E, an initial box with a random distribution of 500 vitamin

E molecules in CG resolution was prepared. A CG DNA molecule with sequence 5’AGGTAGTGTAATCGCCTTG-3’ was also added to the initial random mixture of vitamin E. The resulting system was solvated with 72387 MARTINI water molecules in a larger box so that the local concentration of vitamin E is larger than the total

12

ACS Paragon Plus Environment

Page 12 of 43

Page 13 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

concentration in order to speed up the self-assembly process. To keep the double-helix shape of the DNA, elastic network restraints (ElNeDyn) 17 were applied. ˆ A dimer composed of two α-cyclodextrins (α-CDs) in a head-head conformation 21 was

built using CG resolution. Single rings were connected through bonds in order to represent the dimer, according to previous work with analogous dimers composed of β-CDs. 22 A SDS molecule was introduced in the inner cavity of the dimer. Once this complex was constructed, a box with 50 of this complexes randomly distributed was prepared, solvating with 45019 MARTINI water molecules and adding 50 sodium ions in order to neutralize the total system. ˆ The initial simulation box for 1-butyl-3-methylimidazolium tetrafluoroborate ([BMIm][BF4 ])

ionic liquid (IL) system was obtained by packing 300 ionic pairs in a cubic box with the help of Packmol. 23 The side length of the box was equal to 5.4 nm.

CG and AT parameters The CG simulations were performed with the standard simulation parameters associated with the MARTINI force field. 7,24 The CG topologies for ubiquitin, helicoidal peptides, cyclic peptides, and DNA were built using the martinize.py tool. 25 For the CD, SDS and lipids the available parameters in http://cgmartini.nl were used. 22 The topologies of the CP and the CD (bonds and charges) were manually modified to close the cycles. The CG parameters for vitamin E were manually obtained following the MARTINI tutorial. 26 For the ionic liquid, the MARTINI CG resolution topologies of the molecules were obtained with the help of the auto-martini tool. 11 The AT simulations upon backmapping using our algorithm and resolvation with preequilibrated water were carried out using different force fields, depending on the system: GROMOS53a6 27 for the CD-SDS, GROMOS54a7 atb 28 for the Cell-penetrating peptides 29 and DPPC, 30 GROMOS54a7 28 for the protein in solution, OPLS 31–36 for SCPN and POPC, 37

13

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and GAFF, 38 OPLS-AA 31,33,34,36 and AMBER 39 for the vitamin E, ILs and DNA, respectively.

CG and AT Simulation Conditions All the CG calculations were carried out at a temperature of 300 K using the V-rescale thermostat. 40 The Berendsen barostat 41 with isotropic pressure coupling was used for CDSDS, vitamin E-DNA and ubiquitin systems, while the same barostat with semiisotropic pressure coupling was used for the membrane systems: the Cell-penetrating peptides and the SCPN. All the AT simulations were carried out at 300 K using the V-rescale thermostat. In order to control the pressure, the Berendsen barostat with isotropic pressure coupling was used for the CD-SDS, vitamin E-DNA and ubiquitin systems, whereas the Parrinello-Rahman 42 with semiisotropic pressure coupling was used for the Cell-penetrating peptides and SCPN systems. All the simulations were done using Gromacs 5.1.2. 43

Results The systems selected to test the method are varied and relatively complex. They include ionic liquids (ILs), cyclodextrins, linear and cyclic peptides, vitamin E, DNA, lipids and proteins. We consider them as a representative sample, in terms of structural complexity, of the wide variety of molecular systems that are of current research interest, but our algorithm should work for any other problem involving the transformation between structures consisting of equivalent discrete and flexible objects with different particle representation and object configuration. It is worth mentioning that in this article we do not intend to perform a detailed study of the test systems but just to get representative CG structures to see how our algorithm is able to map them back to atomistic scale detail. Our strategy was to apply our

14

ACS Paragon Plus Environment

Page 14 of 43

Page 15 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

backmapping algorithm after a reasonable simulation time at CG resolution. This allows the system to evolve and the backmapping process is more challenging since the final structure of the system, the one to be mapped to high resolution, is significantly different from the initial one. The validation of the backmapped structures was left to the employed AT force field. Final structures that run stable MD simulations at AT level after the backmapping process are considered as valid and useful for further studies at high resolution. The mapping depends on the reference AT-CG taken to make the alignment, the identification of atom sets for each CG bead, and the associated local coordinate systems. The extrapolation of this mapping to all the CG molecules of the same species, each with different configuration, may lead to a slightly different final system if the reference structures change. This can be assessed by using a highly sensitive property for a system containing a significant number of molecules with a large variety of configurations. A well suited option is the determination of order parameters of lipids in model membranes which showed to be really sensitive to the structure of the bilayer. 44 Thus we took from ref. 45 an equilibrated atomicresolution model membrane consisting of 128 DPPC molecules, run it for 100 ns at 323 K to take it to the fluid phase, we mapped it to CG resolution and then, with no equilibration at all, we applied our algorithm to map it back to AT resolution. After the backmapping, a 20-ns-long AT simulation was performed. The order parameters were determined for single structures before the mapping to AT, after the backmapping to AT and after 5, 10, 15 and 20 ns of MD simulation at AT resolution (see Fig. 4 below). The order parameters just after backmapping −with no equilibration− were significantly distorted. This is obviously a consequence of the loss of information associated to the CG structure. A short equilibration allows the system to recover the correct structure. It is important to mention that the aim of the backmapping method is to produce useful structures for atomic resolution MD simulation that maintain the configuration provided by the CG resolution system. It would be unreasonable to expect a perfectly equilibrated AT resolution system after ignoring the information missing in the CG resolution. The order parameters determined after 10, 15

15

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

and 20 ns of MD simulation are undistinguishable of those obtained at 5 ns. Note that we decided to backmap only the solute molecules. In all cases the CG solvent was removed and the structures resulting from the backmapping process were resolvated with preequilibrated AT molecules. The solvent could also be mapped from CG to AT in a simple way using our algorithm, but we do consider this would not represent any real advantage in the particular case of MD simulations.

Protein in solution Ubiquitin is a globular 76-residue-long cytoplasmic protein with a molecular weight of 8.6 kDa found in all eukaryotic cells. 46 It is involved in a variety of biochemical functions, being the best known its role in protein apoptosis, which has triggered it to earn the title of the molecular “kiss of death” for proteins. 47 The protein native state involves a high content of secondary-structure elements, comprising a five-stranded β-sheet, an α-helix (fitting into a concavity formed by the β-sheet), a 3-10 helix, and seven reverse turns (Fig. 5). 48,49 Several studies, including MD simulations, have suggested that it remains in its native state in solution up to temperatures of 355 K. 50 We used this protein to test the ability of our method to map the structure back to atomic resolution after a CG simulation. A 200 ns-long MD simulation at CG resolution was carried out. The obtained structure was subsequently backmapped to the AT resolution level using the GROMOS54a7 force field. 28 The main features of the original structure were recovered upon the backmapping process and they were stable upon 50 additional nanoseconds of MD simulations at AT level.

Cell-penetrating Peptides Cell-penetrating peptides (CPPs) are short amino acid sequences characterized by their facially amphiphilic patterns of hydrophobicity and cationic charge. They have a remarkable ability to cross cell membranes and have become one of the emerging vehicles for delivery of cargo drugs. 51,52 16

ACS Paragon Plus Environment

Page 16 of 43

Page 17 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

An helicoidal peptide with sequence RRLKRLLRRLKRL (see Fig. 6) inserted into a DPPC membrane was simulated. During the CG simulation, the peptide is rapidly expelled from the hydrophobic core of the membrane (in less than 10 ns) and it places at the region of the lipid heads, with the hydrophilic residues oriented towards the water interface. After 1.2 µs of CG simulation, both the peptide and the lipids were mapped to AT using the G53a6 27 parameterization of the GROMOS force field. The system was then solvated with preequilibrated water, neutralized and minimized using the Gromacs tools. Subsequent atomistic MD simulations were carried out during 20 ns, confirming the stability of the peptide horizontally oriented over the DPPC bilayer.

Cyclic peptides Cyclic antimicrobial peptides (AMPs) have emerged as good antimicrobial candidates due to their robust secondary structure and high activity. 53 CPs that form SCPNs, such as demonstrated in the pioneering work by Ghadiri et al. 54 using CPs composed of amino acids of opposite alternating chirality (L and D), expose all side chains to the outer surface of the structure leaving an empty internal hole (Fig. 8). The radial disposition of the side chains outwards the channel modifies the stability and external properties of the nanotube and so those of the membrane where it is eventually embedded. 55 Depending on the CP sequences, the formed SCPNs can be oriented perpendicularly to the lipid membrane (hydrophobic CPs) or in a parallel fashion (amphipatic CPs), conferring them a high antimicrobial activity. 56,57 Structural studies suggest a carpet type mechanism 58 that decreases the membrane stability, dissipating the membrane potential that finally causes the cell death. One key feature that makes CPs very promising antimicrobial peptides is their low toxicity since the nanotube formation only takes place upon membrane interaction and this supramolecular structure -the nanotube- is the active form. A SCPN composed by the self-assembly of CPs developed by Ghadiri et al., 59 with sequence RRKWLWLW (underlining represents D-amino-acid residues) and moderate activities against E. coli, is here used to challenge the ability of our 17

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

method to map CG to AT resolution. A SCPN composed by eight CP units was approached to a solvated POPC membrane using the MARTINI force field. The resulting system was simulated during 150 ns, leading to a structure with the SCPN partially embedded into the membrane (Fig. 9). Upon applying our algorithm in order to obtain a representation of the system at AT resolution using the OPLS force field and resolvating with pre-equilibrated TIP3P water, the stability of the structure (with charged residues being oriented towards the hydrophilic part of the membrane) was confirmed by simulating the AT backmapped system during 20 additional nanoseconds.

Cyclodextrins Cyclodextrins are natural cyclic oligosaccharides formed by six (α-CD), seven (β-CD) or eight (γ-CD) glucopyranoside groups that play a relevant role in different fields such as pharmacy, medicine, biotechnology, chemistry, materials design, food or agricultural science. 60 CDs are able to form host-guest complexes with hydrophobic groups, mainly aliphatic chains and aromatic rings. Depending on the structure of the guest, it binds with different affinity constants and stoichiometries. Both single CDs and also CD-based supramolecular complexes can self-assemble in solution and at interfaces leading to interesting structures including cyclodextrinosomes and interfacial films that can be employed for different purposes. 61–63 The structures formed by these molecules are typically too large to be studied by MD simulations at atomic level in a reliable way and so the common strategy is to significantly reduce the size of the systems. 64,65 CG resolution simulations allow approaching better the size scale of such structures, so we decided to use the simulation of one of these systems as a test for our backmapping algorithm. In particular, we performed a preliminary study on the selfassembly process of αCD2 SDS1 complexes in water solution, starting from a random mixture of 100 previously solvated and minimized αCD2 SDS1 complexes at CG resolution. 22 Under these conditions, a 6 µs long MD simulation was carried out leading to a large amorphous aggregate that was subsequently mapped to atomic resolution using the GROMOS54a7 force 18

ACS Paragon Plus Environment

Page 18 of 43

Page 19 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

field. 28 The resulting structure was then simulated at atomic level during 10 ns. The AT simulation yielded a stable configuration, although the system continued evolving (see Fig. 10). The CDs have proven to be highly sensitive to the employed force field 66 and so the employed CG parameterization is probably not well suited for these systems. Whether or not the resulting structures are reliable mainly depends on the quality of the CG and AT parameterizations, but what is important in this context is that the backmapping perfectly conserved the structure of the aggregate and that it was straightforward to perform the AT simulation starting from it.

Organic molecules aggregation and DNA Vesicular drug delivery has received particular attention because of their potential ability to confine the activity of encapsulated or membrane embedded drugs at the target tissue. Liposomes are often used to deliver a molecular cargo such as DNA for therapeutic benefit. 67 Due to its amphipathic structure, vitamin E (see Fig. 11) and its analogues can be easily incorporated into lipid bilayers to produce functional liposomes. Vitamin E has proven to self-organize into vesicles even in the absence of lipids or any other cosolute, at appropriate concentrations. Such vesicles become unstable in the presence of divalent cations, acidic pH, and serum. 68 With the aim of testing our algorithm with organic molecules and DNA we decided to include a study using vitamin E and a short DNA chain in our backmapping test. Starting from a random distribution of 500 vitamin E molecules and a single DNA chain with sequence 5’-AGGTAGTGTAATCGCCTTG-3’ at CG resolution using the MARTINI force field, the spontaneous formation of a supramolecular aggregate was observed after 250 ns of MD simulation. The aggregate does not contain water molecules inside and most of the polar heads of vitamin E are oriented towards the bulk water. The DNA molecule is partially inserted in the vesicular nanostructure. The resulting system was mapped to AT resolution using our algorithm and the AMBER/GAFF force field, and an atomistic simulation during 10 ns was run. Again, the obtained simulation was stable and no serious changes of the 19

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

structure were observed (Fig. 12). In this case we have performed an additional validation test of our algorithm by checking that the chiral centers of the vitamin E molecules were conserved after the mapping process (see Fig. 13).

Ionic liquids ILs are compounds formed only by ions with a melting point below 100 ◦ C. Because of their interesting properties, such as low vapor pressures, high thermal stability, good electrochemical properties, excellent solvation capacities, etc., ILs have been widely studied during the last two decades. 69–72 Although experimental studies provide a good way to address the ILs characterization at macroscopic level, MD simulations could be considered as the best option to characterize these systems at a molecular resolution level. Besides the mentioned properties of these densely charged fluids, recent studies show that the microscopic organization due to the amphiphilic nature of ILs coexists with another level of organization at a mesoscopic level. 73,74 To carry out computational studies of these systems at a mesoscopic level, low-resolution MD simulations should be performed. Such simulations would allow reaching the length scales required for the formation of large-scale structures. The backmapping of the resulting structures to atomic resolution would allow extending the simulations to get more detailed information at lower size scales. We decided to test our algorithm using one of the most studied IL, the 1-butyl-3-methylimidazolium tetrafluoroborate ([BMIM][BF4 ]). 300 ionic pairs were simulated in NPT ensemble at a 1 atm and 298.15 K using the MARTINI force field. After 500 ns the system was mapped to OPLS-AA force field, 31 minimized and stabilized to finish with a simulation of 5 ns at AT resolution. In Fig. 14 the map of a single ionic pair and the simulation box at both resolution levels are shown.

Conclusions and perspectives A new method to switch between different representations consisting of equivalent discrete and flexible objects is described and validated using a heterogeneous set of relatively complex 20

ACS Paragon Plus Environment

Page 20 of 43

Page 21 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

molecular systems including proteins, membranes, cyclic and linear peptides, DNA, vitamin E, cyclodextrins and ionic liquids. The method is expected to be useful in the more general context of different representations of discrete systems at the same or different resolution levels, i.e. each object can be described by a different number of particles in each representation, including eventual restraints. For molecular systems it can be employed to exchange between different force fields regardless the number of atoms represented by each particle. The most useful and challenging cases of recovering the atomistic resolution representation from the output of relatively long MD trajectories at CG level were employed in the present paper to test the new algorithm. It was observed that, in all cases, the recovered AT resolution structures fairly reproduced the corresponding CG counterparts. Subsequent MD simulations at AT resolution were also stable in the timescale of several nanoseconds for all the backmapped structures. The algorithm presented here has the following advantages with respect to other alternative methods specifically developed for the recovery of AT resolution from CG representation of molecular systems. 13–16 ˆ It can be applied on the fly for any system without neither the implementation of new

code nor the addition of new elements to a specific library. ˆ It is highly efficient mainly for large systems since the computational effort is invested

in the alignment and atom-particle identification for each different molecule type of the system; then, all the equivalent molecules are mapped using the same atom-particle identification, always respecting the configuration of each individual molecule in the whole CG system, and the full mapping is almost instantaneous. For instance, mapping a system with 200 POPC molecules takes practically the same time as mapping a similar system with 2000 molecules. However, if the system has 100 POPC and another 100 DPPC molecules, the computational time is doubled. In other words, the method reported here is more efficient for large systems with many molecules than for smaller systems with many different types of molecules. 21

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ˆ It is general in the sense that can be applied to transform any molecular system between

any pair of force field representations, including high-to-low resolution, low-to-high resolution or equivalent resolution force fields. These features facilitate the automation of the whole process although some issues have to be optimized. For instance, other methods also consider the backmapping of the solvent molecules. We decided to alternatively resolvate the system with pre-equilibrated water molecules because this is more efficient and, to our view, there are no clear advantages in mapping the water molecules, but we do not discard to test different alternatives for the treatment of the solvent in the future. It is worth to mention that the CG representation of a molecular system could be interpreted as a compressed version of the original AT representation. Such transformation leads to a loss of information that introduces some ambiguity in the low resolution structure. To our knowledge, no algorithms or CG parameterization methods have been proposed to decipher such ambiguity. For this reason, the recovery of the high resolution structure requires additional information. We are currently working in a web-app to make our method freely available. The only inputs of this app, as indicated in the workflow presented in Fig. 1, will be the coordinates of the CG system, the coordinates of one molecule of each type in the whole system and the topologies (ITP files in the case of Gromacs) for each molecule type. The topologies could even be unnecessary by connecting our app to one of the currently available automated generators of topologies for different force fields. 75–77 We hope this algorithm and the associated tool will be useful for the MD community to do multiscale MD simulations by taking the best of each resolution level. At this moment we are working on a web application that would allow everyone to perform their own maps interactively. This platform will be released as soon as possible.

22

ACS Paragon Plus Environment

Page 22 of 43

Page 23 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Acknowledgements The authors wish to thank the financial support of the Spanish Ministry of Economy and Competitiveness (Projects MAT2015-71826-P and MAT2014-57943-C3-1-P). Moreover, this work was funded by the Xunta de Galicia (AGRUP2015/11 and GRC ED431C 2016/001). All these research projects were partially supported by FEDER. J. M. O.-M. and H. M.-C. thank the Spanish Ministry of Education for their FPU grants. R. G.-F. thanks to FCT for a Investigator Grant (Starting Grant, IF/01133/2015) and M. C. thanks to Xunta de Galicia for a predoctoral fellowship. Facilities provided by the Galician Supercomputing Centre (CESGA) are also acknowledged.

References (1) McCammon, J. A.; Gelin, B. R.; Karplus, M. Dynamics of folded proteins. Nature. 1977, 267, 585. (2) Adcock, S. A.; McCammon, J. A. Molecular dynamics: survey of methods for simulating the activity of proteins. Chem. Rev. 2006, 106, 1589. (3) Trick, J. L.; Aryal, P.; Tucker, S. J.; Sansom, M. S. P. Molecular simulation studies of hydrophobic gating in nanopores and ion channels. Biochem. Soc. T 2015, 43, 146–150. (4) Quezada, A. G.; D´ıaz-Salazar, A. J.; Cabrera, N.; P´erez-Montfort, R.; Pi˜ neiro, A.; Costas, M. Interplay between Protein Thermal Flexibility and Kinetic Stability. Structure. 2017, 25, 167 – 179. (5) Fersht, A. R.; Daggett, V. Protein folding and unfolding at atomic resolution. Cell. 2002, 108, 573–582. (6) Marrink, S. J.; De Vries, A. H.; Mark, A. E. Coarse grained model for semiquantitative lipid simulations. J. Phys. Chem. B 2004, 108, 750–760. 23

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(7) Marrink, S. J.; Risselada, H. J.; Yefimov, S.; Tieleman, D. P.; De Vries, A. H. The MARTINI force field: coarse grained model for biomolecular simulations. J. Phys. Chem. B 2007, 111, 7812–7824. (8) Kmiecik, S.; Gront, D.; Kolinski, M.; Wieteska, L.; Dawid, A. E.; Kolinski, A. CoarseGrained Protein Models and Their Applications. Chem. Rev. 2016, 116, 7898–7936, PMID: 27333362. (9) Wu, C.; Shea, J.-E. Coarse-grained models for protein aggregation. Curr. Opin. Struct. Biol. 2011, 21, 209 – 220. (10) Martini: Coarse Grain Forcefield for Biomolecules. http://www.cgmartini.nl. (11) Bereau, T.; Kremer, K. Automated parametrization of the coarse-grained Martini force field for small organic molecules. J. Chem. Theory. Comput. 2015, 11, 2783–2791. (12) Graham, J. A.; Essex, J. W.; Khalid, S. PyCGTOOL: Automated Generation of Coarsegrained Molecular Dynamics Models from Atomistic Trajectories. J. Chem. Inf. Model. 2017, (13) Rzepiela, A. J.; Sch¨afer, L. V.; Goga, N.; Risselada, H. J.; De Vries, A. H.; Marrink, S. J. Reconstruction of atomistic details from coarse-grained structures. J. Comput. Chem. 2010, 31, 1333–1343. (14) Stansfeld, P. J.; Sansom, M. S. From coarse grained to atomistic: a serial multiscale approach to membrane protein simulations. J. Chem. Theory. Comput. 2011, 7, 1157– 1166. ´ Multiscale (15) Brocos, P.; Mendoza-Espinosa, P.; Castillo, R.; Mas-Oliva, J.; Pineiro, A. molecular dynamics simulations of micelles: coarse-grain for self-assembly and atomic resolution for finer details. Soft. Matter. 2012, 8, 9005–9014.

24

ACS Paragon Plus Environment

Page 24 of 43

Page 25 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(16) Wassenaar, T. A.; Pluhackova, K.; Bckmann, R. A.; Marrink, S. J.; Tieleman, D. P. Going Backward: A Flexible Geometric Approach to Reverse Transformation from Coarse Grained to Atomistic Models. J. Chem. Theory. Comput. 2014, 10, 676–690, PMID: 26580045. (17) Periole, X.; Cavalli, M.; Marrink, S.-J.; Ceruso, M. A. Combining an elastic network with a coarse-grained molecular force field: structure, dynamics, and intermolecular recognition. J. Chem. Theory. Comput. 2009, 5, 2531–2543. (18) Hanwell, M. D.; Curtis, D. E.; Lonie, D. C.; Vandermeersch, T.; Zurek, E.; Hutchison, G. R. Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J. Cheminform. 2012, 4, 17. (19) Martini:

DPPC Bilayer. http://cgmartini.nl/images/applications/bilayer/

dppc_bilayer.gro. (20) Wassenaar, T. A.; Ingolfsson, H. I.; Bockmann, R. A.; Tieleman, D. P.; Marrink, S. J. Computational lipidomics with insane: a versatile tool for generating custom membranes for molecular simulations. J. Chem. Theory. Comput. 2015, 11, 2144–2155. (21) Brocos, P.; D´ıaz-Vergara, N.; Banquy, X.; P´erez-Casas, S.; Costas, M.; Pi˜ neiro, A. Similarities and Differences Between CyclodextrinSodium Dodecyl Sulfate HostGuest Complexes of Different Stoichiometries: Molecular Dynamics Simulations at Several Temperatures. J. Phys. Chem. B 2010, 114, 12455–12467, PMID: 20836518. (22) L´opez, C. A.; de Vries, A. H.; Marrink, S. J. Computational microscopy of cyclodextrin mediated cholesterol extraction from lipid model membranes. Sci. Rep. 2013, 3, 2071. (23) Mart´ınez, L.; Andrade, R.; Birgin, E. G.; Mart´ınez, J. M. PACKMOL: a package for building initial configurations for molecular dynamics simulations. J. Comput. Chem. 2009, 30, 2157–2164.

25

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(24) Monticelli, L.; Kandasamy, S. K.; Periole, X.; Larson, R. G.; Tieleman, D. P.; Marrink, S.-J. The MARTINI coarse-grained force field: extension to proteins. J. Chem. Theory. Comput. 2008, 4, 819–834. (25) de Jong, D. H.; Singh, G.; Bennett, W. D.; Arnarez, C.; Wassenaar, T. A.; Schafer, L. V.; Periole, X.; Tieleman, D. P.; Marrink, S. J. Improved parameters for the martini coarse-grained protein force field. J. Chem. Theory. Comput. 2012, 9, 687–697. (26) Martini: Coarse Grain Forcefield for Biomolecules. http://cgmartini.nl/index.php/ tutorials-general-introduction/parametrzining-new-molecule. (27) Oostenbrink, C.; Villa, A.; Mark, A. E.; Van Gunsteren, W. F. A biomolecular force field based on the free enthalpy of hydration and solvation: the GROMOS force-field parameter sets 53A5 and 53A6. J. Comput. Chem. 2004, 25, 1656–1676. (28) Malde, A. K.; Zuo, L.; Breeze, M.; Stroet, M.; Poger, D.; Nair, P. C.; Oostenbrink, C.; Mark, A. E. An automated force field topology builder (ATB) and repository: version 1.0. J. Chem. Theory. Comput. 2011, 7, 4026–4037. (29) Schmid, N.; Eichenberger, A. P.; Choutko, A.; Riniker, S.; Winger, M.; Mark, A. E.; van Gunsteren, W. F. Definition and testing of the GROMOS force-field versions 54A7 and 54B7. Eur. Biophys. J 2011, 40, 843. (30) Chen, R.; Poger, D.; Mark, A. E. Effect of high pressure on fully hydrated DPPC and POPC bilayers. J. Phys. Chem. B 2010, 115, 1038–1044. (31) Jorgensen, W. L.; Maxwell, D. S.; Tirado-Rives, J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J. Am. Chem. Soc. 1996, 118, 11225–11236. (32) McDonald, N. A.; Jorgensen, W. L. Development of an all-atom force field for hete-

26

ACS Paragon Plus Environment

Page 26 of 43

Page 27 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

rocycles. Properties of liquid pyrrole, furan, diazoles, and oxazoles. J. Phys. Chem. B 1998, 102, 8049–8059. (33) Rizzo, R. C.; Jorgensen, W. L. OPLS all-atom model for amines: resolution of the amine hydration problem. J. Am. Chem. Soc. 1999, 121, 4827–4836. (34) Price, M. L.; Ostrovsky, D.; Jorgensen, W. L. Gas-phase and liquid-state properties of esters, nitriles, and nitro compounds with the OPLS-AA force field. J. Comput. Chem. 2001, 22, 1340–1352. (35) Watkins, E. K.; Jorgensen, W. L. Perfluoroalkanes: Conformational analysis and liquidstate properties from ab initio and Monte Carlo calculations. J. Phys. Chem. A 2001, 105, 4118–4125. (36) Kaminski, G. A.; Friesner, R. A.; Tirado-Rives, J.; Jorgensen, W. L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 2001, 105, 6474–6487. (37) Ulmschneider, J. P.; Ulmschneider, M. B. United atom lipid parameters for combination with the optimized potentials for liquid simulations all-atom force field. J. Chem. Theory. Comput. 2009, 5, 1803–1813. (38) Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 2004, 25, 1157–1174. (39) Lindorff-Larsen, K.; Piana, S.; Palmo, K.; Maragakis, P.; Klepeis, J. L.; Dror, R. O.; Shaw, D. E. Improved side-chain torsion potentials for the Amber ff99SB protein force field. Proteins: Struct., Funct., Bioinf. 2010, 78, 1950–1958. (40) Bussi, G.; Donadio, D.; Parrinello, M. Canonical sampling through velocity rescaling. J. Chem. Phys. 2007, 126, 014101.

27

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(41) Berendsen, H. J.; Postma, J. v.; van Gunsteren, W. F.; DiNola, A.; Haak, J. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 1984, 81, 3684–3690. (42) Parrinello, M.; Rahman, A. Polymorphic transitions in single crystals: A new molecular dynamics method. J. Appl. Phys. 1981, 52, 7182–7190. (43) Abraham, M. J.; Murtola, T.; Schulz, R.; P´all, S.; Smith, J. C.; Hess, B.; Lindahl, E. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 2015, 1, 19–25. (44) Piggot, T. J.; Pineiro, A.; Khalid, S. Molecular dynamics simulations of phosphatidylcholine membranes: a comparative force field study. J. Chem. Theory Comput. 2012, 8, 4593–4609. (45) Doma´ nski, J.; Stansfeld, P. J.; Sansom, M. S.; Beckstein, O. Lipidbook: a public repository for force-field parameters used in membrane simulations. The Journal of membrane biology 2010, 236, 255–258. (46) Goldstein, G.; Scheid, M.; Hammerling, U.; Schlesinger, D.; Niall, H.; Boyse, E. Isolation of a polypeptide that has lymphocyte-differentiating properties and is probably represented universally in living cells. Proc. Natl. Acad. Sci. 1975, 72, 11–15. (47) Pickart, C. M.; Eddins, M. J. Ubiquitin: structures, functions, mechanisms. BBA-Mol. Cell. Res. 2004, 1695, 55–72. (48) Vijay-Kumar, S.; Bugg, C. E.; Cook, W. J. Structure of ubiquitin refined at 1.8 ˚ Aresolution. J. Mol. Biol. 1987, 194, 531–544. (49) Cornilescu, G.; Marquardt, J. L.; Ottiger, M.; Bax, A. Validation of protein structure from anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase. J. Am. Chem. Soc. 1998, 120, 6836–6837.

28

ACS Paragon Plus Environment

Page 28 of 43

Page 29 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

(50) Kony, D. B.; H¨ unenberger, P. H.; van Gunsteren, W. F. Molecular dynamics simulations of the native and partially folded states of ubiquitin: Influence of methanol cosolvent, pH, and temperature on the protein structure and dynamics. Protein. Sci. 2007, 16, 1101–1118. (51) Koren, E.; Torchilin, V. P. Cell-penetrating peptides: breaking through to the other side. Trends. Mol. Med. 2012, 18, 385–393. (52) Louzao, I.; Garcia-Fandino, R.; Montenegro, J. Hydrazone-modulated peptides for efficient gene transfection. J. Mater. Chem. B 2017, 5, 4426–4434. (53) Stavrakoudis, A.; Tsoulos, I. G.; Shenkarev, Z. O.; Ovchinnikova, T. V. Molecular dynamics simulation of antimicrobial peptide arenicin-2: β-Hairpin stabilization by noncovalent interactions. Pept. Sci. 2009, 92, 143–155. (54) Ghadiri, M. R.; Granja, J. R.; Milligan, R. A.; McRee, D. E.; Khazanovich, N. Selfassembling organic nanotubes based on a cyclic peptide architecture. Nature. 1993, 366, 324. (55) Garcia-Fandi˜ no, R.; Pi˜ neiro, A.; Trick, J. L.; Sansom, M. S. P. Lipid Bilayer Membrane Perturbation by Embedded Nanopores: A Simulation Study. ACS Nano. 2016, 10, 3693–3701, PMID: 26943498. (56) Ghadiri, M. R.; Granja, J. R.; Buehler, L. K. Artificial transmembrane ion channels from self-assembling peptide nanotubes. Nature. 1994, 369, 301. (57) S´anchez-Quesada, J.; Isler, M. P.; Ghadiri, M. R. Modulating ion channel properties of transmembrane peptide nanotubes through heteromeric supramolecular assemblies. J. Am. Chem. Soc. 2002, 124, 10004–10005. (58) Montenegro, J.; Ghadiri, M. R.; Granja, J. R. Ion Channel Models Based on Self-

29

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 43

Assembling Cyclic Peptide Nanotubes. Accounts. Chem. Res. 2013, 46, 2955–2965, PMID: 23898935. (59) Fernandez-Lopez, S.; Kim, H.-S.; Choi, E. C.; Delgado, M.; Granja, J. R.; Khasanov, A.; Kraehenbuehl, K.; Long, G.; Weinberger, D. A.; Wilcoxen, K. M. et al. Antibacterial agents based on the cyclic D, L-α-peptide architecture. Nature. 2001, 412, 452–455. (60) Ain, S.; Kumar, B.; Pathak, K. Cyclodextrins: versatile carrier in drug formulations and delivery systems. Int. J. Pharm. Chem. Biol. Sci 2015, 5, 583–598. (61) Jiang, L.; Peng, Y.; Yan, Y.; Huang, J. Aqueous self-assembly of SDS@ 2 β-CD complexes: lamellae and vesicles. Soft. Matter. 2011, 7, 1726–1731. (62) Mathapa, B. G.; Paunov, V. N. Cyclodextrin stabilised emulsions and cyclodextrinosomes. Phys. Chem. Chem. Phys. 2013, 15, 17903–17914. (63) Ryzhakov, A.; Do Thi, T.; Stappaerts, J.; Bertoletti, L.; Kimpe, K.; Couto, A. R. S.; Saokham, P.; Van den Mooter, G.; Augustijns, P.; Somsen, G. W. et al. Self-assembly of cyclodextrins and their complexes in aqueous solutions. J. Pharm. Sci-Us. 2016, 105, 2556–2569. (64) Mixcoha, E.; Campos-Teran, J.; Pineiro, A. Surface adsorption and bulk aggregation of cyclodextrins by computational molecular dynamics simulations as a function of temperature: α-CD vs β-CD. J. Phys. Chem. B 2014, 118, 6999–7011. ´ Ruso, J. M.; Hassan, N.; Campbell, R. A.; Campos(65) Hernandez-Pascacio, J.; Pi˜ neiro, A.; Teran, J.; Costas, M. Complex Behavior of Aqueous α-Cyclodextrin Solutions. Interfacial Morphologies Resulting from Bulk Aggregation. Langmuir. 2016, 32, 6682–6690. (66) Henriksen, N. M.; Gilson, M. K. Evaluating force field performance in thermodynamic calculations of cyclodextrin host-guest binding:

30

ACS Paragon Plus Environment

water models, partial

Page 31 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

charges, and host force field parameters. J. Chem. Theory. Comput. 2017, DOI: 10.1021/acs.jctc.7b00359. (67) Sercombe, L.; Veerati, T.; Moheimani, F.; Wu, S. Y.; Sood, A. K.; Hua, S. Advances and challenges of liposome assisted drug delivery. Front. Pharmacol. 2015, 6, 286. (68) Neophytou, C. M.; Constantinou, A. I. Drug delivery innovations for enhancing the anticancer potential of Vitamin E isoforms and their derivatives. Biomed. Res. Int. 2015, 2015 . (69) Zhang, S.; Sun, N.; He, X.; Lu, X.; Zhang, X. Physical properties of ionic liquids: Database and evaluation. J. Phys. Chem. Ref. Data. 2006, 35, 1475–1517. (70) Freemantle, M. An Introduction to Ionic Liquids; RSC Publishing, 2009. (71) Wasserscheid, P.; Welton, T. Ionic liquids in synthesis; Wiley Online Library, 2003. (72) Zhang, S.; Lu, X.; Zhou, Q.; Li, X.; Zhang, X.; Li, S. Ionic Liquids: Physicochemical Properties; Elsevier, 2009. (73) Hayes, R.; Warr, G. G.; Atkin, R. Structure and nanostructure in ionic liquids. Chem. Rev. 2015, 115, 6357–6426. (74) Montes-Campos, H.; Otero-Mato, J. M.; M´endez-Morales, T.; L´opez-Lago, E.; Russina, O.; Cabeza, O.; Gallego, L. J.; Varela, L. M. Nanostructured solvation in mixtures of protic ionic liquids and long-chained alcohols. J. Chem. Phys. 2017, 146, 124503. (75) Malde, A. K.; Zuo, L.; Breeze, M.; Stroet, M.; Poger, D.; Nair, P. C.; Oostenbrink, C.; Mark, A. E. An Automated Force Field Topology Builder (ATB) and Repository: Version 1.0. J. Chem. Theory. Comput. 2011, 7, 4026–4037, PMID: 26598349. (76) Zoete, V.; Cuendet, M. A.; Grosdidier, A.; Michielin, O. SwissParam: A fast force field generation tool for small organic molecules. J. Comput. Chem. 2011, 32, 2359–2368. 31

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

(77) Dodda, L. S.; Cabeza de Vaca, I.; Tirado-Rives, J.; Jorgensen, W. L. LigParGen web server: an automatic OPLS-AA parameter generator for organic ligands. Nucleic. Acids. Res. 2017, 45, W331–W336.

32

ACS Paragon Plus Environment

Page 32 of 43

Page 33 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

CG SYSTEM

FOR EACH MOLECULE TYPE REQUIRED INPUT ONE MOL. CG COORDINATES*

REQUIRED INPUT

ONE MOL. CG BONDS

ONE MOL. AT COORDINATES

SINGLE MOLECULE CG CONFORMATION

SINGLE MOLECULE AT CONFORMATION

ONE MOL. AT BONDS

ALIGNMENT ENGINE

χ2

RIGID TRANSLATION

RIGID ROTATION

YE

S

ACCEPTANCE CRITERIA

SINGLE BEAD DISPLACEMENT

NO UPDATE CG CONFORMATION

NO

CONVERGENCE CRITERIA

YES

CG SYSTEM

EXTRAPOLATION TO ALL THE MOLECULES OF THE SAME TYPE

OPTIMAL EXCHANGE MAP

AT SYSTEM *The CG coordinates may be obtained from the CG system

Figure 1: Schematic workflow of the algorithm operation. First of all, the input files are used to load one molecule of each species in both AT and CG resolutions. Next, these molecules are used as inputs in the alignment engine to obtain the optimal exchange map between resolutions through a Monte Carlo simulation. Finally, these exchange maps are used to extrapolate the AT configuration to the rest of the molecules of the CG system.

33

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 34 of 43

Figure 2: Scheme with the possible single atom displacements for atoms with one, two or three bonds. Red beads represent the CG molecule atoms and black lines the bonds. Blue vectors indicate the possible directions to move the corresponding atom, perpendicular to the cyan lines and surface.

Final state 3rd step 2nd step 1st step Initial state

Figure 3: Sequence of displacements in a deformation of a CG structure. Blue spheres represent the displaced beads. Light colors represent the initial positions in the previous step.

34

ACS Paragon Plus Environment

Page 35 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 4: Average order parameters determined for the sn-1 (circles) and sn-2 (triangles) chains of 128 DPPC molecules corresponding to a lipid bilayer obtained from: the final structure of a 100-ns-long MD simulation at 323 K (left); the direct backmapping to AT level, with no equilibration, from the CG structure corresponding to the previous AT bilayer simulated for 100-ns (middle); and the equilibrated AT system with 5 ns of MD simulation after the backmapping. The bilayer structure was simulated for 15 ns more but the order parameters did not change after the first 5 ns of MD. The original bilayer was obtained from the lipidbook web site.

35

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5: Top: Representations of the initial and final structures of ubiquitin at CG level. Bottom: Initial and final structures of ubiquitin at AT level. The initial CG structure was obtained from the transformation of the AT structure with PDB-ID=1UBQ 48 to the MARTINI force field representation using the martinize.py tool. The final CG structure was obtained by 200 ns of MD simulation. The backmapping of the final CG structure to AT resolution using the GROMOS54a7 force field 28 corresponds to the initial AT structure. The final AT structure was obtained by 50 ns of MD simulation. In all snapshots water molecules were removed for clarity.

36

ACS Paragon Plus Environment

Page 36 of 43

Page 37 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 6: 2D structure of the peptide with sequence RRLKRLLRRLKRL.

Initial CG

Final CG

Final AT

Initial AT

Figure 7: Top: Overlapped CG and AT representations of DPPC and the CPP. Middle: Representations of the initial and final structures of the CPP in a DPPC bilayer at CG level. Bottom: Initial and final structures of the CPP in a DPPC bilayer at AT level. The initial CG structure was constructed by inserting the CPP in a preformed DPPC bilayer using the MARTINI force field. The final CG structure was obtained by 1.2 s of MD simulation. The backmapping of the final CG structure to AT resolution using the GROMOS53a6 force field 27 corresponds to the initial AT structure. The final AT structure was obtained by 50 ns of MD simulation. In all snapshots water molecules were removed for clarity.

37

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 8: Left: 2D structure of the CP with sequence RRKWLWLW (underlining represents D-amino-acid residues). Right: schematic representation of a SCPN.

38

ACS Paragon Plus Environment

Page 38 of 43

Page 39 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Initial CG

Final CG

Final AT

Initial AT

Figure 9: Top: Overlapped CG and AT representations of DPPC and the CP. Middle: Representations of the initial and final structures of the CP in a POPC bilayer at CG level. Bottom: Initial and final structures of the CP in a POPC bilayer at AT level. The initial CG structure was constructed by inserting the CP in a preformed POPC bilayer using the MARTINI force field. The final CG structure was obtained by 150 ns of MD simulation. The backmapping of the final CG structure to AT resolution using the OPLS-AA force field 31 corresponds to the initial AT structure. The final AT structure was obtained by 20 ns of MD simulation. In all snapshots water molecules were removed for clarity.

39

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Initial CG

Final CG

Final AT

Initial AT

Figure 10: Top: Overlapped CG and AT representations of SDS and αCD. Middle: Representations of the initial and final structures of the αCD2SDS1 complexes at CG level. Bottom: Initial and final structures of the αCD2SDS1 complexes at AT level. The initial CG structure was constructed by inserting 100 MARTINI αCD2SDS1 complexes at random locations. The final CG structure was obtained by 6 µs of MD simulation. The backmapping of the final CG structure to AT resolution using the GROMOS54a7 28 force field corresponds to the initial AT structure. The final AT structure was obtained by 10 ns of MD simulation. In all snapshots water molecules were removed for clarity.

Figure 11: 2D structure of vitamin E.

40

ACS Paragon Plus Environment

Page 40 of 43

Page 41 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Chemical Theory and Computation

Figure 12: Top: Overlapped CG and AT representations of vitamin E and the DNA chain with sequence 5’-AGGTAGTGTAATCGCCTTG-3’. Middle: Representations of the initial and final structures of the simulated system at CG level. Bottom: Initial and final structures of the vitamin E aggregate with the DNA chain partially embedded at AT level. The initial CG structure was constructed by inserting 500 MARTINI vitamin E molecules and a DNA chain at random locations. The final CG structure was obtained by 250 ns of MD simulation. The backmapping of the final CG structure to AT resolution using the AMBER/GAFF force field corresponds to the initial AT structure. The final AT structure was obtained by 10 ns of MD simulation. In all snapshots water molecules were removed for clarity. 41

ACS Paragon Plus Environment

Journal of Chemical Theory and Computation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 13: Vitamin E representation before (transparent) and after (opaque) the mapping. The chiral center is enlarged to highlight chirality conservation.

Figure 14: Top: Overlapped CG and AT representations of the simulated ion pair ([BMIM][BF4 ]). Bottom: Final structures at CG and AT resolution for the 500 molecules of the IL simulated. The CG structure was obtained after 500 ns of MD simulation using the MARTINI force field, starting from a (5.4 nm)3 cubic box built using Packmol. 23 The final AT structure was obtained after 5 ns of MD simulation using the OPLS-AA force field, 31 starting from the backmapped structure at the end of the CG-MD simulation.

42

ACS Paragon Plus Environment

Page 42 of 43

Page 43 of 43 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59

Journal of Chemical Theory and Computation

ACS Paragon Plus Environment