Configurational-Bias Monte Carlo Back-Mapping Algorithm for Efficient

Jun 20, 2018 - Troy D. Loeffler*† , Henry Chan† , Badri Narayanan‡ , Mathew J. Cherukara§ , Stephen Gray† , and Subramanian K. R. S. Sankaran...
0 downloads 0 Views 4MB Size
Article Cite This: J. Phys. Chem. B XXXX, XXX, XXX−XXX

pubs.acs.org/JPCB

Configurational-Bias Monte Carlo Back-Mapping Algorithm for Efficient and Rapid Conversion of Coarse-Grained Water Structures into Atomistic Models Troy D. Loeffler,*,†,∥ Henry Chan,†,∥ Badri Narayanan,‡ Mathew J. Cherukara,§ Stephen Gray,† and Subramanian K. R. S. Sankaranarayanan*,† Center for Nanoscale Materials, ‡Material Science Division, and §X-ray Science Division, Argonne National Lab, Lemont, Illinois 60439, United States

Downloaded via KAOHSIUNG MEDICAL UNIV on July 6, 2018 at 22:32:10 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



S Supporting Information *

ABSTRACT: Coarse-grained molecular dynamics (MD) simulations represent a powerful approach to simulate longer time scale and larger length scale phenomena than those accessible to all-atom models. The gain in efficiency, however, comes at the cost of atomistic details. The reverse transformation, also known as back mapping, of coarse-grained beads into their atomistic constituents represents a major challenge. Most existing approaches are limited to specific molecules or specific force fields and often rely on running a long-time atomistic MD of the back-mapped configuration to arrive at an optimal solution. Such approaches are problematic when dealing with systems with high diffusion barriers. Here, we introduce a new extension of the configurational-bias Monte Carlo (CBMC) algorithm, which we term the crystallineconfigurational-bias Monte Carlo (C-CBMC) algorithm, which allows rapid and efficient conversion of a coarse-grained model back into its atomistic representation. Although the method is generic, we use a coarse-grained water model as a representative example and demonstrate the back mapping or reverse transformation for model systems ranging from the ice−liquid water interface to amorphous and crystalline ice configurations. A series of simulations using the TIP4P/Ice model are performed to compare the new CBMC method to several other standard Monte Carlo and molecular dynamics-based back-mapping techniques. In all of the cases, the C-CBMC algorithm is able to find optimal hydrogen-bonded configuration many thousand evaluations/steps sooner than the other methods compared within this paper. For crystalline ice structures, such as a hexagonal, cubic, and cubic-hexagonal stacking disorder structures, the C-CBMC was able to find structures that were between 0.05 and 0.1 eV/water molecule lower in energy than the ground-state energies predicted by the other methods. Detailed analysis of the atomistic structures shows a significantly better global hydrogen positioning when contrasted with the existing simpler backmapping methods. The errors in the radial distribution functions (RDFs) of back-mapped configuration relative to reference configuration for the C-CBMC, MD, and MC were found to be 6.9, 8.7, and 12.9, respectively, for the hexagonal system. For the cubic system, the relative errors of the RDFs for the C-CBMC, MD, and MC were found to be 18.2, 34.6, and 39.0, respectively. Our results demonstrate the efficiency and efficacy of our new back-mapping approach, especially for crystalline systems where simple force-field-based relaxations have a tendency to get trapped in local minima.



INTRODUCTION

typical simulations over nanometer length scales and nanosecond time scales. This, of course, can reach the microsecond range through the use of specialized equipment, such as GPUs.3,4 Coarse-grained approaches with reduced degrees of freedom enable us to reach scales longer than the micron/ microsecond range. In many soft-matter applications, such as self/directed assemblies of colloidal nanoparticles, phase transitions in aqueous polymers/proteins subjected to external stimuli to condensed phase dynamics in supercooled water, both

Molecular dynamics (MD) is a powerful technique for materials modeling. There are several examples in the literature where different flavors of MD have led to breakthroughs in materials design and discovery for applications ranging from catalysis, tribology,1 and corrosion to energy storage.2 MD is also emerging to be a valuable tool that can be integrated with microscopy and X-ray imaging to potentially enable real-time three-dimensional atomistic characterization of materials. There are various flavors of MD ranging from ab initio MD to coarse-grained MD. The more accurate and flexible ab initio MD is computationally demanding and hence limited to smaller time (picoseconds) and length scales (nanometers). Classical MD with semiempirical potential models allows © XXXX American Chemical Society

Received: February 21, 2018 Revised: May 17, 2018 Published: June 20, 2018 A

DOI: 10.1021/acs.jpcb.8b01791 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 1. Schematic overview of the C-CBMC algorithm.

atomistic and coarse-grained molecular dynamics simulations, are routinely employed to understand the structure and dynamical evolution of system. As an example, there are several atomistic models of water, which have been used to study homogeneous and heterogeneous nucleation of ice in supercooled systems. These atomistic models, such as SPC/E,5 TIP3P,6 and TIP4P/2005,7 to name a few, describe the dynamics of nucleation and growth phenomena over nanometer length scales and nanosecond time scales.8 In recent years, coarse-grained water models, such as the mW9 model, have sought to solve many of the mesoscale problems associated with studying events, such as the crystallization of water over hundreds of nanometer and nanosecond scales under atmospherically relevant conditions. Such coarse-grained models gain in efficiency by abstracting individual water models into single monoatomic beads. In these models, the explicit hydrogen-bonding interactions between water molecules are replaced by a set of first-neighbor angular potentials, which ensure that a system will adopt the correct tetrahedral configuration. Reducing water molecules down to single monoatomic units results in several advantages. For example, traditional water models, such as the SPC and TIP models, typically have three or more atomic sites, which contain a partial charge.6,10 This means that the pairwise interaction between any two water molecules requires the evaluation of a minimum of nine pairs distance to accurately capture the interaction energy. This, of course, does not include any long-range electrostatic corrections that may be required as well. In contrast to a

monoatomic water model, 9−12 distance pairs may be sufficient to completely calculate the interaction of a particle and it’s first-neighbor shell. In addition, the abstraction into a single atom removes higher degrees of freedom, such as bond vibration and rotation. This allows one to potentially integrate the system using larger time steps in the case of molecular dynamics or removes the need for intramolecular moves in the case of Monte Carlo. The problem of reverse transformation or back-mapping atomistic coordinates is common to all coarse-grained models, and several attempts have been made in the past to find a way to switch from a coarse-grained structure to a valid atomistic structure. A few examples of methods that have been used include using a fragment or database protocol,11,12 local energy minimization,13 hybrid insertion,14,15 layered insertion methods,16 matrix transform,17 and other geometric reconstruction approaches.18 Since coarse graining is commonly applied to polymer systems, a fair fraction of these approaches have been optimized for those particular applications. Methods such as the one proposed by Ghanbari et al. in ref 16 work by taking the initial center of mass information given by the coarsegrained model, sequentially inserting individual atoms based on the known geometry of the atomistic structure, and then relaxing the system using an MD simulation. These approaches provide excellent results for polymer systems since the primary challenge in these systems is ensuring that the atomistic structure retains the same backbone and side group locations. Many of the popular back-mapping schemes follow a similar logic where one uses the knowledge of the underlying B

DOI: 10.1021/acs.jpcb.8b01791 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

components of a polymer system segment by segment. Each segment can be regrown by randomly selecting a bond length and angle according to the exact probability distribution of both values. This ensures that the resulting configuration’s intramolecular parameters are sufficiently close to their equilibrium values. In addition, since large conformational changes with a high rate of acceptance are possible with CBMC, the CBMC approach can rapidly sample through a large number of different chain conformations in a much shorter time period. When one examines the problem of converting a coarsegrained water structure, which contains no information of the hydrogen positions to an atomistic model and requires valid hydrogen positions, one can begin to draw common characteristics between polymer systems and the back-mapping process. First, due to the strong hydrogen-bonding interactions, to get an MC move accepted, one must ensure that a given move maintains a proper hydrogen-bonding structure. Second, even if the hydrogen-bonding behavior is maintained, significantly changing the rotational orientation of a given water molecule may require several thousand moves not only of the water molecule itself, but also of the neighboring water molecules. Given the similarities between a network of hydrogenbonding and long-chain molecules, one can effectively view the back-mapping procedure as the cutting and regrowing of hydrogen-bonding chains within the water system. Thus, it is hypothesized that extending the CBMC formalism to the backmapping problem can provide an efficient algorithm that can accurately predict the hydrogen positions of a system, given that the oxygen positions are known. The crystallineconfigurational-bias Monte Carlo (C-CBMC) that we have developed uses the following approach. 1. Randomly choose a number N between 1 and Nmax. This number will act as the number of water molecules to be cut and regrown in the system. Henceforth, this will be defined as a “chain” of water molecules. 2. Begin by selecting a water molecule in the simulation box to act as a starting point. 3. Perform a random walk from neighbor to neighbor through the system until N water molecules have been selected. 4. Remove all members of the chain from the system. 5. Starting with the first molecule in the chain, regrow the water molecule by selecting a new rotational orientation via Rosenbluth sampling.24 6. Once the position of the first molecule has been fixed, regrow the next water molecule in the chain. 7. Repeat the regrowth process until all members of the chain have been regrown. Most CBMC algorithms take advantage of Rosenbluth sampling,24 which enhances sampling efficiency by proposing multiple trial positions for each step in the regrowth process. If one were to take the simplistic approach of trying to regrow multiple atoms in a single MC move without some kind of advanced sampling, the likelihood of accepting the move would be very low, given that if even a single atom is positioned poorly the entire move will be doomed. For the purposes of this paper, the trial orientations are selected according to a uniform distribution of all possible rotational orientations that keep the oxygen location fixed. This gives a rotational generation probability of 1 2 . Since the selection

geometric structure to generate a reasonable initial position and then relaxation/dynamics are performed using MD or MC.18 For system types where the nonbonded interactions are fairly weak, where the structure is expected to be disordered, where it is reasonably easy to predict corresponding atomic positions from the coarse-grain data, or where there are only a few valid back-mapped configurations, most of these methods will give satisfactory results. However, extending some of these formalisms into well-ordered or partially ordered aqueous systems has proven to be challenging. In the case of water, even if the oxygen positions are known from the coarse-grained simulations, there are still a large number of unique configurations for the hydrogen positions that correspond to the same coarse-grained configuration. A single water molecule can typically have three to five neighboring water molecules of which it may be able to hydrogen bond with, and with each water molecule, several different viable hydrogen bonds may be made by rotating both water molecules in appropriate directions. This creates potentially thousands of valid hydrogen positions that correspond to a single set of oxygen position. Because of this, there exist an incredibly large volume of local minima in a water system that may not be present in other systems that common back-mapping techniques are applied to. This is especially troublesome for crystalline structures since failing to get the correct hydrogen-bonding structure can potentially give rise to unphysical configurations when a dynamics simulation is performed. Since the study of longtime-scale events, such as ice nucleation,19 is a desired application of the mW9 and any other coarse-grained water models, having a reliable way to convert structures generated by coarse-grained simulations back to atomistic models can be an invaluable tool. In this work, we outline a new back-mapping method, which we term the crystalline-configurational-bias Monte Carlo or CCBMC method, which is designed to search and locate optimal hydrogen orientations while resolving the atomistic coordinates in a coarse-grained model. We demonstrate the computational efficiency of our method by comparing it to other local optimization methods as well as standard forcefield-based relaxation strategies. We use several different test systems ranging from crystalline ice phase with long-range order, amorphous ice phases with short-range order, and ice− water heterointerfaces with partial order. Our C-CBMC algorithm requires far fewer evaluations in reaching the optimal hydrogen-bonded configuration than the other standard back-mapping procedures used in this work (Figure 1).



BACK MAPPING BASED ON CONFIGURATIONAL-BIAS MONTE CARLO The configurational-bias Monte Carlo (CBMC) algorithm20−23 is a method that has been used to efficiently sample long-chain molecules and polymers. Traditionally, these molecules have been difficult to sample using simple MC algorithms due to the fact that internal degrees of freedom (bond stretching, bending, etc.) have very large energetic penalties if an MC algorithm selects a bond configuration that is far away from the equilibrium position. In addition, it may take upward of millions of simple MC moves to significantly change the conformation of a long-chain molecule given that it often requires multiple successful moves of multiple atoms in the correct sequence. The CBMC family of algorithms circumvents this problem by cutting and regrowing the



C

DOI: 10.1021/acs.jpcb.8b01791 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

the diffusion barriers are high and are used to demonstrate the efficiency of our C-CBMC algorithm over other back-mapping algorithms. It should be noted that the exact locations of the hydrogen atoms on each oxygen and the nature of hydrogenbonding network are precisely known for these target systems.27 Thus, for both the cubic and hexagonal ice systems, the reference structures were obtained from ref 27, relaxed for the TIP4P/Ice model, and used as the target. The stacking disordered ice represents an entropically stabilized defective structure that is a mix of both cubic and hexagonal ice and represents additional levels of complexity over the pure crystalline phases. The amorphous ice phase represents an example where short-range order is present and the hydrogenbonding network is highly distorted. Here, the barriers for hydrogen rotation are expected to be lower than the crystalline phase. The heterointerfaces of the ice−water system present an example where the nature of the hydrogen bonding varies greatly near the solid−liquid interface. Moreover, the barriers for hydrogen rotation are lower in the liquid phase, whereas they are much higher in the solid region. Taken together, these different ice/water configurations represent a rigorous test of the CBMC back-mapping algorithm and are used to highlight the efficiency and efficacy of our method over other existing back-mapping strategies. The initial structures were either generated by taking a known crystal structure and randomizing the hydrogen positions (hexagonal, cubic, and stacking) or by running a simulation using the mW9 water model (interface and amorphous). For each system, a sequence of three different simulation approaches were examined: (i) a simple MC simulation, which contained standard rotational displacement moves, (ii) an MD simulation, which begins at a high temperature and slowly quenches the system, and finally (iii) a simulation containing both standard rotational moves and the C-CBMC move. For the MC simulations, the maximum rotation angle was scaled automatically to ensure a 50% acceptance rate. In all simulations, the oxygen position is fixed and only the hydrogen orientation is allowed to move. For the C-CBMC method, eight Rosenbluth trials per water molecule were used. For this study, the TIP4P/Ice model was used with a molecular-based cutoff.10 If two water molecules have a distance of 6.5 Å between their corresponding oxygen atoms, then the interactions of all atoms in both water molecules are evaluated. This is done to ensure that effects due to unbalanced charges (i.e., including a hydrogen’s interaction without the corresponding oxygen charge) are avoided. For the Rosenbluth weight, if two oxygen atoms are within 3.5 Å, the two molecules are considered to be neighbors and are included in the weight. These cutoff values correspond to the first and second neighbor shells. All simulations are performed at a temperature of 10 K with exception to the amorphous system, where the final temperature is 110 K. Since the oxygen positions do not change during the course of the simulation, the neighbor list must only be calculated once at the beginning of the simulation.

process is uniform, the generation probabilities will cancel out in the Rosenbluth weight and acceptance algorithm and so these probabilities do not need to be considered. The trial positions are weighted by their Boltzmann weight and then selected according to the probability given by Pi , m =

e−βEi ,m wi

(1)

M

wi =

∑ e − βE

i ,m

(2)

m=1

where Em is the energy of the mth proposed trial position for the current step in the chain regrowth, β is equal to 1 , and wi kBT

is the normalization factor for the ith step in the chain regrowth. The design of this weight is such that when one inserts it into the Metropolis25,26 accept/reject equation, one has ij W yz P Acc = minjjj1, new zzz j Wold z k {

(3)

where Wnew and W old are given as products of the normalization constants for each member of the chain that was regrown Nchain

W=

∏ wi i=1

(4)

It should be noted for Rosenbluth sampling to satisfy detailed balance, if one proposes M number of trial moves in the forward direction, then one must also provide M − 1 trial moves in addition to the weight of the old state in the reverse direction. Equation 3 assumes that all interactions in the system were included in the Rosenbluth weights. In practice, however, it is usually computationally inefficient to include the entire interaction energy in the Rosenbluth weight since this is effectively calculating the entire system energy several times over. Instead, the nearest neighbors of the molecule being regrown are used since those interactions are the most likely to cause a move to be rejected due to overlap or poor hydrogenbonding orientation. If only a subset of all of the energetic interactions is used, then eq 3 is modified to become ji W zy P Acc = minjjj1, new ·e−β ΔEex zzz j Wold z k {

(5)

where ΔEex is any energetic contribution that was not included in the Rosenbluth weight. In this situation, it is the pairwise interaction between the chain’s members and all molecules beyond each molecule’s respective first-neighbor cutoff.



MODEL SYSTEMS AND SIMULATION DETAILS A set of five different crystalline and amorphous ice structures were used to test and compare different algorithmic approaches. The following different test sets were employed: (a) crystalline ice structure (hexagonal and cubic ice), (b) a stacking disordered structure (mix of cubic/hexagonal), (c) an amorphous ice structure, and (d) a heterointerfacial system, i.e., a liquid water−ice interface system. In the case of crystalline ice structures, there are both short- and long-range order. Moreover, these two systems represent examples where



RESULTS AND DISCUSSION The results for the amorphous ice system are shown in Figure 2. From these results, it can be seen that the C-CBMC algorithm is able to find a low-energy configuration within 500 total Monte Carlo cycles. In contrast, after nearly 7500 steps for both the MD and simple MC simulations, there still remain D

DOI: 10.1021/acs.jpcb.8b01791 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 2. Energy as a function of the number of evaluations/steps for an amorphous ice configuration for the C-CBMC simulation (dashed blue curve), MD simulation (dashed red curve), and simple MC simulation (dashed green curve). For the MC simulation, the x-axis is given in terms of the number of MC cycles of 100 moves each, and for the MD simulation, it is given in terms of picoseconds. Each simulation is averaged over 16 different trajectories. In addition, snapshots of the final configuration for each of the three simulation methods and the reference structure are given.

Figure 3. Energy as a function of the number of evaluations/steps (top) for a liquid−hex interface for the C-CBMC simulation (dashed blue curve), MD simulation (dashed red curve), and simple MC simulation (dashed green curve). Each simulation is averaged over 16 different trajectories. For the MC simulation, the x-axis is given in terms of the number of MC cycles of 100 moves each, and for the MD simulation, it is given in terms of picoseconds. In addition, snapshots of the final configuration for each of the three simulation methods and the reference structure are given.

small energy differences of approximately 0.04 and 0.07 eV/ water, respectively. While the C-CBMC approach is able to find a low-energy structure faster, it appears that given enough time, both of the simpler methods can also achieve a sufficiently optimal hydrogen-bonding structure. For disordered structures, such as a liquid or in this case an amorphous ice structure, the energetic barriers associated with hydrogen rearrangement are rather low. Therefore, in such cases, even simplistic methods seem more than capable of giving good structures so long as the simulation is run for several thousand steps. In a similar fashion, for the liquid−ice interface system shown in Figure 3, the C-CBMC algorithm is able to find a low-energy structure within about 1000 simulation steps, whereas the MD and simple MC simulations find a similar energy configuration after many more evaluations (approximately 7500 steps). For the sake of statistics, repeated runs using different water−ice surfaces were performed in addition to the simulation presented in Figure 3. These interfaces were generated by cutting the ice crystal along different geometric planes. In general, while the trends showed a quantitative difference, they generally showed the same qualitative trends. The results of those can be found in the Supporting Information. In both the ice−water interface and amorphous ice systems, a large fraction of the system consists of a disordered phase without any long-range order. Since these intrinsically disordered systems have low diffusion barriers, it is

understandable that the lower-energy configurations can reasonably be accessed without a specialized algorithm (moderate sampling) given that the simulation is ran for a long enough time. Back mapping of atoms in crystalline systems, however, is expected to be difficult given that the barriers for molecular diffusion and rotation are much higher. Unlike the disordered systems shown in Figures 2 and 3, the hexagonal ice system does not show a similar trend. As shown in Figure 4, the CCBMC algorithm requires several thousand steps to settle down into a stable minima. However, it is able to find a structure that is almost a full 0.07 eV/water lower than the MC simulation and a 0.15 eV/water difference from the MD simulation. Even after running the MC and MD simulations for an additional 5000 simulation steps, the total system energy did not change sufficiently, which may be an indication that the system is trapped in a local minima. Likewise, for the cubic ice and the hexagonal−cubic stacking disordered systems (shown in Figures 5 and 6, respectively), the C-CBMC is typically able to find a lower structure than either the MC or the MD simulation can achieve. The C-CBMC algorithm is able to do so within 2000−3500 simulation steps, which is far fewer than that required for simple MC or MD simulation. The final energy differences for both systems are found to be between 0.05 and 0.1 eV/water. In the case of the disordered systems, we note a continually converging trend by the MC and MD E

DOI: 10.1021/acs.jpcb.8b01791 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 4. Energy as a function of the number of evaluations/steps for a hexagonal ice configuration is given for the C-CBMC simulation (dashed blue curve), MD simulation (dashed red curve), and simple MC simulation (dashed green curve). For the MD simulation, the xaxis is given in terms of the number of MC cycles of 100 moves each, and for the MD simulation, it is given in terms of picoseconds. Each simulation is averaged over 16 different trajectories. The solid black line represents the target energies, i.e., energies of density functional theory (DFT)-minimized lowest-energy configuration of cubic ice obtained from ref 27. In addition, snapshots of the final configuration for each of the three simulation methods and the reference structure are given. Note that the conjugate gradient minimization of the final configurations (i.e., oxygen positions) obtained from C-CCBMC allows convergence to the target energies shown by the dotted lines.

Figure 5. Energy as a function of the number of evaluations/steps for a cubic ice configuration for the C-CBMC simulation (dashed blue curve), MD simulation (dashed red curve), and simple MC simulation (dashed green curve). For the MD simulation, the x-axis is given in terms of the number of MC cycles of 100 moves each, and for the MD simulation, it is given in terms of picoseconds. Each simulation is averaged over 16 different trajectories. The solid black line represents the target energies, i.e., energies of DFT-minimized lowest-energy configuration of cubic ice obtained from ref 27. In addition, snapshots of the final configuration for each of the three simulation methods and the reference structure are given. Note that the conjugate gradient minimization of the final configurations (i.e., oxygen positions) obtained from C-CCBMC allows convergence to the target energies shown by the dotted line.

simulations toward the C-CBMC structure, even though the number of evaluations required was much larger. On the other hand, in all of the crystalline test systems, we observe that the MC and MD simulations typically get stuck in a higher-energy configuration. Even after running for a significantly longer time (MD) or a number of evaluations (simple MC), no notable change in their energy is observed. The final radial distribution functions (RDFs) for the hexagonal and cubic systems are shown in Figures 7 and 8, respectively. In general, all of the methods were able to get reasonable estimates of the first peak for the hexagonal system. However, the higher-order peaks for the MD and MC simulations showed smaller intensities relative to the reference structure. In particular, the peaks located above 2.5 Å were notably smaller. For the cubic structure, several of the peaks, including the first peak, showed significant splitting for the MC and MD simulations. The CCBMC showed a single peak for the first peak, although it was slightly broader than the reference structure’s first peak. The

two-peaked nature of the RDFs from the cubic system is indicative of an alternative local minima that exists in the system. In both the MC and MD simulations, it is evident that half the system is trapped in this local minimum or alternative hydrogen positioning. In contrast, the C-CBMC simulation is able to escape and approach the desired hydrogen position barring a few alternative hydrogen orientations. The relative errors of the simulation RDFs and the reference RDF were computed to quantify the differences between the three methods. The C-CBMC, MD, and MC RDFs were found to have relative errors of 6.9, 8.7, and 12.9, respectively, for the hexagonal system. Similarly, we find relative errors of 18.2, 34.6, and 39.0, respectively, for C-CBMC, MD, and MC RDFs in the case of the cubic system. From a molecular statics perspective, the origin of this energetic difference between the disordered and ordered systems obtained via C-CBMC, simple MC, and MD can F

DOI: 10.1021/acs.jpcb.8b01791 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 8. Final oxygen−hydrogen radial distribution function of cubic system from each simulation method. The RDFs of the C-CBMC (orange line), MD (red line), and MC (green line) are plotted on top of the RDF of the reference structure (dashed black line). The RDFs are offset by a constant factor to plot all of the RDFs in a single figure. The relative errors of the RDFs for the C-CBMC, MD, and MC were found to be 18.2, 34.6, and 39.0, respectively, compared to the reference structure. Figure 6. Energy as a function of the number of evaluations/steps for a hexagonal/cubic stacking disorder ice configuration for the CCBMC simulation (dashed blue line), MD simulation (dashed red curve), and simple MC simulation (dashed green curve). For the MD simulation, the x-axis is given in terms of the number of MC cycles of 100 moves each, and for the MD simulation, it is given in terms of picoseconds. Each simulation is averaged over 16 different trajectories. In addition, snapshots of the final configuration for each simulation are given.

positions not only need to be locally ordered, but instead must satisfy a broader global ordering. Because of this, collective motion becomes important since for even one water molecule to escape a local minimum, multiple water molecules may have to rotate in sequence to maintain the proper local hydrogen-bonding structure. This is where simplistic MC simulations run into trouble since traditionally only a single water molecule can be altered in one MC move. In other words, the simple MC simulations are unable to adequately sample the available configurational space compared to our CCBMC algorithm. It thus requires many successful attempts in a series to finally get out of a given minimum, which explains the significantly larger number of MC moves. In contrast, the MD approach can simulate collective motion, but it must still go through the free-energy barriers found in the system. Since these barriers can be very high, it can require that a large number of simulation steps be carried out to finally overcome these barriers. The advantage of the CCBMC approach is that multiple water molecules can be rotated in sequence, which greatly improves the rate of convergence, and unlike the MD approach, the unphysical nature of the moves allows the system to hop from one minimum to another without having to go through some of the barriers associated with flipping multiple water molecules. After the hydrogen positions have been determined, the oxygen positions can then be relaxed using any desired local minimization scheme. The results of these can be found in Table 1. This can often account for the remaining energy difference between the target structures and the resulting backmapped structure. It can also be the case as seen in Figure 5 that if the chosen back-mapping algorithm does not produce a set of stable hydrogen positions after a minimization and box relaxation, several water molecules can move from their lattice positions and create an unphysical crystal structure. This method can naturally be extended to other small molecules without any modifications. When applying it to larger systems, such as polymers, for example, one must adjust the rotation operations to match the geometric constraints of

Figure 7. Final oxygen−hydrogen radial distribution function of hexagonal system from each simulation method. The RDFs of the CCBMC (orange line), MD (red line), and MC (green line) are plotted on top of the RDF of the reference structure (dashed black line). The relative errors of the RDFs for the C-CBMC, MD, and MC were found to be 6.9, 8.7, and 12.9, respectively, compared to the reference structure. The RDFs are offset by a constant factor to plot all of the RDFs in a single figure.

largely be attributed to the fact that to create a valid liquid or amorphous ice structure one only needs to ensure that the hydrogen positions are ordered enough to satisfy a local hydrogen bonding with the surrounding water molecules. However, for a crystalline configuration, the hydrogen G

DOI: 10.1021/acs.jpcb.8b01791 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

performance of simple MC and MD is better, although the CCBMC still achieves optimal configurations (10−70 meV/ molecule) with fewer evaluations. These improvements arise from the ability of C-CBMC to efficiently and adequately sample the configurational space compared to the simpler MC and MD approaches. Although we have demonstrated the back-mapping strategy for water, the C-CBMC method is quite generic and can be used to map back and forth between coarsegrain and atomistic configurations for a broad range of softmatter systems, allowing researchers to mix and match molecular models to harness each model’s individual strengths.

Table 1. Final Energies of Each of the Systems after an AllAtom Relaxationa system

MD relaxed

MC relaxed

C-CBMC relaxed

hexagonal cubic

−0.753863 −0.99132

−0.723344 −0.94624

−0.784446 −1.01626

a

Box relaxation was performed in addition to relaxation of the atomic positions. The values here are averaged over multiple trajectories for each simulation method. All values are given in units of eV/water.

that system. For example, in a polymer, one may wish to restore atomistic detail to a side group attached to a polymer backbone. In this situation, rotation of the entire group will be restricted due to the bonding requirement. Thus, rotations will likely consist of rotating about the bond that connects the side chain to the polymer backbone. Polymer side chains can then be rotated in sequence using the same selection scheme (i.e., creating a C-CBMC chain out of neighboring polymer side chains). Another advantage of the C-CBMC algorithm is that since it is Monte Carlo based, samplings are not restricted to simple energy minimization and can be done at any given finite temperature. Naturally, the higher the temperature, the easier it is to search for optimal configurations. It can also be readily coupled with other advanced sampling methods that are available to MC simulations, such as umbrella sampling28 or metadynamics.29 These methods can actively flatten barriers that are naturally encountered in the system, further improving the efficiency of MC-based algorithms. In addition, the formulation of the C-CBMC move presented here is far from the most optimal selection method. The efficiency of the move can potentially be improved by selectively targeting algorithms similar to the energy-bias aggregation-volume-bias Monte Carlo method.30 In these algorithms, the water molecules which have the worst orientation, have a high probability of being targeted compared to molecules which are fairly well aligned. Another potential optimization that could be performed is if one could rapidly generate trial positions with a bias toward viable hydrogen bonding similar to the Jacobian−Gaussian CBMC scheme.23 Although harder to implement for intermolecular interactions, these schemes have been shown to massively improve the acceptance rate since they have a higher chance of generating a low-energy configuration.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jpcb.8b01791.



Oxygen relaxation, energy vs simulation time for interface system (PDF)

AUTHOR INFORMATION

Corresponding Authors

*E-mail: tloeff[email protected] (T.D.L.). *E-mail: [email protected] (S.K.R.S.S.). ORCID

Troy D. Loeffler: 0000-0003-0244-1793 Mathew J. Cherukara: 0000-0002-1475-6998 Subramanian K. R. S. Sankaranarayanan: 0000-0002-9708396X

Author Contributions ∥

T.D.L. and H.C. contributed equally to this work.

Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS Use of the Center for Nanoscale Materials was supported by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, under Contract No. DE-AC0206CH11357. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC0205CH11231.





CONCLUSIONS In summary, we have developed a C-CBMC algorithm that allows back mapping of atoms into a coarse-grained model resulting in a fully atomistic model with greater accuracy and efficiency compared to simplistic optimization approaches. Using water as a representative system, we have rigorously tested our C-CBMC back-mapping strategy by applying it to a broad range of configurations from disordered to crystalline to mixed interfaces. The C-CBMC outperforms the simple MC and the popularly employed MD back-mapping approach; it requires fewer evaluations and converges to a lower-energy configuration than those from simple MC and MD. C-CBMC back-mapped configurations for crystalline test systems (cubic and hexagonal ice) quickly converge to the lowest-energy DFT configurations, whereas simple MC and MD get trapped in local minima (several tens of milli-electron volts higher in energy). For disordered systems and interfaces where the barriers for atomic rearrangement are relatively low, the

REFERENCES

(1) Erdemir, A.; Ramirez, G.; Eryilmaz, O. L.; Narayanan, B.; Liao, Y.; Kamath, G.; Sankaranarayanan, S. K. Carbon-based tribofilms from lubricating oils. Nature 2016, 536, 67. (2) Zhang, Z.; Schwanz, D.; Narayanan, B.; Kotiuga, M.; Dura, J. A.; Cherukara, M.; Zhou, H.; Freeland, J. W.; Li, J.; Sutarto, R.; et al. Perovskite nickelates as electric-field sensors in salt water. Nature 2018, 553, 68. (3) Götz, A. W.; Williamson, M. J.; Xu, D.; Poole, D.; Le Grand, S.; Walker, R. C. Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born. J. Chem. Theory Comput. 2012, 8, 1542−1555. (4) Salomon-Ferrer, R.; Götz, A. W.; Poole, D.; Le Grand, S.; Walker, R. C. Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 2. Explicit Solvent Particle Mesh Ewald. J. Chem. Theory Comput. 2013, 9, 3878−3888. (5) Berendsen, H. J. C.; Grigera, J. R.; Straatsma, T. P. The missing term in effective pair potentials. J. Phys. Chem. 1987, 91, 6269−6271.

H

DOI: 10.1021/acs.jpcb.8b01791 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

(27) Engel, E. A.; Monserrat, B.; Needs, R. J. Anharmonic Nuclear Motion and the Relative Stability of Hexagonal and Cubic ice. Phys. Rev. X 2015, 5, No. 021033. (28) Torrie, G. M.; Valleau, J. P. Non-Physical Sampling Distributions in Monte-Carlo Free-Energy Estimation: Umbrella Sampling. J. Comput. Phys. 1977, 23, 187−199. (29) Ensing, B.; de Vivo, M.; Liu, Z.; Moore, P.; Klein, M. L. Metadynamics as a Tool for Exploring Free Energy Landscapes of Chemical Reactions. Acc. Chem. Res. 2006, 39, 73−81. (30) Loeffler, T. D.; Sepehri, A.; Chen, B. Improved Monte Carlo Scheme for Efficient Particle Transfer in Heterogeneous Systems in the Grand Canonical Ensemble: Application to Vapor-Liquid Nucleation. J. Chem. Theory Comput. 2015, 11, 4023−4032.

(6) Jorgensen, W. L.; Chandrasekhar, J.; Madura, J. D.; Impey, R. W.; Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983, 79, 926−935. (7) Abascal, J. L. F.; Vega, C. A general purpose model for the condensed phases of water: TIP4P/2005. J. Chem. Phys. 2005, 123, No. 234505. (8) Sanz, E.; Vega, C.; Espinosa, J. R.; Caballero-Bernal, R.; Abascal, J. L. F.; Valeriani, C. Homogeneous Ice Nucleation at Moderate Supercooling from Molecular Simulation. J. Am. Chem. Soc. 2013, 135, 15008−15017. (9) Molinero, V.; Moore, E. B. Water Modeled As an Intermediate Element between Carbon and Silicon. J. Phys. Chem. B 2009, 113, 4008−4016. (10) Abascal, J. L. F.; Sanz, E.; Fernández, R. G.; Vega, C. A potential model for the study of ices and amorphous water: TIP4P/ Ice. J. Chem. Phys. 2005, 122, No. 234511. (11) Stansfeld, P. J.; Sansom, M. S. From Coarse Grained to Atomistic: A Serial Multiscale Approach to Membrane Protein Simulations. J. Chem. Theory Comput. 2011, 7, 1157−1166. (12) Holm, L.; Sander, C. Database algorithm for generating protein backbone and side-chain co-ordinates from a C alpha trace application to model building and detection of co-ordinate errors. J. Mol. Biol. 1991, 218, 183−194. (13) Shih, A. Y.; Freddolino, P. L.; Sligar, S. G.; Schulten, K. Disassembly of Nanodiscs with Cholate. Nano Lett. 2007, 7, 1692− 1696. (14) Harmandaris, V. A.; Adhikari, N. P.; van der Vegt, N. F. A.; Kremer, K. Hierarchical Modeling of Polystyrene: From Atomistic to Coarse-Grained Simulations. Macromolecules 2006, 39, 6708−6719. (15) Krajniak, J.; Pandiyan, S.; Nies, E.; Samaey, G. Generic Adaptive Resolution Method for Reverse Mapping of Polymers from Coarse-Grained to Atomistic Descriptions. J. Chem. Theory Comput. 2016, 12, 5549−5562. (16) Ghanbari, A.; Böhm, M. C.; Müller-Plathe, F. A Simple Reverse Mapping Procedure for Coarse-Grained Polymer Models with Rigid Side Groups. Macromolecules 2011, 44, 5520−5526. (17) Wassenaar, T. A.; Pluhackova, K.; Böckmann, R. A.; Marrink, S. J.; Tieleman, D. P. Going Backward: A Flexible Geometric Approach to Reverse Transformation from Coarse Grained to Atomistic Models. J. Chem. Theory Comput. 2014, 10, 676−690. (18) Lombardi, L. E.; Martí, M. A.; Capece, L. CG2AA: backmapping protein coarse-grained structures. Bioinformatics 2016, 32, 1235−1237. (19) Russo, J.; Romano, F.; Tanaka, H. New metastable form of ice and its role in the homogeneous crystallization of water. Nat. Mater. 2014, 13, 733−739. (20) Siepmann, J. I.; Frenkel, D. Configurational bias Monte Carlo: a new sampling scheme for flexible chains. Mol. Phys. 1992, 75, 59−70. (21) Escobedo, F. A.; de Pablo, J. J. Extended continuum configurational bias Monte Carlo methods for simulation of flexible molecules. J. Chem. Phys. 1995, 102, 2636−2652. (22) Martin, M. G.; Siepmann, J. I. Novel Configurational-Bias Monte Carlo Method for Branched Molecules. Transferable Potentials for Phase Equilibria. 2. United-Atom Description of Branched Alkanes. J. Phys. Chem. B 1999, 103, 4508−4517. (23) Sepehri, A.; Loeffler, T. D.; Chen, B. Improving the Efficiency of Configurational-Bias Monte Carlo: Extension of the JacobianGaussian Scheme to Interior Sections of Cyclic and Polymeric Molecules. J. Chem. Theory Comput. 2017, 13, 4043−4053. (24) Rosenbluth, M. N.; Rosenbluth, A. W. Monte Carlo Calculation of the Average Extension of Molecular Chains. J. Chem. Phys. 1955, 23, 356−359. (25) Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 1953, 21, 1087−1092. (26) Hastings, W. K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57, 97−109. I

DOI: 10.1021/acs.jpcb.8b01791 J. Phys. Chem. B XXXX, XXX, XXX−XXX