Constructing an Interpolated Potential Energy Surface of a Large

Oct 19, 2016 - Constructing a reliable potential energy surface (PES) is a key step toward computationally studying the chemical dynamics of any molec...
0 downloads 0 Views 1MB Size
Article pubs.acs.org/JCTC

Constructing an Interpolated Potential Energy Surface of a Large Molecule: A Case Study with Bacteriochlorophyll a Model in the Fenna−Matthews−Olson Complex Chang Woo Kim and Young Min Rhee* Center for Self-assembly and Complexity, Institute for Basic Science (IBS), Pohang 37673, Korea Department of Chemistry, Pohang University of Science and Technology (POSTECH), Pohang 37673, Korea S Supporting Information *

ABSTRACT: Constructing a reliable potential energy surface (PES) is a key step toward computationally studying the chemical dynamics of any molecular system. The interpolation scheme is a useful tool that can closely follow the accuracy of quantum chemical means at a dramatically reduced computational cost. However, applying interpolation to building a PES of a large molecule is not a straightforward black-box approach, as it frequently encounters practical difficulties associated with its large dimensionality. Here, we present detailed courses of applying interpolation toward building a PES of a large chromophore molecule. We take the example of S0 and S1 electronic states of bacteriochlorophyll a (BChla) molecules in the Fenna−Matthews−Olson light harvesting complex. With a reduced model molecule that bears BChla’s main π-conjugated ring, various practical approaches are designed for improving the PES quality in a stable manner and for fine-tuning the final surface such that the surface can be adopted for long time molecular dynamics simulations. Combined with parallel implementation, we show that interpolated mechanics/molecular mechanics (IM/MM) simulations of the entire complex in the nanosecond time scale can be conducted readily without any practical issues. With 1500 interpolation data points for each chromophore unit, the PES error relative to the reference quantum chemical calculation is found to be ∼0.15 eV in the thermally accessible region of the conformational space, together with ∼0.01 eV error in S0 − S1 transition energies. The performance issue related to the use of a large interpolation database within the framework of our parallel routines is also discussed. and the efficiency of MM models.6,7 In QM/MM simulations, the QC treatment is limited to a relatively small region where accurate computation is crucial. Nevertheless, the cost required for a typical QM/MM simulation is often very high, and it is rare to see reports with time scales reaching beyond well over picoseconds, especially when high level approaches are employed on the QM side. Because many interesting chemical processes take place at a time scale slower than the picosecond time scale, this cost issue still acts as a major obstacle in employing QM/MM approaches. A method which can dramatically reduce the computational cost while maintaining the accuracy of quantum calculations will surely benefit us in that regard. The potential energy surface (PES) interpolation method, originally developed by Collins and co-workers8 and then extended by many other groups, exactly serves that purpose. In this method, the energy of a molecular conformation is estimated from a preconstructed database of QC calculations at diverse geometries. The utilization of PES interpolation was first toward simulating gas-phase reactions of small molecules8−22 and was later extended to studying surface reactions23−26 and solution phase dynamics.27−29 In the

1. INTRODUCTION During the last few decades, the ability to simulate complex molecular systems has been greatly improved with active developments of both computational methods and their algorithmic implementations.1 Molecular dynamics (MD), which propagates trajectories of a system of interest with numerical integrations of Newton’s equations of motion, has heavily benefited from such developments. One of the most popular targets handled with MD is biomolecules in solution.2 In this case, the number of atoms in the simulation box can often be higher than hundreds of thousands. Molecular mechanics (MM) force fields based on analytical potential models with various functional forms have been extensively developed based on empirical parameters and quantum chemical (QC) calculations.3−5 The relative simplicity of potential energy functions enabled fast and efficient simulations, and this likely has made MD a useful tool for numerous purposes such as calculating thermodynamic properties or unveiling protein conformational dynamics. However, due to its simplicity, the MM force field model is not suitable for investigating the subtle nature of complex chemical phenomena. This limitation can be overcome by adopting the quantum mechanics/molecular mechanics (QM/MM) method, a multiscale approach that combines the accuracy of QC calculations © XXXX American Chemical Society

Received: June 27, 2016 Published: October 19, 2016 A

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

the reference QC energy and ∼0.01 eV S0 − S1 gap energy errors, we could attain ground and excited states MD simulations over multiple nanosecond time scales without any apparent trajectory instability at an ambient temperature. Through this work, we hope that a useful guideline for adopting IM/MM to conducting realistic large-scale condensed phase simulations can be provided. The organization of this article is as follows. In section 2, we present an overview of the PES interpolation method and its relevant theories. Then, in section 3 we describe our practical approaches in constructing the interpolated PES of BChla together with considerations on the parallelization of the interpolation algorithm. We also demonstrate the overall performances of our PES in terms of both accuracy and efficiency. This is followed by concluding remarks in section 4.

solution phase approach, interpolation was employed to a part of the entire system, while its surroundings were described by a simpler MM model. This multiscale-style approach was referred to as interpolated mechanics/molecular mechanics (IM/ MM)28 in an analogy to QM/MM. Indeed, in conjunction with techniques that can properly treat excited state properties,30 it successfully described realistic behaviors of chromophore dynamics in the condensed phase.31 An important practical issue around constructing an interpolated PES is collecting the molecular geometries into a database at which the energy and its derivatives are evaluated. A widely accepted paradigm for the database construction is the “grow scheme”, which iteratively adds new points with the aid of sampling MD simulations. As new data points are added to an existing interpolation database, the PES literally grows until it converges to a desired accuracy in a wide enough region of the conformational space. As the size of the conformational space becomes exponentially larger with the system’s dimensionality, efficiently selecting the data point locations will be important to render the PES interpolation a viable approach for a large system. Another issue that is frequently encountered during the interpolated PES construction is the creation of unphysically low energy traps by data points with negative Hessian characters.32,33 Because such traps tend to lock the interpolated molecule in regions of unphysically distorted conformations, the issue can become serious for both database construction and the actual production simulation. In practice, it turns out that such traps cannot be removed trivially by merely supplying more data points, and thus an active filtering scheme should be adopted.34 Nonetheless, for these practical issues, interpolation has the potential to become a very useful tool toward studying large systems especially when long simulation times are sought for.33 Thus, extending its capability to such a direction, namely, targeting a large system with long time simulations, will be an important step forward in its development. In this article, we present our development and subsequent testing of the interpolated PES construction scheme toward that direction with special attention paid to the two issues commented in the above paragraph. As an example of a large system, we chose to construct both the ground and the first excited state PES’s of seven bacteriochlorophyll a (BChla) molecules in the Fenna−Matthews−Olson (FMO) light harvesting complex.35 Indeed, these seven molecules in the IM region with a total of 46 atoms constitute a much larger target than other systems studied by interpolation before. At the same time, with the increased interest in the coupling between the BChla’s electronic states and its vibrations,36−38 the constructed surfaces may serve a role of elucidating new physical aspects that could not be reached with more conventional simulation approaches in the past. Toward this goal, we devised a few extra features to the existing database grow method and carefully employed them to a stable construction of the surfaces, especially by considering a general situation that the large molecules are embedded in a protein scaffold. In the end, the number of the data points in the database of the seven chromophore units reached the order of 104, and still at this large number, the burden of performing interpolation calculations was yet comparable to that of the accompanying MM calculations. We also implemented parallelization of the interpolation routines and discovered a peculiar performance issue that is unique to interpolation. With the surfaces that bear ∼0.15 eV typical error in comparison to

2. THEORETICAL BACKGROUND For completeness, we will first briefly overview the technical aspects of the interpolation method. Curious readers can refer to historic developments8,34,39−42 of the technique and the related recent reviews.33,43 Hereafter, for simplicity, we will use IM to refer either literally to interpolation mechanics or more loosely to the interpolation approach, unless such a use causes excessive ambiguity. For example, we will denote interpolated PES as IM PES and the interpolation region as IM region. Potential Energy Surface. Suppose we know the energy Va, the gradient ga, and the Hessian Ha at a conformational point Za. Then, the PES in the nearby region of Za can be approximated as a second-order Taylor expansion, Va(Z) = Va + ΔaT ·ga +

1 T Δa ·Ha ·Δa 2

(1)

Here, Δa represents geometric difference between Z and Za expressed for a chosen molecular coordinate system. The total IM PES is defined as the weighted average of individual secondorder Taylor expansions for all N data points in the IM database, N

VI(Z) =

∑ wa(Z)Va(Z)

(2)

a

The weights, {wa(Z)}, are calculated by normalizing primitive weights {va(Z)}:

wa(Z) =

va(Z) N

∑a va(Z)

(3)

The Cartesian gradient of the IM PES can be derived from eq 2 as N

g (X) =

N

∑ wa(X)J·(ga + Δa ·Ha) + ∑ Va(X)∇wa(X) a

a

(4)

where J is the Jacobian matrix for the Z to X conversion. In the limit of N → ∞, the IM PES may become exactly identical to the PES defined by the reference quantum chemical (QC) method adopted during the database construction. This basic scheme has been amended into many variants, each of which differs in terms of the choice of the coordinate system and the weighting function. The simplest weighting function will likely be of the Shepard interpolation style,8 B

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation va(Z) =

1 d(Z , Za)2p

even for cases where multiple IM regions are separately present in the system. In this work, separate IM regions were assigned for each of the seven BChla units in the protein. With multiple regions, the total energy is

(5)

with the Euclidean distance d(Z,Za) between Z and Za. In this work, the Euclidian distance was measured with Cartesian coordinates in all cases, based on an iterative but rigorous geometric alignment algorithm.42 The exponent 2p determines the switching speed between data points. A generalized version of eq 5, often called modified Shepard weighting function, was also proposed by Collins and co-workers,34,41 2p ⎧ ⎡ d(Z , Za) ⎤ ⎪ ⎪⎡ d(Z , Z ) ⎤ a va(Z) = ⎨ +⎢ ⎢ ⎥ ⎥ ⎬ ⎪ ⎪ ⎣ rad(a) ⎦ ⎭ ⎩⎣ rad(a) ⎦

n

V=

i=1 n

+

Ma b=1

i=1

i=1

ij ∑ ∑ V IM ‐ IM

(9)

where i and j are the indices for the IM regions (n = 7). The last term accounts for the nonbonded interactions between different IM regions. For the stabilizer, si = 0.02 was adopted for all i.

(6)

with p > q. The scaling factor rad(a) is the confidence radius of the data point Za, and it represents the relative fidelity of the local Taylor expansion given in eq 1. Bayesian Confidence Radius. The confidence radius in eq 6 can be estimated by applying Bayesian analysis to the IM database. For this, one first picks a set of molecular geometries {Zab} (b = 1, 2, ..., Ma) located in the vicinity of a data point Za and compares their energies calculated by the local Taylor expansions from Za to their reference value from QC calculations. Operationally, rad(a) is estimated as34



n

n

i=1 j>i

2q ⎫−1

1 1 = 6 Ma rad(a)

n

i i i ∑ V IM + VMM + ∑ V IM ‐ MM + ∑ siV h

3. PES CONSTRUCTION FOR BCHLA MOLECULES Preparations. Figure 1 presents the BChla chromophore after rendering by Visual Molecular Dynamics (VMD).45

[VQC(Zba) − Va(Zba)]2 2 Etol d(Zba, Za)6

(7)

Here, of course, VQC(Zab) and Va(Zab) are, respectively, the energies at Zab evaluated by the reference QC calculation and by the local Taylor expansion from Za. Intuitively, rad(a) from the above equation is closely related to the radius of the hyperspherical conformational space inside which the rootmean-squared error of energy becomes equivalent to the energy tolerance Etol. In the original scheme,34 {Zab} was chosen as a subset of the data points in the IM database. This can be a practical decision, as the QC energies will be already known for these molecular conformations, and no additional information will be required for evaluating eq 7. IM/MM in Condensed Phase. For condensed phase MD simulations, the IM PES can be combined with a classical molecular mechanics (MM) model. The nonbonded interactions between the IM and the MM regions are treated by assuming MM-style nonbonded interaction parameters for the IM atoms. If IM and MM atoms are connected by a covalent bond, its bond length is constrained, and a hydrogen link atom is placed at a predetermined fixed position along the bond. The link atom is then treated as a part of the IM region, and the force exerted on it is distributed to boundary atoms according to the scheme proposed by Case and co-workers.44 We confirmed that this force redistribution scheme, when applied to interpolation, conserves the system total energy with constant volume and constant energy (NVE) MD simulations. Lastly, to prevent unphysical extreme distortions in the IM region, an MM-based harmonic stabilizer function Vh for bonded interaction27 is also imposed on the IM region. As Vh derives from the BChla MM model, it is practically the sum of the individual bonded interaction potentials of the IM region. The total energy of the entire system is then calculated as V = VIM + VMM + VIM ‐ MM + sVh

Figure 1. BChla molecule and its core part adopted for interpolation. The core with hydrogen link atoms are shown with thick sticks, while the H-replaced phytyl tail and other side chains are shown with thin sticks.

Because of the large size of the chromophore, the IM region was limited to its core structure that contains the main πconjugation system by terminating the phytyl tail and other dangling side chains and by replacing them with hydrogen link atoms. For the reference QC calculations toward constructing the ground and the excited state PES’s, density functional theory (DFT) and its time-dependent version (TDDFT) were, respectively, used with the B3LYP exchange-correlation functional together with the 6-31G(d,p) basis set. All of the quantum chemical calculations were performed with the developers’ version of Q-Chem 4.3 quantum chemistry package.46 We first prepared the MM stabilizer potential (eq 9) for BChla. As explained in a previous section, we constructed a separate database for each individual chromophore. During the construction of the IM database for the i-th chromophore, the other chromophore units, namely, the j-th chromophore with j ≠ i, were treated as parts of the MM region by adopting the stabilizer function as the MM force field of the chromophore. In this way, the databases were constructed for one chromophore at a time in a step-by-step manner. In

(8)

with apparent definitions for the IM, the MM, and their interaction energies. This equation can be generally applied C

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation constructing this MM force field, the bonded interaction parameters were generated by a Hessian matching process,47 which fits the MM parameters to best mimic the Hessian matrix obtained from a QC calculation. For the ground state MM PES, we first performed geometry optimization of the BChla (Figure 1) in the gas phase and then calculated the Hessian matrix. The atomic partial charges were fit by the RESP method48 and scaled by 0.715 to compensate for the polarization screening by the protein.49 For the Lennard-Jones parameters, we used the same values as those described in ref 50 regardless of the electronic states. Then, we performed energy minimization of the IM region in the protein through the QM/MM method to determine the initial starting structure of the IM database construction. The simulation box was constructed by placing the FMO complex from Chlorobaculum tepidum (PDB ID: 3BSD)51 in a rhombic dodecahedron box of 70.62 Å side length and by solvating the system with 21053 TIP3P water molecules. The protein scaffold and the phytyl tails of BChla were described by the AMBER99sb force field parameters.52 Link hydrogen (LH) atoms were placed along the covalent bonds to the side chains, and their distances to their attached carbon atoms were set to the bond lengths obtained with gas-phase geometry optimizations of the LH-terminated BChla structures (Figure 1). There were a total of eight link atoms in one BChla unit. The QM/ MM minimizations of the IM region were performed by the combination of GROMACS 4.553 and Q-Chem 4.3.46 Finally, a set of valence internal coordinates (“Z-matrix coordinates”) was defined for interpolating BChla, whose components are listed in Table S1 together with Figure S1 in Supporting Information. All of the IM/MM simulations were performed by using GROMACS with our in-house library for interpolation. Collecting IM Data Points. In earlier applications of PES interpolation, IM database construction was usually started from a small set of judiciously chosen data points.8,11,41 These data points form a primitive PES that guides the database construction at the initial stage. For example, for gas-phase reactions, the minimum energy reaction path can be adopted to define the primitive database, and then, the database can be iteratively updated with the grow scheme by performing trajectory simulations to sample the configurational subspaces where the trajectories tend to visit frequently but the data points are yet sparse, and/or where the accuracy of the IM energy is the lowest.39,43 Of course, these two criteria can be combined in an alternating manner. Here, we followed a similar but at the same time somewhat distinct approach for the collection. Because we are handling a case without any reaction path, we initiated the sampling simulations from the QM/MM minimized structure of each BChla. To expedite the database collection, we ran 10 trajectories in parallel with different initial velocities. For each trajectory, if it reached a conformation separated by more than 0.5 Å Euclidean distance from all existing points in the database within 2 ps of simulation time, that conformation was added to the database. This was repeated until no new points were added after completing the 2 ps sampling trajectories. After completing this first stage, we then moved on to the next stage of checking the quality of the PES with the IM energy deviation from the QC energy. We adopted the 10 trajectories of 2 ps durations that just completed the first stage for the conformational sampling and obtained a total of 20 conformations from each trajectory at 0.1 ps intervals. DFT single point energy calculations were performed for these

conformations, and the earliest geometry whose energy error was larger than a preset cutoff (Ecut = 2 kJ/mol) was selected and added to the IM database after further calculating derivative information. After this addition, the 10 trajectories were propagated again from the same initial conditions, and these new trajectories were considered again for the data addition by checking their IM energy errors. This was repeated until no new point was added during the 2 ps simulation time. Upon the completion of this stage, the process went back to the first stage that used the geometrical criterion. After some cycling through these two stages, we could reach a situation where no data point was added from both stages. When this condition was met, the initial geometry was allowed to “drift” in space by performing 1 ps trajectory simulation. After resetting the initial structure this way, the above two stages with geometry and energy error criteria were again repeated in a similar manner. We will call the whole process between two adjacent drifts as a segment. Namely, a segment can contain multiple cycles of the geometry criterion stage and the energy error criterion stage. The 10 trajectories were used as independent workers, and the whole process was completed when all 10 workers finished adding new points. Of course, we allowed workers to finish with different numbers of cycles. The number of new data points that were collected in each 1 ps drift segment tended to become less as we gather more and more IM data points. Eventually, both the geometry and the energy error criteria were satisfied even without adding any new data points with the 2 ps simulations. This was considered as a sign that indicated that the IM data points were scattered widely enough in the conformational space and that the IM PES was becoming accurate over practically reachable regions. Thus, we monitored the average number of IM data points gathered at each segment index. Figure 2 displays how this average varied with the increase of the segment index. Some worker trajectories drifted over different segments faster than others, and the average in this figure was computed only with workers that passed the relevant position of the segment index. In any case, the number of collected data points in each drift segment clearly decreased as the workers passed through more and more segments, and

Figure 2. Average number of added IM data points collected to pass the i-th segment. The collecting attempts were made by using 10 parallel worker trajectories on each of the seven chromophore units, and thus, the average at i = 0 was taken from a total of 70 attempts. However, because each worker trajectory moved with a different speed in the segment index space, the total number of attempts used for the averaging naturally decreased at higher i. In the end, data collection was stopped when 1500 data points were accumulated for each BChla unit. Over the course of collecting 1500 × 7 = 10500 data points, the slowest worker trajectory passed 69 segments, while the fastest one moved over 233 segments. D

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

the Hessian matrix of an IM data point has a negative eigencomponent, a displacement along the negative mode will lower the potential. When the displacement is too excessive, the potential will subsequently be excessively low.33 When there are an infinite number of data points, this issue will not be a problem as the weight of a data point at a large distance quickly decays to zero. However, in the practical application, if there is no data point at a shorter distance than the point with a negative Hessian character, it can become quite a serious issue. In our experience, when this issue arose, the IM part faced abnormal elongation or contraction of a bond that was not contained in the set of Z-matrix coordinates. The use of a stabilizing potential (eq 9) was helpful in general, but still in some cases, its energetic penalty was not enough. With this problem, a long trajectory simulation with the PES defined by eq 5 could not be attained. One practical way to reduce the influence of such problematic data points will be to introduce the confidence radius such that it can control the relative fidelity of the local Taylor expansion around an IM data point. When the modified Shepard weighting function as displayed in eq 6 is used with properly evaluated confidence radii, the surface is affected more by data points with higher fidelity especially when the molecular conformation is far from the IM data points. For this, one needs to know how to calculate the radius for each data point. As was briefly explained in section 2, Collins and coworkers proposed to estimate confidence radii by Bayesian analysis (eq 7) using the set of points in the database.34 In fact, eq 7 becomes reliable when the QC energies are known at the proximity of an individual IM data point Za. Because the database is constructed such that its member points are collected as sparsely as possible, this condition will not be rigorously met by design in our scheme. Thus, we performed additional analysis to better estimate the confidence radii with the aid of IM/MM MD simulations. First, primitive values of the confidence radii were evaluated by applying the original approach with the energies at the database geometries. The elements of {Zba} for the Bayesian analysis for each a were selected to be the 250 nearest molecular conformations around Za in the database, judged by the Euclidian distances between the conformations. The value of Etol was set to 2 kJ/mol. This defined a new IM PES with eq 6 by adopting p = 12 and q = 2, and we performed MD simulations with this surface. We needed to sample diverse conformations of BChla directed by the protein motion, and therefore, we extracted a batch of snapshots from a long trajectory with pure MM potential. The reason we used the pure MM model here was because IM PES at this stage could not support a long time simulation. In any case, after the batch was collected, the conformations were adjusted to the IM/MM potential by additionally performing MD simulations on the IM/MM potential starting from the batch geometries. The long pure MM simulation was initiated from the energy-minimized crystal structure and continued for 6 ns at 300 K condition with 1 ps coupling time constant. With this, the first 1 ns window was discarded, and the atomic positions and the velocities were recorded every 10 ps. In this manner, we prepared a batch of 500 different initial conditions. The adjusting IM/MM simulations were performed at 30 K, the same temperature we adopted for collecting the IM data points. After discarding the first 10 ps data from these IM/MM simulations, conformations of the IM region were recorded every 1 ps for

we chose to stop collecting IM data points when the average number of newly added data points at a single segment index became zero. This condition was met after collecting 1500 IM data points for each chromophore. In this manner, our grow scheme adopted criteria that considered both geometric and energetic deviations. In addition, the energy deviation criterion was based on the difference between the IM and the QC energies. As there is a possibility that multiple worker trajectories can sample space close to each other, one can easily notice that our design is not processor time efficient. The database size may also be larger than optimal. However, in our experience, this approach was efficient in terms of wall clock time.31 The database size did not matter either, as the eventual IM load was at most comparable to the MM load. During the whole process, we used the simple Shepard weight in eq 5 with the power p = 12. Dual Thermostat Scheme. Toward making the IM PES reliable for room temperature simulations, an obvious approach would be to run the aforementioned data collection routines at the same room temperature. However, such an attempt of ours revealed that it was actually inefficient for the following reason. Because we started only with the QM/MM minimum energy geometry in the initial database, we needed to collect some data points with the geometric criterion through the first stage before subjecting the energy deviation criterion. Because of relatively high thermal energy and the initial crudeness of the IM PES, the BChla molecule quickly distorted and upon visual inspection, the sampled data points were obviously with irrelevant conformations. Because QC calculations provided very high energies for such data points, adding them actually induced drastic change to the PES, which in turn severely altered the direction of the sampling trajectory. As a result, the overall behavior was very erratic, and adding a very large number of data points did not result in a converging termination of the first stage of the data collection. Simply, this was because most of the data points were in a dynamically unimportant region of the conformational space and contributed little to the actual quality of the PES. To circumvent this issue, we applied two different temperatures for the IM and the MM regions during the sampling simulations. With this, an excessive thermal energy that would drive the IM part too far from the region with reliable PES description could be safely discarded. At the same time, the IM part was still constantly distorted by the contacts with the MM part. Indeed, in the FMO complex case, the IM−MM interaction primarily modulated the geometrical distortions of BChla, and thus the protein shape governed more on the chromophore conformational space that we needed to sample during the data collection. Thus, maintaining the IM part at a low temperature did not mean that its vibrational motions would not be sampled. Practically, this dual temperature scheme was realized by adopting Berendsen’s weak coupling thermostat54 algorithm separately for the IM and the MM regions, with 30 and 300 K target temperatures and with coupling time constants of 0.01 and 0.1 ps, respectively. The initial velocity was randomly generated for the whole system at 30 K, and the MM region was gradually heated to its target temperature. Postprocessing: Bayesian Analysis Updating. As we mentioned in the Introduction, one important issue with interpolation is the existence of energetic traps which can effectively lock trajectories to severely distorted regions.32,33 This is related to the negativity of Hessian data. Namely, when E

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

have both the ground state and the first excited state PES’s, we applied the construction procedures only to the ground state. This was based on a presumption that the shapes of the two PES’s would not be much different, from the fact that the reorganization energy of BChla is rather small.55 Of course, skipping the additional grow procedures for the excited state was to save time. In the ground state case, after all the procedural details were fixed, the entire database collection took roughly 5 weeks of wall-clock time by employing ∼300 processor cores in parallel. Had we gone through the same procedures for the excited state, judging from the cost difference between the ground state DFT and the excited state TDDFT calculations, it would have cost ∼13 more weeks for the completion of the excited state database constructions. By adopting the same molecular geometries and the confidence radii from the ground state data, the excited state database could be constructed in only ∼5 weeks. The validity of this assumption will be discussed together with the tests on the accuracy of the IM PES’s in the next section. Parallelization. With an increase of the database size, the computational burden of IM became comparable to the cost of MM. Similarly to the MM scheme where efficient parallelization is generally well-known,56−59 it was natural to pursue additional parallelization within the IM scheme. Therefore, we implemented a parallel algorithm to the IM stage by utilizing the message passing interface (MPI) library.60 The parallelization is relatively straightforward as the energy and the gradient expressions displayed in eqs 1 and 4 are basically summations of the contributions by individual interpolation data points. A flowchart describing the overall scheme for the parallel IM in conjunction with the MM part is presented in Scheme 1. Basically, as the scheme shows, the IM routine commences right after the finish of the MM force calculation, and it inserts its own force components to the MM vector just before the integrations of the equations of motion. Because the MM parallelization may divide the IM region into multiple domains,58 the IM atom coordinates are collected by the master processor and then distributed to the slaves first. This is because computing the geometric displacement vector Δa, as a crucial quantity required during the IM calculations with eqs 1 and 4 for treating every data point at every MD step, requires the knowledge of all atomic coordinates in the entire IM region. With the IM region coordinate information, all processors treat their allocated data points by aligning them first toward the interpolation weight calculation and then by computing the force and the energy elements. These elements are finally accumulated in parallel and sent to the master. After the final treatment by the master processor on relatively cheap routines such as the interpolation weight normalization and internalCartesian coordinate transform, the IM force vector is broadcast to slave processors so that the vector components are appropriately inserted to its relevant parallelized domain in the MM routine. The parallelization efficiency is determined by the ratio between the time spent by parallelized computational routines and by their overheads, respectively, represented by blue and red/gray in Scheme 2. We checked the parallel efficiency with a series of benchmark tests up to 128 processor cores in 8 separate nodes. All of the tests were conducted by running the IM/MM simulations for 1000 steps by using an IM database with N = 10500 and then by separately measuring the time spent during individual stages. The computation nodes were equipped with dual Intel E5−2690 processors with 64 GBs of

an additional duration of 100 ps. This resulted in a total of 50000 geometries for each BChla unit. The QC single point energies were evaluated for these 50000 sampled molecular geometries and were subsequently used for updating the confidence radii. Of course, geometries from the IM database were considered together with these newly selected points toward generating 250 nearby geometries for each data point for the Bayesian analysis. The improvement after updating the confidence radii with our additional computations was checked by monitoring the agreement between IM and QC energies at selected BChla geometries. The geometries for this monitoring were defined as the last frames of the 500 IM/MM trajectories described in the above paragraph. Figure 3 visualizes the monitoring result obtained

Figure 3. Improvement in the quality of IM PES for BChla-2 by updating the Bayesian confidence radii with our scheme based on 500 monitoring geometries. At these geometries, the IM energies evaluated with primitive radii (black) and with updated radii (red) were calculated and compared against the corresponding QC energies.

for the BChla-2 unit, where the improvement was most significant. (See Figure S2 for other BChla units.) In principle, if needed, this approach can be applied in an iterative manner until convergence, although we did not attempt it here. Scheme 1 summarizes the overall procedures for constructing the IM PES in this work. Construction of Excited State PES. The overall procedures described above can be applied to any electronic state. For BChla considered here, even though we aimed to Scheme 1. Flow Chart Representation of Our IM PES Construction Procedures

F

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation Scheme 2. Flow Chart Representation of the Parallel IM/ MM Algorithma

a

The nature of each step is color coded as listed at the bottom.

memory with the highest possible bandwidth configuration at 1600 MHz rate. Parallel communications were through a 10 Gbit ethernet switch fabric. Figure 4 summarizes the overall benchmark results with actual simulations. In the most ideal case with perfect parallelization efficiency, the total processor time should be equal for all test conditions. Even though the wall time monotonically decreases with the use of more processor cores as displayed in Figure 4a, in reality, the total processor time for the MM part displays moderate increases, with ∼70% parallel efficiency when 128 processor cores on 8 nodes were utilized at the same time. This is likely a usual behavior with the domain decomposition scheme58 and the computational setup adopted in this work. For this MM part, we note that how the processor cores were allocated affected the efficiency in an interesting manner. Namely, the 8 core × 8 node condition performed slightly better than the 16 core × 4 node condition (dashed circle in Figure 4b). Because the internode network communication is slower than the intranode one, the better performance of the 8 × 8 condition with twice more nodes signifies that the parallel communication is not likely a serious bottleneck for the MM part computation and that the processor cores have actually performed better in this condition. The better performance should be quite expected from the modern processor design, with which the processor clock rate boosts up to by ∼20% when only a fraction of its cores are utilized.61 In the case of IM calculation, the overall trend is similar. Especially, when we compare the four-core conditions in Figure 4b, namely, 4 × 1, 4 × 2, 4 × 4, and 4 × 8 results (in core × node notation), we can observe that the parallel efficiency deteriorated only very slightly. The efficiency drop was drastic when a total of 128 cores were used. At that point, the parallelization was excessive with ∼80 data points per core, and thus, the processor synchronization (“MPI wait”) was an issue. Actually, what is more peculiar is the fact that parallelizing over separate nodes performed much better for the IM case. Namely, using the 8 core × 8 node condition performed much better than using the 16 core × 4 node condition (dashed rectangle in Figure 4b). This performance increase is much larger than the corresponding increase from the MM part

Figure 4. Efficiency of our parallel implementation of IM/MM with up to 128 processor cores on 8 nodes. All tests were conducted by running 1000 steps of IM/MM simulations with a database size N = 10500. Different colors denote the number of adopted nodes. (a) The wall time spent by the MM part (filled squares) and the IM part (open circles). The inset shows the magnified view on the high core count results. (b) The total accumulated processor times utilized by the MM part (filled squares) and the IM part (open circles). (c) The processor times utilized by the computational components of the IM part, represented by the blue plus red routines in Scheme 2.

(dashed circle) and should not be ascribed to the processor clock boost. This puzzling aspect could be explained after we separately inspected the processor time spent for pure computations in the IM part. This is mainly governed by vector−vector and vector−matrix multiplications (eqs 1 and 4), and as such, total processor time should be insensitive to the number of used cores beyond the level of the clock boost. However, when we inspected this time as a function of the number of used cores per node, it increased drastically with the per-node core usage (Figure 4c). This seemingly odd behavior can be explained by the saturation of the memory bandwidth. For every MD step, the processors must read all the energy, gradient, and Hessian data stored in the memory. The total size of the data that needed to be accessed during 1000 steps of MD simulation was 1.33 TBs, while the memory bandwidth of one node was 102 GBs per second. This means that when a single core occupied the whole bandwidth, the memory access time would have been ∼13 s. When 16 cores competed for the memory with each G

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

describing chromophore thermal motions at the simulated temperature. We also inspected how the PES quality has improved during IM database construction. Figure 5 presents the variations of

other, it would have become 208 s, and likewise, eight cores would have spent 104 s. Indeed, this argument agrees quite well with the time increase in the dotted box of Figure 4b: using 16 cores × 4 nodes took ∼164 s more than using 8 cores × 8 nodes, while our bandwidth consideration predicted an increase of 104 s. Thus, with a massive database like that in our case, we can see that the interpolation is more affected by the memory access time than the parallel communication cost. However, memory bandwidth is one of the fast improving properties with recent technical advancements, and it will not likely be a hurdle in future applications in the practical sense. Even when this issue does become bothersome, a reduction scheme such as occasionally generating a neighbor list8,41 can be considered. IM PES Accuracy. We tested the accuracy of our IM PES by comparing IM and QC energies at various chromophore geometries. To generate the test geometry set, an IM/MM MD trajectory was initiated from one of the initial conditions of confidence radii evaluation simulations. After a quick equilibration over 1 ns at 300 K and 1 atm with Berendsen’s weak-coupling algorithms, the trajectory was extended for additional 10 ns with a velocity rescale thermostat62 and Parrinello−Rahman barostat63 at the same temperature and pressure. Along this extended trajectory, snapshots of the IM region was recorded every 10 ps to generate a test set of 1000 conformations for each BChla unit. Table 1 summarizes the root-mean-square (RMS) errors between the IM and the reference QC energies evaluated with

Figure 5. Changes of RMS errors in IM energy with the growth of the IM database. The same test geometries adopted for constructing Table 1 were again employed here.

the RMS errors at the same test geometries with the growth of the IM database for each chromophore. From the figure, we can easily observe that the errors improved over the first ∼500 data points, but the speed of improvement became quite slow beyond that point. However, judging from Figure 2, it is apparent that the points in the later stage of database construction were improving the surface fidelity in terms of the energy reliability. With this, we can infer that the use of an RMS error should be discouraged in judging the surface convergence as it will likely be affected more heavily by the majority of well-behaving points especially at a later stage of database construction. In addition, we can observe that the RMS errors are not converging to zero in Figure 5. This is because we imposed a low-temperature thermostat to the IM region. With this, localized high frequency motions such as C− H vibrations were not reliably sampled during the database construction. Because we applied an ambient temperature to the whole system for collecting the final test geometries, such motions were more actively sampled, and they contributed largely to the RMS errors. As mentioned in the above paragraph, their effects are rather well balanced in both the ground and the excited states, and the gap energies between them were affected little as shown in Table 1. When needed, of course, these high frequency motions can be described better by additionally employing database construction at a higher temperature. For all of the geometries in these final test sets, the contribution of the stabilizer term was less than 0.05 eV for most cases and did not exceed 0.1 eV in any case. We also note that we could perform this 10 ns simulation in 68 h with 64 processor cores. Thus, performing much longer simulations will be practically very feasible. In addition, excited state simulation will not take a longer time as its interpolation database size is the same as the size of the ground state database. With these final test results, we can perform one additional verification on our assumptions. As explained in a previous part, during the construction of the excited state IM PES, we assumed that the confidence radii for the data points from the ground state would be transferable to the excited state. Figure 6 shows the correlation between the ground and the excited state QC energies at the test set geometries. An excellent correlation is observed between the two sets of energies, strongly

Table 1. Root-Mean-Square Errors of IM Energies in Reference to the QC Energies from the Test Sets with 1000 Geometriesa

a

entry

ground state

excited state

transition energy

BChla-1 BChla-2 BChla-3 BChla-4 BChla-5 BChla-6 BChla-7

0.170 0.093 0.100 0.124 0.127 0.110 0.169

0.166 0.090 0.098 0.122 0.124 0.107 0.165

0.010 0.007 0.007 0.009 0.007 0.007 0.008

All values are in eV units.

these test geometry sets for the seven chromophore units. The pictorial representation of the correlations between the IM and the QC energies are also presented in Figure S3 of Supporting Information. We can see that the errors are in general in the order of 0.1 eV. Moreover, the RMS errors for the S0 → S1 transition energies are notably smaller by an order of magnitude than the absolute errors of each state. This is likely because error generating vibrations affect the two states in a balanced manner, which we will discuss further in the next paragraph. In addition, we should also note that ∼0.01 eV error may still be large in the physical sense. Namely, because the spread of the transition energies (dynamic and/or static disorders64) are in the order of 0.05 eV, which is not much larger than the error level of IM/MM, care needs to be taken when interpreting dynamics obtained with the IM PES. Even still, it is encouraging to observe that representing the ample BChla surfaces with only 1500 data points each can have this level of reliability. Indeed, considering that the intrinsic error of any amenable quantum chemical method can be even larger than our observed RMS interpolation error, expanding the interpolation database will not be that meaningful for H

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

For example, the use of transition charge from electrostatic potential (TrESP) can be utilized,81,82 where the electronic transition densities are approximated as partial transition charges located at atomic positions. With such descriptions, adiabatic excited states as superpositions of locally excited states can be obtained by diagonalizing the final Hamiltonian matrix.83 Philosophically, this approach will be similar to a recently reported exciton model approach for quantum chemically calculating molecular aggregates,77 which showed reliable reproductions of excitation energies from brute force style supercomplex calculations.

4. CONCLUSIONS In this article, we have presented practical schemes for constructing an interpolated PES of a large molecule with an application example of a BChla chromophore in the FMO complex. We extended the conventionally employed grow scheme by iteratively adding new interpolation data points with distance and energy criteria until convergence. These iterative additions were performed by multiple worker trajectories that started from equilibrium geometries of each chromophore unit in the complex and then drifted in space along the progress of the iterative additions. Even though the drift was somewhat slow in time, the interpolation database could be improved stably in this manner. Employing multiple workers was of course to compensate the slow speed of the drift. Eventually, both the distance and the energy criteria displayed converging behaviors in terms of the number of data points added during each segment of drifts by the worker trajectories. To further stabilize sampling trajectories especially when the interpolated PES was primitive, the vibrational motion of the chromophore unit was limited by imposing a low temperature condition on the chromophore. Even still, large scale motions were still promoted by imposing a higher temperature on the protein. Thus, this dual thermostat scheme with different temperatures on the IM and the MM parts of the system assisted the data point collection process with stability and efficiency. Through these procedures, we constructed seven separate IM databases with 1500 data points for each individual BChla unit. Right after the collection, the PES was affected by negativities in Hessian characters and was not appropriate for long time simulations. As a remedy, the collected data points were subjected to postprocessing with Bayesian confidence radii evaluations. To complement the sparsity of the IM data points for this purpose, we supplied additional molecular conformations to the Bayesian analysis. This additional process, which can in principle be applied in an iterative manner, improved the surface fidelity consistently for all chromophore units. Finally, we parallelized the IM calculation routine and tested its efficiency for a series of conditions. All the features described above allowed us to conduct a stable IM/MM simulation up to many multiples of nanoseconds, which would not be practically possible with the matching QM/MM scheme. From this relatively long simulation, the errors between our IM PES and the reference QC values were only ∼0.15 eV for the absolute state energy and ∼0.01 eV for the S0 − S1 energy gap. The much reduced error in the energy gap resulted from the way we constructed the interpolation database. Most importantly, our work demonstrates that a reliable interpolated PES can be constructed even for a large molecule such as BChla, with practical adjustments of existing approaches such as ours. We hope this work can act as a guide for overcoming problems that

Figure 6. High correlation between the ground and the excited state energies of BChla-1 at the test geometries. A high similarity between the two PES shapes can be inferred from this figure.

suggesting that the PES shapes of the two electronic states are quite similar. Physically, this is supported by the smallness of the Huang−Rhys factor65 of the BChla chromophore.66 Considering the similarity of the two surfaces, it is reasonable to assume that the confidence radii will be similar for the two states. However, this assumption is only valid for BChla, and for other molecules with large reorganization energies67 and/or substantial Duschinsky rotations,68 the full procedures will need to be separately employed for shaping the excited state PES.69 Considerations on Delocalized Excitonic States. In the excited state PES’s we have generated so far, the excitation is completely localized on a single chromophore. However, it is widely known that considerable amounts of interchromophore coupling exist in the FMO complex, with subsequent delocalization of electronic excitations.64,70,71 Thus, our excited state IM PESs are not true adiabatic excited states but rather localized site-based quasi-diabatic states. In the case of quantum chemical descriptions, the recently developed multistate DFT (MS-DFT) approach has shown that the adiabatic states of a large system can be rigorously derived from localized diabatic configurations.72 In this approach, the orbitals of the full system are partitioned into subgroups corresponding to valence-bond (VB) like nonorthogonal diabatic states, for which the energies and interstate overlap elements are determined. The adiabatic state of the full system can then be calculated in the framework of the VB theory.72,73 This formalism can be adopted for molecular aggregates by using localized monomeric states as its building blocks toward producing fully delocalized adiabatic states. Indeed, in this manner, excited state dynamics of singlet fission has been simulated to give valuable insights on interpreting experimental observations.74 Meanwhile, when the excited states of a chromophore aggregate system are of particular interest, the subsystem formulation of TDDFT can also be an effective tool for generating supermolecular excited states.75,76 Additional monomer-based approaches are being further developed and tested.77,78 Related to these earlier developments, in the case of FMO, a rather simple model that can interconvert the adiabatic and the site-based representations is the Frenkel exciton model. In this model, the two representations are linked by a similarity transform defined by a Hamiltonian matrix whose diagonal and off-diagonal elements are, respectively, dictated by site transition energies and interchromophore electronic coupling.70,79,80 Thus, we will need to further consider such coupling elements toward predicting spectroscopic features. I

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

(10) Nguyen, K. A.; Rossi, I.; Truhlar, D. G. A Dual-Level Shepard Interpolation Method for Generating Potential Energy Surfaces for Dynamics Calculations. J. Chem. Phys. 1995, 103, 5522−5530. (11) Rhee, Y. M.; Lee, T. G.; Park, S. C.; Kim, M. S. Potential Energy Surfaces for Polyatomic Reactions by Interpolation with Reaction Path Weight: CH2OH+ → CHO+ + H2 Reaction. J. Chem. Phys. 1997, 106, 1003−1012. (12) Ishida, T.; Schatz, G. C. Automatic Potential Energy Surface Generation Directly from Ab Initio Calculations Using Shepard Interpolation: A Test Calculation for the H2 + H System. J. Chem. Phys. 1997, 107, 3558−3568. (13) Collins, M. A.; Zhang, D. H. Application of Interpolated Potential Energy Surfaces to Quantum Reactive Scattering. J. Chem. Phys. 1999, 111, 9924−9931. (14) Thompson, K.; Martínez, T. J. Ab Initio/Interpolated Quantum Dynamics on Coupled Electronic States with Full Configuration Interaction Wave Functions. J. Chem. Phys. 1999, 110, 1376−1382. (15) Ishida, T.; Schatz, G. C. A Local Interpolation Scheme Using no Derivatives in Quantum-Chemical Calculations. Chem. Phys. Lett. 1999, 314, 369−375. (16) Zhang, D. H.; Collins, M. A.; Lee, S.-Y. First-Principles Theory for the H + H2O, D2O Reactions. Science 2000, 290, 961−963. (17) Yagi, K.; Taketsugu, T.; Hirao, K. Generation of FullDimensional Potential Energy Surface of Intramolecular Hydrogen Atom Transfer in Malonaldehyde and Tunneling Dynamics. J. Chem. Phys. 2001, 115, 10647−10655. (18) Ishida, T.; Schatz, G. C. A Local Interpolation Scheme Using no Derivatives in Potential Sampling: Application to O(1D) + H2 System. J. Comput. Chem. 2003, 24, 1077−1086. (19) Le, H.-A.; Frankcombe, T. J.; Collins, M. A. Reaction Dynamics of H3+ + CO on an Interpolated Potential Energy Surface. J. Phys. Chem. A 2010, 114, 10783−10788. (20) Zhang, W.; Zhou, Y.; Wu, G.; Lu, Y.; Pan, H.; Fu, B.; Shuai, Q.; Liu, L.; Liu, S.; Zhang, L.; Jiang, B.; Dai, D.; Lee, S. Y.; Xie, Z.; Braams, B. J.; Bowman, J. M.; Collins, M. A.; Zhang, D. H.; Yang, X. Depression of Reactivity by the Collision Energy in the Single Barrier H + CD4 → HD + CD3 Reaction. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 12782−12785. (21) Evenhuis, C.; Martínez, T. J. A Scheme to Interpolate Potential Energy Surfaces and Derivative Coupling Vectors without Performing a Global Diabatization. J. Chem. Phys. 2011, 135, 224110. (22) Evenhuis, C. R.; Collins, M. A. Interpolation of Diabatic Potential Energy Surfaces. J. Chem. Phys. 2004, 121, 2515−2527. (23) Olsen, R. A.; Busnengo, H. F.; Salin, A.; Somers, M. F.; Kroes, G. J.; Baerends, E. J. Constructing Accurate Potential Energy Surfaces for a Diatomic Molecule Interacting with a Solid Surface: H2 + Pt(111) and H2 + Cu(100). J. Chem. Phys. 2002, 116, 3841−3855. (24) Crespos, C.; Collins, M. A.; Pijper, E.; Kroes, G. J. Application of the Modified Shepard Interpolation Method to the Determination of the Potential Energy Surface for a Molecule−Surface Reaction: H2 + Pt(111). J. Chem. Phys. 2004, 120, 2392−2404. (25) Krishnamohan, G. P.; Olsen, R. A.; Kroes, G.-J.; Gatti, F.; Woittequand, S. Quantum Dynamics of Dissociative Chemisorption of CH4 on Ni(111): Influence of the Bending Vibration. J. Chem. Phys. 2010, 133, 144308. (26) Frankcombe, T. J.; Collins, M. A.; Zhang, D. H. Modified Shepard Interpolation of Gas-Surface Potential Energy Surfaces with Strict Plane Group Symmetry and Translational Periodicity. J. Chem. Phys. 2012, 137, 144701. (27) Park, J. W.; Kim, H. W.; Song, C. I.; Rhee, Y. M. Condensed Phase Molecular Dynamics Using Interpolated Potential Energy Surfaces with Application to the Resolvation Process of Coumarin 153. J. Chem. Phys. 2011, 135, 014107. (28) Park, J. W.; Rhee, Y. M. Interpolated Mechanics-Molecular Mechanics Study of Internal Rotation Dynamics of the Chromophore Unit in Blue Fluorescent Protein and Its Variants. J. Phys. Chem. B 2012, 116, 11137−11147.

may be encountered in future applications of interpolation routines for studying large systems.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.jctc.6b00647. Definitions of the internal coordinates, PES improvement by Bayesian treatment, and pictorial comparisons between IM and QC energies (PDF)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Funding

This work was supported by Institute for Basic Science (IBS) of Korea. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS ́ (Stanford University) for fruitful Y.M.R. thanks Todd Martinez discussions during his visit in Korea.



ABBREVIATIONS MD, molecular dynamics; MM, molecular mechanics; IM, interpolated mechanics; QC, quantum calculation; PES, potential energy surface; BChla, bacteriochlorophyll a; FMO, Fenna−Matthews−Olson; MPI, message-passing interface; MS-DFT, multistate density functional theory; VB, valencebond; TrESP, transition charge from electrostatic potential



REFERENCES

(1) Frenkel, D.; Smit, B. Understanding Molecular Simulation: From Algorithms to Applications; Academic Press: New York, 2002. (2) Adcock, S. A.; McCammon, J. A. Molecular Dynamics: Survey of Methods for Simulating the Activity of Proteins. Chem. Rev. 2006, 106, 1589−1615. (3) Leach, A. R. Molecular Modeling: Principles and Applications, 2nd ed.; Prentice Hall: New York, 2001. (4) MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiórkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586−3616. (5) Ponder, J. W.; Case, D. A. Force Fields for Protein Simulations. Adv. Protein Chem. 2003, 66, 27−85. (6) Senn, H. M.; Thiel, W. QM/MM Methods for Biomolecular Systems. Angew. Chem., Int. Ed. 2009, 48, 1198−1229. (7) Brunk, E.; Rothlisberger, U. Mixed Quantum Mechanical/ Molecular Mechanical Molecular Dynamics Simulations of Biological Systems in Ground and Electronically Excited States. Chem. Rev. 2015, 115, 6217−6263. (8) Ischtwan, J.; Collins, M. A. Molecular Potential Energy Surfaces by Interpolation. J. Chem. Phys. 1994, 100, 8080−8088. (9) Jordan, M. J. T.; Thompson, K. C.; Collins, M. A. Convergence of Molecular Potential Energy Surfaces by Interpolation: Application to the OH + H2 → H2O + H Reaction. J. Chem. Phys. 1995, 102, 5647− 5657. J

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation (29) Park, J. W.; Rhee, Y. M. Emission Shaping in Fluorescent Proteins: Role of Electrostatics and π-stacking. Phys. Chem. Chem. Phys. 2016, 18, 3944−3955. (30) Park, J. W.; Rhee, Y. M. Diabatic Population Matrix Formalism for Performing Molecular Mechanics Style Simulations with Multiple Electronic States. J. Chem. Theory Comput. 2014, 10, 5238−5253. (31) Park, J. W.; Rhee, Y. M. Constructing Polyatomic Potential Energy Surfaces by Interpolating Diabatic Hamiltonian Matrices with Demonstration on Green Fluorescent Protein Chromophore. J. Chem. Phys. 2014, 140, 164112. (32) Crittenden, D. L.; Thompson, K. C.; Chebib, M.; Jordan, M. J. T. Efficiency Considerations in the Construction of Interpolated Potential Energy Surfaces for the Calculation of Quantum Observables by Diffusion Monte Carlo. J. Chem. Phys. 2004, 121, 9844−9854. (33) Rhee, Y. M.; Park, J. W. Interpolation for Molecular Dynamics Simulations: from Ions in Gas Phase to Proteins in Solution. Int. J. Quantum Chem. 2016, 116, 573−577. (34) Bettens, R. P. A.; Collins, M. A. Learning to Interpolate Molecular Potential Energy Surfaces with Confidence: A Bayesian Approach. J. Chem. Phys. 1999, 111, 816−826. (35) Fenna, R. E.; Matthews, B. W. Chlorophyll Arrangement in a Bacteriochlorophyll Protein from Chlorobium Limicola. Nature 1975, 258, 573−577. (36) Aghtar, M.; Strü m pfer, J.; Olbrich, C.; Schulten, K.; Kleinekathöfer, U. Different Types of Vibrations Interacting with Electronic Excitations in Phycoerythrin 545 and Fenna−Matthews− Olson Antenna Systems. J. Phys. Chem. Lett. 2014, 5, 3131−3137. (37) Wang, X.; Ritschel, G.; Wuster, S.; Eisfeld, A. Open Quantum System Parameters for Light Harvesting Complexes from Molecular Dynamics. Phys. Chem. Chem. Phys. 2015, 17, 25629−25641. (38) Lee, M. K.; Huo, P.; Coker, D. F. Semiclassical Path Integral Dynamics: Photosynthetic Energy Transfer with Realistic Environment Interactions. Annu. Rev. Phys. Chem. 2016, 67, 639−668. (39) Thompson, K. C.; Collins, M. A. Molecular Potential-Energy Surfaces by Interpolation: Further Refinements. J. Chem. Soc., Faraday Trans. 1997, 93, 871−878. (40) Thompson, K. C.; Jordan, M. J. T.; Collins, M. A. Molecular Potential Energy Surfaces by Interpolation in Cartesian Coordinates. J. Chem. Phys. 1998, 108, 564−578. (41) Thompson, K. C.; Jordan, M. J. T.; Collins, M. A. Polyatomic Molecular Potential Energy Surfaces by Interpolation in Local Internal Coordinates. J. Chem. Phys. 1998, 108, 8302−8316. (42) Rhee, Y. M. Construction of an Accurate Potential Energy Surface by Interpolation with Cartesian Weighting Coordinates. J. Chem. Phys. 2000, 113, 6021−6024. (43) Collins, A. M. Molecular Potential-Energy Surfaces for Chemical Reaction Dynamics. Theor. Chem. Acc. 2002, 108, 313−324. (44) Walker, R. C.; Crowley, M. F.; Case, D. A. The Implementation of a Fast and Accurate QM/MM Potential Method in Amber. J. Comput. Chem. 2008, 29, 1019−1031. (45) Humphrey, W.; Dalke, A.; Schulten, K. VMD: Visual Molecular Dynamics. J. Mol. Graphics 1996, 14, 33−38. (46) Shao, Y.; Gan, Z.; Epifanovsky, E.; Gilbert, A. T. B.; Wormit, M.; Kussmann, J.; Lange, A. W.; Behn, A.; Deng, J.; Feng, X.; Ghosh, D.; Goldey, M.; Horn, P. R.; Jacobson, L. D.; Kaliman, I.; Khaliullin, R. Z.; Kuś, T.; Landau, A.; Liu, J.; Proynov, E. I.; Rhee, Y. M.; Richard, R. M.; Rohrdanz, M. A.; Steele, R. P.; Sundstrom, E. J.; Woodcock, H. L.; Zimmerman, P. M.; Zuev, D.; Albrecht, B.; Alguire, E.; Austin, B.; Beran, G. J. O.; Bernard, Y. A.; Berquist, E.; Brandhorst, K.; Bravaya, K. B.; Brown, S. T.; Casanova, D.; Chang, C.-M.; Chen, Y.; Chien, S. H.; Closser, K. D.; Crittenden, D. L.; Diedenhofen, M.; DiStasio, R. A.; Do, H.; Dutoi, A. D.; Edgar, R. G.; Fatehi, S.; Fusti-Molnar, L.; Ghysels, A.; Golubeva-Zadorozhnaya, A.; Gomes, J.; Hanson-Heine, M. W. D.; Harbach, P. H. P.; Hauser, A. W.; Hohenstein, E. G.; Holden, Z. C.; Jagau, T.-C.; Ji, H.; Kaduk, B.; Khistyaev, K.; Kim, J.; Kim, J.; King, R. A.; Klunzinger, P.; Kosenkov, D.; Kowalczyk, T.; Krauter, C. M.; Lao, K. U.; Laurent, A. D.; Lawler, K. V.; Levchenko, S. V.; Lin, C. Y.; Liu, F.; Livshits, E.; Lochan, R. C.; Luenser, A.; Manohar, P.; Manzer, S. F.; Mao, S.-P.; Mardirossian, N.; Marenich, A.

V.; Maurer, S. A.; Mayhall, N. J.; Neuscamman, E.; Oana, C. M.; Olivares-Amaya, R.; O’Neill, D. P.; Parkhill, J. A.; Perrine, T. M.; Peverati, R.; Prociuk, A.; Rehn, D. R.; Rosta, E.; Russ, N. J.; Sharada, S. M.; Sharma, S.; Small, D. W.; Sodt, A.; Stein, T.; Stück, D.; Su, Y.-C.; Thom, A. J. W.; Tsuchimochi, T.; Vanovschi, V.; Vogt, L.; Vydrov, O.; Wang, T.; Watson, M. A.; Wenzel, J.; White, A.; Williams, C. F.; Yang, J.; Yeganeh, S.; Yost, S. R.; You, Z.-Q.; Zhang, I. Y.; Zhang, X.; Zhao, Y.; Brooks, B. R.; Chan, G. K. L.; Chipman, D. M.; Cramer, C. J.; Goddard, W. A.; Gordon, M. S.; Hehre, W. J.; Klamt, A.; Schaefer, H. F.; Schmidt, M. W.; Sherrill, C. D.; Truhlar, D. G.; Warshel, A.; Xu, X.; Aspuru-Guzik, A.; Baer, R.; Bell, A. T.; Besley, N. A.; Chai, J.-D.; Dreuw, A.; Dunietz, B. D.; Furlani, T. R.; Gwaltney, S. R.; Hsu, C.-P.; Jung, Y.; Kong, J.; Lambrecht, D. S.; Liang, W.; Ochsenfeld, C.; Rassolov, V. A.; Slipchenko, L. V.; Subotnik, J. E.; Van Voorhis, T.; Herbert, J. M.; Krylov, A. I.; Gill, P. M. W.; Head-Gordon, M. Advances in Molecular Quantum Chemistry Contained in the QChem 4 Program Package. Mol. Phys. 2015, 113, 184−215. (47) Song, C.-I.; Rhee, Y. M. Development of Force Field Parameters for Oxyluciferin on Its Electronic Ground and Excited States. Int. J. Quantum Chem. 2011, 111, 4091−4105. (48) Bayly, C. I.; Cieplak, P.; Cornell, W.; Kollman, P. A. A WellBehaved Electrostatic Potential Based Method Using Charge Restraints for Deriving Atomic Charges: the RESP Model. J. Phys. Chem. 1993, 97, 10269−10280. (49) Kim, H. W.; Kelly, A.; Park, J. W.; Rhee, Y. M. All-Atom Semiclassical Dynamics Study of Quantum Coherence in Photosynthetic Fenna−Matthews−Olson Complex. J. Am. Chem. Soc. 2012, 134, 11640−11651. (50) Ceccarelli, M.; Procacci, P.; Marchi, M. An Ab Initio Force Field for the Cofactors of Bacterial Photosynthesis. J. Comput. Chem. 2003, 24, 129−142. (51) Ben-Shem, A.; Frolow, F.; Nelson, N. Evolution of Photosystem I − from Symmetry through Pseudosymmetry to Asymmetry. FEBS Lett. 2004, 564, 274−280. (52) Hornak, V.; Abel, R.; Okur, A.; Strockbine, B.; Roitberg, A.; Simmerling, C. Comparison of Multiple Amber Force Fields and Development of Improved Protein Backbone Parameters. Proteins: Struct., Funct., Genet. 2006, 65, 712−725. (53) Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. GROMACS 4.5: a High-Throughput and Highly Parallel Open Source Molecular Simulation Toolkit. Bioinformatics 2013, 29, 845−854. (54) Berendsen, H. J. C.; Postma, J. P. M.; van Gunsteren, W. F.; DiNola, A.; Haak, J. R. Molecular Dynamics with Coupling to an External Bath. J. Chem. Phys. 1984, 81, 3684−3690. (55) Rät sep, M.; Cai, Z.-L.; Reimers, J. R.; Freiberg, A. Demonstration and Interpretation of Significant Asymmetry in the Low-Resolution and High-Resolution Qy Fluorescence and Absorption Spectra of Bacteriochlorophyll a. J. Chem. Phys. 2011, 134, 024506. (56) Liem, S. Y.; Brown, D.; Clarke, J. H. R. Molecular Dynamics Simulations on Distributed Memory Machines. Comput. Phys. Commun. 1991, 67, 261−267. (57) Bowers, K. J.; Dror, R. O.; Shaw, D. E. The Midpoint Method for Parallelization of Particle Simulations. J. Chem. Phys. 2006, 124, 184109. (58) Hess, B.; Kutzner, C.; van der Spoel, D.; Lindahl, E. GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J. Chem. Theory Comput. 2008, 4, 435−447. (59) Hess, B. P-LINCS: A Parallel Linear Constraint Solver for Molecular Simulation. J. Chem. Theory Comput. 2008, 4, 116−122. (60) Open MPI: Open Source High Performance Computing. https://www.open-mpi.org/ (accessed Sep 8, 2016). (61) Intel Turbo Boost Technology 2.0. http://www.intel.com/ content/www/us/en/architecture-and-technology/turbo-boost/turboboost-technology.html (accessed Sep 8, 2016). (62) Bussi, G.; Donadio, D.; Parrinello, M. Canonical Sampling through Velocity Rescaling. J. Chem. Phys. 2007, 126, 014101. K

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX

Article

Journal of Chemical Theory and Computation

Photosynthetic Antennae and Reaction Centers. J. Phys. Chem. B 2006, 110, 17268−17281. (83) Valkunas, L.; Abramavicius, D.; Mancal, T. Molecular Excitation Dynamics and Relaxation; Wiley-VCH: Weinheim, Germany, 2013.

(63) Parrinello, M.; Rahman, A. Polymorphic Transitions in Single Crystals: A New Molecular Dynamics Method. J. Appl. Phys. 1981, 52, 7182−7190. (64) Shim, S.; Rebentrost, P.; Valleau, S.; Aspuru-Guzik, A. Atomistic Study of the Long-Lived Quantum Coherences in the FennaMatthews-Olson Complex. Biophys. J. 2012, 102, 649−660. (65) Rätsep, M.; Freiberg, A. Electron−Phonon and Vibronic Couplings in the FMO Bacteriochlorophyll a Antenna Complex Studied by Difference Fluorescence Line Narrowing. J. Lumin. 2007, 127, 251−259. (66) Monahan, D. M.; Whaley-Mayda, L.; Ishizaki, A.; Fleming, G. R. Influence of Weak Vibrational-Electronic Couplings on 2D Electronic Spectra and Inter-Site Coherence in Weakly Coupled Photosynthetic Complexes. J. Chem. Phys. 2015, 143, 065101. (67) Nitzan, A. Chemical Dynamics in Condensed Phases; Oxford Graduate Texts: New York, 2006. (68) Kupka, H. J. Transitions in Molecular Systems; Wiley-VCH Verlag: Weinheim, Germany, 2010. (69) Park, J. W.; Rhee, Y. M. Towards the Realization of Ab Initio Dynamics at the Speed of Molecular Mechanics: Simulations with Interpolated Diabatic Hamiltonian. ChemPhysChem 2014, 15, 3183− 3193. (70) Adolphs, J.; Renger, T. How Proteins Trigger Excitation Energy Transfer in the FMO Complex of Green Sulfur Bacteria. Biophys. J. 2006, 91, 2778−2797. (71) Olbrich, C.; Jansen, T. L. C.; Liebers, J.; Aghtar, M.; Strümpfer, J.; Schulten, K.; Knoester, J.; Kleinekathöfer, U. From Atomistic Modeling to Excitation Transfer and Two-Dimensional Spectra of the FMO Light-Harvesting Complex. J. Phys. Chem. B 2011, 115, 8609− 8621. (72) Cembran, A.; Song, L.; Mo, Y.; Gao, J. Block-Localized Density Functional Theory (BLDFT), Diabatic Coupling, and Their Use in Valence Bond Theory for Representing Reactive Potential Energy Surfaces. J. Chem. Theory Comput. 2009, 5, 2702−2716. (73) Cembran, A.; Provorse, M. R.; Wang, C.; Wu, W.; Gao, J. The Third Dimension of a More O’Ferrall−Jencks Diagram for Hydrogen Atom Transfer in the Isoelectronic Hydrogen Exchange Reactions of (PhX)2H with X = O, NH, and CH2. J. Chem. Theory Comput. 2012, 8, 4347−4358. (74) Chan, W.-L.; Berkelbach, T. C.; Provorse, M. R.; Monahan, N. R.; Tritsch, J. R.; Hybertsen, M. S.; Reichman, D. R.; Gao, J.; Zhu, X. Y. The Quantum Coherent Mechanism for Singlet Fission: Experiment and Theory. Acc. Chem. Res. 2013, 46, 1321−1329. (75) Neugebauer, J. Couplings between Electronic Transitions in a Subsystem Formulation of Time-Dependent Density Functional Theory. J. Chem. Phys. 2007, 126, 134116. (76) König, C.; Neugebauer, J. Protein Effects on the Optical Spectrum of the Fenna−Matthews−Olson Complex from Fully Quantum Chemical Calculations. J. Chem. Theory Comput. 2013, 9, 1808−1820. (77) Sisto, A.; Glowacki, D. R.; Martínez, T. J. Ab Initio Nonadiabatic Dynamics of Multichromophore Complexes: A Scalable GraphicalProcessing-Unit-Accelerated Exciton Framework. Acc. Chem. Res. 2014, 47, 2857−2866. (78) Liu, J.; Herbert, J. M. An Efficient and Accurate Approximation to Time-Dependent Density Functional Theory for Systems of Weakly Coupled Monomers. J. Chem. Phys. 2015, 143, 034106. (79) Renger, T.; Muh, F. Understanding Photosynthetic LightHarvesting: a Bottom Up Theoretical Approach. Phys. Chem. Chem. Phys. 2013, 15, 3348−3371. (80) Chenu, A.; Scholes, G. D. Coherence in Energy Transfer and Photosynthesis. Annu. Rev. Phys. Chem. 2015, 66, 69−96. (81) Curutchet, C.; Mennucci, B. Quantum Chemical Studies of Light Harvesting. Chem. Rev. 2016, DOI: 10.1021/acs.chemrev.5b00700. (82) Madjet, M. E.; Abdurahman, A.; Renger, T. Intermolecular Coulomb Couplings from Ab Initio Electrostatic Potentials: Application to Optical Transitions of Strongly Coupled Pigments in L

DOI: 10.1021/acs.jctc.6b00647 J. Chem. Theory Comput. XXXX, XXX, XXX−XXX