Kinetic Transition Network Based on Trajectory Mapping - The Journal

Jul 23, 2010 - ... and the metastable states and interstate transition kinetics are correctly ... Proceedings of the National Academy of Sciences 2013...
0 downloads 0 Views 3MB Size
10266

J. Phys. Chem. B 2010, 114, 10266–10276

Kinetic Transition Network Based on Trajectory Mapping Linchen Gong†,‡ and Xin Zhou*,†,§ Asia Pacific Center for Theoretical Physics, Pohang, Gyeongbuk 790-784, Korea, Institute for AdVanced Study, Tsinghua UniVersity, Beijing 100080, China, and Department of Physics, Pohang UniVersity of Science and Technology, Pohang, Gyeongbuk 790-784, Korea ReceiVed: January 25, 2010; ReVised Manuscript ReceiVed: July 6, 2010

The understanding of biomolecular systems can be greatly improved by constructing conformational transition networks. Network representation honestly reflects the intrinsic complexity of high-dimensional systems, which facilitates the estimation of various thermodynamic and kinetic properties. In this paper, we introduce the trajectory mapping (TM) method for naturally detecting the metastable states of biomolecules and for subsequently constructing the hierarchical kinetic transition networks. In TM, multiple simulation trajectories are mapped to high-dimensional vectors, and the interrelation between these trajectory-mapped vectors is analyzed to locate metastable states. Kinetic information can be quantitatively extracted through simple algebraic manipulations of the identified metastable states. We apply the TM method to a toy model and alanine dodecapeptide. The polypeptide system is analyzed at two time scales, and the metastable states and interstate transition kinetics are correctly revealed. Introduction The rapidly increasing computational power enables the routine use of various simulation schemes to study complex systems. In particular, the atomic details of biomolecules, which are not always accessible in experiments, could be thoroughly investigated by the molecular dynamics (MD) method.1 To better understand the massive high-dimensional simulation data generated by MD, there has been a lot of attention focused on developing rational methods for data analysis in recent years. For biomolecular systems, there are metastable regions in conformational space (also referred to as metastable states in the following). Once a simulation trajectory enters a metastable state, it may be trapped there for a while before switching fast to another state. Therefore, a coarse-grained picture of the simulation system could be established by identifying the metastable states and constructing the transition network that characterizes the transition kinetics between metastable states. Traditionally, to detect metastable states, the high-dimensional simulation data are projected into low-dimensional spaces spanned by a few (usually one or two) order parameters.2 After reproducing the low-dimensional free energy surface, the free energy wells are visually identified as metastable states. However, the hidden complexity of biomolecules is often ignored by the low-dimensional projection method.3,4 In the projected space, different metastable states and transition paths may overlap, leading to artificial states and distorted kinetics. Recently, some dimension reduction techniques5-10 have been designed and applied to biomolecules. These methods provide better characterization of the essential degrees of freedom, and make the low-dimensional projection more reliable. However, even with advanced dimension reduction techniques, there is no guarantee that a complete and visually perceivable low* To whom correspondence should be addressed. E-mail: xinzhou71@ gmail.com. † Asia Pacific Center for Theoretical Physics. ‡ Tsinghua University. § Pohang University of Science and Technology.

dimensional projection of high-dimensional simulation data can always be obtained.11 In the past two decades, there have been lots of attempts to identify the metastable states without low-dimensional projection. In particular, the Markov state model (MSM) has become a popular way for constructing kinetic transition networks.4 In the MSM, the simulation conformations are first grouped into lots of microstates.12-15 Usually enough (usually thousands of) microstates should be generated to ensure that the transition dynamics between these states is compatible with the discretetime discrete-state Markov process. Subsequently, a transition matrix of the microstates is established by counting the transition events in simulation data. The transition matrix can be analyzed with various methods, which enables the microstates to be further grouped into metastable states at different time scales.12-20 The kinetic relation between the metastable states can also be reconstructed accordingly.19,21 In this paper, we present a new method for detecting metastable states and constructing transition networks. While the MSM is a bottom-up method which reproduces the whole picture of complex systems by first defining lots of microstates, a top-down strategy is employed in our method. The top-down viewpoint is based on the fact that the number of metastable states of a biomolecular system is often determined by the observation time scale. Short-term measurement usually results in the discovery of many small metastable states with short lifetimes. When the observation time interval becomes longer, the simulation trajectory between two observations may visit several states and reach equilibrium in these states; consequently, small metastable states may merge into larger states with longer lifetimes. Broadly speaking, temporal coarse-graining will reduce the number of observable metastable states,13 and the different state structures at different time scales reflect the hierarchical organization of conformational space. Therefore, we tend to first locate the major metastable states with long lifetimes in the current method. After that, we can tune down the time scale in each metastable state to search for more detailed structure.

10.1021/jp100737g  2010 American Chemical Society Published on Web 07/23/2010

Kinetic Transition Network Based on Trajectory Mapping To implement the top-down strategy, we define the metastable states by clustering simulation trajectory segments. In practice, we first collect trajectory segments of similar lengths by truncating a long MD trajectory or spawning short independent MD trajectories. Then, these trajectory segments are mapped to high-dimensional vectors using a series of conformational functions (also referred to as basis functions in the following). Here, for each trajectory segment, the average values of basis functions are calculated and used to construct the trajectorymapped vector. The subsequent clustering analysis will be performed in the trajectory-mapped vector space. We call the new method trajectory mapping (TM) in the following according to its implementation procedure. The clustering idea in TM can be traced back to the studies that focus on grouping concentrated simulation conformations with standard clustering methods.22,23 Since in conformational space the regions of high sample density usually correspond to the metastable states, it is plausible to identify the metastable states by clustering individual simulation conformations. However, some practical difficulties always exist in this tactic. First of all, the metastable states of a high-dimensional complex system usually have various shapes and sizes, making it hard to find a proper standard for clustering. Besides, no matter clustering individual conformations or implementing dimension reduction techniques, the dynamic relation between simulation conformations is routinely ignored. Consequently, some conformations that are spatially close but kinetically disconnected may be grouped together, which often results in a misunderstanding for hierarchical systems.24 Finally, clustering lots of individual conformations could be very time-consuming.8 Nonetheless, the above-mentioned problems can be alleviated by clustering trajectory segments instead of individual conformations. In TM, the basis-function-averaging procedure depresses the thermal fluctuation of a simulation system, and makes it easier and more meaningful to cluster the simulation conformations in trajectory-mapped vector space. In a nutshell, if several trajectory segments stay in the same metastable state and their lengths are longer than the intrastate equilibration time scale, the conformational distributions of these trajectory segments should converge to the local equilibrium distribution, and the resultant trajectory-mapped vectors should be almost the same. Therefore, the simulation conformations in these trajectory segments can be grouped together straightforwardly in vector space. Because the averaging procedure naturally incorporates the dynamic information, this group of conformations can be defined as a kinetically meaningful metastable state. In other words, the interconformation difference that reflects the thermal fluctuation of a system can be transformed to the much smaller intertrajectory difference by threading individual conformations into trajectory segments and comparing the distributions of trajectory segments. Here, the distribution of a trajectory segment is characterized by averaging basis functions on this distribution. According to the central limit theorem, suppose each trajectory segment includes M independent conformations, the thermal fluctuation in a metastable state will be reduced by a factor of M in the trajectory-mapped vector space. In TM, we also carefully calibrate the correlation between basis functions; hence, the simulation trajectories can be converted into interstate transition curves that clearly show the transition events between metastable states. Kinetic information can be quantitatively extracted from these curves.21 Considering the correlation between basis functions also allows us to measure the quality of the observed metastable states. To study the

J. Phys. Chem. B, Vol. 114, No. 32, 2010 10267 transition networks at different time scales, we can simply truncate the trajectories into different lengths. The paper is organized as follows. We first present the detailed formalism of TM. After that, this method is illustrated with a simple model system, and then applied to analyze alaninedodecapeptide (Ala12). Finally, a short conclusion is given. Some details of analysis and simulation as well as some supplementary results are listed in the Supporting Information. Theory In this section, the detailed formalism of the trajectory mapping (TM) method will be presented. We first list the procedures in TM and then explain the physical meaning of trajectory-mapped vectors. Finally, we discuss how to extract kinetic information from the simulation trajectories. Trajectory Mapping. We consider Nt trajectory segments b)}µ)1,...,Nb {ti}i)1,...,Nt, each of length τ. Nb basis functions {Aµ(q are selected to map the trajectory segments into vector space, where b q denotes the conformational coordinates of the system. With properly selected basis functions, the trajectory-mapped vectors will be able to reflect the relation between trajectories and metastable states. To characterize the difference between trajectory segments (or say the metastable states), a reference set of conformations is constructed. The distribution of the reference conformations should be a linear combination of the local equilibrium distributions (i.e., probability densities in conformational space) of metastable states. As a practical choice, the collection of all the simulation conformations could be taken as the reference set. After identifying the metastable states, a more rigorous reference set can be established with the conformations in the metastable states. For the model systems in this paper, there is no prominent difference between the two reference sets (see the Supporting Information for more details). In the following, the normalized conformational distribution of b). To eliminate the the reference set will be denoted as Pref(q arbitrariness in the selection of basis functions, we orthogonalize these functions with the reference conformation set. Following the standard orthogonalization procedure (see supporting derivation for more details), the original set of basis functions is b)}µ)1,..., Nb which satisfies transformed into a new set {A˜µ(q

〈A˜µ(q b)A˜ν(q b)〉ref ) δµν,

〈A˜µ(q b)〉ref ) 0

(1)

Here, δµν is the Kronecker δ symbol. 〈 〉ref denotes the average b). Subsequently, a trajectory segment ti over distribution Pref(q could be mapped to a vector b Vi by

b V i ) (〈A˜1(q b)〉i, 〈A˜2(q b)〉i, ... , 〈A˜Nb(q b)〉i)T

(2)

Here, the superscript T denotes the transposition of a vector. For a conformational function B(q b), the average operation, 〈 〉i, is defined as

〈B(q b)〉i )

∫ B(qb)Pi(qb) dqb ) 1τ ∫0τ B(qbi(t)) dt

(3)

where Pi(q b) denotes the normalized distribution function of the conformations in the ith trajectory (or trajectory segment) and b qi(t) denotes the conformation at time point t in the ith trajectory. Actually, using eq 2, it is possible to map any conformational distribution into the vector space.

10268

J. Phys. Chem. B, Vol. 114, No. 32, 2010

Gong and Zhou

The trajectory averaging procedure directly leads to a simple topological structure of the trajectory-mapped vectors. As mentioned in the Introduction, when τ is not so short, two trajectory segments staying in the same metastable state will be mapped to almost the same vector. As a result, the highdimensional local equilibrium distribution of a metastable state can be approximately reduced to a (zero-dimensional) point in the vector space. Now we consider a transition trajectory segment tRβ which visits two states SR and Sβ; its conformational b) distribution and trajectory-mapped vector are denoted as PRβ(q and b VRβ, respectively. Due to the metastability of SR and Sβ, it b) as the linear combination is reasonable to approximate PRβ(q b) and Pβs (q b), of of the two local equilibrium distributions, PsR(q s s b) ≈ aPR(q b) + (1 - a)Pβ(q b). Here, a is a SR and Sβ, i.e., PRβ(q real number within [0.0, 1.0], which represents the length fraction of tRβ in SR. Consequently, we get the following relation

b V Rβ ≈ aV bsR + (1 - a)V bβs

(4)

Vsβ are mapped from PsR(q b) and Psβ(q b), respectively. where b VsR and b Equation 4 implies that, in the vector space, a point mapped from a transition trajectory between two metastable states should lie in the straight line defined by the vectors mapped from the nontransition trajectories in the two states. This conclusion could be easily generalized to trajectories visiting multiple states. Consequently, the high-dimensional distribution of all the simulation samples will be transformed into vectors distributed on or inside a polygon in the vector space. The vectors located at the vertices of the polygon correspond to the nontransition trajectories in metastable states, and the ones located in the lowdimensional subspaces defined by the vertices, such as the lines, triangles, and tetrahedrons, correspond to the transition trajectories between a few metastable states. In analysis, longer trajectories correspond to less vertices and a more coarse-grained picture of the system; shorter trajectory segments correspond to more vertices and more detailed knowledge of the conformational space. The trajectory length τ also determines the quality of the polygon. When τ is too short, thermal fluctuation will finally blur the shape and boundary of the polygon, indicating that the metastable assumption is no longer valid under the selected too short time scale. Given the polygon in vector space, the metastable states could be identified by selecting the vectors on the vertices (i.e., the nontransition trajectory segments). This task could be solved by hand or by clustering methods in general. Subsequently, the local equilibrium distributions of these metastable states could be reconstructed by the nontransition trajectories inside.25 The representative vector of a metastable state can be obtained by transforming the local equilibrium distribution using eq 2. It is worth noting that, when applying clustering methods in vector space, some vertices that are geometrically very close to each other may be grouped together. To solve this problem, we can establish a local reference conformation set with the simulation conformations only in these closely located vertices. When analyzing with the local reference, the local region will be zoomed up with its detailed structure revealed. In practice, after clustering the trajectory segments with the global reference conformation set which involves all of the simulation conformations, we can remap the trajectory segments in each state with the local reference of the state (i.e., only the conformations inside the state are applied as the reference set). Thus, the detailed structure in each state (if it exists) can be found by clustering algorithms. This process will be illustrated in the analysis of alanine-dodecapeptide.

Although orthonormal basis functions are not necessary for the simple topology in the trajectory-mapped vector space, the orthogonalization procedure indeed facilitates the subsequent analysis in two aspects. (1) The dimensions of the basis functions will be scaled off. (2) The trajectory-mapped vectors will have clear physical meaning as the next subsection will show. In our previous work, a similar orthogonalization procedure has been used for reproducing the equilibrium distribution with multiple simulation trajectories.25 The Physical Meaning of Trajectory-Mapped Vectors. In this subsection, we first discuss how to estimate the ratio between two conformational distribution functions with the selected basis functions, which lays the foundation for the physical meaning of trajectory-mapped vectors. b)/Pref(q b), i ) 1, ... , Similar to Fourier expansion, the item Pi(q Nt, could be expanded by the basis functions25 as

Pi(q b) )1+ Pref(q b)

∑ 〈A˜µ(qb)〉iA˜µ(qb)

(5)

µ

b)/Pref(q b))A˜µ(q b)〉ref ) 〈A˜µ(q b)〉i and the where the identity 〈(Pi(q b)}µ)1,...,Nb have been used. Equation 5 is orthonormality of {A˜µ(q always valid under various coordinate transformations. For example, we consider a set of coarse-grained coordinates b x, which can be expressed as b x ) X(q b). Suppose only basis functions of b x are included in the analysis. Then, in analogy to eq 5, we have

j i(x P b) )1+ j ref(x P b)

∑ [ ∫ A˜µ(xb)Pj i(xb) dxb]A˜µ(xb)

(6)

µ

Here, for any conformational distribution function P(q b), we define the conformational distribution of the coarse-grained coordinates b x as

j (x P b) )

∫ P(qb)δ(X(qb) - bx ) dqb

(7)

Combining eqs 6 and 7, we get

j i(x P b) )1+ j Pref(x b)

∑ 〈A˜µ(X(qb))〉iA˜µ(X(qb))

(8)

µ

Equation 8 implies that, by selecting sufficient basis functions of coarse-grained coordinates, the right side of eq 5 is equal to the ratio between two distribution functions of the coarse-grained coordinates. Consequently, for complex systems, we can only include the functions of important coordinates (such as the backbone dihedral angles of a protein) in analysis; then, a reasonable number of basis functions should already be enough for a good estimation of eq 5. Subsequently, we define the inner product between two trajectory segments as

〈i|j〉 )



Pi(q b)Pj(q b) dq b, Pref(q b)

i, j ) 1, 2, ... , Nt

(9)

Given the feasibility of eq 5 in practice, eq 9 can be easily estimated by the trajectory-mapped vectors as

Kinetic Transition Network Based on Trajectory Mapping

∑ 〈A˜µ(qb)〉i〈A˜µ(qb)〉j ) 1 + bV i · bV j

〈i|j〉 ) 1 +

(10)

µ

From the definition, the inner product between two nontransition trajectories in different metastable states should be almost zero. Similarly, we can define the inner product between two metastable states SR and Sβ as

〈SR |Sβ〉 ) 1 + b V sR · b V βs

(11)

where b VsR and b Vβs are the representative vectors of SR and Sβ, respectively. Equation 11 vanishes for two different metastable states, since their distributions do not overlap. This property also provides a handy way for measuring the quality of the observed metastable states. Given Ns identified metastable states {Sγ}γ)1,...,Ns, a simulation trajectory ti can be transformed into Ns state-indicator curves, {fiR(t)}R)1,...,Ns, as shown below

1 fiR(t) ) 〈SR |SR〉



Pit,∆t(q b)PsR(q b) V sR 1+b V it,∆t · b dq b) Pref(q b) 1+b V sR · b V sR (12)

Here, Pt,∆t b) denotes the conformational distribution for the part i (q of the ith trajectory within interval [t - ∆t, t + ∆t] and the b) is denoted as b Vt,∆t vector corresponding to Pt,∆t i (q i . If ∆t ) 0, only the individual conformation qi(t) (the conformation at time t of the ith trajectory) is considered in calculation. Using finite ∆t, the statistical noise in the state-indicator curves could be depressed. Ideally, for trajectory i, metastable state SR, and time t, fiR(t) should be either zero or unity while ∆t f 0. If the trajectory is in SR around t, fiR(t) ≈ 1; otherwise, fiR(t) should almost vanish. Therefore, the transition events between metastable states can be identified from the state-indicator curves. Extracting Kinetic Information from Simulation Trajectories. Given the state-indicator curves {fiR(t)}, various kinetic information can be extracted by directly identifying the transition events in simulation trajectories. Here, we mainly focus on the transition matrix R(t) and the transition rate matrix K for Ns metastable states. The transition matrix R(t) is an Ns × Ns dimensional matrix with its elements defined as

RβR(t) ) P(β, t|R, 0), R, β ) 1, ... , Ns

(13)

i.e., the conditional probability to find a trajectory in Sβ at time t when it is in SR at time zero. On the basis of Bayesian analysis,26 the elements of R(t) are estimated by

RβR(t) )

uβR(t) wR(t)

(14)

Here, uβR(t) ) nβR(t) + 1 and nβR(t) is the number of transitions from SR to Sβ when the observation time interval is t. wR(t) ) ∑β uβR(t). The uniform Dirichlet distribution is taken as the prior probability in estimation. The uncertainty of transition probability can also be estimated with the Bayesian method. The standard deviation for the estimation of element RβR(t) is

J. Phys. Chem. B, Vol. 114, No. 32, 2010 10269

∆RβR(t) )

uβR(t)[wR(t) - uβR(t)] wR(t)2[wR(t) + 1]

(15)

In the above estimations, the key quantity nβR(t) could be directly derived from the state-indicator curves. In practice, we use the following formalism to estimate nβR(t).

nβR(t) )

∑ ∑ h[fiβ(t0 + t)]h[fiR(t0)] i

(16)

t0

Here, h[fjγ(t′)] is a rounding function. If the jth trajectory is in Sγ at t′ (i.e., fjγ(t′) ≈ 1), the rounding function equals 1; otherwise, it equals zero. Ideally, we can directly use fjγ(t′) in eq 16 instead of h[fjγ(t′)]. However, considering the deviation of real systems from perfect metastability and the computational errors, we introduce the rounding function to ensure that the obtained RβR(t) is physically meaningful [i.e., to ensure ∑β RβR(t) ≡ 1 and RβR(t) g 0]. Further details for constructing R(t) and the discussion on the rounding function are presented in the Supporting Information. The matrix R(t) contains all of the kinetic information of the system. Once R(t) is obtained, it is reasonable to ask whether the transition between metastable states can be approximated as the Markov process.21 If the approximation is feasible, then a matrix R(t) with relatively small t could be used to temporally extend the simulation to longer time scales. A necessary condition for such extension is that the R(t) matrix at different t values should be consistent with each other, which means that the following relation

R(nt) ) R(t)n

(17)

should be satisfied for any integer n. We denote the eigenvalues of R(t) as λi(t), i ) 1, ... , Ns. Then, eq 17 implies that log[λi(t)] should be linearly related to t. Similarly, log[λi(t)]/t should be a horizontal line when t is large enough.13 We will examine these properties to check if real physical processes are Markovian. Furthermore, suppose the kinetics between metastable states is compatible with the continuous-time discrete-state Markov process, then the probability evolution of the system can be described by the following master equation

b(t) dP b(t) ) KP dt

(18)

where b P(t) is a vector recording the probability distribution over the metastable states at time t. K is the transition rate matrix. The nondiagonal element kβR (β * R) of K corresponds to the transition rate from state SR to state Sβ, and the diagonal element kRR is defined as kRR ) -∑β*R kβR. The lifetime of a state SR is estimated by -1/kRR. To estimate the transition rate matrix K, we analyze the stateindicator curves to collect the following information in simulation trajectories. (1) Every transition event has a waiting time. The waiting times for transitions from SR to Sβ are denoted by i trans trans , i ) 1, ... , NβR tβR , where NβR denotes the total number of such transition events. (2) At the end of each trajectory, there exists a short piece of trajectory after the final transition event. The lengths of these short trajectory pieces in certain state SR stay are denoted by τiR, i ) 1, ... , Nstay R , where NR denotes the total number of such short pieces. On the basis of this information,

10270

J. Phys. Chem. B, Vol. 114, No. 32, 2010

Gong and Zhou

the likelihood function of the simulation trajectories can be written as19 i L({tβR }, {τγj }|K) )

∏ PβRtrans(tβRi ) ∏ Pγstay(τγj ) R,β,i

γ,j

(19) trans Here, PβR (t) is the t time probability density for the transition from SR to Sβ if the system is initially in SR. Pγstay(t) ) 1 trans (τ) dτ is the probability that the transition from Sγ ∫0t ∑β*γ Pβγ trans to other states happens later than t. According to eq 18, PβR (t) ) kβR exp[kRRt]. The likelihood function eq 19 could also be analyzed with the Bayesian method. Thus, both the estimation and the uncertainty of the kinetic rates could be obtained. The kinetic rates are estimated by

kβR )

trans NβR +1



i tγR +

γ*R,i

kRR ) -

∑ τjR

,

β*R

(20)

j

∑ kβR

(21)

β*R

The current estimation is spiritually similar to the one implemented in Sriraman et al.’s work.27 However, the original likelihood function in Sriraman et al.’s work is substituted by its continuous-time limit, eq 19. Besides, we do not enforce the detailed balance condition; thus, all of the nondiagonal kinetic rates, kβR, β * R, are taken as free parameters. In doing so, the kinetic rates and their uncertainty can be easily estimated with an analytical formula. It should be noted that the kinetic rates estimated by eqs 20 and 21 are only used to provide qualitative intuition on the transition network and the lifetime of metastable states. The kinetics between metastable states is deemed to be better characterized by the R(t) matrix. To more accurately estimate K, the method in Sriraman et al.’s work could be used; then, Monte Carlo simulation is necessary for sampling the distribution of kinetic rates.27 The implementation details for estimating K are listed in the Supporting Information. Results and Discussion In this section, the trajectory mapping (TM) method is first illustrated with a toy model. After that, it is applied to the more complicated system of Ala12. Toy Model: Two-Dimension-Four-State Potential. The energy landscape of the toy model is designed as a twodimensional potential with four degenerate potential wells (see Figure 1a). 900 simulation trajectories are spawned from uniformly distributed initial positions. The simulation length τ for each trajectory is set to 100 (dimensionless time units), which is comparable to the mean first passage time between neighboring states. The simulation samples are recorded every 0.01 time length. In analysis, the following two-dimensional trigonometrical functions are selected as basis functions.

sin[(m + n)x], cos[(m + n)x], sin[(m + n)y], cos[(m + n)y], m+n>0 sin(mx) sin(ny), sin(mx) cos(ny), cos(mx) sin(ny), cos(mx) cos(ny), m g 1, n g 1

(22)

Figure 1. TM analysis of the toy model. (a) The energetic iso-contour map of the two-dimension-four-state toy potential (gray lines). Blue dots represent the simulation samples. (b) The largest ten eigenvalues in the PCA analysis of trajectory-mapped vectors. (c) The projection of trajectory-mapped vectors into the space spanned by the first three PCs (PCs with the largest three eigenvalues). Each red circle represents a trajectory-mapped vector. The vertices are labeled with corresponding state names.

Here, x and y are the two-dimensional Cartesian coordinates multiplied with transformation factor π/2. m and n are nonnegative integers. We define the summation of m and n in eq 22 as the order of these functions and use the one- to fourorder functions (a total of 40 basis functions) in analysis. Hence, the trajectories are mapped to 40-dimensional vectors. To provide an intuitionistic picture, we first analyze the trajectory-mapped vectors by the principle component analysis method (PCA).5 PCA is usually used to characterize the most variable directions [or say principle components (PCs)] for a bi}, set of data vectors. In PCA, given Nvdv-dimensional vectors {x i ) 1, ... , Nv, a dv × dv-dimensional covariance matrix Σ is first established by

Σ)

1 Nv

∑ (xbi - 〈xb〉)(xbi - 〈xb〉)T

(23)

i

where 〈x b〉 ) (1/Nv)∑i b xi is the average vector. Σ is a positive semidefinite matrix; its eigenvectors and eigenvalues correspond to the PCs and the variation along the PCs, respectively. A lowdimensional representation of the data can be obtained by projecting the vectors onto a few PCs with the largest eigenvalues. As shown in Figure 1b, the 40-dimensional trajectory-mapped vectors can be reduced to three-dimensional vectors by PCA

Kinetic Transition Network Based on Trajectory Mapping

J. Phys. Chem. B, Vol. 114, No. 32, 2010 10271 transition curve could be obtained (see Figure 2a, lower panel). All of the transition events along the trajectory can be clearly identified from the interstate transition curve. After identifying all of the metastable states, we will investigate whether the kinetics of the system could be coarsegrained as a continuous-time discrete-state Markov process in the following. If the answer is positive, then due to the symmetry of the system, the following rate matrix should be enough for predicting the long time behavior of the system.

(

-2k0 k0 K) k0 0

Figure 2. Kinetic information of the toy model. (a) The state-indicator curves for a simulation trajectory that visits all four states are plotted in the upper panel, and the corresponding interstate transition curve is plotted in the lower panel. In the upper panel, the gray curves are calculated with ∆t ) 0 in eq 12. They are subsequently divided into 250 segments and averaged in each segment, leading to the smoother red dashed curves. (b) The diagonal elements of matrix R(t) are plotted in the upper panel, and the nondiagonal elements of matrix R(t) are plotted in the middle panel. In the upper and middle panel, the red solid lines correspond to the theoretical prediction of transition state theory, and the dashed lines are estimated with simulation data. -log[λi(t)]/t,i ) 1, ... ,4 are plotted versus t in the lower panel. Here, λi(t) are the eigenvalues of R(t). Since λ0(t) ) 1 is ensured by the construction of R(t), no error bar is assigned to λ0(t). See the Supporting Information for more details of error estimation.

analysis. By projecting the trajectory-mapped vectors onto the first three PCs, an embedded three-dimensional space is obtained, where each point represents a simulation trajectory, and all the points together define a tetrahedron (see Figure 1c). As expected, the points located at the four vertices of the tetrahedron correspond to the nontransition trajectories in the four states. The other points located inside the lines, planes, or tetrahedron defined by the vertices correspond to the trajectories that visit two, three, and all four states within τ, respectively. Within an individual trajectory length τ, most of the transition trajectories only visit two states, which leads to the conspicuous line structures in Figure 1c. Consistent with the topology of the potential function, each vertex is only connected to two others by straight lines (the simulation trajectories usually cannot climb over the global potential maximum under the selected simulation temperature). When τ is too short, or the dynamics of the system shows prominent diffusive behavior, there may be outliers besides the two kinds of points introduced above. Since the toy model possesses good metastability, outlier points hardly exist in Figure 1c. Taking a trajectory that visits all four states as an example, the state-indicator curves are shown in Figure 2a, upper panel. Here, the representative vector for a metastable state is obtained by arbitrarily selecting one trajectory-mapped vector at the corresponding vertex. The curves are calculated with relatively short ∆t[(1/250)τ] in eq 12, and are approximately jumping between 0 and 1 as expected. Therefore, a related interstate

k0 -2k0 0 k0

k0 0 -2k0 k0

0 k0 k0 -2k0

)

Here, k0 is a constant, which is estimated as 5.2 × 10-3 by transition state theory.28 Given K, the transition matrix R(t) could be written as R(t) ) exp(Kt). The four eigenvalues of K are 0, -2k0, -2k0, and -4k0 (where -2k0 is 2-fold degenerate). Correspondingly, the eigenvalues of R(t) should be λ0(t) ) 1, λ1(t) ) exp(-2k0t), λ2(t) ) exp(-2k0t), and λ(t) ) exp(-4k0t). To test whether the model system satisfies the predicted behavior, we first estimate matrix R(t) at different t with the state-indicator curves. As shown in Figure 2b, the elements of R(t) could be classified into three groups. The diagonal elements decay from 1 with increasing t (Figure 2b, upper panel). Four nondiagonal elements increase slowly from zero, with their initial temporal gradients close to zero (see the blue dashed curves in Figure 2b, middle panel). These four elements correspond to the transition probabilities between non-neighboring states, such as R14(t) and R41(t). The other nondiagonal elements grow faster with time (see the black-dashed curves in Figure 2b, middle panel). They represent the transition probabilities between neighboring states. In accord with the symmetry of the model potential, the curves in the same group almost overlap with each other. Besides, the estimated transition probability curves are quantitatively consistent with the theoretical prediction from the Markov model (see the red solid curves in Figure 2b, upper and middle panels) within statistical error. Subsequently, we calculate the eigenvalues of R(t) [λi(t), i ) 1, ... , 4] and plot -log[λi(t)]/t versus t in Figure 2b, lower panel. As a result, we get four almost horizontal lines. For a Markov process generated by K [or R(t) ) exp(Kt)], -log[λi(t)]/ t, i ) 1, ... , 4, should be constant with values 0, 2k0, 2k0, and 4k0. The estimated eigenvalues are quantitatively consistent with the theoretical results. The difference between TM results and theoretical results becomes slightly larger when t approaches τ, which may be due to the poorer statistics of R(t) at larger t (see the larger error bars there). In summary, the metastable states of the toy model are correctly identified by the TM method. The reproduced kinetic properties are also consistent with the theoretical description based on the coarse-grained transition rate matrix K. Alanine-Dodecapeptide. In this section, we study alaninedodecapeptide (Ala12). This molecule consists of 12 alanine residues. Charged termini are used in simulation, which leads to versatile metastable structures.13 Two important free energy wells are usually thought to exist in the conformational space of Ala12.29,30 One is composed of β-hairpin/coil conformations with short end-end distance and large conformational entropy; lots of metastable states are found in this region. The other is composed of R-helix-like conformations with long end-end distance; less metastable states are found in this free energy

10272

J. Phys. Chem. B, Vol. 114, No. 32, 2010

well. The β-hairpin/coil conformations dominate in polar solution. In the current study, 1000 20 ns-length trajectories are generated from different initial conformations. The simulation is performed with the TINKER4.2 simulation package using the OPLSUA force field and the GB/SA implicit solvent model.31 The simulation conformations are recorded every 0.5 ps. To analyze Ala12, we select the one- to two-order twodimensional trigonometrical functions (see eq 22) of backbone dihedral angles φ and ψ as basis functions. Here, φ is defined as the backbone dihedral angle around the bond connecting CR and N atoms and ψ is defined as the backbone dihedral angle around the bond connecting CR and carbonyl carbon atoms. There are 22 φ or ψ angles in Ala12. These angles fully account for the backbone flexibility of this molecule. In this study, only the correlation between sequentially neighboring dihedral angles is modeled by the basis functions. Therefore, 172 basis functions are finally included in the analysis. 88 of them are functions of single dihedral angles, and the remaining 84 are functions of neighboring dihedral angles. According to our previous study of alanine dipeptide,25,32 the one- to two-order basis functions are already enough for a good estimation of eq 5. Besides, although it is possible to consider other degrees of freedom (such as interatomic distances) in analysis and similar results could be obtained,32 backbone dihedral angles should be the most natural choice for characterizing peptide conformation.33 After basis function selection, the trajectories are first analyzed at the O(10 ns) time scale and 15 metastable states are found (the lengths of trajectory segments at this time scale are comparable to the simulation length of each trajectory; see the following and the Supporting Information for more details). Among all of the metastable states, only one is composed of R-helix-like conformations (see Supporting Figure 2a of the Supporting Information), which is consistent with previous research. Since the simulation length of each trajectory (20 ns) is relatively short for the transitions between metastable states to happen, we nearly cannot find any transition trajectories. Hence, the following analysis will focus on the detailed structure at the time scales shorter than 10 ns. Using the trajectory mapping (TM) method, we analyze the internal structure of the two most populated states found at the O(10 ns) time scale. One of the two states has four substates and could be closely compared with the toy model. The other has more substates and a funnel-like free energy landscape. Both of the two states are composed of β-hairpin/coil conformations. Transition Network with Four States. For the first state at the O(10 ns) time scale, we truncate the long trajectory segments into pieces of 1 ns length. Then, the trajectory-mapped vectors are analyzed with the principle component analysis (PCA) method (see the analysis of the toy model). PCA shows that these vectors can be approximately embedded into threedimensional space (see Figure 3b). Therefore, we project the vectors onto the first three PCs in Figure 3a. In analogy to the toy model, it could be seen that there are four vertices corresponding to four substates. The three-dimensional points constitute three straight lines that respectively connect state S1 with the other states. The transition network for the four substates is plotted in Figure 3c, where the characteristic conformations, the main transition pathways, and the lifetime of each state (see below) are shown. Obviously, S1 acts as the hub of this local transition network. To analyze the coarse-grained dynamics of the system, we arbitrarily select one vector at each vertex in Figure 3a to be the representative vector of the corresponding metastable state.

Gong and Zhou

Figure 3. TM analysis of the four-state transition network of Ala12. (a) The projection of trajectory-mapped vectors into the space spanned by the first three PCs. Each red circle represents a vector. (b) The largest 25 eigenvalues in the PCA analysis of the trajectory-mapped vectors. (c) The typical conformations in the metastable states and their transition relation. The transition relation is implied from the estimated kinetic rates (see Table 1). We only illustrate the kinetic rates (transitions) with acceptable estimation error. The lifetimes of the states together with the estimated uncertainties are also shown. (d) The orthogonality test of the metastable states. The height of each column is defined by the scaled inner product 〈SR|Sβ〉 between the corresponding two states SR and Sβ (see eq 24).

TABLE 1: Kinetic Rates of Four-State Transition Network (Units: ns-1)a reaction

forward rate

reverse rate

S2 r S1 S3 r S1 S4 r S1

1.0(1) × 10-1 9(1) × 10-2 3.3(8) × 10-2

1.4(2) 5.2(7) × 10-1 4(1) × 10-1

a The uncertainty of the last digit of estimation is given in parentheses. Here, standard deviation is taken as the measure of uncertainty. The items in the reaction column indicate the direction of the rates in the forward rate column, and the rates in the reverse rate column take the opposite direction. Only the rates with acceptable estimation error are shown in the table. For a rate with estimation value k and standard deviation σ, it is listed only if zero is not in the interval [k - 2.58σ, k + 2.58σ] (for normal distribution, this interval corresponds to 99% confidence interval).

The representative vectors show good orthogonality between each other, as shown in Figure 3d (see also eq 11). In Figure 3d, the inner product between any two states is approximately scaled to [0, 1] by defining

〈SR |Sβ〉 )

〈SR |Sβ〉

√〈SR |SR〉〈Sβ |Sβ〉

(24)

Kinetic Transition Network Based on Trajectory Mapping

Figure 4. Kinetic information of the four-state transition network of Ala12. (a) The state-indicator curves of a simulation trajectory and the corresponding interstate transition curve are plotted, respectively, in the upper and lower panels. In the upper panel, the gray curves are calculated with ∆t ) 0 in eq 12. They are subsequently divided into 500 segments and averaged in each segment, leading to the smoother red dashed curves. (b) The diagonal and nondiagonal elements of matrix R(t) versus t are plotted, respectively, in the upper and middle panels. In the middle panel, only transition probabilities related to S1 are plotted, where the black dashed lines show the transition probabilities from other states to S1 and the blue dashed lines show the transition probabilities from S1 to other states [from bottom to top are the curves of R21(t), R41(t), and R31(t)]. log[λi(t)] are plotted versus t in the lower panel. λi(t),i ) 1, ... ,4, are the eigenvalues of R(t).

The state-indicator curves of a 20 ns trajectory that visits all four states are shown in Figure 4a, upper panel. From these curves, we can again identify the transition events, as shown in Figure 4a, lower panel. The transition probabilities between S1 and the other states are shown in Figure 4b, upper and middle panel. Theoretically, we have

lim P(β, t|R, 0) ) tf∞

∫S Peq(qb) dqb β

(25)

i.e., any element RβR(t) will approach the equilibrium population in Sβ with increasing t. For example, R11(t), R12(t), and R13(t) have almost reached the same level within 5 ns. Although R14(t) also increases fast at the beginning, it shows prominent fluctuation after 1 ns. This is because there are only a limited number of trajectories starting from S4 in the current simulation; hence, the statistics of R14(t) becomes less reliable at larger t. Further refinement of the current results could be obtained by spawning new trajectories from the conformations in S4. In the lower panel of Figure 4b, we show the eigenvalues of R(t) [λi(t), i ) 1, ... , 4] in logarithm form. In accord with the prediction of the Markov model, the curves of log[λi(t)] versus t closely resemble straight lines. However, we also notice that the nonlinearity of log[λ2(t)] versus t is more prominent than the others. Comparing the behavior of log[λ2(t)] with R14(t) and R44(t), we suggest that the abnormality of log[λ2(t)] should be

J. Phys. Chem. B, Vol. 114, No. 32, 2010 10273

Figure 5. TM analysis of the six-state transition network of Ala12. (a) The projection of trajectory-mapped vectors into the space spanned by the first three PCs. Each red circle represents a vector. (b) The largest 25 eigenvalues in the PCA analysis of the trajectory-mapped vectors. (C) The typical conformations in the metastable states and their transition relation. The transition relation is implied from the estimated kinetic rates (see Tables 2 and 3). We only illustrate the kinetic rates (transitions) with acceptable estimation error. (d) The orthogonality test of the metastable states.

attributed to the poor statistics at larger t instead of the deviation of the system kinetics from the Markov process. Actually, except for the elements related to S4, the long time behavior of R(t) could be successfully predicted by a R(t) matrix at small t (see Supporting Figure 5a of the Supporting Information). The estimated transition rates of the four-state transition network are listed in Table 1. While S2, S3, and S4 are all connected to S1, there is no direct connection between these three states. Besides, according to the transition rates, the transitions from other states to S1 are almost 1 order of magnitude faster than the corresponding reverse transitions. Consequently, S1 is both the hub and the free energy minimum in the transition network. It naturally has the longest lifetime as predicted by kinetic rates (see Figure 3c). Transition Network with More States. In this section, we study the detailed structure of the more complicated state found at the O(10 ns) time scale. After truncating the simulation trajectories into pieces of 2 ns length, the trajectory-mapped vectors are analyzed with the principle component analysis (PCA) method. As shown in Figure 5b, the intrinsic dimension of these vectors is higher than the models we studied above. After projecting the high-dimensional vectors onto the first three principle components (PCs), we find that the vectors indeed constitute a polygon with clearly identifiable vertices and lines in three-dimensional space (see Figure 5a). However, due to

10274

J. Phys. Chem. B, Vol. 114, No. 32, 2010

Gong and Zhou

TABLE 2: Kinetic Rates of Six-State and Eight-State Transition Networks (Units: ns-1)a forward rate reaction S3 S4 S5 S6 S5 S6

r r r r r r

S1 S3 S3 S3 S4 S4

6 state

reverse rate

8 state -1

1.0(1) × 10 4(1) × 10-2 1.7(2) × 10-1 4(1) × 10-2 1.3(1) × 10-1 1.9(4) × 10-2

6 state -1

0.9(1) × 10 4(1) × 10-2 1.5(2) × 10-1 4(1) × 10-2 1.3(1) × 10-1 1.9(4) × 10-2

8 state -1

3.1(3) × 10 1.3(3) × 10-2 7.5(8) × 10-1 6(2) × 10-2 1.04(9) 1.8(3) × 10-1

3.1(3) × 10-1 1.3(3) × 10-2 5.9(6) × 10-1 7(2) × 10-2 0.93(8) 1.7(3) × 10-1

a

Shown are the transition rates between S1, S3, S4, S5, and S6. See the caption of Table 1 for error estimation and other details.

the high intrinsic dimension of the trajectory-mapped vectors, picking out the vertices by hand is no longer reliable. Therefore, we design a clustering method to locate the vertices (metastable states) without dimension reduction. The clustering method is thoroughly introduced in the Supporting Information. In brief, we further truncate each trajectory piece into a few subpieces; by comparing neighboring subpieces, we can get rid of the subpieces including transition events, leaving the ones that always stay in one metastable state. Thus, the straight lines in Figure 5a will almost disappear and the remaining vertices could be found by standard clustering methods. This clustering method is also used to analyze the system at the O(10 ns) time scale. Of course, any other clustering method that takes into account the characteristic structure of the trajectory-mapped vectors can also be used. For the current system, clustering at the O(1 ns) time scale gives out six states. The characteristic conformations as well as their relationship are shown in Figure 5c. Among all of the states, S2 has the longest lifetime of 82 ns. S1, S4, and S6 are also stable with their lifetimes longer then 1 ns. S2 is kinetically connected to S1, S4, and S6 through two intermediate states S3 and S5. [Although we predict the direct transition between S2 and S4, the estimated transition rates between these two states are more than 1 order of magnitude smaller than the rates between S2 and S3 or S3 and S4 (see Table 2).] We find that S3 and S5 both have a relatively shorter lifetime and poorer hydrogen bond structure. In S3, there is only one well-defined hydrogen bond between 1N and 12O (for convenience, we use the serial number of a residue plus “N” or “O” to label the amino nitrogen or carbonyl oxygen atoms in that residue), while there are additional hydrogen bonds in the other states. In S5, the 12O atom simultaneously forms hydrogen bonds with 1N, 2N, and 3N, while 12O mainly interacts with 1N and 2N in S2 or 2N and 3N in S4 and S6. The steric effect leads to the weak hydrogen bonds in S5. Therefore, both kinetic and structural analyses indicate that S3 and S5 should be the intermediate states bridging the transition between the other states. The orthogonality of the six states is tested and shown in Figure 5d. Except for the intermediate states, the other states are almost exactly orthogonal to each other. The slightly inferior orthogonality of S3 and S5 could be explained by their relatively poorer metastability. Since the hydrogen bond structures of S3 and S5 are not well-defined as the other states, the conformational dynamics inside the two states should be dominated by diffusive behavior. Hence, for S3 and S5, the time scale of intrastate equilibration might not be well discernible from the time scale of interstate transition, which leads to the defected metastability. To test whether there is a more detailed structure in the six states found at the O(1 ns) time scale, we analyze each state with a local (state-specific) reference conformation set, as mentioned in the Introduction section. When analyzing a state,

Figure 6. TM analysis of the eight-state transition network of Ala12. (a) The typical conformations in the metastable states and their transition relation. The transition relation is implied from the estimated kinetic rates (see Tables 2 and 3). We only illustrate the kinetic rates (transitions) with acceptable estimation error. The dotted arrow indicates that only one kinetic rate between the connected two states suffices the error standard. (b) The orthogonality test of the metastable states. Here, S21, S22, and S23 correspond to the second, third, and fourth states, while the sequence of the other states is kept the same. (c) The qualitative free energy landscape of the eight-state transition network. The thickness of the dotted arrows qualitatively represents the transition rate along the specified direction.

the reference set only includes the conformations in that state instead of the conformations in all six states, and the trajectory segments inside the state are remapped into vectors accordingly. As a result, S2 is found to actually include three well-defined states, S21, S22, and S23 (see Supporting Figure 3 in the Supporting Information), while the other states do not have a more detailed structure. Hence, an eight-state transition network is obtained and portrayed in Figure 6a. Compared with the sixstate transition network (see Figure 5c), S21, S22, and S23 have much shorter lifetimes than S2, and the lifetimes of the other states are almost the same as before. The orthogonality of the updated transition network is shown in Figure 6b; the newly found states are indeed orthogonal to each other and to the other states. Since the interstate transition between the states inside S2 is found to be faster than the transition between S2 and the other states, a funnel-like free-energy landscape emerges for the eight-state transition network. As shown in Figure 6c, in the original six-state model, S2 represents the most stable state, or the global free energy minimum. It is further refined to three different states in the more exact eight-state model. The lifetimes of S21, S22, and S23 are all around 1 ns. These new states constitute the rough bottom of the funnel. S1, S4, and S6 are three metastable states, they are connected to the states in S2 through the intermediate states S3 and S5.

Kinetic Transition Network Based on Trajectory Mapping

J. Phys. Chem. B, Vol. 114, No. 32, 2010 10275 TABLE 3: Kinetic Rates of Six-State and Eight-State Transition Networks (Units: ns-1)a forward rate reaction S3 r S2 S3 r S21 S3 r S22 S4 r S2 S4 r S21 S5 r S2 S5 r S22 S22 r S21 S23 r S21 S23 r S22

6 state

8 state -2

1.0(1) × 10

0.8(3) × 10-3 9(3) × 10-4

reverse rate 6 state

8 state -1

1.3(2) × 10-2 0.3(1) × 10-2 1.3(5) × 10-3 2.33(6) × 10-1 0.66(3) × 10-1 0.06(1) × 10-1

3.7(3) × 10

7(2) × 10-3 1.1(3) × 10-1

3.0(3) × 10-1 0.8(2) × 10-1

0.6(2) × 10-1 0.44(1) 1.60(8) 0.09(2)

a Shown are the transition rates related to S2, S21, S22, and S23. See the caption of Table 1 for error estimation and other details.

Figure 7. Kinetic information of the six-state transition network and the more detailed eight-state transition network of Ala12. (a) The diagonal elements of R(t) for states S1, S3, S4, S5, and S6 are plotted. The results for the six-state transition network are plotted with a black dashed line, and the results for the eight-state transition network are plotted with a red dash-dotted line. (b) The diagonal transition probability for S2 is plotted with a black dashed line. The diagonal transition probabilities for S21, S22, and S23 are plotted with a red dashdotted line. (c) The eigenvalues of R(t) for the six-state transition network are shown in the style of log[λi(t)] versus t. (d) Similar curves to part c for the eight-state transition network. In part d, the two lines with the most negative slopes almost overlap with each other. This may be due to the coincidence of the relaxation time scales of S5 and S23 (see parts a and b).

Generally speaking, the transition dynamics between geometrically very close states is not necessarily fast. Consequently, to honestly reproduce the kinetics of the system, it is better to refine the observed states using the local reference conformation set, in case some adjacent states cannot be separated with the global reference set. However, in the current system, the states inside S2 are closely connected to each other in dynamics, so the six-state model can be used to approximately interpret the kinetics of the system. The diagonal elements of R(t) for states S1, S3, S4, S5, and S6 are plotted in Figure 7a, and the kinetic transition rates between these states are listed in Table 2. As can be seen from Figure 7a and Table 2, the two transition network models are in great agreement on the kinetic properties of S1, S3, S4, S5, and S6. Since these states do not change when extending the six-state model to an eight-state model, the above results reflect the consistency between the two models. In Figure 7b, the diagonal elements of R(t) for states S2, S21, S22, and S23 are plotted. The kinetic rates for the transitions related to these states are listed in Table 3. In the six-state model, the diagonal transition probability of S2 almost does not decay within 5 ns. In the eight-state model, this nondecaying curve is substituted by quickly decaying ones of S21, S22, and S23. This is consistent with the fast interstate transition between the states inside S2. The eigenvalues of R(t) are calculated and plotted in Figure 7c (six-state network) and d (eight-state network). The logarithm curves of the eigenvalues closely resemble straight lines, indicating that both transition network models can be used to extend the simulation time scale (see also Supporting Figures 5b and c and 4 in the Supporting Information). In Figure 7d, there are more lines with large negative slopes as compared to Figure 7c, which indicates the existence of new fast relaxation

modes in the eight-state transition network. These new modes may correspond to the fast interstate transition between S21, S22, and S23. Conclusion In this paper, we present the trajectory mapping (TM) method for constructing the transition networks of biomolecules. The TM method takes multiple MD trajectories as input data. By selecting some basis functions and calculating the trajectory averages of these functions, we transform the trajectories to highdimensional vectors. The averaging process suppresses the thermal fluctuation, which facilitates the identification of metastable states. We also properly treat the correlation between basis functions; thus, the kinetic properties of a model system can be quantitatively estimated, and a coarse-grained picture of the system could be established. Since the time scale of the identified metastable states depends on the length of input trajectories, the TM method provides a natural framework for constructing hierarchical transition networks. Once the metastable states at long time scales are found, the trajectories can be truncated to shorter lengths for extracting more detailed information. The TM method is first illustrated with a two-dimensional toy model, the results are consistent with the continuous-timediscrete-state Markov model. Then, we study the Ala12 molecule in implicit solvent with the current method. At the order of the 10 ns time scale, several metastable states are found without a prominent kinetic connection between them, indicating that the transitions between these states happen at longer time scale. Within the discovered states, both helix and coil conformations are found. Subsequently, we investigate two coil states at the O(1 ns) time scale, one with four substates and the other with six substates. For the latter one, more detailed information can be obtained by refining the observed states with local reference. As a result, a transition network with eight substates is established. This transition network shows a funnel-like free energy landscape. For current model systems, we find the kinetics between metastable states can be approximated as the Markov process. On one hand, it reflects the reliability of the identified metastable states. On the other hand, the Markov kinetics can be used to coarse-grain the system and to extend the simulation time scale.20 Compared with the other methods for establishing transition networks, e.g., the Markov state model, it is not necessary to maintain a large transition matrix between microstates in TM. Since the identification of metastable states is based on trajectory averaging and direct comparison of trajectory-mapped vectors,

10276

J. Phys. Chem. B, Vol. 114, No. 32, 2010

the discovered states and the hierarchical analysis in TM have clear physical meaning. TM is appropriate for analyzing the biomolecular systems with rich metastability (e.g., intrinsic disorder proteins34). By choosing different input trajectory lengths and basis functions of different variables, the systems can be coarse-grained to different levels at will. Since the input trajectories of TM can be generated both by truncating long simulation trajectories and by spawning short independent trajectories, the results of TM can be flexibly improved step by step. For example, if the kinetic properties of a metastable state are not well characterized by existing data, some additional simulation trajectories can be initiated from that state to improve statistics. Similarly, for a trajectory segment that is almost orthogonal to existing metastable states, we can initiate new simulations from some conformations in the trajectory segment; then, new metastable states may be found following TM procedures. TM could be further developed to overcome the limitation on simulation time scale. When the simulation trajectories are too short to bridge different metastable states [like the case in the analysis of Ala12 at the O(10 ns) time scale], it is still possible to define the metastable states by TM. Then, considering the orthogonality between metastable states and the well-defined behavior of state-indicator curves, it may be possible to adapt the existing simulation accelerating methods35-37 to the current framework. In doing so, we may establish the energetic and kinetic relation between the metastable states with long lifetimes. Designing accelerating methods to complement the TM method will be our future work. Acknowledgment. The authors acknowledge the Max Planck Society (MPG) of Germany and the Korea Ministry of Education, Science and Technology (MEST) for the support of the Independent Junior Research Group at the Asia Pacific Center for Theoretical Physics (APCTP) in Korea. L.G. is also supported by the National Basic Research Program of China (973 Program), No. 2007CB935903. Supporting Information Available: Experimental procedures and characterization data for all new compounds. This material is available free of charge via the Internet at http:// pubs.acs.org. References and Notes (1) Adcock, S. A.; McCammon, J. A. Chem. ReV. 2006, 106, 1589– 1615. (2) Shea, J.-E.; Brooks, C. L., III. Annu. ReV. Phys. Chem. 2001, 52, 499–535. (3) Krivov, S. V.; Karplus, M. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 14766–14770.

Gong and Zhou (4) Noe, F.; Fischer, S. Curr. Opin. Struct. Biol. 2008, 18, 154–162. (5) A. Amadei, H. J. B.; Linssen, A. B. Proteins 1993, 17, 412–425. (6) Mu, Y.; Nguyen, P. H.; Stock, G. Proteins 2005, 58, 45–52. (7) Sims, G. E.; Choi, I. G.; Kim, S. H. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 618–621. (8) Das, P.; Moll, M.; Stamati, H.; Kavraki, L. E.; Clementi, C. Proc. Natl. Acad. Sci. U.S.A. 2006, 103, 9885–9890. (9) Coifman, R. R.; Lafon, S. Appl. Comput. Harmon. Anal. 2006, 21, 5–30. (10) Nadler, B.; Lafon, S.; Coifman, R. R.; Kevrekidis, I. G. Appl. Comput. Harmon. Anal. 2006, 21, 113–127. (11) Maisuradze, G. G.; Liwo, A.; Scheraga, H. A. Phys. ReV. Lett. 2009, 102, 238101.1-4. (12) Gfeller, D.; DeLosRios, P.; Caflisch, A.; Rao, F. Proc. Natl. Acad. Sci. U.S.A. 2007, 104, 1817–1822. (13) Noe, F.; Horenko, I.; Schutte, C.; Smith, J. C. J. Chem. Phys. 2007, 126, 155102.1-17. (14) Chodera, J. D.; Singhal, N.; Pande, V. S.; Dill, K. A.; Swope, W. C. J. Chem. Phys. 2007, 126, 155101.1-17. (15) Rao, F.; Karplus, M. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 91529157. (16) Deuflhard, P.; Huisinga, W.; Fischer, A.; Schutte, C. Linear Algebra Appl. 2000, 315, 39–59. (17) Weber, M. ZIB Rep. 2003, 03-04, 1–11. (18) Prada-Gracia, D.; Gomez-Gardenes, J.; Echenique, P.; Falo, F. PLoS. Comput. Biol. 2009, 5, e1000415.1-9. (19) Jayachandran, G.; Vishal, V.; Pande, V. S. J. Chem. Phys. 2006, 124, 164902.1-12. (20) Noe, F.; Schutte, C.; Vanden-Eijnden, E.; Reich, L.; Weikl, T. R. Proc. Natl. Acad. Sci. U.S.A. 2009, 106, 19011–19016. (21) Buchete, N. V.; Hummer, G. J. Phys. Chem. B 2008, 112, 6057– 6069. (22) Torda, A. E.; van Gunsteren, W. F. J. Comput. Chem. 1994, 15, 1331–1340. (23) Shao, J. Y.; Tanner, S. W.; Thompson, N.; Cheatham, T. E., III. J. Chem. Theory. Comput. 2007, 15, 2312–2334. (24) Nadler, B.; Galun, M. AdV. Neural Inf. Process. Syst. 2007, 19, 1017–1024. (25) Gong, L.; Zhou, X. Phys. ReV. E 2009, 80, 026707.1-9. (26) Singhal, N.; Pande, V. S. J. Chem. Phys. 2005, 123, 204909.113. (27) Sriraman, S.; Kevrekidis, I. G.; Hummer, G. J. Phys. Chem. B 2005, 109, 6479–6484. (28) Chandler, D. J. Chem. Phys. 1978, 68, 2959–2970. (29) Levy, Y.; Jortner, J.; Becker, O. M. Proc. Natl. Acad. Sci. U.S.A. 2001, 98, 2188–2193. (30) Wales, D. J. Phys. Biol. 2005, 2, S86–S93. (31) Qiu, D.; Shenkin, P. S.; Hollinger, F. P.; Still, W. C. J. Phys. Chem. A 1997, 101, 3005–3014. (32) Gong, L.; Zhou, X. arXiV 2009, 0911.3527, 1–24. (33) Hovmoller, S.; Zhou, T.; Ohlson, T. Acta Crystallogr. 2002, D58, 768–776. (34) Mao, A. H.; Crick, S. L.; Vitalis, A.; Chicoine, C. L.; v Pappu, R. Proc. Natl. Acad. Sci. U.S.A. 2010, 107, 8183–8188. (35) Voter, A. F. Phys. ReV. Lett. 1997, 78, 3908–3911. (36) Laio, A.; Parrinello, M. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 12562–12566. (37) Zhou, X.; Jiang, Y.; Kremer, K.; Ziock, H.; Rasmussen, S. Phys. ReV. E 2006, 74, 035701(r).

JP100737G