Structural Information-Based Method for the Efficient and Reliable

Mar 7, 2017 - Predictions of structures of biomolecules are challenging due to the high dimensionalities of the potential energy surfaces (PESs) invol...
0 downloads 11 Views 3MB Size
Article pubs.acs.org/JPCB

Structural Information-Based Method for the Efficient and Reliable Prediction of Oligopeptide Conformations Xiao Ru,† Ce Song,†,‡ and Zijing Lin*,† †

Hefei National Laboratory for Physical Sciences at Microscales & CAS Key Laboratory of Strongly-Coupled Quantum Matter Physics, Department of Physics, University of Science and Technology of China, Hefei 230026, China ‡ Department of Theoretical Chemistry and Biology, School of Biotechnology, Royal Institute of Technology, SE-10691 Stockholm, Sweden ABSTRACT: Predictions of structures of biomolecules are challenging due to the high dimensionalities of the potential energy surfaces (PESs) involved. Reducing the necessary PES dimensionality is helpful for improving the computational efficiency of all relevant structure prediction methods. For that purpose, a systematic analysis of the backbone dihedral angles (DAs) in the low energy conformations of amino acids, di-, tri-, and tetrapeptides is performed. The analysis reveals that the DAs can be represented by a set of discretized values. Moreover, there are rules limiting the combinations of neighboring DA states. The DA combination rules are used to formulate a path matrix scheme for locating the low energy conformations of peptides. Comparing with the full DA combinations, the PES dimensionality in the path matrix method is reduced by a factor of 2.5n, where n is the number of amino acid residues in a peptide. The path matrix method is validated by applications to find the conformations of representative tri-, tetra-, and pentapeptides and comparison with the best literature results. All the tests show that the path matrix method is very efficient and highly reliable by producing the best search results for the low energy peptide conformations. costs of the “divide and conquer” method and the systematic search method are the same for the PES subspace that requires a thorough search. Meanwhile, the reliability of the “divide and conquer” method rests on whether the basic features of low energy conformations are reflected by the chosen PES subspaces. Therefore, the divided PES subspaces should cover some adequate numbers of rotational bonds essential for reflecting the structural characteristics of the low energy conformations. Consequently, the scope of applicability of the “divide and conquer” method for the conformational search is dependent on the required sizes of the PES subspaces. The aim of this work is to develop a method to reduce the dimensionality of PES necessary for locating the low energy conformations of peptides. This is achieved through a systematic analysis of the structural features of low energy conformations of a number of AAs, di-, tri-, and tetrapeptides. As a result of the structural analysis, a constraint on the combinations of neighboring dihedral angles (DAs) of molecular bonds is found. The constraint is expressed as a path matrix quantifying the reduced dimensionality of PES space/subspace. The reduction factor increases with the bond rotational degrees of freedom of peptide. The reliability and

1. INTRODUCTION Computational determination of conformations of biomolecules has long been an active area of research in the field of biochemistry. There are a variety of computational methods for locating the low energy conformations of biomolecules. The representative methods may be broadly classified as (1) the stochastic approach that samples the high-dimensional space of the potential energy surface (PES) of biomolecule in some designated manner, such as Monte Carlo, 1−4 genetic algorithm,5−8 simulated annealing,9−14 and basin-hopping;15 (2) the exhaustive or systematic search approach that considers all combinations of the bond rotational degrees of freedom of biomolecule;16 (3) the “divide and conquer” search method17−20 that applies and properly combines the systematic searches on subsets of bond rotational degrees of freedom. The stochastic approach can be efficient but its improvement toward reliability is challenging due to the high dimensionality of the PES space. The systematic search method is the most reliable, but its computational cost increases exponentially with the number of molecular bond rotational degrees of freedom. As a result, the systematic search method is applicable only to oligopeptide with a very small number of amino acid (AA) residues. The “divide and conquer” search method is highly desirable as it possess the feature that the required number of PES searching points increases linearly with the number of AA residues in a peptide. Notice, however, that the computational © XXXX American Chemical Society

Received: December 9, 2016 Revised: February 28, 2017 Published: March 7, 2017 A

DOI: 10.1021/acs.jpcb.6b12415 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Figure 1. Schematic illustration of the dihedral angles of peptide backbone.

Figure 2. Distributions and correlations of backbone DAs: (a) α vs β for 19 AAs with trans carboxyl; (b) α vs Ψ1 for 15 dipeptides; (c) α vs Ψ1 for 12 tripeptides and 3 tetrapeptides; (d) Φn vs β for 15 dipeptides with trans carboxyl; (e) Φn vs β for 15 dipeptides with cis carboxyl; (f) Φn vs β for 12 tripeptides and 3 tetrapeptides with trans carboxyl; (g) Φn vs β for 12 tripeptides and 3 tetrapeptides with cis carboxyl; (h) Φi vs Ψi (1 < i < n) for 12 tripeptides and 3 tetrapeptides.

peptides are used to characterize the AA and peptide structures. The notations of DAs in a peptide backbone are illustrated in Figure 1 and are similar to that commonly defined, Φ for CN− CC and Ψ for NC−CN. Notice that the DAs for the bonds linking the amino group and the carboxyl group are named as α and β, respectively, to indicate their difference with Φ1 and Ψn (n denotes the number of AA residues in a peptide) due to the terminal effect. Notice also that α is defined with H0N−CC, where H0 is a fictitious atom with H0 → N bisecting ∠NH2. While β for NC−CO(H) appears similar to Ψ for NC−CN, α for H0N−CC is likely different from Φ for CN−CC. In order to deduce a reliable method to reduce the size of PES space required for finding the low energy conformations of peptides, the analysis of structural information consists of two steps. First, DAs of the low energy conformations of AAs and representative small peptides are compiled in order to find the

efficiency of the path matrix sampling method is illustrated by providing high quality ensembles of the low energy conformations of representative tripeptides, tetrapeptides, and pentapeptide. The reduced PES size by the path matrix method is helpful for increasing the computational efficiency of both the systematic method and the “divide and conquer” approach, enabling the methods to be applied to larger molecules. It is also helpful for stochastic methods, such as Monte Carlo,1−4 genetic algorithm,5−8 and basin-hopping,15 with more effective samplings.

2. METHODS The conformations of representative AAs, dipeptides, tripeptides, and tetrapeptides obtained through extensive conformational searches16−19,21−23 are used in the analysis of structural information on peptides. DAs in the backbones of AA and B

DOI: 10.1021/acs.jpcb.6b12415 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Table 1. Results of the Backbone DA Discretizations Deduced from the Analysis of Low Energy Conformations of Amino Acids and Small Peptides DA

α

Φi (1 < i < n)

Φn

Ψ1

Ψi (1 < i < n)

β (γ = 180°)

nDA state

3 −60° (Eα), 60° (Fα), 180° (Gα)

3 90° (E), 180° (F), −90° (G)

3 90° (En), 180° (Fn), −90° (Gn)

4 30° (A1), −30° (B1), 150° (C1), −150° (D1)

4 −60° (A), 0° (B), 180° (C), 60° (D)

4 −60° (Aβ), 0° (Bβ), 180° (Cβ), 60° (Dβ)

β (γ = 0°) 2 0° (Bβ), 180° (Cβ)

Figure 3. Correlations of backbone DAs in the low energy conformations of 12 tripeptides and 3 tetrapeptides: (a) Φi vs Ψi−1 when Φi−1 = F (2 < i ≤ n); (b) Φi vs Ψi−1 when Φi−1=E (2 < i ≤ n); (c) Φi vs Ψi−1 when Φi−1=G (2 < i ≤ n); (d) Φ2 vs Ψ1.

energy conformations of AAs and peptides. The size of PES space necessary for locating the low energy conformations is then reduced by eliminating these unnecessary combinations of DA states. The connectivity of DA states is casted into a path matrix form so that the total number of rotational degrees of freedom can easily evaluated. The path matrix is used to improve the computational efficiency of systematic search method by screening out unnecessary combinations of bond rotational degrees of freedom. The reliability of the path matrix method for finding the low energy conformations of peptides is demonstrated by comparison with the best literature results for a number of representative tripeptides, tetrapeptides, and pentapeptide.

characteristic values useful for the discretization of DA. A conformation is viewed here as of low energy if its energy is within a range of 3 kcal/mol from the global minimum. The energy is determined at the BHandHLYP/6-311++G**// BHandHLYP/6-31+G** level. All natural AAs except proline are considered in the analysis. Proline is excluded here due to its distinctly different feature.24,25 The analysis also includes 15 dipeptides (AA, AG, GG, GH, GT, GV, GW, HG, IG, KG, LG, MG, VG, WG and YG), 12 tripeptides (FGG, GFA, GFG, GGF, GTG, GVG, GWG, GYG, MGG, TGG, VGG and WGG), and 3 tetrapeptides (GFGG, GTGG and GVGG). The conformations of all these molecules are known through extensive conformational searches.18,19,23 Extensive tests have been performed by applying local geometry optimization operations on the structures constructed by the characteristic DAs to verify if the DAs may be discretized. The approach is found to reliably recover the true structures of the original conformations and the discretized DA states are fully validated. The number of discretized states for each DA thus determined is less than 6, i.e., less than that required by the Newman projection for a general DA.16 The dimensionality of a peptide PES is thus reduced in comparison with that determined by the Newman projection. Second, the rule for the combinations of the discretized states of neighboring DAs is analyzed. Some DA state combinations are found to be irrelevant for finding the low

3. RESULTS 3.1. Distributions of the Backbone DAs of Amino Acids and Small Peptides. Figure 2 shows the distributions of DAs for the low energy conformations of 19 AAs and that of di-, tri-, and tetrapeptides concerning either the amino or the carboxyl terminal. Other than AAs, conformations with trans and cis carboxyl are separately considered for β in Figure 2. The separate consideration of trans- and cis-form of carboxyl is based on their different structural characteristics, corresponding respectively to γ = 180° and 0° for the DA of OC−NH of the carboxyl. Intuitively, the trans carboxyl plan is structurally similar to the peptide plane of O−C−N−H, while the cis C

DOI: 10.1021/acs.jpcb.6b12415 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

Φi−1 is not an F state. Regarding the (Φ2, Ψ1) correlation shown in Figure 3d, it appears that all the 12 possible combinations can be found in the low energy conformations. Figure 4 is a schematic illustration summarizing the rule of backbone DA combinations found in the low energy

carboxyl plan is quite different from the OCNH peptide plan. Indeed, the DA distributions in Figure 2d,f are quite different from that in Figure 2e,g. As seen in Figure 2a, b, and c, α is broadly distributed but near one of the three values of −60°, 60°, and 180°. The three discretized α states (nα = 3) are verified to fully recover the original conformations by geometry optimizations and are referred below as Eα, Fα, and Gα, respectively. From Figure 2d, e, and f, β may be viewed as distributed around the four values of −60°, 0°, 180°, and 60°. Though Figure 2a indicates that β may take value near β = ± 30°, discretizing β into the values of ±30° is unnecessary as all the conformations corresponding to β ≈ ± 30° can be recovered by optimizing the structures corresponding to β = ± 60°. Moreover, all the conformations shown in Figure 2e,g can be recovered by starting with β = 0° or 180° and followed with geometry optimizations. That is, β may be discretized into the four values of −60°, 0°, 180°, and 60° (nβ = 4) for trans carboxyl termini, but only the two values of β = 0° and 180° are necessary for the cis carboxyl termini. The four states of β = −60°, 0°, 180°, and 60° are denoted as Aβ, Bβ, Cβ, and Dβ, respectively. Similarly, the analysis of Figure 2b and c shows that Ψ1 can be discretized into four values (nψ1 = 4) of 30°, −30°, 150°, and −150° that are referred below as A1, B1, C1, and D1, respectively. The analysis Figure 2d, e, f, and g yields that Φn can be discretized into the three values of 90°, 180°, and −90° (nΦn = 3) and are denoted as En, Fn, and Gn, respectively. The analysis of Figure 2h gives the discretization states of Φi and Ψi (1 < i < n). Φi can be discretized into the three values of 90°, 180°, and −90° (nΦi = 3), denoted as E, F, and G, respectively. Ψi can be discretized into the four values of −60°, 0°, 180°, and 60° (nΨi = 4) that are denoted respectively as A, B, C, and D. The results of the above analysis on the backbone DA discretization are summarized in Table 1. 3.2. Correlations Among the Dihedral Angle States. Analysis of Figure 2 not only provides the information for the discretization of DA states, it also reveals the combinations of DA states favored in the low energy conformations. For example, there are a total of 12 possible DA state combinations for any pair of neighboring DAs as there are three Φ (α) states and four Ψ (β) states. However, only the six combinations of (A1, Eα), (B1, Fα), (C1, Eα), (C1, Gα), (D1, Fα), and (D1, Gα) can be seen in Figure 2b,c for the Ψ1-α pair. Similarly, only the five β−Φn combinations of (Aβ, En), (Bβ, En), (Bβ, Gn), (Cβ, Fn), and (Dβ, Gn) appear in Figure 2d,f concerning trans carboxyl termini. Moreover, tests show that the five β−Φn combinations of (Bβ, En), (Bβ, Gn), (Cβ, En), (Cβ, Fn), and (Cβ, Gn) can recover all the conformations shown in Figure 2e,g concerning cis carboxyl termini. For the Ψi−Φi pair, the seven combinations of (A, E), (B, E), (B, G), (C, E), (C, F), (C, G), and (D, G) are required to recover all the conformations shown in Figure 2h. Figure 3 shows the correlation between the neighboring DA pair of (Φi, Ψi−1) (2 < i ≤ n). The favored combinations of (Φn, Ψn−1) and that of (Φi 2) is repetitive. The pattern is repeated by n−2 times. Each repeating pattern may be expressed as a matrix, M,

(4)

The factor of “2” in eq 4 is used to account for both the cases of γ = 0° and γ = 180°. The total number of the backbone DA state combinations is therefore determined as, NP =





Pab

It is not difficult to compute NP by brute force when n is small. It is also easy to calculate NP with a simple compute code even when n is large. However, it is useful to find an explicit expression of NP so that the variation of NP with n can be easily understood. For that purpose, we decompose M in eq 1 into:

The path matrix, eq 4, can then be evaluated as,

Consequently, we have, NP = 51.27 × 4.696n − 2 − 1.069 × 0.822n − 2 + ( −1)n − 2 × 4.737 × 0.518n − 2 ∼ 2.3 × 4.7n

(9)

For moderate to large n (n > 5), the reduction factor in computational saving by using the correlations between the backbone dihedral angles is, Nfull/NP ≈ 2.5n. To be more concrete, Table 2 shows the variations of Nfull (eq 9) and NP (eq 5) with n for 3 ≤ n ≤ 10. Assuming a handling of up to one million trial structures as being computationally manageable, Table 2 shows that the constrained and straightforward systematic search may be applied to a peptide with up to 8 and 5 AA residues, respectively. The constrained systematic search method is very useful for reliably studying the characteristics of protein secondary structures that appear in oligopeptides with n ≤ 8. Moreover, when combined with some properly designed sampling method, such as Monte Carlo26 and genetic algorithm,27 the size of peptide that can be handled with a good degree of confidence will increase. As the search space size of the constrained search method grows with n much slower than that of the unconstrained search method, the combination will clearly benefit the former more than the latter, further enhancing the importance of the constraint. 3.4. Sample Testing Results on the Path Matrix Conformational Searches. To demonstrate that the path matrix scheme is an efficient replacement of the systematic search approach, the method has been applied to a number of tripeptides, tetrapeptides, and pentapeptides whose conforma-

where a matrix element Mab corresponds to the number of paths connecting the states of a and b (a,b = E, F, or G). The total number of paths connecting the DA states of residue i−1 with the DA states of residue i corresponds to the sum of all matrix elements of M. Similarly, the network of connecting the states of Φ2 and α can be expressed as

(2)

The numbers of paths pointing to En, Fn, and Gn from the β states concerning either γ = 0° or γ = 180° are 2, 1, and 2, respectively. It may be expressed in a matrix form as, ⎛2 0 0⎞ ⎜ ⎟ MC = ⎜ 0 1 0 ⎟ ⎝0 0 2⎠

(8)

In comparison, the number of all possible backbone DA state combinations in an unconstrained systematic search is, Nfull = 2 × (3 × 4)n = 2 × 12n

⎡2 1 2⎤ MN = ⎢⎢ 2 1 2 ⎥⎥ ⎣2 1 2⎦

(5)

a=E ,F ,G b=E ,F ,G

(3)

The path matrix for counting the total number of routes from all β states to all α states is, E

DOI: 10.1021/acs.jpcb.6b12415 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

conformer, while the systematic search misses the third, seventh, and many more other conformers. 3.4.2. GYG. Figure 6 compares the low energy conformations of GYG found by the path matrix method and a systematic

tions have been known through extensive searches. The quality of the path matrix search results is found to be similar or better than the best literature results. To illustrate, the path matrix conformational search results for four tripeptides (FGG, GGF, GWG, and GYG), two tetrapeptides (GGGG and GFGG), and one pentapeptide (GGGGG) are shown below and compared with the best available results. The seven molecules are representative of the most challenging cases reported for the conformational search of small peptides. 3.4.1. GWG. Figure 5 compares the low energy conformations of GWG found by the path matrix method and a

Figure 6. Comparison of the low energy conformations of GYG obtained by the path matrix method and the systematic search method.

search approach.19 As in ref 19, the conformational energies are reported at the BHandHLYP/6-311++G(d,p)//BHandHLYP/ 6-31G(d) level. The total numbers of the initial structures are 1440 and 36 864 for the path matrix method and the systematic search of ref 19, respectively. The numbers of the low energy conformations that are less than 4 kcal/mol above the global minimum are 93 and 25 for the path matrix method and the systematic search method, respectively. Like the case for GWG, the search results of the low energy conformations by the path matrix method is far superior to that by the systematic method. Among the joined results of the path matrix method and the systematic search method, the path matrix misses only the unimportant 61st and 68th lowest energy conformers, while the systematic search misses the third, fourth, sixth conformers, and many others. 3.4.3. FGG. Figure 7 compares the low energy conformations of FGG found by the path matrix method with the best

Figure 5. Comparison of the low energy conformations of GWG obtained by the path matrix method and the systematic search method.

systematic search approach.19 As in ref 19, the conformational energies shown in Figure 5 are determined at the BHandHLYP/6-311++G(d,p)//BHandHLYP/6-31G(d) level. The total number of the initial structures generated in the path matrix method is 240 × 3 × 2 = 1440, where 240 is the number of combinations required by the tripeptide backbone (Table 2) and 3 × 2 is used for the two rotational degrees of freedom for the side chain of the W residue. In comparison, a total of 36 864 trial structures were generated as reported in the systematic search of ref 19. Seventy one conformers are found by the path matrix method to be within 3.5 kcal/mol of the global minimum. Even though the systematic search method is expected to produce a complete set of conformers, however, it finds only 26 conformers in the same energy range.19 The missing of conformers by the systematic approach can be attributed to the crude PES of the semiempirical method of AM128,29 used to optimize the set of initial trial structures in order to reduce the computational burden.19,27 For visualizing the overall qualities of the conformational search results, Figure 5 also shows the density of states (DOS) of the low-energy conformations found by the two search 1

2

2

methods. A normalized Gaussian,φ(x) = α π e−(x − E) / α , is used to represent the contribution of a conformer to the DOS. Here, E is the conformer energy relative to the global minimum and α is a Gaussian width parameter with a value of 0.24 kcal/ mol. As shown in Figure 5, the set of low energy conformations found by the path matrix method is far superior to that found by the systematic search. The main reason for the inferior systematic search results is the use of a crude semiempirical method for screening the set of trial structures to reduce the computational cost. Overall, among the joined conformational results of the path matrix method and the systematic search, the path matrix search misses only the 33rd lowest energy

Figure 7. Comparison of the low energy conformations of FGG obtained by the path matrix method and the best literature results.

literature results.30 As in ref 30, the final conformational energies are determined at the MP2/cc-pVTZ//MP2/cc-pVDZ level. The best literature results were obtained through a very complicated strategy of combined MD runs, structural screenings, DFTB-D geometry optimizations, MP2 energy calculations, and geometry optimizations, etc. For convenience, the literature approach is referred to below as the multiprocess method. F

DOI: 10.1021/acs.jpcb.6b12415 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B The numbers of conformations in the energy range of 3.5 kcal/mol from the global minimum are 20 and 13 for the path matrix method and the multiprocess approach, respectively. Considering the combined conformational results of the path matrix and multiprocess searches, the path matrix method misses the fifth lowest energy conformer, while the multiprocess approach misses the sixth, seventh, 11th, 13th, 14th, 16th, 18th, and 20th lowest energy conformers. The reason for missing the fifth lowest energy conformer by the path matrix method is unclear and difficult to analyze as its structural details are not given in the literature. Nevertheless, it is reasonable to state that the overall quality of the path matrix results is at least comparable to that of the multiprocess searches. Notice that the number of initial structures generated in the path matrix method is only 720, while the multiprocess method is quite laborious and computationally intensive. It is clear that the path matrix method is very efficient and its search results are of high quality. 3.4.4. GGF. Figure 8 compares the low energy conformations of GGF found by the path matrix method with the best

Figure 9. Comparison of the low energy conformations of GGGG obtained by the path matrix method and the best literature results.

3.4.6. GFGG. Figure 10 compares the low energy conformations of GFGG found by the path matrix method

Figure 10. Comparison of the low energy conformations of GFGG obtained by the path matrix method and the systematic search method. Figure 8. Comparison of the low energy conformations of GGF obtained by the path matrix method and the best literature results.

and the best literature results obtained by a systematic search method.28 For comparison with the literature results, the conformational energies are reported at the level of BHandHLYP/6-311++G(d,p)//BHandHLYP/6-31+G(d,p). The numbers of conformations within 3.5 kcal/mol of the global minimum are both 39 for the path matrix method and the systematic search. Among the combined conformational results of the path matrix method and the systematic search, the path matrix approach misses the 11th, 30th, 32nd, and 34th lowest energy conformer, while the systematic strategy misses the sixth, 31st, 35th, and 42nd lowest energy conformers. The overall quality of the path matrix results is comparable or slightly better than that of the systematic search method. Notice that the total number of the initial structures generated by the systematic search method is 884,736, while that by the path matrix method is only 1130 × 3 × 2 = 6780. The computational benefit of the path matrix method is evident. 3.4.7. GGGGG. Conformations of GGGGG were searched by both the systematic method and the path matrix method. The set of 497 664 initial structures generated by the systematic method were first optimized by the empirical AM1 method. The final conformational energies of both the search methods are determined at the B97D31/6-311++G(d,p)//B97D/631+G(d,p) level. The global minimum identified here is 3.7 kcal/mol lower in energy than the literature result.31,32 The path matrix and systematic search methods produce respectively 39 and 34 low energy conformations that are within 4 kcal/mol of the global minimum. Among the

literature results obtained by the multiprocess method.19 The numbers of conformers in the energy range of 3.5 kcal/mol from the global minimum are 27 and 15 for the path matrix method and the multiprocess approach, respectively. For the joined conformational results of the path matrix and multiprocess searches, the path matrix method misses the 16th and 17th lowest energy conformers. In comparison, the multiprocess method misses the eighth, ninth, 11th, 13th, 15th, 19th, ∼25th, 28th, and 29th lowest energy conformers. Again, the path matrix method is found to be not only computationally efficient, but also providing high quality search results. 3.4.5. GGGG. Figure 9 compares the low energy conformations of GGGG found by the path matrix method and the best literature results obtained through a so-called step-by-step approach.20 The conformational energies are reported at the BHandHLYP/6-311++G(d,p)//BHandHLYP/6-31G(d) level. The numbers of conformations in the 3.5 kcal/mol range of the global minimum are 28 and 26 for the path matrix method and the systematic search method, respectively. Among the combined conformational results of the path matrix method and the step-by-step search, the path matrix approach misses the 23rd, 30th, and 31st lowest energy conformer, while the step-by-step strategy misses the 11th, 21st, 22nd, 25th, and 27th lowest energy conformers. The overall quality of the path matrix results is comparable or slightly better than that of the step-by-step search method. G

DOI: 10.1021/acs.jpcb.6b12415 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B

(2) Kapota, C.; Ohanessian, G. The Low Energy Tautomers and Conformers of the Dipeptides HisGly and GlyHis and of their Sodium Ion Complexes in the Gas Phase. Phys. Chem. Chem. Phys. 2005, 7, 3744−3755. (3) Schlund, S.; Müller, R.; Grassmann, C.; Engels, B. Conformational Analysis of Arginine in Gas Phase - a Strategy for Scanning the Potential Energy Surface Effectively. J. Comput. Chem. 2008, 29, 407− 415. (4) Christen, M.; Van Gunsteren, W. F. On Searching in, Sampling of, and Dynamically Moving through Conformational Space of Biomolecular Systems: A review. J. Comput. Chem. 2008, 29, 157−166. (5) Holland, J. H. Genetic Algorithms. Sci. Am. 1992, 267, 66−72. (6) Rak, J.; Skurski, P.; Simons, J.; Gutowski, M. Low-Energy Tautomers and Conformers of Neutral and Protonated Arginine. J. Am. Chem. Soc. 2001, 123, 11695−1170. (7) Le Grand, S. M.; Merz, K. M., Jr The Genetic Algorithm and Conformational Search of Polypeptides and Proteins. Mol. Simul. 1994, 13, 299−320. (8) Meza, J. C.; Judson, R. S.; Faulkner, T. R.; Treasurywala, A. M. A Comparison of a Direct Search Method and a Genetic Algorithm for Conformational Searching. J. Comput. Chem. 1996, 17, 1142−1151. (9) Saunders, M. Stochastic Exploration of Molecular Mechanics Energy Surfaces. Hunting for the Global Minimum. J. Am. Chem. Soc. 1987, 109, 3150−3512. (10) Gao, B.; Wyttenbach, T.; Bowers, M. T. Hydration of Protonated Aromatic Amino Acids: Phenylalanine, Tryptophan, and Tyrosine. J. Am. Chem. Soc. 2009, 131, 4695−4701. (11) Ye, S.; Clark, A. A.; Armentrout, P. B. Experimental and Theoretical Investigation of Alkali Metal Cation Interactions with Hydroxyl Side-Chain Amino Acids. J. Phys. Chem. B 2008, 112, 10291−10302. (12) Vásquez, M.; Némethy, G.; Scheraga, H. A. Conformational Energy Calculations on Polypeptides and Proteins. Chem. Rev. 1994, 94, 2183−2239. (13) Corcho, F. J.; Filizola, M.; Pérez, J. Evaluation of the Iterative Simulated Annealing Technique in Conformational Search of Peptides. Chem. Phys. Lett. 2000, 319, 65−70. (14) Fujitani, N.; Shimizu, H.; Matsubara, T.; Ohta, T.; Komata, Y.; Miura, N.; Sato, T.; Nishimura, S.-I. Structural Transition of a 15 Amino Acid Residue Peptide Induced by GM1. Carbohydr. Res. 2007, 342, 1895−1903. (15) Wales, D. J.; Doye, J. P. Global Optimization by Basin-Hopping and the Lowest Energy Structures of Lennard-Jones Clusters Containing up to 110 Atoms. J. Phys. Chem. A 1997, 101, 5111−5116. (16) Huang, Z.; Lin, Z. Detailed Ab Initio Studies of the Conformers of Gaseous Tryptophan. J. Phys. Chem. A 2005, 109, 2656−2659. (17) Ling, S.; Yu, W.; Huang, Z.; Lin, Z.; Haranczyk, M.; Gutowski, M. Gaseous Arginine Conformers and their Unique Intramolecular Interactions. J. Phys. Chem. A 2006, 110, 12282−12291. (18) Yu, W.; Xu, X.; Li, H.; Pang, R.; Fang, K.; Lin, Z. Extensive Conformational Searches of 13 Representative Dipeptides and an Efficient Method for Dipeptide Structure Determinations Based on Amino Acid Conformers. J. Comput. Chem. 2009, 30, 2105−2121. (19) Yu, W.; Wu, Z.; Chen, H.; Liu, X.; MacKerell, A. D., Jr.; Lin, Z. Comprehensive Conformational Studies of Five Tripeptides and a Deduced Method for Efficient Determinations of Peptide Structures. J. Phys. Chem. B 2012, 116, 2269−2283. (20) Li, H.; Lin, Z.; Luo, Y. A Fragment Based Step-by-Step Strategy for Determining the Most Stable Conformers of Biomolecules. Chem. Phys. Lett. 2014, 610−611, 303−309. (21) Zhang, M.; Huang, Z.; Lin, Z. Systematic Ab Initio Studies of the Conformers and Conformational Distribution of Gas-Phase Tyrosine. J. Chem. Phys. 2005, 122, 134313−134319. (22) Leng, Y.; Zhang, M.; Song, C.; Chen, M.; Lin, Z. A SemiEmpirical and Ab Initio Combined Approach for the Full Conformational Searches of Gaseous Lysine and Lysine-H2O Complex. J. Mol. Struct.: THEOCHEM 2008, 858, 52−65.

combined results of the path matrix and systematic searches, the path matrix method misses the 14th and 29th lowest energy conformer, while the systematic search misses the 11th, 13th, 23rd, 24th, 27th, 31st, and 41st lowest energy conformers. The results are summarized in Figure 11. Even though the number

Figure 11. Comparison of the low energy conformations of GGGGG located by the path matrix method and the systematic search method.

of initial structures generated by the path matrix method is 2 orders of magnitude smaller than that of the systematic method, the path matrix search results are slightly better than the systematic approach.

4. SUMMARY The discrete states and their combination rules for the peptide backbone DAs are determined through a systematic analysis of the structural information in the low energy conformations of dipeptides, tripeptides, and tetrapeptides. A path matrix scheme based on the discrete DA states and their combinations is then developed to reduce the dimensionality of PES necessary for locating the low energy conformations of peptides. The path matrix method is proven to be not only very efficient, but also highly reliable by providing the best results for the low energy conformations of representative tripeptides, tetrapeptides, and pentapeptides. The reduced PES dimensionality by the path matrix approach can be used to improve the computational efficiency of a variety of conformational search methods, including both the systematic and stochastic search methods as well as the “divide and conquer” approach.



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. ORCID

Zijing Lin: 0000-0001-9270-1717 Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation of China (Grant No. 11374272 & 11574284) and the Collaborative Innovation Center of Suzhou Nano Science and Technology.



REFERENCES

(1) Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 1953, 21, 1087−1092. H

DOI: 10.1021/acs.jpcb.6b12415 J. Phys. Chem. B XXXX, XXX, XXX−XXX

Article

The Journal of Physical Chemistry B (23) Chen, H.; Wang, Y.; Chen, X.; Lin, Z. Gas Phase Conformations of Tetrapeptide Glycine-Phenylalanine-Glycine-Glycine. Chin. J. Chem. Phys. 2012, 25, 77−85. (24) Grdadolnik, J.; Mohacek-Grosev, V.; Baldwin, R. L.; Avbelj, F. Populations of the Three Major Backbone Conformations in 19 Amino Acid Dipeptides. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 1794−1798. (25) Porter, L. L.; Rose, G. D. Redrawing the Ramachandran Plot after Inclusion of Hydrogen-Bonding Constraints. Proc. Natl. Acad. Sci. U. S. A. 2011, 108, 109−113. (26) Ozkan, S.; Meirovitch, H. Efficient Conformational Search Method for Peptides and Proteins: Monte Carlo Minimization with an Adaptive Bias. J. Phys. Chem. B 2003, 107, 9128−9131. (27) Ru, X.; Song, C.; Lin, Z. A Genetic Algorithm Encoded with the Structural Information of Amino Acids and Dipeptides for Efficient Conformational Searches of Oligopeptides. J. Comput. Chem. 2016, 37, 1214−1222. (28) Winget, P.; Horn, A. H. C.; Selcuki, C.; Martin, B.; Clark, T. AM1* Parameters for Phosphorus, Sulfur and Chlorine. J. Mol. Model. 2003, 9, 408−441. (29) Winget, P.; Clark, T. AM1* Parameters for Aluminum, Silicon, Titanium and Zirconium. J. Mol. Model. 2005, 11, 439−456. (30) Ř eha, D.; Valdés, H.; Vondrásě k, J.; Hobza, P.; Abu-Riziq, A.; Crews, B.; de Vries, M. S. Vries, Structure and IR Spectrum of Phenylalanyl−Glycyl−Glycine Tripetide in the Gas-Phase: IR/UV Experiments, Ab Initio Quantum Chemical Calculations, and Molecular Dynamic Simulations. Chem. - Eur. J. 2005, 11, 6803−6817. (31) Grimme, S. Semiempirical GGA-Type Density Functional Constructed with a Long-Range Dispersion Correction. J. Comput. Chem. 2006, 27, 1787−1799. (32) Strittmatter, E. F.; Williams, E. R. Computational Approach to the Proton Affinities of Glyn (n = 1−10). Int. J. Mass Spectrom. 1999, 185, 935−948.

I

DOI: 10.1021/acs.jpcb.6b12415 J. Phys. Chem. B XXXX, XXX, XXX−XXX