J. Phys. Chem. 1995,99, 3379-3386
3379
Sensitivity Analysis of a Two-Dimensional Square Lattice Model of Protein Folding Richard E. Bleil and Chung F. Wong* Department of Physiology and Biophysics, Mount Sinai School of Medicine of the City University of New York, New York, New York 10029-6574
Herschel Rabitz Department of Chemistry, Princeton University, Princeton, New Jersey 08544 Received: May 18, 1994; In Final Form: November 3, I994@
In order to further test the use of the sensitivity analysis approach for identifying the determinants of protein structure and for guiding the design of novel bioactive molecules, we apply sensitivity analysis techniques to study a 2-dimensional lattice model of protein folding. The 2-dimensional lattice model is simple enough so that one can calculate sensitivity coefficients, which relate the properties of a model polypeptide to the parameters that control these properties, quickly without convergence problems as in molecular dynamics simulations. The predictions from a sensitivity analysis can also be checked easily by doing many exact calculations for this model. The sensitivity analysis technique is shown to be useful for elucidating traits that might be missed without the use of this method. The study of the 2-dimensional lattice model also suggests that first-order sensitivity theory is useful for identifying residues whose modifications may have the most profound effects on the properties of a model polypeptide.
I. Introduction The problem of protein folding is biologically important, but our understanding of the process is severely limited by many difficulties. Experimentally, means are found to “trap” certain structures that are partially folded in an effort to determine the “folding Theoretically, one of the most significant difficulties is the enormously overwhelming task of exploring a large configuration pace.^-^ The use of a two-dimensional square lattice model makes it easier to study the entire configuration space of the model for a relatively short chain.1°-19 The model itself is an extension of a model proposed by Zwanzig and Lauritzien20,21and later generalized by Go et al.22-25and Skolnick et al.26-31 In the current m~del,’O-’~ the protein is envisioned as a chain restricted to a two-dimensional array composed of either hydrophobic (designated by “h’)or hydrophilic (designated by “p”) residues in some given sequence. This sequence of residues is then allowed to “walk” on a square lattice in a self-avoiding manner. All configurations with a “mirror image” already in the configuration space are discarded as equivalent structures. The entire configuration space is thus no larger than 3n1-212. For chain length 13, this is about 9 x lo4, which can easily be scanned and have relevant information stored by a computer. Specific side chain interactions present in real proteins are not present in the model, so one should be cautious when trying to extrapolate information from this model to the “real world’. There are some nonsequential residues in the configuration space which are the same distance from one another as sequential neighbors along the chain. These pairs are referred to as “topological neighbors” and are assumed to be the only interactions which contribute energy to the system.12 This assumption removes the distances dependency from the pairwise interaction energy of the system. In this model, the only energy which is significant is pairwise hydrophobic interaction energy between topological neighbors.12 Thus, if residues i and j are
* To whom correspondence should be addressed. @
Abstract published in Advance ACS Abstracts, February 15, 1995.
0022-365419512099-3379$09.00/0
topological neighbors, then there is an energy E* contributed to the system only if both residue i and residue j have been previously designated as hydrophobic (“h’). The model has been set up such that if one or both residues involved in the pairwise topological interaction are hydrophilic, then the interaction energy is negligibly small. Aside from the advantage of being able to examine the entire configuration space available, there are also some parallels between this lattice model and real proteins.1°-19 There are some 2-dimensional analogues to real secondary structure^,^^-^^ such as the helix, the turn, and the antiparallel sheet. The parallel sheet also has a 2-dimensional analogue but is not significantly present except for relatively long chain lengths which are beyond the scope of this s t ~ d y . ’ O - ~It~ has been shown that the frequency of appearance of the previously mentioned secondary structures resembles the frequency of appearance in real proteins.1°-19 Several properties can be examined with this model. These include the system’s free energy, entropy, and internal energy, as well as equilibrium constants between two structures of the system. Ensemble average quantities can also be obtained.12s3* For instance, one can find the ensemble average number of bends, the average number of hydrophilic-hydrophobic contacts, the average compactness, the fraction of buried residues which are hydrophobic in the system, and the average number of hydrophobic contacts per bend.12,32Some of these quantities will be discussed in more detail later. Although sensitivity analysis has been utilized in one form or another in engineering and the study of chemical reactions and dynamics for many years,33-37its recent introduction into biophysics has brought forth a tool that seems to be a powerful technique in analyzing biomolecular structures and elucidating the major features of these molecule^.^^-^ The idea behind a sensitivity analysis is rather simple; one studies the relationships between the properties of a system and the parameters that control these properties by calculating first-order and higherorder derivatives of the properties with respect to the parameters. It has been applied to the study of relatively small molecules 0 1995 American Chemical Society
3380 J. Phys. Chem., Vol. 99, No. 10, 1995
Bleil et al.
and simple systems and has been developing steadly toward use in larger systems as a part of molecular dynamics s t u d i e ~ . ~ * - ~ Since the use of sensitivity analysis is still relatively novel in biophysics, much work is needed as yet to clarify its utility, pitfalls, and interpretation. The first-order sensitivity provides an efficient means to estimating the relative importance of different residues in affecting the properties of a polypeptide represented by Dill’s model. With the use of sensitivity analysis, we are able to ascertain the relative importance of each of the residues in a single run, without doing computer “mutagenesis” experiments which require explicit modifications of a sequence. The secondorder sensitivity illustrates the impact of changing the hydrophobic character of two residues simultaneously. This is a particularly insightful exercise if one is interested in determining how two residues cooperate to infuence the properties of a sequence. Aside from extracting more information from the model with fewer sequences, sensitivity analysis has also proven useful in emphazing certain trends that may not be altogether obvious without the use of this technique. Furthermore, the use of sensitivity analysis is insightful when choosing particular mutations to improve some given desirable qualities.
11. Formalisms of the 2-Dimensional Protein Folding Model Much of the current work has been given a form to resemble previous efforts with the rnodel.l2 Due to some of the specifics of calculating the sensitivities, however, we have made some small changes that can facilitate sensitivity analysis. Although previous works have utilized degeneracy for configurations with equivalent energy,12 we find it more convenient to not use degeneracy. Instead, we write the canonical partition function as an explicit sum over states, where each state corresponds to one acceptable configuration. This change makes it easier to carry out the sensitivity analysis as described in this paper. In order to maintain the formalisms introduced earlier, our choice of reference state will be the energy of that (those) configuration(s) with the maximum possible number of topological neighbours and assuming all these contacts are hydrophobic contacts.’* For any given chain length, this maximum number of possible topological neighbors, tmm,is fixed. Not all sequences can give tmaxhydrophobic interactions, but all sequences can give t- total interactions, including hydrophobichydrophilic and hydrophilic interactions. Therefore, the partition function is expressed as
z = Cexp(-E5)
(11.1)
5
where the sum is over all allowed configurations and Et is the energy of configuration 5 (11.2) Here, the sum is over all pairs of residues in the sequence, d(d - do) = 1 when d = di, and d(d - do) = 0 otherwise; d is the distance between topological neighbors defined to be the grid size of the lattice, do is the distance between residues i and j , and U U = € ( i ) EO’)
(11.3)
is the magnitude of the hydrophobic interaction energy between two residues i and j and has been given this form to help
facilitate a sensitivity analysis. To maintain agreement with previous works,12 every hydrophobic residue pair is assumed to interact with energy ch, so c(i) = lch11’2 if residue i is hydrophobic. c ( i ) is chosen to be small (10-l2) if residue i is hydrophilic. For clarification, the constant l l k ~ Twhere , k~ is the Boltzmann’s constant and T is the absolute temperature, has been absorbed into the constant ~ ( i ) In ~ .other words, the hydrophobic interaction energy is expressed in ~ B units. T Therefore, when E M increases, one can think in terms of an increase in hydrophobic interactions at a constant temperature or a decrease in absolute temperature with the hydrophobic interaction energy held fixed. Returning for a moment to considering the reference state, since the free energy of the system is FIk,T = -In Z
(11.4)
then, if the ground state configuration has all tmm hydrophobic contacts, as the hydrophobic interaction energy c h becomes very large, FIkBT approaches zero ifthe ground state is not degenerate. This is because as c~ becomes large, the only terms that would make the exponential in the partition function nonzero are those terms with tmaxhydrophobic interactions. If there is only one such configuration, then the total free energy would be zero. However, if there are multiple configurations with tmax hydrophobic interactions, then the total free energy would be less than zero. These results are the consequence of the choice of the reference energy to be t-ck, which varies as c b changes and with changes in chain length. Statistical thermodynamics gives the entropy S as Slk, = -
CP5ln(P5)
(11.5)
5
where the sum is over all configurations and
Pc = exp( -E5)/Z
(11.6)
As for the internal energy, we have from thermodynamics E=F+TS
(11.7)
Because of the way the model was set up, all configurations with the same number of hydrophobic contacts nh will have the same energy, that energy being (tmm - nh)Ehh, designated Es. Thus, if there are s different configurations with n h hydrophobic contacts, then the probability that any one given configuration will be in one of these s configurations with the same energy is related to the equilibrium constant, K(s), given by the formula
K(s) = s exp(-E,)l[Z - s exp(-E,)]
(11.8)
III. Formalisms of Sensitivity Analysis of the Folding Model
The sensitivity of any given observable to change in any parameter of a specific residue can be studied by calculating the derivative of the observable with respect to the parameter.33.34Thus, for example, the first-order sensitivity of the free energy of the system with respect to changes in the hydrophobic interaction energy of residue i, ~ ( i ) is , simply represented by a(F/kBT)/&(i). The second-order sensitivity of the free energy of the system with respect to simultaneous changes of c(i) and EO’) is a2(F/k~T)l&(i) &(j),and it reflects how residues i and j cooperate to affect the free energy. The second-order sensitivity of an observable with respect to a set
Two-Dimensional Square Lattice Model of Protein Folding of n, parameters can be represented by a matrix of size n, x ns. The diagonal (i =jJ is simply how the first-order sensitivity of free energy responds as E(i) changes. It is particularly interesting to use this simple lattice model to examine the calculation and utility of the second-order sensitivity coefficients since it is harder to calculate second-order sensitivity coefficients in more complicated models. For example, it has been found in studying model peptides and the protein bovine pancreatic trypsin inhibitor that second-order sensitivity coefficients converge slowly in molecular dynamics simulation^.^^ The derivations of these sensitivities are straightforward, and the details shall not be presented. For the thermodynamic variable F , the free energy of the system, it can be shown that
where (111.2)
J. Phys. Chem., Vol. 99, No. IO, I995 3381
(0) =Z 5
O exp(-E5)/Z
(111.12)
The sensitivity of any ensemble average has an equally general form. If 0 does not explicitly depend on E @ ) , it can be shown that for any observable 0 (111.13)
W ) ~ W =) (EkO) - (Ek)(O) and
Thus, to find the first- and second-order sensitivitiesof any given ensemble average quantity, all that is necessary is to substitute the quantity for 0 in eqs III.13 and III.14, respectively. Because in its definition the equilibrium constant K(s) is not divided by the entire partition function (eq 11.8), the form of the sensitivity of K(s) is slightly different from the other observable quantities. It can be shown that
& exp(-Es)/[Z - s exp(-Es)] exp(-Eg) - & exp(-Es)l/[Z - exp(-Es)l
aK(s)/ac(k)=
For the second-order sensitivity, we have
S
K(S)[& 5
S
S
(111.15)
where
where the sum over s is only over those configurations with s hydrophobic contacts. Furthermore, it can be shown that the second-order sensitivity of the equilibrium constant is
a2K(S)/a€(k)aE(0 =
Kkl
eXp(-E,)/[Z - S exp(-Es)l
+
S
and
&er
exp( -Es)/[Z - s exp( -Es) - {&
S
exp( -Es)/[Z -
S
s ~ X P ( - E ~ ) I H & e x p ( - ~ < )-
5
s exp(-Es)11 -
el exp (-E,)I/[~ S
{C exp(-E,)l[Z - exp(-Es)11 x - & exp(-ES)1/[Z - s exp(-Es)ll El
S
S
exp(-E5) 5
S
K(s){[C e x ~ ( - ~ g-) 5 Kkl
~XP(-E,)II -
w{[&
& ~ X (-Es)l/[z P S
~XP(-E,)I/[Z -
S
exp(-E5) 5
exp(-E,)l/[Z - s exp(-E,)Il For the internal energy, we have
&kl
+ 2K(s){[&
S
ekel x
exp(-E$
-
5
- s ex~(-EJll{[&
exp(-Eg) 5 exp(-E,)]/[Z - s exp(-E,)]} (111.16)
zcl S
and i3*(E/kBQ/ac(k)&(l) = a2(F/kBT)/ac(k)&(l)
+
a2(s/kB)/a€(k) a@) (111.1 1) Now, we come to the ensemble averages. For any observable 0, where 0 can be the average compactness, average number of hydrophobic contacts, average fraction of buried residues that are hydrophobic, average number of bends, average number of hydrophobic contacts per bend, or any other observable quantity, we can write the ensemble average generally as
So far we have been discussing absolute sensitivities. To facilitate comparison of the sensitivities of different ensembleaveraged quantities, we choose to display and discuss our results in terms of “log plot^".^*-^^ Consider any observable, Q, where Q can be an equilibrium constant (K(i)), a thermodynamic quantity (F, S or E), or one of the ensemble averages. The log sensitivity of [aQ/ac(i)][~(i)/Q] = a In Q/a In ~ ( i ) . We also scale the second-order plots as [ $ Q / a ~ ( ia) c ( j ) ] [ ~ (c(j)/Q*]. i) Through the use of the Taylor expansion, it should be possible to utilize the results of the sensitivity analysis to estimate the results of making a given single or double mutation. For any
Bleil et al.
3382 J. Phys. Chem., Vol. 99, No. 10, 1995
.3
% v
a
M c is
-0.14 -0.18 -0.22 -0.26
-0.08 -o.l -0.12 -0.14
-0.02 -0.025 -0.03 -0.035
-&hh
Figure 1. Equilibrium constants K(i) as the hydrophobic interaction energy becomes increasingly negative. The value i is the number of hydrophobic interactions for which the equilibrium constant is calculated. given observable, Q, the effect of changing the hydrophobic energy of residue i, or residue j , or both on Q should be
I
0
2
-1.76 -2.08
1/
1
0
1
I
2
Ehh'2.0
I
Although in the study leading to this publication the authors have examined a great variety of different sequences, it has been determined that most of the principal findings can be demonstrated by considering a single sequence, namely, the all hydrophobic sequence for chain length 10. Therefore, we restrict our discussion, for brevity, to the all hydrophobic sequences unless otherwise noted. We begin this discussion with the new quantities which we would like to introduce. In Figure 1, we show the plot of the equilibrium constants vs the hh interaction energy for the all hydrophobic sequence of chain length 10. Unlike the average number of hh interactions used previously by other authors,12 the equilibrium constant plots provide a quick detailed overview of the prominent and increasingly significant interactions in the system. Here, we see that configurations with 1 hydrophobic contact for E& are favored between 0 and 0 . 4 5 k ~ T .Between approximately 0 . 4 5 k ~ Tand 0.7k& configuration with two hydrophobic contacts are favored, and after about 0 . 7 k ~ T , configurations with all four hydrophobic contacts are favored. Two prominent observations can be made. First, except for large values of E M , configurations with 0-4 contacts are significant. There is no region in which one particular configuration type is so strongly favored that all other types of configurations can be neglected. Second, this graph seems to show an intermediate, partially folded state with only two hydrophobic contacts favored which exists over only a short range of interaction energy. As E M increases, thermodynamically the model favors one contact, then two, and finally all four contacts.
8
j
-1.44
IV. Results and Discussion
6
Residue Number Figure 2. First-order log sensitivity of the entropy with respect to changes in the contribution to the interaction energy by residue i, c(i), for hydrophobic interaction energies cm = 0.01, 0.5, 1, 2, and 5. -1.12
provided Ac(i) is small. We found that this was not completely reliable at predicting the result of a specific mutation, especially at high values of Ac(i), although sensitivity analysis was useful for predicting which residues will most likely cause the greatest or the smallest changes in a given quantity once mutated.
4
I
c -0.18 -0.21
E=:;:;: mo j2 -0.27
1
-0.0044
0
2
Residue Number Figure 3. First-order log sensitivity of the equilibrium constant of configurations with 0 hydrophobic interactions with respect to changes , in the contribution to the interaction energy by residue i, ~ ( i )for hydrophobic interaction energies E M = 0.01, 0.5, 1, and 2. In Figures 2 and 3, we show two first-order sensitivity analysis plots that are representative of virtually all of the firstorder sensitivity plots we have seen for the hydrophobic sequence of chain length 10. Shown are the first-order sensitivities of the entropy and the equilibrium constant K(0). Immediately we see at least one utility of using sensitivity analysis in examining this model. Previous authors have demonstrated, via examination of a large portion of the sequence space, the significance of the terminal residues in determining the characteristics of the lattice protein folding Examination of Figures 2 and 3 verifies the significance of the end group through the examination of only one sequence. Furthermore, we simultaneously see the relative significances of all remaining residues as well. For example, residues 2 and 9 (n, - 1) tend to show the lowest sensitivities in these plots at all hydrophobic energies. At low hydrophobic energies, residues 4, 5, 6, and 7 also show comparable sensitivities to the end groups. However, their sensitivities become substantially reduced at high hydrophobic energies. These results are harder to predict a priori and demonstrate the usefulness of the sensitivity analysis method
. I . Phys. Chem., Vol. 99,No.
Two-Dimensional Square Lattice Model of Protein Folding -0.096 1 2 -0.1 -0.128 -0.144
a ro
ns= 1 o
-0.1
-0.12 -0.14
-0.104-0.128-0.1SZ -0.176-
e
-0.7
-
ns=9
-
ns= 12
-0.8
-0.4 -0.5
3
-
\
I
L
y
h
-0.7 -0.43-
involving five residues, an odd number, is not possible. The pattern in these plots seems to be the result of this constraint on contacts between the residues. It can be shown that, as a result of these constraints, the pattern of the number of other residues each residue can contact behave differently for even and odd numbered chains. For even chain lengths, the number of residues any given residue can contact for even chain lengths gives rise to the pattern j l...j...j...j...,while for odd chain lengths the pattern is j 1.J 1J.J 1J.J l...j...j 1 .... For example, for chain length 10,beginning with residue number 1 and ending with residue number 10, the number of other residues each residue can contact is 4...3...3...3...3...3..,3...3...3...4. However, for chain length l l , again beginning with residue number 1 and ending at 11, the number of other residues each given residue can contact in the configuration space is 4...4...3...4...3...4...3.,.4...3...4...4. These differences give rise to the jagged kind of sensitivity pattern for odd chain lengths and a smoother pattern for even chain lengths. That the number of contacts any given residue can make in the configuration space reflects significantly in its sensitivities is further shown in comparing the sensitivities of the end groups with the central residues. As shown in Figures 4 and 5, the end groups for the odd lengths chains do not show significantly higher sensitivity than the central residues that form the same number of contacts as the end groups. However, for the even chain lengths, the end residues are more sensitive than the central residues. In both even and odd chain lengths, the end residues always can contact the maximum number of other residues in the chain. For even chain lengths, this is always one more residue than any other residue in the chain, thereby causing the end groups to have higher sensitivities than any other residue in the chain. However, for odd chain lengths, every other residue can come in contact with this same number of other residues (Le., for chain length 11, residues number 1 , 2 , 4 , 6, and so forth can all make contact with four other residues, while in chain length 10 it is only the end residues that can come in contact with four other residues). Although the fact that two residues that are separated by an odd number of residues cannot form direct contacts in the 2-dimensional square lattice model has been briefly mentioned by other authors,18its consequence is further demonstrated here. It can be shown that these results are extendible to the cubic lattice as well. In Figure 6, we see the second-order sensitivities for the entropy for the chain length 10 all hydrophobic sequence. The first point deserving attention is the strong interaction between residues i...i 3, i..i 5 , i...i 7, and i...i 9. These indications are most clearly evident at low hydrophobic interaction energy. The cause is again due to the fact that only residues separated by an even number of residues can have direct contact. However, even though residues that are separated by an odd number of residues cannot form direct contact, one can still observe cooperativity of these residues, especially at high hydrophobic energies. These cooperative effects are mediated by the surrounding residues and are harder to predict by intuition alone. Sensitivity, analysis, however, provides an efficient means to analyzing such cooperative effects. Finally, to study the reliability of sensitivity analysis in predicting the results to mutations, we show in Figures 7 and 8 the actual results of mutations in the all hydrophobic sequence for hydrophobic energy EM = 1. In these plots, the number along the abscissa refers to which residue has been mutated from hydrophobic to hydrophilic in order to generate this plot. We see that, in general, the changes made reflect the changes predicted in the sensitivity analysis plots (Figure 2 and 3). In
+
-0.08
s
g
-0.1 -0.12 -0.14 -0.16
A
IO, 1995 3383
+
+
+
+
+
2 4
:% -0.7
0
10
1 2
14
Residue Number Figure 5. First-order log sensitivity of the equilibrium constant of configurations with 0 hydrophobic interactions with respect to changes in the contribution to the interaction energy by residue i, E@, for hydrophobic interaction energy E* = 1, for chain lengths n, = 9-13. in identifying the key components of a molecule that determine the properties of the molecule. Another trait of the model demonstrated by the sensitivity analysis is shown in Figures 4 and 5 . Here we plot exactly the same quantities as plotted in Figures 2 and 3, again for the purely hydrophobic sequence, but this time for chain lengths 9-13. The most striking feature of these sensitivity plots is the zigzag structure for odd chain lengths, as compared with the even chain length plots, which show a fairly smooth interior region of the plots. These plots show that an odd lengthed chain will behave differently than an even lengthed chain. The cause for this is that, in the square lattice model, even numbered residues can have direct interaction only with odd numbered residues. For instance, an i..i 3 interaction is possible because there are an even number of residues, four, in the loop. Likewise, there are an even number of residues in the loop with i...i 5 , i...i 7, and i...i 9 interactions. However, an i...i 4 interaction,
+
+
+
+
+
+
+
+
+
3384 J. Phys. Chem., Vol. 99, No. IO, I995
Bleil et al. 1-
234-
5-
6789-
IO-1
2
3
4
5
6
7
8
9
10
1
2
3
4
5
6
7
8
9
10
7
8
9
1
2 3 4
5 6
7 8
9 10
1
2
3
4
1
1
5
6
,
7
1
8
9
0
I, 6
7
8 9
10 1
2
3 4 5 6 7 8 9 ' Figure 6. (a, top left) Scaled second-order sensitivity of the entropy with respect to changes in the contribution IO the interaction energy by residues i and j , f(i) and (ti), for hydrophobic interaction energies f m = 0.01. Positive correlations are represented by dots on a white background. Higher densities of dots suggest higher magnitudes of correlations. Negative correlations an represented similarly except on a darker background. (b, middle left) Same as (a), except f w = 0.5 (c. bottom leii) Same as (a), except EM = 1.0. (d, top right) Same as (a), excevt shh= 2.0. (e. middle right) Same as (a), except chh = 5.0. virmally all of the quantities examined, we found that, for single mutations, the first-order sensitivity coefficient is a good indicator of the changes we can expect, with some few exceptions. To get an idea of the quantitative reliability of the sensitivity analysis predictions, we ran a series of mutations on a variety of sequences. Figure 9 presents the two mutation paths we have chosen. Rather than beginning with the all hydrophobic sequence as before, we choose instead to begin with a sequence which contained both hydrophilic and hydrophobic residues. We then change one residue at a time and use the sensitivity analysis to predict the change that we would expect in 13 different observables. which can be categorized as follows: for the thermodynamic quantities we predicted changes in the free energy, entropy, and internal energy; for the equilibrium constants we predicted changes in K(0).K(l), K(2), K(3), and
K(4); for the ensemble average observables we predicted changes in the compactness, number of hydrophobic-hydrophobic interactions, bends, hydrophobic-hydrophobic interactions per bend, and number of buried residues that are h y d r ~ p h o h i c .For ~ ~ ~each ~ ~ of these quantities, we made the predictions at hydrophobic interaction energies of 0.01 with Aeh = 0.1, 1.0 with Aeh = 1.0 and of 4.0 with Aeh = 2.0, making predictions in both directions of any given mutation. Finally, we included the second path shown in Figure 9, with E,,,, = 4.0, but with AEh = 1.0. and referring to any given residue i with e(i) = 1.0 as hydrophobic. In all mutations, we restricted our predictions to the direction of the change only, and considering only the first-order sensitivity term, resulting in over 660 predictions in all. In Table 1, we have tabulated the results of the first-order sensitivity predictions of mutations as described above. We
J. Phys. Chem., Vol. 99, No. 10, 1995 3385
Two-Dimensional Square Lattice Model of Protein Folding 7.05 1
I
7l\
I
I
I
I
hPPhPPhPPh
1
f
6.95
6.8
0
I
1
I
I
I
I
2
4
6
8
IO
12
Residue Number Figure 7. Results of actual mutations in the entropy as -EM increases. 0.11
,
I
I
I
I
I
I
0.105
1
0.1 h
2
0.085
t
4
0.08
0.075 0
Figure 9. (a, top) Mutation pathways: changes made to test the use of sensitivity analysis for predicting the results of changes in the sequence. All mutations h p or p h. (b, bottom) Same as (a), except mutations are h x, p x, x h, or x p. Here, x is a residue whose contribution to the hydrophobic energy is only half of
- - - --
0.095 0.09
hhphpphxph
2
4
I
Y
I
I
6
8
10
12
Residue Number Figure 8. Results of actual mutations in the equilibrium constant K(0) as -cb increases. have excluded the second-order terms, because that greatly complicates the picture and has been found to decrease the accuracy of the predictions. One thing that we note immediately from this table is that the accuracy of the-prediction decreases as hydrophobic energy increases. This is problematic, because in the square lattice model it is the higher hydrophobic energies that correspond to physiologically interesting qualities of these proteins. Overall, we find that at low hydrophobicities we have over 90% correct predictions in the direction (but not magnitude) of the changes once the mutations are made. At higher hydrophobicities, this accuracy decreases to roughly 75%. Therefore, from these studies, it appears that fmt- and secondorder sensitivity analyses are particularly useful for identifying the residues that are most or least important in determining a certain property of a macromolecular system. It is, however, harder to pick one residue at random and try to quantitatively predict with sensitivity analysis how a property of a macromolecule may be affected by a modification of this residue.
V. Conclusions Due to the large computational costs required to carry out molecular dynamics simulations, we have used a relatively simple 2-dimensional lattice model of protein folding to test the use of the sensitivity analysis approach for identifying the determinants of the properties of biomolecular system and for guiding the design of novel bioactive molecules. For this model, the predictions from a sensitivity analysis can be checked by carrying out many exact calculations, which is difficult to do with molecular dynamics simulations.
-
that of h.
TABLE 1: Predictions of Results of Mutations Using First-Order Sensitivity Analysis EM AEh correct incorrect % correct 0.01 1 4 4
0.1 1 2 1
133 128 119 152
14 25 36 56
90 84 77 13
total
532
131
80
Even for this relatively simple model of polypeptides, sensitivity theory was found to be very useful for demonstrating traits of the model that were not expected a priori. For example, two residues that did not interact directly could still cooperate to affect the properties of the polypeptide model. These effects were mediated by the other residues in the model polypeptide and were harder to predict by intuition alone. Sensitivity analysis provides an efficient and economical tool to systematically identify such effects. In evaluating the ability of the first-order sensitivity theory to aid molecular design, the patterns predicted via the firstorder sensitivity have been shown to in general give the correct qualitative features when single residues are changed. The use of the first-order sensitivity analysis was shown to be an excellent indicator of the direction of changes to be expected upon sequence mutation for low hydrophobic energies (90% accuracy for EM < I), but only moderately reliable for higher hydrophobic energies (75% for .GM> 1). However, first-order sensitivity analysis appears to be very useful for identifying residues whose modifications may affect the properties of the model polypeptide the most or the least, even at high hydrophobic energies. Acknowledgment. This research has been supported by the Petroleum Research Fund, administered by the American Chemical Society, the National Institutes of Health, and the Bristol-Meyer Squibb Institute for Medical Research.
3386 J. Phys. Chem., Vol. 99, No. 10, 1995 References and Notes (1) Schultz, G. E.; Schrimer, R. H. Principles of Protein Structure; Springer-Verlag: New York, 1990; p 154ff and references therein. (2) Anfinsen, C. B. Science 1973, 181, 223. (3) Beeman, D. J. Comput. Phys. 1976, 20, 130. (4) Creighton, T. E. J . Phys. Chem. 1985, 89, 2452. ( 5 ) Creighton, T. E. Proc. Natl. Acad. Sci. USA. 1988, 85, 5082. (6) Skolnick, J.; Kolinski, A. Annu. Rev. Phys. Chem. 1989, 40, 207. (7) McCammon, J. A,; Harvey, S . C. Dynamics of Proteins and Nucleic Acids; Cambridge University Press: New York, 1987; p 42 and references therein. (8) Gilson, M. K.; Sharp, K. A.; Honig, B. H. J . Comput. Chem. 1988, 9, 327. (9) Gilson, M.; Honig, B. Proteins 1988, 4, 7. (10) Dill, K. A. Biochemistry 1985, 24, 1501. (11) Chan, H. S.; Dill, K. A. J. Chem. Phys. 1989, 90, 492. (12) Lau, K. F.; Dill, K. A. Macromolecules 1989, 22, 3986. (13) Chan, H. S.; Dill, K. A. Macromolecules 1989, 22, 4559. (14) Lau, K. F.; Dill, K. A. Proc. Natl. Acad. Sci. U S A . 1990,87,638. (15) Chan, H. S.; Dill, K. A. Proc. Natl. Acad. Sci. U S A . 1990, 87, 6388. (16) Dill, K. A. Biochemistry 1990, 29, 7133. (17) Chan, H. S.; Dill, K. A. Annu. Rev. Biophys. Biophys. Chem. 1991, 20, 447. (18) Chan, H S.; Dill, K. A. J . Chem. Phys. 1991, 95, 3775. (19) Chan, H. S.; Dill, K. A. Phys. Today 1993, 46, 24. (20) Zwanzig, R.; Lauritzen, J. J. Chem. Phys. 1968, 48, 3351. (21) Lauritzen, J.; Zwanzig, R. J. Chem. Phys. 1970, 52, 3740. (22) Taketomi, H.; Ueda, Y.; Go, N. Int. J . Peptide Res. 1975, 7, 445. (23) Go, N. Adv. Biophys. 1976, 9, 65. (24) Ueda, Y.; Taketomi, H.; Go, N. Biopolymers 1978, 17, 1531.
Bleil et al. (25) Go, N.; Taketomi, H. Int. J. Pept. Protein Res. 1979, 13, 447. (26) Kolinski, A.; Skolnick, J.; Yaris, R. Proc. Natl. Acad. Sci. U S A . 1986, 83, 7267. (27) Kolinski, A.; Skolnick, J.; Yaris, R. Biopolymers 1987, 26, 937. (28) Skolnick, J.; Kolinski, A.; Yaris, R. Proc. Natl. Acad. Sci. U.SA. 1988, 85, 5057. (29) Skolnick, J.; Kolinski, A. Science 1990, 250, 1121. (30) Kolinski, A.; Milk, M.; Skolnick, J. J . Chem. Phys. 1991,94,3978. (31) Skolnick, J.; Kolinski, A. J. Mol. Biol. 1991, 221, 499. (32) Bleil, R. E.; Mohanty, U. Manuscript in preparation. (33) Tomavick, R.; Vukobratovic, M. General Sensifivity Theory; Elsevier: New York, 1972. (34) Franck, P. Introduction to System Sensitivity Theory; Academic Press: New York, 1978. (35) Rabitz, H.; Kramer, M.; Dacol, D. Annu. Rev. Phys. Chem. 1983, 34, 419. (36) Rabitz, H. Chem. Rev. 1987, 87, 101. (37) Rabitz, H. Science 1989, 246, 221. (38) Susnow, R.; Nachbar, R. B., Jr.; Schutt, C.; Rabitz, H. J . Phys. Chem. 1991, 95, 8585. (39) Thacher, T.; Hagler, A.; Rabitz, H. J . Am. Chem. SOC.1991, 113, 2020. (40) Wong, C. F.; Rabitz, H. J. Phys. Chem. 1991, 95, 9628. (41) Wong, C. F. J. Am. Chem. SOC.1991, 113, 3208. (42) Zhu, S.-B.; Wong, C. F. J. Chem. Phys. 1993, 98, 8892. (43) Zhu, S.-B.; Wong, C. F. J. Chem. Phys. 1993, 99, 9047. (44) Wong, C. F.; Zheng, C.; Shen, J.; McCammon, J. A,; Wolynes, P. J . Phys. Chem. 1993, 97, 3100. (45) Wong, C. F. Unpublished results. JP941217G