Computing Stability Effects of Mutations in Human Superoxide

Jan 28, 2014 - In this work, stability changes of SOD1 mutations were computed with five methods, CUPSAT, I-Mutant2.0, I-Mutant3.0, PoPMuSiC, and SDM,...
1 downloads 0 Views 3MB Size
Article pubs.acs.org/JPCB

Computing Stability Effects of Mutations in Human Superoxide Dismutase 1 Kasper P. Kepp* DTU Chemistry, Technical University of Denmark, DK 2800 Kongens Lyngby, Denmark S Supporting Information *

ABSTRACT: Protein stability is affected in several diseases and is of substantial interest in efforts to correlate genotypes to phenotypes. Superoxide dismutase 1 (SOD1) is a suitable test case for such correlations due to its abundance, stability, available crystal structures and thermochemical data, and physiological importance. In this work, stability changes of SOD1 mutations were computed with five methods, CUPSAT, I-Mutant2.0, I-Mutant3.0, PoPMuSiC, and SDM, with emphasis on structural sensitivity as a potential issue in structure-based protein calculation. The large correlation between experimental literature data of SOD1 dimers and monomers (r = 0.82) suggests that mutations in separate protein monomers are mostly additive. PoPMuSiC was most accurate (typical MAE ∼ 1 kcal/mol, r ∼ 0.5). The relative performance of the methods was not very structure-dependent, and the more accurate methods also displayed less structural sensitivity, with the standard deviation from different high-resolution structures down to ∼0.2 kcal/mol. Structures of variable resolution and number of protein copies locally affected specific sites, emphasizing the use of state-relevant crystal structures when such sites are of interest, but had little impact on overall batch estimates. Protein-interaction effects (as a mimic of crystal packing) were small for the more accurate methods. Thus, batch computations, relevant to, e.g., comparisons of disease/ nondisease mutant sets or different clades in phylogenetic trees, are much more significant than single mutant calculations and may be the only meaningful way to computationally bridge the genotype−phenotype gap of proteomics. Finally, mutations involving glycine were most difficult to model, of relevance to future method improvement. This could be due to structure changes (glycine has a low structural propensity) or water colocalization with glycine.



INTRODUCTION An urgent scientific challenge is to translate the vast amount of genomic data into phenotypes, preferably by mapping the effects of single amino acids on the overall protein properties, and in turn, on the cell and organism physiology. Such a mapping would aid both toward engineering new proteins for, e.g., industrial purposes and toward understanding and managing genetic disease risks all the way down to the individual, as such a mapping from genotype to phenotype would carry substantial diagnostic and therapeutic potential.1,2 Among protein properties potentially mapped in this way are the functionalities of signal sites, catalytic proficiencies (e.g., kcat, Km of enzymes), modification propensities of surface sites (e.g., glycosylation or phosphorylation), and general properties such as protein stability.3−5 Despite focus on all these properties, there is little effort to understand them in connection, except perhaps via whole-cell models6 and extraordinary cases where the chemical properties of systemically important proteins can be mapped all the way to the organism fitness level.7,8 The enzyme superoxide dismutase 1 (SOD1, Figure 1) is one of three SOD isoforms in humans that protect against oxidative stress caused by infections, toxic exposure, and the mitochondria’s production of superoxide, O2−, as a byproduct during oxidative metabolism.9,10 SOD1 is a homodimer with Cu and © 2014 American Chemical Society

Zn in its active site, and it performs two half-reactions in a catalytic cycle that oxidizes two O2− radicals into one molecule of O2 and one H2O2, while Cu shifts between oxidations states I and II, viz., eqs 1 and 2:9,11 Cu(II) + O2− → Cu(I) + O2

(1)

Cu(I) + O2− + 2H+ → Cu(II) + H 2O2

(2)

SOD1 is an important protein in higher organisms, due to its role in antioxidant modulation and defense. This importance is clear from its direct genetic role in amyotrophic lateral sclerosis (ALS), a devastating disease that attacks the motor neurons.12−14 ALS-causing SOD-1 mutants gain an unknown toxic function,9,14 hypothesized to relate to either protein misfolding or aggregation15,16 or redox toxicity.17,18 Protein misfolding is seen to be coupled to metal release,19 and many mutations have been identified that reduce stability9,20 or reduce metal content,11,21 or a combination of these.22 Despite its pathological importance, there have been no attempts at modeling and understanding the stability effects of SOD1 at the theoretical level. Also, the many available crystal Received: December 5, 2013 Revised: January 28, 2014 Published: January 28, 2014 1799

dx.doi.org/10.1021/jp4119138 | J. Phys. Chem. B 2014, 118, 1799−1812

The Journal of Physical Chemistry B

Article

Figure 1. (A) Human superoxide dismutase (Protein Data Bank structure 2C9V), with zinc and copper shown in purple and blue. (B) Correlation between literature experimental free energies (kcal/mol) of mutagenesis for dimer and monomer SOD1.

structure.37,38 Thiltgen and Goldstein39 also identified this effect independently and suggested using it for benchmarking protein stability predictors. Because of the results from previous benchmarks,23,29,31 the present study focused on available web-server-based methods PoPMuSiC32,33 and I-Mutant2.025 and I-Mutant3.0 that were found to do well in independent benchmarks,23,29,31 together with SDM31 and CUPSAT,35,36 which together represent several design philosophies. The two goals of the work were (i) to understand in detail how variations in structural input affect the results of such methods and (ii) to identify accurate predictors of the stability effects of SOD1 mutations.

structures of SOD1 provide an opportunity for testing the role of structural input on the protein properties predicted by computational methods. Structures in the protein database vary substantially due to methodological factors such as overall resolution, poorly resolved local geometries, missing residues, parameters used for assignment of heteroatom geometries, and the programs and algorithms used for refinement. The structures also vary as they reflect different chemical systems, e.g., different crystal types and unit cells, different numbers of protein molecules per structure file (crystal packing forces may affect structures indirectly even when only one protein is present in each structure file), different environments and conditions (buffer, ionic strength, temperature, and pH), and different protein states measured (oxidation states, posttranslation modified states, and active or inactive forms, etc.). This heterogeneity in deposited protein structures, although poorly explored, could potentially critically affect the robustness and physiological relevance of structure-based protein property prediction. Several attempts have been made at evaluating the relative performance of theoretical methods for calculating the stability of protein variants: In one comparison,23 EGAD,24 IMutant2.0,25 and CC/PBSA26 performed best in terms of overall correlation coefficients (r), mean absolute error (MAE), and root-mean-square deviation (RMSD) relative to experimental data, vs well-known competitors such as FoldX27 and Rosetta.28 Another independent benchmark study found IMutant3.0 to be most accurate.29 Many methods have a preference toward destabilization which reflects the general distribution of mutations to be shifted toward destabilization.30 The method SDM31 was suggested to predict more accurately the minority of stabilizing mutations,31 although the method PoPMuSiC32−34 performed best in terms of correlation (r = 0.67−0.71 for the data sets) and substantially better than FoldX, ERIS, I-Mutant2.0, and CUPSAT.35,36 We previously reported several applications of the software FoldX that identified a range of methodological challenges relating to structural sensitivity and wild-type sampling bias, as evident, e.g., in differences in forward and backward mutations, referred to as “hysteresis” in analogy to free energy perturbation methods, because it arises from the better-sampled crystal



METHODS Experimental Data Used for Benchmarking. The experimental data available for the change in free energy of folding, ΔΔG, associated with single-point mutations in SOD1, were taken from Nordlund and Oliveberg,40 Lindberg et al.,41 Stathopulos et al.,42 and Byström et al.43 In total, the data derived from differential scanning calorimetry and kinetic unfolding experiments amounted to 54 values of ΔΔG for the apomonomer and 33 data points for mutants of the holodimer SOD1. The data are collected in Table 1 together with the solvent accessibility of each residue, given in percent of the corresponding fully exposed amino acid side chain, and the closest distance between identical sites in the two monomers of the dimer structure 2C9V. The methodological error in the experimental ΔΔG is on the order of up to ∼0.3 kcal/mol.41,42 To test the intrinsic variation in experimental data due to the use of either of these protein forms, the two experimental data sets were first correlated, as such variation between protein forms provides a first, underlying premise for the accuracy of computational predictions. Tested Methods. Methods were selected on the basis of their performance in previous benchmarks, as discussed in the Introduction (I-Mutant,25 PoPMuSiC,32 and EGAD24), on the ability to model all possible mutations (i.e., they should be generally applicable to any type of mutation, ruling out, e.g., EGAD,24 which cannot calculate ΔΔG for mutations involving Pro, Cys, and Gly), they should give quantitative output on ΔΔG (i.e., not just “stabilizing/destabilizing as, e.g., MuPro), 1800

dx.doi.org/10.1021/jp4119138 | J. Phys. Chem. B 2014, 118, 1799−1812

The Journal of Physical Chemistry B

Article

Table 1. Effects of Point Mutations on the Stability of Human SOD1 (ΔΔG < 0, Stabilizing; ΔΔG > 0, Destabilizing) mutant A4V I18V V29A V31A I35V L38A L38V G41D G41S H43R F45A H46R V47A F64A D76V D76Y V81A L84A L84V G85R N86D N86K N86S V87A D90A D90V G93A G93S G93R G93D G93V V97A E100G D101G D101N I104A I104F S105L L106V L106A C111A I112A I113T G114A L117A V119A N139D N139K L144A L144S L144F V148G I149A I149V

exptl ΔΔGmonomer (kcal/mol) 41

1.62 0.3740 2.8140 1.3940 0.6640 2.9540 2.2543 2.9141 2.9341 2.6841 2.0740 −0.8141 1.4540 −0.2040 0.2043 0.2143 0.0140 1.8540 1.8741 1.9341 −0.0543 0.6643 −0.0743 1.6140 0.6641 1.4143 2.4341 3.7042 4.4042 5.7042 7.0042 2.9040 0.9141 0.7243 −0.8043 1.5340 0.7241 1.8143 1.7841 3.7140 −0.4240 1.2540 1.2541 2.3343 1.7840 2.5540 0.0143 0.0643 −0.1340 0.2043 0.2341 2.5341 4.0540 0.2740

exptl ΔΔGdimer (kcal/mol) 4.31

41

3.2441 3.4741 4.4741 4.0541 −0.4841

0.0643 0.0943

2.6541 0.9041 0.9443 1.4143 0.4543 0.6541 1.8543 2.9841 3.1042 3.0042 4.5042 5.4042 2.2241 1.3943 −0.7543 1.2441 2.6043 3.6241

2.4841 3.2743

−0.3643 0.2443 1.0743 1.8941 4.5641

% solvent accessibilitya

closest distance to twin site (Å)b

1.8 0.0 0.0 0.0 0.0 6.0 6.0 28.0 28.0 1.1 0.0 0.0 0.0 3.0 40.3 40.3 2.6 1.6 1.6 2.5 25.3 25.3 25.3 4.6 52.7 52.7 6.4 6.4 6.4 6.4 6.4 5.9 85.9 7.6 7.6 0.5 0.5 4.3 3.8 3.8 18.5 0.0 10.3 0.0 0.0 0.0 34.8 34.8 27.9 27.9 27.9 14.3 0.0 0.0

16.1 (CB−CB) 21.6 (O−O) 26.3 (CG1−CG1) 29.7 (CG1−CG1) 28.4 (CG2−CG2) 33.8 (N−N) 33.8 (N−N) 46.6 (O−O) 46.6 (O−O) 39.2 (CB−CB) 28.9 (CE1−CE1) 25.5 (O−O) 19.8 (CG1−CG1) 20.0 (CZ−CZ) 46.8 (O−O) 46.8 (O−O) 26.1 (CG1−CG1) 29.5 (CD1−CD1) 29.5 (CD1−CD1) 40.3 (N−N) 43.4 (CA−CA) 43.4 (CA−CA) 43.4 (CA−CA) 37.3 (CG2−CG2) 46.5 (OD1−OD2) 46.5 (OD1−OD2) 36.9 (O−O) 36.9 (O−O) 36.9 (O−O) 36.9 (O−O) 36.9 (O−O) 35.2 (CG2−CG2) 39.5 (C−C) 33.3 (O−O) 33.3 (O−O) 24.5 (C−C) 24.5 (C−C) 17.1 (O−O) 12.2 (O−O) 12.2 (O−O) 8.5 (SG−SG) 11.1 (O−O) 3.7 (O−CG2) 4.3 (N−CA) 16.6 (N−N) 26.4 (N−N) 48.5 (C−C) 48.5 (C−C) 22.1 (O−O) 22.1 (O−O) 22.1 (O−O) 4.4 (O−CG2) 5.6 (O−O) 5.6 (O−O)

a

Computed with CUPSAT. bClosest distance of any two atoms in twin sites, as calculated from crystal structure 2C9V. Bold sites are metal-region sites, defined as having any atom within 8 Å of either Cu or Zn. Of these, H46 directly binds Cu.

were selected for this work, all providing web-based, quantitative estimates of stability effects of all possible mutations: CUPSAT,36 I-Mutant2.0,25 I-Mutant3.0, PoPMu-

and on their general availability as web-servers (ruling out, e.g., CC/PBSA26 and FoldX,27 which we have however investigated in detail in previous work37,38). Thus, five stability predictors 1801

dx.doi.org/10.1021/jp4119138 | J. Phys. Chem. B 2014, 118, 1799−1812

The Journal of Physical Chemistry B

Article

average ΔΔG reported; thus, all five methods report a free energy change of a single mutation. The experimental dimer data correspond to mutations in both chains and are thus not directly comparable, as will be discussed. As seen from the results of this work, the three metrics used are distinct and not strongly correlated: The correlation coefficient (r) from a linear regression analysis describes the ability of a method to provide the correct overall trend in the data set; the mean signed error (MSE) describes the systematic error in kilocalories per mole toward either destabilization and stabilization; and the mean absolute error (MAE, kcal/mol) describes the overall numerical accuracy of the method compared to experimental data. Thus, all three metrics should be used for assessing the performance of protein property calculations. A fourth metric, the structural robustness (defined as the standard deviation from the average obtained for all five structures), is also investigated in this work, as it is important for studying robustly proteins with crystal structures of variable resolution and representing different types of chemical environment and protein state.

SiC2.1,32,33,44 and SDM.31 While there are also sequence-based versions of some of the methods, the structure-based versions were used for comparison to the other structure-based approaches. CUPSAT uses atom potentials from chemical properties and empirically derived torsion potentials. PoPMuSiC and SDM use environment-specific statistical potentials based on empirically observed substitution probabilities as developed by Topham et al.45 The latest version of PoPMuSiC2.1 that uses structural input was applied.34,44 IMutant2.0 and I-Mutant3.0 (the structure-based versions are used in this work) use support vector machines accounting for amino acid substitution and structural environment, trained on ∼2000 experimental data points from the Protherm data base.46 Protein Structures and Structural Sensitivity. To test the sensitivity of methods to the choice of crystal structure used for computing stability effects upon mutation, we performed all calculations for the data set with five different Protein Data Bank (pdb) structures: 2C9V,47 2V0A,48 1PU0,49 1HL4,50 and 1HL5.50 Structures 2V0A and 2C9V both represent the physiologically relevant holodimer structure of similar high (near-atomic, 1.15 Å) resolution with only one dimer per structure file, but display some structural differences, with RMSD ∼ 1.3 Å. Thus, they provide highly accurate structures that can be used to probe the lower limit of sensitivity of protein stability calculators to structural input, something that has not been investigated. 1PU0 is also a holodimer structure but of lower resolution (1.70 Å) than 2C9V and 2V0A, and it contains five dimers in the unit cell, providing a test of the role of resolution and protein−protein interactions in providing accurate property estimates. Many crystal structures in the protein database have such moderate resolutions or include multiple protein copies per structure file. Importantly, multiple copies in pdb structures allow an investigation of how crystal packing interactions affect protein properties, by explicit computation and comparison to single-protein results. 1HL4 and 1HL5 (1.8 Å resolution) represent a metal-deficient (apo) dimer form of the enzyme with two dimers in the structure, and a corresponding metal-containing holodimer form with nine dimers in the structure file, respectively. The structural sensitivity was quantified by computing the average ΔΔG obtained for a mutation across all five crystal structures, then calculating the standard deviation from this average for each site, and finally averaging over all computed data points. Using 2C9V, the closest distances between twin sites in the dimer have been calculated as shown in Table 1. There are five twin sites with mutual distances shorter than 10 Å: Cys111 (8.5 Å between sulfurs, atom type SG); Ile113 (3.7 Å from backbone oxygen (O) to γ carbon CG2); Gly114 (4.3 Å between backbone nitrogen (N) and α carbon, CA); Val148 (4.4 Å from backbone O to γ carbon CG2); and Ile149 (5.6 Å between backbone oxygens (O)). In the latter case where backbone atoms are in closest contact, the interaction will not be very affected by the mutations that change the side chains. The remaining 50 mutated sites are separated from their twin sites by >10 Å (Table 1) and can thus be expected to have limited interaction in the dimer. Analysis of Predictor Performance. The performance of the five stability calculators was evaluated independently against both the apomonomer data set (54 data points) and the holodimer data set (33 data points). Results with all five structures were compared to the experimental data. Except for PoPMuSiC, chain A was explicitly chosen for computation. For PoPMuSiC, mutations of all chains were computed and the



RESULTS AND DISCUSSION Correlations between Holodimer vs Apomonomer Data. Experimentally determined ΔΔG values are in some cases available for both the holodimer and apomonomer SOD1. It is of interest to know how such data correlate, not only for estimating the variation in experimental data but also for computing protein properties of multimers or, as is often the case, experimental protein structures containing several protein molecules. To understand these effects, in Table 1, 33 experimental ΔΔG values have been collected that are available for the same mutation in both the monomer and dimer states. The comparison in Figure 1B provides an estimate of the validity of monomer data in the discussion of multimers. A correlation coefficient of 1 would imply that mutations are completely additive, because all information of the holodimer structure is also present in the monomer; i.e., the monomers would be noninteracting. This assessment is important before concluding on the basis of either of these data types independently, e.g., using monomer stability effects to conclude on stabilities of the biologically relevant functional dimer. Also, most protein stability calculators can only compute single-point mutations, making them potentially less applicable in estimating the properties of mutants of functional multimers. Figure 1B should be viewed with bearing in mind that the experimental uncertainty is of the order of ±0.3 kcal/mol.42 Figure 1B shows that the correlation coefficients of the ΔΔG for mutations occurring in the holodimer and in the monomer are quite high (r = 0.82, r2 = 0.68; see details in the Supporting Information). The mean average deviation (MAD) between the data sets is 0.92 kcal/mol (RMSD = 1.12 kcal/mol), and the MAE of prediction of dimer ΔΔGs from the linear regression model using the monomer data (i.e., the MAD from the regression line) is 0.75 kcal/mol, with a standard error of 1.01 kcal/mol. The correlation shows that, as a whole, the effects of the mutations are mainly additive: The reason is that the monomers are separated in space so that only a handful (see Methods) of the mutated twin sites can mutually interact in the dimer structure. The typical deviation of ∼1 kcal/mol, the correlation (r = 0.82), and the tendency of the dimer double mutants to have larger numerical effects suggest that the typical stability losses of single-point mutations are of the order of +1 kcal/mol (destabilizing). The average ΔΔG is +1.59 kcal/mol for the 1802

dx.doi.org/10.1021/jp4119138 | J. Phys. Chem. B 2014, 118, 1799−1812

The Journal of Physical Chemistry B

Article

Table 2. Overall Performance of Methods for 54 SOD1Monomer Mutantsa method CUPSAT

I-Mutant2.0

I-Mutant3.0

PoPMuSiC

SDM

MAE MSE r MAE MSE r MAE MSE r MAE MSE r MAE MSE r

2C9V (1.15 Å)

2V0A (1.15 Å)

1PU0 (1.70 Å)

1HL4 (1.8 Å)

1HL5 (1.8 Å)

holodimer

holodimer

holodimer

apodimer

holodimer

1.78 −0.23 0.39 1.16 0.28 0.28 1.17 0.46 0.22 1.10 −0.23 0.50 1.50 0.60 0.22

2.27 −0.26 0.33 1.20 0.28 0.26 1.20 0.47 0.20 1.09 −0.24 0.51 1.50 0.58 0.11

1.71 −0.28 0.32 1.15 0.23 0.34 1.19 0.46 0.19 1.12 −0.24 0.47 1.43 0.49 0.16

2.28 −0.52 0.30 1.14 0.29 0.28 1.16 0.48 0.20 1.11 −0.23 0.48 1.48 0.53 0.11

2.33 −0.27 0.27 1.18 0.26 0.26 1.16 0.46 0.21 1.11 −0.25 0.49 1.39 0.60 0.19

a

MAE: Mean absolute error in kcal/mol. MSE: Mean signed error (experimental minus computed) in kcal/mol. MSE < 0 means that the method is on average too destabilizing compared to experimental data. r: Correlation coefficient from linear regression analysis.

experimental monomer data set, making it representative of typical destabilizing mutations.30 As seen from the correlation analysis discussed previously, the dimer data are largely deducible from the monomer data by linear regression. The variation resembles the uncertainty of the best stability predictors (MAE ∼ 1 kcal/mol), showing that the errors of computing mutations only in one chain cannot be neglected, even if the batch trend is robust. Thus, one can conclude that the two twin-site mutations of the dimers exhibit the same physics, causing similar, largely additive properties of the two mutations. If one is interested in knowing the total quantitative effect of a mutation in a dimer such as SOD1, one must account for the fact that the mutation occurs twice (except in special cases where heterodimers are formed, e.g., by heterozygote combination of monomers, but this is not relevant here, as the experimental data are for homodimers). Still, the remaining correlation errors above will be present in the computational estimates, in addition to the computational method’s own intrinsic errors. Thus, the methods are first compared against the larger experimental monomer data set. Afterward, the structural sensitivity from using holodimer or monomer structures is investigated. Performance of Stability Predictors for Monomer SOD1 Data. In Table 2, several performance indicators are shown for the five methods, which have been systematically applied to five different crystal structures, as discussed in Methods, of variable resolution and number of protein molecules per unit cell. The plots resulting from linear regression are shown in Figure 2 for two crystal structures, 2C9V, which is representative as a high-resolution dimer structure with no other molecules in the unit cell, i.e., representing the real functional dimer, and 1HL4, which is an example of the direct opposite, an apomonomer structure with several closely interacting crystal-packed protein molecules in the unit cell, potentially affecting the structure of each monomer. More results for other structures are provided in Supporting Information. CUPSAT. From Table 2, it can be seen that the accuracy of CUPSAT is quite dependent on the crystal structure used, with accuracy being generally higher for the higher resolution

structures containing only the dimer molecule. In all cases, the MAE is high (>1.7 kcal/mol). However, the systematic bias toward destabilization/stabilization compared to experimental data (as measured by MSE) is relatively small (e.g., −0.23 for 2C9V) except when using the structure 1HL4, where the systematic error is −0.52. For all five structures, CUPSAT is too destabilizing compared to experimental data. The correlation between CUPSAT and experiment is quite reasonable and is markedly higher for the realistic dimer structures, in particular 2C9V (r = 0.39) and reduces to r = 0.27 when using 1HL5. Thus, CUPSAT reproduces much of the stability trend of the data set and has a modest destabilization bias, but exhibits poor quantitative accuracy due to large variations in numerical results: This scattering is very clear from Figure 2. The example of CUPSAT therefore gives a good insight into the importance of using all three performance metrics together. A fourth metric, structural sensitivity, is investigated later in this work. I-Mutant2.0. From Table 2, it can be seen that across the five used crystal structures, I-Mutant2.025 had a very similar MAE of 1.14−1.20 kcal/mol, which is comparable to the typical accuracy that parametrized methods will provide on their parametrization data sets. This is the limit of “chemical accuracy” that these methods typically struggle with, and it is hard to improve accuracy much further except by explicit fitting of predictive models to the protein class of interest.23,37 One of the reasons, as shown in this work, is that in addition to the intrinsic errors in the computational model itself, errors arise from the use of crystal structures that are not representative of the protein state that the experimental data derive from. IMutant2.0 provides similar high accuracy across all structures, suggesting that it may be less structure-sensitive than CUPSAT, which will be investigated further in subsequent text. The MSE of 0.23−0.28 kcal/mol implies that I-Mutant2.0 is too stabilizing for SOD1, although this is an expected and relatively modest bias. The difference in MSE between CUPSAT and I-Mutant2.0 of ∼0.5 kcal/mol is substantial; i.e., CUPSAT is much more destabilizing than I-Mutant2.0. The correlation coefficients of I-Mutant2.0 (0.26−0.34) are slightly smaller than those of CUPSAT, but I-Mutant2.0 performs numerically better than CUPSAT largely because the 1803

dx.doi.org/10.1021/jp4119138 | J. Phys. Chem. B 2014, 118, 1799−1812

The Journal of Physical Chemistry B

Article

Figure 2. Correlation between experimental monomer and computed stability changes (kcal/mol) for 54 single-point mutations; using structures 2C9V and 1HL4. (More data in the Supporting Information.) 1804

dx.doi.org/10.1021/jp4119138 | J. Phys. Chem. B 2014, 118, 1799−1812

The Journal of Physical Chemistry B

Article

Figure 3. Correlation between experimental holodimer and computed stability changes (kcal/mol) for 33 single-point mutations, using structures 2C9V.

Interestingly, as for I-Mutant2.0 and I-Mutant3.0, this performance was very similar on all five structures despite the substantial differences in these structures, suggesting that the method is very structure-insensitive, as already reported from alternative approaches in previous work.34 At the same time, the method had the highest correlation to experimental data, with r between 0.47 and 0.51. Thus, this method, which is based on environment-dependent substitution frequencies, which correlate with the chemical properties of the amino acids, was most accurate using all three metrics. SDM. Finally, the predictor SDM was investigated, with the overall results shown in Table 2. SDM displayed MAEs ranging from 1.39 to 1.50 kcal/mol, was more problematic, and had the largest systematic error of the tested methods, with strong bias (0.5−0.6 kcal/mol) toward too stable mutants. As seen in Figure 2, SDM produces many results that are substantially too stable (the scattered points below the regression line). This is consistent with its previously reported tendency toward stabilization, relative to other methods.31 The correlation coefficient of linear regression ranged from 0.11 to 0.22; i.e., the method was generally not capable of providing reasonable trends of the stabilities of the mutants. Finally, to estimate the redundancy in the performance of the methods, it was also attempted to produce a more accurate model by combining any three methods with equal-weight normalization: The best such combination consists of CUPSAT, I-Mutant2.0, and PoPMuSiC. As seen from Figure 2, using 2C9V as an example, while such a model can of course be optimized to be more accurate than any single method, the partial removal of systematic errors by combination of too

computed values are numerically more realistic and resemble the distribution and variance of the experimental data, as seen in Figure 2. Thus, the comparison provides insight into the strengths and weaknesses of such methods, as described by these three metrics: The qualitative physics of the models, as represented by the correlations, are similar and of modest accuracy, whereas they are biased differently in terms of systematic errors (MSE), and CUPSAT lacks the numerical scaling of a realistic distribution of such stability effects, having too large a numerical spread in its output. I-Mutant3.0. A later version of I-Mutant3.0 was also studied in this work, because it is methodologically different from IMutant2.0.25 In terms of numerical accuracy, it performed very similarly to its predecessor for the 54 SOD1 data in this work across all five structures (MAE ∼ 1.16−1.20 kcal/mol) but displayed a large systematic error, being too stabilizing (MSE ∼ 0.46−0.48 kcal/mol). At the same time, the overall predicted trend in stability was markedly worse than that of CUPSAT, with typical correlation coefficients of 0.2. The method thus still provides some of the real physics of the stabilization effects of the mutants but substantially less than both CUPSAT and IMutant2.0. As for I-Mutant2.0, there was no strong structural dependence on overall performance. PoPMuSiC. As summarized in Table 2, PoPMuSiC displayed the highest numerical accuracy (the smallest MAE ∼ 1.09−1.12 kcal/mol) for the five structures and also had a modest systematic error, as evaluated by a MSE between −0.23 and −0.25 kcal/mol (the smallest overall for all five structures). Thus, PoPMuSiC provided a realistic balance between stabilization and destabilization of the SOD1 mutations. 1805

dx.doi.org/10.1021/jp4119138 | J. Phys. Chem. B 2014, 118, 1799−1812

The Journal of Physical Chemistry B

Article

stabilizing and destabilizing methods (MSE ∼ −0.06 kcal/mol) did not substantially improve numerical accuracy (MAE ∼ 1 kcal/mol) or correlation (increase in r2 by 0.03) compared to PoPMuSiC. The three metrics used so far in this work are not very correlated, as seen, e.g., by comparing CUPSAT (having accurate trends with large numerical error) and I-Mutant3.0 (having high numerical accuracy despite large systematic errors and poor trend). A good trend with high correlation can be obtained despite having both large systematic errors (the entire prediction curve shifted by a constant) and random errors (scattering of data, although following the correct trend), this being the tendency of CUPSAT (Figure 2). In contrast, the methods I-Mutant2.0, I-Mutant3.0, and PoPMuSiC provide realistic, narrow distributions of mutation effects,30 which, regardless of trend, will tend to make the MAE smaller. When the PoPMuSiC results were corrected for systematic errors by an additive constant to obtain MSE = 0 kcal/mol, it resulted in an MAE of 1.05 kcal/mol, only 0.05 kcal/mol smaller than the uncorrected PoPMuSiC results. The overall numerical accuracy, as measured by MAE, was PoPMuSiC > I-Mutant2.0, I-Mutant3.0 ≫ SDM ≫ CUPSAT. In terms of systematic error describing the method’s balance between stabilization and destabilization, the order of increasing MSE was PoPMuSiC < I-Mutant2.0 < CUPSAT < I-Mutant3.0 < SDM. The most challenging metric is the stability trend of the studied mutants, because this trend cannot simply be remedied by any scaling of the predictor’s output, as it reflects the structure-and property-dependent variation between amino acid types. For this metric, the performance was PoPMuSiC ≫ CUPSAT > I-Mutant2.0 > I-Mutant3.0 > SDM. In this ranking, the position of CUPSAT was very dependent on the structure used. Performance on the Experimental Holodimer SOD1 Data. While most experimental data have been obtained from the monomer of SOD1, some are also available for the dimer that is the functional state of the protein in vivo, as discussed above. The computational methods report the ΔΔG of only one mutation. However, PoPMuSiC reports the average ΔΔG of both mutations of the dimer and thus in principle takes into account twin-site correlation. The four other methods only perform mutation in one chain (chosen to be chain A in this work). Many proteins are functional multimers, and many protein structures are deposited with several protein copies per structure file. Furthermore, all crystal structures represent a reality of closely packed protein molecules that may have affected even a structure that includes only a single protein molecule. These three effects (real monomer interactions in multimers, copy interactions in structure files, and indirect effects from crystal packing) should be of considerable concern when computing protein properties. To investigate the sensitivity of the five methods to monomer/dimer state and to interprotein interactions in crystal structures, two strategies were followed: One was to compare the performance also against experimental dimer data, and the other was to use five different crystal structures with increasing numbers of protein molecules in each structure file. First, the performance of the methods against an experimental data set of 33 single-point mutants of the dimer SOD1 is evaluated (see Table 1). If one naively computes the stability of this functional dimer, one would get the results of Figure 3. The methods reveal trends that are at least as accurate

and in several cases more accurate than when compared to the monomer data. This suggests that the physics of the different mutations remain unaffected by the dimer/monomer difference, with the trends preserved. However, all five methods report only the result of a single mutation, which does not reflect the nature of the experimental homodimer. Thus, they all exhibit large systematic errors, as measured by MSE (and, consequently, larger overall MAE), toward being too stabilizing, simply because the data sets are substantially destabilizing on average, and the computations miss one of the twin mutations. The conclusion from this analysis is, as partly suggested from the correlation of experimental monomer and dimer data in Figure 1B, that the methods are generally indifferent to the dimerization state of the protein; i.e., they can preserve the overall batch correlation with experimental dimer data, although some methods (notably CUPSAT) are more sensitive than others, to be investigated further in subsequent text. Table 3 lists the main metrics of each method when using the representative high-resolution dimer structure 2C9V with no Table 3. Overall Performance of Methods for Describing 33 SOD1 Holo-Dimer Mutantsa method CUPSAT (r = 0.52) I-Mutant2.0 (r = 0.38) I-Mutant3.0 (r = 0.38) PoPMuSiC (r = 0.50) SDM (r = 0.07)

additivity assumptionc

regression from experimental datad

error

direct calculationb

MAE

1.68

1.41

1.27

MSE MAE

0.90 1.47

−0.17 0.69

−0.31 1.33

MSE MAE

1.01 1.63

−0.05 0.64

−0.40 1.39

MSE MAE

1.31 1.30

0.24 0.66

−0.63 1.22

MSE MAE

0.76 2.20

−0.31 1.27

−0.20 1.67

MSE

1.76

0.69

−0.97

a

Using structure 2C9V. MAE: mean absolute error (kcal/mol). MSE: mean signed error (experimental minus computed; kcal/mol). MSE < 0 means that the method is too destabilizing compared to experimental data. r: correlation coefficient from linear regression. b Directly calculated and compared to dimer data. cCalculated and multiplied by 2 (additive assumption of mutant effects in monomers). d Calculated from the regression model in Figure 1B.

additional protein copies in the structure, which, aside from indirect crystal packing effects, provides the most realistic structure possible of the experimentally used dimer state. As can be seen, CUPSAT performs quite well in its trend on this data set (r = 0.52) despite large numerical errors due to its stability bias and the too large spread in its ΔΔG values. SDM is the only method that provides a poor trend (r = 0.07) on the dimer data set. The next column of Table 3 shows the results of the predictions if the ΔΔG of all methods are simply multiplied by 2; two reflect the presence of two twin mutations in the experimental dimer data. This approach, which is basically an additive assumption, performs well, as could partly be expected based on the correlation in Figure 1B. While simple multiplication preserves the correlations, the MSE now comes much closer to zero and is in fact often smaller than when comparing to the monomer data set. Consequently, all methods 1806

dx.doi.org/10.1021/jp4119138 | J. Phys. Chem. B 2014, 118, 1799−1812

The Journal of Physical Chemistry B

Article

Figure 4. Signed errors (kcal/mol) averaged over all five structural templates (2C9V, 2V0A, 1PU0, 1HL4, and 1HL5) for the five methods, decomposed into separate mutants, showing the sites most difficult to predict. Negative values imply that the method is too stabilizing.

destabilize too much) The latter two are among the seven metal-adjacent mutated sites in the data set (F45, H46, V47, F64, V81, L117, V119). All methods also either destabilize or stabilize too much for the following mutations: I18V, V31A, D90V, D101N, S105L, L106A, I112A, N139D, and I149A. These cases are noteworthy because, overall, the methods have substantially different biases toward either destabilization or stabilization, as discussed above. The most difficult cases involve a range of glycine mutations at G41 and G93, coinciding with large changes in chemical properties, either volume, hydrophilicity, or both. In these cases, all methods substantially overestimate the destabilization. The experimental numbers for these sites are very large and destabilizing, as seen in Table 1. These large changes may affect the real protein structure via long-range electrostatics, conformational changes, or water displacement when the side chains grow substantially more than the 30 Å3 volume of a water molecule. As the clearest example, the methods give too stable G41 mutants by typically 3−4 kcal/mol and up to 7 kcal/ mol. Thus, the five methods cannot account accurately for the large destabilizations that are observed when chemical properties change substantially compared to the wild type. G41 and G93 are positioned close together (8.5 Å between α carbons in 2C9V) at two neighboring turns, both at the very beginning of β-sheets. The ends of β-sheets are known from temperature- and ionic-strength variable molecular dynamics simulations to be particularly sensitive to disruptions.51 Disruption of these two sites is much more critical, as seen from the experimental ΔΔG, than predicted by any of the methods. If one removes these two sites from the data set, the numerical accuracy is substantially improved for all methods, with MAEs reducing from 1.78, 1.16, 1.17, 1.10, and 1.50 kcal/ mol to 1.61, 0.81, 0.77, 0.86, and 0.99 kcal/mol, with the two IMutant models now being more accurate than PoPMuSiC. For the seven metal-adjacent sites, the errors are significantly above average for CUPSAT and PoPMuSiC, thus constituting difficult sites for these two methods. Considering these seven sites alone shows that all five methods substantially over-

substantially reduce their numerical errors as measured by the MAEs, with I-Mutant2.0 and I-Mutant3.0 and PoPMuSiC giving very low MAEs of 0.64−0.69 kcal/mol. These results show that the investigated mutations are to a large extent independent of the dimerization state of the protein. From Table 1, five twin sites in the dimer are within 10 Å at their closest encounter: Cys111 (8.5 Å), Ile113 (3.7 Å), Gly114 (4.3 Å), Val148 (4.4 Å), and Ile149 (5.6 Å). The remaining sites are separated from their twin sites by >10 Å, partly explaining why the monomer mutants are good proxies of the dimer mutants. Modification of Cys111 has been shown to lead to dimer dissociation,22 showing that these five twin sites may be biochemically important. Finally, the third column of Table 3 shows that the use of a regression equation from the experimental data (Figure 1B) to estimate dimer ΔΔG values from the computed monomer mutations performs markedly worse than the additive approximation, because the uncertainty of ∼1 kcal/mol in this regression propagates to the computational prediction. In conclusion, for SOD1, the mutations as a whole can be considered additive although there are some strongly interacting twin sites, and this allows a computation of the holodimer mutation effects using PoPMuSiC as the most accurate of the studied methods. Analysis of Difficult Sites. In addition to the overall performance of the methods, as described by the three metrics MAE, MSE, and r, it is also relevant to know how the performance of the methods depends on the specific sites in the protein. This decomposition will allow identification of sites and substitutions that are poorly described by all methods and to identify issues with individual methods. Figure 4 shows the MSEs of the five methods averaged over all five structures, compared to the monomer experimental data, and decomposed into the individual 54 sites. From the figure, it can be seen that there are hot spots where all five methods tend to perform poorly. These are multiple mutations at sites G41 and G93 and the single mutations H43R (all methods tend to stabilize too much) and F64A and V81A (all methods 1807

dx.doi.org/10.1021/jp4119138 | J. Phys. Chem. B 2014, 118, 1799−1812

The Journal of Physical Chemistry B

Article

Figure 5. Structural sensitivity of methods decomposed into individual mutants (standard deviation from average result (kcal/mol) using the five structures 2C9V, 2V0A, 1PU0, 1HL4, and 1HL5). Top: All five methods. Bottom: I-Mutant2.0, PoPMuSiC, and I-Mutant3.0.

estimate the destabilization of these sites, with MSEs = −2.64 (CUPSAT), −0.46 (I-Mutant2.0), −1.85 (PoPMuSiC), −1.26 (SDM), and −0.32 kcal/mol (I-Mutant3.0). The likely explanation for this important observation is that the metals in the structure, which are ignored in the methods, distort the metal-adjacent sites away from their typically encountered environment. Thus, it is understandable why the torsion-angle potential of CUPSAT and the empirical structure-based substitution probability of PoPMuSiC fail for these sites, which behave differently from typical protein sites. This finding is relevant to the modeling of properties near metal sites in metalloproteins. Thus, PoPMuSiC, being less sensitive to structure, is the best predictor across all types of mutations, whereas I-Mutant2.0 and I-Mutant3.0 are more accurate when difficult cases are removed. On a final note, the sites that are close to their twin sites in the dimer structure (C111, I113, G114, V148) are not among the difficult ones according to the above analysis, suggesting that the correlation even between these sites is relatively minor. Structural Sensitivity of the Methods. Next, the structural sensitivity across the five structures was investigated. The sensitivity of protein property calculators to the choice of structure is not generally studied, although one study on PoPMuSiC showed that this particular method, due to its emphasis on amino acid substitution frequencies, is not very structure-sensitive.34 In the following, it is shown that this characteristic is not applicable to all of the investigated methods

of this work. As structural sensitivity could be a major issue in structure-derived proteomics data for, e.g., disease prediction, molecular simulation input, or detection of posttranslational modification sites, the sensitivity was investigated as a fourth metric of the present work. To evaluate structural sensitivity, the performance of the five methods was investigated systematically on five different SOD1 crystal structures, as described in Methods. These structures cover a range from realistic dimers at near-atomic (1.15 Å) resolution (2C9V, 2V0A) to moderate-resolution structures with increasing numbers of protein copies in the deposited and used structure files (1PU0, 1HL4, and 1HL5 with resolutions of 1.7−1.8 Å). The latter structures are useful for evaluating the role of resolution and protein−protein interactions, of relevance for understanding how deficiencies in crystal structures affect structure-based computational output. Figure 5 shows the total structural sensitivity over the five sets of structures, evaluated by computing for each method and each mutation the standard deviation from the average computed ΔΔG using the five structures, giving a measure of the structure-caused spread in computational output. The top panel of Figure 5 shows the result for all five methods, whereas the lower panel shows the results for the three least sensitive methods, I-Mutant2.0, I-Mutant3.0, and PoPMuSiC. As seen in the top panel of Figure 5, in particular CUPSAT and to some extent SDM are very sensitive to the structures used. Importantly, the sensitivities are not very site-dependent compared to the method-dependence. The average root-mean1808

dx.doi.org/10.1021/jp4119138 | J. Phys. Chem. B 2014, 118, 1799−1812

The Journal of Physical Chemistry B

Article

Figure 6. Correlation between stability effects computed using PoPMuSiC for all 2907 possible mutations in human SOD1 using different crystal structures.

square deviations in Figure 5 are 0.93 kcal/mol for CUPSAT, 0.08 kcal/mol for I-Mutant2.0, 0.08 kcal/mol for PoPMuSiC, 0.20 kcal/mol for SDM, and 0.04 kcal/mol for I-Mutant3.0. The very large sensitivity of CUPSAT applies broadly across the investigated SOD1 sites and is thus not due to site-specific effects, but instead most likely due to the torsion-potential force field employed in this method, which is strongly affected by local changes in structure. In contrast, the high sensitivity of SDM seems to be mainly due to some specific mutations, e.g., G85R, C111A, I113T, and I149A. The last three of these sites

are relatively close to their twin sites in the dimer, potentially explaining this sensitivity. In the lower panel of Figure 5, the site-specific structural sensitivity is shown in closer detail for the three least structuresensitive methods, I-Mutant2.0, I-Mutant3.0, and PoPMuSiC. IMutant3.0 has the smallest sensitivity, with that of PoPMuSiC and I-Mutant2.0 being similar. Importantly, there are no sites with very large contributions to this sensitivity; i.e., the overall sensitivity is relatively site-independent. The small structure sensitivity of these methods can be viewed as an advantage or a disadvantage, depending on the desired analysis: High 1809

dx.doi.org/10.1021/jp4119138 | J. Phys. Chem. B 2014, 118, 1799−1812

The Journal of Physical Chemistry B



structural sensitivity is an advantage if important and delicate structural details of particular high-resolution structures are of interest. However, for the bulk of proteins whose structures are deposited in the Protein Data Bank, low structural sensitivity is a clear advantage as it increases the universality of the method and reduces the role of resolution issues and crystal packing effects. As the structures 1PU0, 1HL4, and 1HL5 mimic such protein−protein interactions, the results suggest that crystal packing effects can affect the output of computational methods, as seen clearly by CUPSAT. Still, I-Mutant2.0, I-Mutant3.0, and PoPMuSiC are largely insensitive to such packing forces, as far as these forces are mimicked by the multiple protein copies present in 1PU0, 1HL4, and 1HL5. The variation in results with the five methods thus shows that insensitivity to such packing effects is an important strength of a protein property predictor. Among the five structures used, 2C9V and 2V0A are similar in resolution (1.15 Å) although they differ locally due to the refinement details (mutual RMSD of 1.3 Å). As both of these structures represent near-atomic resolution of the biologically relevant dimer state, they can be used to quantify the lower limit of sensitivity expected from using different crystal structures as input in protein property predictors. By calculating for each mutation the standard deviation from the average ΔΔG obtained using the two highest resolution structures 2C9V and 2V0A, a lower limit of structural sensitivity can be evaluated for all five methods (Table S7, Supporting Information). The average standard deviations are 0.52 kcal/ mol for CUPSAT, 0.04 kcal/mol for I-Mutant2.0, 0.02 kcal/ mol for I-Mutant3.0, 0.05 kcal/mol for PoPMuSiC, and 0.16 kcal/mol for SDM. These numbers represent an estimate of the smallest attainable structural sensitivity of the methods using highly similar, realistic near-atomic resolution crystal structures. To provide a visualization of the structural sensitivity in full detail, Figure 6 shows the correlation plots of ΔΔG values for all possible 2907 SOD1 mutations (i.e., 19 mutations in 153 sites) computed with PoPMuSiC, comparing several structural templates. When one considers all possible mutations in SOD1, the structural sensitivity is somewhat larger than for the data set used for benchmarking. Still, the use of higher resolution structures (2C9V and the as-isolated associated structure 2C9U47 as well as 2V0A48) provide the smallest spread in computed results (standard deviations of ∼0.17 kcal/mol) and the strongest correlations (r2 ∼ 0.975). If crystal structures with more moderate solution are used, the computed output will deviate more from the results obtained with high-resolution structures, with typical standard deviations of ∼0.2 kcal/mol, although the correlation remains high. The number of problematic outliers increases correspondingly, as seen from the scattering of points in Figure 6. Lastly, to give an example of the danger of applying NMR structures for computational property prediction, using 1DSW as a basis for computation provides a substantial loss of correlation (r2 = −0.65) and accuracy (standard deviation of 0.59 kcal/mol compared to 2C9V), probably due to the fact that these structures are minimized to destroy the ensemble average, although this requires further investigation on more protein classes to be understood fully. One should bear in mind that PoPMuSiC is insensitive to structure choice, so the use of other methods may lead to sensitivities much worse than those reported in Figure 6.

Article

CONCLUSION

In this work, mutations in the human SOD1 have been computed using five methods and compared to known experimental data. For this well-studied protein, a range of crystal structures is available of variable resolution, protein state, and including some structures with multiple protein copies per structure file. This, combined with a substantial experimental data set of stability effects of single-point mutations, allowed a first investigation into the structural sensitivity of protein property prediction using these five methods. First, the correlation between experimental literature data of SOD1 dimers and monomers was investigated. The large correlation (r = 0.82) suggested that mutations in separate protein monomers are mostly additive, which was later confirmed by explicit computation (Table 3). SOD1 crystal structures with several molecules in the unit cell did not prevent accurate computation of protein properties for the more accurate methods; i.e., the crystal packing effects are small relative to the physics of the single-point mutations. It was found that four metrics are all important for obtaining accurate protein-structure-based properties: (i) the overall trend in the data set as modeled by the regression correlation coefficient (for this metric, PoPMuSiC and CUPSAT performed most accurately for SOD1, with correlation coefficients up to ∼0.5); (ii) the systematic error describing the sign bias of the method (for this metric, PoPMuSiC and IMutant2.0 and I-Mutant3.0 were least biased, down to ∼0.25 kcal/mol); (iii) the overall numerical accuracy, as measured by the MAE, which is largely determined by random (i.e., nonsystematic) errors (the accuracy was highest for PoPMuSiC and I-Mutant2.0 (down to ∼1 kcal/mol for monomer data and ∼0.65 kcal/mol for dimer data); this metric can be optimized by establishing a realistic distribution in the computed property); (iv) sensitivity to crystal structure input (PoPMuSiC, I-Mutant2.0, and I-Mutant3.0 displayed very little sensitivity to variations in crystal structure whereas CUPSAT and to some extent SDM were quite sensitive). The comparison of results obtained with two near-atomic resolution crystal structures of the SOD1 dimer allowed an estimate of the lowest possible structural sensitivity achievable with each method, giving a lowest limit of ∼0.2 kcal/mol, but large sensitivity (0.6 kcal/mol) seen, e.g., for CUPSAT. Several difficult sites contributed to reducing the accuracy of the methods, with errors above 4 kcal/mol seen for a few sites with all methods, but errors >2 kcal/mol were seen in only a handful of sites for the most accurate method, PoPMuSiC. Particularly difficult mutations involved changing glycine, of relevance to future method improvement. It is suggested that this is due to substantial structural changes caused by glycine mutation, since glycine has low structural propensity and interrupts structure, or due to water colocalization with the small glycine side chain. Also, there were seven metal-adjacent (