Component Analysis of the Conformational Freedom within the EF

May 23, 2005 - Department of Food Science, University of Bologna, 47023 Cesena, Italy, Center for Magnetic Resonance,. University of Florence, 50019 S...
0 downloads 0 Views 462KB Size
Principal Component Analysis of the Conformational Freedom within the EF-Hand Superfamily Elena Babini,†,‡ Ivano Bertini,*,‡,§ Francesco Capozzi,†,‡ Claudio Luchinat,‡,# Alessandro Quattrone,‡,| and Maria Turano†,⊥ Department of Food Science, University of Bologna, 47023 Cesena, Italy, Center for Magnetic Resonance, University of Florence, 50019 Sesto Fiorentino, Italy, Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy, Department of Agricultural Biotechnology, University of Florence, 50019 Sesto Fiorentino, Italy, FiorGen Foundation, 50019 Sesto Fiorentino, Italy, and Department of Chemistry, University of Calabria, 87036 Arcavacata di Rende, Italy. Received May 23, 2005

A database of nonredundant structures of EF-hand domainssi.e., pairs of helix-loop-helix motifsshas been assembled, and the six angles among the four helices re-determined. A principal component analysis of these angles allows us to use two such components (PC1 and PC2) to describe the system retaining 80% of the total variance. A PC2 against PC1 plot representation allows us to represent in a compact way the full range of structural diversity of EF-hand domains, their grouping into protein families, and the variation for each family upon calcium and peptide binding. Keywords: EF-hand • principal component analysis • calcium binding • interhelical angle • structural analysis

Introduction EF-hand (EFh hereafter) proteins are at the crossroads of many calcium-mediated processes in cells.1-4 Accordingly, the EFh structural domain (EFhD) is one of the most common metal binding domains in the whole proteome, with more than 600 EFh domains annotated in the human genome alone. The prototype EFh protein, calmodulin, is a paradigm of proteins that act as hubs in interactome networks.5 An EFh motif is characterized by a helix-loop-helix structure, with a calcium ion usually bound to a 12- or 14-amino acid-long interhelical loop sequence (Figure 1A).6 The minimal structural and functional unit of EF-hand proteins is a domain (EFhD) comprised of a pair of EFh motifs (Figure 1B).1 The two motifs are tethered together by a linker of variable length. In the whole genome-proteome relationship, the EFhD is perhaps the most blatant contradiction of the 1:1 correspondence between sequence and structure.7 In many of its occurrences, the EFhD shows the ability to occupy different conformations depending on the environment, i.e., on the presence or absence of bound calcium ions and/or of target proteins. Furthermore, significant conformational differences are ap* To whom correspondence should be addressed. Phone: +39-0554574272. Fax: +39-0554574271. E-mail: [email protected]. † Department of Food Science, University of Bologna, 47023 Cesena, Italy. ‡ Center for Magnetic Resonance, University of Florence, 50019 Sesto Fiorentino, Italy. § Department of Chemistry, University of Florence, 50019 Sesto Fiorentino, Italy. # Department of Agricultural Biotechnology, University of Florence, 50019 Sesto Fiorentino, Italy. | FiorGen Foundation, 50019 Sesto Fiorentino, Italy. ⊥Department of Chemistry, University of Calabria, 87036 Arcavacata di Rende, Italy. 10.1021/pr050148n CCC: $30.25

 2005 American Chemical Society

Figure 1. A. Coordination scheme of the calcium binding motif: (X) first calcium ligand; (Y) second calcium ligand; (Z) third calcium ligand; (G) glycine; (#) fourth calcium ligand, provided by a backbone carbonyl; (I) isoleucine (although other aliphatic residues are also found at this position); (-X) water(w)-mediated calcium ligand; (-Z) sixth and seventh calcium ligands provided by a bidentate glutamate or aspartate. B. Schematic representation of the EF-hand domain. Arrows represent R-helices, while strips correspond to unstructured parts of the polypeptide chain.

parent between different EFh families even under the same environmental conditions. A continuously growing number of structural studies of EFh proteins seem to indicate that EFhD cover almost a continuum of conformational states.8,9 This variety is fascinating, as it reflects the variety of functions which EFh proteins are involved in, but is also challenging, because it makes it very difficult to understand function through a rationalization of the sequence-structure relationships which, in turn, are reflected in the functional properties of a domain. Journal of Proteome Research 2005, 4, 1961-1971

1961

Published on Web 11/05/2005

research articles

Babini et al.

Table 1. PCA Parameters for the Six Interhelical Angles of the Whole EFhD Dataset PC1

standard deviation eigenvalues proportion of variance cumulative proportion

35.6 3.600 0.6 0.6

A12 A13 A14 A23 A24 A34

0.49124 -0.6836 0.07692 0.02714 -0.0212 0.53315

PC2

Importance of Components 20.326 14.648 1.176 0.612 0.196 0.102 0.796 0.898 Loadings -0.3048 -0.4918 -0.3413 -0.0223 0.44408 -0.548 0.58182 0.45656 0.44805 -0.1549 -0.2327 0.47425

With these premises, many researchers have been looking for recognizable structural patterns that would permit a classification of the types of conformations (or conformational changes) in terms of some kind of unified structural descriptors. As each EFh motif has a helix-loop-helix structure, the interhelical angles have been proposed as parameters.10 Distance difference matrices based on the interhelical CR-CR distances have also been used to make structural comparisons, mainly to evaluate the overall effect of the binding of calcium on the loss of van der Waals contacts in the hydrophobic core.11 Ikura has further developed a method of analysis of the spatial relationships between the two helices in a EFh motif, called vector geometry mapping,8,12 which uses three descriptors that take into account not only the interhelical angle but also the relative rotations of the two helices about their own axes. This method is very valuable to help classifying new EFh protein structures,13 but is intrinsically limited to the intra-motif spatial relationships and therefore does not contain direct information on inter-motif spatial relationships. The latter are also important because the function is related to the overall conformation of the EFhD. Analysis of all the six interhelical angles should in principle provide the desired completeness of information. Indeed, all six angles are often tabulated in the structural analyses of EFh proteins.10,11,14-19 Irrespective of the method used, all structural analyses on EFhD seem to suggest that a continuum of conformational states are occurring within the whole EFh protein superfamily.8 As a consequence, it appears that meaningful structural comparisons can only be performed within restricted groups of proteins, while a more general use of structural descriptors to describe the whole EFhD superfamily seems of limited utility. It is possible, however, that the conformational states of EFh proteins do not really span a continuum, but can be at least partially clustered provided that better/more compact descriptors of these conformational states can be found. Our aim here is to assess whether a minimal number of parameters, or descriptors, could be found that could describe unambiguously any EFhD conformation; and, if so, whether this restricted set of descriptors could unambiguously point to one or another of the many EFh protein families, thereby acquiring predicting ability on the function itself. We find that the two most meaningful linear combinations of the six interhelical angles, obtained using a standard Principal Component Analysis (PCA) approach,20 retain as much as 80% of the total variance, i.e., 80% of the ability to describe the complex structural variations within the EFh protein superfamily (Table 1). These two linear combinations provide a satisfactory representation of the whole conformational space. 1962

Journal of Proteome Research • Vol. 4, No. 6, 2005

PC3

PC4

13.0612 0.485 0.0809 0.9785 0.02913 0.02642 -0.4048 -0.2789 0.86246 0.11387

PC5

PC6

5.3028 0.080 0.0133 0.9918

4.15642 0.049 0.00819 1.00000

-0.1581 -0.614 -0.4461 -0.1984 -0.1736 -0.574

0.63096 0.19495 -0.3656 0.57894 0.02886 -0.307

Methods Measurement of the EFhD Interhelical Angles. Inter-helix angles, for each EFh domain in the fragments, were calculated using the command “CalcHelix” of the program MOLMOL,21 which generates a report with angles between helix axes. The helices considered were, when possible, of 8 residues flanking both sides of the loop in the EF-hand motifs. Cylinders approximating the backbone atoms were added to the four helices of each fragment using the command “AddCylinder” by the fit criteria ‘least_square’, which performs a least-squares fit of the cylinder surface to the atoms. Principal Component Analysis. Principal Component Analysis (PCA)22-24of the six inter-helix angles, was performed using the function “prcomp” stored in the MVA (classical MultiVariate Analysis) standard library package of the R software (v 2.1.0, www.r-project.org). The input data were zero-centered and they did not need standardization because they were all in the same units. The resulting PCn scores are linear combinations of the six inter-helix angles whose weight coefficients are represented by loadings reported in Table 1. According to both Kaiser’s criterion and Cattell’s Scree Test,25-27 and by using the eigenvalues shown in Table 1, two principal components turned out to be sufficient to adequately describe the interhelical angle variance. The first two PC scores (PC1 and PC2) were plotted using the program Sigma-plot (version 2.0.1; Jandel Corporation). Statistical Analysis. ANOVA one-way analysis of PC scores distances, calculated for apo/holo pairs of domains with known structures, was performed by using the program STATISTICA (version 6; StatSoft, Inc.). The apo/holo protein structures used to calculate the shifts in both PC1 and PC2 scores upon calcium binding are as follows: S100A6 human (1k9p/1k9k), S100A6 rabbit (2cnp/1jwd), S100B bovin (1cfp/1mho), S100B rat (1b4c/ 1qlk), S100D pig/bovin (1cb1/4icb), S100A11 rabbit/pig (1nsh/ 1qls), CaM C-term bovin (1cmf/1cmg), CaM C-term human (1f71/1j7p), CaM N-term human (1f70/1j7o), CaM N-term yeast (1f54/1f55), TpCc N-term human (1spy/1ap4), TpCs N-term chicken (1top/1avs), CalpSmall C-term human/pig (1kfu/1alv), CalpSmall N-term human/pig (1kfu/1alv), CalpSmall C-term rat (1aj5/1dvi), CalpSmall N-term rat (1aj5/1dvi), Grancalcin C-term human (1f4q/1k94), Grancalcin N-term human (1f4q/ 1k94).

Results and Discussion EFhD Dataset Derivation. The first step of the work was the construction of a structural EFhD dataset, based on the two most comprehensive protein structural classification databases, CATH28 and SCOP.29 We queried both databases for domains

Principal Component Analysis of Conformational Freedom

belonging to the EF-hand superfamily, obtaining in total 471 entries (at April 2005). This initial dataset was the seed for a further exploration of the PDB,30 based on the construction of a sequence profile by the combination of a ClustalX31 algorithmbased multiple alignment tool and a HMMER32 algorithm-based domain identification tool. This procedure added other entries, for a total of 564, sequentially filtered for outdated entries, mutants, adducts with metals other than Ca and Mg, and/or with nonphysiological molecules, to obtain 133 different EFhD sequences corresponding to a final number of 307 nonredundant entries, because each sequence is present with or without bound metal(s) and/or peptides. This final EFhD dataset is reported in Supporting Information Table S1 and also available at www.cerm.unifi.it. Analysis of the Overall Distribution. Provided that only the analysis of the domain as a whole is meaningful for the description of structural changes occurring upon binding of calcium or the target peptide, the best way up to now adopted to describe EFhDs is based on the measurement of the six interhelical angles, which represent the reciprocal orientation of the four helices. Since our aim is to examine the whole EFhD dataset including those domains whose helices may show substantial deviations from ideality, especially far from the loops that act as hinges for the helical motions, we have defined all helices as constituted by only the first eight helical residues immediately preceding and following each loop. The interhelical angles have been measured using the standard routine of the MOLMOL program.21 This routine provides absolute values of the angles, and we conventionally define two antiparallel helices (taken in sequential order) as forming a 180° angle and two parallel helices as forming a 0° angle. Other algorithms are available to obtain interhelical angles that could be used equally well,12 provided that attention is made in avoiding erratic behavior of the sign convention. Supporting Information Table S1 contains also all the interhelical angle values measured. Supporting Information Figure S1 shows the distribution of values for each of the six interhelical angles. Angles I-II, I-III, and III-IV span the largest range (ca. 115°, 140°, and 100°, respectively). These ranges, however, are small if referred to each family and to the EFhD biological status (i.e., bound to calcium and/or to the target peptide, Supporting Information Table S2), suggesting that a more compact representation of the structural diversity within the EFhD may be possible. A powerful way to concentrate the information distributed throughout a large number of variables is the Principal Component Analysis (PCA). The PCA approach uses no a-priori knowledge. Given a set of variables (six in this case) with a given total variance, by PCA the unique set of six linear combinations of the variables (called Principal Components, PC) is found that attributes as much variance as possible to one of them (PC1), the second highest variance to PC2, etc. The coefficients (loadings) relating PC1-PC6 to the six interhelical angles are reported in Table 1 (see Methods section). The combination corresponding to PC1 turns out to retain as much as 60% of the total variance, and inclusion of PC2 yields 80% of the variance. According to statistical criteria,33,34 reduction of the interhelical angle information to these two components is a correct approximation. Interestingly, PC1 is dominated by only three angles, namely the I-II, I-III, and III-IV angles which are those that span the largest ranges, angle I-III having the largest weight. Conversely, PC2 depends on all six angles but the largest weight is attributed to the three angles that essentially do not contribute to PC1.

research articles Figure 2 shows a 2D plot (PC plot) where the structural features of each member of the whole EFhD dataset is now represented by a pair of PC1 and PC2 scores. The distribution of scores has a vague anchor shape, with the majority of domains clustering on each of the two arms of the anchor and a few on the shank. Each EFhD is symbol/color coded according to Table 2. All members of the same family in the same calcium/peptide bound state are given the same symbol/color code. Colors were chosen to maximize the readability of the figure: “warm” colors for domains with high PC1 scores, “cold” colors for domains with low PC1 scores, and intermediate colors for intermediate PC1 scores to visualize differences in PC2. It can be immediately noted that a good clustering of points with identical combinations of colors and symbols occurs, as it will be discussed in detail below. We should stress that the particular choice of colors does not introduce any bias in interpretation, since all members of each family in the same state have the same code independently of how well they cluster in the PC plot. We initially assessed if the PC1 and PC2 scores of a given EFhD are able to predict the family to which an EF-hand protein belongs on the basis of the variation of such parameters upon calcium binding. The variations of PC1 and PC2 scores upon calcium binding for all apo/holo domain pairs with known structure were subjected to statistical analysis (see Methods section for the PDB-entry list). ANOVA one-way analysis showed, according to the Newman-Keuls test,35 the ability of the variation of PC1 and PC2 scores to differentiate proteins belonging to the calmodulin-like/troponin C, S100 and penta-EFh (PEF) families (Table 3). PEF proteins are, indeed, differentiated from the other two families by taking into account the PC1 variations. On the other hand, S100 proteins are differentiated from the other two families by the PC2 shifts upon calcium binding. Thus, the use of both PC1 and PC2 calcium binding shift allows unambiguous assignment of a given EFhD to the family to which the protein bearing it belongs. This means that the PC shifts would be endowed with predictive power for family identification of a hypothetical new EFhD. We turn now to a detailed analysis of each family separately, to assess to what extent in each case the PC plot is able to grasp their essential structural features and, more importantly, the peculiar conformational changes upon binding of calcium and peptides. We begin our analysis from the most represented groups of protein family EFhDs in our dataset, i.e., the calmodulin-like protein family group (75 entries, 24.4% of the EFhD dataset) and the S100 protein family group (31 entries, 10.1% of the EFhD dataset). We will then consider other well represented domain groups such as skeletal and cardiac troponin Cs, penta-EFh (PEF) proteins, proteins containing EH motifs, parvalbumins, and myosin essential or regulatory light chains. All together the above-mentioned groups of EFhDs belong to only 14 families, but cover about 80% of the whole dataset. Finally, the remaining 20% of the domains, which spread over as many as 22 protein families, will be considered. Calmodulin-Like Proteins. Both N- and C-terminal domains of calmodulin-like proteins in the apo form are in the so-called closed conformational state.11,12,16,36-40 Both domains undergo drastic conformational changes upon passing to the calcium bound forms, ending up in the so-called open conformational state (Supporting Information Table S2), while peptide binding to the open calcium form does not change its conformation significantly.7,41-46 In the PC plot of Figure 3A, calmodulin-like Journal of Proteome Research • Vol. 4, No. 6, 2005 1963

research articles

Babini et al.

Figure 2. (A) Principal component plot of the whole EFhD dataset derived from PCA analysis of the six inter-helix angles. Eight colors and fourteen symbols are differently assigned for the N- and C-terminal domains, for the apo and holo forms, for protein in the presence or absence of physiological ligands. The three points located at the extreme edges of the plot (top, middle-right and down-left) represent the positions associated to the theoretical structures represented in the schemes showed close to each point. In such schemes the helices are represented as oriented cylinders. (B) The six inter-helices angles for these structures are shown respectively: The PC plot only contains information about the interhelical orientations within a single EFhD and does not provide information on the reciprocal orientations of EFhDs in multi-EFhD proteins.

N- and C-terminal EFhDs move clearly from the right arm of the anchor (PC1 scores between 20 and 60, PC2 scores between -25 and 30) to the left arm when bound to calcium (PC1 scores between -75 and -10, PC2 scores between -40 and 15), independently of the presence of the target peptide. The only available structure of a peptide-bound C-terminal domain of apo-calmodulin47 is located in the shank (PC1 score ) -14 and PC2 score ) 43) rather than in the right (apo) or in the left (calcium bound) arm of the anchor. It should be noted that similar interhelical angles, and similar PC scores, are found in the C-terminal apo-myosin light chains bound to peptides, and the resulting conformation has been called semi-open. For the PC plot to be more informative from the structural point of view, it is useful to associate some kind of idealized, reference conformational states to the various regions. We thus looked for the coordinates in the PC plot that correspond to idealized conformations (black-grey points indicated by arrows in Figure 2). A geometry with all four helices parallel/antiparallel to each other as in a classical four-helix bundle generates a point at the extreme right of the plot. We call this arrangement antiparallel bundle. The point at the tip of the shaft represents a geometry in which each EFh has the two helices (I/II and III/IV) orthogonal to one another, while helices I/IV 1964

Journal of Proteome Research • Vol. 4, No. 6, 2005

and helices II/III remain parallel. We call this arrangement chair bundle. Another idealized geometry is represented by the point at the left of the plot, where helices I and III are antiparallel and all other reciprocal orientations are at 90 degrees. We call this arrangement orthogonal bundle. [“Orthogonal bundle” is also the name of one of the five domain architectures used in the CATH classification for the “mainly alpha” protein class. CATH classifies EFh as “orthogonal bundle”. However, according to Figure 2, the many EFh domains falling closer to point (a) (defined as parallel bundle here) would rather belong to the CATH “up-down bundle” architecture. Apparently, the flexibility of EFh domains makes them span across these two domain architectures.] The antiparallel bundle and orthogonal bundle conformations may be seen as limit cases of the closed and open conformations, respectively, while the chair bundle can be approximated to the limit case of semi-open conformation (see also later). In this view, both domains of calmodulin, upon calcium binding, move roughly from about halfway between antiparallel and orthogonal bundle to close to orthogonal bundle, with trajectories that, on average, are higher for the C-terminal domains (closer to the chair bundle) than for the N-terminal domains (Figure 3A). S100 Proteins. In the apo-form, all interhelical angles are somewhat different from those of both N- and C-terminal

research articles

Principal Component Analysis of Conformational Freedom Table 2. Color/Symbol Pairs Used to Code the 307 Domains Constituting the Whole EFhD Dataset

a CALM_BOVIN; CALM_CAEEL; CALM_YEAST; CALL_HUMAN; CALM_PARTE; CALM_HUMAN; CALL_ARABIDOPSIS; CALM_DROME; CALM_S.TUBEROSUM; CDPK_SOYBN. b S103_HUMAN; S104_HUMAN; S106_HUMAN; S106_RABIT; S108_HUMAN; S109_HUMAN; S10A_RAT; S10B_BOVIN; S10B_HUMAN; S10B_RAT; S10D_BOVIN; S10D_PIG; S10E_HUMAN; S10P_HUMAN; S110_HUMAN; S111_PIG; S111_RABIT; S112_HUMAN. c MLC1_YEAST; MLE_AEQIR; MLE_CHICK; MLEY_HUMAN; MLR4_SCHPO. d PRVB_ESOLU; PRVB_CYPCA; PRVB_MERBI; PRVB_MERMR; PRVA_TRISE; ONCO_RAT: PRVA_RAT; PRVA_ESOLU; PRVA_HUMAN.

domains of calmodulin, angles I-II, I-III, I-IV, and II-IV being smaller and angles II-III and III-IV being larger, on average.48-51 Upon calcium binding, there are striking changes in the extent of the individual angles, especially angles I-III, II-III, and IIIIV (see Supporting Information Table S2).48,49,51,52 These striking changes are of the same magnitude as those observed for the

corresponding angles in the C-terminal domain of calmodulin, but are not accompanied by similar changes in the other angles. Indeed, it has already been noticed that these changes are mainly due to a large movement of helix III alone.2,53,54 Location of S100 proteins in the PC plot is shown in Figure 3B. The single EFhDs of S100 proteins in the apo form are in the extreme right Journal of Proteome Research • Vol. 4, No. 6, 2005 1965

research articles

Babini et al.

Table 3. Newman-Keuls Test Results for the ANOVA One-Way Analysis of PC Scores Distances35 PC1 distancesa S100

S100 calmodulin-like PEF

0.382520 0.008948

calmodulin-like

PEF

0.382520

0.008948 0.003851

0.003851

PC2 distancesa S100

S100 calmodulin-like PEF a

0.003005 0.001055

calmodulin-like

PEF

0.003005

0.001055 0.308868

0.308868

Two groups are distinguishable if P < 0.01 (bold values).

arm of the anchor (PC1 scores between 45 and 85, PC2 scores between -5 and 40), and when bound to two calcium ions shift to the left to reach the tip (PC1 scores between 0 and 20, PC2 scores between -30 and 5). The different orientation of only one helix is correctly reflected in the movement observed along PC1 in the PC plot, which is in the same direction but less marked than that observed in calmodulin and, moreover, does not bring the calcium-bound forms to the left arm of the anchor. S100 thus roughly moves from being close to the antiparallel bundle limit to about halfway along the line connecting the antiparallel bundle with the orthogonal bundle, while remaining far from the chair bundle limit. Peptide binding to this calcium loaded EFhD does not shift the position in the PC plot appreciably, while the only peptide-bound apo S100 EFhD structure available (S100A10) clusters together with the calcium-bound forms.48 This structure has already been noted to be exceptional, because of the lack of essential calcium binding amino acids in both loops and major differences in the length of the linker connecting the two EF-hands, the most flexible region in S100A10.55 Packing in the hydrophobic core and solvation effects also contribute to the structural features of peptide-bound S100A10.1,56-58 Skeletal and Cardiac Troponins C. Skeletal and cardiac troponins C are very similar to calmodulin, and are often considered as calmodulin-like proteins.35 In all of their forms (apo- and holo, with and without bound peptide), the six interhelical angles are very similar to those of calmodulin (Supporting Information Table S2). The only difference is in the angle III-IV of the C-terminal domain, which is slightly larger in the calcium peptide bound form for troponin C with respect to calmodulin. Accordingly, each different form of these proteins occupies very similar regions of the PC plot, with a slightly less negative PC1 score for the calcium peptide bound form of troponin C (Figure 3C). Cardiac troponin Cs are also structurally characterized in the one-calcium bound form. The interhelical angles that change upon calcium binding show an intermediate behavior (angle I-II changing mainly upon binding of the second calcium ion, angle I-III changing mainly upon binding of the first calcium ion, and angle III-IV changing progressively, Supporting Information Table S2). This is mainly reflected in the PC plot as a progressive decrease in the PC1 score (Figure 3C). As for the N-terminal domain of calmodulin, the movement upon calcium binding is essentially from the antiparallel to the orthogonal bundle, starting from mid-way and ending relatively close to the latter limit. 1966

Journal of Proteome Research • Vol. 4, No. 6, 2005

PEF Proteins. PEF proteins represent a grouping of EFh proteins characterized by different general architecture, including sorcin, ALG-2 and grancalcin (Figure 3D) and the small and large subunits of calpains (Figure 3E). The apo-form of the N-terminal EFh domain of the small subunit of calpain has interhelical angles very similar to the corresponding domain of calmodulin59 (Supporting Information Table S2) and, accordingly, sits in the same region of the PC plot (Figure 3E). The corresponding domains of all other PEFs, when available, are similar to one another and differ a little more from those of the N-terminal calmodulin, showing a slightly higher I-III angle and slightly smaller I-IV and II-III angles. In the PC plot, the latter proteins have PC1 scores in the range 0-30 while the former falls in the 30-60 range (Figure 3D). The apo-forms of the C-terminal domain of all PEF are similar to one another but differ sensibly from those of calmodulin-like proteins. In particular, the I-III angle is somewhat larger while angles I-IV and II-III are smaller (Supporting Information Table S2). Accordingly, the PC2 score is between 30 and 0 for calmodulin like proteins and between 0 and -30 for PEF, i.e., farther away from the chair bundle limit. It has already been noted that PEFs remain in the closed form upon calcium binding.59-61 However, some interhelical angles change appreciably. For instance, in the N-terminal domain of the small subunit of calpain, angles I-II, I-IV, and II-IV decrease, while the I-III angle increases (Supporting Information Table S2). This translates into a significant decrease of PC1 score and a small decrease in PC2 score (Figure 3E), in the same antiparallel-orthogonal bundle direction as observed in the corresponding domain of calmodulin-like proteins. On the other hand, addition of calcium to the C-terminal domain does not cause any significant change in the interhelical angles (Supporting Information Table S2) and hence in the PC scores (Figure 3E). Proteins Containing EH Motifs. EH proteins of known structure are the EPS15, the EPS15 receptor protein and the REPS proteins. The EH domain resembles the N-terminal domain of calmodulin in the apo-form (Supporting Information Table S2),62,63 and sits in the same region of the PC plot (Figure 3F). At variance with calmodulin, addition of calcium (only one loop is able to bind calcium) or of calcium and peptide does not appreciably alter the interhelical angles and, therefore, the PC scores. Parvalbumin. Parvalbumins (both alpha and beta lineages) are believed to originate from an ancestral calmodulin-like tandem domain protein which lost the first EF-hand of the N-terminal domain.64,65 The second, non functional, EFh behaves like an endogenous peptide bound to the C-terminal domain in the presence of calcium. The interhelical angles are very similar to those of the C-terminal domain of calcium calmodulin loaded with peptide, with the only exception of angle II-IV, which is significantly lower and more similar to that found in the calcium calmodulin without peptide (Supporting Information Table S2). Correspondingly, parvalbumins are somewhat lower in PC2 score and somewhat higher in PC1 score, resembling more closely the calcium calmodulin without peptide (Figure 3G). Myosin Essential Light Chains (MELCs). In MELCs, the Nand C-terminal domains differ. In the apo form, only angles I-II and I-IV are similar to calmodulin, while angle III-IV is smaller in both domains and angle II-III is smaller in the N-terminal and larger in the C-terminal domain (Supporting Information Table S2).11 Again, the individual differences in the interhelical angles with respect to calmodulin are sizable for

Principal Component Analysis of Conformational Freedom

research articles

Figure 3. PC plot details of main EFh protein families: (A) calmodulins (CALL_ARATH, CALL_HUMAN, CALM_BOVIN, CALM_CAEEL, CALM_DROME, CALM_ECOLI, CALM_HUMAN, CALM_PARTE, CALM_SOLTU, CALM_YEAST); (B) S100 (S103_HUMAN, S104_HUMAN, S106_HUMAN, S106_RABIT, S108_HUMAN, S109_HUMAN, S10A_RAT, S10B_BOVIN, S10B_HUMAN, S10B_RAT, S10D_BOVIN, S10D_PIG, S10E_HUMAN, S10P_HUMAN, S110_HUMAN, S111_PIG, S111_RABIT, S112_HUMAN); (C) troponin C, cardiac and skeletal (TPCC_CHICK, TPCC_HUMAN, TPCC_TROUT, TPCS_CHICK, TPCS_MELGA, TPCS_RABIT); (D) penta-EFh (SORC_CRILO, SORC_CRILO, SORC_HUMAN, PCD6_MOUSE, PCD6h_LEIMA, GRAN_HUMAN); (E) calpains, large and small subunits (CAN2_HUMAN, CAN2_RAT, CANS_HUMAN, CANS_PIG, CANS_RAT); (F) EH containing proteins (EP15_HUMAN, EP15_MOUSE, REPS1_MOUSE, REPS2_HUMAN); (G) parvalbumins (PRVA_ESOLU, PRVA_HUMAN, PRVA_RAT, PRVA_TRISE, PRVB_CYPCA, PRVB_ESOLU, PRVB_MERBI, PRVB_MERMR, ONCO_RAT, ONCO_HUMAN); (H) myosin Essential Light Chains (MLC1_YEAST,MLE_AEQIR, MLE_CHICK, MLEY_HUMAN, MLR4_SCHPO); (I) myosin Regulatory Light Chains (MLR_AEQIR).

both domains. However, the overall structure of the N-terminal is not much different from calmodulin, while it is substantially different for the C-terminal. This feature is strikingly underlined by the relatively large and positive values of both PC1 and PC2 scores for the MELCs C-terminal domain, which place it in a unique location of the PC plot (Figure 3H). The N- and

C-terminal domains react differently to binding of the target peptide: the C-terminal domain shows a relatively large decrease in both PC1 and PC2 scores, which brings this domain in the shank of the anchor. The N-terminal domain shows a small increase in PC1 score, which is further enhanced in the presence of Ca2+, and brings the domain further in the right Journal of Proteome Research • Vol. 4, No. 6, 2005 1967

research articles

Babini et al.

arm of the anchor. This behavior of the N-terminal domain of MELCs is surprisingly different from that of calmodulin, S100, and all other domains, which invariably either move from the antiparallel to the orthogonal bundle limits and not vice versa, or at most remain in the same region, upon binding of calcium and/or peptides. The movement of the C-terminal domain is better described as a movement toward the orthogonal chair limit, but starting from a position that is much closer to the chair bundle than to the antiparallel bundle limit. Myosin Regulatory Light Chains (MRLCs). For both domains of MRLCs structural data exist only for a peptide bound form, with (C-term) and without (N-term and C-term) calcium/ magnesium.66-68 The C-terminal domain behaves very similarly to the corresponding domain of calmodulin in all six interhelical angles (Supporting Information Table S2) and, obviously, falls in the same region of the PC plot (Figure 3I). Interestingly, the peptide bound N-terminal domain, with and without metal, has been described as being in an open form.1 Again, the six interhelical angles are similar (Supporting Information Table S2) and so are the PC scores (Figure 3I). The overall movement in the PC plot is along the same line as for the MELCs, but with the MRLCs apo form starting already from the region of the calcium bound MELCs and the MRLCs calcium bound form further toward the orthogonal bundle limit. Miscellaneous EFhDs. All of the remaining EFhDs, not belonging to the eight groups already described, constitute only 20% of the total domains but are spread over 22 families. These include neuronal calcium sensors,69-72 the recently described SPARC family,73 actins,74 the CBL oncogene,75 guanilyl cyclase activating proteins,76 etc. Many of them possess interhelical angles similar to those found in one form or another of a protein already described (Supporting Information Table S2), and thus fall in the same PC plot regions. These are summarized in Figure 4. Some particular cases are worth mentioning because some interhelical angles are peculiar and are reflected in peculiar positions in the PC plot. For example, the apo-N-terminal domain of recoverin has interhelical angles similar to those measured in calmodulin-like apo-N-terminal domains (Supporting Information Table S2) and falls in the corresponding region of the PC plot (Figure 4A). However, myristoylation of the protein dramatically alters the orientation of helix 1, thereby increasing the value of the I-II angle from about 106° to about 172°, decreasing the value of the I-III angle from 113° to 18°, and increasing the angle I-IV from 113° to 144°, the other angles experiencing smaller readjustments (Supporting Information Table S2). This corresponds to a large movement in the PC plot, with PC1 score increasing from 10 to 110 and PC2 score from 0 to 40 (Figure 4A). The myristoylated apo-N-terminal domain of recoverin is the domain with the highest PC1 score in the PC plot (Figure 2). An early structure of the calcium loaded C-terminal domain of centrin displays interhelical angles similar to those of the peptide bound apo C-terminal domain of calmodulin (with the exception of angles I-IV and II-III, which are somewhat higher and somewhat lower, respectively, than those of calmodulin (Supplemental Table S2)), and falls in the same region of the PC plot (Figure 4B), i.e., close to the chair bundle limit (Figure 3A).77 This somewhat aberrant behavior has been later attributed to a poor construct selection when another high-resolution structure of the same protein in complex with a peptide has been solved.78 The latter structure has PC scores typical of a calmodulin-like protein. It is worth noting that the peculiar features of the early structure have been immediately captured 1968

Journal of Proteome Research • Vol. 4, No. 6, 2005

Figure 4. PC plot details of miscellaneous EFh protein families: (A) R-actinins and neuronal sensors (AAC1_CHICK, AAC2_HUMAN, CABP_ENTHI, CALB_HUMAN, CBL2_ARATH, CNBL4_ARATH, NCAD_HUMAN, NCB1_HUMAN, NCS1_HUMAN, NCS1_YEAST, RECO_BOVIN); (B) aequorin, obelin, centrin, calcium vector, RING finger, and calerythrin (AEQ2_AEQVI, OBL_OBELO, CAT1_HUMAN, CATR_CHLRE, CAVP_BRALA, CBL_HUMAN, CBP_SACER); (C) dystrophin, guanylyl cyclase activating protein 2, Kv Channel Interacting Protein 1, calcium and integrin-binding protein 1, plasmodial specific Lav1-2 protein, phosphoinositide-specific phospholipase C-1, polcalcin Bet V 4, sarcoplasmic calciumbinding proteins, and sparc proteins (DMD_HUMAN, GCA2_BOVIN, Kchip1_HUMAN, KChIP1b_RAT, KIP1_HUMAN, LAV1_PHYPO, PID1_RAT, POC4_BETVE, SCP_NERDI, SCP2_BRALA, SPRC_HUMAN)

by our PC analysis. Kchip1 has the peculiarity that both loops of the N-terminal domain are unable to bind calcium because they lack essential metal binding groups. It has been observed that the variant amino acids in the loops may force the domain to stay in an open conformation even in the absence of

research articles

Principal Component Analysis of Conformational Freedom

proteins with only one calcium/magnesium and no peptide bound (yellow symbols). They are indeed characterized by an incomplete opening of the domain and, in our view, represent the true intermediate situation between a closed and an open domain, as they lie on the line connecting the antiparallel and the orthogonal bundle limits.

Conclusion

Figure 5. Plot of all EFhD, color/symbol coded as in Table 2, as a function of their I-II (EFh1) and III-IV (EFh2) interhelical angle values.

calcium,79 although, as mentioned above for S100 proteins, alterations in the loop are not the sole determinants of the preferred conformation. The interhelical angles are very similar to those of the calcium bound N-terminal domain of calmodulin (Supporting Information Table S2), and the PC scores are correspondingly similar (Figure 4C). Finally, the N-term domain of sarcoplamic calcium binding proteins, loaded with one or two calcium ions, has the I-II interhelical angle somewhat lower and the angles II-III and II-IV sizably higher with respect to calcium loaded calmodulin (Supporting Information Table S2).80 As a consequence, PC1 score is close to normal while PC2 score is sizably higher, so that this domain falls quite isolated in the upper left region of the PC plot (Figure 4C). Semi-Open Conformation and Chair Bundle Limit. The closed/semi-open/open conformations are often defined in terms of the intra-motif angles I-II and III-IV.8,17,81,82 However, these angles have only a modest discriminatory power between the open and the semi-open conformations, as it can be appreciated by inspection of Figure 5, which reports all EFhD color/symbol coded as in Figure 2 as a function of their I-II and III-IV interhelical angle values. Indeed, the discrimination between open and semi-open forms is mainly due to the intermotif angles I-IV and II-III which, in our PC analysis, are reflected mostly in the PC2 score. This is because domains approximated by the chair bundle conformation have intermediate values of PC1 scores but much higher values of PC2 scores with respect to other domains with similar PC1 scores, and are thus located in the shank of the anchor, i.e., close to the chair bundle limit. The chair bundle has been previously identified as a third distinct conformation of EF-hand domains in the regulatory region of scallop myosin on the ground of distance difference matrices, analysis of interresidue contacts, comparison of interhelical angles and inspection of structures using molecular graphics.11 The PC plot described in the present work seems to greatly simplify the structural analysis necessary to discriminate among these three limit conformations. The domains with similar PC1 scores but low PC2 scores, i.e., located at the lower tip of the anchor and far away from the chair bundle limit, may actually deserve further comments. They include, among others, the N-terminal domains of several

In this work, a complete structural dataset of EFhDs has been assembled, and the structural features of these domains have been analyzed in term of their six interhelical angles. All of these angles have been directly remeasured from the original PDB files, and a protocol has been devised to minimize heterogeneity due to non ideality of the helices and unnecessary use of positive and negative values for the angles themselves. The angle values have been subjected to a principal component analysis (PCA) to reduce redundancy. It is found that only two principal component scores (PC1 and PC2) already account for 80% of the total variance. It is also found that the intra-motif angles I-II and III-IV, although useful to describe the transition between the so-called closed and open forms of the EFhDs, show a large covariance and therefore are not enough to provide a complete description of the conformational space spanned by the EFhDs. In particular, the inter-motif angle I-III adds precious, non redundant, information. In fact, angles I-II and III-IV are not the largest contributors to PC1 score and are actually the smallest contributor to PC2 score. The information contained in angles I-II, I-III, and III-IV is condensed into PC1, whereas PC2 adds the contribution of the other three angles. Most of the essential conformational features and conformational variations of each EFhD family are thus distinctly highlighted in the PC plot of Figure 2, as detailed in Figures 3-4. We can conclude that the PCA is very convenient and is a useful step just below the complete analysis of all the interhelical angles and all the structural parameters. PCA allows one to pick up the essential information. For example, N-terminal apo-calmodulins have positive PC1 scores and close to zero PC2 scores, and move to negative PC1 scores upon calcium and/or peptide binding. We can also say that they move along the line connecting the antiparallel bundle to the orthogonal bundle limit conformations, just as a result of a visual inspection of the PC plot. It is not unexpected, after the analysis of the six interhelical angles, that ensembles of proteins belonging to the same family and in the same form (calcium and/or peptide-bound) fall in the same region of the PC plot. If reference is made to idealized geometries (antiparallel, orthogonal or chair bundles) one has also an idea of the shape of the molecule. In turn, the shape is related to the function and the simple description in terms of two parameters may prove to be useful. For instance, the proteins that exhibit specific types of shifts in the PC plot upon calcium binding seem to be more likely to bind a large variety of substrates (e.g., calmodulin-like proteins),45,83-85 while proteins that exhibit different types of shifts (e.g., S100 proteins) seem more likely to bind only specific peptides.48,55,86-89 We should recall, however, that there is an intrinsic limitation in all structural descriptions of proteins if we do not take into account that proteins are dynamic objects, and dynamics is of course central to EF-hand proteins. Therefore, the structures in the present EFhD datasets only represent static descriptions of one or another conformational state of proteins that may in reality sample other conformational states for different fractions of Journal of Proteome Research • Vol. 4, No. 6, 2005 1969

research articles time. Among the many works dealing with dynamic studies of EF-hand proteins in solution we may refer the reader to the series of papers by Akke and co-workers.90-93 In summary, we have presented an efficient and convenient way to describe essential structural features of EFh domains in terms of two parameters which are linear combinations of the six interhelical angles. Every time a new structure becomes available, such parameters provide a first hand information on the similarity with other proteins of the same or of different families.

Acknowledgment. This research was financially supported by Ente Cassa di Risparmio di Firenze and Fondo per gli Investimenti per la Ricerca di Base (MIUR), contract RBNE01TTJW. Supporting Information Available: Supporting Information Table S1 listing structural parameters and sequences of all the EF-hand domains present in the pdb database. Supporting Information Table S2 listing interhelical angles and main principal components ranges for EFhD families. Supporting Information Figure S1 showing the distribution of values for each of the six interhelical angles within the EFhD dataset. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Nelson, M. R.; Chazin, W. J. Biometals 1998, 11, 297-318. (2) Bhattacharya, S.; Bunick, C. G.; Chazin, W. J. Biochim. Biophys. Acta 2004, 1742, 69-79. (3) Carafoli, E.; Klee, C. B. Calcium as Cellular Regulator; Oxford University Press: New York, 1999. (4) Evenas, J.; Malmendal, A.; Forsen, S. Curr. Opin. Chem. Biol. 1998, 2, 293-302. (5) Han, J. D.; Bertin, N.; Hao, T.; Goldberg, D. S.; Berriz, G. F.; Zhang, L. V.; Dupuy, D.; Walhout, A. J.; Cusick, M. E.; Roth, F. P.; Vidal, M. Nature 2004, 430, 88-93. (6) Kretsinger, R. H. Annu. Rev. Biochem. 1976, 45, 239-266. (7) Hoeflich, K. P.; Ikura, M. Cell 2002, 108, 739-742. (8) Yap, K. L.; Ames, J. B.; Swindells, M. B.; Ikura, M. Proteins 1999, 37, 499-507. (9) Theret, I.; Baladi, S.; Cox, J. A.; Gallay, J.; Sakamoto, H.; Craescu, C. T. Biochemistry 2001, 40, 13888-13897. (10) Harris, N. L.; Presnell, S. R.; Cohen, F. E. J. Mol. Biol. 1994, 236, 1356-1368. (11) Nelson, M. R.; Chazin, W. J. Protein Sci. 1998, 7, 270-282. (12) Yap, K. L.; Ames, J. B.; Swindells, M. B.; Ikura, M. Methods Mol. Biol. 2002, 173, 317-324. (13) de Alba, E.; Tjandra, N. Biochemistry 2004, 43, 10039-10049. (14) Akke, M.; Forsen, S.; Chazin, W. J. J. Mol. Biol. 1995, 252, 102121. (15) Ikura, M. Trends Biochem. Sci. 1996, 21, 14-17. (16) Zhang, M.; Tanaka, T.; Ikura, M. Nat. Struct. Biol. 1995, 2, 758767. (17) Kuboniwa, H.; Tjandra, N.; Grzesiek, S.; Ren, H.; Klee, C. B.; Bax, A. Nat. Struct. Biol. 1995, 2, 768-776. (18) Finn, B. E.; Evenas, J.; Drakenberg, T.; Waltho, J. P.; Thulin, E.; Forsen, S. Nat. Struct. Biol. 1995, 2, 777-783. (19) Drohat, A. C.; Amburgey, J. C.; Abildgaard, F.; Starich, M. R.; Baldisseri, D.; Weber, D. J. Biochemistry 1996, 35, 11577-11588. (20) Massart, D. L.; Vandeginste, B. G. M.; Deming, S. N.; Michotte, Y.; Kaufman, L. Chemometrics: A Textbook; Elsevier Science Publisher: Amsterdam, 1988. (21) Koradi, R.; Billeter, M.; Wuthrich, K. J. Mol. Graph. 1996, 14, 5132. (22) Becker, R. A.; Chambers, J. M.; Wilks, A. R. The New S Language - A Programming Environment for Data Analysis and Graphics; Wadsworth and Brooks/Cole: Pacific Grove, CA, 1988. (23) Mardia, K. V.; Kent, J. T.; Bibby, J. M. Multivariate Analysis; Academic Press: London, 1979. (24) Venables, W. N.; Ripley, B. D. Modern Applied Statistics with S-PLUS. Statistics and Computing, 4th ed.; Springer-Verlag: New York, 2002.

1970

Journal of Proteome Research • Vol. 4, No. 6, 2005

Babini et al. (25) Cattell, R. B. Multivar. Behav. Res. 1966, 1, 245-276. (26) Kaiser, H. F. Educ. Psychol. Meas. 1960, 20, 141-151. (27) Hakstian, R. A.; Rogers, W. T.; Cattell, R. B. Multivar. Behav. Res. 1982, 17, 193-219. (28) Pearl, F. M.; Bennett, C. F.; Bray, J. E.; Harrison, A. P.; Martin, N.; Shepherd, A.; Sillitoe, I.; Thornton, J.; Orengo, C. A. Nucleic Acids Res. 2003, 31, 452-455. (29) Andreeva, A.; Howorth, D.; Brenner, S. E.; Hubbard, T. J.; Chothia, C.; Murzin, A. G. Nucleic Acids Res. 2004, 32, D226-D229. (30) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res. 2000, 28, 235-242. (31) Chenna, R.; Sugawara, H.; Koike, T.; Lopez, R.; Gibson, T. J.; Higgins, D. G.; Thompson, J. D. Nucleic Acids Res. 2003, 31, 34973500. (32) Madera, M.; Gough, J. Nucleic Acids Res. 2002, 30, 4321-4328. (33) Hair, J. F., Jr.; Tatham, R. L.; Anderson, R. E.; Black, W. C. Multivariate Data Analysis, 5th ed.; Prentice Hall: Upper Saddler River, New Jersey, 1998. (34) Stevens, J. Applied Multivariate Statistics for the Social Sciences, 4th ed.; Lawrence Erlbaum Associates: Mahwah, New Jersey, 2002. (35) Winer, B. J.; Brown, D. R.; Michels, K. M. Statistical Principals in Experimental Design, 3rd ed.; McGraw-Hill: New York, 1991. (36) Houdusse, A.; Love, M. L.; Dominguez, R.; Grabarek, Z.; Cohen, C. Structure 1997, 5, 1695-1711. (37) Bunick, C. G.; Nelson, M. R.; Mangahas, S.; Hunter, M. J.; Sheehan, J. H.; Mizoue, L. S.; Bunick, G. J.; Chazin, W. J. J. Am. Chem. Soc. 2004, 126, 5990-5998. (38) Fallon, J. L.; Quiocho, F. A. Structure.(Camb.) 2003, 11, 13031307. (39) Mizoue, L. S.; Chazin, W. J. Curr. Opin. Struct. Biol. 2002, 12, 459463. (40) Ikura, M.; Hiraoki, T.; Hikichi, K.; Mikuni, T.; Yazawa, M.; Yagi, K. Biochemistry 1983, 22, 2573-2579. (41) Ikura, M.; Clore, G. M.; Gronenborn, A. M.; Zhu, G.; Klee, C. B.; Bax, A. Science 1992, 256, 632-638. (42) Kurokawa, H.; Osawa, M.; Kurihara, H.; Katayama, N.; Tokumitsu, H.; Swindells, M. B.; Kainosho, M.; Ikura, M. J. Mol. Biol. 2001, 312, 59-68. (43) Porumb, T.; Yau, P.; Harvey, T. S.; Ikura, M. Protein Eng. 1994, 7, 109-115. (44) Yagi, K.; Yazawa, M.; Ikura, M.; Hikichi, K. Adv. Exp. Med. Biol. 1989, 255, 147-154. (45) Yap, K. L.; Kim, J.; Truong, K.; Sherman, M.; Yuan, T.; Ikura, M. J. Struct. Funct. Genomics 2000, 1, 8-14. (46) Bertini, I.; Del, B. C.; Gelis, I.; Katsaros, N.; Luchinat, C.; Parigi, G.; Peana, M.; Provenzani, A.; Zoroddu, M. A. Proc. Natl. Acad. Sci. U.S.A 2004, 101, 6841-6846. (47) Elshorst, B.; Hennig, M.; Forsterling, H.; Diener, A.; Maurer, M.; Schulte, P.; Schwalbe, H.; Griesinger, C.; Krebs, J.; Schmid, H.; Vorherr, T.; Carafoli, E. Biochemistry 1999, 38, 12320-12332. (48) Bhattacharya, S.; Large, E.; Heizmann, C. W.; Hemmings, B.; Chazin, W. J. Biochemistry 2003, 42, 14416-14426. (49) Dempsey, A. C.; Walsh, M. P.; Shaw, G. S. Structure.(Camb.) 2003, 11, 887-897. (50) Vallely, K. M.; Rustandi, R. R.; Ellis, K. C.; Varlamova, O.; Bresnick, A. R.; Weber, D. J. Biochemistry 2002, 41, 12670-12680. (51) Rustandi, R. R.; Baldisseri, D. M.; Inman, K. G.; Nizner, P.; Hamilton, S. M.; Landar, A.; Landar, A.; Zimmer, D. B.; Weber, D. J. Biochemistry 2002, 41, 788-796. (52) Zhang, H.; Wang, G.; Ding, Y.; Wang, Z.; Barraclough, R.; Rudland, P. S.; Fernig, D. G.; Rao, Z. J. Mol. Biol. 2003, 325, 785-794. (53) Sastry, M.; Ketchem, R. R.; Crescenzi, O.; Weber, C.; Lubienski, M. J.; Hidaka, H.; Chazin, W. J. Structure. 1998, 6, 223-231. (54) Maler, L.; Sastry, M.; Chazin, W. J. J. Mol. Biol. 2002, 317, 279290. (55) Rety, S.; Sopkova, J.; Renouard, M.; Osterloh, D.; Gerke, V.; Tabaries, S.; Russo-Marie, F.; Lewit-Bentley, A. Nat. Struct. Biol. 1999, 6, 89-95. (56) Ababou, A.; Desjarlais, J. R. Protein Sci. 2001, 10, 301-312. (57) Ababou, A.; Shenvi, R. A.; Desjarlais, J. R. Biochemistry 2001, 40, 12719-12726. (58) Skelton, N. J.; Kordel, J.; Akke, M.; Forsen, S.; Chazin, W. J. Nat. Struct. Biol. 1994, 1, 239-245. (59) Xie, X.; Dwyer, M. D.; Swenson, L.; Parker, M. H.; Botfield, M. C. Protein Sci. 2001, 10, 2419-2425. (60) Jia, J.; Borregaard, N.; Lollike, K.; Cygler, M. Acta Crystallogr. D. Biol. Crystallogr. 2001, 57, 1843-1849. (61) Jia, J.; Tarabykina, S.; Hansen, C.; Berchtold, M.; Cygler, M. Structure.(Camb.) 2001, 9, 267-275.

research articles

Principal Component Analysis of Conformational Freedom (62) Enmon, J. L.; de Beer, T.; Overduin, M. Biochemistry 2000, 39, 4309-4319. (63) Whitehead, B.; Tessari, M.; Carotenuto, A.; van Bergen en Henegouwen PM.; Vuister, G. W. Biochemistry 1999, 38, 1127111277. (64) Cox, J. A.; Durussel, I.; Scott, D. J.; Berchtold, M. W. Eur. J. Biochem. 1999, 264, 790-799. (65) Babini, E.; Bertini, I.; Capozzi, F.; Del, B. C.; Hollender, D.; Kiss, T.; Luchinat, C.; Quattrone, A. Biochemistry 2004, 43, 1607616085. (66) Xie, X.; Harrison, D. H.; Schlichting, I.; Sweet, R. M.; Kalabokis, V. N.; Szent-Gyorgyi, A. G.; Cohen, C. Nature 1994, 368, 306312. (67) Houdusse, A.; Cohen, C. Structure 1996, 4, 21-32. (68) Houdusse, A.; Szent-Gyorgyi, A. G.; Cohen, C. Proc. Natl. Acad. Sci. U. S.A 2000, 97, 11238-11243. (69) Vijay-Kumar, S.; Kumar, V. D. Nat. Struct. Biol. 1999, 6, 80-88. (70) Kissinger, C. R.; Parge, H. E.; Knighton, D. R.; Lewis, C. T.; Pelletier, L. A.; Tempczyk, A.; Kalish, V. J.; Tucker, K. D.; Showalter, R. E.; Moomaw, E. W. Nature 1995, 378, 641-644. (71) Flaherty, K. M.; Zozulya, S.; Stryer, L.; McKay, D. B. Cell 1993, 75, 709-716. (72) Bourne, Y.; Dannenberg, J.; Pollmann, V.; Marchot, P.; Pongs, O. J. Biol. Chem. 2001, 276, 11949-11955. (73) Hohenester, E.; Maurer, P.; Hohenadl, C.; Timpl, R.; Jansonius, J. N.; Engel, J. Nat. Struct. Biol. 1996, 3, 67-73. (74) Atkinson, R. A.; Joseph, C.; Kelly, G.; Muskett, F. W.; Frenkiel, T. A.; Nietlispach, D.; Pastore, A. Nat. Struct. Biol. 2001, 8, 853857. (75) Zheng, N.; Wang, P.; Jeffrey, P. D.; Pavletich, N. P. Cell 2000, 102, 533-539. (76) Ames, J. B.; Dizhoor, A. M.; Ikura, M.; Palczewski, K.; Stryer, L. J. Biol. Chem. 1999, 274, 19329-19337. (77) Matei, E.; Miron, S.; Blouquit, Y.; Duchambon, P.; Durussel, I.; Cox, J. A.; Craescu, C. T. Biochemistry 2003, 42, 1439-1450. (78) Hu, H.; Chazin, W. J. J. Mol. Biol. 2003, 330, 473-484.

(79) Scannevin, R. H.; Wang, K.; Jow, F.; Megules, J.; Kopsco, D. C.; Edris, W.; Carroll, K. C.; Lu, Q.; Xu, W.; Xu, Z.; Katz, A. H.; Olland, S.; Lin, L.; Taylor, M.; Stahl, M.; Malakian, K.; Somers, W.; Mosyak, L.; Bowlby, M. R.; Chanda, P.; Rhodes, K. J. Neuron 2004, 41, 587598. (80) Tossavainen, H.; Permi, P.; Annila, A.; Kilpelainen, I.; Drakenberg, T. Eur. J. Biochem. 2003, 270, 2505-2512. (81) Chou, J. J.; Li, S.; Klee, C. B.; Bax, A. Nat. Struct. Biol. 2001, 8, 990-997. (82) Song, J.; Zhao, Q.; Thao, S.; Frederick, R. O.; Markley, J. L. J. Biomol. NMR 2004, 30, 451-456. (83) Drum, C. L.; Yan, S. Z.; Bard, J.; Shen, Y. Q.; Lu, D.; Soelaiman, S.; Grabarek, Z.; Bohm, A.; Tang, W. J. Nature 2002, 415, 396402. (84) Matsubara, M.; Nakatsu, T.; Kato, H.; Taniguchi, H. EMBO J. 2004, 23, 712-718. (85) Clapperton, J. A.; Martin, S. R.; Smerdon, S. J.; Gamblin, S. J.; Bayley, P. M. Biochemistry 2002, 41, 14669-14679. (86) McClintock, K. A.; Shaw, G. S. J. Biol. Chem. 2003, 278, 62516257. (87) Rety, S.; Osterloh, D.; Arie, J. P.; Tabaries, S.; Seeman, J.; RussoMarie, F.; Gerke, V.; Lewit-Bentley, A. Structure. Fold. Des 2000, 8, 175-184. (88) Wu, H.; Maciejewski, M. W.; Marintchev, A.; Benashski, S. E.; Mullen, G. P.; King, S. M. Nat. Struct. Biol. 2000, 7, 575-579. (89) Itou, H.; Yao, M.; Fujita, I.; Watanabe, N.; Suzuki, M.; Nishihira, J.; Tanaka, I. J. Mol. Biol. 2002, 316, 265-276. (90) Akke, M.; Chazin, W. J. Nat. Struct. Biol. 2001, 8, 910-912. (91) Evenas, J.; Malmendal, A.; Akke, M. Structure.(Camb.) 2001, 9, 185-195. (92) Evenas, J.; Forsen, S.; Malmendal, A.; Akke, M. J. Mol. Biol. 1999, 289, 603-617. (93) Malmendal, A.; Evenas, J.; Forsen, S.; Akke, M. J. Mol. Biol. 1999, 293, 883-899.

PR050148N

Journal of Proteome Research • Vol. 4, No. 6, 2005 1971