Biophysical Constraints for Protein Structure Prediction - American

Jul 19, 2002 - Department of Biochemistry and Molecular Biology, Georgetown University School of Medicine,. Washington D.C. 20007, U.S.A., Institute f...
0 downloads 0 Views 83KB Size
Biophysical Constraints for Protein Structure Prediction Olga Tcherkasskaya,*,† Eugene A. Davidson,† and Vladimir N. Uversky‡,§ Department of Biochemistry and Molecular Biology, Georgetown University School of Medicine, Washington D.C. 20007, U.S.A., Institute for Biological Instrumentation, RAS, 142292 Pushchino, Moscow Region, Russia, and Department of Chemistry and Biochemistry, University of California, Santa Cruz, California 95064, U.S.A. Received July 19, 2002

Though highly desirable, neither a single experimental technique nor a computational approach can be sufficient enough to rationalize a protein structure. The incorporation of biophysical constraints, which can be rationalized based on conventional biophysical measurements, might lead to considerable improvement of the simulation procedures. In this regard, our analysis of 180 proteins in different conformational states allows prediction of the overall protein dimension based on the chain length, i.e., the protein molecular weight, with an accuracy of 10%. Keywords: protein • structure • folding • molecular dimension • prediction

Introduction Following the completion of several genome sequences, immediate efforts in advanced molecular biology are focused on structural proteomics. As a result of the tremendous progress in large-scale DNA sequencing projects,1,2 the rapid growth in biological sequence information has put strong pressure on the scientific community to produce structural information for new proteins with high throughput. The estimated number of different human proteins is enormous. To complete the human proteome, each protein will have to be identified and characterized. Most of their structures and functions are unknown. Although genomics delivered the mass of raw information as promised, gene sequences reveal little about protein function or disease relevance. In fact, the true value of the genome sequence information will only be realized after a function, among other things, will be assigned to each of the encoded proteins and their modified products. The challenge facing proteomics is enormous because more than 75% of the proteins in a multicellular organism have no known cellular function.3 Because the function of a protein has many facets, ranging from biochemical activity to a physiological role, the optimal proteomics strategy must integrate many different technologies, which are unbiased in design and poised to discover the unknown. The primary structure of a protein determines its threedimensional (3D) structure, which in turn determines its function. Therefore, experimental techniques that are capable of providing information on protein topology play a key role in structural proteomics. Even though the number of tech* To whom correspondence should be addressed. Phone: (202) 687-1303. Fax: (202) 687-7186. E-mail: [email protected]. † Department of Biochemistry and Molecular Biology, Georgetown University School of Medicine. ‡ Institute for Biological Instrumentation, RAS. § Department of Chemistry and Biochemistry, University of California, Santa Cruz. 10.1021/pr025552q CCC: $25.00

 2003 American Chemical Society

niques that can provide 3D structure data increases steadily (i.e., synchrotron X-ray crystallography, microcrystallography, solid state NMR, and electron microscopy), determining the complete protein conformation is still a long, involved process, with proteomics aiming at what is presently an unattainable goal. With the human-genome effort poised to identify many thousands of new proteins, a faster way to determine 3D structures is urgently required. In this context, it is noteworthy that the conservation of protein folds is more robust than the conservation of protein sequences, and that there exists a significant correlation between structure and biological function, which is being explored systematically.4-6 Structural homology is, thus, a very powerful tool by which we can assign functions to new proteins that bear only a relatively remote resemblance to known proteins in their sequence homologies. During the past decade, a large number of computational approaches intending to predict the 3D structure of a protein with a given amino-acid sequence have been proposed.7-9 Indeed, molecular modeling became a versatile technique that can be applied to a variety of macromolecular studies. The number of software packages including different modeling-type simulations has increased dramatically.7 The first class of protein structure prediction methods, including threading and comparative modeling, rely on detectable similarity spanning most of the modeled sequence and at least one known structure, for example, homology modeling. The second class of methods, de novo or ab initio methods, predicts the structure from sequence alone, without relying on similarity at the fold level between the modeled sequence and any of the known structures, and, in fact, no direct experimental data are used. Note that regardless of the computational methodology used for ab initio prediction, the candidate structure always needs to be verified. Basically, direct data from a variety of biophysical experiments can be included in the prediction procedures, complementing one another. Indeed, modeling-type simulations are Journal of Proteome Research 2003, 2, 37-42

37

Published on Web 10/16/2002

research articles utilized routinely in experimental methods for structure determination, e.g., in crystallography and NMR. In the latter, for instance, the experimental data are incorporated into restraint terms in the force field. Recently, several approaches have been developed in which simulation schemes include data from either residual dipolar couplings10 or small-angle X-ray scattering and electron micrographs.11,12 As to the computational time needed to generate the initial set of candidate structures, the simple coarse-grained approach12-14 seems to be the technique of choice. This method is intended to capture the overall topology of the protein’s native conformation: A candidate set of only CR backbones is generated, and no secondary structure or side-chain orientation information is included in the algorithm. This representation is a good starting point for hierarchical modeling of protein structure with more details added at different levels.15 It is a homogeneous model in the sense that all interactions between residues are treated as identical in form and value. Given a sufficiently accurate model of CR backbones, an all-atom model can be built in a relatively straightforward manner by side-chain packing.16,17 Above all, this approach requires shorter computational time, allows for generating a large protein-fold library in advance, and uses a minimum number of preliminary assumptions. In this context, the use of coarse-grained approaches in combination with physical filters based on biophysical/biochemical experiments as constraints on structural prediction algorithms might contribute to the effective operation of high throughput structural genomics and, ultimately, to its application in identifying the function of unknown genes. Though highly desirable, neither a single experimental technique nor a computational approach can be sufficient to rationalize a protein structure. It is essential, therefore, to develop bioinformatic tools, which might be useful for structure prediction. Such databases might include information as to the correlation between protein/peptide chain length and molecular dimension, distribution of the distances between residues exposed to the solvent, as well as other aspects. In the present work, we address these issues regarding the effective hydrodynamic radius of the protein molecule in a variety of conformational states. Clearly, the size and shape of the bounding volume used for structure simulations plays a crucial role in determining the efficiency and accuracy of any algorithm. The incorporation of a size/shape constraint derived from experimental data will lead to considerable improvement of simulation procedures.

Results and Discussion Molecular density remains the most unambiguous characteristic of the macromolecular forms. For instance, the density of a globule is expected to be independent of chain length.18 By contrast, the density of partially collapsed or swelled macromolecules depends on both the chain length, and therefore on its molecular weight M, and on the solvent. In this context, under conditions known as “ideal” or “Θ-conditions”, that is, when the attractions of the macromolecular segments are balanced by those with the solvent, the density of macromolecules is expected18-20 to follow M-0.5, and the conformational behavior of polymer molecules can be described with the Gaussian statistics.18 It appears that understanding protein folding might be aided by an analysis of the density of proteins adopting different conformational states. Recently, we have found that there is a clear correlation between protein length and the dimension of the final molec38

Journal of Proteome Research • Vol. 2, No. 1, 2003

Tcherkasskaya et al.

ular form.21 To further develop this idea, we analyzed the molecular density as a function of chain length for a number of conformational states of protein molecules. The first class of samples included native proteins with nearly spherical shape, whereas the second class comprised proteins that have been established to exhibit the molten globule state under the physiological conditions or non-native globule state during unfolding by denaturant or changing pH. For characterizing the fully unfolded states induced by strong denaturants, such as urea or guanidinium chloride (GdnHCl), we initially concentrated on proteins without cross-links (i.e., on proteins either with reduced disulfide bridges or without cysteines), as the presence of these might significantly perturb the molecular dimensions. Additional attention was paid to the third class of samples, namely, to proteins that exhibit denatured compact states or a pre-molten globule state during unfolding by strong denaturants. In this case, we exclusively used proteins exhibiting four-state unfolding transition, indicating molten and premolten globule states, along with native and unfolded states. Further, a large number of proteins that show little or no ordered structure under physiological conditions, so-called, natively unfolded proteins, were also incorporated in the analysis.22,23 Furthermore, the set of proteins with intact disulfide bridges, which were unfolded by strong denaturants was taken into consideration. The experimental data used in the present study are gathered in Table 1. To get insight into the residual structure of the proteins in different conformational states, we analyze hydrodynamic data based on intrinsic viscosity, chromatography, and dynamic light scattering measurements, which allow for recovering the overall dimensions of the peptide chains under study. Specifically, we calculated the apparent density of the protein molecules, F, based on protein molecular weight, M, and hydrodynamic radius, R, that is, F ) 3M/(4πR3). The comprehensive lists of proteins, together with the corresponding references on the data used for analysis, were published in our recent papers.21-23 The logarithmic plot in Figure 1 displays molecular density as a function of the molecular weight for 180 proteins in their native, molten globule, fully unfolded states in 8 M urea and 6 M GdmCl, and compact denatured (pre-MG) states. Altogether, the data indicate a number of equilibrium conformations, which are characterized by essential differences in molecular density and its dependence on peptide length. For instance, analysis of the native proteins, known from structural studies to be essentially spherical (curve 1), points to a strong correlation between the molecular dimensions and the chain length: In particular, molecular density shows no changes with the chain length, giving a density of about 0.5 Da/Å3. These data agree well with those reported previously.24 Molten globules (curve 2) also exhibit a lack of molecular weight dependence, having a lower density of 0.39 Da/Å. A correlation between the apparent density and the chain length is observed for fully unfolded states in urea (curve 4b) and GdnHCl (curve 5). In this case, the density follows the molecular weight as F ≈ M-0.64, seemingly supporting results reported elsewhere.19,20,25 Clearly, in all of the unfolded states, the density of the protein molecules decreases with increasing chain length, pointing to coil-like features of their molecular forms. The data generated for denatured compact states (curve 3, open triangles) are combined into one curve, demonstrating intermediate behavior with respect to the native, molten globule and fully unfolded states. In this case, we used proteins exhibiting four-state unfolding transition induced by urea or

research articles

Constraints for Protein Structure Prediction Table 1. Hydrodynamic Characteristics of the Proteins in Different Conformational Statesa protein

basic protein immunoglobulin binding ubiquitin (apo)cytochrome C ribonuclease A R-lactalbumin lysozyme intestinal fatty acid binding tumor supressor, p16 (apo)myoglobin staphylococcal nuclease β-lactoglobulin sarcoplasmic calcium binding adenylate kinase ovine placental lactogen trypsinogen chymotrypsinogen tryptophan synthase β-lactamase carbonic anhydrase B (apo)cytochrome C R-lactalbumin tumor supressor, p16 (apo)myoglobin (ph 4) β-lactoglobulin sarcoplasmic calcium binding adenylate kinase ovine placental lactogen

intestinal fatty acid-binding tumor suppressor, p16 staphylococcal nuclease (KCl) (apo)myoglobin (Na2SO4) adenylate kinase tryptophan synthase osteocalcin protein kinase inhibitord caldesmone SNase∆, A90S mutant Pf1 gene 5 proteinf PPI-1 DARRP-32 manganese stabilizing proteing calreticulin, humanh ribonuclease A R-lactalbumin bovine R-lactalbumin human lysozyme trypsinogen β-lactoglobulin

R-fetoproteinj Vmw65 C-terminal domain PDE γ Em protein apo-cytochrome c prothymosin R fibronectin binding domain B γ-synuclein fibronectin binding domain A ribonuclease A, reduced β-synuclein insulin ubiquitin cytochrome c ribonuclease A lysozyme hemoglobin myoglobin β-lactoglobulin chymotrypsinogen insulin albebetin

M KDa

〈R〉, Å ([η], mL/g)

protein

Native Stateb 4.3 (3.8) RTEM β-lactamase 6.2 (2.8) ovalbumin A1 8.5 16 β-lactoglobulin 11.7 18.5 pepsinogen 13.7 (3.4) G-actin 14.1 18.5 (MMP-1) interstitial collagenase 14.2 (2.7) ovalbumin 15.1 20 serum albumin 16.5 20 R-fetoprotein 17.0 20.9 (3.9) hemoglobin 17.5 20.5 DnaK 18.5 22 creatine kinase, dimer 19.5 21.5 purple acid phosphatase 21.7 21.9 acetylcholinesterase 21.8 22.4 lactate dehydrogenase 25.4 19.8 gap dehydrogenase 25.7 (2.5) aldolase 28.7 24.2 phosphorybosil transferase 28.8 23.7 catalase 28.8 23.3 (2.9) bushy stunt virus, multimer Molten Globule Stateb 11.7 20.1 trypsinogen 14.1 20.2 β-lactamase 16.5 23.6 carbonic anhydrase B 17.0 25.3 (4.1) RTEM β-lactamase 18.5 24 (MMP-1) interstitial collagenase 19.5 24.2 ovalbumin 21.7 24.3 R-fetoprotein 21.8 24.5 DnaK Compact Denatured State (nonglobule state) Unfolding by Urea or GdnHCl at 25 °Cb 15.1 29 β-lactamase 16.5 30.3 carbonic anhydrase B 16.8 29.2 (MMP-1) interstitial collagenase 17.0 27.2 creatine kinase 21.7 30.3 DnaK 28.7 33.9 Natively Unfolded Protein with High Hydrophobicityc 5.4 18.4 calsequestrin, rabbit 7.9 22.3 calreticulin, human 14.0 28.1 calreticulin, bovine 14.1 25.0 taka-amylase A, reduced 15.8 29.5 SdrD proteini 20.8 32.3 chromatogranin B 23.1 34.0 topoisomerase I 26.5 32.7 fibronectin 40.6 46.2 Proteins with Intact Cross-Links in 8 M Urea or in 6 M GdnHClc 13.7 27.3 adenylate kinase 14.1 25.1 chymotrypsinogen 14.2 25.8 (MMP-1) interstitial collagenase 14.2 25.7 serum albumin 25.4 29.1 R-fetoprotein 18.5 32.0 Unfolded States (with minimal residual structure) Coil-like Proteins with Low Hydrophobicity (pH 7, 25 °C)c 3.6 15.5 R-synuclein 9.3 28.0 fibronectin binding domain D 9.7 26.0 stathmin 11.2 28.2 CFos-AD domaink 11.7 30.0 calf thymus histone 12.1 24.3 β-casein 12.3 30.7 phosvitin 13.3 30.4 chromatogranin A 13.7 31.7 caldesmon 13.7 50.6 MAP-2 14.3 32.0 Unfolded States (with minimal residual structure) 8M Urea, 25 °C (proteins without cross-links)c 3.0 14.6 carbonic anhydrase B 8.5 24.6 β-lactamase 11.7 4.05 ovalbumin 13.7 32.4 serum albumin 14.2 33.1 lactate dehydrogenase 15.5 33.5 GAP dehydrogenase 16.9 35.1 aldolase 18.5 37.8 transferrin 25.7 45.0 thyroglobulin 6 M GdnHCl, 25 °C (proteins without cross-links)b 3.0 (6.1) β-lactamase 7.8 24.1 carbonic anhydrase B

M KDa

〈R〉, Å ([η], mL/g)

29.0 36.0 36.8 40.0 41.6 42.6 45.3 66.3 66.5 68.0 70.0 86.2 101 121 141 145 160 210 220 10700

24.5 (3.9) (3.4) (3.2) 28 27.2 (4.4) (3.7) 32.4 (3.6) 32.5 35.2 41.7 49 43.9 (3.8) (4) 51 (4) (3.4)

25.4 28.8 28.8 28.9 42.6 42.8 66.5 69.0

25.6 26.6 26.4 27 32.1 33.5 (5) 35.5 36.3

28.8 28.8 42.6 43.1 69.0

32.4 31.7 40.2 42 53.1

45.2 46.8 47.6 52.5 64.8 77.3 90.7 530

45.0 46.2 44.2 43.1 54.7 50.3 58.5 115

21.7 25.7 42.6 66.3 66.5

35.0 35.5 36.3 57.2 55.2

14.5 14.7 17.0 17.3 19.8 24.0 24.9 48.3 140.0 220.0

32.3 31.8 33.0 35.0 36.7 41.7 39.9 58.5 91.0 122.0

28.8 28.8 43.5 66.3 35.3 36.3 40.0 81.0 165.0

47.8 48.9 58.8 74 52.0 54.0 57 81.0 116.0

28.8 28.8

52 52(29.5)

Journal of Proteome Research • Vol. 2, No. 1, 2003 39

research articles

Tcherkasskaya et al.

Table 1. Continued protein

M KDa

〈R〉, Å ([η], mL/g)

protein

M KDa

〈R〉, Å ([η], mL/g)

ubiquitin prothymosin R ribonuclease a R-lactalbumin lysozyme R-synuclein intestinal fatty acid-binding intestinal fatty acid-binding myoglobin apoflavodoxin tumor suppressor, p16 staphylococcal nuclease dihydrofolate reductase β-lactoglobulin adenylate kinase ovine placental lactogen chimotrypsinogen

8.5 12.1 13.7 14.1 14.2 14.5 14.7 15.1 17.2 16.5 16.5 17.5 17.6 18.5 21.7 21.8 25.7

25.8 31.4 32.8(16.2) 31.8 (17) 34.3 36.1 36.4 (18.8) 37.7 37.1 36 37.6 37 (22.7) 42.1 41.6 (26.8)

phosphoribosyl transferase lactate dehydrogenase ovalbumin A1 GAP dehydrogenase pepsinogen aldolase G-actin creatine kinase purple acid phosphatase serum albumin R-fetoprotein DnaK transferrin acetylcholinesterase thyroglobulin myosin cervical mucus glycoprotein

35.0 35.3 36.0 36.3 40.0 40.0 41.6 43.1 50.2 66.3 66.5 69 81 121 165 197 12000

(31.8) 55.1 (31.1) (34.4) (31.4) (35.4) 60 61 72 82(52) 72 73 86.8 110 (81.6) (92.6) 1300

a 〈R〉 is hydrodynamic radius of a protein molecule, [η] is the intrinsic viscosity of a protein solution. b For details, see ref 11. c For details, see refs 22 and 23. d Heat stable. e Fragment 636-771. f D4 domain, fragment 1-144. g L245E mutant. h -41C fragment. i B1-B5 fragment. j Fragment 447-480. k Fragment 216380.

that these proteins can be divided in two groups, based on their hydrodynamic characteristics.22,23 The first group of the natively unfolded proteins, characterized by relatively high hydrophobicity, follows curve 3 (grey triangles), recovered previously for the denatured compact state; the second group, characterized by lower hydrophobicity,22 exhibits conformational behavior (curve 4a), which is close to that in 8 M urea (curve 4b). Furthermore, the analysis of the unfolded proteins in 8 M urea and 6 M GdnHCl points to the sequence specificity, which might affect the stability of the denatured compact state. In particular, we found that the proteins with intact disulfide bridges exhibit features of the denatured compact state even under strong denaturing conditions (curve 3, gray squares). Therefore, it is prudent to assume that disulfide bridges tend to stabilize the denatured compact state under physiological conditions. On the basis of recovered data, we derived standard equations for a number of conformational states of protein molecules that show correlation between the protein chain length characterized by M and the hydrodynamic radius, R R ) Kh M  Figure 1. Variation of the density of protein molecules, F, with protein molecular weight, M, for a number of thermodynamically stable conformational states: 1, native; 2, molten globule; 3, denatured compact (natively unfolded proteins are shown as gray reversed triangles; proteins with intact sulfate bridges in 8 M urea or 6 M GdnHCl are shown as gray squares; intermediates accumulated during the unfolding by urea or GdmCl are shown by open triangles); 4a, coillike proteins under the physiological conditions; 4b, unfolded in 8 M urea (proteins without cross-links or with reduced cross-links); and 5, unfolded in 6 M GdnHCl (proteins without cross-links or with reduced cross-links). The protein used for this analysis are listed Table 1. The solid lines represent the best fit of the data to the standard function R ) KhM .

GdnHCl, while the molten and pre-molten globule states along with the native and the unfolded states were observed. Overall, the data obtained for denatured compact states indicate poor solvent conditions, which encourage intramolecular interactions, compared to the fully unfolded states in 8 M urea or 6 M GdnHCl. The analysis of natively unfolded proteins provides unexpected results, which need to be addressed in detail. We found 40

Journal of Proteome Research • Vol. 2, No. 1, 2003

In the above equation Kh and  are expected to be a constant for a given conformational state in a wide range of chain lengths. In fact, we found that the best fit of the experimental data to the equations of this type provides the following correlations between molecular dimension and chain length R(native) ) (0.75 ( 0.05)M(0.33 ( 0.02)

(1)

R(MG) ) (0.90 ( 0.10)M(0.33 ( 0.02)

(2)

R(pre-MG) ) (0.60 ( 0.10)M(0.40 ( 0.02)

(3)

R(coiled) ) (0.28 ( 0.02)M(0.49 ( 0.01)

(4a)

R(8 M urea) ) (0.22 ( 0.01)M(0.52 ( 0.01)

(4b)

R(6 M GdnHCl) ) (0.19 ( 0.01)M(0.54 ( 0.01)

(5)

Note that the data obtained for denatured compact (pre-MG) states include (1) proteins without disulfide bridges, which exhibit pre-molten globule state under unfolding by GdnHCl or urea; (2) natively unfolded proteins with relatively high hydrophobicity under the physiological conditions; as well as

research articles

Constraints for Protein Structure Prediction

interactions being a driving force in native structure formation. Second, both native and molten globule proteins exhibit features of macromolecular globules, where the fluctuations of the molecular density are expected to be much less than the density itself. In this regard, the value of exponent  ) 0.33 in eqs 1 and 2 is in good agreement with that predicted by the theory of the coil-globule transition.18 Third, in the denatured compact states, protein molecules reveal features that are similar to that of polymer molecules in relatively poor solvents, given that exponent  in eq 3 has a value of 0.4.

Concluding Remarks

Figure 2. Distribution of the relative errors (calculated with eq 6) for the approximations of the hydrodynamic radii of the protein molecules with standard equations of R ) KhM  type for different conformational states (see eqs 1-5). Each plot is identified similar to the correspondent curve, which is shown in Figure 1. Standard deviation of the relative errors, σ, calculated with eq 7 is also shown.

(3) proteins with intact disulfide bridges in 8 M urea or 6 M GdnHCl. Further, although the natively unfolded proteins with low hydrophobicity (denoted as coiled) exhibit molecular parameters, which are close to that in 8 M urea, we still distinguish these curves, owing to the differences in the solvents encouraging different molecular forms. To provide further support to the correlations described by eqs 1-5 we calculate the relative error of the approximation as follows exp

∆Ri Ri ) Ri

theor

- Ri

× 100%

exp

Ri

(6)

and the standard deviation of the relative errors for a wide range of the molecular mass σ2 )

1

n

∑ n-1 i

( ) ∆Ri Ri

2

(7)

In the above expression, n is the number of protein molecules used for the data analysis. One can see from the data of Figure 2 that the relative errors of the recovered approximations exhibit random distribution over the wide range of chain lengths and do not exceed 10%. A somewhat larger amplitude of the weighted residuals is observed for the pre-MG state, most likely indicating the lower stability of this state in comparison with others. Several important observations follow from the present study. First, the hydrodynamic data testify that regardless of the differences in primary amino acid sequence, protein molecules in a number of conformational states behave as polymer homologues, allowing speculation as to volume

A number of denatured conformational states, which are thermodynamically stable under nearly physiological conditions, have been distinguished and analyzed thoroughly in protein-folding studies. Many of them exhibit common properties, allowing speculation as to the polymeric aspects of native structure formation. For instance, the hydrodynamic behavior of protein molecules, namely, the variation of the molecular dimension with the chain length, point to an invariant overall residual structure over protein sequence: Protein molecules in a particular conformational state behave as polymer homologues. Denatured compact (pre-MG) states, which were observed on numerous occasions, exhibit the features of a squeezed polymer coil. In this regard, small changes in the thermodynamic quality of the solvent, or changes induced by proton transfer, interactions with a ligand, fluctuations of temperature, and so forth, can trigger a transition to the more ordered molten globule or native states. In addition, the stabilization of this denatured compact state might be achieved by the incorporation of disulfide bridges into the protein sequence. It should be noted that the bioinformatics filter based on the known dependence of protein dimension on the peptide length might aid greatly in a structure prediction agenda. Clearly, the size and shape of the bounding volume used for structure simulations play a crucial role in determining the efficiency and accuracy of any algorithm: A larger volume results in exponentially longer searches, whereas a smaller volume or wrong shape may cause the failure of exhaustive enumeration, which may miss relevant conformations. In fact, employing a biophysical filter based on overall protein dimension as a constraint in structure prediction allows for the most effective enumerations of molecular conformations, owing to the restricted molecular volume involved, which is highly effective in reducing the numbers of conformations. Given that the effective molecular dimension implies, in general, the geometrical average of the molecular axes, i.e., R ) (RxRyRz)1/3, this biophysical filter might help both to pick tentative structures, which exhibit larger or smaller molecular dimensions than expected, and to specify dimensions of the molecular shapes (a range of ellipsoids) used in structure simulations. Furthermore, such a database will provide estimates for the molecular dimensions for proteins in different conformational states, allowing, therefore, assignment of a particular conformational property to a protein under study based on conventional hydrodynamic measurements. The latter appears to be particularly important because many new proteins are structurally disordered under native conditions, namely, natively unfolded proteins. Although the present study points to a strong correlation between protein dimensions and chain length, the theoretical analysis of the databases generated with NMR and crystallography data reveal a slightly different Journal of Proteome Research • Vol. 2, No. 1, 2003 41

research articles dependence, namely, the density of the native protein is expected to decrease with peptide length.26,27 Clearly, further analysis of such dependencies are required to take full advantage of this information.

References (1) The Genome International Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature Biotechnol. 2001, 409, 860-921. (2) Venter, J. C.; Adams, M. D.; Myers, E. W.; Li, P. W.; Mural, R. J.; Sutton, G. G. et al. The sequence of the human genome. Science 2001, 291, 1304-1351. (3) Edwards, A. M.; Arrowsmith, C. H.; des Pallieres, B. Proteomics: New tools for a new era. Modern drug discovery 2000, 5 (7), 3544. (4) Turcotte, M.; Muggleton, S. H.; Sternberg, M. J. Automated discovery of structural signatures of protein fold and function. J. Mol. Biol. 2001, 306, 591-605. (5) Todd, A. E.; Orengo, C. A.; Thornton, J. M. Evolution of function in protein super-families, from a structural perspective. J. Mol. Biol. 2001, 307, 1113-1143. (6) Thornton, J. M.; Todd, A. E.; Milburn, D.; Borkakoti, N.; Orengo, C. A. From structure to function: approaches and limitations. Nature Struct. Biol. 2000, 7 (Suppl.), 991-994. (7) Forster, M. J. Molecular modeling in structural biology. Micron 2002, 33 (4), 365-384. (8) Osguthorpe, D. J. Ab initio protein folding. Curr. Opin. Struct. Biol. 2000, 10 (2), 146-152. (9) Simons, K. T.; Strauss, C.; Baker, D. Prospects for ab initio protein structural genomics. J. Mol. Biol. 2001, 306, 1191-1199. (10) Rohl, C. A.; Baker, D. De novo determination of protein backbone structure from residual dipolar couplings using Rosetta. J. Am. Chem. Soc. 2002, 124 (11), 2723-2729. (11) Zheng, W.; Doniach, S. Protein structure prediction constrained by solution X-ray scattering data and structural homology identification. J. Mol. Biol. 2002, 316 (1), 173-87. (12) Svergun, D. I. Restoring low resolution structure of biological macromolecules from solution scattering using simulated annealing. Biophys. J. 1999, 76 (6), 2879-2886. (13) Jernigan, R. L.; Bahar, I.; Covell, D. G.; Atilgan, A. R.; Erman. B.; Flatow, D. T. Relating the structure of HIV-1 reverse transcriptase to its processing step. J. Biomol. Struct. Dyn. 2000, 1, 49-55.

42

Journal of Proteome Research • Vol. 2, No. 1, 2003

Tcherkasskaya et al. (14) Hinds, D. A.; Levitt, M. Exploring conformational space with a simple lattice model for protein structure. J. Mol. Biol. 1994, 243, 668-682. (15) Xia, Y.; Huang, E. S.; Levitt, M.; Samudrala, R. Ab initio construction of protein tertiary structures using a hierarchical approach. J. Mol. Biol. 2000, 300, 171-185. (16) Lee, C.; Subbiah, S. Prediction of protein side-chain conformation by packing optimization. J. Mol. Biol. 1991, 217, 373-388. (17) Koehl, P.; Delarue, M. Application of a self-consistent mean field theory to predict protein side-chains conformation and estimate their conformational entropy. J. Mol. Biol. 1994, 239, 249-275. (18) Grossberg, A. Yu.; Khohlov, A. R. Statistical Physics of Macromolecules; Nauka: Moscow, 1989; p 344. (19) Tanford, C. Protein denaturation. Adv. Prot. Chem. 1968, 23, 121282. (20) Tanford, C. Physical Chemistry of Macromolecules; Willey: New York, 1961. (21) Tcherkasskaya, O.; Uversky, V. N. Denatured collapse states in protein folding: Example of apomyoglobin. Proteins: Struct., Funct., Genet. 2001, 44, 244-254. (22) Uversky, V. N. What does it mean to be natively unfolded? Eur. J. Biochem. 2002, 269, 2-12. (23) Uversky, V. N. Natively unfolded proteins: A point where biology waits for physics. Protein Sci. 2002, 11, 739-756. (24) Wilkins, D. K.; Grimshaw, S. B.; Receveur, V.; Dobson, C. M.; Jones, J. A.; Smith, L. J. Hydrodynamic radii of native and denatured proteins measured by pulse field gradient NMR techniques. Biochemistry 1999, 38, 8, 16 424-16 431, and references therein. (25) Tanford, C.; Kawahara, K.; Lapanje, S. Proteins in 6 M guanidine hydrochloride. Demonstration of random coil behavior. J. Am. Chem. Soc. 1967, 89 (9), 729-736. (26) Skolnick, J.; Kolinski, A.; Ortiz, A. R. MONSSTER: A Method for folding globular proteins with a small number of distance restraints. J. Mol. Biol. 1997, 265, 217-241. (27) Kuszewski, J.; Gronenborn, A. M.; Clore, G. M. Improving the packing and accuracy of NMR structures with pseudo-potential for the radius of gyration. J. Am. Chem. Soc. 1999, 121 (1), 2337-2338.

PR025552Q