Hevein: NMR assignment and assessment of solution-state folding for

Feb 1, 1993 - Antimicrobial peptides from Amaranthus caudatus seeds with sequence homology to the cysteine/glycine-rich domain of chitin-binding prote...
0 downloads 0 Views 3MB Size
Biochemistry 1993, 32, 1407-1422

1407

Hevein: NMR Assignment and Assessment of Solution-State Folding for the Agglutinin-Toxin Motifl,t Niels H. Andersen,’,s Bolong Cao,s Adela Rodriguez-Romero,ll and Barbarfn Arreguinll Department of Chemistry, University of Washington, Seattle, Washington 98195, and Znstituto de Qulmica, Universidad Nacional Autbnoma de Mkxico, Ciudad Universitaria, Mgxico D. F. 04510. Mexico Received June 9, 1992; Revised Manuscript Received September 26, 1992

ABSTRACT: The first high-resolution solution-state structure of a member of the toxin-agglutinin folding motif with the WGA disulfide linkage is presented. The lH N M R spectrum of hevein has been 100% assigned from residue 2 through residue 43, the C-terminus, using two-dimensional correlation and NOE spectroscopy. During the course of the NOESY analysis, the three-dimensional structural features of hevein were derived, using nonstereospecific distance constraints (with tight bounds) for XPLOR simulated annealing followed by unconstrained relaxation in the CHARMm force field, at two levels of long-range constraint density. In addition, a large number of low-bound-only constraints, corresponding to unobserved NOE’s, were used in both refinements. The first structure elucidation employed a total of 180 distance constraints (60 of which were medium or long range, i / i + n with n 2 2). The second refinement employed 244 (101 medium or long range) constraints: some conformation-insensitive intraresidue constraints were deleted, two misassigned long-range constraints were corrected, and 41 new i / i + n (n 1 2) constraints were added. The average bounds precisions of the two refinements were comparable (f0.44 A) and significantly tighter than those that result when a universal low bound corresponding to the sum of the van der Waals radii was used. (The more conservative treatment of N O E S gave the same final structure but required a higher constraint density before assignment errors would stand out during the refinement.) Constraint density also has a significant influence on convergence and accuracy using tight constraints. The study demonstrates that convergence within an ensemble of solution structures is not a dependable criterion for either the accuracy or precision of the derived structure. The best fitting conformers from the refinement at the higher constraint density bear a greater similarity to the solid-state structure of the domains of wheat germ agglutinin (0.95 %L rmsd over residues 2-32) than to the recently reported 2.8-%LX-ray structure of hevein (1.25 %L rmsd over residues 2-32,2.83 %L rmsd over residues 2-42). The consensus conformer from the solution data is defined to a backbone rmsd of 2 A), and the authors suggested that this may be "an example of structural resolution being limited by motional disorder". More recent studies (Metzler et al., 1992) have established that a different disulfide linkage pattern is present in that allergen, and its solution structure has been defined to a C0.7-A backbone rmsd from residues 5-39. With the distinct linkage isomerism there is no reason for a structural analogy between hevein and Ra5G; however, given the degree of homology to WGA, one would certainly expect very extensive local structural correspondencesand a similar overall fold (see Figure 1). We now report the complete nonstereospecific assignment of the NMR spectra of hevein in 10% aqueous dioxane and two stages of conformation elucidation using NOESY-derived distance constraints (DCs). At both stages (which differ in constraint density), we have employed tighter bounds for the DCs than is typical of NMR structure elucidations of proteins, a strategy (Andersen et al., 1992) which is still exploratory. I Abbreviations: CD ACD, circular dichroism and difference CD spectrum; DCs, distance constraints; FR, SA structures have been relaxed (without NOE distance constraints) into the nearest conformational minimum; HIV, human immunodeficiency virus; HaLN-06, HaLN-07, HaLW-Rl, HaRN-32, etc., specificconformer modelsfrom theensemble of solution structures of hevein; LBOs, !ow-bound-gnly distance constraints; MM and MD, molecular mechanics and dynamics; SA, dynamics simulated annealing procedure or structures generated from it; TFA, hfluoroacetic acid; UV, ultraviolet absorption spectrum; Amb. a. V and Amb. t . V (=RaSG), ragweed allergens, fraction 5; di, and ri,, experimental estimatesof interproton distances and those observed in theactual molecule or a structural model, respectively; rmsd, root-mean-square deviation measured over the N, Ca, C', and 0 atoms of specified segments of the peptide backbone;rmswv, a specifiedweighted root-mean-square violation , acid residue conformations centered about measure; (YR and a ~amino 6 = -70°/$ = -40° and 6 = +60°/$ = +30°, respectively; ex, molar ellipticity at X(nm) in units of deg.cm*/(residue.dmol). The common abbreviations of 2D NMR-NOESY, TOCSY, ROESY, COSY,DQFCOSY, TPPI, MLEV-17, NOE, t i , t 2 , T,, d", daN, etc.-are employed without further comment. Specific NOE interactions and the correspondingderiveddistanceconstraints are identified by amino acid (residue number or one-letter symbol and residue number) and hydrogen position-HN (or N), a,8, etc.-for each proton involved; e.g., 28a/ 3 1HN represents a cross-peak or constraints for the a-proton of residue 28 and the backbone NH of residue 3 1. All other references to individual residues are by symbol and residue number--116 corresponds to the leucine at position 16-or use standard format, Leu16.

The results in the present case were confirmed by parallel refinements using a conservativeconversion of NOE intensities into looser bounds. The initial set of 180distance constraints afforded a moderate resolution picture of the aqueous folding geometry and (when tight distance constraints were employed) provided a surprisingly well-defined core structure which was distinguishably differentiated from that of WGA. At the stage of the final refinement (using 136 interresidue and 108 intraresidue distances with both high and low bounds and a large set of low-bound-only constraints), the consensus conformer (defined to a backbone rmsd of 0.15 between HN,and H N ~ +only, " a horizontally hatched 0 indicates an A. The final set of 26 SA structures displayed an rmswv of NOE between Hgi and H N ~ only, + ~ a cross-hatched 0 indicates a 0.10 0.02 A. sequential NOE between HN and side-chain protons, a stippled The SA structures were relaxed into the nearest conforindicates an NOE between Hgi and and 0 indicates NOE's either between H N and side-chain protons or between side-chain mational minimum by steepest descent CHARMm minimiprotons. Connectivities appearing as an x are long-range NOESwhich zation (without NOE constraints). The rmswv measure were found late in the analysis and were used only in the final 10

20

30

40

conformer refinements (vide infra). The two enriched boxes correspond to the 19/33 connectivities that were used in the initial refinements and were later found to be suspect.

tertiary contacts, are evident in a diagonal plot of the NOE data (Figure 7). For an initial elucidation of the folding of hevein, we selected 180 conformationally significantly NOE connectivities (58 intraresidue, 62 i/i+ 1, and 60 medium to long range) which were converted into tight distance constraints (as per Table I). XPLOR-2.1 (Polygen) (Brunger, 1990) was used to

-

-

-

Due to our use of DCs that are considerably tighter than those used in most XPLOR-based protein structure refinements, this violation measure needs to be put in perspective and compared to the commonly reported rms violation measure (from program XPLOR) which would be observed with both tight and loose DCs. For a typical first-generation structure (rmswv = 0.13 A) generated by SA against tight bounds, XPLOR reports an rms violation of 0.088,(versus our tight constraints) or 50.04A when broader, "conservative", constraint ranges are used for the XPLOR statistics. Likewise, for a fully relaxed second generation structure, the rmswv measure is 0.1 1 and 0.03 A, respectively, against the tight and loose constraints; XPLOR reports -rms diff' values of 0.08 and 0.04 A, respectively, against the same constraints.

1416 Biochemistry, Vol. 32, No. 6,1993

Andersen et al.

FIGURE8: Views of structural features of the first-generation hevein structure ensemble. The disordered C-terminus appears in the lower left-hand corner of each panel. Panels A and B show the entire ensemble of 26 NMR structures obtained using tight constraints and reveal that the overall fold is defined (pairwise 3 41 backbone rmsd = 1.44A); in panel B, the better defined core (pairwise 15 34 backbone rmsd = 0.65 A) is shown with the more “disordered” termini in gray tone. Panel C shows, in the same perspective employed for panel B, the consensus (1 7 of 30 conformers) from the ensemble obtained using a *conservative” conversion of the NOE intensities to broader constraints 34 backbone rmsd = 0.80 A). (pairwise 15

-

-

-

Table IV: Changes in Constraint Violations during the Course of the First-Generation Structure Refinement refinement stage initial structures all (av, SE) helical only homology simulated annealing second stage ((worst 15%)) second stage ((best 15%)) final stage best mean worst FR (no NOE constraints) best mean worst

ENOE”

>lo6

rmswvb

penalty functions ( 180 constraints) frac >0.2 A

11.7 f 10.5 11.6 1 .I4

0.41 f 0.12 0.50 0.29

636 458

0.55 0.20

0.20 0.12

21 49 f 27 132

0.06 0.10f 0.02 0.13

0.01 0.04 0.08

148 425 f 141 776

0.18 0.09 0.27 f 0.04 0.14 0.37 0.16 82 core nonintraresidue constraintsC frac >0.2A

refinement stage rmswvb SA structures best 0.04 mean 0.09f 0.03 FR structures best 0.13 mean 0.27 f 0.07 a Units of kcal/mol. Violation measures are given in angstroms; rmswv is defined by eq at least one proton that is within the core region, defined as residues 14-35.

increased to 0.27 0.04 A for the fully relaxed (FR) Structures? The 310 added LBO constraints contributed a maximum sum of violation of only 0.19 A for the FR structures; thus the summary of violations (Table IV) includes only the structurally significant constraints (in the lower portion of the table only 82 interresidue constraints involving at least one spin site from within the structure-conserved core region, residues 14-35, are included in the penalty functions). The CHARMmenergies for the FR structures ranged from-1046 to-859 kcal/mol, equal to -24.3 to -20.0 kcal/(residue*mol), consistentwith value obtained for unstrained structures using this force field (Andersenet al., 1992). The&-Jtermobtained, 4 . 2 kcal/(mol-residue), is consistent with the values observed in NMR structures of larger, rigid protein systems, for example, interleukin- 18, -3.7 kcal/(mol*residue) (Clore et al., 1991). While 20 of the 26 FR structures could be viewed as superior with regard to both constraint violation measures and CHARMm energies, we chose not to eliminate any potential conformers in our presentation of the initial conclusions concerning the solution-state fold of hevein.

0.00

0.03

Elviollb

767 727 87 26.2 11.6 2.65 5.48 f 1.10 7.55 7.99 14.18 f 2.77 17.15 QvioIIb

0.78 2.28 f 0.77

0.07 3.01 0.17 7.38 f 1.92 1. This includes all nonintraresidue constraints involving

Features in the First-Generation Structures of Hevein. Views of the consensus features in our ensemble of FR structures appear in Figure 8. Panel A reveals that all structures represent the same overall folding pattern with a backbone atom rmsd (excluding the extreme residue at each terminus) of 1.44 A. Panel B shows that the degree of structural convergenceis greater in a core region from residues 15 through 34 (backbone rmsd = 0.65 A). By a stricter definitionof convergence the core is actually limited to residues 16-32. A short helical segment (residues 28-32) is welldefined in essentially all of the structures. As will be shown, this depiction obscures additional elements of conserved structure within the noncore segments. The structural convergence obtained at this stage is not an artifact of the use of tight bounds which might not compensate for the effects of segmental motion. Panel C shows that the same fold is obtained when all LBOs are eliminated and the low bounds are set at the values used when NOE intensities are converted to distances in the conservative fashion.

Biochemistry, Vol. 32, No. 6, 1993

Hevein Solution Conformation One of the most unusual features of the WGA domains is the occurrence of two adjacent non-glycine residues with an a~ conformation: $14 = +57 f 7 O , $14 = +31 loo, $15 = +64 7 O , $15 = +22 12O (for eight determinations, two protomers of each of four domains) (Wright, 1987). These correspond to an Asn-Asn pair in all WGA domains and in hevein. In our hevein structures, the Asn-Asn unit is found predominantly in the form that is analogous to the WGA structures; the values for the 1 1 best fitting structures (rmsd = 0.36 A from 13 16) are $14 = +62 f loo, $14 = + l o f l6O, $15 = +66 f 1 3 O , = 18 f 20°, However, another conformational cluster (seven structures with an rmsd of 0.43 A over the same range of residues) was also obtained with an (YR conformation: $14 = -44 f 13O, $14 = -23 17O, $15 -54 f 19O, $15 = -59 f loo. This may be a locus of conformational heterogeneity for the solution-stateof hevein.

*

*

-

+

*

The first-generationensembleof NMR structuresgenerated for hevein was obtained without using any hydrogen-bonding information. The CHARMm minimization, used to relax the structures, also excluded the E H . h n d term. Thus a comparison of the structuresand the NH exchangedata should provide a test of the NOE-derived structures. As shown in Figure 6, at least nine of the backbone NHs (residued 16-19, 22,23,25, and 30-31) display extremely slow exchange (tip > 12 h at pH 3). An additional 11 residues display t 1 / 2 > 0.8 h, with the values for residues 3,4, 24, and 32 approaching those observed for persistently H-bonded structures. Of the 13 NHs thus implicated as highly sequestered, nine (3,4,18, 19,22,25,30-32) are listed in Wright’s (1987) tabulation of homologous main-chain hydrogen bonds in the four WGA domains. The backbone NHs in residued 16, 18,23,24,27, and 39 were water-inaccessible, as judged by a Connolly surface generated with a 1.4-%Lprobe (using INSIGHT). Further Refinement, Second-Generation Structures. The solution-state structure of hevein had been defined to surprisingly high precision in the first-generation structures using only a semiquantitativeset of nonstereospecificNOE distance constraints. The solution that emerged presented a welldefined fold, of which the dominant feature is a buckled antiparallel sheet, characterized by a close 18a/24a contact surrounded by the expected persistent H-bonds (17 25/25 17 and 19 23). Buckling at one end of the sheet could be seen by the appearance of a turn-associated 22 19 H-bond, which is also present in the WGA domain structures. The other end of the 8-sheet structure was indicated by the much reduced intensityof the 16a/26a NOE, which was barely detectable in the NOESY spectra. This through-space interaction was, however, confirmed in a ROESY spectrum and displayed significantly greater intensity in the rotating frame experiment; its reduced intensity in the NOESY may be attributable to motional disorder-ROESY cross-peak intensities are less sensitive to changes in the effective correlation time. The only other section of regular secondary structure was a short a-helix from Asp2*to Ser32. By the usual criteria, this represented a more than acceptable degree of both consistency with experimental data and convergence for an NMR structure at this level of refinement. As it turns out, the structural conclusions are flawed even though they display excellent convergence in the core region. A further round of refinement was undertaken on the basis of three developments: (1) further analysis of the NOESY data yielded in excess of 40 new long-range constraints, some

-



-

-

-

1417

of which were significantly violated by our FR structure^;^ (2) a moderate-resolution X-ray structure of hevein became available for comparison (Rodriguez-Romero et al., 199 1)6 but the X-ray structure did not agree with the conclusions reached by the NMR study, and (3) a set of potentially key constraintsbetweenresidues 19 and 33 wereshown tobeeither invalid or suspect.’ Our aims in this additional refinement stage were to ascertain the effect of the now suspect constraints on the refinement process, to incorporate the new constraints, to establish whether the solution- and solid-state structures of hevein are distinguishably different from each other and from domain C of WGA, and to define individual conformers (Andersen et al., 1992;Lai et al., 1993) rather than an “average structure” with regions of motional disorder. The best 20 structures in our FR ensemble represent only five distinct conformers based on the criteria outlined under Experimental and ComputationalMethods. The better fitting representativesof each (a total of nine structures) were refined through three cycles of SA against the new constraint table including, as regularizing distance constraints (only during the first two cycles of SA), the previously recognized H-bonds that characterize each distinct conformer cluster. The X-ray structure2 was also used as starting structures for XPLOR SA refinement based on the NOE constraints. A total of 43 structures were generated. The additional long-range constraints that became available late in the assignment process eliminated a number of conformers from consideration, and ten of the structures were eliminated by their high CHARMm energies. The remaining conformer structures are collected in Figure 9 together with depictions of the core structure from the previous round of refinement and regions from the X-ray structuresof WGA and hevein. (Similarresults were obtained using expanded low bounds for this constraint set.) The fit measure shown for each structure, or ensemble, is the average violation of the tight constraints after relaxation. The rmswv values ranged from 0.10 to 0.15 A and had increased by less than 10% during the unconstrained relaxation. The fit measures were taken over 206 constraints [including, within the core, 37 sequential ( i / i + l ) and 82 i/i+n (n > 2)] and 44 “key” LBOs. Conformer models that were rejected displayed average violations greater than 0.06 A and rmswv values greater than 0.18 A.4 Panel A of Figure 9 shows the most extreme conformers that were judged to be consistent with the NMR data. The largest ensemble of structures (1 6/23, ET = -1 366 f 47 kcal/ mol) is represented in panel A by five structures shown in green and is designated as the “hevein CYL-NMR conformer” (HaLN). The next most frequently obtained conformer, HaRN (3/23, ET = -1329 f 58), which is shown in yellow, is the 1 5 - a R conformer. Given that the HaLN species typically violate less than 10/250 constraints by in excess of 0.3 %I, and those violations all involve wild-carded G/t-aryl protons or A total of 60 new constraints were added as they became available from the continuing NOESY analysis. These replaced 30 conformationinsensitive intraresidue constraints in the original set. As this increase in constraint density occurred, structural convergence decreased in our ensemble of structures and a number of large violations (>0.8 A) began to appear. The coordinates and conclusions reached in the parallel X-ray and N M R investigations were intentionally not shared in order to maintain each as an independent structure determination. The basis for this confusion in the assignment is the subject of a supplementary material figure.

1418

Andersen et al.

Biochemistry, Vol. 32, No. 6, 1993

d

I

FIGURE9: Second-generation conformer models for hevein in the solution state: comparisons with WGA-like structures, the FR structures from the first-stage refinement (from Figure 8) which included two unsubstantiated long-range distance constraints and had a lesser density of long-range connectivities, and the reported (Rodriguez-Romero et al., 1991) low-resolution solid-state structure. In panels A, B, and D the structures are overlaid to a least-squares fit over the backbone atoms of residues 16-32. The average violations shown (panels A and B) for each structure, or ensemble, are taken over 250 constraints, which included 37 sequential (i/i+l) constraints within the core region and 82 i/i+n (n 2 2) constraints involving the core region. In the case of panel B, two of the structures had been refined with an incomplete set of constraints lacking 51 (or 48) of the constraints that were located in the later NOESY analyses; these constraints were, however, used in evaluating the average violation measure. Panel A displays residues 2-42 of four second-generation conformers for the solution state of hevein that survived our fit and energy criteria; the right-hand view is a 38' rotation of the left-hand view about the vertical axis. The major conformer (HaLN, green structures) is represented by five structures (backbone rmsd = 0.42 f 0.02; the values for the entire ensemble of this conformer were 0.58 f 0.1 3). The backbone rmsd comparisons between the other conformers and HaLN for residues 2-42 (and 16-32) were as follows: for H a R N shown in yellow, 1.18 f 0.13 (0.30 f 0.03); for the pink conformer, 0.77 f 0.13 (0.30 f 0.08); for the cyan conformer, 0.98 f 0.08 (0.27 f 0.05). Panel B compares the major conformer from panel A (in green), that in figure 9 (blue), and those (red) obtained with a corrected abbreviated constraint set (see text). Residues 17-35 are displayed. The least-squares fitting was over all backbone atoms of residues 16-32; residue 16 and the carbonyl oxygens were deleted for clarity. Panel C shows the helical domain in the first- and secondgeneration ensembles. The disconnected segment is Ser19. In the first-regeneration set 19-Ha, 33-H61, and 33-H62 are shown as blue balls. In the second-generation set 19-, 30-, and 37-Ha's appear as green balls. Panel D shows residues 2-42 for the X-ray structure of hevein (cyan) which displayed a large average violation (1.12 A) of the NOE-derived distance constraints, the H a L N conformer derived by N M R (green), and the best-fitting (( Ivioll) = 0.08 A) WGA-like structure (red) that could be generated. For clarity, only a sin le representative of the WGA-like set is included; the convergence within the group is excellent-2 32 backbone rmsd = 0.19 i 0.04

-

methyl groups,*it becomes necessary to examine whether the minor conformers are required to rationalize the NOE data: is there any basis for viewing them as populated conformers rather than abberant solutions? A careful analysis of the differences in violations between HaLN and HaRN reveals a self-consistent set of ten long-range NOE distances that appear as negative violations (model distances shorter than

* We employ (r-6) averaging to calculate model ' i i values, for comparison to the constraints, and in the XPLOR SA refinements for all categoriesof wild-carded proton sets. There is no evidenceof hindered rotation for the ring of Tyr30.

1.

the experimentalconstraint) in the QR conformer but as positive violations in HaLN and a set of four related NOE distances with the opposite trend. This is precisely what would be expected (Bruschweiler et al., 1991) if the ~ Y Rconformer is a minor conformer in the solution-state equilibrium occurring under the experimental conditions examined. The relative reduced intensity for the 16a/26a cross-peak in NOESY spectra (versus the ROESY) may also reflect conformational changesat these loci. The two remaining conformerstryctuaa the largest differences that were Observed in panel A at other loci (Gly9, Prol3, and His35);their inclusion in the

Biochemistry, Vol. 32, No. 6, 1993

Hevein Solution Conformation ensemble, however, does not provide a significant improvement in fit to any long-range NOE constraints. There is, to date, no experimental basis for viewing them as populated conformers of hevein in the solution state. Panel B provides a basis for discussing the differences between the structures at this and the previous stage of refinement;the region from Cysl7to H i ~ as 3 it ~ appears in the final refinement (green) and the preceding one (blue) is illustrated. Excellent convergence was observed in each case; pairwise backbone (16-32) rmsd values for the ten best structures were 0.19 f 0.03 A for the final refinement and 0.40 f 0.04 A for the penultimate series. The structures are, however, quite distinct, displaying an average pairwise backbone (16-32) rmsd of 1.08 f 0.06 A between structures from the first- and second-generation ensembles. The difference can be shown to be largely due to the two “incorrect” constraints in the earlier refinement. The changes in constraints were 19a/336* 19a/30a and 19a/37a, 19j3*/ 336* 19@*/30a. We carried out nine SA refinements in which only these three changes were made in the constraints; none of the additional 65 constraints found during subsequent NOESY data analysis were added. The resulting structures appear in red in panel B of Figure 9 and match the final set quite well. It is thus apparent that the extent of structural error that can be introduced by a few questionablelong-range constraints is large and that structural convergence is not always a dependable criterion for the correctness of an ensemble of NOE-derived structures. The added long-range constraints did, however, serve to improve the convergence of the final ensemble of structures. Panel C [which shows the residue 28-35 span and Serlg, the latter being included to illustrate the specific locations of 19a/30a the altered constraints, 19a/336* (blue balls) and 19a/37a (green balls)] illustrates the helical domain of hevein. This span is now well-determined, displaying backbone torsion rmsds of less than f14’ throughout. With the exception of $32 (= +140 f 7O), the entire span is (YR configured: (428-35) =-74f 19O, ($28-35) =-24f21°.This helical region, limited to Asnzsto Serf2in the first-generation structures, can thus be described as an eight-residue helix with a single kink in itsgAt several points the backbone torsion angles suggest an internally hydrated a-helix, 4 = -95’ and J. = +loo (Karle & Balaram, 1990). Turning to comparisons (panel D of Figure 9) of the NMR structure (HaLN) to WGA and the existing X-ray structure, the criterion used to trim the SA conformers would have eliminated both as models for the solution-state structure of hevein. In order to test the possibility of hevein having a structure similar to WGA, we incorporated additional constraints so as to maintain a WGA-like fold while retaining the full set of NOE constraints. The resulting structures were indeed WGA-like: backbone (2-32) rmsd, 0.46 f 0.02 A, versus domain C of WGA, 0.96 f 0.05 A, versus HaLN-the rmsd between HaLN and WGA-C is 0.95 f 0.04 A. As a group these WGA-like structures displayed a mean violation of 0.09 f 0.01 %L from the NOE-derived distances. The best fitting of these conformers is shown (red structure) in panel D. In excess of 60 XPLOR SA runs with torsion constraints that maintain a 10.50-A backbone (2-32) rmsd with WGA were carried out; in every case the rmswv values for WGA-

-

-

-

All attempts (using constrained dynamics followed by relaxation in the CHARMm force field) to create models lacking this kink produced structures that had not only significantly increased violations but also higher CHARMm energies (by >20 kcal/mol) in the absence of NOE constraints.

1419

like structuresexceeded0.27 (versus 0.1 1-0.13 for all members of the HaLN ensemble). The solution-state structure of hevein is thus readily distinguishable from the solid-state structure of WGA over any homologous region-the core only or from residues 2-32, the full span over which a similar fold would be predicted. However, the solution-state structures derived in the present work resemble the solid-state structure of WGA more than that recently reported for hevein itself. The crystal structure of hevein displays, over the Leu16 Ser32core region, a backbone rmsd of 1.25 f 0.05 A versus HaLN. This 2.8-A solid-state structure does not have a wellformed antiparallel sheet in the core region, displaying a E(viol)2of 3 l .7A2over 19 cross-strand NOE distances (versus 0.32 f 0.20 A* for HaLN). Violations of NOE constraints include ten instances of X-ray model distances of >5.0 A for which indisputable backbone-H/backbone-H interstrand N O E S are observed. Large deviations are also observed for a number of loop/loop and loop/core interactions. Over the 2-42 backbone the X-ray structure has a 2.83 f 0.10 A rmsd with the HaLN ensemble. The solution structure also provides better explanations for the unusual chemical shifts observed in hevein. As previously noted, the side-chain shifts of 420 are remarkable, and the methylene protons of Gly25 are extremely shift divergent, with one of them appearing at an unusual upfield position, 2.08 ppm. Figure 10 provides a rationale for these observations and demonstrates that the X-ray structure does not rationalize the shifts as well. A rationale for a key inter-side-chain NOE in this region is also absent in the solid-state structure. The differences between the solution- and solid-state structure models for hevein will be explored in collaboration with Dr. Soriano.

-

CONCLUSIONS The present study has provided another demonstration of the importance of obtaining the highest possible density of long-range constraints in attempts to define protein tertiary structure by NMR and a vivid illustration of the need for absolute certainty in assignmentsof any long-range constraints employed. In the case of hevein, two incorrect assignments (out of a total of 60 long-range constraints for a 43-residue system) did not prevent convergence; rather, they improved the convergence to an incorrect structure. As the additional 41 long-range constraints, extracted from further NOESY analysis, were added, an inconsistency was apparent-the ensemble of structures from SA refinements began to show larger NOE constraint violations and a lessened degree of convergence as measured by the backbone rmsd over the core region of the structure. Structural convergence within a set of NMR structures cannot be taken as an indication of the accuracy of the structure refinement nor as validation of the procedure used. When two long-range NOE attributions were corrected, only two conformers were required to rationalize all of the NOES observed to within experimental error (f10.25 at dv 13.0 A, f0.5 at dv = 3.8 A). Neither of these solution conformers corresponds to either the recently reported X-ray structure or the WGA domain structure, but they do resemble the latter. Thus the present study is one of the rare instances in which NMR has provided a notably different structure for a small protein than that which was predicted on the basis of homology modeling or derived by X-ray crystallography. It should also be noted that the degree of structural convergence observed within the entire ensemble of solution structures (C1.2 A pairwise backbone rmsd from residues 2-42, C0.4 A over the residue 16-32 core region) is better

1420 Biochemistry, Vol. 32, No.6, 1993

Andersen et al. I

18

6 3.73 24

\

1.85

FIGURE 10: To the left, views of the shielding of the Gly25Hal by the aryl ring diamagnetic effect of Tyr30 and the differential shielding of the Gln2- side-chain protons by the indole ring of Trpzl as seen in the major conformer of the second-generation NMR structure. The unusual chemical shifts placed by the protons on the structural depictions are those observed at pH 6.6. The location of 27HN (which appears far upfield, at 7.05 ppm) is indicated by an asterisk. The upfield 207 proton displays a moderately weak NOE to 2161 and €1 which correlates with the 3.4 A 0.1 A 2071/2161 distance in the model. To the right, the corresponding segments from the solid-state structure are shown, in one case rotated to yield a better view of the Tyr side-chain orientation. 25al is not predicted to be as highly shielded and 27HN would not be shielded at all in this model. Both the Glnzoand TrDZ1side chains appear in different conformations which explain neither the shifts = 7.4 A). nor the inter-side-chain NOE ( 2 0 ~ * / 2 1 6distance ~

than that expected (Clore & Gronenborn, 1991,1992). Within the core region the constraint density (counting all interresidue constraints twice) varies from 8 to 23 per residue; outside of the core the count varies from 2 to 13 per residue. At a total of 244 constraints for 43 residues, in the absence of torsion constraints and pro-R/S assignments,this represents a firststage NMR structure based on the criteria of Clore and Gronenborn (1991), and the expected structural accuracy is 11.5 A backbone rmsd. The higher accuracy obtained in this study is the result of tighter distance bounds, in particular, abandoning the universal 2-A low bound. The risks and rewards associated with this strategy will be fully documented in a subsequent paper. CD spectroscopy provides some insights into one of the distinctions between the present NMR study, the X-ray structure, and the WGA analogy. The far-UV CD traces for hevein throughout the pH range from 2.4 to 8.0 are shown in Figure 11 and bear a considerable resemblance to those of WGA. The previous CD comparisons of hevein and WGA (Rodriguez-Romeroet al., 1989), which did not penetrate to 180 nm, establish that the CD spectral features are absent upon cleavage of the disulfide linkages. The difference CD, between the most acidic conditions and the others, corresponds to the expectation spectrum for a very short helix (Yang et al., 1986; Harris et al., 1992). If the five-residue span (2832) with i/i+3 connectivities is the sole contributor to this ACD feature, the decrease in 0222 at pH 2.4 corresponds to a 41% increaselo in helix population at the most acidic lo Given the @/+valuesobservedinthe full ensembleofNMRstructures (vide supra), this estimate of population change may be high; the aR conformations at 33-35 and at 1 6 1 5 (in a minor conformer), as well as type I 6 turns (Perczel t Fasman, 1992), may also contributeCD signals resembling a short helix.

conditions examined, based on the formula for 0222 given by Scholtz et al. (1991). The NMR data also provide evidence for a pH-inducedconformationalchange in or near the helical segment. When the amide N H chemical shifts at pH 2.4 and 6.6 are compared, we note an insignificant downfield shift (+0.03 f 0.04 ppm) upon acidification for all residues outside of the 28 35 region. Within this region much larger shifts areobserved: G ~ U-0.46 ~ ~ppm; , Tyr30,-0.14 ppm; Ser32,-0.39 ppm; Asp34,+0.10 ppm; +0.21 ppm. At pH 6.6, the Aspz8H N is absent (presumably due to rapid cross-saturation), and two distinct N H chemical shift gradients are observed over this span: 6 (residue number) 8.75 (29), 7.89 (30), 7.59 (31), and 8.51 (32); Pro, 8.28 (34) and 7.60 (35). At pH 2.4 the Aspz8 HN appears at an extremely downfield position, and a single N H chemical shift trend, consistent with a helix dipole effect (Wishart et al., 1991), is observed from 28 31-9.43 (28), 8.29 (29), 7.74 (30), 7.63 (31)-and 8.1 1 ppm (at residue 32) breaks the trend. The CD data are consistent with the significant population of a short helix at all pH conditions examined. At pH < 4 the helix population or extent increases. While i/i+3 connectivities over residues 28 32 are observed at both pH 2.4 and pH 6.6, a detailed quantitative analysis gives better agreement for the pH 2.4 data. With regard to the absence of this feature in the solid-state structure, it should be noted that the crystals used for X-ray structure were grown at pH 8.06 (Rodriguez-Romero et al., 1991). The solution structure of hevein derived in the present study is remarkable in several respects. Unlike most small globular proteins which have been examined, or the scorpion toxins [as exemplified by charybdotoxin (Bontems et al., 1991)], hydrophobic cluster formation does not appear to be the major fold determinant. The only residues that are judged to be entirely solvent-inaccessible are Cys3J8 and the residues

-

-

-

Hevein Solution Conformation

Biochemistry, Vol. 32, No.6,1993 1421

i.750E+04

[el

I

I

- 7 . " 0 " 1 ~ ' " " " " ' ' " " " " ' ' " " " " ' ' " " " " " " " " " " " ' ' ' ' " ' ' ' ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ' 260.0 178.0 WL t n m l

FIGURE11: CD spectra (residue molar ellipticity versus wavelength) of hevein at pH 2.4, 4.5, 6.6, and 8.0 (at 50 pM protein and 10 mM phosphate throughout). The pH 2.4 and 6.6 traces are solid lines; those at pH 4.5 and 8.0 are broken lines and are labeled as to the pH. The heavy dashed-line trace corresponds to the ACD (pH = 2.4 - pH = 6.6),with the residue molar ellipticity scale based on the assumption that the entire change occurs within an eight-residue span. 17-19 backbone. Two H-bonds stitch the loop to the core (NH(leyO(39) and NH(38)--0(18)), the latter displaying excellent geometry and a 2 . 8 7 - A N 4 distance. In both WGA and hevein, the conserved Gln(36/38)is in close contact with both the core and the same portion of the N-terminal loop, N-H(5)-OE1. In WGA the trans-amide N H is bonded to OC(19). In the NMR structures of hevein, the H-bond acceptor is either the 19 or 20 carbonyl. Structures refined to optimize these interactions while retaining a fit to the NOE constraints indicate that the HE22(38+C(p) bond, as also seenin WGA, is the better model with regard to both constraint violations and CHARMm energy. Gln38displays very unusual chemical shifts. For all of the other side-chain CONHz groups the NHs cis and trans to the carbonyl oxygen are found at 6 6.83 f 0.13 and 7.40 f 0.25, respectively; for Gln38they appear s-s X-CG-X,-G(C/N)-X,-CPNN at 8.30 and 8.20 ppm. We propose that the interactions I described provide the driving force for the common fold (L/H/Y)CCS-X2-GXC-Xd-(Y/F)C(G/S)-X,-(G/N)CQ-preference for the otherwise unconstrained C-terminus. The sequence comparison reveals Cys residues flanked by Gly The NOE-derived structures obtained in this study should or Asn units which impart unusual flexibility and allow for prove useful for defining the molecular basis of the agglulocal QL conformations: a survey of all conformer structures tinizing reactions recently found for hevein (Rodriguezwhich we generated reveals that at least five of these seven Romero et al., 1991). These activities, and possible allergic residues are in this rarer local conformation; the conformer responses, are almost certainly associated with the exposed with all Asn residues in the a~ conformationll is consistent loops, the structures of which could not bedefined if the broader with the NOE's observed. We propose that these arerequired, bounds typical of protein structure elucidation (for example, in what becomes the core, in order to predispose the system 1.9-2.7, 2.0-3.5, 2.4-4.5, and 2.7-5.5 A) were employed to a fold that is subsequently'xed by the closure of the (Andersen and Cao, studies to be published separately). disul'des. The flexiblesegmentbetween Cys31and the small The agglutinin-toxin folding motif represents an excellent disulfide loop in the C-terminus would, presumably, allow for system for studies of the significance of local secondary a variety of spatial dispositions of the C-loop relative to the structure preferences as folding determinants (Dyson et al., core. An examination of the WGA structure and HaLN 1988a,b; 1990; Wright et al., 1988). As stereospecific reveals the following common features: the 37-C&S-S-C@assignments become available and both sequential and long41 (hevein numbering) unit is fully internalized, shielded from range constraints are defined more tightly by the use of the the solvent surface by the 38-40 backbone, Cys31, and the DISCON algorithm (Andersen et al., 1992; Lai et al., 1993), the persistently structured regions and the nature of any I I A quick inspection of a short mixing time NOESY provides a nearly remaining conformational averaging should be defined to foolproof initial assignment of OLL sites. They are diagnosed by an greater precision. The resulting dynamic structure hypothesis, intraresidue dNa peak that is as large as the larger sequential daN peaks and the resonance assignments from the present study, will in the spectrum and the appearance of at least one of the corresponding serve as the basis for studies of the refolding of hevein. Even d"iti CrOSS-@CS.

adjacent to this disulfide linkage (Gly4 and Ser19)and GlyZ2. None of the hydrophobic side chains (Leu, Pro, Trp, and Tyr) were fully internalized. One of the methyls of each leucine is partially shielded, the other being fully exposed. Both Trp and Pro residues are solvent exposed. All of the polar side chains (Asp, Glu, Lys, and Arg) are fully exposed. Even though there is no covalent connection between the core and the C-terminus,both WGA and hevein (in the solution and solid state) display a rather similar disposition of this flexible loop relative to the core. This is remarkable given that the connecting region between the core and the conformationally independent C-terminal disulfide loop is the point of greatest sequence divergence in the structures-n in the following comparison is 3 for hevein and 1 for WGA.

1422 Biochemistry, Vol. 32, No. 6,1993 at the present stageof refinement, thederivedsolutionstructure for hevein differs significantly from that seen in a moderateresolution solid-state structure (Rodriguez-Romero et al., 1991). Further study, including a higher resolution X-ray data set (Soriano & Rodriguez, research in progress), will be required in order to ascertain which of these differences represent actual changes in conformational preference and fold in the states examined. ACKNOWLEDGMENT

Chin-pan Chen provided the XPLOR procedures which were modified for use on this problem. We thank Scott M. Harris for help in recording CD spectra. Professor M. SorianoGarcia (UNAM, M6xico) providedcoordinates from the 2.8-A X-ray data for hevein. SUPPLEMENTARY MATERIAL AVAILABLE

Six figures showing d” connectivities at pH 6.6, aryl-CH connectivitiesin D2O at pH 6.6, representative annotated HN/ Ha and H/3 segments of the NOESY at 310 K, illustrations of the assignment of both Pro residues and protons coincident with the Pr03~6 signals and the appearance of diagnostic intra&peaks for (YLresidues,and a comparison of NOESY spectra recorded at three temperatures and an annotated listing of the complete set of distance constraints used in the final SA runs (12 pages). Ordering information is given on any current masthead page. REFERENCES

Andersen, N. H., Eaton, H. L., & Nguyen, K. T. (1987) Magn. Reson. Chem. 25, 1025-1034. Andersen, N. H., Eaton, H. L., & Lai, X. (1989) Magn. Reson. Chem. 27, 515-528. Andersen, N. H., Lai,X., & Marschner,T. (1991) NOESYSIM/ DISCON Documentation, University of Washington, Seattle, WA. Andersen, N. H., Chen, C., Marschner, T. M., Krystek, S. R., Jr., & Bassolino, D. A. (1992) Biochemistry 31, 1280-1295. Archer, B. L. (1960) Biochem. J. 75, 236-240. Bax, A,, & Davis, D. G. (1985) J. Magn. Reson. 65, 355-360. Bax, A., & Drobny, G. (1985) J. Magn. Reson. 61, 306-320. Bean, J. W., Kopple, K. D., & Peishoff, C. E. (1992) J. Am. Chem. SOC.114, 3328-3334. Bodenhausen, G., Kogler, H., & Ernst, R. R. (1984) J. Mugn. Reson. 58, 370-388, Bontems, F., Roumestand, C., Gilquin, B., Menez, A., & Toma, F. (1991) Science 254, 1521-1523. Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Saminathan, S., & Karplus, M. (1983) J. Comput. Chem. 4, 187-21 7. Briinger, A. T. (1990) XPLOR Version 2.1 Manual, Yale University, New Haven, CT. Briischweiler, R., Blackledge, M., & Ernst, R. R. (1991) J. Biomol. NMR 1, 3-1 1. Clore,G. M., & Gronenborn, A. M. (1991) Annu. Reo. Biophys. Biophys. Chem. 20, 29-63. Clore, G. M., & Gronenborn, A. M. (1992) 33rd Experimental NMR Conference(March31), BookofAbstracts, pp44, Pacific Grove, CA. Clore, G. M., Wingfield, P. T., & Gronenborn, A. M. (1991) Biochemistry 30, 23 15-2323. Drenth, J., Low, B. W., Richardson, J.S., & Wright,C. S . (1980) J. Biol. Chem. 255,2652-2655. Driscoll, P. C., Clore, G. M., Beress, L., & Gronenborn, A. M. (1989) Biochemistry 28, 2178-2187. Drobny, G., Pines, A,, Sinton, S., Weitekamp, D. P., & Wemmer, D. (1979) Faraday Symp. Chem. SOC.No. 13, 49-55.

Andersen et al. Dyson, H. J., Rance, R., Houghten, R. A,, Wright, P. E., & Lerner, P. A. (1988a) J. Mol. Biol. 201, 161-200. Dyson, H. J., Rance, R., Houghten, R. A., Wright, P. E., & Lerner, P. A. (1988b) J. Mol. Biol. 201, 201-217. Dyson, H. J., Satterthwait, A. C., Lerner, R. A., & Wright, P. E. (1990) Biochemistry 29, 7828-7837. Griesinger,C., Otting, G., Wiithrich, K., & Ernst, R. R. (1988) J. Am. Chem. SOC.110,7870-7872. Harris, S. M., Cao, B.,Lee, Y. G., & Andersen, N. H. (1992) Biopolymers (submitted for publication). Johnson, W. C., Jr. (1990) Proteins: Struct., Funct., Genet. 7, 205-214. Karle, I. L., & Balaram, P. (1990) Biochemistry 29,6747-6756. Kessler, H., Griesinger, C., Kerssebaum,R., Wagner, K., & Emst, R. R. (1987) J. Am. Chem. SOC.109,607409. Kraulis, P. J., Clore, G. M., Nilges, M., Jones, T. A., Pettersson, G., Knowles, J., & Gronenborn, A. M. (1989) Biochemistry 28, 7241-7257. Lai, X . , Chen, C., & Andersen, N. H. (1993) J. Magn. Reson. (in press). Low, B. W., Preston, H. S., Sato, A,, Rosen, L. S., Searl, J. E., Rudko, A. D., & Richardson, J. S . (1976) Proc. Nutl. Acad. Sci. U.S.A. 73, 2991-2994. Marion, D., & Wiithrich, K. (1983) Biochem. Biophys. Res. Commun. 113,967-974. Metzler, W. J., Valentine, K., Roebber, M., Friedrichs, M. S., & Mueller, L. (1992) Biochemistry 31, 5117-5127. Montelione, G. T., Wiithrich, K., Burgess, A. W., Nice, E. C., Wagner, G., Gibson, K. D., & Scheraga, H. A. (1992) Biochemistry 31, 236-249. Nbgrerie, M., Grof, P., Bouet, F., Mbnez, A., & Aslanian, D. (1990) Biochemistry 29, 8258-8265. Otting, G., Widmer, H., Wagner, G., & Wiithrich, K. (1986) J. Magn. Reson. 66, 187-193. Perczel, A., & Fasman, G. D. (1992) Protein Sci. I, 378-395. Rance,M.,Sorensen, 0.W., Bodenhausen,G., Wagner,G.,Emst, R. R., & Wiithrich, K. (1983) Biochem. Biophys. Res. Commun. I 17,479-485. Rodriguez, A,, Tablero, B., Barragin, B., Lara, P., Rangel, M., Arreguln, B., Possani, L., & Soriano-Garcia, M. (1986) J. Cryst. Growth 76, 710-714. Rodrfguez-Romero, A., Arreguln, B., & Hernindez-Arana, A. (1989) Biochim. Biophys. Acta 998,21-24. Rodriguez-Romero, A,, Ravichandran, K. G., & SorianeGarcia, M. (1991) FEBS LRtt. 291, 307-309. Scholtz,J. M., Quian, H., York, E. J., Stewart, J. M., & Baldwin, R. L. (1991) Biopolymers 31, 1463-1470. Wagner, G. (1983) J. Mugn. Reson. 55, 151-156. Walujono, L., Sholma, R. A,, Beintema, J. J., Mariono, A., & Hahn, A. M. (1976) in Atlas of Protein Sequences and Structure (Dayhoff, M. O., Ed.) Vol. 5 , Suppl. 3, p 308, National Biomedical Press, Washington, DC. Warren, G. L., Beshah, K., Goodfriend, L., Petsko, G. A,, & Neuringer, L. J. (1991) J. Cell. Biochem., Suppl. IJG, 90. Wishart, D. S.Sykes, B. D., & Richards, F. M. (1991) J. Mol. Biol. 222, 311-333. Wright, C. S.(1987) J. Mol. Biol. 194, 501-529. Wright, P. E., Dyson, H. J., & Lerner, R. A. (1988) Biochemistry 27, 7167-7175. Wiithrich,K. ( 1986)NMR of Proteins and Nucleic Acids, Wiley, New York. Yang, J. T., Wu, C.4. C., & Martinez, H . M. (1986) Methods Enzymol. 130, 208-269. Zagorski, M. G. (1990a) J. Magn. Reson. 86, 400-405. Zagorski, M. G. (1990b) J. Magn.Reson. 89, 608-614.