Structural studies of the acidic transactivation domain of the Vmw65

Derek N. Lavery and Iain J. McEwan ... JuFang Chang, J. Günter Grossmann, Ronan O'Brien, John Ladbury, Brian Carpenter, Stefan Roberts, and Ben Luisi...
0 downloads 0 Views 1MB Size
Biochemistry 1992, 31, 4150-4156

4150

Umezawa, H. (1980) Recent Results Cancer Res. 75, 115-125. Ward, P. E., Benter, I. F., Dick, L., & Wilk, S . (1990) 40, 1725-1732.

Watt, V. M., & Yip, C. C. (1989) J . Biol. Chem. 264, 5480-5487. Wilkes, S . H., & Prescott, J. M. (1985) J. Biol. Chem. 260, 13154-13162.

Structural Studies of the Acidic Transactivation Domain of the Vmw65 Protein of Herpes Simplex Virus Using ‘H NMR P. O’Hare*J and G . Williams$ Marie Curie Research Institute, The Chart, Oxted, Surrey RH8 OTL. U.K., and Roche Products Ltd., Welwyn Garden City, Herts AL7 3AY, U.K. Received December 4, 1991; Revised Manuscript Received February I I , 1992

W e have overproduced and purified the carboxy-terminal transactivation domain of Vmw65 (VP16) of herpes simplex virus, and studied potential folding of the domain by ‘H NMR. Two species of the acidic domain were obtained from the bacterial expression system, and we demonstrate that one of these represents read-through of the natural amber termination codon of the Vmw65 reading frame producing a larger polypeptide. Additional residues in the read-through product were identified by total amino acid analysis and by NMR. Study of the correctly terminated product by 1D N M R gave resonances which were clustered into groups around their random-coil chemical shift positions, and 2D N M R demonstrated that, even in mixed solvents containing up to 80% MeOH, there was very little evidence of secondary structure. Together these results indicate that the isolated acid domain has little if any a-helical content of any stable nature. We discuss these results with reference to the demonstrated activity of the acidic domain in a wide variety of polypeptide contexts. ABSTRACT:

v m w 6 5 (VP16) is a structural protein of herpes simplex virus (HSV) which is packaged into the tegument region of the virion at approximately 400-600 molecules per virion (Heine et al., 1974). In addition to its structural role, Vmw65 is a regulatory protein (Post et al., 1981; Campbell et al., 1984) which selectively induces the transcription of HSV immediate-early (IE) genes after virus infection by a complex interaction with at least two cellular factors including Oct-1 (McKnight et al., 1987; Kristie & Roizman, 1987; Gerster & Roeder, 1988; OHare & Goding, 1988; Stern et al., 1989; Preston et al., 1988) and CFF (VCAF-1, Cl/C2) (Xiao & Capone, 1990; Kristie et al., 1989; Katan et al., 1990). A transcription complex containing these three proteins is selectively assembled on IE-specific, cis-acting regulatory elements (the TAATGARAT motif) present upstream of each IE gene (Mackem & Roizman, 1982a,b; Preston et al., 1984; Whitton et al., 1983; Treizenberg et al., 1988a; O’Hare & Hayward, 1987). Requirements within Vmw65 for transactivation are distinct and separable from those for complex formation (Treizenberg et al., 1988b; Greaves & O’Hare, 1989; Werstruck & Capone, 1989). Thus, removal of the acidic carboxy-terminalregion of Vmw65 completely abrogates transactivation of IE genes but has no effect on complex formation. Moreover, this acidic region of Vmw65 will function in transcriptional activation when fused to a wide variety of different proteins, including, for example, the human estrogen receptor (Elliston et al., 1990), the chicken oncogene v-myb (Ibanez & Lipsick, 1990), and the yeast regulatory protein GAL4 (Sadowski et al., 1988; Cousens et al., 1989). *Marie Curie Research Institute. 5 Roche Products Ltd.

The carboxy-terminal region used in such “domain swap” experiments is highly acidic, containing 2 1 glutamic or aspartic residues and, from secondary structure predictions (Garnier et al., 1978), is proposed to form 2 helical regions separated by a poorly structured proline/glycine-rich loop. The prediction that this activation region of Vmw65 may adopt an cy-helical conformation is consistent with the proposal of Ptashne and colleagues that acidic activation regions may have the general property of forming amphipathic helices having one hydrophilic face bearing the acidic residues, and one hydrophobic face (Ma & Ptashne, 1987; Giniger & Ptashne, 1987). However, although the activation regions of a number of other regulatory proteins are enriched in acidic residues (Hope et al., 1988; Hollenberg & Evans, 1988), there has been no direct examination of the physical structure of such regions and no analysis of the proposal for a requirement for a-helix formation. Since the transcriptional function of the acidic region of Vmw65 is independent of the sequence of the DNA binding domain to which it is attached, we anticipated that any specification for folding would be entirely contained within the region itself. Here we report the absence of any major stable secondary structure of the domain and discuss this in light of current results from site-directed mutagenesis of the region and previous predictions on the requirement for CYhelical formation for the function of acidic transactivation domains. MATERIALS AND METHODS Plasmid Construction, Expression, and Purification. The coding region for the acidic carboxy-terminal 79 amino acids of Vmw65 protein of HSV-1 strain MP was fused in-frame

0006-2960/92/043 1-4150%03.00/0 , 0 1992 American Chemical Society I

- I

NMR Study of the Vmw65 Acid Domain to the coding sequence for glutathione S-transferase contained in the plasmid pGEX2T (Pharmacia). Insertion of the appropriate HincII to EcoRI fragment of pRG5O (Greaves & O'Hare, 1989) into the SmaI to EcoRI sites of pGEX2T resulted in a vector expressing a glutathione transferaseacid domain fusion protein from which the acid domain could be isolated by thrombin cleavage. Cleavage at the site engineered in the vector was predicted to result in an N-terminal extension of 3 amino acids, GlySer-Pro, in front of the 79 amino acid region of Vmw65 (which commenced with threonine-412). Expression of the fusion protein was induced by addition of 0.5 mM IPTG to large-scale fermentations (2-10 L) of Escherichia coli (HB 101) containing the recombinant plasmid (pP070). Preliminary experiments demonstrated that virtually all of the fusion protein was soluble under mild solubilization conditions (sonication in PBS containing 1% NP40) and the fusion protein was purified on glutathionesepharose affinity columns and eluted in a buffer containing 50 mM Tris-HCl, pH 8.0, and 5 mM reduced glutathione. Routinely, 5 mL of buffer was used to solubilize the pellet from 1 L of bacterial culture. Bound material was eluted in 3-mL fractions, 10 pL of which was loaded on SDS-polyacrylamide gels for analysis. The purified fusion protein was cleaved with thrombin in a buffer containing 100 mM NaCl and 2.5 mM CaCl,, using 1 pg of thrombin to 1 mg of fusion protein. After complete cleavage was ensured, the material was applied directly to an FPLC MonoQ column, equilibrated in 50 mM Tris, pH 7.5, and 20 mM NaC1, and eluted with a linear gradient of 20 mM-1 M NaCl. Desalting and buffer exchange were performed using Amicon Centricon units with nominal cutoff values of 3000 Da. NMR Spectroscopy. NMR samples contained typically 2 mg of protein in 450 pL of D,O at pH 6.8 f 0.2. pH adjustments in aqueous solution were made via microliter additions of 1 M solutions of 2HC1 and Na02H. The acidic domain is insoluble at these concentrations below pH 5.3 but redissolves on raising the pH. There is no significant change in the 1D NMR spectrum between pH 5.3 and pH 8.0. Samples in methanol-water mixtures were prepared by first freeze-drying the protein from water at pH 6.8, followed by dissolution in the desired solvent mixture without further adjustment. 'H NMR spectra were recorded at 400 MHz using a Bruker AM400 spectrometer. Typical 2D spectra were collected with 96-160 scans per t l increment and with data sizes of 4096 X 400 points ( t 2 X t J . Free induction decays were multiplied by a 45O-shifted sine-bell function before Fourier transformation and zero-filled to 4096 X 1024 points (E2X F1) during processing. Base lines were optimized by careful adjustment of the receiver phase (Marion & Bax, 1988) and ADC delay (Hoult et al., 1983), and no further correction was applied. TOCSY spectra employed a 36-ms, 13.6-kHz WALTZ- 16 spin-lock, flanked by 1.5-ms trim pulses. Double-quantumfiltered COSY spectra were obtained using the phase cycling scheme of Derome and Williamson (1990). Other Methods. Polyacrylamide gel electrophoresis was performed using denaturing conditions and buffers as previously described (Laemmli, 1975). N-Terminal sequencing was performed using conventional Edman degradation of the appropriate purified species by Jaytee Biosciences Ltd., Kent, U.K. RESULTS Protein Production and Purification. The acidic domain (residues 412-490) of Vmw65 was expressed in bacteria as a fusion protein with glutathione S-transferase and the product

Biochemistry, Vol. 31, No. 16, 1992 4151

(A)

L U W 1 2 3 4 5 6 97 68

*

43 29 18

-?I

(B)

M-Q 14 29 30 3 1 3 2 37 38 3 9 4 0 4 1

LI

29 18 14 -

4 4-

68

43

-lAD

- SAD

FIGURE1: Purification of the acid domain of Vmw65. (A) A soluble extract was prepared from bacterial cells as described under Materials and Methods and applied to a glutathione-Sepharose affinity column, the unbound material was collected, and the column was washed. Bound material was eluted in a buffer containing 50 mM Tris, pH 8.0, and 5 mM glutathione. Lanes indicate load (L), unbound (U), wash (W), and fractions 1-6 (10 pL out of 3-mL fractions). (B) Typical results of cleavage are shown in the left two tracks [(-) and (+) thrombin]. After cleavage, the sample was loaded onto a MonoQ column as described under Materials and Methods. One-milliliter fractions were collected, and 2 pL was analyzed by SDS-polyacrylamide gel electrophoresis. GST eluted mainly in the unbound fraction (fractions 10-14) although a proportion of GST eluted at higher salt concentrations (fractions 3 1 /32). The two cleavage products (arrowed) eluted in overlapping peaks from 0.3 to 0.4 M NaC1. (C) Although the elution profiles of LAD and SAD overlapped, the peaks were distinct, and LAD could be efficiently separated from SAD. The figure shows a summary of the uncleaved fusion protein (Un), and two peak fractions containing purified LAD and SAD as indicated.

purified as described under Materials and Methods by affinity chromatography on glutathione-Sepharose columns (Figure 1A). One major product with a molecular weight of approximately 38K and one minor product of approximately 43K were observed in the purified fractions (Figure lA, solid arrows lanes 1-6). The 38K protein was of the size predicted, and minor lower molecular weight products were considered to be heterogeneous degraded products. Thrombin cleavage of the purified fusion protein resulted in three main products: one of 29K (the size of glutathione S-transferase) and two smaller products with molecular weights of approximately 18K and 14K [Figure lB, lanes (-) and (+) Thr]. These products could be separated by anion-exchange chromatography (Figure 1B) and ultimately purified to virtual homogeneity. N-Terminal sequencing of the two cleavage products gave identical results, and the sequence was exactly as predicted for the three residue extension reading into the acidic domain region, i.e., Gly

4152 Biochemistry, Vol. 31, No. 16, I992

1

"

'

"

'

'

~

8.0

'

1

"

"

l

7.5 PPM

FIGURE 2:

'

'

"

7.0

O'Hare and Williams

fi

"310'

"

'

2.5 I '

"

' 2 .I 0

'

"

'

' 1 .I 5

"

1 I. 0 "

"

"

'

PPM

IH NMR spectra of (a) the small acidic domain (SAD) and (b) the large acidic domain (LAD) protein products from expression

of the Vmw65 acidic domain in E. coli. Random-coilchemical shift ranges are illustrated for some amino acid types (indicated by single-letter codes).

Ser-Pro-Thr.Ala.Pro.Pro-Thr-AspVa1 .... Consequently these two products were named large acidic domain (LAD) and small acidic domain (SAD), and we originally anticipated that the SAD was a specific proteolysis product of LAD. Overall we obtained 5-10 mg of each species, and each was purified to at least 95% homogeneity as determined by Coomassie Brilliant Blue staining of SDS-polyacrylamide gels. NMR Spectroscopy. One-dimensional 'H NMR spectra of the long acidic domain (LAD) and short acidic domain (SAD) proteins show obvious similarities (Figure 2). In spectra of both LAD and SAD, resonances are clustered into groups around the Urandom-coilnchemical shift positions of amino acid residues. Each group of signals has a similar intensity in both LAD and SAD spectra, with the exception of resonances around 2.0 ppm, which are more intense in the spectrum of LAD. Resonances from Glu, Pro, Ile, Gln, Arg, Val, and Met amino acids occur in this region. However, both samples have three intense singlet resonances in this region of the spectrum which arise from the C,H3 groups of three Met residues. This is entirely consistent with the expected amino acid composition, and these can be discounted as a source of heterogeneity between the two samples. In the aromatic region of the spectrum, resonances of histidine C2H protons and the C3H/C5H doublets of tyrosine residues are separated from the remainder. Signals from three His residues are observed in SAD while four residues are apparent in LAD. The variation in line width among these signals, due to chemical exchange effects can be attributed to a minor variation in pKa among these residues. In all cases, the observed pKa is close to the value (6.6) observed in small peptides. At least two tyrosines are present in both samples, although in SAD the two C3H/C5H doublets overlap almost

exactly. However, two-dimensional, double-quantum-filtered COSY (DQF-COSY) spectra can be used to resolve, at least partially, this overlap. Figure 3a shows the cross-peaks arising from correlations between C3H/C5H and C2H/C6H proton resonances of tyrosine residues in SAD. Although the C3H/C5H proton resonances of the tyrosines are degenerate, the C2H/C6H resonances are resolved, and two antiphase patterns are evident in the cross-peak above the spectrum diagonal. These patterns are not resolved in the cross-peak below the diagonal due to the poorer digitization of the spectrum in the F1 dimension. The presence of two tyrosine residues in SAD is consistent with the predicted composition. However, the spectrum of LAD in this region is more complex (Figure 3b). Apparent phase distortions are due to the combined effects of remnance overlap and mild resolution enhancement. It is possible to discern two antiphase multiplets in the cross-peak above the diagonal; however, one multiplet is approximately twice as intense as the other. Close inspection of the more intense antiphase multiplet reveals a small splitting in the F1 dimension, which is more clearly resolved in the F2 dimension of the cross-peak below the spectrum diagonal. This splitting, together with the increase in intensity, indicates that two tyrosine residues with almost degenerate resonances overlap at these chemical shifts. Thus, LAD contains at least three tyrosine residues in total. Of the two tyrosine residues in the acidic domain sequence, one is two residues from the C-terminus. Since both LAD and SAD contain at least two tyrosines, any differences between them are unlikely to arise from C-terminal truncation of the acidic domain sequence. This conclusion is supported by the observation of a C,H/C,H3 correlation from a single isoleucine residue in each sample (Figure 4). The acidic domain

Biochemistry, Vola31, No. 16, 1992 4153

NMR Study of the Vmw65 Acid Domain y2

v.

n

b

712

7:i

PPM

6.8

6.8

6.9

6.9

7.0

7.0

7.1

7.1

7.2

7.2

'M

6:8

7:O

n

7 2

PM

fia

7 0

7 1

FIGURE 3: DQF-COSY spectra of (a) SAD and (b) LAD. Cross-peaks arising from C3H/C5H-C2H/C6H correlations of two tyrosines (YI and Yz)are evident in (a), while three, highly-overlapped cross-peaks are evident in (b) (Yl-Y3). Positive contours are drawn at intervals of 21/2while negative contours are drawn at intervals of 4.

1.9

-

1.9

2.0

-

2.0

2.1

-

2.1

2.2

-

2.2

OP

2.3

\

I

I

I

I

I

1.1

1.0

.9

.8

.7

PPM

I

.6

'M

t

I

I

I

I

I

1.1

1.0

.9

.8

.7

PPM

I

.6

I

PPM

FIGURE 4: DQF-COSY spectra of (a) SAD and (b) LAD. Cross-peaks arising from isoleucine and valine p-y correlations are shown. A single isoleucine residue is evident in both spectra while LAD contains an additional valine residue (V3), whose p-7 correlations are labeled. The cross-peak denoted R in each spectrum arises from a reference compound.

sequence contains a single isoleucine residue, which is five residues from the C-terminus. As indicated above, N-terminal peptide sequencing over the first five residues indicates that both LAD and SAD have identical, and correct, N-terminal sequences, thus eliminating N-terminal cleavage as a source of the LAD/SAD heterogeneity. Comparison of Figure 4a with Figure 4b indicates the presence of an additional valine residue in LAD whose C,H/C,H3 antiphase correlation multiplets are clearly visible to one side of a more intense peak produced by the overlap of four such multiplets arising from two valine residues. These amino acid compositions can be confirmed by inspection of a TOCSY spectrum in which valine and isoleucine CJI/C,H3 correlations are present (Figure 5 ) . Due to the greater chemical shift dispersion of the C,H resonances, these correlation peaks (which are in-phase multiplets in TOCSY spectra) are better resolved from each other than the corre-

sponding C,H/C,H3 correlations. Thus, three valine (and one isoleucine) residues are evident in LAD while two valine residues (and one isoleucine) are present in SAD. From the predicted sequence, the acid domain should contain two valine residues. This region of the TOCSY spectrum also contains signals arising from Ala C,H/C,H3, Thr C,H/C,H3, and Thr C,H/C,H3 correlations. There are nine Ala and five Thr residues in the acidic domain sequence, and most of their correlation peaks are clustered into two groups (boxed) whose chemical shift coordinates are centered on 4.25, 1.22 ppm (Thr C,H/C,H3 and Thr CBH/C,H3) and 4.32, 1.40 ppm (Ala C,H/C,H3). However, the C,H resonances of one Thr and two Ala residues are shifted significantly downfield from these values while the corresponding Thr C,H resonance is shifted upfield, in spectra from both LAD and SAD. While such secondary shifts are commonplace in structured proteins, ar-

4154

Biochemistry, Vola31, No. 16, 1992

O'Hare and Williams I

IY"

a8

\rF9

I

1.2

I

1.3

1.3

T

I

1.4

T

I

1.4

I 4j6

415

4!4

4!3 PPM

412

411

1.5

4.0

FIGURE5: TOCSY spectra of (a) SAD and (b) LAD. A portion of the spectrum is shown which contains cross-peaks due to isoleucine a-y, valine a-y,threonine a-y, threonine B-y, and alanine a-/3 correlations. Two valine a-y correlations are identified in SAD while an additional cross-Deak is Dresent in LAD. Downfield shifts are evident on the C,H protons of one threonine (TI) and two alanine (Al, A2) residues in both ipectra. -

ising from the spatial proximity of magnetically anisotropic groups such as aromatic rings, there is little evidence of such structure under the conditions of Figures 2-5. In extended polypeptides, downfield shifts on C,H protons are commonly observed in residues which precede a proline in the amino acid sequence (G. Williams, unpublished data) and are probably due to restriction of free rotation around the C,H-CO bond, caused by steric interactions of the proline NCHz group. In the acidic domain, one threonine and two alanine residues precede proline, and thus the observed shifts in both LAD and SAD are consistent with the predicted sequence at these points. TOCSY spectra have revealed an additional set of correlations which are present in LAD and absent in SAD. The resonances involved have chemical shifts which correspond to those expected for the B, y, and 6 protons of an arginine residue and occur in a region of the spectrum which is well separated from other signals (data not shown). No such residue is expected from the acidic domain sequence. Overall then, the one-dimensional and two-dimensional DQF-COSY and TOCSY spectra of SAD are consistent with the amino acid composition of the expected product of the expression system. In addition, confirmation of the amino acid composition of SAD has been provided using negative-ion, electrospray mass spectrometry (Dr. Walter Vetter, Hoffmann-La Roche AG, Basel). The relative molecular mass predicted from the amino acid sequence of the construct is 8519.0 while that observed for SAD is 8518.5. However, 'H NMR spectra of LAD indicate the presence of additional residues in LAD which include one Val, one Arg, one His, one Tyr, and, by integration of Figure 2, approximately seven Pro or Glu residues. Due to the extensive overlap in both 1D and 2D spectra, this is not likely to represent a complete list. Since there is no evidence of heterogeneity in the sequence of the gene from which these proteins products are derived, it is unlikely that the additional residues of LAD arise from an insertion which maintains the correct reading frame. However, the extra residues observed in LAD are consistent with an extension to the sequence at the C-terminus which would result from inefficient termination at the single stop codon employed in the construct and translation of the DNA

sequence until the next stop is reached. This adds the sequence XGARPDPHPPSGFSPPVTGSYPQ,where X represents the amino acid resulting from the translation of the stop codon and neatly accounts for the presence of the additional resonances observed in 'H NMR spectra of LAD. Those additional residues which have not been identified by NMR (Ala, Thr, Phe, Glu, three Gly, and three Ser) have resonances which are obscured by overlap with those of similar residues or which occur in inaccessible regions of the spectrum (e.g., under the HOD resonance). Amino acid analysis of the peptides supports these general conclusions: LAD contains significantly larger amounts of Pro, Ser, and Gly than SAD. The absence of secondary and tertiary structure in aqueous solution at 35 OC, which is implied by Figure 2, is confirmed by inspection of two-dimensional NOE (NOESY) and rotating-frame NOE (ROESY) spectra of SAD and LAD. Intraresidue and C,Hi-NH,,' sequential NOES are observed, characteristic of an extended polypeptide chain; however, no sequential NHi-NHi+l NOEs, which are indicative of helical or nonhelical turns, are observed. Small changes in chemical shift are seen on the addition of salt (20 mM CaClz or 100 mM NaCl), but there is no evidence of secondary structure formation under these conditions. Addition of a nonaqueous solvent and reduction of temperature have been shown to promote structure formation in a variety of peptides. SAD is soluble in methanol-water mixtures containing up to 80% methanol. A portion of the NOESY spectrum of SAD at 7 OC in 80% methanol is shown in Figure 6. Strong intraresidue NOEs are evident between the C2H/C6H and C3H/C5H resonances of the two tyrosines: a less intense cross-peak arises from chemical exchange between the two N,H protons of the single glutamine residue, caused by slow rotation about the side chain C-N amide bond. However, while some interresidue NH-NH NOEs are observed, these are considerably weaker. The dominant conformer is thus still extended although some turn-containing conformers are probably present. However, these are populated in relatively small amounts (