Biochemistry 1992, 31, 6203-621 1
6203
Sequence of a cDNA Clone Encoding the Zinc Metalloproteinase Hemorrhagic Toxin e from Crotalus atrox: Evidence for Signal, Zymogen, and Disintegrin-like Structures+,$ Lawrence A. Hite,$ John D. Shannon,$ Jbn B. Bjarnason,ll and Jay W. Fox*,$ Department of Microbiology, University of Virginia Health Sciences Center, Charlottesville, Virginia 22908, and Science Institute, University of Iceland, Reykjavik, Iceland Received September 17, 1991; Revised Manuscript Received January 17, 1992
ABSTRACT: The sequence of two overlapping cDNA clones for the zinc metalloproteinase hemorrhagic toxin e (also known as atrolysin e, EC 3.4.24.44) from the venom gland of Crotalus atrox, the Western diamondback rattlesnake, is presented. The assembled cDNA sequence is 1975 nucleotides in length and encodes an open reading frame of 478 amino acids. The mature hemorrhagic toxin e protein as isolated from the crude venom has a molecular weight of approximately 24 0oO and thus represents the processed product of this open reading frame. From the deduced amino acid sequence, it can be hypothesized that the enzyme is translated with a signal sequence of 18 amino acids, an amino-terminal propeptide of 169 amino acids, a central hemorrhagic proteinase domain of 202 amino acids, and a carboxy-terminal sequence of 89 amino acids. The propeptide has a short region similar to the region involved in the activation of matrix metalloproteinase zymogens. The proteinase domain is similar to other snake venom metalloproteinases, with over 57% identity to the low molecular weight proteinases HR2a and H2-proteinase from the Habu snake Trimeresurusflauouiridis. The carboxy-terminal region, which is not observed in the mature protein, strongly resembles the protein sequence immediately following the proteinase domain of H R 1 B (a high molecular weight hemorrhagic proteinase from the venom of T. flauouiridis) and the members of a different family of snake venom polypeptides known for their platelet aggregation inhibitory activity, the disintegrins. The cDNA sequence bears striking similarity to a previously reported sequence for a disintegrin cDNA. This report is evidence that this subfamily of venom metalloproteinases is synthesized in a proenzyme form which must be proteolytically activated. Furthermore, these studies suggest a structural link between the low and high molecular weight hemorrhagic toxins and the disintegrin-like family of polypeptides.
A
prominent feature of crotalid snake venoms is that they contain an abundance of proteolytic enzymes. Some of these proteinases are directly capable of causing hemorrhage at the site of envenomation and often disseminate from the site, affecting the entire appendage. In severe cases, systemic hemorrhage has been observed, affecting the heart, lungs, kidneys, and brain. Both types of hemorrhage are the result of the action of zinc metalloproteinases found in the venom [reviewed in Bjarnason and Fox (1989)l. Bjarnason and Tu (1978) first isolated and partially characterized five hemorrhagic metalloproteinasesfrom Crotalus atrox venom. The cleavage specificities of these enzymes were initially investigated using insulin B chain as substrate in order to understand more about the mechanism of hemorrhage induction (Bjarnason & Fox, 1983, 1987; Fox et al., 1986; Bjarnason et al., 1988). From both microscopic pathological studies (Ownby et al., 1978) and specificity studies using purified basement membrane components (Baramova et al., 1989, 1990a), it has been well established that these hemorragic metalloproteinases degrade the proteins of basement membrane including collagen type IV, nidogen (entactin), and laminin. Despite these studies, it was still not understood why the sizes of these metalloproteinases are so varied (from 24 to 68 'This work was supported by Grant ROlGM31289 to J.W.F. from the National Institutes of Health and Grant INT-8902965 to J.W.F. from the National Science Foundation. *The nucleic acid sequence in this paper has been submitted to GenBank under Accession Number 505372. * To whom correspondence should be addressed. f University of Virginia Health Sciences Center. 11 University of Iceland.
m a ) and why the higher molecular weight toxins are generally more effective at producing hemorrhage than the low molecular weight toxins. One explanation based upon our previous studies (Baramova et al., 1990b) is that the low molecular weight hemorrhagic toxins (Ht-c, -d, and -e)' are effectively inhibited by plasma a2-macroglobulinwhile the high molecular weight toxin Ht-a (atrolysin A, EC 3.4.24.1) was not. Partial protein sequence data are now available for many such enzymes from a broad range of crotalid snake venoms, which suggests that these enzymes are evolutionarily related members of a subfamily of zinc metalloproteinases (Shannon et al., 1989; Miyata et al., 1989; Takeya et al., 1989, 1990a,b; Tanizaki et al., 1989; Nakagawa et al., 1989; Sanchez et al., 1991). Indeed, the entire protein sequences of five hemorrhagic proteinases and one nonhemorrhagic proteinase have been completed by traditional Edman protein sequencing. With the elucidation of the protein sequence for H R l B (trimerelysin I, EC 3.4.24.52) from Trimeresurusflauovridis ~
~~
Abbreviations: EDTA, ethylenediaminetetraacetic acid; HPLC, high-performance liquid chromatography; Ht, hemorrhagic toxin; kDa, kilodalton(s); KGD, tripeptide sequence lysine-glycine-aspartate; MMuLV, murine Molony leukemia virus; MBIR, Molecular Biology Information Resource; MMP, matrix metalloproteinase; NZY, rich bacteriological broth containing 10 g of N Z amine, 5 g of sodium chloride, 5 g of yeast extract, and 2 g of magnesium sulfate heptahydrate per liter, adjusted to pH 7.5; ORF, open reading frame; RGD, tripeptide sequence arginine-glycine-aspartate; SDS, sodium dodecyl sulfate; SM, phage buffer consisting of 0.1 M sodium chloride, 8 mM magnesium sulfate, 50 mM Tris-HC1, pH 7.5, and 2%gelatin; SSC, standard saline citrate (0.15 M sodium chloride/l5 mM sodium citrate, pH 7.0); TAE, Trisacetate-EDTA buffer (40 mM Tris-acetate/l mM EDTA); TFA, trifluoroacetic acid; Tris, tris(hydroxymethy1)aminomethane. I
0006-2960/92/0431-6203$03.00/00 1992 American Chemical Society
6204 Biochemistry, Vol. 31, No. 27, 1992 (Takeya et al., 1990a), it was shown that this high molecular weight toxin consists of three domains. At the amino-terminal end is a proteinase domain similar to the small molecular weight hemorrhagic metalloproteinases, followed by two additional domains, one of which appears to be related to the disintegrin family of proteins while the other shows weak similarity to von Willebrand factor. Here we present the complete cDNA sequence and the translated protein sequence of a low molecular weight hemorrhagic proteinase, Ht-e, from Crotalus atrox, the Western diamondback rattlesnake. On the basis of this work, we can now suggest a new explanation for the structural relationship between the high and low molecular weight snake venom metalloproteinases and the disintegrin-like family of polypeptides. EXPERIMENTAL PROCEDURES Protein Preparation. Hemorrhagic toxins were purified from crude C. atrox venom (Miami Serpentarium) using the purification scheme of Bjarnason and Tu (1978). Hemorrhagic toxins were alkylated with iodoacetamide by the method of Hirs (1967), or with 4-vinylpyridine by the method of Friedman et al. (1970), then HPLC-purified away from the reduction/alkylation reagents using a reverse-phase ((2-4) column with a starting solvent of 0.1% aqueous TFA, and eluted with 0.1% TFA in acetonitrile at a gradient of 1% increase in eluting solvent per milliliter. Cyanogen bromide digestion of Ht-e by the method of Gross (1967) followed by HPLC (using the same conditions outlined previously) yielded purified CNBr fragments. Amino Acid Analysis. Amino acid analysis of Ht-e was done by a modification (Shannon et al., 1989) of the method of Bidlingmeyer et al. (1984). Protein Sequencing. Protein sequencing was performed on an Applied Biosystems 470A gas-phase sequenator (Applied Biosystems, Foster City, CA) with an on-line Applied Biosystems Model 120A analyzer using the manufacturer's standard protocols. Poly(A+) R N A Isolation. RNA was isolated from the venom glands of two freshly sacrificed adult rattlesnakes (Maverick Trading Post, Farmer's Branch, TX) approximately 4 ft in length. Briefly, the snakes were sacrificed by decapitation; the glands were removed and placed directly into a denaturing solution of 4 M guanidinium isothiocyanate and homogenized using a motorized homogenizer. Total RNA was isolated by the method of Chirgwin et al. (1979). Homogenate was layered over ultracentrifuge tubes containing 5.7 M cesium chloride plus 0.1 M EDTA and centrifuged for 18 h at 33 000 rpm at 20 OC. The pelleted RNA was recovered and ethanol-precipitated. Purified total RNA was then passed over an oligo(dT)%llulose column and eluted as poly(A+) RNA. The total yield of poly(Af) RNA was about 10 pg from two snakes. cDNA Library Construction. cDNA synthesis was accomplished using the UniZAP XR kit by Stratagene (La Jolla, CA). First-strand synthesis from 1.2 pg of poly(A+) RNA was primed by an XhoI/oligo(dT) linker/primer and catalyzed by M-MuLV reverse transcriptase in the presence of [a32P]dATPand 5-methyl-dCTP (to hemimethylate the cDNA at all internal XhoI sites). After second-strand synthesis using Escherichia coli DNA polymerase the cDNA ends were blunted with T4 DNA polymerase, and EcoRI adapters were ligated on. The cDNA was the digested with XhoI to remove the adapter from the end containing the XhoI linker/primer sequence (internal XhoI sites were protected by the hemimethylated state of the cDNA). The resultant cDNA had an EcoRI site at the 5' end of the message and an XhoI site at the 3' [poly (A)] end. cDNA was then purified from the
Hite et al. excess adapters and cleaved XhoI linkers by gel filtration chromatography using Sepharose CL-4B resin. This step was also used to size-select for larger cDNAs. This cDNA was then ligated into Stratagene's UniZAP XR vector (a derivative of phage A) and packaged using Stratagene's Gigapack I1 packaging extract. The packaged DNA was then titered and found to contain 2.9 X lo7 recombinants. Of these, 2 X lo6 were amplified to make the library. Design and Synthesis of Oligonucleotide Probes. The amino-terminal protein sequence of Ht-e was compared to similar regions of other C. atrox hemorrhagic toxins (partial protein sequences of which we had previously determined), and a heptameric amino acid sequence was chosen which would be unlikely to cross-hybridize with other hemorrhagic toxin sequences, yet which would have as few as possible degenerate positions. A mixture of oligonucleotide probes of 20 bases in length was synthesized using a Biosearch 8600 synthesizer (MilliGen/Biosearch, Burlington, MA), incorporating the proper mixture of bases at each site of degeneracy. The DNA sequence of the probe was A-T-G-T-A-C/T-A-C-N-A-AA/G-T-A-C/T-A-A-C/T-G-G, corresponding to the peptide sequence M-Y-T-K-Y-N-G (positions 18-24) from the amino-terminal region of the mature Ht-e proteinase. Prior to hybridization, the probe was labeled using T4 polynucleotide kinase and [y-32P]ATP(6000 Ci/mmol; New England Nuclear, Boston, MA) (Ausubel et al., 1987) and purified by spun column chromatography using BioGel P-2 (Bio-Rad, Richmond, CA) and a method adapted from Sambrook et al. (1982). cDNA Library Screening. Since the hemorrhagic toxins were expected to be highly expressed proteins in the venom gland, the primary screening of the library was done at a low plaque density. The library was plated on 150-mm NZY plates and grown 9 h at 37 OC. Plates were then chilled for several hours, and plaques were lifted in duplicate onto nitrocellulose filters and processed as recommended by Stratagene. After vacuum baking at 80 "C, filters were washed in three changes of 3 X SSC/O.l% SDS at room temperature and then prehybridized for 3 h at 42 OC. Labeled probe and hybridization solution were then added, and the hybridization was allowed to proceed overnight at 42 "C. The next day the solution was removed, and the filters were washed in three changes of 6 X SSC/O.l% SDS at room temperature and then in the same buffer at 42 OC. Filters were then sealed in plastic wrap, and Kodak XAR film (Rochester, NY) was placed over them in an autoradiography cassette. Film was exposed overnight at -70 OC. Positive plaques were identified after the film was developed and the autoradiograph was aligned to the plates. Characterization of Positive Clones. Plaques which were both positive and clearly defined from surrounding plaques were removed using a sterile Pasteur pipet and eluted in SM buffer. Clones were then rescued into pBlueScript I 1 phagemids using R408 helper phage. Plasmid DNA was then obtained using a Qiagen plasmid kit (Qiagen Inc., Chatsworth, CA). The plasmid DNA was restriction-digestedwith EcoRI and XhoI to remove the insert and analyzed using 1% agarose gels in TAE buffer. Two of the largest clones (HO11 and H012) were chosen for DNA sequence analysis. Those clones to be sequenced were also isolated in single-stranded form using the in vivo packaging signals in the pBlueScript I1 vector using the protocols provided by Stratagene. DNA Sequencing. DNA sequencing was performed on a du Pont Genesis 2000 automated DNA sequencer (E. I . du Pont de Nemours & Co., Wilmington, DE). Sanger dideoxy
Biochemistry, Vol. 31, No. 27, 1992 6205
Metalloproteinase cDNA with Disintegrin Sequence sequencing reactions (Sanger et al., 1977) were performed on both single- and double-stranded DNA using reagents and protocols furnished by du Pont. Typical reactions gave readable sequences of 250-350 bases after manual editing of the results given by the base-calling software. The singlestranded reactions generally gave superior data. Sequence overlaps were obtained by “walking” across the insert, sequencing in from both ends starting with primers within the vector sequence and making new oligonucleotide primers to the end of each sequence run until each strand was completed. Synthesized primers were 17 or more bases in length. The coding strand was sequenced from single-stranded template and the noncoding strand from double-stranded template. Both strands were sequenced in their entirety at least twice. Sequence Assembly. Individual sequence runs were assembled into overlaps using the computer program SAM (Sequence Assembly Manager) of the MBIR suite of programs (Lawrence et al., 1989). After being edited, the assembled sequence was translated into the coding sequence, and both the protein and DNA sequences were compared to sequences in the GenBank (Bilofsky et al., 1986) and PIR (Protein Identification Resource, National Biomedical Research Foundation, Georgetown University, Washington, D. C.) databases using the fasta program of Pearson and Lipman (1988). RESULTS Protein Sequencing and Amino Acid Analysis of Ht-e. The amino acid sequence of the amino terminus of the mature Ht-e protein was determined to 42 residues by Edman degradation, and part of this protein sequence was used to design a suitable oligonucleotide probe for screening the cDNA library. Several Ht-e CNBr peptides were sequenced to verify that positive clones isolated from the library encoded the correct protein sequence. The carboxy-terminal CNBr peptide was sequenced from Asn-192 to Pro-202, where the signal ended, suggesting the end of the peptide. Amino acid analysis of this peptide (data not shown) confiied that Pro-202 was the final residue of the mature proteinase. The amino acid composition of Ht-e was determined. Table I displays the average results of five amino acid analyses compared to the predicted composition from the cDNA sequence. Isolation of cDNA Clones. As stated earlier, it was expected that Ht-e would be highly expressed in the venom gland library. This assumption was borne out by the screening of the library with the Ht-e 20-mer oligonucleotide probe. The average number of clones which tested positive to this probe from the screening of three plates was greater than 1%. Due to the high number of positives, 16 clones were selected for further work. The clone (H012) which was first selected to be completely sequenced consisted of 1795 base pairs. When it was discovered that this was not sufficient to unambiguously determine the translation initiation codon, a second, overlapping clone (H011) was sequenced. This clone is slightly larger than the H 0 1 2 clone and gave an additional 180 base pairs of 5’-untranslated sequence for a total length of 1975 base pairs. Figure 1 shows the cDNA sequence of Ht-e and the translated protein sequence from the initiation codon to the termination codon. The deduced amino acid sequence from this open reading frame is 478 amino acids, much larger than would be necessary to encode a 24-kDa protein. Within this sequence, we assume the presence of a signal peptide, a propeptide, the mature protein, and a carboxy-terminal peptide which must be posttranslationally removed. Similarity of Ht-e to Other Sequences. When the nucleotide sequence of the Ht-e clone was compared with the
Table I: Amino Acid Comwsition of Ht-e” protein deduced from translated CDNA analysis Asn-1 to residue results Asn-1 to Pr0-202~ Glv-291C Asx 27 25 36 29 Glx 23 21 23 Ser 17 17 22 14 13 GlY His 9 9 9 Thr 9 17 10 6 Ala 7 10 8 14 10 Pro 8 8 12 11 10 12 TYr Val 11 11 16 7 7 Met 10 Ile 20 20 21 Leu 14 13 19 5 5 6 Phe 10 10 13 LYS 6 7 19 CYS ND 3 3 Trp ‘Values for the protein analysis were averaged from five samples of Ht-e, calculated for a 24-kDa protein and rounded to integer values expressed as residues per molecule. Asx = sum of Asn + Asp; Glx = sum of Gln Glu; N D = not determined. Asn- 1 was determined to be the amino terminus of mature Ht-e protein using amino-terminal protein sequence analyis. Pro-202 is the predicted carbxy-terminal residue of mature Ht-e. cGly-291 is the last residue of the open reading frame.
+
GenBank database, one significant similarity was found, that of the 2017-nucleotide cDNA sequence for trigramin (Neeper & Jacobson, 1990) from Trimeresureus gramineus. These two sequences are 85.3% identical over a 1997-nucleotide overlap (which includes the entire Ht-e cDNA sequence plus gaps inserted for optimal alignment). Comparison of the translated open reading frames of these two clones showed a 70.3% identity over a 482 amino acid overlap. Comparison of the deduced protein sequence to the PIR database, as was expected, confirmed a strong similarity to other snake venom metalloproteinases (5147%) as seen in Figure 2, and also to the disintegrins (up to 75%) as seen in Figure 3. No other significant similarities were seen, except to the zinc binding site of other metalloproteinases,the consensus sequence (H-E-X-X-H) which was described by Jongeneel et al. (1989), and the extended motif described by Murphy et al. (1991), (H-E-X-X-H)X-X-G-X-X-H. The proposed signal peptide region of Ht-e is similar in amino acid sequence to the reported trigramin cDNA signal sequence. The sequence structure (a central region rich in hydrophobic residues and a likely cleavage site 18 amino acids from the amino terminus) is consistent with other signal sequences. Upon close examination, a sequence, P-K-M-C-G-V, was located between the proposed signal sequence and the metalloproteinase domain (residues 164-1 69) which resembles a region in the propeptide of matrix metalloproteinases (MMPs). This sequence is thought to be involved in activation of these enzymes (Valee & Auld, 1990; Van Wart & Birkedal-Hansen, 1990) and might serve a similar function in the zymogen form of Ht-e.
DISCUSSION A comparison of the amino acid sequence of mature Ht-e with the sequences of other known snake venom metalloproteinases is made in Figure 2. As was expected, the protein sequence of Ht-e is very similar to the other known venom metalloproteinases. When all of the proteinases shown in the
Hite et al.
6206 Biochemistry, Vol. 31, No. 27, 1992 21
41
81
61
CTCAGATTGGCTTGAliAGCGGGAAGAGATTGCC~G~CTTCCAGCCAliATCCAGCCTCC~TGATCCAAGTTCTCTTGGTGAC~ATATGCTTASCAGCT
M 101
121
141
I
Q
V
L
L
V
T
I
C
L
A
A
181
161
TTTCCTTATCAAGGGAGCTCTATAATCCTGGAATCTGGG~CGTCAATGATTATGAAGTAATTTATCCACG~GTCACTGCATTGCCCAAAGGAGCAG
F P 231
Y
Q
G
S
S
I I 221
L
E
S
G
N V 241
N
D
Y
E
V
I Y 261
P
R
K
V
T
A L 281
P
K
G
A
TTCAGCCAAAGTATGAAGACACCATGCAATATGAATTG~GTGAATGGAGAGCCAGTGGTCCTTCA~C~GG~T~GGACT~T~TTC~GA V Q P K Y E D T M Q Y E L K V N G E P V V L H L E K N K G S K D
301 _
321
T Y
A S
34:
C A G T G A G E T H Y S F
401
A D
361
C T C A T T G R K i T
A T
T T C C T T N P S V E
441
421
T 3
381 G q
A T G S C C Y Y H
A G
461
G A R i
A A A A T E N D A
481
GACTCAACTGCAAGCATCAGTGCATGCATGCAACGGTTTGAAAGGACATTTCAAGCTTCAAGGGGAGA~G~ACCT~A~TGAACCCTTGAAGCTT~CCGACAGTG 3 S T A S I S A C N G L K G H F K L Q G E M Y L I E P L K L S 3 S
501
521
5 61
541
581
AAGCCCATGCAGTCTTCAARTTGPJUE4ATGTAG~~G~GGATGAGGCCCC~AAAATGTGTGGGGTAACCCAGAATTGGS~TCATATGAGCCCATC~ E A ~ A V F K L ~ R V E K E ~ E A ” ~ C G V T ~ N ~ E
631
621
641
661
68,
AAAGGCCTCTGATTTAAATCTTAATCCTSAACATCAAAGATATGTTGAGCTTTTTCATTG?TGTGGACCATGGAATGTACACRAAATATAATAATSGCGAT~CA
K 70;
A
S
3
L
N
L N 721
P
E
H
Q
R
Y V 74;
E
L
F
I
V
V D 761
H
G
M
Y
T K 781
Y
N
G
D
S
GATAAGATAAGACMCGGGTACAT~~TGGTCAACATTATGAAGGAGAGTTACACTTATATGTACATCGATATATTACTGGC“GTATAGAAAT-TGGT
D K 801
I
R
Q
R
V
H 82:
Q
M
V
N
I
M K 841
E
S
Y
T
Y
M Y 861
I
D
I
L
L
A G 881
I
E
I
W
CCAACGGAGATTTGAlTAACGTGCAGCCAGCATCGCC~AA~AC~~TGAACTCATT?GSAGAATGGASAGAGACAGA’iT”GCTGAAGCGC
S
N
G
D
L
I
N
V Q 921
901
P
A
S
P
N
T L 941
N
S
F
G
E W 961
R
E
T
D
L
L K 981
R
K
S
H
D
TAATGCTCAGTTACTCACGTCCATTGCCTTCGATGAAC~TTATAGGACGGGCTTATATAGGTGGCATATSCSACCCAAAGCGTTCTACAGGAGT_GTC
N A lOOi
Q
L
L
T
S I A 1021
F
D
E
Q
I I 104i
G
R
A
Y
I
G G io61
I
C
D
P
K
R S 1381
T
G
V
V
C A G G A T C A T A G C G A A A T A A A T C T T A G G G T T G C A G T T A C A A T G C S
Q D 1101
H
S
E
I
N
L
-. 1-21
R
V
A
V
T
M T H 1141
E
L
G
H
N L 1161
G
I
H
H
D
T D 1181
S
C
S
C
GTGGTTACTCATSCATTATGTCTCCTGTGATAAGCGATGAACCTTCC~TATTTCAGCGATTGTAGTTACA~CCAATG-TSGGAATTTAT~ATGAA~CA
G G 1201
Y
S
C
I
M
S P 1221
V
I
S
D
P
P S 121;
K
Y
F
S
D C S 1261
Y
I
Q
C
G R A G C C A C A A T G C A T T C T C A G A A A C C C T T G A G A - T K P Q C I L K K P L R ’ D T V S T P V S G N ~ ~
i321
1301
1341
136;
W E 1281
F
L
A
E
I
M
G
N
~
Q
E
C
D
C
1381
GGCTCTCTTGAAAATCCGTGC_GTTATGCTATGCTACAACCTGT~ATGAGACCAGGG?CACAGTGTSCAGAAGGAC~GTGT_G~GACCASTGCAGA~TTATGA G S L E N P C C Y A T T C K M R P G S Q C A E G , C C D C C R F M
1401
1421
1441
1461
1481
AAAAAGGAACAGTATGCCGGGTATCAATSS~-SATAGGAATSATGA~ACCTGCACTSGCCAA-C~GCTGACTGTCCCAG~TGGCCTCTATGGC~~C
K
K
G
T
V
C
X
15c:
V S I521
M
V
D
R
N
D D 1541
T
C
T
G
Q S 1561
A
2
C
P
R
N G 1581
L
Y
G
AACAATSSAGATGGAliAGGTCTGCAGCAACAGGCAGGCACTGTGTTGATG~GACTACAGCCTACTAATCAACC?C_GGCT~C_CTCAGATTTSA~C_-GGAGAT
16C1 C
C
1701
1621 T
T
C
T
T
T
172;
1641 C
A
G
M
S
G
174:
1661 T
T
C
A
A
C
1681 ~
T
C
C
1761
C
T
~
T
A
G
A
C
C
1781
_GCAATAT_?CTTCTCCATATTTAATCTGTTTACCTTTT~CTGT~TC~CCTTTTCCCCACCACAAAGCTC~ATGGGCATATAC~CACCAAGGGC-T
1801 1821 1841 1861 I881 A T T T G C T G T C A A G A A R A A C A A C A G G T G G C C A T T T T A C C G T T T C 1901
1921
-941
196:
AA-GCTTCCTCTTCCAAAATTTCATGC_GCC””-CCAAGA_G“AGCTGCTGCTTC~ATCAACAGGTAliACTAAC~AT~CTCA
FIGURE 1: cDNA sequence of Ht-e clones H011 and H012, with the translated open reading frame from the start ATG codon to the termination codon. The signal sequence and mature protein are denoted in boldface.
alignment in Figure 2 are compared, there is an overall shared identity of 25% (50 residues). It has the highest identity with HR2a and H,-proteinase (57.3% for each), followed next by a hemorrhagic proteinase from Lachesis muta muta (the bushmaster), LHF-I1 (57.1%). If only the proteinase domain (mature protein) of Ht-e is compared with the high molecular weight proteinase HRlB, a 56.1% identity is obtained, whereas there is a 55.7% identity if both the proteinase and disintegrin-like domains are compared. Surprisingly, of the other snake venom metalloproteinases whose protein sequences have been determined, Ht-e is least similar to a hemorrhagic proteinase, Ht-d, from the same species snake (54.8%). Another surprising observation is that the mature Ht-e protein sequence is 56.6% identical to the upstream region of the translated open reading frame of the cDNA for the disintegrin trigramin reported by Neeper and Jacobson (1990). The translated open reading frames of these two clones are 70.3% identical overall. The structures of these translated clones parallel each other (signal sequence, zymogen, met-
alloproteinase, disintegrin); indeed, even the 5’- and 3‘-untranslated regions of the nucleotide sequence bear striking similarity. Although there has not been a disintegrin isolated from C. atrox venom, it is likely that the processed disintegrin-like peptide (Figure 2, residues 203-295) is a component of the venom. This peptide would lack the RGD sequence found in many of the disintegrins, and its biological activity is not known. This sequence shows a high degree of similarity to the disintegrins, including an identical cysteine pattern (Figure 3). For instance, there is a 75.3% identity between this region of Ht-e and elegantin (55 out of 73 amino acids). The disintegrins are a family of anticoagulant peptides which have a high cysteine content [reviewed by Gould et al. (1990)l. These peptides inhibit platelet aggregation by competing with fibrinogen for binding to the platelet’s glycoprotein IIb-IIIa complex, a necessary event for thrombus formation. The RGD sequence found in most disintegrins has been suggested to be important in the binding of integrin proteins to their receptors
S
~
Metalloproteinase cDNA with Disintegrin Sequence 1
10
Biochemistry, Vol. 31, No. 27, 1992 6207 20
30
40
so
Ht-e NPEHQRYVELFIVVDHGMYTKYNGDSDKIRQRVHQMVNIMKESYTYMYID T. gramineus ORF Q Q R F P Q R Y I K L G I F V D H G M Y T K Y S G N S E R I T K R V H Q M I N N I N M M C R A L N I V HRlB > Glu > Asp = Cys. Of the 50 residues conserved in all the venom sequences in Figure 2 (excluding the H-E-X-X-H zinc binding site), 4 are aspartates (at positions 15,93, 128, and 153), 1 is a glutamate (at position 9), and 1 is a histidine (at position 152). Noteworthy is the observation that the second histidine ligand is preceded by an absolutely conserved glycine residue in the snake venom enzymes, which is also the case for the MMPs. Members of the thermolysin family do not contain a glycine at this position. The sequence similarities between the zinc binding residues of snake venom metalloproteinases and the MMPs, along with their similar substrate preferences, set them apart from the thermolysin family. Recent work by Murphy et al. (199 1) proposed the classification of the zinc metalloproteinases into five related groups (Figure 4). The first is represented by thermolysin and contains a conserved second sequence which harbons the third zinc ligand. The second group includes Serratia proteinase and Envinia proteinases B and C and contains the two zinc binding motifs of the first group, but also contains a third motif adjacent to the zinc binding site, (H-E-X-X-H)-X-X-G-X-XH. The third group, which contains the matrix metalloproteinases, the snake venom metalloproteinases, and Astacus proteinase, has the first and third motifs but lacks the second motif (which contains the conserved third zinc ligand). The other two groups are not yet fully described. The amino acid sequence of Ht-e conforms to the third group. In summary, the cDNA sequence for a snake venom hemorrhagic metalloproteinase was determined. The translated protein sequence of the mature Ht-e protein shows similarity to other snake venom metalloproteinases. The continuation of the open reading frame beyond the end of the observed carboxy terminus of the mature protein suggests a carboxyterminal domain, which is processed away from the proteinase. The high degree of similarity between this clone and a cDNA clone for the disintegrin trigramin strengthens this hypothesis. Furthermore, it helps to bridge our understanding of the interrelatedness of the large and small molecular weight venom metalloproteinases to one another and to the disintegrin family of peptides. It also predicts the existence of this disintegrin-like peptide in C. atrox venom, the function of which remains to be established. Efforts to isolate this peptide are underway. A cluster of amino acids from residues 92 to 101 was noted to be conserved in the hemorrhagic metalloproteinases but not
6210 Biochemistry, Vol. 31, No. 27, 1992 conserved in the only nonhemorrhagic snake venom metalloproteinase whose protein sequence has been determined. This could be evidence that this region of the proteinase is crucial to its ability to hydrolyze substrates in a manner which results in hemorrhage. Since this region is now defined to a single, 10 amino acid sequence and a cDNA clone is available, it should be possible to perform site-directed mutagenesis studies in order to test this hypothesis. Evidence exists for a signal sequence and prosequence for Ht-e. The propeptide shows some similarity to the propeptide of MMPs and may have a similar mechanism of activation. This region of the cDNA clone is also a good candidate for site-directed mutagenesis studies. The zinc binding region of Ht-e more closely resembles that of the MMPs than of the thermolysin family of zinc metalloproteinases, which is not surprising since the substrates of Ht-e are more similar to those of the MMPs. More studies are needed to better define the structure and function of this family of metalloproteinases. ACKNOWLEDGMENTS We thank Ms. Lolita L. Bland for her assistance in DNA sequencing. Registry No. DNA (Crotalus atrox atrolysin e mRNA-complementary and flanking region), 141436-80-8; DNA (Crotalus atrox atrolysin e mRNA-complementary), 141436-79-5; preproatrolysin e (Crotalus atrox reduced), 141436-82-0; atrolysin e (Crotalus atrox reduced), 141436-81-9; atrolysine, 81669-70-7; proatrolysin e (Crotalus atrox reduced), 14 1436- 8 3- 1.
REFERENCES Ausubel, F. M., Brent, R., Kingston, R. E., Moore, D. D., Seidman, J. G., Smith, J. A,, & Struhl, K., Eds. (1987) Current Protocols in Molecular Biology, John Wiley & Sons, New York. Baramova, E. N., Shannon, J. D., Bjarnason, J. B., & Fox, J. W. (1989) Arch. Biochem. Biophys. 275, 63-71. Baramova, E. N., Shannon, J. D., Bjarnason, J. B., & Fox, J. W. (1990a) Matrix 10, 91-97. Baramova, E. N., Shannon, J. D., Bjarnason, J. B., Gonias, S. L., & Fox, J. W. (1990b) Biochemistry 29, 1069-1074. Bidlingmeyer, B. A,, Cohen, S . A,, & Tarvin, T. L. (1984) J . Chromatogr. 336, 93-104. Bilofsky, H. S . , Burks, C., Fickett, J. W., Goad, W. B., Lewitter, F. I., Rindone, W. P., Swindell, C. D., & Tung, C . - S . (1986) Nucleic Acids Res. 14, 1-4. Bjarnason, J. B., & Tu, A. T. (1978) Biochemistry 17, 3395-3404. Bjarnason, J. B., & Fox, J. W. (1983) Biochemistry 22, 3770-3778. Bjamason, J. B., & Fox, J. W. (1987) Biocheim. Biophys. Acta 91 1 , 356-363. Bjarnason, J. B., & Fox, J. W. (1989) J. Toxicol. Toxin Rev. 7, 121-209. Bjarnason, J. B., Hamilton, D., & Fox, J. W. (1988) Biol. Chem. Hoppe-Seyler 369, 121-129. Chao, B. H., Jakubowski, J. A., Savage, B., Ping Chow, E., Marzec, U. M., Harker, L. A., & Maraganore, J. M. (1989) Proc. Natl. Acad. Sci. U.S.A. 86, 8050-8054. Chirgwin, J. M., Przybyla, A. E., MacDonald, R. J., & Rutter, W. J. (1979) Biochemistry 18, 5294-5299. Delepelaire,P., & Wandersman, C. (1989) J. Biol. Chem. 264, 9083-9089. Dennis, M. S . , Henzel, W. J., Pitti, R. M., Lipari, M. T., Napier, M. A., Deisher, T. A., Bunting, S . , & Lazarus, R. A. (1989) Proc. Natl. Acad. Sci. U.S.A. 87, 2471-2475.
Hite et al. Fox, J. W., Campbell, R., Beggerly, L., & Bjarnason, J. B. (1986) Eur. J. Biochem. 156, 65-72. Friedman, M., Krull, L. H., & Cavins, J. F. (1970) J. Biol. Chem. 245, 3868-387 1. Gan, Z.-R., Gould, R. J., Jacobs, J. W., Friedman, P. A,, & Polokoff, M. A. (1988) J. Biol. Chem. 263, 19827-19832. Gould, R. J., Polokoff, M. A., Friedman, P. A,, Huang, T.-F., Holt, J. C., Cook, J. J., & Niewiarowski, S . (1990) Proc. SOC.Exp. Biol. Med. 195, 168-171. Gross, E. (1967) Methods Enzymol. 11, 238-255. Hirs, C. H. W. (1967) Methods Enzymol. 1 1 , 199-203. Huang, T.-F., Holt, J. C., Kirby, E. P., & Niewiarowski, S . (1989) Biochemistry 28, 661-666. Jongeneel, C. V., Bouvier, J., & Bairoch, A. (1989) FEBS Lett. 242, 21 1-214. Lawrence, C. B., Shalom, T., & Honda, S. (1989) SAM: A Software Package for Sequence Assembly Management for UNZX Systems, Molecular Biology Information Resource, Baylor College of Medicine, Houston, TX. Matthews, B. W., Weaver, L. H., & Kester, W. R. (1974) J. Biol. Chem. 249, 8030-8044. Miyata, T., Takeya, H., Ozeki, Y., Arakawa, M., Tokunaga, F., Iwanaga, S., & Omori-Satoh, T. (1989) J.Biochem. 105, 847-853. Murphy, G. J. P., Murphy, G., & Reynolds, J. J. (1991) FEBS Lett. 289, 4-7. Musial, J., Niewiarowski, S . , Rucinski, B., Stewart, G. J., Cook, J. J., Williams, J. A., & Edmunds, L. H., Jr. (1990) Circulation 82, 261-273. Nagase, H., Enghild, J. J., Suzuki, K., & Salvesen, G. (1990) Biochemistry 29, 5783-5789. Nakagawa, H., Miller, R. A,, Tu, A. T., Nikai, T., & Sugihara, H. (1989) Biochem. Arch. 5, 243-247. Nakahama, K., Yoshimura, K., Marumoto, R., Kikuchi, M., Lee, I. S . , Hase, T., & Matsubara, H. (1986) Nucleic Acids Res. 14, 5843-5855. Neeper, M. P., & Jacobson, M. A. (1990) Nucleic Acids Res. 18, 4255. Ownby, C. L., Bjarnason, J. B., & Tu, A. T. (1978) Am. J . Pathol. 93, 201-218. Pearson, W. R., & Lipman, D. J. (1988) Proc. Natl. Acad. Sci. U.S.A. 85, 2444-2448. Rucinski, B., Niewiarowski, S., Holt, J. C., Soszka, T., & Knudsen, K. A. (1990) Biochim. Biophys. Acta 1054, 257-262. Sambrook, J., Fritsch, E. F., & Maniatis, T. (1982) in Molecular Cloning: A Laboratory Manual, pp 466-467, Cold Spring Harbor Laboratory Press, Salem, MA. Sanchez, E. F., Diniz, C. R., & Richardson, M. (1991) FEBS Lett. 282, 178-182. Sanger, F., Nicklen, S., & Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U.S.A. 74, 5463-5467. Scarborough, R. M., Rose, J. W., Hsu, M. A,, Phillips, D. R., Fried, V. A., Csmpbell, A. M., Nannizzi, L., & Charo, I. F. (1991) J. Biol. Chem. 266(15), 9359-9362. Shannon, J. D., Baramova, E. N., Bjarnason, J. B., & Fox, J. W. (1989) J. Biol. Chem. 264, 11575-11583. Shebuski, R. J., Ramjit, D. R., Bensen, G. H., & Polokoff, M. A. (1989) J . Biol. Chem. 264, 21550-21556. Sidler, W., Niederer, E., Suter, F., & Zuber, H. (1986) Biol. Chem. Hoppe-Seyler 367, 643-657. Takeya, H., Arakawa, M., Iwanaga, S . , & Omori-Satoh, T. (1989) J. Biochem. 106, 151-157. Takeya, H., Oda, K., Miyata, T., Omori-Satoh, T., & Iwanaga, S . (1990a) J. Biol. Chem. 265, 16068-16073.
Biochemistry 1992, 31, 621 1-6219 Takeya, H., Onikura, A., Nikai, T., Sugihara, H., & Iwanaga, S . (1990b) J . Biochem. 108, 711-719. Takeya, H., Kubo, J., Omori-Satoh, T., & Iwanaga, S . (1991) Thromb. Haemostasis 65, 842. Tanizaki, M. M., Zingali, R. B., Kawazaki, H., Imajoh, S., Yamazaki, S., & Suzuki, K. (1989) Toxicon 27,747-755. Titani, K., Hermodson, M. A., Eriason, L. H., Walsh, K. A., & Neurath, H. (1972) Nature (London), New Biol. 238, 35-37. Valee, B. L., & Auld, D. S . (1990) Biochemistry 29, 5 647-5 659. Valee, B. L., & Auld, D. S . (1991) in Methods in Protein Sequence Analysis (Hornvall, H., HGg, J.-O., & Gustavsson, A,-M., Eds.) pp 363-372, Birkhauser Verlag, Basel.
621 1
Van Wart, H. E., & Birkedal-Hansen, H. (1990) Proc. Natl. Acad. Sci. U.S.A. 87, 5578-5582. von Heijne, G. (1983) Eur. J . Biochem. 133, 17-21. Whitham, S . E., Murphy, G., Angel, P., Rahmsdorf, H. J., Smith, B. J., Lyons, A., Harris, T. J. R., Reynolds, J. J., Herrlich, P., & Docherty, A. J. P. (1986) Biochem. J. 240, 9 13-9 16. Williams, J., Rucinski, B., Holt, J. C., & Niewiarowski, S . (1990) Biochim. Biophys. Acta 1039, 81-89. Woessner, J. F. (1991) FASEB J . 5, 2145-2154. Yamanaka, M., Matsuda, M., Isobe, J., Ohsaka, A,, Takahasi, T., & Omori-Satoh, T. (1974) Platelets, Thrombosis, and Inhibitors (Didisheim, P., Ed.) pp 335-344, F. K. Schattauer-Verlag, New York.
Identification of Five Sites of Acetylation in Alfalfa Histone H4t Jakob H. Waterborg Division of Cell Biology and Biophysics, School of Basic Life Sciences, University of Missouri-Kansas City, Kansas City, Missouri 64110-2499 Received January 16, 1992; Revised Manuscript Received April 7, 1992
ABSTRACT: Radioactive acetylation in vivo of plant histone H4 of alfalfa, Arabidopsis, tobacco, and carrot revealed five distinct forms of radioactive, acetylated histone. In histone H 4 of eukaryotes ranging from fungi to man, acetylation is restricted to four lysines (residues 5, 8, 12, and 16) possibly caused by a quantitative methylation of lysine-20. Chemical and proteolytic fragmentation of the amino terminally blocked alfalfa H 4 protein, dynamically acetylated by radioactive acetate in vivo, allowed protein sequencing and identification of selected peptides. Peptide identification was facilitated by analyzing fully characterized calf histone H4 in parallel. Acetylation in vivo of alfalfa histone H 4 was restricted to the lysines in the amino-terminal domain of the protein, residues 1-23. Lysine-20 was shown to be free of methylation, as in pea histone H4. This apparently makes lysine-20 accessible as a novel target for histone acetylation. The in vivo pattern of lysine acetylation (16 > 12 > 8 2 5 = 20) revealed a preference for lysines-16 and 12 without an apparent strict sequential specificity of acetylation.
-
c o r e histone acetylation has been suggested to allow transient displacement of histones from DNA by polymerases during gene transcription and DNA duplication and permanent displacement by protamines during spermatogenesis [for recent reviews, see Van Holde (1989), Grunstein (1990), and Csordas (1990)], Multiacetylated core histones H3 and H4 appear the functional forms for these processes (Van Holde, 1989; Delcuve & Davie, 1989; Boffa et al., 1990; Walker et al., 1990). In both histones, the amino acid sequence of the histone amino-terminal protein domain with the multiple lysine residues which can be acetylated in vivo is highly conserved (Matthews & Waterborg, 1985). Experiments in yeast (Megee et al., 1990; Grunstein, 1990; Durrin et al., 1991), Physarum (Pesis & Matthews, 1986), Tetrahymena (Chicoine et al., 1986), Drosophila (Mu& et al., 1991), and animal cells (Couppez et al., 1987; Turner & Fellows, 1989; Thome et al., 1990) have revealed functional differences between lysine residues, a nonrandom order of lysine acetylation, and a nonrandom distribution of steady-state acetylation patterns in histone H4. Nonrandom distributions of lysine acetylation This work was supported by an NIH Biological Research Support Grant to the School of Basic Life Sciences, University of MissouriKansas City, and by National Science Foundation Grants DCB-8896292 and DCB-9118999.
and methylation have recently been described for two distinct histone H3 variants in alfalfa (Waterborg, 1990). This paper presents the pattern of acetylation of histone H4 in alfalfa. The highest level of histone H4 acetylation observed in animals and lower eukaryotes is four, restricted to lysines-5, -8, -12, and -16 in the amino-terminal domain of histone H4 (Matthews & Waterborg, 1985; Van Holde, 1989; Thorne et al., 1990). Methylation of lysine-20 in lower eukaryotes like Physarum (Waterborg et al., 1983) and in animals ranging from sea urchins and fishes to birds and mammals (Van Holde, 1989; Duerre & Buttz, 1990) precludes modification by acetylation. In contrast, when alfalfa cells were labeled in vivo with acetate, five radioactive, charge-modified forms of histone H4 were observed when histone H4 was fractionated in acid/urea/Triton (AUT)' gels (Waterborg et al., 1989, 1990). In vivo treatment of alfalfa cells with [32P]phosphatefailed to label histone H4 (Waterborg et al., 1989), suggesting that phosphorylation (Ruiz-Carrillo et al., 1975) is not involved. This paper describes experiments that identify lysine-20 of I AUT, acid/urea/Triton; BAP, 6-benzylaminopurine; BSA, bovine serum albumin fraction V; PTH, phenylthiohydantoin;TEMED, N,N,N',N'-tetramethylethylenediamine; TFA, trifluoroacetic acid; TPCK, N-tosyl-L-phenylalanine chloromethyl ketone; 2,4D, 2,4-dichlorophenoxyacetic acid.
0006-2960/92/043 1-6211%03.00/0 0 1992 American Chemical Society