Amino acid sequence of human histidine-rich glycoprotein derived

Aug 28, 1985 - with an affinity-purified antibody to human histidine-rich glycoprotein (HRG) and then with a restriction fragment isolated from the 5'...
0 downloads 0 Views 1MB Size
2220

Biochemistry 1986, 25, 2220-2225

Demple, B., & Linn, S. (1980) Nature (London) 287, 203-208. Doetsch, P. W., Chan, G. L., & Haseltine, W. A. (1985) Nucleic Acids Res. 13, 3285-3304. Fisher, G. J., & Johns, H . E. (1976) in Photochemistry and Photobiology of Nucleic Acids (Wang, S . Y., Ed.) Vol. 1, pp 169-224, Academic, New York. Friedberg, E. C., & Bridges, B., Eds. (1983) Cellular Response to DNA Damages, Liss, New York. Gates, F. T., & Linn, S. (1977) J. Biol. Chem. 252, 2802-2807. Gordon, L. K., & Haseltine, W. A. (1981) J. Biol. Chem. 256, 6608-66 16. Haseltine, W. A., Gordon, L. K., Lindan, C. P., Grafstrom, R. H., Shaper, N. L., & Grossman, L. (1980) Nature (London) 285, 634-641. Helland, D. E., Male, R., & Kleppe, K. (1986) FEBS Lett. (submitted for publication). Karran, P., Lindahl, T., & Griffin, B. (1979) Nature (London) 280, 76-79. Kleppe, K., & Lillehaug, J. R. (1979) A h . Enzymol. Relat. Areas Mol. Biol. 48, 245-275. Kochetkov, N . K., & Budovskii, E. I. (1972) Organic Chemistry of Nucleic Acids, Part B, pp 543-618, Plenum, London. Lindahl, T., & Nyberg, B. (1972) Biochemistry 1 1 , 36 10-36 18.

Linn, S. (1982) in Nucleases (Linn, S . M., & Roberts, R. J., Eds.) pp 59-84, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. Lippke, J . A., Gordon, L. K., Brash, D. E., & Haseltine, W. A. (1981) Proc. Natl. Acad. Sci. U.S.A. 78, 3388-3392. Maniatis, T., Fritsch, E. F., & Sambrook, J. (1982) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. Maxam, A. M., & Gilbert, W. (1980) Methods Enzymol. 65, 499-560. Messing, J., & Vieira, J. (1982) Gene 19, 269-276. Nes, I. F. (1980) Eur. J. Biochem. 112, 161-168. Radman, M. (1976) J . Biol. Chem. 251, 1438-1445. Riazuddin, S., & Grossman, L. (1977) J . Biol. Chem. 252, 6280-6286. Royer-Pokora, B., Gordon, L. K., & Haseltine, W. A. (1 98 1) Nucleic Acids Res. 9, 4595-4609. Sancar, A,, & Sancar, G. B. (1984) J . Mol. Biol. 172, 223-227. Tapper, D. P., & Clayton, D. A. (1981) Nucleic Acids Res. 9, 6787-6794. vanlanker, J. L., & Tomura, T. (1974) Biochim. Biophys. Acta 353, 99-1 14. Yamane, T., Wyluda, B. J., & Shulman, R. G. (1967) Proc. Natl. Acad. Sci. U.S.A. 58, 439-442. Yanisch-Perron, C., Vieira, J., & Messing, J. (1985) Gene 33, 103-1 19.

Amino Acid Sequence of Human Histidine-Rich Glycoproteiq Derived from the Nucleotide Sequence of Its cDNAt Takehiko Koide,t,E Donald Foster,$ Shinji Yoshitake,*J and Earl W . Davie*i$ Department of Biochemistry, University of Washington, Seattle, Washington 981 95, and Department of Biochemistry, Niigata University School of Medicine, Niigata 951, Japan Received August 28, 1985

ABSTRACT: A Xgt 1 1 library containing cDNA inserts prepared from human liver m R N A has been screened

with an affinity-purified antibody to human histidine-rich glycoprotein (HRG) and then with a restriction fragment isolated from the 5’ end of the largest c D N A insert obtained by antibody screening. A number of positive clones were identified and shown to code for H R G by D N A sequence analysis. A total of 2067 nucleotides were determined by sequencing 3 overlapping c D N A clones, which included 121 nucleotides of 5’-noncoding sequence, 54 nucleotides coding for a leader sequence of 18 amino acids, 1521 nucleotides coding for the mature protein of 507 amino acids, a stop codon of TAA, and 352 nucleotides of 3’-noncoding sequence followed by a poly(A) tail of 16 nucleotides. The length of the noncoding sequence of the 3’ end differed in several clones, but each contained a polyadenylylation or processing sequence of AATAAA followed by a poly(A) tail. More than half of the amino acid sequence of H R G consisted of five different types of internal repeats. Within the last 3 internal repeats (type V), there were 12 tandem repetitions of a 5 amino acid segment with a consensus sequence of Gly-His-His-Pro-His. This repeated portion, referred to as a “histidine-rich region”, contained 53% histidine and showed a high degree of similarity to a histidine-rich region of high molecular weight kininogen.

H i s t i d i n e - r i c h glycoprotein (HRG) has been isolated and characterized from human serum (Heimburger et al., 1972; Morgan, 1978), plasma (Lijnen et al., 1980; Koide et al., 1982, ‘This work was supported in part by National Institutes of Health Research Grant HL 16919 to E.W.D. $University of Washington. 8 Niigata University School of Medicine. Present address: Eisai Co., Ltd., Tsukuba Research Laboratory, Ibaraki 300-26, Japan. ‘8

0006-2960/86/0425-2220$01.50/0

1985), and platelets (Leung et al., 1983) as well as from rabbit serum (Morgan, 198 1). Although the physiological function of HRG has not yet been established, a number of biological properties have been reported including interaction with heparin (Heimburger et al., 1972; Koide et al., 1982; Lijnen et



Abbreviations: HRG, histidine-rich glycoprotein; HMW kininogen, high molecular weight kininogen; Tris-HCI, tris(hydroxymethy1)aminomethane hydrochloride; EDTA, ethylenediaminetetraacetic acid; kDa, kilodalton(s).

0 1986 American Chemical Society

HUMAN HISTIDINE-RICH

GLYCOPROTEIN

al., 1983a; Kindness et ai., 1984; Niwa et al., 1985), with the lysine binding site of plasmin(ogen) (Lijnen et al., 1980; Ichinose et al., 1984), and with thrombospondin (Leung et al., 1984). HRG will also cause an inhibition of autorosette formation between erythrocytes and lymphocytes (Rylatt et al., 1981; Lijnen et al., 1983b) and will bind to some divalent metals, heme and dyes (Morgan, 1978, 1981). Human HRG consists of a single polypeptide chain with an apparent molecular weight of 75 000, including 14% carbohydrate. Its N-terminal sequence is Val-Ser-Pro-Thr-Asp (Koide et al., 1982). One of the striking characteristics of HRG is its unusual high content of histidine and proline, each of which exceeds 12% of the total amino acid content of the protein (Heimburger et al., 1972; Lijnen et al., 1980; Koide et al., 1985). Preliminary studies on the primary structure of HRG have shown that the N-terminal sequence is homologous to antithrombin I11 (Koide et al., 1982). The histidine-rich region of HRG is located in the C-terminal portion of the polypeptide chain (Koide et al., 1985). Detailed information of the primary structure of H R G is essential to understand the mechanism of the diverse functions of this protein and to clarify its physiological function in vivo. In this paper, the isolation and characterization of cDNAs coding for HRG are reported which establish the complete primary structure of the protein. MATERIALS AND METHODS Materials. HRG was purified to homogeneity from human plasma as described previously (Koide et al., 1985). The purified protein migrated as a single band of apparent M , 75 000 on sodium dodecyl sulfate-polyacrylamide gel electrophoresis. Antibody to human HRG was prepared as described previously (Koide, 1979) and purified by affinity chromatography using an HRG-coupled agarose column. Screening of the Xgtl I cDNA Library. A Xgtl 1 cDNA library containing cDNA inserts prepared from human liver mRNA was kindly provided by Dr. Savio L. C. Woo. Approximately 2 X lo6 phage were screened for HRG by the antibody-screening method of Young and Davis (1983a,b). The library was then rescreened by employing a restriction fragment of 291 nucleotides isolated from the 5’ end of the largest cDNA insert. The affinity-purified antibody was labeled with lZ5Ito a specific activity of 1.5 X lo6 cpm/pg and was used to screen nitrocellulose filters containing phage plated at a density of 1.5 X lo5 plaques per 150-mm plate. The cDNA probe was labeled with 32Pby nick translation to a specific activity of 1.7 X lo8 cpm/pg. Positive clones were isolated, and each was plaque-purified. DNA Sequence Analysis. Phage DNA was prepared by the plate-lysate method (Maniatis et al., 1982), followed by banding on a cesium chloride step gradient essentially as described by Degen et al. (1983). The cDNA insert in the purified phage DNA was isolated by digestion with EcoRI and subcloned into the plasmid pUC9 (Vieira & Messing, 1982). Appropriate restriction fragments from the insert were subcloned into M13mp18 or M13mp19 for sequencing by the dideoxy chain termination method developed by Sanger et al. (1977). Sequencing reactions were carried out with deoxyadenosine ~ ’ - ( ( Y - [ ~thiotriphosphate) ~S] ( [35S]dATPaS)and run on buffer gradient gels containing 0.5-2.5X TBE buffer (0.089 M Tris-HCl/O.O89 M boric acid/2 mM EDTA, pH 8.3) (Biggin et al., 1983). Over 90% of each strand of the cDNA insert was sequenced 2 or more times. Exonuclease Ba131 was employed to generate overlapping sequences at the 3’ end of XHHRGl by using the systematic deletion method described by Poncz et al. (1982).

VOL. 25, N O . 8 , 1 9 8 6

-

2221

- a 0

0

n o

x w

5’ AHHRGI

Y I500 2000

0

500

- -

1000

3’ (bP)

L (

LHHRGl

%HHRG2

FIGURE 1: Partial restriction map and sequencing strategy for the cDNA inserts in clones XHHRG1, XHHRG2, and XHHRG3. The extent of sequencing is shown by the length of each arrow, and the direction of the arrow indicates the strand that was sequenced. The solid bars indicate the regions coding for the mature protein in each cDNA insert, and the open portion indicates the region coding for the leader sequence. Noncoding regions are shown by the lines at the end(s) of each cDNA insert.

M13mp18, M13mp19, and [ 3 5 S ] d A T P ~were S purchased from Amersham. Restriction enzymes, nuclease Ba13 1, T4 DNA ligase, bacterial alkaline phosphatase, Escherichia coli DNA polymerase I (Klenow fragment), and other enzymes were from Bethesda Research Laboratories and/or New England Biolabs. Deoxynucleotide triphosphates and dideoxynucleotide triphosphates were from P-L Biochemicals. NalZ51was from New England Nuclear. DNA sequences were stored and analyzed by the computer programs of Staden (1977, 1978). RESULTS Isolation and Characterization of cDNAs Codingfor HRG. A human liver cDNA library cloned into a Xgtl 1 phage expression vector was screened for cDNAs coding for HRG by the method of Young and Davis (1983a,b). In this method, IZ5I-labeledaffinity-purified antibody was used to detect phage directing the synthesis of a fusion protein between P-galactosidase and HRG. Twelve positive clones were isolated by screening 2 X IO6 phage, and each was plaque-purified. The cDNA inserts were isolated from 10 clones by EcoRI digestion, and the 5’ and 3’ ends of each insert were sequenced. Out of 10 inserts that were characterized, 7 contained a poly(A) tail at the 3’ end. The clones which lacked a poly(A) tail were apparently derived from a single clone by amplification of the cDNA library since they were identical in length and had the same terminal sequences. A total of 36 more positive clones were then isolated from the Xgtll library by screening 1.5 X lo6 phage with a 32Plabeled cDNA fragment. The cDNA from one phage with the largest insert (XHHRG3) obtained from the second screening and cDNA inserts obtained from XHHRGl and XHHRG2 from the first screening were each subcloned into pUC9, and their nucleotide sequence was determined by using the strategy shown in Figure 1. A total of 2067 nucleotides for the three overlapping cDNA inserts is shown in Figure 2 along with the predicted amino acid sequence. This sequence contains 121 nucleotides from the 5’-noncoding end, 54 nucleotides coding for a leader sequence of 18 amino acids, 1521 nucleotides coding for 507 amino acids present in the mature protein, a stop codon of TAA, and 352 nucleotides of 3’noncoding sequence, followed by a poly(A) tail of 16 nucleotides. The leader sequence of 18 amino acids in HRG contains a typical region rich in hydrophobic amino acids (Blobel et al., 1979). The leader sequence is removed by the cleavage of an Ala-Val peptide bond by signal peptidase. A number of stop codons precede the initiation methionine at position

2222

BIOCHEMISTRY

KOIDE ET A L . 5'

A TAT AAT ATA AAC TAA TAA AGA TTA GGA A.41 AAT TAA TGT ATA CCG TAA TGT AGA CCG ACT CAG GTA 10 20 30 40 50 60

-18 -10 -1 Met Lys Ala Leu I l e A l a Ala Leu Leu Leu l l e Thr Leu Gln Tyr Ser Cys A l a TGT AAG TAG AM ATA TGA AGG TGA ATT AGA TAA TTA AAG M;A TGG TTT AAC AM ATG AAG GCA CTC ATT GCA MA CTG CTT TTG ATC ACA TTG CAG TAT TCG TGT GCC 70 80 90 100 110 120 130 140 150 160 170 +1 10 20 7n Val Ser Pro Thr Asp Cys Ser Ala Val Giu Pro Glu A l a Glu Lys Ala Leu Asp Leu l i e Asn LYS Arg Arg A r g Asp Gly Tyr Leu Ph; Gln Leu Leu Arg I l e Ala GTG AGT CCC ACT GAC TGC AGT GCT GTT MG CCG GAG GCT GAG AAA GCT C T A GAC CTG ATC A A C MA AGG CGA CGG GAT GGC TAC CTT TTC CAA TTG CTG CGG A T T GCT 180 190 200 210 220 230 240 250 260 270 280 40 50 60 .:':. 70 Asp A l a H i s Leu Asp Arg Val Glu A m Thr Thr Val Tyr Tyr Leu Val Leu Asp Val Gln Glu Ser Asp Cys Ser V a l Leu Ser Arg Lys Tyr Trp A h Asp Cys Glu GAT GCC CAC TTG GAC AGA GTG GAA AAT ACA ACT GTA TAT TAC TTA GTC TTA GAT GTG CAP GA4 TCG GAC TGT TCG GTC C T A TCC AGG A4P TAC TGG AAT GAC TGT GAG 360 370 380 390 290 300 310 320 330 34G 350

+

80 90 100 Pro Pro Asp Ser Arg Arg Pro Ser Glu I l e V a l ! l e Gly Gln Cys Lys Val I l e A l a Thr Arg H i s Ser H i s Glu Ser Gln Asp Leu Arg V a l I l e Asp Phe Asn Cys CCA CCT GAT TCC AGA CGT CCA TCT GAA ATA GTG ATC GGA CAA TGT AAG GTA ATA GCT ACA AGA CAT TCC CAT GAA TCT CAG GAC CTC AGA GTG A T T GAC TTT AAC TGC 400 410 420 430 440 450 460 470 480 490 110 120 130 140 Thr Thr Ser Ser V a l Ser Ser Ala Leu Ala Asn Thr Lys Asp Ser Pro Val Leu I l e Asp Phe Phe Glu Asp Thr Glu Arg Tyr Arg Lys Gln A l a Asn t y s Ala Leu ACC ACA AGT TCT GTC TCT TCA GCA CTG GCC AAT ACC AAA GAT AGT CCG GTC CTC ATA GAT TTC TTT GAG GAT A C T GAG CGC T A C AGA MA CAA GCC A A C MA GCC C T T 500 510 520 530 540 550 560 570 580 590 600 150 160 170 180 Glu Lys Tyr Lys Glu Glu Asn Asp Asp Phe Ala Ser Phe Arg Val Asp Arg I l e Glu Arg Val A l a Arg Val Arg Gly Gly Glu Gly Thr Gly Tyr Phe V a l Asp Phe GAG M G TPC AAP GAG GAG AAT GAT GAC TTT GCC TCT TTC AGA G T G GAC CGA ATC MG AGA GTT GCA AGA GTG AGA GGA GGG GAA GGA ACT GGT TAC TTC GTG GAC TTC 610 620 630 640 650 660 670 680 690 700 7 10 190 200 210 Ser Val Arg Asn Cys Pro Arg H i s H i s Phe Pro Arg H i s Pro Asn Val Phe Gly Phe Cys Arg Ala Asp Leu Phe Tyr Asp V a l Glu A l a Leu Asp Leu Glu Ser Pro TCT GTG CGG A A C TGC CCC AGA CAC CAT TTC CCC AGA CAC CCC PAT GTC T T l GGA TTC TGC AGA GCA GAT T T G TTC TAT GAT GTA GAP GCC T T G GAC T T G GAA AGC CCG 720 730 740 7 50 760 770 780 790 800 810 820 220 230 240 250 Lys Asn Leu V a l I l e Asn Cys Glu Val Phe Asp Pro Gln Glu H i s Glu Asn I l e Asn Gly V a l Pro Pro H i s Leu Gly H i s Pro Phe H i s Trp Gly Gly H i s Glu Arg A M PAC C T l GTC ATA AAC TGT GAA GTC TTC GAC C C T CAG GAA CAT GAG A A C A T C AAT GGT GTA CCG CCT CAT TTG GGA CAT CCC TTC CAC TGG GGT GGG CAT GAG CGT 830 840 850 860 870 880 890 900 910 920 930 260 270 280 Ser Ser Thr Thr Lys Pro Pro Phe Lys Pro H i s Gly Ser Arg Asp H i s H i s H i s Pro H i s Lys Pro H i s Glu H i s Gly Pro Pro Pro Pro Pro Asp Glu Arg Asp His TCT TCT ACC ACC AAG CCT CCA T T C AAG CCC CAT GGA TCT AM GAT CAT CAT CAT CCC CAC AAG CCA CAC GAA CAT GGA CCC CCA CCT CCT CCA GAT GA4 AGA GAT CAC 94 0 950 960 970 980 990 1000 1010 1020 1030 290 300 310 320 Ser H i s Gly Pro Pm Leu Pro Gln Gly Pro Pro Pro Leu Leu Pro Met Ser Cys Ser Ser Cys Gln H i s Ala Thr Phe Gly Thr Asn Gly Ala Gln Arg H i s Ser H i s TCA CAT GGA CCC CCA CTT CCA C A A GGC CCT CCT CU CTA TTG CCC ATG TCC TGC TCA ACT TGT CAA CAT GCC ACT TTT GGC ACA AAT GGG GCC CAA AGA CAT TCT CAT 1040 1050 1060 1070 1080 1090 1100 1110 1120 1130 1140

A A 330 340 350 360 Asn Ayn A& Ser Ser Leu H i s P r o H i s Lys H i s H i s Ser H i s Glu Gln H i s Pro H i s Gly H i s H i s Pro H i s A l a H i s H i s pro H i s Glu H i s Asp Thr H i s Arg MT AAT AAT TCC AGT GAC CTC CAT CCC CAT AAG CAT CAT TCC CAT I;AA CAG CAT CCC CAC GGA CAC CAT CCC CAT GCA CAC CAT CCT CAT GAA C A T GAT ACC CAT AGA 1150 1160 1170 1180 1190 1200 1210 1220 1230 1240 1250

Ai,

370 380 390 Gln H i s Pro H i s Gly H i s H i s P r o H i s Gly H i s H i s Pro H i s Gly H i s His Pro H i s Gly H i s H i s Pro H i s Gly H i s H i s Pro H i s Cys H i s Asp Phe Gln Asp Tyr C A G CAT CCC C A T GGA CAC CAC CCC CAT GGA CAC CAT CCT CAT GGA CAC CAC CCC CAT GGA CAC CAT CCC C A T GGA CAC CAT CCC CAC TGC CAT GAT T T C C A A GAC T A T 1260 1270 1280 1290 1300 1310 1320 1330 1340 1350 1360 400 410 420 430 Gly Pro Cys Ai; Pro Pro Pro H i s Asn Gln Gly H i s Cys Cys H i s Gly His Gly Pro Pro Pro Gly H i s Leu A r g Arg Arg Gly Pro Gly Lys Gly Pro A r g Pro Phe GGA CCT TGT GAC CCA CCA CCC CAT AAC CAA GGT CAC TGT TGC CAT GGC CAC GGC CCA CCA C C T GGG CAC T T A AGA AGG CGA GGC CCA GGT A4P GGA C C C CGT C C C T T C 1370 1380 1390 1400 1410 1420 1430 1440 1450 1460 1470 440 450 460 H i s Cys Arg Gln I l e Gly Ser Val Tyr Arg Leu Pro P r o Leu Arg Lys Gly Glu Val Leu Pro Leu Pm Glu Ala Asn Phe Pm Ser Phe Pro Leu P r o H i s H i s Lys CAT TGC AGA CM ATT GGA TCT GTG TAC CGA CTC CCT CCT CTA At9 AAA GGT GAG GTG CTG CCA CTT CCT GAG GCC AAT TTT CCC AGC TTC CCA TTG CCG CAC CAC AAA 1480 1490 1500 1510 1520 1530 1540 1550 1560 1570 500 490 480 470 H i s Pro Leu Lys Pro Asp Asn Gln Pro Phe Pro Gln Ser Val Ser Glu Ser Cys Pro Gly Lys Phe Lys Ser Gly Phe Pro Gln V a l Ser Met Phe Phe Thr H i s Thr CAT CCT CTA A A G CCA GAC AAT CAG CCC TTT CCT CAA TCA GTC TCT GAA TCA TGT CCA GGG AAG TTC AAG AGT GGG TTT CCA C A A GTT TCC ATG T T T TTT ACA CAT ACA 1630 1640 1650 1660 1670 1680 1600 1610 1620 1580 1590 507 Phe Pro Lys STOP 0 TTT CCA m-TPn-41 GTG ATT CCT TTG AAG AGG AM ATG AAT AAT ACA TTG AAT TAG AM CAT @A TAA $41 GAC CAG TAA TTG TGA A M TTA CAG TTC TTT TCA ACC 1690 1700 1710 1720 1730 1740 1750 1760 1770 1780 1790 TAC TTT CAT ACT GA4 MT GCA GCA AM TGT wu\ T U ; 1800 1810 1820 1830

GA4 AAG AM TGG CCT GAG AAG AGA GAT CAA ATG GAA AGG A U GM AAG A A C T C A GTG CTG CCT ATT AGT AGT 1840

1850

1860

1870

1880

1890

1900

TAA TTC TGT CAC TCA CCA CTA CAT CAC TTG AGA CAA ATC TAT GCC ACT CAG RAT CTC CTT CTT TCC TGG ACT T U CTC TAP TTC TAG ACT CTC TGT TAC TGC TTG GGC 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 TAT ACC 2020 TGG GCA TAC

AT:iF

0 0 . TTG AAA CTA TM AM AAA AM AM AA 2050 2060 2067

3'

FIGURE2: Nucleotide sequence of cDNA inserts in XHHRG1, -2, and -3 that code for human HRG. The predicted amino acid sequence

contains the leader sequence (amino acids -18 to -1) and the mature protein (amino acids 1-507). (+, 0 )Potential carbohydrate binding sites to asparagine residues; (e) sites of polyadenylylation in XHHRG clones (details in text). The polyadenylylation or processing sequences (AATAAA) are shown in boxes. A potential polyadenylylation or processing sequence is shown by a dashed underline.

-18. This indicates that there is not a second methionine start site upstream from position -18. The N-terminal sequence for the mature protein starts with valine at position + 1. This sequence is in excellent agreement with the sequence of Val-Ser-Pro-Thr-Asp-Cys-Ser-Ala-

a cyanogen bromide fragment (CN-50 kDa) of HRG (Koide et al., 1985). The C-terminal sequence starting with phenylalanine at position 500 is also in good agreement with the sequence of Phe-Phe-Thr-His-Thr-(Phe,Pro,Lys) determined from a small cyanogen bromide fragment of HRG (Koide et al., 1985). Two additional fragments have been isolated and Val-Glu-Pro-Glu-Ala-Glu-Lys-Ala-Leu-Asp-Leu-Ile-Asn-Lys characterized (CN-30 kDa and CN-14 kDa) and shown to determined from the intact protein (Koide et al., 1982) and

VOL. 2 5 , NO. 8, 1986

H U M A N HISTIDINE-RICH GLYCOPROTEIN

PD

2223

L I R RQY (16-38) contain amino acid sequences of Ser-Cys-Ser-Ser-Cys-GlnA L E K I K E E N D D I A L Type I (143-165) S I II~ V D QR I ~I u vL (166-188) A R v R G c His-Ala-Thr-Phe-Gly and Pro-Pro-Pro-His-Asn-Gln-GlyN E c~ T cR Y P v D I s v P n c P R B Type I1 (100-121) D L R V I D P N C T T S S V S S A L A W 1 K His-Cys-Cys-His-Gly-His-Gly-Pro-Pro-Pro-Gly-His-Leu, re(122-142) M S P I i D F I D I E R I R K Q spectively (Koide et al., 1985). These sequences correspond Type Ill (226-244) P D P Q I B E N I n - C V P P B L c B P to those starting with Ser-305 and Pro-401. 1 (245-265) U B H C C B R S S T ‘ I K m F K P i c ? Polyadenylylation in the three cDNA clones coding for HRG occurred at different locations relative to the polyadenylation or processing sequence of AATAAA (Proudfoot & Brownlee, 1976). The cDNA insert in XHHRG2 was polyadenylylated two nucleotides upstream from the site used FIGURE 3: Five types of internal repeats present in HRG. Identical in XHHRG1, while the cDNA insert in XHHRG3 was six residues in each repeat are enclosed in boxes. Gaps were inserted nucleotides upstream from that in XHHRG1. A number of for maximal homology in the amino acid sequences. The single-letter other shorter clones that also coded for HRG were studied and code for amino acids is as follows: Ala, A; Asn, N ; Asp, D; Cys, C; shown to be polyadenylylated at one of the same positions as Gln, Q; Glu, E; Gly, G; His, H; Ile, I; Leu, L; Lys, K; Met, M; Phe, F; Pro, P; Ser, S ; Thr, T; Trp, W; Tyr, Y; Val, V. those described above. One clone, however, was polyadenylylated at nucleotide number 1774. This clone contains A) Amino Acid Sequence 8 ) Nucleotide Sequence an AATAAA sequence 17 nucleotides upstream from the poly(A) tail. An additional potential polyadenylylation or 330 1163 processing sequence of AATAAA was identified, which includes the last two nucleotides of the coding region and the GlU GlnlHis Pro His1 stop codon of TAA (Figure 2). This sequence, however, was not found to direct the polyadenylylation of any of the clones isolated thus far.

DISCUSSION In the present studies, a total of 2067 nucleotides have been sequenced from the cDNA inserts obtained from clones XHHRG1, -2, and -3. The derived amino acid sequence establishes the complete primary structure of HRG, consisting of 507 amino acid residues. This corresponds well with the estimate of 5 17 total amino acids from amino acid analysis (Koide et al., 1985). The amino acid composition of HRG as derived from the nucleotide sequence of the cDNA was as follows: Asp3,, Thr,,, Ser,,, Gluz8,Gln,,, Pro,,, Gly,,, ‘/~-CYSM, Val,,, Met2, I h , Leu29, Tyrl0,Phe26, Trp2, Lys2,, His,,, and Arg,,. The molecular weight calculated for the protein free of carbohydrate was 57 646. HRG contains about 14% carbohydrate (Heimburger et al., 1972). In the amino acid sequence reported here, four potential carbohydrate attachment sites having the sequence of Asn-X-Thr/Ser were found starting with Asn-45, Asn-107, Asn-326, and Asn-327. At the present time, it is not clear whether both of the adjacent asparagine residues (326 and 327) contain carbohydrate, but the presence of carbohydrate in one of these sites has been confirmed by the presence of glucosamine in a tryptic peptide isolated from a cyanogen bromide fragment (CN-30 kDa; T. Koide et al., unpublished results). A fifth potential carbohydrate attachment site is Asn-69 present in the sequence Asn-Asp-Cys. The sequence of Asn-X-Cys has been shown to contain carbohydrate in protein C (Stenflo & Fernlund, 1982). The addition of 14% carbohydrate to HRG would increase the molecular weight of the glycoprotein to about 67 000. Most of the histidine and proline residues in HRG are found in the C-terminal region. Histidine residues are highly concentrated in a small region between residues 330 and 389 which forms the “histidine-rich” region in the protein. Proline residues, on the other hand, are more widely distributed, but many are concentrated in “proline-rich” regions prior to and following the histidine-rich region. These regions (residue5 267-419) contain very few hydrophobic and basic amino acids. A striking feature of the sequence organization of HRG is the internal repeats of 23 amino acid segments, as classified into 5 different types (types I-V, Figure 3). More than half of the amino acids in HRG are present in these five different types of internal repeats, most of which are successive and

Gly His His Pro His Gly His His Pro His Gly His His Pro His 389

Consensus Sequence

Gly His His Pro His

-n-g

GGA CAC CAT CCC C 1342

GGA CAC CAT CCC CAT

FIGURE 4: Tandem repeats in the cDNA and derived amino acid sequences of the histidine-rich region of HRG. Amino acid residues and nucleotides that are identical with the consensus sequence are enclosed in boxes.

contiguous. Types I, IV, and V are triplications, and types I1 and I11 are duplications (Figure 3). These last repeats include 12 tandem repeats of 5 amino acids. The tandem repetition becomes striking when the amino acid sequence of the histidine-rich region (residues 330-389) is rearranged in blocks, as shown in Figure 4A. It is evident that a sequence of 5 amino acids is repeated 12 times in tandem in this region. Also, the last five repeated sequences are identical, representing exact repetitions of a consensus sequence of Gly-His-HisPro-His. The nucleotide sequence in this region revealed a more strictly conserved repeat of 15 nucleotides (Figure 4B). One possibility is that this region evolved from a short stretch of 15 nucleotides (nucleotides 1328-1 342) by successive gene duplication. This could occur in the following way: an ancestral gene coding for Gly-His-His-Pro-His was initially duplicated to generate a gene coding for 10 amino acids. Duplication of the latter sequence would generate a gene coding for a repeat of 20 amino acids composed of 4 tandem repeats of the original 5 amino acids (type V, Figure 3). Duplication of the repeat of 20 amino acids would yield the second and third type V repeats. It also seems possible that the tandem repeats of Gly-His-His-Pro-His are responsible for heme and metal binding to HRG, since two imidazole rings of histidine residues in His-Pro-His will form a configuration that is ideal for metal and heme binding. Furthermore, this probability is supported by experiments with rabbit HRG (Morgan, 1985). In these experiments, a peptide which contained most of the histidine and proline residues in HRG was isolated by limited plasmin digestion of HRG and shown to have the ability to bind heme and metal ions similar to intact HRG.

~

A

~

A

2224

B IOCH EM J STRY

K O I D E ET A L .

A)

H R G (364-391) H G O H P m H P H - - G H H P - H G H - H P H G H H P H C H HHMWK (475-506) O K F K L D D D LQ[E G B V L D q K m K L l G H GK O BHMWH (475-505) H K m G - m G K H K N K G K N N G K H Y D W R T P Y L A S S R)

FIGURE 5: Comparison of the amino acid sequences of human HRG with human and bovine HMW kininogen in their histidine-richregions (A) and amino-terminal regions (B). Amino acid residues identical with those of HRG are enclosed in boxes. Gaps were inserted for maximal homology. The numbering of the amino acid residues for human and bovine HMW kininogen was taken from Takagaki et al. (1985) and Kitamura et al. (1983), respectively. HHMWK, human HMW kininogen; BHMWK, bovine HMW kininogen.

Two other plasma proteins are well known to have a histidine-rich sequence in their polypeptide chains, including high molecular weight (HMW) kininogen (Han et al., 1975; Kitamura et al., 1983; Takagaki et al., 1985) and ceruloplasmin (Kingston et al., 1979). A comparison of the histidine-rich region in HRG with the histidine-rich regions in human and bovine HMW kininogen is shown in Figure 5A. These regions show a high degree of sequence homology (about 50%). Furthermore, many of the nonidentical residues differ by just one base in their respective codons. These data suggest that the histidine-rich regions of HRG and HMW kininogen are evolutionarily and functionally related. The histidine-rich regions of HRG and HMW kininogen are also similar to each other in that their hydrophobic amino acid content is very low. The hydrophobic amino acid content of the histidine-rich region of ceruloplasmin, however, is high. There also appears to be no sequence similarity to ceruloplasmin. The histidine-rich region of HMW kininogen is known to bind to kaolin in the kaolin-mediated initiation of blood coagulation (Sugo et al., 1980). On the basis of the high sequence homology of histidine-rich regions, HRG may have some function as a modulator of the contact activation of blood coagulation mediated by the histidine-rich region. A computer search also revealed a high degree of sequence homology of HRG with another part of H M W kininogen (Figure 5B). In this comparison, 20 residues out of 46 were identical (43% homology), where the homology between human and bovine HMW kininogen is 73%. This region in HMW kininogen is also present in low molecular weight kininogen (Ohkubo et al., 1984; Takagaki et al., 1985). The primary structure of the histidine-rich protein of the malaria parasite has recently been reported (Ravetch et al., 1984). This protein contains 73% histidine and also has several different types of internal repeats. However, no sequence homology has been found between this protein and the histidine-rich region of HRG. The amino acid sequences of the proline-rich regions of HRG are compared with those of proline-rich peptides (Saitoh et al., 1983a,b) and proline-rich proteins (Azen et al., 1984; Ziemer et al., 1984) from parotid saliva in Figure 6. The homology of HRG and proline-rich peptides and proteins is as high as 49% (Figure 6, top) and 45% (Figure 6, bottom). The characteristic sequence of Gln-Gly-Pro-Pro-Pro- in the proline-rich regions of proline-rich peptides and proteins is also present in HRG as a Gln/His-Gly-Pro-Pro-Pro sequence. Circular dichroism studies on proline-rich peptides (Isemura et al., 1983; Shibata et al., 1984) and glycoprotein (Aubert et al., 1982) demonstrated the existence of poly(L-proline) form I1 conformation in their polypeptide structures due to sequences of continuous proline residues. Recently, similar

PRP-1

FIGURE 6: Comparison of the amino acid sequenGes of two proline-rich regions in HRG with those of proline-rich peptides and proteins from parotid saliva. Amino acid residues that are identical with those of the proline-rich regions in HRG are’enclosed in boxes. Gaps were inserted for maximal homology. The sequences of proline-rich peptides (abbreviated as P-D and P-F) were taken from Saitoh et al. (1983a,b, respectively) and those of proline-rich proteins (abbreviated as PRP and PRP-1) from Ziemer et al. (1984) and Azen et al. (1984), respectively.

I

1

100

200

300

400

500

Residue No.

Hydropathy index for human HRG. Analyses were made by the method of Kyte and Doolittle (1982). FIGURE 7:

observations have also been reported with HRG, suggesting the presence of sequences of continuous proline residues (Morgan, 1985). In the present studies, four Pro-Pro, three Pro-Pro-Pro, and one Pro-Pro-Pro-Pro-Pro were identified in HRG. Furthermore, the amino acid sequences of two proline-rich regions in HRG are homologous to those of the proline-rich peptides and proteins. From these observations, it appears likely that the two proline-rich regions of human HRG may also have an elongated bending and twisting conformation similar to that suggested for the proline-rich protein (Ziemer et al., 1984) and rabbit HRG (Morgan, 1985). Hydropathy analysis by the method of Kyte and Doolittle (1982) showed that HRG is hydrophilic throughout the molecule (Figure 7). The histidine-rich region (residues 330-389) is particularly hydrophilic, giving large negative values in its hydropathic index. The secondary structure of HRG predicted by the method of Garnier et al. (1978) consists of 8% helix, 14% @-sheet,46% @-turn,and 32% random coil. The C-terminal half of the molecule (residues 215-507), however, would be predicted to contain no helical structure where the histidine-rich region consists of only @-turn and random coil. In spite of the high content of histidine and glycine residues and the frequent Occurrence of a Gly-His sequence, there were no Gly-His-Lys sequences in HRG. This tripeptide, which has been detected in plasma, has been shown to promote cell growth in rat hepatoma cells (Pickart et al., 1973). Homologies of HRG with HMW kininogen and proline-rich peptides and proteins in the C-terminal portion, together with homology of HRG with antithrombin I11 in the N-terminal portion (Koide et al., 1982, 1986), suggest that multifunctional properties of HRG may be due to the organization of the polypeptide chain into multidomains. ACKNOWLEDGMENTS We thank Drs. Savio L. C. Woo and Vincent Kidd for kindly providing the Xgtl 1 cDNA library and Drs. Dominic

H U M A N HISTIDINE-RICH GLYCOPROTEIN

Chung, Steven Leytus, and Kotoku Kurachi for their advice and suggestions. We also thank Dr. Gordon Vehar for assistance in the computer search for homology, secondary structure, and hydropathy analyses. Registry No. DNA (human liver histidine-rich glycoprotein mRNA complementary), 100857-87-2;glycoprotein (human liver histidine-rich precursor protein moiety reduced), 100857-88-3; glycoprotein (human liver histidine-rich protein moiety reduced), 100857-89-4.

REFERENCES Aubert, J. P., Porchet, N., Boersma, A., Loucheux-Lefebvre, M. H., & Degand, P. (1982) Biochim. Biophys. Acta 704, 361-365. Azen, E., Lyons, K. M., McGonigal, T., Barrett, N. L., Clements, L S., Maeda, N., Vanin, E. F., Carlson, D. M., & Smithies, 0. (1984) Proc. Natl. Acad. Sci. U.S.A. 81, 5561-5565. Biggin, M. D., Gibson, T. J., & Hong, G. F. (1983) Proc. Natl. Acad. Sci. U.S.A. 80, 3963-3965. Blobel, G., Walter, P., Chang, C. N., Goldman, B. M., Erickson, A. H., & Lingappa, R. (1979) Symp. SOC.Exp. Biol. 33. 9-36. Degen, S. J. F., MacGillivray, R. T. A., & Davie, E. W. (1983) Biochemistry 22, 2087-2097. Gamier, J., Osguthorpe, D. J., & Robson, B. (1978) J . Mol. Biol. 120, 97-120. Han, Y . N., Komiya, M., Iwanaga, S., & Suzuki, T. (1975) J . Biochem. (Tokyo) 79, 55-68. Heimburger, N., Haupt, H., Kranz, T., & Baudner, S. (1972) Hoppe-Seyler’s Z. Physiol. Chem. 353, 1133-1 140. Ichinose, A., Mimuro, J., Koide, T., & Aoki, N. (1984) Thromb. Res. 33, 401-401. Isemura, T., Asakura, J., Shibata, S., Isemura, S., Saitoh, E., & Sanada, K. (1983) Int. J. Pept. Protein Res. 21, 281-287. Kindness, G., Long, W. F., Vigrow, E. A., Williamson, F. B., Morgan, W. T., & Smith, A. (1984) Thromb. Res. 33, 139-143. Kingston, I. B., Kingston, B. L., & Putnam, F. W. (1979) Proc. Natl. Acad. Sci. U.S.A. 76, 1668-1672. Kitamura, N., Takagi, Y., Furuto, S., Tanaka, T., Nawa, H., & Nakanishi, S. (1983) Nature (London) 305, 545-549. Koide, T. (1979) J . Biochem. (Tokyo) 86, 1841-1850. Koide, T., Odani, S., & Ono, T. (1982) FEBS Lett. 141, 222-224. Koide, T., Odani, S., Takahashi, K., Ono, T., & Sakuragawa, N. (1984) Proc. Natl. Acad. Sci. U.S.A. 81, 289-293. Koide, T., Odani, S., & Ono, T. (1985) J . Biochem. (Tokyo) 98, 1191-1200. Koide, T., Foster, D., & Odani, S. (1986) FEBS Lett. 194, 242-244. Kyte, J., & Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-132. Leung, L. L. K., Harpel, P. C., Nachman, R. L., & Rabellino, E. M. (1983) Blood 62, 1016-1021.

VOL. 2 5 , N O . 8,

1986

2225

Leung, L. L. K., Nachman, R. L., & Harpel, P. C. (1984) J. Clin. Invest. 73, 5-12. Lijnen, H. R., Hoylaerts, M., & Collen, D. (1980) J . Biol. Chem. 255, 10214-10222. Lijnen, H. R., Hoylaerts, M., & Collen, D. (1983a) J . Biol. Chem. 258, 3803-3808. Lijnen, H. R., Rylatt, D. B., & Collen, D. (1983b) Biochim. Biophys. Acta 742, 109-1 15. Maniatis, T., Fritsch, E. F., & Sambrook, J. (1982) in Molecular Cloning, A Laboratory Manual, pp 371-372, Cold Spring Harbor Laboratories, Cold Spring Harbor, NY. Morgan, W. T. (1978) Biochim. Biophys. Acta 533, 319-333. Morgan, W. T. (1981) Biochemistry 20, 1054-1061. Morgan, W. T. (1985) Biochemistry 24, 1496-1501. Niwa, M., Yamagishi, R., Kondo, S., Sakuragawa, N., & Koide, T. (1985) Thromb. Res. 37, 237-240. Ohkubo, I., Kurachi, K., Takasawa, T., Shiokawa, H., & Sasaki, M. (1984) Biochemistry 23, 5691-5697. Pickart, L., Thaler, L., & Thaler, M. M. (1973) Biochem. Biophys. Res. Commun. 54, 562-566. Poncz, M., Solowiejczyk, D., Ballantine, M., Schwartz, E., & Surry, S . (1982) Proc. Natl. Acad. Sci. U.S.A. 79, 4298-4302. Proudfoot, N. J., & Brownlee, G. G. ( I 976) Nature (London) 263, 21 1-214. Ravetch, J. V., Feder, R., Pavlovec, M., & Blobel, G. (1984) Nature (London) 312, 616-620. Rylatt, D. B., Sia, D. Y., Mundy, J. P., & Parish, C. R. (198 1) Eur. J . Biochem. 119, 641-646. Saitoh, E., Isemura, S., & Sanada, K. (1983a) J . Biochem. (Tokyo) 93, 495-502. Saitoh, E., Isemura, S., & Sanada, K. (1983b) J . Biochem. (To~,vo)93, 883-888. Sanger, F., Nicklen, S., & Coulson, A. R. (1977) Proc. Natl. Acad. Sci. U.S.A. 74, 5463-5467. Shibata, S., Asakura, J., Isemura, T., Isemura, S., Saitoh, E., & Sanada, K. (1984) Int. J. Pept. Protein Res. 23, 158-165. Staden, R. (1977) Nucleic Acids Res. 4, 4037-4051. Staden, R. (1978) Nucleic Acids Res. 5, 1013-1015. Stenflo, J., & Fernlund, P. (1982) J . Biol. Chem. 257, 12180-1 2190. Sugo, T., Ikari, N., Kato, H., Iwanaga, S., & Fujii, S. (1980) Biochemistry 19, 321 5-3220. Takagaki, Y., Kitamura, N., & Nakanishi, S. (1985) J . Biol. Chem. 260, 8601-8609. Vieira, J., & Messing, J. (1982) Gene 19, 259-268. Young, R. A., & Davis, R. W. (1983a) Proc. Natl. Acad. Sci. U.S.A.80, 1194-1 198. Young, R. A., & Davis, R. W. (1983b) Science (Washington, D.C.) 222, 778-782. Ziemer, M. A., Swain, W. F., Rutter, W. J., Clements, S., Ann, D. K., & Carlson, D. M. (1984) J . Biol. Chem. 259, 10475-1 0480.