Characterization of cDNAs of the human pregnancy-specific. beta. 1

Human Pregnancy Specific Beta-1-Glycoprotein 1 (PSG1) Has a Potential Role in Placental Vascular ... The pregnancy-specific glycoprotein family of the...
0 downloads 0 Views 2MB Size
Biochemistry 1990, 29, 2845-2852 H., Hoth, D. F., Gelman, E., Groopman, J. E., Resnick, L., Gallo, R. C., Myers, C. E., & Fauci, A. S . (1985) Lancet ii, 627-630. Harada, S., Koyanagi, Y., & Yamamoto, N. (1985) Science (Washington, D.C.) 229, 563-566. Lindell, T. J., Weinberg, F., Morris, P. W., Roeder, R. G., & Rutter, M. J. (1967) Science (Washington, D.C.) 170, 447-449. Matsukage, A,, Sivarajan, M., & Wilson, S . H. (1976) Biochemistry 15, 5305-5314. Mitsuya, H., & Broder, S . (1986) Proc. Natl. Acad. Sci. U.S.A. 83, 191 1-1915. Mitsuya, H., Popovic, M., Yarchoan, R., Matsushita, S., Gallo, R. C., & Broder, S . (1984) Science (Washington, D.C.) 226, 172-174. Mitsuya, H., Weinhold, K. J., Furman, P. A., St. Clair, M. H., Nusinoff-Lehrman, So,Gallo, R. C., Bolognesi, D., Barry, D. W., & Broder, S . (1985) Proc. Natl. Acad. Sci. U.S.A. 82, 7096-7100. Nakajima, K., Ono, K., & Ito, Y. (1974) Znteruirology 3, 324-341. Nakane, H., Balzarini, J., De Clercq, E., & Ono, K. (1988) Eur. J . Biochem. 177,91-96. Okamura, S., Crane, F., Messner, H. A., & Mak, T. W. (1978) J . Biol. Chem. 253, 3765-3767. Ono, K., Ohashi, A,, Tanabe, K., Matsukage, A., Nishizawa,

2845

M., & Takahashi, T. (1979) Nucleic Acids Res. 7,715-726. Ono, K., Iwata, Y., & Nakane, H. (1983) Biomed. Pharmacother. 37, 27-35. Ono, K., Nakane, H., BarrbSinoussi, F., & Chermann, J.-C. (1988a) Eur. J . Biochem. 176, 305-310. Ono, K., Nakane, H., & Fukushima, M. (1988b) Eur. J . Biochem. 172, 349-353. Ono, K., Nakane, H., Fukushima, M., Chermann, J.-C., & Barr&-Sinoussi,F. (1989a) Biochem. Biophys. Res. Commun. 160, 982-987. Ono, K., Nakane, H., Fukushima, M., Chermann, J.-C., & BarrbSinoussi, F. (1 989b) Biomed. Pharmacother. (in press). Ono, K., Nakane, H., Meng, Z.-M., Ose, Y., Sakai, Y., & Mizuno, M. (1989~)Chem. Pharm. Bull. 37, 1810-1812. Rozenbaum, W., Dormont, D., Spire, B., Vilmer, E., Gentilini, M., Griscelli, C., Montagnier, L., BarrB-Sinoussi, F., & Chermann, J.-C. (1985) Lancet i, 450-451. Yamaguchi, M., Matsukage, A., & Takahashi, T. (1980) J . Biol. Chem. 255, 7002-7009. Yarchoan, R., Klecker, R. W., Weinhold, K. J., Markham, P. D., Lyerly, H. K., Durack, D. T., Gelmann, E., Nusinoff-Lehrmann, S.,Blun, R. M., Barry, D. W., Shearer, G. M., Fischl, M. A., Mitsuya, H., Gallo, R. C., Collins, J. M., Bolognesi, D. P., Myers, C. E., & Broder, S . (1986) Lancet i , 575-580.

Characterization of cDNAs of the Human Pregnancy-Specific ,t?l-Glycoprotein Family, a New Subfamily of the Immunoglobulin Gene Superfamily+,$ Qiao-Xi Zheng, Le Ann Tease, W. Lesley Shupert, and Wai-Yee Chan* Departments of Pediatrics, Biochemistry, Anatomy, and Cell Biology, Georgetown University Medical Center, 3800 Reservoir Road, NW, Washington, D.C. 20007 Received August 11, 1989; Revised Manuscript Received November 2, 1989

ABSTRACT: Three highly homologous cDNAs encoding human pregnancy-specific @l-glycoprotein (SP1) were isolated from a human placental cDNA library. These cDNAs share >90% nucleotide homology in their coding sequences, and >79% of the encoded amino acids are homologous. Proteins encoded by these cDNAs are very similar to members of the carcinoembryonic antigen family and contain repeating domains, conserved disulfide bridges, and @-sheetstructure typical of the immunoglobulin gene superfamily. However, the high degree of sequence homology and relatively lesser degree of glycosylation among the SPI proteins suggest that they exist as a unique family instead of being members of the CEA family. Both soluble and potentially membrane-bound forms of SP1 proteins were present in the placenta. Northern blot analysis using specific probes confirmed the expression of multiple m R N A species in human term placenta.

Regnancy-speci fic 61-glycoprotein (sP1) is an early pregnancy protein that can be detected in maternal serum as early as 16 days after conception (Grudzinskas et al., 1977). The maternal serum level of SP1 increases with progression of gestation and has been shown to be a good index for monitoring fetal growth and pregnancy complications (Bischof, 1984). SP1 is also elevated in serum of patients with tumors of trophoblastic origin as well as some nontrophoblastic tumors 'Supported in part by NIH Grant HD 21793 and by the Oklahoma Medical Research Foundation. $The nucleotide sequences reported for hPS12 and hPS2 have been submitted to GenBank under Accession Numbers 502892 and 502893. *Author to whom correspondence should be addressed at the Department of Pediatrics.

0006-2960/90/0429-2845$02.50/0

(Sorensen, 1982). The physiological function of SP1 is not known. Like many of the early pregnancy proteins, it has been suggested to be an immunosuppressive agent enhancing survival of the fetus (Cerni et al., 1977). Human placental SP1 as first reported (Bohn, 1972) is a single-subunit glycoprotein with a molecular mass of 90 kDa, 29% of which is carbohydrate. This protein, besides being shown to be synthesized by the syncytiotrophoblasts of the placenta (Horne et al., 1976), was recently shown to be also produced in extraplacental sites including intestine, testis, and uterus (Chan et al., 1988a-c). Further studies indicate that SP1 as it was initially defined might be composed of a group of highly homologous proteins (Chan et al., 1988a). These different species of SPl proteins are the products of a family of genes. Several species of 0 1990 American Chemical Society

2846 Biochemistry, Vol. 29, No. 11, 1990 placental SPI cDNAs have been cloned (Chan et al., 1988a; Rooney et al., 1988; Streydio et al., 1988; Watanabe & Chou, 1988a). Computer analysis of the SPl cDNAs cloned revealed that they are very similar to carcinoembryonic antigen (CEA), a subfamily of the immunoglobulin (Ig) gene superfamily (Beauchemin et al., 1987; Oikawa et al., 1987; Zimmermann et al., 1987; Chan et al., 1988a; Rooney et al., 1988; Streydio et al., 1988; Watanabe & Chou, 1988b). All the cDNAs reported encode proteins that are highly hydrophilic. However, in vitro translation studies suggested that SP1 protein might be amphiphilic (Hau et al., 1983). The present report describes the structure of two hitherto unreported human placental SPl cDNAs, one of which has a sequence encoding a hydrophobic C-terminus making it a very likely candidate for the membrane-bound form of SPI. Detailed analysis of the three placental SPI cDNAs cloned in our laboratory and the other SPI cDNAs reported was performed. Results support the hypothesis that the placental SPI proteins are products of a multigene family and constitute a new subfamily of the Ig gene superfamily (Williams & Barclay, 1988). MATERIALS A N D METHODS Screening of cDNA Library. A human placental cDNA expression library in X gtl 1, generously provided by Dr. Brian Knoll of M. D. Anderson Hospital and Tumor Institute, was screened with ’2s1-labeledanti-human SPI antibody (Calbiochem, San Diego, CA) as described previously (Chan et al., 1988~).Two million phages were screened. The positive plaques picked were rescreened until purified. In order to obtain longer clones, another human placental cDNA library in X gt 1 1 , purchased from Clontech Laboratories, Palo Alto, CA, was rescreened with confirmed partial SPl cDNAs. Preparation of DNA f o r Subcloning and Sequencing. Recombinant phage DNA was prepared by plate lysis followed by banding on a cesium chloride step gradient as described previously (Chan & Qiu, 1988). The cDNA insert was released from the phage vector by partial digestion with EcoRI and separated in a 1% SeaPlaque agarose gel (FMC BioProducts, Rockland, ME) by electrophoresing in 1 X TAE buffer (40 mM Tris, 20 mM sodium acetate, 2 mM EDTA, pH 8.0) (Weislander, 1979). The cDNA insert was excised and the gel melted by incubation at 65 OC for 10 min (Dumais & Nochumson, 1987). For hPS12, the cDNA insert was subcloned into M13mp18 and M13mp19 vectors for singlestrand sequencing (Messing, 1983) and into pBluescript for double-strand sequencing (Stratagene, 1988). For hPS2, the total cDNA insert was recovered, and a number of clones containing overlapping cDNA fragments were generated by Bal-3 1 deletion subcloning. Bal-31 Deletion Subcloning. The hPS2 cDNA insert was blunt-ended by filling in with dNTPs by the action of DNA polymerase I in melted agarose at 37 OC (Dumais & Nochumson, 1987). The blund-ended DNA was ligated into pUCl3, and the recombinant plasmid DNA was prepared for Bal-3 I deletion subcloning as previously described (Maniatis et al., 1982). The recombinant plasmid was linearized by digestion with either PstI or SstI. Linearized plasmid DNA was subjected to sequential digestion with Bal-3 1 nuclease (Yoshitake et al., 1985). Sequentially deleted DNA fragments were first blunt-ended and then digested with either SstI or PstI. Deleted cDNA fragments were separated on 1.3% SeaPlaque agarose gels (Weislander, 1979) and ligated into MI 3mpl8 or MI 3mp19 vectors for single-strand sequencing (Messing, 1983). DNA Sequence Determination. The nucleotide sequence of the cDNAs was determined by a modified dideoxy chain

Zheng et al. termination method using Klenow fragment (Boehringer Mannheim Biochemicals, Indianapolis, IN) at 50 OC or Sequenase ( U S . Biochemical Corp., Cleveland, OH) at 37 OC (Johnston-Dow et al., 1987). MI3 universal sequencing primer and T3 and T7 sequencing primers as well as synthetic oligonucleotide primers were used to prime the sequencing reaction in single-strand sequencing (Chan et al., 1989) or double-strand sequencing with the pBluescript system (Stratagene, 1988). All sequences were determined three or more times as well as from different subclones and from both strands. DNA Sequence Analysis. DNA sequences were analyzed by the Sequence Analysis Software Package of the Genetics Computer Group at the University of Wisconsin (Devereux et al., 1984). SPI cDNAs reported in this work were compared with those reported previously, including hPSl1 (Chan et al., 1988a) PSG16 and PSG93 (Watanabe & Chou, 1988a), PSBGC, PSBGD, and PSBGE (Streydio et al., 1988), and pSP1-i (Rooney et al., 1988). Northern Blot Analysis. Total RNA was extracted from human term placenta in the presence of a 4 M guanidine monothiocyanate-phenol-chloroform mixture (Chomczynski & Sacchi, 1987). Fifty micrograms of total placental RNA was denatured with formaldehyde, separated in an agarose/ formaldehyde gel, and analyzed by Northern blotting with labeled SP1 cDNA probes (Chan et al., 1988c; Fourney et al., 1988). RESULTS Screening of Human Placental cDNA Library. Six positive clones were initially obtained by immunological screening of a human placental cDNA library (Chan et al., 1988~).These clones were confirmed to be partial cDNAs encoding human placental SPl (Chan et al., 1988a). One of these cloned cDNAs was used as a probe to rescreen another human placental cDNA library. A total of 26 clones were obtained. The nucleotide sequence of one of these clones (hPSI 1) has been reported (Chan et al., 1988a). The nucleotide sequences of two of the longer clones, hPS12 and hPS2, were determined. They both represented partial cDNA with an incomplete 5’ coding sequence. To obtain more 5’ sequence, the most 5’ EcoRI-BamHI fragments of hPS 12 and hPS2 were used as probes to rescreen the same cDNA library. Two clones, hPS89 and hPS90, with more 5’ sequence than hPS12 were identified, while no clone having more 5’ sequence than hPS2 was found. Sequence Analysis of hPSl2 and hPS2. The composite sequence of hPS 12 is shown in Figure 1. The cDNA had 1573 bp with a 5‘ noncoding sequence of 45 bp, an open reading frame of 1272 bp encoding 424 amino acids with a calculated molecular mass of 47.5 kDa, a stop codon of TAA, and a 3’ noncoding sequence of 253 bp. Even though no upstream stop codon could be identified, the sequence around the presumed translation initiation codon A C C m G agreed with the consensus sequence for initiation of translation in vertebrates suggested by Kozak (1987). No poly(A) tail was found in any of the hPS 12 clones sequenced. Eight potential glycosylation sites, two of the form Asn-X-Ser and six of the form Asn-XThr, were present. The encoded protein contained an Nterminal domain, two n-subdomains and one c-subdomain each containing two conserved Cys residues, and a C-terminal domain, all characteristic of SP1 proteins reported previously (Chan et al., 1988a; Rooney et al., 1988; Streydio et al., 1988; Watanabe & Chou, 1988a). Figure 2 shows the nucleotide sequence of hPS2. It had 1744 bp with an open reading frame of 1053 bp encoding 351 amino acids, a stop codon TGA, and a 3’ noncoding sequence

PSG Gene Family I

Biochemistry, Vol. 29, No. 11, 1990 2841

C C T A G G C T C A T C T C C A C A G G G G A G A A C A C A C A G A C A G C A ~ A G A C C A T G ~ G A C C C C T C T C A G C C C C T C ~ C T ~ C A ~ T C A G C A C A T C A ~ ~ C T G ~ A A GI Con GGCT~~ ~ ~ P L S A P P C T O H I T U K ~ L L

101

TGCTCACAGCATCACTTTTAAACTTCTGGAACCTGCCCACCACTGCCCAAGTAATAATTGAA~CCAAGCCACCCAAAGTTTCCGAGG~GAAGGATGTTCT 200 L T A S L L N F W N L P T T A O V I I E A K P P K V S E G K D V L

201

TCTACTTGTCCACAATTTCCCCCAGAATCTTACTGGCTACAT~TGGTACAAAGGGCAAATGACGGACCTCTACCATTACATTACATCATATGTAGTAGA~ L L V H N L P O ~ T G Y I W Y K G O ~ T D L Y IH T Y S Y V V D

300

10 1

400

401

500

501

hOn

60 I

ion

101

Rno ,

eo1

0

.

G A A G A A G G A T G T C T T A G C C T T C A C C T G T G A A C C T A A G A G T C G G A A C T A C A C C T A C A T T T G G T G G C T A A A T C G T C A ~ A G C C T C ~ C G G T ~ A G T ~ ~ ~ A ~ ~900 GTA K K D ~ L A F T C E P K S R N Y S Y I Y Y L N G P S L P Y S P R Y 0

90 I

A A G C G A C C C A T T G A A A A C A G G A T A C T C A T T C T A C C C A G T G T C A C G A G A A A T G A A A C A G G A C C ~ T A T C A A T G T G A A A T A C G G G A C C G A T A T C G T G ~ ~ A T C Ci non K R P I E N R I L I L P S V T R N E T C P Y O ~ E I R D R Y G G I R

1001

CCAGTAACCCAGTCACCCTGAATGTCCT A T G G T C C A G A C C T C C C C A G A A T T T A C C C T T C A T T C A C C T A T T A C C G T T C A G G A G A A A A ~ C T C ~ A ~ T T ~ T1 Ci n n S N P Y T L N V L Y G P ~ L P R I Y P S F T Y Y R S ~ E W L U L S

1101

C T C C T T T G C G G A C T C T A A C C C A C C G G C A G A G T A T T T T I G G A C A A T T A A T G G G A A G T T T C A G C T A T C A G G A C A A A A G C T ~ T T T A T C C ~ C ~ A A A T T A ~ TiA2~nAn C F A D S N P P A E Y F Y T I N G K F O L S G O K L F I P O I T T

1201

AATCATACC~GGCTCTATGCTTGCTCTGTTCGTAACTCAGCCACTGGCAAGGAAATCTC~AAATCCATGATA~TCAAAGTCT GG?CCrCTGCCATGGAA C N H S G L Y A C S V R W S A T G K E I S K S ~ I V K V

.

0

R2nTR2c

. 0

R2CT

1301

ACCAGACAGAGTCTCATTAATGGCTCCCICACAATAGAGACACTGAGAAAAAGAACAGGTTGATACCTTCATGAAATTCAA~ACAAA~AAGAAAAAAACTCA

1401

ATGTTATTGGACTAAATAATCAAAA~GATAATGTTTTCATAATITTTTATTGGAAAATGTGCTGATTCTTGGAATGTTTTATTCTCCAGATTTATI~AACT

1501

T T T T T T C T T C A G C A A T T G G ~ A A A G T A T A C ~ T T T G T A A A C ~ A A A A T T G A A A C A T T T G C T T T T G C T C T C C C G CI C5C7 3

L

T

E

S

H

1700

S

~

P

~

~

~

~

iron

' I500

FIGURE 1: Nucleotide and predicted amino acid sequence of hPS12. Nucleotide numbers are indicated. Potential glycasylation sites are underlined. Open inverted triangles indicate conserved Cys residues. An asterisk indicates a stop codon. Boundaries of the different domains are indicated as follows: N, N-terminal domain; Rln, n-subdomain of repeat unit 1; R2n, n-subdomain of repeat unit 2; R2c, c-subdomain of repeat unit 2; C, C-terminal domain. 1

100

101

zoo

201 301 401

300

TCCACCACCAACTIAAACCCCACCCAGGCCATCGAGCCIGTGCCCTTAATCICTCATCCTGAGACTCICGACCCAAGCTACCTATCCTGGATGAATCCTC ' 0 400 S S S W L W P R C A M E A T R L I C D P E T L D A S Y L ~ ~ M W ~ ACAGCCTCCCICTCACICACACCTTCCACCTCTCCAAAACCAACAGCACCCICTATCTATTTCGTCTCACAAACTATAITCCACCACCCTATCAAICTGA 0 ' 500 S L P T T H R L O L S K T H R I L Y L F G T T K Y I A G P ~ E C C

50 I

601

600 0 '

CGTTCACCACAAAACCICCACTTGICCTCCITCACCCAAICTAACCCACCCGCAGACTATTTITCGACAATTAATGGGAAGTTICAGCAATCACGACAAA R S G ~ N L D L S C F I ~ S W P P A E Y F ~ T I N G K

700

F

0 '

701

AGCTCTTTATCCCCCAAAITACTACAAATCATAGCGGGCTCIATCCTTCCICTGTTCATAACTCACCCACTGGCAAGGAAATCTCCAAATCCATCACAGT L I I P O I T R N H S C L Y A C S T . H N S A T G K C I S K S M T V

eo 1

CAAACTCTC K T S

901

O

RcY

0

CIAAGTCGATCCCACCATCCTTGGCAGTACGCITTTAIGTCCAGICIATCICCCTTTCACAGAACACTCAGCAAAACATTTTTATTCCC ~ K Y I P A S L A T G ~ Y Y C S I ~ L S E K S ~ E

.

'0

AGCCIGTCTCCCATCCCCACAACCAAATCCCAAATTCTCCTCCTGAACCCTCCCAATTTGTCTCTACAGACTCTCTTCTCCTTCTTTITCTGTTTTCTTA S L C P M C T S K S O I L L L W P P N L S L O T L F S L ~ F C ~

1001

~

~

~

I

P

800

900

N

~

F

I000

L

M

1100

I101

ITCCAACCCACCCTCCCCACGICCTGGTCACCTGCCAGCGACTCICTCACACTGCCCACCGAGCCITICGCATGACTTAACACCAAGCGCAGTCTIGGGC

I200

1201

CCCCTAIGCACCTCCIGCCAAACCTGCCTGCCCTCTCTCCATCCCTGTCCCTCACCCTAGCACCATGCCAGGACTCCGGCAACTGCAGTGTCCITGCTGT

1300

1301

ATCCCIGTTGTGACC1TCCTTCCACGGGCICGCACTCAAGCAAGCCICCTCGCACCCCAIGCCCITCAGCCCIGCCTCACCAACTGCCCTGTAGGGCACG

I400

1401

GCACTTCTGACCTCACCTCTICCTAGCTCCCTCCAICTGTCTGCCTICTCGCIGACAATCCICCAAATCTGTICICCACAATCAGGCCAAAAAGTTCACA

1500

I501

CTCAAATCCCCACGGCTATTCTTCATCCACGACACCCCTAGGCCCTGCAGCCTGCAACATACCTCAATCCTCICCCACCCCGGATCCTCCTGAAGCCCTT

I600

1601

1700 TTCCCACCACTCCTATCCTCCAAACCCATICTAAATCTCTCIACACTGICTATAAACCITCTICTTCTTITTTTTTITTTAAACTGAGGATTGTC~T~I~~

1101

?CACACTTGTTfICTAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

1744

FIGURE 2: Nucleotide and predicted amino acid sequence of hPS2. Nucleotide numbers are indicated. Potential glycasylation sites are underlined. Open inverted triangles indicate conserved Cys residues. Solid squares indicate the polyadenylation site. An asterisk indicates a stop codon. Boundaries of the different domains are indicated as follows: N, N-terminal domain; Rn, n-subdomain of repeat unit; Rc, c-subdomain of repeat unit; C, C-terminal domain.

of 659 bp. A polyadenylation signal ATTAAA 14 bp upstream from a 29-bp poly(A) could be identified. The polypeptide encoded by hPS2 contained part of the N-terminal domain, an n-subdomain, a c-subdomain similar to that ob-

served in hPS12, and an 81 amino acid C-terminal domain. In addition to the conserved Cys residues in the n- and csubdomains, two Cys residues separated by 27 amino acids were present in the C-terminal domain. Five potential gly-

G

~

K

2848 Biochemistry, Vol. 29, No. I I, I990

Zheng et al.

hPSl2

Kb 4.4b

hPS11

hPS2

hPS11

hPS12

2.4, hPS2

1.4~ POtentlrl Glycosylrtlon Sltc

1 !Dlrulphldc Bond

signa1 Pcptldc

W-Tcr.lMl

n-suwonin of ~cpertlnqUnlt

c-suwauln of ~ c p e r t ~ nUnit q

Douln

c::::IHydrOphOblC C-TIXDlfUl Dovln

~ H y d r o p h l 1 l CC-TerDlnrl Douln

100 bare-palrr

FIGURE 3: Comparison of domain structure of human placental

pregnancy-specific B1-glycoprotein cDNAs hPS12, hPSl1, and hPS2.

cosylation sites could also be identified. The domain structures of hPS12, hPS2, and hPSl I were compared in Figure 3. hPSll and hPS12 had the same domain structure. The most significant differences between hPS2 and the other two cDNAs cloned from the same placental library were the presence of only one complete repeating unit (i.e., one n-subdomain and one c-subdomain) and an 8 1 amino acid hydrophobic C-terminal domain. In spite of these differences, the Cys residues as well as the potential glycosylation sites were conserved in these three cDNAs. Prediction of secondary structure with programs available in the Sequence Analysis Software Package of the Genetics Computer Group (Kyte 8c Doolittle, 1982) showed that both hPS12 and hPS2 were mainly in the form of &sheets (results not shown). The PEPPLOT program (Goldman et al., 1986) and the PLOTSTRUCTURE program (Chou & Fasman, 1978) both indicated that hPS 12 was largely hydrophilic except for the N-terminal 120 amino acids. The N-terminal 34-35 amino acids had the characteristics of a putative signal peptide as observed for hPSl1 (Chan et al., 1988a). The N-terminal domain of hPS2, similar to that of hPS12, was also hydrophobic. However, in contrast to any of the SPl sequences reported, the C-terminal domain of hPS2 was highly hydrophobic (results not shown). Northern Blot Analysis of Human Placental RNA. Figure 4 shows results of Northern blot analysis of human placental RNA probed with a specific fragment of each cDNA, namely, the 3’ 620-bp EcoRI-EcoRI fragment (the entier 3’ n o n d i n g sequence) of hPSl1, the 3‘ NcoI-EcoRI fragment (the Cterminal 24 bp of the coding sequence and the 3’ noncoding sequence) of hPSl2, and the 280-bp BamHI-EcoRI fragment (sequence encoding the C-terminal domain) of hPS2. Both the hPSl1- and the hPS2-specific probes hybridized with a 2.25-kb mRNA band, while the hPS12-specific probe hybridized with a 1.65-kb mRNA band. In addition to the 2.25-kb mRNA band, the hPS2-specific probe also hybridized with the 1.65-kb mRNA band. Even though the ratio of the specific radioactivity of the three probes used to hybridize to the blot was roughly 7:43 (hP!S2:hPS12:hPSl I), the intensity of the hybridization signal was hPSll > hPS12 > hPS2. Comparison of Members of the SPI Family. A comparison of the encoded amino acid sequences of all SPI cDNAs whose nucleotide sequences had been reported and their consensus sequences is shown in Figure 5. An amino acid or nucleotide was considered to be consensus if it occurred at least three

FIGURE 4: Human placental Northern blot probed with cDNA specific probes. Mobilities of size standards (in kb) are indicated. The specific probe used for each lane is shown.

times among the clones hPS12, hPS2, PSBGE, pSP1-i, and hPS 1 1/PSG 16/PSBGC-D. The percent homology with the consensus amino acid sequence ranged from 92.5 to 100 in the N-terminal domains and the n-subdomains except for the N-terminal domain of hPS2, which showed only 85.9% homology. Levels of homology among the c-subdomains were slightly lower than those of the other domains and ranged from 85.9% to 97.6%. Nucleotide sequence homology among the N-terminal domains and n- and c-subdomains was similar to that among the amino acid sequences and ranged from 90% to 99%. On the other hand, the amino acid and nucleotide sequences of the C-terminal domains of these cDNAs shared very little sequence homology, with two exceptions. First, the C-terminal domains of hPSl1 and PSBGD were identical, and second, the 3’ noncoding sequence of PSG 16 was identical with that of hPSl1 except for the deletion of 86 bp near the 3‘ end of the coding region (Chan et al., 1988a). The positions of the cysteine residue in the n- and c-subdomains of all cDNAs were conserved. The positions of the potential glycosylation sites in the N-terminal domain and the Rln-subdomain were also conserved. Figure 5A shows that hPS12 differed from the other SPI cDNAs in the deletion of three nucleotides encoding Ile-90 in the N-terminal domain. It was also apparent from Figure 5A that, albeit very similar, hPS12 and pSP1-i were different from each other and from the other SPl cDNAs. Both hPS2 and PSBGE had one less n-subdomain when compared to the other SP1 cDNAs. The percent homology with the Rln-subdomain consensus sequence was 93.5 and 92.5 and that of the R2n-subdomain was 52.7 and 48.4 for hPS2 and PSBGE, respectively. The c-subdomains of hPS2 and PSBGE were less homologous to the consensus sequence than those of the other SPI cDNAs, being 87.1% and 85.9% respectively, while all other cDNAs showed >90% homology. The most significant difference between hPS2 and the other S P l s was the presence of the 81 amino acid hydrophobic C-terminus, while all other SPls had only relatively short (14 amino acids or less) hydrophilic or amphiphilic C-termini. Comparison of the 5’ noncoding sequence showed that there was over 9 I % similarity among all SPl cDNAs. Comparison of the 3’ noncoding sequences was more varied. The 3’ noncoding sequence of hPS2 had very little homology with any of the SPI cDNAs reported. On the other hand, the 3‘ noncoding sequence of hPS12 was very homologous to those of PSBGE and pSP1-i. Comparison of these three sequences using the LINEUP program (Devereux et al., 1984) showed that aside from 27, 30, and 39 bp at the 5’ end of this region in

Biochemistry, Vol. 29, No. 11, I990

PSG Gene Family

A

2849

N-Terminal lwmain

-

1 HGPLSAPPCT QHITWKGLLL TASLLNFWNL PTTAOVIIEA KPPKVSEGKD VLLLVHNLPO NLTGYIWYKG QHTDLYHYIT SYVVDGOII YGPAYSGRET P..F.... E I I...K..I R....... E..I T....... .R.K...... T... E.T....... R....... E..I T....... .R.K...... T... Q R....... E..I T....... .R.K...... T... E.T .IR....... I E..K.....V T... Q K....... I 1. .R........ P T... E.T. K...

hPSl2 hPS2 hPSll PSG16 PSBGC-D PSBGE pSP1-i

.. .. .. .......... ..........

................ ................ ................ ................ ......... ......

-

........ ......... .. ............ .......... ............ ................... ........... . . . . . . . . . . . . . . . . . ..... .... .. ................... .... .. ..........

..

...... ...... ...... ......... .........

.......... .......... .......... .......... ....... .......

101 hPSlZ V Y S ~ S L L I ONVTOEDAGSY TLHIIKRGDG TGGVTGYFTV TLY E .REEIRH..F . , . RK...T. hPS2 GD.. .R....R..F ..H R...... hPSll A . .R....R..F ..H GO.. R...... A..., PSG16 GD.. .R....R..F ..H R...... P SBGC - D A . .R.. +.F R...... PSBGE A . . v..... .R.E..H..F R...... pSP1-i Rln-subdomain

......... ...... ...... ...... .......... ....

... .......... - ... ... .-. ... .... ... ..-..... ... -.. .. ... .......... - -

...

... ...

v

51 1 v hPSl2 SETPKPSISS SNLNPREVME AVRLICDPET PDASYLWLLN GQNLPHTHRL Q L S K T m L Y hPS2 L.....Y... A.. L......WM. S..V.... S.....S. K..E.,.F S.T..... WH. T.. hPSll L......... W M . S.....S. K..E.,.F S.T..... T.. PSG16 L......... S.T.... WM. . S . . S . K. .E. -.I T.. PSBGC-D L......... ..T..O.WM. S......F F.,.E A . . T.I.T..... PSBGE L......... S.....S. NK...F S.T..... WM. Y...D.. pSP1-i L......... R2n-subdomain

....... ....... ....... ....... ....... ...

.......... .... ....... ....... .. ........ .. .......

1 v hPS12 PKLPMPYITI NNLNPREKKD VLAFTCEPKS R U Y I W W L N GQSLPVSPRV hPS2 LET.K...SS S......AME AVRLI.D.ET LDAS.L..M. TH.L It..... N.. ..L..... ,E hPSll PSG16 K..... N.. . . X , . . .,. E. PSBGC-D K..... N.. N....... E......... PSBGE LET.K:.S.SS S......AME T V r . D . E T PDTS.Q..H. HTH.F pSP1-i K..... N.. E......... RZc-subdomain

....... ....... ....... .......

.... .... .... ....

v

LDLSCFADSN ................ TE.. ................... .Y...S.... ................... .Y...S.... ................... .Y.....N.. .Y...S.... ............... .Y........

B

1 PCHGWESH GKWIPASLAV DWTVP EAL AYSSSIUS DWTVP ASTRIGLLPL APSGTGHLPG

v

51

KRPIENRILI ...... 0LSKTa.Y .......................... .......................... .. .................... ..... 0LSETA.F .......... ....................

1 hPSI2 YGPDLPRIYP SFTYYRSGEN hPSZ H......... hPSll V V PSG16 PSBGC-D V PSBGE H.......H. .Y.N....D. pSP1-I H.... C-Terminal Domain

hPSl2 hPS2 hPSll PSGI6 PSBGC PSBGD PSBGE PSPI-I

LFGVTKYIAG PYECEIRNPV SASRSDPVTL NLL .......... ..I . . . . . . . . . . - .................... .L.....T.. ....................... .L.....T.. ....................... .L.....T.. ....................... ... ....... T.. ........SG ............. .... .......T.. .......................

.... .. ... ... ..

L P S V T R m G PYQCEIRDRY GGIRSNPVTL NVL .FG..KYIA. E....NPV SA...D.... .L. D.... V..D.... D.... .FG..Km. E....NSG SAS..D.... .L.

.. ......................... ...................... ......................... .. o... .....Y . . . . ................ -

...

... ... ...

51 v PPAEYFWTIN GKFQLSGQKL F I P Q I T T E GLYACSVRNS ATGKEISKSH IVKVS Q..... R... H.. O.S.... E....P.... RH -K V...... S.... T.E.. Q.S.... E....P.... RH...K.. V...... S.... T.E.. Q.S.... E....P.... RH...K.. V...... S.... T.E.. Q.S.... O...N. K.. I...... 0.S.T.L T.... S.... K.. H . S . . . . T....

.............. ... ........... ... .. ... .. ... .... ....... ..... .................

....... ............... ... ..... ... ..... ... ..... ... ... .............

51 V

V

GFYVESIWLS EKSOENIFIP SLCPMGTSKS O I L L L N P P U L Q T L F S L F F CFLHADLVSG LKKVGRGLYQ P GNRN L m LNPL

N-Terminal Domain

SPI NCon hCEA N hNCA N

1 HGPLSAPPCT QXIXWKGLLL TASLLNFWNL PTTAQVTIEA XPPKVSEGKD VLLLVHNLPO W G Y I W Y K G OMXDLYHYIT SYPVDGOIII YGPAYSGRET T...P KL...S T.FN.A...E H.F..S.... ERV.GNRQ.1 G..IGT.OAT P... I .ESP....HR WC.P.OR... T...P KL...S T.FN.A...E A..... .R...S.... ERV.GNSL.V G..IGT.QAT P P.....R LHVP..EV..

..... .....

...

101 SP1 NCon VYS)f&LLIO hCEA N .,P.I hNCA N I.P,

.... ....

U R E D A G S Y TLHIIKRGDG . I I Q u . F . ...V..SDLV ... .... ...QN.1.F. ..OV..SDLV

Rn-subdomain 1 SPlRlnCan LETPKPSISS SPlR2nCon PKL...Y.TI hCEA IA P.L hCEA IIA A.P...F.T. hCEA IIIA A.L hNCA IA P.L.......

SNLNPREXME KD UK.V.DKD u..V.DED N.SK.V.DKD N.S..V.DKD

..........

..... .........

....

TRGVTGXFTF TLY NEEA..O.R VY NEEA..O.H VY

v

PDASYLWWHN GOSLPMTHRL ... AVXLTCDPET VL.F..E.KS X&T.I..L. .....VSP.V ....... .. AF..E... O..T....V. k . . V S P . . ....AF..E..A A...E..I O E T ....V. N,..VSP.. ....... QNTT.. ..V. G....VSP.. ..AF..E..v o m ....v. .....VSP..

N....

51 OLSKTULF KRPIE..I.I T.,GN T.,DN T.,GN NG&.T

... ... ... ...

v

LFGVTKYTAG PYECEIRNPV SASRSDPVTL NLL .PS..RUT. O....DRY-GGI..... ..kRmA S.K..TQ... R...S.I. .LS..RNDV. G.O.EL .VDH....I. .V. ..N..RND.R A.V.G.O.S. N....... DV.

..

....

..

.....

...

..

.LS.KRND.. s . . . . . o . . A . . G . . ..v. ..

Rc-subdomain 1

SPlRZcCon hCEA IB hCEA IIB hCEA IIIB hNCA IB

YGPDLPRIYP A.T.S. D.T.S. T.I.S. V.T.S.

.... .... .... ....

v 51 v SFTYYRSGEN LYLSCFADSN PPAEYSWTIN GKFOLSGQKL FIPOITTKHS GLYACSVRNS ATGKESSKSM TVKVS LLS. .L.H.A.. O...FVdF.O.T.E. N..VNN, .S.T.OAH.. D . . L W T V T .IT.Y .Y..P.V..H.A. O...L.D .NI.OHT.E. . . S L E . N . T . O A L .S.HSRTTVK .IT.. PDSS.L..A. .L. HSA PO...R.. .IP.OHT.V. A K PNNF.S.L R b I V K SIT.. .KAN..P... . L . H . A . . ...O...F.LLT...O.T.E. . . . N..VNN. .S.li.OAH.. L K T T V T HIT..

.....

... , . ...

...

. . ..

...

...

... ...

FIGURE 5: Comprison of (A) aligned amino acid sequences encoded by human placental SP1 cDNAs and (B)-&nsensus amino acid sequences of SPl cDNAs with those encoded by human CEA and N C A cDNAs. Identical amino acids are indicated by dots. Potential glycosylation sites are underlined, Solid inverted triangles indicate conserved Cys residues. Amino acids are numbered with reference to the beginning of each domain or subdomain. The Rn-subdomain of hPS2 and PSBGE was repeated under both Rln- and R2n-subdomains for the purpose of comparison. SPI NCon, consensus sequence of the N-terminal domain of SP1 cDNAs; SPlRlnCon, consensus sequence of the Rln-subdomain of SPl cDNAs; SPI R2nCon, consensus sequence of the R2n-subdomain of SPl cDNAs; SPlR2cCon, consensus sequence of the R2c-subdomain of SPl cDNAs. Domain notation of CEA and NCA is the same as described (Beauchemin et al., 1987; Tawaragi et al., 1988).

2850 Biochemistry, Vol. 29, No. 11, 1990 hPS12, PSBGE, and pSP1-i, respectively, there was 95% homology among the three cDNAs. The 3‘ noncoding sequences of hPS 1 1, PSGl6, and PSBGD were almost identical except for the deletion of 70 bp after the stop codon in PSG16 and one mismatch in both PSG16 and PSBGD when compared with hPSl1. Comparison with Carcinoembryonic Antigen ( C E A ) and Nonspecific Cross-Reacting Antigen (NCA),Members of the Ig Gene Superfamily. Among the placental SPl cDNAs, hPS2 and PSBGE had the domain structure closest to that of CEA and NCA. Both hPS2 and PSBGE, similar to NCA, had only one complete repeat unit, Le., one n-subdomain and one c-subdomain (Neumaier et al., 1988; Tawaragi et al., 1988), while CEA had three complete repeating units (Beauchemin et al., 1987; Oikawa et al., 1987; Zimmermann et al., 1987). All other SP1 cDNAs had one complete repeating unit and an additional n-subdomain. Figure 5B shows the comparison of the consensus amino acid sequences of the different domains of the SPl cDNAs with the corresponding domains in CEA and NCA. The percent homology of both CEA and NCA to SP1 was quite similar in all domains compared, ranging from 53.8 to 62.4 at the amino acid level. The amino acid sequences of the C-terminal domains were very variable among the SPl cDNAs and were not compared with those of CEA or NCA. Nucleotide sequence comparison of the different domains of SPl, CEA, and NCA showed results similar to those of amino acid comparison. DISCUSSION All the SP1 cDNAs reported demonstrate characteristics required for inclusion in the Ig gene superfamily, namely, sequence homology, characteristic domain structure, conserved disulfide bonds within domains, and @-sheetstructure (Williams, 1987). The SP1 proteins therefore constitute a subfamily of the Ig gene superfamily. There is increasing evidence suggesting the presence of multiple species of highly homologous SPl proteins in human placenta. It has also been shown previously that the SP1 proteins are encoded by multiple genes (Chan & Qiu, 1988). Even though a recent report suggested the possibility that different SPl proteins might be the differentially spliced products of one gene (Oikawa et al., 1988), results of these studies support the contention that SPl protein in human placenta consists of products of more than one gene (Chan et al., 1988a; Rooney et al., 1988; Streydio et al., 1988). Two of the reported placental SP1 cDNAs, hPSl1 and PSBGD, are identical. PSG16 and another partially sequenced cDNA, PSG93, differ from these two cDNAs at only four bases, which could correspond to polymorphisms (Chan et al., 1988a; Streydio et al., 1988; Watanabe & Chou, 1988a). On the basis of their almost identical protein coding sequences, hPSl1, PSG 16, PSBG93, PSBGC, and PSBGD are likely to be the products of the same gene with differentially spliced exons encoding the C-terminus and 3’ noncoding sequence. This conclusion is in line with that derived from studies of a cloned fragment of human SPI gene (Oikawa et al., 1988) and CEA gene (Barnett et al., 1989). Three other cDNAs, hPS12, PSBGE, and pSP1-i, have 221 bp in their 3’ noncoding sequences that are highly homologous (95%). Considering that this is observed in the noncoding region and that the three clones were derived from two different libraries, it is conceivable that the few differences were individual polymorphisms. These three cDNAs are products of splicing different amino acid coding exons to one common exon that contains the 3’ noncoding sequence. The exon encoding the N-terminal domain of hPSl2 is unique in that it has a 3-bp deletion when

Zheng et al. compared to the other cDNAs. PSBGE differs from hPS12 and pSP1-i by having one less n-subdomain, which could be the result of a difference in splicing. The gene encoding hPS 12/PSBGE/pSPl -i could be different from that encoding hPSl l/PSGl6/PSG93/PSBGC-D. The other SPl cDNA, hPS2, is unique. It has no significant homology with any of the SPl cDNAs reported. Thus it might be the product of a third SP1 gene. The presence of at least three different species of mRNA in human placenta was confirmed by the Northern blot analysis. The difference in hybridization signal intensity suggested that hPS2 mRNAs were present in lesser quantity than either hPS12 mRNA or hPSl1 mRNA. It is interesting to notice that the hPS2-specific probe hybridized to both the 1.65-kb and 2.25-kb mRNA bands. These results suggest that each mRNA band contains more than one species of mRNA. Up to the present time only one placental SP1 cDNA with a sequence encoding a hydrophobic C-terminus has been found. The Northern blot results thus also suggest that there might be more than one species of membrane-bound SPl present in human placenta. Even though the size of pSP1-i cDNA reported is 2.016 kb (Rooney et al., 1988), the hPS12-specific probe, which is supposed to be also capable of hybridizing with pPS1-i, hybridized only to a mRNA band of 1.65 kb. The cause of this discrepancy is not clear from this study. Nonetheless, the presence of multiple species of SPl mRNA in human term placenta is demonstrated by Northern analysis. Multiple mRNAs have also been reported for the CEA gene family (Thompson & Zimmermann, 1988). Thus the SP1 genes, like the CEA family of genes, are probably localized in clusters (Thompson & Zimmermann, 1988). Naturally, there is also the possibility that all SP1 cDNAs so far reported were derived from one very large gene that gives rise to the above three groups of products by differential splicing. Both hPS2 and PSBGE, unlike the other SP1 cDNAs, contain only one n-subdomain. These cDNAs could be formed by having one complete repeating unit containing an n-subdomain and a c-subdomain @e., the n-subdomain in hPS2 is an R2n-subdomain) or by splicing the n-subdomain of one repeating unit with the c-subdomain of another repeating unit (Le, the n-subdomain in hPS2 is an Rln-subdomain). By comparison to the consensus sequences of Rln- and R2nsubdomains of the SP1 cDNAs, it is apparent from Figure 5 that the n-subdomain of hPS2 was probably derived from the R1 repeating unit instead of the R2 repeating unit. A similar conclusion was drawn for PSBGE (Streydio et al., 1988). Figure 3 shows one distinctive feature of hPS2-the presence of an 81 amino acid hydrophobic C-terminus. This feature of hPS2 is very similar to that observed in CEA (Beauchemin et al., 1987; Oikawa et al., 1987; Zimmermann et al., 1987), TM-CEA (Barnett et al., 1989), and NCA (Tawaragi et al., 1988). Hydropathy plot analysis showed that this C-terminal domain of hPS2 is very hydrophobic and supports the inference that it represents the membrane-anchor region of the molecule. Thus it appears that there are two types of SP1 proteins in human placenta, namely, the cytoplasmic or soluble SPl and the membrane-bound SP1. Analogous phenomena have been reported for the CEA family of proteins (Thompson & Zimmermann, 1988; Barnett et al., 1989). Even though the SP1 proteins and CEAs appear to share many common properties, there are sufficient features unique to the SP1 proteins to qualify them as a separate subfamily of the Ig gene superfamily instead of being members of the CEA gene subfamily. Even though the SPls are very similar

PSG Gene Family to CEA and NCA, the homology among the different SP1 family members is even higher, being consistently >90% at the nucleotide level and >85% at the amino acid level. The SP1 proteins are also less glycosylated. The number of potential glycosylation sites ranges from four in SP1-i to eight in PSBGC with the majority having six or seven. Most of these sites are also conserved. On the other hand, both CEA and NCA are more heavily glycosylated, with 27 potential sites in CEA (Beauchemin et al., 1987; Oikawa et al., 1987; Zimmermann et al., 1987) and 12 potential sites in NCA (Tawaragi et al., 1988; Neumaier et al., 1988). A number of these sites are conserved between CEA and NCA but not between CEAs and SPls, as shown in Figure 5B. The strong conservation of both nucleotides and amino acids among the internal repeats of SPl and CEA genes suggests that both gene families evolved recently by the duplication of a primordial gene. It is interesting to notice that the percent homology at the amino acid level between the Rln-subdomain and R2n-subdomain of SP1 is 52.7, which is significantly lower than that between the Rln-subdomain of SP1 and the nsubdomains of CEA (62.4 with IA, 57.0 with IIA, and 59.1 with IIIA) and NCA (62.4). On the other hand, it is comparable with the percent similarity between R2n-subdomain and the n-subdomains of CEA and NCA. Comparison of nucleotide sequences gives the same result. This seems to suggest that duplication of the R1 repeat unit to the R2 repeat unit occurred before the divergence of SPl and CEA and that the divergence of the two genes involved duplication of the R1 repeat unit only. Similarity in structure to the Ig gene superfamily suggests potential immunological function of the SP1 proteins. Preliminary study using purified human placental SPI proteins show that these proteins inhibit stimulated lymphocytes in a mixed lymphocyte assay but not those stimulated by phytohemagglutinin (L. Thompson and W. Y. Chan, unpublished). The specific species of SP1 protein(s) that has this immunosuppressive function, however, is not clear from these studies. Previous studies have indicated that S P l might have a growth-promoting property (Chan et al., 1988b). This property of the SP1 proteins has been demonstrated by using murine megakaryocyte cultures (S. A. Burnstein and W. Y. Chan, unpublished). The presence of multiple species as well as soluble and membrane-bound members in the SPl family might indicate different functions for different members. Cloning of the cDNA of the different species of SP1 will allow testing of the properties of each individual protein after expression of the specific cDNA with appropriate systems. ACKNOWLEDGMENTS We acknowledge the generosity of Dr. Brian Knoll of the Department of Pathology, M. D. Anderson Hospital and Tumor Institute, for use of the placental cDNA library. REFERENCES Barnett, T. R., Kretschmer, A., Austen, D. A., Goebel, S. J., Hart, J. T., Elting, J. J., & Kamarck, M. E. (1989) J . Cell Biol. 108, 267-276. Beauchemin, N., Benchimol, S., Cournoyer, D., Fuks, A., & Stanners, C. P. (1987) Mol. Cell. Biol. 7, 3221-3230. Bischof, P. (1984) Contrib. Gynecol. Obstet. 12, 6-92. Bohn, H . (1972) Blur 24, 292-302. Cerni, C., Tatra, G., & Bohn, H. (1977) Arch. Gynaekol. 223, 1-7. Chan, W. Y., & Qiu, W. R. (1988) Am. J . Hum. Genet. 43, 152-1 59.

Biochemistry, Vol. 29, No. 11, 1990 2851 Chan, W. Y.,Borjigin, J., Zheng, Q. X.,& Shupert, W. L. (1988a) DNA 7 , 545-555. Chan, W. Y., Tease, L. A., Bates, J. M., Jr., Borjigin, J., & Shupert, W. L. (1988b) Hum. Reprod. 3, 687-692. Chan, W. Y., Tease, L. A., Borjigin, J., Chan, P. K., Rennert, 0. M., Srinivasan, B., Shupert, W. L., & Cook, R. G. (1988~)Hum. Reprod. 3, 677-685. Chan, W. Y., Lin, Q., Busch, H., Rennert, 0. M., Tease, L. A,, & Chan, P. K. (1989) Biochemistry 28, 1033-1039. Chomczynski, P., & Sacchi, N. (1987) Anal. Biochem. 162, 156-159. Chou, P. Y., & Fasman, G. (1978) Adv. Enzymol. Relat. Areas Mol. Biol. 47, 145-147. Devereux, J., Haeberli, P., & Marquess, P. (1984) Nucleic Acids Res. 12, 387-395. Dumais, M. M., & Nochumson, S . (1987) BioTechniques 5, 62-67. Fourney, R. M., Miyakoshi, J., Day, R. S., 111, & Paterson, M. C. (1988) FOCUS10, 5-7. Goldman, A., Engelman, D. M., & Steiz, T. A. (1986) Annu. Rev. Biophys. Biophys. Chem. 15, 321-353. Grudzinskas, J. G., Lenton, E. A., Gordon, Y.B., Kelso, I. M., Jeffrey, D., Sobowale, O., & Chard, T. (1977) Br. J. Obstet. Gynaecol. 84, 740-742. Hau, J., Grudzinskas, J. G., Teisner, B., & Svendsen, P. (1983) Placenta 4, 291-294. Horne, C. H. W., Towler, C. M., Pugh-Humphreys, R. G. P., Thompson, A. W., & Bohn, H. (1976) Experientia 32, 1197-1 199. Jameson, B. A., & Wolf, H. (1988) Comput. Appl. Biosci. 4, 181-186. Johnston-Dow, L., Mardis, E., Heiner, C., & Roe, B. A. (1987) BioTechniques 5, 754-765. Kozak, M. (1987) Nucleic Acids Res. 15, 8125-8148. Kyte, J., & Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-131. Maniatis, T., Fritsch, E. F., & Sambrook, J. (1982) Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Messing, J. (1983) Methods Enzymol. IOlC, 20-77. Neumaier, M., Zimmermann, W., Shively, L., Hinoda, Y ., Riggs, A. D., & Shively, J. E. (1988) J . Biol. Chem. 263, 3202-3207. Oikawa, S., Imajo, S., Noguchi, T., Kosaki, G., & Nakazato, H. (1987) Biochem. Biophys. Res. Commun. 144,634442. Oikawa, S., Inuzuka, C., Kosaki, G., & Nakazato, H. (1988) Biochem. Biophys. Res. Commun. 156, 68-77. Rooney, B. C., Home, C. H. W., & Hardman, N. (1988) Gene 71, 439-449. Sorensen, S . (1982) Clin. Chim. Acta 121, 199-208. Stratagene (1988) pBluescript 11ExolMung DNA sequencing system, Stratagene, La Jolla, CA. Streydio, C., Lacka, K., Swillens, S., & Vassart, G. (1988) Biochem. Biophys. Res. Commun. 154, 130-137. Tawaragi, Y . , Oikawa, S.,Matsuoka, Y.,Kosaki, G., & Nakazato, H. (1988) Biochem. Biophys. Res. Commun. 150, 89-96. Thompson, J., & Zimmermann, W. (1988) Tumor Biol. 9, 63-83. Watanabe, S., & Chou, J. Y. (1988a) J . Biol. Chem. 263, 2049-2054. Watanabe, S., & Chou, J. Y . (1988b) Biochem. Biophys. Res. Commun. 152, 762-768.

Biochemistry 1990, 29, 2852-2860

2852

Weislander, L. (1979) Anal. Biochem. 98, 305-309. Williams, A. F. (1987) Immunol. Today 8 , 298-303. Williams, A. F., & Barclay, A. N. (1988) Annu. Rev. Zmmunol. 6, 381-405.

Yoshitake, S., Schach, B. G., Foster, D. C., Davie, E. W., & Kurachi, K. (1 985) Biochemistry 24, 3736-3750. Zimmermann, W., Ortlieb, B., Friedrich, R., & von Kleist, S . (1987) Proc. Natl. Acad. Sci. U.S.A. 84, 2960-2964.

An NMR Study of the Covalent and Noncovalent Interactions of CC-1065 and DNA Terrence A. Scahill,* Randy M. Jensen, David H. Swenson,* Nicole T. Hatzenbuhler, Gary Petzold, Wendell Wierenga, and Nanda D. Brahmel Research Laboratories, Pharmaceutical Research and Development Division. The Upjohn Company, Kalamazoo, Michigan 49001 Received August 8, 1989; Revised Manuscript Received November 16, 1989

ABSTRACT: The binding of the antitumor drug CC-1065 has been studied with nuclear magnetic resonance

( N M R ) spectroscopy. This study involves two parts, the elucidation of the covalent binding site of the drug to D N A and a detailed investigation of the noncovalent interactions of CC-1065 with a D N A fragment through analysis of 2D NOE (NOESY) experiments. A CC-1065-DNA adduct was prepared, and an adenine adduct was released upon heating. N M R ('H and 13C) analysis of the adduct shows that the drug binds to N 3 of adenine by reaction of its cyclopropyl group. The reaction pathway and product formed were determined by analysis of the 13C DEPT spectra. An octamer duplex, d(CGATTAGC.GCTAATCG), was synthesized and used in the interaction study of CC-1065 and the oligomer. The duplex and the drug-octamer complex were both analyzed by 2D spectroscopy (COSY,NOESY). The relative intensity of the NOES observed between the drug (CC-1065) and the octamer duplex shows conclusively that the drug is located in the minor groove, covalently attached to N 3 of adenine 6 and positioned from the 3' 5' end in relation to strand A [d(CGATTA6GC)]. A mechanism for drug binding and stabilization can be inferred from the N O E data and model-building studies.

-

C C - 1065, a fermentation product of Streptomyces zelemius, was one of the most cytotoxic antitumor agents known when discovered (Hanka et al., 1978):

A

B

C

The isolation and characterization of this antibiotic has been reported (Martin et al., 1980, 1981), and the X-ray crystal study including an interaction study of CC-1065 with DNA was published shortly thereafter (Chidester et al., 1981). The antibiotic has a unique structure possessing a cyclopropyl ring with alkylating potential and a twist or pitch in the backbone of the molecule giving it a half-moon or banana shape with concave and convex sides. This structural feature permits a compatible fit in the minor groove of DNA. Experimental evidence for minor groove binding includes competitive binding studies with netropsin, site-exclusion studies, and reduction of the alkylation of minor groove sites after treatment of the DNA with methylation agents (Swenson et al., 1982). Molecular model building using CPK models and computer

* Author to whom correspondence should be addressed. 'Present address: Visiting Scientist at the National Center for Toxicological Research, Jefferson, AR 72079. 5 Present address: Bio-Rad Laboratories, 1000 Alfred Nobel Dr., Hercules, CA 94547. 0006-2960/90/0429-2852$02.50/0

graphics have also been used to understand the drug-DNA interaction in a qualitative way. (+)-CC-1065 was shown to be efficacious in the treatment of experimental tumors in mice (Reynolds et al., 1986; Martin et al., 198 1). Unfortunately, (+)-CC-1065 also produced delayed death in mice at therapeutic doses (McGovern et al., 1984), and subsequently, its development was halted. However, this lead compound has led to the development of potent, efficacious analogues that do not cause delayed death and are promising as therapeutic agents (Warpehoski et al., 1986; Li et al., 1987). Preliminary reports (Hurley et al., 1984; Scahill et al., 1986) detailed the structure of the adduct formed upon reaction with calf thymus (CT)' DNA, and two others demonstrating the sequence specificity of the drug with DNA have also appeared (Reynolds et al., 1986; Hurley et al., 1988). While CC-1065 interacts with DNA and is thought to exhibit its cytotoxic effects through disruption of DNA synthesis, the exact nature of the DNA binding has been elusive. Circular dichroism (CD) studies (Krueger et al., 1985, 1987) suggest strong binding to duplex DNA, especially sequences rich in adenine-thymine (A-T) base pairs. CC-1065 and DNA interact Abbreviations: NMR, nuclear magnetic resonance; NOE, nuclear Overhauser effect; COSY, homonuclear correlated spectroscopy; NOESY, two-dimensional NOE correlated spectroscopy; COLOC, correlated spectroscopy with long-range coupling; DEPT. distortionless enhancement with polarization transfer; CT, calf thymus; FID, free induction decay; CD, circular dichroism; HPLC, high-performance liquid chromatography.

0 1990 American Chemical Society