Human von Willebrand factor gene and pseudogene - American

Aug 3, 1990 - pseudogene is 21-29 kilobases in length and corresponds to 12 exons (exons ... not only to the vWF locus on chromosome 12 but also to...
1 downloads 0 Views 2MB Size
Biochemistry 1991,30, 253-269

253

Human von Willebrand Factor Gene and Pseudogene: Structural Analysis and Differentiation by Polymerase Chain Reaction?,$ David J . Mancuso,s Elodee A. Tuley,s Lisa A. Westfield,§ Teresa L. Lester-Mancuso,%Michelle M. L e Beau," James M. Sorace,sJ and J. Evan Sadler*,s Howard Hughes Medical Institute Laboratories, Department of Medicine, and Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 631 10, and Joint Section of HematologylOncology, Department of Medicine, University of Chicago Medical Center, Chicago, Illinois 60637 Received August 3, 1990; Revised Manuscript Received September 1I , I990

ABSTRACT: Structural analysis of the von Willebrand factor gene located on chromosome 12 is complicated

by the presence of a partial unprocessed pseudogene on chromosome 22qll-13. T h e structures of the von Willebrand factor pseudogene and corresponding segment of the gene were determined, and methods were developed for the rapid differentiation of von Willebrand factor gene and pseudogene sequences. T h e pseudogene is 21-29 kilobases in length and corresponds to 12 exons (exons 23-34) of the von Willebrand factor gene. Approximately 21 kilobases of the gene and pseudogene were sequenced, including the 5' boundary of the pseudogene. The 3' boundary of the pseudogene lies within an 8-kb region corresponding to intron 34 of the gene. The presence of splice site and nonsense mutations suggests that the pseudogene cannot yield functional transcripts. The pseudogene has diverged -3.1% in nucleotide sequence from the gene. This suggests a recent evolutionary origin 19-29 million years ago, near the time of divergence of humans and apes from monkeys. Several repetitive sequences were identified, including 4 Alu, one Line-1, and several short simple sequence repeats. Several of these simple repeats differ in length between the gene and pseudogene and provide useful markers for distinguishing these loci. Sequence differences between the gene and pseudogene were exploited to design oligonucleotide primers for use in the polymerase chain reaction to selectivity amplify sequences corresponding to exons 23-34 from either the von Willebrand factor gene or the pseudogene. This method is useful for the analysis of gene defects in patients with von Willebrand disease, without interference from homologous sequences in the pseudogene.

-

H u m a n von Willebrand factor ( V W F ) is ~ a complex multimeric plasma glycoprotein that is required for platelet adhesion to damaged endothelium and for factor VI11 survival in the circulation [reviewed in Sadler (1989)l. Deficiency of v W F results in von Willebrand disease (vWD), which is the most common inherited bleeding disorder in man if all degrees of severity are included (Rodeghiero et al., 1987). Many deficient phenotypes have been identified (Ruggeri & Zimmermann, 1987). The human v W F gene has been cloned, and its structure was reported in detail recently. It spans 178 kb of DNA and contains 52 exons. A high-resolution restriction map was constructed for -75% of the locus, and -20% of the gene sequence has been reported (Mancuso et al., 1989b). I n the course of characterizing the vWF gene, restriction fragments from the center of the v W F cDNA were found to hybridize not only to the v W F locus on chromosome 12 but also to sequence on chromosome 22 (Shelton-Inloes et al., 1987;

-

'This work was supported by the Howard Hughes Medical Institute and (M.M.L.B.) by the University of Chicago Cancer Research Foundation and the Leukemia Society of America. *The nucleotide sequences reported in this paper have been submitted to the GenBank/EMBL Data Bank under Accession Numbers M36184 and M36185. * T o whom correspondence should be addressed at the Howard Hughes Medical Institute, Washington University School of Medicine, 660 S. Euclid Avenue, Box 8045, St. Louis, MO 631 10. 9 Washington University School of Medicine. 'I University of Chicago Medical Center. Present address: Chief, Blood Bank and Hematology, Laboratory Service ( 1 I3), Veterans Administration Medical Center, 3900 Loch Raven Blvd., Baltimore, MD 21218.

0006-2960/9 1/0430-253$02.50/0

Mancuso et al., 1988). This cross-hybridizing locus proved to be a pseudogene for vWF. The presence of this pseudogene intereferes with analysis by Southern blotting of gene defects in patients with vWD and has caused some difficulty in the identification of restriction fragment length polymorphisms for the gene. The v W F pseudogene corresponds to a region of the vWF gene that encodes important functional domains of the protein and also has an interesting evolutionary history. The human vWF precursor consists of four types of repeated domains, present in 2-5 copies each in the following order: D1-D2D'-D3-Al-A2-A3-D4-Bl-B2-B3-Cl-C2 (Bonthron et al., 1986; Verweij et al., 1986; Shelton-Inloes et al., 1986). The segment of the gene that is duplicated in the vWF pseudogene encodes domains Al-A3. The v W F A repeats appear to be homologous to segments of several proteins: complement factor B (Shelton-Inloes et al., 1986; Sadler et al., 1986; Mole et al., 1984), complement component C2 (Bentley, 1986), cartilage matrix protein (Kiss et al., 1989), all three a-chains of collagen type VI (Bonaldo et al., 1989, 1990; Chu et al., 1989; Koller et al., 1989), the a-subunits of the leukocyte adhesion receptors Mac-1 (Arnaout et al., 1988; Corbi et al., 1988; Pytela, 1988), p150,95 (Corbi et al., 1987), LFA-1 (Larson et al., 1989), and the a-subunit of the platelet collagen receptor (Takada & Hemler, 1989). The present distribution of v W F A domain containing genes may have arisen in part

'

Abbreviations: bp, base pair(s); kb, kilobase(s); PCR, polymerase chain reaction; vWF, von Willebrand factor: vWD, von Willebrand disease; R, a purine nucleoside, adenine or guanosine: Y , a pyrimidine nucleoside, thymidine or cytidine.

0 199 1 American Chemical Society

254 Biochemistry, Vol. 30, No. 1, 1991

Mancuso et al.

by repeated gene segment duplication and exon shuffling (Gilbert, 1978). The A domains of vWF contain binding sites for heparin, collagen, and platelet glycoprotein Ib (Mohri et al., 1988; Fujimura et al., 1987; Pareti et al., 1986; Roth et al., 1986). One can speculate that the homologous A domains in other proteins might also recognize one or more similar ligands. In particular, the collagen-binding properties of the platelet collagen-receptor might be dependent upon the presence of the v W F A domain in the a-chain. We report here -21 kb of DNA sequence for the v W F pseudogene and for the corresponding DNA sequence of the vWF gene, including the 5’ boundary of the pseudogene. the pseudogene is -21-29 kb in length and has diverged -3.1% from the gene. Sequence differences between the gene and pseudogene were exploited to selectively amplify segments of either locus by using the polymerase chain reaction (PCR). This method permits the analysis of gene defects in patients with vWD without interference from the pseudogene. EXPERIMENTAL PROCEDURES Materials. Restriction enzymes, T4 DNA ligase, T4 polynucleotide kinase, and T4 DNA polymerase were purchased from Bethesda Research Laboratories or New England Biolabs. The Klenow fragment of Escherichia coli DNA polymerase and calf intestine alkaline phosphatase were purchased from Boehringer Mannheim. Deoxycytidine 5’[ L U - ~triphosphate ~P] ( [ c + ~ ~ P ] ~ C Tdeoxyadenosine P), 5’-[a35S]thiotriphosphate( [35S]dATPaS),and [y-32P]ATPwere purchased from Amersham. Taq polymerase (Amplitaq) was purchased from Perkin-Elmer Cetus. Placental DNA was purchased from Sigma. Isolation of u WF Pseudogene DNA Clones. The isolation of clones X6, pWE64, and pWEl13 was described previously (Mancuso et al., 1989b). The bacteriophage X Charon 4A library was derived from fetal liver and was provided by Tom Maniatis (Harvard University, Boston) (Lawn et al., 1978). The pWEl5 cosmid library was derived from human placenta (pWEI5) and was provided by Glen Evans (Wahl et al., 1987) (Salk Institute, San Diego). Chromosomal Localization by in Situ Hybridization. Human metaphase cells were prepared from phytohemagglutinin-stimulated peripheral blood lymphocytes. The vWF cDNA fragment of plasmid X3/SacI-2a, which contains the 5’ 3.8-kb Sacl-Sac1 fragment of XHvWF3 (Shelton-Inloes et al., 1987) cloned in the Sac1 site of pUCl8, was radiolabeled by nick translation of the entire plasmid with all four 3H-labeled deoxynucleoside triphosphates to a specific activity of 1 .O X 10’ dpm/Fg. In situ hybridization was performed as described previously (Le Beau et al., 1984). Metaphase cells were hybridized at 2.0 and 4.0 ng of probe/mL of hybridization mixture. Autoradiographs were exposed for 1 1 days. Polymerase Chain Reaction. The polymerase chain reaction was performed (Saiki et al., 1988) with a Perkin-Elmer Cetus DNA thermal cycler. Typically, 100 ng of a cosmid or bacteriophage X DNA isolate was amplified with 2.5 units of Taq polymerase for 30 cycles of the following: 30 s at 94 OC (denaturing), 30 s at 50-60 OC (annealing), and 1 min at 72 O C (extension), with a final extension time of 5 min at 72 OC. Human peripheral blood leukocyte DNA was amplified similarly with the following modifications. Reactions without enzyme were preheated at 99 OC for 5 min and then at 96 OC for an additional 5 min after addition of Taq polymerase. DNA was amplified for 30 cycles of the following: 1 min at 96 “ C (denaturing), 1 min at 53-55 OC (annealing), and 2 min at 72 OC (extension) with a final extension time of 5 min at 72 OC.

-

22 Chromosomal localizationof the human vWF pseudogene. Shown in the figure is the distribution of labeled sites on chromosome 22 in normal human metaphase cells from phytohemagglutininstimulated peripheral blood lymphocytes hybridized with a vWF cDNA probe. Of 38 labeled sites observed on chromosome 22, 23 (86.8%) were clustered at 22q11-13, with the largest cluster at 22qll-12. FIGURE I :

Computer-Assisted Methods. Computer programs were those of the Genetics Computer Group (Madison, WI) or Compugene (St. Louis, MO). Nucleotide sequences were analyzed and compared to the Genbank Genetic Sequence Data Bank (BBN Laboratories, Inc., Cambridge, MA, release 62.0, December 1989) and the EMBL database (release 19.0, May 1989) with the program FASTA (Lipman & Pearson, 1985). Other Methods. Restriction mapping and Southern blotting, DNA subcloning and sequencing, and preparation of peripheral blood leukocyte DNA were performed as described previously (Mancuso et al., 1989b; Grunebaum et al., 1984).

RESULTS Chromosomal Localization of the u WF Pseudogene. A restriction fragment from the middle of the v W F cDNA, corresponding to amino acids 144-544 of the mature v W F subunit (Shelton-Inloes et al., 1986), was hybridized to human metaphase chromosomes. Specific labeling of chromosomes 12 and 22 was observed due to the presence of the vWF gene and pseudogene on these chromosomes, respectively. Of the 38 labeled sites observed on chromosome 22, 36 (94.7%) were clustered at bands qll-13 with the largest number of grains at bands ql1-12. This cluster of grains represented 16% (38/238) of all labeled sites (p 64% identical residues between pseudogene nucleotides 17104 and 17800. The inverted Alu repeat repeats between pseudogene nucleotides 20377 and 20724 apparently is inserted into the Line-1 element. Direct repeats

-

-

-

256

Mancuso et al.

Biochemistry, Vol. 30, No. I , 1991 1 GAGCAGAATCTTTGGGGAAGGGAGGGAATGAGTCATGCCAGTATCTAGGGGAGAGAGTTTAGGCAGACGGAACMGCACG

81 TGCCGAGGCCTTAAGGTGGTAGGTGTCTGCAGCCGTTGTACAAGGAGGTCAGTGTGGCTACAGAGGAGGGACTTGGGGGT

1 GACCTTAAAATGGAGAGATTATCCTGAATTATCTAGGT 161 AATGGTAGTTCACGAGGGCAGGGAGGGGCAGTAGCCATACCATGGGGGCTACTAGGCCATGGGAAGATCGTTGGCC~TT 39 GGGCCCACTGTAGTCACAAAGAACCTTCAAACAGGAAGGCCGAAAGACCAGAGTGAGTAGAAGAGCTTGGGG~CTGG~ 241 GCCCAACAGTGAGAACATTCTTCCTCTATGTTTCCTGGTGACTTACACCCCTTGTACCAAGCTGAATGCAGCTTATTG~ 119 GCAAGATAGAATACATTTTTTCTCTTTGTCTCAACATCAGGGCTCTAAGTATATTGTAAAAGTATAGCACCTTTTCTTCT 321 199 CCAGGCCAAATATTTCCAGTTTTTTCTGTAGTTTTTTTTTTGTTTTATTATTATTTTTAGTTGACATTTAACTGTATGTA __401 279 TTTACTGGTGACAGTGTGATATTTCAATACATGTATACAGTGTGTAATGATCAAATCAGGGTAGTTAGTATATCTC~CC 478 G G C G 359 CTCAAACATTTACCATTTCTTTGTGTTGGGAACATTCAAAGTCTGCTCTTCCAGCTACTTGAAAATATACAATAAACTGT 558 439 TGATAATTATAGTCACCCTGTGGTGCTACAGAACACTAGAACTTATCCCTCCTCTTTGTGTAGTTTTCACTTGTTTCTCA 638 C 519 CTCATTCATAACTACTTTTTCAGAACTTGTTGCATGTTGCACTCCATGATGGGTGAACATGGAGACAGGTGAGACATATC 718 G 599 TTCCTGGTTACATTTATGATAGAGCAGAATTCTGCCCAGCCTGGTCTGACAGGCTGCGGTTCTCCAGGCAAGGGGACTC 798 A G AC AC 679 ACCTGCCCAGAGGCAGGCAGTTCTATAACGTTCCCTCCTTGGACCCCTACACCTCCATTGGTGGAGCAGTGTACCTGGCT 878 T T 759 CCTCTCTCTGCTTCTACGCTTGTTGGGGACAGCGATAGTTCCCACCAGTGATCTCAGGGCCAAGGCTGCCTGATTCCCAC 958 T T 839 TTCTGCCCTTGGCTGACTATGTGACATGGGCATGTTGCCTCTCTTTGTTTCCATAGCTTC~TAAAATGGGGCCAGCM -T 1038 C 919 GGAAGCTCAGGAATAGGGCTTGGCAATGGCAAGGCTTTGCTGCTCACCTCGGGCCTCCTCTGAGTCTCTGTCCCGCTC~ G T 1116 999 CCTCCTCTCCCTCGAATGCCCTCTGCCTCCATTGCCGCCAGGAACGTTCCCCTTTCCCCTGAGCCAGAGAGCATGCCC~ 1196 T T G T 1079 GGGCTTGACGGTGCTCATCCCTCAACTTGTCTCTCAA~GAGAAAGTGTGTGGCCTGTGTGGGAATTTTGATGGCATCCAG 1276 Bxon23+E K V C G L C G N P D G I 0

1003

1159 AACAGTGACCTCACCAGCAGCAACCTCCAAGTGGAGGAAGACCCTGTGGACTTTGGGAAGCTCCTGG~GTGAGCTCGCA A 1356 N S D L T S S N L O V E E D P V D P G N S V K V S S Q 1030 N 1239 GTCTGCTGACACCAGAAAACTACGTCTGGGTCTCTGTGTGGACAGAGCCCGAGAGCTTGCTTCCTGGAATGTCCCTCTGT G T 1436 S A D T R K-

c-

1036

C 1319 CCCCATTCTTATGGGGGCTGGAAGGGGGGGTTGTGGGTGGTATGACCTTCAGGTGGCTGCAGGGTGGGAGGAGGGTCTCT 1516 G C C 1399 TGGATCCTTCTGGGCTGAATAACCCCAGTTTGACCAGCTGACGGCTGGCCTATCTCTTGCCTGGTTCCC~GTGCCTCTG 1596 Exon 24 + V P L

1039

1479 GACTCATCCCCTGCCACCTGCCATAACAAGCATCATGAAGCAGACGATGGTGGATTCCTCCTGTAGAATCCTTACCAGTGA 1676 D S S P A T C E N N I ~ K O T ~ V D S S C R I L T S1066D 1559 CGTCTTCCAGGACTGCAACAGCTGGTGAGGACCTTGAGGGTAGTGGGAGCAGATGGTCCCAAGGCTTGGCCCAGTGGT 1756 C TG V P O D C N K L 1639 ATGGACACAGAGTGTGACCTTCTAACGTGGACACTACCCTTGTGTCTTGACATGATCTGCACCAAGACACCACTTTAGCT 1836 C CG 1719 TTTTTTTCTTGGCTTTCAATCTGGGAAACAAAAAGTAAAATC~CAGTTTCTAGGGGAAGCAATGCCTGGC~CATTT 1916 1799 CCTTCTGCATGAGAAGTAACTCCCCTTGGCATGTGCCAACGTTTCTCTTTCGGCCCCAGTCTTAGGATTTGTTCTCTTAT T C A 1995 1879 GGAAGTATCTTGTTTTCAACACCAGAGCCAGAGATTTCCTTTTCCTGTCACTGCTGCATTTGTCCAGACC~GACCGT T 2075 T 1959 CCTCTCCCACTCCTCAAAACCCCTTGGTGCCCATTTCTTGTCTCACAGAAATTCTTTTCTGGCCTTACTTTTGGTGATTT 2155 C CT A 2039 TGAGTCCTCGTATTATGACTTATTTTTGTGTCTTCATCTCTAATGACAAGGAGGAACTCCCCAGGGTTAAGTGCAGTTGT T G A 2235

1074

Biochemistry, Vol. 30, No. 1, 1991 257

von Willebrand Factor Gene and Pseudogene

CTCTCTTCTTTTTTTTT------GAGACAGAGTCTCGCTCTGTCACTCAGGCTGGAGTGCAATGGCATGATCTTGGCTTA 2119 T TTTTTT C 2315 2193 GTGCAGCCTCTGCTTCCCGGGTTCAAGCGATTCTCCTGCCTCAACCTCCCGAGTAGCTGGGATTACAGGTACCCGCCACC C T C C C T 2394 2273 ACACCCAGCTAATTTTTGTATTTTTAGTAGAGACGAGGTTTCACCACGTTGGTCAGGCTGGCCTCGAACTCCTGACCTCA G T A T T 2474 2353 GGTGATCTACCCACCTGAGCCTCCCAAAGTGCTGGGATTACAGGTGTAAGCCACCATGCCCGACCA~AGTGC~GTCT CG G 2553

CTTATCTGCTGCATGTATGTGTGTACACACATGTGGACATGGGCTTGTACATGGACATATGTGACTACAGAAGTCTATGT 2433 2633 G 2513 ACACACACACAGCCATGCACACATACATGTTGCTGTGTAGCACCACCAATGTGGGAGAGACTGGTTG~CATGGATCT __ C A 2713 2593 CAATTCTCTTTCTATCTATGCAGTGATTTTCAGTCTCTGAGGACTCCAAAGGATACTTACATTCCC-GGTTTGGTGG~ G T 2791 2672 TCCTGGGCATCTGAGTTGGAAGAGTGAGGACAGGGGAGGAGCTGGGGACATTGGGACTGTTGG~CGTCTTGGAAAC~T T A 2871 2752 GACCCACTCAGTGTCTGAATTCATTTTCTGTCATAACTGCCCTG~GTCCAGTGTGTTTAGAGGCGTGT~TGGG-A 2951 G 2831 TGAGGAAGGGTGGGAACTGGTTGAACTGGATTTGGAATCAGAGTCTAGGCCCTATTGTTCTGCATACCTGCCCCATAGCA 3031 T C 2911 CTGCAGTGAGGCAGCTGCAGGGCTTGATGTATTTCCCCATTCTGCTTGCTGAATTGAGGCAAAGAAAGACAGTGACCAGC 3111 C CT 2991 ACATATGTGTGTTTGTGTTTTTGTAAAAACACCCACATGCTCATGAGGCCAAGAGTGGGTTGTGAGGACAGATGGGTGGC 3191 G T 3071 TGAGCAGGGAGGCAGGCAGAGGGACAGAGGGAATGTTCTTCTGGAATATCCTCAGGCTCATTGTGTTCTGCAGAAGGCCA 3271 T A 3151 GCAGCACTGCATTATTCAACTCTTCTTGCTGGAATGCAGATTAGAAACTAAGAATCTTGCCTTCCCACTCATTCCCTCTT 3351 3231 TGAGACCATTGAGCTACATTTCTCCTTCTACCTGGACCCCCCTTATCCTTAAATTGACCATCAGAACATTTGCACCCAGA 3431 G 3311 CTAAGAGCCAGAGTTTCTGACACCTGGCCATAGGCCTGGGCCACCTGAGGCTACCTTTGC~GTGGACCCCGAGCCATAC C G T 3510 Exon 25 .* V D P E P I 1080 3391 CTGGATGTCTGCATTTACGACACCTGCTCCTGTGAGTCCATTGGGGACTGCGCCTGCTTCTGCGACACCATTGCTGCCTA 3590 L D V C I Y D T C S C E S I G D C A C P C D T I A A Y 1107 3471 TGCCCACGTGTGTGCCCAGCATGGCAAGGTGGTGACCTGGAGGACGGCCACATTGTGCCATGAGTCCTGACACCCTCATG 3670 E A G A E V C A Q E G K V V T V R T A T L C

1126

3551 TTCTCAGATGCCCTCCCTTCTTCCCATGTGTCTATGCTTGAAGACCTTGTGAGTGCAGGGGGATATCTTCATGGGCGAGA 3750 3631 GAAGACAG--TGCTAGTATGGGACTGGATGGCCAAGGGCTAACGAGGGTTAAGACGTTGGCTGAGTAAGAAGTTTATATT AG T G A T 3830 3709 ATGGACTTTGAGGCGGGACATGGATTCAAGGCATGAACATGTGGATACTTTTCTTCTGGAGAGATTCTGGCAGAGGAGAA T G G 3909 C G--3789 GAGGGAATACTGATGAAAAGAAGGAAGTTGATTTATGTCTTTAATTAGGCGTGATTATATTTGCCAATATGAGTGATC~ C 3986 3869 CTCATACATTCATGTCTATAGAATCTTCCTTCTTTGGACATAAGCAACTTAATGCTTTTATGT~C~GGCCAGGC G G T -G 4066 3949 ACAGTGACTCACACCTGTGATACCTGTGTTTTGGGAGGCTGAGGCCAGAGGATTCCTTGAGCCCAGGAGTTCAAGACCAG T A C 4146 4029 CCTGGGCAACATAGCAAGACCCCATCTCTACAAAAATT~GAAATGAATTCAGAACCAATAGATTCTGGTTTAGG 4226 G 4109 TGCTTCAACAATCCAGAAGTCTCTAATATTGGTGACGCCCATAGTCCCTTAGTTCCCCAACATTATCTCCAGATGCTGCA C GC 4306 4189 GGCCATCACCACATGGGTCTGCAGTCCTGGAGGCTTTGCCTTTTGTGGCCACAGCCTTGTCTCCTGTCTACAC~CCCAG G 4386 Exon 26 .* P Q 1128 4269 AGCTGCGAGGAGAGGAATCTCCGGGAGAATGGGTATGAGTGTGAGTGGCAATAGAACAGCTGTGCACCTGCCTGTCAAGT C GC T 4466 S C B E R N L R E N G Y E C E V Q * N S C A P A C Q V 1155

Y

258

Biochemistry, Vol. 30, No. 1, 1991 4349 4546

Mancuso et a].

TACGTGCCAGCACCCTGAGCCGCTGGCCTGCCCTGTGCAGTGTGTGGAGGTCTGCCATGCCCACTGCCCTCCAG~GAGG C T A G T C O E P ~ P L A C P V O C V E V C E A E C P P G

1179

4429 4626

CCTCTATCCCTGGGGGTCGGGCTGGTGGGATGGGATAGGGATGGATGGAAAGGTGCTTCTAGGTCTTGCTTCATCTCAGC A

4509 4706

CTCCAICCTGCCACATCCCATCTCTGACCTGCAA~CCTGCCACCCCATCTCTGACCTGC~~CCTGCCACCCCATCTCTGA _________________________________________----~

4589

C C T G C A A C I C C T G C C G C C C C A T C T C T G A ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ C ~ ~ ~ ~

G

T

---___________________________________

4669 4740

ACCTGCAA~CCTGCTGCCCCATCTCTGACCTGCAAG~CTGCCGC~GGTTCCATGGGTTCTTTCATCAGAGTCAGGCCAGT ___________________-________________ G T G A

4749 4784

TGTGATTTTTCTCAAGTCGAGCTCCTCCAIVLATGCTTTTCTGTGCCTATTTGTGGCATTCTCATCTAAAGCAGCCCCTGC C A G C

4029 4864

CGATAGAACTTTCTGCAGTGGGGG~TGTTGTATTGAATGTAACTGTGACGAGTGTGTCTGAGGAACTGAATTTTTGAAT G

4909 4944

TTTATTGAATTTTACTTAATTTAACTTAAATAGCCACATGCAGTTTATGACTATCCTATGGTACAGCACAGCTCGAGAGT G C G G T

4989 5024

TACTGAAGGTGTTATTCAGAAGAGCAGAAAGAGCCCTGGAGATAAGATTTCATTTGTCCTGAGGCTTGGGGAGGTGAGGT C

5069 5104

AGGATGAAGGAATCCCCGCTCCCAGTTTTGCAGAGGGATCAATCAAGGCACAAGGAGGAGAGATGCTCCTTGAGTGGCAG G C ATG

5149 5184

GGTGACCCCTAGGGGTGCAGGCAGGAGGAGTTGGCTTCTAGGGCAGGAGGAGGAGTTGGCTCCTCCCTTTTAGTT~ G A

5229 5264

TGAGACTTCCTTGTGGGAAAGGGGAGCGTTTTGGTTCCTAATGAGAGCTTTCTTTTGC~GGAAAATCCTGGATGAGCTT G C Exon 27 + G K I L D E L 1186

5309 5344

T A

C

G

L Q T C V D P E D C P V C E V A G W R P A S R K K V T l 2 1 3

R

G

5389 5424

CTTGAATCCCAGTGACCCTGAGCACTGCCAGATTTG~AIVLACAGATGCCTGGATTGTTTG~GTGACGAATCTTATTGC T G T L N P S D P E E C O I C 1225

5469 5504

TTCTCCAGGGGCCAGCTCAG AG ACTAGGTTTTGGGT T

AAAA

5549 5584 5629 5664

TGGACTAGACTGCTTCCTCCACATATATTCCTCTACAATGCAGGTTTTT

5709 5744

GCTTTTCTCACTGAAGGTGGGGGTATGGTCACTTATTGTCCTTGTAGGTCATGACTAGTCTGAGCACAGGTGTCCATAT~

5789 5824

CTGGAGGTGGCGACTCAGGAGAGGAAAGTGAGATGGAACGCGGGGCCCTCAGGGTAACATGAACGTCGTGGGGGCCATAG G C A

5869 5903

TCAAGGTGGCAGGGTGTGGCTGTCATAATGGGGAGGGAAGGTGCTCGTTGCTCTCACCCTGGGTTCTCACTGGACTTGCC

5949 5983

TCTGGGAAACTGGCCTTTGGGGAGGTGAGAAGGAGTCCTCAGGGTGAGCTCTGAGCCCTCTGAAGTCCTGCTCATAAGTA

6029 6063

GGGAACATTTATTAAAATCCACATCTATTCAACACCACCACTCCCTGACACTGATCAGGCCTTCTGGGAGCTT~CTTGAGA G G A A

6109 6143

ATGTAAATTCGCTGGGGAGGGGCTATGGTCACAGCTGCCAGAGTCAGATCCTGCCACCACCATCTGCAC~CAAGAACG C T G T T

6189 6223

AACACTGCCTGGGGTGAGTTTTCTAGCAAACACAGGAAACTAGAAAGTATGTCACTCACGCACCGTTAACCAAGGATTGA T T

6269 6303

AGCAATAAATCTCATGGATCATGGGAAGGAGGAATTAACTTCCTAGTAAGACTCTACCTGCGCACACTCTTTATG~GAA TG C

6349 6383

ACAGGGACTGTAGGGAGCAAAGGATTGGGTGATTCTGATGCTGGCTTGGGGTTCTGTGCC~GGGCATGAGAGCTTCG A C

6429 6463

AGCCCTCTCCCCACCTCCTTTGCTGGCACACGGGCAGGGGCAGCTGCCACGGTGCACGCTGAAGCTGC~GCAGTGTGTG

6509 6543

GACAGAAGTCCCACAGCAGGCTGGGGGCATCACCTGGTATCAAGCCACAGGGGGACACTTCCCAGGCTCAGGGCACTGCC T G G

6589 6623

GCCTTGTCAGCTTTACTGGGAGTGCTGAAGGCCAGGTGCCTCCAAGGCTGAGAGTCCAAAGAGGAGATGGTCCTGGCCCA

C

G

C

von Willebrand Factor Gene and Pseudogene

Biochemistry, Vol. 30, No. 1, 1991 259

6669 TATTGGTCTCCACCACATGGCCACTTGATTTCACCTGTGGGCTGGCAGGCCGTGCCACTTTCCGGGACACTGGC~G C T 6703 6749 GTGATGTTACCAGCAGAGCCTCGAAATTTGGCATAGTAGTGTAGGAATTTGTGTGTGTGGGGCCACTGCTCTACTACAGA G G G 6783 6829 CTTGTGGTGTGTGCTTGGGGCTGGGCACAGAGCCTGGCCAATACCTGGCACATGAGAGGTGCTCACCCTATGTCTGCTGG G 6863 6909 AGGAATGAGTGAATGTCTTTTCTTAGGCCTAACACCTTGGGGACCCCTGTCTGCTACTCCCAAATAATTTGTTCTGCTAG G G A 6943 6989 ATAAAGCTGGGGTTTTGAATCTTTATTCATCTTTCCACAATGTTTTTGCTCCAATTTTTTCTCTCTTCGTTAGGGCCCAA T 7023 7069 CGTATTAAACATGTTTTACTCATGCAATTTACATTCTGTTAGAATATGTACTGTAAAGCATTTATCTTCAACAGAAC 7103 A C 7149 ATGTACAAGCACAGGTTAGCGTCTACATTTTAGTAACATTTGTTCTTGGAGACACTTGTAAGAAGGCTTAGATTATAATT A G 7183 7229 AAGTATTCATGTTGCTAAAGATGTAGGTTGAAAAGAAATCAGCTTTGAGTGGCTGTGTGGATGGCCTAGAACAACGAGTA T 7263 7309 TATTTTTTCATATCCTAGGATGGGCTAAAGCTTTGGTTTCTAGAATCAAAGGAAGTCGGCTATGTGTGTGTTTTGAAGTG GT 7343 7389 GGGGGGCATGCTATTTGGGGACAGATGTTAAACAGTGACATCTCACTTGGGTGTGGAATGGTCCGTGGGATCTCAAGTTC A A A 7423 7469 AGGTGGAACAGAGGAGATTCTGTGGGAATATGGAAGTAGTCATACACTGTAGGGCTCAGAAGTGTCCATAGGTTCTTTCT CA TG C C 7503 7549 GAACCATTTTAATTTCTTCGCTCTTTTCTGC~CCACTGTGATGGTGTCACCCTCACCTGTGAAGCCTGCCAGGAGCCGG T A 7583 Exon28iE

V T L T C E A N

C D G

V

C

Q

E

1240

P

7629 GAGGCCTGGTGGTACCTCCCACAGATGCCCCGGTGAGCCCCACCACTCTGTATGTGGAGGACATCTCAGAACCGCTGTTG G G C 7663

1267

G G L V V P P T D A P V S P T T L Y V E D I S E P L L P

7709 CACGATTTCTACTGCAGCAGGCTACTGGACCTGATCTTCCTGCTGGATGGCTCCTCCAGGCTGTCCGAGGCTGAGTTTGA G 7743 E D P Y C S R L L D L I P L L D G S S R L S E A E P E l Z 9 4

V

7789 AGTGCTGAAGGCCTTTGTGGTGGACATGATGGAGCGGCTGCGCATCTCCTAGAAGTGGGTCCGCGTGGCTGTGGTGGAGT C C 7823 V

L

K

A

P

V

V

D

~

~

E

R

L

R

I

S

*

K

Y

V

R

V

A

V 1320 E

V

Q

7869 ACCACGACGGCTCCCACGCCTACATCGGGCTCAAGGACCGGAAGCGACCGTCAGAGCTGCGGCGCGTTGCCAGCCAGGTG 7903 A

1347

Y E D G S E A Y I G L K D R K R P S E L R R V A S Q V I

7949 AAGTATGCGGGCAGCCAGGTGGCCTCCACCAGCGAGGCCTTGAAATACACACTGTTCCAAATCATCAGCAAGATCGACCG T T 7983 R Y A G S Q V A S T S E A L K Y T L F Q I I S K I D R l 3 7 4

V

P

8029 CCCTGAAGCCTTCCGCATCGCCCTGCTCCTGATGGCCAGCCAGGAGCCCCAACGGATGTCCCGGAACTTTGTCCGCTACG C 8063 P

E

A

P S

R

I

A

L

L

L

~

A

S

Q

E

P

O

R

~

S

R

N

P

R 1400 Y

V

8109 TCCAGGGCCTGAAGAAGAAGAAGGTCATCGTGATCCCGGTGGGCATTGGGCCCCATGCCAACCTCAAGCAGATCCGCCTC T 8143 V Q G L K K K K V I V I P V G I G P E A N L K Q I R L

1427

8189 ATTGAGAAGCAGGCCCCTGAGAACAAGGCCTTCGTGCTGAGCGGTGTGGATGAGCTGGAGCAGCAAAGGGAAGAGATCGT 8223 C A C I E K Q A P E N K A P V L S G V D E L E Q Q R E E I V l 4 5 4 S D

8269 TAGCTACCTCTGTGACCTTGCCCCTGACGCCCCTCCTCCTACTCTGCCCCCCCACATGGCACAAGTCACTGTGGGCCCGG A 8303 S

Y

L

C

D

L

A

P

D

A

P

P

P

T

L

P

P

E

~

A

Q

V

T

V

P 1480

G

E

8349 GGCTCTTGGGGCTTTCGACCCTGGGGCCCAAGAGGAACTCCATGGTTCTGGATGTGGCATTTGTCCTGGAAGGATCGGAC 8383 G G C G

L

L

G

L V

S

T

L

G

P

K

R

N

S

~

V

L

D

V

A

P

V

L

E

G

S

D1507

8429 AAAATTGGTGAAGCCAACTTCAACAGGAGCAAGGAGTTCATGGAGGAGGTGATTCAGCGGATGGATGTGGGCCAGGACAG 8463 G R

I

G

E

A

N

P

N

R

S

K

E

P

~

E

E

V

I

Q

R

~

D

V

G

Q

D

8509 CATCCACGTCACGGTGCTGCAGTACTCCTACATGGTGACCGTGGAGTACCCCTTCAGCGAGGCACAGTCCAAAGGGGACA 0543 I

H

V

T

V

L

Q

Y

S

Y

~

V

T

V

E

Y

P

P

S

E

A

Q

S

K

G

D1560

D

S

~

~

~

~

260

Mancuso et al.

Biochemistry, Vol. 30, No. 1, 1991 8589 TCCTGCAGCGGGTGCGAGAGATCCGCTACCAGGGTGGCAACAGGACCAACACTGGGGTGGCCCTGCTGTACCCCTCCGAC C C G T T 8623 1587 I L O R V R E I R Y O G G N R T N T G V A L L Y P S D L R L 8669 CACAGCTTCTTGGTCAGCCAGGGTGACCGGGAGCAGGTGCCCAACCTGGTCTCCATGGTCACCAGAAATCCTGCCTCTGA C A G 8703 E S P L V S ~ G D R E O V P N L V S ~ V T R N P A A Y G

S1614D

8749 TAAGATCAAGAGGTTGCCTGGAGACATCCAGGTGGTGCCCATTGGAGTGGGCCCTAATGCCAACGTTCAGGAGCTGGAGA C G 8783 G 1640 K I K R L P G D I O V V P I G V G P N A N V O E L E

E 8829 GGATCGGCTGGCCCAATGCCCCCATCCTCATCCAGGACTTTGAGACGCTCCCCTGAGAGGCTCCTGACCTGGTGCTGCAG T T C 8863 1667 R I G Y P N A P I L I O D P E T L P * E A P D L V L O R 8909 AGGTGCTGCTCCGGAGAGGGGCTGCAGATCCCCACCCGCTCCCCTGCCCCTG~ATGCTGGCACCTTGTGTGCAGGTGGG 8943 T A R C C S G E G L O I P T R S P A P L

1684

8989 AGGGCTGGGCGAGGGCTGGCATGGCCTTGGTACTACATGCATCTGCCAAGATATGACTTGGGTTCTAATCCTGGCTTCCC 9023 G c c 9069 TGGTCTGTGTGCCCTTGGTTGAAACTTGCCTTCAAAGGGCCTGTGTTTCCTCACCTCCCTGGCAGGGAGACAAACTGTGA 9103 G 9149 TCCTTTTGCGGGGGAAGATCTTTTCTTATTGTGTGTGCTTCTGTAAATTCAAAGGTAATGGAAGCCTATTGTTAACATTT T A 9183 9229 AAACAATGCCACTAAGGAGTAIYVVV1AAAAA--TATGCAAGTCTTACTTTATCACACGGCCTCTTCCCCACCCAATTCCA 9263 A AAA 9307 CCCTCCACCTTCAAGGTAACCTCTGTTAAAGGTGATGATGATAAGCTTCGTTTTGACCACACTGGGTGGGTGAAATTATT TG G A 9343 9387 9423 V

9467 GTGTCTGCCTTAGTTTGAATCTTATCTCTGCCACTTGCTAGCTGGGTAATTTTGGGC~TACTT~CCTCTTTGTATAT T 9503 --A 9547 9581 9556 C A T C T A T A A A A T G G G G A T A A T A A T A G C A C C T A C T T G C T T A - T G A A 9660 9634 CTTGCTCCCACTGCTGTTATTACTACCTTTTACTGACACTGGTGTCTAATCCATTCCTAGTTCCTGAACATCTTTATTCG C C 9740 9714 GTGTGTTTGGGACCATCTCAGAATAATAAGCCTTCTTTTAGTATCTTTTGAGTCTCACACTTGTCAGTACTTTTGTTTCT C A 9820 9794 TTGTTTTTGTTTTTATTCATGGCCATAGATTTATTTAAATTCTTGTAATATTTCTGCTGAGGAAAACAACAATTACATCA C 9900 9874 TTTCATCAAATCTCAGATGTGCTCAGACACTAACAGGAGCACTAGGCATTTATAGCTGGAAGAATCACAGTACGTTCACC 9980 T 9954 TGCCTTGCAAGATCTGAGGACACAGCAGCTCATTGTCCAGGGAGGGGCTGCCGCTCCATTTCCTTTTGCAGT---CTTGT 10060 C CTC 10031 ATTGCCAGGCCAGTATTTTACTCATTTCTAGAAGAGTGGCGGCCCCTTCTTACCGAGQAAGCCCATGCCTACTGCTTTTA 10140 A T T G 10111 TTTGTAGACATTTAAGCTTCCTTTGGGTAGAACTGGAGGCTTCTCAGTGTCTCCAAACCTGAGGTAGTCCAGCCCCAGGA 10220 A T T T T G A 10191 ACTCCATCTCCCCTATCCCCTCCTCCCTGGCCCACGTTGCCCTTGCACTCACAAAGGCACCCCCCGCCCCCCTTGGTGGT 10300 A T G 10271 GCCACGTGGTCAGCACGCCCTGCAGATCCTATGGGATGTCAGGTTGTAGGCCTGGCGGCCAGTGCCCCTGCTGGCACCTG 10379 T T T T 10351 TGTGCTCACCTTCCTGGTTATCTTTGC~ACTGCAGCCAGCCCCTGGACGTGATCCTTCTCCTGGATGGCTCCTCCAGTT 10459 G Exon29+D C S 0 P L D V I L L L D G S S S

1701

10431 TCCCAGCTTCTTCTTTTGATGAAATGAAGAGTTTTGCTAAGGCTTTCATTTC~GCCAATATAG~GAGTGAGTGAGG 10539 A c c G C P P A S S P D E ~ K S P A K A P I S K A N I 1723 Y 10511 CACCTGAAGCAGCAGGTGACGAAGAGGCTCTTTTTGGGGCTCCACTTGATTCAGAATAATCCCCATTTTCTTGTTCCG~ 10619 T T A G C

von Willebrand Factor Gene and Pseudogene

Biochemistry, Vol. 30, No. 1, 1991 261

10591 TAGGGCCTCATCTCACTCAGGGGTCAGTGCTCCAGTATGGAAGCATCACCACCATTGACGTGCCATGGAACGTGGCCCCG 10699 G T G T G P B L T Q G S V L Q Y G S I T T I D V P W N V A P R V V

1749

10671 GAGAAAGCCCATTTGCTGAGCCTTGTGGACGTCATGCAGCGGGAGGGAGGCCCCAGCCAAATCG~AACGTTGGTGCCAC 10779 E K A E L L S L V D V kl Q R E G G P S Q I c E x o n 3 0 1770 10751 AGGCTGGATGCAGAAGCTGCATTCTGGTTCTTATTTTTGGCATAAGTGACTGTGTGACCTTGGCCAGTCACTTTGCTCCT 10859 C 10831 TGGCCTTAGTTTCTTCCCCTGGAAAGTGAGGGGCTAGATGCTCTTCCACGTCTCTCCAGATCTC~CTGGGTGTTCCTTG T 10939 10911 GAGTCTCTGAATCATTCAGCTTTTAAGTGACTTAAGGATCCACCGT-GAGACAGGGTGTCCAGCCGCAGTCAGTACTGAC 11019 T T G 10990 TTGGTGTGATCTGTTCTCCATCCTCAGGGGGATGCCTTGGGCTTTGCTGTGCGACACTTGACTTCAG~TGCATGGTGCC C T 11099 E x o n 3 1 I G D A L G P A V R E L T S B kl E G A 1788 Y 11070 AGGCTGGGAGCCTCGAAGGCGGTGGTCATCCTGGTCACGGATGTCTCTGTGGATTCAGTGGATGCAGCAGCTGATGCCGC 11179 C A C R L G A S K A V V I L V T D V S V D S V D A A A D A A 1815 P 11150 CAGGTCCAACAGTAAGAATCTGGTGTACAGTCCTCAATTCAGGAGAGCGAGACTTCAGAAATGATAGAGGATT~GTG 11259 R S N 1818 11230 GCCTGGGGTTACTTTTGGATGTTGTGGCTGATATTAAATATGCTCAGTACCAACCTCTGTCCTGGTTTC~CAGCAGGA 11339 G A 11310 A A T T A T C C A A T C A C G A T G T G A T T A G G ~ A C C A A A C A A T A C A 11419 11390 CCCTTTAATTTCTCCGCATTTTCCCTTTGTGTCCAGGAGTGATCATACCTCTATGCCCTTTCTTTCTAAAGTAGGATTCG A 11499 -T 11470 TGGTCTCTTACTAAAATGTCATGTCTGAGGCCAGAGTGATTCATTGTGGGGCCTTGAGAACCAATGGGGACTTTCATGTA 11578 A G 11550 TTTAGAGTCCTTTCCTGGTACCAGGTAATGAGCTTGGCAGGGTTGATGGATGAGAGAC~GGATCATGCTGTAGGAAAG 11658 G 11630 GGAAGGATCCAGATGTCCAGCCTTGTGCTCCATAAGAACATGTGCTTATCATGTTCTTAATATCAACAGGATGTGAGCTG 11738 11710 GGACAGGTGGTAGAGTCTGAGGGACCACCTCCCAGCTGAAGGAGGTCTGGGTGAGTGGAGTGACATTGAGTCCCCTCGTG 11818 11790 GCCACCTGGAGAACTGTTCATCTTTCAGAACCCTGCCTAGGAGTGCTGTCAGAACCCTGTCTTTTCTCTGCTGTCTTCCT 11897 T

T

11870 GATGTTCTTCCTGAGGAAGTTATTTGCTCCGTTCTCTATGTCACATATGAGCTACTAAAAATGTGCAAATC-----GTAT 11977 C E-T

V

V

T

T

T

11945 TGTATTGTATTTGTAACATATCTATTCCTTGCAAAAAACTGGGAGTTCTTTGAAGGCAAGGGAAGATCTTGCTTATTTTT 12057 12025 GTATCATCTGTGTATGATGCACAAAATAGGTGACCATTTAATGTTTGTTGAGTGAATGAATGTGACATGTTTATGAAATC 12137 C 12105 TATAATTTTAGGGTAAAGGACACTAAAAAGGAAAGGATATATATTTGGATGATATAACGAACTTACAAGGGCTG~GC G G A A 12217 12185 TGCTAGTAGAAAATCCATTCCAACCTGAAGCACATTGCAAATTCCTAGAACTTTCTGGAGAGCTGAATTGAGGAATTTCA A G 12297 12265 GGATGGAGATTTTTTCTCCTTTAGTATCAAGAATCCAAAGAAAGATTTTCACTGGG~GGAAGCAGTTGTGGTCTTGAG C A 12377 12345 GAAGCAGATATGATCCAGAGCACGTCATTTAACCATCTAAAGCCTCGGTTTCTCATGTGT~TGAGCATGCACATTCTC A G C 12456 -12425 TGGGCTCAGCCCTGTGATATTTTCTCTTCTCTTCTTACATATACCCTGTTGGTGGTCTAGTTCAGTGTCATGGGTCTG~ cc A 12534 12505 TATATTTATACATCTCTGACTTTGAAATTCGTATCTCCAGCCTCACCTCAACTCAGCTTGCATGCTCT~CAGGC~C C T T T 12614 12585 T A A A C T C C T G A T T T T C A C A T C C G T C T T T C T T A T C T T A G A A G C T A 12694 12665 CCAGCATCCTTGATTCTTCTTCCTTTTTCCTGATACCTCCATGCAAGCCACG~AAACTATAGCTCCATCTTC~TA 12774 T C 12745 CTTTGAGAGTTGCACCTCTCCACACCATGTCCACCAGGGTCACCTCAATCTGAGTCCCCAGCATCTCTCACCTGGACTAT C CA C T C T 12854 12825 TGAAATAGCTGTCTAACTGGAATCCCAGCTTCCACCTTTGTCCCCTGCAGCTCATTCTCAGCACAGCAGCTGCAGTAGCC C A G 12934

262 Biochemistry, Vol. 30, No. 1, 1991

Mancuso et al.

12905 13014

CTTTACTATCAAAGTCAGCCCCTGTCCCTCTGCTCTGTAGTCTTGGCTGG-TTTGTTTCATTCACAGAACATGGTGAGCC A A T C

12984 13094

CTTACTGTGGCCAATGGAGCATTGCCTGATCCATGCCCTTTCCCTCTCTTGCTTGTCCCTCTCTCCCCAGCCCCCGGGAG

13064 13174

CATGCAGTCACACTTCTACTCTGGGGCCTTTGCACTCACTCTTCTCTCTGCTTCGGATGTCCCTCCCTCAGACATCCTCT T G

13144 13254

TGGCTCCTTCCCTCCTTACATCAGGTCTCTGCTCAGATGCCATCTTTTCACCAAATGCTTCTCCAACTGCCATATATACA A T G

13224 ACGATAACCCACACCTCAGCCCTGGGCATTCTCATCCCCCTTATCTCATTTTGTTTTTTTCCTAAGCATTTATCCACATC 13334 13304 TGAAATTCTATATATTTAC----TTTGTGTGTTTGTTGTCTGTCTCTCCATGAGGACGGGGGACAGGGAGGGACTTTATG 13414 TTTG A 13380 TGCTTGGTTCACTGCTGTACCCCTATTGCTGACAATAGTACCTGACACAGAGTAGGAGCTCATTAATATCTGTTGACTGA 13494 T C 13460 A C A T C T T C C T C A T A G G G C T G G T G T A T G T G A C C A G C C T G G A C G T C A G G A 13574 13540 CCAGTCCATTTTGAACCTTCTTGCCCACAGATCCTTTCTTGTCTCTTTGCTGACTCT~GAGTGACAGTGTTCCCTATTG G 13654

Exon32 + R

V T V

P P I

1825

13620 GAATTGGAGATCGCTACGATGCAGCCCAGCTACGGATCTTGGCCGGCCCAGCAGGCGACTCCAACATGGTGAAGCTCCAG A G 13734

G I G D R Y D A A Q L R I L A G P A G D S N ~ V K L Q1852 V 13700 13814

CAAATCGAAGACCTCCCTACCATGGTCACCTTGGGCAATTCCTTCCTCCAC~CTGTGCTCTCGTGAGTCTTATAATAC G GO I E D L P T ~ V T L G N S P L E K L C S 1873 R

13780 13894

CTTTCTTACTTCCCTCA-GTTCCAAAACAAACCCGCCAAAGAAAC~TACTATTCAAGAGACCCCAGGGTG A T G

13859 13974

CCCATGCATAAGATTTGGCCTTGAAAGAAGATTATTATATAGACCAGGACATTCAAGTTTCTGCTCATGGGAGGTGTATA T C T T

13939 14054

CCACAAGCCCCCGTGGGGCCAGCAGAAAGCTTTCTGTGTATGGTCCTATTTCCACTCTAATTGGCCATGTGAACTTGGCA A T

14019 GAGAG--CCTAAGCTTATTTGTGTAGAGGGATCTTGGACAAAGGAATCAGAAGACATAAGCAAGGAACCCCAGAGTTCCA 14134 AG A C G 14097 TAG----AGTCTTCTTCCCCATGGGCCTAGACTCACTCTCAACATGGGTATAAAGGGCTTTAG~TACAATAACATGGA GTAT A G C 14214 14173 GACTCATATCAAAGTACCATAGTTTAAGTTGATTTTAGGTTAGAAACTTAAAAAATACGCTTTTG~CAGGCACGGTGGC 14294 -TTG A 14253 TCATGTCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGTGGATCACGAGGTCAGGAGATCGAGACCGTCTTGGCTAAC C C T A 14374 14333 ACAGTGAAACCCCGCCTCTACTAAAGATACAAAAAATTAGCCGGGCATGGTGGCGGGCACCTGTAGTACCAGCTACTTGG T C A A G 14454 14413 GAGGCTGAGGCAGGAGAATGGCGTGAACCCAGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGTCACTGCACTCCAGCCT 14534 T G A C T 14493 14614

GGGCGACAGAGCAAGACTCTGTCTCAAAA----------CACACACACACACACACACACACACACACACACATGCTTTT TcA TG CA --- ------ ---- T

14563 TTTG-GGGGGAGGCCTGGGTTTATATATTTTCCCGTTACTAGATGTAAGTCAAAACCTGCAT~AGCTACTGTCTTTCAG 14681 T C G G C G 14642 GGGAATAAGTCAGTGCAAGTTTGCCCTTAAAGGGCAATAACTCTACACAAGTTTTGACTTATAGCAAAGATGGTCACATTAGCTG A TG T A 14761 14722 TACAGAGAGATGGCAGCTCTCCTGGTAGGAATCTTCAAGTAGATCTCTTTCAGGTTTCCAGGATCTTGCTTCATCTCCCC 14841 14802 14921

ACCTTCCCCATCCCTGGCGTGATCTACATGTGAACCAAGATAATGACAGCGTAAGCTGTAGTTATTGCCATATTATCGCT

14882 GTTGTTGACATCATAATTATTAACAACCGCAGAGCATGTCTGAAGAACCACAGGATGACCACCTCAGCCACATGTCCCTA 15001 G T T T 14962 GGTCTCCACTGTTAACCTTGTTCAGATTCTTTTCAGAGTTGAGTTGACTTC~CTAGACCAGGTTGCTTAAGCAGAC 15081 T 15042 ATTGTGAATGGTTCAGAATTTCTGGGTGAAAGATGGGATGGGAACTAAGGTCTTATTTGTGTCTGTTGC~GATTTGTTAGGATT 15161 Exon 33 + R P V R I 1878

G 15122 15241

CGCATGGATGAGGATGGGAATGAGAAGAGG~AAGTTCCTTTCTGTTGACTTTGAAAGAAAGATTAGAGATGTGTTTGGG T G R ~ D E D G N E K R C

1888

von Willebrand Factor Gene and Pseudogene

Biochemistry, Vol. 30, No. 1, 1991 263

15202 GCTCTTGTTCCCACTGTTTAATTTTTCCTCCTTTGGTCTTAGTCCAGCGCTTCCTCTTACTATTATCTTGTTTTTGAGGG G T T C 15321 15282 TCCATCTGTACATCTTGTGTTTTGCTTCCTGTCTCATGTACAGGAGGCCTCCTTGCTGTGTAGGCCCGTGTTCAATTCTA G T 15401 15362 GGGGTCAGTTGTCTGGCAGATGGGCTTAGAGTTGGAGTACCTCATCTTATTCCCTGCCTGAATCTGCTGTTTTCTTCTGC 15481 15442 AGCCCGGGGACGTCTGGACCTTGCCAGACCAGTGCCACACCGTGACTTGCCAGCCAGATGGCCAGACCTTGCTGAAGAGT 15561 P G D V W T L P D O C E T V T C O P D G O T L L K S

1914

15522 CATCGGGTCAACTGTGACCGGGGGCCGAGGCCTTCATGCCCTAACAGCCAGTCCCC~GTTAAAGTGGAAGACACCTGTGG T G G 15641 E R V N C D R G P R P S C P N S O S P V K V B D T C G 1941 L B 15602 CTGCCGCTGGACCTGCCCCTGTGAGTCCTTTGCTTCTCCAGCCAGGGCAGCGTCAAAGGGGCAGTGCTTTTAGCTTGGCT 15721 C R W T C P + Exon 34

1947

15682 GTGCAGAAAAGTAGAGCAGGCACCCACCAGCCCAGAAGTACCCTTTCCCTCATCACCACATGCACAGTGCTACCTTCACT 15801 15762 CACCTTCCTTTCCTCCTGTGCTCTTTGGACATGCATGCAGCCAGTCTCAGGGATCACTGCCCTCTTTCTCTGTCTTTGGA 15881 15842 AGGCACTTCCCCAGATTATGCATAACCGGAAGGAAG~TTGCTTTTCTGAGGTCAGTGCTCAGCTTGGCTGTTGGCAAGT T A 15961 15922 CAACCTTTAGGAATCTATGTATTCAGGGTATAGCAGTGGAAGTATAGCAGTGAGAGAGATGCTTCCAGAAAACTCTCCAA G C 16041 16002 CCACACCTATAGCAAGAGTACATCACCAAGGTGGCCTGAATTAAGAGGAGGAAATATTTAAAGTCCATCTCACGTACACT C C G T 16121 16082 TGTGGAGAGCATTTCATCCATGGGCTGGGTTAGCCTTCTGTCTCATGTTTGGGAGCAATGACTGCTTATGGTGACCATGA G 16201 16162 C T C A G C A G T G G A T T G T A A C - T T T A T G C T G A A C A G G C A T A A A C T G T A T G T T A T A G C T C A T T C C 16281 16241 CCCCTTTAGAATTTATTTAGTGAGATAAATTTCCCTCTTGGTCTAGGTGAAAACTCCACCAGACTTTAGGTTTCTTCTGT 16361 16321 TTATTCATAAAGAATTCTATAGTTACTTTGTCAGAAGACAAAGTCTCAGTTGATAATTAGATACCCACTCTTGTGT~TA C C 16440 16401 ATCAAAACTTTGAATTTGATTAGAGTTAAAACACCTCCCAGTTCCAATTTGCGCTGCCTGCCTCTGGGAAGGAGGGTATG AG 16520 16481 GTCAGCTTTCCTCTGTACAAAGAGCATTGCCTTTTGTGATTAAAAGCATATCCTGGAAATTACCCGATGTCATTTCAAAG T T 16600 16561 AGAGCTTCCTAATACTTCACTGTGTACCATAGGTTACCCAACTGTTCTTCTATGGATGTGCATTTAGGCTGTTTACAGTG G C 16680 16641 TCTTGTAATGACAAGCAATGCTGCAATGAGTAAACCTGCACATACGAA----TTGTCATATGGTTGGAGGTGTGTCTTCA A CGAA T 16760 T 16717 GGGTAGATTCCTAGAAGTGAGATTCTTGGGTCAAAAGATAAGTGCATATGAAGTTCCCCTTAGATATTCCCAGATTCTCC C 16840 16797 TCTAACAGGTGAGATGTGGTTGTGCAAAACTGGAATGTCCATCAGTTTGCTTTTCGATCAGCGAGGTATGAGAGTGCAAG G T 16920 16877 CTTCTCCATGGCCTTGCTGAGTGCGTTGTTACAGTGTTTTTTTTTTTT-----CAAATTTCTTACTTTTCAGGAAAGAAA C T T A A TTTTG 17000 A 16952 TAGTATCTGAGTGTACTTGAAATTCATTTCTCTAATGATGAGTGAGGGTAAACATTTT-TCATATGTTTAATTGCCA~ C T A 17080 I C Line-1 17031 TTTGGTCTAATTTTATGAATTTTCTGCTCATGCCTTGCCCCATTTTCCCATTAGAAGAGTTGTGATAA~C~CATGAG G G A G 17160 17111 ATCTACCCTATTAATAAATTTTCCAGTGTACAATACAATATTGTTAGCTATTTGCAAGTTGTTGTACAGCAGACCTTTAG C T C 17240 17191 AAGTTTTTGTTTTGTGTGGCTGAAACTATGCCCACTGAACAGC~CTTCCCATTTTCCCCTTCCCCTAGCCCCCACAACC A G 17320 17271 ACCATTCTGCTTTTTGCTTCTATGAATTTGACGATTTTAGATACCTCCTGTAAGTGGATTCATGCAGTAGTTATCCTTCT 17400 17351 GTGACTGGCTCATTTCACTTAGCTTAATGTCCTCCAAGTTCATCTATGTTGTGGCATATGA~GGATTTCCTTCTTTCTA 17480 CA 17431 AAGACTGAATAGTATTCCATTGCATGTATATACCACATTTTCTTTATCCATTTGTCCACTGATGTGCATTTGGATTGTTT A 17560 17511 CTAACTCTTGGCTATTGTGAAT~TGTTGCAGTGAACGTGGAAGTGTA--AGTATCTTTTTGAGATCCTGATTTCAATGC T TA C 17640

264

Biochemistry, Vol. 30, No. 1, 1991 17589 17720

TTTTGGATGAATACCCAGAAGTGGGATTGCGGGGTCATATGCTAATACCAGTTTTAATTTTCTGAGGAACCTTCTTACCG A T T T T

17669 17800

TTTTCTGTAGTGGCTACACCACTTTACATTCCTACCATCAATGCACAAGGGTTTCAGTTTCTTCACATTCTTGTTAACAC C G A

17749 17880

TTGGTATTTCCTG----GTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGATAATGGCCATTCTAGCACGTATGAGGTAATA G G

17825 17960

CCTCATTATGGTTTTGATTTGCATTTCCCTGATGATCAGTAATGTTGGATATTTTTTCATATAATTGACTTTTTAGATGT G G

17905 18040

CTCTTTGGAGGAATGTCTTTTCAAGTCCTTTGCTCATTTTTAAATTGGGTTATTTGCATTTTTTTTTTTTTTGCTATTGA

17985 18120

GTTGTGTGAGTTCCTAATATATTTTGGATATTAACCTTTTATCAAATGTACGGTTTGCAAATATTTTCTTCCATTCTGTA A C

18065 18200

GGCTGCCTTTTTATTCTGTTGATTGTTTCTTTGGCTGCACAGAAGCTTTTTGGTTTGATGTAGTCCCACTTGTCTATTTT

18145 18280

TGCTTTTGTTGCCTTGTTTTTGGTAACGATATCCAAGAAATCGTCGCCAAGACCAATGTTGCAAAGTTCTTGTGTTTTCT C T

18225 18360

TTTAGGAGTTTTATAATTTCAGGTTTTACGTATCAGTGTTTAAGTTTGATTTTGGGTTAACTTTGTGTATGATATAAGAT G

18305 18440

AAGGGTCCAACTTCATTCTTTTTCATGTGGACAGGT-CATTGTTAGTGTATAAAAACAAATGGTTTTCTATGTTGATTTT T T TG G

18384 18520

GTATTCTGCAACTGAATTTGTTTCAACTTTGCTGAATTTGCTAACAAATTTTTGGTGAAATCTTTAGAGTTTTCTACACA T C TG

18464 18600

TAAGATTATGTCATCTGCAAATTGAGATAATTTTACTTCTTTTTTGATTTGGGTGATGTTTATATCTTCTTGATTCATTG T C G C

18544 18680

CTCTAGCTAAGACTTCTGGTACTATGTTGAATAGAAGTGGTGAGAGTGGGCATCCTTGCTGTCTTCCTGA~TAGAGGAAA C C

18624 18760

AGCTTTTAGTTTTGCACTATTCAG-GTGATGTTAGCTGTGGTATTTTCATTAATGGCCTTAATTTTGTTGAGGTAATTTC G T T T

18703 18840

CTTCTATTCTTAGTTTGTTGAATGTTTTAATCATGAAATGGTGTTTGAATTTTGACAAATGCTTTATCTGCATCTATTGA A T G

18783 18920

CATGACAGTATGATTTTTATCCTTTGTCCTGTTAATGCTGTGTATCACATTTACTGATTTTTAGATGCTGAGCCATTCTC T T A C T C T

18863 19000

CCATCTGAGGGATAAATCCCACTC----AGAGTGTATGATCCTTTGATTGTGCTGTTGAATTCAGTTTGCTATCATTTTG T CGTC T T G

18939 19080

TTGAGAATTTTTGCATCTATGTTCATCAGGAATATTGACTTGTAGTTTTCTTT-AACGTAAAGTCTTTGGCTTTGGTATC G T T T A

19018 19160

AGGGTAATTCTGGCTTC-------AGGTTTCCCACTTCAATGTTTTGGAAGAGTTTGAGAAAGACTGGCATTAATTCTTC ATAAAGT G G T

19091 TTTAAATCTTTGGTAGAATTATCCAGTGAAAGCATCTGAGCTCTTCTTTGTTGGGAGATTATGATTATTAATTTGATCTC 19240 A A A C C 19171 CTTACTAATTGTGGGGCTGTTCAGATTTTCTGTTTATTCATCATTCAGTCTTGGTAGGTTGTATGTTTCAGGGATTTATT 19320 GA G 19251 19400

CATTTCTTTAGGTTATCCAATTTGTTAGTGTGTAGTTGTTCATAGTAGTCTCTTACTAGCCTTTTTATTTCTGTGCTATC A C

19331 AATTGTAATATAACCTCTTCTATTTCTGATTTTATTTATTTGAGTCTT---TTTTTGTCTTGGTTAATCTAGCTAAAGAT 19480 G G CTC C 19408 19560

TTGTTAATTTTGTTGACCTTTAAAAAACCAATTCTAAGTTTTATTGATATTTTGTATTCTATATTTCATTTATTTTTCCT

19488 19640

CAAATCTTA---ATTTCCTTCCTTCTGCTAACTTAGGGCTTAGTTTATTCATTTTCTAGTTCCTTATGTTGTAAAATTAG TATT C - CA

19565 19719

GTT----GTTTGAGATCTTTCTTACTGTAGACATTTACTGCTGTAAACTTTCCTTTTATTGCTTTTGCTGTTTCCCATAA ATTT A

19641 GCTTTGCTATGTTGTGCTTTAGTTTTTGTTTTTCTCAAAATATTTTCTGATTTTTCTTTTGATTTCTTCTTTGACCCATT 19799 T 19721 GGTTGTTCAAGAGTGTTTTTTAATTTCCACATATTTGAGAACTTTCTGAAATTATTTCTATTAGCAATGTCTCATTCCGT 19879 C TTA 19801 T--GTGGTCAGAAAAGATACTTGGTATGATTTTAGTCTCCTTAAATTTGTTAAGATTTGTTTTGTGGGCTAGCATGTGAA 19959 CT T C C 19879 20039

CTATTCTGGAGACTGTTCTATGTGTGTTTGAGGAGAATGTATAATTTGCTGTTGTTGGGTAGAATATTCTGTATTTGTTT A G

19959 20119

GATAGGTCCACTTGGGCCATAGTGTTGTTTGAGTTATCTGTTTC~TA~GATCTTCTGTCTGGATTTTTTATCCATTAT C A

20039 20199

TGAAAGTGAGATATTACAATTTCTCACTATTATGTTGTTCTATATTTATCATCACTTCTGTCAACATTTGCTTCTTATAT A C T

Mancuso et al.

von Willebrand Factor Gene and Pseudogene

Biochemistry, Vol. 30, No. 1, 1991 265

20119 20279

TTGTGTGCTTTAATGTTGGGTGCAATGTATTTATAATTATATCTTCCTGATGAATTGATGCATTTATCACTATATAATGT C-----G

20199 20353

CCGTTTTTTTGTCTCTTGTGACCATCTTTGACTTAAAGTTTAATTTGTCTAAGTATGGCTACCT-------CTTTTAGTT G C GTGCTGT

20272 20433

AACATTGGCATGAAATATCTTTTTCCATCTTCTCACTTTCAACCTATGTGCATTCTTAAATCCAAAGTGAGTCTCTCTCT A T TGTAGACAGCACATAGTTGGATCTTG~TTTATATTTA~TATTTATTTATTTATTTATTTATTTATTTAATTTATTTAGA T T T T T T T T T T T T V

20352 20513

-

--_---------_-_

L

20432 20578

TGGAGTCTGGCTCTGTTGCCCAGGCTGGAGTGCAATGGCTCGATCTCGGCTCACTGCAATCTATGCCTCCCGGGTTCAAG T c c

20512 20658

TGATTCTCCTCCCTCAGCCTCCTAAGTAGCTGGAACTAGAAGCGCATGCCACCACTCCTGGCTAATTTTTTGTATTTTTA C A G TG

20592 20738

GTAGAGACGGGGTTTCACTGTGTTAGCCAGGATGGTCTTGATCTCCTGATCTCTCATGATCTGCCTACCTCAGCGTCCCA CA C C C

AAGTGCTGGGATTACAGACGTGAGCCACCACACCTGGCTGTTTTATTTTATTTTTAAAAATTCATTTCAGCCACTTCATA C A G c Line-1 el 20752 GCTTTTGATTGGAGAGTTTATTCTATTTATACTTAAAGTAACTACTGATAGGGAAGGACTTACTGTTGCCATTTTGGTAA C TG 20898 20672 20818

20832 20977

TTATTTTCAGTCAGTCTTGTAGCTTTTTTGTCCCTCTTTTCTACTCTTACTGCCCTTCTTTGTGTTTCATTGATG~

20912 21056

TGGGTAATGATGAGATTTGATTCCTTTCTCATTTCCTTTTGTGTATCTTCTATAGATATTTTCTTTGTGGTTACCATGAA

20992 21136

GCTTACTTAAAACAGCCTATACTTATAACAGTCTATTCTAAG

c

c

A

CTTATAACAACTTAATTTCAATTGCTCACGAAAACTCT

21216 21296

TTTTTATAGTGATAGTTATTTTTATAAGTTTGTCTTTTAAGTACTATACCAGAATTC

FIGURE 4: Nucleotide sequence of the v W F pseudogene and gene. Dots above and within the sequence are placed a t IO-bp intervals. The pseudogene sequence is shown in the first line of each comparison. Where different, nucleotides corresponding to the v W F gene sequence are indicated in the second line, below the pseudogene sequence. Regions of the pseudogene corresponding to v W F exons are numbered 23-34. Translated amino acid sequence for the pseudogene is shown in single letter code in the third line. Where different, the translated amino acid sequence of the v W F gene is shown in the fourth line. The 5' boundary of the pseudogene is indicated by double arrowheads, pointing up and down. N u repeats are flanked by arrows that indicate their extent and direction, and flanking direct repeats are underlined. The approximate boundaries of a Line-I repeat near the 3' end of the sequence are indicated (Line-1), and candidate flanking direct repeats are underlined. Simple sequence repeats described in Table 111 are underlined, and the extent of monomer units is indicated by single arrowheads, pointing down. Insertions and deletions necessary to maintain alignment of the two sequences are indicated by dashes (- - -).

Table 1: Divergence between the vWF Gene and Pseudogene substitutions sequcnces compared BP TV TS

TS/TV

CG

-.

TG

gaps Ins

Del

7% divergence

2.38 31 0 0 2.47 2 875 21 50 exons 17816 142 382 2.69 141 24 32 3.25 introns 2.65 172 24 32 3.14 163 432 total 20 69 1 BP is the length of the sequences compared in nucleotides minus the length of any sequence opposite a gap; TV, transversion; TS, transition; TS/TV, the ratio of transitions to transversions; CG TG, transitions that could be accounted for by deamination of methylcytosine; Ins, insertions relative to the gene sequence; Del, deletions relative to the gene sequence. Substitutions are the sum of TV + TS. Gaps are the sum of Ins + Del. Percent divcreence is eaual to [(IOO)(substitutions + gaps)/(BP + gaps)], in which gaps are counted as single events regardless of their length.

-

Table I I : Alu and Line-I Repeats in the Human vWF Gene and Pseudogene Location/ Nucleotides

Repeat/ Orientation

Planking direct repeats

Intron 24 2123-2417

Alu/ inverted

15

Intron 25 3941-4072

Alu/ direct

9

7

*/g5' wg3'

Intron 32 14237-14554

Alu/ direct

8

*5

Length

Source

Sequence

h olg5' WE3'

;

g5

*3 '

g3' Intron 34 20388-20709

Alu/ inverted

Intron 34 17104-20790

Line-I/ inverted

12 11

29 28 27

AGTGCAGTTGTCTCT G AGAAAACTG . .

ACGCTTTT T T C

*/g5' Ug3'

TGTTTTATATTT TGTTTTAT-TTT

Ug?'

TATGTTTAATTGCCATTTTTTGGTCTAAT

$3

83'

G G

C C

ATG AT-

-G -G

flanking this Line-1 element were not identified with certainty; candidate 29 bp repeats are underlined in Figure 4 a n d listed

Table 111: Simple Sequence Repeats in the Human vWF Gene and Pseudogene no. of reoeats pseudoconsensus location gene gene (CAI. 2513-2539 -12 -13 14522-14555 IO 17 -1.5 -7.5 45 14-47 13 CCTGCCGCCCCATCTCTGACCTGCAAG 5539-5624 15 15 (TTTA)n 12 12 20379-20429 4 3 6799-6806 (GT), 4 4 7 37 2-7 3 79 4 4 11353-11360 16 17762-17789 14 4 9416-9444 4 (Y CAGGTC),