Biochemistry 1991,30, 253-269
253
Human von Willebrand Factor Gene and Pseudogene: Structural Analysis and Differentiation by Polymerase Chain Reaction?,$ David J . Mancuso,s Elodee A. Tuley,s Lisa A. Westfield,§ Teresa L. Lester-Mancuso,%Michelle M. L e Beau," James M. Sorace,sJ and J. Evan Sadler*,s Howard Hughes Medical Institute Laboratories, Department of Medicine, and Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 631 10, and Joint Section of HematologylOncology, Department of Medicine, University of Chicago Medical Center, Chicago, Illinois 60637 Received August 3, 1990; Revised Manuscript Received September 1I , I990
ABSTRACT: Structural analysis of the von Willebrand factor gene located on chromosome 12 is complicated
by the presence of a partial unprocessed pseudogene on chromosome 22qll-13. T h e structures of the von Willebrand factor pseudogene and corresponding segment of the gene were determined, and methods were developed for the rapid differentiation of von Willebrand factor gene and pseudogene sequences. T h e pseudogene is 21-29 kilobases in length and corresponds to 12 exons (exons 23-34) of the von Willebrand factor gene. Approximately 21 kilobases of the gene and pseudogene were sequenced, including the 5' boundary of the pseudogene. The 3' boundary of the pseudogene lies within an 8-kb region corresponding to intron 34 of the gene. The presence of splice site and nonsense mutations suggests that the pseudogene cannot yield functional transcripts. The pseudogene has diverged -3.1% in nucleotide sequence from the gene. This suggests a recent evolutionary origin 19-29 million years ago, near the time of divergence of humans and apes from monkeys. Several repetitive sequences were identified, including 4 Alu, one Line-1, and several short simple sequence repeats. Several of these simple repeats differ in length between the gene and pseudogene and provide useful markers for distinguishing these loci. Sequence differences between the gene and pseudogene were exploited to design oligonucleotide primers for use in the polymerase chain reaction to selectivity amplify sequences corresponding to exons 23-34 from either the von Willebrand factor gene or the pseudogene. This method is useful for the analysis of gene defects in patients with von Willebrand disease, without interference from homologous sequences in the pseudogene.
-
H u m a n von Willebrand factor ( V W F ) is ~ a complex multimeric plasma glycoprotein that is required for platelet adhesion to damaged endothelium and for factor VI11 survival in the circulation [reviewed in Sadler (1989)l. Deficiency of v W F results in von Willebrand disease (vWD), which is the most common inherited bleeding disorder in man if all degrees of severity are included (Rodeghiero et al., 1987). Many deficient phenotypes have been identified (Ruggeri & Zimmermann, 1987). The human v W F gene has been cloned, and its structure was reported in detail recently. It spans 178 kb of DNA and contains 52 exons. A high-resolution restriction map was constructed for -75% of the locus, and -20% of the gene sequence has been reported (Mancuso et al., 1989b). I n the course of characterizing the vWF gene, restriction fragments from the center of the v W F cDNA were found to hybridize not only to the v W F locus on chromosome 12 but also to sequence on chromosome 22 (Shelton-Inloes et al., 1987;
-
'This work was supported by the Howard Hughes Medical Institute and (M.M.L.B.) by the University of Chicago Cancer Research Foundation and the Leukemia Society of America. *The nucleotide sequences reported in this paper have been submitted to the GenBank/EMBL Data Bank under Accession Numbers M36184 and M36185. * T o whom correspondence should be addressed at the Howard Hughes Medical Institute, Washington University School of Medicine, 660 S. Euclid Avenue, Box 8045, St. Louis, MO 631 10. 9 Washington University School of Medicine. 'I University of Chicago Medical Center. Present address: Chief, Blood Bank and Hematology, Laboratory Service ( 1 I3), Veterans Administration Medical Center, 3900 Loch Raven Blvd., Baltimore, MD 21218.
0006-2960/9 1/0430-253$02.50/0
Mancuso et al., 1988). This cross-hybridizing locus proved to be a pseudogene for vWF. The presence of this pseudogene intereferes with analysis by Southern blotting of gene defects in patients with vWD and has caused some difficulty in the identification of restriction fragment length polymorphisms for the gene. The v W F pseudogene corresponds to a region of the vWF gene that encodes important functional domains of the protein and also has an interesting evolutionary history. The human vWF precursor consists of four types of repeated domains, present in 2-5 copies each in the following order: D1-D2D'-D3-Al-A2-A3-D4-Bl-B2-B3-Cl-C2 (Bonthron et al., 1986; Verweij et al., 1986; Shelton-Inloes et al., 1986). The segment of the gene that is duplicated in the vWF pseudogene encodes domains Al-A3. The v W F A repeats appear to be homologous to segments of several proteins: complement factor B (Shelton-Inloes et al., 1986; Sadler et al., 1986; Mole et al., 1984), complement component C2 (Bentley, 1986), cartilage matrix protein (Kiss et al., 1989), all three a-chains of collagen type VI (Bonaldo et al., 1989, 1990; Chu et al., 1989; Koller et al., 1989), the a-subunits of the leukocyte adhesion receptors Mac-1 (Arnaout et al., 1988; Corbi et al., 1988; Pytela, 1988), p150,95 (Corbi et al., 1987), LFA-1 (Larson et al., 1989), and the a-subunit of the platelet collagen receptor (Takada & Hemler, 1989). The present distribution of v W F A domain containing genes may have arisen in part
'
Abbreviations: bp, base pair(s); kb, kilobase(s); PCR, polymerase chain reaction; vWF, von Willebrand factor: vWD, von Willebrand disease; R, a purine nucleoside, adenine or guanosine: Y , a pyrimidine nucleoside, thymidine or cytidine.
0 199 1 American Chemical Society
254 Biochemistry, Vol. 30, No. 1, 1991
Mancuso et al.
by repeated gene segment duplication and exon shuffling (Gilbert, 1978). The A domains of vWF contain binding sites for heparin, collagen, and platelet glycoprotein Ib (Mohri et al., 1988; Fujimura et al., 1987; Pareti et al., 1986; Roth et al., 1986). One can speculate that the homologous A domains in other proteins might also recognize one or more similar ligands. In particular, the collagen-binding properties of the platelet collagen-receptor might be dependent upon the presence of the v W F A domain in the a-chain. We report here -21 kb of DNA sequence for the v W F pseudogene and for the corresponding DNA sequence of the vWF gene, including the 5’ boundary of the pseudogene. the pseudogene is -21-29 kb in length and has diverged -3.1% from the gene. Sequence differences between the gene and pseudogene were exploited to selectively amplify segments of either locus by using the polymerase chain reaction (PCR). This method permits the analysis of gene defects in patients with vWD without interference from the pseudogene. EXPERIMENTAL PROCEDURES Materials. Restriction enzymes, T4 DNA ligase, T4 polynucleotide kinase, and T4 DNA polymerase were purchased from Bethesda Research Laboratories or New England Biolabs. The Klenow fragment of Escherichia coli DNA polymerase and calf intestine alkaline phosphatase were purchased from Boehringer Mannheim. Deoxycytidine 5’[ L U - ~triphosphate ~P] ( [ c + ~ ~ P ] ~ C Tdeoxyadenosine P), 5’-[a35S]thiotriphosphate( [35S]dATPaS),and [y-32P]ATPwere purchased from Amersham. Taq polymerase (Amplitaq) was purchased from Perkin-Elmer Cetus. Placental DNA was purchased from Sigma. Isolation of u WF Pseudogene DNA Clones. The isolation of clones X6, pWE64, and pWEl13 was described previously (Mancuso et al., 1989b). The bacteriophage X Charon 4A library was derived from fetal liver and was provided by Tom Maniatis (Harvard University, Boston) (Lawn et al., 1978). The pWEl5 cosmid library was derived from human placenta (pWEI5) and was provided by Glen Evans (Wahl et al., 1987) (Salk Institute, San Diego). Chromosomal Localization by in Situ Hybridization. Human metaphase cells were prepared from phytohemagglutinin-stimulated peripheral blood lymphocytes. The vWF cDNA fragment of plasmid X3/SacI-2a, which contains the 5’ 3.8-kb Sacl-Sac1 fragment of XHvWF3 (Shelton-Inloes et al., 1987) cloned in the Sac1 site of pUCl8, was radiolabeled by nick translation of the entire plasmid with all four 3H-labeled deoxynucleoside triphosphates to a specific activity of 1 .O X 10’ dpm/Fg. In situ hybridization was performed as described previously (Le Beau et al., 1984). Metaphase cells were hybridized at 2.0 and 4.0 ng of probe/mL of hybridization mixture. Autoradiographs were exposed for 1 1 days. Polymerase Chain Reaction. The polymerase chain reaction was performed (Saiki et al., 1988) with a Perkin-Elmer Cetus DNA thermal cycler. Typically, 100 ng of a cosmid or bacteriophage X DNA isolate was amplified with 2.5 units of Taq polymerase for 30 cycles of the following: 30 s at 94 OC (denaturing), 30 s at 50-60 OC (annealing), and 1 min at 72 O C (extension), with a final extension time of 5 min at 72 OC. Human peripheral blood leukocyte DNA was amplified similarly with the following modifications. Reactions without enzyme were preheated at 99 OC for 5 min and then at 96 OC for an additional 5 min after addition of Taq polymerase. DNA was amplified for 30 cycles of the following: 1 min at 96 “ C (denaturing), 1 min at 53-55 OC (annealing), and 2 min at 72 OC (extension) with a final extension time of 5 min at 72 OC.
-
22 Chromosomal localizationof the human vWF pseudogene. Shown in the figure is the distribution of labeled sites on chromosome 22 in normal human metaphase cells from phytohemagglutininstimulated peripheral blood lymphocytes hybridized with a vWF cDNA probe. Of 38 labeled sites observed on chromosome 22, 23 (86.8%) were clustered at 22q11-13, with the largest cluster at 22qll-12. FIGURE I :
Computer-Assisted Methods. Computer programs were those of the Genetics Computer Group (Madison, WI) or Compugene (St. Louis, MO). Nucleotide sequences were analyzed and compared to the Genbank Genetic Sequence Data Bank (BBN Laboratories, Inc., Cambridge, MA, release 62.0, December 1989) and the EMBL database (release 19.0, May 1989) with the program FASTA (Lipman & Pearson, 1985). Other Methods. Restriction mapping and Southern blotting, DNA subcloning and sequencing, and preparation of peripheral blood leukocyte DNA were performed as described previously (Mancuso et al., 1989b; Grunebaum et al., 1984).
RESULTS Chromosomal Localization of the u WF Pseudogene. A restriction fragment from the middle of the v W F cDNA, corresponding to amino acids 144-544 of the mature v W F subunit (Shelton-Inloes et al., 1986), was hybridized to human metaphase chromosomes. Specific labeling of chromosomes 12 and 22 was observed due to the presence of the vWF gene and pseudogene on these chromosomes, respectively. Of the 38 labeled sites observed on chromosome 22, 36 (94.7%) were clustered at bands qll-13 with the largest number of grains at bands ql1-12. This cluster of grains represented 16% (38/238) of all labeled sites (p 64% identical residues between pseudogene nucleotides 17104 and 17800. The inverted Alu repeat repeats between pseudogene nucleotides 20377 and 20724 apparently is inserted into the Line-1 element. Direct repeats
-
-
-
256
Mancuso et al.
Biochemistry, Vol. 30, No. I , 1991 1 GAGCAGAATCTTTGGGGAAGGGAGGGAATGAGTCATGCCAGTATCTAGGGGAGAGAGTTTAGGCAGACGGAACMGCACG
81 TGCCGAGGCCTTAAGGTGGTAGGTGTCTGCAGCCGTTGTACAAGGAGGTCAGTGTGGCTACAGAGGAGGGACTTGGGGGT
1 GACCTTAAAATGGAGAGATTATCCTGAATTATCTAGGT 161 AATGGTAGTTCACGAGGGCAGGGAGGGGCAGTAGCCATACCATGGGGGCTACTAGGCCATGGGAAGATCGTTGGCC~TT 39 GGGCCCACTGTAGTCACAAAGAACCTTCAAACAGGAAGGCCGAAAGACCAGAGTGAGTAGAAGAGCTTGGGG~CTGG~ 241 GCCCAACAGTGAGAACATTCTTCCTCTATGTTTCCTGGTGACTTACACCCCTTGTACCAAGCTGAATGCAGCTTATTG~ 119 GCAAGATAGAATACATTTTTTCTCTTTGTCTCAACATCAGGGCTCTAAGTATATTGTAAAAGTATAGCACCTTTTCTTCT 321 199 CCAGGCCAAATATTTCCAGTTTTTTCTGTAGTTTTTTTTTTGTTTTATTATTATTTTTAGTTGACATTTAACTGTATGTA __401 279 TTTACTGGTGACAGTGTGATATTTCAATACATGTATACAGTGTGTAATGATCAAATCAGGGTAGTTAGTATATCTC~CC 478 G G C G 359 CTCAAACATTTACCATTTCTTTGTGTTGGGAACATTCAAAGTCTGCTCTTCCAGCTACTTGAAAATATACAATAAACTGT 558 439 TGATAATTATAGTCACCCTGTGGTGCTACAGAACACTAGAACTTATCCCTCCTCTTTGTGTAGTTTTCACTTGTTTCTCA 638 C 519 CTCATTCATAACTACTTTTTCAGAACTTGTTGCATGTTGCACTCCATGATGGGTGAACATGGAGACAGGTGAGACATATC 718 G 599 TTCCTGGTTACATTTATGATAGAGCAGAATTCTGCCCAGCCTGGTCTGACAGGCTGCGGTTCTCCAGGCAAGGGGACTC 798 A G AC AC 679 ACCTGCCCAGAGGCAGGCAGTTCTATAACGTTCCCTCCTTGGACCCCTACACCTCCATTGGTGGAGCAGTGTACCTGGCT 878 T T 759 CCTCTCTCTGCTTCTACGCTTGTTGGGGACAGCGATAGTTCCCACCAGTGATCTCAGGGCCAAGGCTGCCTGATTCCCAC 958 T T 839 TTCTGCCCTTGGCTGACTATGTGACATGGGCATGTTGCCTCTCTTTGTTTCCATAGCTTC~TAAAATGGGGCCAGCM -T 1038 C 919 GGAAGCTCAGGAATAGGGCTTGGCAATGGCAAGGCTTTGCTGCTCACCTCGGGCCTCCTCTGAGTCTCTGTCCCGCTC~ G T 1116 999 CCTCCTCTCCCTCGAATGCCCTCTGCCTCCATTGCCGCCAGGAACGTTCCCCTTTCCCCTGAGCCAGAGAGCATGCCC~ 1196 T T G T 1079 GGGCTTGACGGTGCTCATCCCTCAACTTGTCTCTCAA~GAGAAAGTGTGTGGCCTGTGTGGGAATTTTGATGGCATCCAG 1276 Bxon23+E K V C G L C G N P D G I 0
1003
1159 AACAGTGACCTCACCAGCAGCAACCTCCAAGTGGAGGAAGACCCTGTGGACTTTGGGAAGCTCCTGG~GTGAGCTCGCA A 1356 N S D L T S S N L O V E E D P V D P G N S V K V S S Q 1030 N 1239 GTCTGCTGACACCAGAAAACTACGTCTGGGTCTCTGTGTGGACAGAGCCCGAGAGCTTGCTTCCTGGAATGTCCCTCTGT G T 1436 S A D T R K-
c-
1036
C 1319 CCCCATTCTTATGGGGGCTGGAAGGGGGGGTTGTGGGTGGTATGACCTTCAGGTGGCTGCAGGGTGGGAGGAGGGTCTCT 1516 G C C 1399 TGGATCCTTCTGGGCTGAATAACCCCAGTTTGACCAGCTGACGGCTGGCCTATCTCTTGCCTGGTTCCC~GTGCCTCTG 1596 Exon 24 + V P L
1039
1479 GACTCATCCCCTGCCACCTGCCATAACAAGCATCATGAAGCAGACGATGGTGGATTCCTCCTGTAGAATCCTTACCAGTGA 1676 D S S P A T C E N N I ~ K O T ~ V D S S C R I L T S1066D 1559 CGTCTTCCAGGACTGCAACAGCTGGTGAGGACCTTGAGGGTAGTGGGAGCAGATGGTCCCAAGGCTTGGCCCAGTGGT 1756 C TG V P O D C N K L 1639 ATGGACACAGAGTGTGACCTTCTAACGTGGACACTACCCTTGTGTCTTGACATGATCTGCACCAAGACACCACTTTAGCT 1836 C CG 1719 TTTTTTTCTTGGCTTTCAATCTGGGAAACAAAAAGTAAAATC~CAGTTTCTAGGGGAAGCAATGCCTGGC~CATTT 1916 1799 CCTTCTGCATGAGAAGTAACTCCCCTTGGCATGTGCCAACGTTTCTCTTTCGGCCCCAGTCTTAGGATTTGTTCTCTTAT T C A 1995 1879 GGAAGTATCTTGTTTTCAACACCAGAGCCAGAGATTTCCTTTTCCTGTCACTGCTGCATTTGTCCAGACC~GACCGT T 2075 T 1959 CCTCTCCCACTCCTCAAAACCCCTTGGTGCCCATTTCTTGTCTCACAGAAATTCTTTTCTGGCCTTACTTTTGGTGATTT 2155 C CT A 2039 TGAGTCCTCGTATTATGACTTATTTTTGTGTCTTCATCTCTAATGACAAGGAGGAACTCCCCAGGGTTAAGTGCAGTTGT T G A 2235
1074
Biochemistry, Vol. 30, No. 1, 1991 257
von Willebrand Factor Gene and Pseudogene
CTCTCTTCTTTTTTTTT------GAGACAGAGTCTCGCTCTGTCACTCAGGCTGGAGTGCAATGGCATGATCTTGGCTTA 2119 T TTTTTT C 2315 2193 GTGCAGCCTCTGCTTCCCGGGTTCAAGCGATTCTCCTGCCTCAACCTCCCGAGTAGCTGGGATTACAGGTACCCGCCACC C T C C C T 2394 2273 ACACCCAGCTAATTTTTGTATTTTTAGTAGAGACGAGGTTTCACCACGTTGGTCAGGCTGGCCTCGAACTCCTGACCTCA G T A T T 2474 2353 GGTGATCTACCCACCTGAGCCTCCCAAAGTGCTGGGATTACAGGTGTAAGCCACCATGCCCGACCA~AGTGC~GTCT CG G 2553
CTTATCTGCTGCATGTATGTGTGTACACACATGTGGACATGGGCTTGTACATGGACATATGTGACTACAGAAGTCTATGT 2433 2633 G 2513 ACACACACACAGCCATGCACACATACATGTTGCTGTGTAGCACCACCAATGTGGGAGAGACTGGTTG~CATGGATCT __ C A 2713 2593 CAATTCTCTTTCTATCTATGCAGTGATTTTCAGTCTCTGAGGACTCCAAAGGATACTTACATTCCC-GGTTTGGTGG~ G T 2791 2672 TCCTGGGCATCTGAGTTGGAAGAGTGAGGACAGGGGAGGAGCTGGGGACATTGGGACTGTTGG~CGTCTTGGAAAC~T T A 2871 2752 GACCCACTCAGTGTCTGAATTCATTTTCTGTCATAACTGCCCTG~GTCCAGTGTGTTTAGAGGCGTGT~TGGG-A 2951 G 2831 TGAGGAAGGGTGGGAACTGGTTGAACTGGATTTGGAATCAGAGTCTAGGCCCTATTGTTCTGCATACCTGCCCCATAGCA 3031 T C 2911 CTGCAGTGAGGCAGCTGCAGGGCTTGATGTATTTCCCCATTCTGCTTGCTGAATTGAGGCAAAGAAAGACAGTGACCAGC 3111 C CT 2991 ACATATGTGTGTTTGTGTTTTTGTAAAAACACCCACATGCTCATGAGGCCAAGAGTGGGTTGTGAGGACAGATGGGTGGC 3191 G T 3071 TGAGCAGGGAGGCAGGCAGAGGGACAGAGGGAATGTTCTTCTGGAATATCCTCAGGCTCATTGTGTTCTGCAGAAGGCCA 3271 T A 3151 GCAGCACTGCATTATTCAACTCTTCTTGCTGGAATGCAGATTAGAAACTAAGAATCTTGCCTTCCCACTCATTCCCTCTT 3351 3231 TGAGACCATTGAGCTACATTTCTCCTTCTACCTGGACCCCCCTTATCCTTAAATTGACCATCAGAACATTTGCACCCAGA 3431 G 3311 CTAAGAGCCAGAGTTTCTGACACCTGGCCATAGGCCTGGGCCACCTGAGGCTACCTTTGC~GTGGACCCCGAGCCATAC C G T 3510 Exon 25 .* V D P E P I 1080 3391 CTGGATGTCTGCATTTACGACACCTGCTCCTGTGAGTCCATTGGGGACTGCGCCTGCTTCTGCGACACCATTGCTGCCTA 3590 L D V C I Y D T C S C E S I G D C A C P C D T I A A Y 1107 3471 TGCCCACGTGTGTGCCCAGCATGGCAAGGTGGTGACCTGGAGGACGGCCACATTGTGCCATGAGTCCTGACACCCTCATG 3670 E A G A E V C A Q E G K V V T V R T A T L C
1126
3551 TTCTCAGATGCCCTCCCTTCTTCCCATGTGTCTATGCTTGAAGACCTTGTGAGTGCAGGGGGATATCTTCATGGGCGAGA 3750 3631 GAAGACAG--TGCTAGTATGGGACTGGATGGCCAAGGGCTAACGAGGGTTAAGACGTTGGCTGAGTAAGAAGTTTATATT AG T G A T 3830 3709 ATGGACTTTGAGGCGGGACATGGATTCAAGGCATGAACATGTGGATACTTTTCTTCTGGAGAGATTCTGGCAGAGGAGAA T G G 3909 C G--3789 GAGGGAATACTGATGAAAAGAAGGAAGTTGATTTATGTCTTTAATTAGGCGTGATTATATTTGCCAATATGAGTGATC~ C 3986 3869 CTCATACATTCATGTCTATAGAATCTTCCTTCTTTGGACATAAGCAACTTAATGCTTTTATGT~C~GGCCAGGC G G T -G 4066 3949 ACAGTGACTCACACCTGTGATACCTGTGTTTTGGGAGGCTGAGGCCAGAGGATTCCTTGAGCCCAGGAGTTCAAGACCAG T A C 4146 4029 CCTGGGCAACATAGCAAGACCCCATCTCTACAAAAATT~GAAATGAATTCAGAACCAATAGATTCTGGTTTAGG 4226 G 4109 TGCTTCAACAATCCAGAAGTCTCTAATATTGGTGACGCCCATAGTCCCTTAGTTCCCCAACATTATCTCCAGATGCTGCA C GC 4306 4189 GGCCATCACCACATGGGTCTGCAGTCCTGGAGGCTTTGCCTTTTGTGGCCACAGCCTTGTCTCCTGTCTACAC~CCCAG G 4386 Exon 26 .* P Q 1128 4269 AGCTGCGAGGAGAGGAATCTCCGGGAGAATGGGTATGAGTGTGAGTGGCAATAGAACAGCTGTGCACCTGCCTGTCAAGT C GC T 4466 S C B E R N L R E N G Y E C E V Q * N S C A P A C Q V 1155
Y
258
Biochemistry, Vol. 30, No. 1, 1991 4349 4546
Mancuso et a].
TACGTGCCAGCACCCTGAGCCGCTGGCCTGCCCTGTGCAGTGTGTGGAGGTCTGCCATGCCCACTGCCCTCCAG~GAGG C T A G T C O E P ~ P L A C P V O C V E V C E A E C P P G
1179
4429 4626
CCTCTATCCCTGGGGGTCGGGCTGGTGGGATGGGATAGGGATGGATGGAAAGGTGCTTCTAGGTCTTGCTTCATCTCAGC A
4509 4706
CTCCAICCTGCCACATCCCATCTCTGACCTGCAA~CCTGCCACCCCATCTCTGACCTGC~~CCTGCCACCCCATCTCTGA _________________________________________----~
4589
C C T G C A A C I C C T G C C G C C C C A T C T C T G A ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ C ~ ~ ~ ~
G
T
---___________________________________
4669 4740
ACCTGCAA~CCTGCTGCCCCATCTCTGACCTGCAAG~CTGCCGC~GGTTCCATGGGTTCTTTCATCAGAGTCAGGCCAGT ___________________-________________ G T G A
4749 4784
TGTGATTTTTCTCAAGTCGAGCTCCTCCAIVLATGCTTTTCTGTGCCTATTTGTGGCATTCTCATCTAAAGCAGCCCCTGC C A G C
4029 4864
CGATAGAACTTTCTGCAGTGGGGG~TGTTGTATTGAATGTAACTGTGACGAGTGTGTCTGAGGAACTGAATTTTTGAAT G
4909 4944
TTTATTGAATTTTACTTAATTTAACTTAAATAGCCACATGCAGTTTATGACTATCCTATGGTACAGCACAGCTCGAGAGT G C G G T
4989 5024
TACTGAAGGTGTTATTCAGAAGAGCAGAAAGAGCCCTGGAGATAAGATTTCATTTGTCCTGAGGCTTGGGGAGGTGAGGT C
5069 5104
AGGATGAAGGAATCCCCGCTCCCAGTTTTGCAGAGGGATCAATCAAGGCACAAGGAGGAGAGATGCTCCTTGAGTGGCAG G C ATG
5149 5184
GGTGACCCCTAGGGGTGCAGGCAGGAGGAGTTGGCTTCTAGGGCAGGAGGAGGAGTTGGCTCCTCCCTTTTAGTT~ G A
5229 5264
TGAGACTTCCTTGTGGGAAAGGGGAGCGTTTTGGTTCCTAATGAGAGCTTTCTTTTGC~GGAAAATCCTGGATGAGCTT G C Exon 27 + G K I L D E L 1186
5309 5344
T A
C
G
L Q T C V D P E D C P V C E V A G W R P A S R K K V T l 2 1 3
R
G
5389 5424
CTTGAATCCCAGTGACCCTGAGCACTGCCAGATTTG~AIVLACAGATGCCTGGATTGTTTG~GTGACGAATCTTATTGC T G T L N P S D P E E C O I C 1225
5469 5504
TTCTCCAGGGGCCAGCTCAG AG ACTAGGTTTTGGGT T
AAAA
5549 5584 5629 5664
TGGACTAGACTGCTTCCTCCACATATATTCCTCTACAATGCAGGTTTTT
5709 5744
GCTTTTCTCACTGAAGGTGGGGGTATGGTCACTTATTGTCCTTGTAGGTCATGACTAGTCTGAGCACAGGTGTCCATAT~
5789 5824
CTGGAGGTGGCGACTCAGGAGAGGAAAGTGAGATGGAACGCGGGGCCCTCAGGGTAACATGAACGTCGTGGGGGCCATAG G C A
5869 5903
TCAAGGTGGCAGGGTGTGGCTGTCATAATGGGGAGGGAAGGTGCTCGTTGCTCTCACCCTGGGTTCTCACTGGACTTGCC
5949 5983
TCTGGGAAACTGGCCTTTGGGGAGGTGAGAAGGAGTCCTCAGGGTGAGCTCTGAGCCCTCTGAAGTCCTGCTCATAAGTA
6029 6063
GGGAACATTTATTAAAATCCACATCTATTCAACACCACCACTCCCTGACACTGATCAGGCCTTCTGGGAGCTT~CTTGAGA G G A A
6109 6143
ATGTAAATTCGCTGGGGAGGGGCTATGGTCACAGCTGCCAGAGTCAGATCCTGCCACCACCATCTGCAC~CAAGAACG C T G T T
6189 6223
AACACTGCCTGGGGTGAGTTTTCTAGCAAACACAGGAAACTAGAAAGTATGTCACTCACGCACCGTTAACCAAGGATTGA T T
6269 6303
AGCAATAAATCTCATGGATCATGGGAAGGAGGAATTAACTTCCTAGTAAGACTCTACCTGCGCACACTCTTTATG~GAA TG C
6349 6383
ACAGGGACTGTAGGGAGCAAAGGATTGGGTGATTCTGATGCTGGCTTGGGGTTCTGTGCC~GGGCATGAGAGCTTCG A C
6429 6463
AGCCCTCTCCCCACCTCCTTTGCTGGCACACGGGCAGGGGCAGCTGCCACGGTGCACGCTGAAGCTGC~GCAGTGTGTG
6509 6543
GACAGAAGTCCCACAGCAGGCTGGGGGCATCACCTGGTATCAAGCCACAGGGGGACACTTCCCAGGCTCAGGGCACTGCC T G G
6589 6623
GCCTTGTCAGCTTTACTGGGAGTGCTGAAGGCCAGGTGCCTCCAAGGCTGAGAGTCCAAAGAGGAGATGGTCCTGGCCCA
C
G
C
von Willebrand Factor Gene and Pseudogene
Biochemistry, Vol. 30, No. 1, 1991 259
6669 TATTGGTCTCCACCACATGGCCACTTGATTTCACCTGTGGGCTGGCAGGCCGTGCCACTTTCCGGGACACTGGC~G C T 6703 6749 GTGATGTTACCAGCAGAGCCTCGAAATTTGGCATAGTAGTGTAGGAATTTGTGTGTGTGGGGCCACTGCTCTACTACAGA G G G 6783 6829 CTTGTGGTGTGTGCTTGGGGCTGGGCACAGAGCCTGGCCAATACCTGGCACATGAGAGGTGCTCACCCTATGTCTGCTGG G 6863 6909 AGGAATGAGTGAATGTCTTTTCTTAGGCCTAACACCTTGGGGACCCCTGTCTGCTACTCCCAAATAATTTGTTCTGCTAG G G A 6943 6989 ATAAAGCTGGGGTTTTGAATCTTTATTCATCTTTCCACAATGTTTTTGCTCCAATTTTTTCTCTCTTCGTTAGGGCCCAA T 7023 7069 CGTATTAAACATGTTTTACTCATGCAATTTACATTCTGTTAGAATATGTACTGTAAAGCATTTATCTTCAACAGAAC 7103 A C 7149 ATGTACAAGCACAGGTTAGCGTCTACATTTTAGTAACATTTGTTCTTGGAGACACTTGTAAGAAGGCTTAGATTATAATT A G 7183 7229 AAGTATTCATGTTGCTAAAGATGTAGGTTGAAAAGAAATCAGCTTTGAGTGGCTGTGTGGATGGCCTAGAACAACGAGTA T 7263 7309 TATTTTTTCATATCCTAGGATGGGCTAAAGCTTTGGTTTCTAGAATCAAAGGAAGTCGGCTATGTGTGTGTTTTGAAGTG GT 7343 7389 GGGGGGCATGCTATTTGGGGACAGATGTTAAACAGTGACATCTCACTTGGGTGTGGAATGGTCCGTGGGATCTCAAGTTC A A A 7423 7469 AGGTGGAACAGAGGAGATTCTGTGGGAATATGGAAGTAGTCATACACTGTAGGGCTCAGAAGTGTCCATAGGTTCTTTCT CA TG C C 7503 7549 GAACCATTTTAATTTCTTCGCTCTTTTCTGC~CCACTGTGATGGTGTCACCCTCACCTGTGAAGCCTGCCAGGAGCCGG T A 7583 Exon28iE
V T L T C E A N
C D G
V
C
Q
E
1240
P
7629 GAGGCCTGGTGGTACCTCCCACAGATGCCCCGGTGAGCCCCACCACTCTGTATGTGGAGGACATCTCAGAACCGCTGTTG G G C 7663
1267
G G L V V P P T D A P V S P T T L Y V E D I S E P L L P
7709 CACGATTTCTACTGCAGCAGGCTACTGGACCTGATCTTCCTGCTGGATGGCTCCTCCAGGCTGTCCGAGGCTGAGTTTGA G 7743 E D P Y C S R L L D L I P L L D G S S R L S E A E P E l Z 9 4
V
7789 AGTGCTGAAGGCCTTTGTGGTGGACATGATGGAGCGGCTGCGCATCTCCTAGAAGTGGGTCCGCGTGGCTGTGGTGGAGT C C 7823 V
L
K
A
P
V
V
D
~
~
E
R
L
R
I
S
*
K
Y
V
R
V
A
V 1320 E
V
Q
7869 ACCACGACGGCTCCCACGCCTACATCGGGCTCAAGGACCGGAAGCGACCGTCAGAGCTGCGGCGCGTTGCCAGCCAGGTG 7903 A
1347
Y E D G S E A Y I G L K D R K R P S E L R R V A S Q V I
7949 AAGTATGCGGGCAGCCAGGTGGCCTCCACCAGCGAGGCCTTGAAATACACACTGTTCCAAATCATCAGCAAGATCGACCG T T 7983 R Y A G S Q V A S T S E A L K Y T L F Q I I S K I D R l 3 7 4
V
P
8029 CCCTGAAGCCTTCCGCATCGCCCTGCTCCTGATGGCCAGCCAGGAGCCCCAACGGATGTCCCGGAACTTTGTCCGCTACG C 8063 P
E
A
P S
R
I
A
L
L
L
~
A
S
Q
E
P
O
R
~
S
R
N
P
R 1400 Y
V
8109 TCCAGGGCCTGAAGAAGAAGAAGGTCATCGTGATCCCGGTGGGCATTGGGCCCCATGCCAACCTCAAGCAGATCCGCCTC T 8143 V Q G L K K K K V I V I P V G I G P E A N L K Q I R L
1427
8189 ATTGAGAAGCAGGCCCCTGAGAACAAGGCCTTCGTGCTGAGCGGTGTGGATGAGCTGGAGCAGCAAAGGGAAGAGATCGT 8223 C A C I E K Q A P E N K A P V L S G V D E L E Q Q R E E I V l 4 5 4 S D
8269 TAGCTACCTCTGTGACCTTGCCCCTGACGCCCCTCCTCCTACTCTGCCCCCCCACATGGCACAAGTCACTGTGGGCCCGG A 8303 S
Y
L
C
D
L
A
P
D
A
P
P
P
T
L
P
P
E
~
A
Q
V
T
V
P 1480
G
E
8349 GGCTCTTGGGGCTTTCGACCCTGGGGCCCAAGAGGAACTCCATGGTTCTGGATGTGGCATTTGTCCTGGAAGGATCGGAC 8383 G G C G
L
L
G
L V
S
T
L
G
P
K
R
N
S
~
V
L
D
V
A
P
V
L
E
G
S
D1507
8429 AAAATTGGTGAAGCCAACTTCAACAGGAGCAAGGAGTTCATGGAGGAGGTGATTCAGCGGATGGATGTGGGCCAGGACAG 8463 G R
I
G
E
A
N
P
N
R
S
K
E
P
~
E
E
V
I
Q
R
~
D
V
G
Q
D
8509 CATCCACGTCACGGTGCTGCAGTACTCCTACATGGTGACCGTGGAGTACCCCTTCAGCGAGGCACAGTCCAAAGGGGACA 0543 I
H
V
T
V
L
Q
Y
S
Y
~
V
T
V
E
Y
P
P
S
E
A
Q
S
K
G
D1560
D
S
~
~
~
~
260
Mancuso et al.
Biochemistry, Vol. 30, No. 1, 1991 8589 TCCTGCAGCGGGTGCGAGAGATCCGCTACCAGGGTGGCAACAGGACCAACACTGGGGTGGCCCTGCTGTACCCCTCCGAC C C G T T 8623 1587 I L O R V R E I R Y O G G N R T N T G V A L L Y P S D L R L 8669 CACAGCTTCTTGGTCAGCCAGGGTGACCGGGAGCAGGTGCCCAACCTGGTCTCCATGGTCACCAGAAATCCTGCCTCTGA C A G 8703 E S P L V S ~ G D R E O V P N L V S ~ V T R N P A A Y G
S1614D
8749 TAAGATCAAGAGGTTGCCTGGAGACATCCAGGTGGTGCCCATTGGAGTGGGCCCTAATGCCAACGTTCAGGAGCTGGAGA C G 8783 G 1640 K I K R L P G D I O V V P I G V G P N A N V O E L E
E 8829 GGATCGGCTGGCCCAATGCCCCCATCCTCATCCAGGACTTTGAGACGCTCCCCTGAGAGGCTCCTGACCTGGTGCTGCAG T T C 8863 1667 R I G Y P N A P I L I O D P E T L P * E A P D L V L O R 8909 AGGTGCTGCTCCGGAGAGGGGCTGCAGATCCCCACCCGCTCCCCTGCCCCTG~ATGCTGGCACCTTGTGTGCAGGTGGG 8943 T A R C C S G E G L O I P T R S P A P L
1684
8989 AGGGCTGGGCGAGGGCTGGCATGGCCTTGGTACTACATGCATCTGCCAAGATATGACTTGGGTTCTAATCCTGGCTTCCC 9023 G c c 9069 TGGTCTGTGTGCCCTTGGTTGAAACTTGCCTTCAAAGGGCCTGTGTTTCCTCACCTCCCTGGCAGGGAGACAAACTGTGA 9103 G 9149 TCCTTTTGCGGGGGAAGATCTTTTCTTATTGTGTGTGCTTCTGTAAATTCAAAGGTAATGGAAGCCTATTGTTAACATTT T A 9183 9229 AAACAATGCCACTAAGGAGTAIYVVV1AAAAA--TATGCAAGTCTTACTTTATCACACGGCCTCTTCCCCACCCAATTCCA 9263 A AAA 9307 CCCTCCACCTTCAAGGTAACCTCTGTTAAAGGTGATGATGATAAGCTTCGTTTTGACCACACTGGGTGGGTGAAATTATT TG G A 9343 9387 9423 V
9467 GTGTCTGCCTTAGTTTGAATCTTATCTCTGCCACTTGCTAGCTGGGTAATTTTGGGC~TACTT~CCTCTTTGTATAT T 9503 --A 9547 9581 9556 C A T C T A T A A A A T G G G G A T A A T A A T A G C A C C T A C T T G C T T A - T G A A 9660 9634 CTTGCTCCCACTGCTGTTATTACTACCTTTTACTGACACTGGTGTCTAATCCATTCCTAGTTCCTGAACATCTTTATTCG C C 9740 9714 GTGTGTTTGGGACCATCTCAGAATAATAAGCCTTCTTTTAGTATCTTTTGAGTCTCACACTTGTCAGTACTTTTGTTTCT C A 9820 9794 TTGTTTTTGTTTTTATTCATGGCCATAGATTTATTTAAATTCTTGTAATATTTCTGCTGAGGAAAACAACAATTACATCA C 9900 9874 TTTCATCAAATCTCAGATGTGCTCAGACACTAACAGGAGCACTAGGCATTTATAGCTGGAAGAATCACAGTACGTTCACC 9980 T 9954 TGCCTTGCAAGATCTGAGGACACAGCAGCTCATTGTCCAGGGAGGGGCTGCCGCTCCATTTCCTTTTGCAGT---CTTGT 10060 C CTC 10031 ATTGCCAGGCCAGTATTTTACTCATTTCTAGAAGAGTGGCGGCCCCTTCTTACCGAGQAAGCCCATGCCTACTGCTTTTA 10140 A T T G 10111 TTTGTAGACATTTAAGCTTCCTTTGGGTAGAACTGGAGGCTTCTCAGTGTCTCCAAACCTGAGGTAGTCCAGCCCCAGGA 10220 A T T T T G A 10191 ACTCCATCTCCCCTATCCCCTCCTCCCTGGCCCACGTTGCCCTTGCACTCACAAAGGCACCCCCCGCCCCCCTTGGTGGT 10300 A T G 10271 GCCACGTGGTCAGCACGCCCTGCAGATCCTATGGGATGTCAGGTTGTAGGCCTGGCGGCCAGTGCCCCTGCTGGCACCTG 10379 T T T T 10351 TGTGCTCACCTTCCTGGTTATCTTTGC~ACTGCAGCCAGCCCCTGGACGTGATCCTTCTCCTGGATGGCTCCTCCAGTT 10459 G Exon29+D C S 0 P L D V I L L L D G S S S
1701
10431 TCCCAGCTTCTTCTTTTGATGAAATGAAGAGTTTTGCTAAGGCTTTCATTTC~GCCAATATAG~GAGTGAGTGAGG 10539 A c c G C P P A S S P D E ~ K S P A K A P I S K A N I 1723 Y 10511 CACCTGAAGCAGCAGGTGACGAAGAGGCTCTTTTTGGGGCTCCACTTGATTCAGAATAATCCCCATTTTCTTGTTCCG~ 10619 T T A G C
von Willebrand Factor Gene and Pseudogene
Biochemistry, Vol. 30, No. 1, 1991 261
10591 TAGGGCCTCATCTCACTCAGGGGTCAGTGCTCCAGTATGGAAGCATCACCACCATTGACGTGCCATGGAACGTGGCCCCG 10699 G T G T G P B L T Q G S V L Q Y G S I T T I D V P W N V A P R V V
1749
10671 GAGAAAGCCCATTTGCTGAGCCTTGTGGACGTCATGCAGCGGGAGGGAGGCCCCAGCCAAATCG~AACGTTGGTGCCAC 10779 E K A E L L S L V D V kl Q R E G G P S Q I c E x o n 3 0 1770 10751 AGGCTGGATGCAGAAGCTGCATTCTGGTTCTTATTTTTGGCATAAGTGACTGTGTGACCTTGGCCAGTCACTTTGCTCCT 10859 C 10831 TGGCCTTAGTTTCTTCCCCTGGAAAGTGAGGGGCTAGATGCTCTTCCACGTCTCTCCAGATCTC~CTGGGTGTTCCTTG T 10939 10911 GAGTCTCTGAATCATTCAGCTTTTAAGTGACTTAAGGATCCACCGT-GAGACAGGGTGTCCAGCCGCAGTCAGTACTGAC 11019 T T G 10990 TTGGTGTGATCTGTTCTCCATCCTCAGGGGGATGCCTTGGGCTTTGCTGTGCGACACTTGACTTCAG~TGCATGGTGCC C T 11099 E x o n 3 1 I G D A L G P A V R E L T S B kl E G A 1788 Y 11070 AGGCTGGGAGCCTCGAAGGCGGTGGTCATCCTGGTCACGGATGTCTCTGTGGATTCAGTGGATGCAGCAGCTGATGCCGC 11179 C A C R L G A S K A V V I L V T D V S V D S V D A A A D A A 1815 P 11150 CAGGTCCAACAGTAAGAATCTGGTGTACAGTCCTCAATTCAGGAGAGCGAGACTTCAGAAATGATAGAGGATT~GTG 11259 R S N 1818 11230 GCCTGGGGTTACTTTTGGATGTTGTGGCTGATATTAAATATGCTCAGTACCAACCTCTGTCCTGGTTTC~CAGCAGGA 11339 G A 11310 A A T T A T C C A A T C A C G A T G T G A T T A G G ~ A C C A A A C A A T A C A 11419 11390 CCCTTTAATTTCTCCGCATTTTCCCTTTGTGTCCAGGAGTGATCATACCTCTATGCCCTTTCTTTCTAAAGTAGGATTCG A 11499 -T 11470 TGGTCTCTTACTAAAATGTCATGTCTGAGGCCAGAGTGATTCATTGTGGGGCCTTGAGAACCAATGGGGACTTTCATGTA 11578 A G 11550 TTTAGAGTCCTTTCCTGGTACCAGGTAATGAGCTTGGCAGGGTTGATGGATGAGAGAC~GGATCATGCTGTAGGAAAG 11658 G 11630 GGAAGGATCCAGATGTCCAGCCTTGTGCTCCATAAGAACATGTGCTTATCATGTTCTTAATATCAACAGGATGTGAGCTG 11738 11710 GGACAGGTGGTAGAGTCTGAGGGACCACCTCCCAGCTGAAGGAGGTCTGGGTGAGTGGAGTGACATTGAGTCCCCTCGTG 11818 11790 GCCACCTGGAGAACTGTTCATCTTTCAGAACCCTGCCTAGGAGTGCTGTCAGAACCCTGTCTTTTCTCTGCTGTCTTCCT 11897 T
T
11870 GATGTTCTTCCTGAGGAAGTTATTTGCTCCGTTCTCTATGTCACATATGAGCTACTAAAAATGTGCAAATC-----GTAT 11977 C E-T
V
V
T
T
T
11945 TGTATTGTATTTGTAACATATCTATTCCTTGCAAAAAACTGGGAGTTCTTTGAAGGCAAGGGAAGATCTTGCTTATTTTT 12057 12025 GTATCATCTGTGTATGATGCACAAAATAGGTGACCATTTAATGTTTGTTGAGTGAATGAATGTGACATGTTTATGAAATC 12137 C 12105 TATAATTTTAGGGTAAAGGACACTAAAAAGGAAAGGATATATATTTGGATGATATAACGAACTTACAAGGGCTG~GC G G A A 12217 12185 TGCTAGTAGAAAATCCATTCCAACCTGAAGCACATTGCAAATTCCTAGAACTTTCTGGAGAGCTGAATTGAGGAATTTCA A G 12297 12265 GGATGGAGATTTTTTCTCCTTTAGTATCAAGAATCCAAAGAAAGATTTTCACTGGG~GGAAGCAGTTGTGGTCTTGAG C A 12377 12345 GAAGCAGATATGATCCAGAGCACGTCATTTAACCATCTAAAGCCTCGGTTTCTCATGTGT~TGAGCATGCACATTCTC A G C 12456 -12425 TGGGCTCAGCCCTGTGATATTTTCTCTTCTCTTCTTACATATACCCTGTTGGTGGTCTAGTTCAGTGTCATGGGTCTG~ cc A 12534 12505 TATATTTATACATCTCTGACTTTGAAATTCGTATCTCCAGCCTCACCTCAACTCAGCTTGCATGCTCT~CAGGC~C C T T T 12614 12585 T A A A C T C C T G A T T T T C A C A T C C G T C T T T C T T A T C T T A G A A G C T A 12694 12665 CCAGCATCCTTGATTCTTCTTCCTTTTTCCTGATACCTCCATGCAAGCCACG~AAACTATAGCTCCATCTTC~TA 12774 T C 12745 CTTTGAGAGTTGCACCTCTCCACACCATGTCCACCAGGGTCACCTCAATCTGAGTCCCCAGCATCTCTCACCTGGACTAT C CA C T C T 12854 12825 TGAAATAGCTGTCTAACTGGAATCCCAGCTTCCACCTTTGTCCCCTGCAGCTCATTCTCAGCACAGCAGCTGCAGTAGCC C A G 12934
262 Biochemistry, Vol. 30, No. 1, 1991
Mancuso et al.
12905 13014
CTTTACTATCAAAGTCAGCCCCTGTCCCTCTGCTCTGTAGTCTTGGCTGG-TTTGTTTCATTCACAGAACATGGTGAGCC A A T C
12984 13094
CTTACTGTGGCCAATGGAGCATTGCCTGATCCATGCCCTTTCCCTCTCTTGCTTGTCCCTCTCTCCCCAGCCCCCGGGAG
13064 13174
CATGCAGTCACACTTCTACTCTGGGGCCTTTGCACTCACTCTTCTCTCTGCTTCGGATGTCCCTCCCTCAGACATCCTCT T G
13144 13254
TGGCTCCTTCCCTCCTTACATCAGGTCTCTGCTCAGATGCCATCTTTTCACCAAATGCTTCTCCAACTGCCATATATACA A T G
13224 ACGATAACCCACACCTCAGCCCTGGGCATTCTCATCCCCCTTATCTCATTTTGTTTTTTTCCTAAGCATTTATCCACATC 13334 13304 TGAAATTCTATATATTTAC----TTTGTGTGTTTGTTGTCTGTCTCTCCATGAGGACGGGGGACAGGGAGGGACTTTATG 13414 TTTG A 13380 TGCTTGGTTCACTGCTGTACCCCTATTGCTGACAATAGTACCTGACACAGAGTAGGAGCTCATTAATATCTGTTGACTGA 13494 T C 13460 A C A T C T T C C T C A T A G G G C T G G T G T A T G T G A C C A G C C T G G A C G T C A G G A 13574 13540 CCAGTCCATTTTGAACCTTCTTGCCCACAGATCCTTTCTTGTCTCTTTGCTGACTCT~GAGTGACAGTGTTCCCTATTG G 13654
Exon32 + R
V T V
P P I
1825
13620 GAATTGGAGATCGCTACGATGCAGCCCAGCTACGGATCTTGGCCGGCCCAGCAGGCGACTCCAACATGGTGAAGCTCCAG A G 13734
G I G D R Y D A A Q L R I L A G P A G D S N ~ V K L Q1852 V 13700 13814
CAAATCGAAGACCTCCCTACCATGGTCACCTTGGGCAATTCCTTCCTCCAC~CTGTGCTCTCGTGAGTCTTATAATAC G GO I E D L P T ~ V T L G N S P L E K L C S 1873 R
13780 13894
CTTTCTTACTTCCCTCA-GTTCCAAAACAAACCCGCCAAAGAAAC~TACTATTCAAGAGACCCCAGGGTG A T G
13859 13974
CCCATGCATAAGATTTGGCCTTGAAAGAAGATTATTATATAGACCAGGACATTCAAGTTTCTGCTCATGGGAGGTGTATA T C T T
13939 14054
CCACAAGCCCCCGTGGGGCCAGCAGAAAGCTTTCTGTGTATGGTCCTATTTCCACTCTAATTGGCCATGTGAACTTGGCA A T
14019 GAGAG--CCTAAGCTTATTTGTGTAGAGGGATCTTGGACAAAGGAATCAGAAGACATAAGCAAGGAACCCCAGAGTTCCA 14134 AG A C G 14097 TAG----AGTCTTCTTCCCCATGGGCCTAGACTCACTCTCAACATGGGTATAAAGGGCTTTAG~TACAATAACATGGA GTAT A G C 14214 14173 GACTCATATCAAAGTACCATAGTTTAAGTTGATTTTAGGTTAGAAACTTAAAAAATACGCTTTTG~CAGGCACGGTGGC 14294 -TTG A 14253 TCATGTCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGTGGATCACGAGGTCAGGAGATCGAGACCGTCTTGGCTAAC C C T A 14374 14333 ACAGTGAAACCCCGCCTCTACTAAAGATACAAAAAATTAGCCGGGCATGGTGGCGGGCACCTGTAGTACCAGCTACTTGG T C A A G 14454 14413 GAGGCTGAGGCAGGAGAATGGCGTGAACCCAGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGTCACTGCACTCCAGCCT 14534 T G A C T 14493 14614
GGGCGACAGAGCAAGACTCTGTCTCAAAA----------CACACACACACACACACACACACACACACACACATGCTTTT TcA TG CA --- ------ ---- T
14563 TTTG-GGGGGAGGCCTGGGTTTATATATTTTCCCGTTACTAGATGTAAGTCAAAACCTGCAT~AGCTACTGTCTTTCAG 14681 T C G G C G 14642 GGGAATAAGTCAGTGCAAGTTTGCCCTTAAAGGGCAATAACTCTACACAAGTTTTGACTTATAGCAAAGATGGTCACATTAGCTG A TG T A 14761 14722 TACAGAGAGATGGCAGCTCTCCTGGTAGGAATCTTCAAGTAGATCTCTTTCAGGTTTCCAGGATCTTGCTTCATCTCCCC 14841 14802 14921
ACCTTCCCCATCCCTGGCGTGATCTACATGTGAACCAAGATAATGACAGCGTAAGCTGTAGTTATTGCCATATTATCGCT
14882 GTTGTTGACATCATAATTATTAACAACCGCAGAGCATGTCTGAAGAACCACAGGATGACCACCTCAGCCACATGTCCCTA 15001 G T T T 14962 GGTCTCCACTGTTAACCTTGTTCAGATTCTTTTCAGAGTTGAGTTGACTTC~CTAGACCAGGTTGCTTAAGCAGAC 15081 T 15042 ATTGTGAATGGTTCAGAATTTCTGGGTGAAAGATGGGATGGGAACTAAGGTCTTATTTGTGTCTGTTGC~GATTTGTTAGGATT 15161 Exon 33 + R P V R I 1878
G 15122 15241
CGCATGGATGAGGATGGGAATGAGAAGAGG~AAGTTCCTTTCTGTTGACTTTGAAAGAAAGATTAGAGATGTGTTTGGG T G R ~ D E D G N E K R C
1888
von Willebrand Factor Gene and Pseudogene
Biochemistry, Vol. 30, No. 1, 1991 263
15202 GCTCTTGTTCCCACTGTTTAATTTTTCCTCCTTTGGTCTTAGTCCAGCGCTTCCTCTTACTATTATCTTGTTTTTGAGGG G T T C 15321 15282 TCCATCTGTACATCTTGTGTTTTGCTTCCTGTCTCATGTACAGGAGGCCTCCTTGCTGTGTAGGCCCGTGTTCAATTCTA G T 15401 15362 GGGGTCAGTTGTCTGGCAGATGGGCTTAGAGTTGGAGTACCTCATCTTATTCCCTGCCTGAATCTGCTGTTTTCTTCTGC 15481 15442 AGCCCGGGGACGTCTGGACCTTGCCAGACCAGTGCCACACCGTGACTTGCCAGCCAGATGGCCAGACCTTGCTGAAGAGT 15561 P G D V W T L P D O C E T V T C O P D G O T L L K S
1914
15522 CATCGGGTCAACTGTGACCGGGGGCCGAGGCCTTCATGCCCTAACAGCCAGTCCCC~GTTAAAGTGGAAGACACCTGTGG T G G 15641 E R V N C D R G P R P S C P N S O S P V K V B D T C G 1941 L B 15602 CTGCCGCTGGACCTGCCCCTGTGAGTCCTTTGCTTCTCCAGCCAGGGCAGCGTCAAAGGGGCAGTGCTTTTAGCTTGGCT 15721 C R W T C P + Exon 34
1947
15682 GTGCAGAAAAGTAGAGCAGGCACCCACCAGCCCAGAAGTACCCTTTCCCTCATCACCACATGCACAGTGCTACCTTCACT 15801 15762 CACCTTCCTTTCCTCCTGTGCTCTTTGGACATGCATGCAGCCAGTCTCAGGGATCACTGCCCTCTTTCTCTGTCTTTGGA 15881 15842 AGGCACTTCCCCAGATTATGCATAACCGGAAGGAAG~TTGCTTTTCTGAGGTCAGTGCTCAGCTTGGCTGTTGGCAAGT T A 15961 15922 CAACCTTTAGGAATCTATGTATTCAGGGTATAGCAGTGGAAGTATAGCAGTGAGAGAGATGCTTCCAGAAAACTCTCCAA G C 16041 16002 CCACACCTATAGCAAGAGTACATCACCAAGGTGGCCTGAATTAAGAGGAGGAAATATTTAAAGTCCATCTCACGTACACT C C G T 16121 16082 TGTGGAGAGCATTTCATCCATGGGCTGGGTTAGCCTTCTGTCTCATGTTTGGGAGCAATGACTGCTTATGGTGACCATGA G 16201 16162 C T C A G C A G T G G A T T G T A A C - T T T A T G C T G A A C A G G C A T A A A C T G T A T G T T A T A G C T C A T T C C 16281 16241 CCCCTTTAGAATTTATTTAGTGAGATAAATTTCCCTCTTGGTCTAGGTGAAAACTCCACCAGACTTTAGGTTTCTTCTGT 16361 16321 TTATTCATAAAGAATTCTATAGTTACTTTGTCAGAAGACAAAGTCTCAGTTGATAATTAGATACCCACTCTTGTGT~TA C C 16440 16401 ATCAAAACTTTGAATTTGATTAGAGTTAAAACACCTCCCAGTTCCAATTTGCGCTGCCTGCCTCTGGGAAGGAGGGTATG AG 16520 16481 GTCAGCTTTCCTCTGTACAAAGAGCATTGCCTTTTGTGATTAAAAGCATATCCTGGAAATTACCCGATGTCATTTCAAAG T T 16600 16561 AGAGCTTCCTAATACTTCACTGTGTACCATAGGTTACCCAACTGTTCTTCTATGGATGTGCATTTAGGCTGTTTACAGTG G C 16680 16641 TCTTGTAATGACAAGCAATGCTGCAATGAGTAAACCTGCACATACGAA----TTGTCATATGGTTGGAGGTGTGTCTTCA A CGAA T 16760 T 16717 GGGTAGATTCCTAGAAGTGAGATTCTTGGGTCAAAAGATAAGTGCATATGAAGTTCCCCTTAGATATTCCCAGATTCTCC C 16840 16797 TCTAACAGGTGAGATGTGGTTGTGCAAAACTGGAATGTCCATCAGTTTGCTTTTCGATCAGCGAGGTATGAGAGTGCAAG G T 16920 16877 CTTCTCCATGGCCTTGCTGAGTGCGTTGTTACAGTGTTTTTTTTTTTT-----CAAATTTCTTACTTTTCAGGAAAGAAA C T T A A TTTTG 17000 A 16952 TAGTATCTGAGTGTACTTGAAATTCATTTCTCTAATGATGAGTGAGGGTAAACATTTT-TCATATGTTTAATTGCCA~ C T A 17080 I C Line-1 17031 TTTGGTCTAATTTTATGAATTTTCTGCTCATGCCTTGCCCCATTTTCCCATTAGAAGAGTTGTGATAA~C~CATGAG G G A G 17160 17111 ATCTACCCTATTAATAAATTTTCCAGTGTACAATACAATATTGTTAGCTATTTGCAAGTTGTTGTACAGCAGACCTTTAG C T C 17240 17191 AAGTTTTTGTTTTGTGTGGCTGAAACTATGCCCACTGAACAGC~CTTCCCATTTTCCCCTTCCCCTAGCCCCCACAACC A G 17320 17271 ACCATTCTGCTTTTTGCTTCTATGAATTTGACGATTTTAGATACCTCCTGTAAGTGGATTCATGCAGTAGTTATCCTTCT 17400 17351 GTGACTGGCTCATTTCACTTAGCTTAATGTCCTCCAAGTTCATCTATGTTGTGGCATATGA~GGATTTCCTTCTTTCTA 17480 CA 17431 AAGACTGAATAGTATTCCATTGCATGTATATACCACATTTTCTTTATCCATTTGTCCACTGATGTGCATTTGGATTGTTT A 17560 17511 CTAACTCTTGGCTATTGTGAAT~TGTTGCAGTGAACGTGGAAGTGTA--AGTATCTTTTTGAGATCCTGATTTCAATGC T TA C 17640
264
Biochemistry, Vol. 30, No. 1, 1991 17589 17720
TTTTGGATGAATACCCAGAAGTGGGATTGCGGGGTCATATGCTAATACCAGTTTTAATTTTCTGAGGAACCTTCTTACCG A T T T T
17669 17800
TTTTCTGTAGTGGCTACACCACTTTACATTCCTACCATCAATGCACAAGGGTTTCAGTTTCTTCACATTCTTGTTAACAC C G A
17749 17880
TTGGTATTTCCTG----GTGTGTGTGTGTGTGTGTGTGTGTGTGTTTGATAATGGCCATTCTAGCACGTATGAGGTAATA G G
17825 17960
CCTCATTATGGTTTTGATTTGCATTTCCCTGATGATCAGTAATGTTGGATATTTTTTCATATAATTGACTTTTTAGATGT G G
17905 18040
CTCTTTGGAGGAATGTCTTTTCAAGTCCTTTGCTCATTTTTAAATTGGGTTATTTGCATTTTTTTTTTTTTTGCTATTGA
17985 18120
GTTGTGTGAGTTCCTAATATATTTTGGATATTAACCTTTTATCAAATGTACGGTTTGCAAATATTTTCTTCCATTCTGTA A C
18065 18200
GGCTGCCTTTTTATTCTGTTGATTGTTTCTTTGGCTGCACAGAAGCTTTTTGGTTTGATGTAGTCCCACTTGTCTATTTT
18145 18280
TGCTTTTGTTGCCTTGTTTTTGGTAACGATATCCAAGAAATCGTCGCCAAGACCAATGTTGCAAAGTTCTTGTGTTTTCT C T
18225 18360
TTTAGGAGTTTTATAATTTCAGGTTTTACGTATCAGTGTTTAAGTTTGATTTTGGGTTAACTTTGTGTATGATATAAGAT G
18305 18440
AAGGGTCCAACTTCATTCTTTTTCATGTGGACAGGT-CATTGTTAGTGTATAAAAACAAATGGTTTTCTATGTTGATTTT T T TG G
18384 18520
GTATTCTGCAACTGAATTTGTTTCAACTTTGCTGAATTTGCTAACAAATTTTTGGTGAAATCTTTAGAGTTTTCTACACA T C TG
18464 18600
TAAGATTATGTCATCTGCAAATTGAGATAATTTTACTTCTTTTTTGATTTGGGTGATGTTTATATCTTCTTGATTCATTG T C G C
18544 18680
CTCTAGCTAAGACTTCTGGTACTATGTTGAATAGAAGTGGTGAGAGTGGGCATCCTTGCTGTCTTCCTGA~TAGAGGAAA C C
18624 18760
AGCTTTTAGTTTTGCACTATTCAG-GTGATGTTAGCTGTGGTATTTTCATTAATGGCCTTAATTTTGTTGAGGTAATTTC G T T T
18703 18840
CTTCTATTCTTAGTTTGTTGAATGTTTTAATCATGAAATGGTGTTTGAATTTTGACAAATGCTTTATCTGCATCTATTGA A T G
18783 18920
CATGACAGTATGATTTTTATCCTTTGTCCTGTTAATGCTGTGTATCACATTTACTGATTTTTAGATGCTGAGCCATTCTC T T A C T C T
18863 19000
CCATCTGAGGGATAAATCCCACTC----AGAGTGTATGATCCTTTGATTGTGCTGTTGAATTCAGTTTGCTATCATTTTG T CGTC T T G
18939 19080
TTGAGAATTTTTGCATCTATGTTCATCAGGAATATTGACTTGTAGTTTTCTTT-AACGTAAAGTCTTTGGCTTTGGTATC G T T T A
19018 19160
AGGGTAATTCTGGCTTC-------AGGTTTCCCACTTCAATGTTTTGGAAGAGTTTGAGAAAGACTGGCATTAATTCTTC ATAAAGT G G T
19091 TTTAAATCTTTGGTAGAATTATCCAGTGAAAGCATCTGAGCTCTTCTTTGTTGGGAGATTATGATTATTAATTTGATCTC 19240 A A A C C 19171 CTTACTAATTGTGGGGCTGTTCAGATTTTCTGTTTATTCATCATTCAGTCTTGGTAGGTTGTATGTTTCAGGGATTTATT 19320 GA G 19251 19400
CATTTCTTTAGGTTATCCAATTTGTTAGTGTGTAGTTGTTCATAGTAGTCTCTTACTAGCCTTTTTATTTCTGTGCTATC A C
19331 AATTGTAATATAACCTCTTCTATTTCTGATTTTATTTATTTGAGTCTT---TTTTTGTCTTGGTTAATCTAGCTAAAGAT 19480 G G CTC C 19408 19560
TTGTTAATTTTGTTGACCTTTAAAAAACCAATTCTAAGTTTTATTGATATTTTGTATTCTATATTTCATTTATTTTTCCT
19488 19640
CAAATCTTA---ATTTCCTTCCTTCTGCTAACTTAGGGCTTAGTTTATTCATTTTCTAGTTCCTTATGTTGTAAAATTAG TATT C - CA
19565 19719
GTT----GTTTGAGATCTTTCTTACTGTAGACATTTACTGCTGTAAACTTTCCTTTTATTGCTTTTGCTGTTTCCCATAA ATTT A
19641 GCTTTGCTATGTTGTGCTTTAGTTTTTGTTTTTCTCAAAATATTTTCTGATTTTTCTTTTGATTTCTTCTTTGACCCATT 19799 T 19721 GGTTGTTCAAGAGTGTTTTTTAATTTCCACATATTTGAGAACTTTCTGAAATTATTTCTATTAGCAATGTCTCATTCCGT 19879 C TTA 19801 T--GTGGTCAGAAAAGATACTTGGTATGATTTTAGTCTCCTTAAATTTGTTAAGATTTGTTTTGTGGGCTAGCATGTGAA 19959 CT T C C 19879 20039
CTATTCTGGAGACTGTTCTATGTGTGTTTGAGGAGAATGTATAATTTGCTGTTGTTGGGTAGAATATTCTGTATTTGTTT A G
19959 20119
GATAGGTCCACTTGGGCCATAGTGTTGTTTGAGTTATCTGTTTC~TA~GATCTTCTGTCTGGATTTTTTATCCATTAT C A
20039 20199
TGAAAGTGAGATATTACAATTTCTCACTATTATGTTGTTCTATATTTATCATCACTTCTGTCAACATTTGCTTCTTATAT A C T
Mancuso et al.
von Willebrand Factor Gene and Pseudogene
Biochemistry, Vol. 30, No. 1, 1991 265
20119 20279
TTGTGTGCTTTAATGTTGGGTGCAATGTATTTATAATTATATCTTCCTGATGAATTGATGCATTTATCACTATATAATGT C-----G
20199 20353
CCGTTTTTTTGTCTCTTGTGACCATCTTTGACTTAAAGTTTAATTTGTCTAAGTATGGCTACCT-------CTTTTAGTT G C GTGCTGT
20272 20433
AACATTGGCATGAAATATCTTTTTCCATCTTCTCACTTTCAACCTATGTGCATTCTTAAATCCAAAGTGAGTCTCTCTCT A T TGTAGACAGCACATAGTTGGATCTTG~TTTATATTTA~TATTTATTTATTTATTTATTTATTTATTTAATTTATTTAGA T T T T T T T T T T T T V
20352 20513
-
--_---------_-_
L
20432 20578
TGGAGTCTGGCTCTGTTGCCCAGGCTGGAGTGCAATGGCTCGATCTCGGCTCACTGCAATCTATGCCTCCCGGGTTCAAG T c c
20512 20658
TGATTCTCCTCCCTCAGCCTCCTAAGTAGCTGGAACTAGAAGCGCATGCCACCACTCCTGGCTAATTTTTTGTATTTTTA C A G TG
20592 20738
GTAGAGACGGGGTTTCACTGTGTTAGCCAGGATGGTCTTGATCTCCTGATCTCTCATGATCTGCCTACCTCAGCGTCCCA CA C C C
AAGTGCTGGGATTACAGACGTGAGCCACCACACCTGGCTGTTTTATTTTATTTTTAAAAATTCATTTCAGCCACTTCATA C A G c Line-1 el 20752 GCTTTTGATTGGAGAGTTTATTCTATTTATACTTAAAGTAACTACTGATAGGGAAGGACTTACTGTTGCCATTTTGGTAA C TG 20898 20672 20818
20832 20977
TTATTTTCAGTCAGTCTTGTAGCTTTTTTGTCCCTCTTTTCTACTCTTACTGCCCTTCTTTGTGTTTCATTGATG~
20912 21056
TGGGTAATGATGAGATTTGATTCCTTTCTCATTTCCTTTTGTGTATCTTCTATAGATATTTTCTTTGTGGTTACCATGAA
20992 21136
GCTTACTTAAAACAGCCTATACTTATAACAGTCTATTCTAAG
c
c
A
CTTATAACAACTTAATTTCAATTGCTCACGAAAACTCT
21216 21296
TTTTTATAGTGATAGTTATTTTTATAAGTTTGTCTTTTAAGTACTATACCAGAATTC
FIGURE 4: Nucleotide sequence of the v W F pseudogene and gene. Dots above and within the sequence are placed a t IO-bp intervals. The pseudogene sequence is shown in the first line of each comparison. Where different, nucleotides corresponding to the v W F gene sequence are indicated in the second line, below the pseudogene sequence. Regions of the pseudogene corresponding to v W F exons are numbered 23-34. Translated amino acid sequence for the pseudogene is shown in single letter code in the third line. Where different, the translated amino acid sequence of the v W F gene is shown in the fourth line. The 5' boundary of the pseudogene is indicated by double arrowheads, pointing up and down. N u repeats are flanked by arrows that indicate their extent and direction, and flanking direct repeats are underlined. The approximate boundaries of a Line-I repeat near the 3' end of the sequence are indicated (Line-1), and candidate flanking direct repeats are underlined. Simple sequence repeats described in Table 111 are underlined, and the extent of monomer units is indicated by single arrowheads, pointing down. Insertions and deletions necessary to maintain alignment of the two sequences are indicated by dashes (- - -).
Table 1: Divergence between the vWF Gene and Pseudogene substitutions sequcnces compared BP TV TS
TS/TV
CG
-.
TG
gaps Ins
Del
7% divergence
2.38 31 0 0 2.47 2 875 21 50 exons 17816 142 382 2.69 141 24 32 3.25 introns 2.65 172 24 32 3.14 163 432 total 20 69 1 BP is the length of the sequences compared in nucleotides minus the length of any sequence opposite a gap; TV, transversion; TS, transition; TS/TV, the ratio of transitions to transversions; CG TG, transitions that could be accounted for by deamination of methylcytosine; Ins, insertions relative to the gene sequence; Del, deletions relative to the gene sequence. Substitutions are the sum of TV + TS. Gaps are the sum of Ins + Del. Percent divcreence is eaual to [(IOO)(substitutions + gaps)/(BP + gaps)], in which gaps are counted as single events regardless of their length.
-
Table I I : Alu and Line-I Repeats in the Human vWF Gene and Pseudogene Location/ Nucleotides
Repeat/ Orientation
Planking direct repeats
Intron 24 2123-2417
Alu/ inverted
15
Intron 25 3941-4072
Alu/ direct
9
7
*/g5' wg3'
Intron 32 14237-14554
Alu/ direct
8
*5
Length
Source
Sequence
h olg5' WE3'
;
g5
*3 '
g3' Intron 34 20388-20709
Alu/ inverted
Intron 34 17104-20790
Line-I/ inverted
12 11
29 28 27
AGTGCAGTTGTCTCT G AGAAAACTG . .
ACGCTTTT T T C
*/g5' Ug3'
TGTTTTATATTT TGTTTTAT-TTT
Ug?'
TATGTTTAATTGCCATTTTTTGGTCTAAT
$3
83'
G G
C C
ATG AT-
-G -G
flanking this Line-1 element were not identified with certainty; candidate 29 bp repeats are underlined in Figure 4 a n d listed
Table 111: Simple Sequence Repeats in the Human vWF Gene and Pseudogene no. of reoeats pseudoconsensus location gene gene (CAI. 2513-2539 -12 -13 14522-14555 IO 17 -1.5 -7.5 45 14-47 13 CCTGCCGCCCCATCTCTGACCTGCAAG 5539-5624 15 15 (TTTA)n 12 12 20379-20429 4 3 6799-6806 (GT), 4 4 7 37 2-7 3 79 4 4 11353-11360 16 17762-17789 14 4 9416-9444 4 (Y CAGGTC),