Genotyping of Simple and Compound Short Tandem Repeat Loci

Allison P. Null, James C. Hannis, and David C. Muddiman*,†,‡. Department of Chemistry, Virginia Commonwealth University, Richmond, Virginia 23284...
1 downloads 0 Views 122KB Size
Anal. Chem. 2001, 73, 4514-4521

Genotyping of Simple and Compound Short Tandem Repeat Loci Using Electrospray Ionization Fourier Transform Ion Cyclotron Resonance Mass Spectrometry Allison P. Null, James C. Hannis, and David C. Muddiman*,†,‡

Department of Chemistry, Virginia Commonwealth University, Richmond, Virginia 23284

The utility of electrospray ionization Fourier transform ion cyclotron resonance (ESI-FTICR) mass spectrometry as a new approach for genotyping short tandem repeats (STRs) is demonstrated. STRs are currently valued as a powerful source of genetic information with repeats that range in structure from simple to hypervariable. Two tetranucleotide STR loci were chosen to evaluate ESIFTICR mass spectrometry as a tool for genotyping: HUMTH01, a simple STR with nonconsensus alleles, and vWA, a compound STR with nonconsensus alleles. For HUMTH01, the genotype (i.e., repeat number of each allele) was determined for each of 30 individuals using mass measurements of double-stranded amplicons. Low-intensity peaks observed in the spectra of amplicons derived from heterozygous individuals were identified by mass as heteroduplexes that had formed between nonhomologous strands. Mass measurement of the double-stranded vWA amplicon was not sufficient for determining whether the individual was homozygous for allele subtype 18 or 18′ since the amplicons differ by only 0.99 Da. Therefore, single-stranded amplicons were generated by incorporating a phosphorylated primer, prepared using T4 polynucleotide kinase, into the PCR phase and subsequently digesting the bottom strand using λ-exonuclease. Accurate mass measurements were obtained for the single-stranded amplicons using internal calibration and the addition of a correction factor to adjust for the natural variation of isotopic abundances, confirming that the individual is homozygous for allele 18. Our results clearly demonstrate that ESI-FTICR mass spectrometry is a powerful approach to characterize both simple and compound STRs beyond the capabilities of electrophoretic technologies. Deciphering the human genome relies on genetic markers to map genes, identify evolutionary and pathological mutations, and investigate genetic diversity. This is in part accomplished by linkage disequilibrium and association studies that use multiple genetic markers from various and numerous populations.1 Genetic * Corresponding author. Phone: 804-828-7510; Fax: 804-828-8599; E-mail: [email protected]. † Affiliate Appointment in Biochemistry and Molecular Biophysics. ‡ The Massey Cancer Center. (1) Lander, E. S.; Schork, N. J. Science 1994, 265, 2037-2049.

4514 Analytical Chemistry, Vol. 73, No. 18, September 15, 2001

markers include many types of polymorphisms that occur within the human genome, including repetitive sequences, deletions, insertions, and substitutions. For genetic analysis, the most scientifically useful of these variations are the repetitive sequences known as short tandem repeats (STRs).2-5 STRs, or microsatellites, typically consist of 4-20 repeat units with each unit ranging from 1 to 7 nucleobases in length.2,3 Six types of STRs have been defined by Urquhart in order of increasing complexity according to repeat structure:6 (1) simple, (2) simple with nonconsensus alleles, (3) compound, (4) compound with nonconsensus alleles, (5) complex, and (6) hypervariable. Overall, STRs are extremely informative genetic markers since they are highly polymorphic,7,8 are found evenly dispersed throughout the human genome (1 in every 10-20 kb),2-4 and are relatively easy to isolate by the polymerase chain reaction (PCR).2,3,9 Thousands of di-, tri-, and tetranucleotide repeats have been validated for use as genetic markers for gene mapping4,10,11 as well as for human identification in forensic science.12 Standard strategies for characterizing STRs rely on slab gel or capillary electrophoresis, both of which are laborious, timeconsuming, and often inaccurate while requiring fluorescent or radioactive tags to visualize electrophoretic mobility. In addition, single-stranded amplicons may differ in electrophoretic mobility due to the adenine/cytosine content,13 resulting in complications in STR genotyping. The recently developed DNA microarray technology that has revolutionized high-throughput analysis of single-nucleotide polymorphisms (SNPs) is far less effective for STR characterization. The repeating sequence results in nonspe(2) Weber, J. L.; May, P. E. Am. J. Hum. Genet. 1989, 44, 388-396. (3) Litt, M.; Luty, J. A. Am. J. Hum. Genet. 1989, 44, 397-401. (4) Edwards, A.; Civitello, A.; Hammond, H. A.; Caskey, T. C. Am. J. Hum. Genet. 1991, 49, 746-756. (5) Tautz, D. Nucleic Acids Res. 1989, 17, 6463-6471. (6) Urquhart, A.; Kimpton, C. P.; Downes, T. J.; Gill, P. Int. J. Leg. Med. 1994, 107, 13-20. (7) Weber, J. L. Curr. Opin. Biotechnol. 1990, 1, 166-171. (8) Weber, J. L.; Wong, C. Hum. Mol. Genet. 1993, 2, 1123-1128. (9) Mullis, K.; Faloona, F. Methods Enzymol. 1987, 155, 335. (10) Gyapay, G.; Morissette, J.; Vignal, A.; Dib, C.; Fizames, C.; Millasseau, P.; Marc, S.; Bernardi, G.; Lanthrop, M.; Weissenbach, J. Nat. Genet. 1994, 7, 246-339. (11) The Utah Marker Development Group. Am. J. Hum. Genet. 1995, 57, 619628. (12) Alford, R. L.; Caskey, C. T. Curr. Opin. Biotechnol. 1994, 5, 29-33. (13) Saitoh, H.; Ueda, S.; Kurosaki, K.; Kiuchi, M. Forensic Sci. Int. 1998, 91, 81-90. 10.1021/ac0103928 CCC: $20.00

© 2001 American Chemical Society Published on Web 08/11/2001

cific annealing, yielding erroneous results, and has only been accomplished using a high-stringency electronic hybridization technique.14 An alternative to the inaccuracies of present techniques with respect to STR identification is a mass spectrometric approach. It is the inherent mass accuracy and short analysis time of a mass spectrometric approach that provides the basis for development of electrospray ionization mass spectrometry15 (ESI-MS) for the characterization of nucleic acids. Analysis of a PCR amplicon can be completed within seconds compared to the time scale of hours required for electrophoretic methods. Each peak in the mass spectrum corresponds to the mass-to-charge ratio (m/z) of an ion, which in turn provides the molecular mass; thus, defining the repeat number is possible without the use of an allelic ladder. Previous reports on the detection of PCR amplicons using ESIMS,16-25 with the largest extending beyond 500 base pairs in size,26 have established ESI-MS as an effective approach for characterization of nucleic acids.27 Specifically, measurement of doublestranded amplicons for genotyping STRs by ESI has been shown for human genomic template using a Fourier transform ion cyclotron resonance (FTICR) mass spectrometer22,28,29 and for equine template using an ion trap mass spectrometer.25 ESI-FTICR detection of single-stranded amplicons produced by λ-exonuclease digestion of double-stranded amplicons has been shown, which allows genotyping to distinguish between two alleles when doublestranded results lack achievable mass accuracy and precision.23 The superior sensitivity of FTICR, with respect to other types of mass spectrometers, has allowed the routine analysis of PCR amplicons derived from single acquisitions of a single 50-µL reaction30 while previous reports relied on pooling numerous reactions.16-24 Additionally, single-acquisition spectra of doublestranded amplicons have been acquired when electrospraying from solution concentrations as low as 5 nM.31 High throughput will also be imperative to analyze the significant number of (14) Radtkey, R.; Feng, L.; Muralhidar, M.; Duhon, M.; Canter, D.; DiPierro, D.; Fallon, S.; Tu, E.; McElfresh, K.; Nerenberg, M.; Sosnowski, R. Nucleic Acids Res. 2000, 28, e17. (15) Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M. Science 1989, 246, 64-71. (16) Muddiman, D. C.; Wunschel, D. S.; Liu, C. L.; Pasatolic, L.; Fox, K. F.; Fox, A.; Anderson, G. A.; Smith, R. D. Anal. Chem. 1996, 68, 3705-3712. (17) Naito, Y.; Ishikawa, K.; Koga, Y.; Tsuneyoshi, T.; Terunuma, H. Rapid Commun. Mass Spectrom. 1995, 9, 1484-1486. (18) Wunschel, D. S.; Fox, K. F.; Fox, A.; Bruce, J. E.; Muddiman, D. C.; Smith, R. D. Rapid Commun. Mass Spectrom. 1996, 10, 29-35. (19) Wunschel, D. S.; Muddiman, D. C.; Fox, K.; Fox, A.; Smith, R. D. Anal. Chem. 1997, 70, 1203-1207. (20) Muddiman, D. C.; Smith, R. D. Rev. Anal. Chem. 1998, 17, 1-68. (21) Hannis, J. C.; Muddiman, D. C. Rapid Commun. Mass Spectrom. 1999, 13, 323-330. (22) Hannis, J. C.; Muddiman, D. C. Rapid Commun. Mass Spectrom. 1999, 13, 954. (23) Null, A. P.; Hannis, J. C.; Muddiman, D. C. Analyst 2000, 125, 619-625 (Special Issue on Biological Mass Spectrometry). (24) Cerda, B. A.; Wesdemiotis, C. J. Am. Chem. Soc. 1996, 117, 9734. (25) Hahner, S.; Schneider, A.; Ingendoh, A.; Mosner, J. Nucleic Acids Res. 2000, 28, e82. (26) Muddiman, D. C.; Null, A. P.; Hannis, J. C. Rapid Commun. Mass Spectrom. 1999, 13, 1201-1204. (27) Null, A. P.; Muddiman, D. C. J. Mass Spectrom. 2001, 36, 589-606. (28) Comisarow, M. B.; Marshall, A. G. Chem. Phys. Lett. 1974, 25, 282-283. (29) Henry, K. D.; Williams, E. R.; Wang, B. H.; McLafferty, F. W.; Shabanowitz, J.; Hunt, D. F. Proc. Natl. Acad. Sci. U.S.A. 1989, 86, 9075-9078. (30) Hannis, J. C.; Muddiman, D. C.; Null, A. P. Proc. SPIE 2000, 3926, 36-47. (31) Hannis, J. C.; Muddiman, D. C. Fresenius' J. Anal. Chem. 2001, 369, 246251.

amplicons required for genetic studies in a timely manner. To address this issue, a flow injection approach has recently been designed for ESI mass spectrometry and demonstrated using PCR amplicons that would provide over 300 genotypes per day.32 Furthermore, we have recently integrated an automated liquid handling system with ESI-FTICR that will eventually increase the throughput to more than 2000 genotypes per day.33 Although FTICR mass spectrometers routinely achieve impressive, often unparalleled, mass accuracy, obtaining mass measurements with the accuracy needed to confidently provide a genotype is challenging, specifically for genotyping length polymorphisms derived from types 3-6 STR loci. Typically, analysis of large DNA molecules using ESI-FTICR with external calibration yields a mass accuracy of better than 200 ppm, corresponding to 10 Da for a 50-kDa amplicon. This mass error is primarily attributed to space charge effects and is greatly improved by internal calibration as previously shown by our drug-DNA characterization study.34 Recently, we have developed a dual-electrospray source that allows separate introduction of analyte and internal standard into the mass spectrometer for simultaneous trapping and detection.35,36 By using a dual-electrospray source and internal calibration, we were able to routinely obtain a mass accuracy of better than (50 ppm (∼2.5 Da for a 50-kDa amplicon). Furthermore, in an effort to further achieve high-accuracy mass measurements, one must also take into account the negative bias introduced by the natural distribution of isotopic abundances.37,38 This is a result of measuring the top of centroid as opposed to the true centroid for isotopically unresolved peaks. We have developed a correction factor that can be added to the top-of-centroid measurement to remove the systematic bias introduced by nature.39 We describe here STR genotyping for both type 2 (HUMTH01) and type 4 (vWA) human tetranucleotide repeat loci in a blind study using ESI-FTICR mass spectrometry. Simple tetranucleotide repeats (types 1 and 2) require only measurement of the doublestranded amplicon since the smallest mass difference between adjacent alleles, a single base pair, is ∼618 Da. However, more complex STRs contain alleles that can differ by less than 1 Da, which places stringent mass accuracy requirements on a doublestranded amplicon measurement. An alternative approach is to generate single-stranded amplicons using λ-exonuclease resulting in a mass accuracy at approximately the theoretical limit of (10 ppm for biomolecules greater than 10 kDa.40 The use of an internal standard and the application of a correction factor to the mass measurement yielding an accurate genotype for vWA are described here. While characterization of large nucleic acids by ESIMS is in its infancy, its potential as a platform for rapid and (32) Hannis, J. C.; Muddiman, D. C. Rapid Commun. Mass Spectrom. 2001, 15, 348-350. (33) George, L. T.; Null, A. P.; Hannis, J. C.; Muddiman, D. C. Proceedings of the Amercian Society for Mass Spectrometry; Chicago, IL, 2001; ThPK266. (34) Kloster, M. B. G.; Hannis, J. C.; Muddiman, D. C.; Farrell, N. P. Biochemistry 1999, 38, 14731-14737. (35) Hannis, J. C.; Muddiman, D. C. J. Am. Soc. Mass Spectrom. 2000, 11, 876883. (36) Flora, J. W.; Hannis, J. C.; Muddiman, D. C. Anal. Chem. 2001, 73, 12471251. (37) Zubarev, R. A. Int. J. Mass Spectrom. Ion Processes 1991, 107, 17-27. (38) Zubarev, R. A.; Demirev, P. A.; Hakansson, P.; Sundqvist, B. U. R. Anal. Chem. 1995, 67, 3793-3798. (39) Null, A. P.; Muddiman, D. C. Rapid Commun. Mass Spectrom., submitted. (40) Beavis, R. C. Anal. Chem. 1993, 65, 496-497.

Analytical Chemistry, Vol. 73, No. 18, September 15, 2001

4515

accurate mass measurements of large biomolecules is clearly established. EXPERIMENTAL SECTION PCR Amplification. The tetrameric repeat region located within the first intron of the human tyrosine hydroxylase (HUMTH01) gene41 (GenBank D00269) was amplified from 30 human genomic DNA samples of unknown genotype and from K562 DNA (Promega, Madison, WI); the latter has been purified from a subculture of the human myelogenous leukemia cell line. Primers were synthesized and purified using reversed-phase HPLC at the Midland Certified Reagent Co. in Midland, TX (forward sequence, 5′-CCT GTT CCT CCC TTA TTT CCC-3′; reverse sequence, 5′-GGG AAC ACA GAC TCC ATG GTG-3′). Each 50-µL reaction included 1× GeneAmp PCR buffer including 1.5 mM MgCl2, 2.5 units of AmpliTaq polymerase (Perkin-Elmer, Branchburg, NJ), 50 pmol of each primer, 0.2 mM deoxynucleoside triphosphates (Perkin-Elmer), and 100 ng of template DNA. Amplification was accomplished using a 24-well Perkin-Elmer GeneAmp 2400 thermal cycler at an initial hold temperature of 94 °C for 1 min followed by six cycles of 94 °C for 15 s, 63 °C for 15 s, and 72 °C for 15 s. In subsequent cycles, the primer annealing temperature was reduced by 2-3 °C every 6 cycles until reaching a final annealing temperature of 53 °C for 10 cycles. Final elongation was carried out at 72 °C for 7 min. The repeat region located within intron 40 of the von Willebrand factor (vWA) gene42 (GenBank M25858) was amplified from K562 DNA with each 50-µL reaction containing 1× GeneAmp PCR Gold buffer, 3 mM MgCl2, 1.25 units of AmpliTaq Gold (Perkin-Elmer), 12.5 pmol each primer (forward sequence, 5′-TCA GTA TGT GAC TTG GAT TG-3′; reverse sequence, 5′-GAT AAA TAC ATA GGA TGG ATG G-3′) (Midland Certified Reagent Co.), 0.2 mM deoxynucleoside triphosphates, and 100 ng of K562 DNA template. Amplification was accomplished using a 96-well MJ Research DNA Engine thermal cycler (Waltham, MA) at an initial hold temperature of 95 °C for 10 min followed by 30 cycles of 95 °C for 15 s, 55 °C for 30 s, and 72 °C for 30 s and a final hold of 72 °C for 10 min. After PCR, each amplicon was ethanol precipitated as previously described and all oligonucleotides were microdialyzed prior to ESI-FTICR mass analysis.21-23,26,43 5′-Phosphorylation Using T4 Polynucleotide Kinase. Each 25-µL reaction was composed of 1× PNK reaction buffer, 3 units of T4 PNK (Epicentre Technologies, Madison, WI), 150 pmol of vWA reverse primer, and 600 pmol of adenosine triphosphate. The reaction was incubated at 37° C for 30 min and the enzyme deactivated for 5 min at 70 °C. Preparation of Single-Stranded Amplicons. Amplicons were digested using λ-exonuclease (Boehringer Mannheim, Indianapolis, IN) as previously described.23 Briefly, 20 pmol of PCR amplicon was added to 1× buffer and 5 units of λ-exonuclease in a final volume of 10 µL. Reactions were carried out for 1 h at 37 °C followed by an enzyme deactivation step at 75 °C for 10 min. The product was purified with an additional ethanol precipitation step as described for the double-stranded amplicons. (41) Polymeropoulos, M.; Xiao, H.; Rath, D.; Merril, C. Nucleic Acids Res. 1991, 19, 3753. (42) Kimpton, C.; Walton, A.; Gill, P. Hum. Mol. Genet. 1992, 1, 779. (43) Liu, C.; Muddiman, D. C.; Smith, R. D. J. Mass Spectrom. 1997, 32, 425431.

4516

Analytical Chemistry, Vol. 73, No. 18, September 15, 2001

Mass Spectrometry. All spectra described here were obtained with a modified IonSpec ESI-FTICR mass spectrometer (Irvine, CA) using a 4.7-T superconducting magnet (Cryomagnetics, Inc., Oak Ridge, TN). The ESI source (Analytica of Branford, Inc., Branford, CT) was modified to accept a heated metal capillary44 and fitted with a dual-microelectrospray emitter previously described.35,36 The analytes were electrosprayed from 50-µm fusedsilica capillary (Polymicro Technologies, Inc., Phoenix, AZ) pulled to a fine tip and remotely coupled to a potential of ∼-2500 V with a 3 nL/s flow rate.45 Each experiment included two hexapole accumulations,46 one each for the amplicon and internal standard, when necessary, whose respective accumulation times were adjusted for sufficient relative signal to noise to accomplish internal calibration. Trapped ions were excited using a 4-ms chirp waveform with Vp-p ) 30 V from m/z ) 400 to 3500 using a preamplifier gain of 1×, without a window function, acquiring 128K of data at a 500-kHz ADC rate with 12 bits of accuracy. All electrospray reagents and the internal standard, poly(ethylene glycol) (PEG; MW ) 1500) were purchased from Sigma at the highest purity available and used as received. Oligonucleotides were electrosprayed at concentrations of 2-4 µM while PEG (MW ) 1500) was electrosprayed from a concentration of 10 µM. Electrospray buffer for all samples consisted of 60% acetonitrile, 20% 2-propanol, 2 mM ammonium acetate, 20 mM piperidine and 20 mM imidazole.47,48 RESULTS AND DISCUSSION Genotyping a Simple STR Locus. The well-characterized, tetrameric repeat within the HUMTH01 gene was chosen as the model locus for genotyping simple STRs. Of the 13 known alleles at this locus, 4 contain a nonconsensus repeat in which a dT residue has been deleted in the coding strand and the complementary dA residue in the noncoding strand. Each allele is named according to the number of repeats (e.g., 8,9,10) with nonconsensus alleles named by the number of complete repeats and the number of bases present in the incomplete strand separated by a decimal (e.g., 8.3,9.3,10.3).49 Thirty purified human genomic DNA samples were provided by Microdiagnostics (Nashville, TN) for genotyping in a blind study. HUMTH01 was amplified from each by PCR using a primer pair designed to directly flank the repeating sequence while maintaining a 3′-GC pair to ensure effective amplification. Generating the smallest possible amplicon improves the resolution and sensitivity of a mass spectrometric measurement as well as improving PCR efficiency. Based on known alleles, amplicons from 55 to 98 base pairs were expected corresponding to molecular weights of ∼34 000-60 000. Table 1 lists the HUMTH01 genotypes determined for all 30 genomic samples using ESI-FTICR mass spectrometry. Genotypes were validated using polyacrylamide gel electrophoresis (PAGE) (44) Chowdhury, S. K.; Katta, V.; Chait, B. T. Rapid. Commun. Mass Spectrom. 1990, 4, 81-87. (45) Hannis, J. C.; Muddiman, D. C. Rapid Commun. Mass Spectrom. 1998, 12, 443-448. (46) Senko, M. W.; Hendrickson, C. L.; Emmett, M. R.; Shi, S. D.-H.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 1997, 8, 970-976. (47) Greig, M.; Griffey, R. H. Rapid Commun. Mass Spectrom. 1995, 9, 97-102. (48) Muddiman, D. C.; Cheng, X. H.; Udseth, H. R.; Smith, R. D. J. Am. Soc. Mass Spectrom. 1996, 7, 697-706. (49) Bar, W.; Brinkmann, B.; Lincoln, P.; Mayr, W. R.; Rossi, U. Int. J. Leg. Med. 1994, 107, 159-160.

Table 1. HUMTH01 Genotypes Derived from the Genomic DNA of 30 Individuals. individual

genotype

individual

genotype

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

6/7 7/9 7/8 7/7 7/7 7/7 6/9.3 6/7 8/9.3 9/9 9/9 6/9.3 6/7 6/7 9/9.3

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

7/7 8/8 8/9.3 8/9.3 7/9.3 9/9 6/9 8/9.3 6/6 7/9 8/9 7/9 6/6 9.3/9.3 6/9.3

with conventional primers, and each agreed with the genotype determined by ESI-FTICR. Although gel electrophoresis and mass spectrometry provided the same conclusion for the HUMTH01 genotypes, ESI-FTICR mass spectrometry provides several significant advantages. First, one can generate a mass spectrum in a matter of seconds where gel electrophoresis can take many hours for results. Second, silver staining, the most sensitive approach used for detection in electrophoresis, requires picomoles (or nanograms) of material to visualize a signal whereas mass spectrometry requires only femtomole to attomole quantities of PCR amplicon electrosprayed from micromolar to nanomolar concentrations.27,31 Currently, 2 µL of sample is required; thus, for a 10 nM solution of PCR amplicon,31 20 fmol of amplicon must be produced. Third, the ability to directly measure the mass of an amplicon by mass spectrometry eliminates the need for the allelic ladder required in electrophoretic-based methods that compare relative mobility. Figure 1 shows single-acquisition ESI-FTICR mass spectra for 6 out of the 30 amplicons generated for the HUMTH01 locus. Due to the terminal transferase activity inherent with Taq polymerase,50 three species are produced per allele separated by a mass of ∼313 Da: blunt-ended, monoadenylated and diadenylated amplicons. Each amplicon was electrosprayed at 2 µM to ensure a good signal-to-noise ratio, which decreases with an increase in number of molecular species. Only mono- and diadenylated species are significant peaks in the spectra shown here with very little salt adduction. The top three spectra were derived from individuals 24, 4, and 21, who are homozygous at the HUMTH01 locus with genotypes of (6,6), (7,7), and (9,9), respectively. There are clearly two peaks at each charge state representing the mono- and diadenylated species. Occasionally, primers from the PCR were observed as illustrated in the spectrum generated for the amplicon derived from individual 4. These peaks were easily recognizable and did not interfere with genotype calling for any of the individuals. The three mass spectra in the bottom row of Figure 1 were derived from individuals 1, 2, and 9, who are heterozygous for the HUMTH01 locus. A pair of charge-state distributions with essentially equal intensities present in the spectrum could im(50) Clark, J. M. Nucleic Acids Res. 1988, 16, 9677-9686.

mediately render the mass spectra derived from heterozygous individuals distinguishable from the mass spectra derived from homozygous individuals. The average molecular masses correspond to the genotypes (6,7), (7,9). and (8,9.3), respectively. The signal-to-noise ratio of both charge-state distributions in the mass spectra derived from heterozygous individuals is ∼50% less on average than those derived from homozygous individuals since the signal is dispersed over twice as many peaks. An interesting observation in all spectra derived from heterozygous individuals was that minor peaks were present that did not correspond to any expected component of the PCR. Figure 2 shows the ESI-FTICR mass spectrum derived from individual 15 who is heterozygous (9,9.3) for HUMTH01 with an expansion to illustrate these minor peaks. A second expansion of the mass spectrum in Figure 2 shows two peaks whose average molecular masses correspond to the sum of the top strand of allele 9.3 and the bottom strand of allele 9 and vice versa; both are 3′ monoadenylated. The second set of peaks is ∼313 Da greater than the first and is identified by mass as the diadenylated heteroduplexes. The primary structure of the homoduplexes and subsequent heteroduplexes are shown in the schematic in Figure 2 to illustrate how a portion of one strand loops out when a heteroduplex forms. This depiction shows the nonconsensus repeat looping out; however, an ensemble of configurations is possible because the heteroduplex will assume the most thermodynamically favorable secondary structure. These heteroduplexes could occur two ways: (i) when the polymerase falls off of the synthesizing strand during PCR, a repeat unit within the template strand loops out and elongation resumes without including that repeat or (ii) when “complementary” strands from two different alleles anneal in the final step of PCR. It most likely does not occur via the former explanation since heteroduplexes are not observed in the mass spectra of amplicons generated from homozygous individuals. The latter explanation is supported by the fact that any two alleles of the HUMTH01 locus possess a high degree of homology. The highest degree of homology from the samples analyzed here exists between alleles 9 and 9.3 (96%) so it was no surprise that the heteroduplex peaks generated from these alleles were the most intense among all spectra derived from heterozygous individuals. The intensity of heteroduplex peaks among all heterozygous individuals in this study was roughly proportional to the degree of homology between alleles. Heteroduplexes could complicate gel electrophoresis results if they are interpreted as stutter bands or an alternative allele due to DNA mixtures. The mass accuracy of mass spectrometry alleviates this problem by providing unique identification of such artifacts by direct mass measurement. Mass accuracy and precision are an advantage of mass spectrometry over electrophoretic methods, especially when accompanied with internal calibration where measurements routinely approach the theoretical limit of 10 ppm40 in our laboratory for large biomolecules.35 However, the smallest mass difference between two adjacent alleles for a double-stranded amplicon derived from the HUMTH01 locus is ∼618 Da for alleles 9.3 and 10. Although separation of these alleles is challenging when electrophoretic methods are used, this difference equates to a mass accuracy requirement of ∼12 000 ppm, which is routine for any mass spectrometer. Therefore, mass measurement of the Analytical Chemistry, Vol. 73, No. 18, September 15, 2001

4517

Figure 1. ESI-FTICR mass spectra of amplicons derived from the tetrameric repeat region of the HUMTH01 locus for individuals 24, 4, 21, 1, 2, and 9 (6 out of the 30 individuals used in this study). Amplicons were electrosprayed from a concentration of 2 µM and each singleacquisition mass spectrum was collected using 128K data points and an ADC rate of 500 kHz. The three spectra in the top row are derived from homozygous individuals whereas the bottom row consists of spectra from amplicons derived from heterozygous individuals. The genotype for each individual and charge states have been labeled with only the monoadenylated peaks specified in the spectra of heterozygous individuals for clarity. The presence of excess primer, occasionally observed in mass spectra of PCR amplicons, has been indicated in the spectrum generated for individual 4.

double-stranded amplicon is sufficient for accurate genotyping of simple repeat loci. However, genotyping a complex STR system places stringent demands on the technology applied for accurate genotyping, and a novel approach using mass spectrometry will be detailed below. Genotyping a Compound STR Locus. The tetranucleotide repeat sequence vWA was employed for characterization of a more complex STR system by ESI-FTICR mass spectrometry. The 18 possible alleles at this compound, nonconsensus locus consist of 5′-d(TCTA)x-3′ and 5′-d(TCTG)y-3′ repeats with the nonconsensus sequence a 5′-d(TCCA)z-3′. Alleles in which the value of x is 4 are named by the number of full repeats (i.e., x + y + z),49 while those that differ are named by the number of full repeats followed by a prime (′). Those alleles that lack the last eight nucleotides altogether are denoted by a double prime (′′).51 We should note that another nomenclature exists that neglects the last eight nucleotides and calls alleles by the repeat number only, regardless of structure;51 the former nomenclature will be used throughout this report. Several alleles at the vWA locus contain the same number of repeats but differ in sequence. For example, there are two alleles that contain 18 repeats, one with the sequence 5′-d(TCTA[TCTG]4[TCTA]11TCCATCTA)-3′ referred to as allele (51) Brinkmann, B.; Sajantila, A.; Goedde, H. W.; Matsumoto, H.; Nishi, K.; Wiegand, P. Eur. J. Hum. Genet. 1996, 4, 175-182.

4518

Analytical Chemistry, Vol. 73, No. 18, September 15, 2001

18, and the other with the sequence 5′-d(TCTA[TCTG]3[TCTA]12TCCATCTA)-3′ referred to as allele 18′. Using our primer pair that closely flanks the repeating sequence, the double-stranded PCR amplicons generated from these alleles have theoretical monoadenylated molecular masses of 71 846.21 and 71 845.22 Da, respectively. Scientists genotyping this locus using electrophoretic methods call both alleles 18 since they are indistinguishable by electrophoretic mobility. The goal of our work here is to distinguish between two very similar yet distinct alleles using mass spectrometry to provide accurate STR genotypes. The repeat region at the vWA locus was amplified from K562 human genomic DNA and the amplicon analyzed using ESI-FTICR mass spectrometry. Figure 3 shows a mass spectrum of the double-stranded PCR amplicon electrosprayed from a 4 µM solution, which consists of a single distribution with an average monoadenylated mass of 71 877.78 Da. The blunt-ended and diadenylated peaks can be observed flanking the most intense monoadenylated peak. It can be concluded that this individual is homozygous for an allele with 18 repeats; however, these measurements cannot confirm whether the individual is homozygous for allele 18 or 18′. The experimental average mass, taken as an average of the eight most abundant monoadenylated peaks, deviates from the theoretical average mass of allele 18 by 389 ppm and likewise by 403 ppm for allele 18′. The mass accuracy

Figure 2. ESI-FTICR mass spectrum of the HUMTH01 amplicon generated from the genomic DNA of individual 15 with expansions of the heteroduplexes formed between alleles 9 and 9.3. Expansion of the m/z range 2295 to 2485 shows the mono- and diadenylated heteroduplex peaks representing the duplex formed between the top strand of allele 9.3 and the bottom strand of allele 9 (9.3/9) and vice versa (9/9.3). The dominant 3′-monoadenylated species have been expanded to illustrate each heteroduplex with average molecular masses of 49 902.08 (9.3/9) and 49 942.10 Da (9/9.3). The schematic depicts only one of the many possible heteroduplexes that may form between complementary strands of alleles 9 and 9.3.

required to confidently assign a genotype must be better than 7 ppm; however, achieving a mass accuracy better than 10 ppm is theoretically unsound for biomolecules greater than 10 000 Da due to the natural variation of isotope abundances.40 For this reason, internal calibration was not employed for this experiment. Clearly, an approach is demanded to accurately characterize similar alleles that requires mass measurement of a single strand to reduce the mass accuracy requirements to a level that is achievable for genotype calling. The production of single-stranded amplicons using the DNA repair enzyme λ-exonuclease has previously been shown for the HUMTH01 locus;23 more recently we have explored electrospraying PCR amplicons under denaturing conditions utilizing both modified solution compositions and a heated transfer capillary.52 However, when addressing complex mixtures, such as those encountered in assaying microsatellite instability, detection of a single strand per allele will prove invaluable. λ-Exonuclease selectively digests a single strand of a DNA duplex that contains a 5′-phosphorylation leaving the complementary strand intact. In the previous work, a 5′-phosphorylated primer was included in the PCR and the phosphorylated amplicon treated with λ-exonuclease. Although single-stranded amplicons were successfully produced from the phosphorylated amplicons, the signal-to-noise (52) Mangrum, J. B.; Flora, J. F.; Null, A. P.; Muddiman, D. C. Proceedings of the Amercian Society for Mass Spectrometry, Chicago, IL, 2001; ThPK272.

ratio suffered due to the presence of unphosphorylated doublestranded amplicons. Mass spectrometric analysis of these primers, which were commercially prepared using phosphoramidite chemistry, found that only half of the primers were phosphorylated.23 To avoid such a problem, for these experiments we performed an in-house phosphorylation assay using T4 polynucleotide kinase (T4 PNK). T4 PNK catalyzes the transfer of the γ-phosphate of adenosine triphosphate (ATP) to the 5′-hydroxyl terminal of single- or doublestranded DNA as illustrated in Figure 4A. The reverse primer was chosen as the primer to be phosphorylated and for subsequent digestion of the bottom strand of the amplicon. Each 25-µL T4 PNK reaction contains 150 pmol of primer, which is sufficient for 15 PCR reactions. The entire reaction takes 35 min and can be run while the PCR is being prepared. Figure 4B shows a mass spectrum of the reverse primer after T4 PNK treatment. The phosphorylated primer is clearly the most abundant species in the spectrum. On the basis of peak intensities, 85% of the total reverse primer has been converted to phosphorylated primer ensuring an effective λ-exonuclease digestion. An advantage of phosphorylating the primer using T4 PNK is that the reaction can be directly added to the PCR without any sample cleanup. PCR amplicons were prepared from the vWA locus as before substituting the phosphorylated reverse primer for the normal reverse primer in the reaction. After amplification and ethanol Analytical Chemistry, Vol. 73, No. 18, September 15, 2001

4519

Figure 3. Five-acquisition ESI-FTICR mass spectrum of the double-stranded vWA amplicon derived from the genomic DNA template of a homozygous individual. The experimental mass determined using external calibration was 71 877.78 Da indicating 18 tetranucleotide repeats. It cannot be determined from this average mass measurement whether the individual is homozygous for allele subtype 18 (MWtheo ) 71 846.21) or subtype 18′ (MWtheo ) 71 845.22) since the mass accuracy required to discriminate between the two alleles is below theoretically achievable limits (e10 ppm).

Figure 4. Phosphorylation of the vWA reverse primer using T4 polynucleotide kinase. (A) The schematic illustrates the T4 PNKcatalyzed transfer of a γ-phosphate of adenosine triphosphate to the 5′-terminus of the reverse primer. (B) ESI-FTICR mass spectrum of the PNK-treated reverse primer electrosprayed at a concentration of 2 µM. The relative peak intensities indicate that ∼85% of the primer has been phosphorylated.

precipitation, the reactions were treated with λ-exonuclease to digest the bottom strand, leaving only the top strand for analysis 4520 Analytical Chemistry, Vol. 73, No. 18, September 15, 2001

by mass spectrometry. The top strands of alleles 18 and 18′ have theoretical average masses of 35 608.80 and 35 592.81 Da, respectively; therefore, a mass accuracy of better than 225 ppm is required to distinguish between these two alleles. Using external calibration, an experimental average mass of 35 623.65 Da was determined for the single-stranded amplicon by averaging the five most abundant peaks in each of eight spectra. The experimental mass differs from allele 18 by 14.85 Da (417 ppm) and from allele 18′ by 30.84 Da (861 ppm); therefore, an internal calibrant is clearly necessary to obtain an accurate genotype (data not shown). Internal calibration using a dual-electrospray source has been shown to significantly improve mass measurements for nucleic acids by reducing the systematic error resulting from space charge effects.35,36 For this study, PEG (MW ) 1500) was used since it spans the same m/z range as the single-stranded amplicon. Figure 5 shows the single-acquisition mass spectrum of the singlestranded amplicon including the internal calibrant. Both the bluntended and monoadenylated top strands of the amplicon are present although the monoadenylated is the predominant species. Inspection of the mass spectrum shows that there is no doublestranded amplicon detectable. Several charge states of the adenylated PCR amplicon have been labeled and the PEG (MW ) 1500) peaks denoted by asterisks. Using five peaks per each of eight single-acquisition spectra, the experimental average mass is 35 607.69 Da, which differs from allele 18 by -31 ppm and allele 18′ by 418 ppm. These results conclude that the individual from whom this amplicon was derived is homozygous for allele 18. Sanger sequencing of the individual confirmed this result.53 The fact that the remaining error in the mass measurement possesses a negative bias strongly suggests that a systematic error (53) Personal communication, Promega Corp., Madison, WI, 2000.

which closely approaches the theoretical limit of average mass measurement accuracy and precision of large biomolecules.40 These data demonstrate the reliable and accurate genotyping of tetranucleotide STR loci that is achieved using ESI-FTICR mass spectrometry. Externally calibrated mass measurements derived from double-stranded amplicons were sufficient for genotyping the HUMTH01 locus, while production of a single-stranded amplicon and internal calibration were required to accurately define the genotype for vWA. The accuracy, resolution, and short analysis time of ESI-FTICR ensures the ability to characterize many STRs of any complexity.

Figure 5. Single-acquisition ESI-FTICR mass spectrum of a singlestranded vWA amplicon including an internal standard. The singlestranded amplicon was generated by incorporating a 5′-phosphorylated reverse primer into the PCR and subsequently digesting the product with λ-exonuclease. PEG (MW ) 1500) was introduced as an internal standard using the dual-electrospray source and is denoted by asterisks. The internally calibrated molecular mass and mass error were determined by averaging five charge states from each of eight mass spectra of similar signal-to-noise ratios and using 16 PEG (MW ) 1500) peaks. The average mass shown includes the addition of a 0.24-Da correction factor. From these data, the individual can be genotyped as homozygous for allele 18.

is introduced by the natural abundance of isotopes.37,38 The addition of a correction factor has been described in detail for nucleic acids that relies on the average mass and the resolution of the spectral peak.39 The correction factor for a molecule of 35 608 Da with a resolution of 2000 (fwhm) is 0.24 Da. The corrected mass is 35 607.93 Da with a mass error of -25 ppm,

CONCLUSIONS We have demonstrated that ESI-FTICR mass spectrometry is a reliable approach for genotyping STRs by accurately genotyping both simple and compound STR loci. STRs with simple repeat sequences only require the accurate mass measurement of a duplex PCR amplicon for reliable genotyping. Clearly, analysis of single-stranded amplicons using internal calibration and a correction factor will be required to satisfy the stringent mass accuracy requirements for accurate genotyping of types 3-6 STR loci. Mass spectrometry possesses the ability to directly and rapidly provide specific genetic information that would otherwise require Sanger sequencing. ACKNOWLEDGMENT We gratefully acknowledge MicroDiagnostics (Nashville, TN) for providing the 30 human genomic DNA samples and the National Institutes of Health (R01HG02159) and the Mary E. Kapp Foundation of the Department of Chemistry, Virginia Commonwealth University for the generous financial support. Received for review April 5, 2001. Accepted July 3, 2001. AC0103928

Analytical Chemistry, Vol. 73, No. 18, September 15, 2001

4521