Anal. Chem. 2009, 81, 7515–7526
Accelerated Articles Base Composition Profiling of Human Mitochondrial DNA Using Polymerase Chain Reaction and Direct Automated Electrospray Ionization Mass Spectrometry Thomas A. Hall,† Kristin A. Sannes-Lowery,† Leslie D. McCurdy,‡ Constance Fisher,‡ Theodore Anderson,§ Almira Henthorne,† Lora Gioeni,‡ Bruce Budowle,| and Steven A. Hofstadler*,† Ibis Biosciences, subsidiary of Abbott Molecular, Inc., Carlsbad, California 92008, Federal Bureau of Investigation, Quantico, Virginia 22135, Armed Forces DNA Identification Laboratory, Rockville, Maryland 20850, and Department of Forensic and Investigative Genetics, Institute of Investigative Genetics, University of North Texas Health Science Center, Fort Worth, Texas 76107 We describe an automated system for high-resolution profiling of human mitochondrial DNA (mtDNA) based upon multiplexed polymerase chain reaction (PCR) followed by desolvation and direct analysis using electrospray ionization mass spectrometry (PCR/ESI-MS). The assay utilizes 24 primer pairs that amplify targets in the mtDNA control region, including the hypervariable regions typically sequenced in a forensic analysis. Profiles consisting of product base compositions can be stored in a database, compared to each other, and compared to sequencing results. Approximately 94% of discriminating information obtained by sequencing is retained with this technique. The assay is more discriminating than sequencing minimum HV1 and HV2 regions because it interrogates more of the mitochondrial genome. A profile compared to a population database can be subjected to the same statistics used for assessing the significance of concordant mtDNA sequences. The assay is not hindered by length heteroplasmy, can directly analyze template mixtures, and has a sensitivity of 99% of trials producing a full profile with automated analysis. The technique has direct application to analysis of forensic biological evidence. Forensic analysis of mitochondrial DNA (mtDNA) often provides results for samples where nuclear autosomal marker * To whom correspondence should be addressed. E-mail:
[email protected]. Phone: (760) 603-2599. † Ibis Biosciences, subsidiary of Abbott Molecular, Inc. ‡ Federal Bureau of Investigation. § Armed Forces DNA Identification Laboratory. | University of North Texas Health Science Center. 10.1021/ac901222y CCC: $40.75 2009 American Chemical Society Published on Web 08/17/2009
analyses are problematic (such as old bones, teeth, and hair shafts).1-5 Typing generally involves PCR amplification of two regions of the mtDNA control region called hypervariable regions 1 and 2 (HV1 and HV2, nucleotide positions 16024-16365 and 73-340, respectively, of the revised Cambridge Reference Sequence6,7) followed by Sanger sequencing of the amplicons.8,9 This process can be laborious, time-consuming, and costly. Additionally, data analysis can be confounded by sequence artifacts, electrophoretic anomalies, the presence of heteroplasmy (i.e., the presence of more than one mitochondrial genome variant within an individual), and limited ability to quantify the components of a mixed sample. Heteroplasmy can be confined to two genomes usually differing at a single nucleotide position (SNP) or differing in length, typically at polycytosine tracts within the regularly analyzed regions.10-14 The presence of length hetero(1) Budowle, B.; Allard, M. W.; Wilson, M. R.; Chakraborty, R. Annu. Rev. Genomics Hum. Genet. 2003, 4, 119–141. (2) Ginther, C.; Issel-Tarver, L.; King, M. C. Nat. Genet. 1992, 2, 135–138. (3) Sullivan, K. M.; Hopgood, R.; Gill, P. Int. J. Legal Med. 1992, 105, 83–86. (4) Wilson, M. R.; DiZinno, J. A.; Polanskey, D.; Replogle, J.; Budowle, B. Int. J. Legal Med. 1995, 108, 68–74. (5) Bogenhagen, D.; Clayton, D. A. J. Biol. Chem. 1974, 249, 7791–7795. (6) Anderson, S.; Bankier, A. T.; Barrell, B. G.; de Bruijn, M. H. L.; Coulson, A. R.; Drouin, J.; Eperon, I. C.; Nierlich, D. P.; Roe, B. A.; Sanger, F.; Schreier, P. H.; Smith, A. J. H.; Staden, R.; Young, I. G. Nature 1981, 290, 457–465. (7) Andrews, R. M.; Kubacka, I.; Chinnery, P. F.; Lightowlers, R. N.; Turnbull, D. M.; Howell, N. Nat. Genet. 1999, 23, 147. (8) Bar, W.; Brinkmann, B.; Budowle, B.; Carracedo, A.; Gill, P.; Holland, M.; Lincoln, P. J.; Mayr, W.; Morling, N.; Olaisen, B.; Schneider, P. M.; Tully, G.; Wilson, M. Int. J. Legal Med 2000, 113, 193–196. (9) Tully, G.; Bar, W.; Brinkmann, B.; Carracedo, A.; Gill, P.; Morling, N.; Parson, W.; Schneider, P. Forensic Sci. Int. 2001, 124, 83–91. (10) Santos, C.; Sierra, B.; Alvarez, L.; Ramos, A.; Fernandez, E.; Nogues, R.; Aluja, M. P. J. Mol. Evol. 2008, 67, 191–200. (11) Bendall, K. E.; Sykes, B. C. Am. J. Hum. Genet. 1995, 57, 248–256. (12) Budowle, B.; Allard, M. W.; Wilson, M. R.; Chakraborty, R. Annu. Rev. Genomics Hum. Genet. 2003, 4, 119–141. (13) Lagerstrom-Fermer, M.; Olsson, C.; Forsgren, L.; Syvanen, A. C. Am. J. Hum. Genet. 2001, 68, 1299–1301.
Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
7515
plasmy can make sequence analysis cumbersome by producing mixed, out of register sequence results. Likewise, evidence samples comprised of DNA from more than one contributor render sequence analysis difficult unless individual components are separated, for example, by cloning and sequencing multiple clones.15,16 Alternate technical approaches provide mechanisms to overcome challenges and limitations inherent in direct mitochondrial sequencing. Various technologies, such as pyrophosphate-based sequencing (pyrosequencing),17,18 denaturing high-performance liquid chromatography (DHPLC),19 individual SNP-targeting analyses,20 microarray-based resequencing,21 and mass spectrometrybased analysis,22-24 may obviate some of the limitations of Sanger sequencing but have other disadvantages. For example, SNPbased techniques such as SNaPshot (Applied Biosystems) require that each individual polymorphic position be previously characterized and specifically targeted through primer design. However, in forensic analyses, substantially more information for differentiating individuals can be obtained by exploiting private mutations whose position within a DNA sequence may not be known a priori. Array-based techniques for mitochondrial resequencing are expensive to set up, are challenged when confronted with homopolymeric stretches, and generally require large amounts of pristine sample DNA. Hence they are less well-suited for forensic analysis of limited or poor-quality samples.21,25,26 Pyrosequencing, in contrast, shows promise in terms of sensitivity and applicability to mixture analysis but has limited capability for multiplexing.17,27 Mass spectrometry has been demonstrated as a viable platform for several modes of nucleic acid analysis.28-31 DNA typing methods have been described for the detection and identification (14) Salas, A.; Lareu, M. V.; Carracedo, A. Int. J. Legal Med. 2001, 114, 186– 190. (15) Hatsch, D.; Amory, S.; Keyser, C.; Hienne, R.; Bertrand, L. J. Forensic Sci. 2007, 52, 891–894. (16) Walker, J. A.; Garber, R. K.; Hedges, D. J.; Kilroy, G. E.; Xing, J.; Batzer, M. A. Anal. Biochem. 2004, 325, 171–173. (17) Andreasson, H.; Nilsson, M.; Budowle, B.; Frisk, S.; Allen, M. Int. J. Legal Med. 2006, 120, 383–390. (18) Andreasson, H.; Asp, A.; Alderborn, A.; Gyllensten, U.; Allen, M. Biotechniques 2002, 32, 124–126, 128, 130-123. (19) Kristinsson, R.; Lewis, S. E.; Danielson, P. B. J. Forensic Sci. 2009, 54, 28–36. (20) Divne, A. M.; Nilsson, M.; Calloway, C.; Reynolds, R.; Erlich, H.; Allen, M. J. Forensic Sci. 2005, 50, 548–554. (21) Vallone, P. M.; Jakupciak, J. P.; Coble, M. D. Forensic Sci. Int. Genet. 2007, 1, 196–198. (22) Hall, T. A.; Budowle, B.; Jiang, Y.; Blyn, L.; Eshoo, M.; Sannes-Lowery, K. A.; Sampath, R.; Drader, J. J.; Hannis, J. C.; Harrell, P.; Samant, V.; White, N.; Ecker, D. J.; Hofstadler, S. A. Anal. Biochem. 2005, 344, 53–69. (23) Oberacher, H.; Niederstatter, H.; Parson, W. Int. J. Legal Med. 2007, 121, 57–67. (24) Oberacher, H.; Niederstatter, H.; Pitterl, F.; Parson, W. Anal. Chem. 2006, 78, 7816–7827. (25) Zhou, S.; Kassauei, K.; Cutler, D. J.; Kennedy, G. C.; Sidransky, D.; Maitra, A.; Califano, J. J. Mol. Diagn. 2006, 8, 476–482. (26) Maitra, A.; Cohen, Y.; Gillespie, S. E.; Mambo, E.; Fukushima, N.; Hoque, M. O.; Shah, N.; Goggins, M.; Califano, J.; Sidransky, D.; Chakravarti, A. Genome Res. 2004, 14, 812–819. (27) Andreasson, H.; Nilsson, M.; Styrman, H.; Pettersson, U.; Allen, M. Forensic Sci. Int. Genet. 2007, 1, 35–43. (28) Doktycz, M. J.; Hurst, G. B.; Habibi-Goudarzi, S.; McLuckey, S. A.; Tang, K.; Chen, C. H.; Uziel, M.; Jacobson, K. B.; Woychik, R. P.; Buchanan, M. V. Anal. Biochem. 1995, 230, 205–214. (29) Hurst, G. B.; Doktycz, M. J.; Vass, A. A.; Buchanan, M. V. Rapid Commun. Mass Spectrom. 1996, 10, 377–382. (30) Naito, Y.; Ishikawa, K.; Koga, Y.; Tsuneyoshi, T.; Terunuma, H.; Arakawa, R. J. Am. Soc. Mass Spectrom. 1997, 8, 737–742.
7516
Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
of clinically relevant microbes both from isolates and clinical specimens and for forensic profiling of human DNA.22,24,32-35 Several SNP identification methodologies based upon mass spectrometry have been described, primarily using matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) platforms.36,37 Recently, multiplex PCR followed by high-performance liquid chromatography (HPLC) coupled to electrospray ionization time-of-flight mass spectrometry (ESI-TOF-MS) was demonstrated to be amenable for typing coding regions of human mtDNA targeted for the presence of SNPs between individuals to complement sequencing of regions HV1 and HV2.24 Mass spectrometry of PCR products has the advantage that private mutations within the targeted regions are detected in addition to positions known to vary frequently, expanding the discriminating potential of an assay beyond that of specific SNP targeting. In this work we describe a multiplexed PCR/ESI-TOF-MSbased assay that interrogates 1051 bases of the human mtDNA control region encompassing the most polymorphic regions assayed in a mitochondrial forensic analysis. A total of 24 overlapping segments, each defined by a pair of PCR primers, are partitioned into 8 triplex reactions. The amplified products are analyzed directly following automated amplicon purification with no required size separation of products prior to mass spectrometry. Accurately determined masses of PCR products are converted to base compositions for each set of amplicons to develop an overall base composition profile for a sample. The compiled profile can be compared directly to another base composition profile or a database generated from sequence data, base composition, or both.22 Individual polymorphic positions do not need to be specifically characterized and targeted as with sequencing, as all variation within the amplified regions is interrogated simultaneously. Approximately 94% of the differentiating information of full sequencing is retained by this approach when comparing regions of identical length. However, because a wider range of nucleotides is interrogated than the minimum HV1 (31) Hofstadler, S. A.; Sampath, R.; Blyn, L. B.; Eshoo, M. W.; Hall, T. A.; Jiang, Y.; Drader, J. J.; Hannis, J. C.; Sannes-Lowery, K. A.; Cummins, L. L.; Libby, B.; Walcott, D. J.; Schink, A.; Massire, C.; Ranken, R.; White, N.; Samant, V.; McNeil, J. A.; Knize, D.; Robbins, D.; Rudnik, K.; Desai, A.; Moradi, E.; Ecker, D. J. Int. J. Mass Spectrom. 2005, 242, 23–41. (32) Ecker, D. J.; Sampath, R.; Blyn, L. B.; Eshoo, M. W.; Ivy, C.; Ecker, J. A.; Libby, B.; Samant, V.; Sannes-Lowery, K.; Melton, R. E.; Russell, K.; Freed, N.; Barrozo, C.; Wu, J.; Rudnick, K.; Desai, A.; Moradi, E.; Knize, D. J.; Robbins, D. W.; Hannis, J. C.; Harrell, P. M.; Massire, C.; Hall, T. A.; Jiang, Y.; Ranken, R.; Drader, J. J.; White, N.; McNeil, J. A.; Crooke, S. T.; Hofstadler, S. A. Proc. Natl. Acad. Sci. U.S.A. 2005, 102, 8012–8017. (33) Ecker, J. A.; Massire, C.; Hall, T. A.; Ranken, R.; Pennella, T. T.; Agasino Ivy, C.; Blyn, L. B.; Hofstadler, S. A.; Endy, T. P.; Scott, P. T.; Lindler, L.; Hamilton, T.; Gaddy, C.; Snow, K.; Pe, M.; Fishbain, J.; Craft, D.; Deye, G.; Riddell, S.; Milstrey, E.; Petruccelli, B.; Brisse, S.; Harpin, V.; Schink, A.; Ecker, D. J.; Sampath, R.; Eshoo, M. W. J. Clin. Microbiol. 2006, 44, 2921– 2932. (34) Sampath, R.; Russell, K. L.; Massire, C.; Eshoo, M. W.; Harpin, V.; Blyn, L. B.; Melton, R.; Ivy, C.; Pennella, T. T.; Li, F.; Levene, H.; Hall, T.; Libby, B.; Fan, N.; Walcott, D. J.; Ranken, R.; Pear, M.; Schink, A.; Gutierrez, J.; Drader, J.; Moore, D.; Metzgar, D.; Addington, L.; Rothman, R.; Gaydos, C. A.; Yang, S.; St. George, K.; Fuschino, M. E.; Dean, A. B.; Stallknecht, D.; Goekjian, G.; Yingst, S.; Monteville, M.; Saad, M. D.; Whitehouse, C. A.; Baldwin, C.; Rudnick, K. H.; Hofstadler, S. A.; Lemon, S. M.; Ecker, D. J. PLoS ONE 2007, 2, e489, DOI: 410.1371/journal.pone.0000489. (35) Oberacher, H.; Parson, W.; Muehlmann, R.; Huber, C. G. Anal. Chem. 2001, 73, 5109–5115. (36) Pusch, W.; Wurmbach, J. H.; Thiele, H.; Kostrzewa, M. Pharmacogenomics 2002, 3, 537–548. (37) Sobrino, B.; Brion, M.; Carracedo, A. Forensic Sci. Int. 2005, 154, 181– 194.
Table 1. Primer Pairs and Triplex Arrangement for the mtDNA Tiling Assay triplex grouping
Ibis primer pair number
1
2892 2901 2906
2
2891 2908 2925
3
2890 2899 2907
4
2898 2889 2923
5
2893 2910 2902
6
2897 2916 2903
7
2904 2896 2913
8
2912 2895 2905
target coordinates (primer pairs + amplified coords)
primer sequences
16231..16338 16254..16305 15893..16012 15924..15985 154..290 178..267 16256..16366 16283..16344 204..330 234..313 15937..16041 15963..16017 16318..16402 16342..16381 15985..16073 16015..16051 239..363 263..340 16025..16119 16048..16098 16357..16451 16377..16428 262..390 289..367 16154..16268 16182..16250 331..425 355..401 5..97 31..76 16055..16155 16078..16129 367..463 389..437 20..139 41..114 83..187 103..162 16102..16224 16124..16201 464..603 493..576 409..521 431..501 16130..16224 16157..16201 113..245 138..217
TCACACATCAACTGCAACTCCAA TGCTATGTACGGTAAATGGCTTTATGTACTATG TGGGGTATAAACTAATACACCAGTCTTGTAA TTAAATTAGAATCTTAGCTTTGGGTGC TCCTTTATCGCACCTACGTTCAAT TGGTTGTTATGATGTCTGTGTGG TCACCCCTCACCCACTAGGATACCAAC TGGGACGAGAAGGGATTTGACT TGTGTTAATTAATTAATGCTTGTAGGACAT TCTGTGGCCAGAAGCGG TCCTTTTTCCAAGGACAAATCAGAGA TGCTTCCCCATGAAAGAACAGAGA TGCCATTTACCGTACATAGCACAT TGGTCAAGGGACCCCTATCTG TGCACCCAAAGCTAAGATTCTAATTTAAAC TGGTGAGTCAATACTTGGGTGG TAACAATTGAATGTCTGCACAGCC TGTTTTTGGGGTTTGGCAGAGAT TCTTTCATGGGGAAGCAGATTTG TCATGGTGGCTGGCAGTAATG TCTCGTCCCCATGGATGACC TCGAGGAGAGTAGCACTCTTGTG TGCTTTCCACACAGACATCATAACAAA TCTGGTTAGGCTGGTGTTAGGGT TAGTACATAAAAACCCAATCCACATCAA TGGTGAGGGGTGGCTTTG TCTTAAACACATCTCTGCCAAACC TAAAAGTGCATACCGCCAAAAGAT TCAGGTCTATCACCCTATTAACCACT TGTCTCGCAATGCTATCGCGT TCCAAGTATTGACTCACCCATCA TACAGGTGGTCAAGTATTTATGGTAC TACCCTAACACCAGCCTAACCA TGGAGGGGAAAATAATGTGTTAGTTG TATTAACCACTCACGGGAGCT TTTCAAAGACAGATACTGCGACATA TAGCATTGCGAGACGCTGGA TGCCTGTAATATTGAACGTAGGTGC TACTGCCAGCCACCATGAATAT TGGGTTGATTGCTGTACTTGCTT TCTCCCATACTACTAATCTCATCAATACA TGCTTTGAGGAGGTAAGCTACATAAAC TGCGGTATGCACTTTTAACAGT TGTGTGTGCTGGGTAGGATG TTTCCATAAATACTTGACCACCTGTAG TGGGTTGATTGCTGTACTTGCTT TCTATGTCGCAGTATCTGTCTTTGA TGGGTTATTATTATGTCCTACAAGCATT
and HV2 sequence coordinates for a typical forensic analysis, more discriminating information is obtained with this assay. Since each amplicon is interrogated as an individual component, the technique resolves length heteroplasmy and is capable of quantitatively analyzing heteroplasmic and mixed samples. The mtDNA profiling assay described herein is produced as a preassembled kit in a 96-well plate format to which DNA extract is added to each well. A total of 12 samples may be run on a single 96-well plate. The post-PCR process is fully automated through data processing and presentation to an analyst for final analysis, quality control (QC) evaluation, and interpretation. A graphical user interface provides functions for data interrogation, registration of profiles into a database, and lookup, comparison, and searching of measured profiles. Concordance with theoretical base compositions has been demonstrated with 163 samples, and reproducibility and measurement precision have been assessed with 3 331 independent runs of the same positive
control template performed over a period of 28 months as part of routine system operation. EXPERIMENTAL SECTION From the addition of DNA sample to a prefabricated assay kit, all steps through thermocycling (amplification), desalting, mass spectrometry, data processing, mass assignment, and initial profile development were automated. A schematic overview of the assay setup and analysis process is shown in Figure S-1 in the Supporting Information. PCR Primer Design and Assay Layout. A total of 12 primer pairs were selected to interrogate the human mtDNA HV1 coordinates 15924-16428 (primers span 15893-16451) and 12 primer pairs for HV2 coordinates 31-576 (primers span 5-603) (Table 1). With the exception of the three highly conserved positions 16251, 16252, and 16253, all positions are amplified at least once and a subset of positions are amplified twice by Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
7517
overlapping primer pairs. Primers were selected using an alignment of 4839 human mtDNA sequences from the Scientific Working Group on DNA Analysis Methods (SWGDAM) public forensic population database38 supplemented with 685 full or nearly full mtDNA genome sequences from GenBank and 53 full mtDNA control region sequences generated from in-house samples. Sequences were aligned to the revised Cambridge Reference Sequence (rCRS),6,7 and primer pairs were selected to maximize the conservation of bases on the 3′ end of each primer over the alignment. Primers were grouped into eight triplex groupings that maximize the separation between the members of each triplex as shown in Table 1. A general layout of the assay is outlined in Figure S-1 in the Supporting Information. Assay Plates and PCR. Prefabricated PCR plates (BioRad hard-shell 96-well PCR plates) were prepared such that only the addition of 5 µL of prepared template per reaction was required (summarized in Figure S-1 in the Supporting Information). Each well contained 35 µL of master mix that, when brought to 40 µL by template addition, contained 10 mM Tris-HCl, 50 mM KCl, 1.5 mM MgCl2, 400 mM betaine, 200 µM each dATP, dCTP, dTTP (BioLine, Boston, MA), 200 µM 13C-enriched deoxyguanosine triphosphate (dGTP) (Spectra Stable Isotopes, Columbia, MD), 5 U Amplitaq Gold (ABI, Foster City, CA), and primer pairs at various concentrations. All primers were synthesized by Integrated DNA Technologies (Coralville, IA). Thermocycling was performed in an Eppendorf model no. 5345 under the following conditions: 96 °C, 10 min, then 36 cycles at 96 °C, 20 s, 50 °C, 1.5 min, 72 °C, 5 s (with 5% of max ramp speed between 96 and 50 °C steps), 72 °C, 4 min, and a 4 °C hold. Mass Spectrometry Analysis. The analysis process that follows PCR thermocycling is completely automated on the Ibis T5000 system39 and consists of amplicon purification followed by direct analysis of desalted analytes in an ESI-TOF mass spectrometer. Removal of salts, excess dNTPs, polymers, and other constituents of the PCR reaction that can form adducts or add to the chemical noise background in the mass spectra is required prior to injecting the amplicons into the mass spectrometer. Briefly, PCR products are desalted using an in-house magnetic bead/anion-exchange protocol modified from Jiang.40 After the sample is desalted, negative-mode mass spectrometry is carried out in a Bruker Daltonics (Billerica, MA) MicroTOF equipped with an ESI source with a glass capillary and source potential of -6000 V using a flow rate of 100 µL/h to the ESI source. A countercurrent of dry nitrogen gas is used to aid PCR product desolvation. Two internal peptide standards that bracket the most informative region of the spectrum (mass-to-charge range (m/z) of ∼726-1346) are incorporated into the final elution buffer for desalting and in the injection line buffer to aid in subsequent data processing. The buffer system and electrospray interface conditions employed promote separation of duplex DNA strands into (38) Monson, K. L.; Miller, K. W. P.; Wilson, M. R.; DiZinno, J. A.; Budowle, B. Forensic Sci. Commun. 2002, 4, April. (39) Ecker, D. J.; Drader, J.; Gutierrez, J.; Gutierrez, A.; Hannis, J.; Schink, A.; Sampath, R.; Ecker, J. A.; Blyn, L. B.; Eshoo, M. W.; Hall, T. A.; Tobarmosquera, M.; Jiang, Y.; Sannes-Lowery, K.; Cummins, L.; Libby, B.; Walcott, D. J.; Massire, C.; Ranken, R.; Manalili, S. M.; Ivy, C.; Melton, R.; Levene, H.; Harpin, V.; Li, F.; White, N.; Pear, M.; Samant, V.; Knize, D.; Robbins, D.; Rudnick, K.; Hajjar, F.; Hofstadler, S. A. JALA 2006, 11, 341– 351. (40) Jiang, Y.; Hofstadler, S. A. Anal. Biochem. 2003, 316, 50–57.
7518
Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
multiply charged single stranded molecules but do not result in fragmentation. After mass spectrometry, data processing is automatically triggered by the T5000 runtime controller. Data processing employs custom software to (1) remove general background noise, (2) postcalibrate the raw mass spectrum, using multiple isotope peaks of the internal peptide standards, and (3) “deconvolve” the raw spectrum to produce a simplified spectrum consisting of mass on the X-axis and signal intensity on the Y-axis. The resulting deconvolved mass spectrum is traversed to determine mass peak positions and report final masses in a list that is imported directly into a database for storage and retrieval during sample analysis. Base Composition Profile Analysis. All masses calculated from a mass spectrum are compared to a database of theoretical base compositions derived from known mtDNA sequences. Base compositions were predicted for each primer pair using a database of 4839 human mtDNA HV1 and HV2 sequences obtained from the Scientific Working Group on DNA Analysis Methods (SWGDAM) public forensic database,38 685 human mtDNA genome sequences obtained from GenBank, 4856 human mtDNA HV1 and HV2 sequences obtained from AFDIL (Ted Anderson, personal communication), and 53 mtDNA control region sequences generated from in-house samples. Theoretical base compositions for existing sequences are used to bracket the biological range of A, G, C, and T counts expected for each primer pair considering that human mtDNA is the known amplification target. Theoretical base compositions are assigned as consistent with any mass within a measurement uncertainty window of 70 parts per million (ppm) (this is equivalent to about 2.1 Da for a 30 000 Da piece of DNA or a 100 base single strand of DNA). The mass tolerance was set 10 ppm below the theoretical maximum of 80 ppm to avoid ambiguous assignments for the largest expected base composition (about 150 bp, for primer pair 2923) when employing 13C-enriched dGTP in place of dGTP in the PCR amplification step (see the next section and Figure S-2 in the Supporting Information for an explanation of the maximum theoretical error tolerance). Final base composition assignmentforadouble-strandedPCRproductrequiresWatson-Crick base pairing between the candidate base compositions for both the forward and reverse single stranded DNA products (the number of A’s, G’s, C’s, and T’s of one strand must equal the number of T’s, C’s, G’s, and A’s of the complementary strand, respectively). An additional constraint is imposed that limits the sum of the absolute value of measured mass errors of both single strands of a product to 80 ppm. The rationale is that slight variations in spectral calibration can cause a small systematic shift in apparent measurement errors, but measurement deviations for the two masses for a double-stranded product should not deviate substantially in opposite directions. The maximum difference in errors is therefore set equal to the theoretical uncertainty tolerance for each mass in our current system. A final constraint required the signal intensities of both strands of a product to be within a ratio of 2.5. Each of the above constraints is encoded as configurable parameters in an automated analysis interface. In the event that no base composition is assigned for a primer pair, the detected masses are automatically analyzed for any pair of masses that could produce a double-stranded product within any combination of + 1 A, G, C, and/or T from a previously characterized
mtDNA product. If no product is assigned for two masses of similar intensity, and those masses appear to potentially be a pair associated with a double-stranded DNA product, the software interface offers a mechanism to manually interrogate the mass spectrometry data for the presence of complementary products. Once base compositions are assigned for each primer pair in each triplex reaction, a base composition profile is assembled for the sample. The base composition profile for a sample consists of all base compositions assigned for all primer pairs, sorted by the 5′ sequence coordinate of each forward primer binding site, relative to the rCRS.6,7 In routine operation, after the automated development of an initial profile, the interface is accessible to the user for a visual verification. Use of 13C-Enriched dGTP As a Mass Tag. The average mass accuracy observed in the system described herein was 10.12 ± 8.04 ppm for 159 688 measurements (see below). To achieve unambiguous base composition determinations for products up to about 130 bp, a mass assignment accuracy of approximately 25 ppm is required.40,41 However, 7 712 of 159 688 correct assignments (4.8%) were outside of the required 25 ppm cutoff. Alternative base compositions will have forward and reverse strand masses that are similar to each other. For example, products with base compositions of [A46 G20 C32 T39] and [A47 G19 C31 T40] (which differ by a G f A + T f C) have masses that differ by approximately 1 Da for each forward and reverse strand. With sufficient mass accuracy and precision, which is normally achievable with ESI-TOF-MS analysis, these possibilities can be distinguished (albeit not when present simultaneously within the same spectrum). However, because biochemical reactions contain noise and signals are not always perfect in distribution, ambiguity at this level is possible. The use of 13C-enriched dGTP offsets the masses of the two strands by ∼11 Da, rather than 1 Da, making base composition assignment unambiguous. Figure S-2 in the Supporting Information displays this concept graphically. The use of this isotopically enriched nucleotide analogue does not appear to impact PCR efficiency or the incorporation of GTP. DNA Samples. DNA from 21 saliva samples from Ibis, 25 blood spot samples from the Armed Forces DNA Identification Laboratory (AFDIL), and 25 buccal swabs from the Federal Bureau of Investigation (FBI) DNA Analysis Unit II was extracted at Ibis using a Qiagen QIAamp DNA mini spin kit (Qiagen no. 51304, Valencia, CA). The DNA from the Ibis saliva samples (0.5-1 mL) was extracted using Qiagen supplementary protocol QA19s.doc Sep-01 for saliva and mouthwash. Blood samples from AFDIL were received as three filter paper punches each and were prepared according to the Qiagen QIAamp mini kit dried blood spot spin protocol. Buccal swabs from the FBI were extracted according to the Qiagen QIAamp mini kit buccal swab spin protocol. Human nuclear DNA was quantified for each sample (excluding the 21 Ibis saliva samples, which were used unquantified) using the real-time PCR-based Quantifiler kit (AB no. 4343895, Foster City, CA). For concordance studies, Ibis salivaderived samples were used unquantified. Blinded samples from the FBI and AFDIL were used at 500 pg of template per PCR reaction. Pre-extracted and quantified dried DNA also was kindly provided by John Butler and Peter Vallone of the National Institute (41) Muddiman, D. C.; Anderson, G. A.; Hofstadler, S. A.; Smith, R. D. Anal. Chem. 1997, 69, 1543–1549.
of Standards and Technology (NIST) which was resuspended in water and used at 500 pg per reaction. Assay Reproducibility. The Ibis mtDNA analysis system is generally operated by analyzing 10 samples, 1 negative control, and 1 positive control per 96-well plate. The positive control sample utilized for the period of August 2006 through December 2008 was a blood-derived sample from SeraCare (San Diego, CA). Automatically derived analysis results for sample SC35495 were extracted from analysis logs for 3 331 trials from 3 329 assay plates analyzed during this 28 month period using custom software. Each trial was assessed for the number of expected product assignments that were missed or incorrect, the number of unexpected product assignments observed, and the mass measurement error for all product assignments. Sensitivity. Assay sensitivity was assessed using five templates in dilution-to-extinction using prefabricated assay plates. Three templates were selected from the blinded samples sent by AFDIL (AF-6, AF-18, and AF-25, received as blood spots on filter paper) and two templates were selected from the blinded samples sent by the FBI (FBI-58 and FBI-67, received as buccal swabs). Chosen templates exhibited no heteroplasmy. Human DNA concentrations were determined with the Quantifiler kit (AB no. 4343895, Foster City, CA). Template concentrations were examined over the range of 100-0.098 pg per well in 2-fold serial dilutions. The lowest concentration at which the correct product was automatically detected for each primer pair and the lowest concentration at which a full correct profile was automatically detected for each sample were noted. Quantification of Mixtures. To test the ability to interrogate mixtures of mtDNA templates, pairs of known mtDNA types were mixed at specific ratios, amplified by PCR, and analyzed on an Ibis T5000 system in the FBI DNA Analysis Unit II laboratory. DNA samples used in this study were obtained from a subset of the same individuals who provided DNA for blinded samples from the FBI (FBI-57, FBI-61, FBI-65, FBI-70, and FBI-72). The number of mtDNA copies in each sample was estimated with a real-time PCR quantification method developed at the FBI (personal communication) and mixtures were constructed on the basis of estimated mtDNA copies. Mixtures were made at 99:1, 95:5, 90: 10, 75:25, 25:75, 10:90, 5:95, and 1:99. The total number of amplification products expected to generate different base compositions for each mixture is shown in Table 5. Each quantification estimate was calculated as the average of relative signal outputs from at least 13 different primer pairs. For each primer pair, the relative signal ratio was the average of three replicate trials, with the exception of one 75:25 mixture of FBI-65 + FBI-70, which was based upon two replicates because of a single reaction that was lost during analysis (Table 5). In cases where one template or the other displayed poly-C length heteroplasmy in HV1 or HV2, affected regions were generally not used in the quantification due to the splitting of total signal over several heteroplasmic products for one template. RESULTS AND DISCUSSION Information Content Relative to Sequencing of HV1 and HV2. On the basis of analysis of human mtDNA sequences from GenBank containing the full range of sequence covered by the 24 primer pairs in Table 1, 94.2% of 1 266 distinguishable control region sequences were differentiated by one or more base Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
7519
composition differences. The few percent loss of information comes from reciprocal SNPs within a single amplified region. For example, if a sequence contains a C f T SNP and a T f C SNP within the same amplicon, it will yield the same base composition in that amplicon as another sequence without the C f T and T f C SNPs as such compensating SNPs are “mass silent” in a given amplicon. The condition that all differences between two sequences are compensated by reciprocal SNPs is uncommon, and in some cases where it occurs, they can be resolved by overlapping primer pairs. Thus, over the span of the 24 tiled regions, most profile-to-profile differentiation power is retained. Although the assay retains 94.2% of individually discriminating information relative to sequencing, the assay covers a larger coordinate space than that of the originally defined minimal HV1/ HV2 regions (rCRS coordinates HV1 16024-16365 and HV2 73-340).4,12,38,42 For the same 1 266 sequences that are distinct in sequence over the range of the tiling assay coordinates (Table 1), only 90.2% are differentiated by sequencing of the minimal HV1 and HV2 coordinates. With a base composition data obtained with this assay to existing sequence databases consisting largely of sequences containing the minimum HV1/HV2 regions, there is also a small additional loss of information due to the inability to predict PCR product base compositions for primer pairs that encompass a 5′ or 3′ end of the sequenced range. Considerations for Profile Comparison to Existing Databases. Because each PCR product is generated independently and analyzed directly, there is a deterministic relationship between the underlying sequence of the template and the base composition of each PCR product. A profile represented as a list of base compositions associated with rCRS coordinates (defined by the primer target sites) can therefore be compared directly to any sequence for all primer pairs where there is sequence data for all amplified coordinates. Therefore, a base composition profile generated in this assay can be searched against a sequence database by converting the sequence database to base compositions. Thus an inference can be conveyed about the rarity of a base composition profile in a given reference population. Counting method statistics with an upper bound frequency estimate on an evidence profile that matches a particular reference profile within a population can be performed in the same manner as for sequence data.12,47 Rather than specific positional differences, the assay described here reports the minimum number of differences that must exist between two profiles according to the base compositions associated with each primer pair, taking primer pair overlapping into account. In this way, sequence data can be compared to profiles generated in this assay. Likewise, sequence queries (converted to base composition profiles) can be searched directly against a base composition profile database. Concordance with Fully Sequenced Templates. Mitochondrial DNA sequences from 21 individuals (3 blood samples and 18 saliva samples) were obtained by amplifying three overlapping regions of the control region (15778-16403, 16350-374, and 281-777) and sequencing the PCR products. Sequence profiles for the ranges 15778-777 were stored in a database and converted to theoretical base compositions according to the primer-defined (42) Scientific Working Group for DNA Analysis Methods. Forensic Sci. Commun. 2003, 5.
7520
Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
coordinates in Table 1. All measured base compositions were consistent with theoretically predicted base compositions for the 24 regions. Concordance with Sequence Data for Blinded Samples. The ability of the assay system to generate base composition profiles that can be directly compared to theoretical profiles based on sequence data was tested with 50 blinded samples from two sources (25 blood spots on filter paper received from AFDIL and 25 buccal swabs received from the FBI DNA Analysis Unit II). For all blinded samples, base composition profiles were developed and communicated back to collaborators prior to receiving mtDNA sequence data corresponding to each sample. Samples from the FBI were selected such that each sample had primer/template mismatches in at least one primer pair. The FBI samples also included one blank swab and two swabs each containing two contributors (this information was not disclosed to Ibis until after results were communicated to the FBI). Each of the 47 single-source samples produced a full profile containing products for all 24 primer pairs (There were no PCR failures caused by the primer/target mismatches). In all cases, base composition profiles were consistent with theoretical predictions over the range of available sequence data. For the 25 samples from AFDIL, and 1 sample from the FBI (FBI-22), available sequence data ranged from 16024-16365 (HV1) and 73-340 (HV2), allowing prediction of base compositions for 12 of the 24 regions. For the remaining 21 FBI samples with sequence data, sequenced coordinates were 15998-576 (including the origin of replication), allowing prediction of base compositions for 22 of the 24 regions. The blank FBI swab produced no profile, and the two mixed swabs were correctly identified as mixtures. Moreover, in each case of mixed templates, it was correctly predicted that the profile of one contributor was consistent with that from one of the donors within the blinded sample set while the other contributor profile was not included in the blinded sample set. The included contributor was correctly identified in each case (sample FBI-33 for mixed sample FBI-101 and sample FBI-47 for mixed sample FBI-102). Additional Population Reference Samples. An additional 95 samples obtained from NIST were analyzed using 500 pg of total nuclear DNA per PCR reaction. This set consisted of 31 Caucasian, 32 African-American, and 32 Hispanic population samples. Full base composition profiles were obtained for all 95 samples. Subsequently, sequence data ranging from 16024 to 577 for 94 of the samples were provided by NIST. This allowed prediction of theoretical base compositions for 21 of 24 primer pairs for each sample. All predicted compositions were consistent with observed profiles. Heteroplasmy Detection. Sequence heteroplasmy is readily resolved using mass spectrometry. Two species of the same region differing at a single base (e.g., C T T) are detected independently and appear as two double stranded products; the masses of the forward strands are separated by approximately 15 Da and the masses of the reverse strands are separated by approximately 26 Da (because a 13C-enriched dGTP is used in this assay as a mass tag, the reverse strand has a molecular mass approximately 11 Da heavier than if using unenriched GTP). Figure 1 shows an example of a C T T heteroplasmy detected in sample FBI-37 from the FBI blinded sample set. The sequence profile for this sample has a “Y” designation at position 16298, indicating the presence of both cytosine and thymine. Two primer
Figure 1. SNP heteroplasmy detection. A C T T SNP heteroplasmy was detected in blinded sample FBI-37 in products from overlapping primer pairs 2892 (amplifies 16254-16305, panel A) and 2891 (amplifies 16283-16344, panel B). In panel A, the ratio of average peak heights suggested 76.6% [A40 G9 C41 T18] and 23.4% [A40 G9 C40 T19] (corresponding to 76.6% C and 23.4% T). In panel B, the ratio of average peak heights suggested 77.8% [A37 G9 C42 T23] and 22.2% [A37 G9 C41 T24] (corresponding to 77.8% C and 22.2% T). The overlapping region between the primer pairs suggested a polymorphism within the range of 16283-16305. The sequence data contained a “Y” designation at 16298.
pairs overlap this position: primer pair 2892 amplifies positions 16254-16305 and primer pair 2891 amplifies positions 16283-16344 (see Table 1). As shown in Figure 1, both of these primer pairs produce two products that differ from each other by a C T T difference. Moreover, with this assay it is possible to quantify the relative ratios of each component as approximately 77% of the “C” containing species and 23% of the “T” containing species. Two independent PCR amplicons corroborate the relative signal intensities for the two variants (Figure 1). This assay offers a distinct advantage over sequencing when confronted with a sample containing length heteroplasmy, which is fairly common in the poly-C tracts found in HV1 (16184-16193), HV2 (303-309/310-315), and downstream of HV2 (568-573, referred to as HV3).10-13,43 For example, when a T to C transition occurs at 16189, a poly-C tract is created that results in the generation of multiple length variants within an individual; the resulting sequence data appears out of register and rapidly degrades after the homopolymeric C stretch. The mass spectrometer simply measures each product independently, so multiple products that differ in length are clearly differentiated. Figure 2 shows an example of analysis of length heteroplasmy in HV1 (Figure 2A) compared to a nonheteroplasmic HV1 sample (Figure 2B). Whereas the sequence data are not interpretable after the poly-C stretch in Figure 2A, mass spectrometry reveals four distinct length variants while Figure 2B shows an example without multiple length variants. The four distinct length variants seen in the sample in Figure 2A were detectable in each of three overlapping regions that span the HV1 poly-C tract (data not shown). There is also a “CA” repeat found at positions 514-523 that occasionally presents CA dinucleotide-length heteroplasmy.44-46 Three samples contained two variant species in this (43) Calloway, C. D.; Reynolds, R. L.; Herrin, G. L., Jr.; Anderson, W. W. Am. J. Hum. Genet. 2000, 66, 1384–1397. (44) Szibor, R.; Plate, I.; Heinrich, M.; Michael, M.; Schoning, R.; Wittig, H.; Lutz-Bonengel, S. Int. J. Legal Med. 2007, 121, 207–213. (45) Bodenteich, A.; Mitchell, L. G.; Polymeropoulos, M. H.; Merril, C. R. Hum. Mol. Genet. 1992, 1, 140. (46) Chung, U.; Lee, H. Y.; Yoo, J. E.; Park, M. J.; Shin, K. J. Int. J. Legal Med. 2005, 119, 50–53.
region that differed in length by a CA repeat (sample FBI-61 from the FBI set and samples NIST-WT51354 and NIST-WT51386 from the NIST set, data not shown). A total of 73 of 163 single-source samples showed varying degrees of length heteroplasmy. Table 2 summarizes the overall frequency of occurrence of specific numbers of length variants in HV1, HV2, or HV3. The most common overall length heteroplasmy patterns appear to be two length variants in HV2 and four length variants in HV1. Note that the poly-C stretches in HV1 and HV2 are amplified by three overlapping primer pairs each (see the coordinates in Table 1). The distribution of relative abundances of length variants is highly reproducible between the three overlapping regions within a sample and between replicates of the same sample. An overview of the relative abundances of length variants observed can be found in Table S-1 in the Supporting Information. For the 163 single-source samples described above (21 inhouse samples, 22 samples from the FBI, 25 samples from AFDIL, and 95 samples from NIST), there were 14 cases of C T T SNP heteroplasmy and two cases of A T G SNP heteroplasmy detected. A total of 11 cases of C T T heteroplasmy had a “Y” designation and two cases had an “N” designation in the sequence within the amplified coordinates of each product showing SNP heteroplasmy. Sequence data did not reflect the detected C T T heteroplasmy for one sample (NIST sample NIST-PT84224). One sample with an A T G heteroplasmy had an “N” noted in the sequence while the other (AFDIL sample AF-15) did not have a heteroplasmy noted in the sequence data. Table 3 summarizes SNP heteroplasmies observed in the 163 single-source samples. Assay Reproducibility. Reproducibility was assessed by examining 3 331 separate assay trials of the same template (bloodderived sample SC35495, currently the positive control used inhouse) run at 500 pg of total DNA per well over a period of 28 months. In this set, 3 298 (99%) of the trials yielded a full profile and 33 (1%) were missing one or more regions (generally due to a plugged injector or marginal cleanup). There were initially 277 (47) Council, N. R. NRC Report II: The Evaluation of Forensic Evidence; National Academy Press: Washington, DC, 1996.
Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
7521
Figure 2. Effect of length heteroplasmy upon mitochondrial DNA sequencing and the Ibis mitochondrial tiling assay. Sequence analysis is severely challenged by multiple templates that differ in length. Panel A shows the forward strand sequence electropherogram of a sequencing reaction developed by direct PCR-product sequencing from in-house saliva sample CS0033 on top. The sequence electropherogram becomes unreadable after the poly-C region due to a mixture of templates of varying length. The mass spectrometry result for primer pair 2896 (rCRS amplified coordinates 16124...16201), one of the three primer pairs that encompass the HV1 poly-C stretch, is shown on the bottom. There are four clearly resolved products that differ by single C residues in the poly-C tract. Panel B shows equivalent data obtained for the nonheteroplasmic sample CS0016. A single double-stranded product obtained for primer pair 2896 (bottom) correlates with homoplasmic sequence data (top). *, 13 C-enriched dGTP was used in these reactions, adding just under 10 Da to the mass of each G residue. Table 2. Length Heteroplasmies Observed in 163 DNA Samples no. of observed variants heteroplasmy type
2
3
4
5
HV1 HV2 HV3 HV3
1 43 1 4
4 10 5
21 1
1
C-length C-length C-length CA-length
unassigned products (0.35%) in 126 samples in the automated first pass. The majority of these unassigned products was due to slight miscalibration of individual spectra (due to an overabundance of signal interfering with the peptide standards) and was correctly assigned with manual assistance. Ultimately, there were 85 missed assignments (no base composition was assigned for a primer pair, leaving a gap in the profile) out of 79 944 expected products (0.11%). Remaining missed assignments were due to low or absent signal due to a plugged injector or failure to add template in the PCR setup. There were 21 misassignments (an incorrect base composition was assigned) in 14 samples in the automated first pass of which all but 12 assignments (0.015%) in 6 samples were easily corrected during normal manual inspection. The remaining misassignments were due to salt adduction that was recognized during manual inspection. There were 113 extra artifact assignments (primarily due to salt adducts or products with unusually high adenylation). All but 10 artifacts were corrected during manual inspection. Those that were not unambiguously corrected during manual inspection were recognized as problematic and could be filtered. To assess the precision 7522
Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
and accuracy of profile assignment and mass measurements, the observed mass measurement deviation in parts-per-million was plotted versus the frequency of occurrence. As shown in Figure 3, the plot is centered around 0 ppm mass measurement deviation and indicates that there is not a bias in the mass measurement uncertainty. The average measurement deviation magnitude for 159 688 measurements was 10.12 ± 8.04 ppm. Sensitivity. The lowest concentration at which the product was automatically assigned for each region and for each sample is shown in Table 4. The average sensitivity for all regions over the five templates was 0.90 ± 1.09 pg/reaction. A full profile was obtained at 3.1 pg/reaction for four of five and at 6.3 pg/reaction for one of five templates. In addition, more than 60 dilution trials of the standard positive control (blood sample SC35495) run in a routine system validation format (data not shown) have been used to set a conservative limit of sensitivity of 25 pg/reaction for the mtDNA assay. This is 4× the limit observed in this trial and is defined as the sensitivity that must be achieved for a system validation plate to pass the system sensitivity quality control criteria. Quantification of Mixtures. Mixtures were made at 99:1, 95: 5, 90:10, 75:25, 25:75, 10:90, 5:95, and 1:99 (Table 5). Mixtures of 99:1 and 1:99 produced a single, clean profile of the predominant sample in all cases. There are two implications of this result: (1) The Ibis mtDNA tiling assay is currently not capable of detecting and resolving mixtures where the minor contributor is present at 1% or lower of the total DNA in a sample and (2) the presence of a contaminant at a level of 1% or below the total DNA input will not interfere with simple, automated analysis of mitochondrial
Table 3. SNP Heteroplasmies Observed in 163 DNA Samplesa
contributor
sample
in-house
CS0021 CS0038 CS0040 AF-2 AF-4 AF-15 AF-19 FBI-37 FBI-47 FBI-51 FBI-66 FBI-72 NIST-JT51499 NIST-JT52076 NIST-PT84224 NIST-PT84231
AFDIL
FBI
NIST
a
replicates in percentage percentage observed variant of variant variant of variant heteroplasmy region 1 1 (%) 2 2 (%) primer pairs
amplified overlap
sequence data
14.5 ± 0.6 25.4 35.7 ± 0.3 30.0 ± 2.2 48.6 ± 2.2 21.1 ± 2.0 44.8 ± 1.1 22.9 ± 0.8 17.3 ± 0.6 45.9 ± 4.2 48.5 ± 1.1 39.8 ± 4.1 41.4 ± 0.4 34.9 ± 0.1 48.3 ± 0.8 19.1 ± 0.5
16078-16098 16283-16344 16078-16098 16283-16305 16157-16201 16124-16129 16048-16051 16283-16305 16078-16098 41-76 178-217 178-217 178-217 16254-16305 16078-16098 178-217
16093 Y 16325 Y 16093 Y 16293 N 16176 N ND 16051 N 16298 Y 16093 Y 64 Y 195 Y 217 Y 198 Y 16260 Y ND 204 Y
CTT CTT CTT CTT CTT ATG ATG CTT CTT CTT CTT CTT CTT CTT CTT CTT
4 1 2 4 4 4 4 4 4 2 4 4 4 2 4 4
T T T T T A G T T T C T T C T T
C C C C C G A C C C T C C T C C
85.5 ± 0.6 74.6 64.3 ± 0.3 70.0 ± 2.2 51.4 ± 2.2 78.9 ± 2.0 55.2 ± 1.1 77.1 ± 0.8 82.7 ± 0.6 54.1 ± 4.2 51.5 ± 1.1 60.2 ± 4.1 58.6 ± 0.4 65.1 ± 0.1 51.7 ± 0.8 80.9 ± 0.5
2898, 2897 2891 2898, 2897 2892, 2891 2896, 2895 2897, 2896 2899, 2898 2892, 2891 2898, 2897 2902, 2903 2905, 2906 2905, 2906 2905, 2906 2892 2898, 2897 2905, 2906
“ND” means that no heteroplasmy was noted in the sequence data received for the indicated sample.
Table 4. Sensitivity of All Primer Pairs Assessed with Five Different Templates (Picogram/Reaction) template
Figure 3. Distribution of measurement errors for 159 688 mass measurements. Errors in parts-per-million (ppm) were assessed for all product assignments from 3 311 trials of the same template over the course of 28 months. Measurements were taken with three different T5000 instruments during this time.
DNA using the Ibis mtDNA tiling assay. The point at which a minor contributor became detectable was ∼5% of the total DNA input (Table 5), but mixtures were not reliably resolved for a minor contributor present at 5% of the total DNA input. With manual interrogation of the data, results from a single experiment involving mixtures at the 10% level suggest that mixtures at this level may be resolvable (Table 5). Mixtures containing 25% minor contributor were easily identified with the automated analysis processing. Relative mass spectrometry signal outputs for mixed products compared well with template input ratios at minor contributor levels of 25% and higher (Table 5). Figure 4 displays the detection of specific product peaks from a 25:75 mixture of two templates (FBI-65 and FBI-70). There is an elevated quantification error in regions with length heteroplasmy, presumably because some length variants fall below detection limits and the total signal for the variants from a heteroplasmic template becomes diluted by splitting the signal across
primer pair
amplified coordinates
2889 2890 2891 2892 2893 2895 2896 2897 2898 2899 2925 2901 2902 2903 2904 2905 2906 2907 2908 2923 2910 2916 2912 2913 complete profile
16377..16428 16342..16381 16283..16344 16254..16305 16182..16250 16157..16201 16124..16201 16078..16129 16048..16098 16015..16051 15963..16017 15924..15985 31..76 41..114 103..162 138..217 178..267 263..340 234..314 289..371 355..402 389..437 431..501 493..576
AF-6
AF-18
AF-25
FBI-58
FBI-67
0.195 0.195 0.391 0.391 0.195 0.781 0.195 0.391 0.391 0.781 0.391 0.391 0.391 0.391 0.195 1.563 0.781 0.781 0.391 3.125 0.195 0.391 0.391 0.195 3.125
0.391 0.391 0.195 0.391 3.125 0.781 0.195 1.563 0.781 0.391 0.195 0.391 0.195 1.563 0.195 3.125 0.391 0.391 0.391 0.781 3.125 1.563 1.563 0.391 3.125
0.391 1.563 0.391 0.781 0.781 3.125 0.781 1.563 0.781 0.391 0.391 0.195 0.781 1.563 0.781 3.125 0.781 0.781 3.125 0.781 0.781 1.563 3.125 1.563 3.125
0.391 0.391 0.391 0.195 0.195 0.781 0.391 0.781 3.125 0.098 0.391 1.563 0.195 0.391 0.391 0.781 1.563 0.195 0.391 3.125 0.391 0.781 0.781 0.391 3.125
0.195 0.195 0.195 0.781 0.195 0.781 0.391 0.391 6.250 0.098 0.098 0.781 0.098 0.391 0.195 0.781 0.781 3.125 0.781 6.250 0.098 0.391 0.781 0.195 6.250
multiple products. All relative quantifications were based upon the average of at least 13 different primer pairs. Strengths and Limitations of the Ibis Forensics Platform. The Ibis platform provides several notable advantages over conventional assays. The assay is capable of resolving mixtures of mtDNA from two contributors. When template concentrations are similar, it is possible to determine that a sample is a mixture but not to unravel a mixed profile into two independent contributor profiles. However, when template inputs are asymmetric, the approach is conceptually capable of quantifying the relative contribution of each template which enables resolution of both profiles. There are three primary caveats to mixed Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
7523
Table 5. Observed Profiles and Signal Output Ratios for Mixtures of Templates in the Ibis Mitochondrial Control Region Tiling Assay
template 1
template 2
no. of products (out of 24) that differ between templates
FBI-65
FBI-70
19
no. of PCR products used in quantificationa
14 15 16 18 FBI-65
FBI-70
19
14b 16 16 16 15 FBI-61
FBI-72
20
17 14 17 17 16 FBI-65
FBI-72
16 16 16
FBI-57
FBI-70
16 13 13
template 1/ template 2 ratio 99:1 95:5 90:10 75:25 25:75 10:90 5:95 1:99 99:1 99:1 95:5 95:5 75:25 75:25 25:75 25:75 5:95 5:95 1:99 1:99 99:1 99:1 95:5 95:5 75:25 75:25 25:75 25:75 5:95 5:95 1:99 1:99 99:1 95:5 75:25 25:75 5:95 1:99 99:1 95:5 75:25 25:75 5:95 1:99
observed profile FBI-65 FBI-65 FBI-65 FBI-65 FBI-65 FBI-65 FBI-70 FBI-70 FBI-65 FBI-65 FBI-65 FBI-65 FBI-65 FBI-65 FBI-65 FBI-65 FBI-70 FBI-65 FBI-70 FBI-70 FBI-61 FBI-61 FBI-61 FBI-61 FBI-61 FBI-61 FBI-61 FBI-61 FBI-72 FBI-61 FBI-72 FBI-72 FBI-65 FBI-65 FBI-65 FBI-65 FBI-72 FBI-72 FBI-57 FBI-57 FBI-57 FBI-57 FBI-70 FBI-70
+ + + +
FBI-70 FBI-70 FBI-70 FBI-70
+ + + +
FBI-70 FBI-70 FBI-70 FBI-70
+ FBI-70
+ + + +
FBI-72 FBI-72 FBI-72 FBI-72
+ FBI-72
+ FBI-72 + FBI-72
+ FBI-70 + FBI-70
percent estimate for template 1 (%)
relative signal quantification estimatec
100 100 86.3 + 3.0 71.2 ± 3.1 26.2 ± 4.6 14.1 ± 5.1 0 0 100 100 100 100 79.7 ± 6.5 77.9 ± 6.5 37.9 ± 4.3 33.8 ± 4.5 0 12.0 ± 4.4 0 0 100 100 100 100 66.6 ± 7.6 72.5 ± 5.4 21.9 ± 3.4 30.7 ± 4.3 0 10.0 ± 3.1 0 0 100 100 75.2 ± 7.5 26.7 ± 6.7 0 0 100 100 64.9 ± 2.1 20.6 ± 4.4 0 0
ND ND 86:14 71:29 26:74 14:86 ND ND ND ND ND ND 80:20 78:22 38:62 34:66 ND 12:88 ND ND ND ND ND ND 67:33 73:27 22:78 31:69 ND 10:90 ND ND ND ND 75:25 27:73 ND ND ND ND 65:35 21:79 ND ND
a Not all amplification products were useful for quantification in all cases. Primer pairs spanning the HV1 or HV2 poly-C region were generally not considered in the quantification estimates when one template was heteroplasmic due to distribution of the total signal over several independent length variants in the heteroplasmic sample. b This analysis was based upon duplicates due to a single well loss during analysis. All other analyses were based upon triplicate experiments. c ND ) not detected.
profile resolution: (1) the template concentrations must be sufficiently asymmetric to associate products to a given contributor, (2) it is not possible to conclude that a single product for a primer pair means that both contributors have the same product (if the minor contributor drops out or the signal is split by heteroplasmy, for example), and (3) in the event of sequence heteroplasmy in the major contributor, it may not be possible to assign products to one contributor or the other. On the other hand, the Ibis mtDNA assay is not compromised by the presence of length heteroplasmy, which causes significant analytical challenges during sequence analysis (see Figure 2). Because each DNA species is measured independently as a separate entity, length variants do not hinder interpretation of assay results. Fully automated desalting followed by direct mass spectrometry and automated data 7524
Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
processing provides benefit both in terms of throughput and convenience relative to the traditional sequence-based workflow. Additionally, this approach has the potential to offer considerable costs savings to the forensics practitioner as both data acquisition and analysis require less hands-on time relative to conventional approaches. Despite these advantages, there is a trade-off. Compensating SNPs within an amplified region (e.g., a C f T and T f C within the same amplicon) will be “mass silent” and undetected in the Ibis system. For example, the common variations at nucleotide positions 150 and 152 will be masked when both occur in unison for two samples being directly compared. Furthermore, it is not possible with the mtDNA tiling approach to identify the exact position of a variant within a region. Thus, the approach is not useful for phylogenetic studies. However,
Figure 4. Deconvolved spectra from a 25:75 mixture of FBI-65 and FBI-70. Detailed view of mass spectra zoomed in to specific regions of triplex reactions performed on a 25:75 mixture of FBI-65 and FBI-70 using a total of approximately 1000 mtDNA copies. (A) Products from two primer pairs (2902 and 2910) are shown demonstrating the resolution of distinct products for the two input templates at signal output ratios corresponding approximately to the intended template input ratios. (B) Products from primer pair 2896. FBI-70 has multiple C-length heteroplasmies in the HV1 poly-C stretch, which is encompassed by primer pair 2896. Three distinct length-variant products are displayed here for template FBI-70 (a fourth is also present but not in zoomed view). A separate distinct product [A47 G11 C41 T24] is present that corresponds to the product generated from template FBI-65. The signal amplitude observed for the FBI-65 product [A47 G11 C41 T24] is ∼23% of the combined signal amplitude for the four FBI-70 products [A44 G13 C42/43/44/45 T23].
assays can be developed on the same platform that can identify specific SNPs and their positions within a sequence; however, that would reduce the discrimination power compared with this mitochondrial profiling assay. The ESI-MS platform is currently limited by the upper size of an amplicon that can be analyzed with high precision. For the unambiguous determination of product base compositions, PCR products should be under 150 bp in length. This limit is imposed by three factors: (1) The mass measurement deviations observed in mass spectrometry analyses are proportional to the mass of the analyte being measured, (2) spectral congestion increases with product size, as the number of discrete charge states present is proportional to the mass of the analyte, and (3) the number of possible nucleotide combinations consistent with a given mass (and mass measurement uncertainty) increases with increasing mass. When potential base compositions are highly constrained, the upper size can be extended to amplicons of 200 bp or greater because an unambiguous, de novo calculation of base composition is not required (rather, the masses can be compared directly to an allele database). We are currently applying an analogous approach to employing ESI-MS to characterize STRs,and will report on that work in detail elsewhere.48 The ESI-MS platform is currently limited in capacity for multiplexing compared to CE-based methods. This is due primarily to product size and spectral congestion limitations. The addition of a separation step such as HPLC between PCR and mass spectrometry can increase the multiplexing capacity.24 (48) Hall, T. A.; Lowery, K.; Budowle, B. B.; Planz, J.; Eisenberg, A.; Butler, J. M.; Hofstadler, S. A. Manuscript in preparation.
CONCLUSIONS PCR/ESI-TOF-MS offers an efficient method for profiling the control region of mtDNA that identifies differences between individuals without targeting specific nucleotide positions. With the assay described here, approximately 94% of the discrimination provided by sequencing the same coordinate range is retained. The approach described here provides resolution exceeding that obtained by sequencing the minimum HV1 and HV2 coordinates (16024-16365 and 73-340) because a greater area is covered with the tiling primers (15924-16428 and 31-576). Because no a priori knowledge of polymorphic sites is required, this assay is capable of detecting polymorphisms without any postamplification targeting. The assay is produced as a prefabricated kit that requires only the addition of template, and all post-PCR sample handling, mass spectrometry, and raw data processing are automated. The assay is not compromised by the presence of length heteroplasmy, is capable of identifying mixed contributor samples, and has the potential to assign profiles to major and minor contributors in asymmetrically mixed samples. Profiles may be compared directly to sequencebased profiles and vice versa by converting sequences to theoretical base compositions. Moreover, database comparisons and counting statistics can be performed in the same manner they are today, making the output of the assay inherently compatible with existing population databases. ACKNOWLEDGMENT This is publication number 09-24 of the Laboratory Division of the Federal Bureau of Investigation. The opinions and assertions contained herein are solely those of the authors and are not to be construed as official or as views of the U.S. Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
7525
Department of Defense, the U.S. Department of the Army, the Armed Forces Institute of Pathology, the U.S. Department of Justice, or the Federal Bureau of Investigation. Names of commercial manufacturers are provided for identification only and inclusion does not imply endorsement by the Federal Bureau of Investigation. We would like to thank John Butler and Peter Vallone for supplying population reference samples and mitochondrial DNA sequence profiles for them. We would like to thank Arthur Eisenberg and John Planz for helpful
7526
Analytical Chemistry, Vol. 81, No. 18, September 15, 2009
discussions. This work was funded in part by the National Institutes of Justice Award No. 2006-DN-BX-K011. SUPPORTING INFORMATION AVAILABLE Additional information as noted in text. This material is available free of charge via the Internet at http://pubs.acs.org. Received for review June 4, 2009. Accepted July 28, 2009. AC901222Y