Length and Base Composition of PCR-Amplified Nucleic Acids Using

A generally applicable algorithm has been developed to allow base composition of polymerase chain reaction. (PCR) products to be determined from mass ...
0 downloads 0 Views 224KB Size
Anal. Chem. 1997, 69, 1543-1549

Length and Base Composition of PCR-Amplified Nucleic Acids Using Mass Measurements from Electrospray Ionization Mass Spectrometry David C. Muddiman, Gordon A. Anderson, Steven A. Hofstadler, and Richard D. Smith*

Macromolecular Structure and Dynamics, Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington 99352

A generally applicable algorithm has been developed to allow base composition of polymerase chain reaction (PCR) products to be determined from mass spectrometrically measured molecular weights and the complementary nature of DNA. Mass measurements of arbitrary precision for single-stranded DNA species are compatible with an increasingly large number of possible base compositions as molecular weight increases. For example, the number of base compositions that are consistent with a molecular weight of 35 000 is ∼6000, based on a mass measurement precision of 0.01%. However, given the low misincorporation rate of standard DNA polymerases, mass measurement of both of the complementary single strands produced in the PCR reduces the number of possibilities to less than 100 at 0.01% mass precision, and base composition is uniquely defined at 0.001% mass precision. Taking into account the low misincorporation rate of standard DNA polymerases and the fact that the final PCR product also contains primers of known sequence (generally 15-20-mer in size, which flank the targeted region), this reduces the number of possible base combinations to only ∼3 at MW ) 35 000. In addition, the number of base pairs (i.e., length of the DNA molecule) is uniquely defined. We show that the use of modified bases in PCR or post-PCR modification chemistry allows unique solutions for the base composition of the PCR product with only modest mass measurement precision. The ability to characterize large intact macromolecules (e.g., proteins and oligonucleotides) by mass spectrometry (MS) has resulted from by advances in “soft” ionization techniques such as electrospray ionization (ESI)1,2 and matrix-assisted laser desorption/ionization (MALDI).3,4 Biomolecules of over 1 MDa have been measured by ESI-MS approaches, and 110 MDa (doublestranded T4 coliphage DNA) ions have been detected using ESIFTICR.5 (1) Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M. Science 1989, 246, 64. (2) Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M. Mass Spectrom. Rev. 1990, 9, 37. (3) Karas, M.; Hillenkamp, F. Anal. Chem. 1988, 60, 2299. (4) Karas, M.; Bahr, U.; Giessmann, U. Mass Spectrom. Rev. 1991, 10, 335. (5) Chen, R. D.; Cheng, X. H.; Mitchell, D. W.; Hofstadler, S. A.; Wu, Q. Y.; Rockwood, A. L.; Sherman, M. G.; Smith, R. D. Anal. Chem. 1995, 67, 1159. S0003-2700(96)01134-1 CCC: $14.00

© 1997 American Chemical Society

To determine the base composition of nucleic acids (i.e., RNA or DNA), one can exploit not only the molecular weight information but also the masses of the limited set of individual building blocks (e.g., A, C, G, and T for DNA). McCloskey and co-workers have previously demonstrated that the number of compositional possibilities for RNA becomes extremely large as molecular weight increases if no constraints are imposed on base composition.6 However, they also showed that, by restricting the number for a particular base, the number of base compositions was significantly reduced (e.g., by elucidating the compositional value for any one residue via chemical modification or enzymatic methods). It was shown that, for base compositions of RNA in which the abundance of one base is known, compositions up to the 14-mer level could be uniquely defined given a mass measurement precision of 0.01% (100 ppm), with similar results noted for DNA. The polymerase chain reaction (PCR)7 allows the enzymatic synthesis of large oligonucleotides from targeted regions of genomic DNA with high fidelity. The typical PCR exponentially amplifies a targeted nucleic acid sequence, producing a doublestranded (ds) product. An improved understanding of the limitations associated with the mass spectrometric analysis of nucleic acids8-14 has recently allowed ionization and precise measurements of intact PCR products exceeding the 100-mer level by ESI Fourier transform ion cyclotron resonance (FTICR) MS.15,16 Denaturing a PCR product prior to MS analysis allows individual strand mass information to be obtained. A number of new, as well as established, applications of PCR produce a range of products for which rapid and accurate identification and characterization schemes are needed. For example, PCR amplification and characterization of nucleic acids has proven to be a viable method for monitoring microbial communities in soil and aquatic (6) Pomerantz, S. C.; Kowalak, J. A.; McCloskey, J. A. J. Am. Soc. Mass Spectrom. 1992, 4, 204. (7) Mullis, K.; Faloona, F. Methods Enzymol. 1987, 155, 335. (8) Greig, M. J.; Gaus, H. J.; Griffey, R. H. Rapid Commun. Mass Spectrom. 1996, 10, 47. (9) Greig, M.; Griffey, R. H. Rapid Comm. Mass Spectrom. 1995, 9, 97. (10) Stults, J. T.; Marsters, J. C. Rapid Commun. Mass Spectrom. 1991, 5, 359. (11) Muddiman, D. C.; Cheng, X. H.; Udseth, H. R.; Smith, R. D. J. Am. Soc. Mass Spectrom. 1996, 7, 697. (12) Doktycz, M. J.; Hurst, G. B.; Habibi-Goudarzi, S.; McLuckey, S. A.; Tang, K.; Chen, C. H.; Uziel, M.; Jacobson, K. B.; Woychik, R. P.; Buchanan, M. V. Anal. Biochem. 1995, 230, 205. (13) Altman, E. I.; Colton, R. J. Surf. Sci. 1993, 295, 13. (14) Bleicher, K.; Bayer, E. Biol. Mass Spectrom. 1994, 23, 320. (15) Wunschel, D. S.; Fox, K. F.; Fox, A.; Bruce, J. E.; Muddiman, D. C.; Smith, R. D. Rapid Commun. Mass Spectrom. 1996, 10, 29. (16) Muddiman, D. C.; Wunschel, D. S.; Liu, C.; Pasa-Tolic, L.; Fox, K. F.; Fox, A.; Anderson, G. A.; Smith, R. D. Anal. Chem. 1996, 68, 3705.

Analytical Chemistry, Vol. 69, No. 8, April 15, 1997 1543

environments, allowing the progress of intrinsic or accelerated bioremediation efforts to be monitored on relatively short time scales.17,18 Accurate mass measurements from ESI-MS can even identify single-base substitutions due to the much higher mass accuracy and precision than those possible with conventional electrophoretic methods.16 In a recent study, we found that each of four PCR products examined had a polymorphism (single- or doublebase substitutions) compared to the reported sequence.16 While the molecular weight determinations unambiguously indicated that a polymorphic site was present, the relatively high mass of these PCR products (ranging from ∼27 to 35 kDa per strand) precluded the assignment of a unique base composition solely from the molecular weight due to the large number of possible base compositions for the mass accuracy obtained (∼50 ppm, not much larger than that due to the possible variability in isotopic content19). Inspection of the mass differences (∆) between the expected single-strand (calculated) masses and measured masses indicated that the differences were due to simple single- or double-base substitutions. We have developed a simple mathematical strategy to assist characterization of dsDNA products from PCRs reactions which involves the integration of one or more of the following: the molecular weight measurement, mass precision, and primer composition constraints. The number of base compositions consistent with a mass measurement of finite precision obviously increases as size increases. For example, the number of combinations of different base sequences for a 16-mer is 4.3 × 109, albeit with significantly fewer unique base compositions (i.e., 969). In a real situation, base composition of PCR products, is substantially constrained, and one benefits substantially if this information is used. In addition, knowledge of the chain length is also of importance when characterizing PCR products since polymorphisms can be base deletions or insertions, and our approach unambiguously defines this parameter. The algorithm exploits the complementary nature (i.e., Watson-Crick base-pairing) of DNA to determine the base composition of the individual strands. The approach addresses the finite accuracy of mass measurements and is compared and contrasted with the recent work of McLafferty and co-workers.20 While this paper targets characterizing polymorphisms produced in a normal PCR scheme (i.e., using matched primer pairs), variations of the algorithm should also prove useful for determining base composition of other PCR products (e.g., arbitrarily primed PCR). Provision is made for exploiting modified base PCR and post-PCR chemical modification methods, and the algorithm (or a variation thereof) should prove useful as more PCR-MS-based detection schemes are developed. EXPERIMENTAL SECTION Polymerase Chain Reaction and Mass Spectrometry Measurements. The methods used to generate the PCR products, desalt the PCR product, and characterize the PCR product using ESI mass spectrometry have been described in detail elsewhere.16 In this work we utilize data from two previously (17) Chandler, D. P.; Brockman, F. J. Appl. Biochem. Biotechnol. 1996, 57/58, 971. (18) Brockman, F. J. Mol. Ecol. 1995, 4, 567. (19) Zubarev, R. A.; Demirev, P. A.; Hakansson, P.; Sundqvist, B. U. R. Anal. Chem. 1995, 67, 3793. (20) Aaserud, D. J.; Kelleher, N. L.; Little, D. P.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 1996, 7, 1266.

1544 Analytical Chemistry, Vol. 69, No. 8, April 15, 1997

measured PCR products with the following calculated and determined molecular weights:16 Bacillus subtilis Expected Base Composition A39G23C18T34 (Coding) coding strand noncoding strand

35 272.98 (calcd) 35 027.80 (calcd)

35 311.6 ( 3 (obsd) 34 986.6 ( 3 (obsd)

Bacillus thuringiensis Expected Base Composition A23G22C10T34 (Coding) coding strand noncoding strand

27 618.94 (calcd) 27 237.80 (calcd)

27 603.8 ( 2 (obsd) 27 253.3 ( 2 (obsd)

where “calcd” is the average molecular weight based on reported sequence and “obsd” is the mass spectrometrically determined molecular weight. The average molecular weights of the four different constituent bases used in the algorithm are A, 313.2096; G, 329.2089; T, 304.1959; and C, 289.1843. It is important to note that the electrosprayed ions were detected using an FTICR mass analyzer capable of providing highprecision mass measurements (25 kDa); therefore, average masses are calculated. DNA-Tools. The computations presented in here are a subset of a software package, DNA-2LS, developed at Pacific Northwest National Laboratory,22 which runs on an IBM-compatible Pentium processor under Microsoft Windows 95, using a visual basic graphical interface with the dynamic link library written in Borland C++. RESULTS AND DISCUSSION The analysis of PCR products by ESI-MS of >100 bp size and with mass measurement accuracy of better than 100 ppm (0.01%) has been recently demonstrated.16 It is important to note that, for DNA mass measurements, that double-stranded masses (i.e., the sum of the two single strands) are only slightly sequencedependent. This situation arises since the average mass nucleotide base residue of an A and a T is 308.703, and that of a G and C is 309.197. Thus, even the extreme variations in sequence for dsDNA lead to mass differences of only 0.15%. This highlights the importance of mass measurements of single-stranded (ss) DNA for elucidation of polymorphisms. Our recent report indicated that the double-stranded DNA was consistent with the predicted mass (based on sequence); however, the single-stranded species did not have the expected masses.16 This raises the question as to how the molecular weight information can be most effectively used for rapidly determining the base composition of PCR products using mass spectrometry. Since PCR produces (21) Winger, B. E.; Hofstadler, S. A.; Bruce, J. E.; Udseth, H. R.; Smith, R. D. J. Am. Soc. Mass Spectrom. 1993, 4, 566. (22) Muddiman, D. C.; Anderson, G. A. DNA-2LS; Pacific Northwest National Laboratory, Richland, WA, 1996 (all rights reserved).

complementary dsDNA (when using primer pairs), constraints can be reasonably introduced into the algorithm which reduce the number of base compositions possible for a given molecular weight. Species-specific nucleic acid biomarkers of closely related species are often similar in mass and base composition and difficult to resolve using conventional techniques, unless one resorts to complete (e.g., Sanger) sequencing. The ability to rapidly and accurately determine size and base composition (and, ideally, sequence) of PCR-amplified nucleic acids using mass spectrometric methods would be an enabling technique in a number of bioremediation and biomedical applications. A recent report by McLafferty and co-workers demonstrated that utilizing the complementary nature of DNA in conjunction with very high mass accuracy data (0.5 Da at 39 kDa), a single base composition could be delineated.20 In this approach, it is possible to determine the number of base pairs using the doublestranded mass measurement. In addition, by utilizing the number of base pairs and the high accuracy, a unique base composition could be determined. While this approach leads to results similar to ours (see below), this algorithm is limited in its applicability, since the mass accuracy needed is not routinely obtainable by conventional mass spectrometers. It is important to note that it is unnecessary to measure the double-stranded mass to determine the number of base pairs in the PCR product when both individual strands can be measured. In this work, we have conducted calculations to define the usefulness (and, perhaps more importantly, the limitations) of using mathematical approaches to determine the length of the PCR product and base composition using mass spectrometrically measured molecular weights. Nucleotide Compositions of ssDNA as a Function of Molecular Weight. The number of base combinations increases nearly exponentially as a function of molecular weight for DNA or RNA if one places no constraints on the base composition.6 The number of possible base compositions is

wG + xC + yA + zT ( D ) MW ( uncertainty

Figure 1. Plot of the number of base compositions for singlestranded DNA as a function of molecular mass and mass precision. The resonances in the number of base compositions are a result of the quantized nature of the individual bases.

be obtained (certainly feasible using FTICR), the number of possible base compositions would be substantially reduced. However, without imposing additional constraints on the possible base compositions, even with 0.001% mass accuracy, the exact determination of the base composition of individual strands (i.e., not utilizing their complementary nature) of PCR products is intractable at first glance. For example, the PCR product generated from the 16S/23 rRNA spacer region of B. subtilis is 114 base pairs in length, each strand having a mass of ∼35 000 Da.16 If the mass precision are 100, 50, and 10 ppm, the numbers of possible base combinations that satisfy the relationship

number of different base compositions ) MW (measured) ( precision

(1)

Thus, the integers w, x, y, and z multiplied by their respective base molecular weight (i.e., the masses of the bases A, G, C, and T plus a constant D mass deficit or gain, such as due to dephosphorylation) must fall within the molecular weight range defined by the right-hand side of eq 1. McCloskey and coworkers6 have previously noted that placing reasonable constraints on base composition can dramatically reduce the number of combinations consistent with a given mass measurement. They have demonstrated the utility of such an approach for small oligonucleotides.23 Building on this work, we have developed a more constrained method to determine base composition of much larger complementary DNA fragments (vide infra). Figure 1 shows the number of possible base compositions as a function of molecular weight (0-40 000 Da in 100 Da steps) at three different levels of mass precision. The number of possible base combinations is dependent on both the magnitude of the molecular weight and the accuracy of the molecular weight determination. The resonances apparent in Figure 1 are a direct result of the quantized nature of the masses of the four individual bases. Single-stranded species of >7-mer size actually overlap in the range of possible masses. Obviously, if a mass precision of 0.001% (10 ppm) could (23) Kowalak, J. A.; Pomerantz, S. C.; Crain, P. F.; McCloskey, J. A. Nucleic Acids Res. 1993, 21, 4577.

(precision scales with the measured value) are 5837, 3045, and 465, respectively. It is important to note that the different base compositions (i.e., 5837, 3045, and 465) have widely varying numbers of base pairs (i.e., lengths). Thus, interpretation methods that exploit reasonable constraints should greatly assist elucidation of base composition. Characteristics of the Algorithm. Our algorithm uses constraints on the composition of large DNA segments to significantly reduce the number of possible base compositions, as well as the information one would obtain for complementary strands (e.g., produced in PCR) to eliminate many combinations. The only assumption built into the approach is Watson-Crick base pairing for blunt-ended DNA. In our strategy (Figure 2), MW (1) must first be satisfied with a user-defined mass precision (e.g., 0.01%); the mass precision (in Da) is determined by multiplying the measured molecular weight by the user-defined precision, which automatically scales with molecular weight. Any base composition that is consistent with MW(1), assuming WatsonCrick base-pairing, must also be consistent with MW(2), or the composition is discarded (e.g., the number of G’s in the coding strand must equal the number of C’s in the noncoding strand). It was shown above that, using only single-strand molecular weight information, a large number of base compositions are possible. However, in the case of PCR, dsDNA is produced, one strand being complementary to the other. Denaturing the dsDNA Analytical Chemistry, Vol. 69, No. 8, April 15, 1997

1545

Figure 2. Schematic representation depicting the general characteristics of the algorithm used in this work. MW(1) must first be satisfied based on the measured molecular weight and the userdefined mass precision (%). Each base composition that satisfies MW(1) must then satisfy MW(2), with the only constraint being WatsonCrick base-pairing between the complementary strands (indicated by the arrows).

into two complementary single-stranded species followed by mass measurement provides the information that is the basis of our algorithm. While our algorithm is applicable to any massmeasured complementary DNA that is detected as ssDNA, this article focuses on the characterization of polymorphisms in PCR products. The examples that are described (see Experimental Section) are PCR products produced using primer pairs flanking a targeted region of a bacterial genome. Thus, the sequence (and therefore base composition) is known for unmutated strains, allowing this information to be used for ranking purposes (vide infra). For cases where the base composition of completely unknown complementary DNA is sought (e.g., arbitrarily primed PCR), alternative rankings (such as mass error) can be invoked. An additional facet of the program accounts for the use of different DNA polymerases in the PCR due to the different properties that each possesses (i.e., 3′-5′ exonuclease activity). For example, the polymerase Thermus aquaticus (Taq) lacks proofreading ability and is known to incorporate a nontemplated adenosine on the 3′ end of each strand.24 However, some enzymes, such as the Pyrococcus furiosus polymerase, have proofreading ability, and 3′ adenylation does not occur.25 Thus, knowledge of an enzyme’s fidelity modifies the algorithm slightly. Essentially, if a polymerase is used that is known to adenylate the 3′ end of a blunt-ended PCR product, the number of T’s in the coding strand must equal the number of A’s in the noncoding strand plus one additional A. Since 3′ adenylation is not always a quantitative process, the algorithm simultaneously accounts for the potential of both the 3′ adenylation product and the bluntended product. PCR using mass spectrometric detection can be used to determine a point mutation or a deletion of a gene, such as the cystic fibrosis transmembrane codon, which leads to a 3 bp deletion.26 While the examples presented herein clearly illustrate the feasibility of elucidating the most likely base compositions for a particular PCR product, the number of base pairs is uniquely determined (up to nearly kilobase size) from the complementary nature of the two single-stranded species. For example, from (24) Zhou, M. Y.; Clark, S. E.; Gomez-Sanchez, C. E. Biotechniques 1995, 19, 34. (25) Scott, B.; Nielson, K.; Cline, J.; Kretz, K. Strategies 1994, 7, 62. (26) Ch’ang, L.-Y.; Tang, K.; Schell, M.; Ringelberg, C.; Matteson, K. J.; Allman, S. L.; Chen, C. H. Rapid Commun. Mass Spectrom. 1995, 9, 772.

1546

Analytical Chemistry, Vol. 69, No. 8, April 15, 1997

knowledge of the two single-stranded masses (and the fact that the lengths of both strands must be the same if they are complementary), the number of base pairs is uniquely defined, even at 0.01% mass precision (see Figure 2). Thus, it is unnecessary to use dsDNA mass measurements to calculate the number of base pairs for a PCR product. If we assume a mass precision of 100 ppm (0.01%), which can be readily obtained for desalted samples using a quadrupole or ion trap mass spectrometer, the number of base compositions is significantly reduced using our algorithm (see Figure 2). Using our two examples, the number of combinations that can result using only the calculated coding strand mass and a mass precision of 0.01% for B. thuringiensis and B. subtilis are 2179 and 5994, respectively (see eq 1 and Figure 1). However, utilizing the complementary nature of the coding and noncoding strands with no additional constraints, the number of combinations is reduced to 39 and 92 for B. thuringiensis and B. subtilis, respectively, almost 2 orders of magnitude fewer combinations. Table 1 shows the output for both B. thuringiensis and B. subtilis up to 4 base-pair changes (with respect to the reported sequence in GeneBank), along with the theoretical mass, the computed base composition corresponding to that mass, the number of base pair changes (relative to the expected sequence), the length in base pairs, the merit, and the error. For the B. thuringiensis results, the algorithm indicates that either a T-to-C (coding) and A-to-G (noncoding) or G-to-A (coding) and C-to-T (noncoding) switch has occurred. Similarly, the B. subtilis algorithm results indicate that either a C-to-G (coding) and G-to-C (noncoding) or 2C’s-to-1A,1T (coding) and 2G’s-to-1A, 1T (noncoding) has occurred. The output is ranked sequentially, first by relating it to the number of base changes and second by mass error. As illustrated in Figure 3, the “merit value”, which is related to the number of base changes, is determined by comparing the predicted base composition with the calculated composition using only the coding strand (since a change in one strand must result in a change in the other). The merit value is useful for ranking because one generally does not expect a significant number of base changes to occur when the genomic sequence being amplified is already known. However, in cases when there is not an expected sequence, ranking is probably best done using mass error alone. The expected base composition of the coding strand for the 114 bp PCR product amplified from B. subtilis is A39T34C18G23 as shown in the left-hand column in Figure 3. The next column shows the base composition that has the highest merit (i.e., the fewest base pair changes) determined using the algorithm. The last column in Figure 3 is an accumulator that is obtained from the minimum number in each row. The merit is then calculated by taking the ratio of the total score (113 in this case) divided by the expected total (114 in this case), resulting in a value of 0.9912 (refer to the first entry in the B. subtilis output in Table 1). Thus, the merit value will always scale between 0 and 1, 1 being an exact fit to the predicted composition. The merit value is chosen as the first criterion for ranking the results because, in general, increasing numbers of base changes is unlikely for PCR products. This assumption should be valid when using matched primer pairs in PCR; owing to the low misincorporation rate of DNA polymerases, several base pair changes are not expected. An exception to this might occur if a gene having an allele which corresponds to a change in a large number of bases is amplified. The misincorporation rate of bases in PCR using Taq is about 1 in 9000 bases,27 with the rate of other enzymes being much lower.

Table 1. Algorithm Output for Two PCR Products strand

mass (Da)a

base composition

no. of bp changesb length (bp)

meritc

sqrt(SOS)d

coding noncoding

Output for B. thuringiensis: Expected Base Composition A23G22C10T34 (Coding) 27 604.0 A23G22C11T33 1 27 253.8 A33G11C22T23 89 0.9888

0.5484

coding noncoding

27 603.0 27 252.8

A24G21C10T34 A34G10C21T24

1 89

0.9888

0.9563

coding noncoding

27 604.9 27 254.8

A22G23C12T32 A32G12C23T22

3 89

0.9663

1.894

coding noncoding

27 602.0 27 251.9

A25G20C9T35 A35G9C20T25

3 89

0.9663

2.331

coding noncoding

27 606.0 27 250.8

A26G21C11T31 A31G11C21T26

4 89

0.9551

3.3094

coding noncoding

Output for B. subtilis: Expected Base Composition A39G23C18T34 (Coding) 35 313.0 A39G24C17T34 1 34 987.8 A34G17C24T39 114 0.9912

1.867

coding noncoding

35 312.0 34 986.8

A40G23C16T35 A35G16C23T40

2 114

0.9825

0.4911

coding noncoding

35 314 34 988.8

A38G25C18T33 A33G18C25T38

2 114

0.9825

3.261

coding noncoding

35 315.1 34 984.8

A42G23C17T32 A32G17C23T42

3 114

0.9737

3.911

coding noncoding

35 311.1 34 985.8

A41G22C15T36 A36G15C22T41

4 114

0.9649

0.9514

coding noncoding

35 309.0 34 988.8

A38G23C15T38 A38G15C23T38

4 114

0.9649

3.408

coding noncoding

35 310.0 34 989.8

A37G24C16T37 A37G16C24T37

4 114

0.9649

3.594

coding noncoding

35 314.1 34 983.8

A43G22C16T33 A33G16C22T43

4 114

0.9649

3.745

coding noncoding

35 315.0 34 989.8

A37G26C19T32 A32G19C26T37

4 114

0.9649

4.657

a The average mass is determined from the base composition that satisfies the algorithm (Figure 2) accounting for the lack of a 3′ terminal phosphate. b The number of base pair changes is determined by comparing the reported base composition (from sequence information) with that determined by the algorithm. The length (in base pairs) is determined solely from the algorithm and mass spectrometrically measured molecular weights. c Refer to Figure 3 and the text for an explanation. d The second method of ranking the possible base compositions is done using the error from both strands. The absolute mass difference between the measured mass of each strand and that calculated from the possible base composition (column 2 in this table) is determined. These two values (one for the coding strand and one for the noncoding strand) are then squared and summed; the square-root of this value is then reported.

While our algorithm (see Figure 2) did not always give a unique solution, given modest mass measurement accuracies, it decreased the number of possibilities to a manageable value (