Anal. Chem. 2004, 76, 2438-2445
Top-Down Characterization of Nucleic Acids Modified by Structural Probes Using High-Resolution Tandem Mass Spectrometry and Automated Data Interpretation Katherine A. Kellersberger,† Eizadora Yu,† Gary H. Kruppa,‡ Malin M. Young,‡ and Daniele Fabris*,†
Department of Chemistry and Biochemistry, University of Maryland Baltimore County, Baltimore, Maryland 21250, and Sandia National Laboratories, Livermore, California 94551-0969
A top-down approach based on sustained off-resonance irradiation collision-induced dissociation (SORI-CID) has been implemented on an electrospray ionization (ESI) Fourier transform mass spectrometer (FTMS) to characterize nucleic acid substrates modified by structural probes. Solvent accessibility reagents, such as dimethyl sulfate (DMS), 1-cyclohexyl-3-(2-morpholinoethyl)carbodiimide metho-p-toluenesulfonate (CMCT), and β-ethoxyr-ketobutyraldehyde (kethoxal, KT) are widely employed to reveal the position of single- vs double-stranded regions and obtain the footprint of bound proteins onto nucleic acids structures. Established methods require end-labeling of the nucleic acid constructs, probe-specific chemistry to produce strand cleavage at the modified nucleotides, and analysis by polyacrylamide gel electrophoresis to determine the position of the susceptible sites. However, these labor-intensive procedures can be avoided when mass spectrometry is used to identify the probeinduced modifications from their characteristic mass signatures. In particular, ESI-FTMS can be directly employed to monitor the conditions of probe application to avoid excessive alkylation, which could induce unwanted distortion or defolding of the substrate of interest. The sequence position of the covalent modifications can be subsequently obtained from classic tandem techniques, which allow for the analysis of individual target adducts present in complex reaction mixtures with no need for separation techniques. Selection and activation by SORICID has been employed to reveal the position of adducts in nucleic acid substrates in excess of 6 kDa. The stability of the different covalent modifications under SORI-CID conditions was investigated. Multiple stages of isolation and activation were employed in MSn experiments to obtain the desired sequence information whenever the adduct stability was not particularly favorable, and SORICID induced the facile loss of the modified base. A new program called MS2Links was developed for the automated reduction and interpretation of fragmentation data * To whom correspondence should be addressed. Phone: (410) 455-3053. Fax: (410) 455-2608. E-mail:
[email protected]. † University of Maryland Baltimore County. ‡ Sandia National Laboratories.
2438 Analytical Chemistry, Vol. 76, No. 9, May 1, 2004
obtained from modified nucleic acids. Based on an algorithm that searches for plausible isotopic patterns, the data reduction module is capable of discriminating legitimate signals from noise spikes of comparable intensity. The fragment identification module calculates the monoisotopic mass of ion products expected from a certain sequence and user-defined covalent modifications, which are finally matched with the signals selected by the data reduction program. Considering that MS2Links can generate similar fragment libraries for peptides and their covalent conjugates with other peptides or nucleic acids, this program provides an integrated platform for the structural investigation of protein-nucleic acid complexes based on cross-linking strategies and top-down ESI-FTMS. Protein-nucleic acid interactions play key roles in the replication, transcription, translation, and repair of the genome of all organisms, from viruses to humans. The framework necessary to understand the nature and mechanism of these vital functions is provided by the complete 3D structures of the protein-nucleic acid complexes formed in the cellular environment during such processes. The focus of recent structural genomics initiatives has been directed to the elucidation of target protein structures,1,2 but little emphasis has been put specifically on protein-nucleic acid complexes. Very frequently, due to their sheer size or inadequate crystallization, these complexes represent a challenge for the established high-resolution techniques, NMR, and X-ray crystallography. The structural determinants of protein-nucleic acid interactions can be investigated by using chemical reagents, which introduce structure-specific modifications of either the nucleic acids or the proteic components of the complex of interest. For example, alkylating probes are generally applied to obtain the footprint of proteins onto nucleic acids, or to determine the location of double-stranded vs single-stranded regions in the nucleic acid moiety.3,4 Bifunctional and photoactivated cross-linkers are commonly employed to reveal the exact position of molecular (1) Service, R. F. Science 2002, 298, 948-950. (2) Terwilliger, T. C. Nat. Struct. Biol. 2000 (Suppl. 7), 935-939. (3) Peattie, D. A.; Gilbert, W. Proc. Natl. Acad. Sci. U.S.A. 1980, 77, 4679-4682. (4) Walker, T. A.; Johnson, K. D.; Olsen, G. J.; Peters, M. A.; Pace, N. R. Biochemistry 1982, 21, 2320-2329. 10.1021/ac0355045 CCC: $27.50
© 2004 American Chemical Society Published on Web 04/03/2004
contacts between the different components or to identify the tertiary interactions responsible for making adjacent domains fold together into the final form of a complex.5-8 Currently, established protocols for chemical probing rely heavily on polyacrylamide gel electrophoresis (PAGE) to obtain the position of modified nucleotides. These methods often require labeling of the nucleic acid substrates and probe-specific chemistry to induce strand cleavage prior to PAGE analysis. Not only are these protocols time- and labor-intensive, but they are also limited to the nucleic acid components of the complex: no equivalent procedures are available to obtain the location of modified residues on the proteic regions. In contrast, mass spectrometry (MS) enables the sequence characterization of covalent adducts of both nucleic acids and proteins with sensitivity, accuracy, and speed that equal or exceed those achieved by other established techniques. These favorable features constitute the driving force behind the development of new approaches combining chemical probing with MS detection for the investigation of 3D structures of proteins9-13 and their complexes with nucleic acids.14,15 In general, these approaches can follow either a bottom-up strategy, in which the probed species are hydrolyzed to obtain products of smaller size for mass mapping and tandem mass spectrometry,16 or a top-down strategy, in which probed samples are submitted directly to tandem mass spectrometry with no prior cleavage.17 In either case, sequencing experiments are carried out to correctly identify the location of probed residues on the structure of interest. The ability of gas-phase activation techniques to provide characteristic series of fragments, which are diagnostic of the precursor ion sequence, was realized early on and applied to singly as well as multiply charged oligonucleotides.18-25 Almost immediately, these methods were successfully employed to characterize nucleic acid adducts with carcinogens and drugs,26-33 or posttranscriptional modifications of natural RNA.34,35 In this report, we show that high-resolution tandem mass spectrometry on a Fourier transform mass spectrometer (FTMS)36,37 can be employed in a top-down approach to characterize oligonucleotides modified by solvent accessibility probes that are commonly utilized in structural investigations of protein-nucleic acid complexes. This technique allowed us to investigate the base specificity and reactivity of the different probes under various conditions optimized for direct mass spectrometric analysis. (5) Krol, A.; Carbon, P. Methods Enzymol. 1989, 180, 212-227. (6) Ji, T. H.; Ji, I. Pharmacol. Ther. 1989, 43, 421-432. (7) Mattson, G.; Conklin, E.; Desai, S.; Nielander, G.; Savage, M. D.; Morgensen, S. Mol. Biol. Rep. 1993, 17, 167-183. (8) Wilms, C.; Noah, J. W.; Zhong, D.; Wollenzien, P. RNA 1997, 3, 602-612. (9) Young, M. M.; Tang, N.; Hempel, J. C.; Oshiro, C. M.; Taylor, E. W.; Kuntz, I. D.; Gibson, B. W.; Dollinger, G. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 5802-5806. (10) Kruppa, G. H.; Schoeniger, J.; Young, M. M. Rapid Commun. Mass Spectrom. 2003, 17, 155-162. (11) Pearson, K. M.; Pannell, L. K.; Fales, H. M. Rapid Commun. Mass Spectrom. 2002, 16, 149-159. (12) Back, J. W.; de Jong, L.; Muijsers, A. O.; de Koster, C. G. J. Mol. Biol. 2003, 331, 303-313. (13) Lanman, J.; Lam, T. T.; Barnes, S.; Sakalian, M.; Emmett, M. R.; Marshall, A. G.; Prevelige, P. E. J. J. Mol. Biol. 2003, 325, 759-772. (14) Yu, E.; Fabris, D. J. Mol. Biol. 2003, 330, 211-223. (15) Kvaratskhelia, M.; Miller, J. T.; Budihas, S. R.; Pannell, L. K.; Le Grice, S. F. Proc. Natl. Acad. Sci. U.S.A. 2002, 99, 15988-15993. (16) McLafferty, F. W. Science 1981, 214, 280-287. (17) Kelleher, N. L.; Lin, H. Y.; Valaskovic, G. A.; Aaserud, D. J.; Fridriksson, E. K.; McLafferty, F. W. J. Am. Chem. Soc. 1999, 121, 806-812.
Although a large collection of software tools have been introduced over the years to assist in the prediction and interpretation of peptide fragmentation for protein characterization, to the best of our knowledge, the Mongo Oligo Mass Calculator (http://medlib.med.utah.edu/masspec/) developed by McCloskey’s group and its SOS extension38 appear to be the only publicly available tools designed specifically for nucleic acids and their covalent adducts. Furthermore, there are no programs that can cover the fragmentation of both peptides and nucleic acids in an integrated platform. In preparation for the utilization of chemical probes in the investigation of large protein-nucleic acid complexes, we introduce here a new program called MS2Links. Based on the software MS2Assign, which was proposed earlier for automated interpretation of tandem mass spectra of cross-linked peptides,39 the algorithm included in MS2Links has been extended to perform the rapid data reduction and interpretation of highresolution tandem mass spectra of native and chemically modified oligonucleotides. EXPERIMENTAL SECTION Materials. DNA oligonucleotides were purchased from the W. M. Keck Foundation Biotechnology Resource Laboratory at Yale University (New Haven, CT), and RNA oligonucleotides were purchased from Dharmacon Research Inc. (La Fayette, CO). Extensive desalting was performed to reduce sodium cation adduction. Dimethyl sulfate (DMS) and 1-cyclohexyl-3-(2-mor(18) Cerny, R. L.; Tomer, K. B.; Gross, M. L.; Grotjahn, L. Anal. Biochem. 1987, 165, 175-182. (19) McLuckey, S. A.; Van Berkel, G. J.; Glish, G. L. J. Am. Soc. Mass Spectrom. 1992, 3, 60-70. (20) McLuckey, S. A.; Habibi-Goudarzi, S. J. Am. Chem. Soc. 1993, 115, 1208512095. (21) Little, D. P.; Chorush, R. A.; Spier, J. P.; Senko, M. W.; Kelleher, N. L.; McLafferty, F. W. J. Am. Chem. Soc. 1994, 116, 4893-4897. (22) Little, D. P.; Thannhauser, T. W.; McLafferty, F. W. Proc. Natl. Acad. Sci. U.S.A. 1995, 92, 2318-2322. (23) Nordhoff, E.; Kirpekar, F.; Roepstorff, P. Mass Spectrom. Rev. 1996, 15, 67-138. (24) Ni, J.; Pomerantz, C.; Rozenski, J.; Zhang, Y.; McCloskey, J. A. Anal. Chem. 1996, 68, 1989-1999. (25) Limbach, P. A. Mass Spectrom. Rev. 1996, 15, 297-336. (26) Barry, J. P.; Vouros, P.; Van Schepdael, A.; Law, S.-J. J. Mass Spectrom. 1995, 30, 993-1006. (27) Wickham, G.; Iannitti, P.; Boschenok, J.; Sheil, M. M. FEBS Lett. 1995, 360, 231-234. (28) Deforce, D. L.; Ryniers, F. P.; van den Eeckhout, E. G.; Lemiere, F.; Esmans, E. L. Anal. Chem. 1996, 68, 3575-3584. (29) Chaudhary, A. K.; Reddy, G. R.; Blair, I. A.; Marnett, L. J. Carcinogenesis 1996, 17, 1167-1170. (30) Iannitti, P.; Sheil, M. M.; Wickham, G. J. Am. Chem. Soc. 1997, 119, 14901491. (31) Marzilli, L. A.; Wang, D.; Kobertz, W. R.; Essigmann, J. M.; Vouros, P. J. Am. Soc. Mass Spectrom. 1998, 9, 676-682. (32) Ni, J.; Liu, T.; Kolbanovskiy, J.; Krzeminski, J.; Amin, S.; Geacintov, N. E. Anal. Biochem. 1998, 264, 222-229. (33) Glover, R. P.; Lamb, J. H.; Farmer, P. B. Rapid Commun. Mass Spectrom. 1998, 12, 368-372. (34) Kowalak, J. A.; Pomerantz, S. C.; Crain, P. F.; McCloskey, J. A. Nucleic Acids Res. 1993, 21, 4577-4585. (35) McCloskey, J. A.; Graham, D. E.; Zhou, S.; Crain, P. F.; Ibba, M.; Konisky, J.; Soll, D.; Olsen, G. J. Nucleic Acids. Res. 2001, 29, 4699-4706. (36) Comisarow, M. B.; Marshall, A. G. Chem. Phys. Lett. 1974a, 282-283. (37) Hendrickson, C. L.; Emmett, M. R.; Marshall, A. G. Annu. Rev. Phys. Chem. 1999, 50, 517. (38) Rozenski, J.; McCloskey, J. A. J. Am. Soc. Mass Spectrom. 2002, 13, 200203. (39) Schilling, B.; Row, R. H.; Gibson, B. W.; Guo, X.; Young, M. M. J. Am. Soc. Mass Spectrom. 2003, 14, 834-850.
Analytical Chemistry, Vol. 76, No. 9, May 1, 2004
2439
pholinoethyl)carbodiimide metho-p-toluenesulfonate (CMCT) were purchased from Sigma-Aldrich (St. Louis, MO) and used without further purification. Kethoxal (KT) was purchased from ICN Biomedicals, Inc. (Costa Mesa, CA) and used without further purification. All solvents were of HPLC grade, and RNase-free water was obtained by treatment of RO grade water through a Millipore Milli-Q (Billerica, MA) purification system. Chemical Probing. Prior to probe application, the nucleic acid substrates were heat-renatured to ensure proper folding. Probing was performed according to established protocols; however, the ratio between probe and substrate, the reaction temperature, and the incubation time were varied to obtain subsaturating as well as saturating conditions.14 In general, alkali-containing buffers were replaced with ammonium equivalents to dispense with desalting and make the reaction products immediately amenable to direct infusion electrospray. Alternatively, the modified products were purified by ethanol precipitation overnight at -20 °C. In the case of DMS, ∼10 µg of substrate was dissolved in appropriate volumes of 50 mM NH4OAc (pH 7.0) to reach an ∼30 µM final concentration (e.g., ∼60 µL for Top17) and incubated with 1 µL of DMS at room temperature for 30 min. Reactions were quenched with 1.5 M NH4OAc and 1 M β-mercaptoethanol. The products were immediately precipitated by adding 3 vol of cold ethanol. Pellets were resuspended in appropriate volumes of 10 mM NH4OAc (pH 7.0) prior to mass spectrometry analysis. For the CMCT reaction, ∼10 µg of substrate was dissolved in appropriate volumes of 50 mM NH4OAc (pH 8.0) to a final ∼30 µM concentration and treated with 1 µL of CMCT (42 mg/mL) for 45 min at room temperature. No ethanol precipitation or other purification procedures were necessary for this reagent. The kethoxal reaction was carried out by treating 10 µg of substrate in 50 mM (NH4)3BO3 (pH 7.0) to a final ∼30 µM concentration with 2 µL of freshly prepared reagent (20 mg/mL in 20% ethanol) for 45 min at room temperature. Also in this case, no further purification was necessary. Mass Spectrometry. All experiments were performed on a Bruker Apex III (Billerica, MA) 7.0 T Fourier transform ion cyclotron resonance mass spectrometer (FT-ICR MS) equipped with an Apollo electrospray ion source and a home-built heated metal capillary interface. Samples were analyzed in negative ion mode with a desolvation temperature of 150 °C. Reaction mixtures were resuspended or diluted in a solution of 10 mM ammonium acetate (pH 7.0) and 10% isopropyl alcohol to provide a final analyte concentration of ∼10 to 20 µM. Analyte solutions were infused directly into the mass spectrometer by a syringe pump at a flow rate of 2 µL/min. For tandem experiments, precursor ions of interest were isolated in the ICR cell using correlated rf sweeps (CHEF),40 followed by activation through sustained off-resonance irradiation (SORI).41-43 Initially, frequency offsets above and below the frequency of the selected precursor ion were explored to avoid possible “blind spots” in the product spectra.41 Optimal results were provided by irradiation frequencies that were typically 6001500 Hz below that of the precursor ion. Argon was used as collision gas in all experiments. Selection of first generation
products and further activation in consecutive fragmentation procedures (MSn)44 were performed in the same fashion. Thirtyfive to 50 scans were typically averaged for each spectrum. Data Reduction and Interpretation. MS2Links was developed as an extension of the previously introduced MS2Assign software,39 which was expanded to cover the fragmentation of native and modified nucleic acids. Prior to assignment by MS2Links, a data reduction module was used to filter the raw data obtained by a high-resolution mass spectrometer to retain only the signals with plausible isotopic patterns. Using the standard data processing software, which is similar to that included in any of the various manufacturer’s platforms, users can set an intensity threshold to discard signals that are considered too weak for further processing. Starting from this selected data set, the data reduction module can further discriminate the actual signal from any noise that may be above the initial intensity threshold (e.g., spikes or aliases) by searching for possible M + 1 and M + 2 isotopes with m/z spacing compatible with the original charge state of the precursor ion. The intensity of each putative isotopic signal is checked against a theoretical intensity calculated from a typical isotopic distribution of a peptide with the same size, on the basis of the mass of the hypothetical average amino acid known as “averagine”.45 The error limits (tolerance) employed for the intensity check are quite large to allow for distortion of the isotopic envelope by the SORI-CID process so that the use of a protein-based averagine does not result in rejection of any legitimate peaks from the data set. We are, however, in the process of calculating an analogous “averagebasine” (reflecting the mass and isotopic distribution of the hypothetical average nucleotide) to be used by the macro in place of averagine, which may provide a closer fit to data obtained by MS/MS methods where the isotope distributions are more reliable (e.g., Q-FTMS). At this point, a table containing the actual monoisotopic masses of each fragment is calculated taking into account the charge state inferred from the respective isotopic spacing. It should be noted that this data reduction macro uses concepts taken from other procedures developed for the automated assignment of complex multiply charged spectra, including the THRASH46 and Zscore47 algorithms. However, the data reduction module does not try to assign a most probable monoisotopic mass based on a detailed fit of the isotopic distribution. Instead, the macro indicates that either the monoisotopic mass was certainly observed (predicted monoisotopic intensity should have been >3 times the noise level), possibly observed (monoisotopic intensity between 1 and 3 times the noise level), or definitely not observed (predicted monoisotopic intensity less than the noise level). Only the monoisotopic masses falling in the first category are selected for further processing by MS2Links. The fragment identification module MS2Links requires the input of the known sequence of the nucleic acid construct under investigation and a user-defined set of monoisotopic incremental masses corresponding to the applied probe(s) or cross-linking reagents. Additional required information includes the base
(40) de Koning, L. J.; Nibbering, N. M. M.; van Orden, S. L.; Laukien, F. H. Int. J. Mass Spectrom. Ion Processes 1997, 165/166, 209-219. (41) Senko, M. W.; Speir, J. P.; McLafferty, F. W. Anal. Chem. 1994, 66, 28012808. (42) Shukla, A. K.; Futrell, J. H. J. Mass Spectrom. 2000, 35, 1069-1090. (43) Gauthier, J. W.; Trautman, T. R.; Jacobson, D. B. Anal. Chim. Acta 1991, 246, 211-225.
(44) Solouki, T.; Pasa-Tolic, L.; Jackson, G. S.; Guan, S.; Marshall, A. G. Anal. Chem. 1996, 68, 3718-3725. (45) Senko, M. W.; Beu, S. C.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 1995, 6, 229-233. (46) Horn, D. M.; Zubarev, R. A.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 2000, 11, 320-332. (47) Zhang, Z.; Marshall, A. G. J. Am. Soc. Mass Spectrom. 1998, 9, 225-233.
2440
Analytical Chemistry, Vol. 76, No. 9, May 1, 2004
Scheme 1
specificity of the probe and the maximal number of adducts per fragment to be calculated from the mass of the precursor ion. The program then calculates a combinatorial fragment library consisting of the expected masses for the classic ion series observed in tandem spectra of nucleic acids,19,20,23,24 including the 5′-series (a - B) and (d - H2O), the 3′-series w and y, and possible internal fragments, along with related peaks arising from the loss of water, user-defined modifications and cross-linking reagents. The final peak assignment is performed by matching the observed masses provided by the data reduction module with those of the fragments predicted from the substrate sequence within the error limits specified by the user. If the output from the data reduction module indicates that the true monoisotopic mass was between 1 and 3 times the noise level, meaning it may not have been observed, the MS2Links assignment module also matches a mass lower by the difference between 13C and 12C, and includes this in the output with a special annotation. For cases in which the expected monoisotopic intensity is below the noise level, only the mass lower by a difference between 13C and 12C is matched. The emphasis in this procedure is not to miss any possible assignments because the software makes an incorrect fit to an isotopic distribution, but rather to assign all reasonable possibilities and to let the user decide on the most reasonable assignment on the basis of the appearance of the peak and the assignments themselves. This may mean that for weaker peaks, the process may not be fully automated and will require some level of intervention on the part of the user for a reliable assignment. The importance of such weak peaks cannot be discounted, because they often provide key information about the position of a modification or offer additional redundancy, which can greatly increase the level of confidence in the final interpretation. RESULTS AND DISCUSSION Probe Application. Chemical probes selected for this study included dimethyl sulfate (DMS), 1-cyclohexyl-3-(2-morpholinoethyl)carbodiimide metho-p-toluenesulfonate (CMCT), and β-ethoxyR-ketobutyraldehyde (kethoxal, KT) (Scheme 1). The N7-guanine is highly susceptible to methylation by DMS in the absence of steric hindrance from base stacking or tertiary interactions.3,5 DMS also induces methylation of N1-adenine and N3-cytidine wherever such positions are not protected by base-pairing or proteinbinding,3,5 but with considerably lower affinity than for N7-guanine. Kethoxal (N1- and N2-guanine) and CMCT (N3-uridine or thymine and N1-guanine) are generally used in complementary roles to complete the full spectrum of information on base-pairing and steric protection for the four ribonucleotides.48 In general, these reagents are widely employed to discriminate single-stranded vs double-stranded regions and obtain the footprint of ligands onto nucleic acid substrates. Following their application to the structure (48) Christiansen, J.; Garrett, R. Methods Enzymol. 1988, 164, 456-468.
of interest, the established PAGE strategies require probe-specific treatments to induce strand scission at the modified nucleotide, thus revealing the sequence position of alkylated/cleaved bases. However, we have recently demonstrated that a common hydrolytic procedure can be employed with any of the three reagents when ESI-FTMS is included in a bottom-up approach, which involves mass mapping and, if necessary, sequencing of the modified RNA products.14 Replacing PAGE with ESI-FTMS enables us to immediately assess the extent of alkylation over the entire substrate before any hydrolysis or sequencing procedure is carried out, thus facilitating the optimization of the conditions used for probe application. A summary of the constructs employed in this study and the products observed upon treatment with solvent-accessibility reagents is provided in Table 1. With the exception of kethoxal, which is strictly G-specific, solvent accessibility probes present variable degrees of reactivity for the different nucleotides; thus, the ratio between substrate and reagent could be carefully tuned to selectively hit only the most reactive targets. Consistent with a U > G relative scale of reactivity, CMCT was found to readily alkylate the two uridines present in the unstructured control uRNA (see the Experimental Section) when a 1:5 substrate to reagent was applied.14 However, significantly larger amounts of CMCT (up to 1:30) induced the modification of all susceptible bases in the construct, which includes two Us and three Gs. The number of alkylated nucleotides is dependent not only on intrinsic chemical reactivity, but also on the degree of protection of target nucleotides in the structural context of a substrate, which is the purpose of these solvent accessibility probes. Monitoring the number of hits as a function of reagent concentration should also serve to test whether initial alkylation may cause partial defolding of the target structure, which could expose new (secondary) sites that were previously protected. The effects of defolding on reactivity patterns can be immediately detected by ESI-FTMS, as shown by the comparison between the reaction products observed after applying CMCT to HIV-1 stemloop 2 (SL2) at room temperature or at 85 °C (Figure 1). A maximum of four adducts were detected at room temperature (even after protracted incubation to ensure completion), which is consistent with the presence of four susceptible sites on the singlestranded loop of the hairpin structure (one U and three Gs). This number increased significantly at higher temperature, as expected from the melting of the double-stranded stem and the consequent loss of steric protection. Finally, it should be also pointed out that the gentle nature of electrospray ionization enables the direct observation of the effects of structural probes on the state of association of noncovalent complexes. In previous work, ESI-FTMS was successfully used to follow the reaction of either CMCT or kethoxal with a 1:1 complex formed by HIV-1 nucleocapsid protein and RNA stemloop 2 (SL2),14 showing that the binding interactions induced complete protection of the loop nucleotides, which were efficiently alkylated in the absence of protein under like conditions. These experiments allowed us to conclude also that the addition of the probes in solution did not induce the dissociation of the noncovalent complex, which was detected intact by ESI-FTMS. Top-Down Identification of Probed Sites. The position of modified bases was obtained by a top-down approach, which Analytical Chemistry, Vol. 76, No. 9, May 1, 2004
2441
Table 1. Summary of Nucleic Acids Constructs Employed in This Studya adducts detected by ESI-FTMSb name
sequence (5′-3′)
DMSc
CMCTd
KTe
uRNA Top17 Comp17 SL2 RNA SL3 RNA SL4 RNA SK7 SK8
CAG UCA GCU CAG TAA TAC GAC TCA CTA TA ATT ATG CTG AGT GAT AT GGC GAC UGG UGA GUA CGC C GGA CUA GCG GAG GCU AGU CC GGA GGU GCG AGA GCG UCU CC TTT TGA CTT TTT AAA GAC ATA TAT GTC TTT AAA AAG TCA AAT A
6 (10) 3 3 4 (8) 4 (7) 6 (9) 2 (10) 2 (11)
3 (5) 2 4 4 (8) 3 2 3 (9) 4 (10)
2 (2) 1 (1) 4 (4) 3 (8) 2 1 2 (2) 2 (2)
a Probe application was performed under different conditions to verify the base specificity of the different reagents and to recognize possible denaturing effects induced by covalent modification of the initial substrate (see the Experimental Section). b Numbers in parentheses indicate adducts observed after significantly longer incubation time (see the Experimental Section). c DMS relative reactivity scale: G > A > C; incremental monoisotopic mass 14.0087 Da. d CMCT relative reactivity scale: U(T) > G; incremental monoisotopic mass 251.1992 Da. e KT: G-specific; incremental monoisotopic mass 130.0630 Da.
Figure 1. Effects of heat denaturation on the base-protection pattern in HIV-1 stem-loop 2 (SL2). (a) ESI-FTMS spectrum of the substrate treated with CMCT at room temperature (see Experimental Section). (b) ESI-FTMS spectrum of reaction carried out at 85 °C under otherwise identical conditions. Both reactions were allowed to proceed until no further increase in the number of hits was detected.
dispenses with strand hydrolysis and calls for the direct gas-phase activation by SORI-CID. For example, isolation and activation of the precursor ion corresponding to the CMCT monoadduct of Top17 (see the Experimental Section) provided abundant ion series, which were used to verify the sequence of the precursor ion and infer the position of modified bases (Figure 2). The majority of structurally informative fragments yielded by this experiment consisted of (a - B)- and w-type ions, according to the accepted nomenclature;19,20 however, a large number of internal fragments were also observed. These nearly complete sequence series and several internal fragments (CMCT-modified [6-10] and [7-11], in particular) served to identify the sites of alkylation with T4, T10, and T14. The observation of multipleposition isomers for the monoadduct species, which are isobaric, is not surprising in light of the fact that Top17 includes a fairly large number of susceptible bases (i.e., five Ts and one G, although the latter is not expected to compete for alkylation under conditions optimized for monoadduct formation)14 and that the 2442
Analytical Chemistry, Vol. 76, No. 9, May 1, 2004
construct does not assume any stable 3D structure that could result in significant protection of specific sites. It should be noted also that the large variety of fragments obtained from multiple monoadduct isomers manifests itself as partially overlapping ions series, which would make data interpretation very arduous without adequate software aids. Adduct stability plays a very important role in the success of a top-down approach. In this study, CMCT adducts proved to be very stable under the conditions employed for gas-phase activation. In fact, CMCT-modified oligonucleotides (Table 2) followed the typical fragmentation patterns. A characteristic mass shift corresponding to the incremental mass of the reagent (251.1992 Da) could be observed for specific sequence ions consistent with the position of the modified bases. On the contrary, kethoxal adducts were found to dissociate readily upon activation to regenerate the initial unmodified substrate, thus losing any “memory” of the location of probed sites. While further investigation will be necessary to elucidate the causes and hopefully avoid
Figure 2. SORI-CID of the CMCT monoadduct of Top17 (precursor ion m/z 1345.47, charge state 4-). The included structure provides a summary of the most abundant ion series, which reveal the sequence position of the modified nucleotides: b (a - B); 9 w; O include the incremental mass of CMCT (251.1992 Da). Table 2. Summary of Constructs/Adducts Investigated by SORI-CID in the Top-Down Approach name uRNA uRNA, DMS (mono-) uRNA, DMS (tri-) Top17 Top17, DMS (mono-) Top17, CMCT (mono-) Top17, KT (mono-) SL2 SL2, DMS (mono-) SL2, CMCT (mono-) SL3 SL3, DMS (mono-) SK8 SK8, DMS (mono-) SK8, CMCT (mono-) SK8, KT (mono-)
sequence
MW
CAGUCAGCUCAG
3792.56 3806.58 3834.61 TAA TAC GAC TCA CTA TA 5134.91 5148.92 5386.11 5263.96 GGC GAC UGG UGA GUA CGC C 6128.89 6142.91 6380.09 GGA CUA GCG GAG GCU AGU CC 6457.91 6471.93 TAT GTC TTT AAA AAG TCA AAA T 6737.18 6751.19 6988.38 6866.23
a complete loss of the modifying group, this effect may constitute a major limitation for the application of kethoxal in a top-down strategy based on SORI-CID. Fortunately, kethoxal adducts have proved to be very stable to the nuclease digestion and mass mapping procedures employed in our previous bottom-up investigation.14 In the case of DMS, alkylation of N1-adenine and N3-cytidine resulted in stable adducts, but methylation of N7-guanine induced the facile gas-phase cleavage of the N-glycosidic bond between N9 and C1′. In fact, the loss of methylguanine was the predominant process observed in the SORI-CID spectrum of the triply methylated substrate Me3-uRNA (Figure 3), which showed also a reduced number of characteristic sequence ions. Although the loss of nucleobases has been occasionally observed in CID spectra of unmodified oligonucleotides with intensities that are comparable to those of regular series ions,23 the cleavage of methylguanine appears here to be much more favorable than the fragmentation of the nucleic acid phosphodiester chain. In light of the strict specificity for methylguanine, this process could be compared to the elimination of N7-modified guanine in solution, which constitutes one of the mechanisms of nucleic acid damage induced by carcinogens.49 Although incomplete, the fragmentation pattern confirmed that no significant methylation of A or C was induced
Figure 3. SORI-CID of Me3-uRNA (precursor ion m/z 1277.62, charge state 3-) with interpretation of the observed fragmentation pattern: b (a - B); 9 w; 2 (d - H2O); f include the methyl incremental mass (14.0087 Da); [ contain an abasic site, which is formed by elimination of methyl-G independent of the strand cleavage mechanism typical of a - B fragments.
by the selected reaction conditions, which resulted only in guanine alkylation (Figure 3). These results are consistent with the known relative scale of reactivity toward DMS.3,5 In some instances, when the loss of modified base is the predominant event and backbone fragmentation is not nearly as favorable, the information provided by a single stage of SORICID may not be sufficient to infer the sequence position of the adduct. Fortunately, FTMS offers the possibility of performing consecutive steps of ion isolation and activation in MSn experiments, which can be employed to obtain the missing information. For example, the initial activation of the monomethylated RNA hairpin SL3 from HIV-1 resulted in the observation of abundant loss of the modified base, but very sparse backbone fragmentation (MS2, Figure 4a). Selection of the first-generation product with loss of methylguanine and subsequent SORI activation produced more extensive fragmentation of the oligonucleotide backbone (MS3, Figure 4b), although still far from the abundant fragmentation observed for nonmodified SL3. Isolation and activation of a second-generation product, which corresponds to the loss of a nonmodified guanine in the second stage (MS3), showed that loss of base was no longer the predominant event, and the more informative backbone fragments were detected with comparable intensities (MS4, Figure 4c). Similar experiments could be performed on different first- or second-generation products following alternative schemes (e.g., (M f M - MeG) f (M - MeG - G) vs (M f M - MeG - G) f (y19 - G), etc.) to achieve the desired sequence coverage. In the case of monomethyl-SL3, the alkylation pattern was found to be consistent with possible alkylation of the exposed guanines in the tetraloop region of the stem-loop structure. As expected, base-pairing protected the nucleotides located in the double-stranded stem, save for the 5′ guanine (G1), which could become exposed during transient “breathing” of the stem structure. MS2Links for Automated Data Reduction and Interpretation. As observed in this investigation of nucleic acids treated with structural probes, the complexity of tandem mass spectra is significantly increased by the possible presence of multiple(49) Burrows, C. J.; Muller, J. C. Chem. Rev. 1998, 98, 1109-1151.
Analytical Chemistry, Vol. 76, No. 9, May 1, 2004
2443
Figure 5. SORI-CID of Me-SK8 (precursor ion m/z 1683.24, charge state 4-). Only the signals for which a 13C isotope could be detected were selected for subsequent data interpretation; the others were filtered out by the data reduction module (e.g., peaks labeled with an asterisk (*) in the inset). The 3 symbol marks the initial noise threshold, which was set using the standard software for data processing. Figure 4. Multiple stages of SORI-CID for the characterization of the monomethylated adduct of HIV-1 stem-loop 3 (SL3): (a) products obtained by isolation/activation of monomethyl SL3 (MS2, precursor ion m/z 1617.02); (b) isolation/activation of the first-generation fragment produced by loss of methylguanine (MS3, m/z 1617.02 f m/z 1576.55 f products); (c) isolation/activation of the secondgeneration fragment induced by consecutive losses of methylguanine and guanine (MS4, m/z 1617.02 f m/z 1576.55 f m/z 1538.70 f products). All precursor ions selected at each stage had a 4- charge state.
position isomers induced by the alkylation of alternative susceptible sites under nonsaturating conditions. Although ion populations selected as the precursors for gas-phase activation present homogeneous elemental compositions, their structural heterogeneity translates in partially overlapping sequence series that may be very difficult to assign. The application of robust algorithms for data reduction and interpretation could not only expedite the processing of the experimental results, but would also reduce the incidence of erroneous assignments and help to highlight the presence of patterns that may escape an expert visual inspection. This need was the driving force behind the development of MS2Links, which was designed for both modified and unmodified oligonucleotides and tested with the samples listed in Table 2. The data reduction function has proven to be particularly helpful in discriminating actual signals from noise spikes, thus allowing for the assignment of particularly weak signals, which could be easily overlooked by normal visual processing. The principle can be illustrated by examining the SORI-CID spectrum of monomethylated SK8, which displays a typical combination of signals with very different intensities (Figure 5). In the enlarged section, a handful of peaks are shown to exceed the initial noise threshold set by using the standard software provided by the manufacturer for data processing (marked with 3 in the inset). However, only one of them was found to have a corresponding putative 13C isotope, which is not observed for the signals marked with asterisks. Consequently, only the former was used to search the list of predicted fragments generated by the fragment prediction module, and the others were discarded. The ability to 2444 Analytical Chemistry, Vol. 76, No. 9, May 1, 2004
include low-abundance fragments in the interpretation of complex tandem mass spectra results in a greater degree of information redundancy, which in turn increases the level of confidence in the assigned structures. The fragment prediction module has proven to be rugged: for any given precursor, the list of possible products including the expected ion series was found to be in full agreement with that provided by Mongo for the same species. In addition to the expected sequence products, the MS2Links ion product library included all ions expected for fragments modified by the set of user-defined probes. Matching the peaks selected during data reduction with the predicted fragments can be performed with a user-controlled level of tolerance, which can take full advantage of the accurate mass measurements achievable by FTMS. CONCLUSIONS Solvent accessibility probes can be employed in a top-down strategy based on high-resolution tandem mass spectrometry and designed to obtain information about base pairing and steric protection in nucleic acid structures or their protein complexes. ESI-FTMS constitutes a very powerful tool for the optimization of probe application necessary to avoid possible artifacts due to excessive alkylation and structural distortion. Such nonsaturating conditions are necessarily bound to produce complex isomeric mixtures,14 which can be effectively investigated by the top-down approach. The direct gas-phase fragmentation of modified substrates can offer specific and unambiguous information concerning the position of isomeric adducts, with no need for hydrolysis and mass mapping. This feature is particularly advantageous considering that stable secondary structures or the presence of modified bases is a frequent cause of loss of activity by the majority of the nucleases, which are generally employed as cleavage reagents in bottom-up strategies. With the exception of kethoxal, covalent adducts produced by the different probes tested in this study were sufficiently stable under typical SORI-CID conditions to allow for the correct identification of the sequence location of susceptible sites. RNA
constructs with molecular mass in excess of 6000 Da provided excellent sequence coverage by the fragmentation process. Multiple consecutive stages of selection and gas-phase activation (MSn) can be employed to overcome the effects of the loss of modified bases and reduced backbone fragmentation, to the point where the desired level of coverage is reached. The MS2links software proved to be a very useful tool for the interpretation of tandem spectra of increased complexity due to the possible presence of multiple position isomers among the reaction products. The ability of filtering signals according to their putative isotopic patterns can greatly reduce ambiguities and misassignments, thus expediting the subsequent matching with predicted fragments. This feature is particularly desirable in the investigation of progressively larger constructs, in which the probability of error is greater. The fact that MS2Links also contains an algorithm developed for the interpretation of tandem mass spectra of cross-linked peptides (originally included in
MS2Assign) will allow for the seamless handling of conjugated hybrids formed by cross-linking of proteins to nucleic acids in structural investigations of large macromolecular complexes. We plan to make the program available to the public as soon as possible. ACKNOWLEDGMENT D.F., K.K., and E.Y. thank the National Institutes of Health (R01-GM643208) for financial support. M.Y. and G.K.’s work was supported by the LDRD program at Sandia National Laboratories, which is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract DE-AC04-94AL85000. Received for review December 18, 2003. Accepted February 24, 2004. AC0355045
Analytical Chemistry, Vol. 76, No. 9, May 1, 2004
2445