Anal. Chem. 2009, 81, 3723–3730
De Novo Sequence Determination of Modified Oligonucleotides Julie Farand* and Francis Gosselin Department of Process Research, Merck Research Laboratories, 126 East Lincoln Avenue, Rahway, New Jersey 07065 We present the combined application of chemical and enzymatic digestions toward de novo sequence determination of a modified oligonucleotide. The unknown of interest, consisting of a random mixture of 2′-deoxy, 2′fluoro, 2′-O-methyl, abasic and/or ribonucleotides, is a representative oligonucleotide used as a component of synthetic short interfering RNAs (siRNAs). The sequence is initially determined using chemical degradation, electrospray ionization (ESI), time-of-flight (TOF), and tandem (MS/MS) mass spectrometry. A nucleoside composition analysis is then performed via enzymatic digestion of the oligonucleotide to complement the chemical sequencing method. The identity and experimental count of each nucleoside within the oligonucleotide are determined by ultra performance liquid chromatography (UPLC) analysis. This additional analysis allows unambiguous sequence assignment of an unknown chemically modified oligonucleotide. Proteins, peptides, and oligonucleotides can offer numerous analytical challenges rarely encountered with their small molecule counterparts. One such challenge involves the complete and unambiguous sequence determination of unknown biomolecules (de novo). Oligonucleotides are rapidly gaining importance as target validation tools and are presently emerging as diagnostic and therapeutic agents.1,2 The incorporation of chemically modified nucleotides and modifications to the phosphate backbone have improved physical properties of oligonucleotides toward their use as novel therapeutic agents.3,4 As a result, conventional sequencing methodologies often fail to provide complete characterization data on such complex structures.5 To address this problem and provide a robust characterization method, we recently reported a novel sequence confirmation methodology for modified oligonucleotides.6 In brief, oligonucleotides are chemically degraded, and the resulting fragment masses are determined via electrospray ionization (ESI), time-of-flight (TOF), and iontrap mass spectrometry (ITMS). The sequence is confirmed when * To whom correspondence should be addressed. E-mail: julie_farand@ merck.com. (1) Opalinska, J. B.; Gewirtz, A. M. Nat. Rev. Drug Discovery 2002, 1, 503– 514. (2) Goodchild, J. Curr. Opin. Mol. Ther. 2004, 6, 120–128. (3) Morrissey, D. V.; Zinnen, S. P.; Dickinson, B. A.; Jensen, K.; McSwiggen, J. A.; Vargeese, C.; Polisky, B. Pharm. Discovery 2005, 5, 16–20. (4) Prakash, T. P.; Bhat, B. Curr. Top. Med. Chem. 2007, 7, 641–649. (5) Limbach, P. A. Mass Spectrom. Rev. 1996, 15, 297–336. (6) Farand, J.; Beverly, M. Anal. Chem. 2008, 80, 7414–7421. 10.1021/ac802452p CCC: $40.75 2009 American Chemical Society Published on Web 04/21/2009
all expected fragment masses, each sequentially separated by the mass of a known nucleotide, are observed by mass spectrometry. With the recognized success of this method, chemical/mass spectrometry sequencing was put to the test using a chemically modified oligonucleotide of unknown sequence. Among the variety of sequencing tools, mass spectrometry (MS) is increasingly becoming the method of choice, particularly via energy or collision-induced dissociation (CID), because of its ability to rapidly generate valuable sequencing information according to mass differences and dissociation chemistry.7-12 However, the size and intrinsic structural nature of proteins, peptides, and nucleic acids can impede this technique to yield the entire sequence of an unknown. For example, sequencing of thymidine-containing oligonucleotides can be difficult because of the fragmentation inertness of thymidine, which in turn hinders phosphate backbone cleavage during MS/MS.13 Without instruments of high resolving power and high mass accuracy, MS/MS spectral interpretation of biomolecules is notoriously complex and time-consuming (e.g., ion charge state assignment, low signal intensity of fragments).14 Even with the use of high performance hybrid instruments, such as an ion-trap coupled with a Fouriertransform ion-cyclotron resonance (FTICR) mass spectrometer, de novo protein sequencing can require both the aid of computer algorithms and manual interpretation.15,16 Tremendous developments in bioinformatics have recently emerged to accelerate protein and peptide sequencing.17-19 However, such advances are (7) For a review on proteins and peptides sequence determination, see Standing, K. G. Curr. Opin. Struct. Biol. 2003, 13, 595–601. (8) McLuckey, S. A.; Berkel, G. J.; Glish, G. L. J. Am. Soc. Mass Spectrom. 1992, 3, 60–70. (9) Ni, J.; Pomerantz, S. C.; Rozenski, J.; Zhang, Y.; McCloskey, J. A. Anal. Chem. 1996, 68, 1989–1999. (10) Premstaller, A.; Huber, C. Rapid Commun. Mass Spectrom. 2001, 15, 1053– 1060. (11) Little, D. P.; Chorush, R. A.; Speir, J. P.; Senko, M. W.; Kelleher, N. L.; McLafferty, F. J. Am. Chem. Soc. 1994, 116, 4893–4897. (12) Barry, J. P.; Vouros, P.; Schepdael, A. V.; Law, S.-J. J. Mass Spectrom. 1995, 30, 993–1006. (13) Wan, K. X.; Gross, J.; Hillenkamp, F.; Gross, M. L. J. Am. Soc. Mass Spectrom. 2001, 12, 193–205. (14) Hernandez, P.; Mu ¨ ller, M.; Appel, R. D. Mass Spectrom. Rev. 2006, 25, 235–254. (15) Branca, R. M. M.; Bodo´, G.; Bagyinka, C.; Prokai, L. J. Mass Spectrom. 2007, 42, 1569–1582. (16) Zhang, W.; Krutchinsky, A. N.; Chait, B. T. J. Am. Soc. Mass Spectrom. 2003, 14, 1012–1021. (17) Pevtsov, S.; Fedulova, I.; Mirzaei, H.; Buck, C.; Zhang, X. J. Proteome Res. 2006, 5, 3018–3028. (18) Fernandez-De-Cossio, J.; Gonzalez, J.; Satomi, Y.; Shima, T.; Okumura, N.; Beseda, V.; Betancourt, L.; Padron, G.; Shimonishi, Y.; Takao, T. Electrophoresis 2000, 21, 1694–1699.
Analytical Chemistry, Vol. 81, No. 10, May 15, 2009
3723
Figure 1. Phosphodiester hydrolysis using nuclease P1, phosphodiesterase I, and alkaline phosphatase. Table 1. Natural and Chemically Modified Nucleosides Used During Oligonucleotide Synthesis
scarce in the field of nucleic acids, and even more so for oligonucleotides containing chemically modified nucleotides.20-23 De novo sequence determination of highly modified oligonucleotides will likely be plagued with inherent limitations.24 To address these limitations and complement the chemical/MS sequencing method, a nucleoside composition analysis was developed to identify and quantify nucleosides within the unknown oligonucleotide. Enzymatic digestion of nucleic acids into single nucleosides (or nucleotides) can give insight on nucleoside composition, the presence of modified nucleosides (e.g., C5 methylation of cytosine) and can be used as a method for DNA/ RNA quantitation.25-27 Seminal work by Crain has demonstrated the use of nuclease P1 (endonuclease), followed by phosphodiesterase I (5′-exonuclease), to digest DNA/RNA and release their 5′-mononucleotides (Figure 1). A third incubation with alkaline phosphatase is then required to hydrolyze the 5′ phosphate to afford free nucleosides. (19) Taylor, J. A.; Johnson, R. S. Anal. Chem. 2001, 73, 2594–2604. (20) Oberacher, H.; Wellenzohn, B.; Huber, C. G. Anal. Chem. 2002, 74, 211– 218. (21) Oberacher, H.; Mayr, B. M.; Huber, C. G. J. Am. Soc. Mass Spectrom. 2004, 15, 32–42. (22) Rozenski, J.; McCloskey, J. A. J. Am. Soc. Mass Spectrom. 2002, 13, 200– 203. (23) Meng, Z.; Limbach, P. A. Eur. J. Mass Spectrom. 2005, 11, 221–229. (24) Identical molecular weights of ribo-adenosine and deoxy-guanosine (329.20594 u); increasing variety of chemically modified nucleotides with weights similar to other nucleotides (e.g., MW of 5′-phosphothiolated deoxythymidine ) 320.25878 u vs MW of 2′-O-methyl uridine ) 320.19258 u); 1 u mass difference between nucleotides uridine and cytosine. (25) Crain, P. F. Methods Enzymol. 1990, 193, 782–790. (26) Friso, S.; Choi, S.-W.; Dolnikowski, G. G.; Selhub, J. Anal. Chem. 2002, 74, 4526–4531. (27) Shimelis, O.; Giese, R. W. J. Chromatogr. A 2006, 1117, 132–136.
3724
Analytical Chemistry, Vol. 81, No. 10, May 15, 2009
Many tailored versions of the method have been described using DNA and RNA.28-30 However, chemically modified oligonucleotides often resist enzymatic digestion. 31-33 This warranted the need for optimal reaction conditions to ensure complete digestion of oligonucleotides containing an unknown mixture of 2′-deoxy, 2′-fluoro, 2′-O-methyl, abasic and/or ribonucleosides (Table 1). The identity and experimental count of each nucleoside found within the unknown oligonucleotide can be determined using commercially available nucleoside standards and UPLC analysis.34 The data is then cross-referenced to the proposed de novo sequence obtained via chemical/mass spectrometry sequencing to ensure the highest level of accurate assignment. Herein, the combination of chemical sequencing and nucleoside composition analysis will be described as a powerful de novo sequence determination method for chemically modified oligonucleotides. EXPERIMENTAL SECTION Chemical Synthesis and Purification of Oligonucleotides 1 and 2. Oligonucleotide 1, containing an unknown mixture of ribo-, deoxy-, 2′-fluoro, 2′-O-methyl and/or abasic nucleotides, was (28) Eadie, J. S.; McBride, L. J.; Efcavitch, J. W.; Hoff, L. B.; Cathcart, R. Anal. Biochem. 1987, 165, 442–447. (29) Donald, C. E.; Stokes, P.; O’Connor, G.; Woolford, A. J. J. Chromatogr. B 2005, 817, 173–182. (30) Quinlivan, E. P.; Gregory, J. F., III Anal. Biochem. 2008, 373, 383–385. (31) Shaw, J.-P.; Kent, K.; Bird, J.; Fishback, J.; Froehler, B. Nucleic Acids Res. 1991, 19, 747–750. (32) Kawasaki, A.; Casper, M. D.; Freier, S. M.; Lesnik, E. A.; Zounes, M. C.; Cummins, L. L.; Gonzalez, C.; Cook, P. D. J. Med. Chem. 1993, 36, 831– 841.
prepared for de novo sequencing and purified using previously published conditions.6 The final sequence was exclusively known by the synthetic chemist. Oligonucleotide 2 was prepared as above and was used for method development of the base composition analysis. The sequence was known prior to enzymatic digestion.35 Sample Preparation for Chemical Sequencing/ESI-TOF and ITMS Analysis. The lyophilized oligonucleotide 1 was diluted in water and was analyzed by ESI-TOF MS to determine the molecular weight. A 100 µM solution of the unknown was prepared in water and was subjected to all chemical reactions,36 ESI-TOF MS and ITMS analysis.6 ITMS analysis was performed using a sample from the hydroxylamine/piperidine reaction to confirm the sequence of the N-21 fragment. Enzyme Preparation for Base Composition Analysis. Nuclease P1. A solution of 20 mM aqueous NaOAc/50 mM aqueous NaCl/5 mM aqueous ZnCl2 (adjusted with 1% HOAc in water to pH 5.3) (400 µL) was added to a vial of nuclease P1 (Penicillium citrinum, g 200 Units/mg, Sigma Aldrich). The solution was vortexed for 10 s. Phosphodiesterase I. A solution of 15 mM aqueous MgCl2/50 mM aqueous Tris (adjusted with 1 N aqueous NaOH to pH 8.9) (1 mL) was added to a vial of phosphodiesterase I (Crotalus adamanteus venom, USB corp., g 20 Units/mg). The solution was vortexed for 10 s. Alkaline Phosphatase. Water (980 µL) was added to alkaline phosphatase (calf intestine, Roche Diagnostics Corp., 1000 U). General Method for Nucleoside Composition Analysis. Purified and lyophilized oligonucleotides 1 and 2 (∼300 µg) were directly weighed in a 1.5 mL Eppendorf tube. A solution of nuclease P1 (40 µL) was added; the solutions were vortexed and then heated at 70 °C for 2 h. The solutions were concentrated to dryness using a Savant SpeedVac (∼30 min at 65 °C). The samples were reconstituted in 15 mM aqueous MgCl2/50 mM aqueous Tris, pH 8.9 (160 µL). Phosphodiesterase I (40 µL) and alkaline phosphatase (40 µL) solutions were added to each Eppendorf tube; the samples were vortexed and heated at 45 °C for 2 h. The solutions were then directly transferred to a UPLC vial. Note: total reaction volume used for calculations is 240 µL. Oligonucleotides containing abasic nucleotides (e.g., 1) must be incubated at room temperature for an additional 18 h prior to UPLC analysis. This step ensures complete digestion of the oligonucleotide. Nucleoside Standards and UPLC Analysis. All nucleosides used during oligonucleotide synthesis were accurately weighed (∼10 mg, ChemGenes) and diluted up to 100 mL in water to obtain a final concentration of ∼100 µg/mL) (Table 1, excluding abasic nucleosides). A series of dilutions were performed to obtain 5 calibration points ∼1, 5, 10, 50, and 100 µg/mL. All five standards, of various concentrations, were analyzed by UPLC for each nucleoside. The retention time was determined, and a calibration (33) Egli, M.; Minasov, G.; Tereshko, V.; Pallan, P. S.; Teplova, M.; Inamati, G. B.; Lesnik, E. A.; Owens, S. R.; Ross, B. S.; Prakash, T. P.; Manoharan, M. Biochemistry 2005, 44, 9045–9057. (34) Calibration curves were experimentally prepared to determine the response factor of each nucleoside (where m ) response factor in y = mx + b). (35) Sequence of 2: 5′ rA-rU-rG-OMeG-OMeA-OMeA-OMeG-OMeG-fluC-OMeAfluU-fluC-OMeG-fluC-fluC-fluC-fluU-OMeG-OMeG-OMeU-OMeU 3′. (36) Seven chemical reactions were performed with the following key reagents: aniline, piperidine, dimethyl sulfate/piperidine, diethyl pyrocarbonate/ piperidine, hydroxylamine/piperidine, sodium hydroxide and formic acid.
curve was prepared for all 12 nucleosides to determine their response factors. Digested oligonucleotides 1 and 2, and nucleoside standards were analyzed (3 µL injections) using reverse phase UPLC (Waters Acquity) and separated using a HSS C18 column (2.1 × 100 mm, 1.8 µm, Waters Acquity) at 40 °C with buffers A ) 10 mM aqueous ammonium formate, pH 4.7 and B ) MeOH. Gradients of 1 f 17% B were applied over 9 min, and all spectral data were collected at 260 nm. RESULTS AND DISCUSSION Our attempt to sequence an unknown oligonucleotide began by determining the molecular weight of purified strand 1 using ESI-TOF mass spectrometry. With a known average mass (6912 u), a 100 µM aqueous solution of lyophilized strand 1 was prepared to subject the unknown to seven chemical digestion reactions: aniline, piperidine (C5H11N), dimethyl sulfate (C2H6O4S)/ piperidine, diethyl pyrocarbonate (DEPC)/piperidine, hydroxylamine (NH2OH)/piperidine, sodium hydroxide and formic acid (See Experimental Section). The specificity, reactivity, and mechanistic trends of phosphate backbone cleavage under such reaction conditions have previously been discussed.6 All seven reaction samples were subsequently analyzed by ESI-TOF MS to identify the masses of all fragments generated during the digestion of the full length product (FLP) 1. Our data analysis strategy began by examining fragment masses near the average mass of 1 (MW ) 6912 u) (Figure 2). Since the majority of the nucleotides used during the synthesis of our modified oligonucleotides weigh approximately 320 u, we expected to see key fragments, decreasing in mass from 1, at such mass intervals. The precise mass difference between key fragments indicated the identity of the digested nucleotide and allowed the creation of a sequence ladder. However, the potential presence of abasic nucleotides (MW ) 198 u) incorporated within the unknown could not be ignored (see Table 1). We previously found that these nucleotides were most efficiently digested under basic conditions, such as aqueous NaOH, to release the 3′ terminal hydroxylcontaining fragment without the 5′ terminal phosphate. Therefore, a mass loss of approximately 180 u from the parent fragment would indicate the digestion of an abasic nucleotide.6 All fragments >4000 u were deconvoluted to obtain their average masses.37 Fragment masses repetitively appearing in more than one chemical reaction were highlighted as possible key fragments. For example, masses in Figure 2 appeared more than once among the following reactions: aniline, dimethyl sulfate/ piperidine, diethyl pyrocarbonate/piperidine, hydroxylamine/ piperidine, and sodium hydroxide. Upon examination of fragments generated during the hydroxylamine/piperidine digestion, the mass difference between many fragments corresponded to potential nucleotides/nucleosides. In Figure 2b, the mass difference between fragment 6505 u and 6912 u (FLP) was much larger than the average mass of a nucleotide (∼320 u). Therefore, the presence of an abasic nucleotide was suspected, and the digestion via sodium hydroxide was closely analyzed (Figure 3). As anticipated, loss of 180 u from FLP 1 afforded fragment N-1 (6732 u), which indicated the presence of an abasic nucleotide. A mass difference of 227 u (37) Because of software limitations, only masses >4000 u could be deconvoluted. Masses 4000 u: 5′ iB-fluC-dA-dG-dA-fluC-dA-dGdG...3′. A similar strategy was utilized for fragments