Sulfonic Acid Derivatives for Peptide Sequencing by MALDI MS
A new fast, simple, and water-compatible derivatization strategy improves
©JOHN D. SHAFFER FROM PROCTER AND GAMBLE
protein identification.
156 A
A N A LY T I C A L C H E M I S T R Y / A P R I L 1 , 2 0 0 3
P ©JOHN D. SHAFFER FROM PROCTER AND GAMBLE
rotein sequence information is fundamentally important for understanding many physiological processes at the molecular level. Sequence data is typically used to identify proteins after one- or two-dimensional gel separations, localize and identify chemical or posttranslational modifications, and design oligonucleotide probes and polymerase chain reaction (PCR) primers for gene cloning. MS has become an essential tool for protein and peptide sequencing because of its speed, sensitivity, and applicability to complex mixtures. Much of the peptide sequence information comes from the fragmentation of peptides within mass spectrometers. Many derivatization strategies have been developed over the past 20 years to increase sensitivity and the amount of sequence information that can be obtained. Most of those derivatives are cationic or contain a very basic group because it improves peptide detection sensitivity. Recently, an entirely new derivatization strategy was developed specifically to enhance de novo sequencing of peptides under low internal energy conditions. This approach uses anionic sulfonic acid groups instead of basic or cationic groups, and that change dramatically improves the quality of sequence data obtained with MALDI MS. The new chemistry is water-compatible, fast, and simple, and it can be multiplexed and automated. Sulfonic acid derivatives fragment well under both electrospray ionization (ESI) and MALDI conditions, and easily interpreted fragmentation patterns are obtained with a variety of mass
analyzers. These derivatives will facilitate automated peptide sequencing on instruments designed for high-throughput applications.
Cationic derivatization Current strategies for the cationic derivatization of peptides had their origins in the early days of desorption MS in which analytes were sampled directly from the condensed phase into the mass spectrometer. Important desorption MS methods include static secondary ion MS, fast atom bombardment, ESI, and MALDI. In the mid-1970s, it was well known that field desorption MS provided much more intense signals for quaternary ammonium salts than for any other class of organic compounds (1, 2). That fact was exploited to dramatically improve field desorption MS sensitivity for important classes of biological molecules, such as amino acids and phosphatidyl cholines. Those zwitterionic compounds were converted to cations simply by adding p-toluenesulfonic acid prior to loading analyte solutions onto field desorption emitters (3). Cooks and co-workers understood that the mechanisms of various desorption ionization approaches were very similar and that derivatization to form ionic species represented a new strategy to increase the utility of all desorption MS methods (4). Kidwell and co-workers first applied these ideas to peptides by adding a quaternary ammonium group to the N-terminus of two tripeptides before performing sequence analysis with single-stage static SIMS (5).
Thomas Keough • R. Scott Youngquist • Martin P. Lacey Procter & Gamble Co.
A P R I L 1 , 2 0 0 3 / A N A LY T I C A L C H E M I S T R Y
157 A
N-Terminal OCH3 O
O
N
+
+
CH3O
P
CH2
N
C
C SO2
OCH3 O
O
+
C6H13
O
+
N
CH2
N
C
CH2
O
C
N
CH2
C
C-Terminal NH
CH2CH2
+
+
O
P
CH2
N
3
NH
CH2CH2
+
N
NH
18
SO3–
O
Side chain and backbone NH 2
H
N
SO3H NH2
N H
FIGURE 1. Representative modification groups to facilitate detection and sequencing of peptides.
tion. Peptides containing a single basic residue at the Nor C-terminus are preferable for sequencing purposes because the resulting fragmentation patterns are relatively simple compared with the spectra obtained from peptides containing an internal basic residue. Peptides containing a C-terminal basic residue are readily generated via trypsin digestion of proteins. The dramatic effect of charge localization on peptide fragmentation patterns and the large increase in sensitivity resulting from the use of fixed-charge derivatives led to the investigation of cationic derivatives for peptide fragmentation under high-collision energy conditions (10, 11). For example, Stults and co-workers attached a quaternary ammonium group to the N-terminus of tryptic peptides (12). This derivative produced relatively simple MS/MS spectra that were easier to interpret and showed greater fragment ion intensities than those obtained from native tryptic peptides. Representative N-terminal, C-terminal, backbone, and side chain derivatives are summarized in Figure 1. Recent reviews of charged derivatives for peptide sequencing have been published (11, 13).
Theoretical basis for a new approach
Tandem MS is much more versatile than single-stage MS for peptide sequencing because it is directly applicable to complex peptide mixtures (6, 7 ). The first MS stage selects a particular peptide molecular ion from a mixture; all the other ions in the mixture are discarded. The stable, isolated molecular ion is then excited via collision(s) with neutral gas molecules and fragments. The resulting product ions are mass analyzed in the second MS stage. Unfortunately, peptide fragmentation patterns produced by this method are often difficult or impossible to interpret de novo. Biemann and co-workers showed that peptide fragmentation patterns produced at high-collision energies were strongly dependent on the presence and location of the basic amino acids Lys and Arg (8). Peptides lacking basic residues fragmented extensively along the backbone, producing simple fragmentation patterns. Peptides with one basic residue mainly showed fragment ions that contained the basic amino acid, which sequestered the ionizing proton. If the basic group was located at the N-terminus of the peptide, the resulting fragment ions were N-terminal an, bn, and dn ions (9). If the basic residue was at the C-terminus, the resulting fragment ions were mainly C-terminal wn, vn, zn, and yn ions. A basic residue in the center of the peptide yielded a more complex fragmentation pattern that was often impossible to interpret accurately. Peptides containing more than one basic group usually showed limited sequence-specific fragmenta158 A
A N A LY T I C A L C H E M I S T R Y / A P R I L 1 , 2 0 0 3
Our initial experiments using MALDI postsource decay (PSD) sequencing of tryptic peptides were disappointing because the spectra typically showed weak and irreproducible fragmentation patterns (13). This was in contrast to sequencing results obtained on the same peptides using ESI tandem MS. The spectrum in Figure 2a was obtained from a tryptic peptide using ESI MS/MS on a triple quadrupole mass spectrometer. The doubly protonated form of the peptide was fragmented because ESI produces abundant doubly protonated molecules from tryptic peptides. The spectrum exhibits extensive fragmentation and a single series of Cterminal y-type ions. Almost the entire sequence of the peptide can be deduced from this single MS/MS spectrum. Figure 2b is the MALDI PSD spectrum of the same peptide. MALDI produces mainly singly protonated tryptic peptides. No significant level of fragmentation was observed from the singly protonated molecule. PSD spectra often show more fragmentation than that which is evident in Figure 2b; however, the extent of fragmentation is peptide dependent and extremely variable, and the spectra are very difficult or impossible to interpret de novo. Our first attempts to improve the quality of MALDI PSD spectra of tryptic peptides involved the use of cationic fixedcharge derivatives. Quaternary ammonium groups were attached either to the N-terminus or to the -amino group in Lys-terminated tryptic peptides. The derivatized peptides did not show enhanced fragmentation or simplified fragmentation patterns as had been previously observed under high-energy collision conditions. Instead, very low product ion yields were observed. The spectra were often dominated by cleavages adjacent to aspartic acids, which are known to be among the lowest-energy fragmentation channels (14).
Counts
Counts
Soon, we began to suspect that there simply was not enough fortunately, doubly protonated molecules are not formed in internal energy available to fragment cationic derivatives effi- high yield under MALDI conditions, and therefore, these labile ciently under MALDI PSD conditions. This hypothesis was ions are not available for sequencing studies with this technique. To create a tryptic peptide with an extra proton, but not a bolstered by the work of Gaskell and co-workers who showed that the formation of cationic derivatives of peptides actually double charge, we intentionally added a sulfonic acid group to hinders peptide sequencing under low-energy collision condi- the N-terminus. This strategy concedes that the side chain of the tions (15). The added cationic group promoted proximate basic C-terminal residue will be protonated under MALDI confragmentations, and structural information remote from the ditions. The strong acid group was chosen so it would remain charged group was lost. Interestingly, structure-specific fragmentation of the backbone could be recovered simply by adding a proton to the precharged derivative to form a doubly charged ion. The added proton is free to protonate anywhere along the pep(a) tide backbone, and it induces fragmentation 250 at the backbone amide bonds. y12 Wysocki and co-workers showed that it rey13 quired more internal energy to fragment peptides containing a basic amino acid than it y18 y3 took to fragment the same peptide without it y7 y16 y15 y2 y (16, 17). The trimethylammonium acetyl dey21 y5 y8 10 y14 y1 y19 y4 rivative of a simple peptide (YGGFL) also required more energy to dissociate than the cor0 500 1000 1500 2000 responding native peptide. Furthermore, it m/z (b) took much more internal energy to fragment the singly charged N-terminal triphenylphos15,000 phonium derivative of LDIFSDF than it took to fragment the protonated cationic derivative 10,000 (doubly charged ion). Gaskell and Wysocki’s 5000 studies indicate that sequestering the ionizing proton on a basic residue or fixing the location 0 of the positive charge by derivatization is not 500 1000 1500 2000 m/z optimal for sequencing peptides under lowcollision-energy conditions. These conditions require protonation of backbone sites to inFIGURE 2. Comparison of (a) the electrospray tandem mass spectrum of a duce efficient structure-specific fragmentadoubly protonated peptide with (b) the MALDI PSD spectrum of the singly tion. It is not known whether dissociation protonated peptide. (Adapted with permission from Ref. 20.) starts with proton association with the amide oxygen atom or with the less-basic amide nitrogen atom (17, 18). However, MNDO bond order calculations have shown that protonation of amide nitrogen atoms significantly lowers backbone amide bond strengths, which should favor cleavage of amide bonds deprotonated under MALDI conditions, counter-balancing the under low energy conditions (19). Singly charged tryptic peptides formed either by ESI or C-terminal positive charge. One additional proton, which is reMALDI do not fragment readily because the ionizing proton is quired to ionize the peptide for MS analysis, is more or less free localized on the basic side chain of the C-terminal Lys or Arg. to randomly protonate backbone amide bonds, because the most There is not enough internal energy available to move the ion- basic site in the molecule is already occupied. The ionized deizing proton from the basic side chain to the peptide backbone. rivatized peptides mimic doubly protonated peptides that are This problem is easily remedied with ESI, which is capable of formed directly by ESI, and they were expected to fragment as producing very abundant, doubly protonated peptides. Most readily as the electrospray-derived ions (Figure 2a). An added feature of this derivatization strategy is that the reelectrospray tandem MS sequencing experiments use doubly protonated tryptic peptides because they fragment readily. Un- sulting PSD fragmentation patterns should be quite simple. A P R I L 1 , 2 0 0 3 / A N A LY T I C A L C H E M I S T R Y
159 A
3500
y21
As expected, the PSD mass spectra of the native peptide and the oxidized –Deriv. peptide showed significant differy23 2500 ences. The spectrum of the native y22 y12 peptide was dominated by a fragment 2000 ion resulting from cleavage on the Cy19 y13 terminal side of the Asp along with a 1500 y16 y7 few other relatively weak N- and Cy y y1 y2 y 14 10 y6 4 y15 y17 1000 terminal fragments. It was not possiy8 y5 y y3 ble to deduce the sequence of the pepy9 11 500 tide on the basis of this spectrum. Following oxidation, the spectrum ob0 tained showed increased S/N and a 500 1000 1500 2000 2500 clean series of y-ions that enabled the m/z derivation of the entire sequence of the model peptide (20). FIGURE 3. Typical MALDI PSD spectrum of a derivatized tryptic peptide After our initial success with the (the same peptide used to produce the PSD spectrum in Figure 2b). model peptide, we evaluated several –Deriv. indicates that the ion is formed by loss of the added derivative reagents for the sulfonation of tryptic group. (Adapted with permission from Ref. 20.) peptides. Performic acid oxidation of Cys residues provided an easy validation of the initial concept; however, that approach did not represent a useful general strategy for peptide sequencing because most tryptic pepFragments containing the original N-terminus of the molecule tides do not contain Cys. Various reagents to add sulfur groups will be suppressed in the positive ion mode because they will be (–SH, –SS–, CH3S–) to the N-terminus of tryptic peptides neutral as a result of the added anionic group. A typical exam- were tried. The modified peptides could then be oxidized with ple of a PSD spectrum of a derivatized tryptic peptide (NTAT- performic acid to yield sulfonic acid derivatives. Commercially SLGSTNLYGSGLVNAEAATR) is shown in Figure 3. As expect- available compounds such as 3,3´-dithiobis[sulfosuccinimidyled, it is dominated by a single, predictable series of C-terminal propionate], which is used for protein cross-linking experiy-type ions because they contain the protonated basic residue. ments, Traut’s reagent, and S-acetyl mercaptosuccinic anhyInterpretation of the spectrum de novo is trivial because the se- dride, were evaluated. We synthesized dithioglycolic anhydride, quence-reading direction is known with certainty. Peptide se- but this strategy was ultimately abandoned because of poor dequences are determined by simply measuring mass differences rivatization yields, unwanted side chemistry, and slow-reaction between adjacent members of the y-ion series. The measured kinetics. The fundamental problem was unwanted oxidation of mass differences are correlated to the known residue masses of other labile amino acid residues, which made the resulting mass the common amino acids. This spectrum, obtained from the spectra more difficult to interpret. same peptide that produced the unacceptable results in Figure Direct addition of sulfonic acid groups to the N-terminus of 2b, demonstrates the increase in fragmentation resulting from tryptic peptides eliminated the problem peptide oxidation step. the use of sulfonic acid derivatives. Initially, a commercially available reagent, chlorosulfonylacetyl chloride (CSAC), was used. CSAC turned out to be a robust reagent that allowed direct sulfonation and characterization of Evaluating peptide sulfonation procedures At the start of this project, we wanted to quickly test whether subpicomole quantities of tryptic peptides following in-gel digesN-terminal sulfonation would promote and direct the fragmen- tion of proteins. The resulting PSD spectra were extremely accutation of tryptic peptides under low energy conditions. This was rate for identifying proteins using database searches and for idenaccomplished using the commercially available peptide CDP- tifying chemically modified proteins not found in databases (21). CSAC was used to prove that derivatization PSD sequencing GYIGSR as a model. This peptide contains a C-terminal Arg, which is characteristic of many tryptic peptides, and an N-ter- was a viable method for protein identification. However, the acid minal Cys. The Cys residue contains a sulfhydryl group on its chloride was too reactive for use in water, and the coupling reacside chain, and the –SH can be readily converted to –SO3H via tion had to be performed under nonaqueous conditions. Using performic acid oxidation. This simple derivatization protocol nonaqueous conditions for the sulfonation reaction is a problem yielded a model tryptic peptide with a sulfonic acid group on because derivatized peptides need cleanup prior to MALDI analyses. Peptide cleanup is typically done on C18 mini-columns, the N-terminal end of the molecule. Counts
3000
160 A
A N A LY T I C A L C H E M I S T R Y / A P R I L 1 , 2 0 0 3
y18
y20
room temperature, or it can be completed in 5–10 min using high concentrations of reagents and elevated temperature (25). This reaction is important: It enables the sulfonation method to be applied with equal facility to both Lys- and Arg-terminated tryptic peptides. The optimized protocol for peptide derivatization involves Lys protection, solid-phase sulfonation, and cleanup. The Lys protection step is optional, depending on the situation. Lys protection is often not needed to identify proteins separated by two-dimensional gel electrophoresis, particularly if the MALDI Optimized protocol spectrum shows many tryptic peptides. It All of the direct sulfonation reagents is likely that several of the observed pepdiscussed earlier react with amino tides are Arg-terminated and amenable to groups. Arg-terminated tryptic pepdirect sulfonation. The protein probably tides add a single sulfonate group to can be identified using only the Arg-terthe N-terminus producing the deThe difference in minated peptides, because definitive idensired monosulfonate derivative. Untification generally requires only one or protected Lys-terminated peptides two derivatization PSD spectra. often form disulfonate derivatives, fragmentation behavior On the other hand, the MALDI spectra with one sulfonate group attached from some protein digests show only a few to the N-terminus and the second tryptic peptides, occasionally only one, and group attached to the -amino group has important protein identification hinges on obtaining of the Lys side chain. Disulfonate sequence data from a very limited number derivatives are undesirable because of candidate peptides. In situations like they exhibit poor sensitivity in the consequences for protein this, researchers generally are able to deterpositive-ion mode and relatively poor mine if the peptide(s) of interest is Lys- or fragmentation behavior under negaArg-terminated by PSD analyses of the tive-ion conditions. identification via sodium-cationized form of the native pepNegative-ion PSD spectra of sevtide (MNa+). Alkali metal ion-containing eral Lys-containing disulfonate derivatives showed both low-product ion database searching. tryptic peptides generally show prominent yields, especially at low mass (< 500 loss of the C-terminal residue with MS/MS Da), and complex fragmentation pat(26). In practice, it is easier to perform the terns containing N-terminal (b-type) guanidination reaction prior to N-terminal and C-terminal (y-type) ions linked sulfonation. to the sulfonate group. Some of the negative-ion PSD spectra did The N-terminal sulfonation reaction using 3-sulfopropionic provide accurate protein identifications when searched against acid NHS ester was initially done in aqueous solution for 30 protein sequence databases. However, negative-ion PSD of disul- min at room temperature. The sulfonation step is somewhat fonate derivatives was not as reliable as positive-ion PSD of Arg- slow, in part, because of the low analyte concentrations (