SearchXLinks. A Program for the Identification of Disulfide Bonds in

Antonio Artigues , Owen W. Nadeau , Mary Ashley Rimmer , Maria T. Villar , Xiuxia Du , Aron W. Fenton , Gerald M. ..... Nobel laureate Aaron Klug dies...
0 downloads 0 Views 99KB Size
Anal. Chem. 2006, 78, 1235-1241

SearchXLinks. A Program for the Identification of Disulfide Bonds in Proteins from Mass Spectra Stephan Wefing,* Volker Schnaible,† and Daniel Hoffmann‡

Center of Advanced European Studies and Research (caesar), Ludwig-Erhard-Allee 2, 53175 Bonn, Germany

We present the computer program SearchXLinks that analyzes mass spectra with the aim of identifying disulfide bonds and other modifications in proteins of known amino acid sequence. Disulfide bonds can be intra- or intermolecular. To decrease the number of false positives, the analysis of in-source decay and tandem mass spectra are coupled into the program. The steps taken during a SearchXLinks run are outlined, and the computational costs are discussed. The application of the program is illustrated by the analysis of data from recent studies on bovine ribonuclease A and bovine serum albumin. The software can be used free of charge on the Internet at http://www.searchxlinks.de. Disulfide bonds play an important role during protein folding1 and stabilize the final protein structure.2 If the three-dimensional structure of a protein is unknown, the elucidation of its disulfide bond pattern may provide valuable hints on the structure. In particular, the corresponding distance restraints may be used in a subsequent modeling step. In addition, disulfide bond elucidation may be helpful for assessing the quality of recombinant proteins.3 Consequently, numerous experimental methods have been developed that aim at the determination of disulfide bonds in proteins, cf. ref 4 and the references therein. Modern mass spectrometry (MS) methods5,6 have further fostered methodological development in this area.4,7-16 * Corresponding author. E-mail: [email protected]. Fax: +49 228 9656 118. † Current address: Binzener Strasse 7g, 79539 Lo ¨rrach, Germany. ‡ Current address: Bingen University of Applied Sciences, Berlinstr. 109, 55411 Bingen, Germany. (1) Wedemeyer, W. J.; Welker, E.; Narayan, M.; Scheraga, H. A. Biochemistry 2000, 39, 4207-4216. (2) Creighton, T. E. Proteins, 2nd ed.; W. H. Freeman and Co.: New York, 1992. (3) Baneyx, F.; Mujacic, M. Nat. Biotechnol. 2004, 22, 1399-1408. (4) Gorman, J. J.; Wallis, T. P.; Pitt, J. J. J. Mass Spectrom. Rev. 2002, 21, 183216. (5) Karas, M.; Hillenkamp, F. Anal. Chem. 1988, 60, 2299-2301. (6) Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M. Science 1989, 246, 64-71. (7) Zhou, J.; Ens, W.; Poppe-Schriemer, N.; Standing, K. G.; Westmore, J. B. Int. J. Mass Spectrom. Ion Processes 1993, 126, 115-122. (8) Patterson, S. D.; Katta, V. Anal. Chem. 1994, 66, 3727-3732. (9) Gormen, J. J.; Ferguson, B. L.; Speelman, D.; Mills, J. Protein Sci. 1997, 6, 1308-1315. (10) Qin, J.; Chait, B. T. Anal. Chem. 1997, 69, 4002-4009. (11) Jones, M. D.; Patterson, S. D.; Lu, H. S. Anal. Chem. 1998, 70, 136-143. (12) Wallis, T. P.; Pitt, J. J.; Gorman, J. J. Protein Sci. 2001, 10, 2251-2271. (13) Yen, T.-Y.; Yan, H.; Macher, B. A. J. Mass Spectrom. 2002, 37, 15-30. (14) Schnaible, V.; Wefing, S.; Resemann, A.; Suckau, D.; Bu ¨ cker, A.; WolfKu ¨ mmeth, S.; Hoffmann, D. Anal. Chem. 2002, 74, 4980-4988. 10.1021/ac051634x CCC: $33.50 Published on Web 01/12/2006

© 2006 American Chemical Society

All approaches to disulfide bond detection entail the cleavage of the protein backbone in the presence of disulfide bonds. With the notable exception of ref 16, all protocols directly identify disulfide bonds by comparing measured masses with masses of predicted digest peptides that contain disulfide bonds. Most of these latter protocols use the classical bottom-up approach in which backbone cleavage is carried out enzymatically or chemically in solution. This digestion step is often facilitated by a partial reduction of disulfide bonds.17 The peptide mixture thus obtained is subjected to high-performance liquid chromatography (HPLC), and the fractions are analyzed by mass spectrometry. This is in contrast to the top down approach,15 where the native protein is directly transferred to the mass spectrometer where it is selectively cleaved. SearchXLinks is a program for analyzing bottom-up experiments. It aims at identifying disulfide bonds in proteins of known amino acid sequence. This is achieved by enumerating all possible peptides, including those carrying disulfide bonds, calculating their masses, and searching for matching measured masses. A frequent problem in bottom-up experiments is the large number of theoretically possible peptides that may lead to many false positives. Hence, to increase selectivity, the basic mass spectrometric experiment must usually be complemented by additional data, e.g., in-source decay (ISD) data8,11,14 or MS/MS data.7,9,11-14 SearchXLinks directly integrates ISD and MS/MS data analysis in its search for matching peptides without creating intermediate files. For historical reasons, SearchXLinks exclusively refers to postsource decay (PSD) data as a special type of MS/MS data. However, SearchXLinks can also analyze other types of MS/MS data.14,18 Several programs have previously been written for the analysis of disulfide bond patterns from MS data. One of the first programs that was made publicly available via the World Wide Web was the disulfide mapping tool in the PROWL environment19,20 whose underlying algorithm is also used by the disulfide bond modeler.21,22 Another program offering a freely accessible web interface (15) Ge, Y.; Lawhorn, B. G.; ElNaggar, M.; Strauss, E.; Park, J.-H.; Begley, T. P.; McLafferty, F. W. J. Am. Chem. Soc. 2002, 124, 672-678. (16) Qi, J.; Wu, W.; Borges, C. R.; Hang, D.; Rupp, M.; Torng, E.; Watson, J. T. J. Am. Soc. Mass Spectrom. 2003, 14, 1032-1038. (17) Gray, W. R. Protein Sci. 1993, 2, 1732-1748. (18) Lowe, E. K.; Anema, S. G.; Bienvenue, A.; Boland, M. J.; Creamer, L. K.; Jiminez-Flores, R. J. Agric. Food Chem. 2004, 52, 7669-7680. (19) Fenyo ¨, D. Comput. Appl. Biosci. 1997, 13, 617-618. (20) http://prowl.rockefeller.edu/prowl/proteininfo.html. (21) Craig, R.; Krokhin, O.; Wilkins, J.; Beavis, R. C. J. Proteome Res. 2003, 2, 657-661. (22) http://www.proteome.ca/x-bang/DisulphideModeler/disulphide.html.

Analytical Chemistry, Vol. 78, No. 4, February 15, 2006 1235

is the MS-Bridge tool of the ProteinProspector program suite.23,24 Each of the existing programs has its own strengths and weaknesses. Among the strengths of SearchXLinks is the combination of normal MS data with ISD and MS/MS analysis, as well as its considerable flexibility in generating putative peptidic isomers and their MS/MS fragments. A weakness is the missing calibration of the MS/MS scoring scheme. SearchXLinks may also be used to analyze MS data of crosslinked and digested proteins. As for disulfide bonds, the aim is to identify pairs of amino acid residues connected by a cross-linker. From such pairs one can derive distance restraints25 that can be used to obtain low-resolution protein models.26,27 However, SearchXLinks’ current support for cross-linkers is not as sophisticated as for disulfide bonds. In particular, since ISD is usually not observed for cross-linkers, one has to resort to other methods of eliminating false positive structural candidates. One such method consists of using isotopically labeled cross-linkers.28,29 However, SearchXLinks currently cannot exploit the spectral correlations that are provided by such cross-linkers. Conversely, programs that have been written for the analysis of MS cross-linking experiments30,31 have also been used for elucidating disulfide bonds.18 In two previous studies on bovine ribonuclease A (RNAseA)32 and bovine serum albumin (BSA),14 we have applied prototypes of SearchXLinks to the mass spectroscopic elucidation of disulfide bonds without providing details on the methods as such. The purpose of the present work is to comprehensively discuss the method and its performance. To this end, we reanalyze two previously published sets of mass spectra on RNAseA32 and on BSA.14 COMPUTATIONAL METHODS Description of a SearchXLinks Run. SearchXLinks attempts to accurately model all available knowledge about the protein(s) to be investigated, including modification and digestion. Modeling is controlled by appropriate rules. For example, via a link rule the user can specify the set of cysteine residues that are eligible for disulfide bond formation. Given these rules, a mass spectrometric peak list (possibly containing peaks resulting from ISD fragmentation), one or more MS/MS spectra, and the amino acid sequence of one or more proteins to be investigated, SearchXLinks exhaustively enumerates all peptides that are consistent with the input. Note that the term “accurate modeling” only refers to the result of a SearchXLinks run, i.e., to the final set of consistent peptides (23) Clauser, K. R.; Baker, P.; Burlingame, A. L. Anal. Chem. 1999, 71, 28712882. (24) http://prospector.ucsf.edu. (25) Green, N. S.; Reisler, E.; Houk, K. N. Protein Sci. 2001, 10, 1293-1304. (26) Young, M. M.; Tang, N.; Hempel, J. C.; Oshiro, C. M.; Taylor, E. W.; Kuntz, I. D.; Gibson, B. W.; Dollinger, G. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 5802-5806. (27) Rappsilber, J.; Siniossoglou, S.; Hurt, E. C.; Mann, M. Anal. Chem. 2000, 72, 267-275. (28) Mu ¨ ller, D. R.; Schindler, P.; Towbin, H.; Wirth, U.; Voshol, H.; Hoving, S.; Steinmetz, M. O. Anal. Chem. 2001, 73, 1927-1934. (29) Pearson, K. M.; Pannell, L. K.; Fales, H. M. Rapid Commun. Mass Spectrom. 2002, 16, 149-159. (30) Schilling, B.; Row, R. H.; Gibson, B. W.; Guo, X.; Young, M. M. J. Am. Soc. Mass Spectrom. 2003, 14, 834-850. (31) http://roswell.ca.sandia.gov/∼mmyoung/index.html. (32) Schnaible, V.; Wefing, S.; Bu ¨ cker, A.; Wolf-Ku ¨ mmeth, S.; Hoffmann, D. Anal. Chem. 2002, 74, 2386-2393.

1236

Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

Figure 1. Flowchart of a SearchXLinks run. Explanations are given in the text.

as output by the program. It does not mean that SearchXLinks internally simulates all experimental steps in a one-to-one fashion and in the same order as in the experiment. For example, partial digestion is actually modeled by a two-step process within SearchXLinks: We first remove all peptide bonds at all potential cleavage sites, corresponding to a complete digestion, and then reinsert some of the bonds (missed cleavages) at a later stage. The optional first step of a SearchXLinks run, cf. Figure 1, filters peaks of the conventional mass spectrum. As described in ref 14, one can use the ISD fragmentation of disulfide bonds in order to focus on those peaks that are probably due to peptides with at least one disulfide bond. To this end, we search for peak triplets fulfilling the following equation:

m/z(peak A) + m/z(peak B) - m/z(H2 + H+) ) m/z(peak C) (1) Here, peak C corresponds to a putative parent ion, whereas peaks A and B correspond to two putative fragment ions. Hence, only ions corresponding to peaks of type C may be considered for peptide mass matching, cf. stage A of the peptide search below. Thus, we are able to prune the search tree at a very early stage (“ISD single filter”). In the absence of peak A or peak B, ISD peak screening cannot be carried out. However, during the ISD analysis of the peptidic isomers at a later stage, the theoretical m/z values of both fragment candidates are known and we can try to find a matching peak for at least one of them. Hence, even in such a case, ISD filtering is possible, albeit in a less efficient and less restrictive way (“ISD half-filter”). On the other hand, if we are searching for peptides containing two disulfide bonds, one may screen the peak list for a single peak C fulfilling eq 1 for two pairs of peaks giving rise to a more restrictive “ISD double filter”. Making use of correlations between peaks is similar to using isotope labeling12,28,29,33 or reporter groups34,35 for the identification of peptides containing disulfide bonds or cross-links. In particular, (33) Back, J. W.; Notenboom, V.; de Koning, L. J.; Muijsers, A. O.; Sixma, T. K.; de Koster, C. G.; de Jong, L. Anal. Chem. 2002, 74, 4417-4422. (34) Back, J. W.; Hartog, A. F.; Dekker, H. L.; Muijsers, A. O.; de Koning, L. J.; de Jong, L. J. Am. Soc. Mass Spectrom. 2001, 12, 222-227. (35) Tang, X.; Munske, G. R.; Siems, W. F., Bruce, J. E. Anal. Chem. 2005, 77, 311-318.

Tang et al.35 used a cross-linker that is easily cleaved in MS/MS experiments, thereby releasing a reporter group of known mass. These authors used relations similar to eq 1 to identify those peaks that were due to cross-linked peptides. In contrast to the current study, these relations were applied to MS/MS data. Perhaps it is also possible to observe ISD fragmentation for their cross-linker. Since ISD analysis refers to normal MS data, this would render the identification of cross-linked peptides more efficient. Note that eq 1 applies only to interchain disulfide bonds whose rupture releases two fragments. Hence, intrachain disulfide bonds without an actual cleavage site between the two linked cysteine residues are not detected. Similarly, the rupture of only one of two interchain disulfide bonds connecting the same pair of chains, cf. insulin,36 cannot be detected. In general, eq 1 is only valid for exocyclic disulfide bonds and singly protonated ions. For ISD peak filtering, SearchXLinks takes possible degeneracies of fragment masses into account: For peptides with a single exocyclic disulfide bond, both fragments may have identical masses. Hence, we must take into account that peak A of eq 1 may be identical to peak B. Similarly, for peptides with two exocyclic disulfide bonds, the minimum number of different fragment masses is two. The next step of a SearchXLinks run consists of the complete in silico digestion (no missed cleavages) of the “bare” protein yielding a set of so-called base peptides. Here, the term “bare” means that the protein does not carry any modifications or disulfide bonds; that is, cysteine residues are in their reduced state. Along with the disulfide bonds, modifications, and missed cleavages specified in the input, the base peptides form the set of available objects from which matching peptides are constructed. Searching for peptides that are consistent with the input of SearchXLinks proceeds in two stages: Stage A mainly deals with the mass of a candidate peptide, whereas the second, more sophisticated stage B focuses on the topology of the candidate. In stage A, we construct all relevant subsets of the available objects, i.e., of base peptides, disulfide bonds, modifications, and missed cleavages. Such a subset is called a “peptide” in the following. Note that connections between the constituent objects of a peptide are still undefined at this stage. Nevertheless, it is possible to calculate the total mass of a peptide, which is done in the next step. Subsequently, we scan the peak list for masses that match the total mass of a peptide. This step can be regarded as a mass filter. Matching peptides are passed on to stage B. As indicated by the term “relevant”, SearchXLinks avoids to construct peptides that cannot lead to mass matches right from the start. For example, it makes no sense to construct peptides whose m/z lies above the highest m/z occurring in the experimental peak list. This principle can be utilized easily if peptides are built up from smaller objects. In stage B, we try to connect all the objects that constitute a peptide in order to obtain a “valid” peptidic isomer. Here, the term “valid” refers to several properties, of which we mention the two most important ones: First, a valid structure must represent a single molecule; in the language of graph theory and interpreting atoms as nodes and bonds as edges, the molecule must correspond to a singly connected graph. We use standard graph (36) Lin, M.; Campbell, J. M.; Mueller, D. R.; Wirth, U. Rapid Commun. Mass Spectrom. 2003, 17, 1809-1814.

Figure 2. Protein model underlying SearchXLinks. The figure depicts a sample amino acid chain consisting of three residues that are connected by peptide bonds b. Each residue carries along a single side-chain connector s. A connector represents a reaction site, which can form one end of a disulfide bond or which can carry a single modification. The two terminal residues of a chain each provide an additional connector: connector n at the N-terminus models the terminal NH2 group, and connector c at the C-terminus models the terminal COOH group.

algorithms to check this property.37 Second, apart from the terminal residues of a protein chain, each amino acid residue can be involved in at most one disulfide bond or can carry at most one modification, cf. Figure 2. Hence, modifications of different types or modifications and cross-links are no longer independent of each other. These correlations are neglected in stage A and are checked in stage B. The isomers obtained thus far may be optionally subjected to ISD or MS/MS analysis, both of which can be used as a filter. To this end, an isomer is fragmented according to rules specified by the user, and the fragment masses are searched for in the normal peak list (ISD fragments) or in an MS/MS spectrum. The currently available fragmentation rules comprise fragmentation of disulfide bonds for ISD and MS/MS analysis, and simple chain fragmentations leading to ion series a, b, y, etc., in the case of MS/MS analysis. In general, SearchXLinks cannot handle fragment ions that require the “simultaneous” rupture of more than one bond. Thus, it cannot propose internal or immonium ions30 or ions that result from the rupture of two disulfide bonds as has been observed by Lin et al.36 for insulin. SearchXLinks supports so-called “conditional” assignments in the context of MS/MS analysis. These are assignments that depend on other, related assignments. For example, the rupture of a disulfide bond can give rise to eight different fragment ions, four ions for each side. Of the latter, two ions are due to asymmetric fission (-SH, +SH), while the other two are due to symmetric fission (-H, +H).7 In our experiments, we always found the peaks assigned to asymmetric fissions to be significantly more intense than the peaks assigned to symmetric fissions, cf. Figure 2 of ref 14. Thus, we accept assignments involving symmetric fissions only if the two peaks for the corresponding asymmetric fission can both be assigned as well. MS/MS Rating. To each isomer proposed, MS/MS analysis assigns several figures of merit. Among these, the user can choose one for setting up the final ranking list of isomers. Available figures of merit comprise, in particular, the number of assigned peaks, the sum of the intensities of the assigned peaks (specified as percentage of the total intensity), the number of assigned fragments, and an MS/MS score. The first three ratings are parameter-free, whereas the MS/MS scoring scheme provides some parameters that can be changed by the user. In our MS/MS scoring scheme, the score of an isomer is the sum of individual nonnegative scores that are assigned to each matching fragment derived from the isomer. For these contribu(37) Cormen, T. H.; Leiserson, C. E.; Rivest, R. L.; Stein, C. Introduction to Algorithms, 2nd ed.; MIT Press: Boston, 2001.

Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

1237

tions, SearchXLinks provides three types of scoring functions: constant, linear, and exponential. The latter two types can be used for the assessment of contiguous series of, for example, b- or y-ions. With these functions, it is possible to increase the contribution of each additional adjacent matching ion. Thus, three adjacent b-ions may contribute more to the score than three isolated b-ions. Furthermore, one can assign boni to fragments that have an increased probability to be observed. Such fragments may be due to the rupture of labile bonds, e.g., Xxx-Pro bonds,38 or they may contain certain residues (e.g., arginine) or modifications. The deviation of a calculated m/z from an experimental m/z is assessed only in a very crude fashion: For a match to be detected, this deviation must always be below some user-defined threshold. For normal MS data, all peaks fulfilling this condition are assigned to a proposed peptide, whereas only the best matching peak is assigned to a proposed fragment in the context of MS/MS analysis. Thus, an MS/MS fragment can be assigned to at most one MS/MS peak, but an MS/MS peak may be assigned to several MS/MS fragments. Peak intensities are completely ignored for conventional MS data, whereas they may be taken into account in a simple way for MS/MS analysis, for example, by using the sum of the intensities of the assigned peaks for isomer ranking (see above). Computational Resources. The CPU time required by a SearchXLinks run is dominated by the number of peptides proposed, while the amount of memory and the amount of output are dominated by the number of matching peptides. Neglecting missed cleavages, disulfide bonds, modifications, and upper bounds for the mass of a proposed peptide, the number of proposed peptides is identical to the number N of base peptides obtained from the complete digestion of the protein. If we admit any number of missed cleavages, the number of proposed peptides rises to N(N + 1)/2, which is the number of different subsequences of adjacent base peptides.39 If we allow for an additional single cross-link, the number of different peptides increases to the order of N4. Here, we assume that each base peptide can take part in one disulfide bond. Hence, the number of peptides to be constructed increases significantly with the introduction of disulfide bonds. To keep the computational resources manageable, it is thus important to make as much experimental data as possible available to SearchXLinks. If, nonetheless, resource consumption is too high, for example, if the output comprises thousands of matching peptides, the experiment as such may lack discriminative power. It should be noted that SearchXLinks uses a general enumeration algorithm that can handle any number of disulfide bonds and does not make any assumptions on the topology of the matching peptides. This is in contrast to the algorithms presented by Chen et al.,39 which assume an H-like peptide structure, that is, two amino acid subsequences, each consisting of a linear sequence of adjacent base peptides and both connected by a single disulfide bond or cross-link. Practical Considerations. SearchXLinks is an ANSI C batch program (ASCII input/output) that is called from the command line. Hence, SearchXLinks is highly suitable for automatic data (38) Breci, L. A.; Tabb, D. L.; Yates, J. R., III; Wysocki, V. H. Anal. Chem. 2003, 75, 1963-1971. (39) Chen, T.; Jaffe, J. D.; Church, G. M. J. Comput. Biol. 2001, 8, 571-583.

1238

Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

processing. Formats of output files comprise simple text, HTML, and XML. The program currently runs under UNIX (Linux, Sun) and Windows operating systems and can be ported easily to other platforms. SearchXLinks is freely accessible via input masks at http://www.searchxlinks.de. At the same web site, copies of the program are available under license. Data Analysis. For this study, data analysis by SearchXLinks was carried out on the basis of monoisotopic masses of singly positively charged ions. The maximal deviation used for the matching of peptides in the normal mass spectrum was 150 ppm, the maximal deviation used for the matching of MS/MS fragments was 1.5 Da, and the maximal deviation used for ISD peak screening, cf. eq 1, was 150 ppm. During the generation of peptidic isomers, the number of missed cleavages was not bounded. For each cysteine residue of RNAseA,32 SearchXLinks considered four possible states: unmodified, linked to some other cysteine, modified by N-ethylmaleimide (NEM), and modified by hydrolyzed NEM. For BSA,14 cyanylation was taken into account as an additional fifth state. For both RNAseA and BSA, an optional deamidation of asparagine residues preceding a glycine residue was taken into account, as well as an optional formation of a pyroglutamate residue for glutamine residues at peptidic N-termini. With respect to MS/MS analysis, we searched for b- and y-ions and for ions that were due to the rupture of disulfide bonds. Conditional assignment rules were used for some of the disulfide bond-related fragments as described above. Since the MS/MS scoring scheme has not been calibrated yet, we used the sum of the intensities of the assigned peaks for the MS/MS ranking of isomers. The complete set of parameters used in the current study is provided by the SearchXLinks parameter files submitted as Supporting Information (S-1, S-2). The analysis was carried out on a Pentium III (550 MHz) personal computer under Linux. EXPERIMENTAL METHODS Details of the sample preparation and of the MS measurements are given in ref 32 for RNAseA and in ref 14 for BSA. Here, we give only brief outlines. RNAseA was partially reduced by tris(2carboxyethyl)phosphine (TCEP),17 modified by NEM,40 and cleaved by a mixture of trypsin and endoproteinase GluC. BSA was partially reduced by TCEP, modified by 1-cyano-4-dimethylaminopyridinium tetrafluoroborate41 and NEM, and cleaved by a mixture of trypsin and chymotrypsin. In both cases, the reaction products were separated by HPLC and the fractions were subjected to matrix-assisted laser desorption/ionization (MALDI) MS. MS/MS spectra were recorded for selected parent ions using MALDI PSD for RNAseA and MALDI tandem time-of-flight (LIFT TOF/TOF) MS for BSA. The MS spectra analyzed in this study were calibrated externally. RESULTS AND DISCUSSION In this section, we illustrate and discuss MS data analysis by SearchXLinks for two selected HPLC fractions, one for RNAseA and one for BSA. For the complete analysis we refer to ref 32 for RNAseA and to ref 14 for BSA. The RNAseA example represents a simple standard case of analysis, whereas the BSA example (40) Bures, E. J.; Hui, J. O.; Young, Y.; Chow, D. T.; Katta, V.; Rohde, M. F.; Zeni, L.; Rosenfeld, R. D.; Stark, K. L.; Haniu, M. Biochemistry 1998, 37, 12172-12177. (41) Wu, J.; Watson, J. T. Protein Sci. 1997, 6, 391-398.

Table 1. RNAseA Peptidic Isomers Proposed by SearchXLinks for m/z 2760.35 matching ISD fragments rank/ isomer

peptide

disulfide bond

1

(38-49)-S-S-(87-98)

40-95

2 3

(34-49)-S-S-(54-61) 67(deamidated)-91

40-58 72-84

formula

m/z (exp)

(38-49)-SH (87-98)-SH -

1444.80 1318.66 -

assigned MS/MS intensity (%) 29.3 13.9 10.7

Table 2. Results of the Three SearchXLinks Runs Carried out for BSA To Analyze m/z 3557.77 and the Associated MS/MS Spectruma

run

ISD filter

no. of matching peptides

no. of consistent isomers

MS/MS ranks of final solutions

assigned intensity of top isomer (%)

amount of output (MB)

CPU time (min)

memory (MB)

1 2 3

none single double

6395 147 1

196783 2527 2

8/9 5/6 1/2

31.0 30.9 30.2

691.00 12.00 0.03

61 13 12

15 1 1

a

The two MS/MS ranks specified in column 5 refer to the two isomers considered the final solution. One of these isomers is depicted in Figure

4.

marks the other end of the complexity scale: It can only be solved by employing the full range of SearchXLinks features. The main reason for this difference is protein size: RNAseA has 124 amino acid residues, 8 of which are cysteine residues, giving rise to 24 base peptides upon complete in silico digestion with trypsin/GluC. By contrast, BSA features 583 amino acid residues, 35 of which are cysteine residues, yielding 186 base peptides after complete in silico digestion with trypsin/chymotrypsin. Consequently, the number of peptides that pass the mass filter, i.e., that match a peak of the conventional mass spectrum, increases significantly in the case of BSA, resulting in a high number of false positives if no additional measures such as ISD filtering are taken. RNAseA. Using the 24 base peptides, SearchXLinks generates all possible peptides up to m/z 3446.22, which represents the highest m/z value observed in the MS spectrum of the selected HPLC fraction. The spectrum comprises a total of 12 peaks to which SearchXLinks assigns 13 peptides giving rise to 14 isomers. No ISD or MS/MS filtering has been applied. To two peaks, m/z 2300.19 and 2760.35, SearchXLinks assigns two and three, respectively, isomers. To resolve these ambiguities, MS/MS spectra were recorded and analyzed for these two peaks. For the peak at m/z 2760.35, SearchXLinks proposes three peptidic isomers that are listed in Table 1 with respect to decreasing MS/ MS rating. Analysis of ISD disulfide bond fragmentation was carried out. However, the results were not used for peak or isomer filtering. Since the assigned MS/MS intensities of the proposed isomers differ markedly, one may trust the ranking and consider isomer 1 the final solution. This assessment is supported by the fact that ISD and MS/MS fragments that are due to disulfide bond rupture can only be identified for isomer 1. This finding strengthens the case against isomer 2, whereas it is less conclusive with respect to isomer 3, which contains an intrachain disulfide bond. Hence, for isomer 3, we do not expect any ISD fragments to be detected from the outset. However, m/z 1444.80 and 1318.66 of the normal mass spectrum can only be explained by the two fragments (38-49)-

Figure 3. RNAseA peptidic isomer 1 assigned to m/z 2760.35. Matching fragment ions of the MS/MS spectrum are indicated by bars (b- and y-ions, disulfide bond fragments). For each ion, the measured m/z is provided. The numbers in brackets represent the intensity ranks of the matching peaks.

SH and (87-98)-SH. Thus, we assign m/z 2760.35 to isomer 1, cf. Figure 3. According to the known crystal structure of RNAseA,42 isomer 1 has its disulfide bond placed correctly, whereas the two other isomers feature incorrectly placed disulfide bonds. The analysis of m/z 2300.19 proceeds along similar lines, leading to the assignment of peptide (38-49)-S-S-(92-98) with a disulfide bond between Cys40 and Cys95. Thus, the assignments for m/z 2760.35 and 2300.19 are consistent. The SearchXLinks output holding the complete analysis is provided as Supporting Information (S-3). BSA. We focus on a peak at m/z 3557.77 and the associated MS/MS spectrum. Without ISD filtering, SearchXLinks identifies 6395 peptides matching this single peak and resulting in 196 783 different isomers, cf. run 1 of Table 2. (42) http://www.rcsb.org/pdb, code 3RN3.

Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

1239

Figure 4. One of the two final peptidic isomers proposed by SearchXLinks for m/z 3557.77. The numbers shown have the same meaning as in Figure 3.

Since the assigned intensities of the top-ranking isomers are densely spaced, the MS/MS ranking does not provide clear evidence for the correct assignment. Hence, we resort to ISD filtering and repeat the SearchXLinks run employing an ISD single filter; that is, we accept only isomers with at least one fragmenting disulfide bond whose ISD fragment ions can both be matched by peaks in the peak list. This reduces the number of possible solutions considerably (2527 isomers of 147 peptides), but the MS/MS ranking still does not yield a clear answer with respect to the correct isomer. Since the amino acid sequence of BSA contains eight pairs of adjacent cysteine residues, peptides with two disulfide bonds may occur as well. Hence, we try to reduce the number of possible solutions even further by employing an ISD double filter: Only those isomers are allowed to pass that show at least two fragmenting disulfide bonds for which all four fragment ions can be identified in the peak list. It turns out that there is indeed a single peptide with two possible isomers that is consistent with this boundary condition. We accept these two peptides as the final solution; see Figure 4 for one of the isomers. The complete output of run 3 is provided as Supporting Information (S-4). The two isomers proposed by SearchXLinks for m/z 3557.77 differ with respect to the location of the two disulfide bonds: In isomer 1, they are located at Cys123-Cys167 and Cys168-Cys176. In isomer 2, which is depicted in Figure 4, the disulfide bonds are located at Cys123-Cys168 and Cys167-Cys176. To distinguish between these two isomers, one must cleave between Cys167 and Cys168. This can be accomplished chemically, for example, by the methodology developed by Watson and coworkers,16 or by backbone fragmentation during MS/MS experiments. Since we used a mixture of trypsin and chymotrypsin to digest BSA, neither of which usually cleaves between cysteine residues, we rely on MS/MS fragmentation in the current study. As shown in Figure 4, SearchXLinks indeed assigns m/z 2081.3 to a y6-ion derived from fragmentation between Cys167 and Cys168. However, for the other isomer, SearchXLinks assigns m/z 2167.2 to a b7-ion derived from fragmentation at the same site. We think that one of these assignments is accidental. In addition, there is no pronounced difference of the assigned intensity for 1240 Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

both isomers (30.2% for isomer 1, 29.9% for isomer 2). Thus, we currently cannot distinguish between these two isomers. Given that none of the isomers proposed in run 1 of Table 2 can account for more than 31% of the total intensity or, in terms of assigned peaks, for more than 19 of 55 MS/MS peaks, the implementation of additional fragmentation pathways may make a distinction possible. In particular, internal ions released after two bond fissions may be helpful in that respect if their intensity is not too low.9,11,13,30 We did some experiments in order to improve the MS/MS ranking using different figures of merit. For example, we ran some tests using the MS/MS scoring scheme with an ad hoc parameter set. However, the results deteriorated significantly in almost all cases. One reason for this behavior is the following: Due to the cleavage specificity of the enzymes used for digestion, only a few types of amino acid residues occur at the C-termini of the linear subsequences held together by disulfide bonds. If such a C-terminal residue is cleaved off during MS/MS fragmentation, and if the resulting fragments can be observed in the MS/MS spectrum, SearchXLinks will detect matches for each linear subsequence having the same C-terminal residue. This is demonstrated in Figure 4 by Lys173 and Lys180. Note that, due to SearchXLinks' standard MS/MS fragmentation rules, the bond Pro179-Lys180 is not considered for fragmentation because of its low fragmentation probability.38 However, if Pro179 is substituted by some other amino acid, m/z 3411.5 and 148.2 will be assigned twice. One can think of an extreme case in which two identical linear subsequences are linked by a single disulfide bond at equivalent cysteine residues. In such a case, all matching backbone fragmentations are considered twice. This phenomenon favors peptides containing several linear subsequences. The effect disappears if we switch from a fragment-oriented rating to a peak-oriented rating, as is done in the current study. For a peak-oriented rating, the peaks assigned to m/z 3411.5 and 148.2 contribute just once to the rating of the isomer. In toto, a careful calibration of parameters, possibly including peak intensities,43,44 seems to be mandatory for a successful application of our MS/MS scoring scheme. CONCLUSIONS AND PERSPECTIVES In this study, we presented the program SearchXLinks and used it to elucidate the structure of complex peptides containing more than one disulfide bond. A key element is the combined analysis of ISD and MS/MS data including a careful consideration of potential correlations hidden in the data. These correlations help to separate true structural candidates from a large number of false positives. Thus, structural complexity does not always represent a hurdle with respect to elucidation19 but can also be an asset by allowing for highly selective filters. It may be possible to directly transfer the strategies demonstrated here to the analysis of peptides carrying cross-linkers with labile bonds.35 The latter set also includes cross-linkers containing disulfide bonds. There are many ways in which SearchXLinks can be improved. In particular, MS/MS analysis should be complemented by fragmentation rules that allow for the rupture of more than one bond.30 In addition, a calibration of the MS/MS scoring scheme is desirable. SearchXLinks should also be able to analyze experiments in which the protein is digested in a mixture of 16O- and (43) Havilio, M.; Haddad, Y.; Smilansky, Z. Anal. Chem. 2003, 75, 435-444. (44) Zhang, Z. Anal. Chem. 2004, 76, 3908-3922.

18O-water.12,33

All these enhancements would increase SearchXLinks’ power to discriminate between true and false positive structural candidates. At the same time, these additional features would render SearchXLinks more useful for the analysis of crosslinking experiments. One may also think of a rating scheme that takes both, ISD and MS/MS data, into account, thus replacing the independent and strictly binary decisions of ISD filtering. However, the usage of computational resources may be demanding for such a scheme, cf. run 1 of Table 2: While output and memory can be limited by keeping only the N top-ranking solutions, it can turn out difficult to limit CPU time.

SUPPORTING INFORMATION AVAILABLE S-1. SearchXLinks input for the analysis of RNAseA MS data, cf. Table 1. S-2. SearchXLinks input for the analysis of BSA MS data, cf. run 3 of Table 2. S-3. SearchXLinks output corresponding to S-1. S-4. SearchXLinks output corresponding to S-2. This material is available free of charge via the Internet at http:// pubs.acs.org.

Received for review September 13, 2005. Accepted December 14, 2005. AC051634X

Analytical Chemistry, Vol. 78, No. 4, February 15, 2006

1241