Letter pubs.acs.org/ac
To Be or Not to Be? Five Guidelines to Avoid Misassignments in Cross-Linking/Mass Spectrometry Claudio Iacobucci and Andrea Sinz* Institute of Pharmacy, Martin Luther University Halle-Wittenberg, Wolfgang-Langenbeck-Strasse 4, D-06120 Halle (Saale), Germany ABSTRACT: The number of publications in the field of chemical cross-linking/ mass spectrometry (MS) for deriving protein 3D structures and for probing protein/protein interactions has largely increased during the last years. MS analysis of the large cross-linking data sets requires an automated data analysis by dedicated software tools, but applying scoring procedures with statistical methods does not eliminate the fundamental problems of a misassignment of cross-linked products. In fact, we have observed a significant rate of misassigned cross-links in a number of publications, mainly due to the presence of isobaric cross-linked species, an incomplete fragmentation of cross-linked products, and low-mass accuracy fragment ion data. These false assignments will eventually lead to wrong conclusions on the structural information derived from chemical cross-linking/MS experiments. In this contribution, we examine the most common sources for misassigning cross-linked products. We propose and discuss rational criteria and suggest five guidelines that might be followed for a reliable and unambiguous identification of cross-links, independent of the software used for data analysis. In the interest of the cross-linking/MS approach, it should be ensured that only high-quality data enter the structural biology literature. Clearly, there is an urgent need to define common standards for data analysis and reporting formats of cross-linked products.
I
open and contributions from the entire community will be highly valuable. Here, we address the challenges involved in the automated data analysis of cross-linked products. Although automated data analysis is crucial for analyzing large cross-linking data sets, MS/MS data of putative cross-linked peptides must still be approached with caution. Applying filtering, scoring, and threshold parameters with statistical methods does not eliminate the fundamental problems of misassignments of cross-linked products. In particular, the availability of a large number of different software tools for cross-link assignment and the lack of universally accepted identification criteria pose significant challenges for authors, reviewers, and readers in the validation and reproduction of results. This does not imply that the situation is running out of control. Indeed, most of the assignments of cross-linked peptides are likely to be correct and the resulting geometrical constrains have proven highly valuable to derive meaningful structural data. Nevertheless, in a number of publications we found a non-negligible percentage of misassigned cross-links, often beyond a given false discovery rate (FDR), putting the reliability of the overall outcome into question. These misassignments arise from a misinterpretation of data, mainly due to the presence of isobaric cross-linked species, an incomplete fragmentation of cross-linked products, and low-mass accuracy fragment ion data. These false assignments will eventually lead to wrong conclusions regarding
n chemical cross-linking, 3D structural information on a protein or a protein complex is obtained by the insertion of a chemical cross-linker between two functional groups within a protein.1,2 The cross-linker has a defined length and is connected via covalent bonds to functional groups of amino acid side chains. It therefore acts as a kind of “molecular ruler”3 and imposes a distance constraint on the structure of a protein or a protein assembly. Cross-linked amino acids are subsequently identified by MS, usually after enzymatic digestion of the protein or the protein complex.4−7 The rapid development in mass spectrometric instrumentation and protocols as well as the availability of relatively easy-to-use software for analyzing large amounts of cross-linking data has boosted the interest in the cross-linking/MS approach, which is now getting accepted in structural biology. In line with the steadily increasing interest in the technique, cross-linking/MS is currently moving toward a large scale analysis of cross-linking data. This is an important development and will bring the crosslinking/MS field to maturity. Nevertheless, the community should be aware of the risks in accepting nonsubstantiated results to make this important step forward without adopting adequate precaution measures. With the need for a fast and automated assignment of the maximum number of cross-links we should avoid repeating the mistakes that had been made in the proteomics field in the dawn of the new millennium.8,9 The main obstacle on the path to a complete affirmation of the cross-linking/MS technique remains the lack of generally accepted guidelines regarding cross-linking reagents, experimental conditions, data analysis, and reporting formats. The call for contributing to this difficult harmonization process is © XXXX American Chemical Society
Received: June 15, 2017 Accepted: July 19, 2017 Published: July 19, 2017 A
DOI: 10.1021/acs.analchem.7b02316 Anal. Chem. XXXX, XXX, XXX−XXX
Letter
Analytical Chemistry
induce two misleading conclusions: (i) If the two putatively cross-linked amino acids are separated by several residues, an artificial constraint will be introduced. This will definitely be a shaky foundation for a subsequent modeling and lead to wrong structural assumptions. (ii) If the two putative cross-linked amino acids are separated by only a few residues, the resulting geometrical constraint would have a low impact for modeling anyway. In that case, the two residues are close to each other and their distance will be consistent with the length of common cross-linkers. Case ii may have a detrimental effect in fundamental protein structural studies, such as the evaluation of an eventual perturbation of the protein 3D structure induced by cross-linking. In these cases, the cross-linking experiments are performed using a protein with a well-known 3D structure, measuring the expected distances between cross-linked residues. Subsequently, the overlap of the distance distribution between false positive cross-links, overlength cross-links, and decoys is evaluated. The resulting distributions may be dramatically affected by “dead-end” products leading to an underestimation of the FDR or the presence of misfolded protein. As high-throughput data analysis may not efficiently distinguish between isobaric species as outlined in Figure 1B, it is crucial to carefully re-examine the respective cross-links. Our first suggested guideline (guideline 1) states that the assignment of cross-links should always be supported by at least one characteristic diagnostic fragment ion for both cross-linked peptides. One example is presented in Figure 2A where a crosslink is assigned between the consecutive amino acid sequences 5−15 and 16−19 of the bMunc 13-2 protein. The cross-link was automatically assigned between Lys-14 and Ser-17, but there are no indicative fragment ions that unanimously prove the existence of a true cross-link. In fact, all fragment ions are also in perfect agreement with a peptide composed of amino acids 5−19, modified at Lys-14 with a partially hydrolyzed cross-linker (“dead-end” or type 0 cross-link). From our experience and from screening published cross-linking data, we assume that this type of misassignment is the most frequent one. According to this guideline, the cross-link reported in Figure 2A should be discarded. The problem of generating isobaric species as outlined in Figure 1B, observed for the currently most commonly used NHS cross-linkers BS3 and BS2G, can be overcome by the recently developed MS-cleavable cross-linkers. These novel linkers, such as disuccinimidyl sulfoxide (DSSO)11 and disuccinimidyl dibutyric urea (DSBU or BuUrBu),12 are now commercially available and drastically reduce the potential of identifying false-positive cross-links. MS-cleavable linkers catapult the cross-linking/MS strategy to a new level by creating characteristic marker ions upon fragmentation in the gas phase. As the identification of cross-linked product relies on unique signatures in the fragment ion mass spectra, the search space of cross-links is reduced from n2 to 2n. Therefore, MScleavable linkers are paving the way for a fully automated analysis of cross-linked products, even for whole proteomes.7,13 Incomplete Fragmentation of Cross-Linked Products. It is frequently observed in cross-linking experiments that only one of the connected peptides is thoroughly sequenced, while there are no fragmentation data available of the other peptide. Since different combinations of cross-linked peptides might yield isobaric species, it is essential to determine the intact masses of cross-linked peptides and to thoroughly sequence both peptides.
the structural data derived from chemical cross-linking/MS experiments. In the following, we will describe the most common sources of cross-link misassigments everyone working in the field of chemical cross-linking should be aware of. To this end, we propose five guidelines we believe to deserve serious consideration in every cross-linking study. Any improvement of these initially suggested guidelines would be highly appreciated by the structural proteomics community. Clearly, there is an urgent need to define the standards of data analysis in cross-linking mass spectrometry paving the way toward more reliable and reproducible results. Isobaric Cross-Linked Species. Currently, the most commonly used cross-linking principles are N-hydroxysuccinimide (NHS) esters, such as bissulfosuccinimidylsuberate (BS3) or bissulfosuccinimidylglutarate (BS2G). The advantages of NHS esters are their simple application and their reactivity mainly toward lysine residues. However, it is often overlooked that cross-links composed of consecutive amino acid sequences are isobaric to peptides that are modified with a partially hydrolyzed cross-linker (type 0 or “dead-end” cross-links) involving the same amino acid sequences (Figure 1). From this,
Figure 1. (A) Types of cross-linked species. The nomenclature of cross-linked products is not unified yet although a systematic nomenclature (type 0, 1, and 2 cross-links) has been proposed by Schilling et al.10 (B) Cross-linked products composed of consecutive sequences cannot be distinguished from a peptide that is modified by a partially hydrolyzed cross-linker (type 0 or “dead end” cross-link) involving the identical amino acids. Both species are isobaric.
one of the most common sources of misassignment arises: Cross-linked consecutive amino acid sequences in a protein cannot be discriminated from “dead-end” cross-links unless respective characteristic and indicative fragment ions are identified. When checking published cross-link assignments, we found that in many cases an assigned cross-link, composed of consecutive amino acid sequences, is in fact most likely just a peptide that is modified by a partially hydrolyzed cross-linker, a “dead-end” cross-link. These products are usually extensively sequenced and therefore their scores are overestimated, independently of the software used. These products are also frequently misassigned when employing photoactivatable crosslinkers, such as sulfo-NHS-diazirine (sulfo-SDA) and Lphotomethionine. It is therefore crucial to closely examine cross-linked products that are composed of consecutive amino acid sequences. As outlined above, there is a great chance that these putative cross-links are in fact just “dead-end” products. If one uses such a falsely assigned cross-link for a subsequent structural modeling of a protein or a protein complex, this may B
DOI: 10.1021/acs.analchem.7b02316 Anal. Chem. XXXX, XXX, XXX−XXX
Letter
Analytical Chemistry
Figure 2. (A) Cross-links composed of consecutive amino acid sequences cannot be distinguished from peptides that are modified by a partially hydrolyzed cross-linker (type 0 or “dead end” cross-link) as they are isobaric species. (B) One of the cross-linked peptides does not deliver any fragment ions. (C) Incomplete fragmentation does not allow pinpointing the exact cross-linking site. All spectra have been assigned by the StavroX software.19
the variable number of peptide modifications however require higher mass measurement accuracies at the MS/MS level compared to simple enzymatic peptide mixtures in proteomics.17,18 Therefore, recording MS and MS/MS data with high resolving power and high mass measurement accuracy is of pivotal importance to reduce the number of false positive crosslinks. We suggest that MS data should be recorded with a resolving power of at least 30 000 for a measured m/z. For MS/ MS data, the resolving power should be at least 10 000 for the most intense fragment ion (guideline 3). These requirements are fulfilled by different mass analyzers, such as orbitrap or time-of-flight. As far as the data analysis is concerned, a mass measurement accuracy of 1 ppm in MS mode can exclude 99% of possible peptides, even in m/z intervals with a high density of candidates, and therefore ensures a high degree of confidence.14 The above-mentioned guideline 3 allows adapting sufficiently stringent parameters for the software tools that are currently used for analyzing cross-linking data. We propose to allow mass tolerances of up to 5 ppm for precursor ions (MS level) and 10 ppm for product ions (MS/MS level) (guideline 4). Another aspect to be considered is a false precursor mass determination based on a misassignment of monoisotopic ions. Often M + 1 or M + 2 peaks will be selected, especially for higher charge states. One should keep in mind that precursor masses might be wrong in uncorrected mgf files. This latter aspect cannot be properly addressed by guideline 4, but the user should carefully check the automatic peak picking procedure to rule out a potential misassignment of monoisotopic ions. Unassigned Peaks and Signal-to-Noise Ratio. The spectrum quality and number of unassigned signals is also an important issue that requires special attention. If only a few
An example for this type of misassignment is shown in Figure 2B where fragment ions are obtained from one of the crosslinked peptides, while there are no fragments available from the other peptide. Therefore, this cross-linked product has to be considered ambiguous as determining the accurate mass of the cross-linked product alone is not sufficient to unanimously derive the amino acid composition of the cross-linked peptides. Another example where the connected amino acid could not be unambiguously assigned is presented in Figure 2C. Here, the fragment ions observed do not allow pinpointing the exact cross-linking sites of the NHS ester, which can potentially react with lysine, serine, or threonine residues in the peptide. Therefore, we derive as guideline 2 that both peptides have to be thoroughly sequenced and that cross-linked products should be discarded if no reliable sequence information is available from one of the connected peptides. Low Mass Accuracy Fragment Ion Data. Rapid technological advances have made high-resolution MS and MS/MS measurements available on a routine basis, which has had a major impact on cross-linking/MS. The presence of various cross-linked species significantly increases the complexity of the peptide mixture generated by enzymatic digestion. Therefore, coeluting species are more frequently observed than in conventional proteomics studies. Moreover, nearly all crosslinked products exceed the mass limit of 500−600 u required for the unambiguous determination of the amino acid composition based on accurate mass measurements.14−16 Also, the presence of the isomeric amino acids, leucine or isoleucine, cannot be assessed. The resolution requirement at the MS/MS level is less stringent than that at the MS level due to the limited number of possible amino acid masses for a given precursor ion. In the case of cross-linked peptides, increased sample complexity and C
DOI: 10.1021/acs.analchem.7b02316 Anal. Chem. XXXX, XXX, XXX−XXX
Letter
Analytical Chemistry
(8) Carr, S.; Aebersold, R.; Baldwin, M.; Burlingame, A.; Clauser, K.; Nesvizhskii, A. Mol. Cell. Proteomics 2004, 3, 531. (9) Bradshaw, R. A.; Burlingame, A. L.; Carr, S.; Aebersold, R. Mol. Cell. Proteomics 2006, 5, 787. (10) Schilling, B.; Row, R. H.; Gibson, B. W.; Guo, X.; Young, M. M. J. Am. Soc. Mass Spectrom. 2003, 14, 834. (11) Kao, A. H.; Chiu, C. L.; Vellucci, D.; Yang, Y. Y.; Patel, V. R.; Guan, S. H.; Randall, A.; Baldi, P.; Rychnovsky, S. D.; Huang, L. Mol. Cell. Proteomics 2011, 10, M110.002212. (12) Muller, M. Q.; Dreiocker, F.; Ihling, C. H.; Schafer, M.; Sinz, A. Anal. Chem. 2010, 82, 6958. (13) Arlt, C.; Götze, M.; Ihling, C. H.; Hage, C.; Schäfer, M.; Sinz, A. Anal. Chem. 2016, 88, 7930. (14) Zubarev, R. A.; Hakansson, P.; Sundqvist, B. Anal. Chem. 1996, 68, 4060. (15) Smith, R. D. Int. J. Mass Spectrom. 2000, 200, 509. (16) Spengler, B. J. Am. Soc. Mass Spectrom. 2004, 15, 703. (17) Zubarev, R.; Mann, M. Mol. Cell. Proteomics 2007, 6, 377. (18) Mann, M.; Kelleher, N. L. Proc. Natl. Acad. Sci. U. S. A. 2008, 105, 18132. (19) Götze, M.; Pettelkau, J.; Schaks, S.; Bosse, K.; Ihling, C. H.; Krauth, F.; Fritzsche, R.; Kühn, U.; Sinz, A. J. Am. Soc. Mass Spectrom. 2012, 23, 76.
signals in the tandem mass spectrum are assigned to a specific cross-link, while the majority of signals are not assigned, this cross-links should not be trusted. This point seems to be quite obvious, but when we screened published cross-linking data, we often encountered automated cross-link assignments based on a few minor signals in the fragment ion mass spectra. We therefore propose as guideline 5 that a cross-linked product should only be accepted if the majority of fragment ions can be assigned and that spectra with low signal-to-noise ratios should be discarded. This aspect is even more important than for single peptide identifications in standard proteomics workflows due to the usually low abundance of cross-linked species.
■
CONCLUSIONS To avoid frequent misassignments of cross-links in automated data analysis, everyone working in the field of cross-linking/MS should carefully re-examine cross-links composed of consecutive amino acid sequences. Only if the presence of a “deadend” (type 0) cross-link can definitely be ruled out based on characteristic and specific fragment ions, the respective crosslink can be included. Also, MS and MS/MS data should be recorded with high mass measurement accuracy and high resolution and nearly complete fragment ion series should be obtained from both cross-linked peptides. Here, we propose a first set of guidelines for a reliable data analysis, and we advise to validate the automatically assigned cross-links, independent from the software employed. The next step should be to define common standards for acquiring, interpreting, and reporting data in cross-linking/MS.
■
AUTHOR INFORMATION
Corresponding Author
*Phone: +49-345-5525170. Fax: +49-345-5527026. E-mail:
[email protected]. ORCID
Andrea Sinz: 0000-0003-1521-4899 Author Contributions
The manuscript was written through contributions of all authors. All authors have given approval to the final version of the manuscript. Funding
A.S. is funded by the DFG (Project Si 867/15-2) and the state of Saxony Anhalt. C.I. is funded by a postdoctoral fellowship by the Alexander von Humboldt Foundation. Notes
The authors declare no competing financial interest.
■
REFERENCES
(1) Sinz, A. Mass Spectrom. Rev. 2006, 25, 663. (2) Rappsilber, J. J. Struct. Biol. 2011, 173, 530. (3) Green, N. S.; Reisler, E.; Houk, K. N. Protein Sci. 2001, 10, 1293. (4) Kalisman, N.; Adams, C. M.; Levitt, M. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 2884. (5) Leitner, A.; Joachimiak, L. A.; Bracher, A.; Monkemeyer, L.; Walzthoeni, T.; Chen, B.; Pechmann, S.; Holmes, S.; Cong, Y.; Ma, B. X.; Ludtke, S.; Chiu, W.; Hartl, F. U.; Aebersold, R.; Frydman, J. Structure 2012, 20, 814. (6) Herzog, F.; Kahraman, A.; Boehringer, D.; Mak, R.; Bracher, A.; Walzthoeni, T.; Leitner, A.; Beck, M.; Hartl, F. U.; Ban, N.; Malmstrom, L.; Aebersold, R. Science 2012, 337, 1348. (7) Liu, F.; Rijkers, D. T. S.; Post, H.; Heck, A. J. R. Nat. Methods 2015, 12, 1179. D
DOI: 10.1021/acs.analchem.7b02316 Anal. Chem. XXXX, XXX, XXX−XXX