Xilmass: A New Approach toward the Identification of Cross-Linked

Sep 19, 2016 - Upon the Shoulders of Giants: Open-Source Hardware and Software in Analytical Chemistry. Michael D. M. Dryden , Ryan Fobel , Christian ...
1 downloads 7 Views 1MB Size
Subscriber access provided by ECU Libraries

Article

Xilmass: a new approach towards the identification of cross-linked peptides #ule Y#lmaz, Friedel Drepper, Niels Hulstaert, Masa Cernic, Kris Gevaert, Anastassios Economou, Bettina Warscheid, Lennart Martens, and Elien Vandermarliere Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.6b01585 • Publication Date (Web): 19 Sep 2016 Downloaded from http://pubs.acs.org on September 20, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Xilmass: a new approach towards the identification of crosslinked peptides Şule Yılmaz1,2,3, Friedel Drepper4,5, Niels Hulstaert1,2,3, Maša Černič6,7, Kris Gevaert1,2, Anastassios Economou8,9, Bettina Warscheid4,5, Lennart Martens1,2,3*, Elien Vandermarliere1,2,3 1 Medical Biotechnology Center, VIB, 9000 Ghent, Belgium 2 Department of Biochemistry, Ghent University, 9000 Ghent, Belgium 3 Bioinformatics Institute Ghent, Ghent University, 9000 Ghent, Belgium 4 Department of Biochemistry and Functional Proteomics, Institute of Biology II, Faculty of Biology, University of Freiburg, 79104 Freiburg, Germany 5 BIOSS Centre for Biological Signaling Studies, University of Freiburg, 79104 Freiburg, Germany 6 Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, Jamova cesta 39, 1000 Ljubljana, Slovenia 7 Faculty of Medicine, University of Ljubljana, 1000 Ljubljana, Slovenia 8 KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, 3000 Leuven, Belgium 9 Institute of Molecular Biology and Biotechnology-FoRTH and Department of Biology-University of Crete, Iraklio, 71100 Crete, Greece * Corresponding author: Prof. Dr. Lennart Martens, A. Baertsoenkaai 3, 9000 Ghent, Belgium. E-mail: [email protected] Tel: +3292649358 Fax: +32-92649484

1 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 33

ABSTRACT Chemical cross-linking coupled with mass spectrometry plays an important role in unravelling protein interactions, especially weak and transient ones. Moreover, cross-linking complements several structure determination approaches such as cryo-EM. Although several computational approaches are available for the annotation of spectra obtained from cross-linked peptides, there remains room for improvement. Here, we present Xilmass, a novel algorithm to identify cross-linked peptides that introduces two new concepts: (i) the cross-linked peptides are represented in the search database such that the cross-linking sites are explicitly encoded, and (ii) the scoring function derived from the Andromeda algorithm was adapted to score against a theoretical MS/MS spectrum that contains the peaks from all possible fragment ions of a cross-linked peptide pair. The performance of Xilmass was evaluated against the recently published Kojak and the popular pLink algorithms on a calmodulin-plectin complex data set, as well as three additional, published data sets. The results show that Xilmass typically had the highest number of identified distinct cross-linked sites and also the highest number of predicted cross-linked sites.

2 ACS Paragon Plus Environment

Page 3 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

INTRODUCTION Chemical cross-linking coupled with mass spectrometry (XL-MS) has gained importance to study protein structures and their interactions1,2,3,4. It allows to study proteins that are inaccessible to the conventional structural biology techniques1,5,6; but it also complements approaches, especially cryo-electron microscopy (cryo-EM)7–10. As such, XL-MS is an important player in the integrative approach of modern structural biology1,11,12. A typical XLMS experiment starts by covalently linking residues either within or between proteins by a chemical cross-linker which has known reactive groups at its ends and a known length providing further analyses13,14 with a distance constraint. The cross-linked protein sample is subsequently proteolytically digested and the resulting peptide mixture which is composed of cross-linked and non-cross-linked peptides, can be further purified chromatographically. The peptides are finally analyzed by liquid chromatography tandem mass spectrometry (LCMS/MS)14,15. The identification of cross-linked peptides from the acquired MS/MS spectra is a computationally challenging task for several reasons13. First, the computational search space increases

quadratically

because

each

of

the

possible

peptide-linked-to-peptide

combinations has to be represented in the database13,14,16. Therefore, the majority of the studies focus primarily on the analysis of purified protein complexes instead of on entire proteomes17. Second, MS/MS spectra of cross-linked peptides are typically more complex than MS/MS spectra of single peptides13: cross-linked spectra contain fragment ions derived from (combinations of) both peptides16, and tend to have higher precursor charge states18. Third, cross-linking efficiency is low and as a result the peptide mixture obtained from crosslinking experiments is generally composed of only a few, low-abundant cross-linked peptides among many unlinked peptides13, which adds to the difficulty to reliably identify cross-linked peptides6. To overcome this low abundance, size exclusion or ion exchange chromatography techniques are often applied to enrich for cross-linked peptides19,20,21. Fourth, cross-linking is not a uniform chemical process. A typical bifunctional cross-linker is a chemical entity with two reactive groups on either end that may be identical (“homo”) or different (“hetero”) in their chemical reactions. Only one reactive group might bind to a residue with the second group failing to react (mono-linked peptide), or both reactive groups of the cross-linker might bind to residues within the same peptide (loop-linked or 3 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 33

cyclic peptide). The cross-linking might also occur between residues of the same protein (intra-protein cross-linked peptides), or between residues of different proteins (interprotein cross-linked peptides). Moreover, cross-linking might even occur between two cross-linked peptides (higher order cross-linked peptides)14. Given the complexity described above, several algorithms have been introduced in the last decade to facilitate the automated annotation of MS/MS spectra derived from cross-linking experiments22–28. One approach is based on the in silico linearization of cross-linked peptide pairs17,29, i.e., cross-linked peptides are introduced in the search database as linear sequences concatenated in pairs with all other potential cross-linkable peptides in all possible permutations. This curated cross-linked database is then searched with traditional search algorithms17. In this approach, two (linearized) permutations derived from a pair of cross-linked peptides can provide the entire set of fragment ions only when considered together, which implies that each permutation results in some missing fragment ion types13. Another approach is based on stable isotope labeling to facilitate the detection of crosslinked

peptides

by

MS30,31.

This

strategy

has

been

implemented in

several

algorithms23,27,28,32,33. A commonly used version of this approach is to introduce light and heavy isotope-labeled cross-linkers, and a popular algorithm for this type of analyses is xQuest23. The xQuest algorithm first detects isotopic pairs on the MS1 level that show a mass shift which corresponds to the mass difference between the heavy and light crosslinkers. MS/MS spectra from such a pair of labeled cross-linked peptides are then processed to sort ions into common and cross-linked ones, followed by scoring. This approach thus heavily relies on the experimental detection of a pair of cross-linker-labeled peptides. Another labeling approach is to perform the enzymatic cleavage of the cross-linked proteins in a buffer that contains either 16O or 18O labeled water molecules31,34. A drawback of this method is that complete and stable

18

O atom incorporation is not easy to achieve13,35.

Despite extensive studies on labeling strategies, Giese et al.18 recently suggested that labeling is not necessary in cross-linking studies due to distinct cross-linked fragments on spectra in terms of mass and charge. Lastly, the use of a cleavable cross-linker has been introduced in order to improve identifications 16 chemically cleavable

28 32

28 32,36 37

. One type of such cross-linkers is

, whereas another type is only cleavable during the fragmentation

in the mass spectrometer16

36 37

(MS-cleavable cross-linker). Chemical cleavage allows the 4

ACS Paragon Plus Environment

Page 5 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

detection of modified peptides, whereas MS-cleavable cross-linkers allow the identification of peptide specific reporter ions during MS/MS. These cleavable cross-linking approaches can overcome the issue of the increased search database size in cross-linking studies5. Recently Liu et al.16 described an interaction study within the whole human proteome with the use of MS cleavable cross-linkers; however their approach relies on the combination of two different fragmentation technologies which is not available on every mass spectrometer. In this work, we present Xilmass, a novel, freely available and open-source algorithm for the identification of cross-linked peptides from MS/MS spectra. Xilmass starts with the generation of a search database that is based on a novel representation of cross-linked peptides in a modified FASTA format. Xilmass then generates a theoretical spectrum for each cross-linked peptide pair that contains all possible fragment ions. While the StavroX 25 algorithm also aims to consider all possible fragment ions, Xilmass also takes into account extra ions derived from a linked residue (see Figure S6). Next, Xilmass employs a probabilityderived scoring function, adapted from the Andromeda algorithm 38, to assign the peptideto-spectrum matches. The output of Xilmass is an easy-to-read, tab-delimited file that contains the best ranked cross-linked peptide-to-spectrum matches (XPSMs). Optionally, further validation can be carried out through the computation of the false discovery rate (FDR) of the XPSMs22,39. We illustrate the performance of Xilmass on a data set of the protein complex of calmodulin (CaM) and the actin-binding domain (ABD) of plectin40 and compare Xilmass with the recently published Kojak24 and the popular pLink22 algorithms. The results showed that Xilmass outperformed the other algorithms on this data set. Xilmass was further tested on three published data sets. On these data sets too, Xilmass showed better performance than the other algorithms on both higher-energy collisional dissociation (HCD) and collision induced dissociation (CID) data sets.

5 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 33

EXPERIMENTAL SECTION Sample preparation Human CaM and plectin-ABD isoform 1a, expressed and purified as described40, were kindly provided by Kristina Dijnović-Carugo and Jae-Geun Song (Vienna, Austria). The protein complex was cross-linked by isotopically light (non-labeled) and heavy (labeled) disuccinimidyl suberate (DSS-d0 and DSS-d12), subsequently digested with trypsin and analyzed by LC-MS/MS using fragmentation by higher-energy collisional dissociation (HCD). A detailed description of sample preparation and MS analysis is given in S1 in the Supporting Information (SI).

Search settings The raw data were converted with msconvert (part of ProteoWizard release 3.0.7692)41 to mgf for both Xilmass and pLink, and to mzML42 for Kojak. A database that contains the two target proteins and their reversed sequences (decoy) was prepared by DBToolKit43 (version 4.2.4). The decoy sequences were later used to compute FDR44. In silico digestion was performed and a cross-linked peptide database was built for each possible peptide-pair combination (see S2.2 in SI). This resulted in 27,414 possible cross-linked peptides from the four proteins: the two target proteins and their reversed sequences, without considering any modification. The database search settings were carbamidomethylation of cysteine as a fixed modification and oxidation of methionine as a variable modification. The fragmentation mode was set to HCD, with the precursor and the fragment mass tolerances set at 15 ppm and 0.03 Da respectively. Subsequently four isotopic precursor mass tolerance windows at 10 ppm were allowed. See S2.2 in SI for the parameters to run Xilmass, Kojak24 (version 1.3.6) and pLink22 (version date: 27.01.2015). The XPSMs identified by Xilmass were validated by the built-in FDR specific for cross-linked peptides; Kojak identifications were validated with Percolator45 (version 2.07) and the validated XPSMs for pLink were obtained straight from pLink. The performance of the algorithms was compared for the validated XPSMs at 5% FDR. These

6 ACS Paragon Plus Environment

Page 7 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

validated XPSM lists were then inspected for the occurrence of common contaminants (See S3 in SI) and any contaminant-derived spectrum was removed. Verification of the identified cross-linking sites Identified cross-linking sites were further verified with the aid of known structures: PDBentry of 4Q5740 contains the actin-binding domain (ABD) of plectin in complex with the Nterminal domain of calmodulin, and PDB-entry of 2F3Y46 contains calmodulin (complexed with a subunit of the voltage-dependent L-type calcium channel; which was not of interest in this work). The maximum allowed DSS distance constraint was Euclidean Cα-Cα distance of 30 Å19,47. The identified sites were visualized in PyMOL (The PyMOL Molecular Graphics System, Version 1.1r1 Schrödinger, LLC.) and compared against the predicted sites. Xwalk (version v0.6), an algorithm that predicts XL sites by the computation of the solvent accessible surface distance (SASD)48, was run in the production mode allowing a SASD of 34 Å, which is the default Xwalk value for lysine to lysine XL sites with DSS as cross-linker. The prediction mode was set to both intra- and inter-protein XL sites for 4Q57 and only the intra-protein XL sites of calmodulin for 2F3Y. In total 86 XL sites were predicted.

Xilmass: an algorithm to identify cross-linked peptides Xilmass is designed to identify cross-linked peptides, irrespective of the labeling strategy. Xilmass is implemented in Java and Apache-Lucene (https://lucene.apache.org/) is used for database indexing. The algorithm is inherently multithreaded for the XPSM score calculations. Xilmass is open-source and freely available under the permissive Apache2 license and works on Windows, Linux and OS-X operating systems. Xilmass is available both as a graphical user interface (GUI) tool, but can also be ran on the command-line. The Xilmass

software

and

its

source

code

can

be

downloaded

from

https://github.com/compomics/xilmass.git, and documentation is also available on this site.

Data availability The data set and the cross-linked peptides database are available in ProteomeXchange49 through PRIDE50, under accession number PXD003880. 7 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 33

Reviewer account details: Username: [email protected] Password: T0nxReaE

8 ACS Paragon Plus Environment

Page 9 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

RESULTS AND DISCUSSION

The Xilmass algorithm Xilmass runs through three steps to identify cross-linked peptides: (i) construction of a cross-linked peptide search database, (ii) generation of theoretical MS/MS spectra for the cross-linked peptides, and (iii) scoring of theoretical spectra against experimental MS/MS spectra. Xilmass starts with an in silico tryptic digestion of a given protein database. The resulting peptides are then combined to build all possible cross-linked peptide pairs. These crosslinked peptides are subsequently written to a modified FASTA database with the “.fastacp” extension. This modified FASTA format allows all cross-linking related information for each peptide pair to be stored as a single database entry. Each entry header contains a protein accession number of each protein from which the linked peptides were derived, the start and end positions of these peptides within their protein, and the position within each peptide where the cross-linker binds. Each sequence contains the two linked peptide sequences, separated by a vertical bar and with linked residues indicated by an appended asterisk (see S4). It should be noted that these modifications to the standard FASTA format prohibit such a database from being read by other search engines. The cross-linked peptide database is then indexed through the open-source Lucene library for fast ranged lookup of possible matches within the specified precursor mass tolerance. The matched entries from the database are then used to generate the theoretical XLMS/MS spectra. The current algorithms typically do not build one single theoretical spectrum that contains all possible fragment ions of a cross-linked peptide pair. Instead, either individual theoretical spectra are considered for each peptide in a cross-linked peptide pair with the introduction of a mass shift, or theoretical spectra are generated for linearized peptides with the introduction of a variable modification. Because fragmentation can happen anywhere on a cross-linked peptide, we instead rely on a single theoretical XLMS/MS spectrum that contains all possible fragment ions derived from both linked peptides. Figure 1 shows an example of a high scoring XPSM. The longer and shorter peptides are shown as “” and “”, respectively. The theoretical spectrum contains some regular 9 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 33

fragment ions from each peptide. For example, the  ion is a regular 4 fragment ion from peptide  (see Figure S4). However, the  ion of the same peptide has a regular  fragment ion plus the entire linked peptide (peptide ) with the attached cross-linker (see Figure S5). Moreover, we also propose fragment ions that are derived from the cross-linked residue. For example, a fragmentation event may remove the entire linked peptide but the linked residue may remain attached to the cross-linker.. An example of this is seen in Figure 1 in the form of the   fragment ion, where the  ion of peptide  is attached to the cross linker (see Figure S6). Or in other words, the amide bond of the lysine in peptide B to the cross-linker is broken during fragmentation which leaves the cross-linker attached to peptide A. Lastly, the linked residue can remain attached to a fragment ion from the other peptide. For example,   is the  ion of peptide  linked to the  ion of peptide , including the attached cross linker (see Figure S7). Additionally, our theoretical spectrum generation is compatible with both labeled and unlabeled strategies, which allows the detection of a cross-linked peptide in a labeling experiment even if its paired cross-linked peptide (which is supposedly labeled) remains undetected. The fragment ion type is based on the fragmentation mode in which the experimental spectra are generated and must be provided to Xilmass by the user. CID fragmentation mode yields predominantly b- and y-ions, whereas HCD spectra also contain a-ions51, with a signature a2/b2 pair52. We therefore introduce b- and y-ions for both CID and HCD fragmentation, but also the a2 ion and its possible linked derivatives for the HCD fragmentation mode. To keep the theoretical spectrum as sparse as possible, we did not introduce all a-ions. Lastly, if the precursor charge state is higher than two, only singly and doubly charged theoretical fragment ions are considered. Experimental XL-MS/MS spectra are matched against the generated theoretical spectra within a given precursor tolerance. Cross-linked peptides tend to be larger and this might result in the selection of the C-13 peak instead of the C-12 peak by peak picking algorithms. To overcome this, Xilmass follows the same approach as pLink and allows the introduction of several peptide mass tolerance windows which allows identification even if the peak picking algorithm selected the C-13 peak instead of the C-12 peak. After the precursor ionbased selection, the experimental peaks derived from the precursor ion are removed and subsequently, fragment ion deisotoping and then charge state deconvolution are applied. 10 ACS Paragon Plus Environment

Page 11 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

During this process, fragment ions with the same mass but different charges (up to the precursor charge) are converted to singly charged ions in the presence of one isotopic peak. The preprocessed spectrum is then scored by the probabilistic scoring function adapted from Andromeda38. For this, the experimental spectrum is divided into several user-definedsize bins by allowing a certain number of experimental peaks based on their intensities. Then, this experimental spectrum is dynamically scored against the theoretical spectrum. Moreover, we have verified that the Xilmass score allows true hits to be well separated from false hits (See S5 in SI). Furthermore, neutral losses can be also considered in Xilmass, in one of three ways: i) no neutral losses; ii) water and ammonia neutral losses, for selected residues (as implemented in the Andromeda38); and iii) all water and ammonia losses for every residue as singly and doubly charged. Cross-linked peptide validation The output from Xilmass is a list of the highest-scored PSM per experimental spectrum; a spectrum could be matched to either a cross-linked peptide (XPSM) or to a mono-linked peptide (when searchForAlsoMonoLink is set to true). The inclusion of mono-linked peptides is crucial because the mass of a cross-linked peptide derived from two adjacent peptides may sometimes be equal to a mono-linked peptide that contains an additional missed cleavage site, and due to the presence of shared fragment ions, the algorithm could assign the spectrum to a cross-linked peptide instead of the mono-linked peptide. The output from Xilmass contains target PSM and decoy PSM results. These decoy XPSMs are composed of either one target and one decoy (half decoy), or two decoy sequences (full decoy). Moreover, Trnka et al53 showed an improvement in the validation by separating PSMs into inter- and intra-protein cross-linking sites (improved FDR) when compared to global FDR calculation. This strategy was used in Kojak, and is an option in Xilmass. Our built-in FDR calculation computes the FDR as explained by Yang et al. 22 with the following equation:  =

∑  ∑   ∑ !"

(1)

11 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 33

Analysis of the cross-linked calmodulin-plectin complex The performance of Xilmass was evaluated on the purified calmodulin-plectin complex. In total, 21,405 MS/MS spectra were analyzed by Xilmass, Kojak, and pLink, and their identifications were compared at an XPSM-FDR of 5%. For Xilmass, we used the improved FDR, instead of the global FDR (Table S1 shows the results for both the global and the improved FDR calculations). In order to assess the relative performance of each algorithm, we compared the number of XPSMs, and the number of distinct cross-linking (XL) sites. It is of note that an XL site is a pair of residues that link two peptides. Therefore, two different inter-linked peptides might cover the same distinct XL site. This can happen because of missed cleavage sites. The validated XPSM list contains 942 XPSMs for Xilmass; whereas there are 284 XPSMs for Kojak, and 756 XPSMs for pLink. The higher number of XPSMs for both Xilmass and pLink as compared to Kojak can be explained by the use of several peptide mass windows during the identification. These peptide mass windows allow different isotopic peaks to be covered. 385 out of the 942 Xilmass-XPSMs (41%) and 263 out of the 756 pLink-XPSMs (35%) had a mass difference equal to, or less than the given precursor tolerance of 15 ppm. To verify the obtained results, the identified XL sites were mapped onto the available structures (see Figure 2 for Xilmass, and Figure S8 for each algorithm) and then compared to the predictions from Xwalk (Table 1 and Table S1). We divided the identified distinct XL sites into four groups: (i) those within the distance constraint of DSS (30 Å); (ii) those exceeding 30 Å; (iii) homo-multimeric inter-protein links (the XL site is linked to the same site on the same protein, which is presumably part of a different complex); and (iv) those for which structural information is missing. The results show complementarity between the three algorithms as only about one fourth of the total distinct XL sites are shared. In total 46 shared XL sites were obtained from 653 matched XPSMs out of the 942 Xilmass-XPSMs (69%), 202 out of the 284 Kojak-XPSMs (71%), and 647 out of the 756 pLink-XPSMs (85%) (Figure 3, Figure S3, Figure S9, and Table S2). 20 out of these 46 shared XL sites were within the distance constraint and 17 out of these 20 XL sites were predicted by Xwalk. Manual verification on the structure of the

12 ACS Paragon Plus Environment

Page 13 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

remaining three XL sites indicates that intra-protein cross-linking is likely impossible. In addition, ten out of the remaining 26 XL sites exceeded the distance constraint. Seven XL sites that exceed the distance constraint and the three remaining XL sites within the distance constraint are located on the exposed surface of the protein. They are therefore most likely due to inter-protein cross-linking, i.e., linkages between two of the same proteins that are part of different complexes. Lastly, 14 XL sites lack structural information, and two XL sites are homo-multimeric. Overall, Xilmass resulted in the highest number of XPSMs as well as the highest number of distinct XL sites, including the highest number of predicted XL sites (Table1). In total, Xilmass identified 130 distinct XL sites, with 94 distinct XL sites identified by at least two XPSMs (72% of the total distinct Xilmass sites). Kojak identified the second largest number of distinct XL sites: 98 in total, but only 61 of these XL sites had at least two XPSMs (62% of total distinct Kojak sites). pLink identified 87 distinct XL sites, with 68 identified by at least two XPSMs (78% of total distinct pLink sites) (Table 1 and Figure 3). Overall, Xilmass identified 10% more distinct XL sites with at least two XPSMs compared to Kojak, which finds more “one-hit wonders” (sites that are identified by only a single XPSM), but 4% less distinct XL sites with at least two XPSMs than pLink. 58 distinct XL sites identified by Xilmass were observed within the distance constraint (Figure 4). This value is 39 for Kojak and 32 for pLink. Comparison against the Xwalk predictions shows that 40 Xilmass sites were predicted versus 26 Kojak sites, and 22 pLink sites. For predicted sites with at least two XPSMs, 27 Xilmass sites remain versus 18 sites for both Kojak and pLink. The N-terminus of the proteins may be of functional importance and cross-linking can show this. Therefore, the N-terminus has to be included in the analysis. Furthermore, due to the use of recombinant proteins, reactions of the protein N-terminus frequently reflect the flexible nature of this terminus and might be an artifact of protein production. Due to this importance, Xilmass reports such XL sites, as well. Xilmass and pLink follow a similar approach and assume that the protein N-terminus can be cross-linked due to the presence of a primary amine group. The second residue is also taken into account because of the removal of the N-terminal methionine. However, this analysis showed that Kojak identified some novel N-terminal sites that were found neither by Xilmass nor by pLink. Kojak seems to always take the first two residues of the N-terminus into account, regardless of the N13 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 33

terminal methionine. Moreover, cross linkages which are chemically very unlikely are also reported by Kojak. Our dataset contains an example that illustrates this. The plectin ABD domain starts with the Gly-Pro motif. Here, Kojak assumes that the Pro can be linked, which is chemically very unlikely. This resulted in 10 Kojak specific XL sites with proline linked (21 XPSMs) (Table S3). Xilmass also has the highest number of algorithm-specific XL sites predicted by Xwalk (Figure S10). As seen in Figure 3, there are 50 XL sites specific for Xilmass, 23 XL sites for Kojak and 20 XL sites for pLink. There are 10 Xilmass-specific XL sites that were predicted by Xwalk, two for Kojak and only one for pLink (Table S4, Table S5, Table S6). In addition to these sites, some XL sites within the distance constraint were most likely impossible based on the available structure: 7 for Xilmass, 3 for Kojak and 2 for pLink. However, these linked residues are on the surface of the protein and might therefore be between proteins that originate from different complexes. Additionally, some XL sites that exceed the distance constraint are most likely not possible based on the available structures but the residues were located on the surface. Xilmass has 5 such XL sites, Kojak has 6 such XL sites and pLink has two such XL sites.

Analysis of publicly available data sets Human serum albumin: To illustrate the versatility of Xilmass and to evaluate its performance on a CID data set, a publicly available data set from PRIDE was selected (PXD00214254

18

). In this data set, human serum albumin (HSA) was cross-linked by both

light- and heavy- labeled BS3 and MS/MS spectra were generated by CID fragmentation. We performed an analysis that was analogous to that of the calmodulin-plectin complex (see S2.3 and S3). Unlike the calmodulin-plectin protein complex, the available HSA structure is that of a homo-dimeric protein which is rich in lysines (PDB-entry: 1A0655). Xwalk predicted 899 possible lysine-to-lysine linking sites with a maximum 34 Å SASD. Identifications from each algorithm, including the XiQ54 algorithm results from the original study54, were compared (Figure S11 and Table S4). Xilmass identified the most XL sites: 232 XL sites, compared to 78 XL-sites by Kojak, 61 XL-sites by pLink and 43 XL-sites by XiQ. 91 Xilmass sites were predicted by Xwalk. Xilmass identified the maximum absolute number of distinct 14 ACS Paragon Plus Environment

Page 15 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

XL sites as well as Xwalk predicted sites. Further, filtering the results by their XPSMs improved the relative ratio of predicted Xwalk sites. (Table S7). HOP2-MND1 complex: Xilmass was further tested using the PRIDE data set (PXD001538)21 on the Arabidopsis thaliana HOP2-MND1 complex. This data set was generated in four replicates using several cross-linkers including DSS, and was generated by HCD fragmentation. This study used the standard pLink search settings and then applied more stringent filtering: 1% of FDR, 5ppm of precursor mass filter and e-value of 10-2. The pLink results for distinct XL sites were taken from the Supplemental Table of this study21. Xilmass was run on each replicate, restricting the FDR to 1%. Two different precursor-tolerances (5ppm and 10ppm) were selected to see effects at this lower FDR, because Xilmass computes only a PSM score and does not apply any distinctive filtering based on precursortolerance for the validation step, while pLink does. Xilmass identified 53 distinct XL sites at 5 ppm precursor tolerance, whereas pLink identified 67 distinct XL sites, with 29 shared between Xilmass and pLink. However, Xilmass identified more XL sites than pLink when the precursor tolerance was increased to 10ppm, which also increased the number of overlapping XL sites to 34 (Figure S12 and Table S8). UTP-B complex: The last data set that was used to evaluate Xilmass was one of the original data sets used in the pLink paper22. This data set was previously compared against another algorithm, Protein Prospector53. The results of these two algorithms were directly taken from the Supplementary Information of this study53. The UTP-B complex is composed of six proteins and this sample was cross-linked by both light- and heavy-labeled BS3. The Xilmass search was performed according to S2.5.1. Because pLink results were filtered by at least 3 spectral counts, this XPSMs based filtering is also applied for Xilmass. The results at 5% FDR show that Xilmass identified 71 XL sites with 42 shared with both other algorithms, compared to 78 pLink sites and 84 Protein Prospector sites. (Figure S13-a). However, if the the number of spectral counts for filtering is decreased to two for Xilmass, it identified 88 XL sites with 46 shared sites between the three algorithms (Figure S13-b and Table S9). Xilmass identified more intra-protein sites than the other algorithms.

15 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 33

CONCLUSION MS-based cross-linking experiments are being successfully applied to study protein structures and complexes and are frequently part of integrated structural biology approaches. In addition to the optimization of the experimental set-up, the success of a cross-linking experiment heavily relies on the accurate annotation of the obtained experimental spectra. To support this, we developed Xilmass, a novel algorithm to identify cross-linked peptides. Xilmass encodes the cross-linked peptides in a novel way for the search database and uses the scoring function from Andromeda to score against a theoretical spectrum that contains the peaks from all the possible fragment ions of a crosslinked peptide pair. Xilmass can handle data from both labeled and unlabeled experiments, and provides a GUI that allows researchers to easily adopt Xilmass in their research, while the availability of a command-line interface also allows automation. Moreover, Xilmass is inherently cross-platform, working on, Windows, Linux and OS-X, and is open source under the permissive Apache2 license, rather than closed source with a restrictive license like in the case of pLink. We illustrated that Xilmass can be applied to isolated protein complexes. To this end, we compared Xilmass against Kojak and pLink with the aid of the calmodulin-plectin ABD complex. Xilmass had the best performance in terms of number of XPSMs, distinct XL sites, and Xwalk predicted XL sites. We also evaluated Xilmass on several published data sets, including one that is generated by CID. Xilmass also had the best performance on the CID data set, but it should be noted many of the Xilmass-specific observations were single hits. For the results of the UTP-B data set, Xilmass performed slightly worse than pLink and ProteinProspector when filtered for at least three XPSMs per site, but when filtered for two XPSMs per site, it outperformed the other algorithms. A problem that still remains largely unaddressed is the analysis of interactions within a full proteome with the aid of cross-linkers. Like most of the available identification algorithms, Xilmass mainly focuses on the analysis of single complexes. Analysis of large complexes is possible but takes much more processing time due to the vastly increased complexity of the search space. Moreover, these much larger search spaces are also more likely to yield false positive identifications, which results in either less reliability, or less sensitivity of the 16 ACS Paragon Plus Environment

Page 17 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

obtained results44,56. Because of these issues of all currently available search tools for crosslinked peptides, we believe that even if the combinatorial explosion of search space complexity was computationally overcome, application of the current algorithms to whole proteomes lacks the required robustness.

Supporting Information The Supporting Information is available free of charge via the Internet at http://pubs.acs.org. -

Sample preparation, Algorithm parameters used, Contaminant derived spectra search, Database entry information. Figures for different theoretical ions matched on an experimental spectrum

-

Additional figures including XL sites mapped onto the structure and Venn diagrams for the identified distinct XL sites by each algorithm and algorithm-specific XL sites

-

An extended version of Table 1 and the tables for distinct XL sites in the protein Nterminus, distinct unique XL sites for each algorithm, and overlapped distinct XL site in addition to distinct XL sites for analyzing published data sets.

The authors declare no competing financial interest.

ACKNOWLEDGEMENT We thank Kristina Dijnović-Carugo and Jae-Geun Song for providing protein samples, MengQiu Dong for providing data set, and Rebecca Beveridge for providing details about the data set. Ş.Y. would like to thank S. Gupta, D. Maddelein, P.C. Masuzzo, A. Staes and A. Sticker for helpful discussions. E.V. is a postdoctoral research fellow of the Research Foundation Flanders. A.E. acknowledges support from KUL-Spa (Onderzoekstoelagen 2013; Bijzonder Onderzoeksfonds;

KU

Leuven)

and

RiMembR

(Vlaanderen

Onderzoeksprojecten;

#G0C6814N; FWO). This work was funded by Ghent University (Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks”, Concerted Research Action BOF12/GOA/014), and the PRIME-XS project (Grant 262067) funded by the European Union 7th Framework Program (to L.M.), and also the Deutsche Forschungsgemeinschaft (FOR 17 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 33

1905 and FOR 1352 to BW) and the Excellence Initiative of the German Federal and State Governments (EXC 294 BIOSS to B.W.).

18 ACS Paragon Plus Environment

Page 19 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

REFERENCES (1)

Sinz, A.; Arlt, C.; Chorev, D.; Sharon, M. Prot. Sci. 2015, 24, 1193–1209.

(2)

Tabb, D. L. Nat. Methods 2012, 29, 879–881.

(3)

Bruce, J. E. Proteomics 2012, 12, 1565–1575.

(4)

Petrotchenko, E. V.; Borchers, C. H. Mass Spectrom. Rev. 2010, 29, 862–876.

(5)

Vandermarliere, E.; Stes, E.; Gevaert, K.; Martens, L. Mass Spectrom. Rev. 2014, n/a, 1–13.

(6)

Sinz, A. J. Mass Spectrom. 2003, 38, 1225–1237.

(7)

Benda, C.; Ebert, J.; Scheltema, R. A.; Schiller, H. B.; Baumgärtner, M.; Bonneau, F.; Mann, M.; Conti, E. Mol. Cell 2014, 56, 43–54.

(8)

Greber, B. J.; Boehringer, D.; Leibundgut, M.; Bieri, P.; Leitner, A.; Schmitz, N.; Aebersold, R.; Ban, N. Nature 2014, 515, 283–286.

(9)

Greber, B. J.; Bieri, P.; Leibundgut, M.; Leitner, A.; Aebersold, R.; Boehringer, D.; Ban, N. Science 2015, 348, 303–308.

(10)

Lasker, K.; Forster, F.; Bohn, S.; Walzthoeni, T.; Villa, E.; Unverdorben, P.; Beck, F.; Aebersold, R.; Sali, A.; Baumeister, W. Proc. Natl. Acad. Sci .USA 2012, 109, 1380–1387.

(11)

Ward, A. B.; Sali, A.; Wilson, I. A. Science 2013, 339, 913–915.

(12)

Politis, A.; Stengel, F.; Hall, Z.; Hernández, H.; Leitner, A.; Walzthoeni, T.; Robinson, C. V; Aebersold, R. Nat. Methods 2014, 11, 403–406.

(13)

Leitner, A.; Walzthoeni, T.; Kahraman, A.; Herzog, F.; Rinner, O.; Beck, M.; Aebersold, R. Mol. Cell. Proteomics 2010, 9, 1634–1649.

(14)

Rappsilber, J. J. Struct. Biol. 2011, 173, 530–540.

(15)

Sinz, A. Mass Spectrom. Rev. 2006, 25, 663–682.

(16)

Liu, F.; Rijkers, D. T. S.; Post, H.; Heck, A. J. R. Nat. Methods 2015, 12, 1179–1184.

(17)

Maiolica, A.; Cittaro, D.; Borsotti, D.; Sennels, L.; Ciferri, C.; Tarricone, C.; Musacchio, A.; Rappsilber, J. Mol. Cell. Proteomics 2007, 6, 2200–2211.

(18)

Giese, S. H.; Fischer, L.; Rappsilber, J. Mol. Cell. Proteomics 2016, 15, 1094–1104.

(19)

Leitner, A.; Walzthoeni, T.; Aebersold, R. Nat. Protoc. 2014, 9, 120–137.

(20)

Leitner, A.; Reischl, R.; Walzthoeni, T.; Herzog, F.; Bohn, S.; Forster, F.; Aebersold, R. Mol. Cell. Proteomics 2012, 11, M111.014126, 1–12.

(21)

Rampler, E.; Stranzl, T.; Orbán-Németh, Z.; Hollenstein, D. M.; Hudecz, O.; Schloegelhofer, P.; Mechtler, K. J. Proteome Res. 2015, 14, 5048–5062.

19 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 33

(22)

Yang, B.; Wu, Y.-J.; Zhu, M.; Fan, S.-B.; Lin, J.; Zhang, K.; Li, S.; Chi, H.; Li, Y.-X.; Chen, H.-F.; Luo, S.-K.; Ding, Y.-H.; Wang, L.-H.; Hao, Z.; Xiu, L.-Y.; Chen, S.; Ye, K.; He, S.-M.; Dong, M.-Q. Nat. Methods 2012, 9, 904–906.

(23)

Rinner, O.; Seebacher, J.; Walzthoeni, T.; Mueller, L. N.; Beck, M.; Schmidt, A.; Mueller, M.; Aebersold, R. Nat. Methods 2008, 5, 315–318.

(24)

Hoopmann, M. R.; Zelter, A.; Johnson, R. S.; Riffle, M.; MacCoss, M. J.; Davis, T. N.; Moritz, R. L. J. Proteome Res. 2015, 14, 2190–2198.

(25)

Götze, M.; Pettelkau, J.; Schaks, S.; Bosse, K.; Ihling, C. H.; Krauth, F.; Fritzsche, R.; Kühn, U.; Sinz, A. J. Am. Soc. Mass Spectrom. 2012, 23, 76–87.

(26)

Lima, D. B.; de Lima, T. B.; Balbuena, T. S.; Neves-Ferreira, A. G. C.; Barbosa, V. C.; Gozzo, F. C.; Carvalho, P. C. J. Proteomics 2015, 129, 51–55.

(27)

Gao, Q.; Xue, S.; Doneanu, C. E.; Shaffer, S. A.; Goodlett, D. R.; Nelson, S. D. Anal. Chem. 2006, 78, 2145–2149.

(28)

Petrotchenko, E. V; Borchers, C. H. BMC Bioinf. 2010, 11, 1–10.

(29)

Panchaud, A.; Singh, P.; Shaffer, S. a; Goodlett, D. R. J. Proteome Res. 2010, 9, 2508–2515.

(30)

Muller, D. R.; Schindler, P.; Towbin, H.; Wirth, U.; Voshol, H.; Hoving, S.; Steinmetz, M. O. Anal. Chem. 2001, 73, 1927–1934.

(31)

Zelter, A.; Hoopmann, M. R.; Vernon, R.; Baker, D.; MacCoss, M. J.; Davis, T. N. J.Proteome Res. 2010, 9, 3583–3589.

(32)

Petrotchenko, E. V; Olkhovik, V. K.; Borchers, C. H. Mol. Cell. Proteomics 2005, 4, 1167–1179.

(33)

Seebacher, J.; Mallick, P.; Zhang, N.; Eddes, J. S.; Aebersold, R.; Gelb, M. H. J. Proteome Res. 2006, 5, 2270–2282.

(34)

Back, J. . W.; Notenboom, V.; de Koning, L. J.; Muijsers, A. O.; Sixma, T. K.; de Koster, C. G.; de Jong, L. Anal. Chem. 2002, 74, 4417–4422.

(35)

Miyagi, M.; Rao, K. C. S. Mass Spectrom. Rev. 2007, 26, 121–136.

(36)

Götze, M.; Pettelkau, J.; Fritzsche, R.; Ihling, C. H.; Schäfer, M.; Sinz, A. J. Am. Soc. Mass Spectrom. 2014, 26, 83–97.

(37)

Burke, A. M.; Kandur, W.; Novitsky, E. J.; Kaake, R. M.; Yu, C.; Kao, A.; Vellucci, D.; Huang, L.; Rychnovsky, S. D. Org. Biomol. Chem. 2015, 13, 5030–5037.

(38)

Cox, J.; Neuhauser, N.; Michalski, A.; Scheltema, R. A.; Olsen, J. V.; Mann, M. J. Proteome Res. 2011, 10, 1794–1805.

(39)

Walzthoeni, T.; Claassen, M.; Leitner, A.; Herzog, F.; Bohn, S.; Förster, F.; Beck, M.; Aebersold, R. Nat. Methods 2012, 9, 901–903.

(40)

Song, J.-G.; Kostan, J.; Drepper, F.; Knapp, B.; de Almeida Ribeiro, E.; Konarev, P. V.; Grishkovskaya, I.; Wiche, G.; Gregor, M.; Svergun, D. I.; Warscheid, B.; Djinović-Carugo, K.

20 ACS Paragon Plus Environment

Page 21 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Structure 2015, 23, 558–570. (41)

Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P. Bioinformatics 2008, 24, 2534– 2536.

(42)

Martens, L.; Chambers, M.; Sturm, M.; Kessner, D.; Levander, F.; Shofstahl, J.; Tang, W. H.; Römpp, A.; Neumann, S.; Pizarro, A. D.; Montecchi-Palazzi, L.; Tasman, N.; Coleman, M.; Reisinger, F.; Souda, P.; Hermjakob, H.; Binz, P.-A.; Deutsch, E. W. Mol. Cell. Proteomics 2011, 10, R110.000133,1–7.

(43)

Martens, L.; Vandekerckhove, J.; Gevaert, K. Bioinformatics 2005, 21, 3584–3585.

(44)

Colaert, N.; Degroeve, S.; Helsens, K.; Martens, L. J. Proteome Res. 2011, 10, 5555–5561.

(45)

Granholm, V.; Noble, W. S.; Kall, L. J. Proteome Res. 2012, 10, 2671–2678.

(46)

Fallon, J. L.; Halling, D. B.; Hamilton, S. L.; Quiocho, F. A. Structure 2005, 13, 1881–1886.

(47)

Merkley, E. D.; Rysavy, S.; Kahraman, A.; Hafen, R. P.; Daggett, V.; Adkins, J. N. Prot. Sci. 2014, 23, 747–759.

(48)

Kahraman, A.; Malmström, L.; Aebersold, R. Bioinformatics 2011, 27, 2163–2164.

(49)

Vizcaíno, J. A.; Deutsch, E. W.; Wang, R.; Csordas, A.; Reisinger, F.; Ríos, D.; Dianes, J. A.; Sun, Z.; Farrah, T.; Bandeira, N.; Binz, P.-A.; Xenarios, I.; Eisenacher, M.; Mayer, G.; Gatto, L.; Campos, A.; Chalkey, R. J.; Kraus, H.-J.; Albar, J. P.; Martinez-Bartolome, S.; Apweiler, R.; Omenn, G. S.; Martens, L.; Jones, A. R.; Hermjakob, H. Nat. Biotechnol. 2014, 32, 223–226.

(50)

Martens, L.; Hermjakob, H.; Jones, P.; Adamski, M.; Taylor, C.; States, D.; Gevaert, K.; Vandekerckhove, J.; Apweiler, R. Proteomics 2005, 5, 3537–3545.

(51)

Frese, C. K.; Altelaar, A. F. M.; Hennrich, M. L.; Nolting, D.; Zeller, M.; Griep-Raming, J.; Heck, A. J. R.; Mohammed, S. J. Proteome Res. 2011, 10, 2377–2388.

(52)

Michalski, A.; Neuhauser, N.; Cox, J.; Mann, M. J. Proteome Res. 2012, 11, 5479–5491.

(53)

Trnka, M. J.; Baker, P. R.; Robinson, P. J. J.; Burlingame, a L.; Chalkley, R. J. Mol. Cell. Proteomics 2014, 13, 420–434.

(54)

Fischer, L.; Chen, Z. A.; Rappsilber, J. J. Proteomics 2013, 88, 120–128.

(55)

Sugio, S.; Kashima, A.; Mochizuki, S.; Noda, M.; Kobayashi, K. Protein Eng 1999, 12, 439–446.

(56)

Muth, T.; Kolmeder, C. A.; Salojärvi, J.; Keskitalo, S.; Varjosalo, M.; Verdam, F. J.; Rensen, S. S.; Reichl, U.; de Vos, W. M.; Rapp, E.; Martens, L. Proteomics 2015, 15, 3439–3453.

21 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 33

TABLES Table 1. Number of XPSMs and distinct XL sites identified by Xilmass, Kojak and pLink at a 5% XPSM-FDR, for the calmodulin-plectin complex data set. Please note that total accounts also for distinct XL sites for homo-multimeric and missing structure. Distance

XPSMs

Xwalk prediction

At least two

Predicted

≤ 30Å One or more

> 30Å

TOTAL

At least two One or more At least two One or more

XPSMs Distinct XL sites Xilmass Kojak pLink Xilmass Kojak pLink

No prediction filter

324 439

75 110

243 317

27 42

18 25

18 25

Predicted

337

83

247

40

26

22

No prediction filter

455

124

324

58

39

32

-

226

57

134

24

15

13

-

234

67

139

32

25

18

-

906

247

737

94

61

68

942

284

756

130

98

87

22 ACS Paragon Plus Environment

Page 23 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

FIGURE LEGENDS

Figure 1. An example of an annotated high scored XPSM. The MS/MS spectrum is derived from a triply charged precursor ion observed at m/z 760.46 Th. The structure (PDB-entry: 4Q5740) displays the complex between plectin (light grey) and calmodulin (dark grey). One cross-link is between EKLLLWSQR (green, shown as A) and QVKLVNIR (blue, shown as B) by DSS-d12 (orange). The Euclidean Cα-Cα distance between the linked residues is 20.07 Å. Matched fragment ions are colored according to the color of the linked peptides.

Figure 2. Cross-linking sites of the calmodulin-plectin complex identified by Xilmass were mapped to the crystal structure of the complex between the plectin (light grey) and calmodulin (dark grey) (PDB entry: 4Q5740). The structure on the right shows the mapped cross-linking sites when the complex is turned 180° around the vertical axis. Blue lines show cross-linking within 30Å, whereas red lines show cross-links exceeding it.

Figure 3. A Venn diagram comparison of distinct cross-linking sites from the calmodulinplectin complex shows that about one-fourth of the total 181 XL sites are shared by all three algorithms (in total 46 XL sites). Xilmass identifies the highest number of algorithm-specific XL sites, followed by Kojak and pLink. See main text for a detailed analysis of the shared and the algorithm-specific XL sites.

Figure 4. Distinct XL sites for each algorithm within the distance constraint of 30Å, based on Xwalk prediction and further filtering for at least two XPSMs per site. In each category, Xilmass identified the highest number of distinct XL sites, followed by Kojak and pLink.

23 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 33

FIGURES

Figure 1

24 ACS Paragon Plus Environment

Page 25 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2

25 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 33

Figure 3

26 ACS Paragon Plus Environment

Page 27 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 4

27 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 33

for TOC only

28 ACS Paragon Plus Environment

Page 29 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 1. An example of an annotated high scored XPSM. The MS/MS spectrum is derived from a triply charged precursor ion observed at m/z 760.46 Th. The structure (PDB-entry: 4Q5739) displays the complex between plectin (light grey) and calmodulin (dark grey). One cross-link is between EKLLLWSQR (green, shown as A) and QVKLVNIR (blue, shown as B) by DSS-d12 (orange). The Euclidean Cα-Cα distance between the linked residues is 20.07 Å. Matched fragment ions are colored according to the color of the linked peptides. Figure 1 254x190mm (96 x 96 DPI)

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Cross-linking sites of the calmodulin-plectin complex identified by Xilmass were mapped to the crystal structure of the complex between the plectin (light grey) and calmodulin (dark grey) (PDB entry: 4Q5739). The structure on the right shows the mapped cross-linking sites when the complex is turned 180° around the vertical axis. Blue lines show cross-linking within 30Å, whereas red lines show cross-links exceeding it. Figure 2 406x203mm (96 x 96 DPI)

ACS Paragon Plus Environment

Page 30 of 33

Page 31 of 33

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 3. A Venn diagram comparison of distinct cross-linking sites from the calmodulin-plectin complex shows that about one-fourth of the total 181 XL sites are shared by all three algorithms (in total 46 XL sites). Xilmass identifies the highest number of algorithm-specific XL sites, followed by Kojak and pLink. See main text for a detailed analysis of the shared and the algorithm-specific XL sites. Figure 3 203x203mm (300 x 300 DPI)

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. Distinct XL sites for each algorithm within the distance constraint of 30Å, based on Xwalk prediction and further filtering for at least two XPSMs per site. In each category, Xilmass identified the highest number of distinct XL sites, followed by Kojak and pLink. Figure 4 76x50mm (600 x 600 DPI)

ACS Paragon Plus Environment

Page 32 of 33