Parsing and Quantification of Raw Orbitrap Mass Spectrometer Data

4 hours ago - Effective analysis of protein samples by mass spectrometry (MS) requires careful selection and optimization of a range of experimental p...
0 downloads 5 Views 726KB Size
Subscriber access provided by Warwick University Library

Parsing and Quantification of Raw Orbitrap Mass Spectrometer Data using RawQuant Kevin A. Kovalchik, Sophie Moggridge, David D. Y. Chen, Gregg B. Morin, and Christopher S. Hughes J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.8b00072 • Publication Date (Web): 23 Apr 2018 Downloaded from http://pubs.acs.org on April 23, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Parsing and Quantification of Raw Orbitrap Mass Spectrometer Data using RawQuant

Kevin A. Kovalchik1, Sophie Moggridge2, David D. Y. Chen1,*, Gregg B. Morin2,3,*, Christopher S. Hughes2 1 – Department of Chemistry, University of British Columbia, Vancouver, British Columbia, Canada 2 – Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, British Columbia, Canada 3 – Department of Medical Genetics, University of British Columbia, Vancouver, British Columbia, Canada

* To whom correspondence should be addressed: David Chen ([email protected], 1-604-822-0878) and Gregg Morin ([email protected], 1604-675-8154)

ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Effective analysis of protein samples by mass spectrometry (MS) requires careful selection and optimization of a range of experimental parameters. As the output from the primary detection device, the ‘raw’ MS data file can be used to gauge the success of a given sample analysis. However, the closed-source nature of the standard raw MS file can complicate effective parsing of the data contained within. To ease and increase the range of analyses possible, the RawQuant tool was developed to enable parsing of raw MS files derived from Thermo Orbitrap instruments to yield meta and scan data in an openly readable text format.

RawQuant can be commanded to export user-friendly files

containing MS1, MS2, and MS3 meta data, as well as matrices of quantification values based on isobaric tagging approaches. In this study, the utility of RawQuant is demonstrated in several scenarios: 1. Re-analysis of shotgun proteomics data for the identification of the human proteome; 2. Re-analysis of experiments utilizing isobaric tagging for whole-proteome quantification; 3. Analysis of a novel bacterial proteome and synthetic peptide mixture for assessing quantification accuracy when using isobaric tags. Together, these analyses successfully demonstrate RawQuant for the efficient parsing and quantification of data from raw Thermo Orbitrap MS files acquired in a range of common proteomics experiments. In addition, the individual analyses using RawQuant highlights parametric considerations in the different experimental sets, and suggests targetable areas to improve depth of coverage in identificationfocused studies, and quantification accuracy when using isobaric tags.

2 ACS Paragon Plus Environment

Page 2 of 45

Page 3 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Keywords TMT, isobaric labeling, Orbitrap, SPS-MS3, quantitative proteomics, iTRAQ

3 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Introduction In recent years, mass spectrometry (MS) methods have been successfully applied to study the proteomes of a variety of organisms 1. Achieving optimal performance in these types of studies requires thorough examination of MS methods to maximize the quality of the output data. During typical MS data acquisition, survey (MS1) scans are used to identify precursor ions suitable for fragmentation and tandem MS/MS (MS2) analysis. As a metric to gauge the performance of this type of shotgun proteomics approach, readouts of the numbers of peptide spectral matches (PSMs), unique peptides, and protein identifications derived from analysis of the generated MS2 spectra are often used. When measuring a complex sample, the numbers of identifications generally provides an excellent metric by which to assess MS performance, as it indirectly provides information on acquisition rate (number of MS2 scans) and the quality of the obtained data (identification rate). As a result, a growing number of tools designed for quality control tracking of MS performance are also based on identification metrics

2,3

. In addition to identification metrics, direct assessment of

MS performance can be achieved through examination of the raw scan data to determine parameters such as: 1. topN – number of dependent MS2 scans triggered after an MS1 event; 2. Duty cycle length – time required for an MS1 and all triggered dependent MS2 scans; 3. Scan rate – number of MS2 scans acquired per second. However, these data can be difficult to compile for the average MS user as they require direct access to the raw data.

4 ACS Paragon Plus Environment

Page 4 of 45

Page 5 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

A foundational challenge to the extraction of data suitable for identification or further analysis is the closed-source format of the MS raw data file. Software capable of parsing and converting raw formats from individual MS vendors (e.g. MSFileReader) has resulted in substantial progress in the ability to access and process the enclosed data

4–6

. However, the rapid development and increasing

complexity of MS analysis approaches mean the composition and structure of a raw file can be substantially different between vendor MS or software iterations, potentially compromising the compatibility with, and ultimately the functionality of a given tool. As an example, the introduction of synchronous precursor selection tandem MS/MS/MS (SPS-MS3) scanning for isobaric tag reporter ion acquisition on the Orbitrap Fusion 7 and Orbitrap Fusion Lumos MS platforms resulted in the storage of an additional scan per precursor ion in a non-sequential manner. Implementation of analysis workflows that integrate SPS-MS3 quantification with MS2 identification results has been relatively slow, with the commercial tools Proteome Discoverer (PD) and PEAKS 8, freeware MaxQuant 9, and open-source Trans-Proteomic Pipeline (TPP) 10 offering varying levels of compatibility with raw files generated using this acquisition type. To improve the accessibility of data contained within raw Thermo Orbitrap MS data files, this work describes the development, validation, and application of a new open-source tool: RawQuant. RawQuant supports simple parsing of meta and scan data from all combinations of MS1, MS2, and SPS-MS3 acquisition modes. RawQuant offers the functionality to output standard text format tables of scan meta data, acquisition characteristics (e.g. topN, duty cycle), and Mascot 5 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Generic Files (MGF) suitable for MS2 spectral identification. In addition, RawQuant provides the capability to extract isobaric tag quantification data (e.g. isobaric tags for relative and absolute quantification - iTRAQ and tandem mass tags - TMT) across the Q-Exactive and Orbitrap Fusion instrument families. The functionality of RawQuant was demonstrated using a combination of individual applications that serve to highlight the capability of the tool to provide a userfriendly method to capture the information contained within raw data files acquired on Thermo Orbitrap MS instruments.

6 ACS Paragon Plus Environment

Page 6 of 45

Page 7 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Experimental Section Access of deposited data For benchmarking RawQuant, a collection of previously published data obtained from ProteomeXchange were examined: 1. Re-analysis of data from a study focused on examination of the HeLa cell proteome with MS analysis on a Q-ExactiveHF instrument (PXD004452) 11

.

2. Re-analysis of data from a study focused on examination of the HeLa cell proteome with MS analysis on a Q-ExactiveHF-X instrument (PXD006932) 12

.

3. Re-analysis of data from a study focused on examination of the HeLa cell proteome with MS analysis on a Q-ExactiveHF instrument (PXD001305) 13

.

4. Re-analysis of data from a study of TMT 10-plex labeled ‘triple-knockout’ genetic mutant Saccharomyces cerevisiae strains (PXD008009) 14. 5. Re-analysis of data files from a study of an iTRAQ 8-plex two-proteome mixture model of human and E. coli peptides performed on a Q-Exactive MS (PXD003640) 15. 6. Re-analysis of data from a study of TMT 10-plex labeled Saccharomyces cerevisiae

grown

in

different

carbon

sources

(PXD002875)

16

.

Supplementary tables of expression values for this study were obtained through direct contact with the authors.

7 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 45

Detailed explanations of re-analysis of the above data sets for identification results are described in the Supplementary Information. RawQuant processing RawQuant was implemented with the Python programming language and utilizes MSFileReader (Thermo Scientific) to gain access to raw Thermo Orbitrap MS files. For this work, the freely available versions of Python (version 3.6.1, 64bit) and MSFileReader (version 3.0.29, 64-bit) were used. RawQuant accesses raw data files utilizing François Allain’s Python bindings for MSFileReader (https://github.com/frallain/MSFileReader-Python-bindings). RawQuant and its dependencies

are

available

in

the

Python

Package

Index

(https://pypi.python.org/pypi/RawQuant/0.1.0). In addition, detailed information describing

installation

and

use

are

also

freely

available

on

GitHub:

https://github.com/kevinkovalchik/RawQuant. RawQuant has two primary modes of operation: parse and quant. In parse mode, the user provides the input raw file, the orders (e.g. MS1, MS2, MS3) of the MS scans for extraction, and whether or not an MGF file should be created for subsequent database matching. If specified, RawQuant can return standard meta data for all selected orders, including: scan index, retention time, injection time, and other linked scan events (e.g. MS1 scan from which an MS2 scan was triggered). In quant mode, RawQuant requires input of the raw file and reporter ion design (e.g. TMT 10-plex, iTRAQ 8-plex). Quant mode can utilize standard isobaric tagging methods (e.g. iTRAQ 2 – 8plex, and TMT 0 – 11plex), as well as custom, user-defined tag sets. Given an isotope impurity table, corrections can 8 ACS Paragon Plus Environment

Page 9 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

be applied for both TMT and iTRAQ experimental designs. The impurities are corrected using linear algebra and Cramer’s rule. Reporter ions are quantified based on the centroid data calculated by the instrument firmware. RawQuant performs no peak picking, and thus relies entirely on vendor provided centroid values contained within the raw files themselves. Fortunately, centroided versions of every spectrum where the Orbitrap detector is used are stored in the raw file independent of the data mode used (e.g. profile or centroid), so these values are available. For extracting reporter ions, RawQuant assumes that the data in the highest order MSn scans contain the data of interest (e.g. MS3 for SPS-MS3). However, this functionality can be overridden by the user. Quantification is performed by searching a window of ± 0.003 Da around the expected reporter ion mass. If multiple ion peaks are discovered, the ion with the lowest ppm mass error is chosen. The reporter ion’s mass, mass error, and intensity are available from both ion trap and Orbitrap data. Additionally, when the data were acquired in the Orbitrap, the resolution, baseline, and noise are automatically output to the quantification matrix. The generated quantification matrix is automatically saved to a disk as a tab-delimited text file. If MS1 interference quantification is desired, it is assumed that the MS1 scan is acquired in the Orbitrap. Interference can be calculated for both profile and centroid scans, and bases the calculations on area or intensity, respectively. The isolation width is automatically acquired from the raw file and is used to extract the relevant mass list centered on the precursor mass. Any peak found at the precursor’s m/z is designated a non-interference. In all scans, appropriate 9 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 45

carbon isotope peaks are searched for at higher m/z values than the precursor and are designated as non-interferences. If the precursor’s mass is greater than 1,000 Da, isotope peaks are also searched for at lower m/z values. To increase the script’s speed, RawQuant does not perform curve-fitting to determine area under profile scans. The area is determined directly from the mass list using the composite trapezoidal rule. The MS1 interference is calculated as a percent ratio of interference area or intensity to the total area or intensity in the isolation window. In both parse and quant modes, the option to output standard MGF files that contain centroided peaks from MS2 scans is available. In addition, in both modes, a ‘metrics’ table containing information such as number of MS1 scans, number of MS2 scans, mean topN, and mean scan rate and duty cycle can be generated. E. coli cell culture, protein isolation, reduction, and alkylation E. coli cultures grown in Luria Broth with standard conditions (total of ~1e9 cells were harvested) were prepared using the following protocol. Pellets (~1e7 cells each) were thawed on ice and periodically vortexed. To each pellet, 900µL of lysis buffer (50mM HEPES pH 8, 1% SDS (Thermo Fisher, CAT#BP1311-1), 1% Triton X-100 (Sigma, CAT#T8787), 1% NP-40 (Sigma, CAT#NP-40), 50mM NaCl (Sigma, CAT#S7653), 10mM tris(2-carboxyethyl)phosphine hydrochloride (TCEP)

(Sigma,

CAT#C0267),

1X

CAT#C4706), cOmplete

40mM

protease

chloroacetamide inhibitor



EDTA

(CAA)

(Sigma,

free

(Sigma,

CAT#11836170001)) was added. Lysis mixtures were transferred to 2mL 10 ACS Paragon Plus Environment

Page 11 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

FastPrep-compatible tubes containing Lysing Matrix B (MP Biomedicals; CAT#116911050). Lysis mixtures were vortexed on the FastPrep-24 instrument (6 M/s, 40 seconds, 2 cycles, 120 second rest between cycles). Lysates were then centrifuged at 20,000g for 5 minutes, and the supernatant recovered. Resultant lysates were heated at 90°C for 15 minutes, and chilled to room temperature for a further 15 minutes. Protein concentrations were approximated using A280 readings from a NanoDrop instrument (Thermo Scientific). Protein clean-up with SP3, and protease digestion Proteins were purified using the SP3 method, as described previously

17–

19

. A total of 200µg of protein was prepared in a final volume of 100µL in a

standard 1.5mL tube. To the lysate, 20µL (400µg) of a 1:1 combination of two different types of carboxylate-functionalized beads, both with a hydrophilic surface (Sera-Mag Speed Beads, GE Life Sciences, CAT#45152105050350 and CAT#65152105050350). Beads were rinsed in water prior to addition to the lysate. The pH of the bead-lysate mixture was maintained at basic conditions (HEPES pH 8) to ensure optimal binding to beads 19,20. To promote binding to the beads, 100µL of 90% ethanol was added to achieve a final concentration of 50% by volume (e.g. 45% ethanol final concentration). Tubes were mixed on a ThermoMixer unit (Eppendorf) at 1000rpm for 10 minutes at room temperature. Tubes were placed in a magnetic rack and incubated for 2 minutes. The supernatant was discarded, and the beads rinsed 3x with 180µL of 90% ethanol by removing the tubes from the magnetic rack and gently re-suspending the beads by pipette mixing. For elution, tubes were removed from the magnetic 11 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

rack, and beads were re-suspended in 100µL of 50mM HEPES, pH 8 containing an appropriate amount of trypsin/rLysC mix (1:25 enzyme to protein concentration) (Promega, CAT#V5071) and incubated for 14 hours at 37°C in a ThermoMixer with mixing at 1000rpm. After incubation, the tubes were sonicated briefly (~30 seconds) in a bath sonicator, placed on a magnetic rack, and the supernatant recovered for further processing. Synthetic peptide mix preparation The set of standard peptides was taken as a subset from the collection analyzed in the ProteomeTools initiative

21

. Peptides were selected for a panel of

61 genes, resulting in a set of 444 total candidates that were synthesized in a ‘SpikeMix’ format (JPT Peptide Technologies) (Table S-1). Upon delivery, dried peptides were reconstituted in 100µL of DMSO, vortexed briefly (~15 seconds), and sonicated in a water bath for 5 minutes. Reconstituted peptides were spiked into real samples based on an acquired signal response curve for the mixture. Tandem mass tag labeling of peptides TMT 11-plex labeling kits were obtained from Pierce. Each TMT label (5mg per vial) was reconstituted in 500µL of acetonitrile and refrozen. A maximum of 100µg of combined peptide was present in any single channel. Labeling reactions were carried out through addition of 300 µg of TMT label in two volumetrically equal steps of 15µL (150 µg per addition), 30 minutes apart. Reactions were quenched through addition of 10µL of glycine (1M stock solution) (Sigma). Labeled peptides were concentrated on a SpeedVac centrifuge (Thermo Scientific) to remove excess acetonitrile, acidified to 1% (v/v) 12 ACS Paragon Plus Environment

Page 12 of 45

Page 13 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

trifluoroacetic acid (TFA), and purified with a C18 TopTip (100 - 1000µL TopTip, Glygen Corp., CAT#TT3C18). Peptides were dried in a SpeedVac centrifuge, and reconstituted at 1µg/µL in 1% DMSO, 1% formic acid in water. Peptide clean-up procedures Peptides were desalted and concentrated using TopTip treatment. For TopTip clean-up, 1mL TopTips (Glygen, CAT#TT3C18) were rinsed twice with 0.6mL of acetonitrile with 0.1% TFA. Cartridges were then rinsed twice with 0.6mL of water with 0.1% TFA prior to sample loading. Loaded samples were rinsed three times with 0.1% formic acid (0.6mL per rinse) and eluted with 1.2mL of 90% acetonitrile containing 0.1% formic acid. All TopTip processed samples were concentrated in a SpeedVac centrifuge and subsequently reconstituted in 1% formic acid with 1% DMSO in water. Chromatographic separation prior to MS analysis For all runs, samples were introduced to the MS using an Easy-nLC 1000 system (Thermo Scientific). Columns used for trapping and analytical separations were packed in-house in fritted capillaries prepared using a combination of formamide and Kasil (1:3 ratio, Next Advance, CAT#FRIT-KIT). Trapping columns were packed in 100µm internal diameter capillaries to a length of 3cm with C18 core-shell beads (Aeris PEPTIDE XB-C18, Phenomenex, 1.7µm particle size, CAT#04A-4506). Prior to injection, the pre- and analytical columns were equilibrated at 400 bar for 10µL and 3µL, respectively. After injection, trapping was carried out for a total volume of 15µL at a pressure of 400 bar. After trapping, gradient elution of peptides was performed on a core-shell C18 (Aeris 13 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

PEPTIDE XB-C18, Phenomenex, 1.7µm particle size) column packed in 100µm internal diameter capillaries to a length of 25cm and heated to 50°C using AgileSLEEVE column ovens (Analytical Sales & Service). Elution in 60-minute runs was performed with a gradient of mobile phase A (water and 0.1% formic acid) from 3 – 8% B (acetonitrile and 0.1% formic acid) over 2 minutes, 8 – 25% B over 40 minutes, and to 40% B over 11 minutes, with final elution (80% B) using a further 7 minutes at a flow rate of 400nL/min. MS analysis of peptide samples on the Orbitrap Fusion Data acquisition with Orbitrap Fusion (control software version 3.0.2041) was carried out using a data-dependent method with MS2 in the Orbitrap, or multi-notch synchronous precursor selection (SPS)-MS3 scanning for TMT tags. The Orbitrap Fusion was operated with a positive ion spray voltage of 2200 and a transfer tube temperature of 275°C. MS1 scans were acquired in the Orbitrap at a resolution of 120K, across a mass range of 350 – 1500 m/z, with an RF lens setting of 60, an AGC target of 2e5, a max injection time of 50ms, for 1 microscan in profile mode. For dependent scans, monoisotopic precursor selection was enabled with the ‘Peptide’ setting, an intensity threshold of 5e3, charge state selection of 2 – 4 charges, and dynamic exclusion for 30 seconds after 1 appearance with 20ppm low and high tolerances. Isotopes were excluded from repeat analysis, and the dependent scan on a single charge state per precursor setting was disabled. For MS2 acquisition in the Orbitrap, quadrupole isolation using a 1.4m/z window with no offset used prior to HCD fragmentation with a setting of 35%. 14 ACS Paragon Plus Environment

Page 14 of 45

Page 15 of 45 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Data acquisition was carried out in the Orbitrap using a resolution of 50K, a first mass of 110m/z, an AGC target of 2e5, and a max injection time of 120ms for 1 microscan in profile mode. For SPS-MS3 acquisition, MS2 scans were acquired in the ion trap after quadrupole isolation with a window of 1.4m/z. Activation was by CID with an energy of 30%, a 10ms activation time, and 0.25 activation Q. The ion trap was set to scan in ‘Rapid’ mode, with a 1e4 AGC target, and 30ms max inject time. MS2 scans were acquired in centroid mode. Ions for MS3 scans were selected based on a precursor mass range of 350 – 1200, a relative intensity filter of 10%, precursor ion exclusion of 20, and 5ppm (Low, High), and isobaric tag loss of TMT. A total of 10 precursors were set for SPS using a MS1 and MS2 isolation windows of 2m/z with no offset. Ions were fragmented with HCD at an energy of 60%. Scans were acquired in the Orbitrap at a resolution of 50K and scan range of 120 – 750m/z for 1 microscan in profile mode. SPS-MS3 AGC targets and maximum injection times were altered across windows of 8e4 – 8e5 and 60 – 120ms as indicated. Mass spectrometry data analysis All acquired data were processed using SearchCLI (version 3.2.30) and PeptideShakerCLI (version 1.16.11)

22,23

. All searches used a combination of

XTandem (version 2015.12.15.2), Myrimatch (version 2.2.140), MS-GF+ (version 10282), and Comet (version 2016.01 rev. 3) algorithms. MGF files generated with RawQuant were used in all searches.

15 ACS Paragon Plus Environment

Journal of Proteome Research 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

For the analysis of the MS data in SearchCLI, centroided MS2 spectra (MGF files from RawQuant) were searched against a UniProt E. coli proteome database (release 2017_10) containing common contaminants and the synthetic peptide sequences, that was appended to reversed sequences generated using the –decoy tag of FastaCLI in SearchCLI (8,710 total sequences, 4,355 target). Identification parameter files were generated using IdentificationParametersCLI in SearchCLI, specifying precursor and fragment tolerances of 20ppm and 0.5 Da (ion trap MS2) or 0.05 Da (Orbitrap MS2), carbamidomethyl of cysteine, TMT 10plex of peptide N-term, and TMT 10-plex of lysine as fixed modifications, and oxidation of methionine and acetylation of protein N-term as variable modifications. The msgf_instrument, msgf_fragmentation, and msgf_protocol tags were set to 0, 1, and 4 for ion trap MS2, and 3, 3, and 4 for Orbitrap MS2. All SearchCLI results were processed into PSM, peptide, and protein sets using PeptideShakerCLI. Error rates are controlled in PeptideShakerCLI using the target-decoy search strategy to determine false-discovery rates (FDR). Hits from multiple search engines were unified using posterior error probabilities determined from the target-decoy search strategy. Results were exported from PeptideShakerCLI using ReportCLI. All results were filtered to provide a final FDR at the PSM, peptide, and protein level of