Optimization of Acquisition and Data-Processing ... - ACS Publications

Dec 20, 2016 - Proteomic analysis with data-independent acquisition (DIA) ... the effectiveness of two widely used peak extraction software PeakView a...
0 downloads 0 Views 1MB Size
Subscriber access provided by University of Newcastle, Australia

Article

Optimization of acquisition and data-processing parameters for improved proteomic quantification by SWATH mass spectrometry Shanshan Li, Qichen Cao, Weidi Xiao, Yufeng Guo, Yunfei Yang, Xiaoxiao Duan, and Wenqing Shui J. Proteome Res., Just Accepted Manuscript • Publication Date (Web): 20 Dec 2016 Downloaded from http://pubs.acs.org on December 20, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

Optimization of acquisition and data-processing parameters for improved

2

proteomic quantification by SWATH mass spectrometry

3

Shanshan Li1,#, Qichen Cao2,#,*, Weidi Xiao3, Yufeng Guo2, Yunfei Yang2, Xiaoxiao

4

Duan3, Wenqing Shui1,*

5

1

6

2

7

300308

8

3

iHuman Institute, ShanghaiTech University, Shanghai 201210, China Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin

College of Life Sciences, Nankai University, Tianjin 300071, China

9 10

#These authors contribute equally to this work

11

12

*To whom correspondence should be addressed to:

13

Qichen Cao, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences,

14

Tianjin 300308, China; Tel: 86-22-24828768; email: [email protected]

15

Wenqing Shui, iHuman Institute, ShanghaiTech University, Shanghai 201210, China;

16

Tel: 86-21-20685595; email: [email protected]

17 18

1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Abstract

2

Proteomic analysis with data independent acquisition (DIA) approaches represented by

3

the SWATH technique has gained intense interest in recent years because DIA is able to

4

overcome the intrinsic weakness of conventional data dependent acquisition (DDA)

5

methods

6

quantification. Although the raw mass spectrometry (MS) data quality and the data-

7

mining workflow conceivably influence the throughput, accuracy and consistency of

8

SWATH-based proteomic quantification, there lacks a systematic evaluation and

9

optimization of the acquisition and data-processing parameters for SWATH MS analysis.

10

Herein, we evaluated the impact of major acquisition parameters such as the precursor

11

mass range, isolation window width and accumulation time as well as the data-

12

processing variables including peak extraction criteria and spectra library selection on

13

SWATH performance. Fine tuning these interdependent parameters can further improve

14

the throughput and accuracy of SWATH quantification compared to the original setting

15

adopted in most SWATH proteomic studies. Furthermore, we compared the

16

effectiveness of two widely used peak extraction software PeakView and Spectronaut in

17

discovery of differentially expressed proteins in a biological context. Our work is believed

18

to contribute to a deeper understanding of the critical factors in SWATH MS experiments

19

and help researchers optimize their SWATH parameters and workflows depending on

20

the sample type, available instrument and software.

and afford higher throughout

and

reproducibility for proteome-wide

2

ACS Paragon Plus Environment

Page 2 of 39

Page 3 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

Key words: SWATH, DIA, acquisition parameters, data processing, spectral library,

2

proteomic quantification

3

3

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Page 4 of 39

Introduction

2

The ultimate goal of proteomics is qualitative and quantitative profiling of the full

3

repertoire of proteins with sufficient accuracy and consistency. As technologies in mass

4

spectrometry continue advancing, thousands of protein constituents in complex

5

biological samples can be identified and quantified unambiguously.1-3 Proteomic analysis

6

with data independent acquisition (DIA) approaches represented by the sequential

7

window acquisition of all theoretical fragment ion spectra (SWATH) technique has

8

gained intense interest in recent years because DIA is able to overcome the intrinsic

9

weakness of conventional data dependent acquisition (DDA) methods and afford higher

10

throughput and reproducibility for proteome-wide quantification.4, 5

11

SWATH mass spectrometry (MS) analysis records the complete fragment ion traces

12

for all peptides detectable within specific precursor mass windows, therefore overcoming

13

the stochastic, intensity-driven selection of precursors in DDA-based MS analysis. In

14

fact, SWATH MS maintains the major advantages of multiple reaction monitoring (MRM)

15

-based targeted approaches such as high degree of specificity, reproducibility and

16

sensitivity,6 yet it substantially improves throughput and coverage of protein

17

quantification compared to MRM.5 An increasing number of studies have demonstrated

18

the great potential of SWATH MS in large-scale quantitative proteomic research

19

including interrogation of dynamics of the human interactome,7,

20

mapping of mouse tissue proteome9,

21

determination of genome-wide absolute protein concentrations.12, 13

10

8

comprehensive

and human plasma proteome11 as well as

22

SWATH MS operates within a space of interdependent acquisition parameters,

23

including precursor mass range, precursor isolation window width, accumulation time of

24

the product ion scan, and total duty cycle that would collectively influence the intensity

25

and specificity of fragment ion peaks, and throughput, accuracy and consistency of peak 4

ACS Paragon Plus Environment

Page 5 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

quantification. In the original publication of SWATH technique, the authors provided an

2

acquisition parameter set as condition 1 listed in Table 1 mainly based on their extensive

3

experiences in DDA and MRM experiments as well as computational simulation of

4

fragment ion interferences5. Even since then these instrument settings have been

5

adopted in many proteomic studies using the SWATH approach.4, 7, 8, 10, 11, 13-15 Because

6

of the interconnections between these acquisition variables such as the obvious tradeoff

7

of the isolation window width with the accumulation time/cycling rate,5 we consider it is

8

necessary to explore the acquisition parameters in detail and understand their distinct

9

influences on SWATH MS performance.

10

Fragment ion chromatogram extraction against a reference spectral library is an

11

essential step in most SWATH MS data processing workflows. The depth and quality of

12

the spectral library makes a significant impact on the outcome of proteomic

13

quantification. Very recently, Wu JX et al. evaluated locally generated and online

14

repository-based libraries for their effects on SWATH quantification using a commercial

15

software PeakView for SWATH peak extraction.16 In addition to PeakView, an array of

16

open-access software such as Spectronaut, OpenSWATH, DIA-Umpire, and Skyline

17

have been developed to process SWATH and other types of DIA datasets with more

18

flexibility and openness than the commercial software.4, 12, 17-21 Among them, Spectronaut

19

has emerged as a popular DIA data mining tool because it can utilize spectral libraries

20

generated from raw data acquired on various instrument platforms and achieve accurate

21

RT calibration using defined spike-in peptide standards.22 However, the variables in

22

fragment ion peak extraction and spectral library construction have not been extensively

23

investigated.

24

In this work, we systematically evaluated the impact of major acquisition parameters

25

such as the precursor mass range, isolation window width and accumulation time as well 5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

as the data-processing variables including peak extraction criteria and spectra library

2

selectin on SWATH performance. By analyzing yeast proteomic samples serially diluted

3

in a complex digestion background, we assessed the throughput and accuracy of

4

proteins and peptides quantified by SWATH MS analysis under different conditions.

5

Furthermore, we compared two workflows using PeakView or Spectronaut for fragment

6

ion peak extraction and assessed their performance in discovery of differentially

7

expressed proteins in yeast cells upon heat shock stress. We anticipate our work will

8

contribute to a deeper understanding of the critical factors in SWATH MS experiments

9

and help researchers optimize their SWATH parameters and workflows depending on

10

the sample type, available instrument and software.

11 12 13 14 15 16 17

6

ACS Paragon Plus Environment

Page 6 of 39

Page 7 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

MATERIALS AND METHODS

2

Yeast Culture, Protein Extraction, Digestion and Fractionation

3

Cell culture media and media supplements were all purchased from Invitrogen

4

(Carlsbad, CA, USA). All the other chemical materials were purchased from Sigma (St.

5

Louis, MO, USA). Saccharomyces cerevisiae BY4743 strain was grown at 30 °C in YPD

6

medium until they reached mid-exponential phase. Yeast culture was centrifuged at

7

1500 × g for 5 min at 4 °C. The cell pellets were washed 3 times with cold PBS to

8

remove the medium. Then the cell pellets were resuspended in lysis buffer of 8 M urea,

9

100 mM NH4HCO3, 5 mM DTT and protease inhibitor cocktail (Roche, Mannheim,

10

Germany). The cells were disrupted by glass beads and centrifuged at 15000 × g for 20

11

min at 4 °C. The protein supernatant concentration was determined by Bradford Protein

12

Assay Kit. Yeast proteins (~1 mg) were reduced with 10 mM DTT at 37 °C for 4 h and

13

alkylated with 40 mM iodoacetamide at room temperature in darkness for 40 min.

14

Additional 10 mM DTT were added to quench excess iodoacetamide followed by

15

incubation at 37 °C for 30 min. Samples were then diluted with 100 mM NH4HCO3 to a

16

final concentration of 1.0 M urea and proteins were digested with sequencing grade

17

modified trypsin (Promega, Madison, USA) at an enzyme: protein ratio of 1:100 (w/w) at

18

37 °C for 4 h. The same amount of trypsin was added again and incubated at 37 °C

19

overnight. Digestion was terminated by adding 1% FA and then the peptides were

20

desalted with C18 cartridges (Waters, Milford, USA) and dried by speed vacuum.

21

To build a reference spectral library for SWATH-MS analysis, the yeast peptide

22

sample was pre-fractionated by high-pH RPLC with a Durashll-C18 column (C18, 3 µm

23

resin, 4.6 mm × 250 mm, Agela, China) on the Nexera UHPLC system (SHIMADZU,

24

Japan). The peptides were dissolved in mobile phase A (water-acetonitrile-NH4OH = 98: 7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

2: 0.014, v/v/v). The peptides were eluted by mobile phase B (water-acetonitrile-NH4OH

2

= 2: 98: 0.014, v/v/v) using a gradient of 5-40 min 8-18%; 40-62 min 18-32%; 62-64 min

3

32-95%. The fractions were collected and pooled into 15 fractions before lyophilization

4

under vacuum. Before LC-MS analysis, all peptide samples were spiked with the

5

retention time standard peptides iRT-Kit (Biognosys, Schlieren, Switzerland)22 according

6

to the manufacture instruction.

7 8

Dilution of Yeast Proteomic Samples with E.coli Total Digests

9

To assess the SWATH-MS quantification accuracy, the yeast total cell digest was

10

mixed with Escherichia coli cell total digest which simulates the highly complex

11

proteomic matrix. The E. coli K-12 strain was grown to 1.0 OD in M9 minimal medium.

12

Protein extraction and digestion was conducted with the same procedure as described

13

above for yeast total digest preparation. The yeast peptide samples were spiked into E.

14

coli total digest with 2X, 5X and 10X dilution factors, which gave rise to expected fold

15

changes of 0.5, 0.2 and 0.1 for the serially diluted yeast samples vs the undiluted

16

sample. Each diluted and undiluted yeast proteomic sample was injected into 1D

17

nanoLC-MS in duplicate for SWATH MS analysis.

18 19

NanoLC-MS/MS Setup

20

All the nanoLC-MS/MS runs in this work were performed on an Eksigent NanoLC

21

connected to TripleTOF 5600 mass spectrometer (AB SCIEX, Concord, Ontario) with a

22

nano-electrospray ionization source. Each proteomic sample (~2 µg) was loaded onto a

23

C18 trap column (10 mm × 100 µm, 5 µm, C18 resin) using an isocratic 98% Buffer A

24

(2% acetonitrile, 0.1% FA) and 2% Buffer B (98% acetonitrile, 0.1% FA) at a flowrate of 8

ACS Paragon Plus Environment

Page 8 of 39

Page 9 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

2 µL/min. Then peptides were separated on a nanoLC column (150 mm × 75 µm)

2

packed with C18-AQ 3 µm C18 resin (Dr. Maisch, GmbH, Germany) at a flow rate of 300

3

nl/min with an elution gradient of 0-1 min, 5% Buffer B; 1-55 min, 5-24% B; 55-70 min,

4

24-36% B; 70-85 min, 36-80% B. The mass spectrometer was operated in the positive

5

ion mode. In the shotgun experiment of pre-fractionated yeast digest samples,

6

information-dependent acquisition (IDA) was implemented using a “top 40” method.

7

Specifically, a 250 ms survey scan was performed in the m/z range of 350-1500, and the

8

top 40 ions above the intensity threshold of 120 counts were selected for subsequent

9

MS/MS scans with an accumulation time of 50 ms. In the SWATH experiment, a 100 ms

10

survey scan was performed in the m/z range of 350-1500, followed by serial consecutive

11

SWATH scans. Key parameters such as the SWATH scan mass range, isolation window

12

width, accumulation time, etc. specified in Table 1 were individually evaluated.

13 14

Database Search and Spectral Library Generation

15

For extensive spectral library generation, 15 off-line RPLC fractions from yeast

16

protein digestion were analyzed on TripleTOF 5600 mass spectrometer in the IDA mode.

17

Each IDA file was searched with Mascot (v2.5.1, Matrix Science), MaxQuant

18

(v1.5.0.30),23 ProteinPilot (v4.5, AB SCIEX)24 or X!tandem25 search engine built in

19

SearchGui software (v2.1.4)26 against the SGD protein sequence database (release 09

20

Nov. 2011, containing 6771 entries). Trypsin was set as the specific enzyme and up to

21

two missed cleavages per peptide were allowed. Carbamidomethylation on cysteine was

22

set as fixed modification, oxidation on methionine and acetylation on the protein N-

23

terminus were variable modification. For Mascot, MaxQuant and X!tandem searches,

24

precursor ion mass tolerance was set to 20 ppm and fragment ion tolerance was 0.05

25

Da. For ProteinPilot search, the “Thorough ID” mode was selected which automatically 9

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

adjust mass tolerance to fit the high-resolution MS and MSMS data. The corresponding

2

database search result files were imported to Skyline software19 to generate the spectral

3

library with a cut-off score of 0.99 to ensure confident spectral assignment.

4 5

SWATH MS Data Extraction and Statistical Analysis

6

Peak extraction of the SWATH data was performed using either the Spectronaut

7

software (ver 8.0, Biognosys, Switzerland) or SWATH micro App embedded in

8

PeakView (ver2.0, AB SCIEX, USA).27 SWATH data was processed with default settings

9

in Spectronaut4. Reference peptides from the iRT-kit (Biognosys) spiked into each

10

sample were used to calibrate the retention time of extracted peptide peaks using

11

Spectronaut. Peptide identification results were filtered with a q-value < 0.01 which

12

controlled the estimated peptide FDR below 1% using the error rate algorithm originally

13

from mProphet.28 PeakView was also used for peptide peak extraction with the following

14

parameters: 75 ppm m/z tolerance for the targeted transition, six peptides selected per

15

protein, six transitions selected per peptide, peptide identification FDR < 1%, and

16

excluding shared peptides. RT calibration was also performed based on iRT peptide

17

elution profiles in PeakView using the SWATH App module (ver2.0).

18

After peak extraction with either Spectronaut or PeakView, the sum of MS2 ion peak

19

areas of SWATH quantified peptides for individual proteins were exported to calculate

20

the protein peak areas. In the data analysis for diluted vs undiluted yeast proteomic

21

samples, the protein fold change was calculated based on protein peak areas from the

22

pair of samples in comparison. For statistical analysis of the SWATH dataset from the

23

yeast heat shock experiment, peak extraction output data matrix from either Spectronaut

24

or PeakView was imported into MSstats (v2.3.5) for data normalization and relative

25

protein quantification.29 Proteins with a fold change > 1.5 and statistical p-value < 0.05 10

ACS Paragon Plus Environment

Page 10 of 39

Page 11 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

estimated by MSstats were regarded differentially expressed under heat shock condition

2

vs control.

3 4

Yeast Heat Shock Experiment

5

An industrial yeast strain ScY01 was first grown at 30 °C to early exponential phase in

6

YPD medium. An equal volume of pre-warmed YPD medium (70 °C) was added to the

7

culture, resulting in an instantaneous shift of culture temperature from 30 °C to 50 °C,

8

and cells were exposed to the heat stress at 50 °C for 30 min. A control experiment was

9

carried out by culturing yeast cells at 30 °C constantly. Cells were collected by

10

centrifugation at 1500 × g for 5 min at 4 °C and the cell pellets were washed by PBS

11

three times. The following protein extraction and digestion procedure was the same as

12

described above. Two biological replicates, each with two process replicates were

13

implemented for both the control and heat shock samples. Protein digests from

14

individual samples were separately analyzed on AB 5600 TripleTOF MS in SWATH

15

acquisition mode. The optimized instrument parameters specified as “condition 5” in

16

Table 1 were applied. SWATH peak extraction was separately performed using

17

Spectronaut and PeakView, both with the criteria of peptide transition FDR 1

18

unique peptide per protein.

19 20

11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

RESULTS AND DISCUSSION

2

Impact of SWATH MS Major Acquisition Parameters

3

We first evaluated the influence of tuning major SWATH MS acquisition parameters

4

on the total number of proteins and peptides quantified as well as precision of

5

quantification. The possible influence of several instrument parameters on general

6

SWATH MS performance was briefly discussed in a recent publication30. We anticipate

7

the best identification and quantification results can be obtained only when the major

8

acquisition parameters (i.e. precursor mass range, Q1 isolation window width,

9

accumulation time for product ion scans, and cycle time) are well adapted to both the

10

targeted analytes and the instrument performance. To this end, six different combination

11

of these parameters listed in Table 1 were applied in SWATH analysis of a yeast

12

proteomic sample using TripleTOF 5600 mass spectrometer (Complete quantification

13

results summarized in Table S1). In this study, we relied on protein identification results

14

from DDA experiments on the pre-fractionated peptide samples from the same yeast

15

total digest to construct a spectral library for SWATH data extraction.

16

Condition 1 represents the conventional SWATH MS settings which divide the mass

17

range from 400 to 1200 m/z into 32 consecutive Q1 isolation windows with 25 amu per

18

window4, 5. Each product ion scan takes 100 ms and the total cycle time is 3.35 s which

19

allows collection of enough data points across most peptide chromatographic peaks.

20

Under condition 2, the precursor mass range was shrunk to 350-1000 m/z and the

21

accumulation time of each product scan was increased while the cycle time was kept

22

constant. These changes resulted in quantification of 14% more proteins and peptide

23

precursors compared to the initial setting (Figure 1A). 12

ACS Paragon Plus Environment

Page 12 of 39

Page 13 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

We attributed the gain of quantification throughput mostly to the increased

2

accumulation time which directly improves the MSMS spectral quality. The increase of

3

accumulation time in product ion scan is related to the reduced number of isolation

4

window due to the narrowed mass range. It is noteworthy that the adjusted mass range

5

did not compromise the total number of protein and peptide identifications in the SWATH

6

experiment as we found the majority of identifications from a DDA experiment of the

7

yeast total digest using the same instrument was concentrated in this region (Figure

8

S1A). In other words, skipping the high mass range of 1000-1200 m/z allowed for an

9

enhanced product ion scan for each isolation window while not affecting the cycling rate,

10

leading to increased quantification coverage as well as slightly improved precision

11

(Figure 1A and 1B). However, when we analyzed DDA data from mouse brain tissue or

12

HeLa cells acquired on two instruments from other vendors31, 32, a lot more identifications

13

were based on precursors detected in the high mass range (Figure S1B, S1C).

14

Therefore, optimization of the precursor mass range is sample and instrument specific.

15

According to our DDA data, we employed this new precursor mass range of 350-1000

16

m/z in the subsequent experiments and condition 2 was regarded a benchmark for

17

performance comparison.

18

We next assessed the impact of Q1 isolation window width on SWATH MS

19

performance. In principle, large isolation windows allow for faster cycling rates across a

20

defined precursor scan range. However, large isolation windows increase the number of 13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

precursors concurrently fragmented in the respective window, introducing more ion

2

interference. In addition, presence of fragment ion interference could influence the mass

3

accuracy, resolution and signal intensity of targeted fragment ions in SWATH spectra.27

4

When we narrowed the isolation window from the original 25 amu to 15 amu or 10 amu

5

and still fixing the cycle time around 3.3 s, it follows that much more isolation windows

6

were needed across the mass range of 350-1000 m/z and each one consumed shorter

7

accumulation time (conditions 3 and 4 in Table 1). Interestingly, these adjustments

8

further increased the total number of protein quantification by 7.1% and 10.4% for

9

conditions 3 and 4 respectively compared to condition 2. Notably, these new conditions

10

did not impair precision of quantification mainly because of the constant cycle time

11

(Figure 1B). These results indicate that less interferences in the product ion spectra due

12

to smaller isolation window overweighed the reduction of accumulation time, causing net

13

enhancement of SWATH data quality. Recently, the technique of variable Q1 isolation

14

window has been introduced to SWATH MS and it allows more flexible optimization of

15

the SWATH isolation window within different segments of the precursor mass range so

16

as to acquire deeper coverage for proteome-wide quantification.33

17

Considering the cycle time significantly affects data point sampling across

18

chromatographic peaks and thus the precision of quantification, we then modified this

19

parameter to achieve faster cycling rates (conditions 5 and 6). It turned out that a cycle

20

time of 2.7 s which is shorter than the conventional setting (3.3-3.4 s) afforded the best 14

ACS Paragon Plus Environment

Page 14 of 39

Page 15 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

precision among six conditions examined here (Figure 1B). Compared to condition 2, the

2

median CV of peptide peak quantification in condition 5 dropped from 8.3% to 6.1%, and

3

the median number of data points across peaks increased from 8.6 to 11.0. However,

4

the even faster cycling rate in condition 6 deteriorated the precision of SWATH

5

quantification shown by the median CV of peak area raised to 10.7% (Figure 1A, 1B). It

6

should be also noted that in our study the shorter cycle time is connected to reduced

7

accumulation time of product ion scans (80 or 50 ms) as the isolation window setting

8

was not much changed. Interestingly, the reduced accumulation time in condition 5 did

9

not cause negative effect on the quantification throughput (Figure 1A). Instead, slightly

10

more proteins and peptides were measured in condition 5 vs condition 2 (Figure 1A).

11

However, the unmatched combination of shorter accumulation time and wider isolation

12

window in condition 6 would affect fragment ion spectral quality and peptide transition

13

extraction, which ultimately lowered both quantification throughput and precision (Figure

14

1). Scatter plots of CV relative to ion intensity under different conditions are shown in

15

Figure S2.

16

Taken together, we obtained an optimal combination of major acquisition parameters

17

(condition 5) in which precursor mass range, accumulation time and cycle time are

18

significantly modified compared to the conventional settings adopted in previous studies

19

using TripleTOF 5600 mass spectrometer (condition 1). These modifications led to

20

increase of the total number of protein quantifications from 1983 to 2297, and reduction 15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

of the median CV of peptide peak area from 10.5% to 6.1%. Our strategy of combined

2

adjustment of interdependent instrument parameters can be extended to other mass

3

instrument platforms with faster scan speed, better sensitivity and stronger mass

4

resolution power to accomplish optimal SWATH MS performance.

5 6

SWATH MS Analysis of Yeast Proteomic Samples Diluted in E. coli Digest

7

Background

8

To assess the capability of SWATH MS in profiling of various proteins in complex

9

matrix, we diluted a proteomic sample of the yeast total digest win E. coli digest

10

background by a factor of 2, 5 and 10. The initial (undiluted) proteomic sample and three

11

diluted samples were analyzed by SWATH MS using the optimal instrument parameter

12

set discussed above (i.e. condition 5 in Table 1). Each sample was injected with equal

13

loading in technical duplicate. The spectral library generated earlier from pre-fractionated

14

yeast proteomic samples was used here for detection of proteins and peptides from

15

SWATH dataset.

16

The number of yeast proteins and peptides that can be identified and quantified by

17

SWATH analysis was gradually reduced from the initial sample to the diluted samples of

18

increasing dilution factors (Figure 2A). In total, 14007 peptide precursors corresponding

19

to 2297 proteins were measured in the initial yeast total digest, whereas only 1901

20

precursors corresponding to 535 yeast proteins were measured in the 10-fold diluted

21

sample. It seems increasingly challenging to extract and identify less abundant yeast

22

peptides serially diluted with E. coli total digests. Surprisingly, according to the copy

23

number estimation given by Ghaemmaghami S et al.,34 the dynamic range of cellular

24

abundances of yeast proteins detected by SWATH MS from the initial sample and 16

ACS Paragon Plus Environment

Page 16 of 39

Page 17 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

serially diluted samples all spanned a wide range of four orders of magnitude (from 100

2

to 1E6 copies/cell) (Figure 2B). Furthermore, the quantification precision for yeast

3

proteomic samples was not much affected in the E. coli digest background. More than

4

70% yeast proteins were quantified with CV below 20% in both the initial and diluted

5

samples (Figure 2C). We then used this reference dataset to assess how changes in

6

data processing parameters impact on the quantification throughput and accuracy of

7

SWATH MS analysis.

8 9

Impact of Peptide Peak Extraction Criteria

10

We investigated the influence of peptide extraction criteria in Spectronaut software on

11

the total number of yeast proteins quantified in the E. coli digest background as well as

12

the quantification accuracy of SWATH analysis. As expected, quantitative comparison

13

between different diluted yeast proteomic samples and the initial sample would yield

14

theoretical ratios of 0.5, 0.2 and 0.1 respectively. Using default settings in Spectronaut

15

with a single filter of q-value < 0.01 for control of peptide transition FDR, 1598, 999 and

16

618 proteins were quantified in the yeast proteomic samples diluted in 2, 5, and 10 fold

17

with E. coli digests (complete quantification results summarized in Table S1). Although

18

the median protein ratio determined for each sample was close to the theoretical value,

19

a number of outliers far from the expected ratio were observed (Figure 3 A). In fact,

20

22.2%-43.9% proteins from three diluted samples showed a relative error above 50%

21

(Table S2). These outliers are mostly derived from peptides of poor quantification

22

reproducibility or having low-quality extracted ion spectra reflected by Cscore given by

23

Spectronaut (Figure S3). Because peptides carrying variable modifications are routinely

24

removed when developing a quantification assay (e.g. S/MRM), we then filtered out

25

these modified peptides extracted from the SWATH dataset. Surprisingly, this treatment 17

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

did not significantly reduce the number of outliers for protein quantification (Figure 3B).

2

Next we changed the criteria to select at least two unique peptides per protein for

3

quantification, which is adopted in many proteomic studies. As a result, 12.8%-20.5%

4

proteins from three diluted samples were found to have a relative error above 50%,

5

indicating substantial increase of SWATH quantification accuracy (Figure 3C, Table S2).

6

Combining the filters to exclude modified peptides and retain proteins with more than

7

one peptide assignment did not further improve the accuracy of quantification (Figure

8

3D, Table S1). Although extraction of at least two peptides per protein reduced the

9

coverage of the quantifiable protein and peptide precursors to varying extent (Figure

10

3E), we still recommended implementing this stringent criteria for the benefit of

11

quantification accuracy.

12 13

Impact of Spectral Library and Data Processing Software

14

To ensure specificity of fragment ion extraction from SWATH datasets, a prior

15

generated spectra library is needed to provide the information matrix of peptide

16

sequence, retention time, peptide-fragment transitions and fragment ion intensity. In a

17

detailed protocol of building high-quality spectral libraries by Schubert et al.,35 the

18

authors suggested pre-fractionation of peptide samples and combining output of multiple

19

search engines to maximize the number of PSMs and enhance discrimination between

20

true and false assignments. In the recently published work by Wu J et al, the spectral

21

library generated on the same sample using the same type of instrument led to detection

22

of the highest number of differential proteins with the lowest false positive rate.16

23

However, this study did not investigate the value of using multiple search engines to

24

build extended spectral libraries.

18

ACS Paragon Plus Environment

Page 18 of 39

Page 19 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

In this work, we first constructed two separate spectral libraries based on protein

2

identifications for the pre-fractionated yeast proteomic samples from Mascot search

3

alone or pooling results from four search engines (Mascot, MaxQuant, ProteinPilot and

4

X!Tantem). The third spectral library available in Spectronaut was built on the public data

5

repository of yeast proteomes. It is noteworthy that all three spectral libraries contained

6

data from iRT peptide standards spiked into every sample for accurate RT calibration

7

which warrants the specificity of fragment ion peak extraction.4 In total, 27,462 peptides

8

corresponding to 4145 proteins were built in the Mascot library; 33,097 peptides

9

corresponding to 4315 proteins were in the multi-engine library; and 34,029 peptides

10

corresponding to 3421 proteins were available in the public data library. The

11

aforementioned yeast dilution reference dataset was processed against the three

12

different spectral libraries using Spectronaut, which extracted at least two peptides per

13

protein. It turned out that use of the library from both the multi-engine search output and

14

the public data repository increased the total number of quantified proteins to different

15

extent for three diluted yeast samples compared to use of the Mascot library (Figure 4).

16

However, the quantification accuracy was impeded when using the multi-engine and

17

public data libraries given that more outliers with abnormal ratios were present (Figure

18

4A-C). In particular, the percentage of yeast proteins in the 10-fold diluted sample that

19

had fold changes of > 50% relative error increased from 20% to over 26% when

20

changing the Mascot library to the multi-engine or public repository derived library (Table

21

S3). We speculated that combining identification outputs from multiple engines

22

expanded the search space and may have introduced more errors in SWATH spectral

23

annotation as well as the following fragment ion peak extraction. Thus a stringent FDR

24

control method such as Mayu36 and BiblioSpec37 could be exploited to improve the

25

quality of the multi-engine and public data libraries and minimize false assignment of 19

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

SWATH spectra. In addition, it is reported by Wu J et al. that extending the assay library

2

by integrating spectra from public data repository largely compromises accuracy and

3

precision of SWATH quantification16. Other software tools for creating combined spectral

4

libraries from different sources may be further evaluated to increase the depth and

5

quality of assay libraries12,

6

with the public data library also observed in our study emphasizes the importance of

7

building the library from shotgun proteomic data acquired on the same instrument

8

system as the SWATH experiment even though precursor RT can be strictly calibrated

9

with peptide standards.

35, 38

. Nevertheless, the inferior quantification performance

10

Apart from optimizing data processing parameters in Spectronaut, we also tested a

11

widely used commercial software PeakView in analysis of the same SWATH dataset for

12

yeast dilution samples. Fragment ion peaks were extracted using SWATH micro App 2.0

13

built in PeakView software with criteria similar to Spectronaut settings (see Methods for

14

details). The spectral library was constructed on the identification output from

15

ProteinPilot which contained 150791 peptides corresponding to 3673 proteins.

16

Surprisingly, SWATH peak extraction with PeakView against the single library gave rise

17

to significantly fewer proteins (Figure 4E). Only 361, 210, 117 proteins were quantified in

18

2, 5, 10-fold diluted yeast samples respectively, compared to 1082, 609, 337 proteins

19

quantified by Spectronaut against the Mascot library. Swapping the spectral library with

20

the two software led to the same conclusion that a greater number of yeast proteins

21

were quantified when extracting SWATH peaks by Spectronaut than by PeakView

22

(Figure 4E).

23

Notably, higher accuracy of protein quantification was obtained from SWATH data

24

processed by PeakView than by Spectronaut (Figure 4, Table S3). Thus the two

20

ACS Paragon Plus Environment

Page 20 of 39

Page 21 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

software showed complementary features in terms of quantification throughput and

2

accuracy for SWATH data mining.

3 4

Quantification of Expression Changes in The Yeast Proteome upon Heat Shock

5

We next employed the SWATH MS technique to quantify the expression changes in

6

the S. cerevisiae proteome induced by heat stress. The heat shock response of yeast

7

cells which activates multiple fundamental stress response programs have been

8

characterized at the proteomic level using stable isotope labeling-based quantification

9

techniques39,

40

. Herein we analyzed the control and heat-stressed proteome of an

10

industrial yeast strain by SWATH MS with the optimal instrument and data-processing

11

parameters discovered above (see Methods for details). Given that the two software

12

Spectronaut and PeakView displayed complementary strengths in SWATH quantification

13

performance, we processed this SWATH dataset using both for peak extraction under

14

the same criteria (i.e. FDR 1 peptide/protein). Ion peak areas exported from

15

Spectronaut and PeakView were normalized and subjected to statistical analysis by

16

MSstats.29

17

Using the two data-mining workflows, we measured a total of 1323 and 663 proteins

18

across all four replicates of the heat-shock and control samples based on peak

19

extraction by Spectronaut and Peakview, respectively. Among the quantified proteins,

20

100 were found to be differentially expressed with a fold change > 1.5 (p-value < 0.05) in

21

heat shock response vs control with the Spectronaut workflow whereas 43 were

22

differential proteins discovered by the PeakView workflow. The overlap of up- and down-

23

regulated proteins revealed by two different workflows is summarized in Figure 5.

24

Although Spectronaut is more sensitive in detection of differentially expressed proteins,

25

PeakView still captures a handful of differential proteins that would have been 21

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

overlooked if relying on Spectronaut alone. Considering the relatively high accuracy of

2

PeakView in quantification of expected protein fold changes in the diluted yeast samples

3

(Figure 4), we pooled differential proteins reported by two software that meet our

4

selection criteria. Yet it should be noted that the FDR of quantification could be

5

increased by combining results from two workflows. Among 117 non-redundant

6

differential proteins revealed by our SWATH MS analysis, only 27 were reported in the

7

previous study to be also significantly changed under heat shock stress39. This

8

considerable difference in proteomic profiles may be attributed to multiple factors

9

including distinct strain background, different stress conditions and different MS

10

quantification techniques employed.

11

Functional classification was then performed on the differentially expressed proteins

12

detected by both workflows (Figure 5). A panel of proteins involved in protein

13

folding/degradation were up-regulated and those in the electron transport/energy

14

generation were down-regulated during heat shock response of the industrial yeast

15

strain. The complete dataset and in-depth characterization of differential proteins will be

16

presented elsewhere. In summary we recommend pooling results given by the two

17

widely used SWATH data extraction software to cover as many differentially regulated

18

proteins as possible for functional studies.

19 20

22

ACS Paragon Plus Environment

Page 22 of 39

Page 23 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Journal of Proteome Research

Conclusions

2

The present study has evaluated the impact of multiple parameters in raw data

3

acquisition and data processing workflow on SWATH MS performance so as to strike a

4

balance

5

quantification.

6

Our work implicates that several acquisition variables including precursor mass range,

7

MS2 accumulation time, isolation window width and cycle time affect quantification

8

performance in an interdependent manner. In addition, special attention needs to be

9

paid to SWATH peak extraction criteria and the software chosen for SWATH data

10

processing so as to acquire complete and reliable quantification results. For SWATH

11

experiments performed on ABSciex TripleTOF 5600 instrument, we provided an optimal

12

set of acquisition parameters and recommended two complementary workflows for data

13

mining and differential protein selection. However, it should be noted that the optimized

14

parameter set is sample and instrument specific. Our study mainly provides

15

experimental evidence to show that combined optimization of these parameters is able

16

to improve SWATH quantification performance, yet we would recommend researchers to

17

optimize their own parameters and workflows depending on the sample type, available

18

instrument and software.

between

the

throughput,

accuracy

and

reproducibility

19 20 21

23

ACS Paragon Plus Environment

of

proteomic

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Acknowledgement

2

This work was supported by grants from the Bairenjihua Program of the Chinese

3

Academy of Sciences, and the National Natural Science Foundation of China (No.

4

31401150 and 21505151) the Key Projects in Tianjin Science & Technology Pillar

5

Program (No. 14ZCZDSY00062).

6 7

Conflict of Interest Disclosure

8

The authors declare no competing financial interests.

9 10

24

ACS Paragon Plus Environment

Page 24 of 39

Page 25 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Journal of Proteome Research

REFERENCES 1.

Chick, J.M.; Munger, S.C.; Simecek, P.; Huttlin, E.L.; Choi, K.; Gatti, D.M.; Raghupathy, N.; Svenson, K.L.; Churchill, G.A.; Gygi, S.P. Defining the consequences of genetic variation on a proteome-wide scale. Nature 2016, 534, 500-505.

2.

Mertins, P.; Mani, D.R.; Ruggles, K.V.; Gillette, M.A.; Clauser, K.R.; Wang, P.; Wang, X.; Qiao, J.W.; Cao, S.; Petralia, F.; Kawaler, E.; Mundt, F.; Krug, K.; Tu, Z.; Lei, J.T.; Gatza, M.L.; Wilkerson, M.; Perou, C.M.; Yellapantula, V.; Huang, K.L.; Lin, C.; McLellan, M.D.; Yan, P.; Davies, S.R.; Townsend, R.R.; Skates, S.J.; Wang, J.; Zhang, B.; Kinsinger, C.R.; Mesri, M.; Rodriguez, H.; Ding, L.; Paulovich, A.G.; Fenyo, D.; Ellis, M.J.; Carr, S.A.; Nci, C. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 2016, 534, 55-62.

3.

Kusebauch, U.; Campbell, D.S.; Deutsch, E.W.; Chu, C.S.; Spicer, D.A.; Brusniak, M.Y.; Slagel, J.; Sun, Z.; Stevens, J.; Grimes, B.; Shteynberg, D.; Hoopmann, M.R.; Blattmann, P.; Ratushny, A.V.; Rinner, O.; Picotti, P.; Carapito, C.; Huang, C.Y.; Kapousouz, M.; Lam, H.; Tran, T.; Demir, E.; Aitchison, J.D.; Sander, C.; Hood, L.; Aebersold, R.; Moritz, R.L. Human SRMAtlas: A Resource of Targeted Assays to Quantify the Complete Human Proteome. Cell 2016, 166, 766-778.

4.

Selevsek, N.; Chang, C.Y.; Gillet, L.C.; Navarro, P.; Bernhardt, O.M.; Reiter, L.; Cheng, L.Y.; Vitek, O.; Aebersold, R. Reproducible and consistent quantification of the Saccharomyces

cerevisiae proteome by SWATH-mass

spectrometry.

Mol

Cell

Proteomics 2015, 14, 739-749. 5.

Gillet, L.C.; Navarro, P.; Tate, S.; Rost, H.; Selevsek, N.; Reiter, L.; Bonner, R.; Aebersold, R. Targeted data extraction of the MS/MS spectra generated by dataindependent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 2012, 11, O111 016717.

6.

Lange, V.; Picotti, P.; Domon, B.; Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol Syst Biol 2008, 4, 222.

7.

Collins, B.C.; Gillet, L.C.; Rosenberger, G.; Rost, H.L.; Vichalkovski, A.; Gstaiger, M.; Aebersold, R. Quantifying protein interaction dynamics by SWATH mass spectrometry: application to the 14-3-3 system. Nat Methods 2013, 10, 1246-1253.

8.

Lambert, J.P.; Ivosev, G.; Couzens, A.L.; Larsen, B.; Taipale, M.; Lin, Z.Y.; Zhong, Q.; Lindquist, S.; Vidal, M.; Aebersold, R.; Pawson, T.; Bonner, R.; Tate, S.; Gingras, A.C. Mapping differential interactomes by affinity purification coupled with data-independent mass spectrometry acquisition. Nat Methods 2013, 10, 1239-1245.

9.

Bruderer, R.; Bernhardt, O.M.; Gandhi, T.; Miladinovic, S.M.; Cheng, L.Y.; Messner, S.; Ehrenberger, T.; Zanotelli, V.; Butscheid, Y.; Escher, C.; Vitek, O.; Rinner, O.; Reiter, L. Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophen-treated three-dimensional liver microtissues. Mol Cell Proteomics 2015, 14, 1400-1410. 25

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

10.

Guo, T.; Kouvonen, P.; Koh, C.C.; Gillet, L.C.; Wolski, W.E.; Rost, H.L.; Rosenberger, G.; Collins, B.C.; Blum, L.C.; Gillessen, S.; Joerger, M.; Jochum, W.; Aebersold, R. Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps. Nat Med 2015, 21, 407-413.

11.

Liu, Y.; Buil, A.; Collins, B.C.; Gillet, L.C.; Blum, L.C.; Cheng, L.Y.; Vitek, O.; Mouritsen, J.; Lachance, G.; Spector, T.D.; Dermitzakis, E.T.; Aebersold, R. Quantitative variability of 342 plasma proteins in a human twin population. Mol Syst Biol 2015, 11, 786.

12.

Rost, H.L.; Rosenberger, G.; Navarro, P.; Gillet, L.; Miladinovic, S.M.; Schubert, O.T.; Wolski, W.; Collins, B.C.; Malmstrom, J.; Malmstrom, L.; Aebersold, R. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat Biotechnol 2014, 32, 219-223.

13.

Schubert, O.T.; Ludwig, C.; Kogadeeva, M.; Zimmermann, M.; Rosenberger, G.; Gengenbacher, M.; Gillet, L.C.; Collins, B.C.; Rost, H.L.; Kaufmann, S.H.; Sauer, U.; Aebersold, R. Absolute Proteome Composition and Dynamics during Dormancy and Resuscitation of Mycobacterium tuberculosis. Cell Host Microbe 2015, 18, 96-108.

14.

Loke, M.F.; Ng, C.G.; Vilashni, Y.; Lim, J.; Ho, B. Understanding the dimorphic lifestyles of human gastric pathogen Helicobacter pylori using the SWATH-based proteomics approach. Sci Rep 2016, 6, 26784.

15.

Tang, X.; Meng, Q.; Gao, J.; Zhang, S.; Zhang, H.; Zhang, M. Label-free Quantitative Analysis of Changes in Broiler Liver Proteins under Heat Stress using SWATH-MS Technology. Sci Rep 2015, 5, 15119.

16.

Wu, J.X.; Song, X.; Pascovici, D.; Zaw, T.; Care, N.; Krisp, C.; Molloy, M.P. SWATH Mass Spectrometry Performance Using Extended Peptide MS/MS Assay Libraries. Mol Cell Proteomics 2016, 15, 2501-2514.

17.

Tsou, C.C.; Avtonomov, D.; Larsen, B.; Tucholska, M.; Choi, H.; Gingras, A.C.; Nesvizhskii, A.I. DIA-Umpire: comprehensive computational framework for dataindependent acquisition proteomics. Nat Methods 2015, 12, 258-264, 257 p following 264.

18.

Rardin, M.J.; Schilling, B.; Cheng, L.Y.; MacLean, B.X.; Sorensen, D.J.; Sahu, A.K.; MacCoss, M.J.; Vitek, O.; Gibson, B.W. MS1 Peptide Ion Intensity Chromatograms in MS2 (SWATH) Data Independent Acquisitions. Improving Post Acquisition Analysis of Proteomic Experiments. Mol Cell Proteomics 2015, 14, 2405-2419.

19.

MacLean, B.; Tomazela, D.M.; Shulman, N.; Chambers, M.; Finney, G.L.; Frewen, B.; Kern, R.; Tabb, D.L.; Liebler, D.C.; MacCoss, M.J. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26, 966-968.

20.

Teleman, J.; Rost, H.L.; Rosenberger, G.; Schmitt, U.; Malmstrom, L.; Malmstrom, J.; Levander, F. DIANA--algorithmic improvements for analysis of data-independent acquisition MS data. Bioinformatics 2015, 31, 555-562. 26

ACS Paragon Plus Environment

Page 26 of 39

Page 27 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Journal of Proteome Research

21.

Wang, J.; Tucholska, M.; Knight, J.D.; Lambert, J.P.; Tate, S.; Larsen, B.; Gingras, A.C.; Bandeira, N. MSPLIT-DIA: sensitive peptide identification for data-independent acquisition. Nat Methods 2015, 12, 1106-1108.

22.

Escher, C.; Reiter, L.; MacLean, B.; Ossola, R.; Herzog, F.; Chilton, J.; MacCoss, M.J.; Rinner, O. Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 2012, 12, 1111-1121.

23.

Cox, J.; Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 2008, 26, 1367-1372.

24.

Shilov, I.V.; Seymour, S.L.; Patel, A.A.; Loboda, A.; Tang, W.H.; Keating, S.P.; Hunter, C.L.; Nuwaysir, L.M.; Schaeffer, D.A. The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Mol Cell Proteomics 2007, 6, 1638-1655.

25.

Craig, R.; Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20, 1466-1467.

26.

Vaudel, M.; Barsnes, H.; Berven, F.S.; Sickmann, A.; Martens, L. SearchGUI: An opensource graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 2011, 11, 996-999.

27.

Nesvizhskii, A.I. Protein identification by tandem mass spectrometry and sequence database searching. Methods Mol Biol 2007, 367, 87-119.

28.

Reiter, L.; Rinner, O.; Picotti, P.; Huttenhain, R.; Beck, M.; Brusniak, M.Y.; Hengartner, M.O.; Aebersold, R. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat Methods 2011, 8, 430-435.

29.

Choi, M.; Chang, C.Y.; Clough, T.; Broudy, D.; Killeen, T.; MacLean, B.; Vitek, O. MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics 2014, 30, 2524-2526.

30.

Simburger, J.M.; Dettmer, K.; Oefner, P.J.; Reinders, J. Optimizing the SWATH-MSworkflow for label-free proteomics. J Proteomics 2016, 145, 137-140.

31.

Sharma, K.; Schmitt, S.; Bergner, C.G.; Tyanova, S.; Kannaiyan, N.; Manrique-Hoyos, N.; Kongi, K.; Cantuti, L.; Hanisch, U.K.; Philips, M.A.; Rossner, M.J.; Mann, M.; Simons, M. Cell type- and brain region-resolved mouse brain proteome. Nat Neurosci 2015, 18, 1819-1831.

32.

Beck, S.; Michalski, A.; Raether, O.; Lubeck, M.; Kaspar, S.; Goedecke, N.; Baessmann, C.; Hornburg, D.; Meier, F.; Paron, I.; Kulak, N.A.; Cox, J.; Mann, M. The Impact II, a Very High-Resolution Quadrupole Time-of-Flight Instrument (QTOF) for Deep Shotgun Proteomics. Mol Cell Proteomics 2015, 14, 2014-2029.

33.

Zhang, Y.; Bilbao, A.; Bruderer, T.; Luban, J.; Strambio-De-Castillia, C.; Lisacek, F.; Hopfgartner, G.; Varesio, E. The Use of Variable Q1 Isolation Windows Improves Selectivity in LC-SWATH-MS Acquisition. J Proteome Res 2015, 14, 4359-4371. 27

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

34.

Ghaemmaghami, S.; Huh, W.K.; Bower, K.; Howson, R.W.; Belle, A.; Dephoure, N.; O'Shea, E.K.; Weissman, J.S. Global analysis of protein expression in yeast. Nature 2003, 425, 737-741.

35.

Schubert, O.T.; Gillet, L.C.; Collins, B.C.; Navarro, P.; Rosenberger, G.; Wolski, W.E.; Lam, H.; Amodei, D.; Mallick, P.; MacLean, B.; Aebersold, R. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat Protoc 2015, 10, 426-441.

36.

Reiter, L.; Claassen, M.; Schrimpf, S.P.; Jovanovic, M.; Schmidt, A.; Buhmann, J.M.; Hengartner, M.O.; Aebersold, R. Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry. Mol Cell Proteomics 2009, 8, 2405-2417.

37.

Frewen, B.; MacCoss, M.J. Using BiblioSpec for creating and searching tandem MS peptide libraries. Curr Protoc Bioinformatics 2007, Chapter 13, Unit 13 17.

38.

Zi, J.; Zhang, S.; Zhou, R.; Zhou, B.; Xu, S.; Hou, G.; Tan, F.; Wen, B.; Wang, Q.; Lin, L.; Liu, S. Expansion of the ion library for mining SWATH-MS data through fractionation proteomics. Anal Chem 2014, 86, 7242-7246.

39.

Nagaraj, N.; Kulak, N.A.; Cox, J.; Neuhauser, N.; Mayr, K.; Hoerning, O.; Vorm, O.; Mann, M. System-wide perturbation analysis with nearly complete coverage of the yeast proteome by single-shot ultra HPLC runs on a bench top Orbitrap. Mol Cell Proteomics 2012, 11, M111 013722.

40.

Shui, W.; Xiong, Y.; Xiao, W.; Qi, X.; Zhang, Y.; Lin, Y.; Guo, Y.; Zhang, Z.; Wang, Q.; Ma, Y. Understanding the Mechanism of Thermotolerance Distinct From Heat Shock Response Through Proteomic Analysis of Industrial Strains of Saccharomyces cerevisiae. Mol Cell Proteomics 2015, 14, 1885-1897.

24 25 26

28

ACS Paragon Plus Environment

Page 28 of 39

Page 29 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

1

Table 1. SWATH MS acquisition parameters evaluated in this study

Condition

Mass range (m/z)

SWATH window width (amu)

SWATH window number

Accumulation time (ms)

Cycle time (s)

1

400-1200

25

32

100

3.35

2

350-1000

25

26

125

3.4

3

350-1000

15

44

70

3.23

4

350-1000

10

65

50

3.38

5

350-1000

20

33

80

2.7

6

350-1000

20

33

50

1.8

2 3

29

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Figure Legends

2

Figure 1. Impact of major acquisition parameters on proteomic quantification by

3

SWATH analysis. (A) The total number of proteins and peptide precursors quantified by

4

SWATH analysis of a yeast total digest. Different conditions of data acquisition are

5

specified in Table 1. (B) The median number of data points across each fragment ion

6

peak (DP/peak) and the median CV% of peak quantification under different data

7

acquisition conditions.

8 9

Figure 2. SWATH MS analysis of yeast proteomic samples diluted in E.coli digest

10

background. (A) The total number of proteins and peptide precursors quantified by

11

SWATH MS analysis of the initial (undiluted) yeast proteomic sample and the same

12

sample serially diluted with E. coli total digest by 2, 5 and 10 fold. The optimal condition

13

5 in Table 1 was employed for data acquisition. (B) Estimated cellular abundance

14

distribution of yeast proteins measured in the initial and diluted samples. (C) The

15

accumulated curve of CV% of yeast protein quantification in the initial and diluted

16

samples across triplicate measurement.

17 18

Figure 3. Impact of peptide peak extraction criteria on proteomic quantification by

19

SWATH MS analysis. Distribution of relative ratios determined on individual yeast

20

proteins in a specific dilute sample vs the initial sample is shown in the boxplots. Median

21

protein ratios are indicated next to the boxplots. Fragment ion peaks were extracted by

22

Spectronaut with different criteria: (A) peptide transition q-value < 0.01; (B) q-value
50% in diluted

12

yeast proteomic samples vs the initial sample.

13

Table S3 Percentage of proteins with fold changes of relative error > 50% in diluted

14

yeast proteomic samples vs the initial sample.

15

38

ACS Paragon Plus Environment

Page 38 of 39

Page 39 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1

Journal of Proteome Research

for TOC only

2 3

Image courtesy of Shanshan Li (author), Copyright 2016

39

ACS Paragon Plus Environment