Biomarker Discovery and Verification of Esophageal Squamous Cell

Jul 30, 2015 - Biomarker Discovery and Verification of Esophageal Squamous Cell Carcinoma Using Integration of SWATH/MRM. Guixue Hou†‡§, Xiaomin ...
0 downloads 7 Views 1MB Size
Page 1 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Biomarker discovery and verification of esophageal squamous cell carcinoma using integration of SWATH/MRM Guixue Hou1,2,3, Xiaomin Lou1, Yulin Sun4, Shaohang Xu3, Jin Zi3, Quanhui Wang1,3, Baojin Zhou3, Bo Han4, Lin Wu1, Xiaohang Zhao4*, Liang Lin3*, Siqi Liu 1,3*

1

CAS Key Laboratory of Genome Sciences and Information, Beijing Institutes of Genomics,

Chinese Academy of Sciences, Beijing, 100101, China 2

University of Chinese Academy of Sciences, Beijing, 100049, China

3

Proteomics Division, BGI-Shenzhen, Shenzhen, Guangdong, 518083, China

4

National Laboratory of Molecular Oncology, Cancer Institute & Hospital, Chinese Academy of

Medical Sciences, 17 Panjiayuan, Chaoyangqu, Beijing, 100021,China

*To whom correspondence should be addressed:

Siqi Liu, Ph.D., Professor Phone: 86-10-80485325 Fax: 86-10-80485324

1 / 33

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

E-mail: [email protected] Address: Airport Industrial Zone B-6, Shunyi, Beijing 101318, China

Liang Lin, Ph.D. Associate Professor Phone: 86-755-82637021 Fax: 86-755-25274284 E-mail: [email protected] Address: Beishan Industrial Zone, Yantian, Shenzhen 518083, China

Xiaohang Zhao, Ph.D. Professor Phone: 86-10-67709015 Fax: 86-10-67709015 E-mail: [email protected] Address: 17 Panjiayuan, Chaoyangqu, Beijing, 100021, China

Abstract

We propose an efficient integration of SWATH with MRM for biomarker discovery and verification when the corresponding ion library is well established. We strictly controlled the false positive rate associated with SWATH MS signals and carefully selected the target peptides coupled with SWATH and MRM. We collected 10 samples of esophageal squamous cell carcinoma (ESCC) tissues paired with tumors and adjacent regions and quantified 1758 unique proteins with FDR 1% at protein level using SWATH, in which 467 proteins were abundancedependent with ESCC. After carefully evaluating the SWATH MS signals of the up-regulated 2 / 33

ACS Paragon Plus Environment

Page 2 of 38

Page 3 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

proteins, 120 proteins were selected for MRM verification. MRM analysis of the pooled and individual esophageal tissues resulted in 116 proteins that exhibited similar abundance response modes to ESCC that were acquired with SWATH. Because the ESCC-related proteins consisted of a high percentile of secreted proteins, we conducted the MRM assay on patient sera that were collected from pre- and post-operation. Of the 116 target proteins, 42 were identified in the ESCC sera, including 11 with lowered abundances post-operation. Coupling SWATH and MRM is thus feasible and efficient for the discovery and verification of cancer-related protein biomarkers.

Keywords SWATH, MRM, ESCC, biomarker

Introduction In discovery proteomics, tandem mass spectrometry (MS/MS) data are usually collected for peptide identification through automated data-dependent acquisition (DDA). Using DDA, mass information on intact peptides in a full-scan mass spectrum (MS1) is used to determine which subset of mass signals should be targeted for the further acquisition of fragmentation (MS/MS) spectra to identify peptide sequences. Despite enabling unbiased global protein identification, DDA-based shotgun proteomics is not satisfactorily accepted because of its tendency to miss many ionized peptides and acquire less quantification information. Recently, data independent acquisition (DIA) has emerged as another important method in proteomic analysis. It does not require the selection of precursor ions from MS1 for peptide fragmentation and allows all of the

3 / 33

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ionized peptides in a sample to be fragmented. Sequential window acquisition of all theoretical spectra (SWATH) is a novel DIA technique that combines the strength of regular shotgun proteomics and the reproducibility of MRM signals to detect and accurately quantify a high number of analytes1. The application of SWATH has achieved significant progress in a short time. For instance, Liu et al compared the quantitative results achieved from SWATH and MRM for 41 N-glycoproteins that were enriched from human plasma and demonstrated that SWATH exhibited comparable sensitivity and reproducibility to MRM2. Collins et al combined affinity purification with SWATH (AP-SWATH) to study the interactome dynamics of the 14-3-3β scaffold protein after stimulation of the insulin-PI3K-AKT pathway and observed that the data derived from APSWATH was sufficient for identifying interacting proteins3. The current study of SWATH is primarily focused on acquiring quantitative information from ionized peptides at low densities. Based on the original intent for SWATH, the SWATH acquisition mode (SWATH MS) can select broad m/z windows and fragments for all precursors, allowing mass spectrometry to collect the MS/MS spectra of all peptide ions. SWATH therefore likely acquires plentiful peptide information that is not only favorable for quantitative proteomics but is also beneficial for discovery proteomics. Recently, Haverland et al successfully employed SWATH MS to identify the host proteins in macrophages in response to HIV-1 infection and demonstrated that this method was capable of profiling proteins and characterizing differential proteins after quantitative evaluation of the profiling data4. Conversely, SWATH is still not generally employed in discovery proteomics. A technique bottleneck for SWATH users is the generation of mass spectrometric reference maps (ion libraries) with maximized peptide spectra, in which SWATH is capable of acquiring and identifying differentially expressed proteins. Generally, two

4 / 33

ACS Paragon Plus Environment

Page 4 of 38

Page 5 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

approaches have been taken to construct ion libraries through either in-house shotgun proteomics or chemically synthesized peptides1,5. Because of the current instrument and algorithm limitations, the construction of ion libraries by coalescing replicate peptide spectra into a consensus spectrum in-house remains a feasible and economic approach for most SWATH users. Although several groups have developed such construction approaches, there remains large room for improvement, particularly in how to make an ion library for biomarker discovery. Proteomics has long been implemented in biomarker discovery but seldom in clinical applications. The confirmation of a useful biomarker undergoes three key steps: discovery, verification and validation. A rate-limiting step is how to effectively verify biomarker candidates identified from discovery proteomics. The traditional verification in protein biomarkers often relies on antibody-based assays, such as immunohistochemistry (IHC), western blotting or enzyme-linked immune sorbent assay (ELISA). The verification efficiencies of these traditional approaches are dependent on qualified antibodies obtained from commercial or laboratorial sources. For example, Pawar et al attempted to identify potential biomarkers for esophageal squamous cell carcinoma (ESCC) through the iTRAQ approach and discovered 257 ESCCrelated proteins; however, only 3 candidates were further verified by IHC with tissue microarrays6. Zhang et al analyzed ESCC and corresponding adjacent tissues by twodimensional electrophoresis (2DE) and matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry (MALDI TOF MS). Of 33 proteins up-regulated in the ESCC tissues, only two were selected for verification in a subsequent study of ESCC serum samples and tissue-array slides using immunoassay7. Obviously, improving the verification efficiencies is critical for exploring more protein biomarkers related to diseases. Recent progress in MRM in proteomics allows it to be a powerful technology in this field. In contrast to antibody-based techniques,

5 / 33

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

MRM can easily verify multiple protein targets, likely over hundreds of candidates. Based on informative quantitation signals, such as multiple transitions per peptide and multiple peptides per protein, MRM can provide accurate and reproducible data for evaluating the abundances of peptides and proteins. The performance characteristics of the MRM assay for the measurement of protein biomarkers in specimens such as plasma has been systematically and rigorously evaluated8,9. Are there any technique challenges in MRM to verify the protein candidates acquired from discovery proteomics? The answer is certainly. First, most reports related to this topic have adopted a two-step quantification strategy, in which the techniques of protein identification and quantification in the discovery stage are different from that of the verification stage6,7. Because the differential protein candidates derived from these criteria are so different from MRM, this strategy results in a puzzle of whether the corresponding peptide MS signals are suitable for the MRM assay. Although computational tools such as MRMaid10 and PeptideAtlas11 provide a partial solution, a barrier between theoretical prediction and experimental evidence exists. Because SWATH MS data possesses similar physical features as MRM data, peptides and fragments extracted from SWATH are likely to be suitable for developing MRM assays1,2. Additionally, in verification with MRM assays, quality control of MS/MS data is critical to ensure that the MRM MS signals truly represent target peptides. Traditionally, this process is implemented with synthesized peptides that are applied to the MRM assay in the same conditions12,13. In a proteomic analysis coupling with SWATH and MRM, however, large-scale peptide synthesis is obviously expensive and time-consuming. A computational tool to evaluate the signal qualities of peptides extracted from SWATH or MRM is necessary.

6 / 33

ACS Paragon Plus Environment

Page 6 of 38

Page 7 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

In this communication, we selected tissues from ESCC and the corresponding adjacent regions and aimed to discover and verify ESCC-related protein biomarkers with a combination pipeline of SWATH and MRM. After protein extraction from the pooled tissues, we constructed an ion library based on DDA shotgun proteomics and retention time (RT) correction through DDA and SWATH data. A total of 1758 proteins were quantified from the esophageal tissues, including 467 proteins that exhibited significantly quantitative differences between the ESCC and adjacent regions. After careful analysis of the SWATH MS data and strict evaluation of the MRM signal qualities, the ESCC-related protein candidates were further verified by MRM in the esophageal tissues and the ESCC patient sera. Our data revealed that many protein candidates identified at the discovery phase with SWATH could be confirmed at the verification phase using MRM and that the quantitative information derived from SWATH and MRM was highly and positively correlated. Therefore, the combination of SWATH/MRM is feasible and efficient for the discovery and verification of candidate protein biomarkers.

Materials and Methods 1. Materials Ten paired tissues were collected from the Cancer Institute, CAMS, Beijing, China. All cases with pathologic diagnoses for tumor-node-metastasis (TNM) stages were evaluated based on the Cancer Stage Manual, 7th Edition, issued in 2009 by the American Joint Committee on Cancer. All specimens that matched adjacent normal esophageal tissues (NETs) were used as controls. After surgery, the fresh and paired tissues, tumor and adjacent, were washed with cool phosphate buffer saline, immediately snap frozen in liquid nitrogen and stored at −80°C until proteomic

7 / 33

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

analysis. The corresponding sera from these patients before and one week after surgery were collected. A written, informed consent were signed by patients, and the procedure of sample collection was approved by the ethics committee of the Cancer Institute, CAMS. 2. Sample preparation of the esophageal tissues The frozen tissues were ground with liquid nitrogen, and the powdered tissues were homogenized by sonication in lysis buffer containing 2% SDS, 7 M Urea, 10 mM EDTA, 10 mM PMSF, 10 mM DTT and 0.1 M Tris-HCl, pH 7.6. The homogenized tissues were centrifuged at 20,000 g, and the resulting supernatants were reduced by 10 mM DTT and alkylated by 55 mM iodoacetamide. The treated proteins were tryptic-digested following the FASP protocol as described by Mann14. Total protein content was measured using the Bradford assay. To profile the esophageal proteins with SWATH, equal amounts of proteins from each ESCC or adjacent tissue were pooled and digested for proteomic analysis. To verify the ESCCrelated proteins, the pooled and individual samples were digested for MRM assay. 3. Serum preparation for MRM Serum preparation was performed following the instructions of the Proteominer Kit15 (Bio-Rad, Hercules, CA, USA). Briefly, equilibrated Proteominer resin was mixed with 100 µl serum and incubated for two hours at room temperature with rotation. After removing the unbound proteins by washing three times, all bound proteins were eluted with an elution buffer containing 8 M Urea and 2% CHAPS. The eluted proteins were reduced with 10 mM DTT and alkylated with 55 mM iodoacetamide, and the treated proteins were precipitated by acetone in -20 °C followed by centrifuging at 20,000 g. The sediment was re-natured in 8 M urea, and the dissolved proteins

8 / 33

ACS Paragon Plus Environment

Page 8 of 38

Page 9 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

were diluted to a urea concentration of 1 M. After serum protein levels were quantified by the Bradford assay, approximate 50 µg were taken for tryptic digestion using the FASP protocol. 4. Proteomic analysis with SWATH-MS The tryptic peptides generated from the ESCC and adjacent tissues were directly delivered to a nano-HPLC system, Eksigent400 (Eksigent, AB Sciex, USA), which was mounted with a 50 cm analytical column (75 µm×50 cm, 3 µm, 100 Å, Dionex). The peptides were eluted with a linear gradient of solvent B from 8% (98% Acetonitrile, 0.1% FA) to 32% over 208 min at a flow rate of 300 nl/min, and the eluted peptides were monitored at 5600 TripleTOFTM (AB SCIEX, Framingham, MA, USA). For ion library generation, peptide mixtures containing equal amounts of peptides from the pooled ESCC and adjacent tissues were first analyzed by QTOF MS. Data acquisition was implemented in a DDA mode in 5600 TripleTOFTM, in which a 250-ms was selected for survey scan (TOF MS), and the top 20 ions in the TOF MS with intensities higher than 120 counts/s and multiple charges were selected for 50-ms MS/MS scans. The pooled peptides of the ESCC or adjacent tissues were separately delivered to a 5600 TripleTOFTM. Data acquisition was implemented in SWATH mode, a DIA mode in 5600 TripleTOFTM in which a 50-ms was set for TOF-MS, and the MS/MS scan windows followed the regular SWATH approach by setting continuous 25 Da windows through 400 to 1200. 5. Proteomic analysis with MRM-MS The MRM assay was performed on a QTRAP5500 mass spectrometer (AB SCIEX, Framingham, MA, USA) equipped with a nanoAcquity UPLC system (Waters, Milford, MA) with a nanocapillary column (ID 75 µm x 20 cm, 1.7 µm particles, 100 Å aperture, Waters, Milford, MA). The peptides prepared from the pooled or individual tissue lysates or sera were eluted with

9 / 33

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 38

a non-linear gradient program at 300 nl/min. The mobile phases consisted of solvent A, 2% acetonitrile with 0.1% aqueous formic acid, and solvent B, 98% acetonitrile with 0.1% formic acid. Peptides were separated on and eluted with a gradient of 5-35% solvent B for 40 min followed by 35%-90% solvent B for 5 min. The MS parameters for all the MRM experiments were set as ionspray voltage (IS), 2500 V; curtain gas (CUR), 35.00; ion source gas1 (GS1), 20.00; collision gas (CAD), high; interface heater temperature (IHT), 150; declustering potential (DP), 100.00; entrance potential (EP), 10.00; Q1 and Q3, unit resolution. A total of 15 target peptides of β-gal at transition/peptide were assessed daily as QC for the LC-MS system. In the MRM mode, the digested peptides were scanned with the collision energy (CE) calculated by a series equation, CE=a*m/z+b, based on the m/z and the charge statuses of parent ions, in which the parameter pairs a and b were set as unknown ion, 0.044, 6; double charged ion, 0.036, 8.857 and triple charged ion, 0.0544, -2.4099. For the global evaluation of peptide abundances, three external unique peptides (IPDFILQR, NLASGDLIAYANESIR and GGILGITVR) were added into each sample for normalization of the MRM signal. For MRM verification in tissues, the transition list was divided into 12 transition lists with at most 300 transitions per MRM method. For the final MRM quantification in serum, the scheduled MRM™ Algorithm was used with the retention time acquired from the previous test: the MRM detection window was 120 seconds, and the target scanning time was 1 second. 6. Data processing Based on the MS data acquired from the DDA mode of 5600 TripleTOFTM, ProteinPilot (Version 4.5), set at “through mode”, was used for peptide searching against the swissprot human (20, 204 sequence) database. The search results of the three DDAs were used as reference spectral

10 / 33

ACS Paragon Plus Environment

Page 11 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

libraries for targeted extraction separately. The PeakView SWATH Processing Micro App (AB Sciex, USA) was used to identify the correct peak group in a set of fragment chromatograms with peaks at the same retention time with parameters setting as, 1) 5000 proteins and 1000 peptides/proteins, 2) 75 ppm m/z tolerance and 30 min extraction window, 3) confidence setting 50%, 4) FDR (false discovery rate) as 1%, 5) shared peptides excluded for SWATH analysis and 6) modified peptides included. Peak-group scoring was similar to that described previously17 and used a combination of chromatographic correlation (related peaks should have the same shape, width and retention time), mass error and additional predicted fragments ions; a decoy strategy was used to select most likely peak groups for FDR control. For RT correction, the RTs from different data sources were implemented by linear regression as the equation of y=ax+b, in which y represents the RTs in DDA data and x represents the RTs from the SWATH data. All the retention times of the identified peptides from DDA were globally corrected, and a new ion library was constructed consisted by the peptides with corrected RTs, against which the SWATH data were analyzed using Peakview SWATH Processing Micro App (AB SCIEX, Framingham, MA, USA). The MRM methods and raw data were processed by Skyline16, an open-source software (http://skyline.maccosslab.org/), in which the modified mProphet scoring algorithm was implemented for MRM peak selection17,18. The transitions extracted from SWATH MS were used as MS/MS spectral library to select peptides and transitions for the MRM assays. After data acquisition, all raw data was imported into Skyline. Skyline used a scoring model for peak selection with q-values under 0.05 (corresponding to an estimated false discovery rate of 0.05 or lower). For peptides for which no detected peak met this criterion, Skyline did not identify any peaks.

11 / 33

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 38

SRMstats19, developed by Aebersold’s group, was employed to evaluate protein abundances in SWATH and MRM. The qualified transitions obtained from SWATH or MRM were imported to SRMstats and processed following its manual, which contained definitions of the biological populations of interest and of the desired scope of conclusions, exploratory data analysis to control the quality of MS runs, joint representation of the quantitative measurements of proteins using flexible linear mixed-effects models, and model-based determination of proteins that change in abundance from one condition to another. All figures were constructed using scripts written in the R language. For instance, boxplots were generated using the bwplot function of lattice package, and cluster plots were constructed using the heat.map2 function of the gplots package.

Results 1. Identification of proteins in ESCC tissues by SWATH Peptide extraction from SWATH MS requires a priori generation of an ion library that includes the essential coordinates of peptides, such as precursor ion masses, fragment ion masses, fragment ion intensities and retention time. The pooled lysates from ESCC or adjacent tissues were tryptic-digested, and the digested peptides were divided to two parts, one for DDA acquisition in which equal amounts of the digested peptides were mixed and loaded on the MS and the other for SWATH acquisition in which the peptides from ESCC or adjacent tissues were separately loaded. Three data sets were thus generated: DDAmix, SWATHESCC and SWATHadjacent, with three injection replicates for each. Although the MS/MS signals of peptides

12 / 33

ACS Paragon Plus Environment

Page 13 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

were acquired in different modes, the peptides were eluted under the same chromatographic conditions. It was thus reasonable that corresponding MS/MS signals from the different data sets could be constructed through RT correction. Generation of XICs (eXtracted Ion Chromatograms) from TICs (Total Ion Chromatograms) is important for peptide extraction from SWATH MS by selecting proper extraction windows around target peptides. Generally, when a larger window of elution time tolerance is used, more peptide candidates are extracted, but expanding this extraction window increases the number of extraction errors. Selecting the optimal extraction windows is a time-consuming and data-depending process. In contrast to the traditional approach, another method is to extract peptides with a fixed extraction window from an ion library that is generated from RT correction to DDA and SWATH MS. We adopted this strategy for the DDA and SWATH MS data from the ESCC and adjacent tissues. The search results of DDAmix with three replicates were used to extract peptides from the SWATH MS data, and the correlations of retention times for the shared peptides elicited from DDA and SWATH MS were regressed. All peptides extracted from each SWATH MS spectrum were used to calculate linear regression models of retention time between individual SWATH and DDA spectra, respectively. Of the regression parameters gained from six parallel SWATH acquisitions, the coefficient of variation was only 0.73%, indicating that the regression curves of the SWATH RTs versus the DDA RTs were highly reproducible. In RT correction, the average RTs of both SWATH and DDA were thus acceptable to generate an averaged linear regression model, as shown in Figure 1A with a regression coefficient of 0.98. Based on the averaged linear regression of the peptides extracted from SWATH MS, the RT values of all identified peptides from DDA were globally corrected, and all peptides with corrected RTs formed a new ion library, termed the “target ion library”.

13 / 33

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 38

Furthermore, during the extraction of SWATH data by Peakview, the scoring algorithm was modified from mProphet to construct an error model that was automatically adapted to each data set and assigned a confidence to each peak group for quality control17,18. In short, the peptides from SWATH MS data were extracted against the target ion library and its corresponding decoy ion library. The extracted results were evaluated by the scoring system in Peakview for FDR control. As illustrated in Figure 1B, a distribution of the frequencies of transition scores from the target or decoy ion library was generated. The decoy distribution is roughly Gaussian, indicating that the scoring statistical results were valid for quality control of all extracted peptides. The model assigned high confidence to target peptides that were to the far right of the decoy curve, indicating that these peptides extracted from SWATH had been identified at an acceptable FDR. Using the ion library generated by DDAmix1_2_3 with an FDR of 5%, 1758 unique proteins with 9452 unique peptides were extracted in at least three SWATH injections. Two recent documents have reported on proteomics analyses of ESCC and adjacent tissues using LC-MS and identified a total of 802 and 687 unique proteins identified in the tissues, respectively. We performed a comparison of our SWATH data with the published results. The Venn plot illustrated in Figure 1C reveals that approximately 64% from Fan’s report and 57% from Pawar’s report were included in our SWATH data, whereas the overlap rates between Fan’s and Pawar’s report were relative diverse, from 49% to 57%6,20. Using SWATH to profile the esophageal tissues, when a relatively large pool of the identified proteins were generated and compared with the published data sets of ESCC, the SWATH approach was proven acceptable for discovery proteomics. Moreover, the overlap ratio between the SWATH and previously reported data was above 60%, suggesting that the identified proteins by SWATH were representative for the proteins in esophageal tissues.

14 / 33

ACS Paragon Plus Environment

Page 15 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

2. Analysis of differentially expressed proteins between ESCC and adjacent tissues We next investigated the differences between ESCC and its adjacent tissues in protein abundances from SWATH data. There is no commonly accepted approach to estimate protein significances from MS/MS signals. Choi et al developed an algorithm entitled SRMstats to estimate protein abundances from MRM data which that converts the different peak areas of peptides to protein fold changes19. By employing SRMstats, a total of 1758 proteins were defined with quantitative information in at least three MS replicates, and the quantification results in tissues by SWATH MS are listed in Supporting Table 1. The protein folds for the 1758 proteins are plotted against their corresponding p-values to generate a volcano plot (Figure 2A). Stringent criteria defining differentially expressed protein between ESCC and its adjacent tissues were set at fold change>=2, peptide number for a protein >=2 and p-value