Biomarker Development for Intraductal Papillary Mucinous

In this study, a robust multiple reaction monitoring (MRM) pipeline was applied to discovery and verify IPMN biomarker candidates in a large cohort of...
0 downloads 12 Views 2MB Size
Subscriber access provided by UNIV OF ST THOMAS

Article

Biomarker Development for Intraductal Papillary Mucinous Neoplasms Using Multiple Reaction Monitoring Mass Spectrometry Yikwon Kim, MeeJoo Kang, Dohyun Han, Hyunsoo Kim, KyoungBun Lee, SunWhe Kim, Yongkang Kim, Taesung Park, Jin-Young Jang, and Youngsoo Kim J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.5b00553 • Publication Date (Web): 12 Nov 2015 Downloaded from http://pubs.acs.org on November 13, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Biomarker Development for Intraductal Papillary Mucinous Neoplasms Using Multiple Reaction Monitoring Mass Spectrometry

Yikwon Kim1,#, MeeJoo Kang2,#, Dohyun Han1, Hyunsoo Kim1, KyoungBun Lee3, Sun-Whe Kim2, Yongkang Kim4, Taesung Park4,5, Jin-Young Jang2,*, and Youngsoo Kim1,*

Departments of Biomedical Engineering1, Surgery and Cancer Research Institute2, and Pathology3, Seoul National University College of Medicine, 28 Yongon-Dong, Seoul 110-799 Korea; Department of Statistics4, and Interdisciplinary program in Bioinformatics5, Seoul National University, Daehakdong, Seoul 151-742 Korea

#

These authors contributed equally to this work.

*

Address correspondence to: Youngsoo Kim or Jin-Young Jang

Dr. Jin-Young Jang Department of Surgery, Seoul National University College of Medicine, 28 Yongon-Dong, Chongno-Ku, Seoul 110-744, Korea; (Email) [email protected]; (Tel) +82-2-2072-2194; (Fax) +82-2-741-2194 or Dr. Youngsoo Kim Department of Biomedical Engineering, Seoul National University College of Medicine, 28 Yongon-Dong, Chongno-Ku, Seoul 110-799, Korea; (Email) [email protected]; (Tel) +82-2-740-8073; (Fax) +82-2-741-0253

1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Intraductal papillary mucinous neoplasm (IPMN) is a common precursor of pancreatic cancer (PC). Much clinical attention has been directed toward IPMNs due to the increase in the prevalence of PC. The diagnosis of IPMN depends primarily on a radiological examination, but the diagnostic accuracy of this tool is not satisfactory, necessitating the development of accurate diagnostic biomarkers for IPMN to prevent PC. Recently, high-throughput targeted proteomic quantification methods have accelerated the discovery of biomarkers, rendering them powerful platforms for the evolution of IPMN diagnostic biomarkers. In this study, a robust multiple reaction monitoring (MRM) pipeline was applied to discovery and verify IPMN biomarker candidates in a large cohort of plasma samples. Through highly reproducible MRM assays and a stringent statistical analysis, 11 proteins were selected as IPMN marker candidates with high confidence in 184 plasma samples, comprising a training (n = 84) and test set (n = 100). To improve the discriminatory power, a 6-protein panel was constructed by combining marker candidates. The multimarker panel had high discriminatory power in distinguishing between IPMN and controls, including other benign diseases. Consequently, the diagnostic accuracy of IPMN can be improved dramatically with this novel plasma-based panel in combination with a radiological examination.

Keywords: IPMN, Plasma, Biomarker development, LC-MRM, Targeted proteomics

2

ACS Paragon Plus Environment

Page 2 of 39

Page 3 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction Intraductal papillary mucinous neoplasm (IPMN) is a frequent precursor of pancreatic cancer (PC) that progresses from low-grade dysplasia to invasive IPMN and ductal adenocarcinoma. PC is one of the most lethal cancers, with a 5-year survival rate of approximately 7%. There are no specific symptoms or signs that suggest PC, and a late diagnosis of advanced-stage disease is one of the reasons for the poor treatment outcomes. However, when diagnosed in the early stages, the 5-year survival rate of PC exceeds 58%.1 Thus, it is critical to detect PC in the early stages, and much effort has been made in examining its precursor lesions, especially IPMN. Most IPMNs are benign, but the prognosis of invasive IPMN is poorer than that of stomach and colon cancer, with a 5-year survival rate of 49%.2 If IPMN progresses to pancreatic ductal adenocarcinoma, this rate decreases to 29%,3 necessitating the early detection of patients with IPMN before it advances to invasive IPMN or PC. IPMN is diagnosed primarily by radiological examination, and its incidence and rate of detection have risen by more than 14-folds in the past 2 decades. However, IPMN has no specific symptoms or signs, and those who do not undergo radiologic surveillance cannot be diagnosed. Thus, accurate and effective screening tools, such as protein biomarkers, are needed to identify patients with IPMN. There are few studies on biomarkers with regard to the diagnosis of IPMN. Serum tumor markers, such as CEA and CA19-9, have been introduced, but their accuracy is merely approximately 70%. Recently, GNAS and KRAS mutations in pancreatic tissue4 and cyst fluid5 have been implicated as promising biomarkers for the detection of IPMN, but these biomarkers are evaluated in pancreatic tissue that can only be obtained after surgery, and cyst fluid that is acquired invasively. The ideal biomarker must be specific, sensitive, and predictive. But importantly, the samples that are used to test for it should be taken noninvasively through simple, rapid, accurate, and inexpensive methods. As a result, we need biomarkers for IPMN that can be detected in samples of blood, which is more accessible than cyst fluid. The current limitations of diagnostic biomarkers are that they entail labor-intensive 3

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

procedures and require significant amounts of time for analysis, necessitating sensitive and robust analysis tools to mitigate these bottlenecks. For example, multiple reaction monitoring (MRM) is a multiplex method that uses a triple-quadrupole (QQQ) mass spectrometer that quantifies several proteins in a single MS run. MRM performs highly specific detection of numerous target proteins without background interference,6-8 which increases throughput and limits the loss of samples. Further, the sensitivity and accuracy of MRM have been improved through the development of the mass spectrometer and stable isotope standard (SIS) peptides as references. The sensitivity of MRM is similar to that of ELISA, which is the traditional diagnostic method for protein markers (ng/ml). Also, the quantitative results of MRM show high correlation (R2 > 0.9) with ELISA.9 Thus, we adopted MRM to perform a precise and robust quantification of the IPMN proteome. Based on the clinical significance of IPMN, the easy accessibility of blood samples for biomarkers, and the high-throughput nature of MRM, we performed MRM quantification to identify plasma-based biomarkers of IPMN. We discovered several preliminary candidates by comparing their abundance using strict statistical methods and verified them in 2 independent plasma sample sets. To improve the discriminatory power, a multiprotein panel was constructed by combining several marker candidates. Our panel showed high discriminatory power in differentiating IPMN from the heterogeneous control group, demonstrating its value as a diagnostic marker of IPMN and potential in preventing PC.

4

ACS Paragon Plus Environment

Page 4 of 39

Page 5 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Materials and methods

Materials Sequencing-grade modified trypsin was purchased from Promega (Madison, WI, USA). Multiple Affinity Removal System (MARS-6, 5185–5984) columns, Buffer A (5185-5987), and Buffer B (5185-5988) were obtained from Agilent (Santa Clara, CA, USA). Amicon Ultra 3K units were purchased from Millipore (Bedford, MA, UK). The SIS peptides (isotope labeled at 13C and 15N) were synthesized at crude levels from JPT (Berlin, Germany). The OASIS HLB 1 cc (30 mg) extraction cartridge was acquired from Waters (Milford, MA, USA).

Sample collection All plasma samples were collected by the Department of Surgery, Seoul National University College of Medicine and approved by the institutional review board of Seoul National University Hospital (No: 1412-051-632). A total of 184 plasma samples were used for this study. All sample characteristics are presented in Table 1. Plasma samples were centrifuged for 10 min at 3000 g and 4°C, and the supernatant fractions were stored at -80°C before analysis.

Global data mining The data mining was performed using previous reports and a public database to select PCrelated proteins. First, the public database Oncomine10 was searched per the following criteria: pancreatic cancer and p-value less than 0.05. Consequently, 235 proteins were selected. Then, 4 mutant forms of KRAS, GNAS, and AGER were added, based on their significance in PC.11-13 Next, 18 studies that were related to pancreatic cancer proteome were examined by manual inspection14-31 and filtered using the Plasma Proteome Database (PPD, http://www.plasmaproteomedatabase.org)32 to improve the detection rate by mass spectrometry; 161 proteins were selected as a result. Thus, a total 5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of 260 proteins were selected as initial MRM target proteins through these steps (Figure 1A), listed in Supplementary Table 1.

Sample preparation Six of the most abundant human plasma proteins were depleted on an HPLC instrument (Shimadzu Co., Kyoto, Japan) that was equipped with a MARS-6 LC column as described.33 The crude human plasma samples were diluted 5X with Buffer A and then passed through 0.22-μm centrifugal filter units. Then, 200 μl of each crude plasma sample was loaded onto the column, and flowthrough fractions were collected. The depleted plasma samples were concentrated on Amicon ultra 3K units. Next, 100 μg of each protein sample was denatured in 8 M urea, 20 mM DTT, and 100 mM Tris pH 8.0 for 60 min at 37°C and alkylated with 50 mM IAA for 40 min at room temperature in the dark. The samples were diluted 10-fold with 100 mM Tris, pH 8.0, and trypsin was added at a 1:50 (w/w) enzyme-to-protein ratio overnight at 37°C. The digested samples were acidified with 100% formic acid. The digested samples were desalted using OASIS HLB 1 cc (30 mg) extraction cartridges.34 Briefly, the cartridges were activated with methanol and acetonitrile and equilibrated with 0.1% formic acid. The digested samples were loaded onto the cartridge and washed with 0.1% formic acid. Finally, peptides were eluted sequentially with 40% and 60% acetonitrile in 0.1% formic acid. The eluted samples were lyophilized and stored at -80°C before MRM analysis. For the MRM analysis, the samples were reconstituted in 50 μl 0.1% formic acid (2 μg/μl), and 50 fmol of ß-galactose peptide (GDFQFNISR) was spiked into each sample as an external standard for preliminary MRM quantification measurements.

6

ACS Paragon Plus Environment

Page 6 of 39

Page 7 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

LC-MRM/MS MRM analysis was performed on a 6490 triple-quadrupole mass spectrometer that was coupled to an Agilent 1260 Infinity HPLC system (Agilent Technologies, Santa Clara, CA).35 Five microliters of each sample, dissolved in 100 μl of solution A (0.1% formic acid in water), was loaded onto an analytical column (Agilent, Zorbax SB-C18, 150 mm × 0.5 mm i.d., 3.5-μm particle size). The peptides were separated at a flow rate of 10 μL/min using a 70-min gradient, consisting of 3% solution B (0.1% formic acid in acetonitrile) for 5 min, 3% to 35% solution B for 45 min, 35% to 70% solution B for 1 min, 70% solution B for 10 min, and 70% to 3% solution B for 9 min. The instrument settings were: 2500 V ion spray capillary voltage, 2000 V nozzle voltage, 250°C drying gas temperature at a gas flow rate of 15 L/min, and 350°C sheath gas temperature at a gas flow rate of 12 L/min. Additional parameters were set as follows: 30 psi nebulizer, 380 V fragmentor voltage, 5 V cell accelerator voltage, 200 V delta EMV, 0.7 FWHM quadrupole resolution, and approximately 2-ms cycle time. Collision energies (CEs) for each transition were optimized in ramping collision energy mode. CE was ramped from the predicted value to ±2 V of the step size in 5 increments. Of 11 CE values, the best value was selected, based on the peak intensity of each transition.

Generation of target peptides by in silico digestion The FASTA file of target proteins was imported into Skyline (MacCoss Lab Software, ver 1.4.0), and in silico digestion was performed, based on the following criteria: 2+ charge state for fragment ion; MS spectrum range, 300-1400 m/z; and peptide length, 7-24 amino acids. The Nterminal signal peptides and peptides that contained Met, His, an NXT/NXS motif, or RP/KP were excluded. Approximately 5-6 transitions per peptide were generated, and those that were detected with at least 3 transitions were chosen as signature peptides. Further, transition patterns of the detected peptides were compared with the NIST library (National Institute of Standards and Technology, http://peptide.nist.gov/), which includes over 200,000 mass spectra from large proteomic datasets. The comparison between the query and library spectra is represented as a dot-product score, based on the 7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

scale of peak intensity. The threshold of dot-product score was set to 0.5, based on an earlier study,36 and most target peptides in this study had a dot-product score >0.6. The removal of interference and poor chromatograms was processed using the Protein Blast database (http://blast.ncbi.nlm.nih.gov/),37 AUDIT,38, 39 and SRM Collider.40

Data analysis and statistics The raw MRM analysis data were analyzed in Skyline. Savitzky-Golay smoothing was applied to increase the quality of the chromatograms. The peak integration was processed by manual inspection to correct for false assignments. In the preliminary quantification, endogenous peptides were normalized to ß-galactose peptide (GDFQFNISR) to correct experimental variation. In a subsequent, more precise quantification, the ratios of the areas of endogenous/corresponding SIS peptides (L/H) were calculated to measure protein abundance. Based on intensity and consistency, the best transition was selected and used for the quantification. The statistical analyses were performed using SPSS 21 (IBM, NY, USA)41, 42 and MSstats43, 44

to determine the significance of target proteins. Independent t-test, Mann-Whitney U test, logistic

regression, and classification analysis were performed using SPSS 21. LMM (linear mixed model) analysis and sample size calculations were performed in MSstats. ROC (receiver operating characteristic) analysis was conducted using MedCalc (MedCalc, Mariakerke, Belgium, v10.0.1.0).45, 46

The scatterplots were drawn by PCA (principal component analysis) using R, version 3.2.0. The heatmaps were constructed with the “gplots” package in R, version 3.2.0, using Euclidean distance and complete measure to perform clustering.

Evaluation of linear response curves To examine the quantitative reliability and determine the approximate quantity of each 8

ACS Paragon Plus Environment

Page 8 of 39

Page 9 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

individual SIS peptide to be spiked as a reference peptide against the corresponding endogenous peptide, serially diluted mixtures of 40 SIS peptides were spiked into 100 μg of pooled plasma matrix, and the mixture was serially diluted to 100, 33.3, 11.1, 3.7, 1.2, 0.4, and 0 (blank) fmol/μl. The normalized intensity (ratio of H/L areas) was used to draw linear response curves. All experiments were analyzed in triplicate. The standard errors between triplicates and R2 values are presented on each linear response curve.

Spiking of SIS peptides The concentration of stock solutions for each SIS peptide was 200 pmol/μl, containing 30% ACN in 0.1% formic acid. Based on the linear response curves, SIS peptides were spiked into plasma samples within a ten order of magnitude range against the corresponding endogenous peptides.47

9

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 39

Results

Clinicopathologic characteristics of the study subjects The 184 study subjects were divided into 2 groups: a training and test set. In the training set, plasma samples from 34 IPMN patients [low- and intermediate-grade dysplasia (LGD and IGD) n = 17; high-grade dysplasia and invasive IPMN (HGD and invasive IPMN) n = 17] and 50 heterogeneous controls (healthy control n = 25; chronic cholecystitis n = 25) were used to identify significantly changed proteins in IPMN. To demonstrate the specificity of the candidates for IPMN, chronic cholecystitis samples were included in the control group as benign controls. In the test set, plasma samples from 50 IPMN patients (LGD and IGD, n = 25; HGD and invasive IPMN, n = 25) and 50 healthy controls were used to verify and determine IPMN markers. Overall, the IPMN plasma samples were distributed uniformly with regard to clinicopathologic characteristics (LGD, n = 21; IGD, n = 21; HGD, n = 21; invasive IPMN, n = 21). The clinical characteristics of the plasma samples, including age, sex, CEA, and CA19-9, are listed in Table 1.

Overall workflow of MRM analysis A total of 260 PC-related proteins were selected by data-mining with previous reports and a public database (Figure 1A). Of these proteins, 104 unique candidates were detected in the pooled plasma sample by MRM analysis (40% detection rate). Next, MRM analysis of individual plasma samples (training set, n=84) was performed in 2 steps. In the first step, a preliminary quantification was conducted to generate a list of synthetic peptides as reference peptides. By pairwise comparison, the abundance of 29 proteins changed significantly (p-value < 0.05). Thus, SIS peptides that corresponded to the endogenous peptides of these 29 proteins were synthesized. In the second step, SIS peptides were quantified more precisely in triplicate to select confident and quantifiable candidates in the same individual samples (training set, n=84). As a result, 2 of 29 proteins were filtered out, based on Coefficient of variation (CV) values (median CV > 20%). Ultimately, 22 10

ACS Paragon Plus Environment

Page 11 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

proteins were differentially expressed in IPMN, based on LMM (adjusted p-value < 0.05), and selected as candidates for further verification. To verify our candidates, MRM analysis of the 22 proteins was performed in duplicate using independent plasma samples (test set, n=100). Prior to this step, sample randomization and a blind test were conducted to remove proteins with a batch effect and exclude any subjective bias by the experimenter. As a result, 11 proteins in the test set changed in abundance similarly to those in the training set. Thus, these verified proteins were determined as marker candidates of IPMN and used to construct a multiprotein panel. The list of proteins at each filtering step is listed in Supplementary Table 1. Consequently, a multimarker panel was established in the training set by combining candidates using logistic regression and verified in the test set. Further, we performed 5-fold crossvalidation with 100 replicates to determine whether the discriminatory power of the multimarker panel was overestimated. The overall scheme is presented in Figure 1B.

Preliminary quantification Based on the preliminary MRM quantification, 29 proteins were significantly altered in IPMN by pairwise comparison (p-value < 0.05); AGT, GSTP1, ICAM1, LDHB, CLU, TXN, CFI, CFH, PPBP, P4HB, THBS1, BTD, CPN2, SERPINC1, PKM2, FSTL1, SPARC, IGFBP2, IGFBP3, KLKB1, LRG1, C4BPA, APOH, APOC1, ITIH4, C4BPB, C5, IFRD1, and CO7. To assess their normality, the quantified peptides were analyzed by Kolmogorov-Smirnov test. Peptides that did not show normality were subjected to Mann-Whitney U-test, whereas those that did were analyzed by independent t-test. After performing statistical hypothesis test for independent samples, Levene's test was used to validate homoscedasticity, and then, p-values were calculated separately when equal variances were assumed or not. The 40 SIS peptides that corresponded to the 29 proteins were synthesized for further quantification.

11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 39

Quantitative reliability To determine the quantitative reliability and approximate quantity of SIS peptides to spike, the linear response curves of 40 peptides, corresponding to the 29 proteins, were generated using the H/L area ratio, based on the best transition. In the linear response curves, the R2 values of the 40 peptides ranged from 0.9525 to 1, reflecting the high quantitative reliability of our MRM analysis (Figure 2A). The approximate quantity of each SIS peptide to spike was decided, based on these linear response curves, within a ten order of magnitude range against the corresponding endogenous peptide.47 In addition, CVs between triplicates were evaluated using the L/H area ratio, based on the best transition (Supplementary Table 2). The overall median and average CVs of the target peptides were 7.6% and 10.6%, respectively. The median CVs for each target peptide ranged from 3.8% to 31.9%. Biomarker candidates must show good precision (CV < 20%) to discriminate diseases;48 thus, 2 proteins (FSTL1 and PKM2), with CVs of more than 20%, were excluded from further verification. The CV plots for each peptide are shown in Figure 2B.

Quantification of 29 proteins with SIS peptides The precise quantification of 29 proteins by MRM was performed using SIS peptides in the training set (n=84). The SIS peptides were spiked into individual plasma samples to normalize peak intensities of the corresponding endogenous peptides between MS runs. The changes in protein levels between IPMN and the control group were calculated by LMM analysis using MSstats. The entire dataset was normalized by equalization of the median intensities of the reference SIS peptides between all MS runs. By LMM analysis, differentially expressed proteins were selected, based on an adjusted p-value < 0.05. In total, 24 proteins modulated significantly in IPMN. Twelve proteins each were upregulated (LDHB, TXN, PPBP, THBS1, IGFBP2, LRG1, C5, CFI, PKM2, FSTL1, C4BPA, and C4BPB) and downregulated (AGT, CPN2, IGFBP3, KLKB1, APOC1, GSTP1, CLU, P4HB, BTD, SERPINC1, IFRD1, and CO7) in IPMN. However, PKM2 and FSTL1 were excluded from further 12

ACS Paragon Plus Environment

Page 13 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

verification, based on the CV values, as described above. Consequently, 22 proteins were selected as preliminary candidates for further verification. All quantification data for the 29 proteins is listed in Supplementary Table 2.

Verification of preliminary candidates in the test set We performed MRM analysis using independent plasma samples (test set, n=100) to verify the significantly changed proteins in the training set. The suitable sample size of the test set was calculated using the design sample size module in MSstats, based on data on the training set. Consequently, approximately 50 biological replicates were required to show a statistical power of 0.9, based on 22 preliminary candidates (Supplementary Figure 1). Thus, the test set comprised 50 healthy control and 50 IPMN plasma samples. All sample preparation methods were identical to those of the training set, except for the sample randomization and blindedness to the sample information. MRM analysis for verification was performed in duplicate. All mass data are shown in Supplementary Table 3. Based on the LMM analysis results, volcano plots were drawn to show the distribution of proteins as quantitative changes between the IPMN and control groups for the training and test sets (Figure 3). The volcano plots showed that 17 of 22 proteins were also differentially expressed in the test set, 11 of which—KLKB1, IGFBP2, THBS1, PPBP, TXN, LDHB, IGFBP3, LRG1, C5, AGT, and CPN2—had similar expression changes between IPMN and control samples as in the training set. We hypothesized that these proteins are marker candidates of IPMN with consistency and stability. The remaining 6 proteins—C4BPA, C4BPB, CO7, P4HB, APOC1, and GSTP1—underwent disparate quantitative changes versus the training set and were thus rejected as marker candidates of IPMN. The information on the 11 IPMN marker candidates is presented in Table 2, and their expression patterns are represented as interactive plots (Figure 4), which show their discrimination between samples Based on the interactive plots, 7 markers—LDHB, TXN, THBS1, IGFBP2, LRG1, PPBP, and C5—were upregulated, and 4—IGFBP3, KLKB1, AGT, and CPN2—were downregulated in IPMN. Except for AGT and C5, 9 proteins could clearly discriminate between samples. 13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 39

Further, to determine the quantitative differences in candidate markers, based on their signature peptides, the levels of 5 proteins that were identified with 2 peptides were compared in scatter plots and by fold-change: THBS1, LRG1, C5, IGFBP3, and LDHB. In the scatter plots, the R square values between signature peptides ranged from 0.5891 to 0.9426 (Supplementary Figure 2), and fold-change values, based on signature peptides, were similar between the disease and control groups: LDHB, 1.80-fold versus 1.79-fold; IGFBP3, 0.76-fold versus 0.80-fold; THBS1, 2.20-fold versus 2.28-fold; LRG1, 1.13-fold versus 1.21-fold; and C5, 1.07-fold versus 1.11-fold (Supplementary Table 4).

Construction of the multiprotein panel ROC analysis was performed to show the discriminatory power of the 11 candidate markers (Table 2). As a result, 5 proteins had AUC values that exceeded 0.7 in the training and test sets: LDHB, TXN, THBS1, PPBP, and KLKB1. The AUC values of IGFBP2, IGFBP3, and LRG1 ranged from 0.6 to 0.7 and were greater than 0.7 for each independent set. C5, AGT, and CPN2 had AUC values of less than 0.7 the training and test sets. Notably, the differences in AUC values for LDHB, TXN, THBS1, IGFBP2, and CPN2 exceeded 0.1 between the training and test sets (Table 2). For example, the AUC value for LDHB was 0.710 in the training set versus 0.937 in the test set. These results might have been caused by a batch effect or composition of control group, based on the assay design. As described above, the training set contained benign control samples, in contrast to the test set. Also, the blind test with sample randomization was performed only for the test set. When IPMN samples were compared with only healthy control samples in the training set, the AUC values of LDHB, TXN, and THBS1 rose dramatically from 0.71, 0.78, and 0.711 to 1, 0.964, and 0.938, respectively (Table 2), indicating that they are less specific for IPMN. Conversely, the AUC values of IGFBP2 and CPN2 did not change significantly, suggesting that the batch effect leads to difference in AUC values between the training and test sets. To circumvent the limitations of single-protein markers with regard to specificity and consistency, a multiprotein panel was constructed by combining candidates to improve their 14

ACS Paragon Plus Environment

Page 15 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

specificity, stability, and discriminatory power. The panel was analyzed by multiple logistic regression using the stepwise method to select the best combination of markers that formed the most significant model in the training set, among the combinations of the 11 selected markers. We used SPSS to perform the stepwise selection with likelihood ratio test. As a result, IGFBP3, PPBP, C5, CPN2, IGFBP2, and LDHB were aggregated into a multimarker panel for IPMN with the training set. To measure the discriminatory power of the fixed panel, the panels were applied to discriminate between a heterogeneous control and IPMN samples for the training set—the panel had an AUC value of 0.957 in the training set (Figure 5A). To determine its performance in distinguishing IPMN from other benign diseases, the multimarker panel was used to compare IPMN and benign controls (chronic cholecystitis) except for healthy controls. The discriminatory power of the multiprotein panel was good, as evidenced by its AUC value of 0.928 in the training set (Figure 5B). The constructed classifier of the training set was applied to discriminate between control and disease groups for the test set as a verification step. In the test set, the panel still had high discriminatory power, with an AUC value of 0.984, verifying its performance (Figure 5A). The performance of the multiprotein panel was represented as a classification table, error rates, and AUC values (Table 3). The percentage of classification and error rates ranged from 83.1% to 93.0% and from 1.1% to 3.4% in the training and test sets, respectively. Based on these values, this panel could clearly discriminate between IPMN and heterogeneous controls, including healthy controls and other benign diseases. Further, multicollinearity was inspected using SPSS 21 to show independence between the 6 proteins of the panel. The VIF (variation inflation factor) was calculated by logistic regression as a criterion of multicollinearity; a VIF of less than 10 reflects almost no multicollinearity between objects, and a value of more than 10 demonstrates multicollinearity. The VIFs of the 6 proteins ranged from 1.142 to 4.926 in the training set and from 1.474 to 2.445 in the test set (Supplementary Table 5). Thus, there was little multicollinearity between components of the multimarker panel.

15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 39

Comparison with current pancreatic cancer markers Because CEA and CA19-9 are currently used as biomarkers for pancreatic cancer,49, 50 and they are occasionally elevated in IPMN51, we compared our multiprotein panel with CEA and CA19-9 by ROC analysis. The AUC values of CEA were 0.568 and 0.647 between IPMN and controls in the training and test sets, respectively, versus 0.628 and 0.551 for CA19-9 (Figure 5C). Thus, discriminatory power of our multimarker panel was significantly higher in differentiating IPMN from non-IPMN compared with CEA and CA19-9.

Further verification of the multimarker panel To more accurately evaluate and examine the probability of overestimating the performance of the 6-protein panel, we performed a 5-fold cross-validation with 100 replicates using 184 plasma samples. The resulting AUC values ranged from 0.8975 to 0.9306 in 100 replicates in the crossvalidation. The median and mean AUC values were 0.9156 and 0.9160 between replicates, respectively. The accuracy, sensitivity, and specificity of the multimarker panel were 0.838, 0.814, and 0.865, respectively. Based on these results, the classifier showed reliable and consistent performance, regardless of sample composition. Detailed information on the multimarker panel for the cross-validation is presented on Figure 6A. Further, PCA was performed to show the discrimination of sample groups using 6 peptides, corresponding to the 6 proteins in the multimarker panel. By PCA, scatterplots were drawn in 2 and 3 dimensions, based on log2 scale (Figure 6B). The 2-dimensional scatterplot was drawn using the first and second principal components, calculated by PCA, for the correlation structure of the 6 selected peptides. The 3-dimensional scatterplot was generated using the first, second, and third principal components as for the 2-dimensional scatterplot. Although some samples were compounded in the midrange of the scatterplots, most separated well. Also, the cumulative proportions ranged from 0.341 to 0.962 as principal components (Figure 6B). Based on the cumulative proportions, if we had used only 1 principal component, we would not have been able to analyze the case samples more precisely. 16

ACS Paragon Plus Environment

Page 17 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Thus, the PCA demonstrated the multimarker panel model performed better than the single-peptide model. Finally, the expression of the 6 peptides in our proposed multimarker panel was correlated by on a heat map. The heat maps were drawn using the log2-transformed area ratio in the training and test sets (Supplementary Figure 3). Based on the heat maps, there were 2 groups of proteins in the multiprotein panel that appeared consistently on the heat maps of training samples alone and test samples alone. Thus, the construction of peptides in the multiprotein panel is homogeneous in all sample sets.

17

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 39

Discussion MRM is a high-throughput method that quantifies numerous proteins in a single-shot MS run. This property can limit the loss of samples and shorten the analysis time, rendering MRM advantageous for discovering biomarkers in large cohorts. In this study, 11 plasma protein candidate markers that discriminated between IPMN and controls were discovered and verified by MRM in a large cohort of samples (184 plasma samples). Further, a powerful multi-protein panel was constructed by combining several of these candidates. These proteins were secretory proteins that resided in the extracellular space and had functions that were related to the regulation of metabolism. Of these factors, the combination of C5, CPN2, IGFBP2, IGFBP3, LDHB, and PPBP distinguished IPMN from healthy people and those with chronic cholecystitis. The molecular functions of the 6 proteins were related to binding, catalytic activity, enzyme regulatory activity, and receptor activity. Several of the 11 candidate markers are associated with pancreatic cancer. For example, LDHB is ubiquitously expressed in normal and tumor tissues, and its levels are considerably higher in breast cancer and bladder cancer than in normal. Expression of LDHB is a key tumorigenic driver in many cancers, acting as a downstream mTOR effector. Recently, LDHB was suggested to be a metabolic marker of the response to neoadjuvant chemotherapy in breast cancer.52 Also, LDHB suppresses the progression of pancreatic cancer.53 Interestingly, in our study, LDHB was overexpressed by 1.6-fold in the training set and 2.27-fold in the test set. IGFBP3 inhibits cellular proliferation and induces apoptosis. Several cancers, such as breast, colon, prostate, cervix, lung, liver, renal, and esophageal, have an inverse relationship with IGFBP3. Especially, IGFBP3 has recently been implicated in pancreatic cancer with regard to its prognosis and in differentiating it from nonpancreatic cancer.54, 55 In this study, IGFBP3 was downregulated by 0.78fold in the training set and by 0.81-fold in the test set. TXN participates in various redox reactions, the pathways of which are upregulated in tumors.56, 57 TXN regulates tumor growth in breast cancer, prostate cancer, and colon cancer. TXN has been also reported to be upregulated in pancreatic cancer via the AKT/mTOR pathway, and a phase II 18

ACS Paragon Plus Environment

Page 19 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

randomized study has been completed using TXN as a target.58 TXN rose by more than 5-fold in our IPMN patients. THBS1 is an adhesive glycoprotein that mediates cell-to-cell and cell-to-matrix interactions and functions in platelet aggregation, angiogenesis, and tumorigenesis. It has been proposed as a marker and therapeutic target in gastric, breast, ovarian, and thyroid cancer. THBS1 is expressed in cells that are derived from a high-metastatic variant cell line and has been suggested as a prognostic indicator in pancreatic cancer.59 Notably, THBS1 is a marker of invasiveness in IPMN.60 In this study, expression levels of THBS1 were significantly changed by 1.96-fold and 5.74-fold in the training and test sets, respectively. TXN and THBS1 clearly discriminated between IPMN and the healthy control group, with AUC values of 0.919 and 0.947, respectively. However, despite the high fold-change, these two proteins differentiated IPMN from the heterogeneous control group poorly, including chronic cholecystitis, with AUC values of 0.780 and 0.711. Based on poor discriminatory power for heterogeneous samples, TXN and THBS1 were excluded from the multi-proteins panel. This study is the largest-scale project to apply MRM to discover plasma-based biomarkers for improving the diagnostic accuracy of IPMN. Our multimarker panel had high discriminatory power, with a mean AUC value of more than 0.9 and an accuracy, specificity, and sensitivity of over 0.8 in all randomly split sample sets. Thus, once fully validated in the clinic, it will be possible for IPMN patients to be diagnosed with this multimarker panel without undergoing invasive procedures, such as biopsy and endoscopy. Further, being a significant precursor of pancreatic cancer, several of the identified markers are related to pancreatic cancer. These markers were detectable in plasma, potentially guiding the development of more effective diagnostic tools for the early detection of IPMN, which increases the risk of pancreatic cancer. When these markers and the multimarker panel are applied to discriminate between pancreatic cancer and IPMN, they will have greater clinical significance in terms of preventing of the progression of PC. Despite its significance, there are some limitations in our study. With regard to overestimating the performance of the multiprotein panel, the classifier was tested in randomly 19

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 39

composed sample sets using 184 plasma samples, based on 5-fold cross-validation with 100 replicates. As a result, the AUC values were consistent across all replicates, indicating that the overestimation was mitigated. However, the best method of determining performance is to test our markers panel in independent samples that were not used in selection process of individual markers. Another limitation of this study was the low number of study subjects. Eleven candidate markers were suggested to have discriminatory power between IPMN and normal healthy controls, of which LDHB,61 TXN,62 THBS1,63 and IGFBP355, 64 are related to diagnosis, cancer development, and tumor progression, and prognosis in pancreatic cancer, implicating IPMN as a precancerous lesion of pancreatic cancer. However, there is no direct evidence of its relevance to the progression of PC due to the limited number of study subjects, rendering it difficult to perform a subgroup analysis according to the degree of dysplasia. Further, the control samples consist solely of a limited set of benign disease controls, such as chronic cholecystitis. Therefore, further studies with more plasma samples from IPMN patients and benign disease control groups, such as liver disease, pancreatitis, and other cystic lesions of the pancreas, should be performed before addressing the value of the markers that we have identified. Only when these classifiers are applied to larger sample sets will their true performance be determined.

20

ACS Paragon Plus Environment

Page 21 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Conclusion Targeted quantification, based on mass spectrometry, was successfully applied to discover potential biomarkers for IPMN. This robust method detected and verified IPMN candidate markers in large scale of plasma samples. A total of 11 proteins were identified without an enrichment step and demonstrated significant changes in abundance in the plasma samples of IPMN patients. Further, 6 of the 11 candidates were combined into a powerful multiprotein panel for IPMN. This panel discriminated between IPMN patients and a heterogeneous control population with high accuracy. These results were verified in independent plasma samples. Thus, our multimarker panel can improve the diagnostic accuracy of IPMN, in combination with a radiological examination.

21

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Acknowledgments This work was supported by the Multi-omics Research Program through the National Research Foundation and a National Research Foundation grant (No. 2011-0030740), funded by the Korean government [MSIP, Korea]. This work was also supported by the Industrial Strategic Technology Development Program (#10045352), funded by the Ministry of Knowledge Economy (MKE, Korea), and a grant from the Korean Health Technology R&D Project, Ministry of Health & Welfare, Republic of Korea. (HI14C2640, No. HI14C1277).

Notes The authors declare no competing financial interest.

Supporting Information

Six supplementary files are available: (1) Supplementary Figures (2) Supplementary Table 1 (3) Supplementary Table 2 (4) Supplementary Table 3 (5) Supplementary Table 4 (6) Supplementary Table 5

This material is available free of charge at http://pubs.acs.org.

22

ACS Paragon Plus Environment

Page 22 of 39

Page 23 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure legends

Figure 1. Venn diagram of data mining and overall flowchart (A) A Venn diagram of the data mining. A total of 235 proteins were searched in the public database Oncomine, and 4 proteins with mutant forms were added to the Oncomine-based proteins: KRAS_G12D, AGER_G82S, GNAS_R201C, and GNAS_R201H. The sum of these proteins is displayed with ONCOMINE + Mutant category of the Venn diagram. Then, 161 proteins were searched in previous papers and PPD. These proteins are displayed with previous papers and the PPD category of the Venn diagram. A total of 260 proteins were selected as initial target proteins and processed further. (B) Overall flowchart of the quantitative analysis of IPMN. The IPMN samples were divided into two groups: a training and test set. The training set consisted of 34 IPMN and 50 heterogeneous control plasma samples, including benign controls. The test set comprised 50 IPMN and 50 healthy control plasma samples, after sample randomization and blind test. MRM quantification for the discovery of marker candidates was performed in the training set, and 22 proteins were discovered as IPMN candidate markers for further verification; 11 of the 22 proteins were verified in an independent sample set (test set) and were selected as IPMN candidate markers. For the fixed marker candidates, multivariate analysis was performed by combining candidates by logistic regression. Consequently, a 6-protein panel was constructed in the training set and verified in the test set. This panel had powerful discriminatory power against benign controls and healthy controls. In a further verification step by cross-validation, the classifiers performed consistently and reliably, regardless of sample composition.

Figure 2. Linear response curves and CV distribution of target peptides (A) To evaluate the linearity of the MRM analysis, linear response curves were drawn for 7 amounts of SIS peptides, spiked into 100 μg of plasma matrix: 0 (blank), 2.1, 6.2, 18.5, 55.6, 166.7, and 500 fmol were injected into the mass spectrometer. The x- and y-axes indicate the amounts of injected SIS 23

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 39

peptides and the H/L area ratio, respectively. All samples were analyzed in triplicate. The R square values ranged from 0.9525 to 1 for 40 peptides. The error bars are presented for each spike. (B) The CVs of 40 target peptides were calculated to demonstrate the reliability of this quantification. The median and average CVs of the entire dataset were 7.6% and 10.6%, respectively. The median CVs for each target peptide ranged from 3.8% to 31.9%. All CVs were obtained based on the L/H area ratio of the best transition. Ultimately 38 target peptides, excluding FSTL1 and PKM2, had a median CV of less than 20%. The x- and y-axes indicate the type of peptides and median CV, respectively.

Figure 3. Volcano plots Volcano plots were drawn to reflect the expression patterns of target proteins between IPMN and controls, based on LMM. Eleven proteins had similar changes in abundance between the training and test sets: IGFBP2, KLKB1, PPBP, THBS1, C5, TXN, LDHB, LRG1, IGFBP3, CPN2, and AGT. The x- and y-axes indicate the log2 fold-change values and the -log10 adjusted p-values, respectively. The horizontal dotted line represents 0.05 of the adjusted p-value. Black, red, and blue dots indicate proteins with adjusted p-values of more than 0.05, adjusted p-values of less than 0.05 and upregulated, and adjusted p-values of less than 0.05 and downregulated, respectively.

Figure 4. Interactive plots of 11 IPMN candidate markers Interactive plots were drawn to show the expression patterns of 11 IPMN candidate markers between IPMN and the control group. The plots represent the training and test sets. The 11 candidates had similar changes in expression between the training and test sets. The x- and y-axes indicate the group name and L/H area ratio, respectively. The sensitivity, specificity, and cutoff value for each marker are presented on interactive plots.

Figure 5. Comparative ROC analysis between IPMN multiprotein panel, CEA, and CA19-9 A comparative ROC analysis was performed for the IPMN multiprotein panel against CEA and CA19-9. The multimarker panel comprised C5, CPN2, IGFBP2, IGFBP3, LDHB, and PPBP. 24

ACS Paragon Plus Environment

Page 25 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(A) ROC curves of the IPMN multiprotein panel and the 6 components between IPMN and the control group. AUC values were 0.957 and 0.984 for the training and test sets, respectively. (B) ROC curves of the IPMN multiprotein panel and the 6 proteins between IPMN and chronic cholecystitis. The panel had an AUC value of 0.928. (C) ROC curves of current PC biomarkers between IPMN and control samples. CEA had AUC values of 0.568 and 0.647 in the training and test sets, respectively, versus 0.628 and 0.551 for CA19-9. Based on the AUC values, current PC biomarkers can not discriminate between IPMN and the control group.

Figure 6. Five-fold cross-validation and PCA for the multimarker panel Five-fold cross-validation and PCA were performed to examine overfitting and the discriminatory power of the 6-protein panel. (A) Box plot of distribution of AUCs between 100 replicates of the 5-fold cross-validation (up). The minimum and maximum AUC values were 0.8975 and 0.9306, respectively, and AUC values were consistent between replicates. The performance of the multimarker panel in all replicates, including the accuracy, specificity, sensitivity, and mean AUCs, are presented in a table (down). “Qu” represents quantile. (B) Scatterplots with 2 dimensions using the first and second principal components (left) and with 3 dimensions using the first, second, and third principal components (right). The cumulative proportions for each component are described in a table (down). Although some samples are compounded in the scatterplots, most discriminated well. “comp” represents components.

25

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 39

Table 1. Cohort characteristics

Training set (N=84) Chronic cholecystitis

Test set (N=100)

N=25

Age (mean±SD)

49.4±8.5

Sex (M:F)

8:17

CA19-9 (U/ml, mean±SD)

9.2±8.1

CEA (μg/l, mean±SD)

1.5±0.6

Healthy control

N=25

N=50

Age (mean±SD)

55.6±11.1

54.2±7.0

Sex (M:F)

11:14

25:25

CA19-9 (U/ml, mean±SD)

22.8±55.2

9.0±6.0

CEA (μg/l, mean±SD)

1.4±0.7

1.3±0.6

IPMN

N=34

N=50

Age (mean±SD)

64.4±7.7

67.1±8.5

Sex (M:F)

23:11

31:19

CA19-9 (U/ml, mean±SD)

21.6±43.6

48.7±120.6

CEA (μg/l, mean±SD)

1.5±0.6

2.2±2.9

* Abbreviation: M, male; F, female; SD, standard deviation; IPMN, intraductal papillary mucinous neoplasms; CA19-9, carbohydrate antigen 19-9.

26

ACS Paragon Plus Environment

Page 27 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 2. Difference between IPMN and the control group for 11 IPMN marker candidates

Training set Protein name

Uniprot accession number

Gene symbol

Test set

AUC value Fold change

Heterogeneous control

Healthy control

Fold change

AUC value

Lactate dehydrogenase B

P07195

LDHB

1.60

0.710 (0.704)

1 (0.999)

2.27

0.937 (0.910)

Thioredoxin

P10599

TXN

2.03

0.780

0.964

2.29

0.919

Thrombospondin I

P07996

THBS1

1.96

0.711 (0.691)

0.938 (0.925)

5.74

0.947 (0.944)

IGF binding protein 2

P18065

IGFBP2

1.49

0.674

0.647

2.27

0.869

IGF binding protein 3

P17936

IGFBP3

0.78

0.739 (0.698)

0.794 (0.746)

0.81

0.678 (0.633)

Leucine rich alpha 2 glycoprotein1

P02750

LRG1

1.18

0.634 (0.599)

0.642 (0.529)

1.38

0.732 (0.708)

Beta thromboglobulin

P02775

PPBP

3.24

0.801

0.986

1.48

0.773

Kallikrein B

P03952

KLKB1

0.79

0.749

0.854

0.79

0.716

Complement component 5

P01031

C5

1.09

0.571 (0.570)

0.623 (0.607)

1.08

0.559 (0.550)

Angiotensin I

P01019

AGT

0.90

0.594

0.535

0.88

0.614

Carboxypeptidase N polypeptide 2

P22792

CPN2

0.89

0.694

0.754

0.95

0.576

* Fold changes and AUC values were calculated by MSstats and MedCalc, respectively. The value in bracket represents AUC value, which is calculated using another signature peptide.

27

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 39

Table 3. Classification table of the IPMN multiprotein panel

Training set (IPMN versus Control)

Predicted group Actual group

Error rate Percent correct

0

1

Y=0

47

3

94.0%

Y=1

5

29

85.3%

2.1%

Percent of cases correctly classified

90.5%

Training set (IPMN versus Chronic cholecystitis)

Predicted group Actual group

Y=0

Error rate Percent correct

0

1

20

5

80.0% 3.4%

Y=1

5

29

85.3%

Percent of cases correctly classified

83.1%

Test set (IPMN versus Control)

Predicted group Actual group

Y=0

Error rate Percent correct

0

1

48

2

96.0% 1.1%

Y=1

5

45

Percent of cases correctly classified

90.0% 93.0%

* Predicted group: sample condition predicted by multi-protein panel, Actual group: actual sample condition, 0: control, 1: IPMN.

28

ACS Paragon Plus Environment

Page 29 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 1.

29

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2.

30

ACS Paragon Plus Environment

Page 30 of 39

Page 31 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 3.

31

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4.

32

ACS Paragon Plus Environment

Page 32 of 39

Page 33 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 5.

33

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 6.

34

ACS Paragon Plus Environment

Page 34 of 39

Page 35 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

References 1. Egawa, S.; Takeda, K.; Fukuyama, S.; Motoi, F.; Sunamura, M.; Matsuno, S., Clinicopathological aspects of small pancreatic cancer. Pancreas 2004, 28, 235-40. 2. Kang, M. J.; Jang, J. Y.; Lee, K. B.; Chang, Y. R.; Kwon, W.; Kim, S. W., Long-term prospective cohort study of patients undergoing pancreatectomy for intraductal papillary mucinous neoplasm of the pancreas: implications for postoperative surveillance. Ann Surg 2014, 260, 356-63. 3. Kang, M. J.; Lee, K. B.; Jang, J. Y.; Kwon, W.; Park, J. W.; Chang, Y. R.; Kim, S. W., Disease spectrum of intraductal papillary mucinous neoplasm with an associated invasive carcinoma invasive IPMN versus pancreatic ductal adenocarcinoma-associated IPMN. Pancreas 2013, 42, 1267-74. 4. Dal Molin, M.; Matthaei, H.; Wu, J.; Blackford, A.; Debeljak, M.; Rezaee, N.; Wolfgang, C. L.; Butturini, G.; Salvia, R.; Bassi, C.; Goggins, M. G.; Kinzler, K. W.; Vogelstein, B.; Eshleman, J. R.; Hruban, R. H.; Maitra, A., Clinicopathological correlates of activating GNAS mutations in intraductal papillary mucinous neoplasm (IPMN) of the pancreas. Ann Surg Oncol 2013, 20, 3802-8. 5. Wu, J.; Matthaei, H.; Maitra, A.; Dal Molin, M.; Wood, L. D.; Eshleman, J. R.; Goggins, M.; Canto, M. I.; Schulick, R. D.; Edil, B. H.; Wolfgang, C. L.; Klein, A. P.; Diaz, L. A., Jr.; Allen, P. J.; Schmidt, C. M.; Kinzler, K. W.; Papadopoulos, N.; Hruban, R. H.; Vogelstein, B., Recurrent GNAS mutations define an unexpected pathway for pancreatic cyst development. Sci Transl Med 2011, 3, 92ra66. 6. Picotti, P.; Rinner, O.; Stallmach, R.; Dautel, F.; Farrah, T.; Domon, B.; Wenschuh, H.; Aebersold, R., High-throughput generation of selected reaction-monitoring assays for proteins and proteomes. Nat Methods 2010, 7, 43-6. 7. Addona, T. A.; Abbatiello, S. E.; Schilling, B.; Skates, S. J.; Mani, D. R.; Bunk, D. M.; Spiegelman, C. H.; Zimmerman, L. J.; Ham, A. J.; Keshishian, H.; Hall, S. C.; Allen, S.; Blackman, R. K.; Borchers, C. H.; Buck, C.; Cardasis, H. L.; Cusack, M. P.; Dodder, N. G.; Gibson, B. W.; Held, J. M.; Hiltke, T.; Jackson, A.; Johansen, E. B.; Kinsinger, C. R.; Li, J.; Mesri, M.; Neubert, T. A.; Niles, R. K.; Pulsipher, T. C.; Ransohoff, D.; Rodriguez, H.; Rudnick, P. A.; Smith, D.; Tabb, D. L.; Tegeler, T. J.; Variyath, A. M.; Vega-Montoto, L. J.; Wahlander, A.; Waldemarson, S.; Wang, M.; Whiteaker, J. R.; Zhao, L.; Anderson, N. L.; Fisher, S. J.; Liebler, D. C.; Paulovich, A. G.; Regnier, F. E.; Tempst, P.; Carr, S. A., Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotechnol 2009, 27, 633-41. 8. Kuzyk, M. A.; Smith, D.; Yang, J.; Cross, T. J.; Jackson, A. M.; Hardie, D. B.; Anderson, N. L.; Borchers, C. H., Multiple reaction monitoring-based, multiplexed, absolute quantitation of 45 proteins in human plasma. Mol Cell Proteomics 2009, 8, 1860-77. 9. Fortin, T.; Salvador, A.; Charrier, J. P.; Lenz, C.; Lacoux, X.; Morla, A.; Choquet-Kastylevsky, G.; Lemoine, J., Clinical quantitation of prostate-specific antigen biomarker in the low nanogram/milliliter range by conventional bore liquid chromatography-tandem mass spectrometry (multiple reaction monitoring) coupling and correlation with ELISA tests. Mol Cell Proteomics 2009, 8, 1006-15. 10. Rhodes, D. R.; Yu, J.; Shanker, K.; Deshpande, N.; Varambally, R.; Ghosh, D.; Barrette, T.; Pandey, A.; Chinnaiyan, A. M., ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia 2004, 6, 1-6. 11. Krechler, T.; Jachymova, M.; Mestek, O.; Zak, A.; Zima, T.; Kalousova, M., Soluble receptor for advanced glycation end-products (sRAGE) and polymorphisms of RAGE and glyoxalase I genes in patients with pancreas cancer. Clin Biochem 2010, 43, 882-6. 12. Furukawa, T.; Kuboki, Y.; Tanji, E.; Yoshida, S.; Hatori, T.; Yamamoto, M.; Shibata, N.; Shimizu, K.; Kamatani, N.; Shiratori, K., Whole-exome sequencing uncovers frequent GNAS mutations in intraductal papillary mucinous neoplasms of the pancreas. Sci Rep 2011, 1, 161. 13. Ling, J.; Kang, Y.; Zhao, R.; Xia, Q.; Lee, D. F.; Chang, Z.; Li, J.; Peng, B.; Fleming, J. B.; Wang, H.; Liu, J.; Lemischka, I. R.; Hung, M. C.; Chiao, P. J., KrasG12D-induced IKK2/beta/NF-kappaB activation by IL1alpha and p62 feedforward loops is required for development of pancreatic ductal adenocarcinoma. Cancer Cell 2012, 21, 105-20. 14. Hwang, T. L.; Liang, Y.; Chien, K. Y.; Yu, J. S., Overexpression and elevated serum levels of phosphoglycerate kinase 1 in pancreatic ductal adenocarcinoma. Proteomics 2006, 6, 2259-72. 15. Lu, Z.; Hu, L.; Evers, S.; Chen, J.; Shen, Y., Differential expression profiling of human pancreatic adenocarcinoma and healthy pancreatic tissue. Proteomics 2004, 4, 3975-88. 16. Chen, R.; Yi, E. C.; Donohoe, S.; Pan, S.; Eng, J.; Cooke, K.; Crispin, D. A.; Lane, Z.; Goodlett, D. R.; Bronner, M. P.; Aebersold, R.; Brentnall, T. A., Pancreatic cancer proteome: the proteins that underlie invasion, metastasis, and immunologic escape. Gastroenterology 2005, 129, 1187-97. 17. Turtoi, A.; Musmeci, D.; Wang, Y.; Dumont, B.; Somja, J.; Bevilacqua, G.; De Pauw, E.; Delvenne, P.; 35

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 39

Castronovo, V., Identification of novel accessible proteins bearing diagnostic and therapeutic potential in human pancreatic ductal adenocarcinoma. J Proteome Res 2011, 10, 4302-13. 18. Chen, R.; Brentnall, T. A.; Pan, S.; Cooke, K.; Moyes, K. W.; Lane, Z.; Crispin, D. A.; Goodlett, D. R.; Aebersold, R.; Bronner, M. P., Quantitative proteomics analysis reveals that proteins differentially expressed in chronic pancreatitis are also frequently involved in pancreatic cancer. Mol Cell Proteomics 2007, 6, 1331-42. 19. Zhao, J.; Qiu, W.; Simeone, D. M.; Lubman, D. M., N-linked glycosylation profiling of pancreatic cancer serum using capillary liquid phase separation coupled with mass spectrometric analysis. J Proteome Res 2007, 6, 1126-38. 20. Yu, K. H.; Rustgi, A. K.; Blair, I. A., Characterization of proteins in human pancreatic cancer serum using differential gel electrophoresis and tandem mass spectrometry. J Proteome Res 2005, 4, 1742-51. 21. Bloomston, M.; Zhou, J. X.; Rosemurgy, A. S.; Frankel, W.; Muro-Cacho, C. A.; Yeatman, T. J., Fibrinogen gamma overexpression in pancreatic cancer identified by large-scale proteomic analysis of serum samples. Cancer Res 2006, 66, 2592-9. 22. Matsubara, J.; Ono, M.; Honda, K.; Negishi, A.; Ueno, H.; Okusaka, T.; Furuse, J.; Furuta, K.; Sugiyama, E.; Saito, Y.; Kaniwa, N.; Sawada, J.; Shoji, A.; Sakuma, T.; Chiba, T.; Saijo, N.; Hirohashi, S.; Yamada, T., Survival prediction for pancreatic cancer patients receiving gemcitabine treatment. Mol Cell Proteomics 2010, 9, 695-704. 23. Yu, K. H.; Barry, C. G.; Austin, D.; Busch, C. M.; Sangar, V.; Rustgi, A. K.; Blair, I. A., Stable isotope dilution multidimensional liquid chromatography-tandem mass spectrometry for pancreatic cancer serum biomarker discovery. J Proteome Res 2009, 8, 1565-76. 24. Makawita, S.; Smith, C.; Batruch, I.; Zheng, Y.; Ruckert, F.; Grutzmann, R.; Pilarsky, C.; Gallinger, S.; Diamandis, E. P., Integrated proteomic profiling of cell line conditioned media and pancreatic juice for the identification of pancreatic cancer biomarkers. Mol Cell Proteomics 2011, 10, M111 008599. 25. Gronborg, M.; Kristiansen, T. Z.; Iwahori, A.; Chang, R.; Reddy, R.; Sato, N.; Molina, H.; Jensen, O. N.; Hruban, R. H.; Goggins, M. G.; Maitra, A.; Pandey, A., Biomarker discovery from pancreatic cancer secretome using a differential proteomic approach. Mol Cell Proteomics 2006, 5, 157-71. 26. Pan, S.; Chen, R.; Crispin, D. A.; May, D.; Stevens, T.; McIntosh, M. W.; Bronner, M. P.; Ziogas, A.; Anton-Culver, H.; Brentnall, T. A., Protein alterations associated with pancreatic cancer and chronic pancreatitis found in human plasma using global quantitative proteomics profiling. J Proteome Res 2011, 10, 2359-76. 27. Farina, A.; Dumonceau, J. M.; Frossard, J. L.; Hadengue, A.; Hochstrasser, D. F.; Lescuyer, P., Proteomic analysis of human bile from malignant biliary stenosis induced by pancreatic cancer. J Proteome Res 2009, 8, 159-69. 28. Gronborg, M.; Bunkenborg, J.; Kristiansen, T. Z.; Jensen, O. N.; Yeo, C. J.; Hruban, R. H.; Maitra, A.; Goggins, M. G.; Pandey, A., Comprehensive proteomic analysis of human pancreatic juice. J Proteome Res 2004, 3, 1042-55. 29. Chen, R.; Pan, S.; Yi, E. C.; Donohoe, S.; Bronner, M. P.; Potter, J. D.; Goodlett, D. R.; Aebersold, R.; Brentnall, T. A., Quantitative proteomic profiling of pancreatic cancer juice. Proteomics 2006, 6, 3871-9. 30. Schroder, C.; Jacob, A.; Tonack, S.; Radon, T. P.; Sill, M.; Zucknick, M.; Ruffer, S.; Costello, E.; Neoptolemos, J. P.; Crnogorac-Jurcevic, T.; Bauer, A.; Fellenberg, K.; Hoheisel, J. D., Dual-color proteomic profiling of complex samples with a microarray of 810 cancer-related antibodies. Mol Cell Proteomics 2010, 9, 1271-80. 31. Dai, L.; Li, C.; Shedden, K. A.; Lee, C. J.; Li, C.; Quoc, H.; Simeone, D. M.; Lubman, D. M., Quantitative proteomic profiling studies of pancreatic cancer stem cells. J Proteome Res 2010, 9, 3394-402. 32. Muthusamy, B.; Hanumanthu, G.; Suresh, S.; Rekha, B.; Srinivas, D.; Karthick, L.; Vrushabendra, B. M.; Sharma, S.; Mishra, G.; Chatterjee, P.; Mangala, K. S.; Shivashankar, H. N.; Chandrika, K. N.; Deshpande, N.; Suresh, M.; Kannabiran, N.; Niranjan, V.; Nalli, A.; Prasad, T. S.; Arun, K. S.; Reddy, R.; Chandran, S.; Jadhav, T.; Julie, D.; Mahesh, M.; John, S. L.; Palvankar, K.; Sudhir, D.; Bala, P.; Rashmi, N. S.; Vishnupriya, G.; Dhar, K.; Reshma, S.; Chaerkady, R.; Gandhi, T. K.; Harsha, H. C.; Mohan, S. S.; Deshpande, K. S.; Sarker, M.; Pandey, A., Plasma Proteome Database as a resource for proteomics research. Proteomics 2005, 5, 3531-6. 33. Darde, V. M.; de la Cuesta, F.; Dones, F. G.; Alvarez-Llamas, G.; Barderas, M. G.; Vivanco, F., Analysis of the plasma proteome associated with acute coronary syndrome: does a permanent protein signature exist in the plasma of ACS patients? J Proteome Res 2010, 9, 4420-32. 34. Keshishian, H.; Addona, T.; Burgess, M.; Kuhn, E.; Carr, S. A., Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics 2007, 6, 2212-29. 35. Martinez-Marquez, A.; Morante-Carriel, J.; Selles-Marchart, S.; Martinez-Esteso, M. J.; Pineda-Lucas, J. L.; Luque, I.; Bru-Martinez, R., Development and validation of MRM methods to quantify protein isoforms of polyphenol oxidase in loquat fruits. J Proteome Res 2013, 12, 5709-22. 36

ACS Paragon Plus Environment

Page 37 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

36. Ahrne, E.; Ohta, Y.; Nikitin, F.; Scherl, A.; Lisacek, F.; Muller, M., An improved method for the construction of decoy peptide MS/MS spectra suitable for the accurate estimation of false discovery rates. Proteomics 2011, 11, 4085-95. 37. Zhang, Q.; Spellman, D. S.; Song, Y.; Choi, B.; Hatcher, N. G.; Tomazela, D.; Beaumont, M.; Tabrizifard, M.; Prabhavalkar, D.; Seghezzi, W.; Harrelson, J.; Bateman, K. P., Generic automated method for liquid chromatography-multiple reaction monitoring mass spectrometry based monoclonal antibody quantitation for preclinical pharmacokinetic studies. Anal Chem 2014, 86, 8776-84. 38. Abbatiello, S. E.; Mani, D. R.; Keshishian, H.; Carr, S. A., Automated detection of inaccurate and imprecise transitions in peptide quantification by multiple reaction monitoring mass spectrometry. Clin Chem 2010, 56, 291-305. 39. Burgess, M. W.; Keshishian, H.; Mani, D. R.; Gillette, M. A.; Carr, S. A., Simplified and efficient quantification of low-abundance proteins at very high multiplex via targeted mass spectrometry. Mol Cell Proteomics 2014, 13, 1137-49. 40. Pavlou, M. P.; Dimitromanolakis, A.; Martinez-Morillo, E.; Smid, M.; Foekens, J. A.; Diamandis, E. P., Integrating meta-analysis of microarray data and targeted proteomics for biomarker identification: application in breast cancer. J Proteome Res 2014, 13, 2897-909. 41. Poon, T. C.; Sung, J. J.; Chow, S. M.; Ng, E. K.; Yu, A. C.; Chu, E. S.; Hui, A. M.; Leung, W. K., Diagnosis of gastric cancer by serum proteomic fingerprinting. Gastroenterology 2006, 130, 1858-64. 42. Jung, S.; Kim, O. Y.; Kim, M.; Song, J.; Lee, S. H.; Lee, J. H., Age-related increase in alanine aminotransferase correlates with elevated levels of plasma amino acids, decanoylcarnitine, Lp-PLA2 Activity, oxidative stress, and arterial stiffness. J Proteome Res 2014, 13, 3467-75. 43. Choi, M.; Chang, C. Y.; Clough, T.; Broudy, D.; Killeen, T.; MacLean, B.; Vitek, O., MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics 2014, 30, 2524-6. 44. Surinova, S.; Huttenhain, R.; Chang, C. Y.; Espona, L.; Vitek, O.; Aebersold, R., Automated selected reaction monitoring data analysis workflow for large-scale targeted proteomic studies. Nat Protoc 2013, 8, 1602-19. 45. Jang, J. Y.; Park, T.; Lee, S.; Kang, M. J.; Lee, S. Y.; Lee, K. B.; Chang, Y. R.; Kim, S. W., Validation of international consensus guidelines for the resection of branch duct-type intraductal papillary mucinous neoplasms. Br J Surg 2014, 101, 686-92. 46. Buhimschi, C. S.; Bhandari, V.; Dulay, A. T.; Nayeri, U. A.; Abdel-Razeq, S. S.; Pettker, C. M.; Thung, S.; Zhao, G.; Han, Y. W.; Bizzarro, M.; Buhimschi, I. A., Proteomics mapping of cord blood identifies haptoglobin "switch-on" pattern as biomarker of early-onset neonatal sepsis in preterm newborns. PLoS One 2011, 6, e26111. 47. Kennedy, J. J.; Abbatiello, S. E.; Kim, K.; Yan, P.; Whiteaker, J. R.; Lin, C.; Kim, J. S.; Zhang, Y.; Wang, X.; Ivey, R. G.; Zhao, L.; Min, H.; Lee, Y.; Yu, M. H.; Yang, E. G.; Lee, C.; Wang, P.; Rodriguez, H.; Kim, Y.; Carr, S. A.; Paulovich, A. G., Demonstrating the feasibility of large-scale development of standardized assays to quantify human proteins. Nat Methods 2014, 11, 149-55. 48. Percy, A. J.; Chambers, A. G.; Yang, J.; Hardie, D. B.; Borchers, C. H., Advances in multiplexed MRM-based protein biomarker quantitation toward clinical utility. Biochim Biophys Acta 2014, 1844, 917-26. 49. Duffy, M. J.; Sturgeon, C.; Lamerz, R.; Haglund, C.; Holubec, V. L.; Klapdor, R.; Nicolini, A.; Topolcan, O.; Heinemann, V., Tumor markers in pancreatic cancer: a European Group on Tumor Markers (EGTM) status report. Ann Oncol 2010, 21, 441-7. 50. Ni, X. G.; Bai, X. F.; Mao, Y. L.; Shao, Y. F.; Wu, J. X.; Shan, Y.; Wang, C. F.; Wang, J.; Tian, Y. T.; Liu, Q.; Xu, D. K.; Zhao, P., The clinical value of serum CEA, CA19-9, and CA242 in the diagnosis and prognosis of pancreatic cancer. Eur J Surg Oncol 2005, 31, 164-9. 51. Fritz, S.; Hackert, T.; Hinz, U.; Hartwig, W.; Buchler, M. W.; Werner, J., Role of serum carbohydrate antigen 19-9 and carcinoembryonic antigen in distinguishing between benign and invasive intraductal papillary mucinous neoplasm of the pancreas. Br J Surg 2011, 98, 104-10. 52. Dennison, J. B.; Molina, J. R.; Mitra, S.; Gonzalez-Angulo, A. M.; Balko, J. M.; Kuba, M. G.; Sanders, M. E.; Pinto, J. A.; Gomez, H. L.; Arteaga, C. L.; Brown, R. E.; Mills, G. B., Lactate dehydrogenase B: a metabolic marker of response to neoadjuvant chemotherapy in breast cancer. Clin Cancer Res 2013, 19, 3703-13. 53. Cui, J.; Quan, M.; Jiang, W.; Hu, H.; Jiao, F.; Li, N.; Jin, Z.; Wang, L.; Wang, Y.; Wang, L., Suppressed expression of LDHB promotes pancreatic cancer progression via inducing glycolytic phenotype. Med Oncol 2015, 32, 589. 54. Pan, S.; Chen, R.; Tamura, Y.; Crispin, D. A.; Lai, L. A.; May, D. H.; McIntosh, M. W.; Goodlett, D. R.; Brentnall, T. A., Quantitative glycoproteomics analysis reveals changes in N-glycosylation level associated with pancreatic ductal adenocarcinoma. J Proteome Res 2014, 13, 1293-306. 37

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

55. Rohrmann, S.; Grote, V. A.; Becker, S.; Rinaldi, S.; Tjonneland, A.; Roswall, N.; Gronbaek, H.; Overvad, K.; Boutron-Ruault, M. C.; Clavel-Chapelon, F.; Racine, A.; Teucher, B.; Boeing, H.; Drogan, D.; Dilis, V.; Lagiou, P.; Trichopoulou, A.; Palli, D.; Tagliabue, G.; Tumino, R.; Vineis, P.; Mattiello, A.; Rodriguez, L.; Duell, E. J.; Molina-Montes, E.; Dorronsoro, M.; Huerta, J. M.; Ardanaz, E.; Jeurnink, S.; Peeters, P. H.; Lindkvist, B.; Johansen, D.; Sund, M.; Ye, W.; Khaw, K. T.; Wareham, N. J.; Allen, N. E.; Crowe, F. L.; Fedirko, V.; Jenab, M.; Michaud, D. S.; Norat, T.; Riboli, E.; Bueno-de-Mesquita, H. B.; Kaaks, R., Concentrations of IGF-I and IGFBP-3 and pancreatic cancer risk in the European Prospective Investigation into Cancer and Nutrition. Br J Cancer 2012, 106, 1004-10. 56. Lu, Y.; Zhao, X.; Luo, G.; Shen, G.; Li, K.; Ren, G.; Pan, Y.; Wang, X.; Fan, D., Thioredoxin-like protein 2b facilitates colon cancer cell proliferation and inhibits apoptosis via NF-kappaB pathway. Cancer Lett 2015, 363, 119-26. 57. Wang, L.; Song, G.; Chang, X.; Tan, W.; Pan, J.; Zhu, X.; Liu, Z.; Qi, M.; Yu, J.; Han, B., The role of TXNDC5 in castration-resistant prostate cancer-involvement of androgen receptor signaling pathway. Oncogene 2015, 34, 4735-45. 58. Han, H.; Bearss, D. J.; Browne, L. W.; Calaluce, R.; Nagle, R. B.; Von Hoff, D. D., Identification of differentially expressed genes in pancreatic cancer cells using cDNA microarray. Cancer Res 2002, 62, 2890-6. 59. McElroy, M. K.; Kaushal, S.; Tran Cao, H. S.; Moossa, A. R.; Talamini, M. A.; Hoffman, R. M.; Bouvet, M., Upregulation of thrombospondin-1 and angiogenesis in an aggressive human pancreatic cancer cell line selected for high metastasis. Mol Cancer Ther 2009, 8, 1779-86. 60. Okada, K.; Hirabayashi, K.; Imaizumi, T.; Matsuyama, M.; Yazawa, N.; Dowaki, S.; Tobita, K.; Ohtani, Y.; Tanaka, M.; Inokuchi, S.; Makuuchi, H., Stromal thrombospondin-1 expression is a prognostic indicator and a new marker of invasiveness in intraductal papillary-mucinous neoplasm of the pancreas. Biomed Res 2010, 31, 13-9. 61. Cui, J.; Quan, M.; Jiang, W.; Hu, H.; Jiao, F.; Li, N.; Jin, Z.; Wang, L.; Wang, Y.; Wang, L., Suppressed expression of LDHB promotes pancreatic cancer progression via inducing glycolytic phenotype. Med Oncol 2015, 32, 143. 62. Nie, S.; Lo, A.; Wu, J.; Zhu, J.; Tan, Z.; Simeone, D. M.; Anderson, M. A.; Shedden, K. A.; Ruffin, M. T.; Lubman, D. M., Glycoprotein biomarker panel for pancreatic cancer discovered by quantitative proteomics analysis. J Proteome Res 2014, 13, 1873-84. 63. Nie, S.; Yin, H.; Tan, Z.; Anderson, M. A.; Ruffin, M. T.; Simeone, D. M.; Lubman, D. M., Quantitative analysis of single amino acid variant peptides associated with pancreatic cancer in serum by an isobaric labeling quantitative method. J Proteome Res 2014, 13, 6058-66. 64. Hirakawa, T.; Yashiro, M.; Murata, A.; Hirata, K.; Kimura, K.; Amano, R.; Yamada, N.; Nakata, B.; Hirakawa, K., IGF-1 receptor and IGF binding protein-3 might predict prognosis of patients with resectable pancreatic cancer. BMC Cancer 2013, 13, 392.

38

ACS Paragon Plus Environment

Page 38 of 39

Page 39 of 39

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table of Contents

Intraductal papillary mucinous neoplasm (IPMN) is a frequent precursor of pancreatic cancer (PC). In this study, a robust multiple reaction monitoring (MRM) pipeline was applied to discovery and verify IPMN biomarker candidates in a large cohort of plasma samples. The 11 proteins were selected as IPMN marker candidates with high confidence. Further, a 6-protein panel was constructed by combining marker candidates. The diagnostic accuracy of IPMN can be improved dramatically with this novel plasma-based panel in combination with a radiological examination.

39

ACS Paragon Plus Environment