A Panel of Regulated Proteins in Serum from ... - ACS Publications

The panel of six proteins showed 67% sensitivity and 88% specificity for .... Response to Vancomycin-Induced Cell Wall Stress in Streptomyces coelicol...
0 downloads 0 Views 4MB Size
Article pubs.acs.org/jpr

A Panel of Regulated Proteins in Serum from Patients with Cervical Intraepithelial Neoplasia and Cervical Cancer Alexander P. Boichenko,† Natalia Govorukhina,† Harry G. Klip,‡ A.G.J. van der Zee,‡ Coşkun Güzel,§ Theo M. Luider,§ and Rainer Bischoff*,† †

Department of Analytical Biochemistry, University of Groningen, Antonius Deusinglaan 1, 9713 AV Groningen, The Netherlands University Medical Centre Groningen, Hanzeplein 1, 9713 GZ Groningen, The Netherlands § Department of Neurology, Erasmus University Medical Center, P.O. Box 1738, 3000DR Rotterdam, The Netherlands ‡

S Supporting Information *

ABSTRACT: We developed a discovery−validation mass-spectrometry-based pipeline to identify a set of proteins that are regulated in serum of patients with cervical intraepithelial neoplasia (CIN) and squamous cell cervical cancer using iTRAQ, label-free shotgun, and targeted mass-spectrometric quantification. In the discovery stage we used a “pooling” strategy for the comparative analysis of immunodepleted serum and revealed 15 up- and 26 down-regulated proteins in patients with early- (CES) and latestage (CLS) cervical cancer. The analysis of nondepleted serum samples from patients with CIN, CES, an CLS and healthy controls showed significant changes in abundance of alpha-1-acid glycoprotein 1, alpha-1-antitrypsin, serotransferrin, haptoglobin, alpha-2HS-glycoprotein, and vitamin D-binding protein. We validated our findings using a fast UHPLC/MRM method in an independent set of serum samples from patients with cervical cancer or CIN and healthy controls as well as serum samples from patients with ovarian cancer (more than 400 samples in total). The panel of six proteins showed 67% sensitivity and 88% specificity for discrimination of patients with CIN from healthy controls, a stage of the disease where current protein-based biomarkers, for example, squamous cell carcinoma antigen (SCCA), fail to show any discrimination. Additionally, combining the six-protein panel with SCCA improves the discrimination of patients with CES and CLS from healthy controls. KEYWORDS: cervical cancer, cervical intraepithelial neoplasia, protein panel, LC−MS/MS, alpha-1-antitrypsin, alpha-1-acid-glycoprotein, haptoglobin, serotransferrin, vitamin D-binding protein, alpha-2-HS-glycoprotein



INTRODUCTION Squamous cell cervical cancer (termed cervical cancer from now on) is the most frequent carcinoma in women in the developing countries1 and the second most frequent carcinoma and one of the leading causes of death in women worldwide.2 The presently used cytomorphological assessment is the most reliable screening method, although it results in high numbers of false-positive and false-negative cervical smears, especially for detection of early stage cancer.3 Sensitivity and specificity is increased when implementing screening for human papillomavirus (HPV) infection reaching 91 and 88%, respectively, for cervical intraepithelial neoplasia (CIN, precancer stage) grade 2 or worse from healthy.4 However, HPV screening just as cytomorphology requires vaginal samples, which reduces participation. Screening for HPV in self-collected vaginal samples has significantly lower sensitivity/specificity (76/ 86%) and is currently not recommended for routine application.4 Blood and its liquid components (plasma and serum) are body fluids that are widely used for disease diagnostics. Plasma © XXXX American Chemical Society

or serum should be ideally suited for biomarker discovery because they sample the entire body and may, for example, contain secreted proteins from cancer cells. However, biomarker discovery and validation in serum or plasma is a challenging task because (i) it is not clear how regulation of protein expression in tissue is translated into blood;5 (ii) the up- or down-regulation of proteins in body fluids is often not specifically related to the disease under study; and (iii) the extensive concentration range of proteins in plasma or serum makes detection and quantification of proteins at the ng/mL level or below difficult.6,7 During the last two decades, there have been numerous attempts to discover and validate serum-based protein biomarkers for diagnostic and prognostic purposes notably related to cancer.2 Cancer antigens were isolated from tissue Special Issue: Proteomics of Human Diseases: Pathogenesis, Diagnosis, Prognosis, and Treatment Received: June 15, 2014

A

dx.doi.org/10.1021/pr500601w | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

findings were verified by label-free nanoChipLC−MS/MS profiling of individual samples and validated by quantitative LC−MS/MS of the selected candidates in the MRM mode. More than 400 samples were analyzed in total.

(e.g., squamous cell carcinoma antigen (SCCA),8 carcinoembryonic antigen,9 tissue polypeptide antigen,10 or cancer antigen 12511,12) and measured in plasma or serum from patients by immunoassays (most often using the ELISA technique). These antigens usually circulate in blood at the ng/mL level, a range that is not addressed by standard shotgun proteomics approaches unless extensive sample preparation is employed. 2D gel-electrophoresis (GE) in combination with mass-spectrometric (MS) identification of selected candidates allowed to profile serum samples of squamous cell cervical cancer patients versus healthy controls, leading to a set of regulated proteins.13,14 However, the results obtained by different authors are contradictory; for example, up-regulation of alpha-1-antitrypsin (A1AT) in serum of cervical cancer patients was shown by Bhattcharyya and Chaudhuri,15 while down-regulation was reported by Abdul-Rahman et al.13 The discrepancy in the results may be related to the low number of analyzed individual samples but could also be due to different classification criteria and different times of sampling. In general, currently known biomarker-based approaches for cervical cancer diagnostics are less specific and sensitive than cytomorphological assessments. Thus, there is an urgent need to develop a robust workflow to discover a new set of serum biomarkers notably for the detection of CIN or cervical cancer. To address this problem, we developed a mass-spectrometrybased pipeline (Figure 1) that combined stable-isotope labeling



MATERIALS AND METHODS

Description of Patients and Serum Samples

Blood was collected into glass tubes (Becton Dickinson #3679953) with a separation gel and micronized silica to accelerate clotting. Serum was prepared by letting freshly collected blood coagulate at room temperature for at least 2 h (but no longer than 8 h), followed by centrifugation at room temperature for 10 min at 3000 rpm. Samples were stored in a local biobank at −80 °C and delivered on dry ice. After reception, samples were stored in our lab at −80 °C until analysis. Additionally, a set of serum samples from healthy female controls that were verbally screened as having no prior medical conditions was obtained from SeraLab (West Sussex, U.K.). All intermediate fractions that were made during sample preparation were stored at −20 °C. At the Department of Gynaecological Oncology (UMCG), all newly referred patients are routinely asked to give written informed consent for collection and storage of pretreatment and follow-up serum samples in a serum bank for future research. Relevant patient data including SCCA values measured by ELISA and follow-up are retrieved and transferred into an anonymous, passwordprotected database. Patients’ identity is protected by studyspecific, unique patient codes, and their true identity is only known to two dedicated data managers. According to Dutch regulations, these precautions mean that no further institutional review board approval is needed (http://www.federa.org). The details of groups of serum samples that were used for pooling and iTRAQ labeling are shown in Table 1. The demographic and disease characteristics of sample groups used for label-free shotgun (verification set) and targeted validation (validation set) of protein panel are shown in Table 2. The information on individual serum samples can be found in Tables S1−S3 in the Supporting Information. Serum Desalting and Total Protein Determination by Macroporous Reversed-Phase Chromatography

The total protein concentration of individual serum samples was determined according to a previously described Macroporous Reversed-Phase (mRP) chromatographic method (Appendix S1 in the Supporting Information), which also desalts and denatures serum proteins prior to label-free quantification.17

Figure 1. Scheme of the discovery-to-validation pipeline used to reveal a cervical cancer related serum protein panel.

Depletion of High-Abundance Proteins

and label-free quantitative proteomics approaches for the discovery and validation of a panel of proteins that are regulated in patients with cervical cancer or CIN.16 The initial

Sample preparation prior to iTRAQ labeling included immunoaffinity depletion of abundant serum proteins using a

Table 1. Demographic and Disease Characteristics for Cervical Cancer and Control Groups Used for Serum Pooling and iTRAQ Labelling: Discovery Study no. of samples average age ± conf. intervala min−max cancer stage SCCA ± conf. interval (ng/mL)a min−max a

healthy set 1, H1S

healthy set 2, H2S

cancer early stage, CES

cancer late stage, CLS

10 51 ± 9 38−70 not applicable not available

7 62 ± 7 45−73 not applicable not available

9 51 ± 11 34−77 Ib−IIa 0.78 ± 0.6 0.3−2.5

7 64 ± 10 41−78 IIb−IVb 93 ± 63 20−222

Confidence intervals were calculated for normal-distributed data and a significance level of 0.05. B

dx.doi.org/10.1021/pr500601w | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

Table 2. Demographic and Disease Characteristics for Cervical Cancer and Control Groups Used for Label-Free NanoChip-LC/ MS Quantification (Verification Set) and Targeted MRM Analysis (Validation Set) healthy controls

CIN

cancer early stage, CES

no. of samples average age ± conf. intervala min−max cancer stage/CIN gradeb SCCA ± conf. interval (ng/mL)a min−max

84 43 ± 3 22−73 not applicable

verification set 16 45 ± 6 29−68 1−3

no. of samples average age ± conf. intervala min−max cancer stage/CIN gradeb SCCA ± conf. interval (ng/mL)a min−max

49 43 ± 3 28−59 not applicable 1.4 ± 0.2 0.18−2.9c

validation set 48 36 ± 2 21−55 2−3 1.7 ± 0.2 0.07−3.0c

cancer late stage, CLS

23 45 ± 5 31−77 Ib−IIa 4.6 ± 2.9 0.3−20.6

20 57 ± 8 26−83 IIb−IVb 60 ± 28 1.2−220

49 50 ± 4 29−82 Ib−IIa 7.1 ± 3 0.6−40

34 62 ± 5 31−92 IIb−IVb 27 ± 11 2.2−177

ovarian cancer, OV

49 60 ± 4 27−84 Ia−IV

a

Confidence intervals were calculated for normal distributed data and significance level 0.05. bFor patients with cervical intraepithelial neoplasia (CIN). cPredicted based on a normal distribution using published data for average values (healthy controls 1.5 ng/mL, CIN 1.7 ng/mL)26,27 and a standard deviation of 0.6.27

multiple affinity removal (MAR) column (Agilent, 4.6 × 50 mm, #5185−5984). Twenty μL of serum was mixed with 80 μL of buffer A (Agilent, Santa Clara, CA), of which 80 μL was injected on a MAR column after filtration through a 0.22 μm spin filter (#5185−5990) at 13 000g and 4 °C for 10 min to remove particulates. The MAR column designed for human serum samples allows the removal of six proteins: serum albumin, immunoglobulins G and A, A1AT, serotransferrin (TRFE), and haptoglobin (HPT) in a single step by immobilized antibodies. Removal of abundant proteins was performed on an AKTA FPLC system (Appendix S1 in the Supporting Information).

Figure 2. Experimental design including two full analytical replicates with two iTRAQ 4-plex kits per replicate, strong-cation exchange (SCX) fractionation of labeled peptides, and nanoChip LC−MS/MS analysis of the collected fractions. Two separate LC−MS/MS runs were performed for the first replicate to assess the repeatability of the calculated protein ratios. CES1, CES2, CES3: pooled immunodepleted serum samples from a set of patients with early stage cervical cancer; CLS1 and CLS2: pooled serum samples from a set of patients with late stage cervical cancer; H1S1: pooled serum sample from set 1 of healthy controls; H2S1: pooled serum sample from set 2 of healthy controls. (See Table 1 for details.)

iTRAQ Labeling, Peptide Fractionation, and LC−MS/MS Analysis

Immunodepleted individual serum samples were pooled, ultrafiltrated at a cutoff of 10 kDa spin filters, reduced, and alkylated according to the manufacturer’s recommendations (Appendix S1 in the Supporting Information). The total protein concentration was determined using the micro BCA protein assay (Pierce, Rockford, USA, # 23235) according to the manufacturer’s instructions. Equivalent amounts of serum proteins after depletion constituted a “pooled” sample per group: cancer early stage (CES), cancer late stage (CLS), healthy set 1, and healthy set 2. After washing, the samples were analyzed by SDS-PAGE for visual inspection to make sure that protein amount and composition of the depleted serum samples were comparable. Thereafter, the proteins were digested (16 h) with trypsin (sequencing grade modified trypsin, Promega, USA, # V5111) at a 1:15 enzyme to total protein ratio at 37 °C. Digests were labeled with stable isotopecontaining reagents (iTRAQ, 4-plex) in accordance with the experimental design shown in Figure 2. Two full analytical replicates (referred to as replicate 1 and replicate 2 in the text below) starting from immunodepletion of the high abundant proteins were analyzed to assess the reproducibility of the analytical procedure (Figure 2). To reduce sample complexity, we fractionated iTRAQlabeled peptides by strong cation exchange (SCX) liquid chromatography using a PolySulfoethyl A column (200 × 4.6 mm, PolyLC, USA) (Appendix S1 in the Supporting Information). Each SCX fraction with iTRAQ-labeled peptides

collected in replicate 1 was analyzed twice with nanoChip LC qTOF MS/MS (Figure 2) to estimate the repeatability of the LC−MS/MS and data analysis steps. These analyses are referred as replicate 1.1 and replicate 1.2 in the text below. NanoChip LC qTOF MS/MS Analysis

A quadruple time-of-flight mass spectrometer (qTOF, Agilent 6510) with a liquid chromatography-chip cube (#G4240) electrospray ionization interface was coupled to a nanoLC system (Agilent 1200) composed of a nanopump (#G2226A), a capillary loading pump (#G1376A), and a solvent degasser (#G1379B). Injections were performed with an autosampler (#G1389A) equipped with an injection loop of 40 μL and a thermostated cooler maintaining the samples in the autosampler at 4 °C during the analysis (G1377A Micro WPS). The instrument was operated under the MassHunter Data Acquisition software (B.04.00, B4033.3) (Appendix S1 in the Supporting Information). A chip (ProtID-Chip-150 II 300A, #G4240−62006) with a 40 nL trap column and a 75 μm × 150 mm analytical column filled with Zorbax 300SB-C18, 5 μm C

dx.doi.org/10.1021/pr500601w | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

average retention times are shown in Table S8 in the Supporting Information. Data analysis included normalization of peptide levels in individual samples to randomly selected healthy control sample. The log 2 ratios calculated for each sample were corrected to obtain zero median. The t test was applied to elucidate the regulated proteins in a group of serum samples from patients with CES and CLS versus healthy controls.

(Agilent Technologies) was used for iTRAQ-labeled peptide separation with a gradient described in Appendix S1 in the Supporting Information. Database Search, Protein Quantification, and Statistical Analysis

Tandem mass spectra were extracted, charge-state-deconvoluted, and deisotoped by the MassHunter Qualitative Analysis software version B.05.00 (Agilent) and saved as .mgf files. All MS/MS data were analyzed using Phenyx (GeneBio, Geneva, Switzerland) and X!Tandem (The GPM, thegpm.org; version CYCLONE (2010.12.01.1)). Phenyx was set up to search the Uniprot swissprot database (selected for Homo sapiens, release 2012_08 of 05-Sep-12, 20231 sequences) setting the digestion enzyme to trypsin. A fragment ion mass tolerance of 0.30 Da and a parent ion tolerance of 50 ppm were selected for database search. Iodoacetamide derivatization of cysteine and 4-plex iTRAQ derivatization of lysine and the N-terminus were specified as fixed modifications; oxidation of methionine was specified in Phenyx as a variable modification. X!Tandem database search was performed in the Scaffold 3.6.2 (Proteome Software, Portland, OR) environment on the subsequent protein database (proteins identified with Phenyx database search) to extend the search space by taking the following variable modifications into account: amidation of the Cterminus, deamidation of asparagine and glutamine, oxidation of histidine, methionine, and tryptophan, acetylation of lysine and the N-terminus, and phosphorylation of serine, threonine, and tyrosine. Scaffold 3.6.2 was used to validate MS/MS based peptide and protein identifications. Peptide identifications were accepted if they had a probability of greater than 95.0% as specified by the Peptide Prophet algorithm.18 Protein identifications were accepted if they had a probability of greater than 95.0% as specified by the Protein Prophet algorithm.19 Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony (a minimum set of proteins that accounts for all detected peptides is reported). Scaffold Q + 3.6.2 was used to relatively quantify iTRAQlabeled peptides and proteins. Channels were corrected for the presence of contamination from other reporter reagents with the values provided by the manufacturer (AB SCIEX) in all samples according to the algorithm described in i-Tracker.20 Intensities for each identified peptide were median-based normalized within the assigned protein. The repeatability of iTRAQ results was evaluated with analysis of variances (Appendix S2 in the Supporting Information). The p values estimated based on the Cauchy−Lorentz distribution were combined using Fisher’s combined probability test.21

Label-Free Relative and Absolute Quantification of the Selected Proteins in Individual Nondepleted Serum Samples (Verification Set)

Individual nondepleted serum samples were used for absolute quantification of alpha-1-acid glycoprotein 1 (A1AG1) and the relative quantification of A1AT, HPT, alpha-1-antichymotrypsin (AACT), TRFE, alpha-2-HS-glycoprotein (FETUA), vitamin-D binding protein (VTDB), and kininogen-1 (KNG1). The total serum protein concentration was determined simultaneously with protein desalting by mRP chromatography. After desalting, an aliquot of the collected fraction containing 25 μg of total protein amount was spiked with 10 pmol of two stable-isotope-labeled synthetic peptides: SDVVYTDWK (heavy lysine 13C615N2; monoisotopic mass: 1119.52 Da) and TEDTIFLR (heavy arginine 13C615N4, monoisotopic mass: 1003.51 Da) (Thermo-Fisher, Aqua Ultimate grade). After that, samples were resuspended in 10 μL of 100 mM NH4HCO3, reduced with DTT, alkylated with IAA, and digested for 16 h with 1 μg of trypsin. The results were exported in mzData format with the MassHunter Qualitative Analysis software (Agilent). The TOPASS software (version: 1.9.0)22,23 was used for the conversion of data from the mzData to the mzML format, for feature finding with OpenMS (open-ms.sourceforge.net). Further data processing was done with Microsoft Excel. Quantification of A1AG1 was based on the ratio between the areas for doubly charged ion of the heavy synthetic and the endogenous SDVVYTDWK peptide. A1AT, HPT, AACT, VTDB, TRFE, KNG1, and FETUA were relatively quantified based on their tryptic peptides whose areas were normalized with respect to the areas of the spiked synthetic heavy peptide from A1AG1 to reduce variability. The obtained values per peptide were transformed on log 2 scale and further normalized with respect to the randomly selected reference sample. The log 2 protein level was finally calculated as the average level of the peptides assigned to a protein. Statistical analysis was performed with Statistica 10 (Statsoft), and dot distributions and box plots were constructed with SigmaPlot (Systat Software). UHPLC MS/MS Targeted Quantification of Proteins in Nondepleted Serum (Validation Set)

Label-Free Profiling of the Individual Nondepleted Serum Samples Used in the iTRAQ Labeling Experiment (Confirmation Step)

Fifty μg of desalted serum proteins was dissolved in 50 μL of 100 mM NH4HCO3, reduced with 10 μL of DTT (1.5 μg/μL) for 30 min, and alkylated with 25 μL of IAA (5.5 μg/μL in 100 mM NH4HCO3) for 30 min in dark. The excess IAA was neutralized by the addition of 100 μL of DTT (1.5 μg/μL) and incubated for 30 min at room temperature. Samples were digested (16 h) with 10 μg of trypsin at 37 °C. After digestion, samples were spiked with a mixture (Table S9 in the Supporting Information) of synthetic, stable-isotope-labeled, heavy peptides (Thermo), and completely dried. After drying, peptides were dissolved in 10 μL of acetic acid, then 5 μL of dimethyl sulfoxide and 85 μL of 3% (v/v) ACN, 0.1% FA solution was added.

Individual nondepleted serum samples were desalted by mRP chromatography reduced, alkylated, and digested for 16 h with 1 μg of trypsin. The digested samples were separated on a chip G4240−62030 with a 360 nL trap column and a 150 mm × 75 μm analytical column filled with Polaris C18-A, 3 μm sorbent (Agilent Technologies) (Appendix S1 in the Supporting Information). Label-free quantification was based on the area of the manually extracted ion chromatograms (0.1 m/z tolerance) of the selected peptides. Details of the selected peptides and their D

dx.doi.org/10.1021/pr500601w | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Article

Table 3. Linearity of the Area Response Measured with Heavy Peptides Spiked into Trypsin-Digested Serum Samples (Linearity Was Estimated With 11 Calibration Points; Coefficient of Variation Was Calculated for Four Replicates) peptide sequence

slope

R2

range (μg/mL)a

CV (%)

VTSIQDWVQK S[C]AVAEYGVYVK DIAPTLTLYVGK YTFELSR VPTADLEDVLPLAEDITNILSK HTLNQIDEVK FSVVYAK LSITGTYDLK SASLHLPK FDEFFSEG[C]APGSK SAGWNIPIGLLY[C]DLPEPR SDVVYTDWK TEDTIFLR

0.98 0.84 0.91 1.07 0.92 1.05 1.09 0.98 1.05 1.00 1.03 1.03 0.99

0.999 0.996 0.996 0.998 0.996 0.997 0.999 0.998 0.998 0.999 0.999 0.999 0.998

67−3700 13−750 67−3700 10−560 24−1400 25−1400 11−600 114−6300 54−3000 46−2600 190−6400 72−4000 25−3900

6.7 9.5 6.5 3.4 5.6 8.3 7.9 6.0 4.0 7.8 7.9 8.9 5.0

protein short name and accession number HPT/HPTR P00738/P00739

VTDB P02774 FETUA P02765 A1AT P01009 TRFE P02787 A1AG1/A1AG2 P02763/P19652 a

Protein concentration in serum.

Table 4. Repeatability and Reproducibility of Relative Protein Quantification Based on Data from the iTRAQ (Discovery Study) intrakit repeatability

a

a

repl.#

std.

1.1 1.2 2

0.41 0.43 0.28

interkit repeatability av. std. 0.37

b

repl.#

std.

1.1 1.2 2

0.47 0.53 0.68

repeatability of LC−MS/MS analysis av. std.

repl.#

std. 0.57

0.56

1.1 vs 1.2

av. std. 0.49

0.41

overall repeatability std. 1 vs 2

0.78

Standard deviation of log 2 protein ratios. bAverage standard deviation of the log 2 protein ratios.

Trypsin-digested proteins were labeled, SCX-fractionated, and analyzed by nanoChipLC qTOF MS/MS in accordance with an experimental design that allows us to discriminate between analytical (including data processing) and biological variability (Figure 2). Two analytical replicates starting from the immunodepletion of serum samples were used to select regulated proteins with high confidence. Two LC−MS/MS runs of each of the 66 SCX fractions collected in the first replicate were performed to evaluate the robustness of the MS and data analysis steps (Figure 2). In total, 125 serum proteins were identified with high confidence (false discovery rate at the peptide level