The Shotgun Proteomic Study of the Human ... - ACS Publications

Mar 19, 2013 - The Shotgun Proteomic Study of the Human ThinPrep Cervical Smear Using iTRAQ Mass-Tagging and 2D LC-FT-Orbitrap-MS: The Detection ...
0 downloads 0 Views 2MB Size
Subscriber access provided by UNIV OF MISSOURI COLUMBIA

Article

The shotgun proteomic study of the human ThinPrep cervical smear using iTRAQ Mass-Tagging and 2D LC-FT-Orbitrap-MS: The detection of the human papillomavirus at the protein level Evaggelia K. Papachristou, Theodoros I. Roumeliotis, Argyro Chrysagi, Chrysanthi Trigoni, Ekatherina Charvalos, Paul A. Townsend, Kitty Pavlakis, and Spiros Dennis Garbis J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/pr301067r • Publication Date (Web): 19 Mar 2013 Downloaded from http://pubs.acs.org on March 28, 2013

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

The shotgun proteomic study of the human ThinPrep cervical smear using iTRAQ Mass-Tagging and 2D LC-FT-Orbitrap-MS: The detection of the human papillomavirus at the protein level

Evaggelia K. Papachristou1,4#, Theodoros I. Roumeliotis1,#, Argyro Chrysagi2, Chrysanthi Trigoni2 , Ekatherina Charvalos2, Paul A. Townsend5,&, Kitty Pavlakis3,& and Spiros D. Garbis1,6,&, * 1

Institute for Life Sciences, Centre for Proteomic Research, University of Southampton, Highfield

Campus, Southampton, UK, 2Molecular Diagnosis department, Central Laboratories, IASO Maternity 3

Hospital, Athens, Greece, Department of Pathology, University of Athens, School of Medicine, 4

Center for Basic Research, Division of Biotechnology, Biomedical Research Foundation, Academy of 5

Athens, Athens, Greece, Institute for Cancer Research, University of Manchester, Manchester, UK, 6

Cancer Sciences and CES Units, School of Medicine, University of Southampton, Southampton

General Hospital, Southampton, UK.

#: these two authors contributed equally to this work &: Joint senior authors *: to whom correspondence should be addressed: Spiros D. Garbis, Ph.D. Faculty of Medicine Cancer Sciences and CES Units, School of Medicine, Institute for Life Sciences, University of Southampton 3001, Life Sciences Building 85, Highfield Campus Southampton, SO17 1BJ, UK Email: [email protected]

Keywords: Proteomics, iTRAQ, Cervical smear, Cervical cancer, ThinPrep, HPV, Phosphorylation, Orbitrap Elite, CID, HCD, Sample size, Power analysis. Running Title: Papachristou, E. K., et al., The iTRAQ FT-Orbitrap LC-MS Proteomics of the Human ThinPrep Smear

1

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 40

Abstract The ThinPrep cervical smear is widely used in clinical practice for the cytological

and

molecular

screening

against

abnormal

cells

and

human

papillomavirus (HPV) infection. Current advancements made to LC-MS proteomics include the use of stable isotope labeling for the in-depth analysis of proteins in complex clinical specimens. Such approaches have yet to be realized for ThinPrep clinical specimens. In this study, an LC-MS method based on isobaric (iTRAQ) labeling and high-resolution FT-Orbitrap mass spectrometry was used for the proteomic analysis of 23 human ThinPrep smear specimens. Tandem mass spectrometry analysis was performed with both nitrogen High Collision Dissociation (HCD MS/MS) and Helium Collision Induced Dissociation (CID MS/MS) peptide fragmentation modes. The analysis of three 8-plex sample sets yielded the identification of over 3,200 unique proteins at FDR < 1% of which, over 2,300 proteins were quantitatively profiled in at least one of the three experiments. The inter-individual variability served to define the required sample size needed to identify significant protein expression differences. The degree of in-depth proteome coverage allowed the detection of 6 HPV-derived proteins including the high-risk HPV16 type in the specimens tested. These HPV-proteins were also confirmed with PCR-Hybridization molecular methods. This proof-of-principle study constitutes the first ever report on the non-targeted analysis of HPV proteins in human ThinPrep clinical specimens with high-resolution mass spectrometry. A further testament to the sensitivity and selectivity of the proposed study method was the confident detection of a significant number of phosphopeptides in these specimens.

2

ACS Paragon Plus Environment

Page 3 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Introduction The effective assessment of proteome-wide perturbations in response to physiological versus pathological stimuli in clinical specimens represents a hallmark feature in basic biology and translational research. In this capacity, “shotgun” proteomics approach using the precepts of multidimensional liquid chromatography hyphenated with high resolution tandem mass spectrometry1, has gained widespread recognition for the global and in-depth analysis of proteomes in a diverse range of biological matrices, including tissue2, 3, plasma4, 5, other biofluids6, cell lines7 cell secretomes8 and virus-host cell interactions9-12. In particular, concurrent quantitative profiling of viral and host proteomes13 provides for a more causal understanding of the modulating mechanisms elicited by viral gene products to the host environment. The Human Papillomavirus (HPV) is one of the known tumour-associated viruses14-16. Depending on the type, HPV can cause transient or persistent cervical infections leading to reversible innocuous or pre-cancerous intraepithelial lesions17. To date, tens of different HPV strains have been identified and classified either as low- or high- risk depending on their potential for cervical cancer induction after prolonged infection. Notably, 70 % of cervical malignancies are associated with the high-risk HPV types 16 and 18. Based on the current evidence, HPV types 16 and 18 exert their oncogenic action in a lengthy process involving the recruitment of the E6 and E7 viral genes that impair the canonical function of the host p53 and Rb tumour suppressor proteins respectively and induce transformation18. The ThinPrep cervical smear specimen is routinely used for cytological and molecular screening for the detection of abnormal cells and HPV DNA in the cervix. Such screening has resulted in the early risk assessment for cervical cancer with 3

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 40

concomitant efficient therapeutic intervention19-23. Immunological tests of clinical specimens for HPV infection have proven to be inadequate. The currently routinely used HPV screening tool, however, is based on the hybridization and polymerase chain reaction (PCR) assay24. Quantitative RT-PCR (qRT-PCR) measurements have shown that the HPV DNA copy number positively correlates with the degree of the cervical disease25. This trend highlights the need for the development of a sensitive high-throughput quantitative assay for viral load determination in the cervical smear. The microenvironment innate to the whole cervical smear specimen, potentially containing secreted, shedded, vesicle derived and HPV proteins, serves as an ideal matrix for the proteomics driven discovery of potential biomarkers for the assessment of cancer risk. Based on this precept, proteomic analysis serves as an ideal approach for the protein-level understanding of the host cell HPV integration process, currently restricted to cell line models26. However, proteomic studies have shown limited clinical utility so far, principally stemming from the absence of robust protein extraction protocols amongst other factors, especially over interpretation of the early data. However, successful proteome comparisons between high-grade dysplastic versus normal cervical swab specimens were demonstrated using the 2-D DIGE27 approach or in Laser Capture Microdissected ThinPrep slides using LCMS28. To date, the proteomic analysis of the whole cervical smear remains largely unexplored. Our present proof-of-concept study involves the in-depth proteomic comparative analysis of the ThinPrep smear specimens procured from HPV positive and HPV negative females with no cervical dysplasia, using high precision Multidimensional Protein Identification Technology (MudPIT). The bottom-up relative protein expression between these two groups was based on the use at the tryptic 4

ACS Paragon Plus Environment

Page 5 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

peptide level of the commercially available isobaric tags for relative and absolute quantification (iTRAQTM, AB Sciex, San Jose, CA, USA) 29. Three 8-plex iTRAQ experiments, classified as A, B and C, were performed using the workflow shown in Figure 1. Experiment A included 8 different HPV positive samples to examine the feasibility of their proteomic identification using the proposed study method. Experiments B and C were intended to include HPV negative control samples. Specifically, experiments B and C included three HPV negative smears, as biological replicates for the estimation of the baseline biological variability. One of the protein extracts in experiment B derived from the same biological specimen but labeled with two different iTRAQ reagents (115 and 116) served as a technical replicate for the estimation of the endogenous and experimental variability. These technical replicates also served to define significant relative protein concentration thresholds and for the calculation of the required sample size needed to make statistically significant quantitative proteome inferences30, 31. This information will be of generic importance to future iTRAQ based biomarker discovery studies. In brief, the experimental procedure involved the preprocessing of the protein extracts derived from the ThinPrep smear specimens followed by trypsin proteolysis. The resultant tryptic peptides were separately labeled with the iTRAQ reagents, combined and then subjected to off-line Reverse Phase (RP) C18 HPLC fractionation at high pH. Each peptide fraction was on-line analysed with nano-RP C18 UPLC low pH hyphenated to nano-electrospray ionization (nESI) and high-resolution mass spectrometry with the hybrid LTQ-FT-Orbitrap Elite system (referred to as LC-FT-Orbitrap MS henceforth). Two complimentary peptide fragmentation techniques were used for all peptide fractions in order to examine both qualitative and quantitative aspects of their resulting product ion MS/MS spectra and 5

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 40

to maximize proteome coverage. These peptide fragmentation techniques were (A) The Helium collision induced dissociation MS/MS process performed in the linear ion trap region (referred as CID MS/MS henceforth), and, (B) the nitrogen High-energy collisional dissociation process performed in the octapole collision cell adjacent to the Orbitrap (referred as HCD MS/MS henceforth). Due to its low mass cutoff of resonant-excitation, CID was principally used for the more sensitive qualitative detection of the lower abundant tryptic peptides thus increasing the number of proteins identified. The combined use of CID and HCD LC-FT-Orbitrap MS analysis identified > 3,200 unique proteins at < 1% FDR in all three of the combined experiments including 6 HPV derived (confirmed with PCR and hybridization techniques). The HCD LC-FT-Orbitrap MS quantitatively profiled a total of 2,310 proteins in at least one of the three experiments. Moreover, a total of 303 unique phosphoserine peptides were identified from both HCD and CID LC-FT-Orbitrap MS analyses. To our knowledge, this is the first reported iTRAQ LC-FT-Orbitrap MS proteomic study of ThinPrep smear specimens.

Materials and Methods Cervical Smear Specimens. This study received approval by the ethical committees of the Athens University Medical School and the IASO Hospital. Written informed consent was obtained for all samples tested. A total of 23 cervical smear specimens were obtained from 17 high- and/or low- risk HPV positive females and 6 HPV negative females using the ThinPrep Pap Test kit under the standard specimen collection procedures of the IASO Maternity Hospital of Athens. The ThinPrep smear procurement procedures were approved by the IASO Hospital. The detection and genotyping of HPV strains was accomplished with the Linear Array (LA) HPV 6

ACS Paragon Plus Environment

Page 7 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

genotyping (Roche Diagnostics GmbH, D-68298 Mannheim) PCR based standard methods of the Molecular Diagnosis Department of IASO Maternity Hospital of Athens. HPV positive samples were selected in order to include as many different HPV types as possible so that the performance of the proteomic method in terms of viral protein content determination could be evaluated.

Protein Purification. A 5 mL aliquot from each ThinPrep fluid specimen was transferred into a 15 mL Falcon tube and lyophilized to dryness with a centrifugal concentrator (eppendorf concentrator 5301). These sample aliquots originated from the remainder of the original specimens used for the molecular and cytological diagnostic tests. The dry residue was dissolved in 200 µL solution containing 0.5 M triethylammonium bicarbonate (TEAB, Sigma-Aldrich), 2 % SDS buffer assisted by pulsed probe sonication. The homogenized proteins were subjected to 10 M trichloroacetic acid (TCA) precipitation (1:2 TCA/sample volume ratio) with vortex mixing 1 hour storage at 4 °C. The resulting precipitate was further centrifuged at 13,000 rpm, 4 °C for 25 min. Supernatant liquid was discarded and the protein pellets were washed twice with the addition of 200 µL ice-cold tetrahydrofuran (THF) and centrifugation at 13,000 rpm, 4 °C. The residual protein content was air dried for 10 min and re-suspended in 200 µL 0.5 M triethylammonium bicarbonate, 1 % sodium deoxycholate (SDC, Thermo Scientific) buffer (dissolution buffer) with 5 min heating at 90 °C and probe sonication.

Protein Digestion and iTRAQ Labeling. For up to eight different samples the concentrations for a total protein amount of 50 µg per sample, measured with Bradford assay (Bio-Rad Protein Assay) according to manufacturer’s instructions, 7

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 40

were equalized with the addition of dissolution buffer 0.5 M TEAB, 1 % SDC up to a final volume 20 µL. For the reduction of cysteine disulfide bonds, an amount of 2 µL reducing agent 50 mM tris-2-carboxymethyl phosphine (TCEP) was added followed by 1 h incubation in heating block at 60 °C. Reduced cysteine residues were methylthiolated by the addition of 1 µL 200 mM methanethiosulfonate (MMTS) in isopropanol and 10 min incubation at room temperature. Samples were diluted with 14 µL ultrapure water and 6 µL of freshly prepared proteomics grade trypsin (Roche Diagnostics) aqueous solution 500 ng/µL were added for 16 h proteolytic reaction at 37 °C. A 50 µL volume of isopropanol was added to each iTRAQ reagent vial and after vortex mixing the content of each iTRAQ vial was transferred to each sample tube. The detailed labeling scheme for each one of the three iTRAQ 8-plex experiments is shown in Table 1. Samples were incubated for 2 h at room temperature and after complete reaction the labeled peptide samples were pooled and the whole mixture was dried with speedvac concentrator. For the SDC removal the residual was reconstituted in 200 µL 1 % formic acid and the precipitated SDC was filtered out using PVDF (Millex, 0.22 µm) syringe filter. Finally the purified peptide mixture was dried and stored at -20 °C until the high pH reverse phase fractionation.

High-pH Reverse Phase (RP) Peptide Fractionation. High pH RP C18 fractionation of the iTRAQ labeled peptides was performed on the Dionex P680 pump equipped with PDA-100 photodiode array detector using the Waters, XBridge C18 column (150 x 4.6 mm, 3.5 µm particle). Mobile phase (A) was composed of 2 % acetonitrile, 0.05 % ammonium hydroxide and mobile phase (B) was composed of 100 % acetonitrile, 0.05 % ammonium hydroxide. The dry peptide pellet was 8

ACS Paragon Plus Environment

Page 9 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

dissolved in 200 µL mobile phase (A) with bath sonication and extensive vortex mixing. Sample was centrifuged at 13,000 rpm for 5 min and the supernatant solution was injected in the flow stream through a 200 µL sample loop. The separation program was as follows: for 10 min isocratic 5 % (B), for 55 min gradient up to 60 % (B), for 10 min gradient up to 70 % (B), for ten min up to 95 % (B) at a flow rate 0.4 mL/min. Signal response was monitored at 215, 254 and 280 nm at a column temperature of 30 °C. The greatest peptide signal-to-noise ratio was observed to be at the 254 and 280 nm wavelengths. Although the highest absolute signal response was observed at 215 nm, it was not chosen to monitor peptide elution due to the prevalence of contaminants and background signal absorbing at this wavelength. The fractions were collected every one-minute during the entire gradient elution phase, to ensure the capture of all possible tryptic peptides. The peptide fractions were finally dried with speedvac concentrator for 4-5 h and stored at -20 °C until the LC-MS analysis.

LC-FT-Orbitrap MS Analysis. The LC-MS experiments were performed on the Dionex Ultimate 3000 UHPLC system coupled with the high resolution nano-ESI LTQ-Velos Orbitrap-Elite mass spectrometer (Thermo Scientific). Individual peptide fractions were reconstituted in 30 µL loading solution (2 % acetonitrile, 0.1 % formic acid) and a 2 µL volume was loaded on the Acclaim PepMap 100, 100 µm × 2 cm C18, 5 µm particle trapping column with the ulPickUp Injection mode using the loading pump at 5 µL/min flow rate for 5 min. Two separate analyses for HCD and CID fragmentation for each one of the collected fractions were performed. For the analytical separation the Acclaim PepMap RSLC, 75 µm × 25 cm, nanoViper, C18, 2 µm particle column retrofitted to a PicoTip emitter (FS360-20-10-D-20-C7) was used 9

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 40

for multi-step gradient elution. Mobile phase (A) was composed of 2 % acetonitrile, 0.1 % formic acid and mobile phase (B) was composed of 100% acetonitrile, 0.1% formic acid. The gradient elution method at flow rate 300 nL/min was as follows: for 80 min gradient up to 40 % (B), for 5 min gradient up to 85 % (B), for 5 min isocratic 85 % (B), for 2 min down to 3 % (B), for 8 min isocratic equilibration 3 % (B) at 35 °C. Separated peptides were transferred to the gaseous phase with positive ion electrospray ionization applying a voltage of 2.5 kV. Top 10 multiply charged precursor isotopic clusters with m/z value larger than 350 or smaller than 1900 and intensity threshold 500 counts were selected with FT mass resolution of 120,000 and isolated for HCD fragmentation within a mass window of 1.2 Da. Tandem mass spectra were acquired with FT resolution of 15,000 within m/z range of 100-1900. For the CID experiments, top 20 precursors were selected with FT mass resolution of 240,000 within a mass window of 2 Da. Normalized collision energy was set to 35 and already targeted precursors were dynamically excluded for further isolation and activation for 30 sec with 5 ppm mass tolerance for both types of analysis.

Database searching. Approximately 1.01×106 HCD and 1.05×106 CID tandem mass spectra were collected and submitted to Sequest search engine implemented on the Proteome Discoverer software version 1.3.0.339 for peptide and protein identifications. All spectra were searched against a UniProt Fasta file containing 20,200 human reviewed entries and a total of 7,180 human papillomavirus sequences. The Sequest node for the HCD spectra included the following parameters: Precursor Mass Tolerance 10 ppm, Fragment Mass Tolerance 20 mmu, Dynamic Modifications were Oxidation of M (+15.995 Da), Deamidation of N, Q (+0.984 Da), Phosphorylation of S (+79.966 Da) and Static Modifications were 10

ACS Paragon Plus Environment

Page 11 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

iTRAQ8plex at any N-Terminus, K, Y (+304.205 Da) and Methylthio at C (+45.988 Da). The level of confidence for peptide identifications was estimated using the Percolator node with decoy database searching. Strict FDR was set to 0.01, relaxed FDR was set to 0.05 and validation was based on q-Value. The Reporter Ion Quantifier node included a custom iTRAQ 8plex (Thermo Scientific Instruments) Quantification Method, integration window tolerance 20 ppm and integration method Most Confident Centroid. Protein ratios were normalized to protein median and peptides with missing iTRAQ values were rejected from protein quantification. For the CID experiments, Fragment Mass Tolerance was 0.5 Da while dynamic and static modifications were kept the same as the high-resolution HCD spectra. Phosphorylation localization probability was estimated with the phosphoRS node in both cases.

All HCD data have been deposited to the ProteomeXchange

Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository32 with the dataset identifier PXD000109. Supporting Information Sections 01-03 list peptides identified in experiments A-C with both HCD and CID methods. Supporting information Section 04 lists the unique proteins identified from all experiments at ≤ 1 % FDR.

For these lists, both the q-value and the

Posterior Error Probability (PEP) show the confidence of the PSMs. A q-value is defined as the minimal false discovery rate at which the identification is considered correct. These q-values are estimated using the distribution of scores from the decoy database search. The PEP is the probability that the observed PSM is incorrect.

Bioinformatics and Statistics. Preliminary evaluation of the acquired spectra was performed with the Preview software (Protein Metrics Inc.). DAVID 11

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 40

(Database for Annotation, Visualization and Integrated Discovery) free on line software (http://david.abcc.ncifcrf.gov/) was used for the classification of the identified proteins based on their Gene Ontology (GO). Histograms were plotted using

the

on

line

interactivate

histogram

software

(http://www.shodor.org/

interactivate/activities/Histogram/) and sample size calculations were performed with the

on

line

Power

&

Sample

Size

Calculator

(http://

www.statisticalsolutions.net/pss_calc.php).

Molecular Biology Methods. The Linear Array (LA) HPV genotyping test which was used for the molecular analysis of the cervical smear samples (Roche Diagnostics GmbH, D-68298 Mannheim) is a qualitative in vitro test for the determination of 37 anogenital HPV DNA genotypes [6, 11, 16, 18 ,26, 31, 33, 35, 39, 40, 42, 45, 51, 52, 53, 54, 55, 56, 58, 59, 61, 62, 64, 66, 67, 68, 69, 70, 71, 72, 73 (MM9), 81, 82 (MM4), 83 (MM7), 84 (MM8), IS39 and CP6108]33 in cervical cells collected in Cobas PCR Cell Collection Media. The whole procedure includes DNA extraction, PCR amplification of target DNA, hybridization of amplified products to oligonucleotide probes and colorimetric detection.

PCR Amplification of HPV DNA. The LA HPV test uses biotinylated PGMY primers to amplify a 450-bp fragment within the polymorphic L1 region of the HPV genome as previously described34. Briefly, the master mix containing buffer, PGYM primers, nucleotides [dATP, dCTP, dGTP, and dUTP], MgCl2, and < 0.02 % AmpliTaq Gold DNA polymerase amplified HPV DNA as described above33. Amplicons incorporated dUTP, allowing the use of AmpErase enzyme (uracil Nglycosylase), which is included in the master mixture to prevent PCR carryover 12

ACS Paragon Plus Environment

Page 13 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

contamination. Capture probe sequences are located in polymorphic regions of L1 bound by these primers. In the same PCR reaction an additional pair targeted the human β-globin gene (268-bp amplicon) to provide a control for cell adequacy, extraction, and amplification. PCR was performed in a final reaction volume of 100 µL, containing 50 µL HPV master mixture and 50 µL of isolated DNA. The PCR mixture was incubated for 2 min at 50 °C and for 9 min at 95 °C, followed by 40 cycles of 30 seconds at 95 °C, 1 min at 55 °C, and 1 min at 72 °C, with a final extension at 72 °C lasting from 10 min to a maximum of 1 h. Commercially provided HPV-positive and -negative controls were used with each set of samples to assess the performance of the reaction.

Hybridization and Detection. Following amplification, the HPV and human β-globin amplicons were chemically denatured to form single-stranded DNA by immediately adding 100 µL denaturation solution to each PCR tube. Hybridization and HPV genotyping were performed as described by the manufacturer (Roche Diagnostics GmbH, D-68298 Mannheim). The strips were interpreted using the Linear Array HPV reference guide. The Supporting information Section 05 shows the PCRhybridization images used for the confirmation of HPV presence or absence into the cervical smear specimens.

13

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 40

Results and Discussion Combined screening for abnormal cervical cells and HPV infection in ThinPrep specimens constitutes the basis for successful risk assessment for cervical cancer. The clinical utility of the ThinPrep can only be expanded by their in-depth and global proteomic analysis with state-of-the-art LC-MS technologies. Such an approach may lead to the discovery of novel protein markers that could help explain HPV mediated initiation and progression of cervical cancer. Such markers could form the basis for the yet more early diagnosis of cervical cancer. Taken together, traditionally and clinically accepted techniques in conjunction with state of the art mass spectrometry/proteomics based techniques will increase the sensitivity and specificity of current screening approaches. The effective application of proteomic discovery approaches, however, must account for specimen quality (biological heterogeneity) of the cervical smear along with the innate phenotypic differences between patients tested. In this proof-of-principle study we utilized iTRAQ mass tagging, orthogonal two-dimensional liquid chromatography coupled with high resolution mass spectrometry for the analysis of HPV positive and HPV negative control ThinPrep smears to assess the above considerations along with the potential clinical relevance of the protein identified therein. The two-dimensional liquid chromatographic approach used in this study is a technique of choice to reduce the complexity of the peptide content typical of the bottom-up proteomic approaches. This reduction in complexity is accomplished via the higher degree of separation made possible by the orthogonal use of multiple chromatographic chemistries, which typically results to an improvement in the quality of mass spectrometry analysis. In our particular methodological approach, an initial high RP separation followed by a 14

ACS Paragon Plus Environment

Page 15 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

low pH RP separation strategy was used to achieve a higher degree of separation capacity and orthogonality and to improve the depth of proteomic analysis35.

Two key determinants to a successful iTRAQ LC-MS study is the efficiency in both solution phase proteolysis and peptide labeling. This becomes especially prudent when the analysis of complex and highly heterogeneous biomedical specimens is at stake. As a means to quantify the biological and experimental variance affected by these factors, the Preview software36 was used for the preliminary evaluation of the HCD derived high resolution spectra. In addition, the Preview software program, calculated the percentage of spectra against 60 different modifications including the iTRAQ8plex labeling. As a result, the average labeling efficiency was 96 % for Ntermini and 98.8 % for K residues while the average percentage of missed cleavages was 5.5 %. Additionally, the most frequent peptide modifications with average percentages 12.9 % and 7.8 % were Deamidation (NQ) and Oxidation (M) respectively which were subsequently included in the main database search. Regarding mass precision, the median precursor and fragment accuracies expressed as absolute value of error were 3.6 ppm and 5.2 ppm by average, respectively. The 2D LC-FT-Orbitrap MS proteomic analysis of the ThinPrep smear resulted in the total identification of 17,580 unique peptides at FDR < 1 %, traceable to 3,217 unique proteins. A total of 2,310 proteins were quantitatively profiled in at least one of the three iTRAQ 8-plex experiments based on HCD fragmentation. A comparison between the different experiments showed that 900 and 1,266 proteins were reproducibly identified in all of the three experiments performed with HCD and CID operations respectively. Globally, 1,945 proteins and 6,495 peptides were 15

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 40

common between the two different fragmentation methods. In terms of posttranslational modifications, a total number of 303 phosphorylated peptides at serine residues were identified, of which 93 % were captured with CID fragmentation most likely due to the improved S/N observed to their resulting tandem spectra relative to those observed with HCD. This observation exemplifies the potential benefit to using dual sequential triggering of the same precursor mass for HCD and CID activation for each LC-FT-Orbitrap MS run. However, in this study we purposely chose the independent use of either the CID or HCD so as to maximize the number of targeted precursors per duty cycle for each fragmentation approach. The comparative advantage of the CID LC-FT-Orbitrap-MS approach makes it a technique of choice for its application to labeling approaches at the MS level, such as heavy water and dimethyl labeling37. Under this premise, the MS level quantitative analysis is conducted under ultra-high resolution conditions while the qualitative analysis take place with CID under high scan speeds (and low-resolution). A detailed overview of the number of protein identifications along with overlaps between the different LCFT-Orbitrap-MS experimental protocols are illustrated as Venn diagrams in Figure 2. In any given proteomic study involving samples with high cellular heterogeneity, the definition of the main cellular types reflected by the obtained proteome is crucial for data interpretation. For this purpose, we classified the top 50 high scored proteins which represent highly abundant proteins from all of the three HCD experiments according to tissue expression using the DAVID software as shown in Figure 3-I. Based on the available bibliography, a significant number of proteins have been identified in our study that may play a significant role in cervical cancer. Their cellular and molecular functional validation, however, was beyond this proof-of-principle study. The most statistically significant cell type associated with the 16

ACS Paragon Plus Environment

Page 17 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

high abundant proteins included the Keratins 6B, 7, 10, 16 and 19, protein S100 A8 and A9, Envoplakin, Periplakin and Involucrin. Involucrin in particular is classified as a marker of epithelial differentiation which is linked with the productive life cycle of HPV38,

39

. Since virion production is restricted to differentiated cells it becomes

evident from the observed proteome, that the cervical smear can be useful for the study of HPV-host cell protein level interactions. Classification of the whole proteome according to Gene Ontologies was performed with DAVID software and is shown in pie chart form in Figure 3-II. A broad range of biological processes and molecular functions of proteins with diverse cellular component origin was determined. The advantages of whole cervical smear specimen analysis include the potential for the identification of extracellular proteins, the secretion of which may be controlled by disease-specific mechanisms. Particularly, the successful quantitative profiling of cytokines can be helpful for the elucidation of immune system local behaviour during papillomavirus infection. In this study we have obtained quantitative information for five interleukins namely, IL-1β, IL-18, IL-19, IL-36α and pro-IL-16 of which the IL-1 family members IL-18 and IL-36α were the most abundant based on their reproducible identification in the majority of the HCD experiments. IL-18 is a proinflammatory cytokine involved in the regulation of innate and acquired immune responses, with key role in autoimmune, inflammatory and infectious diseases40. Cells expressing E6 oncogene can evade immune surveillance by direct down-modulation of IL-1841 while such an evasion is one of the prerequisites for HPV infections to progress to intraepithelial neoplasia42. The Kallikrein family of secreted proteins of relevance to cancer biology were successfully analysed in this study. Namely, we report the identification of KLK6, KLK7, KLK10, KLK11, KLK12, KLK13 and KLK14 in the cervical smears examined 17

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 40

and is consistent with previous proteomic studies applied to the cervical-vaginal fluid43,

44

minus the KLK14. These proteolytic enzymes have been associated with

several forms of cancer and likely assist their progression and metastasis via their ability to cleave extracellular matrix proteins45. The high-resolution product ion rich spectra at high S/N, obtained in this study, make it possible to unequivocally identify tryptic peptides uniquely traceable to one protein. This notion is further supported by the effective and orthogonal use of good multidimensional chromatographic technique that can separate complex tryptic peptides to maximize the MS signal sampling needed to generate good spectral quality. As a result, the collective features of the proposed LC-FT-Orbitrap MS methods make them highly conducive to their application in heterogeneous biomedical specimens such as the ThinPrep cervical smears derived from HPV infected patients. Such HPV derived specimens may contain virion or integrated HPV DNA products expressed within the host cell. A study objective was the detection of HPV infection at protein level. Although PCR based detection and genotyping is the gold standard for HPV infection screening, the study of HPV DNA products may provide greater functional insights of their disease mechanisms. Our study resulted in the identification of six HPV proteins from a respective number of HPV strains in the ThinPrep sample sets tested and previously confirmed with PCR and hybridization. This number represents approximately the 28 % (6 out of 21) of the total number of the confirmed HPV strains as single infections or coinfections. Additionally, we provide proteomic evidence for the confident detection of the low risk HPV43 type, which is not otherwise detectable by the LA genotyping test. In particular, the HPV43 protein E4 was found to occur in experiment A to the smear specimen generating the iTRAQ 116 reporter ion from two different peptides 18

ACS Paragon Plus Environment

Page 19 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

uniquely traceable to this protein. This unexpected outcome constitutes a proof positive result for the global and unbiased screening of multiple HPV types using our proteomic approach. The detailed list of the identified HPV peptide sequences per identified protein is shown in Table 2 and includes the high risk HPV types 16, 52, 39 and the low risk HPV types 6, 67, 42, 43. Four of the identified protein sequences are defined as E4 proteins while the other three as L1 or L2 proteins. E4 proteins are expressed in the late phase of the papillomaviruses life cycle and are considered as the most highly expressed of all HPV proteins39 a fact that explains their frequent identification by the presented proteomic approach. L1 and L2 proteins are assembled late in order to form viral capsids with subsequent release of mature viruses39. Notably, protein existence at protein level is only referred for Minor capsid protein L2 of HPV16 out of all HPV proteins identified based on Uniprot annotation. All the other HPV protein sequences were inferred from homology or predicted, indicating that their existence is probable because clear orthologs exist in closely related species or there is no evidence at protein, transcript or homology levels. These observations further testify to the ability of our proposed methodology in providing first-ever protein level evidence for these particular gene products. The implementation of iTRAQ based quantitative proteomic method allowed the estimation of relative concentration levels for four of the identified HPV types which can testify to the methods specificity since the HPV presence was type- and individual- specific. Figure 4-I illustrates the high-resolution annotated HCD MS/MS signature for the tryptic peptide T(iTRAQ8plex)VTSDGTTVEVR uniquely surrogate to the Probable protein E4 originating from HPV42. In experiment B the protein extract uniquely containing this low risk HPV type was labeled with the iTRAQ 19

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 40

reagent 114. The reporter ion region of the HCD MS/MS spectrum reveals a high signal response at m/z 114.1107. This high-resolution qualitative and quantitative MS/MS evidence resulted in the novel, non-targeted protein level detection of HPV42. The CID MS/MS version of this tryptic peptide was identified in both experiments A and B wherein the inclusion of HPV42 infected individual was confirmed. Figure 4-II illustrates the annotated CID MS/MS signature with sub-ppm precursor

mass

accuracy,

for

the

tryptic

peptide

Q(iTRAQ8plex)AGTC(Methylthio)PPDIIPK(iTRAQ8plex) uniquely surrogate to the minor capsid protein L2 originating in the most prevalent high-risk HPV16. This protein was reproducibly identified with this same peptide, in experiments A and C. Such non-targeted, high-resolution LC-FT-Orbitrap-MS discovery findings can form the basis for the development of novel targeted selective reaction monitoring (SRM) LC-MS assays offering high analysis throughput with high sensitivity and absolute quantification efficiency (when spiked with stable isotope internal standards). For example the SRM LC-MS assay can be used for the routine screening of the HPV16 viral load in ThinPrep cervical smears for the early screening of cervical disease46. Differential phosphorylation events of key HPV proteins such as the E6 and E7 are involved in the regulatory interactions that drive several cellular functions within the virally infected cells47, 48. Additionally, an early study on HPV1 E4-encoded proteins

with

32

P

radiolabeling

showed

that

these

proteins

were

found

phosphorylated in vivo and in vitro49. Such findings suggest the potentially critical functional role of the phosphorylated forms of HPV proteins in the manifestation of host cell transformation. To maximize the yield of phosphoprotein identifications, protein precipitation, chaotrope/detergent solubilisation and heating at 90

C were

used for the ThinPrep sample preparation procedure, as described. These steps 20

ACS Paragon Plus Environment

Page 21 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

sufficiently

restricted

phosphatase

activity

without

the

need

to

use

kinase/phosphatase inhibitors. In this study we have identified two phosphopeptides for the E4-encoded proteins of HPV67 and HPV43, both with CID fragmentation. Despite their lower electrospray ionization potentials in positive ion mode50, diphosphorylated peptides derived from the E4-encoded proteins HPV43 were nevertheless detected. This may suggest their occurrence at significant amounts making their detection possible in the + ESI mode used for this study. Figure 5-I illustrates the annotated CID MS/MS spectrum (with the assistance of the Scaffold software

program)

for

the

tryptic

diphosphopeptide:R(iTRAQ8plex)LES(phospho)EC(methylthio)DS(phospho)TPTLR uniquely surrogate to the HPV43 E4 protein. This annotation illustrates the mass differences between the native and the phosphorylated forms of this peptide. The non-phosphorylated native form of this peptide was also identified under highresolution HCD MS/MS conditions (identified with charge states +2 and +3 at the precursor

ion),

giving

additional

analytical

confidence

for

the

observed

phosphorylated form observed as a non-random in vivo modification finding. The observed peptide sequence coverage for the E4 protein of HPV43 is shown in Figure 5-II. Effective global proteomic measurements are requisite to a biomarker discovery program and must adhere to well-defined statistical rules. In this study, the iTRAQ approach facilitated the establishment of the baseline biological variation of the ThinPrep smears obtained from healthy individuals. For this purpose, the protein ratios over a selected low risk HPV infected sample for experiments B and C were transformed to Log2Ratios. For the three biological normal replicates, the mean and the standard deviation (SD) values were calculated for each one of the proteins 21

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 40

identified. Figure 6-I depicts the scatter plots between the Log2Ratios of the different biological replicates in experiment B, while the plot of the technical replicates is superimposed. The distribution of the SD values (Figures 6-II, 6-III) showed that 95 % of the protein measurements yielded SD value less than 0.9 and 1.3 for experiments B and C respectively. Based on these values, sample size and regulation thresholds were calculated with test power 80 % at 95 % confidence. The minimum significant regulation thresholds for minimum sample size equal to 3 were 1.5 and 2.2 in Log2 scale. In other words, an average minimum threshold of 1.85 (3.6 fold change) would require the analysis of at least 3 biological replicates and would include the 95 % of the quantified proteins. Under the same rationale, the inclusion of 66 % of the proteins would require at least 4 replicates for a significant cutoff 0.6 (1.5 fold change) which would be more suitable for iTRAQ 8plex experimental designs. In future studies, the inclusion of dysplastic cervical cells in smear specimens may alter the suggested values to some extent, however, these indicative numbers may be used as a starting point for robust experimental designs.

Conclusions.

This

proof-of-principle

study

successfully

led

to

the

development and application of a robust iTRAQ 2DLC-FT-Orbitrap-MS approach to the analysis of human HPV positive and HPV negative ThinPrep smears specimens. Its application yielded the most in-depth, global, high-resolution qualitative and quantitative proteome coverage of the human ThinPrep smear to date. The utilization of state-of-the-art FT-Orbitrap mass spectrometry combined with twodimensional liquid chromatography allowed the achievement of very low detection limits as demonstrated by the confident identification of low abundant viral proteins in human derived clinical specimens for the first time without targeted enrichment. The 22

ACS Paragon Plus Environment

Page 23 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

HPV protein discovery findings of this present study can lead to the deployment of high-throughput

targeted

SRM

LC-MS

based

methods

for

the

absolute

measurements of viral loads towards improved risk assessment extending the standard HPV detection methods at genome level. The significant number of posttranslational phosphorylated peptides also testified to the methods sensitivity. In our study we identified approximately 30 % of the HPV types included in the sample sets. This indicates a major limitation of proteomic approaches for the study of nonmodel species such as that of HPV. Such a limitation stems from the lack of appropriate protein databases. Proteogenomic approaches such as the PIT (Proteomics Informed by Transcriptomics) method can further improve the sensitivity of proteomics methods in capturing HPV proteins within the human cervical smear51. The estimation of the biological variability of cervical smear specimens presented here can be considered in future studies for cervical cancer biomarker discovery as well as in functional studies for HPV-host cell interactions. For meaningful protein concentration differences, we suggest that at least four well characterized ThinPrep smears per biological state would be required for the determination of disease specific protein changes over a 1.5-fold modulation of 66 % of the measured proteins. The development of robust sample procurement and processing methods, as part of the proteomic analysis workflow, should further improve these metrics. The state-of-the-art 2D LC-FT-Orbitrap MS method used in this study for the analysis of the relatively un-invasive ThinPrep smear specimen offers new avenues for the screening of protein biomarkers of potential preventive and treatment utility to cervical cancer.

23

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 40

Acknowledgements We would like to thank all of the families and patients we have investigated in our study and also funding from the Wessex Medical Research Trust, the Wessex Medical Research (grants N11 and N12) and Hope which made it possible to establish our proteomics infrastructure along that made it possible to undertake this study. We all thank the PRIDE team for their data processing support. Moreover, we are indebted to Mr Roger Allsopp and Mr Derek Coates for their enthusiasm and vision in promoting this work and the director of the Institute for Life Sciences, Professor Peter J. Smith for his continual help, support and guidance.

24

ACS Paragon Plus Environment

Page 25 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

FIGURE LEGENDS

Figure 1. Schematic representation of the experimental workflow.

Figure 2. Proportional Venn diagrams summarizing the number of identified proteins and peptides. I) Comparison of the three different experiments conducted with either HCD or CID fragmentation. II) Global comparison of the two different activation methods at protein and peptide level.

Figure 3. Gene Ontology (GO) annotation of the identified proteome as computed by the DAVID software. I) Classification of the top 50 high scoring proteins per experiment based on their tissue expression consensus. II) Classification of the total proteome based on Biological Process (BP), Cellular Component (CC) and Molecular Function (MF) terms.

Figure

4.

I)

Annotated

FTMS

HCD

MS/MS

spectrum

of

T(iTRAQ8plex)VTSDGTTVEVR with m/z: 784.92615 Da (+1.06 ppm), z=+2, traceable to Probable protein E4 of HPV42 found in Experiment B. II) Annotated ITMS

CID

MS/MS

spectrum

of

Q(iTRAQ8plex)AGTC(Methylthio)PPDIIPK(iTRAQ8plex) with m/z: 632.01813 Da (+0.6 ppm), z=+3, traceable to Minor capsid protein L2 of HPV16 found in Experiment C.

Figure 5. I) Annotated ITMS CID MS/MS spectrum of the phosphopeptide R(iTRAQ8plex)LES(phospho)EC(methylthio)DS(phospho)TPTLPR

with

m/z: 25

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 40

705.30310 Da (-2.61 ppm), z=+3, traceable to Protein E4 of HPV43 found in Experiment A along with the annotated ITMS CID MS/MS spectrum of the native phosphopeptide. II) The peptide sequence coverage of Protein E4 of HPV43.

Figure 6. I) Plot of the log2Ratios over 113 sample, of biological replicate 118 against biological replicates 119, 121 and technical replicate 115 against technical replicate 116 as calculated in experiment B. II) Histogram of the distribution of the Standard Deviations (SD) calculated for each individual protein between the three biological replicates in experiment B. III) Histogram of the distribution of the Standard Deviations (SD) calculated for each individual protein between the three biological replicates in experiment C.

26

ACS Paragon Plus Environment

Page 27 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

TABLE LEGENDS Table 1. HPV infection characteristics of the cervical smear specimens.

Table 2. The list of HPV proteins and peptides identified per experiment and fragmentation method.

27

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 28 of 40

SUPPORTING INFORMATION

Definitions for Supporting Information Sections 1-4:

Supporting information 01. Peptides identified in experiment A with both HCD and CID methods.

Supporting information 02. Peptides identified in experiment B with both HCD and CID methods.

Supporting information 03. Peptides identified in experiment C with both HCD and CID methods.

Supporting information 04. List of unique proteins identified from all experiments.

Supporting information 05. The PCR-hybridization images for the confirmation of HPV presence or absence into the cervical smear specimens.

28

ACS Paragon Plus Environment

Page 29 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

References

1. Cox, J.; Mann, M., Quantitative, high-resolution proteomics for data-driven systems biology. Annual review of biochemistry 2011, 80, 273-99. 2. Bouchal, P.; Roumeliotis, T.; Hrstka, R.; Nenutil, R.; Vojtesek, B.; Garbis, S. D., Biomarker discovery in low-grade breast cancer using isobaric stable isotope tags and twodimensional liquid chromatography-tandem mass spectrometry (iTRAQ-2DLC-MS/MS) based quantitative proteomic analysis. Journal of proteome research 2009, 8, (1), 362-73. 3. Garbis, S. D.; Tyritzis, S. I.; Roumeliotis, T.; Zerefos, P.; Giannopoulou, E. G.; Vlahou, A.; Kossida, S.; Diaz, J.; Vourekas, S.; Tamvakopoulos, C.; Pavlakis, K.; Sanoudou, D.; Constantinides, C. A., Search for potential markers for prostate cancer diagnosis, prognosis and treatment in clinical tissue specimens using amine-specific isobaric tagging (iTRAQ) with two-dimensional liquid chromatography and tandem mass spectrometry. Journal of proteome research 2008, 7, (8), 3146-58. 4. Garbis, S. D.; Roumeliotis, T. I.; Tyritzis, S. I.; Zorpas, K. M.; Pavlakis, K.; Constantinides, C. A., A novel multidimensional protein identification technology approach combining protein size exclusion prefractionation, peptide zwitterion-ion hydrophilic interaction chromatography, and nano-ultraperformance RP chromatography/nESI-MS2 for the in-depth analysis of the serum proteome and phosphoproteome: application to clinical sera derived from humans with benign prostate hyperplasia. Analytical chemistry 2011, 83, (3), 708-18. 5. Addona, T. A.; Shi, X.; Keshishian, H.; Mani, D. R.; Burgess, M.; Gillette, M. A.; Clauser, K. R.; Shen, D.; Lewis, G. D.; Farrell, L. A.; Fifer, M. A.; Sabatine, M. S.; Gerszten, R. E.; Carr, S. A., A pipeline that integrates the discovery and verification of plasma protein biomarkers reveals candidate markers for cardiovascular disease. Nature biotechnology 2011, 29, (7), 635-43. 6. Pavlou, M. P.; Kulasingam, V.; Sauter, E. R.; Kliethermes, B.; Diamandis, E. P., Nipple aspirate fluid proteome of healthy females and patients with breast cancer. Clinical chemistry 2010, 56, (5), 848-55. 7. Nagaraj, N.; Wisniewski, J. R.; Geiger, T.; Cox, J.; Kircher, M.; Kelso, J.; Paabo, S.; Mann, M., Deep proteome and transcriptome mapping of a human cancer cell line. Molecular systems biology 2011, 7, 548. 8. Sardana, G.; Marshall, J.; Diamandis, E. P., Discovery of candidate tumor markers for prostate cancer via proteomic analysis of cell culture-conditioned medium. Clinical chemistry 2007, 53, (3), 429-37. 9. Naji, S.; Ambrus, G.; Cimermancic, P.; Reyes, J. R.; Johnson, J. R.; Filbrandt, R.; Huber, M. D.; Vesely, P.; Krogan, N. J.; Yates, J. R., 3rd; Saphire, A. C.; Gerace, L., Host cell interactome of HIV-1 Rev includes RNA helicases involved in multiple facets of virus production. Molecular & cellular proteomics : MCP 2012, 11, (4), M111 015313. 10. Munday, D. C.; Surtees, R.; Emmott, E.; Dove, B. K.; Digard, P.; Barr, J. N.; Whitehouse, A.; Matthews, D.; Hiscox, J. A., Using SILAC and quantitative proteomics to investigate the interactions between viral and host proteomes. Proteomics 2012, 12, (4-5), 666-72. 11. Dove, B. K.; Surtees, R.; Bean, T. J.; Munday, D.; Wise, H. M.; Digard, P.; Carroll, M. W.; Ajuh, P.; Barr, J. N.; Hiscox, J. A., A quantitative proteomic analysis of lung epithelial (A549) cells infected with 2009 pandemic influenza A virus using stable isotope labelling with amino acids in cell culture. Proteomics 2012, 12, (9), 1431-6. 12. Feng, X.; Zhang, J.; Chen, W. N.; Ching, C. B., Proteome profiling of Epstein-Barr virus infected nasopharyngeal carcinoma cell line: identification of potential biomarkers by 29

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

comparative iTRAQ-coupled 2D LC/MS-MS analysis. Journal of proteomics 2011, 74, (4), 567-76. 13. Munday, D. C.; Emmott, E.; Surtees, R.; Lardeau, C. H.; Wu, W.; Duprex, W. P.; Dove, B. K.; Barr, J. N.; Hiscox, J. A., Quantitative proteomic analysis of A549 cells infected with human respiratory syncytial virus. Molecular & cellular proteomics : MCP 2010, 9, (11), 2438-59. 14. Moore, P. S.; Chang, Y., Why do viruses cause cancer? Highlights of the first century of human tumour virology. Nature reviews. Cancer 2010, 10, (12), 878-89. 15. Butt, A. Q.; Miggin, S. M., Cancer and viruses: A double-edged sword. Proteomics 2012, 12, (13), 2127-38. 16. zur Hausen, H., Papillomaviruses and cancer: from basic studies to clinical application. Nature reviews. Cancer 2002, 2, (5), 342-50. 17. Crow, J. M., HPV: The global burden. Nature 2012, 488, (7413), S2-3. 18. Martin, D.; Gutkind, J. S., Human tumor-associated viruses and new insights into the molecular mechanisms of cancer. Oncogene 2008, 27 Suppl 2, S31-42. 19. Alameda, F.; Marinoso, M. L.; Bellosillo, B.; Muset, M.; Pairet, S.; Soler, I.; Romero, E.; Larrazabal, F.; Carreras, R.; Serrano, S., Detection of HPV by in situ hybridization in thinlayer (ThinPrep) cervicovaginal samples. Tumour biology : the journal of the International Society for Oncodevelopmental Biology and Medicine 2011, 32, (3), 603-9. 20. Bai, H.; Sung, C. J.; Steinhoff, M. M., ThinPrep Pap Test promotes detection of glandular lesions of the endocervix. Diagnostic cytopathology 2000, 23, (1), 19-22. 21. Khan, M. J.; Castle, P. E.; Lorincz, A. T.; Wacholder, S.; Sherman, M.; Scott, D. R.; Rush, B. B.; Glass, A. G.; Schiffman, M., The elevated 10-year risk of cervical precancer and cancer in women with human papillomavirus (HPV) type 16 or 18 and the possible utility of type-specific HPV testing in clinical practice. Journal of the National Cancer Institute 2005, 97, (14), 1072-9. 22. Schorge, J. O.; Hossein Saboorian, M.; Hynan, L.; Ashfaq, R., ThinPrep detection of cervical and endometrial adenocarcinoma: a retrospective cohort study. Cancer 2002, 96, (6), 338-43. 23. Whitlock, E. P.; Vesco, K. K.; Eder, M.; Lin, J. S.; Senger, C. A.; Burda, B. U., Liquidbased cytology and human papillomavirus testing to screen for cervical cancer: a systematic review for the U.S. Preventive Services Task Force. Annals of internal medicine 2011, 155, (10), 687-97, W214-5. 24. Burd, E. M., Human papillomavirus and cervical cancer. Clinical microbiology reviews 2003, 16, (1), 1-17. 25. Swan, D. C.; Tucker, R. A.; Tortolero-Luna, G.; Mitchell, M. F.; Wideroff, L.; Unger, E. R.; Nisenbaum, R. A.; Reeves, W. C.; Icenogle, J. P., Human papillomavirus (HPV) DNA copy number is dependent on grade of cervical disease and HPV type. Journal of clinical microbiology 1999, 37, (4), 1030-1034. 26. Alazawi, W.; Pett, M.; Arch, B.; Scott, L.; Freeman, T.; Stanley, M. A.; Coleman, N., Changes in cervical keratinocyte gene expression associated with integration of human papillomavirus 16. Cancer research 2002, 62, (23), 6959-65. 27. Rader, J. S.; Malone, J. P.; Gross, J.; Gilmore, P.; Brooks, R. A.; Nguyen, L.; Crimmins, D. L.; Feng, S.; Wright, J. D.; Taylor, N.; Zighelboim, I.; Funk, M. C.; Huettner, P. C.; Ladenson, J. H.; Gius, D.; Townsend, R. R., A unified sample preparation protocol for proteomic and genomic profiling of cervical swabs to identify biomarkers for cervical cancer screening. Proteomics. Clinical applications 2008, 2, (12), 1658-69. 28. Gu, Y.; Wu, S. L.; Meyer, J. L.; Hancock, W. S.; Burg, L. J.; Linder, J.; Hanlon, D. W.; Karger, B. L., Proteomic analysis of high-grade dysplastic cervical cells obtained from ThinPrep slides using laser capture microdissection and mass spectrometry. Journal of proteome research 2007, 6, (11), 4256-68. 29. Unwin, R. D.; Griffiths, J. R.; Whetton, A. D., Simultaneous analysis of relative protein expression levels across multiple samples using iTRAQ isobaric tags with 2D nano LC-MS/MS. Nature protocols 2010, 5, (9), 1574-82. 30

ACS Paragon Plus Environment

Page 30 of 40

Page 31 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

30. Levin, Y., The role of statistical power analysis in quantitative proteomics. Proteomics 2011, 11, (12), 2565-7. 31. Zhou, C.; Simpson, K. L.; Lancashire, L. J.; Walker, M. J.; Dawson, M. J.; Unwin, R. D.; Rembielak, A.; Price, P.; West, C.; Dive, C.; Whetton, A. D., Statistical considerations of optimal study design for human plasma proteomics and biomarker discovery. Journal of proteome research 2012, 11, (4), 2103-13. 32. Vizcaino, J. A.; Cote, R.; Reisinger, F.; Barsnes, H.; Foster, J. M.; Rameseder, J.; Hermjakob, H.; Martens, L., The Proteomics Identifications database: 2010 update. Nucleic acids research 2010, 38, (Database issue), D736-42. 33. Davies, P.; Kornegay, J.; Iftner, T., Current methods of testing for human papillomavirus. Best practice & research. Clinical obstetrics & gynaecology 2001, 15, (5), 677-700. 34. Gravitt, P. E.; Peyton, C. L.; Alessi, T. Q.; Wheeler, C. M.; Coutlee, F.; Hildesheim, A.; Schiffman, M. H.; Scott, D. R.; Apple, R. J., Improved amplification of genital human papillomaviruses. Journal of clinical microbiology 2000, 38, (1), 357-61. 35. Gilar, M.; Olivova, P.; Daly, A. E.; Gebler, J. C., Orthogonality of separation in twodimensional liquid chromatography. Analytical chemistry 2005, 77, (19), 6426-34. 36. Kil, Y. J.; Becker, C.; Sandoval, W.; Goldberg, D.; Bern, M., Preview: a program for surveying shotgun proteomics tandem mass spectrometry data. Analytical chemistry 2011, 83, (13), 5259-67. 37. Boersema, P. J.; Raijmakers, R.; Lemeer, S.; Mohammed, S.; Heck, A. J., Multiplex peptide stable isotope dimethyl labeling for quantitative proteomics. Nature protocols 2009, 4, (4), 484-94. 38. Gyongyosi, E.; Szalmas, A.; Ferenczi, A.; Konya, J.; Gergely, L.; Veress, G., Effects of human papillomavirus (HPV) type 16 oncoproteins on the expression of involucrin in human keratinocytes. Virology journal 2012, 9, 36. 39. Longworth, M. S.; Laimins, L. A., Pathogenesis of human papillomaviruses in differentiating epithelia. Microbiology and molecular biology reviews : MMBR 2004, 68, (2), 362-72. 40. Biet, F.; Locht, C.; Kremer, L., Immunoregulatory functions of interleukin 18 and its role in defense against bacterial pathogens. J Mol Med-Jmm 2002, 80, (3), 147-162. 41. Cho, Y. S.; Kang, J. W.; Cho, M.; Cho, C. W.; Lee, S.; Choe, Y. K.; Kim, Y.; Choi, I.; Park, S. N.; Kim, S.; Dinarello, C. A.; Yoon, D. Y., Down modulation of IL-18 expression by human papillomavirus type 16 E6 oncogene via binding to IL-18. FEBS letters 2001, 501, (23), 139-45. 42. Syrjanen, S.; Naud, P.; Sarian, L.; Derchain, S.; Roteli-Martins, C.; Longatto-Filho, A.; Tatti, S.; Branca, M.; Erzen, M.; Hammes, L. S.; Costa, S.; Syrjanen, K., Immunosuppressive cytokine Interleukin-10 (IL-10) is up-regulated in high-grade CIN but not associated with high-risk human papillomavirus (HPV) at baseline, outcomes of HR-HPV infections or incident CIN in the LAMS cohort. Virchows Archiv : an international journal of pathology 2009, 455, (6), 505-15. 43. Shaw, J. L.; Smith, C. R.; Diamandis, E. P., Proteomic analysis of human cervicovaginal fluid. Journal of proteome research 2007, 6, (7), 2859-65. 44. Dasari, S.; Pereira, L.; Reddy, A. P.; Michaels, J. E.; Lu, X.; Jacob, T.; Thomas, A.; Rodland, M.; Roberts, C. T., Jr.; Gravett, M. G.; Nagalla, S. R., Comprehensive proteomic analysis of human cervical-vaginal fluid. Journal of proteome research 2007, 6, (4), 1258-68. 45. Obiezu, C. V.; Diamandis, E. P., Human tissue kallikrein gene family: applications in cancer. Cancer letters 2005, 224, (1), 1-22. 46. Zerbini, M.; Venturoli, S.; Cricca, M.; Gallinella, G.; De Simone, P.; Costa, S.; Santini, D.; Musiani, M., Distribution and viral load of type specific HPVs in different cervical lesions as detected by PCR-ELISA. Journal of clinical pathology 2001, 54, (5), 377-80. 47. Massimi, P.; Banks, L., Differential phosphorylation of the HPV-16 E7 oncoprotein during the cell cycle. Virology 2000, 276, (2), 388-94. 31

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 32 of 40

48. Massimi, P.; Pim, D.; Kuhne, C.; Banks, L., Regulation of the human papillomavirus oncoproteins by differential phosphorylation. Molecular and cellular biochemistry 2001, 227, (1-2), 137-44. 49. Grand, R. J.; Doorbar, J.; Smith, K. J.; Coneron, I.; Gallimore, P. H., Phosphorylation of the human papillomavirus type 1 E4 proteins in vivo and in vitro. Virology 1989, 170, (1), 201-13. 50. Nilsson, C. L., Advances in quantitative phosphoproteomics. Analytical chemistry 2012, 84, (2), 735-46. 51. Evans, V. C.; Barker, G.; Heesom, K. J.; Fan, J.; Bessant, C.; Matthews, D. A., De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nature methods 2012, 9, (12), 1207-11.

32

ACS Paragon Plus Environment

Page 33 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Table 1 iTRAQ labeling

Experiment A Most Prevalent High Risk (16,18) ✓ ✓ ✓ ✓

113 114 115 116 117 118 119 121

Experiment B

High Risk

Low Risk

✓ ✓ ✓ ✓ ✓

✓ ✓ ✓ ✓ ✓ ✓ ✓

Most Prevalent High Risk (16,18) -

Experiment C

High Risk

Low Risk

✓ ✓ ✓ -

✓ ✓ ✓ ✓ -

Most Prevalent High Risk (16,18) ✓ ✓ ✓ -

High Risk

Low Risk

✓ -

✓ ✓ ✓ ✓ -

Table 2 HPV types confirmed with Linear Array (LA) genotyping test

HCD

UniProt Accession # Q84295

HPV Type

Protein

Peptides (FDR2 peptides, not detectable by (LA) genotyping test Q705H6

43

E4 protein

TTTLEGTTVEVTLR



LESECDSTPTLPR LLNLTPDQRPPSQIPR



Predicted



RLESECDSTPTLPR ✓ ✓

RLES(phos)ECDS(phos)T PTLPR

✓ ✓ ✓

33

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

155x155mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 34 of 40

Page 35 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

240x172mm (150 x 150 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

150x261mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 36 of 40

Page 37 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

241x154mm (150 x 150 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

271x185mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 38 of 40

Page 39 of 40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

158x184mm (150 x 150 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

196x129mm (150 x 150 DPI)

ACS Paragon Plus Environment

Page 40 of 40