Characterization of Biomarkers in Polycystic Ovary Syndrome (PCOS) Using Multiple Distinct Proteomic Platforms† B. Matharoo-Ball,‡ C. Hughes,§ L. Lancashire,‡ D. Tooth,O G. Ball,‡ C. Creaser,‡ M. Elgasim,§ R. Rees,‡ R. Layfield,*,| and W. Atiomo§ School of Biomedical and Natural Sciences, Nottingham Trent University, Clifton Lane, Nottingham, NG11 8NS, United Kingdom, Department of Obstetrics and Gynaecology, School of Human Development, and School of Biomedical Sciences, University of Nottingham, Nottingham, NG7 2UH England, and Novozymes-Delta Limited, Nottingham, NG7 1FD England Received March 7, 2007
Abstract: A variety of prefractionation methods (including a novel reversed-phase solid-phase-extraction (RP-SPE) combined with SDS-PAGE and proteomic based approaches (e.g., 2-dimensional gel electrophoresis (2DE) and MALDI-TOF mass spectrometry combined with Artificial Neural Network (ANN) bioinformatic tools) were used to investigate the protein/peptide signatures in patients with Polycystic Ovary Syndrome (PCOS). Four potential PCOS biomarkers were identified (complement C4R3c and C4γ and haptoglobin R and β chains). Keywords: Polycystic Ovary Syndrome • 2DE • MALDI-TOF MS • Complement Proteins • Haptoglobin
Introduction Proteomic analyses have recently become an important technology in the quest to identify disease biomarkers for both diagnostic and research purposes.1-4 They provide a comprehensive, unbiased overview of the functional entities in an individual’s cell, tissue, or serum sample, which often vary both qualitatively and quantitatively in the presence/progression of disease. PCOS (Polycystic Ovary Syndrome), as a complex, multi-system disease, is clearly an ideal candidate for proteomic analysis.5 There is a body of work on intrinsic defects in thecal cells.6-8 Although these defects relate to cellular proteins, it is relevant to the aetiology of PCOS. In similar biomarker studies in cancer, changes in intracellular proteins are reflected to an extent in the serum proteome. Current methodological challenges in applying proteomic approaches to biomarker identification in complex disease states such as PCOS include some of the known problems associated with molecular analyses of † Presentations: A related abstract was presented at the 1st Beijing International Conference of Obstetrics and Gynaecology, held in Beijing, China, October 7-10, 2005 and the XVIII FIGO World Congress of Gynaecology & Obstetrics, in Kuala Lumpur, Malaysia, 5-10 November, 2006. * To whom correspondence should be addressed. Dr. Robert Layfield. School of Biomedical Sciences, University of Nottingham Medical School, Queen’s Medical Centre, Nottingham NG7 2UH, U.K. Tel, +44 (0)115 8230107; fax, +44 (0)115 98230142; e-mail,
[email protected]. ‡ Nottingham Trent University. § Department of Obstetrics and Gynaecology, School of Human Development, University of Nottingham. | School of Biomedical Sciences, University of Nottingham. O Novozymes-Delta Limited.
10.1021/pr070124b CCC: $37.00
2007 American Chemical Society
complex biological fluids such as serum/plasma and highlight the need for the development of sensitive and reproducible protocols.9 The dynamic range of serum proteins is in excess of 10 orders of magnitude posing a severe challenge to traditional proteomic methods for the detection of the low-abundance analytes, such as possible biomarkers.10 Removal of the high-abundance proteins such as albumin is usually the first step in serum proteome analysis to reduce the complexity of the sample. A number of approaches have been reported for protein removal or depletion including affinity separation,11-13 protein precipitation,14,15 ultrafiltration,13,16,17 and chromatographic separation.18 However, the major drawback of these techniques is the concomitant removal of potentially important LMW (low molecular weight) proteins and peptides from the sample as a result of association with other larger species such as albumin.13,19 Albumin has been shown to act as a carrier and transport protein which binds hormones, cytokines, and lipoproteins.20 The ideal depletion method to allow the analysis of LMW proteins and peptides would therefore remove the high molecular weight proteins of high abundance, while releasing the LMW species. None of the above technologies are capable of analyzing the whole proteome at once. However, the study of a subset of the proteome is feasible and could be applied to study biological systems. The ultimate goal is to be able to use complementary proteomic-based techniques to gain a comprehensive understanding of the serum/plasma proteome/ peptidome. The aim of our study was to identify novel biomarkers for the diagnosis and management of women with PCOS using proteomics. Given proteomics’ potential for the identification of potential biomarkers (despite the methodological challenges highlighted above), this could generate new leads for research into the pathogenic mechanisms of PCOS, improve diagnosis, and facilitate a more coherent management strategy. In this study, we have used a variety of prefractionation methods (including a novel reversed-phase solid-phase-extraction (RPSPE) combined with SDS-PAGE and proteomic based approaches (e.g., 2-dimensional gel electrophoresis (2DE) and MALDI-TOF mass spectrometry combined with Artificial Neural Network (ANN) bioinformatic tools) to investigate the protein/peptide signatures in patients with PCOS. Alternative fractionation by RP-SPE enables a comprehensive overview of Journal of Proteome Research 2007, 6, 3321-3328
3321
Published on Web 06/29/2007
Characterization of Biomarkers in PCOS
the proteome to be performed by SDS-PAGE without the need for a depletion step.
Materials and Methods 1. Materials. Precast 4-12% acrylamide bis-Tris SDS-PAGE gels and Colloidal Blue Staining Kit were from Invitrogen Paisley, U.K. Mass spectrometry-grade trypsin was from Promega (Southampton, U.K.). C18 ZipTips were from Millipore Corp. (Bedford, MA), and BioLyte carrier ampholytes 3-10 and lowmelt agarose were from Bio-Rad (Hercules, CA). All other reagents were from Sigma (Poole, Dorset, U.K.) or Fisher Scientific Corp. (Loughborough, Leics, U.K.). 2. Sample Preparation. Blood samples were collected after written informed consent from women with PCOS and ageand BMI-matched controls after ethical approval for the study was granted by the Nottingham Local Research Ethics Committee. An initial cohort of five women with PCOS and five ageand BMI-matched control pairs were used for the 2DE study. For the RP-SPE study, samples from these women and an additional PCOS/control pair were used (n ) 6 vs 6). Finally, for the MALDI study, samples from 12 PCOS and 12 controls were used (including samples used for the earlier proteomic studies). All five PCOS women in the 2DE study were Caucasian, and in the control group, there were three Caucasians and two Afro-Caribbeans. In the RP-SPE study, the additional PCOS woman was of South-Asian origin, and the control was AfroCaribbean. In the MALDI study, nine PCOS women were Caucasian, one of South Asian origin, one Afro-Caribbean, and another recorded as “other” ethnicity. Nine of the controls were Caucasian, and three were of Afro-Caribbean origin. All the subjects were diagnosed with PCOS, using the Rotterdam criteria21 if they presented with 2 or more of oligo-/anovulation, polycystic ovaries and clinical and/or biochemical signs of hyperandrogenism. Control subjects had regular 21-35 day menstrual cycles and no more than one of ultrasound evidence of polycystic ovaries or evidence of hyperandrogenism. One control patient (C4) had a raised ovarian volume on one ovary and raised total testosterone level (3.0 nmol/L), but she was not clinically thought to have PCOS as she had regular menstrual periods, a normal Ferriman Gallwey score, and did not have the classic ultrasound features of polycystic ovaries. Whole blood was allowed to clot at room temperature, and the serum was collected by centrifugation at 4000 rpm for 10 min at 4 °C, aliquoted into smaller fractions, and stored at -80 °C until required. 3. 2-DE. 3.1. Serum Depletion and Purification. Antibodybased columns (Amersham Biosciences, Little Chalfont, Bucks, U.K.) were used for serum depletion of albumin and immunoglobulin G (IgG), and the resultant samples (∼500 µL) were subjected to protein precipitation (Amersham Biosciences, Little Chalfont, Bucks, U.K.) according to the manufacturer’s protocol. The protein pellet was resuspended in solubilization buffer (7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 10 mM DTT, and 1% (v/v) BioLyte carrier ampholytes 3-10), to a final concentration of 5 µg/µl, as determined by modified Bradford assay (Quick Start Protein Assay; Bio-Rad). 3.2. Isoelectric Focusing (IEF) and Second Dimension SDSPolyacrylamide Gel Electrophoresis (SDS-PAGE). IEF was essentially as described previously by Hopkinson et al.,22 except that 8-20% acrylamide resolving gels were used, and SDSPAGE was performed according to Laemmli.23 Proteins in the gels were stained at 4 °C to improve spot resolution using a 3322
Journal of Proteome Research • Vol. 6, No. 8, 2007
technical notes modified silver staining protocol (Plus One Silver Stain Kit; Amersham Biosciences) compatible with mass spectrometry.24 Imaging of the stained gels was performed using Agfa Fotolook v3.0 and a Duoscan T1200 flatbed scanner, followed by qualitative and quantitative analysis using DELTA 2D software (v3.3; Decodon, Greifswald, Germany). 4. RP-SPE and SDS PAGE. 4.1. Serum Fractionation. Serum fractionation was performed using SPE employing large pore (1000 Å) polystyrenedivinylbenzene (PDVB) 25 mg resin (International Sorbent Technologies, mid-Glamorgan, U.K.). MS grade mobile phase (Riedel de Hae¨n, Sigma) was applied by vacuum, to condition the column using 70% (v/v) ACN (acetonitrile)/0.1% (v/v) TFA (trifluroacetate) and equilibrated with 0.1% TFA. Serum samples (100 µL, ∼7 mg of total protein) were then applied and washed through with 0.1% (v/v) TFA. Bound proteins were eluted using a stepwise 5-100% (v/v) ACN gradient. Eluates were dried using SpeedVac, re-suspended in loading buffer, and denatured prior to SDS-PAGE analysis. Samples were analyzed using precast 4-12% acrylamide bistris SDS-PAGE gels (Invitrogen, Paisley, U.K.) as described by Hopkinson et al.22 Gels were stained with Coomassie blue. Imaging of the stained gels was performed as described above. 5. Peptide Mass Fingerprinting and Database Searching. Individual 2DE spots and RP-SPE/SDS-PAGE bands showing expression differences in PCOS versus controls were excised. Proteins were trypsin-digested in-gel and subjected to MALDITOF MS to obtain peptide mass fingerprints (PMF). The monoisotopic m/z values of the tryptic peptide ions were submitted to the Aldente software (http://www.expasy.org/ tools/aldente/) and searched against the Human Swiss-Prot and TrEMBL databases to afford protein identification. 6. Western Blotting. Samples (50 µg) were mixed in a ratio of 1:1 with loading buffer, denatured prior to loading on 5-20% acrylamide SDS-PAGE gels, and separated by electrophoresis at 35 mA per gel for 2.5 h. Electroblotting was carried out, and the proteins were transferred to nitrocellulose membrane by electrolysis at 40 mA. The membrane was blocked using 5% skimmed milk (Marvel, Premier International Foods Ltd, U.K.) in TBS-T (10 mM Tris·HCl (pH 7.5), 150 mM sodium chloride, and 0.05% (v/v) Tween-20) and probed with a polyclonal primary antibody raised against the complement component C4R chain (Santa Cruz, Heidelberg, Germany) followed by a horseradish peroxidase-conjugated secondary antibody. Antibody binding was detected by chemiluminescence using an ECL detection kit (Amersham Biosciences). 7. MALDI Mass Spectrometry. 7.1. Sample Preparation. 7.1.1. Protein and Peptide Analysis. Sample preparation randomization was carried out prior to sample handling and analysis. The same aliquot of serum diluted 1 in 10 with 0.1% TFA was used for protein and tryptic peptide analysis. Diluted serum (25 µL) was C18 ZipTip-fractionated according to manufacturer’s instructions using the Xcise robotic system (Proteome systems, Shimadzu, U.K.), and the eluted proteins/peptides were spotted together with SA, 10 mg/mL, onto the MALDI target plate by the robotic system. The remaining elutate was carried forward for tryptic digestion. Briefly, the fractionated serum sample was combined with ammonium bicarbonate (16.6 µL of 100 mM), water (7.6 µL), and trypsin (1.3 µL of 0.5 µg/µL) and incubated at 37 °C overnight. The reaction was quenched and the sample cleaned using C18 ZipTip according to the manufacturer’s instructions and spotted onto the MALDI target using the dried droplet method with CHCA (10 mg/mL solution in 50% ACN + 0.1% TFA., LaserBio Labs, Cedex,
technical notes France), The target plate was analyzed using the AXIMA-CFR+ MALDI-TOF-MS (Shimadzu, Manchester, U.K.) in linear mode using the raster option for proteins and reflectron and autoquality modes for tryptic peptides. A bovine serum albumin (BSA) control was used to ensure the efficiency of the digestion procedure, and 0.1% TFA blank was used to ensure there was no contamination from the reagents or plate. Close external calibration was performed using protein calibration Proteomix3 and Proteomix2 for peptides (Laser Biolabs, Cedex, France). The resultant mass spectra were all examined visually, and rejected outliers (having poor signal-to-noise) removed from the data set before being processed for bioinformatic analysis. 8. ANNs Analysis. 8.1. Data Processing. The raw mass spectral data (m/z, intensities) were exported as ASCII files and smoothed to yield rounded masses and intensities for the mass range of interest (m/z 1000-25 000). These intensity values were subsequently used as inputs to the ANN models, developed using Statistica 7.0 (StatSoft, Inc. Tulsa, OK). The models developed were used to predict membership of each sample to one of two output classes; control (1) or PCOS (2). 8.2. Model Architecture. Model architecture and network parameters are described in Lancashire et al.25 Prior to training, samples were randomly divided into three subsets; training (60%, n ) 14), test (20%, n ) 5), and validation (20%, n ) 5). Training was conducted using 50 randomly extracted data splits, with the validation subset used to independently validate the model on blind data each time to ensure overfitting did not occur. Sensitivity analysis was conducted to identify those inputs having the greatest influence upon the model system with respect to classification performance. These inputs were then subjected to further training. This involved using a stepwise approach which enabled the identification of the optimal subset of biomarkers for class differentiation. The process was stopped when there was no further significant (p )