Article pubs.acs.org/jpr
Spanish Human Proteome Project: Dissection of Chromosome 16 V. Segura,† J. A. Medina-Aunon,‡ E. Guruceaga,† S. I. Gharbi,‡ C. González-Tejedo,‡ M. M. Sánchez del Pino,§ F. Canals,∥ M. Fuentes,⊥ J. Ignacio Casal,# S. Martínez-Bartolomé,‡ F. Elortza,¶ J. M. Mato,¶ J. M. Arizmendi,□ J. Abian,● E. Oliveira,△ C. Gil,▼ F. Vivanco,○ F. Blanco,& J. P. Albar,‡,$ and F. J. Corrales†,$,# †
ProteoRed-ISCIII, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona, Spain ProteoRed-ISCIII, Centro Nacional de Biotecnología - CSIC, Madrid, Spain § ProteoRed-ISCIII. Biochemistry Department, University of Valencia, Valencia, Spain ∥ ProteoRed-ISCIII, Proteomics Laboratory and Medical Oncology Research Program, Vall d’Hebron Institute of Oncology, Vall d’Hebron University Hospital Research Institute, Barcelona, Spain ⊥ ProteoRed-ISCIII, Centro de Investigación del Cáncer/IBMCC (USAL/CSIC), Departamento de Medicina and Servicio General de Citometría, University of Salamanca, IBSAL, 37007 Salamanca, Spain # ProteoRed-ISCIII, Functional Proteomics, Department of Cellular and Molecular Medicine, Centro de Investigaciones Biológicas (CIB-CSIC), Madrid, Spain ¶ ProteoRed-ISCIII, Proteomics Platform, CIC bioGUNE, CIBERehd, ProteoRed, Bizkaia Technology Park, Derio, Spain □ ProteoRed-ISCIII, Department of Biochemistry and Molecular Biology, University of the Basque Country, UPV/EHU, Spain ● ProteoRed-ISCIII, CSIC/UAB Proteomics Laboratory, Instituto de Investigaciones Biomédicas de Barcelona-CSIC/IDIBAPS, Bellaterra, Spain △ ProteoRed-ISCIII, Plataforma de Proteomica, Parc Cientifıc de Barcelona, Universitat de Barcelona, Barcelona, Spain ▼ ProteoRed-ISCIII, Departamento de Microbiología II, Facultad de Farmacia, Universidad Complutense de Madrid, Madrid, Spain ○ ProteoRed-ISCIII, Department of Immunology, IIS-Fundacion Jimenez Diaz, Madrid, Spain & ProteoRed-ISCIII, Osteoarticular and Aging Research Lab, Proteomics Unit, ProteoRed/ISCIII, Rheumatology Division, INIBIC−CHU A Coruña, As Xubias 84, 15006 A Coruña, Spain ‡
S Supporting Information *
ABSTRACT: The Chromosome 16 Consortium forms part of the Human Proteome Project that aims to develop an entire map of the proteins encoded by the human genome following a chromosome-centric strategy (C-HPP) to make progress in the understanding of human biology in health and disease (B/D-HPP). A Spanish consortium of 16 laboratories was organized into five working groups: Protein/Antibody microarrays, protein expression and Peptide Standard, S/MRM, Protein Sequencing, Bioinformatics and Clinical healthcare, and Biobanking. The project is conceived on a multicenter configuration, assuming the standards and integration procedures already available in ProteoRed-ISCIII, which is encompassed within HUPO initiatives. The products of the 870 protein coding genes in chromosome 16 were analyzed in Jurkat T lymphocyte cells, MCF-7 epithelial cells, and the CCD18 fibroblast cell line as it is theoretically expected that most chromosome 16 protein coding genes are expressed in at least one of these. The transcriptome and proteome of these cell lines was studied using gene expression microarray and shotgun proteomics approaches, indicating an ample coverage of chromosome 16. With regard to the B/D section, the main research areas have been adopted and a biobanking initiative has been designed to optimize methods for sample collection, management, and storage under normalized conditions and to define QC standards. The general strategy of the Chr-16 HPP and the current state of the different initiatives are discussed. KEYWORDS: Human Proteome Project, chromosome 16, proteomics, transcriptomics
■
INTRODUCTION
Special Issue: Chromosome-centric Human Proteome Project
The sequencing of the human genome1,2 has provided the first
Received: September 24, 2012 Published: December 12, 2012
level of complexity of human biology. Despite this undoubted © 2012 American Chemical Society
112
dx.doi.org/10.1021/pr300898u | J. Proteome Res. 2013, 12, 112−122
Journal of Proteome Research
Article
and protein capture reagents to detect and quantify chromosome 16 proteins, with special interest in those proteins with weak experimental evidence; (2) definition of changes in the levels of specific proteins that may explain pathogenic mechanisms and provide novel diagnostic, prognostic and therapeutic approaches to improving the management of patients afflicted with diseases that represent a social burden worldwide and particularly in Spain. These include cancer, obesity, neurologic, rheumatoid, cardiovascular, and infectious disorders; (3) to create a computational environment to analyze, integrate and share data in line with the C-HPP consortium; and (4) to promote the development of prototypic devices as precursors of preclinical and clinical instruments, by collaborative efforts with industrial stakeholders. In the present manuscript we describe the progress in Chr16 investigation. Annotations and data analysis of Chr16 genes, selection of cell lines to cover the chromosome 16 proteome, transcriptomic and proteomic shotgun characterization of the selected cell lines, progress in informatics resources, B/D design and working plan for SRM/MRM analysis are discussed.
success, there is still a vast territory to be explored before a complete understanding of our own biology is achieved. Proteins are the tools used by the cells to perform most of their processes: gene expression, splicing events, metabolic reactions, signaling pathways, cell shape, differentiation. Proteins therefore decide the cell fate. The knowledge of their specific functions, regulatory mechanisms, networks of interaction, abundance, isoform patterns, thus constitutes an essential issue for the understanding of human physiology in health and disease. Unravelling the human proteome is a project that, despite the obvious a priori analogies with the sequencing of the human genome, represents a task that is far more challenging and whose boundaries still remain to be defined. As Amos Bairoch stated in a recent interview, the proteome is a fractal system; the deeper you go, the more you have to do.3 The proteomic universe generated from the information encrypted in the genome is massive as the 20300 protein-coding human genes comprise up to an estimated 1 million different protein species derived from DNA recombination, alternative splicing of mRNAs, processing events and a myriad of covalent modifications of many types that display a dynamic behavior resulting in different profiles with time, location, association with other proteins and biological, pathological and pharmacological perturbations. These observations must also be made considering that the human body comprises about 230 cell types4 with different gene expression profiles and, therefore, different proteomes. Moreover, the vast heterogeneity, wide dynamic range and different ionization efficiencies of proteins, among other reasons, restrict detection and quantification capacity on a largescale omics level even using state of the art technology. Accordingly, it has been estimated that close to 35% of predicted proteins have yet to be observed reliably by mass spectrometry.5 The Human Proteome Organization (HUPO) has coordinated the efforts of the international community promoting several initiatives6−11 to describe the human proteome in a systematic manner during the last twelve years (http://www. hupo.org). In September 2010, during the annual HUPO conference in Sydney, Australia, the Human Proteome Project (HPP) was officially launched.4 The HPP is designed to map the entire human proteome in a systematic effort using currently available and emerging techniques. With the aim of providing a comprehensive map of human proteins in their biological context, the HPP rests on three pillars: shotgun and targeted mass spectrometry (MS), polyclonal and monoclonal antibodies (Ab), and integrated knowledge base (KB: Ensembl, neXtProt (gold), GPMDB (green), and Peptide Atlas (1%FDR at protein level)). The project is organized according to a chromosomecentric strategy (C-HPP) where scientific groups from different nationalities agree to characterize the proteome of a selected chromosome following the guidelines of the international consortium and an open access policy.12,13 All 24 chromosomes have already been adopted by as many teams from 21 different countries. Knowledge and technical resources generated within the C-HPP initiative are expected to contribute to progress in the understanding and treatment of diseases by the integration and coordination of specific research initiatives in the Biology and Disease (B/D) − HPP initiative.12 Chromosome 16 (Chr16) has been adopted by a Spanish consortium belonging to the Spanish Proteomics Institute, ProteoRed-ISCIII. These teams combine scientists with recognized research and clinical skills which ensures an efficient C-HPP and B/D-HPP joint development and integration. The main general objectives of the Spanish HPP (Chr-16 SpHPP) project are: (1) development of analytical methods based on MS
■
MATERIALS AND METHODS
Chromosome 16 Annotation
The information about genes, transcripts and proteins and the relationship between accession numbers of most of the databases used for description of Chr16 have been extracted from Ensembl database (http://www.ensembl.org) release 68. We used the biomaRt package of Bioconductor to query the database, and R functions for processing and graphical representations of the results. The retrieved information from Ensembl includes data from eGenetics, GNF Gene Expression Atlas, OMIM and Uniprot public repositories. In addition, we considered the GPMDB (01−10−2012 release), HPA (version 10.0) and our protein expression vector database as additional sources of biological and experimental knowledge. Microarray Hybridization and Data Analysis
Experiments were performed in triplicate with three selected cell lines, MCF7, CCD18 and Jurkat. Cells were harvested in TRIzol Reagent (Invitrogen) and the RNA was extracted according to the manufacturer’s instructions. As a last step of the extraction procedure, the RNA was purified with the RNeasy Mini-kit (Qiagen, Hilden, Germany). Prior to cDNA synthesis, RNA integrity from each sample was confirmed on Agilent RNA Nano LabChips (Agilent Technologies). The sense cDNA was prepared from 300 ng of total RNA using the Ambion WT Expression Kit. The sense strand cDNA was then fragmented and biotinylated with the Affymetrix GeneChip WT Terminal Labeling Kit (PN 900671). Labeled sense cDNA was hybridized to the Affymetrix HuGene 1.0 ST array according to the manufacturer’s protocols and using the GeneChip Hybridization, Wash and Stain Kit. Genechips were scanned with the Affymetrix GeneChip Scanner 3000. Both background correction and normalization were performed using the RMA (Robust Multichip Average) algorithm.14,15 R/Bioconductor14 was used for preprocessing and statistical analysis. After normalization, an expression threshold for each cell line was calculated to eliminate low intensity probe sets that can be considered technical noise. First, probe sets were sorted by increasing expression value. For each probe set a t test was performed to evaluate the differential expression between this probe set and the median value of the probe sets with lower expression levels. The p-values obtained were corrected for multiple hypothesis testing using FDR 113
dx.doi.org/10.1021/pr300898u | J. Proteome Res. 2013, 12, 112−122
Journal of Proteome Research
Article
method16 and FDR > 0.95 (background signal) was considered as the criterion to calculate the corresponding intensity threshold. Microarray data files were submitted to the GEO (Gene Expression Omnibus) database and are available under accession number GSE40168.
isocratic conditions of 90% B for 5 min and return to initial conditions in 2 min. Generally, 1/5th of the sample was run by nanoLC−MS, injection volume was 5 μL. Data acquisition was performed with a TripleTOF 5600 System (AB SCIEX, Foster City, CA). Data was acquired using an ionspray voltage floating (ISVF) 2800 V, curtain gas (CUR) 20, interface heater temperature (IHT) 150, ion source gas 1 (GS1) 30, declustering potential (DP) 85 V. All data were acquired using an information-dependent acquisition (IDA) mode with Analyst TF 1.5 software (AB SCIEX, Foster City, CA). For IDA parameters, 0.25 s MS survey scan in the mass range of 350−1250 m/z was followed by 50 MS/MS scans of 50 ms in the mass range of 100−1500 m/z (total cycle time: 2.8 s). Switching criteria were set to ions greater than mass to charge ratio (m/z) 350 and smaller than m/z 1250 with charge state of 2−5 and an abundance threshold of more than 90 counts (cps). Former target ions were excluded for 20 s. The IDA rolling collision energy (CE) parameters script was used for automatically controlling the CE.
Protein Sample Preparation
Cell growth was carried out between three laboratories from the Chr-16 HPP consortium, following standard growth conditions. At exponential growth, cells were collected and lysed in a CHAPS/Urea lysis buffer (7 M urea, 2 M thiourea, 4% CHAPS, protease and phosphatase inhibitors). 100 μg of each cell line were digested in-solution. Briefly, cell lysates were precipitated with methanol/chloroform, as described elsewhere,17 and precipitated proteins were resuspended in denaturing and reducing buffer (8 M Urea, 25 mM ammonium bicarbonate, 10 mM DTT) for 1 h at 37 °C and cysteine residues were alkylated with 50 mM iodoacetamide for 45 min in the dark. Samples were diluted with 25 mM ammonium bicarbonate to a final concentration of 2 M Urea and Proteomics grade Trypsin (Sigma Aldrich) was then added at a 1:50 w:w ratio (protein:enzyme) and the reaction was left for 18 h at 37 °C. Samples were dried in a vacuum centrifuge (SpeedVac, Savant, Inc.) and stored at −20 °C until off-line peptide fractionation.
Data Analysis
MS and MS/MS data obtained for each sample fraction were processed using Analyst TF 1.5.1 Software (AB SCIEX, Foster City, CA). Raw data were translated to mascot general file (mgf) format and searched against the UniProtKB/Swiss-Prot human database (release 2012_06, June 13) that contains 36852 proteins and their corresponding reversed sequences, using an in-house Mascot Server v. 2.4 (Matrix Science, London, U.K.). Search parameters were set as follows: carbamidomethyl cysteine as fixed modification, oxidized methionines and acetylation of the peptide amino termini as variable ones. Peptide mass tolerance was set to 50 ppm, both in MS and MS/MS mode, and 2 missed cleavages were allowed. Typically, an accuracy of ±10 ppm was found both for MS and MS/MS spectra. False Discovery Rates (FDR ≤ 1% at the protein level) for protein identification were manually calculated.18 For standard reporting and comparison analysis, first, MS mgf files and their corresponding Mascot results, formatted as mzIdentML were submitted to the ProteoRed MIAPE web repository19 to create both the MIAPE MS and MSI reports as the ProteoRed MIAPE web toolkit20 usage guide recommends. Afterward, the MIAPEs were compared through the MIAPE Extractor Software v. 2.92 (http://www.proteored.org/miapeextractor). Finally, to adhere to the C-HPP reporting guidelines, MIAPE MS and MSI compliant reports were translated to PRIDE XML and, together with the raw MS file for each sample fraction, were submitted to the ProteomeXchange repository (http://www. proteomexchange.org/) following the ProteomeXchange submission guidelines.
Basic pH-RP-HPLC
Tryptic peptides were fractionated off-line on a 2.1 × 100 mm C18, 5 μm XBridge column (BEH Technology, Waters), connected to a Smartline HPLC system (KNAUER). Solvent A was 10 mM NH4OH, pH 9.4 and solvent B was 10 mM NH4OH, pH 9.4, 80% Methanol. Peptides were separated at a flow rate of 150 μL/min following isocratic conditions on a linear gradient from 2 to 25% solvent B in 15 min; from 25 to 70% B in 40 min and 70 to 100% B in 5 min and maintained at 100% B for 5 min; 15 min equilibration was allowed (98% A:2% B) prior to next sample injection. Blanks were run between samples to avoid carry over. Thirty fractions were collected during the 80 min total chromatographic run. To maximize orthogonal separation, fractions were mixed throughout the gradient (e.g., FR 1 with FR 16, FR 2 with FR 17, and so on or FR 1 with FR 11 and FR 21, FR 2 with FR 12 and FR 22, etc.). After pooling, we were left with 10 or 15 fractions respectively that were dried in a speed-vac dryer and stored at −20 °C until LC−MS/MS acquisition. Liquid Chromatography and Mass Spectrometry Analysis
The second dimension of the 2D-LC tandem MS analysis was performed using a nano liquid chromatography system (Eksigent Technologies nanoLC Ultra 1D plus, AB SCIEX, Foster City, CA) coupled to a TripleTOF 5600 mass spectrometer (AB SCIEX, Foster City, CA) with a nanospray ionization source. The analytical column was a silica-based reversed phase column C18 ChromXP 75 μm × 15 cm, 3 μm particle size and 120 Å pore size (Eksigent Technologies, AB SCIEX, Foster City, CA). The trap column was a C18 ChromXP, 3 μm particle diameter, 120 Å pore size (Eksigent Technologies, AB SCIEX, Foster City, CA), switched online with the analytical column. The loading pump delivered a solution of 0.1% formic acid in water at 2 μL/min. The nanopump provided a flow-rate of 300 nL/min and was operated under gradient elution conditions, using 0.1% formic acid in water as mobile phase A, and 0.1% formic acid in acetonitrile as mobile phase B. Gradient elution was performed according to the following scheme: isocratic conditions of 98% A: 2% B for 1 min, a linear increase to 30% B in 110 min, a linear increase to 40% B in 10 min, a linear increase to 90% B in 5 min,
■
RESULTS AND DISCUSSION
Structure of the Consortium and Main Goals
The Spanish Chr16 Consortium form part of the global initiative Chromosome-based Human Proteome Project (C-HPP) that aims to develop an entire map of the proteins encoded following a chromosome-centric strategy to make progress in the understanding of human biology in health and disease (B/D-HPP). After several preliminary meetings, the kick-off workshop was held in Madrid, Spain, on the second of April 2012. Adopting the general rules established for HPP,12 the Spanish initiative is constructed on a multidisciplinary basis with 16 scientific groups organized into five working sections namely, Protein/Antibody 114
dx.doi.org/10.1021/pr300898u | J. Proteome Res. 2013, 12, 112−122
Journal of Proteome Research
Article
Ensembl, UniprotKB and GPMDB, as well as a tissue gene expression pattern of Chr16 genes using the eGenetics and GNF Atlas (these databases are from Ensembl Web site). This information will allow selection of tissues/cell types for optimum Chr16 coverage and may prove its value mostly in the case of those gene products with no or faint experimental observations. (2) Development of SRM/MRM assays for the quantification of Chr16 proteins. The setting up of quantitative SRM/MRM assays is the core of the project and requires attention to all gene products of Chr16 and their variants, especially for those proteins which as yet remain elusive. Protein expression systems are already on hand to produce light and heavy versions of some of the challenging proteins, largely enhancing our capacity for the development of efficient SRM assays with optimal results in biological matrices. A library with more than 11000 expression vectors is currently available (M. Fuentes, personal communication, www.dnasu.org, www.plasmid.med.harvard.edu) that might be also of interest for other chromosome initiatives. As recommended in the HPP guidelines, detection and quantification of Chr16 proteins will be initially performed in 3 different cell lines that were selected according to the results provided by the bioinformatics analysis performed on the existing annotations and public data related to the Chr16. (3) Quantification of Chr16 proteins in control and disease samples. The definition of quantitative alterations of specific proteins related to pathogenic processes is of major interest and hence a priority for this project. Although special attention will be dedicated to Chr16 proteins, the study will not be restricted solely to these proteins as most of the diseases also involve genes and/or gene products located in different chromosomes. Of special interest are proteins measurable in biofluids as the consortium is aware of the relevance of developing diagnostic tests based on noninvasive procedures. These research avenues will be explored in close coordination with national clinical and biomedical research initiatives, including the Spanish National Biobank Network, CIBER, RETICS and other networked entities from the Carlos III National Health Institute (ISCIII) and MINECO as well as other International Biobank Platforms (BBMRI). (4) Development of antibody-based protein measurement procedures. This objective will be pursued in close collaboration with the Human Protein Atlas initiative that currently accounts for more than 14079 genes with protein expression profiles based on 17,298 specific antibodies (HPA V10.0). The combination of these resources with our expertise on protein arrays28 ensures the rapid progress of this objective and the capacity for the construction of prototypic devices for preliminary verification of potential biomarkers. Moreover, capture reagents will complement mass spectrometry data relative to protein abundance and tissue and subcellular distribution. (5) Definition of standardized protocols (SOPs), data formats and bioinformatics pipelines for data submission to the public repositories. This is a central issue that must be carefully considered to enable data sharing under a common quality criterion. Samples must be collected, stored and analyzed following a common protocol that ensures the traceability and the reliable comparison of the results from different laboratories. Moreover, data require complex statistical analysis, integration with other sources of biological information (transcriptomics, metabolomics, etc.) and generation of curated data sets in standardized formats that allow deposition in public repositories. It is worth mentioning that ProteoRed-ISCIII has been deeply involved in the development of methods for data standardization within the Proteomics Standards Initiative of HUPO (HUPO-PSI) with significant contributions to the
microarrays, protein expression and Peptide Standard, S/MRM, Protein Sequencing, Bioinformatics and Clinical healthcare and Biobanking.. The C-HPP initiative is based on the ProteoRedISCIII platform, a proteomics consortium integrating 21 proteomics laboratories with more than 7 years of experience in the coordination of multicenter activities,21 sharing state of the art technology, data standardization,20,22 bioinformatics19 and research.23−27 Our experience in these areas paves the way for the efficient progress of the Chr-16 HPP in the short term. The B/DChr-16 HPP has been initially launched focusing on the particular areas of expertise of the participating laboratories although the initiative has been conceived as being open in two different directions. First, to collaboration with other chromosome initiatives in order to cope with the biological complexity of human diseases and also to the involvement of the research activities of scientists interested in joining the project and who can benefit from the knowledge and tools generated by the HPP community. The main structure and principal end points of the Chr-16 HPP initiative are summarized in Figure 1. The specific aims
Figure 1. Executive diagram of Chr16 SpHPP. The SpHPP is governed by a steering committee and rests on three main pillars, analytical resources, bioinformatics and research, under the supervision of the Spanish Ministry for Innovation and Competitivity (MINECO) and the National Institute of Health Carlos III (ISCIII). Protein mapping and the quantitative methods developed will promote a better understanding of human biology in health and disease, leading to the discovery of novel biomarkers and therapeutic targets and to the development of devices with clinical applications for the benefit of patients. In addition of MINECO and ISCIII, Biotech, Pharmaceutical companies and other stakeholders have already enrolled in the project. The research institutions currently participating in the SpHPP are shown: CIB (Centro de Investigaciones Biológicas), CicBiogune, CIMA (Centro para la ́ Investigación Médica Aplicada), CNB (Centro Nacional de Biotecnologia), ́ FJD (Fundación Jiménez Diaz), INIBIC (Instituto de Investigación Biomédica A Coruña), PCB (Parc Cientific de Barcelona), UAB (Universidad Autónoma de Barcelona), UCM (Universidad Complutense de Madrid), USAL (Universidad de Salamanca), UV (Universidad de ́ Valencia), VHIO (Vall D’Hebron Instituto de Oncologia).
include: (1) Annotation and data analysis of Chr16 to generate a theoretical definition of the Chr16 proteome according to 115
dx.doi.org/10.1021/pr300898u | J. Proteome Res. 2013, 12, 112−122
Journal of Proteome Research
Article
definition of the Minimum Definition About a Proteomics Experiment (MIAPE) documents29 and the PSI-XML formats (http://www.psidev.info/). (6) Implementation of prototypic devices as precursors of preclinical and clinical instruments. The identification of biomarker panels for stratification of diseased populations will bring clear benefits for society and for the biotech sector in particular. In a preliminary preclinical phase, taking advantage of our own experience, we propose the design of testing devices based on antibodies (ELISA, arrays) for a proof of concept; however, wide population screening and follow-up will require the participation of industrial partners with the capacity to transform the developed prototypes into commercial products with applications for patient care. Bioinformatics
An essential element within the Chr-16 SpHPP consortium is bioinformatics. The combination of well-known and traditional issues regarding protein and peptide identification, statistical meaning or data analysis with current trends in data synchronization and public deposition are pivotal to establishing a comprehensive environment where the resulting data can be shared and studied in depth by the proteomics community. To accomplish this goal, a cross-sectional workgroup has been founded recruiting experienced computational scientists from across the different laboratories that currently make up the Chr-16 SpHPP consortium. This working group started from the HPP bioinformatics initiative launched in Beijing May 2012 and since then maintains telecom meetings every fifteen days to track progress in the different lines of work. In addition to the leading focus of this bioinformatics team, the use of proteomics data standards for analyzing, storing and reporting the experimental data, the following items summarize the main activities developed so far. (1) Central database of proteomics associated with Chromosome 16. In accordance with the C-HPP’s aims, the information regarding the detection and characterization of the proteins of Chr16 has been deposited in a centralized database. A safe connection was provided to the participating laboratories to both check the reported data and introduce new experimental evidence of this subset of proteins. The stored data will be also linked to the C-HPP’s reference resources such as NextProt (http://www.nextprot.org), PeptideAtlas (http://www.peptideatlas.org/) or GPMDB (http:// www.thegpm.org) through the existing Application Programming Interfaces (APIs) or public URLs. This database also contains protein data and annotations regarding biological function, cellular location, metabolic and cellular pathways or disease relationships. (2) MIAPE-compliant repository for experimental data. The storage of data under consensus formats ensuring traceability and compliance with QC rules is pivotal in collaborative projects, most importantly if massive amounts of data are being generated, to guarantee efficient global analysis and biological outcomes. In this sense the HUPO-PSI’s standard data formats and MIAPE guidelines29 are strictly followed and accordingly, the ProteoRed MIAPE web repository19 is one of the mainstays of the project. Starting from the PSI’s XML-based standards mzML 30 and mzIdentML 31 and through the ProteoRed MIAPE web toolkit20 users are able to store all the information derived from the experiments in a straightforward manner, ensuring the compliance of the deposited data with the widely accepted principles gathered in the MIAPE Mass Spectrometry32 and Protein Informatics guidelines.33 (Figure 2a). (3) Global analysis of the experimental data. From the data deposited in the ProteoRed MIAPE web repository a global
Figure 2. Data management flow-chart. Data will be uploaded in the ProteoRed MIAPE web repository. The ProteoRed MIAPE extractor will allow data calculations, data set comparisons and general analyses in a friendly and efficient way, a pivotal aspect to integrate and evaluate the information provided by different laboratories. Moreover, through the ProteoRed MIAPE web toolkit, the MIAPE compliant data can be exported into different formats, including those compatible with ongoing HUPO PSI initiatives, and finally deposited in the Pride database.
analysis will be performed with special emphasis on the biological interpretation (Figure 2b). As part of the ProteoRed MIAPE web toolkit, the ProteoRed MIAPE extractor has been developed to provide the users with a friendly graphic-based environment where all generated data can be analyzed and integrated from a different experimental and functional perspectives. This software is freely available at http://proteored.org/miape-extractor. (4) Data sharing and reporting. Reporting the results is another key issue in the project. In this regard, assuming that all data are compliant with the MIAPE reporting guidelines, this software environment allows the generation of Mass Spectrometry and Protein and Peptide identification files in formats recommended in the EBI PRIDE XML specifications. As an example, data can be exported using this XML format and stored in the centralized public data repository for protein and peptide identifications EMBL-EBI PRIDE database34 (Figure 2c), and consequently enable their sharing through the ProteomeXchange Web site (http://www.proteomexchange.org/).35 Annotation of Chr16
The gene and protein sets of chromosome 16 were analyzed using the information available in ENSEMBL, UniprotKB and GPMDB (Figure 3) and the results are summarized in Supplementary Table 1 (Supporting Information). The work plans of both C-Chr-16 HPP and B/D-Chr-16 HPP are designed according to the biochemical information resulting from this analysis. Chromosome 16 spans about 89 million base pairs, representing almost 3% of the total DNA in human cells. More than 2300 genes (Ensembl V68) have so far been identified, although the actual figure is still under debate as is reflected by the differences found in different databases. A total of 870 protein-coding genes have been proposed so far on Chr16, 866 among them with Uniprot reference. In light of mass spectrometry data on GPMDB, the figure of unknown proteins was defined as 305 proteins as we decided arbitrarily to include all proteins with log(e) values above −15 assuming that their observation might have some constraints in complex matrices. 116
dx.doi.org/10.1021/pr300898u | J. Proteome Res. 2013, 12, 112−122
Journal of Proteome Research
Article
been associated with obesity and cachexia.40,41 Nevertheless, we are aware that the multigenic nature of these diseases will require the collaboration with other chromosome initiatives for a complete understanding of their molecular pathogenesis. Transcriptomics and Proteomics Profiling of MCF7, CCD18 and Jurkat Cell Lines
To characterize the proteome of Chr16, three cell lines were selected, MCF7 breast cancer human epithelial cells, CCD18 human colon fibroblasts, and Jurkat human T lymphocytes. Transcriptomic and shotgun proteomics experiments were conducted to define in detail the molecular background of these cell lines. As for transcriptomics, the cluster of expressing genes was selected by defining intensity thresholds for each cell line under study, to eliminate low signal probe sets that are considered as technical noise (Supporting Information Figure 1). The resulting values were 5.10, 4.83, and 5.37 for MCF7, CCD18 and Jurkat cells, respectively. Upon filtering, 81.01, 78.54, and 83.03% of the microarray probe sets were considered for further analysis on MCF7, CCD18 and Jurkat cell lines respectively. Accordingly, a total of 19878 genes were expressed in at least one of the three cell lines, 86.13% were common, 0.75% were specifically detected in MCF7 cells, 2.06% in CCD18 and 1.23% in Jurkat cells (Figure 4A). Up to 1533 genes from Chr16 were not detected, most of these being nonprotein coding genes (Figure 4B). However, 84.6% (736) of the Chr16 protein coding genes were detected among a total number of 18465 genes homogenously distributed across chromosomes with roughly 75% coverage in all cases with the only exception of chromosome Y (Supporting Information Figure 2), likely due to the low proportion of protein coding genes in this particular chromosome. The high coverage of protein coding genes was expected as they are preferentially represented in the cDNA microarray used in our analysis. This may also explain the undetected 1533 genes of Chr16, most of which (91%) correspond to nonprotein coding genes. Shotgun proteomic analyses were also conducted for MCF7, CCD18 and Jurkat cell lines. Assuming a FDR below 1% at the protein level, 6608 proteins were identified, 3355, 3156, and 5892 in MCF7, CCD18 and Jurkat cells (Supp Table 2) respectively, 29% commonly found in the three cell lines (Supporting Information Table 2). The distribution of identified proteins across chromosomes is very dissimilar with chromosome coverages that were about 30% (Supporting Information Figure2), in clear contrast with the transcriptomic results that showed coverages above 75%, with the exception of ChrY (Supporting Information Figure 3A). This discrepancy most likely results from the limitations of proteomics technology to cope with the vast complexity and very high dynamic range of the proteome compared with gene expression microarray technology. This pilot study is being used to determine protein detection thresholds that will provide important hints for our proteomic workflows, particularly to establish SRM/MRM methods. Optimization of sample preparation to enhance extraction of membrane proteins, fractionation procedures to enrich low abundance proteins and definition of additional cell lines and environmental conditions to detect proteins specifically expressed under certain stimuli or pathological conditions are central aspects that will extend our proteome coverage. It is worth mentioning that these ongoing shotgun analyses are being performed on a multicenter basis encompassing 9 laboratories and that data will be analyzed following the procedures mentioned above. This will allow not only the generation of a full collection of MIAPE compliant results but also inter- and
Figure 3. Bioinformatic analysis of Chr16. Biological information of ENSEMBL, SWISSPROT and GPM was integrated to define the cluster of genes and protein coding genes of Chr16. Disease-related information was extracted from OMIM. HPA antibodies for Chr16 gene products were estimated including those for unknown proteins (log(e) > −15 in GPMDB), as well as the availability of expression vectors. MCF7, CCD 18 and Jurkat cell lines are estimated to cover theoretically about 80% of the Chr16 proteome. Transcriptomic and proteomic analysis were designed to define in detail their biological background and plan the strategy for the SRM/MRM experiments.
This threshold provides a slightly larger cluster of missing proteins than the EC < 4 (nongreen coded) proteins in GPMDB (250 missing proteins). On the other hand, to define three cell lines to identify and quantify proteins from chromosome 16, tissue and cell specific expression patterns were evaluated using eGenetics and the GNF Atlas. A combination of fibroblasts, lymphoid and epithelial cells might provide theoretically up to 71% coverage of chromosome 16 proteins and 39.7% of the missing species. Finally, availability of HPA antibodies and protein expression resources for studying Chr16 proteins were evaluated. HPA antibodies (612, 512 among them considered as high quality or supportive at least for one application) are already available for 486 Chr16 genes, including 121 for proteins within the unknown group. Moreover, expression vectors for 260 proteins, 58 included in the unknown group, are available. On the one hand, these tools guarantee the availability of methods for protein detection and quantification and on the other, the ability to produce nonobserved proteins to optimize mass spectrometry methods. Functional analysis with Gene Ontology (GO) and Ingenuity Pathway Analysis (IPA), revealed implication of chromosome 16 genes in most of the principal cell functions, as might be expected from a search with a large number of genes, including, among others, metabolism, cell proliferation, cell signaling or cell death. It is hardly surprising that 110 of these genes are involved in human diseases (OMIM) such as cancers, neurodegenerative syndromes, obesity and inflammation, a pathological condition commonly involved in the onset of many diseases. Proteins encoded by chromosome 16 genes have already been identified in this context such as cardiotrophin 1, a protein involved in the maintenance of the cellular energy balance36 and liver protection37−39 that is located in 16p11.2 locus, which has 117
dx.doi.org/10.1021/pr300898u | J. Proteome Res. 2013, 12, 112−122
Journal of Proteome Research
Article
Figure 4. (A) Venn diagrams representing transcriptomics data. Gene expression data from MCF7, CCD18 and Jurkat cells, and their intersection with Chr16 genes is represented. Different subgroups can be easily followed using the distinctive colored lines under each corresponding figure. From 2316 genes on chromosome 16, expression of 783 (33.8%) was detected, 95% among them commonly found in the three cell lines. (B) Intersection of expressed genes in all cell lines with total and Chr16 specific protein coding genes. Most Chr16 genes whose expression is detected are protein coding genes, 736 (93.99%) and represent a 85% of the protein coding genes of this chromosome. Ensembl version 68 was used as the reference database.
Figure 5. Venn diagram representing shotgun proteomic results. Protein identifications from MCF7, CCD18 and Jurkat experiments and the intersection with Chr16 genes are represented. Taking together results from the three cell lines, 292 proteins encoded by Chr16 genes (31.9%) were detected. Ensembl version 68 was used as the reference database.
intralaboratory evaluations using the different tools included in the ProteoRed MIAPE web tool kit. These MIAPE compliant results are available in PRIDE (accessions 27330−27369). Among the identified proteins, 292 are encoded by Chr16 genes, 82 were common to the three cell lines while 22 were only found in MCF7, 10 in CCD18 and 100 in Jurkat cells and 160 were identified in at least two cell lines (Figure 5). Noteworthy, 9 of the identified proteins are among the group of missing proteins, including Putative 3-phosphoinositide-dependent protein kinase 2, Pyridoxal-dependent decarboxylase domain-containing protein 2, Putative Rab-43-like protein, Calpain small subunit 2, Nodal modulator 3, Chromosome transmission fidelity protein 8 homologue isoform 2, L-fucose kinase, Uncharacterized protein C16orf59, tRNA-specific adenosine deaminase 1. The identified proteins represent a 33.56% coverage of Chr16 protein coding genes and correspond in all cell lines to proteins encoded by high expression genes with some exceptions (Figure 6A), as might be expected. It is worth noting that 17 proteins were detected at the protein level, 13 in Jurkat, 7 in CCD18 and 10 in MCF7 cells
while the transcript signal was below the accepted threshold (Figure 6B). Whether these findings indicate different half-lives of mRNA-protein, the limited detection ability of the particular transcripts due to the design of the gene expression array or if, alternatively, the expression threshold defined in this study requires revision, are open questions that should be investigated further. All the protein and gene expression results are summarized in Supporting Information Table 3 and a heat map representation of the specific data for Chr16 is provided (Supporting Information Figure 4). Additionally, functional enrichment analysis of Gene Ontology (GO) categories was carried out using standard hypergeometric tests.16,42 The set of protein coding genes of chromosome 16 was considered the gene universe and the proteins identified in this chromosome in the three cell lines (292 genes) the selected genes (Supporting Information Figure 4). B/D-SpHPP
The Biology and disease section of the HPP was first conceived in a meeting held in June 2012 and the structure of the project and 118
dx.doi.org/10.1021/pr300898u | J. Proteome Res. 2013, 12, 112−122
Journal of Proteome Research
Article
Figure 6. (A) Comparison of transcriptomic and proteomic data relative to Chr16 protein coding genes. Genes and proteins were ranked according to their expression level and then plotted against the gene expression values for both the genes (blue) and the proteins (red). The gene expression threshold is represented by the blue line. (B) Overlapping of proteins and transcripts in MCF7, CCD18 and Jurkat cell lines. The total number of proteins from shotgun experiments (open bars) and the corresponding transcripts detected by gene expression microarray analysis (gray bars) are represented.
Organizations for clinical/biomedical research (Spanish Biobanking Network, CIBER, RETICS) that have already expressed their interest in participating in the HPP project. Although the final structure will be decided in a meeting to be held in the last quarter of 2012, initial proposals have already been elaborated mostly focusing on the application of the C-HPP developed resources for investigation of acute coronary syndrome and aortic stenosis, Parkinson’s disease and muscular dystrophies, the involvement of cardiotrophin-1 and prohibitin-1 in obesity and nonalcoholic liver diseases, osteoarthritis and rheumatoid arthritis and characterization of the innate and adaptive immune response to Candida albicans infection.
the areas of interest, cancer, obesity, infectious, neurodegenerative, cardiovascular and rheumatoid disorders, were agreed on in a second meeting in June (Supporting Information Figure 5). Three main driving criteria for the selection process were established: social impact in Spain, the experience of the participating groups, and the involvement of Chr16 proteins. The initial objective is to launch the initiative taking advantage of the valuable experience of the scientists in the consortium and the financial support already available for their own research activities in the above-mentioned areas, to integrate the contributions from the rest of the scientific community in a second phase, in coordination with National Institutes and 119
dx.doi.org/10.1021/pr300898u | J. Proteome Res. 2013, 12, 112−122
Journal of Proteome Research
Article
Conclusion and Perspectives
Subramanian, A.; Wyman, D.; Rogers, J.; Sulston, J.; Ainscough, R.; Beck, S.; Bentley, D.; Burton, J.; Clee, C.; Carter, N.; Coulson, A.; Deadman, R.; Deloukas, P.; Dunham, A.; Dunham, I.; Durbin, R.; French, L.; Grafham, D.; Gregory, S.; Hubbard, T.; Humphray, S.; Hunt, A.; Jones, M.; Lloyd, C.; McMurray, A.; Matthews, L.; Mercer, S.; Milne, S.; Mullikin, J. C.; Mungall, A.; Plumb, R.; Ross, M.; Shownkeen, R.; Sims, S.; Waterston, R. H.; Wilson, R. K.; Hillier, L. W.; McPherson, J. D.; Marra, M. A.; Mardis, E. R.; Fulton, L. A.; Chinwalla, A. T.; Pepin, K. H.; Gish, W. R.; Chissoe, S. L.; Wendl, M. C.; Delehaunty, K. D.; Miner, T. L.; Delehaunty, A.; Kramer, J. B.; Cook, L. L.; Fulton, R. S.; Johnson, D. L.; Minx, P. J.; Clifton, S. W.; Hawkins, T.; Branscomb, E.; Predki, P.; Richardson, P.; Wenning, S.; Slezak, T.; Doggett, N.; Cheng, J. F.; Olsen, A.; Lucas, S.; Elkin, C.; Uberbacher, E.; Frazier, M.; Gibbs, R. A.; Muzny, D. M.; Scherer, S. E.; Bouck, J. B.; Sodergren, E. J.; Worley, K. C.; Rives, C. M.; Gorrell, J. H.; Metzker, M. L.; Naylor, S. L.; Kucherlapati, R. S.; Nelson, D. L.; Weinstock, G. M.; Sakaki, Y.; Fujiyama, A.; Hattori, M.; Yada, T.; Toyoda, A.; Itoh, T.; Kawagoe, C.; Watanabe, H.; Totoki, Y.; Taylor, T.; Weissenbach, J.; Heilig, R.; Saurin, W.; Artiguenave, F.; Brottier, P.; Bruls, T.; Pelletier, E.; Robert, C.; Wincker, P.; Smith, D. R.; Doucette-Stamm, L.; Rubenfield, M.; Weinstock, K.; Lee, H. M.; Dubois, J.; Rosenthal, A.; Platzer, M.; Nyakatura, G.; Taudien, S.; Rump, A.; Yang, H.; Yu, J.; Wang, J.; Huang, G.; Gu, J.; Hood, L.; Rowen, L.; Madan, A.; Qin, S.; Davis, R. W.; Federspiel, N. A.; Abola, A. P.; Proctor, M. J.; Myers, R. M.; Schmutz, J.; Dickson, M.; Grimwood, J.; Cox, D. R.; Olson, M. V.; Kaul, R.; Raymond, C.; Shimizu, N.; Kawasaki, K.; Minoshima, S.; Evans, G. A.; Athanasiou, M.; Schultz, R.; Roe, B. A.; Chen, F.; Pan, H.; Ramser, J.; Lehrach, H.; Reinhardt, R.; McCombie, W. R.; la Bastide, de, M.; Dedhia, N.; Blöcker, H.; Hornischer, K.; Nordsiek, G.; Agarwala, R.; Aravind, L.; Bailey, J. A.; Bateman, A.; Batzoglou, S.; Birney, E.; Bork, P.; Brown, D. G.; Burge, C. B.; Cerutti, L.; Chen, H. C.; Church, D.; Clamp, M.; Copley, R. R.; Doerks, T.; Eddy, S. R.; Eichler, E. E.; Furey, T. S.; Galagan, J.; Gilbert, J. G.; Harmon, C.; Hayashizaki, Y.; Haussler, D.; Hermjakob, H.; Hokamp, K.; Jang, W.; Johnson, L. S.; Jones, T. A.; Kasif, S.; Kaspryzk, A.; Kennedy, S.; Kent, W. J.; Kitts, P.; Koonin, E. V.; Korf, I.; Kulp, D.; Lancet, D.; Lowe, T. M.; McLysaght, A.; Mikkelsen, T.; Moran, J. V.; Mulder, N.; Pollara, V. J.; Ponting, C. P.; Schuler, G.; Schultz, J.; Slater, G.; Smit, A. F.; Stupka, E.; Szustakowski, J.; Thierry-Mieg, D.; Thierry-Mieg, J.; Wagner, L.; Wallis, J.; Wheeler, R.; Williams, A.; Wolf, Y. I.; Wolfe, K. H.; Yang, S. P.; Yeh, R. F.; Collins, F.; Guyer, M. S.; Peterson, J.; Felsenfeld, A.; Wetterstrand, K. A.; Patrinos, A.; Morgan, M. J.; de Jong, P.; Catanese, J. J.; Osoegawa, K.; Shizuya, H.; Choi, S.; Chen, Y. J.; Szustakowki, J.; International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 2001, 409, 860−921. (2) Venter, J. C.; Adams, M. D.; Myers, E. W.; Li, P. W.; Mural, R. J.; Sutton, G. G.; Smith, H. O.; Yandell, M.; Evans, C. A.; Holt, R. A.; Gocayne, J. D.; Amanatides, P.; Ballew, R. M.; Huson, D. H.; Wortman, J. R.; Zhang, Q.; Kodira, C. D.; Zheng, X. H.; Chen, L.; Skupski, M.; Subramanian, G.; Thomas, P. D.; Zhang, J.; Gabor Miklos, G. L.; Nelson, C.; Broder, S.; Clark, A. G.; Nadeau, J.; McKusick, V. A.; Zinder, N.; Levine, A. J.; Roberts, R. J.; Simon, M.; Slayman, C.; Hunkapiller, M.; Bolanos, R.; Delcher, A.; Dew, I.; Fasulo, D.; Flanigan, M.; Florea, L.; Halpern, A.; Hannenhalli, S.; Kravitz, S.; Levy, S.; Mobarry, C.; Reinert, K.; Remington, K.; Abu-Threideh, J.; Beasley, E.; Biddick, K.; Bonazzi, V.; Brandon, R.; Cargill, M.; Chandramouliswaran, I.; Charlab, R.; Chaturvedi, K.; Deng, Z.; Di Francesco, V.; Dunn, P.; Eilbeck, K.; Evangelista, C.; Gabrielian, A. E.; Gan, W.; Ge, W.; Gong, F.; Gu, Z.; Guan, P.; Heiman, T. J.; Higgins, M. E.; Ji, R. R.; Ke, Z.; Ketchum, K. A.; Lai, Z.; Lei, Y.; Li, Z.; Li, J.; Liang, Y.; Lin, X.; Lu, F.; Merkulov, G. V.; Milshina, N.; Moore, H. M.; Naik, A. K.; Narayan, V. A.; Neelam, B.; Nusskern, D.; Rusch, D. B.; Salzberg, S.; Shao, W.; Shue, B.; Sun, J.; Wang, Z.; Wang, A.; Wang, X.; Wang, J.; Wei, M.; Wides, R.; Xiao, C.; Yan, C.; Yao, A.; Ye, J.; Zhan, M.; Zhang, W.; Zhang, H.; Zhao, Q.; Zheng, L.; Zhong, F.; Zhong, W.; Zhu, S.; Zhao, S.; Gilbert, D.; Baumhueter, S.; Spier, G.; Carter, C.; Cravchik, A.; Woodage, T.; Ali, F.; An, H.; Awe, A.; Baldwin, D.; Baden, H.; Barnstead, M.; Barrow, I.; Beeson, K.; Busam, D.; Carver, A.; Center, A.; Cheng, M. L.; Curry, L.; Danaher, S.; Davenport, L.; Desilets, R.; Dietz, S.; Dodson, K.; Doup, L.;
The C-HPP Chr16 program is fully active in all the activities described above and a first deliverable time line has been proposed (Supporting Information Figure 6). SOP and bioinformatics plan to complete the process from the generation of raw data to their upload to open access repositories by the end of 2013. SRM/MRM experiments (assuming one gene- one protein) are expected to extend up to the end of 2014, programming the assays for roughly one-third of Chr16 proteins yearly (including known and unknown proteins). Each assay will be validated by three independent laboratories and by alternative analysis with capture reagents when available. Intervalidation experiments among other European Chromosome centric-HPP teams are under evaluation. This activity will span up to the end of 2016. The first phase is already on going and a follow-up meeting will take place in December. Isoform detection and posttranslationally modified species will be tackled in parallel, once the assay for the most predominant species is optimized. As for the B/D-SpHPP, collaboration has been started with the Spanish National Biobank Network to define the SOPs on sample collection and storing. The proposed time frame fits well with the schedule of the global C-HPP project with a particular milestone in 2014, when the HUPO congress will be hosted in Madrid and the outcomes concerning protein mapping and the contribution to the understanding human biology in health and disease will be discussed.
■
ASSOCIATED CONTENT
S Supporting Information *
Supplemental tables and figures. This material is available free of charge via the Internet at http://pubs.acs.org.
■
AUTHOR INFORMATION
Corresponding Author
#Corresponding author: Prof. Fernando J. Corrales Center for Applied Medical Research (CIMA), University of Navarra. Pió XII, 55. 31008 Pamplona, Spain. Tel: +34948194700 Fax: +34948194717 e-mail:
[email protected]. Author Contributions $
These authors contributed equally to this manuscript.
Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS All participating laboratories are members of ProteoRed-ISCIII. This work was supported by: ProteoRed and the Carlos III National Health Institute Agreement, ProteoRed-ISCIII; the agreement between FIMA and the “UTE project CIMA”; grants SAF2011-29312 from Ministerio de Ciencia e Innovación and ISCIII-RETIC RD06/0020 to F.J.C. and EU FP7 grant ProteomeXchange (grant number 260558); BBVA Foundation for its support to HUPO initiatives.
■
REFERENCES
(1) Lander, E. S.; Linton, L. M.; Birren, B.; Nusbaum, C.; Zody, M. C.; Baldwin, J.; Devon, K.; Dewar, K.; Doyle, M.; FitzHugh, W.; Funke, R.; Gage, D.; Harris, K.; Heaford, A.; Howland, J.; Kann, L.; Lehoczky, J.; LeVine, R.; McEwan, P.; McKernan, K.; Meldrim, J.; Mesirov, J. P.; Miranda, C.; Morris, W.; Naylor, J.; Raymond, C.; Rosetti, M.; Santos, R.; Sheridan, A.; Sougnez, C.; Stange-Thomann, N.; Stojanovic, N.; 120
dx.doi.org/10.1021/pr300898u | J. Proteome Res. 2013, 12, 112−122
Journal of Proteome Research
Article
Ferriera, S.; Garg, N.; Gluecksmann, A.; Hart, B.; Haynes, J.; Haynes, C.; Heiner, C.; Hladun, S.; Hostin, D.; Houck, J.; Howland, T.; Ibegwam, C.; Johnson, J.; Kalush, F.; Kline, L.; Koduru, S.; Love, A.; Mann, F.; May, D.; McCawley, S.; McIntosh, T.; McMullen, I.; Moy, M.; Moy, L.; Murphy, B.; Nelson, K.; Pfannkoch, C.; Pratts, E.; Puri, V.; Qureshi, H.; Reardon, M.; Rodriguez, R.; Rogers, Y. H.; Romblad, D.; Ruhfel, B.; Scott, R.; Sitter, C.; Smallwood, M.; Stewart, E.; Strong, R.; Suh, E.; Thomas, R.; Tint, N. N.; Tse, S.; Vech, C.; Wang, G.; Wetter, J.; Williams, S.; Williams, M.; Windsor, S.; Winn-Deen, E.; Wolfe, K.; Zaveri, J.; Zaveri, K.; Abril, J. F.; Guigó, R.; Campbell, M. J.; Sjolander, K. V.; Karlak, B.; Kejariwal, A.; Mi, H.; Lazareva, B.; Hatton, T.; Narechania, A.; Diemer, K.; Muruganujan, A.; Guo, N.; Sato, S.; Bafna, V.; Istrail, S.; Lippert, R.; Schwartz, R.; Walenz, B.; Yooseph, S.; Allen, D.; Basu, A.; Baxendale, J.; Blick, L.; Caminha, M.; Carnes-Stine, J.; Caulk, P.; Chiang, Y. H.; Coyne, M.; Dahlke, C.; Mays, A.; Dombroski, M.; Donnelly, M.; Ely, D.; Esparham, S.; Fosler, C.; Gire, H.; Glanowski, S.; Glasser, K.; Glodek, A.; Gorokhov, M.; Graham, K.; Gropman, B.; Harris, M.; Heil, J.; Henderson, S.; Hoover, J.; Jennings, D.; Jordan, C.; Jordan, J.; Kasha, J.; Kagan, L.; Kraft, C.; Levitsky, A.; Lewis, M.; Liu, X.; Lopez, J.; Ma, D.; Majoros, W.; McDaniel, J.; Murphy, S.; Newman, M.; Nguyen, T.; Nguyen, N.; Nodell, M.; Pan, S.; Peck, J.; Peterson, M.; Rowe, W.; Sanders, R.; Scott, J.; Simpson, M.; Smith, T.; Sprague, A.; Stockwell, T.; Turner, R.; Venter, E.; Wang, M.; Wen, M.; Wu, D.; Wu, M.; Xia, A.; Zandieh, A.; Zhu, X. The sequence of the human genome. Science 2001, 291, 1304−1351. (3) Perkel, J. M. The human proteome project takes shape down under. BioTechniques 2011, 50, 149−155. (4) Legrain, P.; Aebersold, R.; Archakov, A.; Bairoch, A.; Bala, K.; Beretta, L.; Bergeron, J.; Borchers, C. H.; Corthals, G. L.; Costello, C. E.; Deutsch, E. W.; Domon, B.; Hancock, W.; He, F.; Hochstrasser, D.; Marko-Varga, G.; Salekdeh, G. H.; Sechi, S.; Snyder, M.; Srivastava, S.; Uhlen, M.; Wu, C. H.; Yamamoto, T.; Paik, Y.-K.; Omenn, G. S. The Human Proteome Project: current state and future direction. Mol. Cell. Proteomics 2011, 10, M111.009993. (5) Nilsson, T.; Mann, M.; Aebersold, R.; Yates, J. R.; Bairoch, A.; Bergeron, J. J. M. Mass spectrometry in high-throughput proteomics: ready for the big time. Nat. Methods 2010, 7, 681−685. (6) Orchard, S.; Hermjakob, H.; Taylor, C. F.; Potthast, F.; Jones, P.; Zhu, W.; Julian, R. K.; Apweiler, R. Second proteomics standards initiative spring workshop. Expert Rev. Proteomics 2005, 2, 287−289. (7) Omenn, G. S. Exploring the human plasma proteome. Proteomics 2005, 5, 3223−3225. (8) He, F. Human liver proteome project: plan, progress, and perspectives. Mol. Cell. Proteomics 2005, 4, 1841−1848. (9) Uhlen, M.; Björling, E.; Agaton, C.; Szigyarto, C. A.-K.; Amini, B.; Andersen, E.; Andersson, A.-C.; Angelidou, P.; Asplund, A.; Asplund, C.; Berglund, L.; Bergström, K.; Brumer, H.; Cerjan, D.; Ekström, M.; Elobeid, A.; Eriksson, C.; Fagerberg, L.; Falk, R.; Fall, J.; Forsberg, M.; Björklund, M. G.; Gumbel, K.; Halimi, A.; Hallin, I.; Hamsten, C.; Hansson, M.; Hedhammar, M.; Hercules, G.; Kampf, C.; Larsson, K.; Lindskog, M.; Lodewyckx, W.; Lund, J.; Lundeberg, J.; Magnusson, K.; Malm, E.; Nilsson, P.; Odling, J.; Oksvold, P.; Olsson, I.; Oster, E.; Ottosson, J.; Paavilainen, L.; Persson, A.; Rimini, R.; Rockberg, J.; Runeson, M.; Sivertsson, A.; Sköllermo, A.; Steen, J.; Stenvall, M.; Sterky, F.; Strömberg, S.; Sundberg, M.; Tegel, H.; Tourle, S.; Wahlund, E.; Waldén, A.; Wan, J.; Wernérus, H.; Westberg, J.; Wester, K.; Wrethagen, U.; Xu, L. L.; Hober, S.; Pontén, F. A human protein atlas for normal and cancer tissues based on antibody proteomics. Mol. Cell. Proteomics 2005, 4, 1920−1932. (10) Hamacher, M.; Marcus, K.; Stephan, C.; Klose, J.; Park, Y. M.; Meyer, H. E. HUPO Brain Proteome Project: toward a code of conduct. Mol. Cell. Proteomics 2008, 7, 457. (11) Yamamoto, T.; Langham, R. G.; Ronco, P.; Knepper, M. A.; Thongboonkerd, V. Towards standard protocols and guidelines for urine proteomics: a report on the Human Kidney and Urine Proteome Project (HKUPP) symposium and workshop, 6 October 2007, Seoul, Korea and 1 November 2007, San Francisco, CA, USA. Proteomics 2008, 8, 2156−2159.
(12) Paik, Y.-K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Marko-Varga, G.; Aebersold, R.; Bairoch, A.; Yamamoto, T.; Legrain, P.; Lee, H.-J.; Na, K.; Jeong, S.-K.; He, F.; Binz, P.-A.; Nishimura, T.; Keown, P.; Baker, M. S.; Yoo, J. S.; Garin, J.; Archakov, A.; Bergeron, J.; Salekdeh, G. H.; Hancock, W. S. Standard guidelines for the Chromosome-centric Human Proteome Project. J. Proteome Res. 2012, 11 (4), 2005−2013. (13) Paik, Y.-K.; Jeong, S.-K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Cho, S. Y.; Lee, H.-J.; Na, K.; Choi, E.-Y.; Yan, F.; Zhang, F.; Zhang, Y.; Snyder, M.; Cheng, Y.; Chen, R.; Marko-Varga, G.; Deutsch, E. W.; Kim, H.; Kwon, J.-Y.; Aebersold, R.; Bairoch, A.; Taylor, A. D.; Kim, K. Y.; Lee, E.-Y.; Hochstrasser, D.; Legrain, P.; Hancock, W. S. The Chromosome-centric Human Proteome Project for cataloging proteins encoded in the genome. Nat. Biotechnol. 2012, 30, 221−223. (14) Gentleman, R. C.; Carey, V. J.; Bates, D. M.; Bolstad, B.; Dettling, M.; Dudoit, S.; Ellis, B.; Gautier, L.; Ge, Y.; Gentry, J.; Hornik, K.; Hothorn, T.; Huber, W.; Iacus, S.; Irizarry, R.; Leisch, F.; Li, C.; Maechler, M.; Rossini, A. J.; Sawitzki, G.; Smith, C.; Smyth, G.; Tierney, L.; Yang, J. Y.; Zhang, J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5, R80. (15) Irizarry, R. A.; Bolstad, B. M.; Collin, F.; Cope, L. M.; Hobbs, B.; Speed, T. P. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003, 31, e15. (16) Storey, J. D.; Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 9440−9445. (17) Marcilla, M.; Alpizar, A.; Paradela, A.; Albar, J.-P. A systematic approach to assess amino acid conversions in SILAC experiments. Talanta 2011, 84, 430−436. (18) Elias, J. E.; Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4, 207−214. (19) Martínez-Bartolomé, S.; Medina-Aunon, J. A.; Jones, A. R.; Albar, J. P. Semi-automatic tool to describe, store and compare proteomics experiments based on MIAPE compliant reports. Proteomics 2010, 10, 1256−1260. (20) Medina-Aunon, J. A.; Martínez-Bartolomé, S.; Lopez-Garcia, M. A.; Salazar, E.; Navajas, R.; Jones, A. R.; Paradela, A.; Albar, J. P. The ProteoRed MIAPE web toolkit: a user-friendly framework to connect and share proteomics standards. Mol. Cell. Proteomics 2011, 10, M111.008334. (21) Bech-Serra, J.-J.; Borthwick, A.; Colomé, N.; ProteoRed Consortium; Albar, J.-P.; Wells, M.; Sánchez del Pino, M.; Canals, F. A multi-laboratory study assessing reproducibility of a 2D-DIGE differential proteomic experiment. J. Biomol. Tech. 2009, 20, 293−296. (22) Martínez-Bartolomé, S.; Blanco, F.; Albar, J.-P. Relevance of proteomics standards for the ProteoRed Spanish organization. J. Proteomics 2010, 73, 1061−1066. (23) Babel, I.; Barderas, R.; Diaz-Uriarte, R.; Moreno, V.; Suarez, A.; Fernandez-Aceñ ero, M. J.; Salazar, R.; Capellá, G.; Casal, J. I. Identification of MST1/STK4 and SULF1 proteins as autoantibody targets for the diagnosis of colorectal cancer by using phage microarrays. Mol. Cell. Proteomics 2011, 10, M110.001784. (24) Pitarch, A.; Nombela, C.; Gil, C. Prediction of the clinical outcome in invasive candidiasis patients based on molecular fingerprints of five anti-Candida antibodies in serum. Mol. Cell. Proteomics 2011, 10, M110.004010. (25) Calamia, V.; Fernández-Puente, P.; Mateos, J.; Lourido, L.; Rocha, B.; Montell, E.; Vergés , J.; Ruiz-Romero, C.; Blanco, F. J. Pharmacoproteomic study of three different chondroitin sulfate compounds on intracellular and extracellular human chondrocyte proteomes. Mol. Cell. Proteomics 2012, 11, M111.013417. (26) la Cuesta, de, F.; Alvarez-Llamas, G.; Maroto, A. S.; Donado, A.; Zubiri, I.; Posada, M.; Padial, L. R.; Pinto, A. G.; Barderas, M. G.; Vivanco, F. A proteomic focus on the alterations occurring at the human atherosclerotic coronary intima. Mol. Cell. Proteomics 2011, 10, M110.003517. (27) Sánchez-Quiles, V.; Mora, M. I.; Segura, V.; Greco, A.; Epstein, A. L.; Foschini, M. G.; Dayon, L.; Sánchez, J.-C.; Prieto, J.; Corrales, F. J.; Santamaria, E. HSV-1 Cgal+ infection promotes quaking RNA binding protein production and induces nuclear-cytoplasmic shuttling of 121
dx.doi.org/10.1021/pr300898u | J. Proteome Res. 2013, 12, 112−122
Journal of Proteome Research
Article
quaking I-5 isoform in human hepatoma cells. Mol. Cell. Proteomics 2011, 10, M111.009126. (28) Ramachandran, N.; Raphael, J. V.; Hainsworth, E.; Demirkan, G.; Fuentes, M. G.; Rolfs, A.; Hu, Y.; LaBaer, J. Next-generation highdensity self-assembling functional protein arrays. Nat. Methods 2008, 5, 535−538. (29) Taylor, C. F.; Paton, N. W.; Lilley, K. S.; Binz, P.-A.; Julian, R. K.; Jones, A. R.; Zhu, W.; Apweiler, R.; Aebersold, R.; Deutsch, E. W.; Dunn, M. J.; Heck, A. J. R.; Leitner, A.; Macht, M.; Mann, M.; Martens, L.; Neubert, T. A.; Patterson, S. D.; Ping, P.; Seymour, S. L.; Souda, P.; Tsugita, A.; Vandekerckhove, J.; Vondriska, T. M.; Whitelegge, J. P.; Wilkins, M. R.; Xenarios, I.; Yates, J. R.; Hermjakob, H. The minimum information about a proteomics experiment (MIAPE). Nat. Biotechnol. 2007, 25, 887−893. (30) Martens, L.; Chambers, M.; Sturm, M.; Kessner, D.; Levander, F.; Shofstahl, J.; Tang, W. H.; Römpp, A.; Neumann, S.; Pizarro, A. D.; Montecchi-Palazzi, L.; Tasman, N.; Coleman, M.; Reisinger, F.; Souda, P.; Hermjakob, H.; Binz, P.-A.; Deutsch, E. W. mzML–a community standard for mass spectrometry data. Mol. Cell. Proteomics 2011, 10, R110.000133. (31) Jones, A. R.; Eisenacher, M.; Mayer, G.; Kohlbacher, O.; Siepen, J.; Hubbard, S. J.; Selley, J. N.; Searle, B. C.; Shofstahl, J.; Seymour, S. L.; Julian, R.; Binz, P.-A.; Deutsch, E. W.; Hermjakob, H.; Reisinger, F.; Griss, J.; Vizcaíno, J. A.; Chambers, M.; Pizarro, A.; Creasy, D. The mzIdentML Data Standard for Mass Spectrometry-Based Proteomics Results. Mol. Cell. Proteomics 2012, 11, M111.014381. (32) Binz, P.-A.; Barkovich, R.; Beavis, R. C.; Creasy, D.; Horn, D. M.; Julian, R. K.; Seymour, S. L.; Taylor, C. F.; Vandenbrouck, Y. Guidelines for reporting the use of mass spectrometry informatics in proteomics. Nat. Biotechnol. 2008, 26, 862. (33) Taylor, C. F.; Binz, P.-A.; Aebersold, R.; Affolter, M.; Barkovich, R.; Deutsch, E. W.; Horn, D. M.; Hühmer, A.; Kussmann, M.; Lilley, K.; Macht, M.; Mann, M.; Müller, D.; Neubert, T. A.; Nickson, J.; Patterson, S. D.; Raso, R.; Resing, K.; Seymour, S. L.; Tsugita, A.; Xenarios, I.; Zeng, R.; Julian, R. K. Guidelines for reporting the use of mass spectrometry in proteomics. Nat. Biotechnol. 2008, 26, 860−861. (34) Martens, L.; Hermjakob, H.; Jones, P.; Adamski, M.; Taylor, C.; States, D.; Gevaert, K.; Vandekerckhove, J.; Apweiler, R. PRIDE: the proteomics identifications database. Proteomics 2005, 5, 3537−3545. (35) Hermjakob, H.; Apweiler, R. The Proteomics Identifications Database (PRIDE) and the ProteomExchange Consortium: making proteomics data accessible. Expert Rev. Proteomics 2006, 3, 1−3. (36) Moreno-Aliaga, M. J.; Pérez-Echarri, N.; Marcos-Gómez, B.; Larequi, E.; Gil-Bea, F. J.; Viollet, B.; Gimenez, I.; Martínez, J. A.; Prieto, J.; Bustos, M. Cardiotrophin-1 is a key regulator of glucose and lipid metabolism. Cell Metab. 2011, 14, 242−253. (37) Bustos, M.; Beraza, N.; Lasarte, J.-J.; Baixeras, E.; Alzuguren, P.; Bordet, T.; Prieto, J. Protection against liver damage by cardiotrophin-1: a hepatocyte survival factor up-regulated in the regenerating liver in rats. Gastroenterology 2003, 125, 192−201. (38) Iñiguez, M.; Berasain, C.; Martinez-Ansó, E.; Bustos, M.; Fortes, P.; Pennica, D.; Avila, M. A.; Prieto, J. Cardiotrophin-1 defends the liver against ischemia-reperfusion injury and mediates the protective effect of ischemic preconditioning. J. Exp. Med. 2006, 203, 2809−2815. (39) Marquès, J. M.; Belza, I.; Holtmann, B.; Pennica, D.; Prieto, J.; Bustos, M. Cardiotrophin-1 is an essential factor in the natural defense of the liver against apoptosis. Hepatology 2007, 45, 639−648. (40) Walters, R. G.; Jacquemont, S.; Valsesia, A.; de Smith, A. J.; Martinet, D.; Andersson, J.; Falchi, M.; Chen, F.; Andrieux, J.; Lobbens, S.; Delobel, B.; Stutzmann, F.; El-Sayed Moustafa, J. S.; Chèvre, J.-C.; Lecoeur, C.; Vatin, V.; Bouquillon, S.; Buxton, J. L.; Boute, O.; HolderEspinasse, M.; Cuisset, J.-M.; Lemaitre, M.-P.; Ambresin, A.-E.; Brioschi, A.; Gaillard, M.; Giusti, V.; Fellmann, F.; Ferrarini, A.; Hadjikhani, N.; Campion, D.; Guilmatre, A.; Goldenberg, A.; Calmels, N.; Mandel, J.-L.; Le Caignec, C.; David, A.; Isidor, B.; Cordier, M.-P.; Dupuis-Girod, S.; Labalme, A.; Sanlaville, D.; Béri-Dexheimer, M.; Jonveaux, P.; Leheup, B.; Ounap, K.; Bochukova, E. G.; Henning, E.; Keogh, J.; Ellis, R. J.; Macdermot, K. D.; van Haelst, M. M.; Vincent-Delorme, C.; Plessis, G.; Touraine, R.; Philippe, A.; Malan, V.; Mathieu-Dramard, M.; Chiesa, J.;
Blaumeiser, B.; Kooy, R. F.; Caiazzo, R.; Pigeyre, M.; Balkau, B.; Sladek, R.; Bergmann, S.; Mooser, V.; Waterworth, D.; Reymond, A.; Vollenweider, P.; Waeber, G.; Kurg, A.; Palta, P.; Esko, T.; Metspalu, A.; Nelis, M.; Elliott, P.; Hartikainen, A.-L.; McCarthy, M. I.; Peltonen, L.; Carlsson, L.; Jacobson, P.; Sjöström, L.; Huang, N.; Hurles, M. E.; O’Rahilly, S.; Farooqi, I. S.; Männik, K.; Jarvelin, M.-R.; Pattou, F.; Meyre, D.; Walley, A. J.; Coin, L. J. M.; Blakemore, A. I. F.; Froguel, P.; Beckmann, J. S. A new highly penetrant form of obesity due to deletions on chromosome 16p11.2. Nature 2010, 463, 671−675. (41) Jacquemont, S.; Reymond, A.; Zufferey, F.; Harewood, L.; WaClters, R. G.; Kutalik, Z.; Martinet, D.; Shen, Y.; Valsesia, A.; Beckmann, N. D.; Thorleifsson, G.; Belfiore, M.; Bouquillon, S.; Campion, D.; de Leeuw, N.; de Vries, B. B. A.; Esko, T.; Fernandez, B. A.; Fernández-Aranda, F.; Fernández-Real, J. M.; Gratacòs, M.; Guilmatre, A.; Hoyer, J.; Jarvelin, M.-R.; Kooy, R. F.; Kurg, A.; Le Caignec, C.; Männik, K.; Platt, O. S.; Sanlaville, D.; Van Haelst, M. M.; Villatoro Gomez, S.; Walha, F.; Wu, B.-L.; Yu, Y.; Aboura, A.; Addor, M.C.; Alembik, Y.; Antonarakis, S. E.; Arveiler, B.; Barth, M.; Bednarek, N.; Béna, F.; Bergmann, S.; Beri, M.; Bernardini, L.; Blaumeiser, B.; Bonneau, D.; Bottani, A.; Boute, O.; Brunner, H. G.; Cailley, D.; Callier, P.; Chiesa, J.; Chrast, J.; Coin, L.; Coutton, C.; Cuisset, J.-M.; Cuvellier, J.-C.; David, A.; de Freminville, B.; Delobel, B.; Delrue, M.-A.; Demeer, B.; Descamps, D.; Didelot, G.; Dieterich, K.; Disciglio, V.; Doco-Fenzy, M.; Drunat, S.; Duban-Bedu, B.; Dubourg, C.; El-Sayed Moustafa, J. S.; Elliott, P.; Faas, B. H. W.; Faivre, L.; Faudet, A.; Fellmann, F.; Ferrarini, A.; Fisher, R.; Flori, E.; Forer, L.; Gaillard, D.; Gerard, M.; Gieger, C.; Gimelli, S.; Gimelli, G.; Grabe, H. J.; Guichet, A.; Guillin, O.; Hartikainen, A.-L.; Heron, D.; Hippolyte, L.; Holder, M.; Homuth, G.; Isidor, B.; Jaillard, S.; Jaros, Z.; Jiménez-Murcia, S.; Helas, G. J.; Jonveaux, P.; Kaksonen, S.; Keren, B.; Kloss-Brandstätter, A.; Knoers, N. V. A. M.; Koolen, D. A.; Kroisel, P. M.; Kronenberg, F.; Labalme, A.; Landais, E.; Lapi, E.; Layet, V.; Legallic, S.; Leheup, B.; Leube, B.; Lewis, S.; Lucas, J.; MacDermot, K. D.; Magnusson, P.; Marshall, C.; MathieuDramard, M.; McCarthy, M. I.; Meitinger, T.; Mencarelli, M. A.; Merla, G.; Moerman, A.; Mooser, V.; Morice-Picard, F.; Mucciolo, M.; Nauck, M.; Ndiaye, N. C.; Nordgren, A.; Pasquier, L.; Petit, F.; Pfundt, R.; Plessis, G.; Rajcan-Separovic, E.; Ramelli, G. P.; Rauch, A.; Ravazzolo, R.; Reis, A.; Renieri, A.; Richart, C.; Ried, J. S.; Rieubland, C.; Roberts, W.; Roetzer, K. M.; Rooryck, C.; Rossi, M.; Saemundsen, E.; Satre, V.; Schurmann, C.; Sigurdsson, E.; Stavropoulos, D. J.; Stefansson, H.; Tengström, C.; Thorsteinsdóttir, U.; Tinahones, F. J.; Touraine, R.; Vallée, L.; van Binsbergen, E.; Van der Aa, N.; Vincent-Delorme, C.; Visvikis-Siest, S.; Vollenweider, P.; Völzke, H.; Vulto-van Silfhout, A. T.; Waeber, G.; Wallgren-Pettersson, C.; Witwicki, R. M.; Zwolinksi, S.; Andrieux, J.; Estivill, X.; Gusella, J. F.; Gustafsson, O.; Metspalu, A.; Scherer, S. W.; Stefansson, K.; Blakemore, A. I. F.; Beckmann, J. S.; Froguel, P. Mirror extreme BMI phenotypes associated with gene dosage at the chromosome 16p11.2 locus. Nature 2011, 478, 97−102. (42) Draghici, S. Data Analysis Tools for DNA Microarrays; Chapman & Hall/CRC: Boca Raton, FL, 2003.
122
dx.doi.org/10.1021/pr300898u | J. Proteome Res. 2013, 12, 112−122