The PeptideAtlas of the Domestic Laying Hen - ACS Publications

Feb 7, 2017 - and David C. Muddiman*,†. †. W.M. Keck FTMS Laboratory for Human Health Research, Department of Chemistry, North Carolina State ...
1 downloads 0 Views 1MB Size
Article pubs.acs.org/jpr

The PeptideAtlas of the Domestic Laying Hen James McCord,† Zhi Sun,‡ Eric W. Deutsch,‡ Robert L. Moritz,‡ and David C. Muddiman*,† †

W.M. Keck FTMS Laboratory for Human Health Research, Department of Chemistry, North Carolina State University, Raleigh, North Carolina 27695, United States ‡ Institute for Systems Biology, Seattle, Washington 98109, United States S Supporting Information *

ABSTRACT: Proteomics-based biological research is greatly expanded by high-quality mass spectrometry studies, which are themselves enabled by access to quality mass spectrometry resources, such as high-quality curated proteome data repositories. We present a PeptideAtlas for the domestic chicken, containing an extensive and robust collection of chicken tissue and plasma samples with substantial value for the chicken proteomics community for protein validation and design of downstream targeted proteome quantitation. The chicken PeptideAtlas contains 6646 canonical proteins at a protein FDR of 1.3%, derived from ∼100 000 peptides at a peptide level FDR of 0.1%. The rich collection of readily accessible data is easily mined for the purposes of data validation and experimental planning, particularly in the realm of developing proteome quantitation workflows. Herein we demonstrate the use of the atlas to mine information on common chicken acute-phase proteins and biomarkers for cancer detection research, as well as their localization and polymorphisms. This wealth of information will support future proteomebased research using this highly important agricultural organism in pursuit of both chicken and human health outcomes. KEYWORDS: biomarkers, chicken, ovarian cancer, PeptideAtlas



INTRODUCTION The domestic laying hen (Gallus gallus) is the most abundant avian species on the planet1 and a cornerstone species in food production and classical embryological study.2 This importance was underscored by its selection as the first classical “production” organism to be fully genome sequenced, in 2004.3 With the establishment of functional genomic tools for the chicken, it has officially entered the postgenomic era dominated by proteomic studies.4 Mass spectrometry (MS)-based proteomics has reached widespread acceptance as a powerful tool to study organisms and protein products through multiple stages of disease, development, and degradation.5 While the experimental techniques are well developed and robust, the application of proteomic methods to nonclassical model organisms remains nascent. This is particularly true for organisms of predominantly agricultural interest, such as chickens, whose investigators frequently lack access to advanced MS resources available to human health researchers.6 The result is that only ∼10% of the chicken genome is well annotated,7,8 in spite of the enormous potential power of the chicken as a biological model system and its substantive importance as an agricultural product. The chicken was a classic model organism for embryological development in the pregenomic era, and the development of genetic tools have returned it to the forefront.2,9 Furthermore, the hen ovary exhibits many similar © XXXX American Chemical Society

genetic and physiological traits to humans and is the only nonhuman animal known to spontaneously develop ovarian cancer.10,11 Clearly there is enormous medical and economic potential in proteomics studies involving Gallus gallus that is limited by access to MS resources and the incomplete nature of chicken proteomic/genomic databases. Successful mass spectrometry experiments are greatly enabled by access to publicly available databases of validated data, such as those found in PeptideAtlas12 or the ProteomeXchange13 consortium database, PRIDE;14,15 however, these databases are currently devoid of substantial information regarding the domestic hen. The completion of a reference database for chicken proteomics would substantially advance experimental planning and data validation for global and targeted analysis in this model system. PeptideAtlas (http://www.peptideatlas.org) curates and compiles mass spectrometry data derived from a variety of experiments through the reprocessing of available MS data using the Trans-Proteomic Pipeline (TPP), a freely available open source suite of tools for tandem MS experiments.16−18 The resulting proteome builds contain high stringency (70% proteome coverage,21 and includes classical agriculturally important organisms, such as the pig (Sus scrofa domesticus)22 and the cow (Bos taurus).23 Here we present an initial build of a chicken PeptideAtlas, containing a variety of structurally complex ovarian tissue types, as well as diagnostically relevant plasma samples. Currently, the chicken PeptideAtlas is directed toward the hen as an ovarian cancer model organism, but data from liver tissue and plasma are included as a useful baseline, and additional tissue types will be incorporated into later builds. We demonstrate the current utility of the atlas through the mining of reported ovarian cancer biomarkers, as well as general inflammatory response proteins, and the selection of candidate peptides for quantitative SRM analysis.



LC-MS/MS Analyses

Peptide samples were analyzed by LC-MS/MS using one of three sets of conditions: (1) Separation was performed on a nanoLC-2D (Eksigent Technologies) coupled to an LTQ-FT-Ultra with a 7T FT-ICR Mass Analyzer (Thermo Scientific) in a vented column configuration.29 The column consisted of a 75 μm × 5 cm Integrafrit trap and 75 μm × 15 cm Picofrit column (New Objective) self-packed with 5 μm Magic C18AQ stationary phase. Mobile phases A and B consisted of 98/2/0.2 and 2/98/0.2 water/ACN/formic acid, and the separation was performed as a 60 min gradient from 2 to 60% B at a flow rate of 350 nL/min. Data was collected in Top-8 data-dependent mode over a mass range of 400−1600 m/z with a full scan resolving power of 100 000 FWHM @ 400 m/z using IT-CID and an AGC target of 1e6. Dynamic exclusion windows of 3 min were applied to each MS/MS precursor mass. (2) Separation was conducted with an EASY nLC II (Thermo Scientific) coupled to a Q Exactive benchtop mass spectrometer (Thermo Scientific). The trap and column were arranged in the vented column configuration as in (1), and used a mobile phase gradient of 2%−5% B (2 min), 5%−40% B (200 min) at a flow rate of 300 nL/min. Parameters for the mass spectrometer were set according to optimums derived for proteomics experiments on the Q Exactive.30 Briefly, data was collected in Top-12 data-dependent MS/MS mode using HCD with a full scan mass range of 400−1600 m/z, a full scan resolving power of 70 000 and MS/MS resolving power of 17 500 FWHM @ 200 m/z, and a dynamic exclusion window of 30 s. AGC targets were set at 1e6 and 2e5 for full scan and MS/MS scan, respectively, with complementary maximum injections times of 30 ms and 250 ms. (3) Chromatographic separation was carried out with an EASY nLC-1000 (Thermo Scientific) coupled to a Q Exactive HF (Thermo Scientific). A 75 μm × 5 cm trap was prepared with a frit synthesized as described by Meiring et al.31 and coupled to a 75 μm × 30 cm Picofrit emitter (New Objective). Both were self-packed with 2.6 μm, 100 Å particle size C18 stationary phase (Phenomenex), and assembled in the same column setup as in (1) and (2). Peptides were separated by a gradient of 2−20% B (100 min), 20−32% B (30 min) at 300 nL/min. MS parameters were independently optimized for proteomics experiments on the Q Exactive HF and are reported in Hecht et al.28 The data was collected in Top 20 datadependent MS/MS mode using HCD with a 15 s dynamic exclusion window, a full scan mass range of 375−1500 m/z, and resolving power of 120 000 and 15 000 FWHM @ 200 m/z for the MS1 and MS2 scans, respectively. The full scan AGC and injection were set to 1e6 and 30 ms for the MS1 scan and 1e5 and 30 ms for the MS2 scans.

EXPERIMENTAL PROCEDURES

Sample Collection and Processing

The biological samples included in the chicken PeptideAtlas were collected from two cohorts of Bovan’s white leghorn commercial laying hens in the Poultry Science department at North Carolina State University. Birds were managed in accordance with the Institute for Laboratory Animal Research Guide with all of the husbandry practices being approved by and under the oversight of North Carolina State University Institutional Animal Care and Use Committee (IACUC). Plasma and tissue from the initial cohort of B-strain hens were obtained in a previous study,24 with samples collected from a second cohort of 300 C-strain hens similarly. In brief, ∼2 mL of blood was collected from each bird every 4 months beginning at an age of 26 weeks. The collected blood was centrifuged at 3000g at 4 °C, and the plasma was collected in cryogenic tubes for long-term storage at −80 °C. At an age of 136 weeks, the birds were sacrificed by cervical dislocation and dissected to ascertain gross internal pathology. Tissue was collected from all birds presenting neoplastic lesions on the ovary, oviduct, duodenum, pancreas, GI tract, and/or liver as well as from a selected number of “healthy” birds with no visible distress. Tissues were preserved for proteomic study by snap freezing in liquid nitrogen and stored at −80 °C. Whole tissue samples were lysed using a 1:5 ratio of tissue to buffer containing 50 mM Tris pH = 7.8, 8 M urea, 2 M thiourea, 10 mM EDTA, 10 mM DTT, and 0.001% sodium azide and homogenized using an OMNI TIP Homogenizing kit, as described previously.25 A subset of liver and ovary tissues were instead subjected to laser microdissection to isolate yolk, ovarian stroma, and follicular wall tissue, which were subsequently lysed in a buffer of 100 mM Tris-HCl, pH 7.5, 100 mM dithiothreitol (DTT), and 4% sodium dodecyl sulfate.26 Plasma samples were used as collected. Cell lysates and plasma were processed for LC-MS/MS analyses using either a 1D SDS-PAGE in-gel digestion protocol24,25 or filter aided sample preparation (FASP)27 as described previously.26 A subset of the FASP prepared samples were deglycosylated on-filter using peptide-N-glycosidase F

Data Processing and PeptideAtlas Assembly

The processing of collected MS data followed the general workflow of Farrah et al.32 with the Trans-Proteomic Pipeline18 as the main component (Figure 1). B

DOI: 10.1021/acs.jproteome.6b00952 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

Figure 1. Schematic workflow of the PeptideAtlas build process.

Table 1. Summary of Sample Types Included in the Chicken PeptideAtlas, Including Disease Status of Epithelial Ovarian Cancer (EOC) and/or Oviductal Cancer (OVD)a

a

ID

tissue

disease state

instrument protocol

no. spectra ID’d

no. distinct

no. proteins

6757 6773 6758 6762 6754 6772 6761 6765 6764 6774 6770 6756 6771 6760 6766 6767 6753 6763 6755 6768 6801 6797 6795 6799 6798 6796 6800 6794 6802

liver ovary ovary ovary ovary oviduct oviduct oviduct oviduct ovarian granulosa ovarian stroma small white follicle ovary ovarian follicle yolk plasma plasma plasma plasma plasma plasma ovary ovary oviduct oviduct plasma ovary ovary oviduct oviduct

healthy healthy late-stage EOC early-stage EOC late-stage EOC-OVD healthy early-stage EOC late-stage EOC late-stage EOC-OVD healthy healthy healthy healthy healthy early-stage EOC late-stage EOC late-stage EOC-OVD healthy healthy late-stage EOC Healthy late-stage EOC late-stage EOC healthy healthy healthy late-stage EOC healthy late-stage EOC

2 1 1 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3

659875 163185 170087 200708 1234 55924 64427 43347 24018 69787 77159 4938 91301 56365 161840 111422 57269 188913 133106 144047 269263 292654 331682 187123 255303 482706 507992 381373 379049

34645 15048 17160 18906 671 8888 10978 10418 8716 16457 19709 2329 19561 13006 3594 3849 3150 4277 6751 7121 29960 35261 36078 18221 9250 32718 33611 26717 36305

3616 2725 3149 3446 317 1232 1665 1633 1566 2779 2915 576 2928 2178 228 245 206 239 226 279 4149 4310 4728 2968 248 3990 4151 3033 4461

A complete list of included files can be found in Table S-1.

First, vendor.RAW files were converted to the mzML format33 using the ProteoWizard34 msConvert tool version 3.0.794. Next, the mzML files were searched using both Comet35 and X!Tandem36 with the hrk-score plugin. The chicken search database was compiled as a nonredundant union of 24 231 chicken sequences from UniProtKB/TrEMBL,

16 354 sequences from Ensembl release 82,37 and the cRAP database. For Q Exactive files, parent mass errors were set to ±20 ppm with product ion mass tolerances of ±20 ppm for X! Tandem and ±0.03 Da for Comet. LTQ-FT data used a parent mass error of ±50 ppm and product mass tolerance of ±0.4 Da for both X!Tandem and Comet. Searching allowed semitryptic C

DOI: 10.1021/acs.jproteome.6b00952 J. Proteome Res. XXXX, XXX, XXX−XXX

Article

Journal of Proteome Research

with an FDR threshold of 0.01% and over one hundred thousand distinct peptides at a false-discovery rate of 1%. This translates into 6646 canonical proteins at a protein FDR of 1.3%. A substantial number of additional proteins are at least partially distinguishable, meaning that they possess at least one unique, well-detected peptide within the chicken PeptideAtlas; the total number of observed proteins is 11 370, corresponding to 47% of 23 997 predicted gene products of Gallus gallus based on the UniProtKB/TrEMBL database. A sizable fraction of these detected species exist at the single-peptide level, which is to be expected based on the reproductive system focus of the current data sets. Future inclusion of additional sample types is expected to increase the overall gene product coverage of the chicken PeptideAtlas.

peptides and a maximum of two missed cleavages. A static modification of carbamidomethyl on C, and a variable modification for oxidation on M, acetylation of protein Ntermini, and pyro-glu on Q, E, and C were also used. For samples processed with deglycosylation, a variable modification of deamidation on N was also searched. The search results were processed with the TPP18,38 version 4.8.1 tools. The pepXML17 output for Comet was used directly, and the TPP tool Tandem2XML was used to convert the native X!Tandem output to pepXML. For each experiment, the set of pepXML files from the two search outputs were processed together with PeptideProphet.39 The two output files were then combined via iProphet.40 The iProphet output was further processed using the reSpect algorithm20 to access spectral chimeras. During reSpect searching, the precursor mass tolerance was increased to match the MS isolation window. For LTQ-FT data, the parent mass tolerance was increased to 2.0 Da and likewise increased to 1.4 Da for Q Exactive data. The new set of mzML files generated by reSpect were searched and processed using the TPP as for the initial files. For Q Exactive data, this process was repeated an additional time, for two complete rounds of reSpect in total. The spectra used for reSpect were filtered for a minimum iProphet probability ≥0.0 during the initial round, and ≥0.5 for the second round of analysis. Using the PeptideAtlas processing pipeline, all the iProphet results from standard and reSpect searches were filtered at a variable probability threshold to maintain a constant FDR threshold for each experiment. The filtered data was assessed with the MAYU software41 to calculate decoy-based FDRs at the peptide-spectrum match (PSM), distinct peptide, and protein levels. All results were collated in the Chicken PeptideAtlas, available at http://www.peptideatlas.org.



Chimeric MS/MS reSpect Analysis

One of the basic assumptions of MS/MS data processing is that the fragment ion spectra obtained are derived from an individual precursor molecule isolated by MS. In spite of continued advancement in MS, precursor isolation windows remain relatively large to the achievable mass accuracy of the analyzers. Further, the complexity of whole proteomes remains well beyond the separation capacity of current chromatographic systems. The end result is that fragment spectra are frequently “chimera” containing peptide fragments from multiple precursors. While this is sometimes an intentional byproduct of the acquisition method, as in data-independent analyses, the collection of chimera can significantly reduce identifications in data-dependent sampling of complex samples due to the algorithmic MS/MS interpretation limitations.44,45 The build of this initial chicken PeptideAtlas incorporates data processed with the newly developed reSpect tool,20 which enables multiple processing of complex MS/MS spectra to isolate cofragmented chimeric peptides and increase overall peptide/ protein IDs. Examining the output of the reSpect reveals the effect of its inclusion on the chicken PeptideAtlas coverage. New peptide identifications derived from the reSpect output account for 825,135/5,125,800 = 16.1% of the total PSMs included in the chicken PeptideAtlas, only five specific instances of which corresponded to the identification that would originally have been assigned by a standard search. This frequency of spectral chimeras is in accordance with previously reported values.45 Within the chimeric group, 653 874 (79.2%) of the identifications have precursor masses falling outside the m/z tolerances of the initial search and are therefore unintentional precursor inclusions as the result of wide precursor isolation windows. Interestingly, 378 194 (45.8%) of the PSMs were obtained from spectra that were assigned a ≥ 90% probability in the initial search, while a roughly equivalent number, 387 714 (47.0%), were from spectra whose assignment was given a probability of