Proteogenomic Analysis of Candida glabrata using ... - ACS Publications

Nov 30, 2011 - Candida glabrata is a common opportunistic human pathogen leading to significant mortality in immunosuppressed and immunodeficient ...
0 downloads 0 Views 742KB Size
Article pubs.acs.org/jpr

Proteogenomic Analysis of Candida glabrata using High Resolution Mass Spectrometry T. S. Keshava Prasad,*,†,‡,§,∥ H. C. Harsha,† Shivakumar Keerthikumar,†,⊥ Nirujogi Raja Sekhar,†,‡ Lakshmi Dhevi N. Selvan,†,∥ Praveen Kumar,†,∥ Sneha M. Pinto,†,§ Babylakshmi Muthusamy,†,‡ Yashwanth Subbannayya,†,# Santosh Renuse,†,∥,□,¶ Raghothama Chaerkady,†,□,¶ Premendu P. Mathur,‡ Raju Ravikumar,■ and Akhilesh Pandey□,¶,○,▲ †

Institute of Bioinformatics, International Technology Park, Bangalore -560 066, India Centre of Excellence in Bioinformatics, Bioinformatics Centre, School of Life Sciences, Pondicherry University, Puducherry -605 014, India § Manipal University, Madhav Nagar, Manipal, Karnataka 576104; India ∥ Amrita School of Biotechnology, Amrita University, Kollam -690 525, India # Rajiv Gandhi University of Health Sciences, Jayanagar, Bangalore −560 041, India ■ Department of Neuromicrobiology, National Institute of Mental Health and Neuro Sciences, Bangalore -560029, India. □ McKusick-Nathans Institute of Genetic Medicine and ¶Departments of Biological Chemistry, ○Pathology and ▲Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, United States ‡

S Supporting Information *

ABSTRACT: Candida glabrata is a common opportunistic human pathogen leading to significant mortality in immunosuppressed and immunodeficient individuals. We carried out proteomic analysis of C. glabrata using high resolution Fourier transform mass spectrometry with MS resolution of 60000 and MS/MS resolution of 7500. On the basis of 32453 unique peptides identified from 118815 peptide−spectrum matches, we validated 4421 of the 5283 predicted protein-coding genes (83%) in the C. glabrata genome. Further, searching the tandem mass spectra against a six frame translated genome database of C. glabrata resulted in identification of 11 novel protein coding genes and correction of gene boundaries for 14 predicted gene models. A subset of novel protein-coding genes and corrected gene models were validated at the transcript level by RT-PCR and sequencing. Our study illustrates how proteogenomic analysis enabled by high resolution mass spectrometry can enrich genome annotation and should be an integral part of ongoing genome sequencing and annotation efforts. KEYWORDS: medical mycology, clinical proteomics, candidiasis, candidemia, molecular diagnostics, fungal infection, genome annotation



INTRODUCTION Annotation of protein-coding genes is one of the primary tasks after sequencing of any genome. Conventional approaches used to identify protein-coding genes are mainly dependent on computational algorithms that predict genes. Less commonly, transcriptomic data is used to complement this approach for genome annotation. However, the protein coding potential of these genes can be confirmed only by investigating the proteome directly. Also, predicted protein start sites and exon boundaries can be ambiguous. Comparative genomic methodologies that are sometimes employed may also miss genes that are unique to an organism, because of lack of similar genes in another species. Thus, mass spectrometry-derived data is © 2011 American Chemical Society

invaluable for accurate genome annotation because, in addition to providing evidence for protein coding potential of genes, it can help identify novel genes and to correct erroneous annotation of known genes.1,2 Proteogenomicsthe use of peptide data for annotation of genomesis gaining importance in recent times.3−5 Candida glabrata has gained importance as an opportunistic human pathogen owing to its resistance to broad-spectrum antimycotic drugs.6,7 C. glabrata, otherwise considered as a Special Issue: Microbial and Plant Proteomics Received: August 26, 2011 Published: November 30, 2011 247

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Journal of Proteome Research

Article

which was collected in fresh vials. Protein estimation was carried out using Lowry’s assay.30

nonpathogenic saprophytic organism, has been increasingly reported from patients suffering from acquired immunodeficiency syndrome or undergoing immunosuppressive therapy.8 It causes higher mortality when compared to other nonalbicans Candida species with a reported mortality rate ranging from 50 to 100% among those with cancer, septicaemia or in bonemarrow transplant patients.9−14 In contrast, C. glabrata exhibits low virulence in animal models.15,16 Virulence factors of C. glabrata have not been well understood at the molecular level, when compared to that of C. albicans.17,18 Aspartyl proteinases and phospholipases, which are associated with the virulence of other Candida species, have not been observed to play a role in the pathogenesis of C. glabrata infections.19−26 In a recent study, we described a comparative proteomic strategy to identify differentially expressed proteins between proteomes of C. albicans and C. glabrata, with the goal of developing a discovery platform for diagnostics.27 In contrast to C. albicans, which is the most studied organism among Candida species, only limited proteomic studies have been carried out in C. glabrata.7,28 Significantly, although the complete genome sequence of C. glabrata was published in the year 2004,29 the large majority of protein-coding genes annotated in its genome are still labeled as hypothetical. Therefore, we employed an unbiased global proteomics approach to characterize the proteome of C. glabrata using an LTQ-Orbitrap Velos mass spectrometer (Thermo Scientific, Bremen, Germany). In addition to protein databases, a six-frame translated genome database was also used to search mass spectrometry-derived data for identification of peptides. Mass spectrometry-based proteomics is a reliable method to confirm the protein expression in any given system.3−5 Therefore, we used a proteogenomic approach to analyze the proteome and genome annotation of C. glabrata. Peptides identified from protein database search were used to validate the expression of predicted proteins. In addition, peptides uniquely identified from translated genome database, genome search-specific peptides (GSSPs), were used to discover novel protein coding regions in the C. glabrata genome.



Trypsin Digestion and Protein/Peptide Fractionation

Approximately 300 μg of protein was resolved by SDS-PAGE. The SDS-PAGE gel was stained with colloidal Coommassie blue and the entire lane was divided into twenty gel bands. The gel bands were excised and subjected to trypsin digestion as described earlier.31 Briefly, reduction and alkylation were carried out using 5 mM dithiothreitol (60 °C for 45 min.) and 20 mM iodoacetamide (RT for 10 min), respectively. Trypsin (modified sequencing grade; Promega, Madison, WI, US) was added with an enzyme: substrate ratio of 1:20 and the digestion was carried out at 37 °C for 16 h. After the digestion, the peptides were extracted from gel bands using 0.4% formic acid in 3% acetonitrile twice; (i) using 0.4% formic acid in 50% acetonitrile and (ii) using 100% acetonitrile, sequentially. The extracted peptides were dried and stored at −80 °C until LC− MS/MS analysis. Proteins extracted in 8 M urea lysis buffer (∼600 μg protein) were subjected to in-solution digestion using trypsin. Reduction and alkylation of proteins was carried out as described above. Overnight digestion was carried out using trypsin with an enzyme: substrate ratio of 1:20 at 37 °C. The digest was then purified using Sep-pak C18 columns (WAT051910, Waters Corporation, MA) and lyophilized at −52 °C (Operon, Gyeonggi-do, Korea). Subsequently, the sample was split into two equal halves and fractionated by strong cation exchange (SCX) chromatography32 or OFFGEL electrophoresis.33 SCX fractionation was carried out on a PolySulfethyl A column (PolyLC, Columbia, MD; 200 Å, 5 μm, 200 × 2.1 mm) using an Agilent 1200 series HPLC system containing a binary pump, autosampler, UV detector and a fraction collector. For this, pH of in-solution digest was adjusted to 2.85 using 1 M phosphoric acid and diluted to 1 mL using SCX solvent A (10 mM potassium phosphate buffer in 20% ACN, pH 2.85). Fractionation of peptides (0.2 mL fractions) was carried out by a linear gradient of solvent B (10 mM KH2PO4, 350 mM KCl, 20% acetonitrile, pH 2.85) for 70 min. The fractions were completely dried, reconstituted in 40 μL of 0.1% TFA, desalted using C18 stage tips and stored at −80 °C until LC−MS/MS analysis.34 pI-based OFFGEL electrophoresis was carried out using an OFFGEL fractionator (Agilent 3100) as per manufacturer’s instructions. Briefly, peptides were separated using IPG strip (pH 3−10) by focusing for 50 kVh with maximum current of 50 μA and maximum voltage set to 4000 V. After the fractionation was complete, a total of 12 fractions were collected and acidified using 1% TFA, desalted using C18 stage tips and stored at −80 °C until LC−MS/MS analysis.34

EXPERIMENTAL PROCEDURES

Yeast Strain and Culture

C. glabrata (MTCC 3814) culture was obtained from the Microbial Type Culture Collection and Gene Bank resource in Chandigarh, India. The cells were cultured in 2% YPD broth at 30 °C with shaking for 8 h. Cells were harvested when the OD600 of the culture was 1. Approximately 10 billion cells were pelleted by centrifuging at 2000× g for 10 min and the pellets were washed repeatedly using phosphate buffer saline to remove any proteins derived from the growth medium. After 10 washings, we confirmed the lack of any residual protein/ peptide in the supernatant by Lowry’s assay30 and SDS-PAGE. Cell pellets were stored at −80 °C until further analysis. No biological replicates were used in this study.

Mass Spectrometry and Data Analysis

The peptides obtained in a total of 52 fractions (from SDSPAGE, SCX and OFFGEL fractionation protocols) were analyzed on an LTQ-Orbitrap Velos mass spectrometer interfaced with an Agilent’s 1200 Series nanoflow liquid chromatography system (Agilent Technologies). The RP-LC system consisted of an enrichment column (75 μm × 2 cm, C18 material 5 μm, 100 Å) and an analytical column (75 μm × 10 cm, C18 material 5 μm, 100A) packed in-house. The peptides were loaded on the enrichment column using 97% solvent A (0.1% formic acid) and resolved on a 10 cm C18 reverse phase column packed in-house using a linear gradient of 10−30% solvent B (90% acetonitrile, 0.1% formic acid) for

Protein Extraction

Cell pellets were dissolved in 0.5% SDS and 8 M urea lysis buffer and homogenized for 5 min (Ultra Turrax T8, IKA Works Inc., Wilmington, NC). The samples were subsequently sonicated for 5 min using an ultrasonicator (Microson XL, NY). The cells were then subjected to disruption using glass beads in a cell disruptor (Disruptor Genei SI-D267, Scientific Industries Inc. NY) for 30 min at 4 °C. The samples were centrifuged at 10000× g for 10 min at 4 °C to obtain a clear supernatant, 248

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Journal of Proteome Research

Article

Figure 1. Proteomic analysis of Candida glabrata. (A) The Venn diagram illustrates the number of peptides found to be unique or common to the fractionation strategies that were employedOFFGEL, SDS-PAGE and SCX. (B) Relative contribution of the different fractionation methods used to the total number of proteins identified. (C) Proteome landscape of C. glabrata: An overview of peptide data against the genome was generated using Circos software. Diverse data features analyzed in this study were mapped against the C. glabrata genome. The concentric circles from the periphery to the center represent (i) C. glabrata chromosomes, (ii) proteins encoded by the genome, (iii) peptides identified in this study, genome search-specific peptides, (iv) novel proteins discovered in this study, (v) revised gene models, and (vi) cDNA sequences obtained through validation experiments. This diagram provides overall idea on the depth of proteomic information obtained in this study and their uniform coverage across the genome.

60 min at a constant flow rate of 0.4 μL/min. Data was acquired using Xcalibur 2.1 (Thermo Fisher Scientific, Bremen, Germany). The spray voltage and heated capillary temperature were set to 2.0 kV and 225 °C, respectively. The data was acquired in a data dependent manner. From each MS survey scan, 20 most intense precursor ions with a charge state of ≥2 were selected for fragmentation. The peptides were fragmented by HCD with a normalized collision energy of 38. MS scans were acquired at a resolution of 60000 at 400 m/z while MS/ MS scans were acquired at a resolution of 7500. The automatic gain control (AGC) for full FT MS was set to 1 million ions and for FT MS/MS was set to 0.1 million ions with maximum time of accumulation 750 and 100 ms, respectively. We searched mass spectrometry-derived data using two search algorithmsSequest and X!Tandemseparately against (i) a protein database composed of protein sequences of C. glabrata from NCBI and Genolevures (http://genolevures. org/download.html) (ii) a database of six frame translated genome of C. glabrata. The protein database was comprised of 5192 protein sequences from RefSeq 37 build, 5283 from Genolevures (release 102) and 491 from GenBank (release 170). We obtained 889568 translated sequences from C. glabrata genome (Genolvures release 3). We used Proteome Discoverer software (version 1.2.0.208, Thermo Fisher Scientific) to search the MS/MS data using Sequest. For MS/MS search, oxidation of methionine, phosphorylation at serine, threonine and tyrosine and protein N-terminal acetylation were set as variable modifications and carbamidomethylation of cysteine residues as a

fixed modification. One missed cleavage, and a mass deviation of 20 ppm at MS level and 0.1 Da at MS/MS level were allowed. Peptides identified with a 1% FDR threshold were considered for further analysis. Workflow for Genome Annotation

Peptides identified only in the search against six-frame translation of C. glabrata genome, but not present in protein database, were used for genome annotation. We refer to these peptides as GSSPs. GSSPs were mapped to the C. glabrata genome by carrying out TBLASTN and their corresponding coordinate positions were noted. The coordinates overlapping with the existing annotation of gene models were segregated as peptides to modify the annotation of existing gene models of C. glabrata. GSSPs that mapped to regions of the genome where there were no gene annotations were used to predict novel protein-coding genes in C. glabrata genome. Approximately 100 kb region of the genome flanking the GSSPs was retrieved from NCBI and used to determine likely gene models. We analyzed these genomic regions by using comparative genomics approach with other genomes including Saccharomyces cerevisiae and C. albicans among other yeast. We also subjected the same genomic sequences to in silico gene prediction GeneMark,35 ORF Finder and Augustus36 to determine candidate novel gene models. For GeneMark, we used GeneMark.hmm 2.4 with S. cerevisiae as species. We used the web-based version of Augustus software and selected S. cerevisiae and C. albicans as organisms. Peptides, GSSPs, novel and revised genes and RT-PCR derived transcripts 249

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Journal of Proteome Research

Article

Figure 2. Proteomic evidence for gene models. A view of peptides annotated in Genolevures genome browser. Mass-spectrometry derived peptide data were uploaded to C. glabrata genome browser provided in the Genolevures resource using remote annotation option. In the screenshot, these peptides can be viewed with respect to genome and predicted genes. In this example, 27 peptides confirm protein-coding potential of the predicted gene model for CAGL0B03047g.

determined with 10 μL of PCR reaction volume on 1.5% agarose gel using StepUp 100 bp DNA ladder (Genei, Bangalore). The PCR products were purified using Qiaquick PCR purification kit (Qiagen, Valencia, CA; Catalog No. 28104). The purified PCR products were subjected to automated DNA sequencing using Applied Biosystems 3730xl DNA Analyzer Big Dye Terminator Version 3.1. The cDNA sequences were submitted to GenBank.

obtained in this study were graphically depicted against the genome of C. glabrata using Circos software.37 For this, we provided genomic coordinates for each category of sequences for Circos configuration file and exported figure in SVG format. RT-PCR Validation

Reverse transcription-polymerase chain reaction (RT-PCR) is one of the widely used methods for validating the expression of any gene at the level of transcripts.38,39 RT-PCR was carried out to provide transcript level evidence for novel genes and revised gene models identified by GSSPs. The primers were designed using Primer 3 (v.0.4.0) software for a subset of gene models that were supported by gene prediction programs and/or homologous proteins in related species. Gene specific primers were designed to span the regions that were revised using proteomics data. The amplicon size was chosen to be 100−350 bp to ensure good sequencing quality. Total RNA was isolated from C. glabrata using RNeasy Mini Kits (Qiagen, Valencia, CA; Catalog No. 74104) and treated with DNase free RNase to remove any contaminating genomic DNA. About 1 μg of total RNA was used for cDNA synthesis with QuantiTect Reverse Transcription Kit (Qiagen, Valencia, CA; Catalog No. 205311). The resulting cDNA was used as a template for the PCR reactions. PCR was performed with 2 μL of cDNA, 100 nM of forward and reverse primers, 1.5 mM MgCl2, 0.2 mM dNTP mix, 1.5U of Taq polymerase and PCR buffer in 50 μL reaction volumes. PCR cycles with initial denaturation at 95 °C for 3 min, 30 cycles of 95 °C for 30 s, 60 °C for 30 s, 72 °C for 30 s and final extension at 72 °C for 5 min were followed. PCR reaction carried out with isolated RNA was used as negative control to eliminate genomic DNA contamination. The size of the amplified products was



RESULTS AND DISCUSSION We carried out an in-depth proteomic profiling of C. glabrata using multiple protein/peptide fractionation strategies coupled to high resolution Fourier transform mass spectrometry (LTQOrbitrap Velos). By employing SDS-PAGE for protein fractionation and strong cation exchange chromatography and isoelectric focusing for peptide level fractionation, a total of 52 samples were generated for LC-MS/MS analysis. Mass spectrometry-derived data was searched using two search algorithmsSequest and X!Tandemagainst (i) a protein database comprising of protein sequences of C. glabrata from databases of Genolevures (http://genolevures.org/download. html) and NCBI RefSeq and (ii) a database of six frame translated genome of C. glabrata. Previously, we had described a comparative proteomic analysis to identify differentially expressed proteins between proteomes of C. albicans and C. glabrata, particularly to provide a discovery platform for diagnostics.27 For this, we used iTRAQ reagents for in vitro labeling, SCX chromatography for fractionation, a quadrupole time-of-flight mass spectrometer (QSTAR/Pulsar, Applied Biosystems) for MS/MS and ProteinPilot software for MS/ MS data analysis. While analyzing the proteomic data for the 250

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Component of the RSC chromatin remodeling complex

HS410789.1

JK317949.1

Database searches using 460693 MS/MS spectra resulted in over 118815 peptide-spectrum matches corresponding to 32453 unique peptides. Peptides identified from the whole proteome analysis of C. glabrata are listed in Supplementary Table S1 (Supporting Information). Of the 32453 unique peptides identified, 13494, 3831, and 7045 peptides were obtained from SDS-PAGE, SCX and offgel fractions, respectively (Figure 1A). The use of SDS-PAGE, SCX and offgel fractionation strategies individually contributed to the identification of 3657, 2463, and 3443 proteins, respectively (Figure 1B). Out of these proteins, 1816 proteins have been identified in common by SDS-PAGE, SCX and offgel methods, whereas 634 (SDS-PAGE), 156 (SCX) and 305 (offgel) proteins were identified exclusively by only one of these fractionation methods. Sequest and X!Tandem searches separately provided identification of 29057 and 15904 peptides, which resulted in the identification of a set of 32453 unique peptides. Sequest and X!Tandem commonly identified 12508 peptides and exclusively identified 16549 and 3396 peptides, respectively. It is clear from these data, our approach of employing more than one method in protein isolation and multiple strategies in protein/peptide fractionation resulted in an increased number of peptide and protein identification from the same sample. In total, we identified peptides from ∼83% of the protein coding genes annotated for C. glabrata. We also searched MS/MS spectra against six-frame translated genome database. We compared peptide sequences obtained in this search with those present in the protein database. We identified a set of unique peptides, which were not represented among predicted proteins of C. glabrata. These peptides are designated as GSSPs. In this study, we identified 51 unique GSSPs that resulted in identification of 11 novel protein coding regions and resulted in the modification of annotation of 14 existing gene models. A graphical representation of all the peptides identified in this study is shown in Figure 1C using Circos software.37 The total number of predicted genes in C. glabrata is 5283 whose sequences are available in the Genolevures database.29,40 These protein sequences will remain as hypothetical ORFs until their expression is confirmed at the level of transcripts and/or proteome. We identified peptides from 4421 proteins which represents >83% of the total predicted protein-coding genes. This is precious information as there is no other evidence in the form of ESTs/mRNAs for most genes in C. glabrata. For example, CAGL0B03047g gene is predicted to be proteincoding. However, there was no transcript or proteomic evidence to support this gene model thus far. We obtained 27 peptides that validated and confirmed CAGL0B03047g gene as illustrated in Figure 2. More than 20% of protein coverage was obtained for 1352 proteins, out of which 364 proteins had peptide evidence for more than 50% of their sequences. Single peptides supported the identification of 714 proteins and most of these peptides were represented by multiple tandem mass spectra. A complete list of proteins is given in Supplementary Table S2 (Supporting Information). Similar high identification

102565−102344 59112−59032 341387−341127 361807−361586 518289−518864 563690−563304 541501−541617 L NC_006035.2 L NC_006035.2 B NC_005968.1 C NC_006026.1 D NC_006027.1 D NC_006027.1 I NC_006032.2 IOB_Cglpro005 IOB_Cglpro006 IOB_Cglpro007 IOB_Cglpro008 IOB_Cglpro009 IOB_Cglpro010 IOB_Cglpro011 5 6 7 8 9 10 11

Chr Chr Chr Chr Chr Chr Chr

IOB_Cglpro004 4

Chr K NC_006034.2

530872−531141

ANSQVVDEEMIKR; LNKLDESAVVSDEEEEQEVDGYR SFLLVLDCETGGVGSLK NNQTVISSMKNLTTLR DLEEIPGTIGAELLWAIK SAELALLASSVPVLVSSTSSVTK ELEVLYVMCLGGVVGTAR KIVLQMPHPSGMASLK YPLPSRIESLK

HS410791.1 XP_002547152.1 (Candida tropicalis) XP_002770269.1 (Debaryomyces hansenii) HS410790.1 XP_002545738.1 (Candida tropicalis) EACAIQDCLQSNGYNEDR 1060349−1060567 IOB_Cglpro003 3

Chr K NC_006034.2

HS410792.1 XP_002499102.1 (Zygosaccharomyces rouxii) XP_001644267.1 (Vanderwaltozyma polyspora) IOB_Cglpro002

putative function orthologs

Multipronged Proteomic Approaches Maximize Protein Discovery in C. glabrata.

2

Chr L NC_006035.2

871204−871419

APPTPVIDGIPAEASR; SLTQYHCAPNTQGQGVHK; DMFVCIPFTR ESMLEVFGAIDDSE; ATAAEIDELLK 360774−360983

previous study, we realized that most of the predicted proteins in C. glabrata genome are hypothetical. Therefore, we performed a deep proteomic analysis of C. glabrata proteome using multiple strategies in protein/peptide fractionation and data analysis, results of which are described in this manuscript.

IOB_Cglpro001

Chr B NC_005968.1

Article

1

evidence of transcript genome search-specific peptides genome coordinates chromosome/ accession protein identifier

Table 1. Novel Genes Identified in C. glabrata

HS410788.1 NP_058154.1 (S. cerevisiae) NP_983417.1 (Ashbya gossypii) Putative mitochondrial inner XP_002497225.1 (Zygosaccharomyces rouxii) membrane peptidase

Journal of Proteome Research

251

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Journal of Proteome Research

Article

Figure 3. Identification of novel protein-coding genes using peptide data from genome database search. (A) Three GSSPs mapped an intergenic region located between the CAGL0B03597g and CAGL0B03619g genes. Blastx of the novel coding region showed similarity to the predicted genes ACR014C (NP_983417.1) in Ashbya gossypii ATCC 10895 and ZYRO0F00528 (XP_002497225.1) in Zygosaccharomyces rouxii CBS 732. (B) Validation of novel and revised gene models by RT-PCR approach. Targeted cDNA sequencing by RT-PCR based approach confirmed the expression of new mRNAs for the subset of novel and revised proteins. RT-PCR products were sequenced on both strands and the resulting sequences were submitted to GenBank.

resistant and susceptible forms of C. glabrata. These studies have documented 25 and 24 proteins from C. glabrata, respectively.

rate has been achieved in S. cerevisiae through a proteomics approach. Shevchenko et al. and Nagaraj et al. have provided peptide evidence for >90% of the proteome in S. cerevisiae.41,42 Using a similar approach described in this study, our group has recently completed a proteogenomic analysis of Anopheles gambiae43 and Mycobacterium tuberculosis.44 We achieved >80% proteome coverage in M. tuberculosis and identified 41 novel protein-coding genes.44 The proteome of C. glabrata has not been previously investigated using high resolution mass spectrometry. A summary of previous proteomic efforts on C. glabrata are discussed here. Schmidt et al. identified 180 proteins of C. glabrata while investigating pH response using a 2-DE platform.28 Stead and co-workers have used proteomic approaches to identify proteins which respond to the inactivation of transcription factor protein Ace2, which increases virulence in C. glabrata.45 In this study, they identified 123 proteins out of which 32 proteins were overexpressed and 29 were downregulated in response to suppression of Ace2 activity. As most of these proteins belonged to extracellular component, Stead et al. reinvestigated effects of Ace2 suppression on secretome of C. glabrata and identified 31 proteins in extracellular proteome.46 In separate studies, Rogers et al.47 and Seneviratne et al.7 have also carried out proteomic studies to investigate molecular level differences among drug

High Resolution Proteomic Data Enriches Genomic Information

Use of “six-frame translated genome database” for MS/MS search resulted in the identification of 51 peptides, which are not previously predicted from C. glabrata genome. We used these GSSPs to identify novel protein-coding regions in the genome. GSSPs with N-terminal acetylation, found upstream of predicted protein-coding genes in the genome, enabled us to assign novel protein start sites in addition to validation of these predicted genes. Besides, GSSPs also supported extension of predicted proteins and exons. In addition to identification of novel protein-coding gene models, these peptides also changed status of a pseudogene into a protein-coding gene. We used comparative genomic strategies to investigate conservation of revised gene models across related species. Presence of orthologous genes in other species provides further support for the revised gene structures. Conversely, absence of conserved homologous protein-coding regions in other genomes indicates that perhaps these genes or gene regions may be unique to C. glabrata. 252

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Journal of Proteome Research

Article

Figure 4. Evidence of translation for pseudogene using mass spectrometry data. (A) Five GSSPs mapped to the genomic region corresponding to a pseudogene CAGL0M00110g. These peptides were also mapped to the protein derived from the conceptual translation of pseudogene. A cDNA (HO864402.2) was also obtained for the region by a targeted sequencing using RT-PCR approach. (B) Novel protein was found to be similar to the predicted gene CAGL0K00110g (XP_448233.1) of C. glabrata. However, substantial sequence differences exist between these two proteins in the regions corresponding to GSSPs (shown in red vertically extended fonts). Region of the protein which corresponded to the HO864402.2 sequence is underlined.

Identification of 11 Novel Genes on the Basis of GSSPs

based on a single long peptide of 18 amino acids, EACAIQDCLQSNGYNEDR, and subsequent ORF analysis. Another genome search-specific peptide, SFLLVLDCETGGVGSLK, revealed a novel gene (IOB_Cglpro005) encoded by the opposite strand where CAGL0L00803g gene resides. This GSSP was found to be the part of the gene model which encodes a protein of 73 amino acid length predicted by ORF Finder. A complete list of novel genes proposed in this study is provided in Table 1.

We identified 11 novel protein-coding genes, which had not been reported in Genolevures annotation. Representative MS/MS spectra are shown in Supplementary Figure S1 (Supporting Information). These ORFs, supported by peptides derived in this study, are supported by gene models as predicted by gene prediction programs including GeneMark35 Augustus36 and ORF Finder (from NCBI). A complete list of novel proteincoding genes identified in the current study is given in Table 1. Three GSSPs, APPTPVIDGIPAEASR, SLTQYHCAPNTQGQGVHK and DMFVCIPFTR, were clustered in the intergenic region of two predicted genes, CAGL0B03597g and CAGL0B03619g, on chromosome B (Figure 3A). ORF analysis of this region revealed a novel protein of 69 amino acids with peptides covering over 65% of the protein length. The novel protein is also highly conserved in other yeast species including Ashbya gossypii (NP_983417.1); Lachancea thermotolerans (XP_002552714.1); Kluyveromyces lactis (XP_453111.1); Zygosaccharomyces rouxii (XP_002497225.1) and Vanderwaltozyma polyspora (XP_001646269.1). Further, we also validated expression of this gene at the level of transcript (IOB_Cglpro001) (Figure 3B). Another novel protein of 72 amino acids (IOB_Cglpro003) was discovered in the region spanning 1.059MB-1.060MB on Chromosome K. The novel protein was similar to predicted proteins in Candida tropicalis and Debaryomyces hansenii (XP_002547152.1 and XP_002770269.1, respectively). This protein was identified

Amendment of Current Gene Models by GSSPs

Five GSSPs were found to be clustered in the genomic region of an annotated pseudogene, CAGL0M00110g, located on chromosome M. Conceptual translation of this pseudogene yielded a protein sequence of 186 amino acids. Interestingly, all five GSSPs matched in-frame with the ORF encoded by this pseudogene (Figure 4A) indicating that CAGL0M00110g gene is in fact a protein-coding gene. Using Blastp analysis, we observed that this protein is highly similar to CAGL0I11011g and CAGL0K00110g proteins of C. glabrata which belong to hyphally regulated cell wall protein families with acetylneuraminyl hydrolase or exoalpha-sialidase activities. We also obtained cDNA sequence (GenBank accession # HO864402.2) by RT-PCR corresponding to this novel protein. These protein sequences did not have well conserved homologous proteins in other related yeasts, which explain the difficulty in locating these protein-coding regions by comparative genomic analysis. 253

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Journal of Proteome Research

Article

Figure 5. Revision of existing gene models by extension of protein-coding regions. Examples of GSSPs which were used to extend predicted regions CAGL0G01892g and CAGL0H10472g. (A) Genome search-specific peptide DQFSTLQNHMFPMFFK overlapped with N-terminal 4 amino acids of XP_446443.1 (CAGL0G01892g gene). Augustus and ORF Finder also predicted an extended gene in this region. The revised protein sequence with N-terminal extension by 42 amino acids was proposed through comparative genomic analysis as supported by homologous proteins in other yeasts (not shown in the figure). A cDNA was also obtained (HO864405.1) which supports the extended region of the revised gene model. (B) Genome search-specific peptide TVVSTDQLNKDETTTPLLK which corresponded to 5′ region of CAGL0H10472g gene. The revised protein sequence with N-terminal extension of 109 amino acids was proposed through comparative genomic analysis as supported by homologous proteins in other yeasts (not shown in the figure). ORF Finder also predicted the extension of this gene. For these examples, (i) the extended region of protein was boxed; (ii) region of protein for which cDNA sequence obtained was underlined; (iii) GSSPs was highlighted with red colored vertically extended fonts.

A genome-search specific peptide, DQFSTLQNHMFPMFFK, enabled the N-terminal extension of predicted protein XP_446443.1 encoded by CAGL0G01892g located on chromosome G. This GSSP overlapped with the initial 7 amino acids of XP_446443.1 sequence with an additional 9 amino acids upstream. Comparative genomic analysis to detect other homologous proteins in related species led us to NP_015423.2 in S. cerevisiae and XP_001644960.1 in Vanderwaltozyma polyspora, both of which were longer at their N-termini. In addition, both Augustus and ORF Finder gene prediction algorithms predicted the gene models which extended the gene in the 5′ direction in agreement with the novel annotated start codon (Figure 5A).

Another example is a GSSP spanning the genomic region 5′ of the CAGL0H10472g gene without any overlap with the protein annotated by this gene (XP_447253.1). Comparative genomic analysis and ORF Finder supported an N-terminal extension of XP_447253.1 by 109 amino acids. Using sequence similarity with protein orthologsNP_010168.4 in S. cerevisiae and XP_002497545.1 in Zygosaccharomyces rouxiiwe extended this protein (Figure 5B). GSSPs also provided evidence regarding exon extension of two gene models. We found 18 GSSPs upstream of the second exon of the CAGL0D02090g gene. Six of these peptides also overlapped with up to 27 amino acids encoded by the second exon of the annotated CAGL0D02090g gene (Figure 6A). We also found conservation of the extended exon in a homologue 254

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Journal of Proteome Research

Article

Figure 6. Revision of existing gene model by extension of exon region. Around 14 unique GSSPs mapped to N-terminus of second exon of CAGL0D02090g gene and provided evidence for the extension of the exon. Comparative genomics analysis showed similarity to protein ASC1p (NP_013834.1) of Saccharomyces cerevisiae which also supported the revision of exon structure. Targeted cDNA sequencing by RT-PCR approach also provided transcript-level evidence for the extended exon. Revised gene model extended exon and protein XP_445505.1 by 41 amino acids which added an additional WD40 domain in the protein. The protein sequences of revised XP_445505.1 of C. glabrata and NP_013834.1 of Saccharomyces cerevisiae were aligned and (i) the extended region of protein was boxed; (ii) region of protein for which cDNA sequence obtained was overlined; (iii) protein sequence covered by GSSPs and other peptides were highlighted with red colored and blue colored vertically extended fonts, respectively.

of CAGL0D02090g protein in S. cerevisiae called Asc1 (Figure 6B). Thus, a combination of GSSPs and homology based analysis allowed us to extend 121 nucleotides upstream of second exon. The revised gene model in this case, which includes an extended second exon, now encodes a protein of 318 amino acids, which is 41 amino acids longer than the predicted protein XP_445505.1. We also obtained cDNA sequence including the exon junction by designing forward primer from exon 1 and reverse primer from exon 2 regions (GenBank accession # HO864404.1). The cDNA sequence thus obtained confirmed the extended exon and novel exon junction. Similarly, single GSSP − EQLGLPTGAIMNCADNSGAR, extended second exon of CAGL0G03575g by 2 amino acids. Revised gene models are listed in Table 2. Additionally, revised protein sequences proposed for these genes are provided in Table S3 (Supporting Information).

predicted or experimentally derived transcripts. There are no other straightforward experimental methodologies to achieve the assignment of translational start sites. Homology-based comparative genomic approaches were helpful to support the predicted start sites to certain extent.48 It is known that most eukaryotic proteins are N-terminally modified with an acetyl group often after removal of the initiator methionine.49,50 This modification occurs even prior to completion of translation of entire protein. Therefore, identification of N-terminal acetylated peptides can help in determining true start site of proteins. Mass spectrometry methods have been successfully employed to reliably characterize N-terminally acetylated peptides.43,44,51,52 Among the GSSPs identified in this study, six peptides were found to be N-terminally acetylated (Table 2). Representative mass spectra for these N-terminally acetylated GSSPs are provided in Supplementary Figure S2 (Supporting Information). When correlated with genome, these peptides were found to be upstream of predicted proteins and in-frame with predicted transcripts. We have also found initiator codon upstream of codon, which encodes acetylated residue. Using

Proteomic Evidence for Translational Start Sites from N-Terminally Acetylated Peptides

The majority of protein start sites are assigned based on the longest open reading frame conceptually translated from 255

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

CAGL0G01892g XP_446443.1

CAGL0D06314g XP_445680.1 CAGL0H08415g XP_447165.1

CAGL0J05940g XP_447964.1

CAGL0H10472g XP_447253.1

CAGL0L00517g XP_448796.1 CAGL0I00264g XP_447264.1 CAGL0H03267g XP_446935.1 CAGL0C03399g XP_445324.1 CAGL0G04213g XP_446546.1 CAGL0D04834g XP_002999531.1

2

3

5

6

7

256

CAGL0D02090g XP_445505.1

CAGL0G03575g XP_446519.1

13

14

12

11

10

9

8

4

CAGL0M00110g

1

protein identifier

137/139

277/318

49/51

335/356

478/479

775/796

161/163

1000/1007

248/357

486/525

100/110

990/1012

117/162

186

protein length/revised protein length

Chr G NC_006030.1

Chr D NC_006027.1

Chr L NC_006035.2 Chr I NC_006032.2 Chr H NC_006031.1 Chr C NC_006026.1 Chr G NC_006030.1 Chr D NC_006027.1

Chr H NC_006031.1

Chr J NC_006033.2

Chr D NC_006027.1 Chr H NC_006031.1

Chr G NC_006030.1

Chr M NC_006036.2

chromosome/ accession

(214357− 214890); (215438− 215860) (346786− 346827); (37489− 37866)

61048− 58025 19312− 18821 306087− 308477 341752− 343191 402047− 400977 471218− 471373

1020270 −1021343

562570− 564147

595789− 598827 824601− 824933

169566− 170054

559−2

genome coordinates genome search-specific peptides

Exon extension IMLWNSAAKVPMYTLSAGDEVYALSFSPNR; IEADFVGHNSNVNTVTASPDGSLIASAGK; FVGHNSNVNTVTASPDGSLIASAGK; DGEIMLWNSAAK; ASPDGSLIASAGK; and more (Refer to Table S4, Supporting Information) EQLGLPTGAIMNCADNSGAR

ac-(M)DMFNMGQGESEEEKK

ac-(M)MNVTAEEHHK; FDDERHHMK

ac-(M)MQSMNVQHQVMPGHEQMMPQR

ac-(S)DTGSVESEKSDFQDSR

ac-(S)MSPFHQLRPVDGQGNIYPFEMLK

Protein extension by N-terminally acetylated peptides ac-(S)SAQDNMVQHNNTK

TVVSTDQLNKDETTTPLLK

NHLAATTTATTNDTSQYPNASNVR

SFCPLCNNMLLVATSDNGVYNLSCR; CPLCNNMLLVATSDNGVYNLSCR

TEQEANAIKNEGSNDSIQTTK

Pseudogene to protein-coding gene GTSSSKPQQLK SAHAFTVPFVFR GFSISSAHAFTVPFVFR GGALYYVNNNMK VDKGGALYYVNNNMK Protein extension DQFSTLQNHMFPMFFK

Table 2. Revised Gene Models in C. glabrata with Proteomic Evidence

HO864404.1

HO864407.1

HO864408.2

HO864403.2

HO864405.1

HO864402.2

evidence of transcript

NP_009466.1 (S. cerevisiae)/ 60S ribosomal protein L23

NP_013834.1 (S. cerevisiae)/G-protein beta subunit and guanine nucleotide dissociation inhibitor

NP_011696.1 (S. cerevisiae)/ Putative ribonucleotidediphosphate reductase NP_015459.1 (S. cerevisiae)/ Putative mitochondrial import receptor subunit

NP_014450.1 (S. cerevisiae)/ Putative CAF1 ribonuclease

NP_012303.1 (S. cerevisiae)/ Putative glutathione peroxidase NP_015134.1 (S. cerevisiae)/ Putative ribonucleoprotein

NP_010330.1 (S. cerevisiae); XP_001644857.1 (Vanderwaltozyma polyspora); XP_002496229.1 (Zygosaccharomyces rouxii)/ Putative transcription elongation factor NP_014245.1 (S. cerevisiae); AAO32440.1 (Saccharomyces bayanus)/ Putative plasma membrane-bound casein kinase NP_010168.4 (S. cerevisiae); XP_002497545.1 (Zygosaccharomyces rouxii)/ Putative nucleocytoplasmic transport protein

NP_015423.2 (S. cerevisiae); XP_001644960.1 (Vanderwaltozyma polyspora)/ Putative inner membrane protein translocase

XP_448233.1 (C. glabrata)/Hyphally regulated cell wall protein

homologues/putative function

Journal of Proteome Research Article

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Journal of Proteome Research

Article

Figure 7. Use of N-terminally acetylated peptides to assign protein start sites. (A) Example of GSSP ac-SSAQDNMVQHNNTK which overlapped with N-terminus of XP_448796.1 (CAGL0L00517g). Revised gene model extended the protein by 7 amino acids. A cDNA sequence HO864403.2 was obtained by targeted RT-PCR approach, which supported revised gene model. (B) Two GSSPs, ac-MNVTAEEHHK and FDDERHHMK corresponding to 5′ genomic region of AGL0G04213g gene, the latter overlapping with 2 N-terminal amino acids of XP_446546.1. Revised gene model extended the protein by 21 amino acids of which 17 amino acids were supported by the GSSP (red colored vertically extended fonts) and 4 amino acids (blue colored vertically extended fonts) were conceptually annotated from the genome. For all these examples, (i) the extended region of protein was boxed; (ii) region of protein for which cDNA sequence obtained was underlined; (iii) GSSP was highlighted with red colored vertically extended fonts.

C. glabrata. Further, we carried out RT-PCR and sequencing for a subset of these novel and revised gene models to provide an additional level of validation (Tables 1 and 2). We validated 6 out of 11 novel protein coding genes by RT-PCR. The list of primers designed for these genes are provided in Table S4 (Supporting Information). These genes are IOB_Cglpro001 (GenBank accession # HS410788.1), IOB_Cglpro002 (GenBank accession # HS410792.1), IOB_Cglpro003 (GenBank accession # HS410791.1), IOB_Cglpro004 (GenBank accession # HS410790.1), IOB_Cglpro005 (GenBank accession # JK317949.1) and IOB_Cglpro009 (GenBank accession # HS410789.1). We also validated 4 revised gene models with N-terminal extension of protein sequences (GenBank accessions HO864405.1, HO864403.2, HO864408.2 and HO864407.1) by RT-PCR. In addition, we validated exon extension in a two exon gene, CAGL0D02090g, by RT-PCR and sequencing (GenBank accession # HO864404.1). A cDNA was also obtained for the pseudogene (GenBank accession # HO864402.2), which was observed to be protein coding. The

these peptides, we were able to extend protein at N-terminus and also assign translational start sites for six proteins CAGL0L00517g, CAGL0I00264g, CAGL0H03267g CAGL0C03399g, CAGL0G04213g and CAGL0D04834g. Two of these examples are illustrated in Figure 7. In the case of CAGL0L00517g, the peptide Ac-(S)SAQDNMVQHNNTK overlapped with N-terminal sequences of protein XP_448796.1 (Figure 7A). In another example, the GSSPsAc-(M) NVTAEEHHK and FDDERHHMKmapped to N-terminal region of the CAGL0G04213g, with former peptide showing no overlap with protein sequence of XP_446546.1 and the latter peptide sharing two N-terminal amino acids of this annotated protein sequence (Figure 7B). The revised protein sequences proposed for these 6 examples of novel protein start sites are given in Supplementary Table S3 (Supporting Information). Targeted RT-PCR-based Validation of Novel Proteogenomic Discoveries

In this study, GSSPs and gene prediction programs, we proposed 11 novel genes and 14 revised gene models in 257

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Journal of Proteome Research



ABBREVIATIONS SCX, strong cation exchange chromatography; HCD, higherenergy collision dissociation; FDR, false discovery rate; GSSP, genome search-specific peptides

sequence analysis of RT-PCR products revealed that they corresponded to the revised gene models thus confirming the validity of proteogenomic approaches to modify the annotation of existing gene models. Amplified cDNA products obtained from 12 such RT-PCR experiments are shown in Figure 3B.



Public Availability of Mass Spectrometry-derived Data

CONCLUSIONS



ASSOCIATED CONTENT

This study demonstrates how high-resolution proteomic technologies can complement genome annotation strategies through identification of missed genes, missed exons and splice isoforms. Because a large majority of eukaryotic proteins are known to be acetylated, the acetylome of an organism can help in efficiently assigning real start sites of the proteins. We suggest that proteogenomics powered high resolution mass spectrometry needs to be pursued as an integral part of future genome sequencing efforts in order to provide improved information to the biomedical community.

S Supporting Information *

Supplemental figures and tables. This material is available free of charge via the Internet at http://pubs.acs.org.



AUTHOR INFORMATION

Corresponding Author

*Dr. Keshava Prasad, Institute of Bioinformatics, International Technology Park, Bangalore -560 066, India. E-mail: [email protected]. Phone: 91-8028416140. Fax: 918028416132. Present Address ⊥

Center for Molecular and Biomolecular Informatics and Nijmegen Center for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Nijmegen, The Netherlands



REFERENCES

(1) Mann, M.; Pandey, A. Use of mass spectrometry-derived data to annotate nucleotide and protein sequence databases. Trends Biochem. Sci. 2001, 26 (1), 54−61. (2) Pandey, A.; Mann, M. Proteomics to study genes and genomes. Nature 2000, 405 (6788), 837−46. (3) Castellana, N.; Bafna, V. Proteogenomics to discover the full coding content of genomes: a computational perspective. J. Proteomics 2010, 73 (11), 2124−35. (4) Krug, K.; Nahnsen, S.; Macek, B. Mass spectrometry at the interface of proteomics and genomics. Mol. Biosyst. 2010, 7 (2), 284− 91. (5) Renuse, S.; Chaerkady, R.; Pandey, A. Proteogenomics. Proteomics 2011, 11 (4), 620−30. (6) Jacobsen, I. D.; Brunke, S.; Seider, K.; Schwarzmuller, T.; Firon, A.; d’Enfert, C.; Kuchler, K.; Hube, B. Candida glabrata persistence in mice does not depend on host immunosuppression and is unaffected by fungal amino acid auxotrophy. Infect. Immun. 2010, 78 (3), 1066− 77. (7) Seneviratne, C. J.; Wang, Y.; Jin, L.; Abiko, Y.; Samaranayake, L. P. Proteomics of drug resistance in Candida glabrata biofilms. Proteomics 2010, 10 (7), 1444−54. (8) Bodey, G. P.; Mardani, M.; Hanna, H. A.; Boktour, M.; Abbas, J.; Girgawy, E.; Hachem, R. Y.; Kontoyiannis, D. P.; Raad II The epidemiology of Candida glabrata and Candida albicans fungemia in immunocompromised patients with cancer. Am. J. Med. 2002, 112 (5), 380−5. (9) Goodman, J. L.; Winston, D. J.; Greenfield, R. A.; Chandrasekar, P. H.; Fox, B.; Kaizer, H.; Shadduck, R. K.; Shea, T. C.; Stiff, P.; Friedman, D. J.; et al. A controlled trial of fluconazole to prevent fungal infections in patients undergoing bone marrow transplantation. N. Engl. J. Med. 1992, 326 (13), 845−51. (10) Gugic, D.; Cleary, T.; Vincek, V. Candida glabrata infection in gastric carcinoma patient mimicking cutaneous histoplasmosis. Dermatol. Online J. 2008, 14 (2), 15. (11) Hajjeh, R. A.; Sofair, A. N.; Harrison, L. H.; Lyon, G. M.; Arthington-Skaggs, B. A.; Mirza, S. A.; Phelan, M.; Morgan, J.; LeeYang, W.; Ciblak, M. A.; Benjamin, L. E.; Sanza, L. T.; Huie, S.; Yeo, S. F.; Brandt, M. E.; Warnock, D. W. Incidence of bloodstream infections due to Candida species and in vitro susceptibilities of isolates collected from 1998 to 2000 in a population-based active surveillance program. J. Clin. Microbiol. 2004, 42 (4), 1519−27. (12) Klevay, M. J.; Ernst, E. J.; Hollanbaugh, J. L.; Miller, J. G.; Pfaller, M. A.; Diekema, D. J. Therapy and outcome of Candida glabrata versus Candida albicans bloodstream infection. Diagn. Microbiol. Infect. Dis. 2008, 60 (3), 273−7. (13) Krcmery, V. Jr.; Oravcova, E.; Spanik, S.; Mrazova-Studena, M.; Trupl, J.; Kunova, A.; Stopkova-Grey, K.; Kukuckova, E.; Krupova, I.; Demitrovicova, A.; Kralovicova, K. Nosocomial breakthrough fungaemia during antifungal prophylaxis or empirical antifungal therapy in 41 cancer patients receiving antineoplastic chemotherapy: analysis of aetiology risk factors and outcome. J. Antimicrob. Chemother. 1998, 41 (3), 373−80. (14) Viscoli, C.; Girmenia, C.; Marinus, A.; Collette, L.; Martino, P.; Vandercam, B.; Doyen, C.; Lebeau, B.; Spence, D.; Krcmery, V.; De Pauw, B.; Meunier, F. Candidemia in cancer patients: a prospective, multicenter surveillance study by the Invasive Fungal Infection Group (IFIG) of the European Organization for Research and Treatment of Cancer (EORTC). Clin. Infect. Dis. 1999, 28 (5), 1071−9. (15) Brun, S.; Dalle, F.; Saulnier, P.; Renier, G.; Bonnin, A.; Chabasse, D.; Bouchara, J. P. Biological consequences of petite mutations in Candida glabrata. J. Antimicrob. Chemother. 2005, 56 (2), 307−14.

We have uploaded the mass spectrometry data (.raw files) generated from this study to the Tranche server (http:// proteomecommons.org/tranche). The raw data files used for genome annotation can be retrieved using the stable URLhttps://proteomecommons.org/tranche/data-downloader. jsp?h = aI3uNjuZ7wvqYIRjGjto36mPu%2FqXqehx2k8WTJ343QopA2VggBrM4rtRW%2Fuy%2FFmTpHtnIQ8Y1htj64e1p8ZGwOVltD0AAAAAAAAsgg%3D%3D.



Article

ACKNOWLEDGMENTS

We thank the Department of Biotechnology (DBT), Government of India for research support to the Institute of Bioinformatics, Bangalore. T.S.K.P. is a recipient of a Young Investigator award from the DBT, Government of India. T.S.K.P. is also supported by a research grant on “Development of Infrastructure and a Computational Framework for Analysis of Proteomic Data” from DBT. H.G. is a Wellcome Trust-DBT India Alliance Early Career Fellow. B.M. is a recipient of a Senior Research Fellowship from the Council of Scientific and Industrial Research (CSIR), Government of India. Y.S. and S.R. are recipients of a Senior Research Fellowship from the University Grants Commission (UGC), Government of India. We thank Agilent Technologies for access to instrumentation. 258

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Journal of Proteome Research

Article

(16) Krcmery, V.; Barnes, A. J. Non-albicans Candida spp. causing fungaemia: pathogenicity and antifungal resistance. J. Hosp. Infect. 2002, 50 (4), 243−60. (17) Li, L.; Redding, S.; Dongari-Bagtzoglou, A. Candida glabrata: an emerging oral opportunistic pathogen. J. Dent. Res. 2007, 86 (3), 204− 15. (18) Silva, S.; Negri, M.; Henriques, M.; Oliveira, R.; Williams, D. W.; Azeredo, J. Adherence and biofilm formation of non-Candida albicans Candida species. Trends Microbiol. 2011, 19 (5), 241−7. (19) Banno, Y.; Yamada, T.; Nozawa, Y. Secreted phospholipases of the dimorphic fungus, Candida albicans; separation of three enzymes and some biological properties. Sabouraudia 1985, 23 (1), 47−54. (20) Barrett-Bee, K.; Hayes, Y.; Wilson, R. G.; Ryley, J. F. A comparison of phospholipase activity, cellular adherence and pathogenicity of yeasts. J. Gen. Microbiol. 1985, 131 (5), 1217−21. (21) Gilfillan, G. D.; Sullivan, D. J.; Haynes, K.; Parkinson, T.; Coleman, D. C.; Gow, N. A. Candida dubliniensis: phylogeny and putative virulence factors. Microbiology 1998, 144 (Pt 4), 829−38. (22) Ibrahim, A. S.; Mirbod, F.; Filler, S. G.; Banno, Y.; Cole, G. T.; Kitajima, Y.; Edwards, J. E. Jr.; Nozawa, Y.; Ghannoum, M. A. Evidence implicating phospholipase as a virulence factor of Candida albicans. Infect. Immun. 1995, 63 (5), 1993−8. (23) Magee, B. B.; Hube, B.; Wright, R. J.; Sullivan, P. J.; Magee, P. T. The genes encoding the secreted aspartyl proteinases of Candida albicans constitute a family with at least three members. Infect. Immun. 1993, 61 (8), 3240−3. (24) Monod, M.; Togni, G.; Hube, B.; Sanglard, D. Multiplicity of genes encoding secreted aspartic proteinases in Candida species. Mol. Microbiol. 1994, 13 (2), 357−68. (25) Takahashi, M.; Banno, Y.; Nozawa, Y. Secreted Candida albicans phospholipases: purification and characterization of two forms of lysophospholipase-transacylase. J. Med. Vet. Mycol. 1991, 29 (3), 193− 204. (26) Zaugg, C.; Borg-Von Zepelin, M.; Reichard, U.; Sanglard, D.; Monod, M. Secreted aspartic proteinase family of Candida tropicalis. Infect. Immun. 2001, 69 (1), 405−12. (27) Prasad, T. S. K.; Keerthikumar, S.; Chaerkady, R.; Kandasamy, K.; Renuse, S.; Marimuthu, A.; Venugopal, A.; Thomas, J.; Jacob, H.; Goel, R.; Pawar, H.; Sahasrabuddhe, N.; Krishna, V.; Nair, B.; Gucek, M.; Cole, R.; Ravikumar, R.; Harsha, H.; Pandey, A. Comparative Proteomic Analysis of Candida albicans and Candida glabrata. Clin. Proteomics 2010, 6 (4), 163−73. (28) Schmidt, P.; Walker, J.; Selway, L.; Stead, D.; Yin, Z.; Enjalbert, B.; Weig, M.; Brown, A. J. Proteomic analysis of the pH response in the fungal pathogen Candida glabrata. Proteomics 2008, 8 (3), 534−44. (29) Dujon, B.; Sherman, D.; Fischer, G.; Durrens, P.; Casaregola, S.; Lafontaine, I.; De Montigny, J.; Marck, C.; Neuveglise, C.; Talla, E.; Goffard, N.; Frangeul, L.; Aigle, M.; Anthouard, V.; Babour, A.; Barbe, V.; Barnay, S.; Blanchin, S.; Beckerich, J. M.; Beyne, E.; Bleykasten, C.; Boisrame, A.; Boyer, J.; Cattolico, L.; Confanioleri, F.; De Daruvar, A.; Despons, L.; Fabre, E.; Fairhead, C.; Ferry-Dumazet, H.; Groppi, A.; Hantraye, F.; Hennequin, C.; Jauniaux, N.; Joyet, P.; Kachouri, R.; Kerrest, A.; Koszul, R.; Lemaire, M.; Lesur, I.; Ma, L.; Muller, H.; Nicaud, J. M.; Nikolski, M.; Oztas, S.; Ozier-Kalogeropoulos, O.; Pellenz, S.; Potier, S.; Richard, G. F.; Straub, M. L.; Suleau, A.; Swennen, D.; Tekaia, F.; Wesolowski-Louvel, M.; Westhof, E.; Wirth, B.; Zeniou-Meyer, M.; Zivanovic, I.; Bolotin-Fukuhara, M.; Thierry, A.; Bouchier, C.; Caudron, B.; Scarpelli, C.; Gaillardin, C.; Weissenbach, J.; Wincker, P.; Souciet, J. L. Genome evolution in yeasts. Nature 2004, 430 (6995), 35−44. (30) Lowry, O. H.; Rosebrough, N. J.; Farr, A. L.; Randall, R. J. Protein measurement with the Folin phenol reagent. J. Biol. Chem. 1951, 193 (1), 265−75. (31) Harsha, H. C.; Molina, H.; Pandey, A. Quantitative proteomics using stable isotope labeling with amino acids in cell culture. Nat. Protoc. 2008, 3 (3), 505−16. (32) Chaerkady, R.; Harsha, H. C.; Nalli, A.; Gucek, M.; Vivekanandan, P.; Akhtar, J.; Cole, R. N.; Simmers, J.; Schulick, R. D.; Singh, S.; Torbenson, M.; Pandey, A.; Thuluvath, P. J. A

quantitative proteomic approach for identification of potential biomarkers in hepatocellular carcinoma. J. Proteome Res. 2008, 7 (10), 4289−98. (33) Fang, Y.; Robinson, D. P.; Foster, L. J. Quantitative analysis of proteome coverage and recovery rates for upstream fractionation methods in proteomics. J. Proteome Res. 2010, 9 (4), 1902−12. (34) Rappsilber, J.; Mann, M.; Ishihama, Y. Protocol for micropurification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2007, 2 (8), 1896−906. (35) Lukashin, A. V.; Borodovsky, M. GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res. 1998, 26 (4), 1107−15. (36) Stanke, M.; Diekhans, M.; Baertsch, R.; Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 2008, 24 (5), 637−44. (37) Krzywinski, M.; Schein, J.; Birol, I.; Connors, J.; Gascoyne, R.; Horsman, D.; Jones, S. J.; Marra, M. A. Circos: an information aesthetic for comparative genomics. Genome Res. 2009, 19 (9), 1639− 45. (38) Wu, J. Q.; Shteynberg, D.; Arumugam, M.; Gibbs, R. A.; Brent, M. R. Identification of rat genes by TWINSCAN gene prediction, RTPCR, and direct sequencing. Genome Res. 2004, 14 (4), 665−71. (39) Yandell, M.; Bailey, A. M.; Misra, S.; Shu, S.; Wiel, C.; EvansHolm, M.; Celniker, S. E.; Rubin, G. M. A computational and experimental approach to validating annotations and gene predictions in the Drosophila melanogaster genome. Proc. Natl. Acad. Sci. U.S.A. 2005, 102 (5), 1566−71. (40) Sherman, D. J.; Martin, T.; Nikolski, M.; Cayla, C.; Souciet, J. L.; Durrens, P. Genolevures: protein families and synteny among complete hemiascomycetous yeast proteomes and genomes. Nucleic Acids Res. 2009, 37 (Database issue), D550−4. (41) Shevchenko, A.; Jensen, O. N.; Podtelejnikov, A. V.; Sagliocco, F.; Wilm, M.; Vorm, O.; Mortensen, P.; Boucherie, H.; Mann, M. Linking genome and proteome by mass spectrometry: large-scale identification of yeast proteins from two dimensional gels. Proc. Natl. Acad. Sci. U.S.A. 1996, 93 (25), 14440−5. (42) Nagaraj, N.; Kulak, N. A.; Cox, J.; Neuhaus, N.; Mayr, K.; Hoerning, O.; Vorm, O.; Mann, M. Systems-wide perturbation analysis with near complete coverage of the yeast proteome by single-shot UHPLC runs on a bench-top Orbitrap. Mol. Cell. Proteomics 2011, DOI: 10.1074/mcp.M111.013722. (43) Chaerkady, R.; Kelkar, D. S.; Muthusamy, B.; Kandasamy, K.; Dwivedi, S. B.; Sahasrabuddhe, N. A.; Kim, M. S.; Renuse, S.; Pinto, S. M.; Sharma, R.; Pawar, H.; Sekhar, N. R.; Mohanty, A. K.; Getnet, D.; Yang, Y.; Zhong, J.; Dash, A. P.; Maccallum, R. M.; Delanghe, B.; Mlambo, G.; Kumar, A.; Prasad, K. T.; Okulate, M.; Kumar, N.; Pandey, A. A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry. Genome Res. 2011, 21, 1872−81. (44) Kelkar, D. S.; Kumar, D.; Kumar, P.; Balakrishnan, L.; Muthusamy, B.; Yadav, A. K.; Shrivastava, P.; Marimuthu, A.; Anand, S.; Sundaram, H.; Kingsbury, R.; Harsha, H. C.; Nair, B.; Prasad, T. S. K.; Chauhan, D. S.; Katoch, K.; Katoch, V. M.; Kumar, P.; Chaerkady, R.; Ramachandran, S.; Dash, D.; Pandey, A. Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry. Mol. Cell. Proteomics 2011, DOI: 10.1074/mcp.M111.011445. (45) Stead, D.; Findon, H.; Yin, Z.; Walker, J.; Selway, L.; Cash, P.; Dujon, B. A.; Hennequin, C.; Brown, A. J.; Haynes, K. Proteomic changes associated with inactivation of the Candida glabrata ACE2 virulence-moderating gene. Proteomics 2005, 5 (7), 1838−48. (46) Stead, D. A.; Walker, J.; Holcombe, L.; Gibbs, S. R.; Yin, Z.; Selway, L.; Butler, G.; Brown, A. J.; Haynes, K. Impact of the transcriptional regulator, Ace2, on the Candida glabrata secretome. Proteomics 2010, 10 (2), 212−23. (47) Rogers, P. D.; Vermitsky, J. P.; Edlind, T. D.; Hilliard, G. M. Proteomic analysis of experimentally induced azole resistance in Candida glabrata. J. Antimicrob. Chemother. 2006, 58 (2), 434−8. (48) Peri, S.; Pandey, A. A reassessment of the translation initiation codon in vertebrates. Trends Genet. 2001, 17 (12), 685−7. 259

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260

Journal of Proteome Research

Article

(49) Goetze, S.; Qeli, E.; Mosimann, C.; Staes, A.; Gerrits, B.; Roschitzki, B.; Mohanty, S.; Niederer, E. M.; Laczko, E.; Timmerman, E.; Lange, V.; Hafen, E.; Aebersold, R.; Vandekerckhove, J.; Basler, K.; Ahrens, C. H.; Gevaert, K.; Brunner, E. Identification and functional characterization of N-terminally acetylated proteins in Drosophila melanogaster. PLoS Biol. 2009, 7 (11), e1000236. (50) Polevoda, B.; Sherman, F. N-terminal acetyltransferases and sequence requirements for N-terminal acetylation of eukaryotic proteins. J. Mol. Biol. 2003, 325 (4), 595−622. (51) Kalume, D. E.; Peri, S.; Reddy, R.; Zhong, J.; Okulate, M.; Kumar, N.; Pandey, A. Genome annotation of Anopheles gambiae using mass spectrometry-derived data. BMC Genomics 2005, 6, 128. (52) Molina, H.; Bunkenborg, J.; Reddy, G. H.; Muthusamy, B.; Scheel, P. J.; Pandey, A. A proteomic analysis of human hemodialysis fluid. Mol. Cell. Proteomics 2005, 4 (5), 637−50.

260

dx.doi.org/10.1021/pr200827k | J. Proteome Res. 2012, 11, 247−260