Bioinformatics Analysis of a Saccharomyces cerevisiae

Bioinformatics Analysis of a Saccharomyces cerevisiae...
4 downloads 0 Views 3MB Size
ARTICLE pubs.acs.org/jpr

Bioinformatics Analysis of a Saccharomyces cerevisiae N-Terminal Proteome Provides Evidence of Alternative Translation Initiation and Post-Translational N-Terminal Acetylation Kenny Helsens,†,‡,^ Petra Van Damme,†,‡,^ Sven Degroeve,†,‡ Lennart Martens,†,‡ Thomas Arnesen,§,|| Jo€el Vandekerckhove,†,‡ and Kris Gevaert*,†,‡ †

Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium § Department of Molecular Biology, University of Bergen, N-5020 Bergen, Norway Department of Surgery, Haukeland University Hospital, N-5021 Bergen, Norway

)



bS Supporting Information ABSTRACT: Initiation of protein translation is a well-studied fundamental process, albeit high-throughput and more comprehensive determination of the exact translation initiation sites (TIS) was only recently made possible following the introduction of positional proteomics techniques that target protein N-termini. Precise translation initiation is of crucial importance, as truncated or extended proteins might fold, function, and locate erroneously. Still, as already shown for some proteins, alternative translation initiation can also serve as a regulatory mechanism. By applying N-terminal COFRADIC (combined fractional diagonal chromatography), we here isolated N-terminal peptides of a Saccharomyces cerevisiae proteome and analyzed both annotated and alternative TIS. We analyzed this N-terminome of S. cerevisiae which resulted in the identification of 650 unique N-terminal peptides corresponding to database annotated TIS. Furthermore, 56 unique NR-acetylated peptides were identified that suggest alternative TIS (MS/MS-based), while MS-based evidence of NR-acetylation led to an additional 33 such peptides. To improve the overall sensitivity of the analysis, we also included the 50 UTR (untranslated region) in-frame translations together with the yeast protein sequences in UniProtKB/Swiss-Prot. To ensure the quality of the individual peptide identifications, peptide-tospectrum matches were only accepted at a 99% probability threshold and were subsequently analyzed in detail by the Peptizer tool to automatically ascertain their compliance with several expert criteria. Furthermore, we have also identified 60 MS/MS-based and 117 MS-based NR-acetylated peptides that point to NR-acetylation as a post-translational modification since these peptides did not start nor were preceded (in their corresponding protein sequence) by a methionine residue. Next, we evaluated consensus sequence features of nucleic acids and amino acids across each of these groups of peptides and evaluated the results in the context of publicly available data. Taken together, we present a list of 706 annotated and alternative TIS for yeast proteins and found that under normal growth conditions alternative TIS might (co)occur in S. cerevisiae in roughly one tenth of all proteins. Furthermore, we found that the nucleic acid and amino acid features proximate to these alternative TIS favor either guanine or adenine nucleotides following the start codon or acidic amino acids following the initiator methionine. Finally, we also observed an unexpected high number of NRacetylated peptides that could not be related to TIS and therefore suggest events of post-translational NR-acetylation. KEYWORDS: Mass spectrometry, peptide-centric proteomics, COFRADIC, proteomics bioinformatics, alternative translation initiation, proteogenomics, NR-acetylome, NR-acetylation

’ INTRODUCTION The general mechanism of protein translation has been documented extensively. Following transcription and processing, a mature mRNA molecule encodes an amino acid sequence that can be decoded by the cell’s translation machinery via three subsequent steps: initiation, elongation, and termination.1 Briefly, eukaryotic translation initiation starts when the ternary complex (Met-tRNA, GTP, and the eukaryotic translation initiation factor eIF2) combines with the 40S ribosomal subunit and other eukaryotic translation initiation factors (eIFs) to form the 43S preinitiation complex. This complex then interacts with the eIF4 complex, which also functions as a scaffold for the 50 capped mRNA, together forming the 48S initiation complex. The latter complex then starts scanning the mRNA molecule until the appropriate r 2011 American Chemical Society

initiation codon is located, followed by recruitment of the 60S ribosomal subunit, and ultimately leading to the formation of the 80S ribosome which initiates protein translation (reviewed in ref 2). The numerous factors involved in translation initiation suggest a complex regulatory mechanism, yet until now the determination of the initiation codon is considered fairly straightforward: the first start codon encountered when scanning the processed transcript is the most likely translation initiation site.3 This is further strengthened by the presence of nucleic acid context features like the Kozak motif4 or preceding adenine stretches in AT-rich genomes such as yeast.5 Received: March 14, 2011 Published: May 30, 2011 3578

dx.doi.org/10.1021/pr2002325 | J. Proteome Res. 2011, 10, 3578–3589

Journal of Proteome Research Recognition of the initiation codon is of crucial importance because distinct translation initiation codons will result in multivalent translation or protein variants, which possibly hold functions that are different from the wild-type protein function. Moreover, an uncontrolled extension or reduction of a protein’s N-terminal sequence could for example interfere with protein localization by altering signal or transit sequences.6 Yet, alternative translation initiation occurs, and some principal causative mechanisms are known. One mechanism includes leaky scanning during which translation starting at the first initiation codon is circumvented in favor of a second (or third) initiation codon.7 Ribosome shunting is another mechanism of protein translation where the first initiation codon is localized within a structured mRNA region and therefore bypassed.8 Translation reinitiation, yet another mechanism, might occur when the 40S ribosomal subunit remains attached to the mRNA and recommences scanning until a second initiator codon is encountered and translation is restarted.9,10 Recapping the 50 end of an mRNA is another post-transcriptional mechanism reported in Homo sapiens, Mus muscuslus, and Drosophila melanogaster and might also be an additional cause of alternative translation initiation in Saccharomyces cerevisiae.11,12 Furthermore, non-AUG codons have also been reported as Met-coding translation initiation sites if they are favorably positioned within a Kozak motif and can thereby lead to yet another means to alternative translation initiation.13,14 After translation initiation, the translation machinery recruits elongation factors that enable further translation of the protein sequence, but as soon as 3050 N-terminal residues protrude from the ribosome, the nascent protein can become a target for various cotranslational modification events. Here, the amino acid composition of a protein’s N-terminus greatly influences whether a protein is subjected to initiator Met removal by Met aminopeptidases (MAPs)15 and to events such as NR-acetylation by NR-acetyltransferases (NATs).16,17 Since these modifications were mostly considered to take place during translation,18 the identification of in vivo acetylated N-terminal peptides was here considered as a direct lead to genuine translation initiation sites (TIS). We therefore applied N-terminal COFRADIC (combined fractional diagonal chromatography),19 a positional proteomics methodology that strongly enriches for protein N-terminal peptides that are thus proxies for translation initiation sites. In addition, we also determined and quantified the in vivo NR-acetylation state20 of these peptides and considered in vivo R-NR-acetylation as an additional proxy for translation initiation events. An overview of our general workflow is shown in Figure 1, which illustrates how we have generated a highly reliable proteome map composed of both genuine, database annotated as well as alternative S. cerevisiae TIS that can serve as a useful resource for further elaboration of the mechanisms of (alternative) protein translation initiation as well as protein N-terminal modifications.

’ EXPERIMENTAL SECTION Sample Preparation and Proteome Analysis

S. cerevisiae proteomes were prepared as described.20 LC-MS/ MS analysis, using an Ultimate 3000 HPLC system (Dionex, Amsterdam, The Netherlands) in-line connected to a LTQ Orbitrap XL mass spectrometer (Thermo Electron, Bremen, Germany), was performed as described.20

ARTICLE

Figure 1. Overview of the workflow used to define the alternative TIS. A proteome sample was prepared from a S. cerevisiae lysate. This was subsequently separated by N-terminal COFRADIC to isolate fractions of protein N-terminal peptides. Following LTQ-Orbitrap LC-MS/MS analysis of these fractions, N-terminal peptides can be identified from the MS/MS spectra by the Mascot database search algorithm. Furthermore, Peptizer was used to assess orthogonal quality criteria to all peptide identifications and ensure the reliability of further downstream analysis. One important protocol step in the N-terminal COFRADIC methodology involves complete blocking of free amines by in vitro 13C2D3-acetylation, which thus allows us to separate in vivo and in vitro NR-acetylated peptides. Taken together, we have gathered a list of in vivo NR-acetylated protein N-termini that we could, relying on the cotranslational nature of NR-acetyltransferases, further split into annotated and alternative translation initiation sites that, respectively, start at expected and unexpected protein coordinates.

Peptide Identification by Mascot

Mascot server version 2.2 from Matrix Science was used to identify the MS/MS spectra in the S. cerevisiae content of UniProtKB/Swiss-Prot (version 15.10) concatenated with the 50 -UTR peptide centric database derived from the S. cerevisiae genome database SGD (additional data file 3, Supporting Information). A shuffled version of this concatenated database was created to estimate the false discovery rate in the results at 0.64%.21,22 The precursor ion tolerance was set to 10 ppm, and the fragment ion tolerance was set to 0.5 Da. Semispecific Arg-C/P was used as enzyme specificity, and no missed cleavages were allowed. The fixed modifications were 13C2D3-acetylation (+47 Da) on Lys, carbamidomethylation (+57 Da) on Cys, and oxidation (+16 Da) on Met, and the variable modifications were acetylation (+42 Da) and 13 C2D3-acetylation (+47 Da) on the N-terminus. The charge state was set to allow single, double, and triple charged peptides. All peptide identifications were subsequently processed, stored, and managed by ms-lims.23 Finally, all peptide identifications have been made publicly available via PRIDE [pride project: 16442]. 3579

dx.doi.org/10.1021/pr2002325 |J. Proteome Res. 2011, 10, 3578–3589

Journal of Proteome Research

ARTICLE

Figure 2. Nonrandom amino acid usage at S. cerevisiae and H. sapiens N-termini. The iceLogos show the percentage difference of amino acid frequencies between protein N-termini and random positions in the theoretical S. cerevisiae (A) and H. sapiens (B) proteome. Percentage differences are only shown if they are more than 99.9% significant based on Monte Carlo sampling. In S. cerevisiae (A), Ser occurs with a frequency of 23% on the first position following the initiator Met, which is 14% more than the random Ser frequency of 9%. Additionally, this increased frequency is also observed more for C-terminal of yeast protein N-termini. Interestingly, Met residues occur less frequently along N-termini than randomly expected. In H. sapiens, Ala occurs with a frequency of 23% on the first position following the initiator Met, which is 16% more than the random Ala frequency of 7%. Additionally, this increased frequency of Ala continues along human protein N-termini along with an increased frequency of leucine residues.

Validation of Peptide Identifications by Peptizer

Different quality-related rules were applied through Peptizer Agents. Here, peptides were required to be longer than 8 amino acids. One MS/MS spectrum could only yield one confident peptide identification (thus excluding all spectra that had ambiguous identifications). At least 40% of all b-ions and 40% of all y-ions needed to be found in the MS/MS spectrum, and fragment ions needed to be present from the first three N-terminal peptide bonds to increase the reporter quality of the N-terminal acetylation. Finally, peptides could not contain His residues as a result of the pre-enrichment using SCX at low pH was performed.24 The xml-based agent profile can be downloaded

from http://sites.google.com/site/peptizer/Home/profiles-1 and can then be loaded into the freely available Peptizer application (http://peptizer.googlecode.com). Sequence Analysis by IceLogo

The annotated N-termini and the alternative N-termini that started with Met were aligned at the initiator Met, while the alternative N-termini that did not start with Met were aligned at their preceding amino acid to include the context of the peptides. Each peptide sequence was then mapped to its coding sequence as extracted from SGD, and the nucleic acids surrounding the peptide sequence were analyzed for sequence features by 3580

dx.doi.org/10.1021/pr2002325 |J. Proteome Res. 2011, 10, 3578–3589

Journal of Proteome Research

ARTICLE

iceLogo.25 Using iceLogo, a reference and a target set of peptide sequences are tailored to the analysis, and the application then reports positional differences between these two sets by probability-based methods. IceLogo estimates, for instance, from 30 reference sets with 100 random peptides each, the mean and standard deviation of serine frequency at a single given position. By comparing a target set of 100, for instance, in vivo NRacetylated, peptides to these estimates, iceLogo is able to report positional amino acid differences between the reference and target sets. The reference sets for the iceLogos were generated by random sampling of amino acids in the S. cerevisiae sequences of UniProtKB/Swiss-Prot 15.10 and by random sampling of nucleic acids in the ORF-Genomic-1000 fasta database from SGD. The sampling size was equal to the number of peptides in each group.

’ RESULTS AND DISCUSSION In Silico Analysis of Database Annotated Protein N-Termini Reveals Nonrandom Usage of Amino Acids in S. cerevisiae

Before studying experimentally identified N-terminal peptides, we evaluated the theoretical amino acid composition of N-terminal peptides in the baker’s yeast proteome using the iceLogo application.25 We therefore compared S. cerevisiae amino acid frequencies from database annotated protein N-termini with general amino acid frequencies in yeast by random sampling. We observed that the random frequency of Ser residues is 9%, while the frequency of Ser at the second position in the protein sequence (following the initiator Met) is 23% (Figure 2A). At this position Ala and Thr are also overrepresented, although not as strongly as Ser. Interestingly, Met residues are significantly underrepresented at all but the first position in protein N-termini. In contrast, a similar in silico analysis on the theoretical human proteome revealed an increased frequency of Ala residues of 23% at the second position compared to the random frequency of 7% (Figure 2B) indicating that this bias may not be evolutionarily conserved. Identification of Known and Alternative N-Termini by Positional Proteomics

The experimental data underlying this manuscript have been generated by the N-terminal COFRADIC positional proteomics approach.19 Briefly, prior to tryptic digestion, all primary amines are modified by 13C2D3-acetylation which allows us to differentiate between in vivo acetylated and free N-termini (in vitro 13 C2D3-acetylated) by introducing a spacing of 5 Da between these types of N-terminal peptides. Furthermore, this also allows a straightforward calculation of the extent of NR-acetylation. After tryptic digestion, all protein N-terminal peptides will thus be blocked, while all other internal peptides will have a newly generated primary R-amine. We subsequently make use of this attribute to isolate N-terminal peptides from internal peptides in a diagonal chromatography setup, and this method thus allows us to select for protein N-termini followed by LC-MS/MS analysis for their identification. If protein translation initiation occurs upstream of existing database annotations, the observed N-terminal peptides will not be found in the protein sequence database(s) used to identify the MS/MS spectra, making it impossible to identify such N-terminal peptides. To allow identification of such events, we set up a proteogenomics approach for S. cerevisiae similar to prior endeavors aimed at validating and correcting TIS annotations in two Mycobacterium species26 and in D. melanogaster.27 We created

Figure 3. Creation of the 50 UTR in-frame peptide centric database. First, (A) each UniProtKB/Swiss-Prot entry was linked to its S. cerevisiae genome database identifier (SGD) using the PICR service, such that the N-terminal peptide of the protein could be aligned on its corresponding nucleic acid sequence. Second, (B), the algorithm extracts 1000 bp upstream of the annotated TIS and subsequently maps in-frame start and stop codons within the coding sequence. Third, (C) potential alternative TIS are translated, and the algorithm subsequently maps the closest downstream arginine residue to mimic Arg-C specificity. Fourth, (D) the potential alternative TIS sites are stored in the 50 UTR in-frame peptide centric database along with a distance index to the annotated TIS. This database is subsequently concatenated to the UniProtKB/Swiss-Prot yeast fraction for the actual sequence database searches.

a 50 UTR extended protein sequence database to search the MS/ MS spectra of a yeast proteome preparation from which protein N-terminal peptides were enriched.19 We first collected all S. cerevisiae protein entries from the UniProtKB/Swiss-Prot database and then obtained the corresponding 50 UTR in-frame sequences from the yeast genome sequence in the S. cerevisiae genome database (SGD) using PICR.28 The algorithm that was used to extract peptides from these 50 UTR derived sequences is schematically shown in Figure 3. This algorithm first locates inframe upstream start codons that could potentially lead to alternative upstream TIS. Then, starting from these sites, the first downstream Arg codon is located to create a 50 UTR, Arg-C specific N-terminal peptide (the Arg-C specific C-terminal cleavage is expected from an N-terminal COFRADIC analysis19). When applied to all S. cerevisiae UniProt/KB entries, the algorithm produced 3581

dx.doi.org/10.1021/pr2002325 |J. Proteome Res. 2011, 10, 3578–3589

Journal of Proteome Research

ARTICLE

Figure 4. N-terminal coverage inspection by Peptizer NR-acetylated peptides categorized as annotated N-termini (group A) and alternative N-termini (group B), respectively, located at annotated protein start positions or internal protein positions. The matrix plot shows sequence coverage values as observed by Peptizer. First, the ion coverage of the N-terminus is calculated as the percentage of single and/or double charged fragmentation ions found for its three first peptide bonds (b1, b2, b3, y(n-1), y(n-2), y(n-3)), and this metric is set along the vertical axis. The three major vertical categories reflect the distance to the N-terminus, given as the location of the fragment ion closest to the N-terminus (position 1, 2, or 3). Within these three vertical categories, the peptides are binned by total fragment ion coverage (minor horizontal axis). The size of each data point reflects the percentage of peptides found at that location. Well-substantiated peptides are thus located to the lower right on each row, indicating the presence of many fragment ions, including those that cover the N-terminal part of the peptide sequence.

a peptide-centric database (n = 134) encompassing all potential 50 UTR alternative TIS generated upon Arg-C specific cleavage. The Mascot database searches were performed in a custom database consisting of the S. cerevisiae UniProtKB/Swiss-Prot database sequences combined with the potential 50 UTR alternative TIS peptides. MS/MS spectra (n = 10466) were identified at 99% probability, with decoy database searches providing a peptide false discovery rate estimate of 0.64% (additional data file 1, Supporting Information). To further ensure the quality of the peptide identifications used in subsequent analyses, we complemented the database searches with additional identification validation using Peptizer.29 For this, a novel Peptizer Agent was created that specifically inspected the in vivo NR-acetylation state by scanning for adequate fragment ion coverage (threshold of >40%). This Agent was complemented by a second Agent that inspected the peptide bond most proximal to the N-terminus for which either b- or y-ions were detected. A third Agent then inspected MS/MS spectra for the presence of conflicting secondary (or tertiary) peptide hits exceeding the 99% probability threshold, as these are indicative of ambiguous identification. As a result, 1023 peptide identifications (i.e., about 10%) were discarded from the original list, while the remaining 9443 peptide identifications were shown to be of superior reliability (additional data file 1, Supporting Information). These high-quality peptide identifications were then grouped by their TIS context, for which we defined two distinct parameters: the protein start position as annotated in UniProtKB/Swiss-Prot and the N-terminal modification status (additional data file 2,

Supporting Information). First, group A (n = 650) includes all unique peptide sequences from database annotated TIS (starting at protein position one or two in UniProtKB/Swiss-Prot) that were identified as in vivo NR-acetylated and/or in vivo unmodified (note that for several proteins NR-acetylation is partial, implying that both NR-acetylated and free peptides can be encountered in vivo). Second, group B contains all in vivo NR-acetylated peptides with start positions upstream or downstream to the database annotated TIS (n = 116). Relying on the cotranslational nature of in vivo acetylation20 and the requirement of a Met-encoding initiator codon, 56 peptide sequences within this group hint toward the protein N-termini originating from alternative TIS (“yes” in column “iMet/Ace-compliant” in additional data file 2, sheet 2, Supporting Information). Both peptide groups thus provide stringent and direct sequence information to study translation initiation by combining existing TIS annotations and direct evidence for NR-acetylation. Supporting this claim, in Figure 4 we illustrate that the overall ion coverage and N-terminal ion coverage parameters which had been asserted by Peptizer are similar in groups A and B and therefore argue that the peptide identification quality within both groups of NR-acetylated peptides is of equally high quality. Heterogeneous Translation Initiation

Altogether, we identified 56 in vivo NR-acetylated peptides that started with or were preceded by a Met residue but that did not map to a known TIS at protein position 1 or 2 (Figure 5). To determine whether these constitute possible erroneous TIS annotations for the identified gene products or whether these

3582

dx.doi.org/10.1021/pr2002325 |J. Proteome Res. 2011, 10, 3578–3589

Journal of Proteome Research

ARTICLE

Figure 5. Alternative yeast protein N-termini with a proximal Met. The 56 in vivo NR-acetylated peptides that start with or follow Met and point to alternative TIS (panel A) or possible wrongly annotated TIS (panel B) are shown as orange bars and are aligned on the x-axis by their start site in the parent protein. Furthermore, database annotated N-acetylated N-terminal peptides identified in our experiment are shown as blue bars; peptides identified in the study by de Godoy et al. are represented by gray bars; and all peptides stored in PRIDE for S. cerevisiae are shown as green bars. Panel A displays the alternative NR-acetylated peptides for which we have found preceding peptide sequence evidence, and we therefore suggest that these are examples of alternative TIS. Panel B subsequently displays all alternative NR-acetylated peptides for which we not have found preceding peptide sequence evidence, yet we have calculated that the annotated TIS peptide should be detectable by mass spectrometry in the [600 Da:4000 Da] mass interval. As these annotated TIS have never been reported in other studies although they do have masses detectable by mass spectrometry, it is probably that these annotated TIS are wrongly annotated, and the here identified alternative TIS might represent corrections to current TIS annotations.

are true cases of alternative TIS, we gathered public information on these proteins. First, we mapped all peptides from group A that were identified in our analysis as annotated TIS, thus starting at protein position 1 or 2. Additionally, we also mapped all peptides identified in the most comprehensive yeast proteome analysis

performed to date.30 Furthermore, we used the BIOMART31 service to extract all identified yeast peptides that are stored into the PRIDE database32 (version of January 21, 2010). This allowed us to search for any identified peptides that preceded the TIS peptides identified in our analysis. If such preceding peptides 3583

dx.doi.org/10.1021/pr2002325 |J. Proteome Res. 2011, 10, 3578–3589

Journal of Proteome Research were found, we propose that the NR-acetylated peptides identified in our analysis resulted from alternative TIS. When no preceding sequence evidence was found, we further calculated whether the database annotated TIS would generate a detectable (mass between 600 and 4000 Da) ArgC peptide in our analysis or a detectable LysC/P peptide with a maximum of two allowed missed cleavages according to the analysis performed by de Godoy et al.30 This analysis allowed us to further split up these 56 in vivo NR-acetylated peptides into two categories. A first category consists of 39 NR-acetylated peptides for which our experiment, the PRIDE database, or the experiment by de Godoy et al. found preceding peptide sequence evidence, thus providing examples of alternative TIS (Figure 5a). This group also encompasses the only two peptides that were identified using the 50 UTR translation extensions (note that for both we also identified the annotated TIS). Our numbers therefore suggest that the annotated yeast TIS are highly reliable but that a few proteins have multiple TIS that might co-occur within or across experiments. Some of these alternative TIS are located quite close to the annotated TIS and potentially result from leaky scanning. Yet, even such small deviations are interesting, as various studies have reported on the drastic influence of minimal N-terminal extensions and/or modifications.33,34 Other alternative TIS, however, are located several residues downstream of the annotated TIS, and these might point to regulatory mechanisms inflicted by translation initiation.35 The second group consists of 17 NR-acetylated peptides for which we have not found preceding sequence evidence, although we estimated that the annotated N-terminal peptide should have been detectable by mass spectrometry (Figure 5b). It is of course important to realize that a missing signal from proteomic analyses cannot be used to rule out the presence of that peptide in a sample. Indeed, although we attempted to provide ample in silico evidence to ensure the quality of these peptide identifications, it is important to stress that further (biological) validation is necessary to ascertain these alternative TIS in general. Such a validation study however falls outside the scope of the current bioinformatics-oriented analysis, and this collection of 56 alternative TIS should therefore primarily be considered as a pointer for further studies on the corresponding transcripts and gene products and their regulated mechanisms of expression, similar to the studies undertaken in refs 3538.

ARTICLE

Figure 6. Nucleic and amino acid sequence features of annotated TIS. Amino acid and nucleic acid iceLogos of peptides starting at protein position 1 or 2, thus agreeing with database annotated TIS are shown. The upper panel shows the amino acid features, while the lower panel shows the nucleic acid features, with the trapezoid linking amino acid and nucleic acid indices. These annotated TIS show a strong increased preference for Ser on protein position two and also show nucleic acid traits known to influence translation initiation such as the KOZAK sequence and the preceding adenine stretch. As a consequence, these features are in agreement with the theoretical calculations (Figure 1) and prove that our experimental methodology provides a representative sample of protein TIS.

Annotation by Public Information

It has previously been shown that translation initiation can serve as a regulatory mechanism for protein localization and protein function.3942 In this study, two alternative TIS of the CDC9 gene were identified that were previously reported to yield two DNA ligases involved in maintaining either the mitochondrial or the nuclear genome.43 Likewise, the UniprotKB/Swiss-Prot entry of TUP1 [Swiss-Prot:P16649]) reports an erroneous initiation event matching the SFL2 protein variant reported by Fujita et al.44 The matching putative alternative TIS, which started at position 45, was also identified in this study, indicating that the SFL2 protein is generated by alternative translation initiation (additional data file 2, sheet 2, Supporting Information). Similarly, other alternative TIS reported in this study might confer different functionalities to their resulting protein isoforms, and some of these can be hypothesized following inspection of public information from UniprotKB/Swiss-Prot protein annotations. For instance, the FAS1 domain-containing protein YDR262W [Swiss-Prot:

Q12331] identified here has a predicted signal peptide from position 1 to 26, and we identified an alternative translation initiation site at position 19 which might interfere with protein localization by creating a variant protein that can no longer be located to vacuoles. The 54S ribosomal protein L36 (MrpL36p, [Swiss-Prot:P36531]) represents an essential nuclear encoded protein residing in mitochondria. Its transit peptide ranges from position 1 to 14.45 MrpL36p can be split into three domains by evolutionary traits: a nonconserved N-terminal domain and conserved central and C-terminal domains. MrpL36p is part of the large mitochondrial ribosome subunit and was linked to mRNA recognition and translation initiation.46 Interestingly, we have identified an alternative TIS at position 49 which likely interferes with the mitochondrial localization of MrpL36p, thus potentially allowing it to exert its function elsewhere. The ATPdependent RNA helicase eIF4A [Swiss-Prot:P10081] is another example of a yeast protein with an alternative translation 3584

dx.doi.org/10.1021/pr2002325 |J. Proteome Res. 2011, 10, 3578–3589

Journal of Proteome Research

ARTICLE

Figure 7. Nucleic and amino acid sequence features of internal in vivo NR-acetylated peptides. This figure shows iceLogos of in vivo NR-acetylated peptides starting at protein positions beyond position 2, and thus all do not correspond to database annotated protein start positions. The upper panels show the amino acid features, while the lower panels show the nucleic acid features, with the trapezoid linking amino acid and nucleic acid indices. Sequences derived from peptides that start with or follow a Met residue, and given their in vivo NR-acetylation status most likely represent alternative TIS, are shown at the left side (A). For these peptides, a strong overrepresentation of Glu, Asp, and Asn is found. Of further note is that the overrepresentation of Ser seen for annotated TIS is absent in peptides resulting from alternative TIS. The nucleic acid features show overrepresentation of guanine (+4) and adenine (+5), in agreement with the coding part of the Kozak motif and/or the codons for Glu and Asp. On the right side (B), all sequences are shown from peptides that do not start with or follow Met and therefore point to post-translational acetylation events. Here, the amino acid features show an overrepresentation of Ser at the NR-acetylation site, while the nucleic acid features only show modest differences.

initiation site. Our study showed that for this protein translation might start at position 27, thereby bypassing the functionally important N-terminal protein sequence (an F22A mutant was shown to abolish growth and ATPase activity as this amino acid is part of the Q motif and thereby the protein’s helicase function47). Analysis of Sequence Features

In support of the alternative TIS peptides directly identified by MS/MS spectra as in vivo NR-acetylated peptides, we further inspected all MS spectra of MS/MS-identified in vitro NR-13C2D3acetylated peptides,48 for the presence of their in vivo NR-acetylated counterparts (additional data file 2, sheet 3, Supporting Information). If this light “complement peptide” was found upon inspection of the MS spectra, we considered this as indirect evidence for in vivo NR-acetylation.48 The alternative TIS identified using this strategy were considered as extra groups in addition to the abovementioned peptide groups A and B and were analyzed as Supporting Information in the following sections.

We subsequently examined amino and nucleic acid consensus features of the NR-acetylated peptides by iceLogo.25 The iceLogo of peptides resulting from translation initiation at annotated TIS and of the alternative TIS here identified is shown in Figure 6. The increased Ser frequency following the initiator Met observed in our theoretical analysis (Figure 1) clearly stands out and thus confirms that the N-terminal COFRADIC methodology provides a representative sample of protein TIS. Of further note is that in AT-rich genomes such as the yeast genome translation initiation is strengthened following an adenine-rich nucleotide stretch, which is here also observed upon mapping the identified peptides onto the nucleic acid sequence (Figure 6) of their corresponding genomic locations. In addition, the KOZAK motif (3A, +4G) is also present. In contrast to peptides resulting from translation initiation at annotated TIS, the iceLogo of peptides resulting from translation starting at alternative TIS (in vivo NR-acetylated, upstream,or downstream to the annotated TIS and directly proximal to a Met 3585

dx.doi.org/10.1021/pr2002325 |J. Proteome Res. 2011, 10, 3578–3589

Journal of Proteome Research

ARTICLE

Figure 8. Nucleic and amino acid sequence features of internal in vivo NR-acetylated peptides (indirect via MS). This figure is analogous to Figure 7 and shows iceLogos of in vivo NR-acetylated peptides starting at protein positions beyond 2, and thus all disagree with annotated TIS. The difference from Figure 7 is that these peptides were identified as in vitro NR-acetylated peptides, yet the presence of the in vivo NR-acetylated form could be asserted indirectly via manual inspection of the MS data. The upper panels illustrate amino acid features, while the lower panels illustrate nucleic acid features, with the trapezoid linking amino acid and nucleic acid indices. On the left-hand (A), all sequences are derived from peptides that start with or follow a Met residue, and combined with their in vivo NR-acetylation status, these most likely represent alternative TIS. The amino acid features show a strong overrepresentation of Glu, Asp, and Asn. Note that the overrepresentation of Ser that was observed with the annotated TIS is again absent among alternative TIS. The nucleic acid features have a strong overrepresentation of guanine (+4) and adenine (+5), in agreement with the coding part of the Kozak motif and/or the codons for Glu and Asp. On the right-hand, all sequences are shown that are derived from peptides that do not start with or not follow Met and therefore cannot spring from TIS. As a consequence, the in vivo NR-acetylation state of these peptides cannot be an effect of cotranslational acetylation. The amino acid features show a remarkable overrepresentation of Ser at the NR-acetylation site, while the nucleic acid features only show modest differences. Taken together, both the amino acid and nucleic acid features support the observations derived from the direct in vivo NR-acetylated peptides in Figure 7.

residue) shows different features (Figure 7a). Glu, Asp,and Asn are highly preferred amino acids following Met, while the increased preference of Ser observed at annotated TIS is lacking. One explanation for the presence of these alternative TIS could be that both codons for Glu and Asp residues start with guanine, which is in agreement with the +4 guanine position from the KOZAK motif. On the other hand, the 3 adenine position of the KOZAK motif as well as the preceding adenine-rich stretch are absent and therefore appear to be a nonessential features for these alternative TIS (Figure 7a). It is important to mention that our NR-acetylated alternative TIS are biased by consensus features steered by protein NR-acetylation such that, for instance, (Met-)Pro starting alternative TIS that are never NR-acetylated are consequently not represented in our data and corresponding nucleotide features matching such alternative TIS will be missing from this sequence analysis.27 Still, the changing frequency of Ser in between the annotated and alternative TIS suggests the existence of discriminating factors for annotated and alternative TIS. The analogous group with indirect alternative TIS provides 33 further internal Met-starting NR-acetylated peptides

(additional data file 2, sheet 3, Supporting Information), and the corresponding sequence features are in perfect agreement with those deduced from the direct alternative TIS (Figure 8a). The sequence features of the NR-acetylated peptides that are upstream or downstream of annotated TIS, but do not start with or directly follow a Met residue, are shown in Figure 7b. These peptides were rather unexpected since in vivo NR-acetylation is known to primarily occur as a cotranslational modification and not as a post-translational modification. However, some cases of post-translational NR-acetylation were known.4951 A recent paper by Helbig et al. reports a similar population of posttranslationally N R -acetylated peptides, and these authors hypothesize that such peptides are associated with yet undetermined in vivo proteolytic events.52 The majority of NR-acetylated peptides in this group begins with Ser and therefore points to NatA-mediated NR-acetylation, but it remains to be determined whether the NatA complex, one of its catalytic subunits (Naa10p or Naa50p), or yet another enzymatic activity is responsible for these acetylation events. Of further note is that three Pro-starting peptides were here found to be post-translationally 3586

dx.doi.org/10.1021/pr2002325 |J. Proteome Res. 2011, 10, 3578–3589

Journal of Proteome Research

ARTICLE

Figure 9. Nucleic and amino acid sequence features of internal in vitro NR-acetylated peptides that start with Met. This figure shows iceLogos of Met starting in vitro NR-acetylated peptides starting at protein positions beyond 2, for which we have found no evidence of NR-acetylation. Given that these Met starting peptides comprise 9.1% of all in vitro NR-acetylated peptides, we suggest from this overrepresentation that a large fraction of the peptides in this group can also originate from alternative TIS. The upper panels illustrate amino acid features, while the lower panels illustrate nucleic acid features, with the trapezoid linking amino acid and nucleic acid indices. While the amino acid features are modest, the nucleic acid features are again in agreement with the coding part of the Kozak motif (guanine at +4).

acetylated, while such N-termini are never acetylated during cotranslational N-terminal acetylation.20,48 Additionally, the analogous group with indirect alternative TIS provides an extra 117 internal non-Met starting NR-acetylated peptides (additional data file 2, sheet 3, Supporting Information) of which the corresponding sequence features, including the overrepresented Ser, are nearly indistinguishable from those of the alternative TIS that were directly identified (Figure 8b). Another supplementary group contains the remaining Metstarting in vitro NR-13C2D3-acetylated peptides for which we observed no in vivo acetylated counterpart upon inspecting the raw MS data (additional data file 2, sheet 4, Supporting Information) but for which the presence of a putative alternative iMet residue might be indicative of the occurrence of alternative translation initiation. Although representative of NR-free N-termini, we consider this group as a possible resource for alternative TIS since 9.1% of all in vitro 13C2D3-acetylated peptides begin with or were preceded by a Met residue, which is a 4-fold increase compared to the random occurrence of Met in S. cerevisiae (2.1%). Taking into account that the N-terminal COFRADIC approach selects for protein N-termini, we suggest that a large fraction of these internal, NR-free Met-preceding/starting

peptides originate from alternative TIS. An iceLogo analysis of these 235 peptides revealed no particular amino acid features, yet the recurring +4 guanine again agrees with the above-mentioned findings (Figure 9).

’ CONCLUDING REMARKS A simple comparison between the general and the N-terminal amino acid composition illustrated that Ser, Thr, and Ala are overrepresented at yeast protein N-termini. One explanation for this could be that these three amino acid residues have small side-chain radii of gyration, which most often leads to removal of the initiator Met by MAPs. As a consequence, the increased presence of these amino acids at the second position in proteins might be driven by a pressure to recycle the energetically expensive Met for subsequent rounds of protein synthesis.53 N-Terminal COFRADIC was employed to isolate protein N-termini as an experimental means to validate database annotated TIS in yeast. Moreover, we also provide experimental evidence for the existence of 56 alternative TIS via direct identification of NR-acetylated peptides and 33 further alternative TIS via indirect means, which is a significant increase over the few 3587

dx.doi.org/10.1021/pr2002325 |J. Proteome Res. 2011, 10, 3578–3589

Journal of Proteome Research examples described in yeast to date. These numbers suggest that existing TIS annotations in yeast are highly reliable but that a fair number of proteins have multiple TIS that can co-occur within or across experiments. To that extent, our results might form a basis for various follow-up experiments (e.g., localization studies, proteinprotein interaction assays, absolute quantification, functional assays) to study the biological consequences of alternative TIS for each of these protein variants. We have further found a significant preference for guanine and adenine immediately following the start codon of alternative TIS, while other nucleic acid traits known to promote translation initiation were absent in this analysis. As such we suggest that the downstream surrounding nucleic acid features of a start codon are main contributing factors to alternative TIS of NR-acetylated protein N-termini.

’ ASSOCIATED CONTENT

bS

Supporting Information Additional data file 1 contains the original peptide identification list. These identifications are also available via PRIDE accession number:16442. Additional data file 2 contains four sheets corresponding to the four groups of NR-acetylated peptides that have been defined throughout the manuscript. The first sheet contains all unique peptides starting at annotated protein positions 1 or 2 and represents the annotated TIS. The second sheet contains all in vivo NR-acetylated identified peptides starting at internal protein positions (>2). The third sheet contains all in vitro NR-acetylated identified peptides starting at internal protein positions (>2) for which we have found indirect evidence of in vivo NR-acetylation via manual inspection of raw MS data. The fourth sheet contains all internal methionine starting peptides for which we found no evidence of in vivo NRacetylation via manual inspection of raw MS data. Additional data file 3 is the fasta database that was used to identify the MS/MS spectra. It consists of all S. cerevisiae entries from UniProtKB 15.10 and the 50 UTR in frame translations as described in the Experimental Section. This material is available free of charge via the Internet at http://pubs.acs.org.

’ AUTHOR INFORMATION Corresponding Author

*Department of Medical Protein Research and Biochemistry, VIB and Faculty of Medicine and Health Sciences, Ghent University, A. Baertsoenkaai 3, B-9000 Ghent, Belgium. Tel.: +32-92649274. Fax: +32-92649496. E-mail: [email protected]. Author Contributions ^

These authors contributed equally to this work.

’ ACKNOWLEDGMENT K.H. and P.V.D. are Postdoctoral Fellows of the Research Foundation - Flanders (FWO-Vlaanderen). K.G. acknowledges support by research grants from the Fund for Scientific Research  Flanders (Belgium) (project numbers 3.G0042.07 and 3. G0440.10), the Concerted Research Actions (project BOF07/ GOA/012) from the Ghent University, and the Inter University Attraction Poles (IUAP06). K.G. and L.M. further acknowledge support of the Ghent University Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks”.

ARTICLE

The Norwegian Research Council and the Norwegian Cancer Society are thanked for financial support to T.A.

’ REFERENCES (1) Thornton, S.; Anand, N.; Purcell, D.; Lee, J. Not just for housekeeping: protein initiation and elongation factors in cell growth and tumorigenesis. J. Mol. Med. 2003, 81 (9), 536–548. (2) Van Der Kelen, K.; Beyaert, R.; Inze, D.; De Veylder, L. Translational control of eukaryotic gene expression. Crit. Rev. Biochem. Mol. Biol. 2009, 44 (4), 143–68. (3) Kozak, M. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 1986, 44 (2), 283–292. (4) Kozak, M. At least six nucleotides preceding the AUG initiator codon enhance translation in mammalian cells. J. Mol. Biol. 1987, 196 (4), 947–950. (5) Nakagawa, S.; Niimura, Y.; Gojobori, T.; Tanaka, H.; Miura, K.-i. Diversity of preferred nucleotide sequences around the translation initiation codon in eukaryote genomes. Nucleic Acids Res. 2008, 36 (3), 861–871. (6) Kim, G.; Cole, N. B.; Lim, J. C.; Zhao, H.; Levine, R. L. Dual sites of protein initiation control the localization and myristoylation of methionine sulfoxide reductase A. J. Biol. Chem. 2010, 285 (23), 18085–94. (7) Kozak, M. Structural features in eukaryotic mRNAs that modulate the initiation of translation. J. Biol. Chem. 1991, 266 (30), 19867–70. (8) F€utterer, J.; Kiss-Laszlo, Z.; Hohn, T. Nonlinear ribosome migration on cauliflower mosaic virus 35S RNA. Cell 1993, 73 (4), 789–802. (9) Gaba, A.; Wang, Z.; Krishnamoorthy, T.; Hinnebusch, A. G.; Sachs, M. S. Physical evidence for distinct mechanisms of translational control by upstream open reading frames. EMBO J. 2001, 20 (22), 6453–63. (10) Kozak, M. Effects of intercistronic length on the efficiency of reinitiation by eucaryotic ribosomes. Mol. Cell. Biol. 1987, 7 (10), 3438–45. (11) Plessy, C.; Bertin, N.; Takahashi, H.; Simone, R.; Salimullah, M.; Lassmann, T.; Vitezic, M.; Severin, J.; Olivarius, S.; Lazarevic, D.; Hornig, N.; Orlando, V.; Bell, I.; Gao, H.; Dumais, J.; Kapranov, P.; Wang, H.; Davis, C. A.; Gingeras, T. R.; Kawai, J.; Daub, C. O.; Hayashizaki, Y.; Gustincich, S.; Carninci, P. Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan. Nat. Methods 2010, 7 (7), 528–34. (12) Schoenberg, D. R.; Maquat, L. E. Re-capping the message. Trends Biochem. Sci. 2009, 34 (9), 435–42. (13) Geoghegan, K. F.; Feng, X.; Chang, J. S.; Kelleher, K.; Wu, P. W.; Lin, L.; Rajamohan, F. Initiation of translation at an upstream non-AUG codon accounting for N-terminally extended minor forms of recombinant proteins expressed in insect cells. Protein Expression Purif. 2011, 76 (1), 72–8. (14) Chang, K. J.; Wang, C. C. Translation initiation from a naturally occurring non-AUG codon in Saccharomyces cerevisiae. J. Biol. Chem. 2004, 279 (14), 13778–85. (15) Li, X.; Chang, Y. H. Amino-terminal protein processing in Saccharomyces cerevisiae is an essential function that requires two distinct methionine aminopeptidases. Proc. Natl. Acad. Sci. U.S.A. 1995, 92 (26), 12357–61. (16) Driessen, H. P.; de Jong, W. W.; Tesser, G. I.; Bloemendal, H. The mechanism of N-terminal acetylation of proteins. CRC Crit. Rev. Biochem. 1985, 18 (4), 281–325. (17) Polevoda, B.; Sherman, F. N-terminal acetyltransferases and sequence requirements for N-terminal acetylation of eukaryotic proteins. J. Mol. Biol. 2003, 325 (4), 595–622. (18) Kendall, R. L.; Yamada, R.; Bradshaw, R. A. Cotranslational amino-terminal processing. Methods Enzymol. 1990, 185, 398–407. (19) Gevaert, K.; Goethals, M.; Martens, L.; Van Damme, J.; Staes, A.; Thomas, G. R.; Vandekerckhove, J. Exploring proteomes and 3588

dx.doi.org/10.1021/pr2002325 |J. Proteome Res. 2011, 10, 3578–3589

Journal of Proteome Research analyzing protein processing by mass spectrometric identification of sorted N-terminal peptides. Nat. Biotechnol. 2003, 21 (5), 566–9. (20) Arnesen, T.; Van Damme, P.; Polevoda, B.; Helsens, K.; Evjenth, R.; Colaert, N.; Varhaug, J. E.; Vandekerckhove, J.; Lillehaug, J. R.; Sherman, F.; Gevaert, K. Proteomics analyses reveal the evolutionary conservation and divergence of N-terminal acetyltransferases from yeast and humans. Proc. Natl. Acad. Sci. U.S.A. 2009, 106 (20), 8157–62. (21) Elias, J. E.; Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4 (3), 207–214. (22) Martens, L.; Vandekerckhove, J.; Gevaert, K. DBToolkit: processing protein databases for peptide-centric proteomics. Bioinformatics 2005, 21, 3584–5. (23) Helsens, K.; Colaert, N.; Barsnes, H.; Muth, T.; Flikka, K.; Staes, A.; Timmerman, E.; Wortelkamp, S.; Sickmann, A.; Vandekerckhove, J.; Gevaert, K.; Martens, L. ms_lims, a simple yet powerful open source LIMS for mass spectrometry-driven proteomics. Proteomics 201010.1002/pmic.200900409. (24) Staes, A.; Van Damme, P.; Helsens, K.; Demol, H.; Vandekerckhove, J.; Gevaert, K. Improved recovery of proteome-informative, protein N-terminal peptides by combined fractional diagonal chromatography (COFRADIC). Proteomics 2008, 8, 1362–70. (25) Colaert, N.; Helsens, K.; Martens, L.; Vandekerckhove, J.; Gevaert, K. Improved visualization of protein consensus sequences by iceLogo. Nat. Methods 2009, 6 (11), 786–7. (26) Gallien, S.; Perrodou, E.; Carapito, C.; Deshayes, C.; Reyrat, J.-M.; Van Dorsselaer, A.; Poch, O.; Schaeffer, C.; Lecompte, O. Orthoproteogenomics: multiple proteomes investigation through orthology and a new MS-based protocol. Genome Res. 2009, 19 (1), 128–35. (27) Goetze, S.; Qeli, E.; Mosimann, C.; Staes, A.; Gerrits, B.; Roschitzki, B.; Mohanty, S.; Niederer, E. M.; Laczko, E.; Timmerman, E.; Lange, V.; Hafen, E.; Aebersold, R.; Vandekerckhove, J.; Basler, K.; Ahrens, C. H.; Gevaert, K.; Brunner, E. Identification and functional characterization of N-terminally acetylated proteins in Drosophila melanogaster. PLoS Biol. 2009, 7 (11), e1000236. (28) C^ote, R. G.; Jones, P.; Martens, L.; Kerrien, S.; Reisinger, F.; Lin, Q.; Leinonen, R.; Apweiler, R.; Hermjakob, H. The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases. BMC Bioinf. 2007, 8, 401. (29) Helsens, K.; Timmerman, E.; Vandekerckhove, J.; Gevaert, K.; Martens, L. Peptizer: A tool for assessing false positive peptide identifications and manually validating selected results. Mol. Cell. Proteomics 2008, 7, 2363–72. (30) de Godoy, L. M. F.; Olsen, J. V.; Cox, J.; Nielsen, M. L.; Hubner, N. C.; Fr€ohlich, F.; Walther, T. C.; Mann, M. Comprehensive massspectrometry-based proteome quantification of haploid versus diploid yeast. Nature 2008, 455 (7217), 1251–4. (31) Smedley, D.; Haider, S.; Ballester, B.; Holland, R.; London, D.; Thorisson, G.; Kasprzyk, A. BioMart--biological queries made easy. BMC Genomics 2009, 10, 22. (32) Martens, L.; Hermjakob, H.; Jones, P.; Adamski, M.; Taylor, C.; States, D.; Gevaert, K.; Vandekerckhove, J.; Apweiler, R. PRIDE: the proteomics identifications database. Proteomics 2005, 5 (13), 3537–45. (33) Meinnel, T.; Serero, A.; Giglione, C. Impact of the N-terminal amino acid on targeted protein degradation. Biol. Chem. 2006, 387 (7), 839–51. (34) Hwang, C. S.; Shemorry, A.; Varshavsky, A. N-terminal acetylation of cellular proteins creates specific degradation signals. Science 2010, 327 (5968), 973–7. (35) Erhardt, M.; Wegrzyn, R. D.; Deuerling, E. Extra N-terminal residues have a profound effect on the aggregation properties of the potential yeast prion protein Mca1. PLoS One 2010, 5 (3), e9929. (36) Welch, E. M.; Jacobson, A. An internal open reading frame triggers nonsense-mediated decay of the yeast SPT10 mRNA. EMBO J. 1999, 18 (21), 6134–45. (37) Outten, C. E.; Culotta, V. C. Alternative start sites in the Saccharomyces cerevisiae GLR1 gene are responsible for mitochondrial

ARTICLE

and cytosolic isoforms of glutathione reductase. J. Biol. Chem. 2004, 279 (9), 7785–91. (38) Antunez de Mayolo, A.; Lisby, M.; Erdeniz, N.; Thybo, T.; Mortensen, U. H.; Rothstein, R. Multiple start codons and phosphorylation result in discrete Rad52 protein species. Nucleic Acids Res. 2006, 34 (9), 2587–97. (39) Elgersma, Y.; van Roermund, C. W.; Wanders, R. J.; Tabak, H. F. Peroxisomal and mitochondrial carnitine acetyltransferases of Saccharomyces cerevisiae are encoded by a single gene. EMBO J. 1995, 14 (14), 3472–9. (40) Wolfe, C. L.; Lou, Y. C.; Hopper, A. K.; Martin, N. C. Interplay of heterogeneous transcriptional start sites and translational selection of AUGs dictate the production of mitochondrial and cytosolic/nuclear tRNA nucleotidyltransferase from the same gene in yeast. J. Biol. Chem. 1994, 269 (18), 13361–6. (41) Gillman, E. C.; Slusher, L. B.; Martin, N. C.; Hopper, A. K. MOD5 translation initiation sites determine N6-isopentenyladenosine modification of mitochondrial and cytoplasmic tRNA. Mol. Cell. Biol. 1991, 11 (5), 2382–90. (42) Boguta, M.; Hunter, L. A.; Shen, W. C.; Gillman, E. C.; Martin, N. C.; Hopper, A. K. Subcellular locations of MOD5 proteins: mapping of sequences sufficient for targeting to mitochondria and demonstration that mitochondrial and nuclear isoforms commingle in the cytosol. Mol. Cell. Biol. 1994, 14 (4), 2298–306. (43) Willer, M.; Rainey, M.; Pullen, T.; Stirling, C. J. The yeast CDC9 gene encodes both a nuclear and a mitochondrial form of DNA ligase I. Curr. Biol. 1999, 9 (19), 1085–94. (44) Fujita, A.; Matsumoto, S.; Kuhara, S.; Misumi, Y.; Kobayashi, H. Cloning of the yeast SFL2 gene: its disruption results in pleiotropic phenotypes characteristic for tup1 mutants. Gene 1990, 89 (1), 93–9. (45) Grohmann, L.; Graack, H. R.; Kruft, V.; Choli, T.; GoldschmidtReisin, S.; Kitakawa, M. Extended N-terminal sequencing of proteins of the large ribosomal subunit from yeast mitochondria. FEBS Lett. 1991, 284 (1), 51–6. (46) Williams, E. H.; Perez-Martinez, X.; Fox, T. D. MrpL36p, a highly diverged L31 ribosomal protein homolog with additional functional domains in Saccharomyces cerevisiae mitochondria. Genetics 2004, 167 (1), 65–75. (47) Tanner, N. K.; Cordin, O.; Banroques, J.; Doere, M.; Linder, P. The Q motif: a newly identified motif in DEAD box helicases may regulate ATP binding and hydrolysis. Mol. Cell 2003, 11 (1), 127–38. (48) Van Damme, P.; Van Damme, J.; Demol, H.; Staes, A.; Vandekerckhove, J.; Gevaert, K. A review of COFRADIC techniques targeting protein N-terminal acetylation. BMC Proc. 2009, 3 (Suppl 6), S6. (49) Chang, H. H.; Falick, A. M.; Carlton, P. M.; Sedat, J. W.; DeRisi, J. L.; Marletta, M. A. N-terminal processing of proteins exported by malaria parasites. Mol. Biochem. Parasitol. 2008, 160, 107–15. (50) Gordiyenko, Y.; Deroo, S.; Zhou, M.; Videler, H.; Robinson, C. V. Acetylation of L12 increases interactions in the Escherichia coli ribosomal stalk complex. J. Mol. Biol. 2008, 380, 404–14. (51) Wang, Z.; Obidike, J. E.; Schey, K. L. Posttranslational Modifications of Bovine Lens Beaded Filament Proteins Filensin and CP49. Invest. Ophthalmol. Visual Sci. 200910. 1167/iovs.094565. (52) Helbig, A. O.; Rosati, S.; Pijnappel, P. W.; van Breukelen, B.; Timmers, M. H.; Mohammed, S.; Slijper, M.; Heck, A. J. Perturbation of the yeast N-acetyltransferase NatB induces elevation of protein phosphorylation levels. BMC Genomics 2010, 11, 685. (53) Frottin, F.; Martinez, A.; Peynot, P.; Mitra, S.; Holz, R. C.; Giglione, C.; Meinnel, T. The proteomics of N-terminal methionine cleavage. Mol. Cell. Proteomics 2006, 5 (12), 2336–49.

3589

dx.doi.org/10.1021/pr2002325 |J. Proteome Res. 2011, 10, 3578–3589