Identification of Phosphorylation Sites in Protein Kinase A Substrates

Protein phosphorylation plays a key role in cell regulation and identification of phosphorylation sites ... Protein phosphorylation catalyzed by prote...
0 downloads 0 Views 135KB Size
Identification of Phosphorylation Sites in Protein Kinase A Substrates Using Artificial Neural Networks and Mass Spectrometry Majbrit Hjerrild,*,† Allan Stensballe,‡ Thomas E. Rasmussen,† Christine B. Kofoed,† Nikolaj Blom,§ Thomas Sicheritz-Ponten,§,| Martin R. Larsen,‡ Søren Brunak,§ Ole N. Jensen,‡ and Steen Gammeltoft† Department of Clinical Biochemistry, Glostrup Hospital, Nordre Ringvej 57, DK-2600 Glostrup, Denmark, Department of Biochemistry and Molecular Biology, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark, Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark, Anker Engelunds Vej 1, DK-2800 Kgs. Lyngby, Denmark, and Department of Medicinal Chemistry, Division of Pharmacognosy, Uppsala University, BMC, 751 23 Uppsala, Sweden Received November 7, 2003

Protein phosphorylation plays a key role in cell regulation and identification of phosphorylation sites is important for understanding their functional significance. Here, we present an artificial neural network algorithm: NetPhosK (http://www.cbs.dtu.dk/services/NetPhosK/) that predicts protein kinase A (PKA) phosphorylation sites. The neural network was trained with a positive set of 258 experimentally verified PKA phosphorylation sites. The predictions by NetPhosK were validated using four novel PKA substrates: Necdin, RFX5, En-2, and Wee 1. The four proteins were phosphorylated by PKA in vitro and 13 PKA phosphorylation sites were identified by mass spectrometry. NetPhosK was 100% sensitive and 41% specific in predicting PKA sites in the four proteins. These results demonstrate the potential of using integrated computational and experimental methods for detailed investigations of the phosphoproteome. Keywords: protein kinase A • phosphorylation site prediction • neural network analysis • mass spectrometry

Introduction Protein phosphorylation catalyzed by protein kinases is a ubiquitous intracellular post-translational modification found in eukaryotes as well as prokaryotes. Reversible protein phosphorylation is involved in regulation of diverse biological processes such as cell proliferation, differentiation, apoptosis, and metabolism.1-3 Eukaryotic protein kinases can be classified into two categories based on their target amino acid i.e., protein tyrosine kinases and protein serine/threonine kinases.4 Modification of serine and threonine is far more abundant compared to tyrosine phosphorylation. Within the large family of serine/ threonine kinases, protein kinase A (PKA), also known as cAMPdependent protein kinase, is one of the best understood.5 Accordingly, it often serves as a prototype for the whole family. PKA is ubiquitous and is involved in signal transduction and regulation of many cellular processes including metabolism, secretion, contraction, motility, division, and differentiation. PKA was the first protein kinase to be analyzed by X-ray * To whom correspondence should be addressed: Tel.: +45 43 23 24 75. Fax: +45 43 23 39 29. E-mail: [email protected]. † Department of Clinical Biochemistry, Glostrup Hospital. ‡ Department of Biochemistry and Molecular Biology, University of Southern Denmark. § Center for Biological Sequence Analysis, BioCentrum-DTU, Technical University of Denmark. | Department of Medicinal Chemistry, Division of Pharmacognosy, Uppsala University.

426

Journal of Proteome Research 2004, 3, 426-433

Published on Web 02/28/2004

crystallography and defining the protein kinase fold that is conserved in about 500 kinases in the human genome.6 Even though the number of identified phosphoproteins is rapidly increasing, it is believed that only a small fraction of physiological phosphorylation sites has yet been assigned.7 It has been estimated that 30-50% of the proteome consists of phosphoproteins.8 A challenge is to characterize the phosphoproteome in order to provide structural and functional information about all phosphorylated proteins in cells. Today, a variety of experimental methods are available for characterization of phosphoproteins. Commonly, techniques such as immunoblotting or immunoprecipitation with phosphospecific antibodies,9-11 phosphopeptide mapping on TLC plates,12 mutagenesis, and Edman degradation13,14 are performed. Recently, functional proteomics using bioinformatics or mass spectrometry has been applied to identify phosphoproteins and map the localization of the phosphorylation sites 15-17. Bioinformatics-based methods have been introduced for recognition of phosphorylation sites in proteins from primary structure. Prosite is based on search for the consensus motif of a given kinase in a protein substrate.18 The consensus motif has been established by studying the specificity of the kinase with the aid of substituted peptide substrates. It includes the target amino acid (P0) and specifies the positions relative to it (P+1, P+2.../P-1, P-2, etc.). Although some kinases such as PKA, PKC, CK2, and CaM kinase II have been studied thoroughly, little or no information is avaiable about many other 10.1021/pr0341033 CCC: $27.50

 2004 American Chemical Society

research articles

Mapping of PKA Phosphorylation Sites

kinases.8 Phosphobase 2.0 is a database of experimentally verified phosphorylation sites.7 It contains more than 400 protein entries i.e., phosphoproteins with more than 1400 experimentally determined phosphorylation sites. Entries are compiled and revised from the literature and from major protein sequence databases such as Swissprot and PIR. NetPhos is an algorithm where phosphorylation sites are identified by neural network analysis of a protein substrate.19 The neural network is trained with a set of positive sites i.e., experimentally verified phosphorylation sites and a set of negative sites i.e., phosphoacceptor amino acids that has not been reported to be phosphorylated. In its first version NetPhos is general for each of the three classes of kinases and not specific for individual kinases. Netphos predicts phosphorylation sites of either serine/threonine or tyrosine kinases in a potential protein substrate. Scansite is a searching algorithm based on experimental phosphorylation of oriented peptide libraries.20 The optimal protein kinase substrate motif is determined by enriching for and sequencing phosphorylated peptides following the kinase reaction. Scansite predicts phosphorylation sites for selected kinases that have been analyzed with respect to phosphorylation of peptide libraries. It may be combined with identification of sequence motifs that are likely to bind to specific phosphorylated protein domains such as 14-3-3, SH2, and PTB domains. This will increase the specificity of the prediction. Finally, Predikin has been developed for structurebased search for kinase substrates.21 Using the avaiable crystal structures, molecular modeling, and sequence analyses of kinases and substrates, a set of rules were developed which govern the binding of a heptapeptide substrate motif (surrounding the phosphorylation site) to the kinase. These rules were implemented in a program for automated prediction of optimal substrate peptides taking only the amino acid sequence of the protein kinase as input. Predikin can locate phosphorylation sites in a single protein sequence with high reliability. However, its key utility is to predict novel putative substrates through searching protein databases. Mass spectrometry is a sensitive analytical technique for the detection of phosphopeptides derived from proteins and for the exact localization of phosphorylated amino acid residues. Matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) and electrospray tandem mass spectrometry (MS/MS) have been established as powerful tools for investigation of post-translational modifications in proteins, including phosphoproteins.16 MALDI-TOF MS is a sensitive and specific analytical method for phosphoprotein characterization because phosphorylation of polypeptides is measured by an increase in protein or peptide molecular mass.16 Recent advances in MALDI MS/MS technology enables phosphopeptide detection and sequencing in a single experiment.22 Furthermore, phosphate release from phosphorylated molecules can be induced by MS/MS to produce highly specific and diagnostic reporter ion signals.16,23 However, femtomole level phosphopeptide mapping by mass spectrometry is still an analytical challenge for a number of reasons. First, phosphopeptides are usually detected with low efficiency or not at all by mass spectrometry probably due to their acidic character. Second, the phosphorylation may not be stoichiometric at any given site in a protein, i.e., a phosphopeptide may be of low abundance compared to the unmodified peptide. Third, phosphorylated residues may interfere with the tryptic digestion of the phosphoprotein resulting in peptide fragments that are too large for analysis by mass spectrometry. Finally, phosphopep-

tides are often relatively hydrophilic and they may elute in the void volume during chromatography for peptide purification. Some of these limitations were circumvented by use of chromatography to purify and enrich the phosphoproteins in the mixture. Reversed-phase liquid chromatography (LC) in combination with electrospray MS/MS was applied for the separation, selective detection and sequencing of phosphopeptides.10,11,24 Furthermore, affinity-based techniques with either antiphosphotyrosine precipitation or immobilized metals, in particular Fe(III)-immobilized metal affinity chromatography (IMAC) columns, were used prior to LC-MS/MS for phosphopeptide purification and identification. 10,17,24,25 Overall mass spectrometry in combination with optimized sample preparation methods has resulted in significant progress in the characterization of the phosphoproteome. In the present study, we have investigated the value of neural network analysis for prediction of PKA phosphorylation sites. Four proteins including the transcription factors Necdin, RFX5, En-2, and the tyrosine kinase Wee 1 were selected as putative PKA substrates. The phosphorylation sites were predicted using the PKA-specific neural network NetPhosK. The four proteins were phosphorylated by the catalytic subunit of PKA in vitro. Thirteen in vitro PKA sites were subsequently identified by MALDI-TOF MS or LC-MS/MS in the four proteins. The neural network was able to predict PKA phosphorylation sites with a sensitivity of 100% and a specificity of 41%.

Experimental Procedures Development of NetPhosK. The dataset used for development of NetPhosK contained 258 experimentally verified PKA phosphorylation sites extracted from PhosphoBase 7, the SwissProt database, and literature. The PKA sequence logo was used for displaying the position specific features of complex sequence alignments as described earlier.26 An artificial neural network was trained on a set of PKA phosphorylation sites and cross-validated using the same principles as described earlier for a general phosphorylation site prediction method.19 The NetPhosK module used here for prediction of PKA sites is part of a larger package for kinase-specific phosphorylation site prediction (Sicheritz-Ponten et al. 2004, manuscript in preparation). Prediction of PKA Sites. PKA phosphorylation sites were predicted in Necdin, RFX5, En-2, and Wee 1 by the newly developed neural network NetPhosK. The neural network prediction server is available on the Internet at http://www.cbs.dtu.dk/services/NetPhosK. For comparison, the www-accessible prediction servers Scansite (http://scansite.mit.edu/) and Prosite (http://www.expasy.ch/prosite/) were used for prediction of PKA sites in the four proteins. Cloning. Plasmids encoding murine En-2, human Wee 1, and human RFX5 were generously given by Alexandra Joyner, Department of Cell Biology, New York, USA; Tony Hunter, The Salk Institute for Biological Studies, San Diego, USA; and Walter Reith, Department of Genetics and Microbiology, University of Geneva Medical School, Switzerland, respectively. Human Necdin was amplified by PCR using chromosomal DNA as template. Necdin was subsequently cloned into a pMT2 vector using XhoI and KpnI cloning sites. En-2, Wee 1, and RFX5 were sub-cloned into a pMT2-MCS vector using SalI and BsiWI cloning sites. The pMT2 and pMT2-MCS vector were used for expression of HA-tagged proteins in mammalian cells. Cell Culture and Transfection. COS-7 cells were grown in Dulbecco’s modified Eagle’s medium supplemented with 10% Journal of Proteome Research • Vol. 3, No. 3, 2004 427

research articles fetal bovine serum and antibiotics. Cells were incubated at 37 °C in a humidified atmosphere containing 5% CO2. The 4 putative PKA substrates were transfected into COS-7 cells using Lipofectamine Reagent (www.invitrogen.com) according to the protocol of the manufacture and the cells were harvested after 48 h. Immunoprecipitation. Cells were lysed in lysis buffer (150 mM NaCl, 50 mM Tris-HCl, 0.5% Triton X-100, 5 mM EDTA, 1 mM Na3VO4, 50 mM NaF, 10 mM Na-β-glycerophosphate, 1 mM PMSF, 10 nM calyculin, 10 µM leupeptin, 5 µM pepstatin and 1000 U/mL Aprotinin) at 4 °C for 15 min. The lysate was centrifuged at 4 °C for 15 min at 10 000 ×g, and the supernatant was incubated with HA antibody (12CA5) for 2 h at 4 °C. Protein G sepharose beads were added to the immune complexes and the mixture was incubated for 30 min. The immunoprecipitated complexes were subsequently washed three times with lysis buffer (without inhibitors) and twice with 1.5× kinase buffer (45 mM Tris-HCl, 15 mM MgCl2 and 1.5 mM DTT). In vitro PKA Kinase Assay. Prior to PKA kinase assay immunoprecipitated HA-tagged Necdin, RFX5, En-2, and Wee 1 were incubated with alkaline phosphatase (New England Biolabs, www.neb.com) at 37 °C for 30 min in order to dephosphorylate the proteins and inactivate co-immunoprecipitated protein kinases that phosphorylated the proteins in vitro (M. Fro¨din, personal communication). The samples were washed three times with kinase buffer. Necdin, RFX5, En-2, and Wee 1 were incubated with or without 10 U of PKA catalytic subunit (Sigma-Aldrich, www.sigmaaldrich.com), 6000 µM 0.5 µL [γ-32P]ATP, 30 mM Tris-HCl, 10 mM MgCl2 and 1 mM DTT at 30 °C for 20 min. The resins were washed twice with 1.5× kinase buffer to remove excess [γ-32P]ATP. Reactions were stopped by the addition of 20 µL 1× Laemmli sample buffer prior to separation by SDS-PAGE. Finally, phosphorylated proteins were visualized by autoradiography. For mass spectrometry analysis the proteins were phosphorylated in vitro by PKA in kinase buffer with nonradioactive ATP and isolated by SDS-PAGE and Coomassie staining. In-gel Digestion. HA-tagged proteins were excised from Coomassie stained gels. The proteins were reduced with 10 mM DTT at 56 °C for 45 min and subsequently alkylated with 55 mM iodoacetamide at room temperature for 30 min.27 The proteins were digested overnight at 37 °C with an excess of sequencing grade trypsin (www.promega.com). Peptides were extracted from the gel by addition of 5% formic acid and acetonitrile. The resulting supernatant was subsequently lyophilized and resuspended in 20 µL 5% formic acid. Enrichment of Phosphopeptides by Fe(III)-IMAC. Fe(III)loaded NTA silica resins were prepared and loaded into a GELoader tip as previously described.24 The Fe(III)-IMAC column was slowly loaded with 0.25 µL of the peptide mixture diluted into 30 µL of 0.1 M acetic acid. The column was then washed in turn with 10 µL 0.1 M acetic acid; 10 µL 0.1 M acetic acid in acetonitrile (3:1 v/v) and 10 µL 0.1 M acetic acid. Peptides were eluted with 1 µL of 2,5-Dihydroxybenzoic acid matrix in 50% acetonitrile/2.5% formic acid in a series of droplets on the MALDI MS probe. Peptide Mapping by MALDI-TOF MS. 1 µL of crude peptide mixture in 5% formic acid was desalted and concentrated using a nanoscale Poros R2 microcolumn 28. Peptides were eluted with 1 µL of a saturated matrix solution of 4-hydroxy-Rcyanocinnamic acid in 50% acetonitrile/2.5% formic acid. The eluate was deposited directly on the MALDI MS probe in a series of small droplets. The peptides were subsequently 428

Journal of Proteome Research • Vol. 3, No. 3, 2004

Hjerrild et al.

Figure 1. PKA sequence logo. Sequence logo of PKA phosphorylation sites aligned at the phosphoacceptor residue (position 11) showing the Shannon information (in units of bits). The experimental dataset contained 258 PKA phosphorylation sites.

analyzed by MALDI-TOF (www.bdal.com). Data analysis was performed using moverz software (www.genomicsolutions.com). Peptide Sequencing by Tandem MS. Phosphopeptide candidates identified by MALDI-TOF MS were sequenced by nanoelectrospray (ESI) tandem mass spectrometry (QTOF1 quadrupole time-of-flight mass spectrometer, Waters/Micromass, www.waters.com) or MALDI-QTOF MS/MS (Ultima Global HT, Waters/Micromass). In addition, automated nanoflow liquid chromatography/tandem mass spectrometric analysis was performed using a QTOF Ultima mass spectrometer (Waters/Micromass, www.waters.com). A nano-HPLC system (Ultimate; www.lcpackings.com) delivering a flow of 175 nl/ min was used to separate the peptide mixture prior to mass spectrometry analysis. Five µL of peptide sample was loaded on the reverse phase chromatographic column (Agilent Zorbax C18 3.5 µm internal diameter) and separated using a gradient of 5-38% acetonitrile. Data were analyzed by MassLynx software and the Mascot search engine (www.matrixscience.com).

Results Development of NetPhosK. We trained the neural network with a positive set of 258 experimentally verified PKA phosphorylation sites and a negative set of 1309 serine or threonine motifs that were assumed not to be phosphorylated. On the basis of the positive training set a sequence logo was generated (Figure 1). The sequence logo shows the amino acid residues that are frequently found in positions from P-10 to P+10 of the phosphoacceptor site. In the PKA sequence logo the consensus motif recognized by PKA at position P-3 and P-2 was observed. 78% and 72% arginine and lysine was found at position P-3 and P-2, respectively. In addition, it was observed that PKA is able to phosphorylate either serine or threonine in the same sequence context and that serine is most frequently modified by PKA among the selected 258 phosphorylation sites. Selection of PKA Substrates. The Swiss-Prot database was screened for putative PKA substrates using Scansite.19 From the list of substrates we selected four proteins: Necdin, RFX5, En2, and Wee 1, which have not previously been reported as PKA substrates. Necdin is a transcription factor predominantly expressed in postmitotic neurons and is involved in growth suppression, differentiation and development of neurons in the brain.29 RFX5 is a transcription factor which transactivates major histocompatibility complex class II genes.30 En-2 is a homeobox containing transcription factor involved in neural

research articles

Mapping of PKA Phosphorylation Sites Table 1. Prediction of PKA Phosphorylation Sites in Necdin, RFX5, En-2, and Wee 1 Using NetPhosK, Scansite, and Prositea PKA sites identified by mass total Ser predicted protein and Thr PKA sites NetPhos K Scansite Prosite spectrometry

Necdin

34

RFX5

79

En-2

50

Wee 1

94

Ser-9 Ser-138 Ser-144 Ser-159 Ser-226 Ser-317 Ser-131 Ser-157 Thr-168 Ser-221 Ser-264 Ser-308 Ser-339 Ser-356 Ser-357 Ser-473 Ser-120 Ser-267 Thr-294 Ser-78 Ser-211 Ser-212 Thr-257 Thr-285 Ser-330 Ser-383 Ser-438 Ser-471 Ser-559 Ser-567 Ser-576 Ser-640 Ser-642

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ + + + + + + + + -

+ + + + + + + + + + + -

* * + + + * + + + * * + + * + + * * + + +

a PKA phosphorylation sites were predicted with NetPhosK with a score > 0.5.19 Prediction with Scansite was performed with medium stringency.20 Asterisk marks predicted sites lying outside the sequence covered by mass spectrometry.

development 31. Wee 1 is a tyrosine kinase that blocks entry into mitosis by phosphorylation and inhibition of cdc2.32 Prediction of PKA Phosphorylation Sites. The neural network NetPhosK trained for specific identification of PKA phosphorylation sites was used for prediction of PKA phosphorylation sites in Necdin, RFX5, En-2, and Wee 1. For comparison, PKA phosphorylation sites were predicted using the www-accessible algorithms Scansite and Prosite. NetPhosK predicted 32 PKA phosphorylation sites in Necdin, RFX5, En2, and Wee 1, whereas Scansite (medium stringency) and Prosite predicted 9 and 11 PKA sites, respectively. The predicted PKA sites are listed in Table 1. The phosphorylation sites verified by mass spectrometry (see below) are also shown in Table 1. The sequence coverage by mass spectrometry was not complete and the predicted sites that are not covered are marked with asterisks in Table 1. PKA Phosphorylation in Vitro. Expression constructs encoding HA-tagged Necdin, RFX5, En-2, or Wee 1 were transfected into COS-7 cells and the proteins were immunoprecipitated using an antibody against the HA-epitope. To determine whether Necdin, RFX5, En-2, and Wee 1 could be phosphorylated by PKA in vitro, immunoprecipitated proteins were incubated with purified catalytic subunit of PKA in the presence of [γ-32P]ATP. The phosphorylated proteins were resolved by SDS/PAGE and visualized by autoradiography. As shown in Figure 2 the catalytic subunit of PKA clearly catalyzed the

Figure 2. In vitro PKA phosphorylation of Necdin, RFX5, En-2, and Wee 1. Immunoprecipitated Necdin, RFX5, En-2, and Wee 1 were incubated with [γ-32P]ATP in the presence (+) or absence (-) of PKA. The proteins were separated by SDS-PAGE. Incorporation of phosphate was visualized by autoradiography and the protein amount was evaluated by Coomassie staining.

incorporation of radioactive phosphate into Necdin, RFX5, En2, and Wee1 implying that the proteins are substrates of PKA in vitro. Identification of in Vitro PKA Phosphorylation Sites by Mass Spectrometry. PKA specific phosphorylation sites were identified by a combination of mass spectrometry based methods. In vitro phosphorylated Necdin, RFX5, En-2, and Wee 1 were separated by SDS-PAGE. Protein bands were subjected to in-gel digestion using trypsin and the resultant peptide mixtures were analyzed by MALDI-TOF MS. MALDI-TOF MS allowed identification of phosphopeptides by comparing spectra from in vitro phosphorylated samples with spectra from control samples. Phosphopeptides were identified in two ways: by observation of 80 Da (HPO3) mass increase compared to untreated samples or alternatively by neutral loss or metastable loss of phosphoric acid corresponding to 98 Da (H3PO4) mass decrease. Furthermore, immobilized Fe(III) affinity chromatography (Fe(III)-IMAC) columns were used for enrichment of phosphopeptides from crude peptide mixtures prior to MALDI-TOF MS analysis. MALDI-QTOF MS/MS or nanoelectrospray-QTOF MS/MS was used for sequencing for exact determination of phosphorylation sites in the phosphopeptides identified by the MALDI-TOF MS analysis.22,24 Finally, LC-ESIMS/MS experiments were performed to locate specific serine or threonine phosphorylation site(s). On the basis of the MS/ MS data, specific residues were identified as phosphorylation sites in 10 of the thirteen phosphopeptides. Peptide fragment ions of the b- and y-type 33 allowed exact localization of phosphorylation sites. β-elimination of phosphoric acid and conversion of phosphoserine to dehydroalanine and phosphothreonine to 2-aminodehydrobutyric acid confirmed the phosphoamino acid assignments. Fe(III)-IMAC prior to MALDI-TOF analysis resulted in isolation of a phosphopeptide from Necdin (amino acids 306-321). The fragment ions obtained by electrospray-QTOF MS/MS analysis showed that Necdin is modified by PKA at Ser-316 or Ser-317. In addition, LC-ESI-MS/MS experiments were performed to locate specific serine or threonine phosphorylation site(s). The PKA site at Ser-316 or Ser-317 in Necdin was confirmed by LC-ESI-MS/MS and two additional sites at Ser144 (or Thr-143) and Ser-226 were found. MALDI-TOF MS Journal of Proteome Research • Vol. 3, No. 3, 2004 429

research articles

Hjerrild et al.

Table 2. Identification of PKA Phosphorylation Sites in Necdin, RFX5, En-2, and Wee 1 by Mass Spectrometry protein

PKA sites

Necdin

S144 (or T143) S226 S317 (or S316)

142-RTpSLILAR-149 225-HpSTFGDVR-232 306-EANPTAHYPRSpSVSED-321

RFX 5

T168

166-RKpTLVSMPPLPGLDLK-181

S308 S339

306-KKpSVVESSAPGANNLQVNALVAR-328 339-pSLIPPIPVSPPILAPR-354

S473 S267

473-pSGGSGERNSTPLK-485 265-RQpSLAQELSLNESQIK-280

S211 S212 S438 (or T437) S559 S642

208-LRGpSSLFMDTEK-219 210-GSpSLFMDTEK-219 437-TpSIPNAASEEGDEDDWASNK-456 557-RPpSAMALVK-565 637-MNRSVpSLTIY-646

En-2

Wee 1

tryptic phosphopeptides

revealed two phosphopeptides in RFX5 corresponding to amino acids 339-354 and amino acids 166-181. Fe(III)-IMAC prior to MALDI-TOF MS also isolated peptide 339-354 from RFX5. MALDI-QTOF MS/MS or nanoelectrospray-QTOF MS/MS identified phosphorylation sites at Ser-339 and Thr-168 in RFX5. Finally, Thr-168, Ser-308, and Ser-473 in RFX5 were identified by LC-ESI-MS/MS as phosphorylated by PKA in vitro. A phosphopeptide in En-2 was detected by MALDI-TOF MS corresponding to a single phosphorylation site in the region from amino acid 265 to 280 (Figure 3a). Fe(III)-IMAC also resulted in isolation of this phosphopeptide from En-2 (amino acids 265-280) (Figure 3b). The PKA site was determined as Ser-267 in En-2 by sequencing with MALDI-QTOF MS/MS or nanoelectrospray-QTOF MS/MS. LC-ESI-MS/MS analysis confirmed the PKA phosphorylation of Ser-267 in En-2 (Figure 3c). Five PKA sites in Wee 1 were determined by LC-ESI-MS/MS namely Ser-211, Ser-212, Ser-559, Ser-642, and Ser-438 (or Thr437). In conclusion, the combined mass spectrometry methods revealed PKA phosphorylation of Necdin at Ser-144 (or Thr143), Ser-226 and Ser-317 (or Ser-316); of RFX5 at Thr-168, Ser308, Ser-339 and Ser-473; of En-2 at Ser-267 and finally of Wee1 at Ser-211, Ser-212, Ser-559, Ser-642, and Ser-438 (or Thr-437). The tryptic peptides phosphorylated by PKA and the identified phosphorylation sites are summarized in Table 2.

Discussion Analysis of post-translational modifications is a major challenge in proteomics research. Protein phosphorylation is one of the most abundant post-translational modification in mammalian cells, and it is an important general mechanism controlling diverse intracellular processes. To understand the role of phosphorylation in signal transduction and cellular regulation, the exact location of phosphorylation sites must be determined. Substrate recognition by kinases is in part determined by the amino acid sequence surrounding the phosphorylated residue. Previous studies of phosphorylation kinetics of synthetic peptides or screening of oriented peptide libraries suggested that a region of 7-12 amino acids around the phosphoacceptor residue interacts with the substrate binding site of the kinase. However, recent studies of computer model430

Journal of Proteome Research • Vol. 3, No. 3, 2004

MS method

LC-MS/MS LC-MS/MS IMAC + MALDI-TOF (aa306-321) Electrospray-qTOF MS/MS LC-MS/MS MALDI-TOF (aa166-181) MALDI-qTOF MS/MS LC-MS/MS LC-MS/MS MALDI-TOF (aa339-354) IMAC + MALDI-TOF (339-354) MALDI-qTOF MS/MS LC-MS/MS MALDI-TOF (aa 265-280) IMAC + MALDI-TOF (aa265-280) MALDI-qTOF MS/MS LC-MS/MS LC-MS/MS LC-MS/MS LC-MS/MS LC-MS/MS LC-MS/MS

Figure 3. Identification of phosphorylation of Ser-267 in En-2 by PKA using mass spectrometry. Phosphorylation of Ser-267 in En-2 was determined by combining MALDI-TOF MS (A), IMAC (B) and LC-ESI-MS/MS (C). MALDI-TOF and IMAC analysis revealed that amino acids 265 to 280 (RQSLAQELSLNESQIK) is a phosphopeptide. The fragment ions generated by LC-ESIMS/MS analysis showed that Ser-267 is phosphorylated by PKA. The fragmentation pattern of the peptide is indicated (only b and y ions are indicated for simplicity). Ions labeled with an asterisk were generated from the peptide in which the phosphoserine had been converted to dehydroalanine by β-elimination (loss of phosphoric acid ) 98 Da).

ing of the protein kinase-substrate complex as well as compilation of phosphoproteins suggest that more than 20 residues interact with the catalytic domain.7,34 These experimental approaches for determination of substrate specificity of protein kinases are expensive and laborious.8,35 Identification of in vivo substrates is even more difficult.14

research articles

Mapping of PKA Phosphorylation Sites

Previously, we have developed an artifical neural network which predicts either serine, threonine or tyrosine phosphorylation [NetPhos; http://www.cbs.dtu.dk/services/NetPhos].19 A neural network is capable of recognizing longer, more complex and nonlinear sequence patterns. This means that similar but nonidentical patterns are recognized by this approach.19,36 In the present study, we have taken this type of analysis a step further and developed a kinase-specific neural network algorithm which can predict PKA phosphorylation sites. The neural network was trained using 258 experimentally verified phosphorylation sites of PKA in various proteins. Because Prosite and Scansite are widely used approaches for prediction of kinase specific phosphorylation sites we found it relevant to compare the predictions with respect to the sensitivity and specificity of our neural network approach. Prosite (http://www.expasy.ch/prosite/), scans protein sequences for the PKA consensus sequence [R/K] - [R/K] - X [S/T].18,37 Scansite (http://scansite.mit.edu/) can be used to search for motifs within proteins that are likely to be phosphorylated by specific kinases.20 Scansite is based on experimental data obtained by phosphorylation of a oriented library of dodecapeptides with selected protein kinases.35 In all three approaches, the prediction of phosphorylation sites is based exclusively on the primary structure of the protein. However, kinases are recognizing the tertiary structure around the phosphoacceptor residue and not the primary structure. Furthermore, the prediction servers cannot take intracellular compartmentalization of protein kinases and substrates into account. Hence, a combination of bioinformatics and experimental analysis is necessary for identification of phosphorylation sites. Today, mass spectrometry combined with bioinformatic techniques is often used for high throughput identification of proteins.38 However, the identification of phosphorylation sites by mass spectrometry analysis is still a challenge. Numerous mass spectrometry-based methods have been applied for analysis of phosphoproteins including MALDI-TOF peptide mapping. However, detection of phosphopeptides by MALDITOF in complex peptide mixtures can be difficult due to suppression of ionization of the phosphopeptide in the presence of excess of nonphosphorylated peptides. Enrichment of phosphopeptides by immobilized metal affinity chromatography (IMAC) or fractionation of the mixture are ways to deal with this problem 17,24. One drawback of IMAC is that not only the phosphate group binds to the chelated metal on the IMAC column but also some nonphosphorylated peptides are captured by the column. Furthermore, MALDI-TOF MS analysis is often combined with treatment of peptides with alkaline phoshatase leading to a 80 Da mass shift to lower mass for the previously phosphopeptides 24,39. However, alkaline phosphatase treatment or observation of post-source decay can be difficult in a complex peptide mixture. Alternatively, electrospray mass spectrometry has successfully been used for mapping of phosphorylation sites. In this report, we have analyzed PKA phosphorylated Necdin, RFX5, En-2 and Wee 1 by MALDITOF MS, MALDI-QTOF MS/MS, and nanoelectrospray MS/MS and LC-ESI-MS/MS analysis. Three phosphopeptides were identified by MALDI-TOF analysis and the phosphoresidues were subsequently mapped by MALDI or ESI tandem mass spectrometry. Enrichment of phosphopeptides using IMAC combined with tandem mass spectrometry analysis reveals three phosphorylation sites. Interestingly, twelve of the thirteen determined PKA sites were identified by LC-ESI-MS/MS

Table 3. Sensitivity and Specificity of Prediction by NetPhosK, Scansite, and Prosite of PKA phosphorylation sitesa

sensitivity specificity

NetPhosK

Scansite

Prosite

100% 40.6%

46.2% 66.7%

53.8% 63.6%

a The sensitivity and specificity of prediction by NetPhosK, Scansite, and Prosite is calculated for all PKA phosphorylation sites in Necdin, RFX5, En2, and Wee1 identified by mass spectrometry.

analysis. We were able to determine the exact position of the phosphorylation site for 10 of the thirteen identified PKA sites, whereas the fragment ions generated by tandem mass spectrometry did not allow precise determination of the phosphorylation site in the remaining three phosphopeptides. The mass spectrometry analysis showed that Ser-144 (or Thr143), Ser-226 and Ser-317 (or Ser-316) in Necdin are phosphorylated by PKA. PKA modification of Ser-317 in Necdin is predicted by NetPhosK but not by Scansite and Prosite. NetPhosK and Prosite but not Scansite predicts modification of Ser-226. All three prediction servers suggested PKA phosphorylation of Ser-144. We identified four PKA phosphorylation sites in human RFX5. Scansite and Prosite suggested three of the experimentally identified sites namely Thr-168, Ser-308, and Ser-473, whereas NetPhosK predicted phosphorylation of all identified sites. One PKA phosphorylation site was identified by mass spectrometry in the mouse En-2 sequence namely Ser267. All three search algorithms predicted PKA phosphorylation of Ser-267 in En-2 in agreement with our experimental results. Scansite only suggested PKA phosphorylation of this site in En2, whereas NetPhosK and Prosite predicted additional sites. Finally, our mass spectrometry experiments revealed that Ser211, Ser-212, Ser-438 (or Thr-437), Ser-559, and Ser-642 in Wee 1 was phosphorylated by PKA in vitro. Prosite and Scansite predicted only phosphorylation of Ser-559. NetPhosK predicted 14 PKA phosphorylation sites in Wee 1 including all five experimentally identified sites. Overall, NetPhosK suggested 32 PKA phosphorylation sites in the four proteins including all the sites determined by mass spectrometry. In contrast, Scansite only predicted nine PKA sites and six of those corresponded to our experimental results. Thus, Scansite did not predict seven of the sites that we detected by mass spectrometry. Prosite suggested eleven PKA sites in the four proteins and seven of these sites were experimentally determined. Prosite failed to identify six of the sites verified by mass spectrometry analysis. Accordingly, NetPhosK is superior to the other algorithms with a sensitivity of 100% compared to 46.2% and 53.8% for Scansite and Prosite, respectively. However, the specificity of predictions using NetPhosK, Scansite and Prosite is 40.6%, 66.7%, and 63.6, respectively (Table 3). It is concluded that NetPhosK is most efficient for prediction of PKA phosphorylation. One reason for the higher sensitivity of NetPhosK compared with Prosite and Scansite in predicting PKA phosphorylation sites may be that the catalytic domain of the kinase interacting with a longer sequence is recognized by NetPhosK. Prosite searches only four residues in the consensus motif and Scansite screens 12 residues surrounding symmetrically the phosphorylation site.18,20 In contrast, individual neural nets of the NetPhos method scans between 13 and 19 residues centered around the phosphoacceptor site.19 This argument is supported by a recent study of the substrate specificity of AMP-activated kinase (AMPK) using structural modeling of the interaction between the substrate and catalytic domain of AMPK and Journal of Proteome Research • Vol. 3, No. 3, 2004 431

research articles phosphorylation of synthetic peptide mutants with 34 residues around the phosphoacceptor site. The findings showed that the binding of the substrate to activated AMPK was dependent on an amphipatic helix from P-16 to P-5, basic residues at P-6 and P-4 as well as residues at P+3 and P+4.34 On the other hand, Prosite and Scansite show higher specificity in predicting the PKA phosphorylation sites than NetPhosK. Several reasons may account for this discrepancy. First, the prediction methods are statistical evaluations and the specificity may depend on the stringency of analysis. However, increasing the cutoff score of NetPhosK did not increase the specificity of prediction significantly (data not shown). Likewise, the specificity of Scansite was not significantly different at low and medium stringency. Second, the prediction methods only take linear sequence into account and do not include the threedimensional structure of the protein. Accordingly, some of the predicted phosphorylation sites may be located in the hydrophobic core of the protein and not on the hydrophilic surface that is accessible to the kinase. Finally, the experimental analysis of phosphorylation sites in tryptic peptides by mass spectrometry is not complete as some sequences in the phosphorylated proteins are not covered. Accordingly, we might not have identified all PKA sites by mass spectrometry due to limitations of this method. The sequence coverage of En-2, Wee 1, Necdin, and RFX5 was 78%, 44%, 53%, and 72%, respectively. Peptides that were not detected by mass spectrometry could also contain true phosphorylation sites. Predicted phosphorylation sites lying outside the covered sequence are marked with asterisks in Table 1. If we only look at the sequence covered by mass spectrometry the specificity of NetPhosK, Scansite, and Prosite increases to 54.2%, 85.7%, and 77.8%, respectively. However, dephosphorylation during sample preparation or low stoichiometry of a phosphorylation could also explain why we might have failed to identify all phosphopeptides by mass spectrometry. In addition, the signal from the phosphopeptide is usually less intense than the nonphosphorylated counterpart because the phosphate group carries a negative charge. This can result in complete suppression of the phosphopeptide signal. Finally, phosphorylation can lead to miscleavages resulting in larger peptides which are more difficult to ionize. We are currently investigating other proteases than trypsin in order to improve the amino acid sequence coverage of the mass spectrometry experiments. Although the identification of proteins in complex mixtures by mass spectrometry is becoming routine, identification of post-translational modification including protein phosphorylation is still a challenging task. Here, we present an artificial neural network method that predicts PKA phosphorylation sites with a sensitivity of 100% and a specificity of 41%. We suggest that the high sensitivity provided by bioinformatic prediction of phosphorylation sites using neural network analysis in combination with the high specificity afforded by advanced tandem mass spectrometry could be used for hypothesis-driven identification of phosphorylation sites at a proteomic scale. We are currently setting up a hypothesis-driven tandem mass spectrometry approach for targeted identification of in vitro and in vivo phosphorylation sites. The functional role of the identified PKA phosphorylation sites in En-2, RFX5, Necdin, and Wee 1 has not yet been characterized. Three of the substrates, Necdin, RFX5 and En-2 are transcription factors. In general, transcription factors are substrates of a number of protein kinases that phosphorylates serine, threonine as well as tyrosine residues.40 Protein phos432

Journal of Proteome Research • Vol. 3, No. 3, 2004

Hjerrild et al.

Figure 4. Schematic illustration of the structure of En-2, RFX5, Necdin, and Wee 1. PKA phosphorylation sites identified by mass spectrometry are indicated.

phorylation and dephosphorylation of transcription factors is the most widely described regulation process affecting their structure, subcellular localization, activity, and DNA binding.41 There are several examples in the literature in which PKAmediated protein phosphorylation has been shown to either enhance or decrease the DNA-binding activity of transcription factors such as CREB, transcription enhancer factor-1 and hepatocyte nuclear factor 4.42-44 The PKA site identified in En2, Ser-267 is located in the DNA binding homeodomain (amino acid 235 to 294) (Figure 4) and the phosphorylation by PKA may affect the binding to DNA. One of the identified PKA sites in RFX5 is found in the DNA binding domain (amino acid 98 to 168), whereas the other sites are located on the C-terminal side (Figure 4). Necdin contains a MAGE domain from amino acid 98 to 297 where two of the identified PKA sites (Ser-144 and Ser-226) are located and the last site (Ser-317) is found outside this domain near the C-terminus (Figure 4). However, it has been shown that phosphorylation of transcription factors outside the DNA binding domain also can affect the DNA binding affinity.45 Hence, we speculate that PKA could be involved in regulation of the DNA binding of these transcription factors. It is also posible that these phosphorylation sites could affect the subcellular localization of the transcription factors leading to regulation of transcription of specific target genes. The last PKA substrate, Wee 1 is a tyrosine kinase that blocks entry into mitosis by phosphorylating and inhibiting the activity of cdc2.32 Two of the PKA sites (Ser-438 and Ser-559) are located in the kinase domain (amino acid 299 to 569), and three PKA sites are found outside this domain (Figure 4). Phosphorylation of Wee 1 by PKA may regulate kinase activity or affect its interaction with other proteins. However, it is not known whether these proteins are substrates of PKA in vivo and full characterization of the biological functions of these PKA sites requires additional studies. Abbreviations. DTT, dithiotreitol; En-2, Engrailed-2; HA-tag, haemagglutinin-tag; IMAC, immobilized metal affinity chromatography; LC-ESI-MS/MS, liquid chromatography tandem mass spectrometry; MALDI-TOF MS, matrix-assisted laser desorption ionization time-of-flight mass spectrometry; PKA, protein kinase A; QTOF, quadrupole time-of-flight; RX5, regulatory factor X 5.

research articles

Mapping of PKA Phosphorylation Sites

Acknowledgment. Plasmids encoding murine En-2, human Wee 1, and human RFX5 were generously given by Alexandra Joyner, Department of Cell Biology, New York, USA; Tony Hunter, The Salk Institute for Biological Studies, San Diego, USA; and Walter Reith, Department of Genetics and Microbiology, University of Geneva Medical School, Switzerland, respectively. N.B. and S.B. were supported by a grant from the Danish National Research Foundation and T.S.P. was supported by a grant from the Wallenberg Foundation. References (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20)

Cohen, P. Nat. Cell. Biol. 2002, 4, E127. Lawlor, M. A.; Alessi, D. R. J. Cell. Sci. 2001, 114, 2903. Hunter, T. Philos. Trans. R Soc. Lond. B Biol. Sci. 1998, 353, 583. Hanks, S. K.; Hunter, T. Faseb J. 1995, 9, 576. Taylor, S. S.; Knighton, D. R.; Zheng, J.; Ten Eyck, L. F.; Sowadski, J. M. Ann. Rev. Cell. Biol. 1992, 8, 429. Knighton, D. R.; Zheng, J. H.; Ten Eyck, L. F.; Ashford, V. A.; Xuong, N. H.; Taylor, S. S.; Sowadski, J. M. Science 1991, 253, 407. Kreegipuu, A.; Blom, N.; Brunak, S. Nucleic Acids Res. 1999, 27, 237. Pinna, L. A.; Ruzzene, M. Biochim. Biophys. Acta 1996, 1314, 191. Zhang, H.; Zha, X.; Tan, Y.; Hornbeck, P. V.; Mastrangelo, A. J.; Alessi, D. R.; Polakiewicz, R. D.; Comb, M. J. J. Biol. Chem. 2002, 277, 39379. Pandey, A.; Podtelejnikov, A. V.; Blagoev, B.; Bustelo, X. R.; Mann, M.; Lodish, H. F. Proc. Natl. Acad. Sci. U.S.A. 2000, 97, 179. Gronborg, M.; Kristiansen, T. Z.; Stensballe, A.; Andersen, J. S.; Ohara, O.; Mann, M.; Jensen, O. N.; Pandey, A. Mol. Cell. Proteomics 2002, 1, 517. Hjerrild, M.; Milne, D.; Dumaz, N.; Hay, T.; Issinger, O. G.; Meek, D. Biochem. J. 2001, 355, 347. Dalby, K. N.; Morrice, N.; Caudwell, F. B.; Avruch, J.; Cohen, P. J. Biol. Chem. 1998, 273, 1496. Campbell, D. G.; Morrice, N. A. J. Biomol. Tech. 2002, 13, 119. Aebersold, R.; Goodlett, D. R. Chem. Rev. 2001, 101, 269. Mann, M.; Jensen, O. N. Nat. Biotechnol. 2003, 21, 255. Nuhse, T. S.; Stensballe, A.; Jensen, O. N.; Peck, S. C. Mol. Cell. Proteomics 2003. Bairoch, A.; Bucher, P.; Hofmann, K. Nucleic Acids Res. 1997, 25, 217. Blom, N.; Gammeltoft, S.; Brunak, S. J. Mol. Biol. 1999, 294, 1351. Yaffe, M. B.; Leparc, G. G.; Lai, J.; Obata, T.; Volinia, S.; Cantley, L. C. Nat. Biotechnol. 2001, 19, 348.

(21) Brinkworth, R. I.; Breinl, R. A.; Kobe, B. Proc. Natl. Acad. Sci. U.S.A. 2003, 100, 74. (22) Bennett, K. L.; Stensballe, A.; Podtelejnikov, A. V.; Moniatte, M.; Jensen, O. N. J. Mass Spectrom. 2002, 37, 179. (23) Mann, M.; Ong, S. E.; Gronborg, M.; Steen, H.; Jensen, O. N.; Pandey, A. Trends Biotechnol. 2002, 20, 261. (24) Stensballe, A.; Andersen, S.; Jensen, O. N. Proteomics 2001, 1, 207. (25) Ficarro, S. B.; McCleland, M. L.; Stukenberg, P. T.; Burke, D. J.; Ross, M. M.; Shabanowitz, J.; Hunt, D. F.; White, F. M. Nat. Biotechnol. 2002, 20, 301. (26) Blom, N.; Hansen, J.; Blaas, D.; Brunak, S. Protein Sci. 1996, 5, 2203. (27) Shevchenko, A.; Wilm, M.; Vorm, O.; Mann, M. Anal. Chem. 1996, 68, 850. (28) Gobom, J.; Nordhoff, E.; Mirgorodskaya, E.; Ekman, R.; Roepstorff, P. J. Mass Spectrom. 1999, 34, 105. (29) Takazaki, R.; Nishimura, I.; Yoshikawa, K. Exp. Cell. Res. 2002, 277, 220. (30) Villard, J.; Peretti, M.; Masternak, K.; Barras, E.; Caretti, G.; Mantovani, R.; Reith, W. Mol. Cell. Biol. 2000, 20, 3364. (31) Simon, H. H.; Saueressig, H.; Wurst, W.; Goulding, M. D.; O’Leary, D. D. J Neurosci. 2001, 21, 3126. (32) Okamoto, K.; Nakajo, N.; Sagata, N. Embo J. 2002, 21, 2472. (33) Roepstorff, P.; Fohlman, J. Biomed. Mass Spectrom. 1984, 11, 601. (34) Scott, J. W.; Norman, D. G.; Hawley, S. A.; Kontogiannis, L.; Hardie, D. G. J. Mol. Biol. 2002, 317, 309. (35) Songyang, Z.; Blechner, S.; Hoagland, N.; Hoekstra, M. F.; PiwnicaWorms, H.; Cantley, L. C. Curr. Biol. 1994, 4, 973. (36) Wu, C. H. Comput. Chem. 1997, 21, 237. (37) Feramisco, J. R.; Glass, D. B.; Krebs, E. G. J. Biol. Chem. 1980, 255, 4240. (38) Kuster, B.; Mortensen, P.; Andersen, J. S.; Mann, M. Proteomics 2001, 1, 641. (39) Larsen, M. R.; Sorensen, G. L.; Fey, S. J.; Larsen, P. M.; Roepstorff, P. Proteomics 2001, 1, 223. (40) Hunter, T. Cell 2000, 100, 113. (41) Hunter, T.; Karin, M. Cell 1992, 70, 375. (42) Gupta, M. P.; Kogut, P.; Gupta, M. Nucleic Acids Res. 2000, 28, 3168. (43) Viollet, B.; Kahn, A.; Raymondjean, M. Mol. Cell Biol. 1997, 17, 4208. (44) Ray, A.; Ray, P.; Guthrie, N.; Shakya, A.; Kumar, D.; Ray, B. K. J. Biol. Chem. 2003, 278, 22 586. (45) Bourbon, H. M.; Martin-Blanco, E.; Rosen, D.; Kornberg, T. B. J. Biol. Chem. 1995, 270, 11 130.

PR0341033

Journal of Proteome Research • Vol. 3, No. 3, 2004 433