Characterization of the Human Plasma Phosphoproteome Using Linear Ion Trap Mass Spectrometry and Multiple Search Engines Montserrat Carrascal, Marina Gay, David Ovelleiro, Vanessa Casas, Emilio Gelpı´, and Joaquin Abian* LP-CSIC/UAB, IIBB-CSIC, IDIBAPS, Rosello´n 161, 7a_ Planta, 08036 Barcelona, Spain Received September 2, 2009
Major plasma protein families play different roles in blood physiology and hemostasis and in immunodefense. Other proteins in plasma can be involved in signaling as chemical messengers or constitute biological markers of the status of distant tissues. In this respect, the plasma phosphoproteome holds potentially relevant information on the mechanisms modulating these processes through the regulation of protein activity. In this work we describe for the first time a collection of phosphopeptides identified in human plasma using immunoaffinity separation of the seven major serum protein families from other plasma proteins, SCX fractionation, and TiO2 purification prior to LC-MS/ MS analysis. One-hundred and twenty-seven phosphosites in 138 phosphopeptides mapping 70 phosphoproteins were identified with FDR < 1%. A high-confidence collection of phosphosites was obtained using a combined search with the OMSSA, SEQUEST, and Phenyx search engines. Keywords: phosphoproteomics • human plasma • biomarkers • phosphopeptide purification • combined search engines
Introduction Protein phosphorylation is the most common reversible PTM in mammals, and it plays a primary role in protein function modulation and signal transmission. Alterations of protein phosphorylation profiles can also result from pathological conditions. The phosphoproteome is therefore seen as a source of knowledge both on the cell function and of disease biomarkers.1,2 Many efforts have been directed toward the description of the phosphoproteome of cells and tissues.3,4 In recent years this has been facilitated by the development of effective purification methods (IMAC, TiO2), new fragmentation methods (ECD, ETD, HCD) and more sensitive and precise mass spectrometric analyzers such as the Orbitrap and FT-ICR instruments.5-7 Biological fluids are currently the preferred samples for health monitoring and diagnosis due to their relative availability and potential to reflect processes in organs and tissues that cannot be easily accessed. The analysis of phosphoproteins or free phosphopeptides in biological fluids such as saliva, cerebrospinal fluid (CSF) or human serum has recently been reported.8-11 Cirulli et al. described four free phosphopeptides derived from fibrinogen in serum and 8 derived from 4 phosphoproteins in saliva using IMAC and Q-Trap tandem mass spectrometry.9 Li et al. used cerium ion-chelated magnetic silica microspheres for serum phosphopeptide enrichment and identified the same set of phosphorylated fibrinogen peptides described in the above work.11 A quantitative study by MALDITOF MS profiling of these fibrinogen peptides has been * To whom correspondence
[email protected].
should
be
876 Journal of Proteome Research 2010, 9, 876–884 Published on Web 11/27/2009
addressed.
E-mail:
reported by Hu et al.10 They used titanium immobilized mesoporous silica particles to purify the phosphopeptides from cancer patients and healthy controls and showed differences between the two groups. On the other hand, 59 phosphopeptides corresponding to 44 different phosphoproteins were described in human CSF by Heegaard’s group using ultrafiltration for sample concentration and TiO2 enrichment prior to LC-MS/MS analysis.8 The detection of phosphopeptides or phosphoproteins in urine has not yet been reported. However, Gonzales et al. described 14 phosphoproteins and 19 phosphosites (p-sites) in urinary exosomes using the Pierce phosphopeptide purification kit followed by LC-MS/MS analysis in an FT instrument.12 One of the main problems in the study of fluids by proteomic approaches is the need to analyze protein markers which could be present at very low concentrations together with other proteins families that are present at high concentrations. Blood serum and plasma contain proteins with relative concentrations that can differ in 9 orders of magnitude.13,14 A similar situation exists for CSF.15 This problem is commonly addressed by depleting the most abundant serum protein families using affinity or immunoaffinity methods as a general sample preparation step.16-18 A major issue with these procedures is the concomitant loss of protein and peptide molecules which exist in blood bound to albumin or other of the depleted proteins. Consequently, the depleted fraction is often now studied as a new source of potential biomarkers.19,20 In this work we describe a collection of phosphopeptides and p-sites identified in human plasma using a procedure executed on 100 µL of plasma. The seven major families of plasma proteins were separated from the other protein com10.1021/pr900780s
2010 American Chemical Society
Characterization of the Human Plasma Phosphoproteome ponents by immunoabsorption. Then we analyzed both the nonretained (low-abundance protein, LAP) and immunoabsorbed (high-abundance protein, HAP) fractions using SCX fractionation and titanium-dioxide purification coupled to liquid chromatography-linear ion trap tandem mass spectrometry (LC-µESI-ITMSn). One-hundred thirty-eight phosphopeptides and 128 p-sites mapping 70 phosphoproteins were identified at FDR < 1%.
Material and Methods Sample Preparation. Plasma samples were obtained from the Blood Bank of the Hospital Clinic and Hospital Vall d’Hebron (Barcelona, Spain). Samples were stored in 1-mL aliquots at -80 °C for ca. 1-3 months until analysis. Three independent analyses were carried out that included a plasma pool from 3 individuals and two identical aliquots from a different plasma pool prepared from 6 individuals, respectively. Plasma was delipidated by centrifugation at 25 000× g for 15 min at 4 °C and the most abundant plasma protein families were depleted using the MARS-7 system (Multiple Affinity Removal System, HPLC column 4.6 mm i.d. × 100 mm, Agilent Technologies, Palo Alto, CA) following the manufacturer’s instructions. This procedure eliminates approximately 88-92% of total protein by removing albumin, IgG, IgA, transferrin, haptoglobin, antitrypsin, and fibrinogen from the plasma sample. Briefly, the delipidated plasma was diluted 4-fold with MARS-7 buffer A and filtered thought a 0.22-µm cellulose membrane by centrifugation at 16 000× g for 10 min. The equivalent to 50 µL of plasma was injected into the system. Immunofractionation was monitored by UV at 214 nm. The depleted protein sample was recovered in the flow-through (LAP) fraction while the immunoabsorbed proteins (HAP fraction) were eluted with MARS-7 buffer B. Both fractions were collected and the material from two consecutive injections was pooled. A phosphatase inhibitor cocktail (Sigma, St Louis, MO) was added to the samples before storage at -80 °C. LAP samples, containing about 1 mg protein, were boiled for 5 min in the presence of 10 mM DTT, alkylated with 55 mM iodoacetamide (IAA) (30 min, 20 °C) and digested with TPCK treated trypsin (Sigma, St. Louis, MO, 2% final concentration) for 20 h at 37 °C. Digestion was stopped by adding TFA until 1% and the tryptic extract was desalted using a tC18 cartridge (Waters, Milford, MA). The eluate was evaporated to near dryness and redissolved in 400 µL of SCX solvent A. For the HAP samples, a volume equivalent to 16% of the total material in the pooled HAP fractions (approximately 1 mg protein) was precipitated with 10% TCA, washed with acetone and dissolved in 6 M urea, 25 mM Tris-HCl pH 7.8. This protein solution was then reduced with 10 mM DTT and alkylated with 55 mM IAA. The sample was diluted 3 times with ammonium bicarbonate and then digested with trypsin and processed as described for the LAP samples. SCX Chromatography. SCX separations were carried out using a Polysulfoethyl A TM, 50 × 2.1 mm, 5 µm, 200 Å column. Separation was performed at 200 µL/min using a linear gradient from 0 to 25% solvent B in 35 min and then to 100% B in 20 min (solvent A: 30% ACN, 0.1% formic acid; solvent B: 30% ACN 0.1% formic acid and 500 mM NH4Cl). Due to the high amount of material in the extracts, the total sample was divided in two aliquots of 200 µL that were injected separately. Sixteen 3-min fractions from each injection were collected. Fractions corresponding to the same time window were pooled. The same procedure was performed for both the HAP and LAP samples.
research articles Phosphopeptide Enrichment Using TiO2. SCX fractions were evaporated to 10 µL, diluted to 50 µL with 1 M glycolic acid, 5% TFA, 80% acetonitrile and loaded into a TiO2 (Titansphere 5 µm, GL Sciences Inc., Tokyo, Japan) minicolumn prepared in a GelLoader tip as described,21 and previously conditioned with 1 M glycolic acid. The samples were loaded into the tip with a syringe and were washed consecutively with 10 µL of 1 M glycolic acid, 5% TFA, 80% ACN, 20 µL of 80% ACN 1% TFA and, finally, 5 µL of water. Phosphopeptides were eluted from the tip with 20 µL 0.5% NH4OH followed by 1 µL 30% ACN. The eluate was acidified with 2 µL formic acid and stored at -80 °C until analysis. LC-MS/MS Analysis. Each extract was concentrated to about 5 µL, diluted to 40 µL with 1% formic acid and analyzed by LC-MS/MS using a linear LTQ ion trap equipped with a microESI ion source (ThermoFisher, San Jose, CA). The HPLC system consisted of an Agilent 1200 capillary pump, a binary pump, a thermostatted microinjector and a microswitch valve. Separation was carried out using a C18 preconcentration cartridge (Agilent Technologies, Barcelona, Spain) connected to a 10-cm-long, 150-µm-i.d. Vydac C18 column (Vydac, IL). Separation was performed at 1 µL/min using a linear ACN gradient from 0 to 40% in 60 min (solvent A, 0.1% formic acid; solvent B, ACN 0.1% formic acid). The LTQ instrument was operated in the positive ion mode with a spray voltage of 2 kV. The scan range of each full MS was m/z 400-1800. The spectrometric analysis was performed in an automatic dependent mode. A full MS scan followed of 8 MS/MS scans for the most abundant signals were acquired. A subsequent MS3 scan was performed when a neutral loss of -98, -49, or -32.7 (loss of H3PO4 for the +1, +2 and +3 charged ions respectively) was detected among the 5 most intense ions. Dynamic exclusion was set on (repeat counts, 1; duration, 3 min.). Database Search and Phosphopeptide Validation. Fragmentation spectra were analyzed in parallel using three different search engines: SEQUEST22 (Bioworks v3.3, ThermoFisher, San Jose, CA), Phenyx (version 2.6, GeneBio, Geneva, Switzerland) and OMSSA (version 2.1.4).23 In all cases, confident identifications were filtered using a target/decoy database strategy. For SEQUEST and OMSSA, searches were performed against the Swissprot database (Human Swiss-Prot release 14.8, 20332 entries) combined with its reversed copy. Phenyx server required separated target and decoy databases. Search parameters for SEQUEST were: peptide mass tolerance, 2 Da; fragment tolerance, 0.8 Da; enzyme was set to trypsin, allowing up to three missed cleavages; static modification, carbamidomethylated cysteine (+57 Da); variable modifications, methionine oxidation (+16 Da), phosphorylation on Ser, Thr and Tyr (+80 Da) and loss of water from Ser and Thr due to the β-elimination of phosphoric acid from the corresponding phospho-amino acid. All searches were performed considering a maximum of +3 charges for the precursor ion and both MS2 and MS3 fragmentation spectra were used. Thermo binary .raw files were used for SEQUEST search using the Bioworks software. Correct peptide sequence identifications (FDR < 1% and 5%) were evaluated independently for the three experiments using both the Xcorr score from SEQUEST and the peptide prophet D value24 as described previously.21 Thermo .raw files were converted to .dta files using the Bioworks tools. These files were submitted for database search to the Phenyx and OMSSA engines using the same parameters as described for SEQUEST. The cutoff values calculated for the different experiments for FDR < 1% were in Journal of Proteome Research • Vol. 9, No. 2, 2010 877
research articles the range 7.2-7.5 (z-score) and 1.6-2 (-log (expectation value)) for Phenyx and OMSSA, respectively. For FDR < 5% the corresponding cutoff values were in the range 6.6-6.8 (z-score) and 0.06-0.84 (-log (expectation value)). For each spectrum only the assignation of higher score for each engine was considered. In the final data sets, only those sequences obtained from only one engine or common to two or more engines were included (i.e., conflicting results were discarded). The confidence of the p-sites from identified phosphopeptides was evaluated with an additional Ascore25 analysis performed as described elsewhere.26
Results and Discussion For a typical analysis, 100 µL of human plasma was depleted with the MARS-7 system. The flow-through (LAP) and the retained (HAP) fractions (containing the low-abundance proteins and seven of the most abundant families of plasma proteins, respectively) were digested and tryptic peptides were fractionated by SCX. For each peptide fraction, phosphopeptides were isolated using TiO2. Purified fractions were subjected to LC-µESI-ITMSn, using an LTQ linear ion trap mass spectrometer. Efficient isolation of phosphopeptides from the highly complex mixtures of nonphosphorylated molecules in the biological sample is a major issue in phosphoproteome analysis. In the procedure reported, phosphopeptides are purified through a combination of SCX fractionation and TiO2 purification. In previous work we described a method for enrichment of T-lymphocyte phosphopeptides from 1D gel-separated and in-gel digested proteins using sequential IMAC and TiO2 purification and showed that this sequential enrichment increases the number of identifications.21 Similar conclusions were reported by Thingholm et al. in the analysis of human mesenchymal stem cells.27 When using SCX fractionation instead of gel separation, the higher concentration of salts in the sample requires that it be desalted before IMAC purification. TiO2 behaves in a more advantageous manner than IMAC at high salt concentrations, so in most cases TiO2 can be used as the only purification step.4,28 In preliminary experiments using SCX, we found that the number of phosphopeptides obtained was slightly higher when fractions were submitted directly to TiO2 than when they were desalted and then phosphopeptides were purified sequentially with IMAC and TiO2 (68 unique phosphopeptides for sequential IMAC and TiO2 and 87 for direct TiO2 enrichment, Supporting Information, Table 1). In consequence, we tested this strategy for the analysis of plasma phosphoproteome. Overall, this strategy showed to be very efficient and, as described here, it allowed us to identify 138 nonredundant phosphopeptides (FDR < 1%) in human plasma from healthy individuals. To increase the number and confidence of the identified sequences, three different search engines were used: SEQUEST, Phenyx and OMSSA. These tools make use of different algorithms and scores for the identification of sequences in the database matching a mass spectrum. Thus, identification of the same sequence by more than one search engine greatly increases confidence in the match.29,30 To select the best matches we used the Phenyx z-score parameter and the OMSSA expectation value. The cutoff value of these parameters was determined for two different values of false discovery rate (1% and 5%) using a target/decoy database strategy. The same procedure was applied to SEQUEST data but in this case, two 878
Journal of Proteome Research • Vol. 9, No. 2, 2010
Carrascal et al. Table 1. Summary of the Reported Human Plasma Phosphopeptide Set Filtered at Two Different Values of FDR
Spectra identified Spectra identified as nonphosphorylated peptides Spectra identified as phosphopeptides Unique phosphopeptides Unique nonphosphorylated peptides Selectivitya (%) Total p-sites High-confidence p-sitesb Phosphoproteins (Swiss-Prot)c Genesc
FDR ) 1%
FDR ) 5%
3344 2502
4014 2847
842
1167
138 718
293 833
18 127 71 70 65
26 356 193 223 221
Identifications common to more than one search engine Spectra identified 2409 2906 Spectra identified as 1910 2218 nonphosphorylated peptides Spectra identified as 551 688 phosphopeptides Unique phosphopeptides 85 92 Unique nonphosphorylated 537 603 peptides 13 13 Selectivitya (%) Total p-sites 70 74 High-confidence p-sitesb 43 47 Phosphoproteins (Swiss-Prot)c 40 44 Genesc 40 43 a Ratio between the number of phosphopeptides and the total number of identified sequences. b Those with an unequivocal p-site or with Ascore >18. c Minimal set of proteins/genes defined by the identified phosphopeptides.
different parameters, the Xcorr and the D value, were used in combination as described previously.21 A total number of 159 640 spectra were submitted to database search. About 2600 spectra were identified on average by each engine at FDR < 1%. The final set of spectra was composed of 842 spectra matching phosphorylated sequences by at least one engine, and 2502 spectra of nonphosphorylated peptides (Figure 1). When there was more than one possible phosphorylation site for a sequence, the Ascore algorithm25 was used to evaluate the best location26 and the p-site was assigned to the candidate with the highest score. P-site assignations from SEQUEST and OMSSA analyses agreed to a large extent with the Ascore evaluation (85% and 87% agreement, respectively). However, Phenyx did not perform so well and 69% of its assignations were reassigned by Ascore. On the other hand, SEQUEST yielded the highest number of phosphopeptide identifications (Figure 1). At an FDR < 1% SEQUEST identified 26% more phosphorylated sequences than OMSSA and 3% more than Phenyx. In a comparison of different search engines using the target/decoy strategy, Balgley et al.29 found that OMSSA produced twice as many hits as SEQUEST (48 328 vs 24 575) for a similar FDR. In the analysis of yeast phosphoproteome, Gygi’s group also reported a higher number of peptides identified by OMSSA at 1% FDR, whereas for phosphopeptides, SEQUEST produced 20% more identifications.5 In our conditions, SEQUEST always performed better than OMSSA, producing also 2% more identifications of normal peptides. Increasing the value of FDR to 5% resulted in similar increases in the total number of identifications passing the filter
Characterization of the Human Plasma Phosphoproteome
research articles
Figure 1. Spectra identified as phosphopeptides and nonphosphorylated peptides using the search engines SEQUEST, Phenyx and OMSSA at FDR < 1% and 5%.
for each engine. This increase was however different for phosphopeptides and nonphosphorylated peptides. While the average increase when FDR was raised from 1 to 5% was 44% for phosphopeptides, for nonphosphopeptides the increase was 12%. This difference could be the result of the different score distributions observed for phosphopeptides and nonphosphopeptides. It could also reflect the higher contribution of phosphopeptide sequences to the data set of false positives. The high probability that a false identification could correspond to a phosphorylated sequence at low FDR values is clearly seen by inspecting the distribution of peptide matches with the search engine score. Thus, in the case of Phenyx, at z-scores below 7, the number of phosphopeptide assignations in the decoy database is about 7-10 fold higher than that for nonphosphorylated peptides (Figure 2). The distributions of identifications common to 2 or more engines were clearly different when comparing phosphorylated and nonmodified sequences. At FDR 18) aafa
TCVADEsAENCDK
76
88
S82 (1)
4
1
Albumin
P02768
DSGRDYVsQFEGSALGK
48
64
S55
2
1, 2
Apolipoprotein A-I
P02647
Apolipoprotein A-II
P02652
THLAPYsDELR
p-siteb
peptide counts
aaia
sequence
experiment
protein
acc. no.
185
195
S191
2
1
SYFEKsKEQLTPLIK
63
77
S68
7
1, 2, 3
VKsPELQAEAK
52
62
S54
1
1
ARISAsAEELR
254
264
S259
1
3
Apolipoprotein A-IV
P06727
52
69
S59
1
1
Apolipoprotein A-V
Q6Q788
9
1, 3
Apolipoprotein B-100
P04114
1, 2, 3
Apolipoprotein L1
O14791
EPATLKDsLEQDLNNMNK VREsDEETQIK
4045
4055
VTEPIsAEsGEQVER
306
320
S311 (1), S314
S4048
VTEPIsAESGEQVER
306
320
S311 (1)
2
1 1, 2, 3
19
VTEPISAEsGEQVER
306
320
S314
6
RIPIEDGsGEVVLSR
290
304
S297
2
1
Complement Component 3
P01024
RLLCNGDNDCGDQsDEANCR
137
156
S150
1
1
Complement Component 8
P07358
QCVPtEPCEDAEDDCGNDFQCSTGR
87
111
T91
2
1, 2
Complement Component 9
P02748
KVTYTsQEDLVEK
19
31
S24
1
1
Complement Factor I
P05156
3
LLSLGAGEFKsQEHAK
849
864
S859
1
Coagulation Factor V
P12259
CDSSPDsAEDVR
132
143
S138 (1,2,4)
20
1, 2, 3
Alpha-2-HS-Glycoprotein
P02765
CDSSPDsAEDVRK
132
144
S138 (1,2,4)
42
1, 2, 3
FSVVYAKCDSSPDsAEDVR
125
143
S138 (1,2,4)
5
1, 2
FSVVYAKCDSSPDsAEDVRK
125
144
S138 (1,2,4)
7
1
HTFMGVVSLGSPsGEVSHPR
318
337
S330 (2)
GSVQYLPDLDDKNsQEK
74
1, 3
302
318
S315
1
1
Fetuin B
Q9UGM5
RPGGEPSPEGTTGQSYNQYsQR
2335
2356
S2354
14
1
Fibronectin 1
P02751
TNTNVNCPIECFMPLDVQADREDsRE
2361
2386
S2384 (1)
2
1
IERDsREHEEPTTsEMAEETYSPK
112
135
S116 (1), S125
1
1
IGF-binding protein 5
P24593
KAAIsGENAGLVR
125
137
S129
1
1
ITI heavy chain H1
P19827
MNFRPGVLsSR
658
668
S666
1
1
ITI heavy chain H4
Q14624
ETTCSKEsNEELTESCETK
325
343
S332
176
1, 2, 3
Kininogen 1
P01042
ETTCSKEsNEELTESCETKK
325
343
S332
67
1, 2, 3
ETTCsKESNEELTESCETKK
325
343
S329
10
IGEIKEETTsHLR
393
405
S402
1
1
IGEIKEETTSHLRsCEYK
393
410
S406
1
3
HLAQAsQELQ
683
692
S688 (1,2)
2
1
PCSK9
Q8NBP7
SRHLAQAsQELQ
681
692
S688 (1,2)
2
1 Plasminogen
P00747
Proteoglycan 4
Q92954
Selenoprotein P
P49908
Serpin A10
Q9UK55
Antithrombin-III
P01008
Alpha-2-antiplasmin
P08697
AFQYHsKEQQCVIMAENR KCSGTEASVVAPPPVVLLPDVETPsEEDCMFGNGK KQLGAGsIEECAAK KQLGAGsIEECAAKCEEDEEFTCR VQsTELCAGHLAGGTDSCQGDSGGPLVCFEKDK GRCFEsFER
1, 2, 3
63
80
S68
3
1
453
487
S477
1
1
39
52
S45
1
1
39
52
S45
2
3
739
771
S741
1
2
72
80
S77
1
1
TTSAKETQsIEK
304
315
S312
1
3
DMPAsEDLQDLQK
262
274
S266 (3)
2
1
DMPAsEDLQDLQKK
262
275
S266 (3)
2
1
VVQAPKEEEEDEQEAsEEKAsEEEK
41
65
S56, S61
6
1, 3
VVQAPKEEEEDEQEAsEEKASEEEK
41
65
S56
6
1, 3
ATEDEGsEQKIPEATNR
61
78
S68 (3)
17
1, 2, 3
KATEDEGsEQKIPEATNR
62
78
S68 (3)
23
1, 2, 3
446
461
EQQDsPGNKDFLQSLK
880
Journal of Proteome Research • Vol. 9, No. 2, 2010
S470
2
1, 2
research articles
Characterization of the Human Plasma Phosphoproteome Table 2. Continued peptide counts
aaia
aafa
31
48
S37
2
1
Heparin Cofactor 2
P05546
DQGNQEQDPNIsNGEEEEEKEPGEVGTHNDNQER
187
220
S198
2
1, 2
SPARC-like protein 1
Q14515
HIQETEWQsQEGK
287
299
S295
8
1, 2, 3
DSPSKSSAEAQtPEDTPNK
65
83
T76
1
1
Trans-golgi network protein 2
O43493
DSPSKSsAEAQTPEDTPNK
65
83
S71 (1,2)
1
1
364
380
S75
2
1
MASP1 protein
P48740
94
107
S96
2
2
Secreted phosphoprotein 24
Q13103
sequence
GGETAQsADPQWEQLNNK
KNEIDLESELKsEQVTE KDsGEDPATCAFQR
p-siteb
a
aai,aaf, position in the protein sequence of the initial and final peptide amino acids. Bahl et al.8 and (4) Plasma database (http://www.plasmaproteomedatabase.org).
of the matches in a data set but also the agreement in the assignation provided by different algorithms. Thus, a valid strategy for selection of high-confidence identifications could rely on the selection of common identifications from data sets filtered at relatively high values of FDR. This procedure is especially suitable for groups working with conventional traps, as in this case, or with other low-resolution mass spectrometers. The number of false positives in the data set of identifications supported by several engines is minimized, resulting in an FDR value lower than that set for each search engine individually.31 Recently, Jones et al. described a method for evaluating combined FDR scores for data sets produced in this way.30 Phosphopeptide enrichment is a key step for phosphoproteome analysis. TiO2 supports show affinity for phosphopeptides but also to other acidic peptides and glycopeptides. The addition of glycolic acid to a sample decreases the binding of acidic-nonphosphorylated peptides and in fact, in our data set, the percentage of acidic residues (E and D) in the nonphosphorylated sequences (18%) was only slightly higher than that expected from the amino acid distribution in the Uniprot database (12%). The percentage of phosphopeptides in the TiO2 enriched extracts was of 18% in average (nonredundant data set, FDR < 1). In previous work, we reported a selectivity of 41% for these columns in the purification of phosphopeptides from the T-cell phosphoproteome. This value was, certainly, an underestimation of TiO2 selectivity, as it was calculated from the purification of the flow-through extract from an IMAC purification. TiO2 selectivity as high as 90% has been obtained.32 However, Bahl et al. reported that TiO2 purification selectivity fell to 50% in CSF (a fluid that shows similarities with plasma in protein composition and relative concentration ranges).8 A similar result was reported by Gilar et al. when comparing the yield of phosphopeptides identified from a yeast digest and from human serum using an MOAC support.33 Thus, the apparent low selectivity obtained in the analysis of the plasma extracts could be at least partially due to the particular composition of this biological fluid. Most of the nonphosphorylated peptides found in the plasma samples were derived from high-abundance plasma proteins (data not provided). These high-abundance peptides probably diminish the recovery of other low-abundance phosphopeptides. Additionally, glycosylated proteins and peptides in plasma could compete with phosphopeptide binding to TiO2, further reducing their recovery. Of the total number of p-sites described in our collection, 47 can be considered of very high confidence as they corresponded to sequences validated by two or more engines and with low (Ascore > 18) or no ambiguity regarding the p-site location (Table 2). Of these, only seven p-sites were included
b
experiment
protein
acc. no.
p-sites described in (1) Phosphosite database, (2) UniProt, (3)
in the Phosphosite database, currently probably the largest available repository of p-sites (www.phosphosite.org). In addition, p-sites at S198 and S295 from SPARC-like protein and S266 of selenoprotein P and S68 of antithrombin III were described by Bahl et al. in CSF proteins8 and the S330 from alpha-2-HS-glycoprotein was already annotated in the SwissProt database. Our collection did not include the only fibrinogen p-site described independently by Citrulli,9 Li11 and Hu10 from 4 different nested peptide sequences in the analysis of free serum phosphopeptides. None of these sequences was a tryptic sequence so they were probably missed in our searches, which had to be performed with enzymatic restriction. (Searching from low-resolution data and many dynamic modifications is a highly computer-intensive task and, in our case, SEQUEST crashes the computer when further overloaded with a nonrestricted search). Neither did we detect the corresponding 3075 Da tryptic phosphopeptide that may be derived from the fibrinogen molecule. The amino acid distribution in our phosphopeptide data set shows enrichment in serine and glutamic acid relative to the proportions expected from the Uniprot-Swiss-Prot database (Figure 4). In fact, 58% of the high-confidence p-sites showed an acidophilic SXE motif (Motif-x algorithm34), which is specific to ATM, calmodulin-dependent II and casein kinase families.35 The ratio of serine to threonine in high-confidence p-sites was
Figure 4. Amino acid composition of the high-confidence phosphorylated sequences reported and that calculated from UniprotSwiss-Prot and Plasma databases. Insert: major phosphorylation motif from the data set of p-sites of high confidence (Motif-X, p < 10 × 10-6; Motif score, 16; fold increase, 8.92). Journal of Proteome Research • Vol. 9, No. 2, 2010 881
research articles
Carrascal et al.
Figure 5. Complement and coagulation cascades (adapted from KEGG pathways database). Proteins marked with a filled star are high-confidence identifications confirmed by several search engines. The identification marked with a hollow star was identified with only one engine at FDR < 5%.
of 45/2. The only sequence phosphorylated at Tyr that was identified by two different engines, showed however a poor Ascore value, and therefore is not included in Table 2. Phosphoproteins. The full set of phosphopeptides (FDR < 1%) pointed to a total of 70 source proteins in the UniprotSwiss-Prot database (223 at FDR < 5%). This figure corresponds to the minimal protein set defined by the identified phosphopeptides. Thus, although all protein matches are annotated in the Supporting Information tables, when a peptide sequence was common to more than one protein sequence, only one hit was counted. Nevertheless, even considering the more extense data set (FDR < 5%), only 7 peptide sequences produced ambiguous protein identifications due to sequence homology in the Uniprot Swiss-Prot database. One of these sequences was common to four proteins, two sequences to 882
Journal of Proteome Research • Vol. 9, No. 2, 2010
three proteins and the other four to 2 proteins each. Only two of these latter peptides passed the filter for FDR < 1%. Forty-four of the phosphoproteins in the data set were identified with at least one peptide sequence validated by two or more engines (see Supporting Information Table IIId). In the following discussion we only consider this protein data set. Description of new phosphosites from these proteins is also restricted to those of Ascore > 18 (Table 2). Six phosphoproteins were detected in the HAP fraction of which only 2 corresponded to proteins specifically targeted by the affinity support (albumin and fibrinogen alpha). Fibrinogen beta and gamma, IgG gamma, haptoglobin, alpha-1-antitrypsin and transferrin, six of the targeted proteins by the MARS columns, were only identified from nonphosphorylated peptides in the HAP fraction. It is well-known that depletion
research articles
Characterization of the Human Plasma Phosphoproteome systems remove many other proteins in addition to those the affinity support is directed at.36,37 Taking into account data from phosphorylated and nonphosphorylated peptides and not considering as different several entries for Ig heavy chains, Ig gammas and keratins, a total of 21 proteins, were present in the HAP fraction. As indicated, only 8 of these proteins were targeted by the affinity support. Thus, phosphopeptides from nontargeted proteins in the HAP fraction could be derived from proteins or protein fragments nonspecifically bound to the support or captured due to their affinity to one of the specifically absorbed proteins. It is interesting to note that of the 14 nontargeted proteins in the HAP fractions, 12 were already reported as forming part of the albuminome19 or being copurified with albumin.20 Twelve of the identified phosphoproteins were related to the complement and coagulation cascade (Figure 5). Four new p-sites on 4 different peptide fragments from plasminogen were identified that were situated in the plasmin heavy (S45, S68, S477) and light (S741) chains which are produced from this zymogen by the action of plasminogen activators (Spectra for plasminogen phosphopeptides are given in Supporting Information). Plasmin is a key enzyme in the fibrinolytic system where its primary function is to degrade fibrin-rich thrombi.38 The plasminogen activator system is also involved in tissue proliferation and cellular adhesion.39 Studies of several types of cancer have revealed that increases of urokinase plasminogen activator levels are related to aggressive tumor behavior and poor prognosis.40-42 Several studies have aimed to elucidate plasminogen structure-function relationships and interactions with other molecules, but the only PTMs described are N- and O-linked glycosylations and a phosphorylation in Ser 578.43,44 The functional relevance of the new p-sites described here has yet to be elucidated, but they could influence plasminogen activation or plasmin enzymatic activity as it occurs with the urokinase-type plasminogen activator45 or other enzymes involved in the coagulation cascade46 which are regulated by phosphorylation processes. Other identified phosphoproteins related with the complement and coagulation cascades include the complement components 3 and 8, the complement factor I, the coagulation factor V, fibrinogen alpha, alpha-2 macroglobulin, kininogen, the mannan-binding lectin serine peptidase 1 and the serpins antithrombim-III, alpha-2antiplasmin and heparin cofactor 2 (Figure 4). Thirteen new high confidence p-sites were identified in this group of proteins. Six additional phosphopeptides corresponding to two proteins of this group (alpha-2 macroglobulin, fibrinogen alpha chain) were also identified although the correct p-site locations could not be validated (Ascore < 18). Six different phosphorylated apolipoproteins were detected. One new phosphorylation site was described for apolipoprotein L1 (S314) and the p-site reported by Mancone et al. in S31147 was confirmed. Other newly described p-sites were S55 and S191 in ApoA-1, S68 and S54 in ApoA-II, S259 in ApoA-IV, S59 in APO-V and S4048 in ApoB. Most of the phosphoproteins identified are plasma proteins. The data set includes protease inhibitors, carrier proteins, and precursors of the complement which are involved in relevant biological functions such as those related to blood circulation and clotting, immunity and lipid transport. The new phosphorylation sites reported here provide of a collection of new tentative sites for protein activity regulation that could be important in understanding the mechanisms involved in blood physiology and homeostasis.
Conclusions Using a procedure based on SCX fractionation, TiO2 and LCµESI-ITMSn, we have reported phosphoproteins and p-sites from human plasma for the first time. Analysis of the phosphoproteome of biological fluids such as serum, plasma and CSF imposes specific conditions on phosphopeptide isolation due to the complexity of the biological matrix. In addition, the use of conventional ion traps limits the accuracy of mass measurements and increases the rate of false identifications. However, the use of several search engines in parallel allows the selection of high-confidence data sets with a reduced proportion of false positives. Phosphorylation is a highly functionally important and common modification of proteins. Characterization of these peptides and proteins in blood is a first step toward a better characterization of the many physiological processes blood cells and cells from the vascular walls are involved in, as well as those processes originating in other tissues and organs whose state is affected or reflected by compounds transported in blood.
Acknowledgment. This work was supported by grant BIO2008-03369 from the Spanish Ministerio de Ciencia e Innovacio´n. The LP-CSIC/UAB is a member of ProteoRed (http://www.proteored.org), funded by Genoma Spain, and follows the quality criteria set up by ProteoRed standards. Supporting Information Available: Supplementary Table 1. Comparison of sequential purification using IMAC and TiO2 with previous desalting and direct purification with TiO2. Supplementary Tables IIa-d. Peptides, phosphopeptides and phosphoproteins at FDR < 1%. Supplementary Table IIIa-d. Peptides, phosphopeptides and phosphoproteins at FDR < 5%. Supplementary Figure 1. Annotated MS/MS spectra from the four identified phosphopeptides of Plasminogen. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Geoerger, B.; Gaspar, N.; Opolon, P.; Morizet, J.; Devanz, P.; Lecluse, Y.; Valent, A.; Lacroix, L.; Grill, J.; Vassal, G. EGFR tyrosine kinase inhibition radiosensitizes and induces apoptosis in malignant glioma and childhood ependymoma xenografts. Int. J. Cancer 2008, 123 (1), 209–216. (2) Yamada, M.; Ikeda, Y.; Yano, M.; Yoshimura, K.; Nishino, S.; Aoyama, H.; Wang, L.; Aoki, H.; Matsuzaki, M. Inhibition of protein phosphatase 1 by inhibitor-2 gene delivery ameliorates heart failure progression in genetic cardiomyopathy. FASEB J. 2006, 20 (8), 1197–1199. (3) Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 2006, 127, 635–648. (4) Zanivan, S.; Gnad, F.; Wickstrom, S. A.; Geiger, T.; Macek, B.; Cox, J.; Fassler, R.; Mann, M. Solid Tumor Proteome and Phosphoproteome Analysis by High Resolution Mass Spectrometry. J. Proteome Res. 2008, 7, 5314–5326. (5) Bakalarski, C. E.; Haas, W.; Dephoure, N. E.; Gygi, S. P. The effects of mass accuracy, data acquisition speed, and search algorithm choice on peptide identification rates in phosphoproteomics. Anal. Bioanal. Chem. 2007, 389, 1409–1419. (6) Domon, B.; Bodenmiller, B.; Carapito, C.; Hao, Z.; Huehmer, A.; Aebersold, R. Electron Transfer Dissociation in Conjunction with Collision Activation To Investigate the Drosophila melanogaster Phosphoproteome. J. Proteome Res. 2009, 8, 2633–2639. (7) Mann, M.; Kelleher, N. L. Precision proteomics: high mass accuracy. Proc. Natl. Acad. Sci. U.S.A. 2008, 105 (47), 18132–18138. (8) Bahl, J. M. C.; Jensen, S. S.; Larser, M. R.; Heegaard, N. H. H. Characterization of the human cerebrospinal fluid phosphopro-
Journal of Proteome Research • Vol. 9, No. 2, 2010 883
research articles (9)
(10)
(11)
(12)
(13) (14) (15)
(16)
(17) (18)
(19) (20) (21)
(22)
(23)
(24)
(25)
(26) (27)
884
teome by titanium dioxide affinity chromatography and mass spectrometry. Anal. Chem. 2008, 80, 6308–6316. Cirulli, C.; Chiappetta, G.; Marino, G.; Mauri, P.; Amoresano, A. Identification of free phosphopeptides in different biological fluids by a mass spectrometry approach. Anal. Bioanal. Chem. 2008, 392, 147–159. Hu, L.; Zhou, H.; Li, Y.; Sun, S.; Guo, L.; Ye, M.; Tian, X.; Gu, J.; Yang, S.; Zou, H. Profiling of Endogenous Serum Phosphorylated Peptides by Titanium (IV) Immobilized Mesoporous Silica Particles Enrichment and MALDI-TOFMS Detection. Anal. Chem. 2009, 81 (1), 94–104. Li, Y.; Qi, D.; Deng, C.; Yang, P.; Zhang, X. Cerium Ion-Chelated Magnetic Silica Microspheres for Enrichment and Direct Determination of Phosphopeptides by Matrix-Assisted Laser Desorption Ionization Mass Spectrometry. J. Proteome Res. 2008, 7 (4), 1767– 1777. Gonzales, P. A.; Pisitkun, T.; Hoffert, J. D.; Tchapyjnikov, D.; Star, R. A.; Kleta, R.; Wang, N. S.; Knepper, M. A. Large-Scale Proteomics and Phosphoproteomics of Urinary Exosomes. J. Am. Soc. Nephrol. 2009; doi: 10.1681/ASN.2008040406. Anderson, N. L.; Anderson, N. G. The Human Plasma Proteome. History, character, and diagnostic prospects. Mol. Cell Proteomics 2002, 1, 845–867. Turner, M. W.; Hulme, B. The Plasma Proteins: An Introduction; Pitman Medical & Scientific Publishing Co., Ltd.: London, 1970. Pan, S.; Zhu, D.; Quinn, J. F.; Peskind, E. R.; Montine, T. J.; Lin, B.; Goodlett, D. R.; Taylor, G.; Eng, J.; Zhang, J. A combined dataset of human cerebrospinal fluid proteins identified by multidimensional chromatography and tandem mass spectrometry. Proteomics 2007, 7, 469–473. Adkins, J. N.; Varnum, S. M.; K.J., A.; Moore, R. J.; Angell, N. H.; Smith, R. D.; Springer, D. L.; Pounds, J. L. Toward a Human Blood Serum Proteome: Analysis by multidimensional separation coupled with mass spectrometry. Mol. Cell. Proteomics 2002, 4, 7–955. Linke, T.; Doraiswamya, A.; Harrison, E. H. Rat plasma proteomics: Effects of abundant protein depletion on proteomic analysis. J. Chromatogr. 2007, 849, 273–281. Roche, S.; Tiersb, L.; Provansalb, M.; Sevenod, M.; Pivaa, M.-T.; Jouind, P.; Lehmanna, S. Depletion of one, six, twelve or twenty major blood proteins before proteomic analysis: The more the better? J. Proteomics 2009; doi:10.1016/j.jprot.2009.03.008. Gundry, R. L.; Cotter, R. J. The Albuminome as a Tool for Biomarker Discovery. In Clinical Proteomics. From Diagnosis to Therapy; Van Eyk, J., Dunn, M. J., Eds.; Wiley: London, 2007; pp 263-278. Gay, M.; Carrascal, M.; Gorga, M.; Pare´s, A.; Abian, J. Characterization of peptides and proteins in commercial human serum albumin solutions Proteomics 2009, DOI: 10.1002/pmic.200900182. Carrascal, M.; Ovelleiro, D.; Casas, V.; Gay, M.; Abian, J. Phosphorylation analysis of primary human T lymphocytes using sequential IMAC and titanium oxide enrichment. J. Proteome Res. 2008, 7, 5167–5176. Eng, J. K.; McCormack, A. L.; Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, 976– 989. Geer, L. Y.; Markey, S. P.; Kowalak, J. A.; Wagner, L.; Xu, M.; Maynard, D. M.; Yang, X.; Shi, W.; Bryant, S. H. Open mass spectrometry search algorithm. J. Proteome Res. 2004, 3 (5), 958– 964. Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74, 5383– 5392. Beausoleil, S. A.; Villen, J.; Gerber, S. A.; Rush, J.; Gysi, S. P. A probability-based approach for high-throughtput protein phosphorylation analysis and site localization. Nat. Biotechnol. 2006, 24 (10), 1285–1292. Ovelleiro, D.; Carrascal, M.; Casas, V.; Abian, J. LymPHOS: Design of a phosphosite database of primary human T cells. Proteomics 2009, 9, 3741–3751. Thingholm, T. E.; Jensen, O. N.; Robinson, P. J.; Larsen, M. R. SIMAC - A phosphoproteomic strategy for the rapid separation of
Journal of Proteome Research • Vol. 9, No. 2, 2010
Carrascal et al.
(28) (29)
(30) (31) (32)
(33)
(34) (35) (36) (37)
(38) (39) (40) (41) (42)
(43) (44)
(45)
(46) (47)
monophosphorylated from multiply phosphorylated peptide. Mol. Syst. Biol. 2008, 7, 661–671. Pan, C.; Olsen, J.; Daub, H.; Mann, M. Global effects of kinase inhibitors on signaling networks revealed by quantitative phosphoproteomics. Mol. Cell. Proteomics 2009, 8, 2796–2808. Balgley, B. M.; Laudeman, T.; Yang, L.; Song, T.; Le, C. S. Comparative Evaluation of Tandem MS Search Algorithms Using a Target-Decoy Search Strategy. Mol. Cell. Proteomics 2007, 6, 1599–1608. Jones, A. R.; Siepen, J. A.; Hubbard, S. J.; Paton, N. W. Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines. Proteomics 2009, 9, 1220–1229. Searle, B. C.; Turner, M.; Nesvizhskii, A. I. Improving Sensitivity by Probabilistically Combining Resuts from Multiple MS/MS Search Methodologies. J. Proteome Res. 2008, 7, 245–253. Larsen, M. R.; Thingholm, T. E.; Jensen, O. N.; Roepstorff, P.; Jørgensen, T. J. D. Highly Selective Enrichment of Phosphorylated Peptides from Peptide Mixtures Using Titanium Dioxide Microcolumns. Mol. Cell. Proteomics 2006, 4, 873–876. Gilar, M.; Ying-Qing, Y.; Ahn, J.; Fournier, J.; Gebler, J. C. Mixedmode chromatography for fractionation of peptides, phosphopeptides, and sialylated glycopeptides. J. Chromatogr., A 2008, 1191, 162–170. Schwartz, D.; Gysi, S. P. An iterative stadistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat. Biotechnol. 2005, 23 (11), 1391–1398. Amanchy, R.; Periaswamy, B.; Mathivanan, S.; Reddy, R.; Tattikota, S. G.; Pandey, A. A curated compedium of phosphorylation motifs. Nat. Biotechnol. 2007, 25 (3), 285–286. Yocum, A.; Yu, K.; Oe, T.; Blair, I. Effect of immunoaffinity depletion of human serum during proteomic investigations. J. Proteome Res. 2005, 4 (5), 1722–1731. Liu, T.; Qian, W. J.; Mottaz, H. M.; Gritsenko, M. A.; Norbeck, A. D.; Moore, R. J.; Purvine, S. O.; Camp II, D. G.; Smith, R. D. Evaluation of Multiprotein Immunoaffinity Subtraction for Plasma Proteomics and Candidate Biomarker Discovery Using Mass Spectrometry. Mol. Cell. Proteomics 2006, 5, 2167–2174. Waisman, D. Plasminogen. Structure, Activation, and Regulation; Kluwer Academic/Plenum Publishers: New York, 2000. McMahon, B.; Kwaan, H. The Plasminogen Activator System and Cancer. Pathophysiol. Haemost. Thromb. 2007, 8 (36), 184–194. Kwaan, H. C.; McMahon, B. The role of plasminogen-plasmin system in cancer. Cancer Treat Res. 2009, 148, 43–66. Zorio, E.; Gilabert-Estelle´s, J.; Espan ˜ a, F.; Ramo´n, L.; Cosı´n, R.; Estelle´s, A. Fibrinolysis: the key to new pathogenetic mechanisms. Curr. Med. Chem. 2008, 15 (9), 923–929. Shariat, S.; Roehrborn, C.; McConnell, J.; Park, S.; Alam, N.; Wheeler, T.; Slawin, K. Association of the circulating levels of the urokinase system of plasminogen activation with the presence of prostate cancer and invasion, progression, and metastasis. J. Clin. Oncol. 2007, 25 (4), 349–355. Wang, H.; Prorok, M.; Bretthauer, R. K.; Castellino, F. J. Serine578 Is a Major Phosphorylation Locus in Human Plasma Plasminogen. Biochemistry 1997, 36, 8100–8106. Castellino, F. J.; Ploplis, V. A. Human Plasminogen: Structure, Activation and Function. In Plasminogen. Structure, Activation, and Regulation; Waisman, D. M., Ed.; Klewer Academic/Plenum Publishers: New York, 2000; pp 3-11. Franco, P.; Iaccarino, C.; Chiaradonna, F.; Brandazza, A.; Iavarone, D.; Mastronicola, M.; Nolli, M.; Stoppelli, M. Phosphorylation of Human Pro-Urokinase on Ser138/303 Impairs Its Receptor-dependent Ability to Promote Myelomonocytic Adherence and Motility. J. Cell Biol. 1997, 137 (3), 779–791. Barlati, S.; De Petro, G.; Bona, C.; Paracini, F.; Tonelli, M. Phosphorylation of human plasminogen activators and plasminogen. FEBS Lett. 1995, 363, 170–174. Mancone, C.; Amicone, L.; Fimia, G.; Bravo, E.; Piacentini, M.; Tripodi, M.; Alonzi, T. Proteomic analysis of human very lowdensity lipoprotein by two-dimensional gel electrophoresis and MALDI-TOF/TOF. Proteomics 2007, 7, 143–154.
PR900780S