Characterization of the Human Plasma Phosphoproteome Using

Nov 27, 2009 - Major plasma protein families play different roles in blood physiology ... protein families from other plasma proteins, SCX fractionati...
1 downloads 0 Views 397KB Size
Characterization of the Human Plasma Phosphoproteome Using Linear Ion Trap Mass Spectrometry and Multiple Search Engines Montserrat Carrascal, Marina Gay, David Ovelleiro, Vanessa Casas, Emilio Gelpı´, and Joaquin Abian* LP-CSIC/UAB, IIBB-CSIC, IDIBAPS, Rosello´n 161, 7a_ Planta, 08036 Barcelona, Spain Received September 2, 2009

Major plasma protein families play different roles in blood physiology and hemostasis and in immunodefense. Other proteins in plasma can be involved in signaling as chemical messengers or constitute biological markers of the status of distant tissues. In this respect, the plasma phosphoproteome holds potentially relevant information on the mechanisms modulating these processes through the regulation of protein activity. In this work we describe for the first time a collection of phosphopeptides identified in human plasma using immunoaffinity separation of the seven major serum protein families from other plasma proteins, SCX fractionation, and TiO2 purification prior to LC-MS/ MS analysis. One-hundred and twenty-seven phosphosites in 138 phosphopeptides mapping 70 phosphoproteins were identified with FDR < 1%. A high-confidence collection of phosphosites was obtained using a combined search with the OMSSA, SEQUEST, and Phenyx search engines. Keywords: phosphoproteomics • human plasma • biomarkers • phosphopeptide purification • combined search engines

Introduction Protein phosphorylation is the most common reversible PTM in mammals, and it plays a primary role in protein function modulation and signal transmission. Alterations of protein phosphorylation profiles can also result from pathological conditions. The phosphoproteome is therefore seen as a source of knowledge both on the cell function and of disease biomarkers.1,2 Many efforts have been directed toward the description of the phosphoproteome of cells and tissues.3,4 In recent years this has been facilitated by the development of effective purification methods (IMAC, TiO2), new fragmentation methods (ECD, ETD, HCD) and more sensitive and precise mass spectrometric analyzers such as the Orbitrap and FT-ICR instruments.5-7 Biological fluids are currently the preferred samples for health monitoring and diagnosis due to their relative availability and potential to reflect processes in organs and tissues that cannot be easily accessed. The analysis of phosphoproteins or free phosphopeptides in biological fluids such as saliva, cerebrospinal fluid (CSF) or human serum has recently been reported.8-11 Cirulli et al. described four free phosphopeptides derived from fibrinogen in serum and 8 derived from 4 phosphoproteins in saliva using IMAC and Q-Trap tandem mass spectrometry.9 Li et al. used cerium ion-chelated magnetic silica microspheres for serum phosphopeptide enrichment and identified the same set of phosphorylated fibrinogen peptides described in the above work.11 A quantitative study by MALDITOF MS profiling of these fibrinogen peptides has been * To whom correspondence [email protected].

should

be

876 Journal of Proteome Research 2010, 9, 876–884 Published on Web 11/27/2009

addressed.

E-mail:

reported by Hu et al.10 They used titanium immobilized mesoporous silica particles to purify the phosphopeptides from cancer patients and healthy controls and showed differences between the two groups. On the other hand, 59 phosphopeptides corresponding to 44 different phosphoproteins were described in human CSF by Heegaard’s group using ultrafiltration for sample concentration and TiO2 enrichment prior to LC-MS/MS analysis.8 The detection of phosphopeptides or phosphoproteins in urine has not yet been reported. However, Gonzales et al. described 14 phosphoproteins and 19 phosphosites (p-sites) in urinary exosomes using the Pierce phosphopeptide purification kit followed by LC-MS/MS analysis in an FT instrument.12 One of the main problems in the study of fluids by proteomic approaches is the need to analyze protein markers which could be present at very low concentrations together with other proteins families that are present at high concentrations. Blood serum and plasma contain proteins with relative concentrations that can differ in 9 orders of magnitude.13,14 A similar situation exists for CSF.15 This problem is commonly addressed by depleting the most abundant serum protein families using affinity or immunoaffinity methods as a general sample preparation step.16-18 A major issue with these procedures is the concomitant loss of protein and peptide molecules which exist in blood bound to albumin or other of the depleted proteins. Consequently, the depleted fraction is often now studied as a new source of potential biomarkers.19,20 In this work we describe a collection of phosphopeptides and p-sites identified in human plasma using a procedure executed on 100 µL of plasma. The seven major families of plasma proteins were separated from the other protein com10.1021/pr900780s

 2010 American Chemical Society

Characterization of the Human Plasma Phosphoproteome ponents by immunoabsorption. Then we analyzed both the nonretained (low-abundance protein, LAP) and immunoabsorbed (high-abundance protein, HAP) fractions using SCX fractionation and titanium-dioxide purification coupled to liquid chromatography-linear ion trap tandem mass spectrometry (LC-µESI-ITMSn). One-hundred thirty-eight phosphopeptides and 128 p-sites mapping 70 phosphoproteins were identified at FDR < 1%.

Material and Methods Sample Preparation. Plasma samples were obtained from the Blood Bank of the Hospital Clinic and Hospital Vall d’Hebron (Barcelona, Spain). Samples were stored in 1-mL aliquots at -80 °C for ca. 1-3 months until analysis. Three independent analyses were carried out that included a plasma pool from 3 individuals and two identical aliquots from a different plasma pool prepared from 6 individuals, respectively. Plasma was delipidated by centrifugation at 25 000× g for 15 min at 4 °C and the most abundant plasma protein families were depleted using the MARS-7 system (Multiple Affinity Removal System, HPLC column 4.6 mm i.d. × 100 mm, Agilent Technologies, Palo Alto, CA) following the manufacturer’s instructions. This procedure eliminates approximately 88-92% of total protein by removing albumin, IgG, IgA, transferrin, haptoglobin, antitrypsin, and fibrinogen from the plasma sample. Briefly, the delipidated plasma was diluted 4-fold with MARS-7 buffer A and filtered thought a 0.22-µm cellulose membrane by centrifugation at 16 000× g for 10 min. The equivalent to 50 µL of plasma was injected into the system. Immunofractionation was monitored by UV at 214 nm. The depleted protein sample was recovered in the flow-through (LAP) fraction while the immunoabsorbed proteins (HAP fraction) were eluted with MARS-7 buffer B. Both fractions were collected and the material from two consecutive injections was pooled. A phosphatase inhibitor cocktail (Sigma, St Louis, MO) was added to the samples before storage at -80 °C. LAP samples, containing about 1 mg protein, were boiled for 5 min in the presence of 10 mM DTT, alkylated with 55 mM iodoacetamide (IAA) (30 min, 20 °C) and digested with TPCK treated trypsin (Sigma, St. Louis, MO, 2% final concentration) for 20 h at 37 °C. Digestion was stopped by adding TFA until 1% and the tryptic extract was desalted using a tC18 cartridge (Waters, Milford, MA). The eluate was evaporated to near dryness and redissolved in 400 µL of SCX solvent A. For the HAP samples, a volume equivalent to 16% of the total material in the pooled HAP fractions (approximately 1 mg protein) was precipitated with 10% TCA, washed with acetone and dissolved in 6 M urea, 25 mM Tris-HCl pH 7.8. This protein solution was then reduced with 10 mM DTT and alkylated with 55 mM IAA. The sample was diluted 3 times with ammonium bicarbonate and then digested with trypsin and processed as described for the LAP samples. SCX Chromatography. SCX separations were carried out using a Polysulfoethyl A TM, 50 × 2.1 mm, 5 µm, 200 Å column. Separation was performed at 200 µL/min using a linear gradient from 0 to 25% solvent B in 35 min and then to 100% B in 20 min (solvent A: 30% ACN, 0.1% formic acid; solvent B: 30% ACN 0.1% formic acid and 500 mM NH4Cl). Due to the high amount of material in the extracts, the total sample was divided in two aliquots of 200 µL that were injected separately. Sixteen 3-min fractions from each injection were collected. Fractions corresponding to the same time window were pooled. The same procedure was performed for both the HAP and LAP samples.

research articles Phosphopeptide Enrichment Using TiO2. SCX fractions were evaporated to 10 µL, diluted to 50 µL with 1 M glycolic acid, 5% TFA, 80% acetonitrile and loaded into a TiO2 (Titansphere 5 µm, GL Sciences Inc., Tokyo, Japan) minicolumn prepared in a GelLoader tip as described,21 and previously conditioned with 1 M glycolic acid. The samples were loaded into the tip with a syringe and were washed consecutively with 10 µL of 1 M glycolic acid, 5% TFA, 80% ACN, 20 µL of 80% ACN 1% TFA and, finally, 5 µL of water. Phosphopeptides were eluted from the tip with 20 µL 0.5% NH4OH followed by 1 µL 30% ACN. The eluate was acidified with 2 µL formic acid and stored at -80 °C until analysis. LC-MS/MS Analysis. Each extract was concentrated to about 5 µL, diluted to 40 µL with 1% formic acid and analyzed by LC-MS/MS using a linear LTQ ion trap equipped with a microESI ion source (ThermoFisher, San Jose, CA). The HPLC system consisted of an Agilent 1200 capillary pump, a binary pump, a thermostatted microinjector and a microswitch valve. Separation was carried out using a C18 preconcentration cartridge (Agilent Technologies, Barcelona, Spain) connected to a 10-cm-long, 150-µm-i.d. Vydac C18 column (Vydac, IL). Separation was performed at 1 µL/min using a linear ACN gradient from 0 to 40% in 60 min (solvent A, 0.1% formic acid; solvent B, ACN 0.1% formic acid). The LTQ instrument was operated in the positive ion mode with a spray voltage of 2 kV. The scan range of each full MS was m/z 400-1800. The spectrometric analysis was performed in an automatic dependent mode. A full MS scan followed of 8 MS/MS scans for the most abundant signals were acquired. A subsequent MS3 scan was performed when a neutral loss of -98, -49, or -32.7 (loss of H3PO4 for the +1, +2 and +3 charged ions respectively) was detected among the 5 most intense ions. Dynamic exclusion was set on (repeat counts, 1; duration, 3 min.). Database Search and Phosphopeptide Validation. Fragmentation spectra were analyzed in parallel using three different search engines: SEQUEST22 (Bioworks v3.3, ThermoFisher, San Jose, CA), Phenyx (version 2.6, GeneBio, Geneva, Switzerland) and OMSSA (version 2.1.4).23 In all cases, confident identifications were filtered using a target/decoy database strategy. For SEQUEST and OMSSA, searches were performed against the Swissprot database (Human Swiss-Prot release 14.8, 20332 entries) combined with its reversed copy. Phenyx server required separated target and decoy databases. Search parameters for SEQUEST were: peptide mass tolerance, 2 Da; fragment tolerance, 0.8 Da; enzyme was set to trypsin, allowing up to three missed cleavages; static modification, carbamidomethylated cysteine (+57 Da); variable modifications, methionine oxidation (+16 Da), phosphorylation on Ser, Thr and Tyr (+80 Da) and loss of water from Ser and Thr due to the β-elimination of phosphoric acid from the corresponding phospho-amino acid. All searches were performed considering a maximum of +3 charges for the precursor ion and both MS2 and MS3 fragmentation spectra were used. Thermo binary .raw files were used for SEQUEST search using the Bioworks software. Correct peptide sequence identifications (FDR < 1% and 5%) were evaluated independently for the three experiments using both the Xcorr score from SEQUEST and the peptide prophet D value24 as described previously.21 Thermo .raw files were converted to .dta files using the Bioworks tools. These files were submitted for database search to the Phenyx and OMSSA engines using the same parameters as described for SEQUEST. The cutoff values calculated for the different experiments for FDR < 1% were in Journal of Proteome Research • Vol. 9, No. 2, 2010 877

research articles the range 7.2-7.5 (z-score) and 1.6-2 (-log (expectation value)) for Phenyx and OMSSA, respectively. For FDR < 5% the corresponding cutoff values were in the range 6.6-6.8 (z-score) and 0.06-0.84 (-log (expectation value)). For each spectrum only the assignation of higher score for each engine was considered. In the final data sets, only those sequences obtained from only one engine or common to two or more engines were included (i.e., conflicting results were discarded). The confidence of the p-sites from identified phosphopeptides was evaluated with an additional Ascore25 analysis performed as described elsewhere.26

Results and Discussion For a typical analysis, 100 µL of human plasma was depleted with the MARS-7 system. The flow-through (LAP) and the retained (HAP) fractions (containing the low-abundance proteins and seven of the most abundant families of plasma proteins, respectively) were digested and tryptic peptides were fractionated by SCX. For each peptide fraction, phosphopeptides were isolated using TiO2. Purified fractions were subjected to LC-µESI-ITMSn, using an LTQ linear ion trap mass spectrometer. Efficient isolation of phosphopeptides from the highly complex mixtures of nonphosphorylated molecules in the biological sample is a major issue in phosphoproteome analysis. In the procedure reported, phosphopeptides are purified through a combination of SCX fractionation and TiO2 purification. In previous work we described a method for enrichment of T-lymphocyte phosphopeptides from 1D gel-separated and in-gel digested proteins using sequential IMAC and TiO2 purification and showed that this sequential enrichment increases the number of identifications.21 Similar conclusions were reported by Thingholm et al. in the analysis of human mesenchymal stem cells.27 When using SCX fractionation instead of gel separation, the higher concentration of salts in the sample requires that it be desalted before IMAC purification. TiO2 behaves in a more advantageous manner than IMAC at high salt concentrations, so in most cases TiO2 can be used as the only purification step.4,28 In preliminary experiments using SCX, we found that the number of phosphopeptides obtained was slightly higher when fractions were submitted directly to TiO2 than when they were desalted and then phosphopeptides were purified sequentially with IMAC and TiO2 (68 unique phosphopeptides for sequential IMAC and TiO2 and 87 for direct TiO2 enrichment, Supporting Information, Table 1). In consequence, we tested this strategy for the analysis of plasma phosphoproteome. Overall, this strategy showed to be very efficient and, as described here, it allowed us to identify 138 nonredundant phosphopeptides (FDR < 1%) in human plasma from healthy individuals. To increase the number and confidence of the identified sequences, three different search engines were used: SEQUEST, Phenyx and OMSSA. These tools make use of different algorithms and scores for the identification of sequences in the database matching a mass spectrum. Thus, identification of the same sequence by more than one search engine greatly increases confidence in the match.29,30 To select the best matches we used the Phenyx z-score parameter and the OMSSA expectation value. The cutoff value of these parameters was determined for two different values of false discovery rate (1% and 5%) using a target/decoy database strategy. The same procedure was applied to SEQUEST data but in this case, two 878

Journal of Proteome Research • Vol. 9, No. 2, 2010

Carrascal et al. Table 1. Summary of the Reported Human Plasma Phosphopeptide Set Filtered at Two Different Values of FDR

Spectra identified Spectra identified as nonphosphorylated peptides Spectra identified as phosphopeptides Unique phosphopeptides Unique nonphosphorylated peptides Selectivitya (%) Total p-sites High-confidence p-sitesb Phosphoproteins (Swiss-Prot)c Genesc

FDR ) 1%

FDR ) 5%

3344 2502

4014 2847

842

1167

138 718

293 833

18 127 71 70 65

26 356 193 223 221

Identifications common to more than one search engine Spectra identified 2409 2906 Spectra identified as 1910 2218 nonphosphorylated peptides Spectra identified as 551 688 phosphopeptides Unique phosphopeptides 85 92 Unique nonphosphorylated 537 603 peptides 13 13 Selectivitya (%) Total p-sites 70 74 High-confidence p-sitesb 43 47 Phosphoproteins (Swiss-Prot)c 40 44 Genesc 40 43 a Ratio between the number of phosphopeptides and the total number of identified sequences. b Those with an unequivocal p-site or with Ascore >18. c Minimal set of proteins/genes defined by the identified phosphopeptides.

different parameters, the Xcorr and the D value, were used in combination as described previously.21 A total number of 159 640 spectra were submitted to database search. About 2600 spectra were identified on average by each engine at FDR < 1%. The final set of spectra was composed of 842 spectra matching phosphorylated sequences by at least one engine, and 2502 spectra of nonphosphorylated peptides (Figure 1). When there was more than one possible phosphorylation site for a sequence, the Ascore algorithm25 was used to evaluate the best location26 and the p-site was assigned to the candidate with the highest score. P-site assignations from SEQUEST and OMSSA analyses agreed to a large extent with the Ascore evaluation (85% and 87% agreement, respectively). However, Phenyx did not perform so well and 69% of its assignations were reassigned by Ascore. On the other hand, SEQUEST yielded the highest number of phosphopeptide identifications (Figure 1). At an FDR < 1% SEQUEST identified 26% more phosphorylated sequences than OMSSA and 3% more than Phenyx. In a comparison of different search engines using the target/decoy strategy, Balgley et al.29 found that OMSSA produced twice as many hits as SEQUEST (48 328 vs 24 575) for a similar FDR. In the analysis of yeast phosphoproteome, Gygi’s group also reported a higher number of peptides identified by OMSSA at 1% FDR, whereas for phosphopeptides, SEQUEST produced 20% more identifications.5 In our conditions, SEQUEST always performed better than OMSSA, producing also 2% more identifications of normal peptides. Increasing the value of FDR to 5% resulted in similar increases in the total number of identifications passing the filter

Characterization of the Human Plasma Phosphoproteome

research articles

Figure 1. Spectra identified as phosphopeptides and nonphosphorylated peptides using the search engines SEQUEST, Phenyx and OMSSA at FDR < 1% and 5%.

for each engine. This increase was however different for phosphopeptides and nonphosphorylated peptides. While the average increase when FDR was raised from 1 to 5% was 44% for phosphopeptides, for nonphosphopeptides the increase was 12%. This difference could be the result of the different score distributions observed for phosphopeptides and nonphosphopeptides. It could also reflect the higher contribution of phosphopeptide sequences to the data set of false positives. The high probability that a false identification could correspond to a phosphorylated sequence at low FDR values is clearly seen by inspecting the distribution of peptide matches with the search engine score. Thus, in the case of Phenyx, at z-scores below 7, the number of phosphopeptide assignations in the decoy database is about 7-10 fold higher than that for nonphosphorylated peptides (Figure 2). The distributions of identifications common to 2 or more engines were clearly different when comparing phosphorylated and nonmodified sequences. At FDR 18) aafa

TCVADEsAENCDK

76

88

S82 (1)

4

1

Albumin

P02768

DSGRDYVsQFEGSALGK

48

64

S55

2

1, 2

Apolipoprotein A-I

P02647

Apolipoprotein A-II

P02652

THLAPYsDELR

p-siteb

peptide counts

aaia

sequence

experiment

protein

acc. no.

185

195

S191

2

1

SYFEKsKEQLTPLIK

63

77

S68

7

1, 2, 3

VKsPELQAEAK

52

62

S54

1

1

ARISAsAEELR

254

264

S259

1

3

Apolipoprotein A-IV

P06727

52

69

S59

1

1

Apolipoprotein A-V

Q6Q788

9

1, 3

Apolipoprotein B-100

P04114

1, 2, 3

Apolipoprotein L1

O14791

EPATLKDsLEQDLNNMNK VREsDEETQIK

4045

4055

VTEPIsAEsGEQVER

306

320

S311 (1), S314

S4048

VTEPIsAESGEQVER

306

320

S311 (1)

2

1 1, 2, 3

19

VTEPISAEsGEQVER

306

320

S314

6

RIPIEDGsGEVVLSR

290

304

S297

2

1

Complement Component 3

P01024

RLLCNGDNDCGDQsDEANCR

137

156

S150

1

1

Complement Component 8

P07358

QCVPtEPCEDAEDDCGNDFQCSTGR

87

111

T91

2

1, 2

Complement Component 9

P02748

KVTYTsQEDLVEK

19

31

S24

1

1

Complement Factor I

P05156

3

LLSLGAGEFKsQEHAK

849

864

S859

1

Coagulation Factor V

P12259

CDSSPDsAEDVR

132

143

S138 (1,2,4)

20

1, 2, 3

Alpha-2-HS-Glycoprotein

P02765

CDSSPDsAEDVRK

132

144

S138 (1,2,4)

42

1, 2, 3

FSVVYAKCDSSPDsAEDVR

125

143

S138 (1,2,4)

5

1, 2

FSVVYAKCDSSPDsAEDVRK

125

144

S138 (1,2,4)

7

1

HTFMGVVSLGSPsGEVSHPR

318

337

S330 (2)

GSVQYLPDLDDKNsQEK

74

1, 3

302

318

S315

1

1

Fetuin B

Q9UGM5

RPGGEPSPEGTTGQSYNQYsQR

2335

2356

S2354

14

1

Fibronectin 1

P02751

TNTNVNCPIECFMPLDVQADREDsRE

2361

2386

S2384 (1)

2

1

IERDsREHEEPTTsEMAEETYSPK

112

135

S116 (1), S125

1

1

IGF-binding protein 5

P24593

KAAIsGENAGLVR

125

137

S129

1

1

ITI heavy chain H1

P19827

MNFRPGVLsSR

658

668

S666

1

1

ITI heavy chain H4

Q14624

ETTCSKEsNEELTESCETK

325

343

S332

176

1, 2, 3

Kininogen 1

P01042

ETTCSKEsNEELTESCETKK

325

343

S332

67

1, 2, 3

ETTCsKESNEELTESCETKK

325

343

S329

10

IGEIKEETTsHLR

393

405

S402

1

1

IGEIKEETTSHLRsCEYK

393

410

S406

1

3

HLAQAsQELQ

683

692

S688 (1,2)

2

1

PCSK9

Q8NBP7

SRHLAQAsQELQ

681

692

S688 (1,2)

2

1 Plasminogen

P00747

Proteoglycan 4

Q92954

Selenoprotein P

P49908

Serpin A10

Q9UK55

Antithrombin-III

P01008

Alpha-2-antiplasmin

P08697

AFQYHsKEQQCVIMAENR KCSGTEASVVAPPPVVLLPDVETPsEEDCMFGNGK KQLGAGsIEECAAK KQLGAGsIEECAAKCEEDEEFTCR VQsTELCAGHLAGGTDSCQGDSGGPLVCFEKDK GRCFEsFER

1, 2, 3

63

80

S68

3

1

453

487

S477

1

1

39

52

S45

1

1

39

52

S45

2

3

739

771

S741

1

2

72

80

S77

1

1

TTSAKETQsIEK

304

315

S312

1

3

DMPAsEDLQDLQK

262

274

S266 (3)

2

1

DMPAsEDLQDLQKK

262

275

S266 (3)

2

1

VVQAPKEEEEDEQEAsEEKAsEEEK

41

65

S56, S61

6

1, 3

VVQAPKEEEEDEQEAsEEKASEEEK

41

65

S56

6

1, 3

ATEDEGsEQKIPEATNR

61

78

S68 (3)

17

1, 2, 3

KATEDEGsEQKIPEATNR

62

78

S68 (3)

23

1, 2, 3

446

461

EQQDsPGNKDFLQSLK

880

Journal of Proteome Research • Vol. 9, No. 2, 2010

S470

2

1, 2

research articles

Characterization of the Human Plasma Phosphoproteome Table 2. Continued peptide counts

aaia

aafa

31

48

S37

2

1

Heparin Cofactor 2

P05546

DQGNQEQDPNIsNGEEEEEKEPGEVGTHNDNQER

187

220

S198

2

1, 2

SPARC-like protein 1

Q14515

HIQETEWQsQEGK

287

299

S295

8

1, 2, 3

DSPSKSSAEAQtPEDTPNK

65

83

T76

1

1

Trans-golgi network protein 2

O43493

DSPSKSsAEAQTPEDTPNK

65

83

S71 (1,2)

1

1

364

380

S75

2

1

MASP1 protein

P48740

94

107

S96

2

2

Secreted phosphoprotein 24

Q13103

sequence

GGETAQsADPQWEQLNNK

KNEIDLESELKsEQVTE KDsGEDPATCAFQR

p-siteb

a

aai,aaf, position in the protein sequence of the initial and final peptide amino acids. Bahl et al.8 and (4) Plasma database (http://www.plasmaproteomedatabase.org).

of the matches in a data set but also the agreement in the assignation provided by different algorithms. Thus, a valid strategy for selection of high-confidence identifications could rely on the selection of common identifications from data sets filtered at relatively high values of FDR. This procedure is especially suitable for groups working with conventional traps, as in this case, or with other low-resolution mass spectrometers. The number of false positives in the data set of identifications supported by several engines is minimized, resulting in an FDR value lower than that set for each search engine individually.31 Recently, Jones et al. described a method for evaluating combined FDR scores for data sets produced in this way.30 Phosphopeptide enrichment is a key step for phosphoproteome analysis. TiO2 supports show affinity for phosphopeptides but also to other acidic peptides and glycopeptides. The addition of glycolic acid to a sample decreases the binding of acidic-nonphosphorylated peptides and in fact, in our data set, the percentage of acidic residues (E and D) in the nonphosphorylated sequences (18%) was only slightly higher than that expected from the amino acid distribution in the Uniprot database (12%). The percentage of phosphopeptides in the TiO2 enriched extracts was of 18% in average (nonredundant data set, FDR < 1). In previous work, we reported a selectivity of 41% for these columns in the purification of phosphopeptides from the T-cell phosphoproteome. This value was, certainly, an underestimation of TiO2 selectivity, as it was calculated from the purification of the flow-through extract from an IMAC purification. TiO2 selectivity as high as 90% has been obtained.32 However, Bahl et al. reported that TiO2 purification selectivity fell to 50% in CSF (a fluid that shows similarities with plasma in protein composition and relative concentration ranges).8 A similar result was reported by Gilar et al. when comparing the yield of phosphopeptides identified from a yeast digest and from human serum using an MOAC support.33 Thus, the apparent low selectivity obtained in the analysis of the plasma extracts could be at least partially due to the particular composition of this biological fluid. Most of the nonphosphorylated peptides found in the plasma samples were derived from high-abundance plasma proteins (data not provided). These high-abundance peptides probably diminish the recovery of other low-abundance phosphopeptides. Additionally, glycosylated proteins and peptides in plasma could compete with phosphopeptide binding to TiO2, further reducing their recovery. Of the total number of p-sites described in our collection, 47 can be considered of very high confidence as they corresponded to sequences validated by two or more engines and with low (Ascore > 18) or no ambiguity regarding the p-site location (Table 2). Of these, only seven p-sites were included

b

experiment

protein

acc. no.

p-sites described in (1) Phosphosite database, (2) UniProt, (3)

in the Phosphosite database, currently probably the largest available repository of p-sites (www.phosphosite.org). In addition, p-sites at S198 and S295 from SPARC-like protein and S266 of selenoprotein P and S68 of antithrombin III were described by Bahl et al. in CSF proteins8 and the S330 from alpha-2-HS-glycoprotein was already annotated in the SwissProt database. Our collection did not include the only fibrinogen p-site described independently by Citrulli,9 Li11 and Hu10 from 4 different nested peptide sequences in the analysis of free serum phosphopeptides. None of these sequences was a tryptic sequence so they were probably missed in our searches, which had to be performed with enzymatic restriction. (Searching from low-resolution data and many dynamic modifications is a highly computer-intensive task and, in our case, SEQUEST crashes the computer when further overloaded with a nonrestricted search). Neither did we detect the corresponding 3075 Da tryptic phosphopeptide that may be derived from the fibrinogen molecule. The amino acid distribution in our phosphopeptide data set shows enrichment in serine and glutamic acid relative to the proportions expected from the Uniprot-Swiss-Prot database (Figure 4). In fact, 58% of the high-confidence p-sites showed an acidophilic SXE motif (Motif-x algorithm34), which is specific to ATM, calmodulin-dependent II and casein kinase families.35 The ratio of serine to threonine in high-confidence p-sites was

Figure 4. Amino acid composition of the high-confidence phosphorylated sequences reported and that calculated from UniprotSwiss-Prot and Plasma databases. Insert: major phosphorylation motif from the data set of p-sites of high confidence (Motif-X, p < 10 × 10-6; Motif score, 16; fold increase, 8.92). Journal of Proteome Research • Vol. 9, No. 2, 2010 881

research articles

Carrascal et al.

Figure 5. Complement and coagulation cascades (adapted from KEGG pathways database). Proteins marked with a filled star are high-confidence identifications confirmed by several search engines. The identification marked with a hollow star was identified with only one engine at FDR < 5%.

of 45/2. The only sequence phosphorylated at Tyr that was identified by two different engines, showed however a poor Ascore value, and therefore is not included in Table 2. Phosphoproteins. The full set of phosphopeptides (FDR < 1%) pointed to a total of 70 source proteins in the UniprotSwiss-Prot database (223 at FDR < 5%). This figure corresponds to the minimal protein set defined by the identified phosphopeptides. Thus, although all protein matches are annotated in the Supporting Information tables, when a peptide sequence was common to more than one protein sequence, only one hit was counted. Nevertheless, even considering the more extense data set (FDR < 5%), only 7 peptide sequences produced ambiguous protein identifications due to sequence homology in the Uniprot Swiss-Prot database. One of these sequences was common to four proteins, two sequences to 882

Journal of Proteome Research • Vol. 9, No. 2, 2010

three proteins and the other four to 2 proteins each. Only two of these latter peptides passed the filter for FDR < 1%. Forty-four of the phosphoproteins in the data set were identified with at least one peptide sequence validated by two or more engines (see Supporting Information Table IIId). In the following discussion we only consider this protein data set. Description of new phosphosites from these proteins is also restricted to those of Ascore > 18 (Table 2). Six phosphoproteins were detected in the HAP fraction of which only 2 corresponded to proteins specifically targeted by the affinity support (albumin and fibrinogen alpha). Fibrinogen beta and gamma, IgG gamma, haptoglobin, alpha-1-antitrypsin and transferrin, six of the targeted proteins by the MARS columns, were only identified from nonphosphorylated peptides in the HAP fraction. It is well-known that depletion

research articles

Characterization of the Human Plasma Phosphoproteome systems remove many other proteins in addition to those the affinity support is directed at.36,37 Taking into account data from phosphorylated and nonphosphorylated peptides and not considering as different several entries for Ig heavy chains, Ig gammas and keratins, a total of 21 proteins, were present in the HAP fraction. As indicated, only 8 of these proteins were targeted by the affinity support. Thus, phosphopeptides from nontargeted proteins in the HAP fraction could be derived from proteins or protein fragments nonspecifically bound to the support or captured due to their affinity to one of the specifically absorbed proteins. It is interesting to note that of the 14 nontargeted proteins in the HAP fractions, 12 were already reported as forming part of the albuminome19 or being copurified with albumin.20 Twelve of the identified phosphoproteins were related to the complement and coagulation cascade (Figure 5). Four new p-sites on 4 different peptide fragments from plasminogen were identified that were situated in the plasmin heavy (S45, S68, S477) and light (S741) chains which are produced from this zymogen by the action of plasminogen activators (Spectra for plasminogen phosphopeptides are given in Supporting Information). Plasmin is a key enzyme in the fibrinolytic system where its primary function is to degrade fibrin-rich thrombi.38 The plasminogen activator system is also involved in tissue proliferation and cellular adhesion.39 Studies of several types of cancer have revealed that increases of urokinase plasminogen activator levels are related to aggressive tumor behavior and poor prognosis.40-42 Several studies have aimed to elucidate plasminogen structure-function relationships and interactions with other molecules, but the only PTMs described are N- and O-linked glycosylations and a phosphorylation in Ser 578.43,44 The functional relevance of the new p-sites described here has yet to be elucidated, but they could influence plasminogen activation or plasmin enzymatic activity as it occurs with the urokinase-type plasminogen activator45 or other enzymes involved in the coagulation cascade46 which are regulated by phosphorylation processes. Other identified phosphoproteins related with the complement and coagulation cascades include the complement components 3 and 8, the complement factor I, the coagulation factor V, fibrinogen alpha, alpha-2 macroglobulin, kininogen, the mannan-binding lectin serine peptidase 1 and the serpins antithrombim-III, alpha-2antiplasmin and heparin cofactor 2 (Figure 4). Thirteen new high confidence p-sites were identified in this group of proteins. Six additional phosphopeptides corresponding to two proteins of this group (alpha-2 macroglobulin, fibrinogen alpha chain) were also identified although the correct p-site locations could not be validated (Ascore < 18). Six different phosphorylated apolipoproteins were detected. One new phosphorylation site was described for apolipoprotein L1 (S314) and the p-site reported by Mancone et al. in S31147 was confirmed. Other newly described p-sites were S55 and S191 in ApoA-1, S68 and S54 in ApoA-II, S259 in ApoA-IV, S59 in APO-V and S4048 in ApoB. Most of the phosphoproteins identified are plasma proteins. The data set includes protease inhibitors, carrier proteins, and precursors of the complement which are involved in relevant biological functions such as those related to blood circulation and clotting, immunity and lipid transport. The new phosphorylation sites reported here provide of a collection of new tentative sites for protein activity regulation that could be important in understanding the mechanisms involved in blood physiology and homeostasis.

Conclusions Using a procedure based on SCX fractionation, TiO2 and LCµESI-ITMSn, we have reported phosphoproteins and p-sites from human plasma for the first time. Analysis of the phosphoproteome of biological fluids such as serum, plasma and CSF imposes specific conditions on phosphopeptide isolation due to the complexity of the biological matrix. In addition, the use of conventional ion traps limits the accuracy of mass measurements and increases the rate of false identifications. However, the use of several search engines in parallel allows the selection of high-confidence data sets with a reduced proportion of false positives. Phosphorylation is a highly functionally important and common modification of proteins. Characterization of these peptides and proteins in blood is a first step toward a better characterization of the many physiological processes blood cells and cells from the vascular walls are involved in, as well as those processes originating in other tissues and organs whose state is affected or reflected by compounds transported in blood.

Acknowledgment. This work was supported by grant BIO2008-03369 from the Spanish Ministerio de Ciencia e Innovacio´n. The LP-CSIC/UAB is a member of ProteoRed (http://www.proteored.org), funded by Genoma Spain, and follows the quality criteria set up by ProteoRed standards. Supporting Information Available: Supplementary Table 1. Comparison of sequential purification using IMAC and TiO2 with previous desalting and direct purification with TiO2. Supplementary Tables IIa-d. Peptides, phosphopeptides and phosphoproteins at FDR < 1%. Supplementary Table IIIa-d. Peptides, phosphopeptides and phosphoproteins at FDR < 5%. Supplementary Figure 1. Annotated MS/MS spectra from the four identified phosphopeptides of Plasminogen. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Geoerger, B.; Gaspar, N.; Opolon, P.; Morizet, J.; Devanz, P.; Lecluse, Y.; Valent, A.; Lacroix, L.; Grill, J.; Vassal, G. EGFR tyrosine kinase inhibition radiosensitizes and induces apoptosis in malignant glioma and childhood ependymoma xenografts. Int. J. Cancer 2008, 123 (1), 209–216. (2) Yamada, M.; Ikeda, Y.; Yano, M.; Yoshimura, K.; Nishino, S.; Aoyama, H.; Wang, L.; Aoki, H.; Matsuzaki, M. Inhibition of protein phosphatase 1 by inhibitor-2 gene delivery ameliorates heart failure progression in genetic cardiomyopathy. FASEB J. 2006, 20 (8), 1197–1199. (3) Olsen, J. V.; Blagoev, B.; Gnad, F.; Macek, B.; Kumar, C.; Mortensen, P.; Mann, M. Global, in vivo, and site-specific phosphorylation dynamics in signaling networks. Cell 2006, 127, 635–648. (4) Zanivan, S.; Gnad, F.; Wickstrom, S. A.; Geiger, T.; Macek, B.; Cox, J.; Fassler, R.; Mann, M. Solid Tumor Proteome and Phosphoproteome Analysis by High Resolution Mass Spectrometry. J. Proteome Res. 2008, 7, 5314–5326. (5) Bakalarski, C. E.; Haas, W.; Dephoure, N. E.; Gygi, S. P. The effects of mass accuracy, data acquisition speed, and search algorithm choice on peptide identification rates in phosphoproteomics. Anal. Bioanal. Chem. 2007, 389, 1409–1419. (6) Domon, B.; Bodenmiller, B.; Carapito, C.; Hao, Z.; Huehmer, A.; Aebersold, R. Electron Transfer Dissociation in Conjunction with Collision Activation To Investigate the Drosophila melanogaster Phosphoproteome. J. Proteome Res. 2009, 8, 2633–2639. (7) Mann, M.; Kelleher, N. L. Precision proteomics: high mass accuracy. Proc. Natl. Acad. Sci. U.S.A. 2008, 105 (47), 18132–18138. (8) Bahl, J. M. C.; Jensen, S. S.; Larser, M. R.; Heegaard, N. H. H. Characterization of the human cerebrospinal fluid phosphopro-

Journal of Proteome Research • Vol. 9, No. 2, 2010 883

research articles (9)

(10)

(11)

(12)

(13) (14) (15)

(16)

(17) (18)

(19) (20) (21)

(22)

(23)

(24)

(25)

(26) (27)

884

teome by titanium dioxide affinity chromatography and mass spectrometry. Anal. Chem. 2008, 80, 6308–6316. Cirulli, C.; Chiappetta, G.; Marino, G.; Mauri, P.; Amoresano, A. Identification of free phosphopeptides in different biological fluids by a mass spectrometry approach. Anal. Bioanal. Chem. 2008, 392, 147–159. Hu, L.; Zhou, H.; Li, Y.; Sun, S.; Guo, L.; Ye, M.; Tian, X.; Gu, J.; Yang, S.; Zou, H. Profiling of Endogenous Serum Phosphorylated Peptides by Titanium (IV) Immobilized Mesoporous Silica Particles Enrichment and MALDI-TOFMS Detection. Anal. Chem. 2009, 81 (1), 94–104. Li, Y.; Qi, D.; Deng, C.; Yang, P.; Zhang, X. Cerium Ion-Chelated Magnetic Silica Microspheres for Enrichment and Direct Determination of Phosphopeptides by Matrix-Assisted Laser Desorption Ionization Mass Spectrometry. J. Proteome Res. 2008, 7 (4), 1767– 1777. Gonzales, P. A.; Pisitkun, T.; Hoffert, J. D.; Tchapyjnikov, D.; Star, R. A.; Kleta, R.; Wang, N. S.; Knepper, M. A. Large-Scale Proteomics and Phosphoproteomics of Urinary Exosomes. J. Am. Soc. Nephrol. 2009; doi: 10.1681/ASN.2008040406. Anderson, N. L.; Anderson, N. G. The Human Plasma Proteome. History, character, and diagnostic prospects. Mol. Cell Proteomics 2002, 1, 845–867. Turner, M. W.; Hulme, B. The Plasma Proteins: An Introduction; Pitman Medical & Scientific Publishing Co., Ltd.: London, 1970. Pan, S.; Zhu, D.; Quinn, J. F.; Peskind, E. R.; Montine, T. J.; Lin, B.; Goodlett, D. R.; Taylor, G.; Eng, J.; Zhang, J. A combined dataset of human cerebrospinal fluid proteins identified by multidimensional chromatography and tandem mass spectrometry. Proteomics 2007, 7, 469–473. Adkins, J. N.; Varnum, S. M.; K.J., A.; Moore, R. J.; Angell, N. H.; Smith, R. D.; Springer, D. L.; Pounds, J. L. Toward a Human Blood Serum Proteome: Analysis by multidimensional separation coupled with mass spectrometry. Mol. Cell. Proteomics 2002, 4, 7–955. Linke, T.; Doraiswamya, A.; Harrison, E. H. Rat plasma proteomics: Effects of abundant protein depletion on proteomic analysis. J. Chromatogr. 2007, 849, 273–281. Roche, S.; Tiersb, L.; Provansalb, M.; Sevenod, M.; Pivaa, M.-T.; Jouind, P.; Lehmanna, S. Depletion of one, six, twelve or twenty major blood proteins before proteomic analysis: The more the better? J. Proteomics 2009; doi:10.1016/j.jprot.2009.03.008. Gundry, R. L.; Cotter, R. J. The Albuminome as a Tool for Biomarker Discovery. In Clinical Proteomics. From Diagnosis to Therapy; Van Eyk, J., Dunn, M. J., Eds.; Wiley: London, 2007; pp 263-278. Gay, M.; Carrascal, M.; Gorga, M.; Pare´s, A.; Abian, J. Characterization of peptides and proteins in commercial human serum albumin solutions Proteomics 2009, DOI: 10.1002/pmic.200900182. Carrascal, M.; Ovelleiro, D.; Casas, V.; Gay, M.; Abian, J. Phosphorylation analysis of primary human T lymphocytes using sequential IMAC and titanium oxide enrichment. J. Proteome Res. 2008, 7, 5167–5176. Eng, J. K.; McCormack, A. L.; Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, 976– 989. Geer, L. Y.; Markey, S. P.; Kowalak, J. A.; Wagner, L.; Xu, M.; Maynard, D. M.; Yang, X.; Shi, W.; Bryant, S. H. Open mass spectrometry search algorithm. J. Proteome Res. 2004, 3 (5), 958– 964. Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74, 5383– 5392. Beausoleil, S. A.; Villen, J.; Gerber, S. A.; Rush, J.; Gysi, S. P. A probability-based approach for high-throughtput protein phosphorylation analysis and site localization. Nat. Biotechnol. 2006, 24 (10), 1285–1292. Ovelleiro, D.; Carrascal, M.; Casas, V.; Abian, J. LymPHOS: Design of a phosphosite database of primary human T cells. Proteomics 2009, 9, 3741–3751. Thingholm, T. E.; Jensen, O. N.; Robinson, P. J.; Larsen, M. R. SIMAC - A phosphoproteomic strategy for the rapid separation of

Journal of Proteome Research • Vol. 9, No. 2, 2010

Carrascal et al.

(28) (29)

(30) (31) (32)

(33)

(34) (35) (36) (37)

(38) (39) (40) (41) (42)

(43) (44)

(45)

(46) (47)

monophosphorylated from multiply phosphorylated peptide. Mol. Syst. Biol. 2008, 7, 661–671. Pan, C.; Olsen, J.; Daub, H.; Mann, M. Global effects of kinase inhibitors on signaling networks revealed by quantitative phosphoproteomics. Mol. Cell. Proteomics 2009, 8, 2796–2808. Balgley, B. M.; Laudeman, T.; Yang, L.; Song, T.; Le, C. S. Comparative Evaluation of Tandem MS Search Algorithms Using a Target-Decoy Search Strategy. Mol. Cell. Proteomics 2007, 6, 1599–1608. Jones, A. R.; Siepen, J. A.; Hubbard, S. J.; Paton, N. W. Improving sensitivity in proteome studies by analysis of false discovery rates for multiple search engines. Proteomics 2009, 9, 1220–1229. Searle, B. C.; Turner, M.; Nesvizhskii, A. I. Improving Sensitivity by Probabilistically Combining Resuts from Multiple MS/MS Search Methodologies. J. Proteome Res. 2008, 7, 245–253. Larsen, M. R.; Thingholm, T. E.; Jensen, O. N.; Roepstorff, P.; Jørgensen, T. J. D. Highly Selective Enrichment of Phosphorylated Peptides from Peptide Mixtures Using Titanium Dioxide Microcolumns. Mol. Cell. Proteomics 2006, 4, 873–876. Gilar, M.; Ying-Qing, Y.; Ahn, J.; Fournier, J.; Gebler, J. C. Mixedmode chromatography for fractionation of peptides, phosphopeptides, and sialylated glycopeptides. J. Chromatogr., A 2008, 1191, 162–170. Schwartz, D.; Gysi, S. P. An iterative stadistical approach to the identification of protein phosphorylation motifs from large-scale data sets. Nat. Biotechnol. 2005, 23 (11), 1391–1398. Amanchy, R.; Periaswamy, B.; Mathivanan, S.; Reddy, R.; Tattikota, S. G.; Pandey, A. A curated compedium of phosphorylation motifs. Nat. Biotechnol. 2007, 25 (3), 285–286. Yocum, A.; Yu, K.; Oe, T.; Blair, I. Effect of immunoaffinity depletion of human serum during proteomic investigations. J. Proteome Res. 2005, 4 (5), 1722–1731. Liu, T.; Qian, W. J.; Mottaz, H. M.; Gritsenko, M. A.; Norbeck, A. D.; Moore, R. J.; Purvine, S. O.; Camp II, D. G.; Smith, R. D. Evaluation of Multiprotein Immunoaffinity Subtraction for Plasma Proteomics and Candidate Biomarker Discovery Using Mass Spectrometry. Mol. Cell. Proteomics 2006, 5, 2167–2174. Waisman, D. Plasminogen. Structure, Activation, and Regulation; Kluwer Academic/Plenum Publishers: New York, 2000. McMahon, B.; Kwaan, H. The Plasminogen Activator System and Cancer. Pathophysiol. Haemost. Thromb. 2007, 8 (36), 184–194. Kwaan, H. C.; McMahon, B. The role of plasminogen-plasmin system in cancer. Cancer Treat Res. 2009, 148, 43–66. Zorio, E.; Gilabert-Estelle´s, J.; Espan ˜ a, F.; Ramo´n, L.; Cosı´n, R.; Estelle´s, A. Fibrinolysis: the key to new pathogenetic mechanisms. Curr. Med. Chem. 2008, 15 (9), 923–929. Shariat, S.; Roehrborn, C.; McConnell, J.; Park, S.; Alam, N.; Wheeler, T.; Slawin, K. Association of the circulating levels of the urokinase system of plasminogen activation with the presence of prostate cancer and invasion, progression, and metastasis. J. Clin. Oncol. 2007, 25 (4), 349–355. Wang, H.; Prorok, M.; Bretthauer, R. K.; Castellino, F. J. Serine578 Is a Major Phosphorylation Locus in Human Plasma Plasminogen. Biochemistry 1997, 36, 8100–8106. Castellino, F. J.; Ploplis, V. A. Human Plasminogen: Structure, Activation and Function. In Plasminogen. Structure, Activation, and Regulation; Waisman, D. M., Ed.; Klewer Academic/Plenum Publishers: New York, 2000; pp 3-11. Franco, P.; Iaccarino, C.; Chiaradonna, F.; Brandazza, A.; Iavarone, D.; Mastronicola, M.; Nolli, M.; Stoppelli, M. Phosphorylation of Human Pro-Urokinase on Ser138/303 Impairs Its Receptor-dependent Ability to Promote Myelomonocytic Adherence and Motility. J. Cell Biol. 1997, 137 (3), 779–791. Barlati, S.; De Petro, G.; Bona, C.; Paracini, F.; Tonelli, M. Phosphorylation of human plasminogen activators and plasminogen. FEBS Lett. 1995, 363, 170–174. Mancone, C.; Amicone, L.; Fimia, G.; Bravo, E.; Piacentini, M.; Tripodi, M.; Alonzi, T. Proteomic analysis of human very lowdensity lipoprotein by two-dimensional gel electrophoresis and MALDI-TOF/TOF. Proteomics 2007, 7, 143–154.

PR900780S