Proteomics of Multigenic Families from Species ... - ACS Publications

Jul 12, 2008 - 1995, 27, 429. (19) Chevalier, T.; De Rigal, D.; Mbeguié, D.; Gauillard, F.; Richard-. Forget, F.; Lycaon, F. R. B. Plant Physiol. 199...
3 downloads 0 Views 4MB Size
Proteomics of Multigenic Families from Species Underrepresented in Databases: The Case of Loquat (Eriobotrya japonica Lindl.) Polyphenol Oxidases Susana Selle´s-Marchart,† Ignacio Luque,‡ Juan Casado-Vela,§ Maria Jose´ Martı´nez-Esteso,† and Roque Bru-Martı´nez*,† Grupo de Proteo´mica y Geno´mica Funcional de Plantas, Departamento de Agroquı´mica y Bioquı´mica, Facultad de Ciencias, Universidad de Alicante, Spain, Instituto de Bioquı´mica Vegetal y Fotosı´ntesis (CSIC/ Universidad de Sevilla), Avenida Ame´rico Vespucio 49.41092 Sevilla, Spain, Protein Technology Unit, Biotechnology Programme, Spanish National Cancer Centre (CNIO), Madrid, Spain Received October 23, 2007

Here, we approach the problem of obtaining accurate and reliable information about the gene origin of a protein belonging to a multigenic family, polyphenol oxidase, from an underrepresented species, Eriobotrya japonica. De novo sequencing was a key approach to obtain broad sequence coverage. Alignment of peptides on their most similar homologous protein revealed divergent amino acid positions that lead to hypothesize the minimal number of genes encoding for the proteins analyzed. Keywords: de novo sequencing • underrepresented species in databases • multigenic families • polyphenol oxidase • phylogenetic analysis

1. Introduction Protein identification is a central issue in proteomics that can be routinely achieved through the comparison of experimental mass spectral peak lists with theoretical ones calculated after a peptide database.1,2 Real protein identification thus involves the assignment of experimental MS data from the protein to a gene product. Therefore, real identification can only be possible when the exact gene or protein sequence is present in the database.3 Otherwise, the information from mass spectrometry (MS) data may merely help to discover the protein function by homology to sequences sharing peptides with the protein under study. Moreover, the criteria of sharing of peptides between orthologs may be not enough to assign a function to the protein, as proteins with different functions may share one peptide or more that have been referred to as degenerate peptides.4 In the case of proteins encoded by a multigenic family of paralogous genes, the number of shared peptides is expected to increase, thus, making the real identification even more complex. Delanlade et al.5 proposed an strategy to identify proteins at the paralog level by searching a genome database to genuinely identify a protein by means of discriminating peptides. Since only a few eukaryotic species are wide-genome-sequenced, this strategy has been relatively restricted to date. * To whom correspondence should be addressed. Prof. Dr. Roque Bru Martı´nez, Grupo de Proteo´mica y Geno´mica Funcional de Plantas, Departamento de Agroquı´mica y Bioquı´mica, Facultad de Ciencias, Universidad de Alicante, Campus de San Vicente del Raspeig, Apdo. 99, E-03080 Alicante, Spain. Phone: +34 965903400. Fax: +34 965903880. E-mail: [email protected]. † Universidad de Alicante. ‡ Instituto de Bioquı´mica Vegetal y Fotosı´ntesis (CSIC/Universidad de Sevilla). § Spanish National Cancer Centre (CNIO). 10.1021/pr700687c CCC: $40.75

 2008 American Chemical Society

Nonetheless, analyzing and discovering the gene origin of a protein is a key milestone in the detailed understanding of the biological processes in which it functions, thus, enabling to address its relevance as a biomarker, a drug target, and so forth. Given the above rationale, it is obvious that the univocal identification of a protein from an underrepresented species in databases using standard approaches is an impossible enterprise. However, the combination of a series of proteomic and bioinformatic resources may lead to obtain quite accurate information about the gene origin of such a protein. Here, we deal with the case of polyphenol oxidase (PPO) from loquat (Eriobotrya japonica Lindl.) fruit, a Rosaceae family species to which other important tree fruit crops belong, such as apple, pear, peach, apricot, prune, almond, and so forth. Polyphenol oxidases (PPOs) (EC 1.14.18.1 or EC 1.10.3.2) are ubiquitous plant enzymes that catalyze the O-dependent oxidation of mono- and o-diphenols to o-quinones. The oxidation of phenolic substrates by PPO causes enzymatic browning of many fruits and vegetables during ripening as well as handling, storage and processing6–9 which is a problem of considerable importance to the agri-food industry.6,7 Multigenic families encoding different PPO polypeptides have been reported for tomato,10 potato,11 banana,12 and hybrid poplar.13 In Rosaceae, Southern experiments provide evidence of the existence of several PPO genes,14 while two different PPO genes have been cloned and sequenced for apple.15 The expression of PPO paralogs has been demonstrated to be regulated both developmentally and following a variety of stress conditions in a number of species,11–13,15–18 although the coexpression of two PPO genes or more in the same tissue at the same developmental stage has also been reported.15,16 Moreover, in apricot fruit, the PPO protein and its catalytic activity are present whatever the fruit age, although Journal of Proteome Research 2008, 7, 4095–4106 4095 Published on Web 07/12/2008

research articles

Selle´s-Marchart et al.

Figure 1. Diversity of PPO polypeptides in loquat fruit. (A) SDS-PAGE of the purified latent PPO from the particulate fraction, (B) SDSPAGE (left) and Western-blot (right) of the particulate fraction protein extract, and (C) SDS-PAGE (left) and Western-blot (right) of the soluble fraction protein extract. The antiserum immunoreacted with two bands of 59.2 (P1) and 66.0 (P2) kDa in the particulate fraction (B), and three bands of 59.2 (S1), 62.0 (S2) and 66.0 (S3) kDa in the soluble fraction (C).

its coding gene, PA-PPO, is transcriptionally active only in the early immature-green fruits.19 In addition, an antibody raised against a purified preparation of apple PPO was shown to crossreact against several bands ranging from 55 to 65 kDa from fruit and leaf extracts of plants from the Rosaceae family, including apple, Japanese pear, pear, Chinese quince, peach, persimon and loquat.14 There are both a genetic basis10–14 and direct experimental evidence11–18 that support the occurrence of PPO isoforms in the same tissue at the same developmental stage, although conclusive data on their uni- or multigenic origin, that is, isoform-specific information, have been never reported. In addition to the proposed role of PPO in plant resistance against bacterial diseases20,21 and insect pests,13 the occurrence of paralogs and their spatial-temporal expression pattern point to other, still undisclosed biological functions of PPO.22 The determination of the genic origin of PPO isoforms, a challenging task, would provide unique and valuable information for the elucidation of the physiological roles of this enzyme. For this, several strategies can be used. One is the use of isoform-specific antibodies, an approach that relies both on the anticipated knowledge of the specific protein sequences to select specific antigenic regions and on the postproduction antibody validation. Another one is the N-terminal sequencing. As compared to the former, it is a more cost-effective alternative, but this technique requires highly pure protein samples that cannot be usually prepared by SDS-PAGE of crude extracts. Instead, the application of LC-MS/MS in the frame of a proteomic workflow (1-D or 2-D electrophoretic separationtryptic digestion-MS analysis-database search) is a strategy that provides a huge amount of information on peptide sequences from protein mixtures that, after proper assembly,23 might lead to the identification of proteins present in the mixture even at the paralog level. The information obtained is so proteinspecific that the tobacco mosaic virus (TMV) coat protein MS data have been used to differentiate between tobamovirus strains,24 while the expression of isoforms of phenylalanine ammonia lyase, mitochondrial chaperonin and fructose bisphosphate aldolase have been assessed in rice.5 The loquat tree (E. japonica) is a very underrepresented species in public databases which register less than 40 entries in the NCBI nucleotide and less than 30 in the NCBI protein databases and only one partial cDNA sequence of PPO (Acc. 4519439). The simple automated use of MS/MS spectra for database search is anticipated to be quite an unfruitful strategy as most of the good quality spectra would be left unassigned.25 Instead, the 4096

Journal of Proteome Research • Vol. 7, No. 9, 2008

use of de novo peptide sequences, deduced by the interpretation of MS/MS spectra26 to search the databases, allows homology-based protein identification, thus, enhancing the validation rate of spectral data. In this work, we have used a combination of Western blot, N-terminal sequencing and LC-MS/MS analysis to conclusively determine that at least two paralogs highly similar to the homologous apple PPO215 and apricot PPO,19 respectively, coexist in loquat fruit, and that these two are different from the only loquat PPO fragment known in the public databases.14 Preprotein forms of the paralog homologous to apple PPO2 could also be identified.

2. Materials and Methods 2.1. Plant Material. Loquat fruits (E. japonica) were obtained from the experimental orchards of Cooperativa Agrı´cola de Callosa d’En Sarria´, Alicante, Spain, harvested between 10 and 12 weeks after fruit set. 2.2. Reagents. Triton X-114 (TX-114) was obtained from Fluka and condensed three times as described by Bordier27 but using 50 mM potassium phosphate buffer, pH 7.0. The TX-114 concentration of the third condensation detergent phase was 21% (w/v), and was used as the stock solution of detergent for enzyme extraction. Protease Inhibitor cocktail for general use (4-(2-aminoethyl) benzenesulfonyl fluoride (AEBSF) 2.0 mM; EDTA 1.0 mM; bestatin 130 µM; E-64 14 µM; leupeptin 1.0 µM, and aprotinin 0.3 µM) was obtained from Sigma. Trypsin was purchased from Promega (Madison WI). All the other reagents were of analytical grade. 2.3. Preparation of Loquat Fruit Soluble and Particulate Fractions. The PPO-containing soluble and particulate fractions of loquat fruit flesh were prepared as described by Selle´s et al.28 with minor modifications. Briefly, 50 g of loquat flesh was homogenized in 150 mL of cold 0.1 M sodium phosphate pH 7.0 buffer containing 50 mM ascorbic acid. The homogenate was filtered through 8 layers of gauze and supplemented with a cocktail of protease inhibitors (4 mL/ L), then constituting the crude extract. The clear supernatant from the crude extract centrifuged at 60 000g for 30 min at 4 °C constituted the soluble fraction. The pellet was resuspended with 0.8 vol of cold 10 mM sodium phosphate buffer, pH 7.0, by probe sonication for 5 min, supplemented with cold TX-114 to a final concentration of 1.5% (w/v) and centrifuged at 5000g for 15 min at 20 °C. The lower dense

research articles

Proteomics of Multigenic Families Table 1. Hit List of Proteins for P1 and Purified Latent PPO access number NCBI/ Uniprot

protein name

14194273/Q93XM8 Polyphenol oxidase 2 precursor 1172586/Q06215 Polyphenol oxidase A1, chloroplast precursor 3282505/O81103 Polyphenol oxidase precursor 7209776/Q5ENY2 Polyphenol oxidase 1172582/Q08307 Polyphenol oxidase E, chloroplast precursor 1346774/Q08303 Polyphenol oxidase A, chloroplast precursor 1172580/Q08305 Polyphenol oxidase C, chloroplast precursor

a

MS/MS peptides de novo peptides sequence coverage number number (%)(mature)

Mr (Da) (mature)

pI (mature)

species

55425.60 58604.45

5.38 6.00

Malus domestica Vicia faba

4c 1

10 1

37.3 4.86

56189.53 65655.50b 57124.55

5.64 6.27 5.88

0 0 0

2 2 1

4.44 3.57 2.20

61593.07

6.33

Prunus armeniaca Ipomoea batatas Lycopersicon esculentum L. esculentum

0

1

2.03

61678.97

5.87

L. esculentum

0

1

2.03

a Protein list retrieved from the public protein databases using MS/MS spectra, N-terminal sequence tag and sequence tags deduced by de novo interpretation of MS/ MS spectra. Entries have been sorted according to sequence coverage. b Mr of the complete sequence. Signal peptides are not known. c Including the N-terminal sequence.

Table 2. Peptide List of Purified Latent PPO and P1 (*) Bands and Those of Their Most Similar Homologous Protein 14194273/Q93XM8 polyphenol oxidase 2 precursor (PPO 2) M. domestica

59 kDa purified latent PPO and P1 (*) bandE. japonica

peptides

APVSAPDLTTCGP (R)DPLFYSHHSNVDR (K)SEFAGSFVHVPHK (K)GPITIGGFSIELINTT (K)LLDLNYGGTDDDVDDATR (K)FDVYVNDDAESLAGK ...IFTDTSSSLYDQNR ...TDWLDAEFLFYDEK ...LLEELDAETDSSLVV... ...PTQTNGEDMGTFYSAGR (K)LPDRGPLR (K)GIEFAGNEPVK (K)AHDEVLVIK (K)LGYVYDEKVPIPWLK

peptides

score

APVSAPDLTTXGP *DPLFYSHHSNVDR SEFAGSFVHVPHK GPITIGGFSIELINTT c [LL]DLNYSGTDDDVDDA[TR]c *c[FD]VYVNDDADSLA[GK]c *c[IF]TDTSSSLYDQYR c [TD]WLDTEFLFYDEK d [483.51]ELEMLGAETDSSLVV[538.65]d *c[PT]QTNGEDFGAFYSA[GR]c LPDSGPLR *c[GI]EFAGNETVK AQDEVLVIK *c[LGY]VYDENVSIP[331.16]dK

s 20.45a 12.63a 18.86a 293b 258b 238b 232b 248b 244b 117b 182b 158 170b

MH

+

∆M (MH+theoreticalMH+P1)

coincidences P1 E. japonica / PPO 2M. domestica

-0.4 -0.8 -0.4 -0.3 -0.4 -0.3 -0.5 -0.5 0.7 -1.6 -0.0 -0.5 -0.6

12/13 13/13 13/13 16/16 17/18 14/15 13/14 13/14 12/15 15/17 7/8 10/11 8/9 10/12

N-terminal 1587.1 1442.5 1633.2 1998.2 1629.2 1696.1 1822.3 2615.4 1817.1 856.1 1164.9 1015.1 1828.7

a Scoring of peptide sequences determined by MS/MS database search using Spectrum Mill Proteomics Workbench. b Scoring of peptide sequences determined by de novo sequencing with Sherenga’s algorithm (Dancik et al.)26 Amino acids in brackets could not be interpreted by de novo but the uninterpreted mass can be. c Di- or tripeptide tags from the tryptic peptides of the most similar homologous protein whose mass matches the uninterpreted stretch in the spectrum. d Uninterpreted mass stretch in the spectrum.

Table 3. Hit List of Proteins for P2a access number NCBI/ Uniprot

protein name

3282505/O81103 Polyphenol oxidase precursor 14194273/Q93XM8 Polyphenol oxidase 2 precursor 1172587/P43311 Polyphenol oxidase, chloroplast precursor 1172583/Q08296 Polyphenol oxidase F, chloroplast precursor 15487290/Q948S3 Polyphenol oxidase 1346774/Q08303 Polyphenol oxidase A, chloroplast precursor 1172580/Q08305 Polyphenol oxidase C, chloroplast precursor 27902363/Q84YH9 Polyphenol oxidase 1785613/P93622 Polyphenol oxidase

de novo MS/MS peptides sequence coverage peptides number number (%) (mature)

Mr (Da) (mature)

pI (mature)

56189.53 55425.60 56718.23

5.64 5.38 5.65

P. armeniaca M. domestica Vitis vinifera

4 0 0

9 4 2

30.9 9.53 4.36

66183.2

6.27

L. esculentum

0

2

4.21

65774.87b 61593.07

8.22 6.33

Pyrus pyrofila L. esculentum

0 0

2 2

3.88 3.87

61678.97

5.87

L. esculentum

0

2

3.87

67737.78b 67389.6b

7.60 6.39

Trifolium pratense V. vinifera

0 0

2 2

3.67 3.62

species

a Proteins list retrieved from the public protein databases using MS/MS spectra, and sequence tags deduced by de novo interpretation of MS/MS spectra. Entries have been sorted according to sequence coverage. b Mr of the complete sequence. Signal peptides are not known.

phase formed was discarded and the upper light phase was submitted to detergent phase partitioning twice by repeating the following process: addition of cold TX-114 to a final concentration of 4% (w/v), 30 min incubation on ice with

gentle shaking, 30 min incubation at 37 °C to induce phase separation and centrifugation at 5000g for 15 min at 25 °C. The upper light phase from the last centrifugation constituted the solubilized particulate fraction. Journal of Proteome Research • Vol. 7, No. 9, 2008 4097

research articles

Selle´s-Marchart et al.

Table 4. Hit List of Proteins for S1, S2 and S3. Protein List Retrieved from the Public Protein Databases Using MS/MS Spectra, and Sequence Tags Deduced by de novo Interpretation of MS/MS Spectra. Entries Have Been Sorted According to Sequence Coverage MS/MS peptides number

de novo peptides number

sequence coverage (%) (mature)

S1 band E. japonica 55425.60 5.38 M.domestica 56189.53 5.64 P. armeniaca 67008.70 8.30 Ananas comosus

3 0 0

4 3 1

20.7 7.25 1.99

Polyphenol oxidase 2 precursor

S2 band E. japonica 55425.60 5.38 M. domestica

1

3

11.6

Polyphenol oxidase 2 precursor Polyphenol oxidase precursor

S3 band E. japonica 55425.60 5.38 M. domestica 56189.53 5.64 P. armeniaca

1 1

4 1

16.8 5.24

access number NCBI/ Uniprot

protein name

14194273/Q93XM8 3282505/O81103 13559508/Q9AU63

Polyphenol oxidase 2 precursor Polyphenol oxidase precursor Polyphenol oxidasea

14194273/Q93XM8 14194273/Q93XM8 3282505/O81103 a

Mr (Da) (mature)

pI (mature)

species

Mr of the complete sequence. Signal peptides are not known.

Table 5. Peptide List of Band P2 and Those of Its Most Similar Homologous Protein 3282505/ O81103 polyphenol oxidase precursor (PPO) P. armeniaca peptides

(R)MYLYFYER (R)IDENLAIMYR (K)FDVFINDDAESLSR (R)VSNSPITIGGFKIEYSS ...EDMGNFYSAGR (K)TPDLFFGHAYR (K)TFKPDLSIPLR ...TDWLDAEFLFYEN (K)SEFAGSFVHVPQG (R)DPLFYAHHANVDR ...YEPVSLPWLFTK... (R)AGNLNTGKYPGTIENMPH.. MWNIWK

P2 band E. japonica peptides

MYLYFYER IDENLAIMYR FDVFINDDAESLSR VSNSPITIGGFKIEYSS c [ED]MGNFYSA[GR]c TPDLFFGHE[YR]c c [TF]TPDLSI[PLR]c c [TD]WLDTEFLFYDEK c [SE]FAGSFVHVPHN[256.32]d DPLFYAHHC[NV]cDR d [264.34]YEPVSVPWLFTK Ad[272.90]TTGTHPGTIENT[PH]c MWNIWK

score a

13.14 16.23a 22.00a 13.74a 149b 196b 140b 224b 207b 165b 210b 227b 100b

MH

+

1182.9 1238.4 1628.2 1799.1 1246.8 1381.9 1260.3 1822.1 1683.9 1644.9 1730.0 1806.8 877.9

∆M (MH theoretical-MH+ P2)

coincidences P2 E. japonica /PPO P. armeniaca

1.6 -0.8 -0.4 -0.2 -0.3 -0.2 -1.2 -0.3 0.1 -1.2 1.1 0.2 -0.4

8/8 10/10 14/14 17/17 11/11 10/11 10/11 12/14 11/13 12/13 11/12 11/15 6/6

+

a Scoring of peptide sequences determined by MS/MS database search using Spectrum Mill Proteomics Workbench. b Scoring of peptide sequences determined by de novo sequencing with Sherenga’s algorithm (Dancik et al.).26 Amino acids in brackets could not be interpreted by de novo but the uninterpreted mass can be. c Di- or tripeptide tags from the tryptic peptides of the most similar homologous protein whose mass matches the uninterpreted stretch in the spectrum. d Uninterpreted mass stretch in the spectrum.

2.4. Purification of 59 kDa PPO. Homogeneous latent PPO was prepared as described by us elsewhere.28 The preparation displayed a single band of 59.2 kDa in SDS-PAGE. 2.5. Sample Preparation and SDS-PAGE. Quantitative protein precipitation was performed as described by Bensadoun and Weinstein.29 Protein samples were supplemented with ultrapure water to a final volume of 750 µL, and then 8.5 µL of 2% sodium desoxicholate was added. The mixture was vortexed and incubated for 15 min on ice, and proteins were precipitated by adding 250 µL of 24% TCA in water, 30 min incubation on ice and centrifugation at 11 000 rpm for 10 min in a benchtop microcentrifuge. The pellet was washed three times in chilled acetone, left to dry and solubilized in 20 µL of SDS-PAGE sample buffer, followed by boiling for 3 min. If the sample was not clean enough, the acetone-washed precipitate was processed according to Wang et al.30 with some modifications. Briefly, the precipitate was washed twice with 1 mL of cold ethyl acetate/ethanol 1:2 (v/v), twice with 1 mL of cold 10% TCA in acetone, twice with 1 mL of 10% TCA in water and finally three times with 1 mL of chilled 80% acetone. The final pellet was solubilized in a dense buffer (0.7 M sucrose, 0.1 M KCl, 0.5 M Tris and 50 mM EDTA, pH 7.5), mixed with 500 µL of Tris-saturated phenol, and shaked vigorously for 30 min at 4 °C. The mixture was phase-separated by centrifugation and the upper phenol phase was collected. Proteins were 4098

Journal of Proteome Research • Vol. 7, No. 9, 2008

precipitated by addition of 5 vol of cold 0.1 M ammonium acetate in methanol, incubated at -20 °C overnight and collected by centrifugation at 11 000 rpm in a benchtop microcentrifuge. The pellet was washed twice with 1 mL of 0.1 M ammonium acetate in methanol and three times with 1 mL of chilled 80% acetone, left to dry and solubilized in 20 µL of SDS-PAGE sample buffer followed by boiling for 3 min. SDS-PAGE was performed according to Laemmli31 in a Hoefer miniVe cell (GE Healthcare). Proteins were resolved in a 12.5% polyacrylamide gel and visualized with colloidal CBB staining.32 2.6. Antibody Production and Western Blot. Primers EjapPPO-F (5′-AATGAATTCCCGACGGCGCGTACGACCAA-3′) and EjapPPO-R (5′-CGTCTAGAGTTAGTCAGTTCTCTTACCTCC-3′) were designed based on the partial sequence of a loquat polyphenol oxidase gene available in the GenBank database (Acc. 4519439). A 614 bp fragment was amplified by PCR using loquat genomic DNA as a template, digested with EcoRI and XbaI, and cloned in the pPROEX-Htb plasmid (Promega Corporation, Madison, WI), in frame with the hexahistidine coding sequence, generating plasmid pPROEX:PPO. Escherichia coli cultures containing this plasmid were induced with isopropyl β-D-1-thiogalactopyranoside (IPTG) at a final concentration of 1 mM for 3 h, harvested and disrupted by sonication. The ca. 27 kDa recombinant His-tagged protein accumulated

c

...TDWLDAEFLFYDEN

(R)VSNSPITIGGFK

peptides

3282505/ O81103 polyphenol oxidase precursor (PPO) P. armeniaca peptides

DPLFYSHHSNVDR (R)VSNSPITIGGFK c [LL]DLNYSGTDDDVDDA[TR]c c [FD]VYVNDDADSLA[GK]c c [TD]WLDTEFLFYD[EK]c c [GI]EFAGNETVK

MH

1588.0 1220.4 1998.1 1628.8 1823.0 1164.9

score

coincidences S3 E. japonica/PPO 2 M. domestica or PPO P. armeniaca

13/13 12/12 17/18 14/15 13/14-12/14 10/11

∆M (MH+theoreticalMH+ S3)

-1.2 -0.7 -0.2 -0.1 -1.1 0.7

13/13 17/18 14/15 10/11

-0.8 -0.4 -0.3 0.7

+

coincidences S2 E. japonica / PPO 2 M. domestica

13/13 15/15 16/16 17/18 14/15 13/14 10/11

-1.0 -0.5 -0.5 -0.3 -0.2 -0.4 0.7

∆M (MH+theoretical-MH+ S2)

coincidences S1 E. japonica/PPO 2 M. domestica

+

∆M (MH theoretical-MH+ S1)

18.80a 13.96a 304b 211b 224b 164b

1587.6 1998.3 1629.0 1164.9

18.87a 288b 251b 148b

S3 bandE. japonica

MH+

score

MH

1587.7 1442.2 1633.4 1998.2 1628.9 1822.2 1164.9

score

19.45a 13.85a 19.38a 296b 263b 211b 171b

+

a Scoring of peptide sequences determined by MS/MS database search using Spectrum Mill Proteomics Workbench. b Scoring of peptide sequences determined by de novo sequencing with Sherenga’s algorithm (Dancik, et al.).26 c Amino acids in brackets could not be interpreted by de novo but the uninterpreted masses are di- or tripeptide tags from the tryptic peptides of the most similar homologous protein whose mass matches the uninterpreted stretch in the spectrum.

(K)LLDLNYGGTDDDVDDATR (K)FDVYVNDDAESLAGK ...TDWLDAEFLFYDEK (K)GIEFAGNEPVK

(R)DPLFYSHHSNVDR

peptides

14194273/Q93XM8 polyphenol oxidase 2 precursor (PPO 2)M. domestica

DPLFYSHHSNVDR [LL]DLNYSGTDDDVDDA[TR]c c [FD]VYVNDDADSLAGK c [GI]EFAGNETVK

peptides

(R)DPLFYSHHSNVDR (K)LLDLNYGGTDDDVDDATR (K)FDVYVNDDAESLAGK (K)GIEFAGNEPVK

S2 bandE. japonica

peptides

DPLFYSHHSNVDR SEFAGSFVHVPHK GPITIGGFSIELINTT c [LL]DLNYSGTDDDVDDA[TR]c c [FD]VYVNDDADSLA[GK]c c [TD]WLDTEFLFYD[EK]c c [GI]EFAGNETVK

peptides

S1 bandE. japonica

14194273/Q93XM8 polyphenol oxidase 2 precursor (PPO 2)M. domestica

(R)DPLFYSHHSNVDR (K)SEFAGSFVHVPHK (K)GPITIGGFSIELINTT (K)LLDLNYGGTDDDVDDATR (K)FDVYVNDDAESLAGK ...TDWLDAEFLFYDEK (K)GIEFAGNEPVK

peptides

14194273/Q93XM8 polyphenol oxidase 2 precursor (PPO 2) M. domestica

Table 6. Peptide List of Bands S1, S2, and S3 and Those of Their Most Similar Homologous Proteins

Proteomics of Multigenic Families

research articles

Journal of Proteome Research • Vol. 7, No. 9, 2008 4099

research articles

Selle´s-Marchart et al.

Figure 2. Alignment of the PPO sequences of M. domestica and P. armeniaca with the peptides obtained from bands P1, P2, S1, S2 and S3. Mutations in the loquat peptide positions with regard to their most similar homologous sequence are shaded, and the divergent positions between peptides for bands P1, P2, S1, S2 and S3 are indicated by (*). The two copper-binding motifs inside the catalytic domain of PPO (CuA and CuB domains) are boxed.

mostly in the inclusion bodies that were isolated by centrifugation at 8000g, washed thoroughly with 2× Triton X-100 Tris4100

Journal of Proteome Research • Vol. 7, No. 9, 2008

buffered saline (TTBS) (40 mM Tris, pH 7.5, 1 M NaCl, 0.5% triton X-100) and 1× TTBS (20 mM Tris, pH 7.5, 0.5 M NaCl,

research articles

Proteomics of Multigenic Families

Table 7. Percentage of Sequence Identities between PPO Sequences of E. japonica (P1, P2 and PPO (4519439)) and Their Top Hit Orthologs in the Public Databases access number NCBInr

14194273 3282505 4519439 15487290 1172584 7209776 1172586

protein name/species

14194273aPPO bPPO P1 3282505aPPO bPPO P2 15487290aPPO 4519439aPPO 2M. domestica E. japonica P. armeniaca E. japonica P. pyrifolia E. japonica

Polyphenol oxidase 2 precursor (PPO 2)/M. domestica Polyphenol oxidase precursor (PPO)/P. armeniaca Polyphenol oxidase (PPO)/E. japonica Polyphenol oxidase (PPO)/P. pyrifolia Polyphenol oxidase, chloroplast precursor (PPO1)/M. domestica Polyphenol oxidase I/I. batatas Polyphenol oxidase A1, chloroplast precursor (PPO)/V. faba

100

92

58

66

54

63

58

68

100

92

56

63

63

17

64

32

97

99

54

56

56

60

100

96

55

57

56

60

95

93

48 49

52 55

48 49

51 49

51 58

59 66

a For full-length sequences, the % sequence identities were taken from the BlastP report against the NCBInr database. b For partial amino acid sequences, the % sequence identities were determined from the number of identical positions after manual alignment without considering the unknown sequence gaps of loquat PPO.

Figure 3. Alignment of the loquat PPO sequences of P1 and P2 with the loquat PPO fragment present in the NCBI database (Acc. 4519439). The divergent positions in the peptides of P1 or P2 in relation to the PPO fragment are shaded.

0.25% Triton X-100), resuspended in 50 mM Tris, pH 7.5, 50 mM NaCl, and 8 M urea, and purified by affinity chromatography in HisTrap columns (GE Healthcare). Protein preparations were lyophilized and sent to Genosphere Biotechnologies (Paris, France) to be injected in rabbits for antibody production. For Western blots, proteins were resolved by SDS-PAGE and transferred to a Hybond-P PVDF membrane (GE Healthcare) using a semidry Hoefer Semiphor device (GE Healthcare). Membranes were blocked in TBS (20 mM Tris, pH 7.5, 500 mM NaCl) containing 5% (w/v) nonfat dry milk and 0.05% (v/v) Igepal (Sigma, St Louis, MO). Filters were incubated in the presence of antisera in TBS and subsequently with a goat antirabbit horseradish peroxidase conjugate (GE Healthcare). Immunoreactive bands were detected using the ECL Plus Western Blotting Detection Kit (GE Healthcare) and exposing Hyperfilm ECL photographic plates for 5 to 10 min. 2.7. N-Terminal Sequencing of 59 kDa PPO. A homogeneous latent PPO preparation was electrophoresed in SDSPAGE and electrotransferred to a PVDF membrane as described above. The membrane was stained in a 0.1% (w/v) CBBR-250 solution in 50% (v/v) methanol and destained in methanol/ acetic acid/water 5:1:4 (v/v/v). The 59.2 kDa band was excised and subjected to N-terminal sequencing using standard methods with an Applied Biosystems Procise 494 automatic sequencer (Foster City, CA), at the Protein Chemistry facility of CIB-CSIC, Madrid, Spain. 2.8. Tryptic in-Gel Digestion. Protein bands of interest were excised from the gels and trypsin in-gel digested in a ProGest (Genomic Solutions, Cambridgeshire, U.K.) automatic in-gel protein digestor according to the manufacturer recommenda-

tions for colloidal CBB-stained samples. Briefly, gel plugs were extensively washed to remove dye and SDS impurities with 25 mM ammonium bicarbonate, in-gel reduced with 60 mM DTT and S-alkylated with excess iodoacetamide followed by digestion with porcine trypsin (Promega, Madison WI) at 37 °C for 6 h.33 Peptides were extracted in ammonium bicarbonate, then in 70% acetonitrile (ACN) and finally in 1% formic acid. Extracted peptides were dried down in a Speed-Vac benchtop centrifuge and resuspended in 0.1% formic acid (typically 10 µL). The peptide sample was ready for LC-MS/MS analysis. 2.9. Liquid Chromatography-Tandem Mass Spectrometry. LC-MS/MS analyses were performed using an Agilent 1100 series nano-HPLC system lined on a XCTplus ion trap mass spectrometer (Agilent) equipped with a nano-ESI source. After sample concentration and desalting on a Zorbax 300SBC18 trap column (0.3 mm × 5 mm, 5 µm) at 0.3 µL/min, peptide separation was achieved on a Zorbax 300SB-C18 analytical column (75 µm × 15 cm, 3.5 µm) using a 30 min linear gradient of 5-35% ACN containing 0.1% (v/v) formic acid at a constant flow rate of 0.3 µL/min. MS and MS/MS spectra were acquired in the standard enhanced mode (26 000 m/z/s) and the ultrascan mode (8100 m/z per second), respectively. Mass spectrometer settings for MS/ MS analyses included an ionization potential of 1.8 kV and an ICC smart target (number of ions in the trap before scan out) of 400 000 or 150 ms of accumulation. MS/MS analyses were performed using automated switching with a preference Journal of Proteome Research • Vol. 7, No. 9, 2008 4101

research articles Table 8. Comparative Intersample Peptide Table

Selle´s-Marchart et al. a

a The 27 unique peptides found among all the samples are clustered into 19 groups according to the alignment of the best hit homologous sequences shown in Figure 2. Peptides belong to the same group if they overlap, at least in part, in the alignment. This arrangement facilitates the detection of divergent positions (indicated as * symbols) and readily displays the discriminant peptides (nos. 2, 9, 10, 13, 16, 17 and 19) that determine a minimal number of two PPO gene products. Those mutations related to the best homologous hit are highlighted in bold. Peptides nos. 10 and 16 demonstrate that sample S3 contains the same gene product as P1, but peptide no. 19 demonstrates that it also contains the same gene product as P2.

for doubly charged ions and a threshold of 105 counts and a 1.3 V fragmentation amplitude. 2.10. MS/MS Spectrum-Based Peptide and Protein Identification. Each MS/MS spectra data set (ca. 1200 spectra/ run) was processed to determine monoisotopic masses and charge states, to merge MS/MS spectra with the same precursor (∆m/z < 1.4 Da and chromatographic ∆t < 15 s) and to select high quality spectra with the Extraction tool of SpectrumMill Proteomics Workbench (Agilent). The reduced data set was searched against the NCBInr Viridiplantae subset in the identity mode with the MS/MS Search tool of SpectrumMill Proteomics Workbench using the following parameters: trypsin, up to 2 missed cleavages, fixed modification S-carbamidomethyl and a mass tolerance of 2.5 Da for the precursor and 0.7 Da for product ions. Peptide hits were validated first in the peptide mode and then in the protein mode according to the score settings recommended by the manufacturer. Validated files were summarized in the protein mode to assemble peptides into proteins. 2.11. De Novo Sequencing and Sequence-Based Protein Identification. The remaining unassigned MS/MS spectra were filtered according to the maximum tag length feature of the spectrum quality check tool of SpectrumMill Proteomics Workbench. Those of good spectral quality were submitted to de novo sequencing by using the corresponding SpectrumMill Proteomics Workbench tool that uses the Sherenga’s de novo sequencing algorithm.26 The Sherenga interpreted sequences were BlastP34 searched against NCBInr. Protein hits with up to four substitutions in the searched sequence tag were considered. Sequence tags achieving the top matches to protein entries previously identified by MS/MS search were added to 4102

Journal of Proteome Research • Vol. 7, No. 9, 2008

the set of MS/MS interpreted peptide, thus increasing the sequence coverage of the already validated protein. 2.12. Phylogenetic Tree. Full-length PPO sequences were retrieved from GenBank and aligned using CLUSTALX.35 The ambiguously aligned regions were eliminated with GBLOCKS.36 Distance matrices (JTT and WAG) were constructed using TREE-PUZZLE (1000 puzzling quartets, JTT or WAG matrix and constant rate in all sites)37 or PHYML (1000 bootstrap replicates, JTT or WAG matrix and ratio of invariable sites estimated from sample).38 Trees were visualized with TREEVIEW.39 Accession numbers of the sequences used for the analysis are indicated in Figure 4.

3. Results 3.1. PPO Diversity in Loquat Fruit. In a previous work, we purified to homogeneity a latent PPO from the particulate fraction of loquat fruit flesh and demonstrated that it is a 59 kDa monomer.28 Figure 1A shows an SDS-PAGE analysis of the purified protein that we have termed in this work as P1. That PPO was shown to have differential enzymatic properties as compared to the PPO activity found in the soluble fraction, thus, providing some evidence of the existence of a diversity of PPOs in loquat fruit. This would be in line with the existence of several PPO isoforms in other members of the Rosaceae phylum like Prunus sp. or Malus sp. Using the unique partial loquat PPO nucleotide sequence found in the public databases,14 we have amplified a 614bp DNA fragment that encodes conserved copper binding regions using loquat genomic DNA as a template. The corresponding polypeptide was overexpressed in E. coli as a His-tagged fusion, purified by affinity

Proteomics of Multigenic Families

research articles

Figure 4. Neighbor-joining phylogenetic tree based on PPO sequences. The accession numbers of the sequences used in the analysis are indicated. Arrows point to the best hit in sequence comparisons for loquats PPO-P1, PPO-P2 and the loquat PPO fragment from the database.

chromatography and used to raise polyclonal antibodies in rabbits. In Western blots, the anti-PPO antiserum immunoreacted with several bands from loquat fruit flesh whole extracts (data not shown). When the whole extracts were split into particulate (Figure 1B) and soluble fractions (Figure 1C), the antiserum immunoreacted with two bands of 59.2 (P1) and 66.0 (P2) kDa in the particulate fraction and with three bands of 59.2 (S1), 62.0 (S2) and 66.0 (S3) kDa in the soluble fraction. Although not conclusive, this result supports the existence of several PPO isoforms in loquat fruit flesh. 3.2. Partial Sequence Determination of a Purified 59 kDa Latent PPO. To obtain sequence information of the purified 59 kDa latent PPO, the corresponding band was analyzed both by N-terminal Edman microsequencing after transfer to a PVDF membrane and by liquid chromatographytandem mass spectrometry after tryptic in-gel digestion and extraction of the peptides. The NCBInr database was used for

both MS/MS spectra searches with the SpectrumMill Proteomics Workbench search engine and for sequence similarity searches with the BlastP program.34 Table 1 shows the protein hit list, indicating the number of matching peptides found either by N-terminal sequencing, MS/MS spectrum search or de novo sequencing, and the percentage of sequence coverage with respect to the mature protein, if the thylakoid signal peptide was known. Table 2 shows the sequences determined by the above procedures and the corresponding sequences of the top protein hit, the apple polyphenol oxidase 2 precursor (Acc. 14194273), thus, providing strong evidence that the 59 kDa loquat fruit flesh latent PPO shares a very high degree of similarity with it. 3.3. Partial Sequence Determination of anti-PPO Immunoreactive Bands. All of the anti-PPO immunoreactive bands present in partially purified soluble and particulate PPO preparations were excised from replicate SDS-PAGE stained Journal of Proteome Research • Vol. 7, No. 9, 2008 4103

research articles gels, processed by tryptic in-gel digestion, and analyzed by liquid chromatography-tandem mass spectrometry, and peptide sequences were determined as above. A star symbol in Table 2 indicates peptides found in immunoreactive band P1, thus, providing evidence that this band contains the purified 59 kDa latent PPO. Table 3 shows the protein hit list for the band P2 and Table 4 shows that for the bands S1, S2 and S3, indicating the number of matching peptides found either by MS/MS spectrum search or de novo sequencing, and the percentage of sequence coverage with respect to the mature protein, if the thylakoid signal peptide was known. Table 5 shows the sequences determined by the above procedures and the corresponding sequences of the top protein hit for the band P2, the P. armeniaca polyphenol oxidase precursor (Acc. 3282505). All the peptide sequences searched, with the exception of [TD]WLDTEFLFYDEK (MH+ ) 1822.11) returned P. armeniaca PPO as the top hit. The latter peptide returned apple PPO2 as the top hit with one substitution, while P. armeniaca PPO was returned when two substitutions were considered. This result suggests that loquat PPO from band P2 is more similar to apple PPO2 in this particular sequence tag but that it shares more identity to P. armeniaca PPO in the whole protein sequence. Table 6 shows the sequences determined from immunoreactive bands of the soluble fraction and the corresponding sequences of the top protein hit for each of them, which were the apple PPO2 precursor (Acc. 14194273) for bands S1, S2 and S3. Since all the sequences deduced from S1 band (Table 6) are also found in P1 band (Table 2) and both bands have the same electrophoretic mobility, it can be proposed that they are likely to be the same polypeptide, S1 resulting from the artifactual release of P1 from the particulate fraction during the tissue homogenization step. Peptide sequences of S2 (a lowintensity band with a lower electrophoretic mobility than S1) and S3 (a more prominent band with the lowest electrophoretic mobility), are also found in the P1 band. Since PPOs are wellknown plastid targeted proteins through a N-terminal signal peptide,40 a possible scenario would be that S2 (62.0 kDa) and S3 (66.0 kDa) were immature forms of the polypeptide banding at P1 (59.2 kDa). A tomato 67 kDa PPO precursor prepared by in vitro translation was demonstrated to be imported by pea chloroplasts in an energy-dependent two-step process, thus, producing a 62 kDa intermediate and a 59 kDa final mature form.41 Since peptides from the N-terminal immature PPO were not found, the only evidence supporting this interpretation is the Mr of S2 and S3 with respect to P1. A detailed look at the PPO signal sequences revealed that there are few and unevenly distributed cationic residues, thus, producing very short or very long tryptic peptides that are difficult to detect by ESI-MS. A unique precursor ion [MH+ ) 1220.39] (VSNSPITIGGFK) found in S3 (Table 6) but not in any of the other bands, matched P. armeniaca PPO (Acc. 3282505). A closer look at the P2 peptide sequences (Table 5) reveals the presence of a larger precursor ion [MH+ ) 1799.07] (VSNSPITIGGFKIEYSS) that contains the above sequence (underlined) and is the result of a trypsin miscleavage. Among the peptide sequences of P1 and S1 (Tables 2 and 6), there is a precursor ion [MH+ ) 1633.23 in P1 and MH+ ) 1633.35 in S1] (GPITIGGFSIELINTT) that aligns the above two with divergent positions. In addition, the peptide [TD]WLDTEFLFYD[EK] found in S3 and also in P1 and P2 can be assigned to either the P. armeniaca PPO or apple PPO2. The simplest interpretation of these data is that the S3 band contains two different polypeptides: one is the immature 4104

Journal of Proteome Research • Vol. 7, No. 9, 2008

Selle´s-Marchart et al. polypeptide that is highly similar to the homologous apple PPO2, the precursor of that found in P1; the other is similar to that found in P2 and is highly similar to the homologous P. armeniaca PPO, and probably occurs by artifactual release from the particulate fraction during the tissue homogenization. 3.4. Sequence Alignments and Phylogenetic Inference. As the entries returned by the database search are homologous proteins in each case, the identification level achieved is only tentative; that is, the PPO function is clearly identified for each band, but not their respective coding genes. To go more deeply into the interpretation of the results obtained, the sequences of the two proteins homologous to loquat PPO bands, that is, apple PPO2 (Acc. 14194273) and P. armeniaca PPO (Acc. 3282505), have been aligned, then peptides which were identified for each band aligned on them, as shown in Figure 2. Many peptide sequences were found to be exclusive of either the group of bands P1, S1 and S2, or of the P2 band. This is an evidence that they are likely encoded by different genes. However, the fact that there is not only one peptide common to all the bands in band S3 ([TD]WLDTEFLFYD[EK]), but also peptides exclusive to the P1, S1, S2 group and one exclusive to P2, means that the possibility of an undisclosed loquat PPO gene conciliating both sets of peptide sequences remains open. For instance, alleles might explain some gene polymorphism, that is, a few divergent positions, while the presence of different isoforms might actually be due to heterozygous genotypes for the PPO locus of the sampled individuals. Since very little is known in relation to loquat genetics, there is no means to assess the likelihood of the above possibilities. On the other hand, the occurrence of splice variants of a single gene should be considered as well. If it would be the case, the overlapping regions of peptides supposedly encoded by different mRNA should have exactly the same sequence, without any divergent position. However, many divergent positions have been detected. Moreover, known PPO genes of plants including Rosaceae are continuous; thus, splice variants should be very rare. The only exception is banana, in which a PPO gene fragmented by one 85 bp intron has been reported.12 Since altogether we have detected up to 21 divergent positions in overlapping peptides (indicated with * in Figure 2) such hypotheses differ considerably, and support the interpretation that each set of bands is the product of a different paralog gene and not of a different allele or splice variant. In fact, in other plants, including members of the Rosaceae family, the PPO gene-pool is represented by three or more known paralogs.10–15 Thus, the most simple hypothesis to explain these results is that there are PPOs encoded from at least two paralog genes in loquat fruit at harvest time. Table 7 shows the percentage sequence identities between paralog sequences of loquat PPOs (P1, P2 and the already known fragment) and their top hit orthologs in the public databases. Apple PPO2 shares only up to 63% of identities with other PPOs, but identities rise up to ca. 92% for the fragments of sequence deduced from the P1 band. P. armeniaca PPO shares only up to 58% identities with complete sequences of other PPOs, and identities of 64% with the unique loquat PPO partial sequence in the public databases. Also in this case, identities rise to 92% when comparing P. armeniaca PPO with the fragments of the sequence deduced from the P2 band. These results suggest that if the respective loquat PPO genes encoding for polypeptides banding in P1 and P2 were known, the deduced sequence of their products would probably share the highest similarity with apple PPO2 and P. armeniaca PPO,

Proteomics of Multigenic Families respectively. Interestingly, the polypeptide encoded by the unique loquat PPO partial sequence in the public databases (Acc. 4519439) that we used to raise our antibodies shows identities of 96% with the corresponding region in P. pyrifolia PPO (Acc. 15487290). As Figure 3 shows, the alignment of the deduced amino acid sequence encoded in the loquat PPO fragment from the database with sequences from P1 and P2 displays multiple divergences in all overlapping peptides, thus, providing strong evidence that such a fragment belongs to a third paralog PPO gene of loquat. To confirm our interpretations and to obtain further information on the phylogenetic relationships of loquat PPOs, a phylogenetic tree was constructed using a set of full-length PPO sequences from the public databases, as shown in Figure 4. The apple PPO2 and P. armeniaca PPO proteins were found to cluster together, thus, indicating that P1 and P2 are most probably closely related proteins. However, P. pyrifolia PPO, the top hit for the loquat PPO fragment, belongs to a different cluster in the tree in which other PPO from Rosaceae species are found (i.e., apple PPO1). Therefore, the phylogenetic analysis supports the existence of at least three PPO paralog genes in loquat, two of them expressed in ripe fruits and the third silent one, at least at harvest time.

4. Discussion As compared to other experimental approaches, LC-MS/MS used in the frame of a proteomic analysis workflow may potentially provide abundant and highly specific information on the chemical nature of proteins (sequence tags), in a costeffective and fast manner. The use of antibodies is an apt guide to find proteins of interest in a complex mixture and confers the proteomic workflow approach with precision and efficiency with regard to either the selection of a gel band or a spot for analysis. In fact, most of the peptides validated in the samples of our immunoreactive bands belonged to PPOs, although some valid peptides belonged to different proteins of a close Mr. Nevertheless, such peptides were insufficient to validate another protein with the minimal criteria. For the purpose of protein function identification, it can be asserted that this technology surpasses other rivals such as immunological or N-terminal sequencing. Apart from the function, the sequence information obtained is so specific that it can be used to unambiguously discriminate between the proteins encoded by paralog genes,5 and even allelic variants24 by means of discriminant peptides. A broad sequence coverage is critical in order to increase the likelihood of obtaining discriminant peptides; therefore, strategies aimed at increasing the sequence coverage need to be developed. As not every member of a multigenic family is present in the public protein databases, their discriminating peptides would remain devoid, while the common ones would serve merely to either identify the function or mislead to an erroneous gene origin assignment. For genome-wide-sequenced organisms, using MS/MS search against the genomic database rather than public protein databases is a fruitful strategy designed by Delalande et al.5 to identify proteins belonging to multigenic families at the paralog level in rice. Unfortunately, this strategy is not applicable to most eukaryotic species whose genomes are not sequenced. Our strategy is not restrictive as searches use the public protein databases, and is based on the sequence similarity of orthologs. As peptide sequences can barely be determined by the MS/MS search for underrepresented species, it is necessary to reanalyze the data sets after

research articles an automated search in identity mode in order to deduce the sequence information contained in them with the expectation that they would correspond to novel peptides of loquat PPOs. For large sets of spectra, Nesvizhskii et al.25 developed a statistical classifier of good quality spectra based on several spectrum features such as the length of the longest sequence tag that can be extracted from the spectrum, taking into account the knowledge of the fragmentation process. Then, high quality unassigned MS/MS spectra were reanalyzed in a number of additional searches, including large mass tolerance, high charge states of the parent ions or variable modifications such as Pyro-Glu, to increase the number of identified peptides that may contain modifications, be sequence polymorphisms or novel peptides from predicted splice forms. Since in this sort of search the increase in computation time often becomes prohibitive, selected spectra are usually searched using subset databases. In a different approach, Searle et al.42 used de novo sequencing results to identify proteins and unanticipated sequence modifications using a mass-based alignment. Often de novo sequencing algorithms report ambiguous regions where the order or identity of two or more amino acids is uncertain or a mass gap is unresolved. Thus, the monoisotopic masses of the amino acids or a combination of two or more amino acids rather than an amino acid sequence is used for a local alignment against a protein database. A mismatch is then interpreted as a sequence modification. Obviously, the length of the correctly interpreted peptide sequence by the de novo approach depends of the quality of the spectrum. We have combined features of both strategies above to increase the sequence coverage of proteins whose representation in the database is by homology and not by identity: the high spectrum quality and the de novo sequencing. The spectrum quality had been previously enhanced by a spectrum extraction step in which raw spectra are improved by merging MS/MS spectra with the same precursor ion which results in enhanced peak intensities and more complete ion series. In this work, we used the spectrum quality classifier tool implemented in SpectrumMill to select good quality spectra with tag lengths higher than 5 among those unassigned to peptides after a first pass of automated MS/MS search in identity mode. Then, de novo sequencing was performed on those spectra to obtain high quality (10 consecutive amino acids on average) peptide sequences which were then used to interrogate the protein database by classical alignment with BlastP.34 A mismatch is then interpreted as a sequence modification of the analyte protein in relation to its homologous in the database. The uninterpreted regions of a spectrum, usually the N- and C-termini, are not used for the alignment, although they can be interpreted later if they match the monoisotopic mass of amino acid combinations flanking the database matched sequence. Although spectra have been acquired in a low mass accuracy instrument, which is a factor that introduces ambiguity in the peptide identification task for the inability to resolve among different sequence alternatives such as I/L, D/N, E/Q/ K, the amino acid substitutions detected are beyond any ambiguity due to the instrument accuracy (see Tables 2, 5 and 6). To compare the sample bands, the top hit orthologs are extracted from the database and aligned. Eventually, the peptide sequences of each sample band are aligned on them to determine the minimal number of paralog sequences present in the biological sample on the basis of discriminant peptides. Although we have manually implemented this strategy in this work, it could also be automated through a software application Journal of Proteome Research • Vol. 7, No. 9, 2008 4105

research articles by inputting the peptide experimentally determined sequences and receiving a report as that outlined in Table 8 after an internal BlastP search of the public protein databases and alignment of candidate homologues. This strategy led us to discover two PPO gene products in ripe loquat fruit, one highly similar to the homologous apple PPO in the form of the precursor and the mature polypeptide, while the other was highly similar to the homologous apricot PA-PPO, but only in the mature form. In a previous study, we showed that loquat PPO from the soluble fraction displayed differential enzymatic properties with regard to that found in the particulate fraction, namely, it was less reactive against the natural substrate chlorogenic acid, not sensitive to the activating agent SDS, and the copper chelating agent, diethyldithiocarbamate, inhibited it more strongly.24 Thus, the soluble and the particulate loquat PPOs were assumed to be different forms. The results obtained in this work provide experimental evidence of such a hypothesis and suggest that the properties of the soluble PPO can be assigned to the precursor polypeptide while those of the particulate to the mature 59.2 kDa latent form. Moreover, the presence of precursors suggests that the gene encoding for P1 is being actively expressed in the ripe loquat, whereas the gene encoding P2 would have been expressed at earlier stages, thus, rendering the protein stable. It is interesting to note that the most similar P1 gene, encoding apple PPO2, is expressed in young fruits,15 and that the most similar P2 gene, encoding apricot PA-PPO, is only expressed in very early fruit development stages, but the protein is present in all the development and ripening stages.19 Eventually, the best hits of a third PPO gene product in loquat, which we partially cloned and expressed for antibody production, were Japanese pear PPO (Acc. BAB64530) and apple PPO1 (Acc. P43309), but it was not detected in ripe loquat fruits. Apple PPO1 was found to be expressed in very young fruits, in the skin of harvested ripen fruits and in flesh upon fruit bruising.18 Thus, the third loquat PPO could be found in fruits in other developmental stages or under certain physiological conditions. Following the identification strategy developed in this work, we are currently studying the evolution of loquat PPOs in different physiological conditions such as development, ripening and postharvest life. Abbreviations: IPTG, isopropyl β-D-1 thiogalactopyranoside; PPO, polyphenol oxidase; PA-PPO, gene encoding for Runus armeniaca polyphenol oxidase; TTBS, Triton X-100 Trisbuffered saline; TX-100, Triton X-100; TX-114, Triton X-144.

Acknowledgment. We thank Mrs. Maria Teresa Vilella Anto´n for her technical support and the Cooperativa Agrı´cola Callosa D‘En Sarria for supplying freshly cut loquat fruits. This work has been supported by grants from the Spanish Ministry of Education, Science and Sports (AGF99-0396), (BIO2002-03100) and European funds for Regional development (FEDER). Note Added after ASAP Publication. This article was published ASAP on July 12, 2008 with part of Section 3 appearing out of order. The correct version was published on July 18, 2008.

4106

Journal of Proteome Research • Vol. 7, No. 9, 2008

Selle´s-Marchart et al.

References (1) Henzel, W. J.; Billeci, T. M.; Stults, J. T.; Wong, S. C. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 5011. (2) Jensen, O. N.; Mortensen, P.; Vorm, O.; Mann, M. Anal. Chem. 1997, 69, 1706. (3) Rappsilber, J.; Mann, M. Trends Biochem. Sci. 2002, 27, 74. (4) Nesvizhskii, A. I.; Keller, A.; Kolker, E.; Aebersold, R. Anal. Chem. 2003, 75, 4646. (5) Delalande, F.; Carapito, C.; Brizard, J. P.; Burgidou, C.; Van Dorselaer, A. Proteomics 2005, 5, 450. (6) Mayer, A. M.; Harel, E. In Food Enzymology; Fox, P. F.; Elsevier: New York, 1991; Vol. 373. (7) Vamos-Vigyazo, L. CRC Crit. Rev. Food Sci. 1981, 15, 49. (8) Vaughn, K. C.; Lax, A. R.; Duke, S. O. Physiol. Plant 1988, 72, 659. (9) Walker, J. R. L.; Ferrar, P. H. Biotechnol. Genet. Rev. 1998, 15, 457. (10) Newman, S. M; Eannetta, N. T; Yu, H; Prince, J. P; de Vincente, M. C; Anksley, S. D; Steffens, J. C Plant Mol. Biol 1993, 21, 1035. (11) Thygesen, P. W.; Dry, I. B.; Robinson, S. P. Plant Physiol. 1995, 109, 525. (12) Gooding, S. P.; Bird, C.; Robinson, P. S. Planta 2001, 213, 748–57. (13) Wang, J.; Constabel, C. P. Planta 2004, 220, 87. (14) Haruta, M.; Murata, M.; Kadokura, H.; Homma, S. Phytochemistry 1999, 50, 1021. (15) Kim, J. Y.; Seo, Y. S.; Kim, J. E.; Sung, S. K.; Song, K. J.; An, G.; Kim, W. T. Plant Sci. 2001, 161, 1145. (16) Thipyapong, P.; Steffens, J. Plant Physiol. 1997, 115, 409. (17) Thipyapong, P.; Joel, D. M.; Steffens, J. C. Plant Physiol. 1997, 113, 707. (18) Boss, P. K.; Gardner, R. C.; Janssen, B. J.; Ross, G. S. Plant Mol. Biol. 1995, 27, 429. (19) Chevalier, T.; De Rigal, D.; Mbeguie´, D.; Gauillard, F.; RichardForget, F.; Lycaon, F. R. B. Plant Physiol. 1999, 119, 1261. (20) Li, L.; Steffens, J. Planta 2002, 215, 239. (21) Thipyapong, P.; Hunt, M. D.; Steffens, J. Planta 2004, 220, 105. (22) Lee, C. Y.; Whitaker, J.; Enzymatic Browning and Its Prevention; ACS Symposum Series 600; American Chemical Society: Washington, DC, 1995. (23) Nesvizhskii, A. I.; Aebersold, R. Mol. Cell. Proteomics 2005, 4, 1419. (24) Casado-Vela, J.; Selle´s, S.; Bru, R. Proteomics 2006, 6, S-196. (25) Nesvizhskii, A. I.; Roos, F. F.; Grossmann, J.; Vogelzang, M.; Eddest, J. S.; Gruissem, W.; Baginsky, S.; Aebersold, R. Mol. Cell. Proteomics 2006, 5, 652. (26) Dancik, V.; Clauser, K. R.; Addona, T. A.; Vath, J. E.; Pevzner, P. A. J. Comp. Biol. 1999, 6, 327. (27) Bordier, C. J. Biol. Chem. 1981, 256, 1604. (28) Selle´s, S.; Casado-Vela, J.; Bru, R. Arch. Biochem. Biophys. 2006, 446, 175. (29) Bensadoun, A.; Weinstein, D. Anal. Biochem. 1975, 70, 241. (30) Wang, W.; Scali, M.; Vignani, R.; Spadafora, A.; Sensi, E.; Mazzuca, S.; Cresti, M. Electrophoresis 2003, 24, 2369. (31) Laemmli, U. K. Nature 1970, 227, 680. (32) Neuhoff, V.; Arold, N.; Taube, D.; Ehrhardt, W. Electrophoresis 1988, 9, 255. (33) Shevchenko, A.; Wilm, M.; Vorm, O.; Mann, M. Anal. Chem. 1996, 68, 850. (34) Altschul, S. F.; Madden, T. L.; Scha¨ffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. Nucleic Acids Res. 1997, 25, 3389. (35) Thompson, J. D.; Higgins, D. G.; Gibson, T. J. Nucleic Acids Res. 1994, 22, 4673. (36) Castresana, J. Mol. Biol. Evol. 2000, 17, 540. (37) Schidt, H. A.; Strimmer, K.; Vingron, M.; von Haeseler, A. Bioinformatics 2002, 18, 502–04. (38) Guindon, S.; Cascuel, O. Syst Biol. 2003, 52, 696. (39) Page, R. D. M. Comp. Appl. Biosci. 1996, 12, 357. (40) Koussevitzky, S.; Ne’eman, E.; Sommer, A.; Steffens, J.; Harel, E. J. Biol. Chem. 1998, 273, 27064. (41) Sommer, A.; Ne’eman, E.; Steffens, J.; Mayer, A. M.; Harel, E. Plant Physiol. 1994, 105, 1301. (42) Searle, B. C.; Dasari, S.; Turner, M.; Reddy, A. P.; Choi, D.; Wilmarth, P. A.; McCormack, A. L.; David, L. L.; Nagalla, S. D. Anal. Chem. 2004, 76, 2220.

PR700687C