MAPA Distinguishes Genotype-Specific Variability of Highly

May 13, 2011 - Microsoft Visual Studio 2008. It can be installed using the ... validation; and (iv) Bootstrap sampling. The proportion of class labels...
0 downloads 0 Views 4MB Size
ARTICLE pubs.acs.org/jpr

MAPA Distinguishes Genotype-Specific Variability of Highly Similar Regulatory Protein Isoforms in Potato Tuber Wolfgang Hoehenwarter,*,† Abdelhalim Larhlimi,‡ Jan Hummel,§ Volker Egelhofer,† Joachim Selbig,‡ Joost T. van Dongen,§ Stefanie Wienkoop,† and Wolfram Weckwerth† †

Department of Molecular Systems Biology, University of Vienna, Faculty of Life Sciences, Althanstrasse 14, A-1090, Vienna, Austria Institute of Biochemistry and Biology, University of Potsdam, Karl-Liebknecht-Strasse 2425, 14476 Potsdam, Germany § Max Planck Institute of Molecular Plant Physiology, Am M€uhlenberg 1, D-14476, Potsdam-Golm, Germany ‡

ABSTRACT: Mass Accuracy Precursor Alignment is a fast and flexible method for comparative proteome analysis that allows the comparison of unprecedented numbers of shotgun proteomics analyses on a personal computer in a matter of hours. We compared 183 LCMS analyses and more than 2 million MS/MS spectra and could define and separate the proteomic phenotypes of field grown tubers of 12 tetraploid cultivars of the crop plant Solanum tuberosum. Protein isoforms of patatin as well as other major gene families such as lipoxygenase and cysteine protease inhibitor that regulate tuber development were found to be the primary source of variability between the cultivars. This suggests that differentially expressed protein isoforms modulate genotype specific tuber development and the plant phenotype. We properly assigned the measured abundance of tryptic peptides to different protein isoforms that share extensive stretches of primary structure and thus inferred their abundance. Peptides unique to different protein isoforms were used to classify the remaining peptides assigned to the entire subset of isoforms based on a common abundance profile using multivariate statistical procedures. We identified nearly 4000 proteins which we used for quantitative functional annotation making this the most extensive study of the tuber proteome to date. KEYWORDS: comparative proteomics, mass accuracy, protein isoforms, potato tuber, lipoxygenase, protease inhibitor, phenotype, genetic variability

’ INTRODUCTION The analysis of naturally occurring genetic variability and molecular phenotype determinants in crops for selective breeding is currently a field of intense research. It is aided by comprehensive experimental approaches that determine the dynamics of all of a certain class of biomolecules at one time and by largescale sampling for statistical results so that researchers can reach conclusions with confidence. Proteins are of special interest because in most cases they are the functional end products of gene expression. Proteomics, the science of the dynamic total protein complement of a biological system, has recently achieved the quantitative description of all of an organisms gene products.1 In theory, this makes it a good candidate for the unbiased search for molecular markers. However, the large-scale comparative implementation of this achievement, that is, the comparison of hundreds of analyses of several different proteomes is lacking. The same holds true for the differentiation of highly similar proteins such as isoforms or post-translationally modified proteins in several proteomics phenotypes. The difficulty lies in finding a quantitative description of the proteome that is comprehensive but simple and uniform. It is possible to separate the heterogeneous proteome into 10 000 individual proteins including isoforms and post-translationally modified species with two-dimensional gel electrophoresis (2-DE),2 r 2011 American Chemical Society

but it is time-consuming and the subtractive analysis of many gel patterns is hard even for an expert. Shotgun proteomics approaches3 digest proteins into peptides prior to analysis with liquid chromatography (LC) and mass spectrometry (MS). They previously required either identification of a peptide’s primary structure4 or at least two parameters for peptide definition, the ionized mass to charge ratio (m/z) measured in the mass spectrometer and the retention time (RT) in liquid chromatography. A third (mass spectrometric signal intensity) was necessary for quantification. Comparison of such three-dimensional peptide arrays was hindered by their complexity and fuzziness incurred by technical inaccuracies.510 A very precise work has alleviated this problem but it is still limited in the number of biological conditions that can be compared and is mainly applicable in the laboratory.11 We have developed a solution called Mass Accuracy Precursor Alignment (MAPA) which defines the proteome by a single parameter, the accurately measured peptide m/z. We applied it to the mature, field grown tubers of 12 cultivars of Solanum tuberosum, the fourth most important food crop worldwide. We compared 183 shotgun proteomics analyses each comprising between 6000 and 14 000 peptide mass spectra (MS/MS spectra) for a total of around 2 million MS/MS spectra. Advanced multivariate statistical procedures were used to decompose the Received: November 4, 2010 Published: May 13, 2011 2979

dx.doi.org/10.1021/pr101109a | J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research data. A total of 123 highly significant features that distinguish the proteomic phenotypes of all 12cultivars were detected. Most of these could be assigned to protein isoforms of the major gene families expressed during tuberization as well as to enzymes involved in sucrose mobilization in the later stages of tuber development and starch degradation. The results indicate expressed polymorphism that is prevalent in the major gene families that regulate tuberization may have a differential impact on tuber development and the plant phenotype. We attempted a novel approach to quantify distinct protein isoforms that are highly homologous from the measured abundance of tryptic peptides. Peptides that were identified as primary structure constituents of one or more protein isoforms were assigned to a single isoform based on common abundance profiles in the investigated cultivars. With this or a similar statistical method, the mixture of peptides generated by digesting the proteome may be better resolved and the protein inference problem12 overcome even for very similar proteins such as isoforms or post-translationally modified proteins.

’ EXPERIMENTAL SECTION Plant Cultivation, Tuber Harvest, and Sample Preparation

Twelve commercially available potato cultivars were chosen for proteomics analyses: Agria, Alliance, Arnika, Festien, Goldika, Kuras, Lady Claire, Marabel, Milva, Omega, Red Fantasy, and Topas. The cultivars were grown according to standard agricultural practice in 2006 on two fields in Germany. One lies near the village of Ebstorf in central Germany, the other near the village of B€ohlendorf in northern Germany near the Baltic Sea. The soil of the first field is composed of loess; the second field near the Baltic Sea is mainly made up of clay. The weather and climate on the two fields also differ substantially. In Ebstorf, the median maximum temperature during summer was 30 °C, in B€ohlendorf it was 25 °C. Total precipitation for the entire cultivation period was 231 mm of rain plus an additional 50 mm of watering on the first field and 350 mm of rain with no watering on the second. Each cultivar was grown independently on six randomly located plots on each field. Twenty plants were grown on each plot from which all tubers were harvested. Tubers 58 cm in diameter were sliced in the middle along the medial axis. Slices from the middle section that included epidermal tissue and cork cells were snap frozen in liquid nitrogen and ground to a fine powder. The powder was stored at 80 °C until protein extraction. Proteins were extracted from at least two, in most cases three, individual tubers per cultivar per field (four to six tubers per cultivar). Protein Extraction, Digestion, and LCMS/MS

Shotgun proteomics experiments were described in detail previously.13 Frozen, pulverized tuber tissue was incubated with MetOH/CHCl3/H2O to solubilize small polar and nonpolar organic compounds. Proteins were then extracted with a combination of a Tris based, SDS containing buffer and phenol, for phase partitioning from nucleic acids. Proteins were precipitated from the phenol phase with acetone and digested with Lys-C and Trypsin in Urea (4 and 2 M, respectively) and NH4CO3. Peptides were desalted with solid phase extraction cartridges (SPEC) and resolved in 0.1% FA/H2O for on-line LCMS/MS. Each sample was measured three to five times. For every shotgun proteomics analysis, 20 μg of peptides was loaded onto a C18 reverse phase monolithic column and separated online by gradient elution with 0.1%FA/MetOH (090% in 95 min) into

ARTICLE

an ESI source. The column was washed after every sample application with 0.1%FA/MetOH. The spray voltage and capillary temperature were set to achieve an optimal spray current of around 0.1 μA. Gas phase ions were analyzed in an LTQ/ Orbitrap mass spectrometer (Thermo Scientific) in data dependent mode. FT full scans were carried out with a max injection time of 2000 ms, a resolution of 30 000, and a maximum ion load (AGC) of 2  106 (a compromise between space charge effects and signal intensity). Collision induced dissociation (CID) was performed with a normalized collision energy of 35%. IT MS/MS scans were carried out with a max injection time of 50 ms and an AGC of 1  104. Dynamic exclusion of 15 s and charge state screening and rejection of single or unassigned charges and monoisotopic precursor selection were enabled. Mass Accuracy Precursor Alignment

The Thermo .raw files were converted to mzXML format with the ReAdW version 2006Nov01 program from the Seattle Proteome Center which can be downloaded from SourceForge.net as a .zip. The spectral count, that is, the number of recorded MS/MS spectra of every mass to charge ratio subjected to CID (precursor m/z), rounded to the second decimal, for each and every mzXML file was entered into the cells of a matrix that has the precursor m/z in the rows and mzXML file identifiers in the columns with the ProtMax program. The program is a windows forms application in the Common Language Runtime (CLR) environment using the Net 3.5 framework, C# and Microsoft Visual Studio 2008. It can be installed using the ClickOnce technology from http://xeml.mpimp-golm.mpg.de/ ClickOnce/ProtMax/default.htm.

Peptide and protein identification

Peptides and proteins were identified with the Mascot software v2.2 from Matrix Science14 and the Inspect software v2009.02.02 from the University of California, San Diego15 using a six frame translation of a database of Solanum tuberosum EST sequences (Solanum_tuberosum_release_2.fasta) available from the J. Craig Venter Institute. A maximum mass deviation (MMD) of 5 ppm for precursor m/z and of 0.4 Da for MS/MS fragment m/z was tolerated as well as 1 missed cleavage. Oxidation of methionine was tolerated as a variable modification; the enzyme was set to trypsin. For the Mascot results, the peptide mass accuracy was recalibrated internally with the MSQuant software v1.4.2 using peptides with Mascot scores of 30 or greater. Multivariate Data Mining

All procedures were done with R v.2.8.0 (R Development Core Team, 2008), the bioconductor package CMA,16 the TM4 software v4.3,17 and MetageneAlyse.18 The raw data matrix that contained all of the shotgun proteomics analyses of 12 tuber cultivars had 20 997 rows (peptide m/z) and 183 columns (analyses). For independent component analysis (ICA), rows with a cumulative spectral count of less than 75 (this approximately accounts for a spectral count of 4 for each measurement of one cultivar when 3 samples per field were each measured three times [3  2  3; total 18 measurements] and 0 for all other cultivars) were removed to eliminate noise resulting in a reduced matrix of 3840 rows. The columns were vector normalized and scaled to the mean total spectral count or alternatively the rows were scaled to unit variance. All values of 0 were replaced with 0.5 as below the limit of detection (LOD), the values were Log10 transformed. 2980

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research For supervised classification, rows with more than 90% missing values were removed to eliminate noise resulting in a reduced matrix of 6135 rows. The columns were vector normalized and scaled to the mean total spectral count. All values of 0 were replaced with 0.5 as below the limit of detection (LOD), the values were Log10 transformed and the rows were mean subtracted. Six supervised classification methods were used to predict the class labels (cultivars) of the analyses using the peptide m/z. The data set was split into training and test sets for model selection and evaluation 500 times for each of four different splitting rules, (i) 5-fold cross-validation, that is, splitting into 5 equal parts where in each iteration i, the test set is given by the i-th part while the remaining parts define the training set; (ii) leaving-one-out cross-validation; (iii) Monte Carlo crossvalidation; and (iv) Bootstrap sampling. The proportion of class labels was also required to be the same in the test sets as in the entire data set. To cope with over fitting, the peptide m/z were ranked using empirical Bayes moderated F statistics generating an independent ranking for every test set for each splitting rule, so a total of 2000 independent rankings. The peptides that were ranked highest in all rankings for all four splitting rules, 123 in all, were used as predictors for four classification methods, linear discriminant analysis (LDA), support vector machine (SVM),19 k-nearest neighbor classification (KNN),20 and random forest (RF).21 To include possible correlation between peptides that are ignored by the univariate peptide selection, partial leastsquares (PLS) was applied to the data set and LDA and RF were performed with the highest ranked PLS components. Where applicable, the classification methods were tuned for optimal performance using cross validation. The misclassification rate was calculated for all six methods to assess their performance. To evaluate the significance of the results, the data set was permuted 50 000 times. KNN was used to analyze all of the peptides assigned to four Patatin isoforms based on their primary structure in a reduced data set of six cultivars and 107 analyses. This resulted in a reduced matrix of 73 rows. Values that were 1 order of magnitude or more greater or smaller than the respective cultivar mean were replaced by the mean. The rows were normalized to the rootmean-square instead of Log transformation. Two rounds of KNN were performed to increase the size of the original training set with a p-value threshold of 0.05 and 0.01 respectively for correlation to the training set.

’ RESULTS Mass Accuracy Precursor Alignment

Measurement of a peptide m/z with an error in the low parts per million (ppm) range specifies its primary structure composition.22,23 This is an unambiguous definition with the exception of peptides containing isobaric amino acid residues (AA) and peptides with the same primary structure composition but different sequence. The LTQ/Orbitrap mass spectrometer is capable of this accuracy routinely, making it applicable to thousands of peptides in shotgun proteomics analyses. The abundance of each measured peptide can be inferred by spectral counting,6,24,25 that is, counting the number of times a peptide is subjected to dissociation and its fragment ions are recorded in an MS/MS spectrum. Thus, a proteome is accurately defined by two parameters, the m/z for peptide identity and the MS/MS spectral count for peptide abundance. For comparison, a set of proteomes can be entered into a matrix as vectors of the fixed dimensionality

ARTICLE

Figure 1. Independent component analysis (ICA) of all of the 183 shotgun proteomics analyses of the 12 cultivars of potato grown on the two fields (B€ohlendorf and Ebstorf) in the 2006 growing season. Six principal components were used for the ICA which captured 54.9% of the total variance of the data. Independent components 2 and 3 are plotted, both with a negative kurtosis, indicating a significant difference in parameter values between two or more classes.

of all of the peptides in the entire set. The row vectors quantify each peptide because the accuracy of the m/z measurement traces peptides throughout all of the analyses. We have shown the technical feasibility of MAPA especially the sustainment of m/z measurement and quantification accuracy over weeks of measurements previously. Reserpine was added to each sample as an internal standard so that 800 fmol were present per shotgun proteomics analysis. The coefficient of variance of the chromatographic retention time of the reserpine peak was 3%; it was 39% for the peak area for all of the analyses. This indicates the different peptide matrices do not substantially influence quantification. Further details can be found in our earlier publication.13 Protein Isoforms Discriminate Tuber Proteomes of All Investigated Cultivars

With MAPA, we aligned all of the peptides recorded in a total of 183 shotgun proteomics analyses of 12 cultivars of potato. Each cultivar was grown on two fields in Germany with distinct agricultural conditions (soil consistency, climate, and weather). The agronomic practices employed, including fertilization and crop protection, were the standard in Germany and the same on both fields. Four to six tubers per cultivar were used for independent protein extraction and proteomics measurements. Each of these protein samples was measured three or more times. Preliminary analysis of the data with independent component analysis (ICA) could only detect a very small influence of the conditions prevalent on the two fields on the tuber proteomes (Figure 1). On the other hand, eight of the 12 cultivars were each clearly separated from the others on a different independent component in ICA (data not shown). Phenotyping of the tubers performed in another study showed strong variability of susceptibility to black spot disease and chip quality dependent on the cultivar genotypes as well as on the agricultural conditions on the two fields in the 2006 growing season.26 Metabolite profiles acquired in the same study were distinct for each cultivar but very similar on both fields. This indicates that the cultivar genotypes had a stronger impact on the molecular and especially proteomics phenotypes than the environment in the experiment. To further mine the data, we employed a series of uni- and multivariate statistical techniques fulfilling good-practice standards. The most significant components for which the cultivar 2981

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research

ARTICLE

Figure 3. Hierarchical clustering of the 123 highly significant peptides used for the predictions. Peptide abundance is given in log10 transformed raw spectral counts with 0 (meaning a spectral count of 1) as the limit of detection. Sequences that are highly specific for a cultivar are boxed. The identities of the peptides and their significance are listed in Table 1. Cultivar names are abbreviated as in figure 2. * Indicates the misclassification of two analyses of Agria. § Indicates the misclassification of three analyses of Red Fantasy.

Figure 2. Prediction of class labels of shotgun proteomics analyses and identification of 123 highly significant features for cultivar discrimination. (Top left) Relative importance of the 123 predictors for classification. (Top right) Distribution of the misclassification rate for four supervised classification techniques employing the 123 univariate selected predictors as well as multivariate selected PLS components. (Center) Confusion matrix showing 18 300 predictions with the best performing method, Random Forest (RF); Cultivar names are abbreviated: Ag, Agria; Al, Alliance; Ar, Arnika; F, Festien; G, Goldika; K, Kuras; LC, Lady Claire; Ma, Marabel; Mi, Milva; O, Omega; ReF, Red Fantasy; T, Topas. (Bottom) Misclassification rate for the prediction and distribution of the misclassification rate for predictions for 50 000 permutations of the data showing the original prediction is highly significant.

showed an influence on protein abundance and which would thus most likely be different between the cultivars were selected with Bayes moderated F statistics and, alternatively, partial least-squares (PLS) to consider possible correlations. The univariate selection was performed on different subsets of the data that were generated in four different ways (splitting rules) 500 times each (total 2000 selections) and that functioned as training sets to build linear models for prediction of the cultivar labels in the remaining data. The 123 components that consistently were the most significant and whose importance was evenly spread (Figure 2, top left) were then used to test the models, or in other words, for the actual predictions. This ensures that the selected peptides are independent of the model evaluation and are truly meaningful as predictors for the entire data set and that spurious correlations occurring by chance for a very large number of components in comparatively few analyses are avoided. A total of 18 300 predictions were made in 500 subsets that consisted of one-fifth of the data generated according to 5-fold

cross validation for each of four supervised classification methods, linear discriminant analysis (LDA), support vector machine (SVM), k-nearest neighbor classification (KNN), and random forest (RF), using the 123 univariate and in two cases the PLS multivariate selected components (Figure 2, top right). The average misclassification rate was less than 0.5% with the univariate selected predictors for each approach except LDA. For KNN and RF, even outlier misclassification rates were below 3% showing the high performance of these methods with the univariate selections. RF slightly outperformed KNN correctly classifying 99.9% of the shotgun proteomics analyses (Figure 2, center). Only 19 of the total 18 300 predictions were misclassified; in all cases, Red Fantasy was confused with Agria. This almost perfect result not only demonstrates the power of our approach, MAPA, in conjunction with multivariate statistics but perhaps more importantly, the ability of the selected peptides to discriminate the cultivars. The significance of the result was evaluated by permuting the data set 50 000 times and performing the same amount of predictions on each permutation. The average misclassification rate for the permutations was 91%, the minimum 76%, the p-value for the actual prediction was 1.0388  10254 (Figure 2, bottom). Because the order in which the specimens of the cultivars were measured was shuffled, a technical cause for the significance of the prediction is excluded. All of the shotgun proteomics analyses except five, again due to confusion between Red Fantasy and Agria, were also correctly classified in hierarchical clustering of the predictor peptides (Figure 3). These peptides clearly define the unique proteomic phenotype of each cultivar. Some of them are found in multiple cultivars with different abundance while others are exclusive markers (boxed). Primary structure and protein annotation could be assigned to 88 of the 123 predictors by matching 2982

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research

ARTICLE

Table 1. The 123 Peptides Used for the Prediction of the Cultivar Labels of the Shotgun Proteomics Analysesa

2983

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research

ARTICLE

Table 1. Continued

2984

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research

ARTICLE

Table 1. Continued

a Peptides are shown in the same order as in Figure 3, left to right. Colored blocks correspond to boxed blocks on Figure 3. P-values and corrected p-values are shown. V_IMP, importance of the variable as in Figure 1; BH, Benjamini Hochberg, NI, not identified.

2985

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research

ARTICLE

Table 2. Peptides That Discriminate Four Patatin Isoforms

a

accession number

isoform name

precursor m/z

peptide sequencea

total spectral count

TA23358_4113 602.8

Patatin protein group A-3 [Solanum tuberosum (Potato)] AQEDPAFASIR

869.44

VQENALTGTTTKADDASEANMELLAQVGENLLK

680.85

RAQEDPAFASIR

237

744.37

ADDASEANMELLAQVGENLLK

648

894.45

MLLLSLGTGTTSEFDKTHTAEETAK

536

1158.92 1116.05

VQENALTGTTTKADDASEANMELLAQVGENLLK ADDASEANMELLAQVGENLLK

775 993

STRNA01

Patatin precursor [Solanum tuberosum (Potato)]

1075.18

VQENALTGTTTEMDDASEANMELLVQVGEK

TA23310_4113

Patatin T5 precursor [Solanum tuberosum (Potato)]

919.22

VQENALTGTTTELDDASEANMQLLVQVGEDLLKK

TA23357_4113

Patatin [Solanum tuberosum (Potato)]

1775.87

VQENALTGTTTEMDDASEANMELLVQVGETLLK

338 1022

15988 294 3559

Isomorphic peptides are underlined.

MS/MS spectra to translated expressed nucleic acid sequences (Table 1). Most of them are tryptic peptides of the isoforms of the major gene families expressed during and/or controlling tuberization, the patatins, lipoxygenases (LOXs), and proteinase inhibitors. Some could be assigned to enzymes involved in carbohydrate metabolism. Fructokinase and sucrose synthase are two enzymes involved in sucrose unloading and mobilization during tuber enlargement and starch accumulation. The amyloplastic alpha 1,4 glucan phosphorylase also known as L1 isozyme catalyzes the phosphorolytic degradation of starch. Quantification of Protein Isoforms from Measured Abundance of Tryptic Peptides

Protein isoforms have extensive sequence similarity. A tryptic peptide can be a unique primary structure constituent of a single isoform (proteotypic). In most cases, however, it will be common to many of its homologues in the proteome. Considering medium to large size gene families such as the patatins (6472 genes with >80% AA sequence similarity per tetraploid genome), the picture of peptides that are common to the entire set of expressed isoforms, shared by a subset of these, or are unique to a single isoform is confounding. This makes the inference of the abundance of protein isoforms from the measured abundance of their tryptic peptides with conventional strategies that consider the mean or median of all tryptic peptides or the three most intense tryptic peptides difficult.27 We assume here that, if a set of peptides can be assigned to a specific isoform based on a common abundance profile as well as primary structure in a large data set, then a correct estimate should be possible purely as a matter of probabilities. As a test case for our novel assumption, we applied multivariate statistical classification to all of the tryptic peptides of four highly similar patatin isoforms that were identified in a reduced data set consisting of the shotgun proteomics measurements of six of the potato cultivars. Several peptides that could be uniquely assigned to the four isoforms were among those with the most pronounced difference in abundance in the cultivars (data not shown). The set of peptides included one peptide with highly similar primary structure for each (Table 2, underlined). This indicates that these isoforms are high discriminatory features of the cultivar

proteomes. The range of the total spectral counts of the peptides of the patatin TA23358_4113 can be explained by variable ESI response and selective suppression of ions due to insufficient resolution of the complex peptide matrix in the shotgun proteomics technology. The range of the ionized species of the same peptide, however, is small and altogether the abundance of the peptides was highly correlated (Figure 4, top), suggesting that in this case the abundance of the individual peptides reflects the abundance of the respective isoform. We tested whether protein isoforms can be quantified using a set of peptides that are assigned to a particular isoform based on their primary structure and additionally feature a common abundance profile in a differential proteomics data set. We classified all of the peptides in the whole data set that belonged to the four patatin isoforms in Table 2 based on their abundance in the cultivars. These were a total of 73 precursor m/z. We used k-nearest neighbor classification (KNN), a multivariate, nonparametric, supervised classification method that assigns an unclassified sample to the class of its k-nearest neighbors in a set of a priori correctly classified samples, the training set, in a multidimensional space. The peptides in Table 2, which were all proteotypic for the respective patatin isoforms, were used as a training set. More than half of the total peptides (including the training set and unclassified test set) could be classified as belonging to one of the isoforms based on the correlation of their abundance with the training set with >99% confidence (Figure 4, left). The corresponding primary structure assignments were found to be reasonably correct (Figure 4, right). More than half of the primary structure assignments of peptides classified as belonging to TA23358_4113 based on their abundance in the cultivars were correct and 100% of the assignments were correct for the isoforms STRNA01 and TA23310_4113. Two-thirds of the primary structure assignments for TA23357_4113 were also correct. These results indicate that statistical procedures can extract a set of peptides that may be used for quantification of protein isoforms with high confidence from a host of otherwise convoluted data. Peptide and Protein Identification from EST Database Search

To achieve deep coverage of the tuber proteome, we applied the most stringent criteria to peptide and protein identification 2986

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research

ARTICLE

Figure 5. Increased identification of peptides and proteins with repeated sample analysis exemplified by analyses in six cultivars. (Top) Number of identified peptides and proteins. (Bottom) Percent increase to the previous number of analyses. The panels show analyses of, from left to right, top to bottom, Goldika, Kuras, Lady Claire, Marabel, Milva and Topas, the cultivar names are abbreviated as in Figure 2. Dark gray, peptides; light gray, proteins with at least one unique peptide assignment; black, proteins with at least two unique peptide assignments. Only the most significant peptide identities from the Mascot software are plotted. Figure 4. Quantification of patatin isoforms. (Top) The abundance of peptides assigned to the isoform TA23358_4113 is highly correlated. (Bottom left) Abundance profiles for each of the four isoforms determined by the correlation of the abundance of all precursor m/z assigned to the four isoforms based on their primary structure in the entire data set with a p-value threshold of 0.01. Cultivar names are abbreviated as in Figure 2. (Right) Primary structure assignments of the peptides with significant correlation colored accordingly. One indicates the peptide can, 0 it cannot, be assigned to the particular isoform; peptides that could be assigned to multiple isoforms are colored parsimoniously.

and took advantage of the large number of analyses performed for statistical comparison. The repeated measurement of the same proteome has been shown to be as effective as the complementary use of different mass spectrometry platforms and more effective than the complementary use of different database search software for increasing peptide and protein identification.28 A logarithmic increase of nonredundant identified peptides and proteins was observed (R2 > 0.98 to y = y0 þ a ln x) (Figure 5, top) as well as an increase of around 40% for peptide identities and protein identities with one and well over 50% for protein identities with two unique peptide assignments after just three compared to a single analysis (Figure 5, bottom). Clearly this strategy invites exploitation. The genome of the potato tuber is not fully sequenced (for information see potatogenome.net), so a six frame translation of an EST database containing assembled contigs was used to

identify peptides and proteins. Each EST is represented six times and the 30 and at times also 50 UTR are incorporated into protein primary structure so the database is artificially exploded (81 072 sequences and 116 221 416 AA). The Mascot software that we have employed for database search calculates a probabilistic score for peptide identities taking into account the total size of the database and the molecular weight, that is, the primary structure of each protein entry.14 The scores for 99% confidence for a database of all proteins encoded by the Arabidopsis thaliana genome which is completely sequenced and has around 25 000 ORFs (33 018 sequences and 13 232 619 AA) and the Swiss-Prot database which is one of the leading collections of protein sequences and contains entries from all kingdoms of life (265 950 sequences and 97 521 944 AA) were >17 and >33, respectively. We selected a score of >34 for searches with the EST database with the same search parameters for higher confidence than 99% for each peptide identity. Furthermore, an m/z measurement with an error of 1 ppm is 100 times more significant than one with nominal mass accuracy23,29,30 so, because the average statistical mass accuracy of all identified peptides was 1.4 ppm (standard deviation 1 ppm), we estimate less than 1 in 10 000 false positives for 127 670 peptides (around 10% of all recorded MS/MS spectra) and 1172 proteins identified with Mascot. To increase the number of identities, we employed a second database search software called Inspect. A false discovery rate (FDR) was determined with the EST database additionally 2987

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research

ARTICLE

containing all entries in reverse order31 to interpret the results in a probabilistic framework while controlling the family wise error rate. A p-value threshold of 0.01 was selected to identify 553 188 peptides (around 37%) with higher than 99% confidence. Together, both software identified 3829 proteins with a p-value threshold of 0.01 and at least one unique peptide assignment. The annotated proteins and MS/MS spectra were deposited in Promex, a public library for reference spectra at the Department of Molecular Systemsbiology at the University of Vienna and are accessible under the URL http://promex.pph.univie.ac.at/promex/. Quantitative Description of the Tuber Proteome

A linear correlation between spectral count and the relative abundance of not only the same peptide or protein, but also of different peptides and proteins in terms of the fraction of the total spectral count of all detected proteins is well established.32 Therefore, we were able to use the highest confidence identities from the Mascot software (essentially all true positive) to quantify the tuber proteome. The total spectral count of each protein and accordingly the dynamic range of quantification for the entire data set are plotted in Figure 6, top left and right. The dynamic range is four, with a linear region of approximately two, orders of magnitude and follows a power law (R2 > 0.98 to y = y0 þ (a/x) þ (b/x2)). This is essentially the same as in two recent large-scale studies where it was 3 orders of magnitude;4,33 the exponential slope into the fourth order reflects the particular make up of the tuber proteome (approximately 100 protein isoforms comprise more than half of the entire molar abundance). The proteins were annotated with the UniProt protein knowledgebase, a curated collection of functional information on proteins (Figure 6, bottom). About 40% of proteins are patatins, as is well-known. The other abundant tuber proteins, protease inhibitors, primarily KPI and serine, cysteine, and aspartic acid protease inhibitors and LOXs (assigned to Fatty acid metabolism) were also detected with a high number of MS/ MS spectra. The class primary metabolism contains enzymes of glycolysis/gluconeogenesis, citric acid cycle, and oxidative phosphorylation; the enzymes of starch metabolism were entered into carbohydrate metabolism. Proteases contain strictly proteases/peptidases. Ubiquitin and subunits of the proteasome were classified as Proteasome. Stress response and pathogen resistance are underrepresented because proteins with these functions such as protease inhibitors, LOXs, electron scavengers, and antioxidants and patatins34,35 were assigned to their own classes or classes more reflecting their biochemical nature. We have identified and quantified several hundred previously undetected proteins including low-abundance proteins making this the most comprehensive description of the tuber proteome to date. The list of proteins and their relative abundance is available upon request.

’ DISCUSSION We have developed a powerful yet simple concept for largescale comparative proteome analysis that we used to quantify expressed genetic variability in tubers of 12 cultivars of potato. The basic principle of MAPA, high mass accuracy m/z measurement to define and spectral counting to quantify peptides, should prove adaptable to other researcher’s specific needs. Because MAPA reduces proteomics data to two dimensions, it can overcome the problem of congestion with minimal computational power and time. Indeed, the fully quantitative comparative matrix that contained the spectral counts of all peptides measured in all of the 183 shotgun proteomics analyses was produced

Figure 6. Characterization of the potato tuber proteome. (Top left) Abundance of total proteins identified with the highest confidence from the Mascot software. (Top right) Dynamic range of quantification. Spectral count index is an index of all unique spectral count parameter values in the data set. (Bottom) Biochemical/functional classification of the proteome.

in about 1 h on a quad core personal computer. MAPA is more flexible than solutions employing stable isotope labeling because it can be applied to any number of proteomes occurring in natura. Unlike other bottom-up shotgun proteomics approaches, it does not require peptide identification for comparison, so it can be of use even when there is little sequence information available for database search. It is particularly attractive for analyzing expressed genetic variability or post-translational modification because discriminating stretches of primary structure (tryptic peptides) of highly similar proteins can be readily extracted from the data matrix as quantitative features. We therefore believe MAPA is ideal for preliminary screening in the search for molecular determinants and biomarkers. We have found that the most significant differences in the proteomes of the tubers of 12 tetraploid cultivars of S. tuberosum are protein isoforms of the patatins, LOXs, cysteine proteinase inhibitors, KPI, and type 1 and 2 proteinase inhibitors, the major gene families expressed during tuberization. While there are previous indications of this,36,37 ours is the first study that distinguishes the proteomic phenotypes as well as the abundance of different protein isoforms of so many cultivars clearly in such detail. The gene families are medium to small in size (6472 genes for patatins;38 around 20 for LOXs,39 Kunitz,40 and type 2 2988

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research proteinase inhibitors;41 and 10 or less for type I proteinase inhibitors42 per tetraploid genome). Patatins and KPI each map to a single locus with extensive sequence similarity (90% for coding regions and >80% for AA for patatins and around 80% and >77%, respectively, for two of three homology groups of KPI);40,43 LOX coding sequences in tuber are more than 96% identical.39 Our results show extensive intra- and intercultivar variability in the expression of these genes which reflects the rapid molecular evolution of the coding and especially the promoter regions due to breeding as well as allelic polymorphism.40,43,44 The functional significance of many of these protein isoforms, particularly the effect of LOXs on tuber size, yield, and morphology is well documented.39,4553 Their expression is developmental and organ specific and involved in induction of tuberization and development, pathogen defense, protein storage, and JA pathways. Considering the differential intercultivar abundance, it is therefore likely that they contribute individually to the cultivar plant phenotype and to cultivar specific traits. This is underscored by recent genetic studies in Arabidopsis. Extensive intraspecific polymorphism (the genome comprises approximately 120 megabases with a SNP density of 1 per 500 bp) and an impact on protein coding genes (the integrity of 10% of protein coding genes is strongly affected) was uncovered and could be mapped to protein classes and different plant phenotypes in genome wide association (GWA) studies.54,55 The picture is more far reaching in crop species such as barley and maize where for the latter it has been shown that over 50% of the genome is divergent between strains.56 The importance of natural genetic variation for phenotypic traits cannot be overstated. The effect of the environmental conditions on the two different fields during the growing season of 2006 on the tuber proteomes was slight. Others found that only different fertilization regimes affected tuber protein profiles while crop protection and crop rotation had no effect.57 In our study, the agronomic practices on both fields were the same. The metabolite profiles of the tubers were also not affected.26 Similar results were reported for wheat.58 Phenotypic characteristics, however, varied dependent on the fields on which the tubers were grown.26 The features that we found to define the cultivar proteomes are highly significant. It remains to determine their expression in subsequent growing seasons. Nevertheless, it seems the markers we describe are robust and may be useful as a minimal definition of the specific cultivar proteomes. To further characterize the protein isoforms and their effects on the plant phenotype, it is interesting to measure their hydrophobicity, analyze the occurrence of possible post-translational modifications, and implement reverse genetic approaches. Measurement of tissue specific protein abundance and depletion of the abundant protein isoforms we describe here would complete the picture of the tuber proteome; however, the latter may introduce additional experimental bias. In addition to the isoforms of the abundant tuber proteins, several lower abundance proteins were found to be discriminatory for the cultivar proteomes. Most of these are involved in carbohydrate metabolism, a process that is central to potato tuber development.50 We show using as an example four patatin isoforms, that it is possible to determine the relative quantity of highly similar proteins that share the same peptides in tryptic digests of distinct proteomes, based on the correlation of measured peptide abundance. This is especially effective if the common peptide, and thus, protein abundance profiles are dissimilar from those of

ARTICLE

other homologous isoforms and the fold changes in the different proteomes are pronounced; hence, there were no primary structure assignments of peptides assigned to the patatin isoform TA23357_4113 (blue) to the other three isoforms colored red. It should be noted that the abundance of these three isoforms is very similar which illustrates the problem. However, the statistical selection of tryptic peptides can differentiate them. Conversely, even very small fold changes can be elucidated from the data with high confidence as is the case for the former, blue colored isoform. These results indicate that protein isoforms max be quantified from the abundance of peptides if statistical pattern recognition is applied stringently to filter and disseminate large amounts of shotgun proteomics data. We propose a statistically determined subset of peptides should be used to infer protein abundance rather than simply the mean or median abundance of all proteotypic peptides or the three most abundant peptides. If a higher resolution of the protein species is desired, top-down techniques such as 2-DE that separate the proteins prior to MS analysis can be employed. We have compiled the largest set of tuber proteins to date. The mass spectra we used to identify them have been deposited in Promex, a public library that contains reference mass spectra of several organisms. It can be used by researchers as a database to search for peptide and protein identities with their own mass spectrometric data for added confidence in identification. We have also produced the largest high quality quantitative annotation of the tuber proteome thus far. In addition to the well-known tuber protein families, it includes a large number of previously unidentified low-abundance proteins. This is evident when one considers the quantitative dynamic range of the data set which is the same as previous large-scale shotgun proteomics studies and was achieved by exhaustive repeated sampling of the tuber proteome. The potato tuber remains to be fully sequenced. Therefore, our collection of several thousand proteins representing expressed open reading frames should be an invaluable resource for further potato research.

’ AUTHOR INFORMATION Corresponding Author

*Dr. Wolfgang Hoehenwarter, Department of Molecular Systems Biology, University of Vienna, Althanstr. 14 A-1090, Vienna, Telephone: þ 43-1-4277-577-02. Fax: þ 43-1-4277-9577. E-mail: [email protected].

’ ACKNOWLEDGMENT We thank Waltraud Schulze for help with mass spectrometry, Muhammad Waqar Hameed, and Stefan Kempa for insightful discussion. We thank Silke Ulrich, Ines Fehrle and others for help with the sample preparation. We thank the BMBF/PTJ, INNOX-Quantpro and ERANET-Plant Genomics for financial support. ’ REFERENCES (1) de Godoy, L. M. F.; Olsen, J. V.; Cox, J.; Nielsen, M. L.; Hubner, N. C.; Frohlich, F.; Walther, T. C.; Mann, M. Comprehensive massspectrometry-based proteome quantification of haploid versus diploid yeast. Nature 2008, 455 (7217), 1251–U60. (2) Klose, J.; Kobalz, U. 2-Dimensional electrophoresis of proteins— An updated protocol and implications for a functional-analysis of the genome. Electrophoresis 1995, 16 (6), 1034–1059. 2989

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research (3) McCormack, A. L.; Schieltz, D. M.; Goode, B.; Yang, S.; Barnes, G.; Drubin, D.; Yates, J. R. Direct analysis and identification of proteins in mixtures by LC/MS/MS and database searching at the low-femtomole level. Anal. Chem. 1997, 69 (4), 767–776. (4) Kislinger, T.; Cox, B.; Kannan, A.; Chung, C.; Hu, P. Z.; Ignatchenko, A.; Scott, M. S.; Gramolini, A. O.; Morris, Q.; Hallett, M. T.; Rossant, J.; Hughes, T. R.; Frey, B.; Emili, A. Global survey of organ and organelle protein expression in mouse: Combined proteomic and transcriptomic profiling. Cell 2006, 125 (1), 173–186. (5) Li, X.; Yi, E.; Zhang, H.; Aebersold, R. Non-labeling liquid chromatography-mass spectrometry-based quantitative proteomics. Mol. Cell. Proteomics 2005, 4 (8), S324–S324. (6) Old, W. M.; Meyer-Arendt, K.; Aveline-Wolf, L.; Pierce, K. G.; Mendoza, A.; Sevinsky, J. R.; Resing, K. A.; Ahn, N. G. Comparison of label-free methods for quantifying human proteins by shotgun proteomics. Mol. Cell. Proteomics 2005, 4 (10), 1487–1502. (7) Ono, M.; Shitashige, M.; Honda, K.; Isobe, T.; Kuwabara, H.; Matsuzuki, H.; Hirohashi, S.; Yamada, T. Label-free quantitative proteomics using large peptide data sets generated by nanoflow liquid chromatography and mass spectrometry. Mol. Cell. Proteomics 2006, 5 (7), 1338–1347. (8) Patwardhan, A. J.; Strittmatter, E. F.; David, G. C.; Smith, R. D.; Pallavicini, M. G. Quantitative proteome analysis of breast cancer cell lines using O-18-labeling and an accurate mass and time tag strategy. Proteomics 2006, 6 (9), 2903–2915. (9) Wiener, M. C.; Sachs, J. R.; Deyanova, E. G.; Yates, N. A. Differential mass spectrometry: A label-free LC-MS method for finding significant differences in complex peptide and protein mixtures. Anal. Chem. 2004, 76 (20), 6085–6096. (10) Zhang, H.; Yi, E. C.; Li, X. J.; Mallick, P.; Kelly-Spratt, K. S.; Masselon, C. D.; Camp, D. G.; Smith, R. D.; Kemp, C. J.; Aebersold, R. High throughput quantitative analysis of serum proteins using glycopeptide capture and liquid chromatography mass spectrometry. Mol. Cell. Proteomics 2005, 4 (2), 144–155. (11) Cox, J.; Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteomewide protein quantification. Nat. Biotechnol. 2008, 26 (12), 1367–1372. (12) Nesvizhskii, A. I.; Aebersold, R. Interpretation of shotgun proteomic data—The protein inference problem. Mol. Cell. Proteomics 2005, 4 (10), 1419–1440. (13) Hoehenwarter, W.; van Dongen, J. T.; Wienkoop, S.; Steinfath, M.; Humme, J.; Erban, A.; Sulpice, R.; Regierer, B.; Kopka, J.; Geigenberger, P.; Weckwerth, W. A rapid approach for phenotypescreening and database independent detection of cSNP/protein polymorphism using mass accuracy precursor alignment. Proteomics 2008, 8 (20), 4214–4225. (14) Perkins, D. N.; Pappin, D. J. C.; Creasy, D. M.; Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20 (18), 3551–3567. (15) Tsur, D.; Tanner, S.; Zandi, E.; Bafna, V.; Pevzner, P. A. Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol. 2005, 23 (12), 1562–1567. (16) Slawski, M.; Daumer, M.; Boulesteix, A. L. CMA—a comprehensive Bioconductor package for supervised classification with high dimensional data. BMC Bioinf. 2008, 9, 439. (17) Saeed, A. I.; Sharov, V.; White, J.; Li, J.; Liang, W.; Bhagabati, N.; Braisted, J.; Klapa, M.; Currier, T.; Thiagarajan, M.; Sturn, A.; Snuffin, M.; Rezantsev, A.; Popov, D.; Ryltsov, A.; Kostukovich, E.; Borisovsky, I.; Liu, Z.; Vinsavich, A.; Trush, V.; Quackenbush, J. TM4: A free, open-source system for microarray data management and analysis. BioTechniques 2003, 34 (2), 374–þ. (18) Daub, C. O.; Kloska, S.; Selbig, J. MetaGeneAlyse: analysis of integrated transcriptional and metabolite data. Bioinformatics 2003, 19 (17), 2332–2333. (19) Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach Learn 2002, 46 (13), 389–422. (20) Theilhaber, J.; Connolly, T.; Roman-Roman, S.; Bushnell, S.; Jackson, A.; Call, K.; Garcia, T.; Baron, R. Finding genes in the C2C12

ARTICLE

osteogenic pathway by k-nearest-neighbor classification of expression data. Genome Res. 2002, 12 (1), 165–176. (21) Breiman, L. Random forests. Mach. Learn. 2001, 45 (1), 5–32. (22) Clauser, K. R.; Baker, P.; Burlingame, A. L. Role of accurate mass measurement (( 10 ppm) in protein identification strategies employing MS or MS MS and database searching. Anal. Chem. 1999, 71 (14), 2871–2882. (23) Zubarev, R. A.; Hakansson, P.; Sundqvist, B. Accuracy requirements for peptide characterization by monoisotopic molecular mass measurements. Anal. Chem. 1996, 68 (22), 4060–4063. (24) Liu, H. B.; Sadygov, R. G.; Yates, J. R. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 2004, 76 (14), 4193–4201. (25) Wienkoop, S.; Larrainzar, E.; Niemann, M.; Gonzalez, E. M.; Lehmann, U.; Weckwerth, W. Stable isotope-free quantitative shotgun proteomics combined with sample pattern recognition for rapid diagnostics. J. Sep. Sci. 2006, 29 (18), 2793–2801. (26) Steinfath, M.; Strehmel, N.; Peters, R.; Schauer, N.; Groth, D.; Hummel, J.; Steup, M.; Selbig, J.; Kopka, J.; Geigenberger, P.; van Dongen, J. T. Discovering plant metabolic biomarkers for phenotype prediction using an untargeted approach. Plant Biotechnol. J. 2010, 8 (8), 900–911. (27) Silva, J. C.; Gorenstein, M. V.; Li, G. Z.; Vissers, J. P. C.; Geromanos, S. J. Absolute quantification of proteins by LCMSE - A virtue of parallel MS acquisition. Mol. Cell. Proteomics 2006, 5 (1), 144–156. (28) Elias, J. E.; Haas, W.; Faherty, B. K.; Gygi, S. P. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods 2005, 2 (9), 667–675. (29) Olsen, J. V.; Ong, S. E.; Mann, M. Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteomics 2004, 3 (6), 608–614. (30) Zubarev, R.; Mann, M. On the proper use of mass accuracy in proteomics. Mol. Cell. Proteomics 2007, 6 (3), 377–381. (31) Elias, J. E.; Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 2007, 4 (3), 207–214. (32) Ishihama, Y.; Oda, Y.; Tabata, T.; Sato, T.; Nagasu, T.; Rappsilber, J.; Mann, M. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol. Cell. Proteomics 2005, 4 (9), 1265–1272. (33) de Godoy, L. M. F.; Olsen, J. V.; de Souza, G. A.; Li, G. Q.; Mortensen, P.; Mann, M., Status of complete proteome analysis by mass spectrometry: SILAC labeled yeast as a model system. Genome Biol. 2006, 7, (6), -. (34) Ryan, C. A. Protease inhibitors in plants—Genes for improving defenses against insects and pathogens. Annu. Rev. Phytopathol. 1990, 28, 425–449. (35) Strickland, J. A.; Orr, G. L.; Walsh, T. A. Inhibition of diabrotica larval growth by patatin, the lipid acyl hydrolase from potato-tubers. Plant Physiol. 1995, 109 (2), 667–674. (36) Bauw, G.; Nielsen, H. V.; Emmersen, J.; Nielsen, K. L.; Jorgensen, M.; Welinder, K. G. Patatins, Kunitz protease inhibitors and other major proteins in tuber of potato cv. Kuras. FEBS J. 2006, 273 (15), 3569–3584. (37) Lehesranta, S. J.; Davies, H. V.; Shepherd, L. V. T.; Nunan, N.; McNicol, J. W.; Auriola, S.; Koistinen, K. M.; Suomalainen, S.; Kokko, H. I.; Karenlampi, S. O. Comparison of tuber proteomes of potato varieties, landraces, and genetically modified lines. Plant Physiol. 2005, 138 (3), 1690–1699. (38) Twell, D.; Ooms, G. Structural diversity of the patatin gene family in potato Cv desiree. Mol. Gen. Genet. 1988, 212 (2), 325–336. (39) Kolomiets, M. V.; Hannapel, D. J.; Chen, H.; Tymeson, M.; Gladon, R. J. Lipoxygenase is involved in the control of potato tuber development. Plant Cell 2001, 13 (3), 613–626. (40) Heibges, A.; Glaczinski, H.; Ballvora, A.; Salamini, F.; Gebhardt, C. Structural diversity and organization of three gene families for 2990

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991

Journal of Proteome Research Kunitz-type enzyme inhibitors from potato tubers (Solanum tuberosum L.). Mol. Genet. Genomics 2003, 269 (4), 526–534. (41) Keil, M.; Sanchezserrano, J. J.; Willmitzer, L. Both WoundInducible and Tuber-Specific Expression Are Mediated by the Promoter of a Single Member of the Potato Proteinase Inhibitor-II gene family. EMBO J. 1989, 8 (5), 1323–1330. (42) Cleveland, T. E.; Thornburg, R. W.; Ryan, C. A. Molecular Characterization of a Wound-Inducible Inhibitor-I Gene from Potato and the Processing of Its Messenger-Rna and Protein. Plant Mol Biol 1987, 8 (3), 199–207. (43) Ganal, M. W.; Bonierbale, M. W.; Roeder, M. S.; Park, W. D.; Tanksley, S. D. Genetic and physical mapping of the patatin genes in potato and tomato. Mol. Gen. Genet. 1991, 225 (3), 501–509. (44) Stupar, R. M.; Beaubien, K. A.; Jin, W. W.; Song, J. Q.; Lee, M. K.; Wu, C. C.; Zhang, H. B.; Han, B.; Jiang, J. M. Structural diversity and differential transcription of the patatin multicopy gene family during potato tuber development. Genetics 2006, 172 (2), 1263–1275. (45) Koda, Y.; Kikuta, Y.; Tazaki, H.; Tsujino, Y.; Sakamura, S.; Yoshihara, T. Potato tuber-inducing activities of jasmonic acid and related-compounds. Phytochemistry 1991, 30 (5), 1435–1438. (46) Castro, G.; Kraus, T.; Abdala, G. Endogenous jasmonic acid and radial cell expansion in buds of potato tubers. J. Plant Physiol. 1999, 155 (6), 706–710. (47) Jackson, S. D. Multiple signaling pathways control tuber induction in potato. Plant Physiol. 1999, 119 (1), 1–8. (48) Hendriks, T.; Vreugdenhil, D.; Stiekema, W. J. Patatin and 4 serine proteinase-inhibitor genes are differentially expressed during potato-tuber development. Plant Mol. Biol. 1991, 17 (3), 385–394. (49) Weeda, S. M.; Mohan Kumar, G. N.; Richard Knowles, N. Developmentally linked changes in proteases and protease inhibitors suggest a role for potato multicystatin in regulating protein content of potato tubers. Planta 2009, 230 (1), 73–84. (50) Fernie, A. R.; Willmitzer, L. Molecular and biochemical triggers of potato tuber development. Plant Physiol. 2001, 127 (4), 1459–1465. (51) Bachem, C. W.; van der Hoeven, R. S.; de Bruijn, S. M.; Vreugdenhil, D.; Zabeau, M.; Visser, R. G. Visualization of differential gene expression using a novel method of RNA fingerprinting based on AFLP: analysis of gene expression during potato tuber development. Plant J. 1996, 9 (5), 745–753. (52) Hughes, R. K.; West, S. I.; Hornostaj, A. R.; Lawson, D. M.; Fairhurst, S. A.; Sanchez, R. O.; Hough, P.; Robinson, B. H.; Casey, R. Probing a novel potato lipoxygenase with dual positional specificity reveals primary determinants of substrate binding and requirements for a surface hydrophobic loop and has implications for the role of lipoxygenases in tubers. Biochem. J. 2001, 353 (Pt. 2), 345–355. (53) Royo, J.; Vancanneyt, G.; Perez, A. G.; Sanz, C.; Stormann, K.; Rosahl, S.; Sanchez-Serrano, J. J. Characterization of three potato lipoxygenases with distinct enzymatic activities and different organspecific and wound-regulated expression patterns. J. Biol. Chem. 1996, 271 (35), 21012–21019. (54) Atwell, S.; Huang, Y. S.; Vilhjalmsson, B. J.; Willems, G.; Horton, M.; Li, Y.; Meng, D. Z.; Platt, A.; Tarone, A. M.; Hu, T. T.; Jiang, R.; Muliyati, N. W.; Zhang, X.; Amer, M. A.; Baxter, I.; Brachi, B.; Chory, J.; Dean, C.; Debieu, M.; de Meaux, J.; Ecker, J. R.; Faure, N.; Kniskern, J. M.; Jones, J. D. G.; Michael, T.; Nemri, A.; Roux, F.; Salt, D. E.; Tang, C. L.; Todesco, M.; Traw, M. B.; Weigel, D.; Marjoram, P.; Borevitz, J. O.; Bergelson, J.; Nordborg, M. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature 2010, 465 (7298), 627–631. (55) Clark, R. M.; Schweikert, G.; Toomajian, C.; Ossowski, S.; Zeller, G.; Shinn, P.; Warthmann, N.; Hu, T. T.; Fu, G.; Hinds, D. A.; Chen, H. M.; Frazer, K. A.; Huson, D. H.; Schoelkopf, B.; Nordborg, M.; Raetsch, G.; Ecker, J. R.; Weigel, D. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science 2007, 317 (5836), 338–342. (56) Morgante, M. Plant genome organisation and diversity: the year of the junk!. Curr. Opin. Biotechnol. 2006, 17 (2), 168–173.

ARTICLE

(57) Lehesranta, S. J.; Koistinen, K. M.; Massat, N.; Davies, H. V.; Shepherd, L. V. T.; McNicol, J. W.; Cakmak, I.; Cooper, J.; Luck, L.; Karenlampi, S. O.; Leifert, C. Effects of agricultural production systems and their components on protein profiles of potato tubers. Proteomics 2007, 7, 597–604. (58) Zorb, C.; Langenkamper, G.; Betsche, T.; Niehaus, K.; Barsch, A. Metabolite profiling of wheat grains (Triticum aestivum L.) from organic and conventional agriculture. J. Agric. Food Chem. 2006, 54 (21), 8301–8306.

2991

dx.doi.org/10.1021/pr101109a |J. Proteome Res. 2011, 10, 2979–2991