Targeted Proteomics Approach for Precision Plant ... - ACS Publications

Dec 25, 2015 - mapping populations.8. Quantitative proteomics facilitate analysis of .... proteomics data published earlier.20 The raw shotgun data fi...
0 downloads 9 Views 1MB Size
Subscriber access provided by UNIV OF CALIFORNIA SAN DIEGO LIBRARIES

Article

A targeted proteomics approach for precision plant breeding Aakash Chawade, Erik Alexandersson, Therese Bengtsson, Erik Andreasson, and Fredrik Levander J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.5b01061 • Publication Date (Web): 25 Dec 2015 Downloaded from http://pubs.acs.org on January 1, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

A targeted proteomics approach for precision plant breeding Aakash Chawade1, Erik Alexandersson2, Therese Bengtsson2#, Erik Andreasson2§, Fredrik Levander1,3§* 1

Department of Immunotechnology, Lund University, Sweden

2

Department of Plant Protection Biology, Swedish University of Agricultural Sciences, Alnarp, Sweden

3

Bioinformatics Infrastructure for Life Sciences (BILS), Lund University, Sweden

# Current address: Department of Plant Breeding, Swedish University of Agricultural Sciences, Alnarp, Sweden §

These authors contributed equally.

*

Corresponding author

Email: [email protected]; Phone: +46 46 222 38 35 Running title: Targeted proteomics for plant breeding Key words: Marker Assisted Selection, MAS, SRM, breeding, proteomics, potato, secretome, Solanum tuberosum, PASS00692

1 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Abstract Selected reaction monitoring (SRM) is a targeted mass spectrometry technique which enables precise quantitation of hundreds of peptides in a single run. This technique provides new opportunities for multiplexed protein biomarker measurements. For precision plant breeding, DNA based markers have been used extensively, but the potential of protein biomarkers has not been exploited. In this work, we developed an SRM marker panel with assays for 104 potato (Solanum tuberosum) peptides selected using univariate and multivariate statistics. Thereafter, using random forest classification, the prediction markers were identified for Phytopthora infestans resistance in leaves, P. infestans resistance in tubers, and for plant yield in potato leaf secretome samples. The results suggest that the marker panel has the predictive potential for three traits, two of which have no commercial DNA markers so far. Furthermore, the marker panel was also tested and found to be applicable to potato clones not used during the marker development. The proposed workflow is thus a proof-of-concept for targeted proteomics as an efficient readout in accelerated breeding for complex and agronomically important traits. Introduction Targeted proteomics using SRM is a method of choice for studies which require quantification of many proteins with high accuracy in large sample cohorts.1 SRM is a highly sensitive mass spectrometry technique which enables detection and quantification of pre-selected peptides in complex biological samples. The triple quadrupole mass spectrometer equipment used for SRM allows mass based separation of pre-selected precursor ions (peptides) and quantification of the desired transitions (fragments) generated from the precursor ions. Thus, an SRM assay involves quantification of the desired transitions which act as a proxy for the measurement of the corresponding precursor ions. At-least two and optimally four to six transitions from each peptide 2 ACS Paragon Plus Environment

Page 2 of 30

Page 3 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

are quantified to estimate the intensity of the corresponding peptide. SRM assays are highly reproducible within and across laboratories and between different instrument platforms.2 The limit of quantification (LOQ) of the assays can be as low as 0.66 fmol/ul in plasma samples2 and the detection levels for various proteins as low as 50 copies/cell in total protein extracts of S. cerevisiae.3 The high accuracy and sensitivity of SRM together with high reproducibility and short data-interpretation time, makes it the technology of choice for high-throughput quantitative analysis of proteins in hundreds of samples. The analytical performance of SRM has even been shown to be adequate for multiplexed biomarker validation in plasma, exemplified by triplicate measurements of 88 peptides in 80 plasma samples.4 SRM can also be used for identification and analysis of protein-based quantitative trait loci (QTL) leading to the understanding of molecular mechanisms functional at the posttranscriptional level. In a previous study, 48 proteins were quantified using SRM in a crossing population with 76 samples from a cross between two S. cerevisiae strains.5 Of the 48 proteins in the assay, 28 were found to be regulated by at least one protein quantitative trait locus.5 In another yeast study, it was shown that QTL influencing protein abundance differed from those influencing transcript abundance.6 Thus, identification of protein based QTL may allow for selection for traits not easily identifiable using traditional DNA-based assays. For marker-assisted plant breeding, QTL at the genomic level are frequently used to predict phenotypes.7 However, molecular mechanisms underlying important agronomic traits are complex and involve several layers of regulation, including gene regulation, post-translational modifications and protein interactions. Therefore, for many traits, robust and simple DNA based markers do not exist. Gene transcription and translation are intermediate steps between genetic perturbations and phenotypic variation, and transcriptome analysis has thus been extensively used 3 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

for predicting phenotypes in mapping populations.8 Quantitative proteomics facilitate analysis of protein abundance, post-translational modification, protein-protein interaction and cellular localization. Thus, in complex traits involving protein modifications and changes in protein abundance, quantitative protein estimation can be useful as markers to breed for complex traits. Proteomics based on 2D gel electrophoresis allowed the development of protein linkage maps in maize and wheat, thus demonstrating the applicability of protein based markers in plant breeding.9-12 Some proteins together with other molecules are secreted out to the apoplast and are collectively referred to as secretome.13 The secreted proteins in the apoplast play a key role in cell signaling and defense as they are usually the first contact between a plant cell and the invading pathogen.14 Upon detecting stress, secreted proteins propagate the signal to the membrane receptors and eventually to the nucleus through a battery of different signaling pathways. The effectors introduced into the plant cell by the pathogens are recognized by the R proteins resulting in a hypersensitive response and programmed cell death.15 In the nucleus, regulation of the gene expression leads to rebalancing of the levels of mRNA and proteins required for addressing the stress. The stress signaling also causes post-translational modifications (PTM) of various proteins thereby either activating them for their involvement or degrading them. The changes occurring at the protein level in most cases do not correlate with the mRNA or the miRNA levels.16 Thus, in cases with complex traits involving protein modification, and changes in abundance, quantitative protein estimation plays an important role. In this proof-of-concept study, we evaluated the feasibility of peptide based marker assisted plant breeding by identifying putative peptide markers based on secretome proteomics. The primary

4 ACS Paragon Plus Environment

Page 4 of 30

Page 5 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

trait of interest was resistance to the oomycete P. infestans, the pathogen causing late blight leading to global financial loss of over €5 billion per annum.17 Experimental Procedures Plant growth and treatment SW93-1015 is used for resistance breeding and is resistant to late blight while Desirée is susceptible.18, 19 In our previous work, the two potato clones were infected with P. infestans and the samples in biological triplicates were collected before infection and 6, 24 and 72 hours post infection.20 Thereafter, apoplastic secretome from the collected leaf samples were extracted and processed for shotgun mass spectrometry as described earlier.18 In this work, the raw data from that study was re-analyzed as described in the subsection “selection of putative peptide markers”. A segregating population of 59 progeny clones from a cross between Desirée and SW93-1015 was developed19 and used as a training set. The population was grown in the field in South Sweden (Borgeby) according to Good Experimental Practice consistent with EU directive 93/71, KIFS 2004:4, STAFS 2001:1 and SOP SLU 2004 including weekly fungicide treatments (Ranman, Infinito or Revus at different times over the season). Leaf secretome were collected21 in the field with a minimum of two biological replicates from four week old plants with no visible symptoms of P. infestans infection. Sample preparation for mass spectrometry was done by first purifying the apoplastic fluid by ultrafiltration in Amicon Ultra 0,5 mL 10K (Millipore) units, and then after trypsination, desalting by the use of UltraMicro spin columns (Silica C18, SUM SS18 V, Nest group). First 30 µl apoplastic fluid was diluted with 120 µl wash solution (8 M urea 100 mM ammonium carbonate) and applied to the Amicon filter, and centrifuged 30 minutes at 14000 g. The filter was washed with 120 µl wash solution and centrifuged again. To elute the

5 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

retentate, the Amicon filter device was placed upside down in a clean tube and centrifuged for 2 min at 1000g. After trypsin digestion, UltraMicro Spin Columns were first cleaned by 50 µl ACN and then washed twice with 50 µl 70% ACN, 5% FA, and before adding the sample, equilibrated twice with 50 µl 5% FA. All washes and elutions were done for 1 min at 1200 g. The acidified sample (FA to a final concentration of 5%) was added to the column and washed 4 times with 50 µl 5% FA. Elution was done twice with 50 µl 70% ACN, 5% FA. At last 25 µl of water was added to the sample and the samples were concentrated in a SpeedVac to remove ACN, but not to complete dryness. Phenotypic analysis of the segregating population Analyses of five different phenotypic traits were performed for the segregating population as follows: P. infestans resistance in leaves as described earlier;18 P. infestans resistance in tubers was screened by incubating tuber halves with P. infestans 88069 (400 spores/tuber half) for 9 days in 18oC, and thereafter measuring the diameter of 8 inoculations per genotype. A score of zero indicates no mycelium growth on the tubers and a score of 3 indicates tubers covered with mycelium. Yield was measured as kilograms per plant from ten plants; Tuber cooking type was scored with classification “A” for firm potato whereas classification “D” for more mealy potato and the intermediate classifications “AB”, “BC”, “C”, “BD” and “CD” as described earlier.22 Time of emergence was scored in the field as number of visual shoots per plants and then percent of tubers that showed shoots. Selection of putative peptide markers Putative peptide markers were selected based on the shotgun proteomics data published earlier.20 The raw shotgun data files were processed in Proteios Software Environment23 as described

6 ACS Paragon Plus Environment

Page 6 of 30

Page 7 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

earlier.20 The data were then normalized using the Loess regression method24 (Loess-G) selected using Normalyzer.25 Peptide markers were selected if they fulfilled either of the following criteria: i) Peptides statistically significantly differentially expressed (adjusted p-value < 0.05) in SW93-1015, Desirée or Sarpo Mira upon infection in any of the three post-infection time-points compared to the respective controls; ii) Peptides detected only in one of the clones at a given time-point; iii) Peptides from proteins with molecular function relevant to biotic stress; iv) Peptides were among the top candidates in the comparison of uninfected samples from Desirée and SW-1015 by Orthogonal Partial Least Square Discriminant Analysis (OPLS-DA) in SIMCA version 13.0.3 (Umetrics, Sweden). The OPLS-DA model was auto-fit and had three significant components (2 orthogonal and 1 predictive) with R2X 0.83 and Q2 0.95. The peptides with the Variable Importance of Projection score (VIP) above 1.0 were selected. Thereafter, SRM assays were set-up with a preliminary set of peptides selected based on the above criteria to identify peptides which could be reliably measured. SRM assays were initially developed using an unscheduled method and on samples pooled from pre- and post-infection time-points from each potato clone. SRM results were manually curated in Skyline26 and peptides were retained if atleast four-transitions could be reliably measured for each peptide and if the transitions showed single co-eluting peaks with minimum interference and background noise. A list with 104 peptides was thereafter finalized for the main study. Mass spectrometry Synthetic crude tryptic peptides (SpikeTides™ proteotypic peptides) for 104 candidates were obtained from JPT Peptide Technologies GmbH. The lyophilized peptides (~50 nmol/well) were dissolved by adding 5 µL concentrated formic acid (JT Baker) to each well, followed by 5 µL LiChrosolv water (Merc Millipore) and 300 µL 30% acetonitrile. Upon incubation in a sonication 7 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

bath for 5 min, peptides were pooled and dissolved to a final concentration of 2 pmol/ µL in 0.1% formic acid. LC separation for both shotgun and targeted proteomics was performed using an Eksigent nanoLC system (Eksigent Technologies, Dublin CA) with mobile phases A) 0.1% formic acid and B) acetonitrile with 0.1% formic acid. The mass spectra for the synthetic peptides were obtained using an LTQ Orbitrap XL ETD mass spectrometer (Thermo). For each run, 2 µL of sample was injected using an LC mobile phase gradient as described earlier.27 Targeted acquisition by SRM was performed on a TSQ Quantum Vantage (Thermo-Fisher Scientific, Waltham MA) triple quadrupole instrument using the same gradient and LC separation conditions as for the Orbitrap instrument. Up to four transitions for each peptide were selected in the Skyline software version 2.5. Scheduled runs were performed based on the retention times from the pooled synthetic peptides spiked with potato secretome background. Any shift in the retention time imposed by the complex background in the samples was adjusted based on the scheduled runs of the spiked-in synthetic peptide runs. The SRM dataset is deposited in PASSEL28 with ID PASS00692. Data analysis and model prediction The generated raw SRM data files were analyzed in Skyline version 2.5.26 For each peptide and sample, total area was estimated by summing the peak areas of all transitions for the given peptide and was exported from Skyline. The data were then normalized using Loess regression24 applied to replicates (Loess-R) using Normalyzer.25 Arithmetic mean of the peptide intensity from replicates was used as a representative value for each clone in the segregating population. Samples were removed from further analysis if identified as outliers in a PCA plot or if they had no biological replicates. Late blight resistance and cooking type phenotypes were of qualitative nature. Late blight resistance trait was recorded in two groups ("S” and “R”) and was used as 8 ACS Paragon Plus Environment

Page 8 of 30

Page 9 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

such for further analysis. Cooking type trait was recorded in seven groups, however only samples in groups “A” and “B” had at-least 20 samples each and thus samples from only these two groups were used for classification. Phenotypes yield, emergence and P. infestans resistance in tuber, were of quantitative nature and were thus clustered in two groups based on the mean of the overall distribution of each phenotypic data. Selection of a suitable classification algorithm and the number of features was done using the nlcv package29 in R/Bioconductor version 3.1.2.30 The nlcv function was run using the following parameters: nRuns:1000, fsMethod:randomForest, classdist:unbalanced, ntree:20,000. The package performs model training and testing using five different classifiers, namely: Diagonal Linear Discriminant Analysis (dlda), randomForest (RF), bagging (bagg), pam and Support Vector Machines (svm). The nlcv package implements nested cross-validation, wherein, the inner cross-validation is used to tune the model parameters and the outer cross-validation is used to estimate the misclassification rate and the top features. For model training, the training data set were randomly split 1000 times (nRuns:1000) into 2/3 training and 1/3 test sets. The R scripts used in this project will be provided upon request. Results The primary requirement for SRM analysis is the selection of peptides, and in the case of plant breeding, the most relevant peptides are those with differing expression levels in cultivars with contrasting phenotype. In this study, late blight (caused by P. infestans) resistance (both in tubers and foliage) was the primary trait of interest, and we hypothesized that peptides or proteins with differing levels between resistant and susceptible clones were the most promising markers. Thus, a new strategy was devised which involved a) infection of resistant and susceptible plants, b) secretome isolation, c) shotgun proteomics of the secretome, d) peptide marker selection by univariate and multivariate analysis, e) SRM assay development with synthetic peptides, f) 9 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 30

development of a bi-parental crossing population and a machine learning classifier model for phenotypic prediction based on the generated SRM data (Figure 1). Steps a-c were performed previously,20 wherein secretome proteome analyses were performed at four different time-points (before infection, and 6, 24 and 72 hours post-infection) from P. infestans infected leaves of Desirée (sensitive to late blight) and SW93-1015 (resistant to late blight).19, 20 In this work, the shotgun proteomics raw data were re-analyzed to identify potential peptide markers which showed differential expression under the studied conditions. As the development of an SRM marker panel requires pre-selection of the peptides to be measured, we employed several parallel methods to generate a candidate peptide list. The peptide selection was done by pair-wise comparison of samples from pre- and post-infection time points and by multivariate analysis of un-infected samples from Desirée and SW-1015 by Orthogonal Partial Least Square Discriminant Analysis (OPLS-DA). Peptides were also selected based on the molecular function of the corresponding proteins. This resulted in the selection of over 500 peptides as potential candidates for marker selection. The peptides were initially quality controlled by performing SRM assays of the corresponding peptides in biological samples and peptides which were unreliable for SRM analysis were removed by manual curation. Peptides were considered unreliable if they produced double peaks, background interference or had less than four clearly detectable transitions. Finally 104 peptides were selected (Table S-1) and the corresponding SRM assays were developed and the retention time estimated using synthetic peptides, resulting in a peptide marker panel with 104 peptides (Table S-2). The effectiveness of the peptide marker panel was first tested using SW93-1015 and Desirée secretome samples from plants infected with P. infestans under controlled conditions. The principal component analysis (PCA) plot shows a good separation between the untreated and

10 ACS Paragon Plus Environment

Page 11 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

treated samples from a given clone and between samples from the susceptible and resistant clones (Figure 2a). This suggests that the marker panel is efficient at grouping samples from the susceptible and resistant clones into separate clusters irrespective of their infection stage. It could be inferred from the heatmap that the peptide intensities differed both between the time-points and between clones, and that most of the selected peptides had lower intensity in SW-1015 controls compared to Desirée controls (Figure 2b). As the main aim of this assay was to assess its suitability in breeding, we thus continued and analyzed this marker panel in a bi-parental segregating population. Segregating lines from a cross between SW93-1015 and Desirée were grown in a field in South Sweden. Leaf secretome samples were collected from four-week-old plants. No plants showed any symptoms of late blight at the time of sampling. The samples were thereafter analyzed by SRM using the marker panel with 104 peptides. A suitable classification model for P. infestans resistance in leaves was selected based on the evaluation of the Mis-Classification Rate (MCR) from five different models estimated by the NLCV package29 (Figure S-1). For model training, the data were split into 2/3 training and 1/3 validation set. This was iterated 1000 times and the performance of the trained model was estimated based on its prediction of the validation dataset during each iteration. The random forest (RF) model for feature selection had the lowest MCR of 0.24 for eight peptide markers, an odds ratio (OR) of 16.2 and an area under curve (AUC) of 0.84 (Table 1 and Table S-3). The model was considered significant as the 99% confidence interval (CI) [3.1-85.5] (denoted as ORCI) of the odds ratio did not include the null value of 1.0.31 All of the eight peptide markers identified by the model had lower mean expression values in the resistant lines compared to the susceptible ones (Figure 3a). Although the difference in the levels of the eight markers was small in the sensitive and resistant lines, it was nevertheless statistically

11 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 30

significant (Table S-4) and the overall pattern as used by the machine learning model is clearly discriminative between the groups. The eight markers map to ten different proteins since one peptide mapped to three homologous proteins (Table S-4). To further characterize these eight peptide markers, we performed the SRM assays on the secretome samples collected during a previous study.20 The samples were obtained from the clones Desirée, SW93-1015 and Sarpo Mira with the infection time-points 0h, 6h, 24h and 3d as described earlier.20 The markers for P. infestans resistance in leaves showed differing intensity patterns in the pre- and post-infection time-points from three potato clones (Figure 3b). Marker Pm6 (pathogenesis related protein - PR-1) was highly induced 3 days post-infection in the susceptible clone Desirée but only moderately induced after 3 days in the resistant clones Sarpo Mira and SW93-1015. Pm89 (endochitinase), Pm101 (Serine protease), Pm16 (Serine-threonine protein kinase) and Pm15 (glycosyl hydrolase) had lower constitutive levels in SW93-1015 compared to Desirée and were up-regulated in SW93-1015 upon infection. Also, the expression profiles of these four peptides correlated during the four infection time-points in SW93-1015. However, the expression patterns differed in Desirée indicating differences in up-stream genetic pathways regulating underlying genes. Markers Pm49 (acetyl esterase) and Pm52 (Leucine-rich repeat family protein) were both induced upon infection in SW93-1015 but were un-induced in Desirée. The eight peptide markers selected for the prediction of P. infestans resistance in leaves are from genes located on four different chromosomes (Figure S-2). Interestingly, several of these peptides come from protein families with pathogenesis-related function and they were expressed in higher levels in the susceptible clones suggesting that these proteins could be the susceptibility factors. We recently cloned a classic R gene with a nucleotide binding (NB) domain and a leucine-rich 12 ACS Paragon Plus Environment

Page 13 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

repeat (LRR) domain conferring resistance to leaf late blight resistance in SW93-1015.32 From our SRM results, it could be hypothesized that, in the field, even without any visible infection symptoms in the susceptible lines, this resistance gene is presumably partly activated and this resistance signaling involves repression of the marker peptides. Apart from P. infestans resistance in leaves, four other phenotypic traits were measured in the segregation population (Table 1). RF modeling for feature selection was then implemented on the remaining four traits to evaluate the feasibility of developing markers based on the existing data for these traits (Table 1). Of the four traits, only two traits, P. infestans resistance in tubers and the plant yield (kg/plant) were significant as the 99% ORCI of the two models did not include the null value of 1.0 (Table 1). RF prediction models for both P. infestans resistance in tubers and yield were obtained with eight peptides each (Figure 4a-4b, Table S-5 and Table S-6). For P. infestans resistance in tubers, four peptides were in close proximity on chromosome 6, indicating a putative hotspot for tuber resistance (Figure S-2), a trait for which no robust DNA resistance markers are available today. Out of the eight peptide markers for P. infestans resistance in tubers, two peptides (Pm46 and Pm34) belonged to Kunitz trypsin inhibitor and had lower levels in the resistant lines. Peptide Pm64 (Beta-D-glucan exohydrolase) and Pm57 (peroxidase) had higher levels in the resistant lines. Pm22 (cystatin), Pm40 (Non-specific lipid-transfer protein), and Pm74 (Serine carboxypeptidase III) were all detected to have lower levels in the resistant lines (Table S-5). Among the yield markers, Pm93 (Reticuline oxidase), Pm13 (Reticuline oxidase), Pm46 (Kunitz trypsin inhibitor) and Pm67 (Peroxidase 40) had higher levels in the high yield lines.

While

Pm86

(GDSL-like

lipase),

Pm83

(Subtilase),

Pm35

(Xyloglucan

endotransglycosylase) and Pm58 (Endo-1,4-beta-glucanase) had lower levels in the high yield lines (Table S-6).

13 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 30

It is intriguing that the plant yield trait had a high AUC albeit with higher MCR, especially since yield was not the criteria for marker selection. However, it can be hypothesized that the plants with higher resistance might be less stressed and thus yield more. We thus estimated the phenotypic correlation between P. infestans resistance in leaves and yield in the segregating population. The results however showed that there was no correlation (0.05) between the two phenotypes (Figure S-3). The highest correlation (0.44) was seen between P. infestans leaf resistance and P. infestans tuber resistance but there was only one peptide marker in common between these two phenotypes (Figure 4c). In the clone SW93-1015, P. infestans resistance in tubers and in leaves is not well-correlated,18 and the cloned R gene-type in SW93-1015 has been shown to be foliage specific.32, 33 This follows well with the minor overlap of peptide markers for leaf and tuber blight resistance found in this study. The fact that we were able to predict two tuber traits by analysis of peptides secreted in the leaf secretome opens up new possibilities for prediction of important tuber traits early on in the breeding processes such as at the seedling stage. A concern with all markers is that their efficiency could be limited to the clones they were developed from. Thus, to test the efficiency of these markers across different potato clones, the SRM marker panel was tested on a new dataset with samples from nine different potato clones grown in the greenhouse. The PCA plot from the SRM data showed that the replicate samples cluster closer together compared to the samples from different clones (Figure 5a). Toluca was most distantly spaced suggesting a different expression profile of the peptides. It is indeed known that Toluca is resistant to P. infestans17 and as the selection of the peptides in this work was based on the resistance characteristics, it could be possible that the separation seen for Toluca is due to the resistance trait. To quantify the efficiency of the peptide assays in different clones, we

14 ACS Paragon Plus Environment

Page 15 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

estimated the standard deviation of mean (sd) for all replicates for a given peptide and a potato clone. The mean of sd (mean-sd) was thereafter obtained by averaging sd from all peptides from a given clone and was compared to the mean-sd obtained across all peptides and clones (Figure 5b). The results show that the mean-sd was lower for replicates compared to that obtained from samples across all clones. This strongly indicates that the developed SRM assays have high sensitivity and that the SRM based markers could be used for trait prediction across different clones. Thus, the results suggest that the markers can be applicable to samples from different genetic backgrounds grown in field or in greenhouse. Discussion The results from this work show that SRM is a promising technology for Marker Assisted Selection (MAS) in plant breeding. The most important criteria for a successful SRM marker panel development is the selection of peptides as putative markers. In this work, the primary trait of interest was P. infestans leaf resistance, thus, we selected peptides based on expression levels before and after P. infestans infection in plants. The final analysis was done on unchallenged field material in order to facilitate prediction of several phenotypes at the same time from relevant plant material. We implemented a strategy utilizing peptide selection based on both univariate and multivariate analysis. We thus retained peptides if they were either significantly differentially expressed upon infection or were identified as highly important in the un-induced samples of susceptible and resistant clones (OPLS-DA; VIP ≥ 1.0). This strategy allowed us to select peptides which are induced upon infection but also those that differ in their constitutive levels in un-induced samples. We also included peptides which were selected based on the functional annotation of the corresponding proteins. The selection of peptide has to be based on the aim of the SRM assay and thus is the most critical step in the entire workflow. As it is 15 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 30

possible to include 100-200 peptides in a single schedule SRM run, there is a good margin of error in peptide selection. One of the limitations of the pre-selection process is that the selected peptides are mainly presumed to be associated with a given phenotypic trait. This could make it more difficult to perform predictions on other traits which were not considered during the peptide selection process. However, with a basic strategy involving targeting peptides which differs between the parents under standard growth conditions, one may be able to target several traits, as was also shown in the present work. In this work, the developed assays were peptide centric due to the concern with protein inference problems in a tetraploid potato with un-sequenced genome, and since apoplastic proteins may undergo multiple processing events which further complicates peptide to protein mapping. We think that in a sample from a complex polyploid genome, individual peptides could produce more discriminative markers compared to the sum or average of multiple peptides per protein. As the assay development was peptide centric, some of the proteins were represented by just one peptide while others by several selected peptides (Table S-2), and we also included peptides which mapped to multiple protein accessions. Thus, in this work, no filtering was done on peptides even if multiple peptides belonged to the same protein. It is likely that established protein markers could be measured by multiple peptides to increase robustness, but this was not attempted here Upon selection of the putative peptide markers, the next step is to test their suitability for SRM. In a high-throughput assay, it is important to automate the final analysis of samples as much as possible. Thus, peptides with uniform transitions and minimal interference are preferred. Upon selection of the preliminary list of peptides, we obtained synthetic peptides, as suggested earlier34 and performed un-scheduled SRM runs. Peptides producing for example, double peaks, uneven looking peaks or those with less than four valid transitions were removed. The remaining 16 ACS Paragon Plus Environment

Page 17 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

synthetic peptides were then tested and SRM assays developed in the secretome samples. Thus, while selecting peptides for the SRM assay panel, it is important to account for further loss of peptides at each step until the assay is finalized. In this work, we worked with a segregating population from a cross between SW93-1015 and Desirée. SW93-1015 has good P. infestans resistance but is poor in other agronomic traits. Thus, in a cross involving SW93-1015 as a parental donor, it is important to reduce the linkage drag using markers, especially in a non-inbred tetraploid crop like the potato. Thus, a marker panel involving markers for the trait of interest (for example resistance to late blight in the tuber) together with markers for other agronomic traits (for example yield) would facilitate selection of individual plants with multiple traits at a very early stage of development. Consequently, a successful SRM assay panel with markers for multiple traits would not only save time but could become a feasible alternative or complement DNA-based selection methods. Several different classifiers are available to build the prediction models from the obtained SRM data. In this work, we used random forest classifier as it is suitable both for two-class and multiclass problems.35 Furthermore, there are only a few parameters to fine-tune, the most important of them being the number of variables to try and the number of trees to grow for each forest.35 It has also shown to work efficiently with relatively small datasets with as few as 38 samples in one study.35 In summary, we have developed a new marker panel for selection of resistance lines from a crossing population and showed that the peptide-based marker panel can also be used to predict other important phenotypes that currently do not have good DNA based markers. The results indicate that the assay works efficiently with both field grown and greenhouse grown material. Furthermore, a meticulous selection of peptide markers for different traits could be integrated 17 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 30

into a single assay thereby facilitating the screening and selection of multiple agronomic traits in a single sample. Conclusion This work demonstrated that SRM is a promising technology for marker assisted selection in plant breeding and it opens up new possibilities for protein-based QTL analyses. The proposed work-flow could be applied to other crops and for the development of markers for different traits. Continuous improvements in mass-spectrometry instruments will further facilitate highthroughput analysis of samples using SRM based markers thereby allowing screening of samples with hundreds of markers representing multiple phenotypic traits in a single run. Acknowledgements We thank Mats Mågård and Karin Hansson for assistance with the LC-MS runs and Mia Mogren, Fredrik Reslow, Per Mühlenbock, Dharani Dhar Burra, Kibrom Abreha, Tewodros Mulugeta and Liying Wang for assistance with phenotyping and sample processing. This work was funded by Swedish Foundation for Environmental Strategic Research (Mistra Biotech) and Swedish Foundation for Strategic Research (RBb08-0006). Author contributions AC, EAn, FL: Designed research. AC performed the mass spectrometry experiments and analyzed the data. EAl and TB grew plants, performed phenotyping and protein extraction. EAn and FL coordinated the project. AC, EAn and FL drafted the manuscript. All authors contributed to the text and approved the final manuscript.

18 ACS Paragon Plus Environment

Page 19 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

References (1) Domon, B.; Aebersold, R., Options and considerations when selecting a quantitative proteomics strategy. Nat Biotechnol 2010, 28 (7), 710-21. (2) Addona, T. A.; Abbatiello, S. E.; Schilling, B.; Skates, S. J.; Mani, D. R.; Bunk, D. M.; Spiegelman, C. H.; Zimmerman, L. J.; Ham, A. J.; Keshishian, H.; Hall, S. C.; Allen, S.; Blackman, R. K.; Borchers, C. H.; Buck, C.; Cardasis, H. L.; Cusack, M. P.; Dodder, N. G.; Gibson, B. W.; Held, J. M.; Hiltke, T.; Jackson, A.; Johansen, E. B.; Kinsinger, C. R.; Li, J.; Mesri, M.; Neubert, T. A.; Niles, R. K.; Pulsipher, T. C.; Ransohoff, D.; Rodriguez, H.; Rudnick, P. A.; Smith, D.; Tabb, D. L.; Tegeler, T. J.; Variyath, A. M.; Vega-Montoto, L. J.; Wahlander, A.; Waldemarson, S.; Wang, M.; Whiteaker, J. R.; Zhao, L.; Anderson, N. L.; Fisher, S. J.; Liebler, D. C.; Paulovich, A. G.; Regnier, F. E.; Tempst, P.; Carr, S. A., Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotechnol 2009, 27 (7), 633-41. (3) Picotti, P.; Bodenmiller, B.; Mueller, L. N.; Domon, B.; Aebersold, R., Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics. Cell 2009, 138 (4), 795-806. (4) Whiteaker, J. R.; Lin, C.; Kennedy, J.; Hou, L.; Trute, M.; Sokal, I.; Yan, P.; Schoenherr, R. M.; Zhao, L.; Voytovich, U. J.; Kelly-Spratt, K. S.; Krasnoselsky, A.; Gafken, P. R.; Hogan, J. M.; Jones, L. A.; Wang, P.; Amon, L.; Chodosh, L. A.; Nelson, P. S.; McIntosh, M. W.; Kemp, C. J.; Paulovich, A. G., A targeted proteomics-based pipeline for verification of biomarkers in plasma. Nat Biotechnol 2011, 29 (7), 625-34. (5) Picotti, P.; Clement-Ziza, M.; Lam, H.; Campbell, D. S.; Schmidt, A.; Deutsch, E. W.; Rost, H.; Sun, Z.; Rinner, O.; Reiter, L.; Shen, Q.; Michaelson, J. J.; Frei, A.; Alberti, S.; Kusebauch, U.; Wollscheid, B.; Moritz, R. L.; Beyer, A.; Aebersold, R., A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature 2013, 494 (7436), 266-70. (6) Foss, E. J.; Radulovic, D.; Shaffer, S. A.; Ruderfer, D. M.; Bedalov, A.; Goodlett, D. R.; Kruglyak, L., Genetic basis of proteome variation in yeast. Nat Genet 2007, 39 (11), 1369-75. (7) Xie, C.; Xu, S., Efficiency of multistage marker-assisted selection in the improvement of multiple quantitative traits. Heredity (Edinb) 1998, 80 ( Pt 4), 489-98. (8) Cubillos, F. A.; Coustham, V.; Loudet, O., Lessons from eQTL mapping studies: non-coding regions and their role behind natural phenotypic variation in plants. Curr Opin Plant Biol 2012, 15 (2), 192-8. (9) Amiour, N.; Merlino, M.; Leroy, P.; Branlard, G., Proteomic analysis of amphiphilic proteins of hexaploid wheat kernels. Proteomics 2002, 2 (6), 632-41. (10) Amiour, N.; Merlino, M.; Leroy, P.; Branlard, G., Chromosome mapping and identification of amphiphilic proteins of hexaploid wheat kernels. TAG Theoretical and Applied Genetics 2003, 108 (1), 6272. (11) Merlino, M.; Leroy, P.; Chambon, C.; Branlard, G., Mapping and proteomic analysis of albumin and globulin proteins in hexaploid wheat kernels (Triticum aestivum L.). Theoretical and Applied Genetics 2009, 118 (7), 1321-1337. (12) Damerval, C.; Maurice, A.; Josse, J. M.; de Vienne, D., Quantitative trait loci underlying gene product variation: a novel perspective for analyzing regulation of genome expression. Genetics 1994, 137 (1), 289-301. (13) Tjalsma, H.; Bolhuis, A.; Jongbloed, J. D.; Bron, S.; van Dijl, J. M., Signal peptide-dependent protein transport in Bacillus subtilis: a genome-based survey of the secretome. Microbiol Mol Biol Rev 2000, 64 (3), 515-47. (14) Alexandersson, E.; Ali, A.; Resjö, S.; Andreasson, E., Plant secretome proteomics. Front Plant Sci 2013, 4, 9. (15) Jones, J. D.; Dangl, J. L., The plant immune system. Nature 2006, 444 (7117), 323-9.

19 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 30

(16) Maier, T.; Guell, M.; Serrano, L., Correlation of mRNA and protein in complex biological samples. FEBS Lett 2009, 583 (24), 3966-73. (17) Haverkort, A. J.; Struik, P. C.; Visser, R. G. F.; Jacobsen, E., Applied Biotechnology to Combat Late Blight in Potato Caused by Phytophthora Infestans. Potato Research 2009, 52 (3), 249-264. (18) Ali, A.; Moushib, L. I.; Lenman, M.; Levander, F.; Olsson, K.; Carlson-Nilson, U.; Zoteyeva, N.; Liljeroth, E.; Andreasson, E., Paranoid potato: phytophthora-resistant genotype shows constitutively activated defense. Plant Signal Behav 2012, 7 (3), 400-8. (19) Lenman, M.; Ali, A.; Mühlenbock, P.; Carlson-Nilsson, U.; Liljeroth, E.; Champouret, N.; Vleeshouwers, V. A. A.; Andreasson, E., Effector-driven marker development and cloning of resistance genes against Phytophthora infestans in potato breeding clone SW93-1015. Theoretical and Applied Genetics 2015, doi:10.1007/s00122-015-2613-y (20) Ali, A.; Alexandersson, E.; Sandin, M.; Resjö, S.; Lenman, M.; Hedley, P.; Levander, F.; Andreasson, E., Quantitative proteomics and transcriptomics of potato in response to Phytophthora infestans in compatible and incompatible interactions. BMC Genomics 2014, 15 (1), 497. (21) Andreasson, E.; Abreha, K. B.; Resjö, S., The Isolation of Plant Organelles and Structures: Isolation of apoplast. (In Press). Methods in Molecular Biology 2016. (22) Tiemens-Hulscher, M.; Delleman, J.; Eisinger, E.; Lammerts Van Bueren, E., Potato breeding : a practical manual for the potato chain. Aardappelwereld: Netherlands, 2013. (23) Häkkinen, J.; Vincic, G.; Månsson, O.; Wårell, K.; Levander, F., The proteios software environment: an extensible multiuser platform for management and analysis of proteomics data. J Proteome Res 2009, 8 (6), 3037-43. (24) Smyth, G. K., limma: Linear Models for Microarray Data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor Statistics for Biology and Health, 2005; pp 397-420. (25) Chawade, A.; Alexandersson, E.; Levander, F., Normalyzer: a tool for rapid evaluation of normalization methods for omics data sets. Journal of Proteome Research 2014, 13 (6), 3114-20. (26) MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.; Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J., Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26 (7), 966-968. (27) Chawade, A.; Sandin, M.; Teleman, J.; Malmstrom, J.; Levander, F., Data processing has major impact on the outcome of quantitative label-free LC-MS analysis. Journal of Proteome Research 2015, 14 (2), 676-87. (28) Farrah, T.; Deutsch, E. W.; Kreisberg, R.; Sun, Z.; Campbell, D. S.; Mendoza, L.; Kusebauch, U.; Brusniak, M. Y.; Huttenhain, R.; Schiess, R.; Selevsek, N.; Aebersold, R.; Moritz, R. L., PASSEL: the PeptideAtlas SRMexperiment library. Proteomics 2012, 12 (8), 1170-5. (29) Talloen, W.; Verbeke, T., Nested Loop Cross Validation for Classification using nlcv. In Bioconductor: 2011. (30) Team, R. C., R: A Language and Environment for Statistical Computing. In 2014. (31) Glas, A. S.; Lijmer, J. G.; Prins, M. H.; Bonsel, G. J.; Bossuyt, P. M., The diagnostic odds ratio: a single indicator of test performance. J Clin Epidemiol 2003, 56 (11), 1129-35. (32) Lenman, L.; Ali, A.; Mühlenbock, P.; Carlson-Nilsson, U.; Liljeroth, E.; Vleeshouwer, W.; Andreasson, E., Effector-driven marker development and cloning of resistance genes against Phytophthora infestans in potato breeding clone SW93-1015. Theor Applied Genetics 2015. (33) Park, T.-H.; Vleeshouwers, V. G. A. A.; Kim, J.-B.; Hutten, R. C. B.; Visser, R. G. F., Dissection of foliage and tuber late blight resistance in mapping populations of potato. Euphytica 2005, 143 (1-2), 75-83. (34) Picotti, P.; Rinner, O.; Stallmach, R.; Dautel, F.; Farrah, T.; Domon, B.; Wenschuh, H.; Aebersold, R., High-throughput generation of selected reaction-monitoring assays for proteins and proteomes. Nat Methods 2010, 7 (1), 43-6.

20 ACS Paragon Plus Environment

Page 21 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

(35) Díaz-Uriarte, R.; Alvarez de Andrés, S., Gene selection and classification of microarray data using random forest. BMC Bioinformatics 2006, 7 (1), 3. (36) Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J. C.; Muller, M., pROC: an opensource package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011, 12, 77.

Table 1: Performance of the RF model for feature selection. P. infestans

P. infestans

Yield

Cooking

leaf resistance

tuber resistance

(Kg/plant)

type

8

8

8

10

3

58

58

59

42

59

0.84

0.81

0.80

0.72

0.70

Emergence

No. of Peptides No. of samples AUC AUC CI

0.510.70-0.99

0.66-0.95

0.64-0.95

(99%)

0.52-0.88 0.92

Accuracy

0.78

0.76

0.78

0.64

0.66

Sensitivity

0.79

0.75

0.78

0.57

0.77

Specificity

0.76

0.77

0.78

0.71

0.55

PPV

0.70

0.75

0.81

0.67

0.64

16.2

12

8.4

2.6

3.5

3.1-85.5

2.5-58.3

1.6-42.6

0.5-14.0

0.8-15.2

0.24

0.29

0.31

0.39

0.40

Odds ratio (OR) ORCI (99%) MCR

21 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 30

AUC: Area Under Curve; AUC CI: Confidence Interval for AUC estimated at 99% CI using 10,000 bootstrap replicates36; PPV: Positive Predictive Value; ORCI: Odds Ratio CI estimated at 99% CI. MCR: Mis-Classification Rate. For quantitative traits, two groups were made based on the data median. Performance estimates are based on the prediction results from the validation set obtained during each iteration.

22 ACS Paragon Plus Environment

Page 23 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure Legends Figure 1: Schematic for marker selection and trait prediction. a) Susceptible and resistant plants are infected; b) Secretome sampled at different time points; c) Shotgun proteomics is done from the secretome to identify expressed proteins; d) Univariate and/or multivariate statistics to identify putative markers which are significantly differently expressed in the two groups; e) Development of SRM assays for the selected markers using synthetic peptides; f) Model development from the generated data. Figure 2: Results based on the generated SRM assay. a) Samples from the two clones Desirée (Des) and SW93-1015 (1015) both from uninfected (0h) and three post-infection time points (6h, 24h and 3d); b) Heatmap of the peptide intensities from SRM marker panel in Desirée and SW93-1015 clones. Figure 3: a) Boxplots of 8 peptide markers for P. infestans resistance in leaves selected by the randomForest classifier for feature selection. For each peptide, log2 intensities are averaged over all susceptible and resistant samples in the population. Whiskers range from the hinge to the value that is within 1.5 times inter-quartile range of the hinge. b) Plots of the eight peptide markers with the intensities (log2) estimated by SRM in Desirée, SW93-1015 and Sarpo Mira at four different time-points. Figure 4: Boxplots of the peptides selected for a) Boxplots of 8 peptide markers for P. infestans resistance in tuber; b) Boxplots for 8 peptide markers for yield. Whiskers range from the hinge to the value that is within 1.5 times inter-quartile range of the hinge; c) Venn diagram showing the common and unique peptide markers for three phenotypic traits. Markers were compared based on their peptide sequence.

23 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 30

Figure 5: Analysis of all markers in the SRM marker panel for their efficiency in various potato clones with different genetic background. a) PCA plot obtained from the SRM data from all peptides in the marker panel and secretome samples from nine potato clones; b) Mean standard deviation (mean-sd) estimated by averaging sd from all peptides intensities in replicates from each clone. Supplementary Tables Table S-1: Table with peptide level differences upon P. infestans infection in the three clones SW93-1015, Desirée and Sarpo Mira. Table S-2: Table with 104 peptides used in the final assay. Table S-3: Performance metrics for randomForest classifier for feature selection for P. infestans resistance. Table S-4: Eight features selected for the prediction of resistance to P.infestans. Table S-5: Putative markers identified from randomForest classification model for Phytophthora resistance in tubers. Table S-6: Putative markers identified from randomForest classification model for Yield (kg/plant). Supplementary Figures Figure S-1: Mis-classification rate (MCR) for various classifiers. Figure S-2: Chromosomal locations of various peptides and proteins. Figure S-3: Correlation plot of various phenotypes.

24 ACS Paragon Plus Environment

Page 25 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 1. Schematic for marker selection and trait prediction. a) Susceptible and resistant plants are infected; b) Secretome sampled at different time points; c) Shotgun proteomics is done from the secretome to identify expressed proteins; d) Univariate and/or multivariate statistics to identify putative markers which are significantly differently expressed in the two groups; e) Development of SRM assays for the selected markers using synthetic peptides; f) Model development from the generated data. 76x147mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2. Results based on the generated SRM assay. a) Samples from the two clones Desirée (Des) and SW93-1015 (1015) both from uninfected (0h) and three post-infection time points (6h, 24h and 3d); b) Heatmap of the peptide intensities from SRM marker panel in Desirée and SW93-1015 clones. 150x62mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 26 of 30

Page 27 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 3. a) Boxplots of 8 peptide markers for P. infestans resistance in leaves selected by the randomForest classifier for feature selection. For each peptide, log2 intensities are averaged over all susceptible and resistant samples in the population. Whiskers range from the hinge to the value that is within 1.5 times inter-quartile range of the hinge. b) Plots of the eight peptide markers with the intensities (log2) estimated by SRM in Desirée, SW93-1015 and Sarpo Mira at four different time-points. 183x81mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 4. Boxplots of the peptides selected for a) Boxplots of 8 peptide markers for P. infestans resistance in tuber; b) Boxplots for 8 peptide markers for yield. Whiskers range from the hinge to the value that is within 1.5 times inter-quartile range of the hinge; c) Venn diagram showing the common and unique peptide markers for three phenotypic traits. Markers were compared based on their peptide sequence. 144x58mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 28 of 30

Page 29 of 30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 5. Analysis of all markers in the SRM marker panel for their efficiency in various potato clones with different genetic background. a) PCA plot obtained from the SRM data from all peptides in the marker panel and secretome samples from nine potato clones; b) Mean standard deviation (mean-sd) estimated by averaging sd from all peptides intensities in replicates from each clone. 130x58mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

TOC Figure 83x45mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 30 of 30