Bioinformatic Challenges in Targeted Proteomics - ACS Publications

Aug 7, 2012 - such methods in an assisted workflow will ease both the usage of targeted proteomics in experimental studies as well as the further deve...
0 downloads 8 Views 639KB Size
Reviews pubs.acs.org/jpr

Bioinformatic Challenges in Targeted Proteomics Daniel Reker and Lars Malmström* ETH Zurich, Wolfgang-Pauli-Strasse 16, 8093 Zurich, Switzerland ABSTRACT: Selected reaction monitoring mass spectrometry is an emerging targeted proteomics technology that allows for the investigation of complex protein samples with high sensitivity and efficiency. It requires extensive knowledge about the sample for the many parameters needed to carry out the experiment to be set appropriately. Most studies today rely on parameter estimation from prior studies, public databases, or from measuring synthetic peptides. This is efficient and sound, but in absence of prior data, de novo parameter estimation is necessary. Computational methods can be used to create an automated framework to address this problem. However, the number of available applications is still small. This review aims at giving an orientation on the various bioinformatical challenges. To this end, we state the problems in classical machine learning and data mining terms, give examples of implemented solutions and provide some room for alternatives. This will hopefully lead to an increased momentum for the development of algorithms and serve the needs of the community for computational methods. We note that the combination of such methods in an assisted workflow will ease both the usage of targeted proteomics in experimental studies as well as the further development of computational approaches. KEYWORDS: targeted proteomics, selected reaction monitoring, computational prediction, data mining



proteomics.10,11 Therefore, we will focus this review on this technology. In SRM, the instructions for how to measure peptides are called assays and each assay consists of one or more transitions, where a transition consists of a pair of numbers (Q1,Q3). The first, Q1, refers to the isolation window of the precursor ion and the second, Q3, is the isolation window of a single fragment produced in the collision cell. As such, a transition corresponds to a single peak in an MS2 spectra, which is the result of isolating an ion (Q1) and scanning the resulting fragments (Q3). Hence, the goal is to measure only preselected, informative (i.e., indicating the presence of a specific protein) and robustly measurable (i.e., only underlying small experimental fluctuations) peaks instead of scanning the entire mass range. Less data is produced but the data contains a higher fraction of useful information, assuming the selection of transitions was optimal. This method makes accurate qualitative and quantitative analysis possible for less abundant proteins whose intensity in the MS1 might be too low to be selected for MS2 in shotgun approaches. SRM requires in-depth knowledge about what to measure and how to measure, making these types of experiments challenging. A review by Lange et al.12 summarized the important parameters from an experimentalist’s point of view. Most studies use transitions estimated from former experiments or large online databases.13 However, in the absence of known transitions, these have to be generated using, for example, synthetic peptides, previously collected data of the proteins

INTRODUCTION Mass spectrometry (MS) has proven useful for efficiently identifying and quantifying proteins in complex samples and has become a workhorse of proteomics.1 Proteins in the sample under investigation are typically enzymatically digested into peptides which are subsequently separated using liquid chromatography, ionized and introduced into the MS. Two types of spectra are commonly collected, MS1 (or survey scans) in which the intensity and m/z are measured for intact peptides and MS2 where one of the ions detected in the MS1 is isolated, fragmented and measured. Ions in the MS1 are selected using simple rule-based algorithms where highly abundant ions are preferentially selected and temporarily excluded from further selection in order to increase the sampling depth. Peptide identities are commonly inferred from the MS2 spectra using search engines2−4 followed by a postsearch filtering5,6 and their abundances are estimated using applications such as SuperHirn7 or openMS8 which rely on the MS1 spectra. The resulting data set is biased to high-abundance proteins and undersampling sometimes leads to incomplete expression profiles for low-abundance proteins, especially if many samples are measured. Many variants of this technology have been developed that address some of these issues, see Jates et al.9 for an overview. Targeted proteomics represents a fundamentally different mass spectrometry approach that relies on knowing which peptides to measure and how to measure these peptides.10 This leads to an increased efficiency and sensitivity. Among several proposed implementations of this approach, selected reaction monitoring mass spectrometry (SRM) represents an emerging and probably the most popular technology for targeted © XXXX American Chemical Society

Received: March 21, 2012

A

dx.doi.org/10.1021/pr300276f | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

“closing the loop” no accepted standards, poor interoperability

increase applicability heuristics only single type of fluctuation long computation time

global shift, fluctuations combinatorial and approximative solutions guidance, plug-in

quantification from relative peak height

measurement scheduling

putting everything together

5

6

7

framework

algorithms and databases

quantities time-schedule for m/z filtering pipeline implementation peak intensities transitions

peak validation 4

data normalization optimization

-synthetic peptide

classification of peaks

additional experiments, decoy transitions not available

additional time and expertise

quantitative model, combination with peptide selection automated approaches, more datadriven development of algorithms mobile-proton concept choosing fragment 3

classification: Sequencebased prediction classification: Sequenceset of fragments based prediction classification: peaks as signal measured peaks or noise synthetic peptide proposal peptide to investigate choosing peptide 2

future directions

more mechanistic models

data not available/bias to training data empirical, qualitative database lookup/prediction

specific peptide sequences fragment ion names

neglect data

shortcomings current solutions

access single database target proteins

output input

all proteins from sample (e.g., proteome) set of peptides data mining

instance of challenge sec

Table 1. Overview of the Bioinformatic Challenges B

target protein selection

1. AUTOMATED SELECTION OF PROTEINS Targeted proteomics is a nonscanning mass spectrometry approach (also known as hypothesis-driven proteomics), that is, it requires us to actively target one or more proteins. Therefore, it is crucial to correctly preselect a set of proteins which are of interest to the question under investigation from all possible proteins, which might be a very complex mixture such as the whole proteome of an organism. Targeted proteomics are commonly applied in biomarker validation,15 that is, to keep track of the occurrence of certain proteins that are associated with specific cell-states (e.g., apoptosis or proliferation). It therefore remains to identify these state-specific proteins through former biological studies, for example, by nontargeted, discovery MS spectra. In such a scenario, the selection of targeted proteins is trivially the set of hypothetical biomarkers. However, other applications of targeted proteomics may require more sophisticated selection methods: for example, SRM has been applied in the derivation of protein-interaction network dynamics.16 Potential network nodes need to be identified by utilizing system biological approaches in combination with SRM data. In such scenarios, the list of investigated proteins will dynamically change during the study according to results from previous data. Furthermore, certain house-keeping proteins should be taken into account. These are proteins which occur stably over various states and usually in high concentration. Measuring of these allows for later result normalization in order to account for potential experimental fluctuations during quantitative studies (see Section 5). After the proteins of interest have been identified, there might still be a need to include further proteins: for example, investigating the activity of a certain enzyme but excluding some inhibitor might result in wrong conclusions. We might predict the activity of the enzyme based on its concentration although the inhibitor could modulate the enzyme activity. Similarly, the best choice for the house-keeping proteins might not be obvious. In general, proteins are chosen that are present in high and stable concentrations across various experimental perturbations relevant for the study, for example, cytoskeletal proteins, enzymes involved in the cell’s metabolism or ribosomal proteins.17 However, there is no such thing as a guarantee that the selected proteins will perform as expected. Therefore, it is often wise to choose more proteins than actually necessary to guarantee useful data acquisition.

1

under investigation, or de novo estimation. For the latter, the need for sophisticated computational methods is strong in the community14 and this review provides an orientation on the topic for bioinformaticians. We express the tasks in terms of classical data mining and machine learning problems to reveal their nature from a computational point of view (see Table 1 for a summary and Figure 1 for an overview of the discussed challenges) while also giving examples of current solutions and possible alternatives. We explain how computational evaluation and prediction can support the identification of interesting proteins (Section 1), the filtering of peptides (Section 2) and fragments (Section 3) as well as making sense of the measured data qualitatively (Section 4) and quantitatively (Section 5). Afterward, the usage of scheduling to increase experimental efficiency is explained (Section 6). The review concludes with an outlook on how the described methods can be combined to provide an efficient framework for utilizing targeted proteomics as a tool (Section 7).

web-service

Reviews

dx.doi.org/10.1021/pr300276f | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Reviews

Figure 1. Schematic overview of a SRM study. The numbers indicate steps that can be supported with computational approaches and point to the sections in which they are discussed.

informative peptides for each protein selected in the previous step. The necessary m/z ratios for selecting a peptide can be derived directly by calculating the mass from its sequence. The challenging part is determining which peptides are the most informative among all produced from the protein under investigation. The following properties should be fulfilled by an optimal peptide; we will first describe these properties and go into possibilities to investigate them afterward. •Uniqueness: Peptides should be unambiguously associable with a single protein. This means that no other protein should produce an identical peptide by proteolysis as otherwise it is not possible to determine which protein was present in the sample using this peptide alone. The amino acid sequence of the peptide determines its mass and preferred charge states among other properties important for the measurement. Sample preparation might reduce sample complexity to facilitate protein identification.24 A prominent example is the usage of antibodies that dock certain peptides with high affinity to filter them from the rest of the mixture.25 Nevertheless, a single peptide will in practice not always be sufficient to identify one among all possible proteins from complex samples. In that case, multiple peptides are used that are in combination capable of identifying the protein by distinguishing it from other proteins.14 •Robust occurrence: It is possible that amino acids of the peptide are irregularly modified chemically or post-translationally.12 While some targeted proteomic studies are investigating exactly these modifications, they can in general make the setup of the experiment more difficult. This is because they will lead to variations in properties that are important for the measurement. In particular, variations in the m/z ratio of certain peptides will influence whether the molecule passes the Q1 filter or not. This will lead to a dampened signal that can lead to wrong qualitative and quantitative results. Therefore, we should avoid choosing peptides with sites expected to be susceptible to such chemical or post-translational modification. Similarly, the enzymatic proteolysis reaction (where usually Trypsin is used as the protease) is not always complete. Sometimes two peptides expected to be created are not proteolyzed but stay covalently linked through an error in the digestion.26 This radically changes the properties of both

It has been proposed to use information derived from system biological studies to infer a complete and correct set of relevant proteins for problematic cases.12 This data is most efficiently integrated from databases providing information in form of interaction networks18,19 and functional groups.20 A single database is most commonly used to search for meaningful proteins. This step is often carried out in an automated fashion.21 However, a single query into one of the databases forgoes the opportunity to benefit from all available, relevant information. Instead, data from various databases can be integrated using data mining techniques. This is expected to improve accuracy and has already been demonstrated in interaction prediction studies.22 It is challenging to determine which databases to include and how to weight the contribution from each data source. Organism-specific data will generally provide useful information but data from closely related organisms can also provide insights. While we want to include as much data as possible to increase accuracy, including all possible database is not always feasible. The vast amount of data that is integrated would exceed the limits of computability and a lot of irrelevant data will be included, increasing the level of noise. Also, accessing a databases will make it necessary to have a data transformation to provide a uniform view on the data source for further processing. This additional effort is necessary for every database. For both reasons, the selection and integration of data sources requires expertise from both the biological as well as the computational side. Note that, although highly curated databases exist, most data will be inferred from biological experiments, some of which are associated with high noiselevels. In this case, data-cleaning might be a necessary preprocessing step.23 Finally, a valid selection approach could be easily centralized in form of a web-service to make the improvements accessible to a broad audience.

2. COMPUTATIONAL SUPPORT IN SELECTING PEPTIDES The filtering of peptides according to their m/z ratio represents the first selection step. It is carried out in the first mass analyzer, Q1. From the set of all peptides generated through proteolysis of all the proteins in the sample, we want to measure the most C

dx.doi.org/10.1021/pr300276f | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Reviews

that discovery MS suffers from undersampling and slow saturation.37 Using plain MS data for evaluation of abundance prediction, as done in all here reported predictions, might therefore be misleading. More thorough investigations will have to show the impact of this undersampling on the evaluation and propose further test set refinements or additional retrospective evaluations. A more reliable evaluation makes use of targeted proteomics data. This can be more rapidly generated using recent advances in SRM technology development. There is still room for improvement of the prediction algorithms following from a conceptual argument: properties used in such studies are strongly connected to the known, described mechanisms that prohibit occurrence or observability. However, approaches to combine them to a classifier might be approximative or, even more importantly, biased to the training data used. For example, imagine a training set which is missing peptides that can be methylated: the trained classifier will underestimate the importance of methylation for peptide observability. To avoid these pitfalls, one can use more mechanistic models that predict the probabilities of observability and occurrence separately by computing probabilities for (or classify according to) the individual aspects of these two problems. For example, Siepen et al.26 proposed a sequencebased prediction algorithm for missed cleavage sites; they have so far only applied it to shotgun studies, but the usage for peptide selection in targeted proteomics is straightforward. Similarly, prediction algorithms for sites of chemical modifications have been proposed. Monigatti et al.38 developed a Hidden Markov Model-based approach to predict sulfation of tyrosine residues and reached accuracy values of up to 98%. Each of these individual events will prevent the peptide to occur or be observed. Therefore, we can combine these predictions into a classifier that predicts the observability and the occurrence of a peptide. Using such classifiers for observability and occurrence, we can reduce the huge set of selectable peptides to a robustly occurring subset. Because of the large amount of selectable peptides, it might not be practical to filter the complete set beforehand. A heuristic would then be necessary to preselect the peptides. From this set of robustly observable peptides, we need to find a combination that allows us to unambiguously identify the protein of interest using the prediction of measurement properties from the sequence. Uniqueness might be more difficult to achieve than observability or vice versa. Uniqueness and observability might be conflicting criteria; certain peptides might be easily observable but highly ambiguous while other peptides could be unique but difficult to observe. The importance of optimizing for either of the two criteria will depend on the character of the sample: while observability can be more difficult to achieve in a set of peptides that are easily chemically modified, uniqueness is more of an issue in a set of peptides generated from protein isoforms. The latter arise from expressing homologous genes or alternative splicings. We might be able to decrease the amount of redundant calculations by starting the filtering for the more difficult criterium first. Again, a heuristic could be used to estimate which is the more difficult part for a given set of peptides and then decide with which property to start.

peptides so that we cannot observe either of them any more. A chosen peptide should be linked to adjacent peptides in the original protein in such a way that it is efficiently proteolyzed. Robust observability: Not every peptide is regularly visible in MS-based measurements. This is not only due to variation in occurrence but also because of other intrinsic properties of the peptide,27 like ionizability and total mass. Choosing a peptide with properties such that it is robustly observable will improve the overall robustness of the acquired results. Uniqueness is comparatively easy to investigate. We need to compare sequences of peptides generated from the proteins of interest with all possible peptide sequences that theoretically could be in generated from sample. Depending on the number of all possible peptides, we can either use some heuristic or even brute-force enumeration to find a uniquely identifying but small subset of peptide sequences to search for experimentally. It is worthwhile to spend computation time on that as a smaller search set will reduce measurement time. If the peptide sequences are not available, they can be generated from the sequences of the proteins in the sample using an in silico digestion approach.28 The sequences of the proteins are stored in large, well-establish databases like UniProt,29 Ensembl,30 and NCBI’s RefProt.31 Unfortunately, the observability and occurrence of a peptide are more difficult to be determined. Most studies today use data from former studies32 stored in databases like PeptideAtlas33 to estimate these two. It is sound and efficient to rely on peptides observed in similar experiments before. However, data may not be available for all proteins, especially because of high sensitivity of the optimal parameters to the experimental setup and sample-specific variations.34 It is worth noting that laboratories operating on similar samples and experimental setups build up in-house databases of peptide observability. This stresses the value of such data. On the other hand, such private databases will compete with public repositories; in particular, many laboratories will have little interest in sharing their data. This might further contribute to the dearth of data for new projects. In cases where no data is available, prediction algorithms are a means of supporting peptide selection. Current prediction algorithms are empirical as they have been trained on observability data (i.e., just listing the sequence and how often it occurred without additional knowledge about the reasons of occurrence). They thereby include aspects from both the observation probability as well as robustness of occurrence because the data gives no possibilities to distinguish between the reasons for why a peptide has not been detected. Mallick et al.35 reached very high ROC AUC values by deriving attributes such as the total charge of the peptide from its sequence and used a hierarchical hill-climbing approach to combine these attributes into a probabilistic observability classifier. In a more recent study, Fusaro et al.36 used a similar approach by training a random forest on averaged chemical properties of the peptides. Labeling their training data as highly versus low/ nonoccurring, they report high predictive sensitivity and outperform other approaches when comparing the predictive results on diverse sets of MS discovery data. The latter have been used to identify the strongest occurring peptides. They convincingly argue that their approach is better at identifying the strong occurring peptides, as their class labels will force the classifier to separate them from all other peptides where as former approaches only attempt to distinguish between occurring and nonoccurring peptides. However, it is known

3. PREDICTING OPTIMAL FRAGMENTS The second selection step is choosing a second m/z ratio that will select from all fragments that have been created by dissociation of the peptides passing the first filter. This is D

dx.doi.org/10.1021/pr300276f | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Reviews

prescans can nevertheless be an option for proteins with low prediction confidence. Finally, the prediction of peptide fragmentation could be included into the peptide scoring to also account for the dissociation behavior as another important factor that influences the peptide selection. It is important to understand which fragments are generated from a peptide: selecting a peptide that only dissociates into ambiguous fragments which do not allow for the discrimination of the peptide from similar peptides is worthless. Because of this, the literature often refers to whole transitions as pairs of m/z values as the important parameter instead of selecting the both ratios sequentially. In this way, we choose peptides and fragments at the same time.12 Combining all here specified attributes of peptides and fragments into a transition selection method will allow for automated SRM setup.

carried out in the last mass analyzer, Q3. Equivalently to the peptide selection just described, the fragment is supposed to identify the peptide of interest and therefore has to be unique, robustly occurring, and observable. However, all three properties are more difficult to investigate for fragments compared to peptides as the dissociation of a peptide is significantly more stochastic compared to the proteolytic reactions used to create peptides from proteins.39 Nevertheless, the factors influencing the abundance of fragments have been identified and suggested to be used in predictive models.40 Many existing models trying to explain the dissociation process of peptides utilize the mobile proton concept.41,42 Note that, although this concept has already been applied successfully, there is a pitfall that might lead to problems in some studies. The mobile proton concept is an empirical, rulebased model that is per se only qualitative, that is, it only predicts whether the fragment occurs at all but does not allow for quantitative estimates.43 This might not be sufficient when we want to ensure that we select a fragment occurring with high-abundance allowing for robust observability. In such a case, new empirical or biophysical models will need to be developed. For example, Zhang44 tackled this problem by proposing a model derived from the mobile-proton concept that still allowed quantitative analysis. High similarity values between experimental and simulated fragment spectra were reported. It is nevertheless worth noticing that the author understands his model to be oversimplifying some aspects. This is visible through wrong estimation of other properties that can explicitly be derived from the model. These simplifications might therefore represent points to attack the problem further and improve the quantitative prediction of peptide dissociation. It is worth noting that fragmentation data can nowadays be generated rapidly using high-throughput setups of discovery MS experiments. Therefore, it has been suggested to use discovery MS prescans to estimate fragment abundances instead of relying on predictive models. Using real data is beneficial in order to avoid errors through approximations within the model. At the same time, we have to realize that even the experimental data might not be perfect. Undersampling effects or high sensitivity to the experimental setup can misguide the study: recent investigations have shown that even multiple discovery scans of one sample only converge slowly to a complete data set.37 For these reasons, prescans are considered with due care as guidelines for fragment selection but are often not sufficient by themselves.45 Frequently, they are validated using measurements of artificial peptides. To avoid such costly additional measurements in large-scale studies, predictive models might represent a useful companion to complement prescan techniques. Such data-augmented predictions will give more accurate results than purely predictive models, while keeping additional measurements and costs to a minimum. Similarly, targeted prescans are used to get empirical data on the fragment observability.46 Thereby, we avoid most pitfalls of using undersampled discovery data for the derivation of targeted proteomics parameters. Nevertheless, problems can occur because of low protein abundance or peptide modifications.47 Moreover, they are even more costly as they require many individual measurements and analysis of this data in order to choose the most promising fragments. This makes them impractical for large-scale studies. Using individual

4. COMPUTER ASSISTED PEAK PICKING AND PEAK VALIDATION Several challenges are present when extracting the information of interest from the collected data. The generic approach can be broken into three parts: reducing noise, detecting peaks and validating the peaks. While noise reduction and peak detection are challenging, they are part of more generic signal processing algorithms and hence will not be reviewed extensively here; instead, we refer the reader to two reviews by Bauer et al.48 and Bertsch et al.,49 and focus on peak validation as there are SRM specific aspects to it. We are not able to ensure that a given transition (two m/z ratios for peptide and fragment selection) will result in a single peak that represents the signal from the targeted peptide, that is, showing the signal from a selected peptide and fragment.50 Both filtering steps might be leaky to other ions. This happens in the case of homology but also very distinct proteins might result in equivalent peptides or fragments according to their m/ z ratios with respect to the measurement resolution. It is important to select the peak that corresponds to the intended measurement among all the detected peaks to allow for correct qualitative and quantitative signal interpretation. This can be formulated as a binary classification problem:51 interpreting every peak as an example, we need to classify whether this peak is generated from the peptide or whether its is interference. We will have to incorporate additional knowledge to make the correct peaks separable from the rest of the spectra. The most commonly applied techniques to include more knowledge use additional or extended experiments. Additional MS/MS studies can be incorporated to identify the origin of an ambiguous signal52 or labeling with heavy-isotopes or antibodies is used to keep track of the peaks.25 Not only can the wellunderstood spectrum interpretation techniques help to understand the additional MS/MS spectra, but also the labeling can be supported with computational methods by predicting the shift of signals after labeling or by proposing informative labels. However, the additional experiments and the labeling are time-consuming and expensive. Therefore, techniques avoiding these were sought to identify the correct peak. mProphet53 was developed which uses additional decoy transitions. They are signals from artificial peptides with similar mass and ionizability acquired within the normal experimental setup. These assays are used to estimate a model for the background noise of the experiment. This helps us to classify the peaks according to their shape (including amplitude) and retention time to be noise or a real signal. To this end, the authors of mProphet E

dx.doi.org/10.1021/pr300276f | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Reviews

Label-free approaches have been proposed more recently.12 In this case, the measurements of the same protein from two related samples (e.g., normal and disease) are compared. However, some kind of normalization to account for experimental variations that will occur over the two distinct measurements is needed. Most studies take a global shift of measured intensity into account by normalizing with the overall signal strength of invariant house-keeping proteins. This deals only with the most simple kind of fluctuation and will not be able to explain protein specific variations. Therefore, further refinements have been proposed. It was suggested to learn fluctuations for every peptide from training data and use these to estimate the differential protein abundance.58 However, this training data may not always be available in a sufficient amount, making it necessary for such models to be capable of using specific data from former experiments. Richard et al.59 made a step into this direction and proposed to include data cleaning and statistical tests for result validation into their automated workflow for protein quantification. Note that most proposals only consider one particular type of fluctuations, while real biological fluctuations are highly stochastic effects probably emerging from multiple underlying processes. A combination of different, complementary principles might lead to a better estimation of such fluctuations.

estimate numerous attributes like correlation values according to shape and retention time. Combining them in a linear fashion with experimentally calibrated weighting, we can create a classifier. The authors showed how the linear combination of the individual attributes lead to an improved classification accuracy compared to just using individual attributes. Therefore, it might be worthwhile to try other functions or to replace the attributes by classifiers trained on them and combine these using a metalearning approach (sometimes also known as stacked generalization). This represents a well-studied concept in machine learning that combines the results of individual classifiers using another classifier. This makes it possible to learn any arbitrary combination function and thereby efficiently compensate for individual weak spots according to the attributes.54 Even more importantly, the calibration experiment for the weighting and the additional decoy transitions requires a certain expertise. It might be sufficient, especially for the calibration of the weighting, to infer the necessary parameters from former experiments stored in some kind of centralized database. To this end, the data from PeptidAtlas33 or similar studies might be used.

5. USING COMPUTATIONAL APPROACHES FOR QUANTITATIVE STUDIES The factors influencing signal intensity in mass spectrometry have been well-known for many years.55 Intensities do not solely depend on the molecule abundance, but also include environmental conditions like pressure, temperature, and complexity of the sample. Also molecule-specific properties, for example, ionizability efficiency, affect the strength of the signal. Nevertheless, the relative comparison between two signals of molecules with similar ionizability efficiency measured in the same experimental conditions can be instructive for the relative abundance of the measured species: the abundance remains the only influencing factor when all the other conditions are held constant. Different approaches have been proposed to use the relative signal intensity for a quantitative study. Most prominently, labeling is used to unambiguously identify a reference peptide and compare its signal with another peak.56 Absolute quantification is possible if the exact amount of the reference is known. Although this represents a very experiment-based approach, computational methods might be able to assist by proposing artificial reference peptides with robust observability and similar measurement behavior compared to the protein under investigation. This is a complex problem as it requires the accurate prediction of certain properties while at the same time being able to propose a stable synthetic peptide that fits the requirements. In spite of being challenging, inhibitor design is already successfully used in drug discovery studies;57 thus, an approach to propose reference peptides dealing with the much more restricted properties of MS studies compared to biomedical studies could be equally successful. However, it is not clear whether targeted proteomics studies will actually benefit from such developments, as molecule design is often a complex and time-consuming process. Therefore, it might not be practical to apply artificial peptide design in such a scenario. On the other hand, to the authors’ knowledge, no artificial peptide proposal algorithm has been proposed so far in the context of proteomics, so that this represents a completely new opportunity to approach targeted proteomics experiment setup.

6. SCHEDULING OF MEASUREMENTS Many transitions are usually measured during a single study. This is not only because we have to measure multiple transitions to investigate a single protein (as described in Section 2), but usually we are interested in investigating multiple proteins. As every transition requires instrument time, the sequential investigation of all transitions becomes timeconsuming and rate-limiting. While classical shotgun approaches can deal with thousands of proteins at the same time, SRM usually only allows us to measure less than 100 proteins per experiment.60 However, most of the time measuring a single transition is spent on waiting for the actual signal to show upthe retention time. More precisely, this is the time it takes the peptide to travel through the whole apparatus and finally reach the detector. It is mainly governed by the amount of time needed to pass through the chromatographic unit. The retention time of a peptide can be learned from former experiments. However, it is well-known that the retention time is very sensitive to the experimental setup, making it difficult to find appropriate data. Therefore, approaches have been developed that use few prescans of reference peptides.61 The outcome of these measurements is informative enough to extrapolate the retention time of a wide range of peptides. The fact that the retention time is so nicely predictable through extrapolation motivated studies using a completely predictive approach: there have been proposals for predicting the retention time using either knowledge about the process62 or simply training a neural network on plain training data.63 It is, however, questionable whether these will be able to capture the details of all experimental setups. At the same time, it has been shown that data from reference peptides measured within the particular setup is only a small additional step leading to high quality retention time estimates. These experiment augmented predictions are in that case the most promising direction. Once the retention time is known for every peptide, we can use this knowledge to schedule the measurements and thereby F

dx.doi.org/10.1021/pr300276f | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Reviews

increase the number of transitions in one experiment.64 It has been suggested that scheduling can bring the number of measurements back into the order of magnitude of shotgun experiments.65 Scheduling can be seen as a parallelization of measurements. We need to ensure that we can still measure the correct peak without too much interference from other signals, that is, we need to know when we can measure each signal and need to distribute the measurements in such a way that they do not interfere.66 This represents an interesting optimization problem of setting the starting time of each measurement and the time intervals for the different filters to achieve maximal occupancy without overlap. Very rough approximations to this problem have been proposed and are implemented in many targeted proteomics pipelines. These rough approximations forego the opportunity to use more complex computational approaches to push the capacities for targeted proteomics studies even further. To this end, the problem has been formalized in terms of Integer Linear Programming66 and exact as well as approximative solutions have been computed.64,66 However, even the approximative solutions need computation time in the order of minutes. This might not be fast enough for approaches where we need to reschedule our transitions dynamically, and therefore, further work has to be invested here. In what case would we need dynamic transition rescheduling? Intelligent SRM (iSRM) has been proposed recently67,68 that allows for increased efficiency in measuring SRM transitions: we distinguish between primary transitions that are measured for the quantitative analysis and additional, complementary transitions that are measured for the unambiguous identification of the protein. The idea is that the complementary transitions will only be measured when the peaks from the primary transitions are present, that is, suggest the presence of the searched peptide. Such data-dependent transition triggering minimizes the number of unnecessary and redundant transitions, but complicates the process of measurement scheduling. The rejection of certain transitions allows to decrease the experiment time by dynamically rescheduling the whole measurement. However, this rescheduling will require the scheduling algorithm to be very fast so that it can operate on the fly.

former experiments, there are computational approaches included at multiple steps in the workflow: the list of proteins to investigate is completed using system biological information, prediction algorithms are used in peak validation to train classifiers and to propose decoy transitions. Although peptide selection is once again supported by looking for data from former experiments, there are simple prediction algorithms implemented as fallback options to estimate the observability probability for peptides in the absence of such data. Even more notably, the authors put some effort in the extensibility of their application. By providing an extensive documentation for bioinformaticians and implementing interfaces that facilitate the integration of new algorithms, they seek to benefit from new developments in the prediction of the parameters. Applications such as ATAQS and Skyline are still young projects, as is the whole field of targeted proteomics. They are moving into a promising direction by supporting collaboration between researchers in one project, allowing cross-platform usage of the software and come with integrations of frequently used databases and prediction algorithms. Nevertheless, the number of proposed extensions is still small. This is probably because the projects are still relatively new. Another reason could be that many researchers work with the tools shipped with their instruments, like MRMPilot (AB Sciex) and Pinpoint (Thermo Fisher Scientific, Inc.). These do not allow for extensions by external developers. Nevertheless, it would be a big success for a research team to incorporate their method within such a framework, thereby reaching a broad audience. It is worth noting also that the commercial vendors have identified the importance of predictive tools in the experimental setup: similar to ATAQS, they include prediction algorithms for the setup of transitions as a fallback option in the absence of data. In addition, simple scheduling techniques are implemented occasionally. Therefore, they might be interested in corporations with theorists to pursue these topics further. In the future development of automated workflows, we see the following main opportunities: the extensive usage of predictive tools in the decision process will assist during the setup of the experiment and data interpretation, making targeted proteomics accessible to a broader audience. At the same time, it has to be kept in mind that the predictions might be erroneous and we will need additional experiments to verify the significance of acquired results. A reliability score (for example, as an average certainty of the different incorporated predictions) might be necessary to indicate the relevance of acquired results. Even more importantly, the automated workflows could make it possible to “close the loop” and make validated results immediately accessible for new studies as additional knowledge, or could be used in the continuous improvement and validation of the computational tools. This already happens on a small scale in many pipeline implementations, allowing for so-called “iterative setup”. However, errors in the predictions could be propagated through the process. Mechanisms will have to be developed that allow for the detection of mistakes and exclusion of such results from further usage. Developing standards for the input and output data of prediction algorithms will make it easier to allow for the integration of new methods into existing tools. Such steps need to be taken by the creators of integrated solutions if they want their tools to garner broader attention by method developers. Conversely, method developers should put effort into integrating their method within existing frameworks.

7. INTEGRATING PREDICTION TOOLS INTO PIPELINES Combining techniques for the prediction of all relevant parameters into a large automated workflow enables any experimentalist to use targeted proteomics; the same can be done in the absence of data or prior knowledge about the organism and proteins under investigation. Skyline21 is a Windows-based application with increasing user count. The main goal of Skyline is to guide the user through the process of setting all the different parameters necessary to set up and analyze a SRM study. For example, an in silicio digestion of the protein under investigation is implemented to visualize all possible peptides from which the user then can choose the appropriate peptides for his study. Furthermore, Skyline allows for rapid access to well-known databases like PeptidAtlas, so that the easy integration of prior knowledge is possible. However, Skyline does not include sophisticated prediction algorithms yet to make a de novo proposal for the necessary parameters. A step into that direction was taken by the authors of ATAQS, 69 a web-based application published recently. Although the focus again lies on integrating knowledge from G

dx.doi.org/10.1021/pr300276f | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research



Possibilities of integrating a new method into already existing piplines will highly increase the usage of the method.

Reviews

AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected].

8. CONCLUSION

Notes

The authors declare no competing financial interest.

We have identified and discussed SRM-related bioinformatical challenges in this review. The eventual goal of approaching these challenges is to support the SRM process in the absence of experimentally derived information about the protein, organism or experimental setup, or when the scale of the study restricts the usage of premeasurements for empirical parameter estimation. For all the necessary parameters, either prediction algorithms have already been proposed or at least computational approaches to equivalent problems exist. Many of the algorithms show promising results on the test data provided by the authors. Nevertheless, the number of proposed algorithms is still small while the community keeps emphasizing the need for sophisticated prediction algorithms that would eventually open the door to targeted proteomics to a broader audience. The request for computational methods is not only expressed explicitly in influential papers that are related to the topic,12,14 but also visible through recent developments, for example, the usage of prediction algorithms in some studies70 and possibilities to include new algorithms in large pipelines.69 In particular, most proposals so far suffer from two shortcomings that new algorithm proposals should address. First, it is worthwhile noting that most algorithms proposed so far plainly utilize classical supervised approaches like neural networks trained on annotated training data. However, for most problems described here, for example, the observability and retention time of a peptide, the underlying processes are well understood. It is generally assumed that including such knowledge into the training will increase predictive accuracy. Knowledge can be included for example through adding specific attributes to the training data or modeling certain properties explicitly. Second, increasing the mathematical foundation in the approaches can increase the reliability of results and may open new forms of access to these studies, even for experts. For example, increasing the usage of statistical tests to validate results can help to determine the significance of a prediction and reveal the reliability of a result. The development of new methods is supported by the vast amount of available data from former experiments in the form of centralized databases like PeptidAtlas,33 PRIDE,71 and SRMAtlas.72 They provide theoreticians with curated training and test data for algorithm development. The performance assessment of new approaches can be done in a straightforward way by comparing against the performance of methods described in this review: they represent the state of the art approach for the corresponding problem and are often publicly available. The introduction to the problems in computational terms hopefully provides an access to the topic for bioinformaticians. They can, understanding the mechanisms of the experiments as well as the mathematics to construct sound models, develop new algorithms that can be integrated in existing or new workflows to support targeted proteomics research. The hope is that this will further increase the general accessibility of SRM, a field which is enjoying a recent upswing in usage as indicated by several recent reviews.11,45,50,73,74



ACKNOWLEDGMENTS This article started as assignment for the graduate course “Reviews in Computational Biology” (263-5151-00 L) at ETH Zurich.



REFERENCES

(1) Domon, B.; Aebersold, R. Mass spectrometry and protein analysis. Science 2006, 312, 212−217. (2) Eng, J.; McCormack, A.; Yates, J., III. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5, 976−989. (3) Craig, R.; Beavis, R. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20, 1466−1467. (4) Perkins, D.; Pappin, D.; Creasy, D.; Cottrell, J. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 1999, 20, 3551−3567. (5) Keller, A.; Nesvizhskii, A.; Kolker, E.; Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 2002, 74, 5383− 5392. (6) Nesvizhskii, A.; Keller, A.; Kolker, E.; Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 2003, 75, 4646−4658. (7) Mueller, L.; Rinner, O.; Schmidt, A.; Letarte, S.; Bodenmiller, B.; Brusniak, M.; Vitek, O.; Aebersold, R.; Müller, M. SuperHirn-a novel tool for high resolution LC-MS-based peptide/protein profiling. Proteomics 2007, 7, 3470−3480. (8) Sturm, M.; Bertsch, A.; Gropl, C.; Hildebrandt, A.; Hussong, R.; Lange, E.; Pfeifer, N.; Schulz-Trieglaff, O.; Zerck, A.; Reinert, K.; Kohlbacher, O. OpenMS−an open-source software framework for mass spectrometry. BMC Bioinf. 2008, 9, 163. (9) Yates, J.; Ruse, C.; Nakorchevsky, A. Proteomics by mass spectrometry: approaches, advances, and applications. Annu. Rev. Biomed. Eng. 2009, 11, 49−79. (10) Doerr, A. Targeted proteomics. Nat. Methods 2010, 8, 43−43. (11) Elschenbroich, S.; Kislinger, T. Targeted proteomics by selected reaction monitoring mass spectrometry: applications to systems biology and biomarker discovery. Mol. BioSyst. 2011, 7, 292−303. (12) Lange, V.; Picotti, P.; Domon, B.; Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol. 2008, 4, 222. (13) Kuster, B.; Schirle, M.; Mallick, P.; Aebersold, R. Scoring proteomes with proteotypic peptide probes. Nat. Rev. Mol. Cell Biol. 2005, 6, 577−583. (14) Yocum, A.; Chinnaiyan, A. Current affairs in quantitative targeted proteomics: multiple reaction monitoring−mass spectrometry. Briefings Funct. Genomics Proteomics 2009, 8, 145. (15) Smith, R. Mass spectrometry in biomarker applications: from untargeted discovery to targeted verification, and implications for platform convergence and clinical application. Clin. Chem. 2012, 58, 528−530. (16) Bisson, N.; James, D.; Ivosev, G.; Tate, S.; Bonner, R.; Taylor, L.; Pawson, T. Selected reaction monitoring mass spectrometry reveals the dynamics of signaling through the GRB2 adaptor. Nat. Biotechnol. 2011, 29, 653−658. (17) Yoshida, Y.; Miyazaki, K.; Kamiie, J.; Sato, M.; Okuizumi, S.; Kenmochi, A.; Kamijo, K.; Nabetani, T.; Tsugita, A.; Xu, B.; Zhang, Y.; Yaoita, E.; Osawa, T.; Yamamoto, T. Twodimensional electrophoretic profiling of normal human kidney glomerulus proteome and construction of an extensible markup language (XML)-based database. Proteomics 2005, 5, 1083−1096.

H

dx.doi.org/10.1021/pr300276f | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Reviews

(18) Aranda, B.; et al. The IntAct molecular interaction database in 2010. Nucleic acids research 2010, 38, D525. (19) Ceol, A.; Chatr Aryamontri, A.; Licata, L.; Peluso, D.; Briganti, L.; Perfetto, L.; Castagnoli, L.; Cesareni, G. MINT, the molecular interaction database: 2009 update. Nucleic Acids Res. 2010, 38, D532. (20) Kanehisa, M.; Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28, 27. (21) MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.; Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26, 966−968. (22) Srinivasan, B.; Novak, A.; Flannick, J.; Batzoglou, S.; McAdams, H. Integrated protein interaction networks for 11 microbes. Res. Comput. Mol. Biol. 2006, 1−14. (23) Rahm, E.; Do, H. Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 2000, 3. (24) Ahmed, F. Sample preparation and fractionation for proteome analysis and cancer biomarker discovery by mass spectrometry. J. Sep. Sci. 2009, 32, 771−798. (25) Anderson, N.; Anderson, N.; Haines, L.; Hardie, D.; Olafson, R.; Pearson, T. Mass spectrometric quantitation of peptides and proteins using Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAPA). J. of Proteome Res. 2004, 3, 235−244. (26) Siepen, J.; Keevil, E.; Knight, D.; Hubbard, S. Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics. J. Proteome Res. 2007, 6, 399−408. (27) Tang, H.; Arnold, R.; Alves, P.; Xun, Z.; Clemmer, D.; Novotny, M.; Reilly, J.; Radivojac, P. A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics 2006, 22, e481. (28) Librando, V.; Gullotto, D.; Minniti, Z. Automated molecular library generation of proteic fragments by virtual proteolysis for molecular modelling studies. In Silico Biol. 2006, 6, 449−457. (29) Jain, E.; Bairoch, A.; Duvaud, S.; Phan, I.; Redaschi, N.; Suzek, B.; Martin, M.; McGarvey, P.; Gasteiger, E. Infrastructure for the life sciences: design and implementation of the UniProt website. BMC Bioinf. 2009, 10, 136. (30) Flicek, P.; et al. Ensembl 2011. Nucleic acids research 2011, 39, D800. (31) Pruitt, K.; Tatusova, T.; Maglott, D. NCBI Reference Sequence (RefSeq): a curated nonredundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005, 33, D501. (32) Craig, R.; Cortens, J.; Beavis, R. The use of proteotypic peptide libraries for protein identification. Rapid communications in mass spectrometry 2005, 19, 1844−1850. (33) Deutsch, E. W.; Lam, H.; Aebersold, R. PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows. EMBO Rep. 2008, 9, 429−434. (34) Cham Mead, J.; Bianco, L.; Bessant, C. Free computational resources for designing selected reaction monitoring transitions. Proteomics 2010, 10, 1106−1126. (35) Mallick, P.; Schirle, M.; Chen, S.; Flory, M.; Lee, H.; Martin, D.; Ranish, J.; Raught, B.; Schmitt, R.; Werner, T.; Kuster, B.; Aebersold, R. Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 2006, 25, 125−131. (36) Fusaro, V.; Mani, D.; Mesirov, J.; Carr, S. Prediction of highresponding peptides for targeted protein assays by mass spectrometry. Nat. Biotechnol. 2009, 27, 190−198. (37) Malmström, J.; Lee, H.; Aebersold, R. Advances in proteomic workflows for systems biology. Curr. Opin. Biotechnol. 2007, 18, 378− 384. (38) Monigatti, F.; Gasteiger, E.; Bairoch, A.; Jung, E. The Sulfinator: predicting tyrosine sulfation sites in protein sequences. Bioinformatics 2002, 18, 769. (39) Bafna, V.; Edwards, N. SCOPE: a probabilistic model for scoring tandem mass spectra against a peptide database. Bioinformatics 2001, 17, S13.

(40) Barton, S.; Whittaker, J. Review of factors that influence the abundance of ions produced in a tandem mass spectrometer and statistical methods for discovering these factors. Mass Spectrom. Rev. 2009, 28, 177−187. (41) Wysocki, V.; Tsaprailis, G.; Smith, L.; Breci, L. Mobile and localized protons: a framework for understanding peptide dissociation. J. Mass Spectrom. 2000, 35, 1399−1406. (42) Boyd, R.; Somogyi, A. The mobile proton hypothesis in fragmentation of protonated peptides: a perspective. J. Am. Soc. Mass Spectrom. 2010, 21, 1275−1278. (43) Wysocki, V.; Cheng, G.; Zhang, Q.; Herrmann, K.; Beardsley, R.; Hilderbrand, A. In Peptide Fragmentation Overview; JohnWiley and Sons: Hoboken, NJ, 2006; Vol. 10, Chapter Peptide fragmentation overview, pp 277−300. (44) Zhang, Z. Prediction of low-energy collision-induced dissociation spectra of peptides. Anal. Chem. 2004, 76, 3908−3922. (45) Malmström, L.; Malmström, J.; Aebersold, R. Quantitative proteomics of microbes: principles and applications to virulence. Proteomics 2011, 11, 2947−2956. (46) Stergachis, A.; MacLean, B.; Lee, K.; Stamatoyannopoulos, J.; MacCoss, M. Rapid empirical discovery of optimal peptides for targeted proteomics. Nat. Methods 2011, 8, 1041−1043. (47) Shuford, C.; Li, Q.; Sun, Y.; Chen, H.; Wang, J.; Shi, R.; Sederoff, R.; Chiang, V.; Muddiman, D. Comprehensive quantification of monolignol-pathway enzymes in populus trichocarpa by protein cleavage isotope dilution mass spectrometry. J. Proteome Res. 2012, 11, 3390−3404. (48) Bauer, C.; Cramer, R.; Schuchhardt, J. Evaluation of peak− picking algorithms for protein mass spectrometry. Methods Mol. Biol. 2011, 696, 341−352. (49) Bertsch, A.; Gröpl, C.; Reinert, K.; Kohlbacher, O. OpenMS and TOPP: open source software for LC-MS data analysis. Methods Mol. Biol. 2011, 696, 353−367. (50) Gallien, S.; Duriez, E.; Domon, B. Selected reaction monitoring applied to proteomics. J. Mass Spectrom. 2011, 46, 298−312. (51) Mitchell, T. M. Machine Learning; McGraw-Hill: New York, 1997. (52) Unwin, R.; Griffiths, J.; Leverentz, M.; Grallert, A.; Hagan, I.; Whetton, A. Multiple reaction monitoring to identify sites of protein phosphorylation with high sensitivity. Mol. Cell. Proteomics 2005, 4, 1134. (53) Reiter, L.; Rinner, O.; Picotti, P.; HÃ ijttenhain, R.; Beck, M.; Brusniak, M. Y.; Hengartner, M. O.; Aebersold, R. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 2011, 8, 430−435. (54) Witten, I.; Frank, E. Data Mining: Practical Machine Learning Tools and Techniques; Morgan Kaufmann Pub: Boston, MA, 2005. (55) Wiley, W.; McLaren, I. Time-of-flight mass spectrometer with improved resolution. Rev. Sci. Instrum. 1955, 26, 1150−1157. (56) Ong, S.; Mann, M. Mass spectrometry-based proteomics turns quantitative. Nat. Chem. Biol. 2005, 1, 252−262. (57) Jorgensen, W. The many roles of computation in drug discovery. Science 2004, 303, 1813. (58) Vogel, C.; Marcotte, E. Calculating absolute and relative protein abundance from mass spectrometry-based protein expression data. Nat. Protoc. 2008, 3, 1444−1451. (59) Richard, E.; Knierman, M.; Gelfanova, V.; Butler, J.; Hale, J. Comprehensive label-free method for the relative quantification of proteins from biological samples. J. Proteome Res. 2005, 4, 1442−1450. (60) Domon, B.; Aebersold, R. Options and considerations when selecting a quantitative proteomics strategy. Nat. Biotechnol. 2010, 28, 710−721. (61) Escher, C.; Reiter, L.; MacLean, B.; Ossola, R.; Herzog, F.; Chilton, J.; MacCoss, M.; Rinner, O. Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 2012, 12, 1111−1121. (62) Krokhin, O.; Craig, R.; Spicer, V.; Ens, W.; Standing, K.; Beavis, R.; Wilkins, J. An improved model for prediction of retention times of I

dx.doi.org/10.1021/pr300276f | J. Proteome Res. XXXX, XXX, XXX−XXX

Journal of Proteome Research

Reviews

tryptic peptides in ion pair reversed-phase HPLC. Mol. Cell. Proteomics 2004, 3, 908. (63) Petritis, K.; Kangas, L. J.; Ferguson, P. L.; Anderson, G. A.; PasaTolic, L.; Lipton, M. S.; Auberry, K. J.; Strittmatter, E. F.; Shen, Y.; Zhao, R.; Smith, R. D. Use of artificial neural networks for the accurate prediction of peptide liquid chromatography elution times in proteome analyses. Anal. Chem. 2003, 75, 1039−1048. (64) Stahl-Zeng, J.; Lange, V.; Ossola, R.; Eckhardt, K.; Krek, W.; Aebersold, R.; Domon, B. High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites. Mol. Cell. Proteomics 2007, 6, 1809. (65) Rost, H.; Malmstrom, L.; Aebersold, R. A computational tool to detect and avoid redundancy in selected reaction monitoring. Mol. Cell. Proteomics 2012, 11, 540−549. (66) Bertsch, A.; Jung, S.; Zerck, A.; Pfeifer, N.; Nahnsen, S.; Henneges, C.; Nordheim, A.; Kohlbacher, O. Optimal de novo design of MRM experiments for rapid assay development in targeted proteomics. J. Proteome Res. 2010, 9, 2696−2704. (67) Kiyonami, R.; Schoen, A.; Prakash, A.; Nguyen, H.; Peterman, S.; Selevsek, N.; Zabrouskov, V.; Huhmer, A.; Domon, B. Increased Quantitative Throughput and Selectivity for Triple Quadrupole Mass Spectrometer-Based Assays Using Intelligent SRM (iSRM) 2009. (68) Kiyonami, R.; Schoen, A.; Prakash, A.; Peterman, S.; Zabrouskov, V.; Picotti, P.; Aebersold, R.; Huhmer, A.; Domon, B. Increased selectivity, analytical precision, and throughput in targeted proteomics. Mol. Cell. Proteomics 2011, 10, No. M110.002931. (69) Brusniak, M. Y.; Kwok, S. T.; Christiansen, M.; Campbell, D.; Reiter, L.; Picotti, P.; Kusebauch, U.; Ramos, H.; Deutsch, E. W.; Chen, J.; Moritz, R. L.; Aebersold, R. ATAQS: A computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry. BMC Bioinf. 2011, 12, 78. (70) Bantscheff, M.; Schirle, M.; Sweetman, G.; Rick, J.; Kuster, B. Quantitative mass spectrometry in proteomics: a critical review. Anal. Bioanal. Chem. 2007, 389, 1017−1031. (71) Vizcaino, J.; Cote, R.; Reisinger, F.; MFoster, J.; Mueller, M.; Rameseder, J.; Hermjakob, H.; Martens, L. A guide to the Proteomics Identifications Database proteomics data repository. Proteomics 2009, 9, 4276−4283. (72) Picotti, P.; Bodenmiller, B.; Mueller, L. N.; Domon, B.; Aebersold, R. Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics. Cell 2009, 138, 795−806. (73) Calvo, E.; Camafeita, E.; Fernaandez-Gutieerrez, B.; Loopez, J. Applying selected reaction monitoring to targeted proteomics. Expert Rev. Proteomics 2011, 8, 165−173. (74) Chiu, C.; Randall, S.; Molloy, M. Recent progress in selected reaction monitoring MS-driven plasma protein biomarker analysis. Bioanalysis 2009, 1, 847−855.

J

dx.doi.org/10.1021/pr300276f | J. Proteome Res. XXXX, XXX, XXX−XXX