Article pubs.acs.org/jpr
Qualis-SIS: Automated Standard Curve Generation and Quality Assessment for Multiplexed Targeted Quantitative Proteomic Experiments with Labeled Standards Yassene Mohammed,†,‡ Andrew J. Percy,† Andrew G. Chambers,† and Christoph H. Borchers*,†,§ †
University of Victoria - Genome British Columbia Proteomics Centre, University of Victoria, Vancouver Island Technology Park, #3101-4464 Markham Street, Victoria, British Columbia V8Z 7X8, Canada ‡ Center for Proteomics and Metabolomics, Leiden University Medical Center, 2333 ZA Leiden, The Netherlands § Department of Biochemistry and Microbiology, University of Victoria, Petch Building Room 207, 3800 Finnerty Road, Victoria, British Columbia V8P 5C2, Canada S Supporting Information *
ABSTRACT: Multiplexed targeted quantitative proteomics typically utilizes multiple reaction monitoring and allows the optimized quantification of a large number of proteins. One challenge, however, is the large amount of data that needs to be reviewed, analyzed, and interpreted. Different vendors provide software for their instruments, which determine the recorded responses of the heavy and endogenous peptides and perform the response-curve integration. Bringing multiplexed data together and generating standard curves is often an off-line step accomplished, for example, with spreadsheet software. This can be laborious, as it requires determining the concentration levels that meet the required accuracy and precision criteria in an iterative process. We present here a computer program, Qualis-SIS, that generates standard curves from multiplexed MRM experiments and determines analyte concentrations in biological samples. Multiple level-removal algorithms and acceptance criteria for concentration levels are implemented. When used to apply the standard curve to new samples, the software flags each measurement according to its quality. From the user’s perspective, the data processing is instantaneous due to the reactivity paradigm used, and the user can download the results of the stepwise calculations for further processing, if necessary. This allows for more consistent data analysis and can dramatically accelerate the downstream data analysis. KEYWORDS: software, multiple reaction monitoring, selected reaction monitoring, parallel reaction monitoring, multiplexed targeted proteomics, reactive analysis endogenous targets are utilized.3,4 For improved sensitivity, the mass spectrometer (typically a triple quadrupole but also hybrid quadrupole/ion trap or hybrid quadrupole−Orbitrap) is operated in the targeted ion mode wherein the peptide LC eluate is measured by multiple/selected reaction monitoring (MRM or SRM5,6) or parallel reaction monitoring (PRM7−9). In these dynamic monitoring modes, specific precursor ions are selected in the first mass analyzer and fragmented by collisioninduced dissociation in the collision cell. From there, specific
1. INTRODUCTION Targeted quantitative proteomics is increasingly being used to quantify the protein contents of biological samples. It is being used to address biological questions and to provide protein profiling, which can be used in personalized medicine. Experimentally, targeted proteomics can be performed with a relative or absolute quantitative technique that is based on a “bottom-up” liquid chromatography (LC)−mass spectrometry (MS) workflow.1,2 In the preferred “absolute” quantitation strategy (i.e., where results are reported as concentrations instead of fold-changes), labeled standards (be it peptides or proteins) that are isotopic analogues and heavy surrogates of the © 2014 American Chemical Society
Received: October 21, 2014 Published: December 29, 2014 1137
DOI: 10.1021/pr5010955 J. Proteome Res. 2015, 14, 1137−1146
Article
Journal of Proteome Research
Figure 1. Schematic of the two modes of using Qualis-SIS for targeted and multiplexed, quantitative proteomic analysis. (A) Standard curve generation and assay attribute determination for the reference sample. The curve comprises a series of levels with constant NAT and variable SIS and requires the acceptance of a number of criteria (e.g., average precision minimally below 20% CV) to qualify. (B) Concentration determination in the new samples based on regression information from the curve. The quality of the derived results is additionally assessed and displayed. The presented results here are acceptable (noted by the green tag) as they lie within the assay’s range of linearity.
partially accomplished with vendor-dependent software, such as MassHunter Quantitative Analysis (Agilent), MultiQuant (AB Sciex), and Pinpoint (Thermo Scientific), or with vendorindependent programs, such as Skyline.12,13 Overall, these software packages are generally dedicated to a preliminary analysis of the mass spectrometric spectral data and the transitions and enable the user to verify and edit the peak selection/integration. However, determination of the concentration levels using standard curves is rather limited and manually intensive. Furthermore, none of the available tools can rapidly generate curves for multiplexed analysis using different concentration level removal strategies and variable thresholds for precision and accuracy. The latter of which is required for adherence to the FDA guidelines for bioanalytical measurement, whereby 20% deviation in precision/accuracy is tolerated for the lower limit of quantitation (LLOQ) and 15% deviation for all other qualified levels.14 In addition, the calculation of the endogenous peptide concentration from the standard curve is a slow and tedious process when offered as it is in Skyline plug-ins (e.g., QuaSAR15). We here present Qualis-SIS, software that automates the generation of peptide standard curves as well as the calculation of
product ions are selected in the second mass analyzer in MRM or entirely transmitted into the second mass analyzer for the acquisition of complete product ion mass spectra in PRM. Regardless of the mode, quantitation of the endogenous proteins in sample matrices should be achieved through linear regression analysis of peptide standard curves, particularly for Tier 1 assays where accurate quantitation is the ultimate goal,10 although quantitation based on the relative ratio of the labeled and unlabeled peptide signal is considered acceptable under certain circumstances. It is important to bear in mind, however, that the latter method assumes that the determined concentrations lie within the linear range of the assay, which may not always be true. Over the years, there has become an increasing need for highly multiplexed peptide panels to help expedite the verification/ validation of putative protein biomarkers and to help improve the diagnostic/prognositic accuracy of disease assessment.11 Using the targeted technology previously described, hundreds of peptides with three transitions per peptide form (i.e., light, endogenous, natural, or native, abbreviated NAT; heavy or stable isotope-labeled standard, abbreviated SIS) can theoretically be monitored in a single analytical run. The analysis of such data sets, however, is laborious. Currently, the data analysis can be 1138
DOI: 10.1021/pr5010955 J. Proteome Res. 2015, 14, 1137−1146
Article
Journal of Proteome Research
Figure 2. Flowchart illustrating the steps of the processing pipeline of multiplexed quantitative data as implemented in Qualis-SIS. The inputs are specified for each peptide and viewed independently on screen. The reactive processes (shown in orange) are retriggered whenever the user changes any of the inputs, with the new results presented on the screen in real time.
updates of the results according to interactions with the user. We used R16 to write the computational part of the software, which encompasses the calculations and generation of figures and result files. The Shiny17 library was used to implement the reactivity part of the software, while ShinyDash18 was used for the interactive graphical user interface. The used data sets are from our previous studies.19,20 We also used data sets from new LC−MRM/MS experiments using pooled human plasma purchased from Bioreclamation (catalog no. HMPLEDTA2; Westbury, NY) obtained from whole blood collected from 15 male and 15 female race-matched donors, between the ages of 18 and 50.
assay attributes (e.g., limit of quantitation, dynamic range) and endogenous protein concentrations of hundreds of target peptides in reference samples (also referred to as control) and samples (also referred to as new or patient samples). For enhanced customization enabling adherence to FDA guidelines, it allows the user to select the regression weighting, level removal algorithm (i.e., low-to-high or end-to-end), and precision/ accuracy threshold on a global and local level at the LLOQ. In addition to using the regression equation to calculate the endogenous concentration of a sample, the tool assesses the quality of the sample measurement and displays the results in a color-coded table. Compared with available alternatives, QualisSIS centralizes all computation into a single tool, which reduces the need for multiple software packages and provides a dramatic reduction over manual processing for curve interpretation and results extraction.
3. RESULTS AND DISCUSSION A multiplexed, targeted proteomic experiment with labeled standards includes the measurement of tens to hundreds of peptides in a single LC−MS analysis toward absolute quantitation. Enforcing a standardized data analysis strategy is important to reduce human error and enhance the reproducibility of analysis between laboratories. Apart from the automation and speed, an additional goal of our developed software tool is to change the user experience from performing a stepwise analysis to interactively examining the data. (See Figures 1 and 3.) Whenever the user changes the analysis parameters (be it from a change in the concentration level removal strategy or precision/accuracy threshold; see flowchart in Figure 2 for a broad overview), the final results are browsable in real time. This streamlines the comparison, enabling the final analysis parameters for a given assay to be more rapidly defined. The
2. MATERIALS AND METHODS The software is written in R and is based on reactive programming. This is a development paradigm based on the propagation of change during data processing to the rest of the analysis. The data processing involves multiple steps, starting with reading in the response data and ending with the calculation of the endogenous peptide concentrations. The final results will change depending on several parameters (e.g., regression weighting, analytical precision), each of which is an input to a different part of the algorithm. The user can select these parameters based on his knowledge of the analytical experiment or adjust the analysis method to see how it affects the final results. Thus, reactive programming is ideal for allowing real-time 1139
DOI: 10.1021/pr5010955 J. Proteome Res. 2015, 14, 1137−1146
Article
Journal of Proteome Research following outlines the software and its implementation, highlights its use and application, and discusses the test results.
upper LOQs and dynamic range and to quantify the endogenous protein concentration in the reference and patient samples.
3.1. Generation and Qualification of Peptide Standard Curves
3.2. Single-Point Measurement and Standard Curves
An MRM analysis is based on the premise that the heavy labeled peptide behaves similarly to the target peptide in all parts of the analysis system. Therefore, the peak area ratio of the SIS to the NAT peptide is proportional to their concentrations, that is
The generation of a standard curve is the principle step in many quantitative methods. Here the goal is to construct a linear (or quadratic) relationship between the instrument’s response and the concentration of an analyte. In our MRM analyses, we add a mixture of SIS peptides to the tryptic matrix digest and use the relative response (i.e., SIS/NAT) instead of the instrument’s absolute response from the individual SIS or NAT transition measurements. Standard curves are sometimes constructed by spiking variable concentrations of a SIS peptide mixture into a buffer or by spiking a fixed concentration of SIS peptides into a serially diluted matrix, typically covering a dilution series of several 100-fold.3 This approach, however, can lead to incorrect LOQs because in the first scenario the buffer will not contain all of the potentially interfering substances in the matrix, and the second scenario will lead to the dilution of potentially interfering components in the sample. Although this will certainly reduce the well-known suppression effects due to the dilution of the matrix,21 it is an artificial benefit because the authentic samples will not be similarly diluted. It is for these reasons that holding the NAT concentration (and the rest of the matrix components) constant when preparing the standard curve generation is the more accurate, and preferred, method (see schematic in Figure 1). We have previously demonstrated this approach in a large number of quantitative proteomics method development projects using multiplexed panels of SIS peptides and a variety of sample matrices.19,22−24 By using this approach, the artificial removal of interferences by matrix dilution is avoided, while the upper LOQ is not unnecessarily limited. Using this curve-generation technique (i.e., holding the matrix unchanged and spiking-in multiple SIS concentrations), Qualis-SIS generates plots of relative response (i.e., SIS/NAT peak area ratio) as a function of either the SIS peptide concentration (in the case of the reference) or relative concentration (toward the sample analysis). In our laboratory, these typically consist of six to eight concentration levels, each of which has a minimum of three to five replicates. The software, however, allows enforcing the minimum number of replicates and consecutive levels with a lower limit of three and a maximum limit determined by the input data. To qualify for curve generation, the analytical replicates within a given level must be both precise (typically 1000-fold change between standards), while an end-to-end strategy is favored if detector saturation is observed at the upper limits. In the end, only those levels that pass the precision and accuracy thresholds are considered to be usable in the linear regression. This final curve is then used to determine the lower/
ASIS C = SIS ANAT C NAT
(1)
where ASIS and ANAT are the SIS and the NAT peak areas, respectively, and CSIS and CNAT are the SIS and NAT concentrations, respectively. By knowing the spiked-in SIS concentration we can readily compute the NAT concentration. We refer by single-point measurement to calculating the NAT concentration from this equation directly without the standard curve. Generating a standard curve using replicates at different concentration levels is essential when we intend to collect statistics about how confident we are in our measurement and determining the LOQs. Here either CSIS or CNAT is held constant, and the other is varied over a few fold concentrations. In our studies, we keep CNAT constant due to the different reasons previously mentioned, and linear regression analysis is applied to determine a relationship of the type ASIS = aCSIS + b ANAT
(2)
where a is the slop of the curve and b is the y intersect. In this case, a confidence interval can be determined and the CNAT in the (reference) sample can be determined with C NAT =
1−b a
(3)
given that the dynamic range is covering the area where ASIS/ANAT = 1. Crucially, and as a result of generating a standard curve with replicates at each level, the calculated CNAT can be reported with a confidence interval of 95%, for instance. It is important to note that the reported confidence interval must be calculated as reverse regression25 considering the peak area ratio as the dependent variable or regressand and the CSIS as the independent variable or regressor. It is the wrong practice to flip these around in constructing a linear regression using inverse regression, that is, considering the concentration as the dependent variable and the peak area ratio as the independent variable.26 The generated standard curve is specific to the reference sample used; that is, the curve is specific to the CNAT in the reference sample. (See Figure 1.) Once the standard curve is generated, it can be used to calculate the endogenous concentration in new samples. One important piece of information the curve provides us is an estimation of the confidence interval of a new single-point measurement. This, however, requires that we do a new singlepoint measurement with a concentration of the spiked-in heavy labeled peptide that is very close or identical to that of the endogenous in the reference sample, for which the curve was generated. (See Figure 1.) In this case, the statistics from the curve can be applied to the new single-point measurement, and we can infer from the curve by reverse regression the confidence interval of the new measurement. For the case that the concentration of the spiked-in SIS lies far from the standard curve specific NAT concentration, all statistics based on the 1140
DOI: 10.1021/pr5010955 J. Proteome Res. 2015, 14, 1137−1146
Article
Journal of Proteome Research
Figure 3. Overview of the Qualis-SIS user interface. The interface consists of four tabs. The File page allows the user to upload the input files (for the reference sample to generate the standard curve and for new samples to be evaluated using the standard curve) and download the output as well as example input files. The Results f rom Reference and Results f rom Samples pages illustrate the peptide standard curve and a summary of the determined metrics for the reference and new sample based on a set of user settings (e.g., regression weighting, precision, accuracy). The Quality Assessment page provides an overview of the quality of the sample measurements. (See Figures 4 and 5.)
curves compared, then the level that yields the lower R2 value is permanently removed. This process continues inward from “endto-end” until no violations in the remaining levels occur. In terms of the precision and accuracy criteria, this can be userspecified for the lower LOQ and all other qualified levels to allow adherence to the FDA requirements (20% deviation at LLOQ and 15% for the remaining, qualified levels). Further to the settings, the number of acceptable replicates (“all” being the default) within a given concentration level and the number of consecutive, qualified levels (three being the default) can also be user-specified. Standard curves (relative ratio vs SIS concentration) are displayed in normal and log−log scales and can be toggled between peptides using a dropdown list. (See Figure 3b.) Included in the plots is the 95% confidence interval, which is calculated from the reverse regression.25 For quick evaluation, a brief summary of the results (in terms of lower and upper LOQs, dynamic range, and endogenous concentration) is shown in this tab. The Results f rom the Samples page is activated if the user uploads the peptide and protein input files of the samples into the File tab. The concentrations of the endogenous proteins are then computed from the regression analysis of the control curve as well as from relative ratios (i.e., single-point measurement, SPM) in the absence of the curve as in eq 1, for comparison. Here new curves are constructed using relative response as a function of relative concentration (RC), as defined and implemented in our recent studies.19,20 This is plotted in the normal and log−log scale on a per-peptide basis. On the curve, the sample protein concentration (denoted as green asterisk from SPM or purple
curve including the dynamic range and LOQs are not applicable and we have no estimation for the error in that new measurement. Here our best estimate of the concentration can be obtained from eq 1. 3.3. Overview of the Functionalities and User Interface
The software runs in the browser in a client-server mode and can be run locally. It has four main tabs, as illustrated from the screenshot in Figure 3. The page accessed via the first File tab allows uploading of the input files and downloading of the analysis results as well as example inputs. Once the input files have successfully been uploaded, the other pages linked to the remaining tabsResults f rom the Reference, Results f rom the Samples, Quality Assessmentare activated according to the provided inputs. The Results f rom the Reference page illustrates and summarizes the results from regression analysis of the reference sample. Here the user can choose the weighting factor for the linear regression (either none, 1/x, or 1/x2; 1/x2 is the default), the concentration level removal method (either low-to-high or end-to-end; low-tohigh is the default), and the thresholds for average precision (adjustable CV between 5 and 25%) and accuracy (adjustable between 5 and 25%; see Figure 2 for checkpoint flowchart). With respect to the level removal strategy, in low-to-high, the lowest concentration level is permanently removed in a sequential manner if the accuracy requirement is violated in any of the remaining levels. In the end-to-end, the lowest and highest concentration levels are independently and temporarily removed, the coefficient of determination (R2) values of the 1141
DOI: 10.1021/pr5010955 J. Proteome Res. 2015, 14, 1137−1146
Article
Journal of Proteome Research
Figure 4. Three examples of quantitative analysis of a sample using regression information from the standard curve based on the reference (control) sample. The different colors refer to the quality of the measurement: green implies that the results can be trusted: yellow means the measurement needs careful consideration and red means that further investigation is required. The latter case arises when there is a large difference between the concentration of the spiked-in SIS in the sample and the NAT concentration from which the curve was generated. This suggests that the calibration curve is not applicable to the new samples, and a reconsideration of the measurement with closer SIS concentration to the curve specific one is needed. (See Sections 3.1 and 3.2.) Yellow appears mainly when the new measurement suggests that the dynamic range of the generated standard curve does not cover the actual dynamic range. (See Section 3.5 for details.)
The Quality Assessment tab provides access to an overview of the quality of the sample measurements. Specifically, it provides a visual indication (through a color-coded table) of the deviation in determined concentration between SPM and regression analysis
asterisk from regression analysis) is plotted to provide a quick visualization and assessment of their relationship with the assay’s dynamic range. All determined concentrations of the proteolytic samples are provided in this page in a summary table. 1142
DOI: 10.1021/pr5010955 J. Proteome Res. 2015, 14, 1137−1146
Article
Journal of Proteome Research
including the confidence interval and LOQs, are applicable to the new samples. In our algorithm, once the reference curves are generated and the user has adjusted all of the parameters, they can be used to estimate the concentrations in the samples. Here only the responses from the heavy and light peptides are needed, and neither replicates nor concentration levels are required. (See the Supporting Information.) The software also performs a quality assessment by comparing the concentration values calculated from the standard curve with concentrations calculated by applying eq 1. As previously mentioned, eq 1 is based on the premise that the NAT and SIS peptides behave similarly in the measuring system and is used throughout the generation of the curve. Therefore, it is also valid for new measurements, and the closer the value estimated by eq 1 to that determined from the corresponding standard curve, the more applicable the statistics collected from the curve on that measurement will be. The software requires the difference to be less than a threshold (with 5% as the default) and determines whether the values are within the linear dynamic range and classifies the results into three categories according to the comparison results: good measurement, after-user approval is required, and remeasurement is required. The three categories are reported visually in “trafficlight” colors: green, yellow, and red, respectively, to allow a fast inspection of the results. (See Section 3.3.) In general, the yellow category implicates the measurement of samples that lie beyond the original standard curve; that is, the values fell below the original LLOQ or above the original ULOQ. Here the user can decide whether to accept the extrapolation or to repeat the reference experiment to extend the dynamic range appropriately.
as well as the concentration’s relation to the assay’s range of linearity. The results are highlighted in green to denote acceptable, yellow for caution, and red to imply unacceptable. Further to the latter, an unacceptable flag is returned if, for instance, the derived concentration is outside the range of linearity. In this case, the user can repeat the measurement with a different SIS peptide concentration or accept the SPM value, for which there are no statistics available. The coding provides a rapid visual of which measurements can be trusted and which ones need more careful consideration. (See Figure 4 and Supporting Information Figure 1.) 3.4. Input and Output Formats
For ease of use and to keep our implementation compatible with most other software packages, the input and result files are in the comma-separated value (CSV) format. Two input files (peptide and protein level) are required to generate peptide standard curves and to quantify NAT in the reference and new samples. The peptide level file minimally requires the responses from the SIS and NAT peptides for all replicates of all concentration levels. (See Supporting Information Table 1.) Additional optional categories can include peak widths and retention times, for instance. The protein level file contains information regarding the concentrations levels (i.e., dilution ratio, number of levels, spiked-in SIS peptide, or protein concentration) and the protein (i.e., molecular weight; see Supporting Information Table 2) for the reference curves. The same protein-level file can, however, be used when applying the standard curve to the samples. Here only the SIS peptide (or protein) concentration and protein molecular weight are needed and used. This is collectively tabulated on a per-compound basis, which is adaptable (we indicate the protein, peptide, percursor, and product ion) but must be consistent between file types. In most cases (e.g., MassHunter Quantitative Analysis), vendor software provides the processed data in the Qualis-SIS supported format. This shortens the time between the initial verification of peak selection/integration with user-selected software and the final analysis with Qualis-SIS. Because of the simple CSV format, it is also possible to import data that have been exported from other commonly used software programs, for instance, from Skyline, for which we have an import option. The results can be exported as CSV files. These files include both detailed and summarized tables of attributes and metrics from the reference and sample quant analyses. (See Supporting Information Table 3 for an example.) The files can be subsequently imported into spreadsheet software for viewing or for additional computation or imported into other scientific processing tools, such as R or MATLAB, for further statistical analysis.
3.6. Testing and Validation
A variety of data sets were assessed and compared with manual processing techniques to evaluate the software’s utility and verify the algorithm’s correctness. The test data sets included condensed and complete panels from previous developments in our laboratory that used targeted protein quant methodologies.19,20 Specifically, the data were obtained from bottom-up LC−MRM/MS analyses of human plasma and of cerebrospinal fluid (CSF). The human plasma data set contains 92 peptides, each measured in triplicates at 7 SIS concentration levels spanning a 10 000-fold range in concentration. Comparing the automatically calculated values with the manually calculated values (i.e., the NAT concentration, the LLOQ, and the ULOQ), we sometimes noticed a difference of 0.01 fmol. This was because our software rounds only in the final step before displaying the results on the screen and is therefore more accurate. We compared the time spent to manually generate standard curves for the 92 peptide data set, with the time required for our software to perform the calculation. This included enforcing all of the level-acceptance criteria and applying one of the concentration-level removal methods. The average speed required to perform the calculations using off-the-shelf software (i.e., MS Excel and Sigma Plot) was about 5−10 min per peptide. Using our software tool, it took ∼2 s to calculate the concentrations of all 92 peptides on a laptop running an Intel i5-4200U processor (a reduction in time by a factor of ∼150−300). In an entire analysis workflow, including the generation of standard curves from a reference data set and using these to calculate protein concentrations in new samples in a server-client setting, we measured 15 s for the whole analysis workflow. This included the user “clicking” to upload the four needed input files
3.5. Application to New Samples
The generated standard curves are specific to the matrix or reference sample that is used to generate them. (See Figure 1.) This is because of the experimental design that keeps the matrix or reference sample unchanged and spikes-in the heavy labeled peptide at different concentration levels. (See the Materials and Methods.) This means that the generated standard curves are not generally applicable and using them to estimate the concentrations from later single point measurement in samples is limited. Ideally, the concentration of the heavy labeled peptide used in the samples should be as close as possible to the concentration of the endogenous peptide in the matrix or reference sample from which the standard curve was generated. This ensures that the generated curve and related statistics, 1143
DOI: 10.1021/pr5010955 J. Proteome Res. 2015, 14, 1137−1146
Article
Journal of Proteome Research
Figure 5. Example of an input data set with incorrect peak integration/picking. When generating a standard curve in an experiment with replicates, Qualis-SIS shows any in correct peak picking or integration from previous steps (performed in external software) as in panel A. This happens often at the lower concentration levels of the internal standard. Using our software the user can easily see these, which are shown in red on the screen (A). He can then go back to the original data (B) using the inspection software of his choice (for instance, SkyLine or MassHunter Quantitative Analysis), correct those specific peaks (C), and run the data again in Qualis-SIS (D).
virtualized server that can be adjusted to such situations by adding more virtual resources. Alternatively, users who do not have access to powerful machines can always choose to divide any input data set into smaller input files and run these in series or in parallel on multiple copies of Qualis-SIS and then combine the results. This is possible because of the simple CSV input format used.. However, because of the very fast data processing compared with the time spent on data acquisition, the users will likely choose to spend most of the time manipulating the results and trying multiple parameters on the results from the whole data set. We tested how our algorithm handles incorrect peak picking and interferences in the input data. For this test, we measured 101 peptides in human plasma, each measured in quintuplicate replicates at six SIS concentration levels spanning a 500-fold range in concentration. We performed peak inspection on all of the data obtained, manually corrected any incorrect peak picking, and monitored the data for interferences. We then fed the corrected data as well as the raw data into our software to test its performance and behavior. Out of the 3030 data points (i.e., 5 repeats of 101 peptides at 6 concentration levels), we were able to use Qualis-SIS to identify 16 peptides for which incorrect peak picking/integration was performed at one concentration level; that is, the software flagged that level as red in the plots and removed it from the data that is included for generating the standard curves. However, 63 other removed levels in Qualis-SIS were extreme data points and manual correction was needed but did not lead to improvements in the curve. This is a massive
from a client machine in Europe to a server on the west coast of Canada. Most of the time was spent in opening the upload window and choosing the files. This, however, did not include the time required for online inspection and manipulation of the results by the user. We also manually evaluated the quality assessment matrix obtained by the software and verified the different assessments made by the software. To simulate future-use cases and to test the software on larger multiplexed MRM experiments, we used our CSF data set with 375 peptides measured in quintuplicate at 7 SIS concentration levels spanning also a 10 000-fold range. The entire analysis took 3.5 s when running the software on the command line. Unlike running the software from the user interface, running it from the command line does not involve any extra overhead regarding, for instance, visualization on the screen or preparing the results for download as CSV files (i.e., this is the pure processing time). This can be useful in cases where the user wants to use the software in a bigger data-processing pipeline and to apply fixed parameters (i.e., the same accuracy, precision, concentration removal method, etc.). When using the software from the user interface it needed ∼6.5 s for the final results to be ready for download. Regarding the processing of even larger amounts of data, for example, thousands of peptides as in PRM experiments, we believe that our software can easily be scaled-up to run such analyses due to the technology used, that is, R and reactivity. For this, running the software on a powerful machine with large amounts of memory and fast processor should help. In our production environment, we curently run the software on a 1144
DOI: 10.1021/pr5010955 J. Proteome Res. 2015, 14, 1137−1146
Article
Journal of Proteome Research
navigate and explore the results almost immediately. The input values are simple comma-separated values and are uploaded as two files that contain the information about the endogenous and heavy labeled peptide responses as well as other necessary information, such as the spiked-in SIS concentration levels and protein molecular weights. The software also allows using the standard curves generated from a reference sample to calculate the concentration in new samples. Here the reference sample is used to generate standard curves for the different peptides (i.e., the same reference sample is measured with different concentration levels of the heavy labeled peptide added). The standard curves are used to estimate the concentrations and the confidence intervals in the new samples. (See Figure 1.) In this case the software also assists in the quality of the measurements in the new samples as the user can have a quick overview of the quality of the measurements by looking at a “traffic-light” coded matrix, where each cell corresponds to one peptide in one sample. Compared with manual processing of this part of the analysis pipeline, which can take up to several hours for one multiplexed targeted experiment with hundreds of peptides, our software requires only seconds to finish all calculations. Therefore, rather than invest time in performing intensive calculations, the user can rapidly review the results from Qualis-SIS and modify any default parameters to better suit the experiment, for instance to enforce different accuracies or precision. This changes the user experience and opens the door to using MRM, or PRM, for applications where standardized calculations and quality assessment are essential as in clinical applications. The software described here is available under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA) on bioinformatics.proteincentre.com. Access to the online version of the software is available at http:// bioinformatics.proteincentre.com:3838/qualis-sis/.
reduction in the number of the integrations that the user has to visually inspectfrom 3030 to 395 (i.e., (63 + 16) × 5)). As previously described and illustrated in Figure 5, the algorithm will remove a level according to the precision and accuracy constraints defined by the user. Furthermore, the different controls available in the software for specifying the minimum number of data points per level allow the user to fine-tune the behavior of the software toward noisy data, uninspected data, or incorrect peak picking or interferences. To compare Qualis-SIS with similar software, we considered QuaSAR, which is a Skyline plug-in that offers calculation of the limits of detection and quantification along with other statistics. The main difference between our tool and the Skyline plug-in is the application: QuaSAR considers data obtained by diluting the samples and having the internal standard spiked at the same concentration level in the entire sample dilution series. As previously described, this approach is not recommended. (See Generation and Qualification of Peptide Standard Curves.) Another major difference that is crucial for our use is the possibility to adhere to the FDA requirements regarding standard curves, which is possible using our software as previously described. Regarding the usage pattern, we build our software bearing in mind reactivity in which the user interacts with the processing workflow before exporting the final results. In QuaSAR the user only sees the final results when the whole processing pipeline is finished, which takes almost 20 min to finish 125 peptides on an Intel Xeon E5 processor with 16 GB of RAM, which is faster than manual calculation, but more than 100 times slower than our software comparing the processing time needed per peptide. An advantage of QuaSAR is that it is available as a plugin in the widely used Skyline platform. We chose instead to make our software an independent module that runs with minimal input data from any software including Skyline.
■
4. CONCLUSIONS A targeted, multiplexed MRM or PRM experiment contains hundreds of peptides measured in three or more replicates at multiple concentration levels that span several fold-changes. Generating the standard curves and determining which concentration levels conform to specific accuracy and precision criteria can be a cumbersome process, where researchers have to perform several steps manually and repeatedly. Here we present a software for the automated generation of standard curves in multiplexed MRM or PRM analyses. The software has been developed specifically for the case where the matrix is left unchanged and the heavy labeled peptides are spiked in at different concentrations. Despite this tailored application, we note that it can also be applied to the PSAQ27,28 and QconCAT29,30 techniques, wherein the isotopically labeled standards (protein in PSAQ or winged peptides in QconCAT) are spiked into the matrix at the beginning of the analytical workflow in the former or directly prior to proteolysis in the latter because the products are ultimately peptide-based. When targeted MRM proteomics is applied with the light channel as reference, for example, in cultured cells experiments, it is still possible to use Qualis-SIS by simply exchanging the columns in the input file. In determining the dynamic range and limits of quantification, our software supports two strategies for concentration-level removal: end-to-end and low-to-high. The user interface is reactive and changes instantly according to the user’s selections; once the input data have been uploaded, all calculations are done in the background, and the user can
ASSOCIATED CONTENT
S Supporting Information *
Supporting Information Table 1: An example of a peptide input file for the reference curves. Supporting Information Table 2: An example of a protein-level input file. Supporting Information Table 3: Example of the output of the software. Supporting Information Figure 1: The results shown on the Quality Assessment page. Supporting Information Software Tutorial: A tutorial on how to use the software with multiple screenshots. This material is available free of charge via the Internet at http:// pubs.acs.org.
■
AUTHOR INFORMATION
Corresponding Author
*Phone: +1-250-483-3221. Fax: +1-250-483-3238. E-mail:
[email protected]. Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS We are grateful to Genome Canada, Genome British Columbia, and Western Economic Diversification of Canada for providing Science and Technology Innovation Centre funding and support to the UVic-Genome British Columbia Proteomics Centre. We are thankful to Linghong Lu (mathematics graduate student) and Mary L. Lesperance (mathematics professor), both statistical 1145
DOI: 10.1021/pr5010955 J. Proteome Res. 2015, 14, 1137−1146
Article
Journal of Proteome Research
(16) Ihaka, R.; Gentleman, R. R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics 1996, 5, 299−314. (17) RStudio Shiny: a Web Application Framework for R. http://shiny. rstudio.com/ (July 19, 2014). (18) ShinyDash: Dashboard Implementation for Shiny. https://github. com/trestletech/ShinyDash (July 19, 2014). (19) Percy, A. J.; Yang, J.; Chambers, A. G.; Simon, R.; Hardie, D. B.; Borchers, C. H. Multiplexed MRM with Internal Standards for Cerebrospinal Fluid Candidate Protein Biomarker Quantitation. J. Proteome Res. 2014, 13 (8), 3733−3747. (20) Percy, A. J.; Chambers, A. G.; Yang, J.; Hardie, D. B.; Borchers, C. H. Advances in Multiplexed MRM-based Protein Biomarker Quantitation Toward Clinical Utility. Biochim. Biophys. Acta 2014, 1844 (5), 917−926. (21) Stahnke, H.; Kittlaus, S.; Kempe, G.; Alder, L. Reduction of matrix effects in liquid chromatography-electrospray ionization-mass spectrometry by dilution of the sample extracts: how much dilution is needed? Anal. Chem. 2012, 84 (3), 1474−1482. (22) Chambers, A. G.; Percy, A. J.; Yang, J.; Camenzind, A. G.; Borchers, C. H. Multiplexed Quantitation of Endogenous Proteins in Dried Blood Spots by Multiple Reaction Monitoring Mass Spectrometry. Mol. Cell. Proteomics 2013, 12 (3), 781−791. (23) Chen, Y.-T.; Chen, H.-W.; Domanski, D.; Smith, D. S.; Liang, K.H.; Wu, C.-C.; Chen, C.-L.; Chung, T.; Chen, M.-C.; Chang, Y.-S.; Parker, C. E.; Borchers, C. H.; Yu, J.-S. Multiplexed Quantification of 63 proteins in Human Urine by Multiple Reaction Monitoring-based Mass Spectrometry for Discovery of Potential Bladder Cancer Biomarkers. J. Proteomics 2012, 75 (12), 3529−3545. (24) Chambers, A. G.; Percy, A. J.; Hardie, D. B.; Borchers, C. H. Comparison of proteins in whole blood and dried blood spot samples by LC/MS/MS. J. Am. Soc. Mass Spectrom. 2013, 24 (9), 1338−1345. (25) Massart, D. L.; Vandeginste, B. G. M.; Buydens, L. M. C.; De Jong, S.; Lewi, P. J.; Smeyers-Verbeke, J. Handbook of Chemometrics and Qualimetrics, Part A; Elsevier: Amsterdam, The Netherlands, 1997. (26) Parker, P. A.; Geoffrey, V. G.; Wilson, S. R.; Szarka, J. L.; Johnson, N. G. The Prediction Properties of Inverse and Reverse Regression for the Simple Linear Calibration Problem; NASA Technical Report LF99-9222; NASA Langley Research Center: Hampton, VA, 2010. (27) Huillet, C.; Adrait, A.; Lebert, D.; Picard, G.; Trauchessec, M.; Louwagie, M.; Dupuis, A.; Hittinger, L.; Ghaleh, B.; Le Corvoisier, P.; Jaquinod, M.; Garin, J.; Bruley, C.; Brun, V. Accurate quantification of cardiovascular biomarkers in serum using Protein Standard Absolute Quantification (PSAQ) and selected reaction monitoring. Mol. Cell. Proteomics 2012, 11 (2), M111.008235. (28) Picard, G.; Lebert, D.; Louwagie, M.; Adrait, A.; Huillet, C.; Vandenesch, F.; Bruley, C.; Garin, J.; Jaquinod, M.; Brun, V. PSAQ standards for accurate MS-based quantification of proteins: from the concept to biomedical applications. J. Mass Spectrom. 2012, 47 (10), 1353−1363. (29) Pratt, J. M.; Simpson, D. M.; Doherty, M. K.; Rivers, J.; Gaskell, S. J.; Beynon, R. J. Multiplexed absolute quantification for proteomics using concatenated signature peptides encoded by QconCAT genes. Nat. Protoc. 2006, 1 (2), 1029−1043. (30) Simpson, D. M.; Beynon, R. J. QconCATs: design and expression of concatenated protein standards for multiplexed protein quantification. Anal. Bioanal. Chem. 2012, 404 (4), 977−989.
consultants at the University of Victoria, for stimulating discussions. We also acknowledge Carol E. Parker (editorial scientist at the UVic-Genome BC Proteomics Centre) for her assistance in editing the manuscript.
■
REFERENCES
(1) Altelaar, A. F.; Frese, C. K.; Preisinger, C.; Hennrich, M. L.; Schram, A. W.; Timmers, H. T.; Heck, A. J.; Mohammed, S. Benchmarking stable isotope labeling based quantitative proteomics. J. Proteomics 2013, 88, 14−26. (2) Rodríguez-Suárez, E.; Whetton, A. D. The application of quantification techniques in proteomics for biomedical research. Mass Spectrom. Rev. 2013, 32 (1), 1−26. (3) Kuzyk, M. A.; Smith, D.; Yang, J.; Cross, T. J.; Jackson, A. M.; Hardie, D. B.; Anderson, N. L.; Borchers, C. H. Multiple reaction monitoring-based, multiplexed, absolute quantitation of 45 proteins in human plasma. Mol. Cell. Proteomics 2009, 8 (8), 1860−1877. (4) Barnidge, D. R.; Goodmanson, M. K.; Klee, G. G.; Muddiman, D. C. Absolute Quantification of the Model Biomarker Prostate-Specific Antigen in Serum by LC-MS/MS Using Protein Cleavage and Isotope Dilution Mass Spectrometry. J. Proteome Res. 2004, 3 (3), 644−652. (5) Domon, B. Considerations on selected reaction monitoring experiments: implications for the selectivity and accuracy of measurements. Proteomics Clin. Appl. 2012, 6 (11−12), 609−614. (6) Lange, V.; Picotti, P.; Domon, B.; Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol. 2008, 4, 222. (7) Peterson, A. C.; Russell, J. D.; Bailey, D. J.; Westphall, M. S.; Coon, J. J. Parallel reaction monitoring for high resolution and high mass accuracy quantitative, targeted proteomics. Mol. Cell. Proteomics 2012, 11 (11), 1475−88. (8) Kim, Y. J.; Gallien, S.; van Oostrum, J.; Domon, B. Targeted proteomics strategy applied to biomarker evaluation. Proteomics Clin. Appl. 2013, 7, 11−12. (9) Gallien, S.; Bourmaud, A.; Kim, S. Y.; Domon, B. Technical considerations for large-scale parallel reaction monitoring analysis. J. Proteomics 2014, 100, 147−159. (10) Carr, S. A.; Abbatiello, S. E.; Ackermann, B. L.; Borchers, C.; Domon, B.; Deutsch, E. W.; Grant, R. P.; Hoofnagle, A. N.; H Uumlttenhain, R.; Koomen, J. M.; Liebler, D. C.; Liu, T.; Maclean, B.; Mani, D. R.; Mansfield, E.; Neubert, H.; Paulovich, A. G.; Reiter, L.; Vitek, O.; Aebersold, R.; Anderson, L.; Bethem, R.; Blonder, J.; Boja, E.; Botelho, J.; Boyne, M.; Bradshaw, R. A.; Burlingame, A. L.; Chan, D.; Keshishian, H.; Kuhn, E.; Kinsinger, C.; Lee, J.; Lee, S. W.; Moritz, R.; Oses-Prieto, J.; Rifai, N.; Ritchie, J.; Rodriguez, H.; Srinivas, P. R.; Townsend, R. R.; Van Eyk, J.; Whiteley, G.; Wiita, A.; Weintraub, S. Targeted Peptide Measurements in Biology and Medicine: Best Practices for Mass Spectrometry-based Assay Development Using a Fit-for-Purpose Approach. Mol. Cell. Proteomics 2014, 13 (3), 907−917. (11) Paulovich, A. G.; Whiteaker, J. R.; Hoofnagle, A. N.; Wang, P. The interface between biomarker discovery and clinical validation: the tar pit of the protein biomarker pipeline. Proteomics: Clin. Appl. 2008, 2 (10− 11), 1386−1402. (12) MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.; Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 2010, 26 (7), 966−968. (13) Skyline. Skyline Targeted Proteomics Environment. https://skyline. gs.washington.edu/labkey/project/home/software/Skyline/begin.view (July 19, 2014). (14) U.S. Food and Drug Administration; U.S. Department of Health and Human Services; Food and Drug Administration. Guidance for Industry: Bioanalytical Method Validation, 2001. http://www.fda.gov/ downloads/Drugs/GuidanceComplianceRegulatoryInformation/ Guidances/ucm070107.pdf. (15) The Broad Institute. GenePattern. http://www.broadinstitute. org/cancer/software/genepattern/ (Sept. 3, 2014). 1146
DOI: 10.1021/pr5010955 J. Proteome Res. 2015, 14, 1137−1146