ChromEval: A Software Application for the Rapid Evaluation of HPLC

May 26, 2010 - Despite the importance of the HPLC system for both shotgun and targeted ... Hence most users spend considerable time developing a set o...
1 downloads 0 Views 5MB Size
Anal. Chem. 2010, 82, 5060–5068

ChromEval: A Software Application for the Rapid Evaluation of HPLC System Performance in Proteomic Applications Ian Sigmon,† Lik Wee Lee,† Deborah K. Chang,† Nicolle Krusberski,† Daniella Cohen,† Jimmy K. Eng,†,‡ and Daniel B. Martin*,† Institute for Systems Biology, 1441 N. 34th St. Seattle, Washington 98103, and University of Washington, Seattle, Washington Mass spectrometry-based proteomics is typically performed using high performance liquid chromatography (HPLC) to introduce peptides into the instrument via electrospray ionization. A variety of configurations exist with varying degrees of precision and cost, but the ultimate goal is the reproducible delivery of peptides in well-separated elution peaks. It is well-known that the quality of chromatography can have a dramatic effect on sample identification as well as run-to-run reproducibility, which is especially important for quantitative analyses. Despite the importance of the HPLC system for both shotgun and targeted proteomics, there are few tools available to monitor HPLC performance. In this paper, we describe a new open-source software application, named ChromEval, to allow rapid assessment of HPLC performance, as well as to provide other metrics of mass spectrometer performance, including mass accuracy calibration. ChromEval permits the user to visually monitor the elution of a set of standard peptides in quality control runs interspersed among a regular workflow. To perform these tasks, ChromEval searches mzXML files using Tandem and presents the peptide results in a graphical user interface (GUI) that allows fast assessment of chromatography by visualization of superimposed elution peaks. This tool facilitates the identification and troubleshooting of chromatography problems such as retention time shifts and variance in sample loading due to autosampler error. It also provides crude but consistent metrics of instrument performance including mass accuracy calibration and number of peptides identified from the standard mixture. ChromEval generates easily interpretable data quickly and thereby enables go/no-go decision making during intensive instrument operation. Mass spectrometry (MS)-based proteomics has emerged as a mainstream tool in molecular and systems biology.1,2 Because the * To whom correspondence should be addressed. E-mail: dmartin@ systemsbiology.org. † Institute for Systems Biology. ‡ University of Washington. (1) Yates, J. R.; Ruse, C. I.; Nakorchevsky, A. Annu Rev Biomed Eng 2009, 11, 49–79. (2) Pan, S.; Aebersold, R.; Chen, R.; Rush, J.; Goodlett, D. R.; McIntosh, M. W.; Zhang, J.; Brentnall, T. A. J. Proteome Res. 2009, 8, 787–797.

5060

Analytical Chemistry, Vol. 82, No. 12, June 15, 2010

majority of users interface a high performance liquid chromatography (HPLC) system with a mass spectrometer to facilitate the delivery of peptides generated by enzymatic protein digestion, the HPLC system plays a key role in the performance of the system. The most commonly used HPLC systems in LC-MS employ a packed bed of C18 resin to which a gradient of aqueous and organic solvents (typically water and ACN) is applied (reviewed in refs 3 and 4). The C18 resin retains peptides until a peptidespecific, critical concentration of ACN is reached, after which the given peptide spends increasing time in the mobile phase and travels through the column into the mass spectrometer. The net effect is to deliver each peptide in a small elution volume at a high concentration for maximum signal in the instrument. Reversed-phase separation by HPLC spreads the population of peptides produced by enzymatic digestion across time and allows the mass spectrometer, which has a limited duty cycle, the greatest chance of sampling each individual peptide. For shotgun proteomics, it is desirable to deliver peptides as tall, narrow peaks; shorter, broader peaks can lead to a failure to trigger MS/MS or to the oversampling of a peptide if its elution time exceeds the dynamic exclusion window set on the instrument. Hence most users spend considerable time developing a set of chromatographic conditions optimal for their expected sample types5 (e.g., simple or complex mixtures) and expect constant performance to ensure the generation of high-quality data. Recent studies demonstrating differences among the MS results from identical standard samples analyzed by different laboratories have illustrated that many challenges exist to generating consistent MS results, one of which is chromatography.6-8 Besides shotgun proteomics, other quantitative applications depend heavily on the performance of the HPLC system. Analyses that focus on identifying peptides through an accurate measurement of their m/z values combined with their presence at expected retention times (LC-MS without MS/MS)9-11 require highly reproducible chromatography. While some alignment of elution times across runs is typically performed to overcome the problem of retention time shifting, the more reproducible the retention (3) Sandra, K.; Moshir, M.; D’Hondt, F.; Verleysen, K.; Kas, K.; Sandra, P. J. Chromatogr., B Anal. Technol. Biomed. Life Sci. 2008, 866, 48–63. (4) Mitulovic, G.; Mechtler, K. Briefings Funct. Genomics Proteomics 2006, 5, 249–260. (5) Xu, P.; Duong, D. M.; Peng, J. J. Proteome Res. 2009, 8, 3944–3950. (6) Aebersold, R. Nat. Methods 2009. 10.1021/ac100043x  2010 American Chemical Society Published on Web 05/26/2010

times of peptides are to begin with, the more data that can be confidently obtained through these workflows. In addition, if these analyses are to be quantitative and comparable, the HPLC system must also properly load the correct amount of sample onto the column in order to ensure accurate measurement. This is the function of the autosampler and, to a lesser extent, the valves that direct flow in the system. When these components of a system are working improperly, there will be increased variance in the intensity of peptide elution peaks without any alteration in peak retention time. Problems with chromatography can be difficult to identify. In a facility setting, operators typically do not know the details of what is in an individual sample or how it was prepared. Thus, it is usually not possible to assess a client’s sample chromatographically; instead, performance is typically evaluated using the strength of the total ion chromatogram trace or the number of high-confidence peptide assignments identified in a subsequent search. Hence, many users perform quality control analyses between analytical runs using a single peptide or the tryptic digest of a mix of purified proteins, either made in-house or purchased commercially as a premixed standard. However, even with standard samples, evaluation of HPLC system performance is a challenge. First, the use of a single peptide as a standard, while quick, cheap, and easy, will only detect a narrow range of chromatography problems specific to the elution time of the peptide and possibly to the physical properties of the peptide. When abnormalities are noted, having only a single peak to evaluate the issue often does not provide enough information to troubleshoot the problem. It is clearly more desirable to have many peptide data (7) Bell, A. W.; Deutsch, E. W.; Au, C. E.; Kearney, R. E.; Beavis, R.; Sechi, S.; Nilsson, T.; Bergeron, J. J.; Beardslee, T. A.; Chappell, T.; Meredith, G.; Sheffield, P.; Gray, P.; Hajivandi, M.; Pope, M.; Predki, P.; Kullolli, M.; Hincapie, M.; Hancock, W. S.; Jia, W.; Song, L.; Li, L.; Wei, J.; Yang, B.; Wang, J.; Ying, W.; Zhang, Y.; Cai, Y.; Qian, X.; He, F.; Meyer, H. E.; Stephan, C.; Eisenacher, M.; Marcus, K.; Langenfeld, E.; May, C.; Carr, S. A.; Ahmad, R.; Zhu, W.; Smith, J. W.; Hanash, S. M.; Struthers, J. J.; Wang, H.; Zhang, Q.; An, Y.; Goldman, R.; Carlsohn, E.; van der Post, S.; Hung, K. E.; Sarracino, D. A.; Parker, K.; Krastins, B.; Kucherlapati, R.; Bourassa, S.; Poirier, G. G.; Kapp, E.; Patsiouras, H.; Moritz, R.; Simpson, R.; Houle, B.; Laboissiere, S.; Metalnikov, P.; Nguyen, V.; Pawson, T.; Wong, C. C.; Cociorva, D.; Yates Iii, J. R.; Ellison, M. J.; Lopez-Campistrous, A.; Semchuk, P.; Wang, Y.; Ping, P.; Elia, G.; Dunn, M. J.; Wynne, K.; Walker, A. K.; Strahler, J. R.; Andrews, P. C.; Hood, B. L.; Bigbee, W. L.; Conrads, T. P.; Smith, D.; Borchers, C. H.; Lajoie, G. A.; Bendall, S. C.; Speicher, K. D.; Speicher, D. W.; Fujimoto, M.; Nakamura, K.; Paik, Y. K.; Cho, S. Y.; Kwon, M. S.; Lee, H. J.; Jeong, S. K.; Chung, A. S.; Miller, C. A.; Grimm, R.; Williams, K.; Dorschel, C.; Falkner, J. A.; Martens, L.; Vizcaino, J. A. Nat. Methods 2009. (8) Turck, C. W.; Falick, A. M.; Kowalak, J. A.; Lane, W. S.; Lilley, K. S.; Phinney, B. S.; Weintraub, S. T.; Witkowska, H. E.; Yates, N. A. Mol. Cell Proteomics 2007, 6, 1291–1298. (9) Piening, B. D.; Wang, P.; Bangur, C. S.; Whiteaker, J.; Zhang, H.; Feng, L. C.; Keane, J. F.; Eng, J. K.; Tang, H.; Prakash, A.; McIntosh, M. W.; Paulovich, A. J. Proteome Res. 2006, 5, 1527–1534. (10) Rudnick, P. A.; Clauser, K. R.; Kilpatrick, L. E.; Tchekhovskoi, D. V.; Neta, P.; Blonder, N.; Billheimer, D. D.; Blackman, R. K.; Bunk, D. M.; Cardasis, H. L.; Ham, A. J.; Jaffe, J. D.; Kinsinger, C. R.; Mesri, M.; Neubert, T. A.; Schilling, B.; Tabb, D. L.; Tegeler, T. J.; Vega-Montoto, L.; Variyath, A. M.; Wang, M.; Wang, P.; Whiteaker, J. R.; Zimmerman, L. J.; Carr, S. A.; Fisher, S. J.; Gibson, B. W.; Paulovich, A. G.; Regnier, F. E.; Rodriguez, H.; Spiegelman, C.; Tempst, P.; Liebler, D. C.; Stein, S. E. Mol Cell Proteomics 2010, 9, 225–241. (11) Tabb, D. L.; Vega-Montoto, L.; Rudnick, P. A.; Variyath, A. M.; Ham, A. J.; Bunk, D. M.; Kilpatrick, L. E.; Billheimer, D. D.; Blackman, R. K.; Cardasis, H. L.; Carr, S. A.; Clauser, K. R.; Jaffe, J. D.; Kowalski, K. A.; Neubert, T. A.; Regnier, F. E.; Schilling, B.; Tegeler, T. J.; Wang, M.; Wang, P.; Whiteaker, J. R.; Zimmerman, L. J.; Fisher, S. J.; Gibson, B. W.; Kinsinger, C. R.; Mesri, M.; Rodriguez, H.; Stein, S. E.; Tempst, P.; Paulovich, A. G.; Liebler, D. C.; Spiegelman, C. J. Proteome Res. 2010, 9, 761–776.

points representing a variety of peptides spread across elution time to evaluate the chromatography and sample loading as well as to diagnose the cause of any problems. Unfortunately, while most mass spectrometry control systems allow the assessment of single ion traces across multiple analyses, these tools are not designed to facilitate the analyses of multiple peptides across multiple runs. Further, the software tools that exist on the mass spectrometer itself typically require the user to enter by hand the particular m/z values to be analyzed. This creates a substantial barrier to the routine evaluation of HPLC performance. A number of performance metrics have been reported that can be used to determine LC-MS performance,9-11 however these metrics are not part of any available open source visual curation software package. A subscription-based commercial package called MassQC (https://www.massqc.com) is available from Proteome Software. This package is an online service that analyzes, displays, and compares standard LC-MS/MS runs and includes many of the performance metrics developed by NIST.10 NIST also distributes NISTMSQC (http://peptide.nist.gov/metrics), a command line based tool limited to analysis of data acquired on Thermo mass spectrometers that produces a text file output. At this time there are no open source quality control (QC) software packages available for day-to-day direct visual assessment of LC-MS performance, and none that operates locally. To address these issues, we have developed an open-source tool called ChromEval that allows the rapid assessment of HPLC performance using a user-defined peptide set that is intended to be run regularly as part of an instrument quality control program. ChromEval is designed to be platform-independent, using input files in the generic mzXML format.12 This tool provides the user with both quantitative and qualitative visual data that can be generated in a matter of minutes to assess retention time reproducibility, sample loading, and mass spectrometry system performance. With ChromEval, operators can quickly determine the overall performance of their systems and determine whether to continue operations or remediate problems. MATERIALS AND METHODS Software Implementation. ChromEval allows easy and rapid visual evaluation of the performance of an automated LC separation system. ChromEval is open-source and provided under the Apache 2.0 license at http://martin.systemsbiology.net/tools.php. It includes all required files for operation as well as a sequence database used in earlier work describing the ISB standard mix12 and test data for demonstration purposes. The program plots chromatograms for selected peptide ions across multiple mass spectrometry runs. It is written in C# using Microsoft Visual Studio 2008 based on Microsoft’s .NET framework 3.5 and WinForms. It uses Tandem13 to perform protein/peptide identifications, ProteoWizard14 library and its dependencies to read the mzXML data format, and ZedGraph (available at https://sourceforge.net/ projects/zedgraph/) for drawing and displaying chromatograms. ChromEval currently supports two types of data files: Thermo RAW files (with properly preinstalled Thermo libraries) and (12) Klimek, J.; Eddes, J. S.; Hohmann, L.; Jackson, J.; Peterson, A.; Letarte, S.; Gafken, P. R.; Katz, J. E.; Mallick, P.; Lee, H.; Schmidt, A.; Ossola, R.; Eng, J. K.; Aebersold, R.; Martin, D. B. J Proteome Res 2008, 7, 96–103. (13) Craig, R.; Beavis, R. C. Bioinformatics 2004, 20, 1466–1467. (14) Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P. Bioinformatics 2008, 24, 2534–2536.

Analytical Chemistry, Vol. 82, No. 12, June 15, 2010

5061

Figure 1. ChromEval primary window. Data is shown for four back-to-back analyses on an Orbitrap equipped with a nanopump. (A) Protein Pane: Shows a list of selectable proteins identified by Tandem. (B) Search Results Pane: Shows a count of uniquely identified peptides passing the filtering criteria for each loaded data file. (C) Control Panel: Collection of buttons and checkbox used to operate the program. (D) Chromatogram Window: Allows viewing, zooming, saving, and printing of chromatograms. (E) Peptide Grid: Displays peptide identification information.

mzXML files converted in centroid mode. If RAW files are specified as input, ChromEval converts them to mzXML by calling the ReAdW converter.15 The mass spectra of the mzXML files are searched with Tandem using a user-specified protein database and parameters file, and the Tandem output files are then converted to the pep.xml15 format using Tandem2XML.15 If the corresponding pep.xml files already exist, the Tandem search will either be skipped or the existing files will be overwritten, a decision that is controlled by a user setting. ChromEval also calls the XPressPeptideParser,15 which extracts a chromatogram and calculates an area under the curve (AUC) for each identified peptide using m/z windows defined in the “mass tolerances” file. After all conversions and searches are performed, the program parses the pep.xml files to load relevant peptide and protein data for display, including peptide mass, precursor charge state, and scan retention time. Tandem expectation scores are used to evaluate the peptide sequence assignments and those below a user adjustable expectation score cutoff (default 0.01) are retained. A second filtering parameter can be applied to remove peptides assigned to proteins listed in the database with a decoy identification string. The results of these two filtering steps are presented to the user as the unique peptide count. For purposes of display, the search results can be further filtered by the contents of the protein filter text file, which contains (15) Keller, A.; Eng, J.; Zhang, N.; Li, X. J.; Aebersold, R. Mol. Syst. Biol. 2005, 1, 2005–0017.

5062

Analytical Chemistry, Vol. 82, No. 12, June 15, 2010

a list of expected protein standards or a subset thereof that the user wishes to monitor. If a list is provided, filtering occurs; otherwise all proteins are shown in a tree format in the “protein pane” (Figure 1). The user can then select one or more proteins of interest to populate the “peptide grid” with all associated peptides that pass the expectation value threshold. The display of extracted ion chromatograms can be performed in an automated fashion or by selection of individual entries. For automated operation, ChromEval populates the chromatography pane with the extracted chromatograms for peptides on a list specified by the user, which can be easily created at any time through the GUI. This feature allows the user to quickly get data for the most reliable peptides in their own quality control mixture. When the peptide list is used, the peptide grid is left unpopulated to maximize the size of the chromatography pane. If so desired, the user can choose individual instances of peptides for which to plot chromatograms or the user can choose to plot chromatograms for all identification instances of that peptide. Chromatogram plots are generated through extraction of precursor ion intensities from the MS1 scans of the associated mzXML files across a time window centered on the MS2 scan time of the peptide identification beginning 1.5 min before the first, and 1.5 min after the last MS2 spectrum assigned to that peptide. ChromEval extracts data from the MS1 scans using an m/z tolerance specified for each instrument type in a configuration file available from the tool menu. ChromEval determines instrument type from the mzXML file.

Instrument calibration is determined by extracting the “calculated_neutral_mass,” “mass_diff,” and “charge_state” attributes from the pep.xml file and calculating measurement error in parts per million (ppm). A mean is calculated for peptides within a user specified threshold (default 50 ppm) and presented as a line on a plot showing the ppm error for all identified peptides. Sample Preparation. A mixture of 18 proteins (Supporting Information (SI) Table 1) was prepared as described.12 Briefly, one nanomole of each protein was dissolved in 20 mM, pH 8.0 ammonium bicarbonate with 0.05% SDS added to a final concentration of 1 µM, reduced with 2.5 mM TCEP at 50 °C for 30 min, and alkylated for 1 h with 10 mM iodoacetimide. The proteins were then digested by overnight incubation at 37 °C with sequencing-grade trypsin (Promega, Madison, WI) at a 1:40 (w/w) ratio. Samples were dried in a Speed Vac and cleaned using a Waters (Milford, MA) Oasis MCX cartridge per the manufacturer’s instruction. The final eluate was evaporated and resuspended in 1 mL of 0.1% formic acid (FA), 1% ACN, in HPLC-grade water (VWR, West Chester, PA). For analysis using a complex mixture, yeast was prepared and fractionated to 16 fractions using an Off-Gel fractionator (Agilent) as described.16 Liquid Chromatography-MS/MS Analyses with Gradient and Loading Perturbations. Unless otherwise specified, LC-MS/ MS was performed on a Thermo Scientific (Waltham, MA) LTQ linear ion trap coupled to an Agilent (Santa Clara, CA) HP 1100 series LC system through an electrospray source using a splitflow system as described.17 Peptides were trapped on a fused silica fritted capillary precolumn packed with 2 cm Magic C18Aq RP spherical silica (75 µm ID, 5 µm, 200 Å; Michrom Bioresources, Auburn, CA) and separated over a 10 cm Magic C18Aq RP analytical column (75 µm ID, 5 µm, 100 Å). A binary solvent system consisting of Buffer A (0.1% FA in water) and Buffer B (0.1% FA in ACN). All gradient programs used in this analysis are detailed in SI Table 2. Analysis of the standard samples in back to back injections (Figure 1) a Thermo LTQ Orbitrap instrument was used, equipped with an Agilent nanopump. The chromatography columns were as described above as were the buffers and the gradient program was identical to that described above. For analysis of the complex samples using fractionated whole yeast samples, a Thermo Orbitrap Velos instrument was used, equipped with an Eksigent nanopump. The chromatography columns were as described above as were the buffers. For the samples of chromatography standards the gradient program was identical to that described above. For the yeast samples the gradient was extended to 60 min. RESULTS AND DISCUSSION ChromEval was written to allow easy visualization of selected ion chromatograms across multiple runs to facilitate rapid visual evaluation of the quality of HPLC system performance. It is designed to be incorporated as part of a quality control program in a mass spectrometry laboratory in which machine performance is assessed by regularly running an aliquot of a standard peptide mixture. It is ideally to be used on a local computer to allow the (16) Martin, D. B.; Holzman, T.; May, D.; Peterson, A.; Eastham, A.; Eng, J.; McIntosh, M. Mol. Cell Proteomics 2008, 7, 2270–2278. (17) Yi, E. C.; Lee, H.; Aebersold, R.; Goodlett, D. R. Rapid Commun. Mass Spectrom. 2003, 17, 2093–2098.

operator to quickly assess system performance without having to transfer files or log onto search queues on remote servers. To satisfy these design criteria, ChromEval was configured to contain all the components necessary to permit the user to search files as well as visually evaluate chromatographic traces. In The ReAdW file converter for converting Thermo RAW files to mzXML files is also included with the ChromEval software and will work on computers on which the appropriate version of Xcaliber is installed. ChromEval begins with mzXML files which are searched using Tandem. Tandem was chosen for this application because it provides free, fast, and sensitive search capabilities; however, other search programs could easily be substituted. Spectral assignments with a Tandem expectation score below 0.01, a user-adjustable threshold, are retained for analysis in ChromEval. This expectation score cutoff value was chosen in a brief pilot experiment because it yielded good sensitivity and specificity in our test mixture analysis (data not shown). The search results are converted to a pep.xml file and saved in a user-specified local directory to speed loading of future instances. In benchmark testing, searches of mzXML files containing ∼4500 and ∼9500 scans took 13 and 27 s, respectively, on a computer with an Intel Core 2 Duo Processor, E6600 CPU, and 3 GB of RAM. Default search parameters include full trypsin enzyme specificity, 2 allowed missed cleavages, no variable modifications, carbamidomethylated cysteine as the only static modification, and the use of two process threads. The database used for this testing and provided in our download contains the expected 18 proteins in our standard mix appended to 1709 decoy proteins from Haemophilus influenzae and 92 contaminant proteins seen in prior analyses with this mixture. ChromEval further eliminates distracting decoy protein identifications by retaining only proteins on a user-defined list of the “house standard” proteins used in quality control runs. Visualization of Chromatographic and Tabular Data. ChromEval presents the user with an easily navigated GUI that provides search results for each file as well as the ability to quickly assess the quality of separation and sample loading in quality control standard runs. The ChromEval GUI is organized into multiple interactive panes (Figure 1). The protein pane (panel A) displays those proteins identified in the Tandem search that survived filtration by expectation score and decoy string and were included on the user-specified protein list. The number of unique peptides identified in the database search with the applied filtration of expectation score and decoy string (but not protein list) is presented as a metric of HPLC/MS performance and reproducibility in the search results panel, which also plots the results for each run (panel B). Panel C is the “control panel,” a collection of buttons and checkbox used to operate the program. Buttons on the control panel allow one to clear all plots, plot all entries for a selected peptide, or save selected peptides or proteins as a text file. The protein list refines the filtration of the proteins the next time data is loaded. Entries on the peptide list are automatically displayed in the chromatography grid at the time of data loading to streamline the analysis process. Multiple charge states for an individual peptide may be treated together or separately using a selection box. Chromatography results are visually displayed in the chromatography pane (panel D), which can host as many individual traces as is visually tolerable, each representing the Analytical Chemistry, Vol. 82, No. 12, June 15, 2010

5063

overlaid chromatograms for a single peptide for all loaded data. It is in this pane that the user can quickly assess the reproducibility of a set of quality control runs. ChromEval provides the user with a color-coded legend as well as the peptide string, m/z target, charge state and mass tolerance used within each individual extracted ion chromatogram window. In addition to the plotted extracted ion chromatograms, ChromEval calculates and plots the ppm error for data acquired on a high mass accuracy instrument. Each peptide assignment is plotted as a point color-coded by run along with a mean ppm error calculated using entries within a specified ppm window (default 50). Two other plots can be displayed in the chromatography pane if desired (SI Figure 1). The first is a plot of retention time vs calculated AUC for each peptide on the peptide list. The second is the mass error for all identifications in the analysis. The peptide grid (panel E) displays peptide identification information and is populated by selecting proteins from the protein pane. Data displayed in the peptide grid include the peptide sequence, charge state, retention time, spectrum, protein identifier, and protein description. Checkboxes in the left-hand column of the data grid allow the user to specify which peptides to plot in the chromatography panel. Identification of HPLC and MS System Malfunction with ChromEval. We demonstrate the utility of ChromEval using a series back-to-back technical replicates using our standard method (control) as well as a set of runs that intentionally deviate from our standard method in ways that emulate commonly observed HPLC problems. For the testing we used a mix of 18 commercially available proteins (SI Table 1) that is our in-house quality control mixture. Any single protein or mixture of proteins can function for this purpose as long as the sequences are included in the database used for searching. The chromatography pane in Figure 1 illustrates results that one might expect with a well functioning HPLC system. This data was obtained in back-to-back runs on an LTQ Orbitrap using a nanopump.12 It is clear from the three presented extracted ion chromatograms that the peptide retention times and ion intensities are very consistent across the four runs. The four chromatograms span a time window of approximately 14 min, which is 60% of the duration of the gradient from 10 to 25% ACN. Also indicative of good system performance is the consistency of the number of peptides identified by the Tandem search in the search results pane. This value provides the operator with an orthogonal indicator which, when altered, could indicate physical or method problems with the mass spectrometer that may not be apparent in overlaid chromatograms. To demonstrate the utility of ChromEval for troubleshooting HPLC issues, we generated a series of intentionally perturbed runs on a Thermo LTQ using a split-flow pump that emulate commonly observed HPLC problems (Figure 2 and SI Table 2). Using our standard mix of proteins, we began by adjusting our HPLC program to advance the entire gradient by three minutes. This type of problem can occur through an error in programming as well as through changes in dead volume that might occur when capillary tubing is changed. The results of this analysis are shown in Figure 3, which clearly illustrates that peptides appear earlier in the advanced run compared to the reference by about the same amount at all time points in the run. A second program altered the slope of the HPLC gradient, causing a delay in elution time 5064

Analytical Chemistry, Vol. 82, No. 12, June 15, 2010

Figure 2. Gradient perturbations. A gradient was programmed to deliver all solvent concentrations above 10% ACN three minutes earlier than the standard gradient (left). A second gradient was programmed with an increased slope delivering the standard ACN concentration at the start of the run with an increasing difference between the perturbed and standard gradients as the run progressed (right).

that increased during the course of the run (Figure 4). This type of error could possibly originate from programming problems as well as mechanical problems with solvent mixing in a pump system. The results displayed by ChromEval for this type of error show a diagnostic change where the earliest peptides align quite nicely with the reference run, while peptides observed later in the gradient are seen to be eluting increasingly earlier in the run. We tested two other perturbations of HPLC program that also produced dramatic results that would be easily identified using ChromEval. In these two runs, the programmed flow rate was changed from the baseline of 0.20 µL/min to 0.25 and 0.15 µL/ min respectively (see SI Table 2 for details). These changes in flow through the column, either upward or downward, can easily occur in split systems when the backpressure from the split changes due to either a nonocclusive obstruction (increasing flow through the column) or a small leak (decreasing flow through the column). In these three runs, ChromEval immediately shows the operator that peptides appear earlier in the run with higher flow rates and later in runs with lower flow rates (SI Figure 2). ChromEval also allows operators to evaluate another aspect of HPLC QC: autosampler reproducibility. We sought to emulate a failure in run-to-run consistency that can occur because of a

Figure 3. Emulated gradient shift error, as viewed by ChromEval. When the gradient is advanced (blue), peptide elution time is displaced by approximately the same amount of time at all points in the run compared to the reference run (red).

Figure 4. Emulated gradient slope error, as viewed by ChromEval. When the slope of the gradient is increased (blue), retention time differences compared to the reference gradient (red) increase as the run progresses. Analytical Chemistry, Vol. 82, No. 12, June 15, 2010

5065

Figure 5. Autosampler error is easily spotted with ChromEval. Consecutive runs were performed loading sample volumes of 1, 2, 4, and 8 µL, corresponding to the blue, green, purple, and red traces, respectively. The ratio of peak height values for the four runs matches the ratio of loading volumes.

variety of autosampler errors, for example, air in the system, leaky seals, needle clogging, etc. To demonstrate the utility of ChromEval for spotting issues attributable to autosampler failure, we deliberately loaded 1, 2, 4, and 8 µL of our standard protein mixture in back-to-back runs. As expected, the peak height varies dramatically among these runs with the ratio of heights closely correlated to the ratio of load volume (Figure 5). The relative intensities of the elution peaks for each load volume are consistent across nearly all the peptides at all elution times. This pattern should quickly alert the operator to the presence of an autosampler problem. Note that the retention times do shift in this set of back to back analyses due to the fact that HPLC system employed flow splitting which leads to some inherent retention time wobble. Finally, ChromEval allows a rapid evaluation of ppm error measurement using the ppm scatter plot. Looking at Figure 1, it is apparent that at the time of acquisition, the instrument mass calibration was off by approximately 2 ppm. This feature can alert the operator that the instrument needs recalibration. An example of ppm scatter plots before and after instrument calibration is shown in Figure 6, which clearly demonstrates an improvement in mass accuracy improvement. Examples of ChromEval Use in Every Day Facility Operation. In addition to the simulated HPLC problems, we have documented the performance of two systems available at our facility in “real world” use, one performing well, and one in need of service. An example of a system performing well is given in Figure 7A where the elution of a two selected peptides from our standard mixture is reliable over a span of 16 injections of a 5066

Analytical Chemistry, Vol. 82, No. 12, June 15, 2010

complex mixture. This experiment was performed using a Thermo LTQ Orbitrap Velos equipped with an Eksigent nanopump. The standard mixture sample was run initially and then after every eight injections of fractions of a whole yeast lysate (in which hundreds to thousands of peptides were identified in each run). As can be seen in the figure, the two selected peptides eluting over 15 min apart in the gradient had reproducible intensity and peak width, and retention times differed by only a few seconds. In contrast, a QC analysis on the identical nano-HPLC system that produced the nearly superimposable extracted ion chromatograms in Figure 1 showed considerable irreproducibility after a period of extensive use (Figure 7B). In this analysis, the standard injections were run with only a solvent blank in between. The data for the two peptides shown indicates a retention time variance of over one minute early in the run and nearly half a minute later in the same run. Based on these results, the HPLC pump was serviced and retention time reproducibility improved dramatically (data not shown). These examples demonstrate the utility of ChromEval for broad HPLC system evaluation. The software is open source and well annotated; a help file is also included to support new users. The files used to generate the figures for this article are provided in the software download. We have included a table that indicates the source and product number for the 18 commercially available purified proteins used here as a reference standard (see SI Table 1). While our standard mixture contains all 18 proteins, a subset of these 18 proteins would suffice nicely for monitoring chromatography. We have identified a set of reliable peptides that span

Figure 6. PPM scatter plot. Scatter plots of ppm error for identified peptides before (blue) and after (red) mass calibration.

Figure 7. Good and bad HPLC performance in actual operation. (A) Retention time is maintained in a nano-HPLC system throughout the analysis of 16 complex samples. (B) Retention time problems experienced in sequential injections using a nanopump in need of maintenance.

the range of retention times for the entire mix, which is included in the download. A user can update this list simply by pushing the “save peptides” button in the control panel while having the

desired set of peptides selected for viewing. Use of a preselected list of peptides can dramatically speed the evaluation of a set of runs by reducing peptide selection time. The current version of Analytical Chemistry, Vol. 82, No. 12, June 15, 2010

5067

ChromEval is capable of analysis of any data converted to the mzXML format and thus can support nearly all instruments used in the community. Support for the mzML format18,19 will be included in a future release. The download includes the ReAdW converter for local conversion of Thermo RAW files on a computer running Xcaliber, which is the primary platform used at our facility. As noted above, the current version of ChromEval uses Tandem; however, this search engine could easily be replaced if another is desired with a few simple adjustments to the source code. CONCLUSIONS We have provided the proteomics community with an opensource quality control tool that allows the rapid evaluation of HPLC performance. It includes searching and visualization capabilities in a compact package to allow quick and easy assessment of the performance of LC-MS runs using a user-specified protein digest. (18) Deutsch, E. W. Methods Mol. Biol., 604, 319-331. (19) Orchard, S.; Albar, J. P.; Deutsch, E. W.; Binz, P. A.; Jones, A. R.; Creasy, D.; Hermjakob, H. Proteomics 2008, 8, 4168–4172.

5068

Analytical Chemistry, Vol. 82, No. 12, June 15, 2010

With this tool, the user has near-real time feedback regarding HPLC operation and is thereby empowered to make on-the-go decisions during LC-MS operation to minimize instrument down time and optimize LC-MS performance. ACKNOWLEDGMENT We acknowledge helpful discussions with James Eddes in planning this work as well as the proof reading skills of Carly Holstein. This work was supported by grants P50GM076547 and 5R21CA126216 (to D.B.M.). SUPPORTING INFORMATION AVAILABLE Two additional figures and two additional tables. This material is available free of charge via the Internet at http://pubs.acs.org.

Received for review January 7, 2010. Accepted May 17, 2010. AC100043X