Interactive Feature Finding in Liquid Chromatography Mass

Oct 13, 2006 - We propose a method for finding features in liquid chromatography mass spectrometry data that is based on the isotopic pattern of peaks...
0 downloads 0 Views 225KB Size
Interactive Feature Finding in Liquid Chromatography Mass Spectrometry Data David Clifford,*,†,§ Michael Buckley,†,§ Kim Y. C. Fung,‡,§ and Leah Cosgrove‡,§ CSIRO Division of Mathematical and Information Sciences, North Ryde, Australia, CSIRO Division of Molecular and Health Technologies, Parkville and Adelaide, Australia, and CSIRO Preventative Health Flagship, Australia Received May 11, 2006

Abstract: We propose a method for finding features in liquid chromatography mass spectrometry data that is based on the isotopic pattern of peaks. Our interactive approach to feature finding is carried out across many samples simultaneously and aligns features concurrently. Our scale-independent approach prioritises potential features and is easily adaptable to look for features of a particular mass and charge, paired features in isotopically labeled samples, or differentially expressed features. We demonstrate this by identifying features from normal human adult plasma. We highlight properties of plasma data that illustrate the need to visually check the quality of features found prior to further statistical analysis. Keywords: liquid chromatography mass spectrometry • bioinformatics • proteomics • biomarker • algorithms

Introduction The application of bioinformatics research to proteomics data is reliant on the availability of high quality data sets and the relevance of the problem(s) to be solved. To date, the majority of work has been focused on the qualitative aspects of proteomics such as the evaluation and/or development of robust protein identification algorithms,1-5 improved data visualization packages,6-8 or on fundamental data manipulations such as spectrum filtering or de-noising to remove lowquality spectra.9-11 More recently, efforts have also been directed toward the development of common data formats to improve the reliability and transparency of data interpretation.12 The application of proteomics to investigate cellular processes or for disease biomarker discovery has contributed to the increased interest in quantitative protein measurements and has spurred the development of numerous in-house software tools for the analysis of mass spectrometry data in a high-throughput manner.13-18 Even though some attention has been directed toward the development of robust algorithms for the quantitative measurement of protein expression, these software packages are still focused primarily on the use of proprietary isotope tagging methods such as ICAT or iTRAQ.19-20 * Corresponding author. David Clifford, CSIRO Mathematical and Information Sciences, Locked Bag 17, North Ryde NSW 1670, Australia. E-mail, [email protected]; phone, +61 2 9325 3210; fax, +61 2 9323 3100. † CSIRO Division of Mathematical and Information Sciences. ‡ CSIRO Division of Molecular and Health Technologies. § CSIRO Preventative Health Flagship. 10.1021/pr060226m CCC: $33.50 Published 2006 by the Am. Chem. Soc.

Proteomics research incorporating either protein array or mass spectrometry (ESI-LC/MS or MALDI-MS) data to measure protein expression produces a long list of features that represent proteins and peptides, with their associated intensities. When employing either ESI-LC/MS or high-resolution MALDI-MS, a feature is any collection of peaks that exhibit the correct isotopic distribution and relative heights. The distance between consecutive isotope peaks is dependent on the charge state of the peptide and is equal to 1/c Da for a peptide of charge c. To measure protein expression, stable isotope tags are often utilized as internal standards. As they are structural analogues of the native peptide, they possess identical chemical and physical properties making them ideal for comparative proteomic studies. When these tags are used, the term “paired” or “two-sided” feature is used to describe a pair of peptides that coelute with a known difference in massto-charge ratio (m/z) that is equivalent to the number of isotopes in the tag. Statistical methods are then used to reduce this list to a smaller group of features that either represent differentially expressed proteins, or can be combined into a classifier to separate diseased from normal populations. This part of the data analysis is entirely dependent on the quality of the original list of features and on the experimental design. At present, data analysis software for mass spectrometry/ proteomics applications is limited in its ability to efficiently and accurately detect global changes in protein/peptide expression and to quantify these differences. Commercially available software packages are highly specific for instrument platform (i.e., dependent on mass spectrometry instrument employed for analysis) or are limited to vendor-specific chemistries for peptide/protein tagging (i.e., users are restricted to purchasing specific chemistries for isotope tagging, for example, SILAC, ICAT, or iTRAQ).19-21 Despite the clear advantages offered by these strategies, these approaches do not necessarily suit all applications or provide an adequate solution to all biological problems. Additionally, the use of these chemistries only allows the researcher to focus on a subset of the available data; for example, when employing ICAT chemistry, only those peptides/proteins containing cysteine residues are captured and analyzed. To overcome these limitations, the use of generic chemical labeling strategies that are not amino acid residuedependent and the development of more flexible data analysis methods are desirable. The long-term goal of our research is biomarker discovery for colorectal cancer through the use of isotopically labeled Journal of Proteome Research 2006, 5, 3179-3185

3179

Published on Web 10/13/2006

technical notes

Interactive Feature Finding in LC/MS Data

plasma samples. As a preliminary step, we have analyzed noncancer plasma samples to develop an understanding of the characteristics of the “normal” population to enable us to clearly describe differentially expressed peptides and/or proteins that would accurately classify a normal from disease population. The complex nature of plasma data has led us to develop novel bioinformatics methods for information extraction from LC/MS data. The aim of this report is to outline the steps of our method. We illustrate its use through several examples. Current standard approaches for feature finding bring in measures of quality as a secondary step and use them to reduce the number of features found.22 The approach we advocate searches for features using a measure of quality. Our approach is not scale-dependent and naturally leads to a list of highquality features. An interactive approach involving the user is taken to delineate the clusters of each feature. This interactive approach ensures that only the highest quality features are included in subsequent analyses. This interactive approach can be carried out on many samples simultaneously, thus, combining spectral alignment with the feature finding stage from the beginning of the analysis. We highlight some of the difficulties we have encountered during our analysis of LC/ MS data. We outline our solutions to these problems, including data processing steps we feel are essential for accurate feature finding.

Materials and Methods Sample Preparation and Mass Spectrometry Methods. 1. Collection and Preparation of Plasma/EDTA Samples. Plasma/ EDTA samples (n ) 22) were collected from volunteer blood donators at the Australian Red Cross Blood Service (ARCBS) following informed consent. Plasma samples were then stored in 1 mL aliquots at -77 °C until required. An aliquot (100 µL) from each plasma sample was precipitated by the addition of an equal volume of acidified acetonitrile (0.1% CF3COOH in CH3CN) at 4 °C to remove abundant proteins. Samples were then centrifuged (13 000g, 15 min, 20 °C), and the supernatant was collected and passed through a 3 kDa centrifugal molecular filter according to manufacturer’s specifications (Millipore, Concord, MA). The low molecular weight fraction was then lyophilised, reconstituted in 0.1% CF3COOH, and then desalted by solid-phase extraction according to the manufacturer’s specifications (C18 seppak, Millipore, Concord, MA). The desalted peptides were then lyophilised and stored frozen (-80 °C) until required. Additionally, a reference standard was created by combining an aliquot (100 µL) of each plasma sample to create a “pooled normal” sample. An aliquot (100 µL) of each individual donor and the pooled normal sample were prepared in parallel to minimize variability that may occur during the sample preparation procedure. Chemical mass labeling with either acetic anhydride (C4H6O3) or deuterated acetic anhydride (C4D6O3) was performed by incubation (37 °C, 2 h) with the respective anhydride prior to being combined. The “pooled normal” reference standard was labeled with deuterated acetic anhydride (C4D6O3). 2. Mass Spectrometry Analysis of Plasma Peptides. Samples were analyzed by ESI-LC/MS using a QSTAR mass spectrometer (Applied Biosystems, Framingham, MA) fitted with a nanospray source (Protana). An electrospray voltage of 2500 V was applied for all analyses, and data was acquired at the mass range of 300-2000 Da. Peptides were separated by on-line 3180

Journal of Proteome Research • Vol. 5, No. 11, 2006

chromatography using a C18 reverse-phase column (100 µm × 100 mm) packed in-house. Chromatography was performed with a binary solvent system (Agilent Technologies, Palo Alto, CA) with the flow rate set to 0.75 µL/min. The column was equilibrated in Buffer A (aqueous 0.1% CF3COOH), and peptides eluted using an increasing linear gradient to 50% Buffer B (0.1% CF3COOH in CH3CN) over 100 min. Data were acquired using the Analyst QS v1.1 software (Applied Biosystems, Framingham, MA). Data file conversions (wiff to CDF) were performed using the File Translator Utility supplied by the manufacturer (Applied Biosystems, Framingham, MA).

Results and Discussion 1. Conversion of LC/MS Data to Image. Prior to feature finding and alignment, the raw MS data is converted into a format more amenable to statistical manipulation. The raw data in LC/MS experiments consist of consecutive mass spectrometry scans over a defined chromatographic time period. For each scan, the mass-to-charge ratio (m/z) for each eluting peptide is recorded along with its intensity. We transform the data by binning observations based on time and m/z, and this converts the raw LC/MS data to an image or matrix with each pixel corresponding to a bin. The choice of bin size is dependent on the shortest elution time one is interested in detecting, the computing power available, and the resolution of the mass spectrometer. Our default bin sizes are δt ) 30 s and δm ) 1/12 Da. Coarser resolutions hide the true relative shapes and heights of peaks, whereas finer resolutions result in higher computation times in subsequent image analysis. A mass-charge increment of δm ) 1/12 Da clearly shows the peaks of singly, doubly, and triply charged peptides (Figure 1). Time resolution can be increased to δt ) 10 s in order to find peptides that elute over shorter time spans. Despite summarizing the raw data into a more computationally friendly format, the resulting images of intensity values are quite large. This forces the user to almost exclusively use heuristic methods for finding, classifying, and quantifying features. 2. Feature Finding. Our general strategy for finding features is based on the isotopic pattern of peptides. One can easily spot a feature based on the plot of its isotopic pattern for a specific scan (or aggregation of scans) as a function of m/z value. The key points of the pattern are the relative heights of each isotope peak and the distance between successive isotope peaks within a cluster (Figure 2). Figure 2 shows a pattern of intensity values collected for a single LC/MS analysis. We can see several features here that elute over the course of 1 to 2 min. The image is created using a binsize of 10 s. 3. Isotopic Pattern. The isotopic pattern for a feature in LC/ MS data can be approximated based on the Poisson distribution and the rate at which 13C occurs in nature. We compare the observed heights of each isotope within a cluster with this model as a means of finding features. The average chemical formula of amino acids, known as averagine, is C4.938H7.758N1.358O1.477S0.042 and has an average molecular mass of 111.1254 Da.23 On the basis of this, a peptide of mass M is comprised of approximately M/111.1 averagine amino acids and 0.0444M carbon atoms. Of these, approximately 1.07% will be 13C atoms and 98.93% will be 12C atoms. As the natural occurrence of 13C atoms is rare, and assuming the carbon atoms are independent,

technical notes

Clifford et al.

Figure 1. This series of plots illustrates how the m/z bin size affects what a triply charged feature looks like. For δm ) 1/3 Da, one cannot distinguish the individual peaks; for δm ) 1/6 Da, one can distinguish peaks, but each peak is literally one pixel; for δm ) 1/12 Da (our standard), the peaks are better defined, and for 1/24 Da, even more so. Increasing the resolution, however, will increase the computational needs when working with the data files.

Figure 2. Image of intensity values in LC/MS data that includes several sets of paired features (three of which are highlighted in green) over the range m/z 700-725. Each paired feature consists of two clusters of three or four vertical lines (representing isotope peaks) that are separated slightly in m/z value. The isotopic clusters are present at the same elution time, and the clusters persist for approximately the same length of time.

then the total number of 13C atoms in a peptide of mass M has (approximately) a Poisson distribution with parameter λ ) 4.8 × 10-4M. We compare the intensity values of isotopic peaks within a feature with this Poisson model as a means of finding features. We also examine intensity values that lie between peaks. Ideally, there are valleys between the peaks, and we improve our feature finding by including the valleys in our search for features. We search for a feature based on three peak heights and the two intermediate valley heights (or depths as ideally they should be zero). The advantage of using five pixel values in this comparison becomes evident as the noise levels increase. With five reference points, there is less chance of matching the Poisson pattern compared with the case where three reference points are used. Peak shape, as well as the relative peak heights, has been used as a quality control check after feature finding has taken place.22,24 In morphological image analysis, structuring elements are binary operators used to pick out particular parts of images.25

Figure 3. A doubly charged feature showing the ideal isotopic pattern when the binsize is set to δm ) 1/12 Da. The binary structuring element SE(m) for a doubly charged feature is also shown. The intensity values marked in black are those used to evaluate the doubly charged metric at the point m. No values are given on the y-axis, as in the end it is the relative heights that are important.

Figure 3 shows the structuring element for finding doubly charged peptides. This structuring element uses five pixel values which coincide with the three peaks and their two intermediate valleys when the structuring element is positioned at a doubly charged peptide. As this is moved across each MS scan, the five intensities are compared to the values from the Poisson model. As the mass m increases, the structuring element SE(m) remains the same, but the Poisson model we compare the intensities to changes. As such, this is a nonstationary comparison across mass. 4. Choice of Metric for Identifying Multiply Charged Peptides. At each pixel in the image, we wish to compare a pattern of intensity values at this m/z location and at four equally spaced locations to the right to those from an ideal feature of a certain charge. If the two patterns match well, then there is evidence that this m/z location is the first peak in the isotopic pattern for a peptide. If the two patterns match poorly, then there is little evidence. Specifically when we are considerJournal of Proteome Research • Vol. 5, No. 11, 2006 3181

Interactive Feature Finding in LC/MS Data

technical notes

ing a single charge state, then the five locations (i.e., structuring element) used are (m/z, m/z + 0.5, m/z + 1.0, m/z + 1.5, m/z + 2.0) as these positions would, if z ) 1, correspond to the first three peaks of the isotopic pattern, and the two intermediate troughs. The five intensity values (I1, ..., I5) observed at these five locations are transformed into proportions p ) (p1, ..., p5), where pi ) Ii/(∑Ii). It is this vector of probabilities p that we wish to compare to a vector q based on the Poisson model with mean λ ) 4.8 × 10-4 m/z. To quantify the similarity of the two patterns p and q, we use a metric, or measure of distance between discrete distributions. A number of such metrics are known in the statistical literature. We use the metric of Bhattacharyya,26 which is essentially the angle between the 5-dimensional vectors p and q. For a particular charge, we evaluate this metric at each pixel. This results in an image of goodness-of-fit values of the same size as the original image. A region of this image where low metric values are found indicates the presence of a feature in the corresponding part of the LC/MS image. Because a peptide elutes over the time span of 30 s to several minutes, one expects to see streaks of low metric values in the image of metric values when a peptide is present in the corresponding part of the LC/ MS image. Morphological dilations and erosions are used to draw attention to appropriate regions in the data.25 These operations are just local minima and maxima, which enhance the streaks of low metric values. One can search for a wider feature by combining consecutive values of our metric in an appropriate manner (mean or max of both for example). One can also search for two-sided or paired features by combining the other metric values further along the m/z axis for a particular scan. A list of such locations is readily available for each image, and features can be subsequently ranked based on the metric values. Figure 4A shows how the contours from the metric image are laid over the LC/MS image to clearly highlight all the features present. Contours are included based on metric values of less than 0.2 and between 0.2 and 0.4. These highlight 15 regions of the image and outline all the features present. Up to this point, the method is automatic. Next, the metric images are averaged across all samples (see Figure 4). Consistently placed features are presented to the user and flagged across all samples. Flagged features are aligned, and the quantity of each is extracted in a single step by outlining a rectangle around the feature. This feature extraction process is continued for different contour values until the quality of features found is deemed to be too low. This results in a list of features for this collection of data files. For each feature, we record its position in each data file, the quantity of the feature in each data file, and also a measure of the time shift at each location. 5. Comparison of Many Data Files. Features found in one data file will not be located in exactly the same position in another data file due to potentially variable chromatographic conditions. Well calibrated MS systems have little or no alignment problems in terms of m/z, but significant problems can exist with the chromatographic retention times. Figure 4 illustrates this issue. In this image, we compare the same region across two different plasma data files (Figure 4A,B). The same contours are laid over both LC/MS images. For this portion of the data, there is clear evidence of a time shift in the 3182

Journal of Proteome Research • Vol. 5, No. 11, 2006

Figure 4. This pair of images represents the same region across two different plasma data sets (A and B). The contours in panel A are based on metric values computed from that image. The red contours outline regions with metric scores below 0.2. The blue contours outline regions with metric scores at most 0.4. The same blue contours are transposed onto panel B, highlighting the shift in the chromatographic elution profile. Also evident are features that appear to be unique to each data set (highlighted in green).

chromatographic elution profile. Note also that there are some features unique to each data file. Our approach for picking out the parts of images with low metric values is to apply a morphological erosion to the image that essentially spreads the low metric values appropriately; in our case, an erosion size corresponding to 5 min has proved adequate. When one examines the overlap of such images for various samples, one can easily pick out features that are consistently located (allowing for alignment). Once all features have been flagged, it is a simple task of outlining the features. After this, the position and quantity of each is automatically computed before the next analysis steps can be taken. Figure 5 consists of four panels which together illustrate the benefit of examining the average metric values across many data files for finding features. The first panel (Figure 5A) shows a region of data for a single data file (elution time 110-160 min, m/z 665-690). The metrics for singly, doubly, and triply charged features are evaluated on 22 data files. The erosion operations described earlier are applied in both the time and mass-charge directions, and the metric values are averaged across all 22 data files. Figure 5B-D represents images of the average metric values across the 22 data sets. The colors used in these images are analogous to topographical colors used in maps; blue corresponds to low metric values, and white indicates high metric values. Regions of blue correspond to regions where consistently low metric values are recorded, i.e., regions where we expect to find features.

technical notes

Clifford et al.

Figure 5. The image illustrates the idea of spreading out the low metric values in time and mass and averaging them across many data files. Panel A shows the log intensity values for a single data file. The other panels are the averaged metric values for singly (B), doubly (C), and triply (D) charged features, with the blue shaded areas indicating a high probability of a feature eluting. In panels B-D, the metrics indicate multiple features coeluting around 160 min; however, it is not clear if these are singly, doubly, or triply charged peptides. In panel C, we can also see a clearly defined doubly charged feature (approx. elution time 140 min, approx. m/z 670). In panel D, a triply charged feature is readily apparent (approx. elution time 145 min, approx. m/z 675-680).

In the metric image for singly charged features (Figure 5B), there are low metric values recorded in four separate parts of the image. In the metric image for doubly charged features (Figure 5C), there are low metric values recorded in three of the same parts of the image. In the metric image for triply charged features (Figure 5D), there are relatively low metric values recorded in two parts of the image. Looking a little further at each of these regions will highlight consistently located features across all 22 datasets. The metric images show either a singly or doubly charged feature in the top left-hand corner of this window. However, in the LC/MS image (Figure 5A), it is not clear if such a feature is present. It may not be present in this image, or perhaps the intensities of this feature are too low for it to stand out in comparison to other features in the image. 6. Detection of Nonideal Features. In this section, we outline some characteristics of comparative proteomics data that we have encountered during our preliminary investigations. An ideal feature is characterized by a clear isotopic pattern which closely resembles the predicted pattern, and since these data represent a comparison of differentially (isotope) labeled plasma samples, the paired features are separated by the correct distance that is dependent on the number of isotope labels attached to each peptide. Figure 6 shows some nonideal features. For example, three coeluting features are outlined by rectangles and marked A-C. The three clusters of peaks that make up the feature are not easily explained by the labeling procedure alone (labeling procedure should result in a paired feature, not a triplicate). It is not clear how one would record such features for subsequent analysis; many different interpretations are possible and questionable. Also highlighted are peptides of the same m/z value

Figure 6. Focused image of intensity values in LC/MS data for several triply charged peptides showing nonideal features. (a) Three coeluting features are indicated (denoted by A-C). (b) A feature with the same apparent m/z value appears to elute over a wide chromatographic time window (denoted by C-E). (c) Denoted at ‘X’, there seem to be two different features with very similar m/z values coeluting within the same chromatographic time window. The green line indicates the location of the highest recorded intensity value in successive scans. At ‘X‘, the position of the highest peak moves right by 1/3 Da, indicating that these paired features are separated by 1/3 Da.

that appear to elute closely together (areas marked C-E). Should these be considered as different peptides or should they be considered as the same peptide and their information combined prior to subsequent analysis and interpretation? Finally, Figure 6 also highlights the problem of overlapping features. Indicated with an X, it is clear from the isotopic patterns that there are two separate features eluting within the Journal of Proteome Research • Vol. 5, No. 11, 2006 3183

technical notes

Interactive Feature Finding in LC/MS Data

same chromatographic time window (120-130 min). The green line joins the position of the highest recorded intensity for each aggregated scan (10 s). At X, the position of this highest peak shifts to the right. Before this time point, the dominant peak is at 456.67 Da, and afterward, the peak is at 457 Da. An important point that needs to be raised here is that when one takes the time to visually examine each of the features, one can determine easily if these scenarios are occurring and avoid potential misinterpretation of the data that may result in a biomarker being erroneously identified. In our interactive approach, a feature is included in subsequent data analysis only after it has been examined visually. The end result of this interactive process is that the quality of features identified is no longer a concern.

Conclusions Our approach to data analysis is being applied to biomarker discovery but can also be adapted for the global analysis of ESI-LC/MS data to identify differentially expressed proteins that may be of biological interest. Approaches utilizing commercially available reagents, e.g., ICAT or iTRAQ, generally rely on proprietary software packages for measuring relative protein abundances in addition to interpretation of MS/MS spectra. When using these methods, tandem mass spectrometry is usually a necessary step in the analytical procedure, and the user is reliant on unique attributes of mass spectrometry platforms such as “data dependence” capabilities for data collection. Proprietary software packages, as well as others that are freely available over the Internet, are typically not transparent in outlining the methods used for data extraction or do not easily enable the user to reexamine the raw data for potential misinterpretations.13-18 During our extensive analysis of differentially labeled plasma samples, we have developed a novel bioinformatics approach to extract information from ESI-LC/MS data and have also identified characteristics of the data that should provide caution against blindly using automated feature finding methods. The use of ESI-LC/MS data to measure protein expression presents unique challenges to researchers where finding and classifying features is often difficult and subject to errors. Some confounding factors can include misalignment of features across multiple sample sets; missing features due to analytical, experimental, or biological variability; overlapping of isotopic clusters; coelution of features that cannot be readily explained by experimental procedures; or the detection of spurious spikes in the data that represent noise or background ions. These artifacts in the data can be easily overlooked when employing automated feature finding methods and can mislead the investigator into making invalid conclusions. We advocate a more interactive approach to data extractionsone that is based on the isotopic pattern of the features themselves. Our scale-independent approach allows one to find and align features across many samples. The user is then assured that all data used at later stages of analysis come from high quality features present across most of the analyzed samples. The applicability of any method for feature finding depends greatly on the resolution of the mass spectrometer in use and the mass of the peptide being detected. The practical mass range of most LC/MS platforms is limited to 2000 Da, and the majority of instruments employed for proteomic studies enable up to five isotopic peaks per peptide to be resolved. Since a majority of peptides enable at least three isotopic peaks to be 3184

Journal of Proteome Research • Vol. 5, No. 11, 2006

resolved and detected regardless of mass, we focus on finding features based on the first three isotopic peaks within each cluster. Our approach uses a more directed and targeted approach for identifying those peptides/proteins that are potentially more biologically important by utilizing a well-characterized global isotope labeling strategy and relying solely on LC/MS information without the need for tandem mass spectrometry. Because the user plays a critical role in the selection of features, this approach enables the researcher to examine all data points, rather than just a selected subset. This approach also allows low-abundance peptides to be included in the analysis and eliminates the use of low quality or noisy data in the analysis. Our approach is slower when compared to automated methods, but we feel that the resulting quality of the feature list and its implications for downstream validation experiments far outweigh this problem. User intervention also introduces a degree of subjectivity into the process. The user is presented with views of different regions of the LC/MS data and decides to accept or reject them as features. We recommend keeping track not only of the accepted features but also of those that are rejected. This enables reiterative analysis of the data to be performed, and provides a mechanism for the user to provide examples of what may/may not have constituted a feature should questions arise.

References (1) Eng, J. K.; McCormack, A. L.; Yates, J. R. Am. Soc. Mass Spectrom. 1994, 5, 976-989. (2) Kapp, E. A.; Schutz, F.; Connolly, L. M.; Chakel, J. A.; Meza, J. E.; Miller, C. A.; Fenyo, D.; Eng, J. K.; Adkins, J. N.; Omenn, G. S.; Simpson, R. J. Proteomics 2005, 5, 3475-3490. (3) Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Anal. Chem. 2002, 74, 5383-5392. (4) Clauser, K. R.; Baker, P. R.; Burlingame, A. L. Anal. Chem. 1999, 71, 2871-2882. (5) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S. Electrophoresis 1999, 20, 3551-3567. (6) Katajamaa, M.; Oresic, M. BMC Bioinformatics 2005, 6, 179. (7) Bellew, M.; Coram, M.; Fitzgibbon, M.; Igra, M.; Randolph, T.; Wang, P.; May, D.; Eng, J.; Fang, R.; Lin, C.; Chen, J.; Goodlett, D.; Whiteaker, J.; Paulovich, A.; McIntosh, M. Bioinf. 2006, 22, 1902-1909. (8) Gaspari, M.; Verhoeckx, K. C.; Verheij, E. R.; van der Greef, J. Anal. Chem. 2006, 78, 2286-2296. (9) Andreev, V. P.; Rejtar, T.; Chen, H. S.; Moskovets, E. V.; Ivanov, A. R.; Karger, B. L. Anal. Chem. 2003, 75, 6314-6326. (10) Salmi, J.; Moulder, R.; Filen, J. J.; Nevalainen, O. S.; Nyman, T. A.; Lahesmaa, R.; Aittokallio, T. Bioinformatics 2006, 22, 400406. (11) Wang, X.; Zhu, W.; Pradhan, K.; Ji, C.; Ma, Y.; Semmes, O. J.; Glimm, J.; Mitchell, J. Proteomics 2006, 6, 2095-2100. (12) Pedrioli, P. G.; Eng, J. K.; Hubley, R.; Vogelzang, M.; Deutsch, E. W.; Raught, B.; Pratt, B.; Nilsson, E.; Angeletti, R. H.; Apweiler, R.; Cheung, K.; Costello, C. E.; Hermjakob, H.; Huang, S.; Julian, R. K.; Kapp, E.; McComb, M. E.; Oliver, S. G.; Omenn, G.; Paton, N. W.; Simpson, R.; Smith, R.; Taylor, C. F.; Zhu, W.; Aebersold, R. Nat. Biotechnol. 2004, 22, 1459-1466. (13) Palagi, P. M.; Walther, D.; Quadroni, M.; Catherinet, S.; Burgess, J.; Zimmermann-Ivol, C. G.; Sanchez, J. C.; Binz, P. A.; Hochstrasser, D. F.; Appel, R. D. Proteomics 2005, 5, 2381-2384. (14) Radulovic, D.; Jelveh, S.; Ryu, S.; Hamilton, T. G.; Foss, E.; Mao, Y.; Emili, A. Mol. Cell. Proteomics 2004, 3, 984-997. (15) Li, X. J.; Yi, E. C.; Kemp, C. J.; Zhang, H.; Aebersold, R. Mol. Cell. Proteomics 2005, 4, 1328-1340. (16) Wang, W.; Zhou, H.; Lin, H.; Roy, S.; Shaler, T. A.; Hill, L. R.; Norton, S.; Kumar, P.; Anderle, M.; Becker, C. H. Anal. Chem. 2003, 75, 4818-4826. (17) Shadforth, I. P.; Dunkley, T. P.; Lilley, K. S.; Bessant, C. BMC Genomics 2005, 6, 145.

technical notes (18) von Haller, P. D.; Yi, E.; Donohoe, S.; Vaughn, K.; Keller, A.; Nesvizhskii, A. I.; Eng, J.; Li, X. J.; Goodlett, D. R.; Aebersold, R.; Watts, J. D. Mol. Cell. Proteomics 2003, 2, 428-442. (19) Gygi, S. P.; Rist, B.; Gerber, S. A.; Turecek, F.; Gelb, M. H.; Aebersold, R. Nat. Biotechnol. 1999, 17, 994-999. (20) Ross, P. L.; Huang, Y. N.; Marchese, J. N.; Williamson, B.; Parker, K.; Hattan, S.; Khainovski, N.; Pillai, S.; Dey, S.; Daniels, S.; Purkayastha, S.; Juhasz, P.; Martin, S.; Bartlet-Jones, M.; He, F.; Jacobson, A.; Pappin, D. J. Mol. Cell. Proteomics 2004, 3, 11541169. (21) Ong, S. E.; Blagoev, B.; Kratchmarova, I.; Kristensen, D. B.; Steen, H.; Pandey, A.; Mann, M. Mol. Cell. Proteomics 2002, 1, 376386.

Clifford et al. (22) Piening, B. D.; Wang, P.; Bangur, C. S.; Whiteaker, J.; Zhang, H.; Feng, L.-C.; Keane, J. F.; Eng, J. K.; Tang, H.; Prakash, A.; McIntosh, M. W.; Paulovich, A. J. Proteome Res. 2006, 5, 1527-1534. (23) Senko, M. W.; Beu, S. C.; McLafferty, F. W. J. Am. Soc. Mass Spectrom. 1995, 6, 229-276. (24) Stein, S. E. J. Am. Soc. Mass Spectrom. 1999, 10, 770-781. (25) Soille, P. Morphological Image Analysis: Principles and Applications; Springer: New York, 2002. (26) Bhattacharyya, A. Sankhya j 1946, 7, 401-406.

PR060226M

Journal of Proteome Research • Vol. 5, No. 11, 2006 3185