UniQua: A Universal Signal Processor for MS ... - ACS Publications

Dec 13, 2012 - Insitute of Bioscience and Biotechnology, National Taiwan Ocean University, Keelung, Taiwan 20224. Anal. Chem. , 2013, 85 (2), ... Howe...
1 downloads 7 Views 689KB Size
Article pubs.acs.org/ac

UniQua: A Universal Signal Processor for MS-Based Qualitative and Quantitative Proteomics Applications Wei-Hung Chang,†,‡,§ Chi-Ying Lee,†,§ Chih-Yu Lin,† Wei-Yun Chen,† Meng-Chieh Chen,† Wen-Shyong Tzou,‡ and Yet-Ran Chen*,†,‡ †

Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan 11529 Insitute of Bioscience and Biotechnology, National Taiwan Ocean University, Keelung, Taiwan 20224



S Supporting Information *

ABSTRACT: Recent developments in high resolution mass spectrometry (HR-MS) technology have ushered proteomics into a new era. However, the importance of using a common, open data platform for signal processing of HR-MS spectra has not been sufficiently addressed. In this study, a MS signal processor was developed to facilitate data integration from different instruments and different proteomics approaches into a unified platform without compromising protein identification and quantitation performance. This processor supports parallel processing capability which allows full utilization of computing resources to speed up signal processing performance to >1 gigabytes/min. The storage space occupied by the processed MS data can be reduced to ∼10%, which helps the analysis and management of large quantities of data from comprehensive proteomics studies. For quantitation at the MS level, processing accuracy is improved and processing time for ASAPRatio is reduced to ∼50%. For quantitation at the MS/MS level, accurate reporter ion ratios from different instruments can be directly determined by the processed MS/MS spectra and reported in the Mascot search result directly without using specialized iTRAQ software.

R

spectrometry data making the integration of data from various instruments into a unified data analysis platform possible.6−8 Although the open data format facilitates the development of a unified data analysis platform, data processing methods still need to be optimized for different HR-MS/MS instruments. This is because HR-MS/MS raw data contains mainly noninformative data, and spectra characteristics vary according to instrument type, tuning condition, and analysis approach. The raw data from HR-MS/MS needs undergo appropriate processing according to the instrument spectra characteristics and analysis approaches to ensure that the final processed data has minimal signal interference, high mass accuracy, and high quantitation accuracy.9 Since not all general protein identification and quantitation software can process MS raw data according to spectral properties or approaches, appropriate MS data signal handling can improve the performance of most general software platforms and thus minimize the effort required to optimize software for different instrument types. There are some software packages designed for high resolution mass spectrometry spectra processing and information picking, which can accept the data from the mzXML from different instrumentations. The Hardklor10 and DeconMSn11,12 are designed for more accurate determination of monoisotopic mass and charge state of each precursor. Both of these packages

ecent developments in MS-based proteomics enable sensitive, efficient, and comprehensive global proteome analysis.1,2 MS-based proteomics generally uses a bottom-up approach that analyzes total peptides generated from the digestion of complex protein mixtures using tandem mass spectrometry (MS/MS). In bottom-up approaches, MS is used to identify and quantify protein expression using isotopic labeling or a label-free approach.3,4 Therefore, the accuracy and sensitivity of MS-based proteomics relies both on reproducible sample preparation and on the data quality of MS spectra.5 High resolution tandem mass spectrometry (HR-MS/MS) is considered to be a successful technology for the analysis of complicated proteomes or low abundance proteins because it generates high spectra quality with high resolving power and accurate mass measurements of peptide precursor and fragment ions. Currently, several types of HR-MS/MS are used in proteomics research. The different MS instrument types in common use are complementary in function, thus efficient integration of results from different instruments can provide more comprehensive analyses. The format of the raw data generated from MS instruments is different according to vendor and analyzer type. Each vendor uses its own software for processing/deconvolution of raw data. Most of the software developed by MS venders only supports their own data formats and limited proteomics approaches. These proprietary data formats result in difficulty in developing proteomics software and integrating data into a single data platform. In recent years, open XML-based formats have been developed to store mass © 2012 American Chemical Society

Received: August 13, 2012 Accepted: December 13, 2012 Published: December 13, 2012 890

dx.doi.org/10.1021/ac302281j | Anal. Chem. 2013, 85, 890−897

Analytical Chemistry

Article

(Madison, WI). The phosphatase inhibitor cocktail was purchased from Hoffmann-La Roche (Basel, Switzerland). The hemoglobin subunit beta (hae-β, Homo sapiens) and hemoglobin subunit alfa (hae-α, Homo sapiens) were purchased from Waters (Milford, MA). Deionized water (18.1 MΩ cm resistivity) from Milli-Q system (Millipore, Bedford, MA) was used throughout this work. Cell Culture and Protein Extraction. Jurkat cell line clone E6−1 (BCRC 60424) was purchased from the Bioresource Collection and Research Center (Hsinchu, Taiwan). All cells were initially maintained in RPMI 1640 medium supplemented with 10% heat inactivated fetal bovine serum, 50 μg/mL penicillin, 50 μg/mL streptomycin, 100 μg/mL neomycin in an incubator with 5% CO2 at 37 °C. Cells were then reconstituted to a concentration of 5 × 106 cells/mL in RPMI 1640 medium for 2 h at 37 °C in an CO2 incubator, and collected by centrifuging at 700g at 4 °C for 10 min. Jurkat cell proteins were extracted with lysis buffer (50 mM HEPES, 0.1% SDS, and 0.02% TX-100 at pH 8.0) and the extracted protein concentration was evaluated by Pierce BCA Protein Assay Kit (Thermo Scientific, Rockford, IL). Protein Reduction, Alkylation, and Digestion. Jurkat total proteins were further diluted to 1 μg/μL with 50 mM TEABC and reduced with 5 mM TCEP for 1 h at 37 °C, followed by alkylation using 2 mM MMTS for 45 min at room temperature. For the proteolytic digestion, the modified tubegel digestion protocol was applied and the detergent residue was checked using the method described previously.23 Labeling of Digested Peptides with iTRAQ Reagent. To label tryptic peptides with iTRAQ, 4-plex iTRAQ labeling reagents (Applied Biosystems, Foster City, CA) were reconstituted with ethanol individually and the labeling procedure was conducted according to the manufacturer’s protocol. Strong Cation Exchange Chromatography for Peptide Fractionation. For strong cation exchange (SCX) fractionation, the buffers SCX-A (5 mM KH2PO4 in 25% ACN at pH 3) and SCX-B (5 mM KH2PO4 and 350 mM KCl in 25% ACN at pH 3) were used as the mobile phase. The peptide mixtures were reconstituted in buffer SCX-A and then loaded into a PolySOLFOETHYL A column (200 × 2.1 mm, 5 μm, 300 Å, PolyLC, Columbia, MD) for 10 min at the flow rate of 0.2 mL/ min. Peptides were fractionated using a 75 min gradient from 0% to 100% of buffer SCX-B. Fractions were collected every 3 min from the retention time of 10 to 55 min using a fraction collector (BioFrac Fraction Collector, BioRad Laboratories, Hercules, CA). Reductive Methylation Labeling Reaction. Two 100 μg proteolytic Jurkat protein mixtures were first dissolved in 100 μL of 100 mM TEABC (pH 8.5) and respectively mixed with 10 μL of formaldehyde and formaldehyde-13C,d2 (4%, diluted with H2O). After vortexing (5 min) and centrifugation, each of the sample solutions was mixed with 10 μL of 600 mM sodium cyanoborohydride solution. The sample solutions were vortexed (10 min) and centrifuged again, and then they were allowed to react for 30 min at 25 °C. To quench the reaction, ammonium hydroxide (7% in water, 5 μL) was added to each sample solution. Finally, 8 μL of formic acid was added to acidify each of the sample solutions, and two samples were further combined for LC-MS analysis. LC-MS/MS Analysis. LC-MS/MS analysis was performed with a nanoUHPLC system (nanoACQUITY UPLC, Waters, Millford, MA) coupled online to the nanoelectrospray source of

can also calculate overlapping peptide isotopic patterns, which can provide better identification performance than the direct extraction of instrument supplied information. When proteins are identified using HR-MS/MS, raw MS/MS spectra from high resolution MS are processed to obtain fragment peak lists of peptides which are then subjected to a protein database search. To obtain a fragment peak list from a MS/MS spectrum, the spectrum noise is removed, and then the fragment signal is centroided and deisotoped. Isotopic peaks are removed because some of the common database search algorithms utilize monoisotopic fragment signals for database matching and the removal of isotopic signals can improve the confidence of peptide identification.13−15 On the other hand, for MS-based quantitation, it is recommended that the peptide precursor signal not be centroided and deisotoped16,17 as this can preserve more peptide abundance information for MS-level quantitation and facilitates the removal of spectrum noise by quantitation software. In addition, for MS/MS level quantitation such as the iTRAQ approach,18 the quantitation information of the reporter ions should be preserved before processing the spectra for the peptide fragment peak list.19,20 In addition to the above considerations, it is also important that the processed results should be in an open format that is compatible with general data analysis platforms to allow integration data analysis results. The OpenMS21,22 combines internal mass calibration of the precursor and wavelet transformation method for MS/MS fragments, which provides better support for the processing of high resolution MS/MS spectra. The fragment peaks in MS/MS spectra are extracted and stored in several formats which are compatible with different protein identification engines. However, if the spectra extraction and the protein quantitation software are not supported by the same data pipeline, the retention time information in the search result may not be recognized by the quantitation software. In this study, a high throughput universal MS signal processing platform (UniQua) was developed to directly process high resolution MS data in an open format. The processed result is also stored in the standardized mzXML/ mzML format, thus the protein identification and quantification information can be determined directly by an open proteomics pipeline. The performance of the UniQua processor is demonstrated and discussed.



EXPERIMENTAL SECTION Materials. Triethylammonium bicarbonate (TEABC), sodium cyanoborohydride, ammonium hydroxide, formaldehyde (37% solution in H2O), formaldehyde-13C,d2 (20% solution in D2O), Tris (2-carboxyethyl) phosphine hydrochloride (TCEP), ammonium persulfate (APS), methyl methanethiosulfonate (MMTS), Triton X-100 (TX-100), 4(2-hydroxyehyl) piperazine-1-ethanesulfonic acid (HEPES), sodium bicarbonate, N,N,N′,N′-Tetramethylenediamine (TEMED), potassium chloride, sodium pyruvate, potassium phosphate monobasic, 40% acrylamide/bis-acrylamide (37.5:1) solution, trifluoroacetic acid (TFA), formic acid (FA), bovine serum albumin (BSA), and bovine beta casein were purchased from Sigma-Aldrich (St. Louis, MO). Ethanol, methanol, acetonitrile (ACN), and sodium dodecyl sulfate (SDS) were purchased from J. T. Baker (Phillipsburg, NJ). RPMI-1640 medium, penicillin, streptomycin, neomycin, and fetal bovine serum (FBS) were purchased from Invitrogen (Carlsbad, CA). Trypsin (modified, sequencing grade) was from Promega 891

dx.doi.org/10.1021/ac302281j | Anal. Chem. 2013, 85, 890−897

Analytical Chemistry

Article

Figure 1. UniQua signal processing workflow.

settings was determined to be lower than 1% with the use of ProteinProphet. Protein Quantitation. For the dimethyl labeling experiment, the identification results obtained from Mascot were imported into TPP and quantified by Automated Statistical Analysis on Protein Ratio (ASAPRatio).27 The criteria for ASAPRatio protein quantitation were ProteinProphet probability ≥0.95, unique peptide hits ≥2, and quantitative peptide ≥2. For the isobaric labeling experiment, the reporter ion ratio for each identified peptide was determined by PLGS, Proteome Discoverer, Libra,28 or Mascot directly.

a hybrid quadrupole time-of-flight mass spectrometer (Q-TOFMS) with a time-to-digital converter (TDC) detector system (SYNAPT HDMS G1, Waters, Manchester, U.K.), Q-TOF-MS with an analog-to-digital converter (ADC) detector system (SYNAPT HDMS G2, Waters, Manchester, U.K.), or an ion trap-orbitrap mass spectrometer (IT-OT-MS) with a Fourier transform detector system (LTQ Orbitrap XL, Thermo Fisher Scientific, Bremen, Germany). The conditions and settings for the LC-MS analysis are described in the Supporting Information. Spectra Processing. The data generated from SYNAPT HDMS G1 and G2 was processed by either ProteinLynx GlobalServer (PLGS version 2.4, Waters, Milford, MA) or UniQua. The data generated from LTQ Orbitrap XL was processed by either Proteome Discoverer (version 1.3, Thermo Fisher Scientific, San Jose, CA) or UniQua. The parameters for the spectra processing are listed in the Supporting Information. The UniQua processed spectra were converted into Mascot generic format (.mgf) using mzXML2Search in Trans Proteomics Pipeline (TPP)24 version 4.4 rev. 1. Protein Qualification. The spectra processed by PLGS, Proteome Discoverer, or UniQua were searched against the IPI human protein database version 3.61 (total protein entries is 82635 with horse heart myoglobin bovine serum albumin, bovine alpha-casein, and bovine beta-casein added in the IPI protein database) using the Mascot version 2.3 (Matrix Science, London, U.K.) search engine. The search parameters are described in the Supporting Information. For the data analyzed by TPP, the Mascot search results were further validated using PeptideProphet25 and ProteinProphet26 which were included in the TPP package. The probability threshold settings for PeptideProphet and ProteinProphet were 0.7 and 0.95, respectively. The overall false discovery rate for the above



RESULTS AND DISCUSSION UniQua Spectra Processor System Architecture. The UniQua spectra processor was designed in C language which has multiple operating system (OS) support and no additional runtime environment requirement. UniQua has four major characteristics: dynamic mass calibration, flexible signal processing function, parallel processing, and the possibility of development into different user interfaces. Figure 1 shows the signal processing workflow of UniQua. The native raw data generated from an instrument is converted into mzXML/ mzML format. The processor first calibrates the MS/MS precursor ion according to the dynamic shift of the mass-tocharge ratio (m/z) during the analysis. For high resolution MS/ MS (HR-MS/MS) in which data is normally acquired in profile mode, each of the MS/MS spectra in mzXML or mzML are smoothed and noise filtered and the fragment peaks are centroided. For qualitative analysis, in addition to the further deisotoping of MS/MS spectra for better database searching performance, the MS spectra can be centroided to save storage space. For MS based quantitation, the MS/MS spectra are centroided but the MS spectra are preserved in profile format 892

dx.doi.org/10.1021/ac302281j | Anal. Chem. 2013, 85, 890−897

Analytical Chemistry

Article

Table 1. UniQua Calibration Performancea calibration method

no calibration

background

lockspray

search result

search toleranceb identified protein average protein score average peptide score matched unique peptides number of matched scans average peptide mass error

0.1 127 (80.3 ± 11.4) 371.6 (186.9 ± 20.6) 53.60 (53.20 ± 3.11) 563 (358.0 ± 45.1) 1689 (507 ± 62.4) 0.032 (0.031 ± 0.006)

0.1 128 (80.7 ± 11.2) 369.7 (186.7 ± 22.2) 53.66 (53.21 ± 3.14) 565 (361.0 ± 44.5) 1698 (510 ± 65.0) 0.022 (0.022 ± 0.005)

0.1 129 (80.7 ± 11.2) 367.4 (186.5 ± 21.9) 53.64 (53.21 ± 3.14) 568 (361.0 ± 44.5) 1702 (510 ± 65.0) 0.017 (0.017 ± 0.005)

0.1 128 (82.3 ± 9.1) 368.7 (191.3 ± 20.6) 53.52 (53.83 ± 2.58) 567 (365.0 ± 44.2) 1711 (520 ± 64.6) 0.012 (0.012 ± 0.001)

a Proteolyzed Jurkat cell lysate was analyzed by nanoUHPLC-Q-TOF-MS (TDC). Identification results were obtained by performing a Mascot search using MS/MS spectra with or without the use of different precursor mass calibration functions supported by UniQua. The results were obtained by merging three replicate analyses, and the values in parentheses are the average of three individual replicates. bMass tolerance for MS and MS/MS (Da). cThe detailed analysis result for each protein/peptide is listed in Supplemental Table S1.

to retain more quantitative information. However, for MS/MS based quantitation such as iTRAQ, all of the fragment peaks except reporter ion signals are deisotoped and MS spectra centroiding can be selected to save data storage space. After signal processing, the processed spectra are written back to replace the spectra in the original mzXML/mzML, thus keeping the processed data compatible with a wide variety of open proteomics data pipelines and software. Dynamic Precursor Calibration. During LC-MS/MS analysis, m/z accuracy is not only affected by environmental temperature but also by the stability of the electronics. To obtain accurate measurements of molecular weight (MW), in addition to periodical external calibration of the mass analyzer, dynamic mass calibration can further improve short-term variations in mass accuracy. UniQua supports several dynamic calibration functions for precursor mass calibration. Dynamic calibration is applied according to the mass shift of the nearest available reference spectrum or the identified peptide. Table 1 shows the calibration performance of UniQua for a LC-MS analysis of proteolytic Jurkat cell lysate. Without performing precursor mass calibration, the average precursor mass error obtained by the Mascot search was 0.032 Da. With background signal calibration, the (Si(CH3)2O)6H+ signal (445.120025) was selected during the retention time without peptide elution (60−600 s). The average mass error after the background calibration was 0.022 Da. For the LockSpray calibration, the average precursor mass error was reduced to 0.017 Da when the Glu-fibrinopeptide B (GFP) signal ([M + 2H]2+ = 785.8426) was used as the reference mass. For use of search result calibration, a higher mass error tolerance (0.3 Da) was used to identify peptides, and identified peptides with a false positive identification rate 1000 (Supplemental Figure S2). The smoothing and centroiding function of UniQua can improve the ratio accuracy for low abundance reporter ions. For MS/MS without processing, the ratio variation was >50% with a reporter intensity range of 10−30 counts (Supplemental Figure S3A). For MS/MS spectra smoothed and centroided by UniQua, the ratio variation was reduced to