Identification of Protein Fragments as Pattern Features in MALDI− MS

Aug 6, 2005 - A new software tool called FragMint reconciles MALDI−MS data and ... The results indicate this approach may assist in identifying patt...
0 downloads 0 Views 302KB Size
Identification of Protein Fragments as Pattern Features in MALDI-MS Analyses of Serum Lisa J. Zimmerman,*,†,§ Gregory R. Wernke,§ Richard M. Caprioli,†,‡,§ and Daniel C. Liebler†,‡,§ Departments of Biochemistry and Pharmacology and Mass Spectrometry Research Center, Vanderbilt University School of Medicine, Nashville, Tennessee 37232 Received May 10, 2005

The use of matrix-assisted laser desorption ionization mass spectrometry (MALDI-MS) to acquire spectral profiles has become a common approach to detect proteomic biomarkers of disease. MALDIMS signals may represent both intact proteins as well as proteolysis products. Liquid chromatographytandem mass spectrometry (LC-MS/MS) analysis can tentatively identify the corresponding proteins Here, we describe the application of a data analysis utility called FragMint, which combines MALDIMS spectral data with LC-MS/MS based protein identifications to generate candidate protein fragments consistent with both types of data. This approach was used to identify protein fragments corresponding to spectral signals in MALDI-MS analyses of unfractionated human serum. The serum also was analyzed by one-dimensional SDS-PAGE and bands corresponding to the MALDI-MS signal masses were excised and subjected to in-gel digestion and LC-MS/MS analysis. Database searches mapped all of the identified peptides to abundant blood proteins larger than the observed MALDI-MS signals. FragMint identified fragments of these proteins that contained the MS/MS identified sequences and were consistent with the observed MALDI-MS signals. This approach should be generally applicable to identify protein species corresponding to MALDI-MS signals. Keywords: serum • MALDI • LC-MS/MS • bioinformatics • proteolysis

Mass-spectrometry (MS)-based proteomics analyses offer exciting new approaches to identify biomarkers for detection of disease and for monitoring of therapeutic and toxic outcomes. MALDI-based proteome profiling of serum, other biofluids and tissue sections have been widely and aggressively employed for pattern-based diagnostics and biomarker discovery.1-10 MALDI spectral features correspond to a subset of proteins present in the sample and collectively constitute proteomic patterns that represent different biological states. In this approach, the patterns themselves are considered to embody information, even in the absence of identification of the proteins detected. A rapidly growing body of literature describes the use of proteome profiles generated by MALDI-MS to classify serum samples on the basis of disease states, particularly cancers.1,3-6,8,11-14 In this approach, serum samples from patients diagnosed with disease and from matched controls are analyzed separately and bioinformatics algorithms and software are used to identify spectral features that distinguish the two sample sets. Subsequent analyses use these spectral features to assign the samples either a disease or control classification. * To whom correspondence should be addressed. Mass Spectrometry Research Center, Vanderbilt University School of Medicine, 9160 Medical Research Building III, 465 21st Avenue South, Nashville, TN 37232-8575. Tel: (615) 343-8431. Fax: (615) 343-8372. E-mail: lisa.j.zimmerman@ vanderbilt.edu. † Department of Biochemistry. ‡ Department of Pharmacology. § Mass Spectrometry Research Center.

1672

Journal of Proteome Research 2005, 4, 1672-1680

Published on Web 08/06/2005

Although numerous studies indicate that this approach can distinguish between defined groups in analyses conducted in the same laboratory, there are several major problems with the approach. First, overfitting of data in classification models limits the generalizability and predictive value of patterns between laboratories or sample sets.15,16 Indeed, replication of the elements of predictive patterns between laboratories has been an elusive goal.14,15 Second, proteomic patterns appear to be highly dependent on methods of sample preparation and analysis, which have not yet been standardized in this field.10 Finally, a major problem is that proteins or protein fragments that make up the proteome patterns have been identified in only a few studies.17 This greatly limits interpretation of proteome patterns in a biochemical context and makes it difficult to compare markers for specific disease states or therapeutic responses. In the few cases where discriminatory peaks have been identified, many of the putative biomarkers were high abundant serum proteins such as apoliprotein A1, transthyretin, interR-trypsin inhibitor and haptoglobin-R subunits.18,19 Since serum preparation involves activation of proteolytic cascades for blood clotting, many of the small proteins and peptides that form these patterns may be fragments of larger proteins. Identification of the proteins or protein fragments that produce signals in MALDI spectra is difficult, as even accurate mass measurements cannot unambiguously identify a protein. Although tandem MS (MS/MS) on MALDI-equipped instruments can 10.1021/pr050138m CCC: $30.25

 2005 American Chemical Society

Identification of Serum Protein Fragments

provide peptide sequence information, this is generally possible only with relatively small peptides of less than about 30 residues. Purification of proteins or protein fragments from complex mixtures to isolate species that account for specific MALDI signals can yield semi-purified samples for tryptic digestion and analysis by MS/MS. However, this approach typically generates multiple peptide and protein matches to MS/MS spectra in the dataset and unambiguous assignment of the species accounting for the MALDI signal is difficult. Many candidate protein identifications may map to proteins of greater mass than the signal observed in MALDI, which suggests that fragments, rather than intact proteins were the observed markers. The problem of choosing between multiple possible fragments from several candidate marker proteins greatly complicates the assignment of marker identities. Although the speed and convenience of MALDI proteome profiling offers an attractive means of biomarker discovery, the difficulty of associating MALDI signals with protein species remains a significant roadblock to more widespread acceptance of the approach. Here we describe a general strategy to identify proteins and protein fragments that account for protein markers observed in MALDI analyses of serum. We combine molecular weightbased protein separation (SDS-PAGE), liquid chromatographytandem mass spectrometry (LC-MS/MS) and a new software tool called FragMint, to identify partial sequences originating from intact proteins that correspond to features in MALDIMS spectra. The key to our approach is FragMint, which reconciles MS/MS-derived protein identifications with observed MALDI mass information to facilitate the selection of candidate markers. This approach should be generally applicable to identifying unknown proteins and protein fragments from MALDI-MS patterns in serum and tissue samples.

Materials and Methods Serum Preparation. Blood was collected by venipuncture in 10 mL glass collection tubes without any additives (red top) and allowed to clot for 45-60 min. The tubes were centrifuged at 1300 × g for 10 min within 2 h of collection. The serum was aliquoted into 100 µL portions and immediately frozen at -80 °C. All patients provided informed consent according to an IRBapproved study. MALDI-MS Analysis. Crude serum was prepared using a dried-droplet protocol using sinapinic acid (SA) as the matrix. The SA matrix was prepared as a saturated solution that contained 60% acetonitrile and 0.1% trifluoroacetic acid (TFA). For MALDI-MS analysis of crude serum, 1 µL of serum diluted 10-fold in 0.1% TFA was mixed with an equal volume of SA matrix and spotted onto a MALDI sample plate. All mass spectra were acquired on a Voyager-DE STR instrument (Applied Biosystems) in the linear mode using a nitrogen laser (337 nm). Porcine insulin (5777.6 Da), bovine cytochrome C (12232.0 Da), equine apomyglobin (16952.27 Da), and bovine trypsinogen (23976.0 Da) were used as external standards for MALDITOF mass spectra calibration. SDS-PAGE. Proteins from crude serum were separated on a precast 10-20% gradient Tricine gel (Invitrogen) for 90 min at 125 V. Gels were fixed with 50% methanol, 10% acetic acid for 15 min and then stained overnight with Colloidal Blue (Invitrogen), followed by destaining with water. In-Gel Digestion. Regions from the gel of crude serum that contained bands corresponding to molecular weights of the peaks observed in the MALDI-MS spectra of the crude serum

research articles

Figure 1. Flowchart for FragMint algorithm. Fragmint combines (a) target masses observed from MALDI mass spectra and (b) Sequest-identified peptides and protein accession numbers from the SDS-PAGE/LC-MS/MS analysis of a sample. Using these three parameters, FragMint retrieves the intact sequence of the proteins, the algorithm is applied, and candidate fragments matching the defined criteria are displayed.

were excised and subjected to in-gel tryptic digestion. Briefly, the gel bands of interest were excised and washed twice with 100 mM ammonium bicarbonate for 15 min. The liquid was discarded and replaced with fresh 100 mM ammonium bicarbonate and the proteins were reduced with 3 mM DTT/100 mM ammonium bicarbonate for 20 min at 55° C. The sample was cooled to room temperature, supplemented with iodoacetamide to 6 mM final concentration and placed in the dark for 15 min at room temperature. The solution was discarded and replaced with 50% acetonitrile/100 mM ammonium bicarbonate and the gel pieces were washed for 20 min and then dehydrated by treatment with 100% acetonitrile. The gel pieces were dried in vacuo and reswelled with 0.8 µg of modified porcine trypsin (Trypsin Gold, Promega) in 25 mM ammonium bicarbonate and digested overnight at 37°C. Peptides were extracted using three 100 µL portions of 60% acetonitrile/0.1% TFA, which were pooled and evaporated in vacuo. The residue was redissolved in 0.1% TFA and desalted using a C18 ZipTip (Millipore). The peptides were eluted from the ZipTip using 10 µL of 60% acetonitrile/0.1% TFA and the sample volumes adjusted to 100 µL using 0.1% formic acid prior to LC-MS/MS analysis. LC-MS/MS Analysis. LC-MS/MS analyses were performed on a ThermoLTQ ion trap mass spectrometer equipped with a Thermo Surveyor LC pump and a microelectrospray source (Thermo Electron, San Jose, CA). Reversed-phase separation of peptide digests was performed using fused silica capillary tips (Polymicro Technologies, 100 µm i.d., 360 µm o.d.) packed with Monitor C18 (5 µm, Column Engineering) at flow rates of 700 nL - 1000 µL min-1. Mobile phase A consisted of 0.1% formic acid and Mobile phase B consisted of 0.1% formic acid in acetonitrile. After equilibrating the column with 100% A, the peptides were eluted from the column with 5% B for 5 min, followed by 50% B for 50 min, which was then increased to 80% B by 52 min, and to 90% B by 55 min and held for 1 min. The mobile phase was then returned to 5% B over the next 5 Journal of Proteome Research • Vol. 4, No. 5, 2005 1673

research articles

Zimmerman et al.

Figure 2. FragMint graphical user interface. (A) FragmMint utilizes three parameters: a target mass, protein accession numbers, and Sequest-identified peptides, which are entered into the designated positions. (B) Protein modifications can be selected from a list of available pre-defined or custom designed modifiers to be applied to the FragMint analysis. 1674

Journal of Proteome Research • Vol. 4, No. 5, 2005

research articles

Identification of Serum Protein Fragments

Figure 3. MALDI spectrum of unfractionated human serum. Serum was diluted 10-fold in 0.1% TFA and 1 µL was mixed with an equal volume of saturated sinapinic acid matrix before depositing on a MALDI sample plate.

min and continued at that composition until the end of the run at 71 min. MS/MS spectra were acquired using a data dependent scanning with one full MS scan (m/z 400-2000) followed by one MS/MS scan of the most intense precursor mass. Select samples were reanalyzed using one full MS scan followed by three MS/MS scans of the three most intense ions. MS/MS spectra from LC-MS/MS analysis were searched against the human database using SEQUEST (Thermo Electron, San Jose, CA) and only tryptic cleavages were allowed. Sequest search outputs were filtered using a custom-designed software tool called CHIPS (Complete Heirarchical Integration of Protein Searches) using the following filtering criteria: cross correlation (Xcorr) value of >1.0 for singly charged ions, >1.8 for doubly charged ions, and >2.5 for triply charged ion. In addition, RSp (ranking of preliminary score) values of 350 also were required for positive peptide identifications. FragMint Algorithm. FragMint is a software tool that generates candidate protein fragment sequences consistent with observed MALDI-MS data and with peptide and protein identifications from MS/MS data. The general work-flow for FragMint analysis is shown in Figure 1. The FragMint graphical user interface is displayed in Figure 2. The user enters the protein fragment target mass, which is the neutral mass corresponding to the signal of interest in the MALDI-MS spectrum (and corrected for charge state, if necessary) (Figure 2A). The user also enters the database accession numbers and corresponding peptide sequences matched to MS/MS data derived from the same protein sample. Peptide sequences may be specified as either required or optional, depending on the quality of peptide sequence or other considerations (see below). The user also has the option to select from pre-defined or custom modifications shown in Figure 2B, which can be applied to the calculations. FragMint retrieves the complete protein sequences for the specified database accession numbers. Starting at the Nterminal amino acid of each protein, FragMint scrolls through the sequence to identify fragments that fall within the target mass window. If protein modifications are specified (as in Figure 2B) these are incorporated as specified, depending on the occurrence of the possible amino acid targets for modification. The display output indicates the sequences and masses of the selected fragments and graphically represents the

Figure 4. Colloidal Blue stained gel of unfractionated human serum. Serum proteins were separated using SDS-PAGE on a pre-cast 10-20% Tricine gel prior to staining. On the basis of the migration of the molecular weight markers, four bands from serum corresponding to the approximate molecular weights of selected peaks in the MALDI-MS spectra were excised from the gel and subjected to in-gel digestion. Each gel band corresponds to an approximate mass range: Band 1 (