Isolation of N-Terminal Protein Sequence Tags from Cyanogen

Karsten Kuhn,†,| Andrew Thompson,‡,| Thorsten Prinz,† Jo1rg Mu1ller,† Christian Baumann,†. Gu1nter Schmidt,‡ Thomas Neumann,† and Christ...
0 downloads 0 Views 1MB Size
Isolation of N-Terminal Protein Sequence Tags from Cyanogen Bromide Cleaved Proteins as a Novel Approach to Investigate Hydrophobic Proteins Karsten Kuhn,†,| Andrew Thompson,‡,| Thorsten Prinz,† Jo1 rg Mu1 ller,† Christian Baumann,† Gu1 nter Schmidt,‡ Thomas Neumann,† and Christian Hamon*,† Proteome Sciences,§ Coveham House, Downside Bridge Road, Cobham, Surrey, KT11 3EP, United Kingdom, and Xzillion GmbH, Industriepark Ho¨chst, Building G865a, 65929 Frankfurt am Main, Germany Received March 26, 2003

A novel method for the isolation of protein sequence tags to identify proteins in a complex mixture of hydrophobic proteins is described. The PST (Protein Sequence Tag) technology deals with the isolation and MS/MS based identification of one N-terminal peptide from each polypeptide fragment generated by cyanogen bromide cleavage of a mixture of proteins. PST sampling takes place after sub-cellular fractionation of a complex protein mixture to give enrichment of mitochondrial proteins. The method presented here combines effective sample preparation with a novel peptide isolation protocol involving chemical and enzymatic cleavage of proteins coupled to chemical labeling and selective capture procedures. The overall process has been very successful for the analysis of complex mixtures of hydrophobic proteins, particularly membrane proteins. This method substantially reduces the complexity of a protein digest by “sampling” the peptides present in the digest. The sampled digest is amenable to analysis by liquid chromatography tandem mass spectrometry (LC-MS/MS). Methods of “sampling” protein digests have great value1 if they can provide sufficient information to identify substantially all of the proteins in the sample while reducing the complexity of the sample to maximize the efficient usage of LC-MS/MS capacity. The validity of the process is demonstrated for mitochondrial samples from S. cerevisiae. The proteins identified by the PST technology are compared to the proteins identified by the conventional technology 2-D gel electrophoresis as a control. Keywords: PST technology • protein sequence tag • N-terminal fragment • hydrophobic proteins • mass spectrometry • yeast mitochondrion • proteomics.

Introduction Conventional techniques for determining protein expression such as 2-D gel electrophoresis (2DE) do not adequately represent hydrophobic proteins. This is a major drawback of 2-D gel electrophoresis even when optimized for hydrophobic proteins,2,3 in particular, as the hydrophobic protein class includes membrane proteins which represent 50% of known drug targets.4 With the completion of the human genome project, membrane proteins are expected to provide many new targets for drug discovery projects.5 Mass spectrometric detection coupled to liquid chromatography analysis of peptide digests is emerging as a powerful gelfree alternative tool for the analysis of complex protein mixtures, which has the potential to overcome the limitations of 2-D gel electrophoresis with respect to hydrophobic proteins. Mann and co-workers have shown that in theory the mass of * To whom correspondence should be addressed. Fax: 0049 69 30544302. E-mail: [email protected]. † Xzillion GmbH. ‡ Proteome Sciences. § Company contact: [email protected] | Equally contributed.

598

Journal of Proteome Research 2003, 2, 598-609

Published on Web 07/26/2003

a single peptide along with partial sequence information, which can be determined through collision induced dissociation of the peptide, can be sufficient to identify the parent protein,6 as long as a peptide of an appropriate size that is wellrepresented in mass spectra can be isolated from the parent protein. Consequently, new methods are being developed in which peptides are analyzed to represent each protein in a mixture, which allows proteins to be identified by MS/MS of their peptide fragments. Conceptually, the simplest approach to analyzing complex polypeptide mixtures is seen in the multidimensional protein identification technology (MudPIT) approach in which a mixture of polypeptides is digested with a protease and all of the digested peptides are analyzed by liquid chromatography tandem mass spectrometry (LC-MS/MS).7,8 MudPIT has also been used to analyze proteins from insoluble protein fractions by cleavage of the proteins in the insoluble pellets with cyanogen bromide (CNBr), illustrating the importance of appropriate sample preparation. The disadvantage of the MudPIT technique, however, is that the total peptide digest contains many more components than that of the starting mixture as a typical enzymatic digestion gives rise to many 10.1021/pr034026b CCC: $25.00

 2003 American Chemical Society

research articles

PST for Hydrophobic Protein Identifications

peptides per protein, for example tryptic digestion of the yeast proteome gives approximately 340 000 peptides from about 6300 proteins, an average of over 50 peptides per protein. The MudPIT approach reduces the problem of the complexity of the sample by attempting to separate all of these peptides with high-resolution 2-dimensional chromatography, but the very high degree of redundancy in this approach means that many more peptides are identified per protein than are strictly necessary for protein identification. Although redundancy increases the confidence in protein identification, an overly high degree of redundancy reduces the overall capacity of the system to detect proteins as the throughput of any LC-MS/ MS based peptide identification process is somewhat limited due to the time required for each CID analysis. Thus, to maximize efficient use of MS/MS capacity, there is a need to reduce the complexity of the digest mixture so that only a few peptides are isolated for each protein. “Sampling” techniques are emerging as useful methods to reconcile the need to analyze small populations of peptides for maximally efficient use of mass spectrometer instrumentation, while retaining sufficient information about the original sample to identify all of its components. The ICAT procedure9 uses “isotope encoded affinity tags”, a pair of isotopic biotin linkers, which are reactive to thiols, for the capture of peptides with cysteine in them. It was reported that on average 5 cysteine containing peptides are available per protein in yeast and that 92% of proteins have at least one cysteine residue (of 6113 proteins in the database that was analyzed), although not all of these peptides will be useful for analysis, for example peptides of 3 amino acids or less will provide little information. A related procedure, that isolates peptides via cysteine by “covalent chromatography” and quantifies the peptides by tagging of lysine10 is also being developed. Assuming that cysteine distributions are consistent across species, these processes should theoretically provide good coverage of a typical proteome, while achieving a 10-fold reduction in complexity compared to the raw digest but it is less clear that this sort of approach is optimal for the isolation of hydrophobic proteins. The ICAT approach, in particular, stresses the importance of labeling cysteine with the biotinylated isotopic labels as early as possible in the sample handling process. For hydrophobic fractions, this may be difficult to achieve prior to digestion of proteins, which is the preferred approach for ICAT, because many hydrophobic proteins may not be accessible to the reagents after extraction from the raw sample as they form insoluble precipitates. However, ICAT has been used to analyze microsomal fractions from human myeloid leukaemia (HL-60) cells.11 As discussed above, these peptide analysis techniques do permit analysis of proteins that are not amenable to analysis by conventional approaches such as 2-D gel electrophoresis, particularly hydrophobic proteins. At a practical level, however, there are a number of challenges to overcome to allow hydrophobic proteins to be readily analyzed using these peptide sampling procedures. Other techniques such as 1DSDS-PAGE followed by MALDI-MS or LC-MS and CE-LCMS are also being developed to facilitate the analysis of difficult samples that are poorly soluble in aqueous solvents.12-15 The work presented here demonstrates the need for considerable process optimization to achieve the goal of isolating hydrophobic proteins and rendering them in a form that is accessible to further analysis. This paper provides a novel process that combines effective sample preparation techniques with a

chemical method for sampling a subset of peptides from each protein in a complex mixture that greatly increases the peptide identification rate from hydrophobic proteins. The effectiveness of this approach is demonstrated by analysis of mitochondrial proteins isolated from S. cerevisiae, as this is an established model biological system whose proteome has been well-characterized.

Experimental Protocols Biological Sample Preparation. Isolation of mitochrondria from yeast: the S. cerevisiae W303A yeast strain16 (Genotype: MATa trp1-1 his3-11,15 can1-100 ura3-1 leu2-3,112 ade2-1) was grown in YPD (yeast peptone dextrose) medium at 30 °C. Mitochondria were isolated by differential centrifugation essentially as described17,18 with the following changes: Bovine serum albumin was left out of the homogenization buffer and mitochondria were finally taken up in a buffer containing 20 mM Na2HPO4/NaH2PO4 (pH 7.2) and 250 mM saccharose. PST Procedure. (a) Sample Preparation. A 500-µL portion of the isolated protein suspension was diluted with 4.5 mL of formic acid. After being vigorously shaken, cyanogen bromide (CNBr; 5M in acetonitrile) was added to the solution to achieve a final concentration of 1 mg CNBr per mL. This mixture was shaken for 24 h in the dark and then was diluted to 40 mL with distilled water, frozen, and lyophilized. The remaining residue was dissolved in 1 mL of a denaturing buffer (4M urea, 2M thiourea, 4 M guanidine hydrochloride). The solution was purified by size exclusion centrifugation (Centricon devices with a molecular weight cutoff (MWCO) of 5000, Millipore), whereby the highly denaturing buffer was used to recover the polypeptide fragments from the Centricon device. The protein concentration in the recovered solution (1 mL, 2 mg protein per mL) was determined by micro Bradford assay (BioRad). (b) Labeling. Disulfide bridges were reduced and free thiols were blocked as follows: 1 mg in 500 µL of the recovered polypeptide solution were diluted with 250 µL borate buffer (200mM, pH 7.2), to which 250 µL water and 0.28 mg tris[2carboxyethylphosphine] TCEP (Pierce) were added. After shaking for 1 h, 1.3 mg iodoacetamide (Sigma) was added, and the reaction mixture was shaken for additonal 2 h. After blocking thiols, free amino groups were blocked as follows: 10 mg of the basic mass tag N,N-dimethylglycine (DMG) N-hydroxysuccinimide ester dissolved in 333 µL dimethylformamide (DMF), were added to the polypeptide mixture and incubated at room temperature. After 3 h, again a 10-mg portion of the DMG reagent was dissolved in 35 µL DMF, added to the protein mixture and incubated overnight. (c) Digest. To remove all of the excess labeling reagents, the guanidine hydrochloride and the DMF, the polypeptide mixture were purified by size exclusion centrifugation using Centricon devices with MWCO of 5000. Borate buffer (50 mM, pH 7.5) containing urea (66 mM) and thiourea (33 mM) was used for repeated dilution and recovery (700 µL solution were recovered, 50% recovery estimated). A 30-µg portion of trypsin (seq. grade modified trypsin, Promega) was added, the pH was adjusted to 7.8, and the solution was incubated at 37 °C for 24 h. (d) Sorting. The scavenger beads were prepared directly before use as follows: 2 × 200 mg of Polystyrene AM COOH resin (Rapp Polymere) was poured into two 5 mL syringes equipped with a filter membrane, which were then swollen and washed several times with DMF. Three milliliter portions per syringe of a solution of 175 mg N-Hydroxysuccinimide in DMF were added to the freshly swollen beads, followed by 233 µL Journal of Proteome Research • Vol. 2, No. 6, 2003 599

research articles per syringe of N,N′-Diisopropylcarbodiimide. The mixture was incubated for 3 h at room temperature with occasional stirring. The beads were washed vigorously several times with DMF and dichloromethane and dried 1 h in vacuo. Then, the first 200 mg portion of the activated beads was re-swollen in DMF and suspended in 2400 µL DMF The digest peptide mixture (approximately 500 µg digested peptides) was adjusted to pH 7.2, and a 600-µL portion was added to the beads suspended in DMF. The mixture was incubated overnight at room temperature with vigorous shaking. After 18 h, the filtrated peptide-containing solution was transferred to the second 200-mg portion of activated beads (re-swollen in DMF before use). After 6 additional hours of incubation, the solution was filtered off, and the solvent was evaporated. The remaining residue was dissolved in 1500 µL of water:acetonitrile 95:5 + 0.1% trifluoro acetic acid. (e) Sample Preparation for LC-MS Analyses. The recovered and diluted solution was loaded onto a strong cation exchange (SCX) column (self-packing cartridge filled with S-sepharose “fast flow”, Fluka). After washing the column with water: acetonitrile 95:5 + 0.1% trifluoro acetic acid, the peptides were eluted with an salt buffer (ammonium acetate in water: acetonitrile 95:5, pH 2.0) using a step gradient (50-250mM salt) to obtain 6 fractions. After lyophilization, each residue was resolved in 100 µL water:acetonitrile 95:5 and 0.1% trifluoro acetic acid. LC-MS Analysis and SEQUEST Search. (a) LC-MS Runs. For most analyses, a Surveyor HPLC System (ThermoFinnigan) and a LCQ Deca (ThermoFinnigan) was used to perform the LC-MS analyses. A 20-µL sample (∼1/5 portion of each SCX fraction) was loaded per run onto a 1 mm-inner diameter × 10 mm trapping column at a flow rate of 100 µL/min of solvent B (water, 0.2% formic acid). Then by switching the valve, the sample was loaded onto a reverse phase C18 column (ThermoHypersilKeystone BioBasic C18, 100 × 1 mm, 5 µM particle) and a binary solvent gradient (solvent A: methanol, 0.2% formic acid; solvent B: water, 0.2% formic acid) was applied to elute the peptides from the column. The overall time of the LC method is 90 min including conditioning step, elution gradient and wash step. The solvent delivery system was run at a constant flow of 100µL/min. MS/MS analysis for peptide identification was performed with dynamic exclusion mode. Some additional MS and MS/MS analyses were performed on a QTOF2 mass spectrometer (Micromass, Manchester, UK). HPLC analysis was performed with a CAP-LC HPLC system (Waters Corporation, Milford, MA) (column: PepMap C18 HPLC column from Dionex with a 75 µm inner diameter and a length of 150 mm; solvents: 95% water to 95% acetonitrile both with 0.2% formic acid). (b) SEQUEST Analyses. The raw data of each LC-MS/MS run were subjected to two different sets of SEQUEST analysis parameters: 1st Run: Cys +57 (static), Lys and His +85 (differential), Nter peptide +85. 2nd Run: Cys +57 (static), Lys and His +85 (differential), Nter protein +42. Cleavage sites for in silico digest were defined after M, K, and R. The allowed mass deviation was set to 3.0 Da (MS) and 0.5 Da (MS/MS). The yeast subset of the nonredundant NCBI database was used for all analyses. Only triply charged peptides that showed Xcorr > 3.0, doubly charged peptides with Xcorr > 1.9 and singly charged peptides with Xcorr > 1.5, having all a dCn of g 0.08 were further analyzed by detailed analysis: the 600

Journal of Proteome Research • Vol. 2, No. 6, 2003

Kuhn et al.

suggested peptide sequences have to match with the labeling criteria (alkylation of Cys, acylation of Lys and N termini generating from CNBr cleavage, no acylation of His) and digest criteria (cleavage after Met or Arg at the N-terminal end, cleavage after Arg at the C-terminal side). If a protein is represented by only one peptide, then the MS/MS spectra was manually inspected the match of the major product ions with the theoretically predicted product ions from the database matched peptides. Membrane 2-D Gel Electrophoresis and Peptide Mass Fingerprinting. (a) Membrane 2-D Gels. Total mitochondria from yeast cells were solubilized in a rehydration solution (6 M urea, 2 M thiourea, 4% (w/v) CHAPS, 75 mM DTT, 0.5 mM EDTA, 5 mM pefabloc, 0.01% (v/v) orange G, 1% (v/v) ampholines pH 3.5-9.5 (Amersham Biosciences), and 1% (v/v) pharmalytes pH 3-10 (Amersham Biosciences)) reaching a total amount of 500 µg. This whole mixture was applied to immobilized pH gradient (IPG) gels (18 cm, 3.3% polyacrylamide, pHs 4-9L (Xzillion)) during the rehydration procedure for 19 h at 50 V and 17 °C. Iso-electric focusing (IEF) of the first dimension was then performed for 79 000 Vh at 17 °C. The equipment for the rehydration and for the running of the IPG gels (IPGPhor and Multiphor II) were purchased from Amersham Biosciences. For the second dimension, 12% SDS-PAGE gels (19 × 23 cm) were run in a DALT-1 electrophoresis apparatus (Amersham Biosciences) at 10 °C with 764 mAh/gel. The gels were stained with Coomassie brilliant blue.19 (b) Spot Isolation and Protein Digestion. The whole procedure of spot picking, washing, destaining, digesting, extraction, and MALDI preparation was carried out automatically by an Ettan spot handling platform (Amersham Bioscience). (c) MALDI-TOF Mass Spectrometry for Peptide Mass Fingerprinting. A MALDI-TOF mass spectrometer with delayed extraction (Voyager STR, Applied Biosystems) was used for the automatic generation of spectra from protein digests.20 The matrix solution was 10 g/L alpha-cyano-4-hydroxycinnamic acid in 49.9% water; 49.9% acetonitrile and 0.2% trifluoroacetic acid (by volume). For MALDI target preparation, 0.5 µL of analyte solution were allowed to dry, followed by addition of 0.5 µL of the matrix solution. After a two point internal calibration, the mass accuracy was in most cases better than 10 ppm (1/2 peak width/peak height), so the mass tolerance for the database search (MS Fit; Applied Biosystems) was set to 15 ppm. On average, our protein identifications by PMF were based on a protein coverage of 30% and a percentage of matched peptides of 51. Protein Sequence Analysis. (a) Characterization of Proteins Concerning Membrane Relationships. A protein was defined as a “membrane protein”, if it met one of three criteria: (1) The feature “transmem” or the keyword “transmembrane” were present in the entry for that protein in either of the SWISSPROT, TrEMBL,21 or PIR databases;22 (2) the protein has been reported in the primary literature as being an integral or membranespanning protein; (3) the protein was predicted to be a membrane protein using the TransMembrane Hidden Markov Model (TMHMM) described by Krogh and Larssen, which is reported to be able to discriminate between soluble and membrane proteins with both a specificity and a sensitivity of better than 99%.23 TMHMM is accessible via www.cbs.dtu.dk/ services/TMHMM-2.0/. Similarly, a protein was defined as a “membrane associated protein” if it met one of the following criteria: (1) The feature “lipid” or the keywords “membrane”, “prenylation”, “lipoprotein”, “myristate”, “palmitate” or “GPI

PST for Hydrophobic Protein Identifications

research articles

anchor” were present in the entry for that protein in either of the SWISSPROT, TrEMBL or PIR databases; (2) the protein has been reported to be a Membrane Associated Protein in the primary literature. Proteins were otherwise assumed to be soluble unless no database entries or highly homologous annotated proteins, identified by BLAST searches,24 were available for them, in which case the proteins were marked as unknown. (b) Characterization of Proteins Concerning Subcellular Localization. The subcellular localization information was extracted from database annotations in SWISSPROT, TrEMBL, PIR, Refseq,25 SGD, or Yeast Protein Localization Server.26 In the case that no annotation has been found in one of these databases, proteins have been considered as unknown. (c) Protein Statistics. Molecular weight and pI values have been calculated by the pepstats program from the EMBOSS package.27 CAI values have been calculated by CodonW (http:// www.molbiol.ox.ac.uk/cu/codonW.html).

Results and Discussion Outline of the PST Process used to Identify Hydrophobic Proteins. This paper combines efficient membrane protein sample preparation with a novel peptide sampling protocol to effect high-efficiency isolation of hydrophobic proteins. The rationale for this process came about as a result of work in our laboratories on N-terminal peptide isolation to develop the PST technology. Earlier work (not shown) demonstrated that it is possible to isolate peptides from the N-termini of proteins in a complex protein mixture as a method of sampling the proteins present to generate an expression profile. Our initial attempts to apply this procedure to membrane proteins gave poor results because of the intrinsic insolubility of the pellets of centrifuged membrane fractions. The proteins in the pellets were inaccessible to the reagents needed for the sampling process. As a result, it became necessary to pre-solubilize the membrane protein fractions. Pre-solubilization with cyanogen bromide proved to be most effective.28,29 Application of the N-terminal peptide isolation procedure to the CNBr cleavage polypeptides results in the isolation of the N-terminal fragment of each CNBr fragment up to the first arginine residue. Consequently, a few protein sequence tags per parent protein can be isolated. A flowchart of the protein sequence tag (PST) isolation procedure is presented in Figure 1. The combined sample preparation and peptide sampling process is illustrated. A schematic of the chemical protocol for isolation of the Peptide Sequence Tags is shown in Figure 2. In this simplified view of the application of the PST process to a single polypeptide, only the reactive functionalities of the relevant amino acids in the polypeptide are shown. The first step in the PST isolation process involves chemical cleavage of the protein mixture with Cyanogen Bromide (CNBr). The combination of organic solvents and chemical cleavage used in this step facilitates solubilization of the proteins in a pellet of proteins and lipids obtained from a typical membrane fractionation process. This step is followed by reduction of disulfides and capping of free thiols with Iodoacetamide. Thiol-capping is followed by blocking free primary amino groups in the CNBr cleavage polypeptides with an aminereactive Basic Mass Tag (BMT), activated via an active ester group, that will protect the blocked amino group from further reaction with the “scavenger beads” used in a later step. BMT labeling has been optimized for completeness and selectivity

Figure 1. Flowchart to illustrate the PST process.

of the reaction of the tag with amino groups in proteins (unpublished results). To assess the labeling reaction, we set the parameters of the SEQUEST search to consider modifications of each peptide at Lys, at His and at the N terminus as differential and to consider Lys residue as a cleavage site. Each peptide sequence with a good SEQUEST score (Xcorr > 4.5 for triply charged peptides and Xcorr > 3.5 for doubly charged peptides with a dCn of g 0.1 for all) was then manually inspected and full labeling of amino groups (Lys and N terminus) was observed. No labeling of histidine residues has been detected. Moreover, it is well-known30 that no labeling of arginine residues occurs under the reaction conditions used in the PST procedure. Modification of lysine with the BMT group has a further advantage in that it allows the modified lysine to be distinguished from the isobaric asparagine residue in CID-based sequencing. It is also known that introducing an easily protonated group at the alpha position in a peptide can enhance the formation of b-ions.31 This should help to improve the quality of the MS/MS obtained from of all of the internal PST peptides that will be blocked at the alpha position. In summary, the BMT replaces amino-groups with a functionality that can still protonate but is no longer nucleophilic. After the amine capping reaction has taken place, the CNBr digest is then further digested with trypsin. Capping of the epsilon amino groups with BMTs renders lysine inaccessible to trypsin, and so cleavage will take place only at arginine residues. The cleavage by trypsin generates new alpha amino groups in all of the non-N-terminal cleavage peptides. These can be selectively reacted with a capture reagent, which, in the final step of the schematic, is an aminereactive scavenger resin. The scavenger resin will react with any peptides with a free amino group. The end result of the process is a pool of peptides that represents the N-terminal fragments from each CNBr cleavage peptide. Capture of the non-N-terminal peptides leaves the N-terminal peptides free in solution. The N-terminal fragments of the CNBr peptides can then be processed in the usual ways for analysis by mass spectrometry. In particular, liquid chromatography tandem mass spectrometry is ideally suited to the analysis of complex Journal of Proteome Research • Vol. 2, No. 6, 2003 601

research articles

Kuhn et al.

Figure 2. Schematic of the chemical protocol used for PST isolation, shown for a single protein. (BMT - Basic Mass Tag; IA Iodoacetamide; hS - homoSerine).

mixtures of peptides, as the peptides can be identified by the sequence information that can be obtained by collision induced dissociation of the peptides. The Basic Mass Tags used in the capping procedure retain a functionality that will protonate readily. This means that the BMT-labeled peptide can be purified by strong cation-exchange chromatography prior to further analysis of the peptide mixture by mass spectrometry. In addition, retention of the ability to protonate has obvious advantages for mass spectrometric ionization techniques that depend on protonation such as electrospray ionization (ESI) or matrix-assisted laser desorption ionization (MALDI). 602

Journal of Proteome Research • Vol. 2, No. 6, 2003

In silico analysis of the yeast proteome, using the 6355 predicted open reading frames in a release of the Saccharomyces Genome Database from January 2003, revealed that there are on average 7 occurrences of methionine per protein. Further analysis of the predicted CNBr peptides that would be obtained after cleavage of BMT-capped proteins with trypsin shows that nearly 99% of proteins have at least one potentially available peptide and an average of 8 peptides with a length of greater than three amino acids, will obtained from each of these proteins. This average represents a useful but modest degree of redundancy in the sampling process. This is an advantageous feature of the PST process as it means that more

PST for Hydrophobic Protein Identifications

Figure 3. Coomassie-stained 2-D gel loaded with the mitochondrial fraction from Saccharomyces cerevisiae. Protein spots that were selected for PMF analysis are indicated by dots.

than one peptide can be identified per protein increasing confidence in protein identifications. Moreover, it is wellknown that not all the peptides obtained after an enzymatic digestion of a protein sample are compatible for identification in MS and MS/MS analyses. Thus, the redundancy also increases the chance that a peptide with good mass spectrometric properties will be obtained from each protein. However, the figure of 8 peptides per protein represents a theoretical upper limit on the numbers of peptides that are likely to be obtained by the PST process. There are various reasons why this number will not be obtained in practice, such as incomplete cleavage at methionine residues by CNBr or the fact that some peptides that are released are uninformative, as they share the same sequence with peptides from other proteins. For this reason, the moderate redundancy is a very useful feature of the PST process, as it ensures that at least one peptide can be obtained from most proteins. Comparison of PST Protocol with 2-D Gel Electrophoresis for Mitochondrial Protein Identification. To demonstrate the validity of the PST approach to study hydrophobic proteins, it was decided to test the technology on a well studied biological model system, namely yeast mitochondria. The yeast mitochondrion represents a useful model as it comprises a welldefined population of about 770 proteins of which 18% (139 proteins) are membrane proteins.26 Additionally, mitochondria provide an ideal model system for studying membrane proteins because they contain both R-helical and β-barrel proteins in their inner and outer membrane, respectively. Mitochondrial protein fractions were subjected to the PST protocol described in the “Experimental Section” and the resulting peptides were analyzed by reverse-phase liquid chromatography tandem mass spectrometry (RPLC-MS/MS) on an ion trap instrument. The resultant MS/MS data was analyzed with the SEQUEST algorithm.32 2-D gel electrophoresis, that had been optimized for membrane proteins, was applied to the same samples as a control. One PST analysis, which comprises six RP-HPLC runs on six fractions obtained by SCX chromatography, identified 147 gene products, whereas 112 different gene products of 412 MS identified proteins coming from 501 gel spots were identified by 2DE which is depicted in Figure 3 (a complete list of the proteins identified by the PST procedure and by 2DE is shown in Tables 1 and 2). Comparison of identification results

research articles obtained from the 2 different techniques revealed 58 of the proteins were found by both methods. The results of a survey of the cellular localization of the proteins identified by each procedure are shown in Figure 4. It can be seen that the two processes identified proteins with similar distributions of localization. One notable difference, however, was the identification of a number of plasma membrane proteins by the PST process that were not seen on the 2-D gels. In the case of the PST, about 50% of proteins identified were mitochondrial proteins while for 2DE about 60% of the proteins identified were mitochondrial proteins, indicating that the enrichment procedures undertaken prior to analysis had been successful. The bar chart in Figure 5 compares the numbers of hydrophobic (membrane and membrane associated) proteins and soluble proteins obtained by each analytical method. On the basis of the criteria described in the Experimental Section, which define proteins as “membrane protein” and “membrane associated protein”, it can be clearly seen that nearly 4 times more known membrane proteins are identified by the PST approach when compared to 2DE that had been optimized for membrane proteins. Indeed, 50 membrane proteins were identified by PST, whereas only 13 proteins could be identified by 2DE. Looking at the overlap of the identified membrane proteins, 9 of 13 hydrophobic proteins identified by 2DE were found by the PST technology as well. Furthermore, the number of R-helical transmembrane domains in the membrane proteins identified by the PST approach ranges from 1 to 12, of which 42% have more than 5 predicted transmembrane domains, whereas 2DE found hydrophobic membrane proteins with only 1-2 R-helical transmembrane domains. Comparison of the Range of Proteins Accessible to PST and 2DE. Figure 6 illustrates the functional range of proteins accessible to the PST technology. It can be clearly seen that there is no particular bias in the proteins isolated by this procedure when compared to the corresponding 2DE analysis. This gives us confidence that the PST procedure is a global sampling process, because, unlike the 2DE, hydrophobic proteins are clearly well represented (Figure 5) in the PST procedure, thus many carrier and transporter proteins were identified. The data in Table 1 also reveal that proteins with molecular weights ranging from 6.0 to 214.9 kDa were identified by the PST process in contrast to 2DE for which proteins ranging from 15.9 to 127.1 kDa were obtained. In addition, proteins with isoelectric points ranging from 3.7 to 12.0 were obtained by the PST process, whereas a range of only 4.1 to 9.8 was obtained by the membrane protein optimized 2DE. These results show that the PST process can allow for isolation of proteins outside the typical range for 2DE. Beside the normal range of proteins identified by 2DE, the PST technology represents a valuable and a straightforward method to identify proteins with very high pI and very low and high molecular weights. Moreover, the data in Table 1 show that 59 proteins identified by PST (34% of the total identified proteins) and 35 proteins identified by 2DE (31%) have a codon adaptation index (CAI) of