Fine Tuning of Proteomic Technologies to Improve Biological Findings

Nov 5, 2013 - Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth. Road, Ott...
1 downloads 0 Views 651KB Size
Review pubs.acs.org/ac

Fine Tuning of Proteomic Technologies to Improve Biological Findings: Advancements in 2011−2013 Janice Mayne, Amanda E. Starr, Zhibin Ning, Rui Chen, Cheng-Kang Chiang, and Daniel Figeys* Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Road, Ottawa, ON, Canada K1H8M5



CONTENTS

Sample Preparation Protein Extraction Secreted Proteins Exosomes Membrane Proteins Protein Stabilization Miniturizing and Automating Sample Preparation Protein Digestion Multiple Enzymes Enzyme Immobilization Decreasing Sample Complexity Serum/Plasma Strategies to Improve Dynamic Range Depletion Strategies Enrichment Strategies Enriching Post-Translationally Modified Proteomes Phosphopeptide Enrichment Glycopeptide Enrichment Enrichment of Other PTMs Methods to Improve Coverage Filtered Aided Sample Preparation (FASP) SDS Spin Columns Detergent Clean-up Methods for MS-Deleterious Agents Integrated Approaches Monolithic Columns Quantitative Proteomics Gel Staining Label-Free Quantitation Metabolic Labeling/SILAC Chemical Labeling Targeted Quantitative Proteomics Selected Reaction Monitoring Characterizing Post-Translationally Modified Proteomes by Mass Spectrometry Protein Interactions Technologies Methods Applications Bioinformatics Proteomic Analyses/Bioinformatics Databases Search Engines Software Label-Free Software Extended Analysis © XXXX American Chemical Society

From Proteomic to Biological Applications Profiling Disease States From Genes to Proteins Biomarker Development Identification of Novel Biomarkers Applying Proteomics to Known Biomarkers Conclusions Author Information Corresponding Author Notes Biographies Acknowledgments References

B B B B C C C C C C C C C D

M M M N N N O O O O O O O

W

e have witnessed tremendous growth in proteomics. In the early days of proteomics, we were in awe when individual proteins were identified using mass spectrometry data. Now, thousands of proteins can be routinely identified and quantified using the latest proteomic technologies. These achievements are linked to improvements in analytical technologies whose applications have led to advancement in the understanding of biological processes. The mapping of protein−protein interactions in human1 and other species, as well as efforts to systematically map post-translational modifications [PTM: PhosphoSitePlus reports over 290000 experimental PTMs2] have changed the way we approach biological questions. Moreover, the applications of technologies to study the temporal dynamic of these interactions and PTMs using mass spectrometry are invaluable resources to the biological community. At the same time, lessons have been learned in proteomics from some of its growing pains. The first lesson was that new technical developments, although exciting, might not be sufficient on their own for discovery from complex samples such as serum and plasma. For example, one can remember the hype around surface-enhanced laser desorption−ionization (SELDI) mass spectrometry,3 which from the early to mid 2000s was going to solve the biomarker conundrum; however, because of issues of reproducibility and sample preparation, it has been mired in controversy. At its peak in 2006, SELDI technology was utilized in 24% of biomarker mass spectrometry papers, whereas in 2012 it was represented in only 6%. It was followed by a series of other technologies that have also

E E E F F F F G G G G G G H H H H I J J J K K L L L M M M

Special Issue: Fundamental and Applied Reviews in Analytical Chemistry 2014

A

dx.doi.org/10.1021/ac403551f | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Review

underperformed for the discovery of true biomarkers. The mismatch between proteomic technologies and biomarker discovery has been a major disappointment. The second lesson was that technical difficulties are not a reason to take biological shortcuts, especially if biological conclusions are to be derived from the experiment. Technical difficulties in proteomics have led to numerous papers based solely on one biological sample, with no or insufficient controls. These articles often inferred biological conclusions that were not supported. Fortunately, recent technology improvements allow extensive and detailed proteomics studies within a few days to a few weeks. As well, people are much more aware of sample complexity and the need to further improve technologies. In the past few years, we have seen a decrease in the number of journals that accept proteomic-based papers based on a single biological sample (n = 1). As well, the number of proteomic papers that include orthogonal validation methods, such as Western blotting and microscopy, has increased. Clearly, continuing technological developments remain essential for proteomics. Excitingly, the number and quality of applications has increased drastically in the past few years. Here, we will review some of the technologies and applications in the field of proteomics since our last review in 2011.

Figure 1. Schematic of the availability of proteins for detection by mass spectrometry based upon sample preparation and extraction efficiency; highly available proteins are represented in green, less available proteins in red. Membrane proteins are difficult to isolate due in part to the high degree of hydrophobicity (in red), whereas cytoplasmic, nuclear, and secreted proteins (in green) are easily isolated for mass spectrometry analysis. Exosomes are represented in lighter red as they have so far been under-represented in total secretome analyses. DNA-bound proteins are also more difficult to analyze than soluble nuclear protein fractions.



diagnostics, biomarker discovery, and the development of therapeutic interventions. As serum/plasma proteins have been extensively utilized in biomarker studies and are important proteomes to define clinically, strategies to decrease their complexity through depletion and enrichment strategies are discussed below in more detail. As an alternative to liquid blood products, Martin et al. recently described an approach to extract and analyze proteins from dried blood spots (DBS) for mass spectrometry, identifying >100 proteins that ranged in concentration from μmol/L to nmol/L.5 Although not as complete as whole blood and serum/plasma, it has the potential as a screening material for biomarkers with an advantage that DBS are easier to store and transport than whole blood. Saliva, also a source for proteomic and biomarker research, has 40− 50% of its total protein content as low molecular weight proteins/peptides. Vitorino et al. cautioned that preclearing saliva by centrifugation is ill-advised as protein components are lost and outline efficient extraction protocols for salivary protein/peptide analyses.6,7 As well, they show that addition of urea to the sample upon collection followed by sonication increased protein and peptide identification.7 Exosomes. Biofluids contain cell-secreted proteins in membrane-bound vesicles, termed exosomes. These small vesicles (50−100 nm in diameter), so far under-represented in body fluid proteome studies, originate from multivescular bodies/endosomes and have garnered increasing attention in secretome proteomic research as reservoirs of low abundant and membrane-bound proteins important for cell-to-cell communication. For instance, urinary exosomes derived from epithelial cells facing the urinary tract are particularly enriched in renal proteins to the exclusion of plasmatic proteins.8 Typically exosomes are isolated from the bulk secretome by ultracentrifugation at 100000 g for 1 h, following a 1500g spin to remove cellular debris and a 16000g spin to remove large membrane fragments. However, the commercially available reagent ExoQuick has been shown to precipitate exosomes of greater quantity from ascites,9 human plasma/serum,9 and urine.10 For ascites fluid, Taylor et al. showed that total protein yields were more than 20% higher in ExoQuick preparations versus exosome isolation by ultracentifugation, size-exclusion chromatography, or magnetic beads.9 As well, the yield of a

SAMPLE PREPARATION Our genes contain the basic information to encode up to 30 × 103 proteins, while the human proteome is estimated to contain from 25 to 100 × 104 proteins.4 Complexity is added first by the RNA editing machinery and then by co- and PTMs) such as glycosylation, sulfation, phosphorylation, ubiquitination, acetylation, and regulated proteolysis. The proteome complexity grows and expands with temporal variations in protein turnover, protein modification, and protein−protein interaction. It varies between cell types and spatially with changes in protein subcellular localization. These dynamic processes maintain cellular homeostasis (health) and are altered by disease and its progression. Mass spectrometry data can be used to define cellular pathways/proteomes under given environmental stimuli and, in health and disease, and this information can be applied to design diagnostics and accelerate therapeutic strategies for a disease state. To accomplish this, improved strategies are required that can confidently profile and quantify not only the most abundant but also lower abundant proteins within a given proteome. In biological samples such as serum and plasma, this range can be a billion-fold. While more powerful mass spectrometers are one key to improved proteome coverage, equally necessary are improvements to technologies and strategies for sample preparation, including extraction and enrichment strategies. These strategies must be reproducible and facilitate increased throughput if we are to generate statistically powered, quantitative proteomic data sets of dynamic physiological and pathophysiological processes. In the following sections, we review advances and additions to sample extraction and preparation, including enrichment strategies that not only improve proteome coverage (Figure 1) and dynamic range but also improve our understanding of the dynamic nature of PTMs in cell and tissues. Protein Extraction. Secreted Proteins. Secreted proteins are important mediators of intercellular communication that account for up to 10% of the human genome. Body fluids, including serum/plasma, urine, saliva, and cerebral spinal fluid (CSF), provide accessible source material for disease B

dx.doi.org/10.1021/ac403551f | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Review

alkylation, and digestion in one single capillary column with a strong cation exchange (SCX) monolith matrix. Hua et al. described using a mixed cationic and hydrophobic butyl methacrylate-based monolithic porous polymer photopolymerized 6- or 8-channel microfluidic sample pretreatment devices for proteomic analyses.16 They multiplexed in-space sample fractionation, collection preconcentration, and peptide elution via sheath-flow-assisted electrokinetic pumping. Protein Digestion. Multiple Enzymes. In 2012, Wisniewski et al. showed that the versatility and sequence coverage of the filter-aided sample preparation protocol (FASP; discussed below) could be improved by implementing sequential multiple enzymes for sample digestion (MED).17 They took advantage of the fact that a portion of the sample remains after a single enzyme digestion and elution on the filter membrane, thus allowing sequential digestion of the remaining sample with two or three enzymes. With the use of LysC and trypsin on low microgram levels of protein, their MED-FASP procedure identified 40% more proteins and phosphorylation sites than using only trypsin digestion with FASP.17 Enzyme Immobilization. Enzyme-immobilized reactors allow for rapid digestion of proteomic samples (minutes vs hours for in-solution and in-gel digestion) while minimizing autodigestion products and increasing sequence coverage and can be adapted to handle small volumes in microchannels.18 A good overview of the advantages of enzyme-immobilized sample digestion coupled with microfluidic chip technology for high-throughput proteomics is reviewed in Yamaguchi et al.19 They also note that enzyme-immobilized reactors can be used in tandem with, for example, phosphatase reactors to determine PTMs as well.

specific exosome protein, PLAP, standardized to initial sample volume was several fold higher in ExoQuick preparations as accessed by immunoblotting.9 They state a similar outcome from plasma preparation. Alvarez et al. compared ultracentrifugation and ultrafiltration with the manufacturers or an in-house-modified ExoQuick precipitation protocol for urinary exosome preparation.10 They obtained the highest number of exosome particles per milliliters urine using their in-housemodified ExoQuick precipitation protocol versus ultracentrifugation (2500 vs 1500 × 106, respectively) and also the highest yield of two standard exosome protein markers (Alix; apotosislinked gene 2-interacting protein X and TSG101; tumor susceptibility gene 101 protein). Overall, precipitation of exosomes by ExoQuick is faster, more efficient, and lends itself to higher throughput applications than classical methods requiring multiple ultracentrifugation steps. Membrane Proteins. The analyses of the membrane proteome, estimated at 20−30% of total proteome, is challenging because of the hydrophobicity of the proteins to be extracted and, as such, is often under-represented in proteomic analyses, a critical oversight given that over 50% of drugs target membrane proteins.4 Several recent papers describe improvements for membrane-extraction strategies and efficiencies. Shevchenko et al. carried out a large comparative study of 16 different extraction approaches on the mouse brain proteome.11 Keeping all downstream digestion and analysis steps constant, they demonstrated that detergentbased buffers outperformed organic solvents, aqueous buffers, and buffers with high formic acid concentrations. Moreover, detergent-based buffers gave the highest percentage of membrane protein coverage.11 To improve the recovery of membrane proteins, Lin et al. developed an entirely solutionbased combinative sample preparation (CSP) method that included using the detergent sodium dodecyl sulfate (SDS) for effective membrane extraction, an optimized precooled acetone precipitation for more efficient SDS removal and protein recovery, and sodium deoxycholate (SDC) for improved protein solubilization and trypsin activity/digestion.12 Alternatively, Ning et al. showed that polymer-based amphipol detergents are as effective as the traditional detergent-based lysis buffer, RIPA (radioimmunoprecipitation assay buffer), for extracting and stabilizing membrane proteins in the aqueous phase but with the distinct advantages that the amphipols do not interfere with trypsin activity and can easily be removed by acidification and centrifugation.13 This one-tube method does not require protein precipitation, centrifugation, and desalting steps that can result in sample loss. The amphipol extracted samples identified 20% more proteins and 40% more peptides that the conventional in-solution digestion from RIPA-extracted tissues.13 Protein Stabilization. Applicable to all biological samples but especially those rich in in proteases, Kennedy et al. demonstrated that thermal stabilization of samples effectively and rapidly inactivated enzymes, reduced protein degradation, and preserved high molecular weight proteins over traditional methods of protease inhibitor addition.14 Miniturizing and Automating Sample Preparation. The Rare Cell Proteomic Reactor (RCPR) as developed by Tian et al. carried out proteomic analyses on 50000 cells, making it amenable to analyze limited samples such as those encountered when studying stem cells.15 They demonstrate an integrated platform for protein preconcentration, reduction,



DECREASING SAMPLE COMPLEXITY Serum/Plasma Strategies to Improve Dynamic Range. Arguably the most complex biological sample, plasma, has a wide dynamic range upward of 10 orders of magnitude, with 99% of the concentration represented by fewer than 10%20 of the curated 1929 known plasma proteins.21 While high abundant proteins (HAPs) can be affected and thus have utility in the study of metabolic disorders,22,23 moderate- and low-abundant proteins (LAPs) are considered to represent greater potential as biomarkers. Approaches to reduce the complexity of plasma include depletion of HAPs or enrichment of LAPs (Figure 2). Depletion Strategies. Immunoaffinity provides an effective means for the removal of HAPs from plasma, serum,24 and other complex mammalian proteomes such as CSF.25 There are a number of commercially available kits, in the form of chromatographic or spin columns, which enable the removal of a single HAP (e.g., albumin) or up to 20 of the most highly abundant plasma proteins from biological fluids. In general, increasing the number of depleted HAPs corresponds to an increase in the number of identified plasma proteins. Smith et al. identified 159 proteins in complete plasma but nearly double that, at 301 proteins, in plasma depleted of the top 20 HAP.26 This can be further enhanced by repeated depletion with the same column25,27 or multiple columns in tandem.28 To further enrich for LAPs, the next 50−300 most abundant proteins can be removed using a second immunoaffinity column in tandem (SuperMix, Sigma-Aldrich).29,30 With the use of this method, 40 proteins with concentrations below 100 ng/mL were identified from depleted plasma; for 2 of the 40 proteins, the accuracy of quantification was confirmed by C

dx.doi.org/10.1021/ac403551f | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Review

a proteome to enhance the findings of shotgun proteomics. Peptides, bound individually to a solid matrix, are capable of binding proteins with different affinities. Utilizing a library of millions of solid-state peptides under overloading conditions, HAPs saturate binding partners and are readily washed away, while LAPs are concentrated on binding partners, thus reducing the dynamic range of the eluted proteome. Notably, the exact mechanism of action of solid-state CPLL recently came under discussion with indication that the interactions were hydrophobicity-dependent; various biochemical studies refuted this and are reviewed by Righetti and Boschetti.37 In a direct comparison of plasma depletion (ProteoPrep20) with LAP enrichment (ProteoMiner, Biorad), Millioni et al. identified 25% more proteins using depletion when compared with LAP enrichment.38 However, the authors noted the increased ease and reduced sample manipulation of the enrichment protocol, which eliminated contaminant identification, included keratin. In a study utilizing swine plasma as a test proteome, the total number of proteins identified following ProteoMiner enrichment was 2657 as compared with the 1708 proteins identified in nonenriched plasma, with a 27% overlap in protein identifications.39 Notably, this study utilized both trypsin and Glu-C digestions to probe the plasma proteome. Utilizing CPLL, in combination with iTRAQ for quantification, Shetty et al. compared plasma from healthy patients with those from patients with HCV, HIV, or coinfected patients.40 While five proteins (APOA2, APOC2, APOE, C3, HRG) were upregulated in the infected patients, these are all moderately abundant proteins. Similarly, in using CPLL to enrich plasma, moderate to HAPs were detected to be pathogenomic in Mycobacteriam avium infection of cattle41 or indicated as poor treatment outcome in lung cancer.42 The identification of LAPs and low-abundant isoforms of moderately abundant proteins, which represent potential biomarkers of atherosclerosis, were enriched by CPLL treatment of atherosclerotic plaque lysates.43 A significant issue in identifying moderate- to LAPs from processed plasma is the initial amount of plasma required. Notably, Millioni et al. indicated that a benefit of CPLL is that the amount of working material after CPLL exceeds that of depletion-treated plasma.38 Further to this, von Toerne et al. successfully scaled down the protocol for CPLL to utilize only 10 μL of murine plasma for CPLL enrichment, from which they then identified and quantified three proteins to be differentially expressed in diabetic versus nondiabetic animals.44 While not feasible for shotgun proteomics, positive selection of proteins is advantageous for targeted studies. As outlined below, physiochemical characteristics such as PTMs offer enrichment strategies for subproteomes, including those within plasma. In instances when specific plasma proteins are to be analyzed by mass spectrometry, immunoaffinity provides a high specificity toward target proteins enabling for significant enrichment of LAPs (reviewed in Madian et al.45). Directed toward either proteins or peptides, immunopurification products can then be evaluated in further liquid chromatography−tandem mass spectrometry (LC−MS/MS) analyses. Recent work by Krastins et al. utilized monolithic microcolumns (Thermo) coupled with commercially available antibodies to enrich for 16 different clinically relevant proteins, including those of high, moderate, and low abundance,46 and then coupled this to selected reaction monitoring (SRM) analysis. The authors proposed utilizing the columns in tandem

Figure 2. Only a few proteins within biofluids represent the most abundant in terms of concentration, while the greatest number of proteins are very low in abundance. Without enrichment or depletion, the high and some moderately abundant proteins are detected. Depletion of high-abundant proteins (HAP) or enrichment of lowabundant proteins (LAP) enables for improved MS detection of medium and abundant proteins. Circles indicate a protein with size indicating the abundance.

enzyme-linked immunoabsorbent assay (ELISA).30 However, when comparing a column removing albumin + immunoglobulins (Igs; Qiagen) to the combined Ig + SuperMix (Sigma) approach, the latter method resulted in better evaluation of LAPs but reduced accuracy in quantification of HAPs.31 In an effort to improve the number of antibodies included in immunoaffinity depletion, a recent study evaluated the capacity for antibody development from different fractions of plasma. Human plasma was fractionated by dual-ion-exchange columns and each fraction then used for antibody production in chickens. Purified antibodies were combined and used in immunodepletion columns; of the 165 unique proteins identified, 23% were detected only in the depleted plasma.32 Physiochemical-based strategies applied to the depletion of HAPs include precipitation, hydrophobic, ionic, or alternative interactions such as immobilized dyes. Due to the inherent reduced specificity of hydrophobic and dye binding, there is an apparent reduction in the efficiency when compared with immunoaffinity.24,33 Similarly, while a high level of reproducibility was achieved with ammonium sulfate precipitation, the overall depletion of albumin from plasma, at ∼40%, was not comparable to the >80% that was achieved by immunoaffinity columns. 33 As with tandem immunoaffinity, sequential depletion utilizing dithiothreitol (DTT) precipitation of disulfide-bond-rich proteins followed by acetonitrile precipitation of high molecular weight proteins significantly reduced the amount of HAPs and revealed previously unidentifiable moderately abundant proteins.34 A considerable limitation to all of these negative selection techniques is the concurrent depletion of low-abundant (nontarget) proteins. In a direct comparison, both physiochemical (dye) and immunoaffinity methods for removal of albumin, or multiple HAPs, removed nontarget proteins.35 Among the proteins targeted for depletion by immunoaffinity with a 6-, 14-, or 20-target cartridge, Yadav et al. identified over 100 proteins,36 highlighting the implications of utilizing this technology. While depletion methods may reveal previously uncharacterized proteins in shotgun proteomic analyses of plasma, the entire proteome will not be available. Moreover, quantitative analysis is subject to significant bias based upon the method of depletion that is utilized. Enrichment Strategies. While not a new technology, combinatorial peptide ligand libraries (CPLL) have recently been adapted and show utility in reducing the dynamic range of D

dx.doi.org/10.1021/ac403551f | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Review

protein digests.51 Alternative techniques include the application of immobilized metal affinity chromatography (IMAC)52 for the isolation of phosphopeptides that have different affinities to acidic or basic peptides, depending on the IMAC metal ions. Notably, a complementary relationship between Ti4+ and Fe3+ has been found.53 On the basis of this discovery, a poly ethylene glycol (PEG) methacrylate brush decorated Fe3O4@ SiO2 core−shell nanoparticle (denoted as Fe3O4@SiO2@ PEG−Ti4+ IMAC) was synthesized and showed superior performance in specificity and binding capacity over conventional IMAC materials.54 A surface-blocked, nanoprobe-based IMAC (NB-IMAC) method was developed and proved to enrich more multiple-phosphorylated peptides than commonly used methods.54 While less commonly used than TiO2 or the Ti-IMAC, novel materials involving metal oxides have also shown improved performance in phosphopeptide enrichment, including TiO2/graphene composites,54 Fe3O4@mTiO(2) microspheres,54 SiO2/TiO2,55 and magnetic TiO2-coated carbon-encapsulated iron nanoparticles.56 However, these materials are not as widely applied as TiO2 or the Ti-IMAC. Despite the great ability of these materials to capture the lowabundant phosphopeptides from protein digests, they lack the capability to isolate specific forms of phosphopeptides. The combination of these materials with chromatographic fractionation techniques has been applied in efforts to overcome this shortcoming. TiSH is a strategy using a TiO2 phosphopeptide pre-enrichment step, followed by postfractionation using sequential elution from IMAC (SIMAC) to separate multiand monophosphorylated peptides and hydrophilic interaction chromatography (HILIC) to separate the monophosphorylated peptides. With this strategy, 6600 unique phosphopeptides were identified from 300 μg of peptides/condition (22 unique phosphopeptides per microgram) in a duplex dimethyl-labeling experiment, with an enrichment specificity >94%.57 Strong anion exchange (SAX) was shown to deplete acidic phosphopeptides from a Ti-IMAC-enriched phosphopeptide mixture, thereby improving coverage for the detection of basophilic kinase substrates from mouse liver.58 Conversely, SCX was shown to improve the enrichment of basic phosphopeptides at ultra acidic conditions.59 Glycopeptide Enrichment. Improvements to conventional methods of glycopeptide enrichment, including lectin affinity chromatography, HILIC, and hydrazide chemistry, are wellestablished. Ahn et al. utilized a lectin-based method to enrich glycoproteins from plasma and to identify potential liver cancer biomarkers.60 The performance of HILIC was improved by adding acid as an ion paring reagents to reduce the nonspecific adsorption, providing the possibility to identify intact glycopeptides by mass spectrometry.61 The specificity of HILIC can also be improved by utilizing a novel sorbent, wherein sugars are conjugated to silica beads by click chemistry.62 These click maltose beads can be packed as a proteomics reactor that integrates protein digestion, glycopeptides enrichment, and deglycosylation. This glycoproteomic reactor simplified the tedious procedure for the glycoproteome analyses and improved the sensitivity of N-glycosites identification.63 A miniaturized protein reactor has been developed, containing monolithic sorbent within a capillary tube for the integrated processing of cell lysate and enrichment of glycopeptides.64 A recent development toward glycopeptide enrichment is to isolate glycopeptides with specific glycan structures. Changes in temperature of the periodate oxidation in hydrazide chemistry

to multiplex the enrichment of multiple proteins from a single sample. With the use of antipeptide antibodies, the ability to multiplex mass spectrometry immunoaffinity by grouping antibodies to 50 targets or to sequentially enrich up to 30 peptides from plasma was shown to be feasible.47 Further to this, solid-phase extraction after immunoprecipitation enabled for the enriched peptide to be directly injected into the mass spectrometer, reducing the cycle time from 20 min to 7 s, permitting for analysis of 96 samples in 15 min.48 Notably, immunoaffinity strategies require the extensive production of antibodies for use, the lack of which is often touted as the advantage of proteomic approaches over conventional methodologies such as ELISA. Indeed, Whiteaker et al. evaluated 89 target proteins (including those at nanogram per milliliter concentrations), using 220 peptide-based antibodies, but this required the injection of ∼80 animals with a total of 403 peptides.49 Enriching Post-Translationally Modified Proteomes. Post-translational modifications are often important in the functionality of a protein and interaction with its partners in a given biological state. As new information is compiled on how these PTMs are altered by environmental stimuli, experimental pressures, or disease status, we may gain further understanding of their biological relevance. Enrichment strategies that identify the state of PTMs can thus contribute to defining the mechanism, functioning and/or malfunctioning of proteins within a given condition (Figure 3).

Figure 3. A graphical representation of a literature survey of reports of large-scale identification of post-translational modification sites suggesting the progress and the difficulty in the mass spectrometric analyses of different types of PTMs: phosphorylation,130 ubiquitylation,217 acetylation,129 N-glycosylation,218 and methylation.74

Phosphopeptide Enrichment. Serine and threonine are the major sites of phosphorylation in the total phosphoproteome, while tyrosine phosphorylation is less abundant despite its vital role in signaling transduction. Currently, the most successful isolation of tyrosine phosphorylated peptides is with antiphosphotyrosine; immunoaffinity purification at the level of peptides or proteins showed a complementary relationship.50 However, by screening phosphopeptide motif binding proteins using pTyr-containing peptide libraries, Christofk et al. showed the utility of phosphopeptide binding domains, namely the SH2 domain for tyrosine phosphorylation, to enrich tyrosinephosphorylated peptides.50 Phosphoproteomics has made great progress in recent years with the implementation of titanium oxide (TiO2) for the specific isolation of acidic phosphopeptides from complex E

dx.doi.org/10.1021/ac403551f | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Review

Filtered Aided Sample Preparation (FASP). Filtered aided sample preparation was introduced in 2009 by Wisniewski and colleagues.76 It uses the principle of protein retention and buffer exchange by ultracentrifugation on molecular weight cutoff (MWCO) spin columns as a means to carry out protein denaturation, reduction, alkylation, and digestion without sample transfer. Since our last review, the FASP protocol has been made more versatile by technology development of multienzyme approaches for improved sequence coverage (and discussed above), by quantification and by multiplexing. In 2011, Wisniewski et al. modified FASP to efficiently analyze small amounts of material from formalinfixed paraffin embedded (FFPE) laser capture microdissected (LCM) samples. This improved method resulted in up to a 10fold increase in the number of proteins identified by up to 10 times when compared to traditional lysis with lithium dodecyl sulfate (LDS), SDS−PAGE separation, and in-gel identification.77 As well, they illustrate that addition of carriers such as polyethylene glycol or dextran to low protein samples improved the yield of peptides to submicrogram ranges. Combination of FFPE-FASP with SAX chromatography (FFPE-FASP-SAX) further improved peptide coverage and protein yields.77 This procedure is especially applicable to the analyses of archived fixed samples of disease tissues and is important in the discovery of new biomarkers of disease or finding new drug targets against a particular disease state. Sun et al. showed that the FASP procedure could be combined with a fully automated two-dimensional (2D)-LC−MS/MS technique.78 In their study, 75% of the proteins identified from rat zymogen granules had not been previously reported in studies using traditional in-gel digestion from one-dimensional (1D) or 2D gel electrophoresis. FASP can be combined with mass tagging of proteins, as first illustrated by Lu et al., wherein quantitative data was obtained with dimethyl labeling.79 However, one disadvantage of tandem mass labeling is that samples are typically labeled late in the protocol of sample processing. McDowell et al. (2013) overcame this issue by combining isobaric mass tagging with FASP (iFASP),80 wherein tryptic peptides are labeled prior to elution. In comparison with a standard protocol of in-solution digestion followed by labeling and quantification, iFASP had comparable labeling efficiency and false discovery rates (99% and 1%, respectively) but 2× the number of quantified peptide spectral matches.80 More recently, Switzer et al. showed that FASP could be efficiently multiplexed using a 96-well filter plate format.81 This extension of FASP allows for preparation of multiple samples simultaneously, improving reproducibility and making it amenable to automation by robotics. Samples (e.g., cells or tissues) can be lysed on a 96-well plate and then transferred directly to the 96-well MWCO filter plates for simultaneous processing. This setup is well-suited for large sample sets or for fractionation of large proteomes since it is less laborious than processing large sample numbers individually. SDS Spin Columns. Bereman et al. compared the traditional FASP procedure to a sample preparation method that uses SDS spin removal columns to perform shotgun proteomics.82 They demonstrate that the SDS spin column reduced sample preparation time, increased reproducibility, and identified 30−107% more peptides than the FASP procedure. This correlated to a 50% increase in protein identification with two or more peptides. Species were also more hydrophobic, according to GRAVY scoring. Hengel et al. further evaluated the utility of the SDS spin columns in terms of percent protein

can target the sialic acid containing glycopeptides and facilitate the quantitative characterization of sialylation.65 By combining the chemical enrichment method and the use of corefucosylation favoring endoglycosidase, aberrant N-glycosylation carrying both terminal sialylation and core-fucosylation has been identified. The difference of the aberrant N-glycosylation was compared quantitatively between patients with liver cancer and cirrhosis.66 As with phosphopeptides, acidic sialylated glycopeptides can be isolated by metal oxides like TiO2 and IMAC. Technologies targeting galactose residues have recently been developed with the aid of galactose oxidase, which specifically oxidizes the terminal galactose and N-acetylgalactosamine residues.67 In vivo labeling with azide-containing sugars and click chemistry have also been applied to capture either Olinked or N-linked glycoproteins from secreted media, with the advantage being that cells are cultured in regular fetal bovine serum-containing media rather than serum-free media.68 The N-glycoproteome, represents only a small portion of the total proteome. As such, recent efforts have focused on developing technologies that will increase the coverage and sensitivity of this subproteome. An N-glyco-FASP approach was developed which replaced the column in conventional lectin affinity chromatography with ultrafiltration units; it has been applied in extensive mapping of 6367 N-glycosylation sites in four mouse tissues and plasma and quantitative analyses of glycosylated proteins from secreted media.69 These developments in glycopeptide and glycoprotein enrichment increased not only the depth of the glycoproteome but also helped solve biological problems. For instance, glycoprotein enrichment helped identify novel substrates of the Alzheimer protease BACE1 in primary neurons.70 Enrichment of Other PTMs. Due to their inert chemical properties, other PTMs are usually enriched for with antibodies that specifically recognize the modification. Antibodies specific to a diglycine remnant (K-ε-GG) on lysine after tryptic digestion has dramatically improved the ability to enrich and identify ubiquitination sites from cellular lysates.71 Immunoaffinity enrichment of acetylated peptides could be improved by using a cocktail of monoclonal antibodies.72 As an alternative to immunoaffinity, HILIC, which can simultaneously enrich glycopeptides and phosphopeptides,73 was also shown to isolate highly basic and hydrophilic, methylated argininecontaining peptides.74 SCX fractionation also has the ability to isolate acetylated peptides as well as phosphopeptides.75 These results demonstrate that immunoaffinity enrichment has better selectivity while chromatographic enrichment can help increase the coverage of peptides with PTMs.



METHODS TO IMPROVE COVERAGE Sample preparation remains the critical first step to overcoming complexity and dynamic range problems associated with analyses of complex biological samples. Ongoing improvements in protein processing prior to mass spectrometry analyses will ensure continued growth in the numbers of proteins identified, specifically by increasing protein extraction efficiency, protein purification methods, and decreasing the complexity of the samples as it enters the mass spectrometer. Our end goal remains to study all proteins in a given sample, including isoforms and PTMs. Premass spectrometry strategies should be designed to minimize sample loss, allowing the mass spectrometer to detect the maximum number of proteins in our complex biological samples. F

dx.doi.org/10.1021/ac403551f | Anal. Chem. XXXX, XXX, XXX−XXX

Analytical Chemistry

Review

recovery and effectiveness of SDS removal.83 They showed that sample recovery was dependent not only on the concentration of SDS in the sample but also was reduced at lower concentrations for samples of different protein concentrations and equal SDS concentration. They attribute this to competition of binding sites on column for SDS and protein/ peptide. They therefore caution that sample recovery optimization should be performed based on sample and SDS concentrations.83 Tanca et al. did a comparative evaluation of detergent-based sample preparation workflows for MS analyses of the bacterial proteome of Escherichia coli.84 SDS-based buffers performed best in terms of protein extraction efficiency, especially for high molecular weight and membrane proteins. They then tested the SDS extracts in five different MS sample preparation workflows: (1) spin column detergent removal followed by in-solution digestion, (2) protein precipitation followed by in-solution digestion in ammonium carbonate, (3) protein precipitation followed by in-solution digestion in the urea buffer, (4) FASP, and (5) 1DE separation followed by ingel digestion.84 The numbers of proteins identified by the spin column was 1104, comparable with ∼1000 for the other 4 methods. However, more peptides were identified, and on average, there was greater sequence coverage with the FASP method. For instance, FASP identified 7.7 unique peptides on average per protein versus 4.6 for the SDS spin column. Notably, all five procedures identified similar distributions of proteins in terms of subcellular localization. Detergent Clean-up Methods for MS-Deleterious Agents. Detergent removal or cleanup prior to MS analyses is commonly accomplished using trichloroacetic acid (TCA) precipitation, chloroform/methanol/water (CMW) extraction, commercially available detergent removal spin (DRS), or FASP columns. Sharma et al. compared the efficiency of these options and found that at high protein concentrations all performed similarly, while at lower protein concentrations the FASP procedure outperformed the other three.85 Integrated Approaches. Yu et al. combined gel-eluted fraction entrapment electrophoresis (GELFrEE), FASP, and microwave-assisted on-filter enzymatic digestion, calling this integrated workflow GOFAST for efficient identification of membrane fractions.86 Workflows of this type can be integrated to include other separation technologies, such as HILIC and SCX. Further integration will continue to increase the dynamic range of the proteins identified. Monolithic Columns. These columns made of fine porous mesopores and dense networks of macropores provide higher separation efficiencies of complex mixtures than traditional column chromatography and have been employed to determine the proteome of complex mixtures like milk proteins.85 Composed from organic polymers of polymethacrylates and polystyrenes or from the inorganic polymers like silica they can be miniaturized for in-chip-based mass spectrometry applications. Contrary to miniaturization, Yamana et al. employed a 100 μm i.d., 2 m long monolithic silica C-18 column, and an 8 h gradient for profiling the proteome of human-induced pluripotent stem cells (iPSCs) and fibroblast lysates.87 They identified 9510 proteins (98977 peptides), showing that long monolithic columns are also useful for sensitive and deep proteome analyses. Such columns can be adapted and applied for simultaneous depletion of immunoglobulin G and albumin from human plasma.88 Vaast et al. utilized polymer-monlithic poly(styrene-co-divinylbenzene) capillary columns at ultrahigh pressure to perform fast efficient gradient separations making

them applicable in high-throughput screening applications.89 Their analysis time was more than 4-fold reduced when compared with conventional flow rates and gradient conditions. As well, they noted good retention time repeatability with relative standard deviations of