Bottom-Up Proteomics (2013–2015): Keeping up ... - ACS Publications

Nov 11, 2015 - Bottom-Up Proteomics (2013−2015): Keeping up in the Era of. Systems Biology. Janice Mayne,. §. Zhibin Ning,. §. Xu Zhang, Amanda E...
0 downloads 0 Views 6MB Size
Subscriber access provided by UNIV LAVAL

Review

Bottom-Up Proteomics (2013-2015): Keeping Up in the Era of Systems Biology Janice Mayne, Zhibin Ning, Xu Zhang, Amanda E Starr, Rui Chen, Shelley Deeke, Cheng-Kang Chiang, Bo Xu, Ming Wen, Kai Cheng, Deeptee Seebun, Alexandra Star, Jasmine I Moore, and Daniel Figeys Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.5b04230 • Publication Date (Web): 11 Nov 2015 Downloaded from http://pubs.acs.org on November 14, 2015

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

BOTTOM-UP PROTEOMICS (2013-2015): KEEPING UP IN THE ERA OF SYSTEMS BIOLOGY

Janice Mayne, Zhibin Ning, Xu Zhang, Amanda E. Starr, Rui Chen, Shelley Deeke, Cheng-Kang Chiang, Bo Xu, Ming Wen, Kai Cheng, Deeptee Seebun, Alexandra Star, Jasmine I. Moore and Daniel Figeys*

Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology and Immunology, University of Ottawa, 451 Smyth Rd., Ottawa, ON, Canada, K1H8M5

*Phone: 613-562-5800 ext. 8674. Fax: 613-562-5655. E-mail: [email protected]

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

CONTENTS Introduction Advances in Proteomic Related Techniques General Sample Preparation One Pot Strategies High Throughput Strategies Optimizing Digestion Sample Preparation Considerations for Membrane Proteins Mass Spectrometry-Based Approaches for Detecting Protein Interactions Affinity Purification Coupled to Mass Spectrometry Proximity Biotinylation Approach to AP-MS Data Independent Analysis of AP-MS Generated Spectra Enriching for Post-Translational Modifications Phosphorylation Enrichment Ubiquitination Enrichment Acetylation Enrichment Glycosylation Enrichment Methylation Enrichment Terminal Peptide Enrichment Liquid Chromatography Mass Spectrometry Advances Single Shot Long gradient Versus Fractionation Two-Column Configuration to Boost Sample Throughput Protein Quantification in Bottom-Up Proteomics NeuCode SILAC Strategy MS2 Level Quantification Label Free Quantification Absolute Quantification Software Tools for Quantification Mass Spectrometry Acquisition Mode Transitions of MS Data Acquisition Modes Bioinformatics and Data Analysis New Search Engines Bioinformatics Tools for Post-Translational Modifications Statistical Analysis: Options for Downstream Analyses of MS Data Statistical Analysis Data Filtering Functional Analyses: Enrichment Analysis Next Leap of Proteomics Proteogenomics Call for Proteogenomics: Issues with Peptide Spectra Matching Customized Databases for Proteogenomics Applications of Proteogenomics Metaproteomics: Unveiling the Functions of the Microbial Community Sample Preparation Considerations for Metaproteomic Studies Challenges in Microbial Peptide Identification and Protein Inference Taxonomic Assignment of Peptides and Proteins Functional Interpretation

ACS Paragon Plus Environment

Page 2 of 104

Page 3 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Quantitative Metaproteomics Proteomics for Understanding Host-Microbe Interactions Proteomics and Clinical Applications Biomarkers Drug Design Decision Making in Health Care The Future and Personalized Medicine Perspectives

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

INTRODUCTION

Over 17,000 papers related to proteomics have been published since our 2011-2013 review. A subset are true gems reporting amazing applications of proteomics to better understand protein interactions, pathways, dynamic signaling and the human proteome, to name a few. While we have clearly seen more attention paid to data quality, and the proper experimental design in manuscripts reporting biological applications of proteomics, more improvements are needed. Technological development still remains a strong driving force of proteomics, making up a large portion of these papers. As in any maturing field, some areas of technological development in proteomics are saturating. However, many new areas in proteomics are burgeoning. Examples include the development of enrichment materials to study different protein modifications, approaches for protein quantification, data independent acquisition (DIA) methods, bioinformatics and the rapid developments in metaproteomics. The acceptance of proteomics in the biological community really depends on its applications. For instance, protein interaction mapping, post-translational modification (PTM) analyses and protein Atlas are routinely used by the biological community.

However, other aspects of proteomics are still under appreciated

including biomarker discovery and studies of diseases from tissues. In part, this is due to previous and, to some extent, still ongoing overstatements from underpowered studies that reported unsubstantiated biological conclusions. Very large-scale, good quality, quantitative proteomic studies are now technically feasible

ACS Paragon Plus Environment

Page 4 of 104

Page 5 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

and ongoing in a few laboratories around the world. However, few bioinformatics tools are capable of handling these large studies. Moreover, proteomics is increasingly integrated in experiments involving multiple -omics datasets obtained across large numbers of biological replicates. “Big data” challenges are clearly appearing in proteomics. This biannual review is focused on technological trends, which we believe are important in proteomics. We continue to foresee that technology and bioinformatics development will remain important areas of research in proteomics.

ADVANCES IN PROTEOMIC RELATED TECHNIQUES

Bottom-up proteomics – whereby enzymatically digested peptides from complex samples/mixtures are used to identify proteins and their post-translational modifications (PTMs) - is the mainstream proteomics approach. This is largely due to the ease and efficiency of detecting enzymatically generated peptides by liquid chromatography electrospray ionization mass spectrometry (LC-ESI-MS) versus top-down MS analysis of intact proteins. Bottom-up proteomic analyses have growing applications in translational and clinical research. It is not bounded by the limitations of the classical reductionist approach in biology, where complex interactions are dissected into its constituent parts for simplification, study and description, and thus ignore the larger picture; failing to see the forest for the tress. Biology is instead filled with intra- and interconnections, branches, redundancies and feedback loops that allow us to

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

compensate and adapt to changes in our systems and environments. Therefore, studying a model as a whole gives greater appreciation and understanding to the interplay of different components, uncovering complexities that are not measurable by reductionist approaches. For proteomics, studying the whole, means being able to identify and quantify as many proteins as possible. The identification and analyses of all expressed proteins in a given experimental system is a herculean task, and is not yet possible. Fortunately, new developments in technologies are allowing for deeper probing of the proteome so that complex biological systems can be fully defined at the protein level, increasing our understanding of their roles in health and disease (Figure 1). In this section we will review recent advances in general sample preparation and enrichment strategies, which help to reveal the post-translational proteome, as well as strategies that improve quantification of proteome changes. GENERAL SAMPLE PREPARATION: The success of any proteomic workflow begins with and depends upon optimal, consistent and appropriate sample preparation. Protocols and technologies to advance sample preparation should strive to minimize preparation time, the number of preparation steps, and sample discrimination while maximizing automation, reproducibility and efficiency. One-pot strategies: So called ‘one-pot’ strategies for sample preparation are attractive in these regards, in that confining the process to one ‘reactor’ can reduce sample loss while increasing yield and decreasing preparation times. For these reasons, offshoots of the filter assisted sample preparation (FASP) protocols continue to appear in the literature. These repurposed ultrafiltration units function

ACS Paragon Plus Environment

Page 6 of 104

Page 7 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

as one-pot proteome reactors and prove very effective in bottom-up proteomics analyses. In our last review we reported on the success of multi-enzyme digestion (MED) strategies in combination with FASP, coined MED-FASP for improved proteomic coverage1. Wisniewski et al has further increased peptide identification by 30%, protein identification by 17% and protein sequence coverage by 10-30% by combining reversible PEGylation with MED-FASP2. They first derivatize thiolreduced proteins with thiol-activated polyethylene glycol (TAPEG) prior to sequential enzymatic cleavage with LysC and trypsin. Following collection of LysCand trypsin-generated peptide fractions, they then collect a third, representing the cysteine-containing peptides, released by thiol-reduction. They demonstrated TAPEG-FASP using tissue lysates from mouse brain, liver and muscle as well as the cell line CaCo-2. Other recently published techniques focusing on one-pot developments employ paramagnetic beads3, StageTips4 and nanoparticles5. Hughes et al introduced a paramagnetic bead based strategy that they coined single-pot solid-phase-enhanced sample preparation (SP3)3. The carboxylate-coated paramagnetic beads trap proteins and peptides in a hydrophilic aqueous surface layer formed by the addition of an organic solvent. With samples immobilized, interfering reagents can be removed by washes, accomplishing concentration and cleanup of samples for downstream proteomic analyses. SP3 is adaptable to both sample labeling for quantification and to a high-throughput 96 well format3. Using yeast or human HeLa cells, Kulak et al reported a highly reproducible and quantitative method combining multiple sample processing steps to an inStageTip

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

reactor (micropipette solid phase extraction tips) which consisted of a pipette tip and C18 disc4. All steps were done on tip, with cells or samples added directly to the tip significantly reducing preparation time. Sample lysis, reduction and alkylation were combined in one step by replacing dithiothreitol (DTT) and iodoacetamide (IAA; used sequentially) with reducer tris (2-carboxyethyl) phosphine (TCEP) and alkylating reagent chloroacetamide (used in concert). They adapted this methodology to multi-sample formats such as 96 well plates. Nanoparticles are an attractive tool for protein isolation and enrichment, due in part to the large protein-binding surface area, relative to their size. In addition, their capacity to bind subsets of proteins according to physiochemical properties make them especially useful in bottom-up proteomics for protein identification, as well as, for identification of peptides/proteins that are uniquely expressed when comparing samples (for example in control versus disease). This is due to the fact that nanoparticle-functional groups increase the range of proteins identified, allowing for detection of lower abundant proteins5. Zaccaria et al used nanoparticles with three different functional groups (namely strong anionic, cationic and hydrophobic) to capture and identify proteins found in red blood cells5. This is a challenge since the proteome is saturated by one protein (hemoglobin at 98%). They demonstrated that without prior fractionation and using 200µl of starting material, they could identify 893 proteins versus 92 in crude lysate. Moreover, they demonstrated that each type of nanoparticle reproducibly captured a subset of proteins and that approximately 22% of proteins were captured by all of the three bead chemistries.

ACS Paragon Plus Environment

Page 8 of 104

Page 9 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Typical in-gel fractionation procedures can analyze deeper into the proteome than in-solution sampling due to molecular weight protein fractionation prior to in-gel digestion and MS/MS analyses. However, in-gel bottom-up proteomics is disadvantaged by its multiple processing steps that make it more laborious. Fischer et al reimagined the typical in-gel sample preparation method into a one-pot gel assisted sample preparation (GASP) method, demonstrating it to be as effective as FASP and five times more effective than in-solution digestion in terms of peptide intensities6. Protein samples and even intact cells (without further clarification) can be extracted in the presence of reducing agent and then co-polymerized with the monomeric form of acrylamide. During this process cysteine residues react with acrylamide to form cys-S_beta propionamide (PAM-cys), which eliminates the need for alkylation before further in-gel digestion and extraction. They show scalability, in terms of efficiency and sensitivity, of this procedure from 1-1000µg. High-throughput strategies: Modern proteomics procedures, especially those moving toward clinical applications, need to be high-throughput – meaning that they are fast, require limited sample processing steps/fractionation, and be able to manage large numbers of samples. These considerations are also important when many conditions are measured at once. To handle sample preparation/automation for large scale and high throughput proteomics new technologies such as 96-well plates and StageTips on-line as pre-columns are useful7,8, but have limited demonstrated usage so far9,10. Hosp et al used 96 well plates to grow yeast in parallel, and then processed the chromatin complexes pulled down from those yeast extracts. By combining parallel processing with two analytical columns in tandem,

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 104

each sample could be processed in 15 mins7. The second column allowed for parallel processing related to pre-MS analysis, including washing, sample loading and equilibration. Binai et al reduced sample turnover time by utilizing an autosampler equipped with two sets of 96 StageTips8. They combined their pre-column StageTips with a 6.5 min gradient reverse phase gradient analytical column. This type of configuration would perform well on low to medium complexity samples. Falkenby et al utilized disposable StageTips to avoid sample carry over and allowed for analysis of 192 replicates of E.coli samples in 30 hours10. Optimizing digestion: To minimize preparation time it is essential to consider protein digestion, which is often the most time consuming step in the proteomic workflow. Li et al utilized a high trypsin ratio of 1:1 to digest proteins in 1hr at 37°C (instead of the typical 12-24 hrs of incubation), then used C18 packed pipette tips to separate the peptides from the trypsin using reverse phase liquid chromatography (RPLC)11. Two sequential elutions allowed the recovery of the trypsin and its reuse up to four times, thereby reducing costs. They used this methodology successfully in their

multiple

reaction

monitoring

development to avoid back exchange of

mass 18O

spectrometry

labeled to

16O

(MRM-MS)

assay

labeled peptides that

occurs with longer digestion times. Jiang et al designed a monolithic material based immobilized enzyme reactor (IMER) conjugated with either trypsin and Lys-C, or Lys-N12. They showed comparable protein identification to in-solution digestions but with the advantage that enzymatic digestions were less than one hour.

ACS Paragon Plus Environment

Page 11 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Fang et al sought to optimize trypsin digestion conditions in terms of minimizing and controlling for nonspecific, trypsin cleavages13. They manipulated variables such as denaturing reagent; sodium dodecyl sulfate (SDS) versus urea, trypsin type/grade, storage time, enzyme to substrate ratio and sample concentrations to produce optimized conditions that decreased non-specific trypsin cleavages from 28.4 to 2.8%. The recipe they recommended included protein storage of less than 1 month, use sodium dodecyl sulfate (SDS) rather than urea, MS-modified grade trypsin at 1:50 ratio and samples at protein concentrations of 0.05 μg/μl.

SAMPLE MEMBRANE CONSIDERATIONS FOR MEMBRANE PROTEINS: Membrane proteins remain a challenge for MS analysis due to their high hydrophobicity and low abundance. It is estimated that 20-30% of genes in genomes encode membrane proteins14. Membrane proteins carry out important functions such as cell-cell interaction, cell signaling via cell surface receptors and transport of ions and molecules. Membrane proteins are implicated in the pathogenesis of many diseases including cancer, Alzheimer’s disease and diabetes, and it has become increasingly important to study this sub-proteome. In recent years, new approaches were reported to improve the extraction and analysis of membrane proteins. Lin et al outline an enhanced sodium dodecyl cholate (ESDC) assisted digestion method for shotgun analysis of the membrane proteome15. Proteins are extracted/solubilized in 5% SDC and digested in buffer containing 1% SDC compared to their previous method16 where SDC was used (at 1%). They claim that using 5% SDC improves the extraction and solubilization of proteins, especially very

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 104

hydrophobic ones over both 1% SDC methods and standard FASP protocols and that diluting from 5% to 1% SDC improves trypsin digestion efficiency15.

Compared

with their original protocol and standard FASP protocols, they were able to increase the identified number of membrane and transmembrane proteins by 13.2% and 17.9 %, respectively. Smolders et al reported on the utilization of biotinylation of small tissue samples to carry out the proteomic profiling of the plasma membrane proteome to solve the problem of poor extraction efficiency, weak enrichment, contamination and large sample consumption of the typical ultracentrifugation method for membrane proteins studies17. They carried out an acute slice biotinylation assay (ASBA) on mouse coronal brain slices where the sections were biotin labeled prior to mechanical homogenization and streptavidin pull-down. Approximately 26% of Ingenuity Pathway Analysis (IPA) annotated proteins were classified as plasma membrane proteins. In fact, using IPA and DAVID (Database of Annotation, Visualization and Integrated Discovery), annotation of approximately 62% of identified proteins were assigned as plasma membrane proteins, peripheral membrane proteins, cell surface proteins and extracellular space proteins. This method can be extended to include different tissues and is promising in the plasma membrane proteomics field17. Ning et al explored a new application of Amphipols showing that the polymer could be used for protein enrichment. Amphipols have been widely used as a surfactant in protein structure analysis for membrane protein trapping, and in solution will readily precipitate when the pH is decreased, and lead to the co-precipitation of

ACS Paragon Plus Environment

Page 13 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

proteins18. In this report the precipitation characteristics of Amphiphols A8-35 was further explored using total cell lysates as well as dilute protein samples like spent media from cells in culture. From total cell lysates 1650 proteins were found coprecipitated with the Amphipols, of which 452 (27%) were Gene Ontology (GO)annotated as integral membrane proteins18. With this newly discovered property of amphipols, these polymers represent a promising alternative for membrane protein studies.

MASS

SPECTROMETRY-BASED

APPROACHES

FOR

DETECTING

PROTEIN

INTERACTIONS: Of the more than 20000 proteins encoded by the human genome, 80% are estimated to interact with partners participating in over 100,000 different interactions19. These interactions are dynamic and affect cellular behavior including the function of multi-enzyme complexes, the cross-talk between cells and tissues and the function of enzymes. Proteomics has become an invaluable technique for the mapping of protein-protein interactions. Affinity Purification Coupled to Mass Spectrometry: Systematic identification of protein interaction networks in a high-throughput fashion is often performed by affinity purification coupled to mass spectrometry (AP-MS). In general, in AP-MS, a bait protein is engineered with an epitope tag and is stably or transiently expressed in a cell. Following cell lysis, the soluble bait with its interacting partners (prey) are purified from the cell extract using an anti-tag antibody that is conjugated to a solid support. The isolated proteins are then analyzed by mass spectrometry. Recently, Morris et al provided a straightforward pipeline to help extract the useful scoring

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

AP-MS data in those network-based analyses20. They recommend one should first filter out contaminants from interactors in MS data, followed by scoring the possible potential binders through several reliable AP-MS scoring algorithms, prior to constructing the protein interaction networks. They further point out that one should be careful to optimize each affinity purification step, including bait, affinity tags, affinity purification methodologies, proper controls and MS instruments. Marcon et al set up an AP-MS like workflow to elucidate and utilize large-scale MS readout data to select a highly reliable antibody towards a prey protein21. Since there is no universal rule to choose the most adequate antibody for a targeted protein, and especially since commercial antibodies are highly condition-dependent (that is, one may only work well for western blotting (WB), but not for immunoprecipitation (IP)) they synthesized a pool of recombinant antibodies toward a prey protein and performed in parallel multiple IP-MS experiments21. Based upon the MS results, the sample with the greatest abundance of the prey protein and higher relative abundant score than others was considered to be the best antibody, and so termed IP gold standard; those candidates were further evaluated by western blotting (WB), immunofluorescence, or ChIP for crossvalidating the feasibility. Although this MS-based standard operating procedure (SOP) provides a faster way to select for an antibody, those candidates might not be applicable for the detection of denatured-and/or lower abundant protein in cells. Proximity Biotinylation Approach to AP-MS: Alternative approaches to the conventional AP-MS have also been developed; the proximity biotinylation approach (BioID), initially developed by Roux et al22, has been reported in

ACS Paragon Plus Environment

Page 14 of 104

Page 15 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

combination with AP to explore the possible bait-prey interactions in different cell organisms. In BioID, a plasmid encoding for the bait protein fused with a mutant E. coli biotin ligase (BirA*) tag is made. Once expressed the chimeric bait proteins will not only interact with its partners but also covalently add a biotin tag to both interaction partners and proteins within its proximity. The protein complexes can be purified by streptavidin beads and analyzed by MS. The presence of biotin tags reinforces the likelihood of bona fide interactors. Compared with the modified chromatin immunopurification (mChIP) method23 coupled to AP-MS24 the BioID approach25 provided larger data sets of significantly interacting partners, and could identify lower abundance complexes associated with chromatin. The BioID not only identifies direct binding partners of the bait, but also discovers proteins that were in the vicinity of the bait for a specific cell environment, and can therefore be utilized for cross-validation of the MS results from antibody-based methodologies25. Data Independent Analysis of AP-MS Generated Spectra: Recently, a SWATH-based approach revealed that the generation of AP-MS data by the data-independent acquisition (DIA) fashion provides better reproducibility for extracting target peptides with superior sample coverage over the data dependent acquisition (DDA) methods26. Information on target peptides from DDA samples are used to build the reference spectral library, while further data acquired from DIA mode in SWATH can be used to extract the quantitative information from samples. By correlating those two datasets with a clear pipeline, including (i) peak area normalization, (ii) reproducibility check in each sample measurement, (iii) determination of confidence of fold-change between different samples, (iv) scoring the interactors to

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

get the true binding partners, and (v) data visualization, this method has been successfully applied to detect the changes in protein-protein interactions for the human kinase CDK4, to show the power of detecting/identifying in the new drugregulated interactome26. These aforementioned tools can be combined with quantitative proteomics to distinguish true protein-protein interactors from experimental artifacts. They can also be designed to study protein-protein interactions under different physiological conditions, and to evaluate how interactions change with post-translation modifications (PTMs). Moreover, these techniques are amenable to 96 well plate high throughput analyses.

ENRICHING FOR POST-TRANSLATIONAL MODIFICATIONS: Post-translational modifications include proteolytic cleavage events and covalent modifications added to the protein post translation. The study of PTMs remains a challenge due to the substoichiometric abundance of PTMs27,28, their transience in signal transduction pathways29, and the introduction of experimentally-induced PTMs30. Enrichment strategies for PTMs are often applied to aid in increasing their identification; the advancements in these strategies are discussed below. Phosphorylation Enrichment: Reversible protein phosphorylation is a ubiquitous PTM that regulates almost all aspects of cellular processes in both prokaryotes and eukaryotes, including cell cycle control, cell differentiation, signal transduction, transformation, proliferation and metabolism31. For MS-based shotgun proteomics study, the substoichiometric abundance of protein phosphorylation and the low

ACS Paragon Plus Environment

Page 16 of 104

Page 17 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

ionization efficiency of phosphopeptides have promoted the development of specific enrichment and separation strategies. Affinity materials including immobilized metal ion affinity chromatography (IMAC) such as Ti4+-IMAC32, and metal oxides affinity chromatography (MOAC) such as TiO233 have gained extensive popularity for phosphopeptide enrichment. Despite these established approaches, research to further improve the enrichment selectivity and sensitivity through the development of new enrichment material with stronger hydrophilic, larger surface area and more active centers remain strong (Table 1). Xiong et al fabricated a Fe3O4@SiO2@(HA/CS)10-Ti4+ nanoparticle, which showed high selectivity (β-casein/BSA at a molar ratio of 1/2000) for the enrichment of phosphopeptides by improving the hydrophilic properties and binding capacity of titanium ion34. The detection limit is at 0.5 fmol for enrichment of a β-casein tryptic digest while the binding capacity and enrichment recovery can be up to 100 mg g-1 and 85.45%, respectively34. Besides spherical nanoparticles, He et al developed a fibrous substrate for IMAC. Based on carboxyl cotton chelator (CCC), a fibrous CCC-Ti4+ IMAC sorbent was synthesized and applied for rapid (3 min) phosphopeptide enrichment. The CCC-Ti4+ fibers show good selectivity (βcasein/BSA at a molar ratio of 1/1000) and sensitivity (10 fmol) in standard sample enrichment and result in the identification of ~4000 unique phosphopeptides from 1 mg of rat brain lysate. In addition, lab-in-syringe solid phase extraction (SPE) approaches can greatly simplify the enrichment procedure35. Being a versatile material, functionalized metal-organic frameworks (MOFs) have also been used in the field. Zhao et al synthesized a magnetic microsphere material Fe3O4@PDA@Zr-

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 104

MOF with a core-shell-shell structure and applied this IMAC material to capture phosphopeptides,

which

exhibited

strong

magnetic

responsiveness,

great

hydrophilicity, high sensitivity (1 fmol) and high selectivity (β-casein/BSA at a molar ratio of 1/500) from standard sample and human serum36. Another magnetic MOF nanoparticle termed Fe3O4@MIL-100 (Fe) developed by Chen et al also showed high selectivity (β-casein/BSA at a molar ratio of 1/500), large enrichment capacity (60 mg g-1), low detection limit (0.5 fmol) and high enrichment recovery (84.47%) for phosphopeptide enrichment37. New advances have been made in the development of metal oxide based materials, especially in the components and morphology. Yan et al developed a G@PD@TiO2 nanohybrids by grafting TiO2 on a dopamine polymerization-coated graphene, which can improve the hydrophilicity and biological compatibility of the material38. The material showed low detection limit (5 fmol) and high selectivity (β-casein/BSA at a molar ratio of 1/1000) for phosphopeptide enrichment in β-casein tryptic digests38. Wan et al showed a magnetic yolk-shell Fe3O4@mTiO2@mSiO2 nanocomposites for highly sensitive (detection limit at 3 fmol) enrichment which was applied to endogenous phosphopeptide enrichment from human serum39. Two lanthanide based magnetic core-shell nanomaterials Fe3O4@SiO2-La2O3 and Fe3O4@SiO2-Sm2O3 were synthesized which showed ultrahigh selectivity (βcasein/BSA at a molar ratio of 1/8500) and sensitivity (detection limit at 1 amol)40. Magnetic binary metal oxides as affinity probes were studied by Wang et al and they found a better performance of the synthesized Fe3O4/Graphene/(Ti−Sn)O4 than single metal oxide affinity probes or a simple physical mixture of them41. Wijeratne

ACS Paragon Plus Environment

Page 19 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

et al studied the performance for the enrichment of phosphopeptides by aligned titanium dioxide nanotubes on titanium wire and concluded that titan sphere beads were as efficient as titanium dioxide beads42. In terms of topography, a ternary nanocomposite

of

magnetite/ceria-codecorated

titanoniobate

nanosheet

(MCTiNbNS) was developed by Min et al, then applied to phosphopeptide enrichment and programmed dephosphorylation, by which the site count of multiphosphopeptides can be reflected by MS1 spectra43. Chen et al fabricated 2D titanoniobate nanosheets termed Fe3O4-TiNbNS, and they emphasized the function of in situ isotope labeling of phosphopeptides that enable the relative quantification of phosphopeptides more conveniently44. Although many new developments of novel materials have been reported, no comprehensive studies have been performed comparing these different materials. There is a need for broader availability of these materials; otherwise it is very unlikely that they will have an impact in the field of proteomics. Despite all these development, phosphoproteomics is still predominantly performed with Ti4+/Fe3+IMAC and TiO2 enrichment materials as they are readily available and have been extensively compared by many laboratories. Enrichment approaches based on phosphatase and substrate interaction have also been demonstrated. Trentini et al developed a YwIE mutant (C9A) trap-based method for selective enrichment of arginine-phosphorylated proteins, and the substrate-trapping

efficiency

was

further

improved

oligomerization of the mutant45.

ACS Paragon Plus Environment

by

impeding

the

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Chromatographic techniques based on the physicochemical properties of phosphopeptides have also been developed including separation based on high pH reversed-phase chromatography (RP)46,47, strong cation exchange chromatography (SCX)48-50, strong anion exchange chromatography (SAX)51,52, electrostatic repulsion-hydrophilic interaction chromatography (ERLIC)53,54 and hydrophilic interaction liquid chromatography (HILIC)55,56. Recently, Loroch et al established an ERLIC-SCX/RP-LC-MS strategy for highly sensitive phosphoproteomics study. With 100 µg of HeLa cell tryptic digests and 45 h LC-MS analysis time, more than 7500 unique phosphorylation sites could be identified in a single run. By using this platform, the starting protein amount can be reduced to microgram range, which is essential for analysis of limited biological samples57. The combination of enrichment materials and chromatographic technologies has been used to reduce sample complexity and to improve protein identification coverage. An optimized platform of HILIC-IMAC by using 96-well filter plate has been shown to reduce sample loss during multiple sample processing steps and to increase identification efficiency and recovery of proteins. With ~500 µg of rat liver tryptic digest and 28 h LC-MS analysis time, more than 16,000 unique phosphorylation sites could be identified in a single experiment58. Ubiquitination Enrichment: Ubiquitination is one of the major PTMs that regulate biological processes. Most large-scale studies for the identification of protein ubiquitination sites rely upon enrichment of modified peptides following tryptic digestion. Branched peptides with GG attached at the lysine residue, where the ubiquitin was attached, are generated during digestion. These branched peptides

ACS Paragon Plus Environment

Page 20 of 104

Page 21 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

are readily purified using a recombinant antibody against K-ε-GG and identified by MS analysis59. Although very successful, this approach does not differentiate mono from

polyubiquitinated

sites,

or

the

different

branched

structures

for

polyubiquitinated sites. Alternative approaches for the detection of cellular ubiquitination events in vivo have been evaluated. For the analysis of EGF signaling, Akimov et al first knockeddown endogenous ubiquitin using an RNAi system and then re-expressed the hexaHistidine and Flag dual-tagged ubiquitin construct at physiological levels60. They further utilized a dual purification methodology through those tags (nickel sepharose bead purification followed by FLAG-agarose bead enrichment), resulting in efficient enrichment of ubiquitinated protein conjugates. Min et al utilized a DNA construct containing six-ubiquitin moieties fused with BirA in cells to generate in vivo–biotinylated

ubiquitin

on

proteins

for

streptavidin-conjugated

bead

purification61. Both methods claimed that the expression level of tagged-ubiquitin is comparable to endogenous ubiquitin concentrations in the cell, thus eliminating over-expression bias, and provide a robust purification workflows for ubiquitinated proteins under harsh denatured conditions. Alternatively, Porras-Yakushi et al found that compared to traditional fragmentation MS techniques, namely collision induced dissociation (CID) and higher-energy collisional dissociation (HCD), electron transfer dissociation (ETD) results in approximately a 2-fold increase in the number of identified ubiquitination sites for tryptic peptides after di-glycine remnant (K-ε-GG) antibody enrichment62. This is because branched peptides (K-ε-GG peptides) possess higher precursor charge (3+)

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

states than regular tryptic peptides in the 300–1200 m/z mass range. ETD can provide better fragmentation efficiency and sequence coverage than CID or HCD for the comprehensive profiling of protein ubiquitination. Valkevich et al also applied limited trypsin proteolysis for the investigation of the structure of poly-ubiquitin (polyUb) chains present on target proteins through a middle-down MS coupled with ETD approach63. They performed a restricted trypsin digestion in the characterization of polyUb chains in the presence of E. coli O:157H7. Upon minimal proteolysis condition, the branched ubiquitin conjugates remained mainly in an intact form and several single ubiquitin moieties can be easily resolved/identified by ETD-MS. Both methodologies implied that the ETD might provide additional information than CID/HCD for ubiquitylome analyses. Acetylation Enrichment: The enrichment of acetylated peptides/proteins has primarily relied on antibodies against the acetyl moiety. A new mixture of antilysine acetylation antibodies was developed and showed a two-fold increase in the number of lysine-acetylated peptides identified, compared with conventional antibodies. Further application of this mixture of antibodies enabled the quantification of 10,000 lysine-acetylated sites from 3000 proteins from Jukat cells64. The enrichment of lysine-acetylated peptides with this mixture of showed a specificity of 41%, compared to 16% with the single antibody approach that is widely used in acetylome analysis. Bromodomains, which also recognizes lysine acetylation, have also been utilized for the enrichment of lysine acetylated peptides65. However, the 14 proteins engineered with yeast bromodomains showed varied affinity towards lysine-acetylated

ACS Paragon Plus Environment

Page 22 of 104

Page 23 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

peptides and peptides with other modifications like lysine methylation. Although this method showed promising specificity in terms of acetylated peptide enrichment, more investigation on the specificity of each bromodomain towards acetylated peptides are needed for better performance, which would expand the application of this method to biological samples. Unlike lysine acetylation, N-terminal acetylation can be enriched by chemical methods that enrich protein N-termini. A charge reversal method was recently developed to block the protein N-terminal and lysine with dimethylation, and neoterminal

of

internal

peptides

by

sodium

4-formylbenzene-1,3-disulfonate

peptides66. Naturally modified and dimethylated peptides in gel-free and gel-based approaches can be subsequently enriched using strong cation exchange

(SCX)

chromatography. This method showed higher specificity towards the N-terminal acetylated peptides as over 60% of identified peptides are N-terminal acetylated. Thus, it would be interesting to see if chemical approaches could be developed for isolation of acetylated peptides and whether they would increase the efficiency and specificity of the enrichment. Glycosylation Enrichment: Specific and sensitive enrichment of glycopeptide/ glycoproteins from complex proteome samples have always been a hot topic in the field of glycoproteomics. Although there are not any new methods that have been developed to isolate glycopeptide/glycoproteins, adaptions have been made to existing methods and new materials were synthesized to increase the yield of glycopeptide/glycoproteins. However, most of these methods and materials have not been tested in complex biological samples, limiting their apparent application.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Lectin affinity chromatography is the traditional technique for glycoprotein enrichment but has limited specificity. To improve the performance of lectin-based glycan purification and detection, a bead-based lectin microarray was developed by coupling lectins to magnetic beads, detecting 50 pg/mL of asialofetuin with ricinus communis agglutinin (RCA) 12067. Lectins were also conjugated to magnetic nanoparticles to reduce non-specific adsorption68. However, the lectin-based approach still suffers from weak binding between lectin and the glycan structures it recognizes. This issue could be partially solved by implementing an additional hydrophilic interaction chromatography SPE (HILIC SPE) to provide extra selectivity over glycoproteins with particular glycan linkages69. The performance of lectin chromatography was compared with another popular method, namely hydrazide chemistry in Zhang et al70 Although the results from each method were complementary, hydrazide chemistry still surpassed lectin chromatography in terms of sensitivity and selectivity. Solid phase capturing of glycopeptides and glycoproteins by hydrazide chemistry was achieved by covalent coupling of oxidized ci-diol group in glycans to agarose beads with hydrazide groups. Although this approach has the highest specificity among methods for glycopeptides enrichment, non-glycopeptides will also bind to the agarose beads through hydrophilic interactions. To eliminate those non-specific bindings, magnetic nanoparticles were synthesized on a new matrix71,72. Another approach to improve the performance of hydrazide chemistry was to block the N-terminal of glycopeptides with serine and threonine by dimethylation to eliminate the non-specific oxidation of vicinal amino alcohol73. Moreover, the

ACS Paragon Plus Environment

Page 24 of 104

Page 25 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

performance of this classic approach was further improved by a hydroxylamine assisted PNGase F deglycosylation to increase the yield of released glycopeptides and improve its accuracy in quantitative glycoproteomics73. Amine chemistry was also applied as an alternative to hydrazide chemistry, but the process was even more sophisticated than conventional hydrazide chemistry74. However, a fundamental flaw of these chemistry based solid phase extraction methods is that the oxidation, coupling and deglycosylation, destroys the glycan structure and thus no site-specific N-glycosylation information can be obtained. Instead, other reports have focused on the development of technologies that enables the enrichment of intact glycopeptides. For example, a promising approach is HILIC SPE that was shown to enrich intact N-glycopeptides from a cell membrane protein digest and the intact glycopeptides could be characterized by HCD MS75. New stationary phase and nanoparticles were synthesized to improve the performance of HILIC by incorporating various functional groups like arginine76, phenol-formaldehyde77 and glycopeptide dendrimers78. Although HILIC SPE has no specificity towards any particular monosaccharides or oligosaccharide, a few of those novel materials exhibited limited selectivity over disaccharides79 which suggested that modified HILIC SPE could potentially replace lectin based approaches with better performance in analysis of particular glycan. However, these new materials need to be tested with complex samples to assess biological applicability. Besides enrichment prior to mass spectrometry analysis, glycopeptides can be selected during MS by prior metabolic labeling with two bromide atoms, which

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

gives the glycopeptide a unique isotopic pattern that distinguishes it from nonglycopeptides. This isoglycoTag technique provides an integrated solution to characterize both N- and O- glycosylation from cells80. However, the application of this technique was limited to cell lines, and O-glycopeptides are more likely to be identified with this technique. Thus, technologies with wider sample applicability are still needed to tackle the problems in both N-glycosylation and O-glycosylation analyses at the proteome level. In addition, the techniques in glycopeptide enrichment should be coupled with MS to identify sites of glycosylation, the glycan structures attached to those sites, occupancy, and for quantitative analysis of site-specific glycosylations and their implications in biology. Methylation Enrichment: Methylation is a prominent experimentally observed PTM with relatively few sites known to date81. This was originally due to the difficulty in identification due to its small size (14 Da) and lack of charge. With the advancement in MS, however, these difficulties are no longer a problem. Due to the substoichiometric abundance of methylation, enrichment is needed prior to MS analysis to increase site identification. Recently, various affinity and chemical enrichment approaches have been applied to study methylation, accelerating the number of methylation identifications. This is particularly true for lysine residues that, historically, have been more difficult to enrich by antibody capture82,83. These new enrichment approaches are usually applied alongside metabolic labeling with heavy methionine (SILAC; stable isotope labeling of amino acid in cell culture) to increase identification confidence84.

ACS Paragon Plus Environment

Page 26 of 104

Page 27 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Two different affinity enrichment approaches have been applied: the methyl-lysine binding domain and the pan-specific methylation antibody. Liu et al used the heterochromatin protein 1 (HP1β) domain to pull down and enrich for methylation sites. Although, HP1β was originally described as a trimethylated H3K9 reader, it appears to also bind to other lysine methylated proteins85. Using the HP1β enrichment approach, Liu et al identified 40 di- and tri-methylated peptides from a HEK293T lysate85. Similarly, Moore et al identified 102 methylated peptides using three repeats of the MBT domain of the lethal malignant brain tumor like protein 1 (L3MBTL1)86. Twenty-six of these methylated peptides were also identified in a heavy methionine SILAC population. Bremang et al used a combination of 11 commercial and in-house pan-specific methylation antibodies on a histone-depleted nuclear fraction and cytosolic fraction to identify a total of 308 methylation sites in the HeLaS3 cell line. They compared this approach to samples with increased fractionation in their in-gel and off-gel separation protocols to simplify their protein and peptide mixtures before MS analysis. The in-gel approach yielded 242 methylation sites while their off-gel approach yielded 112 additional sites, leading to the conclusion that antibody enrichment wasn’t significantly advancing the field for site identification87. In 2013, Guo et al developed a cocktail of pan-specific methylation antibodies to enrich for methylated peptides in HCT116 cells. They were able to identify 165 lysine methylation sites and over 1000 arginine methylation sites88. Similarly Cao et al developed their own pan-specific mono-, di-, and tri-methyl antibodies and were able to identify 552 lysine methylation sites in the HeLaS3 cell line after subcellular

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

fractionation89. In 2014, Sylvestersen et al identified 1027 mono-methyl arginine sites using commercial antibodies. Of these sites, 798 were identified in HEK293T cells90. They employed SILAC in human osteosarcoma (U2OS) cells in order to quantify the changes of mono-methyl arginine sites in response to ActD, an inhibitor of transcription. They found that 42% of proteins with a down-regulated monomethyl arginine site over the first 8 hours of treatment with ActD are involved in transcriptional regulation. This study further supports the involvement of methylated proteins in transcriptional regulation90. To overcome the problems encountered when enriching for lysine methylation with pan-specific antibodies, Wu et al developed a new approach for affinity enrichment for lysine mono-methylation91. They chemically modified the ε-amino group of mono-methylated lysine by derivatization with propionyl anhydride. They subsequently developed an antibody against propionyl methyl-lysine and used this antibody for affinity enrichment prior to MS/MS analysis. They identified 446 lysine mono-methylation sites using this technique in the HeLaS3 whole cell lysate91. Compared to the field of phosphoproteomics, the field of methylation is relatively young and still requires more technological development for its enrichment. Overall, the field of protein methylation is now heading in the right direction toward increasing its number of experimentally observed methylation sites. Terminal peptide enrichment: The protein terminal is important for protein functions and interactions, which can change as a consequence of PTMs, particularly acetylation, methylation, formylation, proteolytic processing, and C-methyl esterification. While standard proteomics methods that evaluate tryptic peptides

ACS Paragon Plus Environment

Page 28 of 104

Page 29 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

are useful in protein identification, the single semi-tryptic amino (N)- or carboxy (C)-terminal peptides are often not measured or overlooked. Methodologies for positive and negative enrichment of terminal peptides have been developed in the last decade, with major advances in the last 5 years. Recently, positive enrichment approaches using immunoaffinity were developed for both N- and C-terminomics. Bland et al enriched for 2,4,6-trimethoxypheyl phosphonium (TMPP)-labeled N-termini by capture with anti-TMPP antibodies coupled to magnetic beads92. Application of their methodology to the bacterium Roseobacter denitrificans resulted in enrichment of two-fold more N-termini when compared with the non-enriched lysate. Similarly, Liu et al utilized a biotin-arginine tag to label the C-termini of Thermoanaerobacter tengcongensis proteins prior to chymotryptic digestion, and enrich these peptides with streptavidin93. In addition to the identification of five-fold more C-terminal peptides than in the non-enriched sample, the labeling had the added advantage of improved ionization of the terminal peptides. In contrast, Lai et al developed a negative enrichment N-terminomic approach wherein N-termini were dimethyl-labeled prior to tryptic digestion, and internal peptides depleted by charge reversal. In addition to enriching for acetylated Ntermini (described above) this method proved useful in identification of over 3000 unmodified N-termini from MEF cells66. Briefly, and as described above, protein Ntermini were dimethyl-labeled prior to tryptic digestion, and internal peptides depleted by charge reversal. In a similar vein, Nika et al developed a multi-step enrichment strategy for C-terminal peptides, utilizing a series of chemical

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 104

modifications at both the protein and peptide level to maximize the adsorption of non-target peptides to a solid state after cyanogen bromide cleavage94. The proof-of concept experiment was successful on a purified protein, but the utility in a complex system has not yet been shown.

LIQUID CHROMATOGRAPHY MASS SPECTROMETRY (LC-MS) TECHNOLOGICAL ADVANCES: One of the holy grails of proteomics is the deep sequencing of the whole proteome95. Confident and comprehensive ‘large scale’ profiling, now frequently called ‘deep proteomics’ is the base for further quantitative analysis95,96. Although astonishing developments have occurred over the last 15 years in LC-MS technology for high throughput protein identification, there are still many more peptides present than identified in any given elution time in any MS run97. Therefore there is still a need for instrument improvement or different strategies to further maximize the readout from LC-MS experiments. The introduction of faster and more performant mass spectrometers such as the Q Exactive HF98, Orbitrap Fusion99, and the Bruker Impact II Q-tof100 in combination with different chromatographic approaches is rendering deep profiling possible. The identification of up to ten thousand proteingroups in a sample is now an achievable goal101. For example, the yeast proteome (with > 4000 proteins) can be nearly mapped within an hour99. Single Shot Long Gradient Versus Fractionation: Improvements in proteomics are not only due to better mass spectrometers but also advances in HPLC, mainly in the form of long column separations. For example, the heated long column/long gradient

configuration

are

increasingly

used

for

ACS Paragon Plus Environment

both

profiling102

and

Page 31 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

quantification103. Long gradient analysis without fractionation presents greater reproducibility when retention times are properly aligned. It also helps to maximize the MS utilization time, with relatively fixed loading and re-equilibration time, and sufficient HPLC resolution. Although one-dimensional HPLC provides better reproducibility and easier data analysis, fractionation of peptides can also increase the dynamic range of peptide identification and quantification when properly and efficiently performed. In order to increase the peak capacity, a new mode of 2D separation, hyphenating high and low pH reverse phase (RP) chromatography, was demonstrated to be better than traditional SCX-RP combination, in terms of separation resolution and MS identification. SCX and SAX are theoretically orthogonal to RP in 2D separation. The orthogonality of high pH RP-low pH RP is not comparable to SCX; however, a modified fraction mixing method, called fraction concatenation makes up the low orthogonality104,105. Though the high scanning speed and increased sensitivity of mass spectrometers make the comprehensive fractionation less and less necessary for regular proteomic profiling, it is still a good choice if extreme deep sequencing is needed, for example for library construction for SWATH analysis106. Two-Column Configuration to Boost Sample Throughput: Nano-scale HPLC has unpaired performance over regular flow LC in terms of sensitivity for proteomics analysis. However, due to the nano-scale flow rate and often-high pressure because of the small resin size and long column, sample loading and column regeneration occupy a disproportionate time portion of the LC run. A double column configuration was developed to reduce the system idle time. Briefly, while one

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

column is being used for analysis, the other column is regenerated and the sample loaded7,107. The two-column configuration also applies to long gradient configuration, because loading is time consuming without a pre-column. For best proteomics profiling, users can choose between fractionation followed by a short gradient7,8,108 or long gradient without fractionation as discussed above. However, it has to be noted that the variation and instability brought by fractionation is not favored for quantification, especially for label free quantification, because the resolution and retention time of the two columns are difficult to match exactly.

PROTEIN QUANTIFICATION IN BOTTOM-UP PROTEOMICS: Very few applications in proteomics provide biologically useful information based solely on protein identification. Protein quantification is key in deciphering biological mechanisms. Multiple relative quantification approaches have been developed including label free, metabolic and chemical labeling approaches. Metabolic labeling methods allow early sample mixing, resulting lower variation, better precision and reproducibility compared to chemical labeling strategies109. However, due to the limited combination of isotopes for labeling, only a few (usually maximum of three) samples can be analyzed simultaneously and quantified on the MS1 level. Although this limitation can be overcome by implementing a super-SILAC approach wherein a heavy reference is spiked into all samples. In addition, all metabolic labeling approaches rely on MS1 level quantification, therefore MS1 complexity is increased accordingly, and then leads to redundant MS2 sampling of the same peptide sequence from different isotopic forms, wasting MS time. On the contrary, the MS2

ACS Paragon Plus Environment

Page 32 of 104

Page 33 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

level chemical labeling strategies, e.g. iTRAQ or TMT, do not complicate the MS1 and MS2 spectra for peptide identification and can handle multiple samples including 12-plex110 and 18-plex111. NeuCode SILAC: Recently metabolic labeling approaches based on neutron encoding were reported (namely NeuCode SILAC) allowing MS1 level quantification with higher multiplexing ability without increasing spectral complexity, by either metabolic labeling or chemical labeling112,113. NeuCode strategy is a fined tuned version of SILAC, utilizing the milliDalton of mass difference resulting from neutron signatures between elementary combinations, rather than between isotopes. The mass difference can only be differentiated by extremely high resolution (usually 200,000~960,000), and therefore the spectra complexity is not increased at regular analysis resolution. MS2 Level Quantification: Unfortunately, MS2 level quantification methods by isobaric chemical tagging have an inherent, overlooked flaw. Due to the large isolation window, fragmentation of co-isolated interfering precursor ions can distort the measured MS2 ratio114. Currently the precursor ion selection is either done by quadrupole (Q-tof platform) or iontrap (orbitrap platform). The isolation window is usually in unit scale and the precursor ion selection specificity is limited in complex samples. Several reports have tried to address this issue by further rounds of selection, termed synchronous precursor selection MS3 (SPS-MS3) or MultiNotch MS3115, but this only partially addresses the problem. Therefore, particular attention needs to be paid on the results from this strategy, and the results should be carefully reviewed and extensively validated.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Label Free Quantification (LFQ): LFQ strategies are still extensively used in proteomics due to simplicity and low cost. However, spectral counting should really be abandoned and replaced by intensity-based quantification by measuring area under the curve (AUC) or MS signal intensity. Accurate and efficient normalization is especially important for LFQ quantification, due to the lack of internal standard. A recent publication on MaxLFQ introduced a series of algorithms for data normalization, alignment, imputation, etcetera that facilitates LFQ quantification utilization116. The MaxLFQ algorithm can be extrapolated to all types of protein quantification from peptide, for both label and label free strategy. Absolute Quantification: Absolute quantification is no longer solely based on labeling strategies. Several methods have been developed such as intensity-based absolute quantification (iBAQ)117 and absolute protein expression measurements (Apex)118. iBAQ has so far the most reasonably calculated index for quantification, and in combination with known concentrations of internal standard, can estimate the absolute concentration of proteins. One overlooked advantage of these methods is that the expression level comparison between proteins within a sample becomes possible and therefore not limited to comparisons across samples. However, iBAQ is still a relative index for particular sample analysis. Internal standards need to be spiked in as an absolute quantification flag to transform this index into an absolute expression value. The iBAQ value itself can be used for large-scale quantification annotation if properly normalized. Using histone as a “ruler” can also be used as a substitute for the spike in internal standard119, because most cells have relatively

ACS Paragon Plus Environment

Page 34 of 104

Page 35 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

stable histone numbers per cell, and very importantly, histone protein is easy to extract and usually presents as a top hit on the protein list. Software Tools for Quantification: Beside those conceptual advances, new tools are continually contributed to broaden the application of proteomic techniques. In the past two years, recently published software/pipeline including ICPL_ESIQuant and O18Quant, which is specially designed for, as suggest in their names, ICPL labeled and 18O labeled protein quantification, respectively120,121. An R package, called aLFQ, supports for calculation of the commonly used absolute label-free protein abundance estimations (TopN, iBAQ, APEX, NSAF and SCAMPI)122. Integrative pipelines such as APP and Sipros/ProRata provide both label-free (spectral count and intensity-based estimations) and labeled quantification (support for TMT- and iTRAQ- labeled peptides quantification)123,124. Despite those intensive efforts made on computational proteomics, there is still plenty of room for improvement of existing and/or establishment of new tools. For example, most of the techniques and algorithms are available, developed especially for ease of implementation in existing software or workflows. In other words, many of the new techniques are data-processing-pipeline dependent, which makes it is challenging to compare and validate different quantitative result from different software/pipelines. For instance, to our knowledge, there are no tools to quantify a given protein list in a search-engine- or quantification-pipeline- independent manner currently. Thus, choosing the right tools is crucial to take full advantage of your data and to get the expected result in some extent. From a bioinformatics

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 36 of 104

prospective, making proteomics quantification tools more general is surely appealing and will benefit the popularization of proteomics techniques.

MASS SPECTROMETRY ACQUISITION MODE: Real time peptide readout and decision making for directed MS: Along with the development of MS hardware, better acquisition “software” that are more versatile have been developed. Thanks to the improvements in computing performances, on-the-fly peptide sequences are now achievable, for example inSeq125 and MaxQuant Real-Time126. The practical usage of on-the-fly peptide sequence identification is not to economize on time, but instead to direct the MS without preset inclusion or exclusion lists. For example, the instrument can be focused on a peptide of a particular biological function for specific PTM analysis126. We expect further application in multiplexed quantification, wherein quantification is extracted and analyzed on-the-fly and MS2 identification is only triggered for differently expressed peptides.

However, this area is still

relatively novel and remains under developed. Real time decision making to direct MS acquisition can be performed without sequence identification. For example, the DDA and DIA can be blended together in a single run to increase the reproducibility of target peptides without interfering with the profiling of non-targeted peptides127. Real-time decision triggered MS2 has also been used for glycopeptide identification80. In this paper,

79Br

and

81Br

were

introduced to label intact glycopeptides. The dibromide motif can then be detected on-the-fly and used to trigger MS2 event for glycopeptide identification128. We expect that the same isotope pattern real-time recognizing strategy has potential

ACS Paragon Plus Environment

Page 37 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

usage for other applications. For example, all isotope spike-in experiment (SILAC etc) increases the MS1 complexity, and roughly half of the MS2 events are redundant. The isotope pattern of the spike-in, could be recognized on-the-fly and used to eliminate unnecessary MS2.

TRANSITIONS OF MS DATA ACQUISITION MODES To date, most of the proteomic data sets are produced using the DDA mode. The DDA mode is a well-balanced strategy to boost proteomic development. With the intelligent real-time decision-making, DDA mode can be modulated to be multipurposed as discussed above80. Indeed, DDA-based discovery proteomics has unmatched capability to identify and quantify proteins at the same time, in a high throughput, non-targeting manner. However, the DDA method is semi-stochastic, compromising the reproducibility and accuracy for large cohort of samples, especially for relatively low abundant proteins, and leads to the missing value issue during quantification of large cohorts116. Notably, all targeted analysis requires prior knowledge for effective identification and quantification, namely a library has to be setup and curated. MRM/SRM has been the gold standard for reproducible, targeted quantification because of the fast transition and large dynamic range of the triple quadrupole instrument. Similar operations can be performed on ion-trap MS, but with severely compromised speed and dynamic range. However, selectivity is the main drawback of MRM because of the low resolution and low mass accuracy of the quadrupole MS, which is a great concern when dealing with very complex samples. A new targeted

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

quantification mode, parallel reaction monitoring (PRM) was first introduced in 2012129,130. Technically it is simply a conceptual improvement wherein all the product ions are recorded without selection, which does not compromise the dwell time on Q-tof or Q-Exactive platform. On the contrary, PRM eliminates the need to preselect and optimize product ions for MRM. Simply put, MRM selects both precursor and product ions; PRM takes the full picture of product ions on selected precursor ions; while in contrast, the SWATH approach records a panorama of both precursor and product ions131. SWATH has been shown to do reproducible quantification for nearly the whole yeast proteome across over 78 samples132. SWATH now dominates most of the DIA workflows, thanks to the fast acquisition of MS2 by TOF and the square quadrupole, which makes the unbiased selection of a large window of ions possible. Q-Exactive technically can also do the same experiment but at a slower speed133. There are discussions and publications on the Q-Exactive platform doing DIA, although not with the name SWATH134. The conventional DDA method has obvious discrimination against low abundance peptides97, while the MRM/SRM mode only focuses on a targeted list. However, the SWATH-like DIA mode provides a panorama, with “everything” there in theory. This basically changed the concept of experimental design from targeted to untargeted(Figure 2). After MS analysis, researchers are able to test multiple hypotheses without the need to reacquire data. With comprehensive information “encrypted” in the DIA data, the question now becomes how to extract the signal from the noise, with high confidence.

ACS Paragon Plus Environment

Page 38 of 104

Page 39 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

A critical step in SWATH strategy is the availability of good spectra library for peptide extraction. There are public libraries available for general purpose, however, users usually need to construct their own sample and platform specific libraries135. Generally, more comprehensive libraries have led to more efficient extraction of information from SWATH. Extensive fractionation is usually needed to increase the identification106. The discovery MS2 library can also be used as a cross-platform136. Self-constructed library on the same platform by DDA mode is always the best choice, because it has retention time information, which is critical for correct peptide matches and filtering. There are two ways of examining SWATH data, de novo MS2 spectral extraction and library search137. The de novo strategy extracts MS2 spectrum by grouping fragmented ions to the precursor ions with the same elution profile. The grouped fragments with precursor m/z can be used then for peptide sequence identification by database search. Due to the complexity of the SWATH data-acquiring mode, the quality of the extracted MS2 is compromised and therefore determination of the proper false positive control would be the main concern. The library search strategy is more conservative, because it only looks for the peptide from the library. The quality is not a major concern anymore if the retention time and MS accuracy are well controlled. The strategy of iRT makes good use of this conservative elution order, using one dimensionless score to stand for the relative order to a set of standard peptide138.

BIOINFORMATICS AND DATA ANALYSIS

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

NEW SEARCH ENGINES: Although there are several classical search engines, which can accommodate virtually all types of data, new search engines are still being developed such as Morpheus for high MS-resolution, MS-GF (universal database search tool)139 and MSAmanda for high accuracy data140. The improvements include speed and lower FDR, especially in PTM analyses. Recently Gygi et al extended the idea of using high MS-deviation search to look for unknown PTM141.

BIOINFORMATIC TOOLS FOR PTMS: As most PTMs are happening at substoichiometric level, peptides with modifications are less likely to be selected by MS for further fragmentation as their unmodified counter in most DDA. Thus, the first step to improve the performance of mass spectrometric analysis of PTMs was to increase the selectivity of MS over modified peptides or dig deeper in the MS data to find evidence of PTMs. To achieve that, PTMeta was developed to find the mass difference between an unidentified precursor and a precursor identified by database search. Once the difference matched to a known PTM in the UniMod, the MS/MS spectrum of this unidentified precursor was selected for the further search with looser criteria142. In a similar way, Pascal et al developed Proteomics Workbench software to generate an inclusion list of peaks with potential modifications and used this list in a second LC-MS run for targeted identification143. Although they only proved it is working for protein phosphorylation, this approach is promising in find new modifications as any PTM in the UniMod could be included. For particular modification, Iso-PeptidAce was developed to deconvolute spectra of

ACS Paragon Plus Environment

Page 40 of 104

Page 41 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

acetylated peptide with isomeric peptide by their difference in fragmentation pattern144. The same principle was also working for other modifications, which would increase the confidence of PTM identification and quantification. As all these tools were developed for post-acquisition analysis, information of PTMs might be missing during the LC-MS analysis due to the lower abundance and ionization efficiency of modified peptides, it is promising to select the precursors with modification during the LC-MS with the development of tools for real time decision making. As mentioned in our previous section, glycopeptides precursors could be selected when bromide atoms are incorporated into the glycans as it gives glycopeptides unique isotopic pattern80. The same strategy might be helpful for other PTMs, particularly for those without efficient enrichment method. Software development was also made to reduce the ambiguity in site localization of PTMs, especially for those with strong neutral loss like phosphorylation. LuciPHOr were developed to estimate the false localization rate of phosphorylation by using decoy phosphopeptides, which was generated by placing artificial phosphorylation(s) on noncandidate residues145. Characteristic fragments were used to improve the accuracy in PTM identification. PTM MarkerFinder extract the information of “marker” ion from the MS/MS spectra of modified peptides identified by search engine and use this marker ion for the validation of identified peptide146. The sensitivity of identifying PTMs might be increased with this approach by detecting the “marker” ion in those spectra were not identified directly by searching ion due to lower score.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Since the identification of intact glycopeptides was not available with popular search engines like MaxQuant, Mascot and SEQUEST, development of software for site-specific analysis of glycosylation is getting interest of bioinformaticians. Although only a few software are commercially available for automatic identification of intact glycopeptides (Byonic, Simglycan, etcetera), there are still lots of development of in-house built tools to improve the throughput and accuracy in identifying intact glyopeptides. Recently progress was mostly based on the development of new fragmentation techniques (ETD and HCD). In HCD, glycopeptides give a unique pattern that its Y1 ion, which contains the peptide backbone and one acetyl-glucosamine, can be easily detected. Since the m/z was recorded with high mass accuracy, the molecular weight of Y1 ion could be used to matched deglycosylated peptides, which could be either selected from in-silico digested protein database (GlycoMaster DB)147 or from dataset identified by database search148-151. These tools shared similar procedures, including library building, spectra selecting and matching with a little variance in algorithm for the matching. Although this approach enabled identification of up to 2000 intact glycopeptides, it has a major drawback, as the only connection between the intact glycopeptides and the deglycosylated peptides is the molecular weight of Y1. Once there are multiple potential matches from the list of deglycosylated peptides, it is difficult to exclude the possibility of wrong matches. Thus, it is more accurate to match the Y1 ion to the list of deglycosylated peptides identified from the same sample with the same enrichment method and LC-MS system150. Moreover, the accuracy of the matching could be further increased by adding retention time as

ACS Paragon Plus Environment

Page 42 of 104

Page 43 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

filtering criterion as intact glycopeptides and deglycosylated peptides with same peptide backbone have similar hydrophobicity. Besides the peptide sequence, glycans from of intact glycopeptides could be identified either by peptide-like database search against existing glycan database or in-house built library. The FDR of this approach has been evaluated and an overall FDR was 3% was calculated152. However, since of the matching of Y1 was no through conventional database search, the calculation of FDR might not be accurate and efficient approach in controlling the FDR is still needed, especially in larger scale studies of more complex samples.

STATISTICAL ANALYSIS: OPTIONS FOR DOWNSTREAM ANALYSES OF MS DATA: Statistical Analysis data filtering (error rate evaluation), FDR/PEP/E-pvalue for identification: After generating a list of peptide spectral matches (PSMs) from any database search engine, the reliability of the dataset must be evaluated. We need to know which peptide identifications are significant, and the credibility of these results. The false discovery rate (FDR) is the most commonly used concept, which can reflect the overall credibility of the identified PSMs153. It should be noted that the FDR cannot access the credibility of a specific PSM, deemed as correct. If we focus on a specific peptide or protein for further biological validation, other metrics to evaluate the property of single PSM are needed. Posterior error probability (PEP) and Q-value are statistical scores associated with specific PSM154. Superficially they are similar because they all indicate the probability of a PSM to be correct. Indeed PEP is an inherent feature of a PSM, which exactly represents the probability that the PSM is a random match. Q-value depends on the entire dataset, which indicates

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 44 of 104

the minimal FDR required to retain a particular PSM in the identification list154. Therefore, appropriate statistical score implemented is dependent upon the specific experimental question. Validation of peptide identification is a process of classifying, correct and incorrect PSMs and reducing the false positive and false negative results in the identifications. Usually, the original scores for each PSM provided by the search engine usually cannot efficiently distinguish correct and incorrect matches, which will result in false positive and false negative identifications. To obtain more identifications with specific FDR, constructing a more efficient score system is generally used. Gorshkov et al constructed a scoring scheme containing a multi-parameter score, which was calculated by various features of the PSM (for example, mass difference, retention time difference, modifications and number of missed cleavages, etc.) and this score was subsequently used to filter PSMs that did not pass the identity threshold by the search engine155. Since classification is a common task in machine learning, various machine-learning algorithms were utilized to resolve this problem

156.

Link et al

developed a support vector machine (SVM)-based algorithm called De-Noise which uses a continuous refining process to distinguish incorrect from correct PSMs157. Comparison of this method with the well-established tool Percolator158, yielded fewer features in the learning procedure and demonstrated improved performance. Degroeve et al proposed a classifier named Nokoi as well as a decoy-free approach to evaluate the FDR159. In this method all the target and decoy PSMs are from the target database and no decoy database is needed. Considering that search engines usually provide multiple PSMs for one spectrum that are ranked by their scores,

ACS Paragon Plus Environment

Page 45 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

basically only the top ranked PSM has a chance to be correct. So the lower ranked PSMs can be collected as decoy dataset. A prebuilt model was used to distinguish correct PSMs with high speed. Nobel et al developed a strategy for computing exact p-values for XCorr and the results from this method were complementary to that from Percolator160. Therefore applying multiple validation methods may be of benefit to obtain more PSMs at a specific FDR. Functional Analyses: Enrichment analysis: Thanks to the advent of high-throughput -Omics technology, the scope of quantitative proteomics studies have been broadened from single gene regulation to the entire biological system. The field of enrichment research is still very much under active study and with diverse outcomes in recent years, including algorithm refinement for statistical tests such as the Bayesian extension of the hyper-geometric test161, integrative platforms for multi-omics (including transcriptomics, proteomics, metabolomics and GWAS data) data enrichment analysis such as iPEAP162 and customizable results visualization tool such as FunRich163. However, it is unclear which method provides the most accurate results and several limiting factors should be considered for proteomics. First, the quality of any enrichment result is highly dependent on the quality of input. Thus, building an informative and fully annotated gene list is the first step towards an informative testing result. For example, an appropriate FDR cut-off should be chosen for your protein identification filter164. Then, proteomics results should be fully annotated using functional annotations (e.g. GO terms165 and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway166). Typically, protein identification results from MaxQuant are reported in protein group level, while the

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

enrichment test will be more informative to be performed at the gene level, so it is advisable to expand the protein list into gene list. Subsequently, an appropriate tool such as GOParGenPy167 should be used to complete the mapping process. As well, whether protein isoforms should be included in the annotation list should be taken into consideration, otherwise missed annotation may arise due to preferences of the gene identifier in each tool. Second, because the statistical test for the enrichment of a particular functional term utilizes a background gene list, thus an unbiased and ‘universal’ background gene list is one of the critical factors that impact the statistical significance of an enrichment test. A general guideline to set up the population background is to use the pool of genes which show expression activity in the scope of the users particular study168. However, a primary hurdle that stands in the way of this application is the high variability of the proteomics data, which may arise from variable causes and effects including sample heterogeneity, experimental variance and proteome discovery inaccuracies that arise from proteomics software169. These so-called ‘sampling biases’ arise and render the background gene list ‘non-universal’ and the results are then over- and/or under-estimating. Using a free tool, like topGO or DAVID, it is possible to correct the sampling bias by carefully setting the background gene list to ‘universal’ manually170,171. Another possible solution is to use a statistical method to account for the non-uniformity of gene annotations, for instance, a test like Annotation Enrichment Analysis172. However, because this issue is mostly case dependent, the efforts to generate an appropriate ‘universal’ background gene list for each enrichment analysis is strongly advocated.

ACS Paragon Plus Environment

Page 46 of 104

Page 47 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

In conclusion, functional enrichment analysis provides a systems level interpretation of high-throughput -omics data and has been widely used; however, the methodology for this kind of analysis is still under development. For example, besides the limiting factors in proteomics’ platforms, other uncertainties such as the incompleteness of gene annotation in GO database may also cause un-expected interpretation. Thus, we suggest that careful examination of the data quality and suitability of the bioinformatic tools used, rather than applying different bioinformatics approaches until statistical significance is obtained.

NEXT LEAP OF PROTEOMICS

PROTEOGENEOMICS: Proteogenomics, a term coined by Jaffe et al173, is a relatively new research area combining genomic, transcriptomic and proteomic information for systems level understanding of cellular physiology and pathophysiology. In a proteogenomic work flow a customized database containing potential novel protein sequences is generated from information gained from genomic and transcriptomic sequences, and then mass spectrometry data searched against this database for novel peptide identification, in addition to the well annotated proteins or peptides. Integrating databases from genomics and transcriptomics with proteomics increases the goal of defining all proteins present, how they interact, and whether they are modified, which in turn provides information leading toward functional understanding. Genomics and transcriptomics alone cannot do this, as there is a well-established disconnect between gene expression, mRNA expression and its

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

correlation to protein levels. This has been documented in many studies, including recently in Zhang et al174. Due to the dramatic development of deep and highthroughput proteomics and next generation sequencing, proteogenomics is becoming available to more researchers and is being applied in many research areas including cancer studies. Call for Proteogenomics: Issues with Peptide Spectra Matching: (PSM): Despite tremendous advancement in proteomic technologies, the persistent ‘Achilles heel’ of proteomics is that the MS/MS spectra generated by peptides are matched to theoretical spectra of peptides found in a given reference protein database. Therefore PSM is only as good as the database and software used for searching. The assumption is that the database is annotated correctly and contains a complete complement representing all protein-coding products. However, this assumption is incorrect because some proteins may not be present in any database, for example, protein variants and undiscovered proteins. Thus spectra inevitably remain unmatched and unaccounted during analyses. These unmatched spectra affect down-stream analyses, such as during quantification. Proteogenomics, using a customized database that contains predicted or potential novel transcripts from DNA or RNA sequencing, thus will overcome this issue by rescuing those unmatched spectra or peptides during PSM against well annotated databases. Customized Databases for Proteogenomics: In contrast to traditional proteomic PSM, proteogenomic analysis compares MS/MS spectra against a customized protein sequence database that consists of a proteomic-based reference protein sequence database with additional input from genomics and transcriptomics,

ACS Paragon Plus Environment

Page 48 of 104

Page 49 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

thereby accounting for the ‘missing’ novel proteins and variants. Advances in transcriptomics using RNA-seq technologies have increased the transcripts annotated and available in databases such as RefSeq and GENCODE. Databases with information on variants include NCBI’s dbSNP database175, disease variants from Online Mendelian Inheritance in Man database (OMIM)176, and Protein Mutant database177. There are several proteogenomic databases available including dasHPPboard178, OryzaPG-DB179 and GenomewidePDB180. Some researchers create a customized database from genomic and transcriptomic data that they obtained from their own samples following a dedicated protein-coding gene prediction. Packages available to generate customized databases include customProDB181 and MSProgene182. Vermillion et al. used Galaxy-P, a multiomic platform to merge their RNA-seg and iTRAQ-based proteomic data from a study to identify proteins and pathways involved in cardiac tissue adaption during hibernation in ground squirrels183. This approach allowed them to identify and quantify 2007 cardiac proteins, 350 of which were previously uncharacterized. Peppy is another software which was designed to perform each step of proteogenomics automatically including peptide database construction from a genome, database search, and return peptide identifications at designated FDR184. The ability to identify novel proteins means that proteogenomics is particularly useful for less well characterized organisms. However, since proteogenomic databases are always very large, it does increase likelihood of erroneous matched spectra and also increases the computational times. Besides, to fully comprehend the enormity of the data compiled from proteogenomics, further software development and integration of

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

bioinformatics from genomic, transcriptomic and proteomic studies are still required. Multiple platforms exist to deal with the multi-omics information flow including PGP185, NextSearch186, PGTools187, PPLine188, Proteoannotator189 and PROTEOFORMER190. Applications of Proteogenomics: Quest for missing proteins: One of the important applications of proteogenomics is to find novel peptides/proteins, including those result from 1) short open reading frames, 2) non-AUG transcription initiation starts, 3) alternative splicing, 4) sequence variants, 5) RNA editing and 6) previously considered non-coding RNA191. Along with the deep proteome development, more and more attention has been directed to dig for new proteins or missing proteins. Chromosome-Centric Human Proteome Project (C-HPP) is the main organization in charge of looking for missing proteins to fully characterize the human proteome, and the proteogenomic data integration is the most important approach they used for this purpose192,193. In a proteogenomic study by Chang et al under the framework of CHPP, they utilized transciptomics, translatomics, and proteomics to analyses three hepatocellular carcinoma cell lines194. While demonstrating that only 50.2% of the protein-coding genes with translation evidence in proteomics, they identified 324 new proteins with MS evidence when compare to three public databases (GPMDB, PeptideAtlas, and HPA). This study thus suggests the need for comprehensive survey of missing proteins and also the promising application of proteogenomics in this task. Cancer Research: Currently cancer research relies extensively on genomic and transcriptomic data. Cancer researchers find utility of proteogenomics since cancers

ACS Paragon Plus Environment

Page 50 of 104

Page 51 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

arise and progress through a series of gene variations and modifications. These alterations result in aberrant protein pathways, cellular signaling and function ending in pathological proteotypes since they translate at the proteome level, but without the integration of genetic information in proteomic database searches, these novel proteins, PTMs and variants would not be revealed. The Clinical Proteomic Tumor Analysis Consortium (CPTAC) strives to better understand and treat cancers by characterizing their unique proteomic signature174. Zhang et al used 95 colorectal cancer specimens from The Cancer Genome Atlas for proteogenomic characterization174. To identify novel amino acid variants, they performed a search against a customized database they created from RNA-seq data from the same samples. This proteogenomic exercise allowed them to identify 796 single amino acid variations, 162 of which were novel. Using proteogenomics, their proteomic study better defined unique features for cancer subtypes than by genomics alone. This illustrates how such an approach enriches unique information about a given disease state, not currently found in some proteomic databases. Some of these novel protein variants may serve as future targets in development of diagnostic and prognostic tools. Microbiota: Microbial communities can be studied with proteogenomics, which is termed metaproteomics or community proteogenomics (and reviewed below), since the microorganisms in the community, particularly for complex ones such as human gut microbiota, are poorly annotated and studied.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

METAPROTEOMICS, UNVEILING THE FUNCTIONS OF MICROBIAL COMMUNITY: The first metaproteomic study was published in 2004 by Wilmes and Bond195, using two-dimensional gel electrophoresis to identify proteins from a sludge system for biological phosphorus removal. The field has since evolved to using affinity chromatography or ion exchange for fractionation, enabling increased throughput. The number of identified proteins has increased dramatically, from 3 proteins in the initial metaproteomic study to several thousand proteins. Since 2004 community level proteomics has investigated a plethora of different sources ranging from teeth root canals to seawater, and from soils to human gut. The human gut is a complex ecosystem involving the host, microbe, Archaea, and Fungi. The importance of gut microbiota is becoming increasingly apparent since accumulating evidences have shown its close associations with many diseases including both intestinal (including irritable bowel syndrome, ulcerative colitis and Crohn’s disease) and non-intestinal diseases (including obesity, type 2 diabetes, and cardiovascular diseases)196,197. However, the mechanisms by which the gut microbiota interact with the host remain unclear. Proteomics, with the ability to examine the functional properties of both host and microbiota, can help better understand the complex host-microbe interactions in the gut. While proteomics can be readily used to study the host proteome, its application in gut microbiota, namely gut metaproteomics, is still challenged by several issues including high protein diversity, individual variability and dynamic range198. Fortunately, recent progress has significantly improved metaproteomic methodologies, facilitating the functional study of the gut microbiota.

ACS Paragon Plus Environment

Page 52 of 104

Page 53 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Sample Preparation Considerations for Metaproteomic Studies: Microbiota compositional and functional profiling by metaproteomics is influenced by sample preparation methods including pre-protein extraction and protein extraction method as illustrated in Figure 3199-201. The biological question under assessment and the microbiota source will dictate the sample preparation method utilized. There is no single standard protein extraction preparation method for metaproteomics for any given microbiota source. Several studies have compared protein extraction methods and their impact on metaproteomic analysis and biological interpretation, as described below199-201. Pre-protein extraction methods, such as sample enrichment methods, including differential centrifugation have been applied in order to deplete host cells to enable increased microbial protein identification. A report from Xiong et al proposed an alternative microbial enrichment method, for infant stool samples, when the amount of starting material is limited202. This approach increased the number of microbial protein groups identified by two fold. Briefly, this method entails the enrichment of the microbial biomass in a first step by size exclusion, followed by homogenization to lyse the human cells (as the bacterial cell walls are more robust) followed by a final step of bacterial cell capture on a filter. The biological question under investigation is an important consideration when designing the sample processing methodology, for example the preceding approach would not be suitable when assessing the hostmicrobe interaction as the host proteome would be incomplete due to the depletion of human proteins. Host depletion could be applicable in investigating host-microbe interactions if another sample in parallel is prepared in which the host proteins are

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

not

depleted.

Pre-protein

extraction

processing

Page 54 of 104

can

also

influence

the

metaproteomic taxonomic and functional identification as exemplified by a study by Tanca et al201. In this study, human stool samples were either directly subjected to protein extraction or processed by differential centrifugation (DC) prior to protein extraction. This differential processing of stool resulted in significant differences in abundance for all microbial phyla, including the Firmicutes/Bacteroidetes ratio. Alterations in this ratio have been associated with metabolic disorders203 and therefore an accurate representation of the abundance of these microbes is essential to understand the host-microbe interaction. Alternative pre-protein extraction methods should be tested prior to their experimental implementation. Another important consideration when performing metaproteomics is the choice of protein extraction method199,200. Given the wide range of sample sources under metaproteomic investigation (e.g. biological fluid/tissue, soil, marine biofilm etc.) the protein extraction method will depend on the nature of the sample. Leary et al illustrated the importance of choice of protein extraction method by comparison of three commonly used protein extraction methods for marine biofilm (guanidine hydrochloride, bacterial protein extraction reagent (B-PER) and sequential citratephenol)204. They observed a negligible overlap in protein sequence and function in the three methods under comparison. In addition, different protein families were enriched depending on extraction method used. Guanidine hydrochloride enriched for proteins related to photosynthesis, carbohydrate metabolism, protein translation and carbon fixation; B-PER enriched for membrane transport and oxidative stress proteins; while extraction by sequential citrate-phenol enriched for

ACS Paragon Plus Environment

Page 55 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

calcium binding and structural proteins. Therefore, the extraction method chosen can greatly influence the community function inferred, resulting in an inaccurate representation. Differences due to extraction method are most likely to be observed in the case of low protein extraction yield. When designing a metaproteomic study several protein extraction methods should be tested and the protein extraction yield calculated.

In the case of low protein extraction yield, several methods may be

necessary to avoid inaccurate biological interpretation. Taken

together,

these

studies demonstrate the importance of sample preparation methods for metaproteomics, and how these procedures should be carefully considered upon experimental design and are dictated by the biological question under investigation. Challenges in Microbial Peptide Identification and Protein Inference: The current bottom-up proteomics are based on proteolytic digestion of proteins into peptides followed by LC separation and tandem mass spectrometry detection. The peptide sequences can then be predicted from the MS/MS spectra and the proteins inferred from the identified peptide sequences. A pre-built database is required for peptide identification through peptide-spectrum match (PSM) at a designated false discovery rate (FDR). However, for gut metaproteomics, database selection and the FDR estimation remain important challenges. The human gut microbiota represents one of the world’s most complex microbial communities and contains >1,000 species and ~9.9 million genes205. More importantly, the individual variabilities of human gut microbiome is extensive206, making the database selection more difficult mainly due to its large size. Currently, there are several online databases available that can be used for metaproteomic

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

database search, including human/mouse gut microbial gene catalogs205,207,208, HMP reference genomes (http://www.hmpdacc.org/), or even NCBI and UniprotKB proteome databases. A matched metagenome has also been proposed to be suitable as a reference database for metaproteomic studies209, however, it is still too large to suit a target-decoy database search strategy210 and also suffers from the expensive metagenomic sequencing. To overcome the problems related to database size, a two-step database search approach was recently proposed, which largely improves the sensitivity of peptide identification for metaproteomics211. Briefly, in the first step, all of the spectra were searched against a large database, which covers all the potential proteins (for example, the preceding gut microbial database) to generate a refined small database. The second search was then performed against a targetdecoy database derived from the refined one. By using this strategy, two-fold more high confident peptide sequences can be obtained; moreover, it can be applied in both metaproteomic and proteogenomic analysis211. As an alternative, de novo sequencing could also be used for metaproteomics which is database independent (inferring peptide sequences directly from MS/MS spectra)212, although it is not widely used by metaproteomic studies. Muth et al. reported that de novo sequencing could obtain up to 60% of identification rate compared to less than 10% for a database-based approach. Only ~25% of the peptides identified by de novo sequencing overlapped with database searching213, indicating that the quality of de novo sequence inferring requires further improvements. However, de novo sequencing can be performed as a complementary

ACS Paragon Plus Environment

Page 56 of 104

Page 57 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

method for database search, since it could potentially retrieve the sequences not included in the database, which is actually a common issue for metaproteomics. One outcome of proteomics is a list of protein identified and their expression levels for

comparison

between

different

samples.

Alternatively,

peptide-centric

approaches have been proposed for better interpretation of shotgun proteomic data214. The inference of proteins for bottom-up proteomics is difficult, and for metaproteomics this issue is accentuated since different bacteria may have different isoforms of similar proteins, which still have in common, many peptides. This renders the quantification of proteins less accurate, and thus more severely affects the taxonomy level quantitative analysis as described below. Taxonomic Assignment of Peptides and Proteins: A complex mixture of multiple organisms including different species (for example, the gut microbiota) can be reliably interrogated by DNA sequencing techniques. While metagenomics and metatranscriptomics can provide the potential functional capacity, metaproteomics reveals the actual functional expression, however taxonomic assignment by metaproteomics remains challenging. This is due to the fact that a peptide may be shared by different proteins, and furthermore, a protein could also be shared by multiple taxa215. Peptides which are unique to a specific taxa have been used for taxonomic analysis of metaproteome data, such as Unipept216 and MEGAN217. Briefly, the occurrence of a tryptic peptide is first searched in a database, such as UniprotKB, and then its taxon-specificity is determined based on the occurrences using a lowest common ancestor algorithm. The relative abundance of a specific taxon is usually estimated by the sum of all matched peptides or spectra counts.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

However, as demonstrated by a lab-assembled microbial mixture study218, all of these methods can’t obtain an accurate relative abundance information of taxa. This may be because of the small number of unique peptides for a specific taxon, particularly for species, in addition to the shared peptides that are not considered for calculation of the relative abundance. Thus, further work is required to rectify such problems in order to achieve accurate relative abundance information for different taxa, particularly at the bioinformatics level. Functional Interpretation: Although gut microbiota studies have been largely promoted by next-generation sequencing techniques, the functions of most of the bacteria are still unknown. Among the 9.9 million predicted human gut microbial genes, only 40% and 60% could be mapped into the KEGG and the evolutionary genealogy of genes nonsupervised orthologous groups (eggNOG) databases, respectively205. This makes the functional annotations of gut metaproteomic data more difficult. Currently available methods/software for functional annotation have been previously described and reviewed in215. Since most of the gut microbial proteins are not included in the reference database of pathway analysis tools, the identified protein sequences have to first be mapped into such databases as cluster of orthologous group (COG)219 and eggNOG220. It is worth noting that the COG database was released in 2003 and updated in 2014 to a version which includes all the known genera from Bacteria and Archaea221. In addition, the STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) database is a well maintained, commonly used and regularly updated database and web resource, allowing

ACS Paragon Plus Environment

Page 58 of 104

Page 59 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

protein-protein interaction analysis222. The best matches could be further used for pathways analysis in tools such as DAVID223 and iPATH224. Quantitative Metaproteomics: The first gut metaproteomic study was published in 2007 wherein they performed 2-D gel electrophoresis for quantitation followed by protein identification with MALDI-TOF225. In 2009, Verberkmoes et al226 reported the first shotgun metaproteomic study for gut microbiota by using label-free quantification (LFQ) based on normalized spectral abundance factor (NSAF)227. The accuracy of LFQ suffers greatly from the separate samples preparation (i.e., proteolytic digestion, desalting, fractionation, etc.) and mass spectrometry runs. However, the LFQ approach is currently the method of choice for gut metaproteomics due to its ease of execution, and more importantly, because there is currently no applicable labeling-based techniques. A recently proposed algorithm, namely MaxLFQ116, largely improved the LFQ approach and can be thus applied for gut metaproteomic studies. SILAC and SILAM (stable isotope labeling in mammals) are currently the most widely used quantitative strategies for proteomics, however, they are difficult to apply to metaproteomic studies due to the bacterial metabolic capability to synthesize most amino acids228. Instead, the full labeling of bacteria (such as 15N or 13C labeling) represents a promising approach for bacterial studies229,230. However, the gut microbiota is a very complex microbial community and most of the gut microbes are not cultivable231, so it would be difficult to generate a SILAC- or SILAM-like representative reference for gut metaproteomic studies. SILAM mice should have a well-labeled gut microbiota which can be used as a reference for

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

quantitative studies232, though further testing is still required. In addition, the extent to which the SILAM diet changes the gut microbiota is unknown, which may largely influence its performance as a spike-in reference. Moreover, to obtain an appropriate reference for human microbiota studies, it may be possible to transplant the human gut microbiota into germfree animals232 followed by labeling with SILAM diet feeding233. However, more work is required for testing the feasibility and performance of this approach. Chemical labeling methods such as dimethyl labeling have the potential be applied in gut metaproteomic studies. For example, in dimethyl labeling234, a spike-in reference can be generated by mixing all the samples followed by heavy dimethyl labeling, and then the pooled reference can be spiked into individual samples labeled with either light or intermediate dimethyl residues. By using this procedure, the variance introduced by the mass spectrometry analysis will be largely reduced, thus improving the quantification accuracy. While it might be a good choice for small-scale experiments with limited sample number, it is still challenging for largescale clinical studies due to the huge individual variability of human gut microbiota compositions. In addition, the dimethyl labeling experiments should also be carefully performed to avoid exposure to the toxic materials234. Proteomics for Understanding Host-Microbe Interactions: Although the microbiome has emerged as an important player in the development of several different types of human diseases, the mechanisms that lead to these aberrant host-microbe interactions remain mostly unknown235. Uncovering the complex interplay between the host and its associated microbiota entails combining several layers of biological

ACS Paragon Plus Environment

Page 60 of 104

Page 61 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

information, namely metagenomics, metatranscriptomics, metaproteomics and metabolomics for the characterization of the microbiota, in addition to host-centric assessment of this biological information. During the past few years, researchers have begun to integrate multi-omic strategies, including proteomics, to study the host-microbe interaction. Pérez-Cobas et al studied the effects of β-lactam therapy on the human fecal microbiota, which is to our knowledge, is the most comprehensive multi-omic study, including 16S rRNA and 16S rDNA sequencing, metagenomics, metatranscriptomics, metametabolomics and metaproteomics236. In this proof-of-concept study, they explored the effects of antibiotics on gut microbiota and also the host response to the antibiotic-induced gut microbial changes. This study demonstrated that the antibiotic treatment resulted in significant alterations at both the level of host metabolism and microbial community level which were much more dramatic effects than anticipated. In a mouse study, Deatherage Kaiser et al also explored the complex molecular interactions between host, pathogen, and the microbiome during Salmonella infection through integrating proteomics (host and microbiota), metabolomics, glycomics, and metagenomics237. They revealed that Salmonella thrived in the gut through increased fucose utilization, and induced intestinal inflammation, disrupted the microbial community, and altered the intestinal metabolites which nutrient commensal bacteria in the gut. Another study investigating the host-microbe interaction comes from Lassek et al in 2015 wherein they use a metaproteomic strategy combined with 16S rRNA gene sequencing to investigate the interplay between the virulence of specific microbes in catheter biofilm associated urinary tract infections and the

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

host immune response238. The authors observed protein differences that demonstrate the method by which the host restricts iron (increased lactoferrin expression), thereby hindering pathogen survival and pathogen counterpart ironacquisition strategies to overcome the limited iron source (lactoferrin degrading protease). Most of the current multi-omic studies are a simple combination of the results observed from different omic data. Strategies that aim at integrating the multi-omic datasets remains a challenge, although several integrative tools have already been implemented, such a Taverna239, Galaxy240, and KNIME241 which are all platforms that enable combining tools for the construction of complex analysis pipelines. The development of additional software tools and pipelines are needed that are easy-touse, accessible, practical and that will facilitate combining multi-omic data in order to shed light on host-microbe interactions. In summary, more attention should be given to the interactions or associations between host and microbe by performing trans-kingdom analysis. As also mentioned in a few of the above studies, we propose a proteomic-centric workflow for understanding host-microbe interactions (Figure 4) since it enables the evaluation of the biological functional unit from both the human and microbes. Briefly, samples including either stool, biopsy or mucosal-luminal interface (MLI) aspirates242,243 could be collected and subjected to further sample pre-processing for enrichment of either host or microbial cells/proteins. The pre-processed samples are then used for protein extraction, proteolytic digestion, peptide fractionation/separation, and analyzed with mass spectrometry to generate

ACS Paragon Plus Environment

Page 62 of 104

Page 63 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

metaproteome or host proteome datasets. Several bioinformatic software or platforms are available for proteome or metaproteome data processing and interpretation, including widely used MaxQuant and Perseus. In the meantime, all other omic strategies (including metagenomics, metatranscriptomics, and metametabolomics, etcetera) could also be incorporated into the framework to provide a more comprehensive overview since the current metaproteomic approach lacks the deep community profiling of low abundant gut microbial species that is afforded by sequencing techniques. Metagenomes and metatranscriptomes can also provide the matched database for metaproteomic data analysis, which enables improved protein identifications209. Finally, and also most importantly, a transkingdom co-variation analysis should be performed to identify co-variate human or bacterial proteins, genes, pathways or metabolites, which might thereby represent the potential key players linking the host and microbes. Unraveling the mechanisms of host-microbe interactions will guide therapeutic intervention strategies and aide in treating diseases associated with alterations of the microbiota.

PROTEOMICS AND CLINICAL APPLICATIONS: Proteomics has contributed greatly to understanding disease pathogenesis. Continuing advances in proteomics (as reviewed above) also hold great promise to revolutionize how to detect disease, design and best apply therapies, and to monitor disease progression or remission. The ability to reliably identify biomarkers to detect and monitor a particular disease, and to choose the protein target(s) for therapy, would significantly and positively impact health care costs, both human and monetary. Below we review

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

progress in these clinically relevant proteomics applications of which biomarker discovery continues to lead in terms of publications, and as discussed, is being vetted more and more vigorously. These tools although not yet commonplace in the clinic are now more tangible than ever. Biomarkers: Proteins found in bodily fluids or tissues that are indicative of changes due to disease or treatment represent an opportunity as biomarkers for disease prediction, diagnosis, prognosis and monitoring of therapy. Because of the ability to assess thousands of proteins, and post-translational modifications, in multiple samples in a relatively short time, the application of proteomic approaches in identifying disease-specific biomarkers has rapidly increased; in the last 3 years, ‘biomarker’ is represented in 16% of published proteomic studies. However, this is an over/misrepresentation of the number of clinically relevant biomarkers identified by proteomics. Many studies have simply profiled proteomes from different disease states, and suggested that a relative change in expression of a given protein represents a biomarker candidate. These bold claims have prematurely inflated expectations and left a negative impression on the positive role that proteomics can play in biomarker identification when executed with appropriate design (Figure 5) at both the discovery and verification stages, including (1) the use of appropriate numbers and types of patients and controls, (2) specified sampling and processing methods, (3) robust analysis, and (4) an independent population for biomarker validation244,245. In contrast to previous years, there is a rise in the number of proteomics studies that are performed in this manner, and that have greater clinical potential.

ACS Paragon Plus Environment

Page 64 of 104

Page 65 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

An important first step in experimental design is the selection of a relevant control population for comparison. A number of candidate biomarkers show overlap between multiple diseases246, for example, due to the elevation of the protein in response to general inflammation. Despite a high sensitivity, the level of specificity is reduced. While a comparable disease counterpart may be difficult to obtain (due to ethical considerations, or sample availability), samples from a healthy control will result in an abundance of altered proteins compared with samples from a diseased state. Several groups have employed model systems at the discovery stage, wherein multiple controls can be utilized, prior to verification of candidate biomarkers in clinical samples. To further overcome the issue of low sensitivity, there is a trend toward identification of biomarker panels/signatures rather than a single protein. In an initial study247, Chaker et al compared the secretomes of non- or aggressive thyroid carcinoma

cell

lines.

Initially,

candidate

biomarkers

were

verified

by

immunoblotting and ELISA of patient serum, but then validated in a follow-up study248 with an increased number of patients by immunostaining, wherein a panel of two proteins (PGK1 and Galectin-3) was sufficient for distinguishing benign from malignant thyroid nodules as well as a subset of indeterminate nodules. In an effort to identify diagnostic biomarkers of lung cancer, Birse et al249 performed discovery experiments using lung tissue biospies (paired malignant vs normal), non- vs. cancerous lung cancer cell line lysates, and on secretome from cell lines. Verification was performed on a subset of candidate biomarkers and on three reference markers, characterized elsewhere, by ELISA using serum from patients with lung

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

cancer and with smoker controls. Ultimately a panel of 8 markers was used for development of a multi-marker model, and then verified in an independent cohort of patients and controls. The biological tissue or fluid from which biomarkers are assessed is dependent upon the disease. Biopsy samples are sometimes a source of biomarkers since it is representative of the local diseased tissue rather than a systemic response. Recognizing the sampling variation that can occur with biopsy sampling, Shipitsin et al250 designed their study to simulate sampling error. By combining data from areas with both high and low grade prostate cancer, they developed a panel of 12 proteins that could predict aggressiveness and lethal outcome of disease. While serum offers a relatively non-invasive source of biomarkers, shotgun proteomics has not been able to identify biomarkers of disease because of the dynamic range of proteins in blood. To overcome this, post-translationally modified proteins have been targeted at the discovery stage, with later successful verification. Surinova et al utilized glycoprotein-enriched tumor lysates to identify diagnostic251 and prognostic252 markers of colon cancer, which were later verified in plasma. Mebazza et al253 applied the combined fractional diagonal chromatography (COFRADIC) approach to analyze the plasma N-terminome of patients with acute heart failure, identifying quiescin Q6 in addition to three established markers. Quiescin Q6 was validated in independent patient cohorts and in a rat models253, and is now thought to contribute to tumor invasion254. Given the extensive amount of data acquired from proteomic analysis, winnowing data to a subset of biomarker candidates is a significant challenge for which there is

ACS Paragon Plus Environment

Page 66 of 104

Page 67 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

currently no standardized approach. In the search for thyroid cancer biomarkers, Chaker et al selected candidate biomarkers for further verification based upon tumorigenic pathways and known biological functions247. Similarly, Shipitsin prioritized prostate cancer candidates by tumorigenic relevance and also upon antibody availability; ultimately, univariate and multivariate analyses were applied for the final selection of biomarkers250. While pathway analysis of biomarker candidates is a mechanism to incorporate biological relevance, the underlying assumption is that pathogenesis of disease is fully understood, and the potential role of all proteins in both homeostatic and pathological states are known; thus a bias is placed upon on the discovery-based proteomics approach. In the case of heart failure, the biomarker quiescin Q6 would have been eliminated from further investigation had this approach been taken rather than the use of statistical analysis, namely Significance Analysis of Microarrays253. Univariate analysis (Student’s t-test) has been applied to identify candidate biomarkers from urine for pancreatic ductal adenocarcinoma255 and idiopathic nephrotic syndrome256. More robust analyses have utilized univariate approaches as a first filter, but multivariate approaches to identify panels of proteins with prognostic251,257-259 or diagnostic potential252,255,260. For example, De Marchi et al257 utilized Student’s t-test to identify and rank proteins differentially expressed in patients with good vs. poor outcome after tamoxifen treatment of breast cancer, and then performed step-down multivariate analysis (Cox regression) to identify a panel of 4 proteins that were predictive of tamoxifen treatment in a separate cohort of patients. To compare univariate, semi-multivariate, and

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

multivariate approaches for biomarker discovery, Christin et al261 utilized a spiked urine data set and found that the best approach used is dependent upon sample size. However, with the intent of transitioning into the clinic, a multivariate approach that limits the number of false positives and provides high precision is best applied. An important aspect of each of these studies is that, following candidate biomarker identification, performance characteristics of proteins/panels (namely receiver operating characteristics) was evaluated in a second independent patient cohort262. The transition from discovery and validation to the clinic is not an easy one, nor one readily tracked. VeriStrat is a clinically utilized 8-peak MALDI signature of serum with utility in predicting outcome for patients with metastatic non-small-cell lung cancer263, which over 8 years after identification continues to be evaluated for utility in both prediction and prognosis264-267. Similarly, OVA1 took over 10 years to be applied as a diagnostic test for ovarian cancer268. Following the rapid growth of proteomics studies that have profiled diseases, and with improved study design, it is anticipated that more proteomic-identified biomarkers will be validated in prospective, blinded multicentre studies, transitioning candidate biomarker proteins/panels269 toward the clinic.

Drug Design: Proteomic platforms can be designed to identify aberrant proteins and pathways in disease, not solely as biomarkers but to use as potential drug targets, to monitor and validate mechanism of action during drug design, to monitor ‘off-target’ effects alerting to the potential for unwanted side-effects, and to screen drugs for efficacy. Knowing which proteins and pathways a drug affects can provide

ACS Paragon Plus Environment

Page 68 of 104

Page 69 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

invaluable insight into drug effectiveness and toxicity. Proteomic-based monitoring, when carried out early and thoroughly can warn of potential negative side effects saving the industry and health care considerable costs. To search for a therapeutic target Subbannayya et al carried out differential, quantitative MS analyses of four different gallbladder cancer (GBC) cell lines that exhibit non-invasive to invasive properties270. They found 31 of the 3653 proteins they identified to be upregulated in the three invasive cell lines and 61 were downregulated. One highly up-regulated protein the macrophage migration inhibitory factor (MIF) was chosen for further analyses due to its reported role in tumor cell proliferation. They found 72% of gall bladder adenocarcinomas stained strongly for MIF in immunohistochemical validation while 62% of cholecystitis tissues stained weakly. They studied its invasive properties in GBC cells. They conclude that MIF is a potential therapeutic target whose inhibition would slow the progression of GBC. Decision Making in Health Care: The gap between proteomics and clinical medicine is still present but will decrease as more rigorously designed clinical studies are coupled with proteomics. Surinova et al

252asked

the question: Can we use

proteomics to decide whether or not to aggressively treat with chemotherapy stage2 and stage-3 colorectal cancers (CRC)? Presently whether to treat is based on staging, but not all patients at a particular stage responds to the treatment, which is aggressive and costly. If one can predict outcome, in this case, who will benefit from chemotherapy, better stratification of treatment can be developed. Surinova et al used a well documented cohort of stage 1-4 CRC patients in a proteomic screen of plasma glycoproteins proteins to identify candidate target proteins were predictive

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

of clinical outcome. They then validated their markers in a clinical cohort for which records of outcomes were available using bioinformatics tools described above in the biomarker section. They found that they could accurately predict outcome in 70% of the cases when using 6 biomarkers with other clinical characteristics, which was significantly better than using clinical characterization only. The Future and Personalized Medicine: Personalized medicine is based upon the recognition that ethnic groups and indeed persons respond differently to given drugs or drug doses. We are now reaching an exciting phase in which we can do individualized predictions of risks for disease or alternatively protection from disease. These individualized medical fingerprints can affect treatment evaluation and options; indeed if an individuals proteo-fingerprint in relation to disease is ignored, treatment success can be compromised. Proteomics and proteogenomics as discussed above hold great promise in personalized medicine. Also coined as precision medicine, this health care revolution is on the horizon. We are now beginning to understand and have the capacity to compile the necessary information on an individual’s phenotype that determine prediction and risk of disease, and personalized treatment. However such individualized care does not come without costs and it will be a challenge to health care systems to determine how to cover these costs and how to deal with the large data sets that the proteomic and proteogenomic fields are now dealing with. Hospitals and health care centers will need to invest in this technology; that means building and manning multi-omics facilities or contracting out, investing in software

ACS Paragon Plus Environment

Page 70 of 104

Page 71 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

and information technology facilities. Data management will probably be the biggest hurdle, but continued technology developments will reduce costs. We expect omics data to teach us about a disease, how it began, how it progressed and then how we should treat it. Moreover we want to predict if who is at risk for that disease and prevent it. The ability to delay or prevent disease, and the ability to determine best treatment options will no doubt offset the costs of personalizing medicine.

PERSPECTIVES One of the challenges present in proteomics, which we have previously pointed to, is experimental design and data quality. A lot of publications are still based on too few biological repeats, leaving them underpowered to determine significance, which we believe is not acceptable and leads to dubious biological conclusions. These poorly designed proteomic reports are one of the causes of the poor acceptance of proteomics in biological sciences. Assessing data quality remains an important challenge in proteomics, in particular for post-translational modifications. Anecdotally, we have reviewed several papers that reported lower than 1% false discovery for peptide identification, but revealed upon manual verifications of the MS/MS much higher inaccurate matches including MS/MS that are clearly not from peptides. Clearly, more research needs to be done to develop tools to properly assess data quality and the accuracy of search tools. However, the field of bottom-up proteomics is maturing. Many groups are doing very well designed, large –omics studies. These continue to ‘raise the bar’ and set

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

new standards for others in the field. With this comes growing appreciation of proteomics in mainstream biology/biochemistry. And despite the challenges above, data generation is faster and more reliable than even. One of the bottlenecks now is data integration: How best to bring together these large data sets from multi-omics studies, what analyses should be done, how should we interpret our findings? Bioinformatics will help bottom-up proteomics keep up in the era of systems biology.

ACS Paragon Plus Environment

Page 72 of 104

Page 73 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

BIOGRAPHIES

Janice Mayne completed a B.Sc (Hon) and PhD in Biochemistry from Memorial University of Newfoundland and carried out post-doctoral studies at the Ottawa Hospital Research Institute. She is currently a Research Associate with Professor Daniel Figeys at the Ottawa Institute of Systems Biology and Department of Biochemistry, Microbiology and Immunology at the University of Ottawa.

Her

studies focus on the role of endoproteolyses in health and disease using a combination of biochemistry, cell biology and proteomic techniques.

Zhibin Ning obtained his B.S. degree in life science at Shandong Normal University, China in 2003. He received his Ph.D. degree in biotechnology and biochemistry from Shanghai Institutes for Biological Sciences in 2008 for the development and applications of liquid based separation strategies for proteomics. He is presently a postdoctoral fellow in Ottawa Institute of Systems Biology, University of Ottawa, under the guidance of Professor Daniel Figeys. He is focusing on technology development and applications in proteomics and lipidomics.

Xu Zhang completed a B.Sc degree in Biotechnology (2006) and M.Sc. degree in Microbiology (2009) in Lanzhou University, China. He obtained his Ph.D. degree in Microbiology from the Shanghai Jiao Tong University in 2013. He is currently a postdoctoral fellow at the Ottawa Institute of Systems Biology, University of Ottawa, under the supervision of Professors Daniel Figeys and Alain Stintzi. His research focuses on the Host-Gut microbe interactions with multi-omic strategies

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

including proteomics/metaproteomics, metagenomics and metatranscriptomics, in the context of human disease including metabolic syndrome and inflammatory bowel disease.

Amanda E. Starr completed a B.Sc. degree in biology and M.Sc. degree in biomedical sciences both at the University of Guelph. She obtained her Ph.D. in biochemistry from the University of British Columbia in 2010 before working as a postdoctoral fellow under the supervision of Professor Daniel Figeys. Currently, she is a Research Associate at the Ottawa Institute of Systems Biology, University of Ottawa, where she is developing and applying proteomics techniques to study inflammatory disease.

Rui Chen completed a B.S. degree in chemistry at Wuhan Univeristy, China in 2006 and received his Ph.D in Analytical Chemistry from the Dalian Institute of Chemical Physics, the Chinese Academy of Science in 2012 for the development of technologies for protein glycosylation analysis with mass spectrometry based proteomics. He is currently a postdoctoral fellow in Ottawa Institute of Systems Biology, University of Ottawa, under the guidance of Professor Daniel Figeys. He is focusing on technology development and applications of glycoproteomics.

Bo Xu completed a B. Sc. degree in Pharmacy at Liaoning Normal University, China in 2009. She obtained her Ph. D. degree in Analytical Chemistry from Dalian Institute of Chemical Physics, Chinese Academy of Sciences in 2015. She is currently working

ACS Paragon Plus Environment

Page 74 of 104

Page 75 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

as a postdoctoral fellow at the Ottawa Institute of System Biology, University of Ottawa, under the supervision of Professor Daniel Figeys. Her studies focus on technology development in PTM proteomics and application in circadian biology.

Ming Wen completed a M.Sc. degree and his Ph.D degree in bioinformatics both at the Sun Yat-sen University, China in 2012. He is currently working as a postdoctoral fellow at the Ottawa Institute of Systems Biology, University of Ottawa, under the supervision of Professor Daniel Figeys where he is focusing on software development for proteomics identification and quantification.

Kai Cheng completed a B.S. degree if Fudan university, China in 2008 and then obtained his Ph.D. degree in Analytical Chemistry from Dalian Institute of Chemical Physics, Chinese Academy of Sciences in 2015. He is currently working as a postdoctoral fellow at the Ottawa Institute of System Biology, University of Ottawa, under the supervision of Professor Daniel Figeys. His studies focus on bioinformatics.

Cheng-Kang Chiang received his B.Sc degree in the Department of Chemistry, National Dong-Hwa University in 2005 and obtained his PhD in the Department of Chemistry, National Taiwan University in 2010. He currently holds a postdoctoral position at the Ottawa Institute of Systems Biology, University of Ottawa, under the supervision of Professor Daniel Figeys. His research focuses on the development and application of microfluidic technologies in proteomics.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Shelley Deeke completed her B.Sc degree and M.Sc. degree in biochemistry both at the University of Ottawa. She is currently a PhD candidate at the Ottawa Institute of Systems Biology, University of Ottawa under the supervision of Professor Daniel Figeys where she is applying proteomic techniques to elucidate the proteomic alterations that occur with the onset of inflammatory bowel disease.

Deeptee Seebun completed a B.Sc degree in Biochemistry/Biotechnology at Carleton University in 2008. She is currently working as a Research Technician at the Ottawa Institute of Systems Biology, University of Ottawa with Professor Daniel Figeys. Her work focuses partly on technology development and applications in proteomics.

Alexandra Star completed her Honours B.Sc. in Science with Specialization in Biochemistry from the University of Ottawa in 2013. She is currently a M.Sc. Candidate under the supervision of Dr. Daniel Figeys in the Ottawa Institute of Systems Biology (OISB) at the University of Ottawa. Her work focuses on the application and optimization of a novel methylation enrichment technique in biological samples.

Jasmine I. Moore received an Advanced Diploma for Biotechnology technologist from Algonquin College in 2012. She then continued her education at Lakehead

ACS Paragon Plus Environment

Page 76 of 104

Page 77 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

University where she received an Honors B.Sc degree in Applied Biomolecular sciences in 2014. Jasmine's honors thesis was completed at the Thunder Bay Regional Research Institute. Currently, Jasmine is a research technician With Professor Daniel Figey's laboratory at the Ottawa Institute of Systems Biology, University of Ottawa.

Daniel Figeys is Head of and professor in the Department of Biochemistry, Microbiology and Immunology at the University of Ottawa and a Tier-1 Canada Research Chair in Proteomics and Systems Biology. Daniel obtained a B.S. and a M.Sc. in chemistry from the Université de Montréal. He obtained a Ph.D. in chemistry from the University of Alberta and did his postdoctoral studies at the University of Washington. Daniel’s research involves developing proteomics technologies and their applications in systems biology.

ACKNOWLEDGEMENTS J.M. and Z.N. are co-first authors of this Review. D.F. acknowledges a Canada Research Chair in Proteomics and Systems Biology and funding from the Natural Sciences and Engineering Research Council of Canada (NSERC), Canadian Institutes of Health Research (CIHR), Genome Canada, the Province of Ontario and the J.L. Levesque Foundation.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

REFERENCES 1 2 3

4 5

6 7

8 9 10

11

12 13 14 15 16 17 18 19

Wisniewski, J. R. & Mann, M. Analytical chemistry 84, 2631-2637, doi:10.1021/ac300006b (2012). Wisniewski, J. R. & Prus, G. Analytical chemistry 87, 6861-6867, doi:10.1021/acs.analchem.5b01215 (2015). Hughes, C. S., Foehr, S., Garfield, D. A., Furlong, E. E., Steinmetz, L. M. & Krijgsveld, J. Molecular systems biology 10, 757, doi:10.15252/msb.20145625 (2014). Kulak, N. A., Pichler, G., Paron, I., Nagaraj, N. & Mann, M. Nature methods 11, 319-324, doi:10.1038/nmeth.2834 (2014). Zaccaria, A., Roux-Dalvai, F., Bouamrani, A., Mombrun, A., Mossuz, P., Monsarrat, B. & Berger, F. Int J Nanomedicine 10, 1869-1883, doi:10.2147/IJN.S70503 (2015). Fischer, R. & Kessler, B. M. Proteomics 15, 1224-1229, doi:10.1002/pmic.201400436 (2015). Hosp, F., Scheltema, R. A., Eberl, H. C., Kulak, N. A., Keilhauer, E. C., Mayr, K. & Mann, M. Molecular & cellular proteomics : MCP 14, 2030-2041, doi:10.1074/mcp.O115.049460 (2015). Binai, N. A., Marino, F., Soendergaard, P., Bache, N., Mohammed, S. & Heck, A. J. Journal of proteome research 14, 977-985, doi:10.1021/pr501011z (2015). Horning, O. B., Kjeldsen, F., Theodorsen, S., Vorm, O. & Jensen, O. N. J Proteome Res 7, 3159-3167, doi:10.1021/pr700865c (2008). Falkenby, L. G., Such-Sanmartin, G., Larsen, M. R., Vorm, O., Bache, N. & Jensen, O. N. Journal of proteome research 13, 6169-6175, doi:10.1021/pr5008575 (2014). Li, J., Zhou, L., Wang, H., Yan, H., Li, N., Zhai, R., Jiao, F., Hao, F., Jin, Z., Tian, F., Peng, B., Zhang, Y. & Qian, X. The Analyst 140, 1281-1290, doi:10.1039/c4an02092h (2015). Jiang, S., Zhang, Z. & Li, L. Journal of chromatography. A 1412, 75-81, doi:10.1016/j.chroma.2015.07.121 (2015). Fang, P., Liu, M., Xue, Y., Yao, J., Zhang, Y., Shen, H. & Yang, P. The Analyst 140, 7613-7621, doi:10.1039/c5an01505g (2015). Stevens, T. J. & Arkin, I. T. Proteins 39, 417-420 (2000). Lin, Y., Wang, K., Liu, Z., Lin, H. & Yu, L. Journal of chromatography 1002, 144151, doi:10.1016/j.jchromb.2015.08.019 (2015). Lin, Y., Zhou, J., Bi, D., Chen, P., Wang, X. & Liang, S. Analytical biochemistry 377, 259-266, doi:10.1016/j.ab.2008.03.009 (2008). Smolders, K., Lombaert, N., Valkenborg, D., Baggerman, G. & Arckens, L. Scientific reports 5, 10917, doi:10.1038/srep10917 (2015). Ning, Z., Hawley, B., Seebun, D. & Figeys, D. J Membr Biol 247, 941-947, doi:10.1007/s00232-014-9668-6 (2014). Berggard, T., Linse, S. & James, P. Proteomics 7, 2833-2842, doi:10.1002/pmic.200700131 (2007).

ACS Paragon Plus Environment

Page 78 of 104

Page 79 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

20

21

22 23

24 25 26

27 28 29 30

31 32 33

34 35

Morris, J. H., Knudsen, G. M., Verschueren, E., Johnson, J. R., Cimermancic, P., Greninger, A. L. & Pico, A. R. Nature protocols 9, 2539-2554, doi:10.1038/nprot.2014.164 (2014). Marcon, E., Jain, H., Bhattacharya, A., Guo, H., Phanse, S., Pu, S., Byram, G., Collins, B. C., Dowdell, E., Fenner, M., Guo, X., Hutchinson, A., Kennedy, J. J., Krastins, B., Larsen, B., Lin, Z. Y., Lopez, M. F., Loppnau, P., Miersch, S., Nguyen, T., Olsen, J. B., Paduch, M., Ravichandran, M., Seitova, A., Vadali, G., Vogelsang, M. S., Whiteaker, J. R., Zhong, G., Zhong, N., Zhao, L., Aebersold, R., Arrowsmith, C. H., Emili, A., Frappier, L., Gingras, A. C., Gstaiger, M., Paulovich, A. G., Koide, S., Kossiakoff, A. A., Sidhu, S. S., Wodak, S. J., Graslund, S., Greenblatt, J. F. & Edwards, A. M. Nature methods 12, 725-731, doi:10.1038/nmeth.3472 (2015). Roux, K. J., Kim, D. I., Raida, M. & Burke, B. The Journal of cell biology 196, 801-810, doi:10.1083/jcb.201112098 (2012). Lambert, J. P., Mitchell, L., Rudner, A., Baetz, K. & Figeys, D. Molecular & cellular proteomics : MCP 8, 870-882, doi:10.1074/mcp.M800447-MCP200 (2009). Lambert, J. P., Tucholska, M., Pawson, T. & Gingras, A. C. Journal of proteomics 100, 55-59, doi:10.1016/j.jprot.2013.12.022 (2014). Lambert, J. P., Tucholska, M., Go, C., Knight, J. D. & Gingras, A. C. Journal of proteomics 118, 81-94, doi:10.1016/j.jprot.2014.09.011 (2015). Lambert, J. P., Ivosev, G., Couzens, A. L., Larsen, B., Taipale, M., Lin, Z. Y., Zhong, Q., Lindquist, S., Vidal, M., Aebersold, R., Pawson, T., Bonner, R., Tate, S. & Gingras, A. C. Nature methods 10, 1239-1245, doi:10.1038/nmeth.2702 (2013). Zhang, Y., Fonslow, B. R., Shan, B., Baek, M. C. & Yates, J. R., 3rd. Chemical reviews 113, 2343-2394, doi:10.1021/cr3003533 (2013). Olsen, J. V. & Mann, M. Molecular & cellular proteomics : MCP 12, 3444-3452, doi:10.1074/mcp.O113.034181 (2013). Gajadhar, A. S. & White, F. M. Current opinion in biotechnology 28, 83-87, doi:10.1016/j.copbio.2013.12.009 (2014). Zhang, Y., Muller, M., Xu, B., Yoshida, Y., Horlacher, O., Nikitin, F., Garessus, S., Magdeldin, S., Kinoshita, N., Fujinaka, H., Yaoita, E., Hasegawa, M., Lisacek, F. & Yamamoto, T. Proteomics 15, 2568-2579, doi:10.1002/pmic.201400454 (2015). Hunter, T. Cell 100, 113-127, doi:Doi 10.1016/S0092-8674(00)81688-8 (2000). Zhou, H. J., Ye, M. L., Dong, J., Corradini, E., Cristobal, A., Heck, A. J. R., Zou, H. F. & Mohammed, S. Nat Protoc 8, 461-480, doi:10.1038/nprot.2013.010 (2013). Larsen, M. R., Thingholm, T. E., Jensen, O. N., Roepstorff, P. & Jorgensen, T. J. D. Molecular & Cellular Proteomics 4, 873-886, doi:10.1074/mcp.T500007MCP200 (2005). Xiong, Z. C., Zhang, L. Y., Fang, C. L., Zhang, Q. Q., Ji, Y. S., Zhang, Z., Zhang, W. B. & Zou, H. F. J Mater Chem B 2, 4473-4480, doi:10.1039/c4tb00479e (2014). He, X. M., Chen, X., Zhu, G. T., Wang, Q., Yuan, B. F. & Feng, Y. Q. Acs Appl Mater Inter 7, 17356-17362, doi:10.1021/acsami.5b04572 (2015).

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

36 37

38 39 40 41 42

43 44 45 46 47 48 49

50 51

52 53 54 55 56

Zhao, M., Deng, C. H. & Zhang, X. M. Chem Commun 50, 6228-6231, doi:10.1039/c4cc01038h (2014). Chen, Y. J., Xiong, Z. C., Peng, L., Gan, Y. Y., Zhao, Y. M., Shen, J., Qian, J. H., Zhang, L. Y. & Zhang, W. B. Acs Appl Mater Inter 7, 16338-16347, doi:10.1021/acsami.5b03335 (2015). Yan, Y., Sun, X., Deng, C., Li, Y. & Zhang, X. Anal Chem 86, 4327-4332, doi:10.1021/ac500047p (2014). Wan, H., Li, J. N., Yu, W. G., Liu, Z. Y., Zhang, Q. Q., Zhang, W. B. & Zou, H. F. Rsc Adv 4, 45804-45808, doi:10.1039/c4ra08692a (2014). Jabeen, F., Najam-ul-Haq, M., Rainer, M., Guzel, Y., Huck, C. W. & Bonn, G. K.. Anal Chem 87, 4726-4732, doi:10.1021/ac504818s (2015). Wang, M. Y., Deng, C. H., Li, Y. & Zhang, X. M. Acs Appl Mater Inter 6, 1177511782, doi:10.1021/am502530c (2014). Wijeratne, A. B., Wijesundera, D. N., Paulose, M., Ahiabu, I. B., Chu, W. K., Varghese, O. K. & Greis, K. D. Acs Appl Mater Inter 7, 11155-11164, doi:10.1021/acsami.5b00799 (2015). Min, Q. H., Li, S. Y., Chen, X. Q., Abdel-Halim, E. S., Jiang, L. P. & Zhu, J. J. Acs Appl Mater Inter 7, 9563-9572, doi:10.1021/acsami.5b01006 (2015). Chen, X. Q., Li, S. Y., Zhang, X. X., Min, Q. H. & Zhu, J. J. Nanoscale 7, 5815-5825, doi:10.1039/c4nr07041k (2015). Trentini, D. B., Fuhrmann, J., Mechtler, K. & Clausen, T. Molecular & cellular proteomics : MCP, doi:10.1074/mcp.O113.035790 (2014). Song, C. X., Ye, M. L., Han, G. H., Jiang, X. N., Wang, F. J., Yu, Z. Y., Chen, R. & Zou, H. F. Anal Chem 82, 53-56, doi:10.1021/ac9023044 (2010). Batth, T. S., Francavilla, C. & Olsen, J. V. J Proteome Res 13, 6176-6186, doi:10.1021/pr500893m (2014). Villen, J. & Gygi, S. P. Nat Protoc 3, 1630-1638, doi:10.1038/nprot.2008.150 (2008). Beausoleil, S. A., Jedrychowski, M., Schwartz, D., Elias, J. E., Villen, J., Li, J. X., Cohn, M. A., Cantley, L. C. & Gygi, S. P. P Natl Acad Sci USA 101, 12130-12135, doi:10.1073/pnas.0404720101 (2004). Zarei, M., Sprenger, A., Metzger, F., Gretzmeier, C. & Dengjel, J. J Proteome Res 10, 3474-3483, doi:10.1021/pr200092z (2011). Han, G. H., Ye, M. L., Zhou, H. J., Jiang, X. N., Feng, S., Jiang, X. G., Tian, R. J., Wan, D. F., Zou, H. F. & Gu, J. R. Proteomics 8, 1346-1361, doi:10.1002/pmic.200700884 (2008). Alpert, A. J., Hudecz, O. & Mechtler, K. Anal Chem 87, 4704-4711, doi:10.1021/ac504420c (2015). Alpert, A. J. Anal Chem 80, 62-76, doi:10.1021/ac070997P (2008). Gan, C. S., Guo, T. N., Zhang, H. M., Lim, S. K. & Sze, S. K.. J Proteome Res 7, 4869-4877, doi:10.1021/pr800473j (2008). McNulty, D. E. & Annan, R. S. Molecular & Cellular Proteomics 7, 971-980, doi:10.1074/mcp.M700543-MCP200 (2008). Boersema, P. J., Mohammed, S. & Heck, A. J. R. Anal Bioanal Chem 391, 151159, doi:10.1007/s00216-008-1865-7 (2008).

ACS Paragon Plus Environment

Page 80 of 104

Page 81 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

57 58 59 60

61 62

63 64

65 66 67 68

69

70 71 72 73

74

75

Loroch, S., Zahedi, R. P. & Sickmann, A. Anal Chem 87, 1596-1604, doi:10.1021/ac502708m (2015). Zappacosta, F., Scott, G. F., Huddleston, M. J. & Annan, R. S. J Proteome Res 14, 997-1009, doi:10.1021/pr501025e (2015). Udeshi, N. D., Mertins, P., Svinkina, T. & Carr, S. A. Nature protocols 8, 19501960, doi:10.1038/nprot.2013.120 (2013). Akimov, V., Henningsen, J., Hallenborg, P., Rigbolt, K. T., Jensen, S. S., Nielsen, M. M., Kratchmarova, I. & Blagoev, B. Journal of proteome research 13, 41924204, doi:10.1021/pr500549h (2014). Min, M., Mayor, U., Dittmar, G. & Lindon, C. Molecular & cellular proteomics : MCP 13, 2411-2425, doi:10.1074/mcp.M113.033498 (2014). Porras-Yakushi, T. R., Sweredoski, M. J. & Hess, S. Journal of the American Society for Mass Spectrometry 26, 1580-1587, doi:10.1007/s13361-0151168-0 (2015). Valkevich, E. M., Sanchez, N. A., Ge, Y. & Strieter, E. R. Biochemistry 53, 49794989, doi:10.1021/bi5006305 (2014). Svinkina, T., Gu, H., Silva, J. C., Mertins, P., Qiao, J., Fereshetian, S., Jaffe, J. D., Kuhn, E., Udeshi, N. D. & Carr, S. A. Molecular & cellular proteomics : MCP 14, 2429-2440, doi:10.1074/mcp.O114.047555 (2015). Bryson, B. D., Del Rosario, A. M., Gootenberg, J. S., Yaffe, M. B. & White, F. M. Proteomics 15, 1470-1475, doi:10.1002/pmic.201400401 (2015). Lai, Z. W., Gomez-Auli, A., Keller, E. J., Mayer, B., Biniossek, M. L. & Schilling, O. Proteomics 15, 2470-2478, doi:10.1002/pmic.201500023 (2015). Wang, H., Li, H., Zhang, W., Wei, L., Yu, H. & Yang, P. Proteomics 14, 78-86, doi:10.1002/pmic.201200544 (2014). Cova, M., Oliveira-Silva, R., Ferreira, J. A., Ferreira, R., Amado, F., Daniel-daSilva, A. L. & Vitorino, R.. Methods in molecular biology 1243, 83-100, doi:10.1007/978-1-4939-1872-0_5 (2015). Ruiz-May, E., Hucko, S., Howe, K. J., Zhang, S., Sherwood, R. W., Thannhauser, T. W. & Rose, J. K. Molecular & cellular proteomics : MCP 13, 566-579, doi:10.1074/mcp.M113.028969 (2014). Zhang, H., Li, X. J., Martin, D. B. & Aebersold, R.. Nature biotechnology 21, 660666, doi:10.1038/nbt827 (2003). Cao, Q., Ma, C., Bai, H., Li, X., Yan, H., Zhao, Y., Ying, W. & Qian, X. The Analyst 139, 603-609, doi:10.1039/c3an01532g (2014). Huang, G., Sun, Z., Qin, H., Zhao, L., Xiong, Z., Peng, X., Ou, J. & Zou, H. The Analyst 139, 2199-2206, doi:10.1039/c4an00076e (2014). Huang, J., Qin, H., Sun, Z., Huang, G., Mao, J., Cheng, K., Zhang, Z., Wan, H., Yao, Y., Dong, J., Zhu, J., Wang, F., Ye, M. & Zou, H. Scientific reports 5, 10164, doi:10.1038/srep10164 (2015). Zhang, Z., Sun, D., Cong, Y., Mao, J., Huang, J., Qin, H., Liu, J., Huang, G., Wang, L., Ye, M. & Zou, H. Journal of proteome research 14, 3892-3899, doi:10.1021/acs.jproteome.5b00306 (2015). Chen, R., Seebun, D., Ye, M., Zou, H. & Figeys, D. Journal of proteomics 103, 194-203, doi:10.1016/j.jprot.2014.03.040 (2014).

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

76 77 78 79 80 81 82 83 84 85

86

87 88

89 90

91

92 93 94 95

Wu, S., Li, X., Zhang, F., Jiang, G., Liang, X. & Yang, B. The Analyst 140, 39213924, doi:10.1039/c5an00570a (2015). Wang, J., Wang, Y., Gao, M., Zhang, X. & Yang, P.. ACS Appl Mater Interfaces 7, 16011-16017, doi:10.1021/acsami.5b04295 (2015). Li, J., Wang, F., Liu, J., Xiong, Z., Huang, G., Wan, H., Liu, Z., Cheng, K. & Zou, H. Chemical communications 51, 4093-4096, doi:10.1039/c5cc00187k (2015). Ding, P., Li, X., Qing, G., Sun, T. & Liang, X. Chemical communications 51, 16111-16114, doi:10.1039/c5cc06279a (2015). Woo, C. M., Iavarone, A. T., Spiciarich, D. R., Palaniappan, K. K. & Bertozzi, C. R. Nature methods 12, 561-567, doi:10.1038/nmeth.3366 (2015). Khoury, G. A., Baliban, R. C. & Floudas, C. A. Scientific reports 1, doi:10.1038/srep00090 (2011). Levy, D., Liu, C. L., Yang, Z., Newman, A. M., Alizadeh, A. A., Utz, P. J. & Gozani, O. Epigenetics & chromatin 4, 19, doi:10.1186/1756-8935-4-19 (2011). Moore, K. E. & Gozani, O. Biochimica et biophysica acta 1839, 1395-1403, doi:10.1016/j.bbagrm.2014.02.008 (2014). Ong, S. E., Mittler, G. & Mann, M. Nature methods 1, 119-126, doi:10.1038/nmeth715 (2004). Liu, H., Galka, M., Mori, E., Liu, X., Lin, Y. F., Wei, R., Pittock, P., Voss, C., Dhami, G., Li, X., Miyaji, M., Lajoie, G., Chen, B. & Li, S. S. Molecular cell 50, 723-735, doi:10.1016/j.molcel.2013.04.025 (2013). Moore, K. E., Carlson, S. M., Camp, N. D., Cheung, P., James, R. G., Chua, K. F., Wolf-Yadlin, A. & Gozani, O. Molecular cell 50, 444-456, doi:10.1016/j.molcel.2013.03.005 (2013). Bremang, M., Cuomo, A., Agresta, A. M., Stugiewicz, M., Spadotto, V. & Bonaldi, T. Molecular bioSystems 9, 2231-2247, doi:10.1039/c3mb00009e (2013). Guo, A., Gu, H., Zhou, J., Mulhern, D., Wang, Y., Lee, K. A., Yang, V., Aguiar, M., Kornhauser, J., Jia, X., Ren, J., Beausoleil, S. A., Silva, J. C., Vemulapalli, V., Bedford, M. T. & Comb, M. J. Molecular & cellular proteomics : MCP 13, 372387, doi:10.1074/mcp.O113.027870 (2014). Cao, X. J., Arnaudo, A. M. & Garcia, B. A. Epigenetics : official journal of the DNA Methylation Society 8, 477-485, doi:10.4161/epi.24547 (2013). Sylvestersen, K. B., Horn, H., Jungmichel, S., Jensen, L. J. & Nielsen, M. L. Molecular & cellular proteomics : MCP 13, 2072-2088, doi:10.1074/mcp.O113.032748 (2014). Wu, Z., Cheng, Z., Sun, M., Wan, X., Liu, P., He, T., Tan, M. & Zhao, Y. Molecular & cellular proteomics : MCP 14, 329-339, doi:10.1074/mcp.M114.044255 (2015). Bland, C., Bellanger, L. & Armengaud, J. J Proteome Res 13, 668-680, doi:10.1021/pr400774z (2014). Liu, M., Fang, C., Pan, X., Jiang, H., Zhang, L., Zhang, L., Zhang, Y., Yang, P. & Lu, H. Anal Chem 87, 9916-9922, doi:10.1021/acs.analchem.5b02437 (2015). Nika, H., Hawke, D. H. & Angeletti, R. H. J Biomol Tech 25, 1-18, doi:10.7171/jbt.14-2501-001 (2014). Richards, A. L., Merrill, A. E. & Coon, J. J. Curr Opin Chem Biol 24, 11-17, doi:10.1016/j.cbpa.2014.10.017 (2015).

ACS Paragon Plus Environment

Page 82 of 104

Page 83 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

96 97 98

99

100

101 102

103 104

105

106 107 108

109 110 111 112

Wisniewski, J. R., Dus, K. & Mann, M. Proteomics Clin Appl 7, 225-233, doi:10.1002/prca.201200046 (2013). Michalski, A., Cox, J. & Mann, M. J Proteome Res 10, 1785-1793, doi:10.1021/pr101060v (2011). Scheltema, R. A., Hauschild, J. P., Lange, O., Hornburg, D., Denisov, E., Damoc, E., Kuehn, A., Makarov, A. & Mann, M. Mol Cell Proteomics 13, 3698-3708, doi:10.1074/mcp.M114.043489 (2014). Hebert, A. S., Richards, A. L., Bailey, D. J., Ulbrich, A., Coughlin, E. E., Westphall, M. S. & Coon, J. J. Mol Cell Proteomics 13, 339-347, doi:10.1074/mcp.M113.034769 (2014). Beck, S., Michalski, A., Raether, O., Lubeck, M., Kaspar, S., Goedecke, N., Baessmann, C., Hornburg, D., Meier, F., Paron, I., Kulak, N. A., Cox, J. & Mann, M. Mol Cell Proteomics 14, 2014-2029, doi:10.1074/mcp.M114.047407 (2015). Deshmukh, A. S., Murgia, M., Nagaraj, N., Treebak, J. T., Cox, J. & Mann, M. Mol Cell Proteomics 14, 841-853, doi:10.1074/mcp.M114.044222 (2015). Thakur, S. S., Geiger, T., Chatterjee, B., Bandilla, P., Frohlich, F., Cox, J. & Mann, M. Molecular & cellular proteomics : MCP 10, M110 003699, doi:10.1074/mcp.M110.003699 (2011). Burgess, M. W., Keshishian, H., Mani, D. R., Gillette, M. A. & Carr, S. A. Mol Cell Proteomics 13, 1137-1149, doi:10.1074/mcp.M113.034660 (2014). Wang, Y., Yang, F., Gritsenko, M. A., Wang, Y., Clauss, T., Liu, T., Shen, Y., Monroe, M. E., Lopez-Ferrer, D., Reno, T., Moore, R. J., Klemke, R. L., Camp, D. G., 2nd & Smith, R. D. Proteomics 11, 2019-2026, doi:10.1002/pmic.201000722 (2011). Bian, Y., Song, C., Cheng, K., Dong, M., Wang, F., Huang, J., Sun, D., Wang, L., Ye, M. & Zou, H. J Proteomics 96, 253-262, doi:10.1016/j.jprot.2013.11.014 (2014). Krisp, C., Yang, H., van Soest, R. & Molloy, M. P. Mol Cell Proteomics 14, 17081719, doi:10.1074/mcp.M114.046425 (2015). Orton, D. J., Wall, M. J. & Doucette, A. A. J Proteome Res 12, 5963-5970, doi:10.1021/pr400738a (2013). Baker, E. S., Livesay, E. A., Orton, D. J., Moore, R. J., Danielson, W. F., 3rd, Prior, D. C., Ibrahim, Y. M., LaMarche, B. L., Mayampurath, A. M., Schepmoes, A. A., Hopkins, D. F., Tang, K., Smith, R. D. & Belov, M. E. J Proteome Res 9, 997-1006, doi:10.1021/pr900888b (2010). Lau, H. T., Suh, H. W., Golkowski, M. & Ong, S. E. J Proteome Res 13, 41644174, doi:10.1021/pr500630a (2014). Frost, D. C., Greer, T. & Li, L. Anal Chem 87, 1646-1654, doi:10.1021/ac503276z (2015). Everley, R. A., Kunz, R. C., McAllister, F. E. & Gygi, S. P. Anal Chem 85, 53405346, doi:10.1021/ac400845e (2013). Hebert, A. S., Merrill, A. E., Stefely, J. A., Bailey, D. J., Wenger, C. D., Westphall, M. S., Pagliarini, D. J. & Coon, J. J. Mol Cell Proteomics 12, 3360-3369, doi:10.1074/mcp.M113.032011 (2013).

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

113

114 115

116 117 118 119 120 121 122 123 124 125

126

127 128 129 130 131 132

Hebert, A. S., Merrill, A. E., Bailey, D. J., Still, A. J., Westphall, M. S., Strieter, E. R., Pagliarini, D. J. & Coon, J. J. Nat Methods 10, 332-334, doi:10.1038/nmeth.2378 (2013). Ow, S. Y., Salim, M., Noirel, J., Evans, C., Rehman, I. & Wright, P. C. J Proteome Res 8, 5347-5355, doi:10.1021/pr900634c (2009). McAlister, G. C., Nusinow, D. P., Jedrychowski, M. P., Wuhr, M., Huttlin, E. L., Erickson, B. K., Rad, R., Haas, W. & Gygi, S. P. Anal Chem 86, 7150-7158, doi:10.1021/ac502040v (2014). Cox, J., Hein, M. Y., Luber, C. A., Paron, I., Nagaraj, N. & Mann, M. Mol Cell Proteomics 13, 2513-2526, doi:10.1074/mcp.M113.031591 (2014). Schwanhausser, B., Busse, D., Li, N., Dittmar, G., Schuchhardt, J., Wolf, J., Chen, W. & Selbach, M. Nature 473, 337-342, doi:10.1038/nature10098 (2011). Lu, P., Vogel, C., Wang, R., Yao, X. & Marcotte, E. M. Nature biotechnology 25, 117-124, doi:10.1038/nbt1270 (2007). Wisniewski, J. R., Hein, M. Y., Cox, J. & Mann, M. Mol Cell Proteomics 13, 34973506, doi:10.1074/mcp.M113.037309 (2014). Brunner, A., Kellermann, J. & Lottspeich, F. Biochimica et biophysica acta 1844, 21-28, doi:10.1016/j.bbapap.2013.02.019 (2014). Guo, Y., Miyagi, M., Zeng, R. & Sheng, Q. BioMed research international 2014, 971857, doi:10.1155/2014/971857 (2014). Rosenberger, G., Ludwig, C., Rost, H. L., Aebersold, R. & Malmstrom, L. Bioinformatics 30, 2511-2513, doi:10.1093/bioinformatics/btu200 (2014). Malm, E. K., Srivastava, V., Sundqvist, G. & Bulone, V. BMC Bioinformatics 15, 441, doi:10.1186/s12859-014-0441-8 (2014). Wang, Y., Ahn, T. H., Li, Z. & Pan, C. Bioinformatics 29, 2064-2065, doi:10.1093/bioinformatics/btt329 (2013). Bailey, D. J., Rose, C. M., McAlister, G. C., Brumbaugh, J., Yu, P., Wenger, C. D., Westphall, M. S., Thomson, J. A. & Coon, J. J. Proc Natl Acad Sci U S A 109, 8411-8416, doi:10.1073/pnas.1205292109 (2012). Graumann, J., Scheltema, R. A., Zhang, Y., Cox, J. & Mann, M. Molecular & cellular proteomics : MCP 11, M111 013185, doi:10.1074/mcp.M111.013185 (2012). Bailey, D. J., McDevitt, M. T., Westphall, M. S., Pagliarini, D. J. & Coon, J. J. J Proteome Res 13, 2152-2161, doi:10.1021/pr401278j (2014). Palaniappan, K. K., Pitcher, A. A., Smart, B. P., Spiciarich, D. R., Iavarone, A. T. & Bertozzi, C. R. ACS Chem Biol 6, 829-836, doi:10.1021/cb100338x (2011). Gallien, S., Duriez, E., Crone, C., Kellmann, M., Moehring, T. & Domon, B. Mol Cell Proteomics 11, 1709-1723, doi:10.1074/mcp.O112.019802 (2012). Peterson, A. C., Russell, J. D., Bailey, D. J., Westphall, M. S. & Coon, J. J. Molecular & cellular proteomics : MCP 11, 1475-1488 (2012). Gillet, L. C., Navarro, P., Tate, S., Rost, H., Selevsek, N., Reiter, L., Bonner, R. & Aebersold, R. Molecular & cellular proteomics : MCP 11, O111 016717 (2012). Picotti, P., Clement-Ziza, M., Lam, H., Campbell, D. S., Schmidt, A., Deutsch, E. W., Rost, H., Sun, Z., Rinner, O., Reiter, L., Shen, Q., Michaelson, J. J., Frei, A., Alberti, S., Kusebauch, U., Wollscheid, B., Moritz, R. L., Beyer, A. & Aebersold, R. Nature 494, 266-270, doi:10.1038/nature11835 (2013).

ACS Paragon Plus Environment

Page 84 of 104

Page 85 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

133 134

135 136 137

138

139 140 141 142 143

144

145

146 147 148

149

150

Egertson, J. D., MacLean, B., Johnson, R., Xuan, Y. & MacCoss, M. J. Nat Protoc 10, 887-903, doi:10.1038/nprot.2015.055 (2015). Bruderer, R., Bernhardt, O. M., Gandhi, T., Miladinovic, S. M., Cheng, L. Y., Messner, S., Ehrenberger, T., Zanotelli, V., Butscheid, Y., Escher, C., Vitek, O., Rinner, O. & Reiter, L. Mol Cell Proteomics 14, 1400-1410, doi:10.1074/mcp.M114.044305 (2015). Zi, J., Zhang, S., Zhou, R., Zhou, B., Xu, S., Hou, G., Tan, F., Wen, B., Wang, Q., Lin, L. & Liu, S. Anal Chem 86, 7242-7246, doi:10.1021/ac501828a (2014). de Graaf, E. L., Altelaar, A. F., van Breukelen, B., Mohammed, S. & Heck, A. J. J Proteome Res 10, 4334-4341, doi:10.1021/pr200156b (2011). Ting, Y. S., Egertson, J. D., Payne, S. H., Kim, S., MacLean, B., Kall, L., Aebersold, R. H., Smith, R. D., Noble, W. S. & MacCoss, M. J. Mol Cell Proteomics, doi:10.1074/mcp.O114.047035 (2015). Escher, C., Reiter, L., MacLean, B., Ossola, R., Herzog, F., Chilton, J., MacCoss, M. J. & Rinner, O. Proteomics 12, 1111-1121, doi:10.1002/pmic.201100463 (2012). Kim, S. & Pevzner, P. A. Nature communications 5, 5277, doi:10.1038/ncomms6277 (2014). Dorfer, V., Pichler, P., Stranzl, T., Stadlmann, J., Taus, T., Winkler, S. & Mechtler, K. J Proteome Res 13, 3679-3684, doi:10.1021/pr500202e (2014). Chick, J. M., Kolippakkam, D., Nusinow, D. P., Zhai, B., Rad, R., Huttlin, E. L. & Gygi, S. P. Nature biotechnology 33, 743-749, doi:10.1038/nbt.3267 (2015). Nahnsen, S., Sachsenberg, T. & Kohlbacher, O. Proteomics 13, 1042-1051, doi:10.1002/pmic.201200315 (2013). Pascal, B. D., West, G. M., Scharager-Tapia, C., Flefil, R., Moroni, T., MartinezAcedo, P., Griffin, P. R. & Carvalloza, A. C. Journal of the American Society for Mass Spectrometry, doi:10.1007/s13361-015-1229-4 (2015). Abshiru, N., Caron-Lizotte, O., Rajan, R. E., Jamai, A., Pomies, C., Verreault, A. & Thibault, P. Nature communications 6, 8648, doi:10.1038/ncomms9648 (2015). Fermin, D., Walmsley, S. J., Gingras, A. C., Choi, H. & Nesvizhskii, A. I. Molecular & cellular proteomics : MCP 12, 3409-3419, doi:10.1074/mcp.M113.028928 (2013). Nanni, P., Panse, C., Gehrig, P., Mueller, S., Grossmann, J. & Schlapbach, R. Proteomics 13, 2251-2255, doi:10.1002/pmic.201300036 (2013). He, L., Xin, L., Shan, B., Lajoie, G. A. & Ma, B. Journal of proteome research 13, 3881-3895, doi:10.1021/pr401115y (2014). Lynn, K. S., Chen, C. C., Lih, T. M., Cheng, C. W., Su, W. C., Chang, C. H., Cheng, C. Y., Hsu, W. L., Chen, Y. J. & Sung, T. Y. Analytical chemistry 87, 2466-2473, doi:10.1021/ac5044829 (2015). Liu, M., Zhang, Y., Chen, Y., Yan, G., Shen, C., Cao, J., Zhou, X., Liu, X., Zhang, L., Shen, H., Lu, H., He, F. & Yang, P. Journal of proteome research 13, 3121-3129, doi:10.1021/pr500238v (2014). Cheng, K., Chen, R., Seebun, D., Ye, M., Figeys, D. & Zou, H. Journal of proteomics 110, 145-154, doi:10.1016/j.jprot.2014.08.006 (2014).

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

151 152

153 154 155

156

157

158 159

160 161 162 163

164 165

166 167 168

Toghi Eshghi, S., Shah, P., Yang, W., Li, X. & Zhang, H. Analytical chemistry 87, 5181-5188, doi:10.1021/acs.analchem.5b00024 (2015). Goyallon, A., Cholet, S., Chapelle, M., Junot, C. & Fenaille, F. Rapid communications in mass spectrometry : RCM 29, 461-473, doi:10.1002/rcm.7125 (2015). Nesvizhskii, A. I. J Proteomics 73, 2092-2123, doi:DOI 10.1016/j.jprot.2010.08.009 (2010). Kall, L., Storey, J. D., MacCoss, M. J. & Noble, W. S. J Proteome Res 7, 40-44, doi:10.1021/pr700739d (2008). Ivanov, M. V., Levitsky, L. I., Lobas, A. A., Panic, T., Laskay, U., Mitulovic, G., Schmid, R., Pridatchenko, M. L., Tsybin, Y. O. & Gorshkov, M. V. J Proteome Res 13, 1911-1920, doi:Doi 10.1021/Pr401026y (2014). Kelchtermans, P., Bittremieux, W., De Grave, K., Degroeve, S., Ramon, J., Laukens, K., Valkenborg, D., Barsnes, H. & Martens, L. Proteomics 14, 353366, doi:DOI 10.1002/pmic.201300289 (2014). Jian, L., Niu, X. N., Xia, Z. H., Samir, P., Sumanasekera, C., Mu, Z., Jennings, J. L., Hoek, K. L., Allos, T., Howard, L. M., Edwards, K. M., Weil, P. A. & Link, A. J. J Proteome Res 12, 1108-1119, doi:Doi 10.1021/Pr300631t (2013). Kall, L., Canterbury, J. D., Weston, J., Noble, W. S. & MacCoss, M. J. Nat. Methods 4, 923-925, doi:Doi 10.1038/Nmeth1113 (2007). Gonnelli, G., Stock, M., Verwaeren, J., Maddelein, D., De Baets, B., Martens, L. & Degroeve, S. A J Proteome Res 14, 1792-1798, doi:Doi 10.1021/Pr501164r (2015). Howbert, J. J. & Noble, W. S. Mol. Cell. Proteomics 13, 2467-2479, doi:DOI 10.1074/mcp.O113.036327 (2014). Cao, J. & Zhang, S. A Biometrics 70, 84-94, doi:10.1111/biom.12122 (2014). Sun, H., Wang, H., Zhu, R., Tang, K., Gong, Q., Cui, J., Cao, Z. & Liu, Q. Bioinformatics 30, 737-739, doi:10.1093/bioinformatics/btt576 (2014). Pathan, M., Keerthikumar, S., Ang, C. S., Gangoda, L., Quek, C. Y., Williamson, N. A., Mouradov, D., Sieber, O. M., Simpson, R. J., Salim, A., Bacic, A., Hill, A. F., Stroud, D. A., Ryan, M. T., Agbinya, J. I., Mariadason, J. M., Burgess, A. W. & Mathivanan, S. Proteomics 15, 2597-2601, doi:10.1002/pmic.201400515 (2015). Elias, J. E. & Gygi, S. P. Methods Mol Biol 604, 55-71, doi:10.1007/978-160761-444-9_5 (2010). Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., IsselTarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M. & Sherlock, G. Nat Genet 25, 25-29, doi:10.1038/75556 (2000). Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27-30 (2000). Kumar, A. A., Holm, L. & Toronen, P. BMC bioinformatics 14, 242, doi:10.1186/1471-2105-14-242 (2013). Huang, D. W., Sherman, B. T. & Lempicki, R. A. Nature Protocols 4, 44-57, doi:10.1038/nprot.2008.211 (2009).

ACS Paragon Plus Environment

Page 86 of 104

Page 87 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

169 170 171

172 173 174

175 176

177 178

179 180 181 182 183 184 185 186 187

Wu, X., Hasan, M. A. & Chen, J. Y. J Theor Biol 362, 44-52, doi:10.1016/j.jtbi.2014.05.031 (2014). Alexa, A., Rahnenfuhrer, J. & Lengauer, T. Bioinformatics 22, 1600-1607, doi:10.1093/bioinformatics/btl140 (2006). Huang, D. W., Sherman, B. T., Tan, Q., Collins, J. R., Alvord, W. G., Roayaei, J., Stephens, R., Baseler, M. W., Lane, H. C. & Lempicki, R. A. Genome Biol 8, R183, doi:10.1186/gb-2007-8-9-r183 (2007). Glass, K. & Girvan, M. Sci Rep 4, 4191, doi:10.1038/srep04191 (2014). Jaffe, J. D., Berg, H. C. & Church, G. M. Proteomics 4, 59-77, doi:10.1002/pmic.200300511 (2004). Zhang, B., Wang, J., Wang, X., Zhu, J., Liu, Q., Shi, Z., Chambers, M. C., Zimmerman, L. J., Shaddox, K. F., Kim, S., Davies, S. R., Wang, S., Wang, P., Kinsinger, C. R., Rivers, R. C., Rodriguez, H., Townsend, R. R., Ellis, M. J., Carr, S. A., Tabb, D. L., Coffey, R. J., Slebos, R. J., Liebler, D. C. & Nci, C. Nature 513, 382387, doi:10.1038/nature13438 (2014). Coordinators, N. R. Nucleic acids research 43, D6-17, doi:10.1093/nar/gku1130 (2015). Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A. & McKusick, V. A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic acids research 33, D514-517, doi:10.1093/nar/gki033 (2005). Kawabata, T., Ota, M. & Nishikawa, K. Nucleic acids research 27, 355-357 (1999). Tabas-Madrid, D., Alves-Cruzeiro, J., Segura, V., Guruceaga, E., Vialas, V., Prieto, G., Garcia, C., Corrales, F. J., Albar, J. P. & Pascual-Montano, A. Journal of proteome research 14, 3738-3749, doi:10.1021/acs.jproteome.5b00466 (2015). Helmy, M., Sugiyama, N., Tomita, M. & Ishihama, Y. Front Plant Sci 3, 65, doi:10.3389/fpls.2012.00065 (2012). Jeong, S. K., Hancock, W. S. & Paik, Y. K. Journal of proteome research 14, 3710-3719, doi:10.1021/acs.jproteome.5b00541 (2015). Wang, X. & Zhang, B. Bioinformatics 29, 3235-3237, doi:10.1093/bioinformatics/btt543 (2013). Zickmann, F. & Renard, B. Y. Bioinformatics 31, i106-115, doi:10.1093/bioinformatics/btv236 (2015). Vermillion, K. L., Jagtap, P., Johnson, J. E., Griffin, T. J. & Andrews, M. T. Journal of proteome research, doi:10.1021/acs.jproteome.5b00575 (2015). Risk, B. A., Spitzer, W. J. & Giddings, M. C. Journal of proteome research 12, 3019-3025, doi:10.1021/pr400208w (2013). Tovchigrechko, A., Venepally, P. & Payne, S. H. Bioinformatics 30, 1469-1470, doi:10.1093/bioinformatics/btu051 (2014). Kim, H., Park, H. & Paek, E. Journal of proteome research 14, 2784-2791, doi:10.1021/acs.jproteome.5b00047 (2015). Nagaraj, S. H., Waddell, N., Madugundu, A. K., Wood, S., Jones, A., Mandyam, R. A., Nones, K., Pearson, J. V. & Grimmond, S. M. Journal of proteome research 14, 2255-2266, doi:10.1021/acs.jproteome.5b00029 (2015).

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

188

189 190

191 192

193

194

195 196 197 198 199 200

Krasnov, G. S., Dmitriev, A. A., Kudryavtseva, A. V., Shargunov, A. V., Karpov, D. S., Uroshlev, L. A., Melnikova, N. V., Blinov, V. M., Poverennaya, E. V., Archakov, A. I., Lisitsa, A. V. & Ponomarenko, E. A. Journal of proteome research 14, 3729-3737, doi:10.1021/acs.jproteome.5b00490 (2015). Ghali, F., Krishna, R., Perkins, S., Collins, A., Xia, D., Wastling, J. & Jones, A. R. Proteomics 14, 2731-2741, doi:10.1002/pmic.201400265 (2014). Crappe, J., Ndah, E., Koch, A., Steyaert, S., Gawron, D., De Keulenaer, S., De Meester, E., De Meyer, T., Van Criekinge, W., Van Damme, P. & Menschaert, G. Nucleic acids research 43, e29, doi:10.1093/nar/gku1283 (2015). Nesvizhskii, A. I. Nature methods 11, 1114-1125, doi:10.1038/nmeth.3144 (2014). Horvatovich, P., Lundberg, E. K., Chen, Y. J., Sung, T. Y., He, F., Nice, E. C., Goode, R. J., Yu, S., Ranganathan, S., Baker, M. S., Domont, G. B., Velasquez, E., Li, D., Liu, S., Wang, Q., He, Q. Y., Menon, R., Guan, Y., Corrales, F. J., Segura, V., Casal, J. I., Pascual-Montano, A., Albar, J. P., Fuentes, M., Gonzalez-Gonzalez, M., Diez, P., Ibarrola, N., Degano, R. M., Mohammed, Y., Borchers, C. H., Urbani, A., Soggiu, A., Yamamoto, T., Salekdeh, G. H., Archakov, A., Ponomarenko, E., Lisitsa, A., Lichti, C. F., Mostovenko, E., Kroes, R. A., Rezeli, M., Vegvari, A., Fehniger, T. E., Bischoff, R., Vizcaino, J. A., Deutsch, E. W., Lane, L., Nilsson, C. L., Marko-Varga, G., Omenn, G. S., Jeong, S. K., Lim, J. S., Paik, Y. K. & Hancock, W. S. Journal of proteome research 14, 3415-3431, doi:10.1021/pr5013009 (2015). Horvatovich, P., Vegvari, A., Saul, J., Park, J. G., Qiu, J., Syring, M., Pirrotte, P., Petritis, K., Tegeler, T. J., Aziz, M., Fuentes, M., Diez, P., Gonzalez-Gonzalez, M., Ibarrola, N., Droste, C., De Las Rivas, J., Gil, C., Clemente, F., Hernaez, M. L., Corrales, F. J., Nilsson, C. L., Berven, F. S., Bischoff, R., Fehniger, T. E., LaBaer, J. & Marko-Varga, G. Journal of proteome research 14, 3441-3451, doi:10.1021/acs.jproteome.5b00486 (2015). Chang, C., Li, L., Zhang, C., Wu, S., Guo, K., Zi, J., Chen, Z., Jiang, J., Ma, J., Yu, Q., Fan, F., Qin, P., Han, M., Su, N., Chen, T., Wang, K., Zhai, L., Zhang, T., Ying, W., Xu, Z., Zhang, Y., Liu, Y., Liu, X., Zhong, F., Shen, H., Wang, Q., Hou, G., Zhao, H., Li, G., Liu, S., Gu, W., Wang, G., Wang, T., Zhang, G., Qian, X., Li, N., He, Q. Y., Lin, L., Yang, P., Zhu, Y., He, F. & Xu, P. Journal of proteome research 13, 38-49, doi:10.1021/pr4009018 (2014). Wilmes, P. & Bond, P. L. Environ Microbiol 6, 911-920, doi:10.1111/j.14622920.2004.00687.x (2004). Clemente, J. C., Ursell, L. K., Parfrey, L. W. & Knight, R. Cell 148, 1258-1270, doi:10.1016/j.cell.2012.01.035 (2012). Zhao, L. The gut microbiota and obesity: from correlation to causality. Nat Rev Microbiol 11, 639-647, doi:10.1038/nrmicro3089 (2013). Lichtman, J. S., Sonnenburg, J. L. & Elias, J. E. ISME J 9, 1908-1915, doi:10.1038/ismej.2015.93 (2015). Huang, H. J., Chen, W. Y. & Wu, J. H. Int J Mol Sci 15, 10169-10184, doi:10.3390/ijms150610169 (2014). Bastida, F., Hernandez, T. & Garcia, C.. J Proteomics 101, 31-42, doi:10.1016/j.jprot.2014.02.006 (2014).

ACS Paragon Plus Environment

Page 88 of 104

Page 89 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

201

Tanca, A., Palomba, A., Pisanu, S., Addis, M. F. & Uzzau, S. Proteomics 15, 3474-3485, doi:10.1002/pmic.201400573 (2015). 202 Xiong, W., Giannone, R. J., Morowitz, M. J., Banfield, J. F. & Hettich, R. L. t. J Proteome Res 14, 133-141, doi:10.1021/pr500936p (2015). 203 Turnbaugh, P. J., Ley, R. E., Mahowald, M. A., Magrini, V., Mardis, E. R. & Gordon, J. I. Nature 444, 1027-1031, doi:10.1038/nature05414 (2006). 204 Leary, D. H., Hervey, W. J. t., Deschamps, J. R., Kusterbeck, A. W. & Vora, G. J. Reprint of "Which metaproteome? The impact of protein extraction bias on metaproteomic analyses". Mol Cell Probes 28, 51-57, doi:10.1016/j.mcp.2014.01.002 (2014). 205 Li, J., Jia, H., Cai, X., Zhong, H., Feng, Q., Sunagawa, S., Arumugam, M., Kultima, J. R., Prifti, E., Nielsen, T., Juncker, A. S., Manichanh, C., Chen, B., Zhang, W., Levenez, F., Wang, J., Xu, X., Xiao, L., Liang, S., Zhang, D., Zhang, Z., Chen, W., Zhao, H., Al-Aama, J. Y., Edris, S., Yang, H., Wang, J., Hansen, T., Nielsen, H. B., Brunak, S., Kristiansen, K., Guarner, F., Pedersen, O., Dore, J., Ehrlich, S. D., Meta, H. I. T. C., Bork, P., Wang, J. & Meta, H. I. T. C. Nat Biotechnol 32, 834841, doi:10.1038/nbt.2942 (2014). 206 Costello, E. K., Lauber, C. L., Hamady, M., Fierer, N., Gordon, J. I. & Knight, R. Science 326, 1694-1697, doi:10.1126/science.1177486 (2009). 207 Qin, J., Li, R., Raes, J., Arumugam, M., Burgdorf, K. S., Manichanh, C., Nielsen, T., Pons, N., Levenez, F., Yamada, T., Mende, D. R., Li, J., Xu, J., Li, S., Li, D., Cao, J., Wang, B., Liang, H., Zheng, H., Xie, Y., Tap, J., Lepage, P., Bertalan, M., Batto, J. M., Hansen, T., Le Paslier, D., Linneberg, A., Nielsen, H. B., Pelletier, E., Renault, P., Sicheritz-Ponten, T., Turner, K., Zhu, H., Yu, C., Jian, M., Zhou, Y., Li, Y., Zhang, X., Qin, N., Yang, H., Wang, J., Brunak, S., Dore, J., Guarner, F., Kristiansen, K., Pedersen, O., Parkhill, J., Weissenbach, J., Bork, P. & Ehrlich, S. D. Nature 464, 59-65, doi:10.1038/nature08821 nature08821 [pii] (2010). 208 Xiao, L., Feng, Q., Liang, S., Sonne, S. B., Xia, Z., Qiu, X., Li, X., Long, H., Zhang, J., Zhang, D., Liu, C., Fang, Z., Chou, J., Glanville, J., Hao, Q., Kotowska, D., Colding, C., Licht, T. R., Wu, D., Yu, J., Sung, J. J., Liang, Q., Li, J., Jia, H., Lan, Z., Tremaroli, V., Dworzynski, P., Nielsen, H. B., Backhed, F., Dore, J., Le Chatelier, E., Ehrlich, S. D., Lin, J. C., Arumugam, M., Wang, J., Madsen, L. & Kristiansen, K. Nat Biotechnol, doi:10.1038/nbt.3353 (2015). 209 Erickson, A. R., Cantarel, B. L., Lamendella, R., Darzi, Y., Mongodin, E. F., Pan, C., Shah, M., Halfvarson, J., Tysk, C., Henrissat, B., Raes, J., Verberkmoes, N. C., Fraser, C. M., Hettich, R. L. & Jansson, J. K. PLoS One 7, e49138, doi:10.1371/journal.pone.0049138 PONE-D-12-22169 [pii] (2012). 210 Noble, W. S. Nat Methods 12, 605-608, doi:10.1038/nmeth.3450 (2015). 211 Jagtap, P., Goslinga, J., Kooren, J. A., McGowan, T., Wroblewski, M. S., Seymour, S. L. & Griffin, T. J. Proteomics 13, 1352-1357, doi:10.1002/pmic.201200352 (2013). 212 Dancik, V., Addona, T. A., Clauser, K. R., Vath, J. E. & Pevzner, P. A. J Comput Biol 6, 327-342, doi:10.1089/106652799318300 (1999).

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

213

Muth, T., Kolmeder, C. A., Salojarvi, J., Keskitalo, S., Varjosalo, M., Verdam, F. J., Rensen, S. S., Reichl, U., de Vos, W. M., Rapp, E. & Martens, L. Proteomics, doi:10.1002/pmic.201400560 (2015). 214 Suomi, T., Corthals, G. L., Nevalainen, O. S. & Elo, L. L. J Proteome Res, doi:10.1021/acs.jproteome.5b00363 (2015). 215 Muth, T., Benndorf, D., Reichl, U., Rapp, E. & Martens, L. Mol Biosyst 9, 578585, doi:10.1039/c2mb25415h (2013). 216 Mesuere, B., Debyser, G., Aerts, M., Devreese, B., Vandamme, P. & Dawyndt, P. Proteomics 15, 1437-1442, doi:10.1002/pmic.201400361 (2015). 217 Huson, D. H., Auch, A. F., Qi, J. & Schuster, S. C. Genome research 17, 377-386, doi:10.1101/gr.5969107 (2007). 218 Tanca, A., Palomba, A., Deligios, M., Cubeddu, T., Fraumene, C., Biosa, G., Pagnozzi, D., Addis, M. F. & Uzzau, S. PLoS One 8, e82981, doi:10.1371/journal.pone.0082981 (2013). 219 Tatusov, R. L., Fedorova, N. D., Jackson, J. D., Jacobs, A. R., Kiryutin, B., Koonin, E. V., Krylov, D. M., Mazumder, R., Mekhedov, S. L., Nikolskaya, A. N., Rao, B. S., Smirnov, S., Sverdlov, A. V., Vasudevan, S., Wolf, Y. I., Yin, J. J. & Natale, D. A. BMC bioinformatics 4, 41, doi:10.1186/1471-2105-4-41 (2003). 220 Powell, S., Forslund, K., Szklarczyk, D., Trachana, K., Roth, A., Huerta-Cepas, J., Gabaldon, T., Rattei, T., Creevey, C., Kuhn, M., Jensen, L. J., von Mering, C. & Bork, P.. Nucleic Acids Res 42, D231-239, doi:10.1093/nar/gkt1253 (2014). 221 Galperin, M. Y., Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Nucleic acids research 43, D261-269, doi:10.1093/nar/gku1223 (2015). 222 Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., HuertaCepas, J., Simonovic, M., Roth, A., Santos, A., Tsafou, K. P., Kuhn, M., Bork, P., Jensen, L. J. & von Mering, C. Nucleic Acids Res 43, D447-452, doi:10.1093/nar/gku1003 (2015). 223 Huang da, W., Sherman, B. T. & Lempicki, R. A. Nucleic Acids Res 37, 1-13, doi:10.1093/nar/gkn923 (2009). 224 Letunic, I., Yamada, T., Kanehisa, M. & Bork, P. Trends in biochemical sciences 33, 101-103, doi:10.1016/j.tibs.2008.01.001 (2008). 225 Klaassens, E. S., de Vos, W. M. & Vaughan, E. E. Appl Environ Microbiol 73, 1388-1392, doi:AEM.01921-06 [pii] 10.1128/AEM.01921-06 (2007). 226 Verberkmoes, N. C., Russell, A. L., Shah, M., Godzik, A., Rosenquist, M., Halfvarson, J., Lefsrud, M. G., Apajalahti, J., Tysk, C., Hettich, R. L. & Jansson, J. K. ISME J 3, 179-189, doi:10.1038/ismej.2008.108 (2009). 227 Florens, L., Carozza, M. J., Swanson, S. K., Fournier, M., Coleman, M. K., Workman, J. L. & Washburn, M. P. Methods 40, 303-311, doi:10.1016/j.ymeth.2006.07.028 (2006). 228 Umbarger, H. E.. Annu Rev Biochem 47, 532-606, doi:10.1146/annurev.bi.47.070178.002533 (1978). 229 Hui, S., Silverman, J. M., Chen, S. S., Erickson, D. W., Basan, M., Wang, J., Hwa, T. & Williamson, J. R. Molecular systems biology 11, 784, doi:10.15252/msb.20145697 (2015).

ACS Paragon Plus Environment

Page 90 of 104

Page 91 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

230

Oda, Y., Huang, K., Cross, F. R., Cowburn, D. & Chait, B. T. Proceedings of the National Academy of Sciences of the United States of America 96, 6591-6596 (1999). 231 Eckburg, P. B., Bik, E. M., Bernstein, C. N., Purdom, E., Dethlefsen, L., Sargent, M., Gill, S. R., Nelson, K. E. & Relman, D. A. Science 308, 1635-1638, doi:10.1126/science.1110591 (2005). 232 Turnbaugh, P. J., Ridaura, V. K., Faith, J. J., Rey, F. E., Knight, R. & Gordon, J. I. Science translational medicine 1, 6ra14, doi:10.1126/scitranslmed.3000322 (2009). 233 Rauniyar, N., McClatchy, D. B. & Yates, J. R., 3rd. Methods (San Diego, Calif 61, 260-268, doi:10.1016/j.ymeth.2013.03.008 (2013). 234 Boersema, P. J., Raijmakers, R., Lemeer, S., Mohammed, S. & Heck, A. J. Nat Protoc 4, 484-494, doi:10.1038/nprot.2009.21 (2009). 235 Waldor, M. K., Tyson, G., Borenstein, E., Ochman, H., Moeller, A., Finlay, B. B., Kong, H. H., Gordon, J. I., Nelson, K. E., Dabbagh, K. & Smith, H. PLoS Biol 13, e1002050, doi:10.1371/journal.pbio.1002050 (2015). 236 Perez-Cobas, A. E., Gosalbes, M. J., Friedrichs, A., Knecht, H., Artacho, A., Eismann, K., Otto, W., Rojo, D., Bargiela, R., von Bergen, M., Neulinger, S. C., Daumer, C., Heinsen, F. A., Latorre, A., Barbas, C., Seifert, J., Dos Santos, V. M., Ott, S. J., Ferrer, M. & Moya, A. Gut 62, 1591-1601, doi:10.1136/ gutjnl-2012-303184 [pii] (2013). 237 Deatherage Kaiser, B. L., Li, J., Sanford, J. A., Kim, Y. M., Kronewitter, S. R., Jones, M. B., Peterson, C. T., Peterson, S. N., Frank, B. C., Purvine, S. O., Brown, J. N., Metz, T. O., Smith, R. D., Heffron, F. & Adkins, J. N. PLoS One 8, e67155, doi:10.1371/journal.pone.0067155 (2013). 238 Lassek, C., Burghartz, M., Chaves-Moreno, D., Otto, A., Hentschker, C., Fuchs, S., Bernhardt, J., Jauregui, R., Neubauer, R., Becher, D., Pieper, D. H., Jahn, M., Jahn, D. & Riedel, K. Mol Cell Proteomics 14, 989-1008, doi:10.1074/mcp.M114.043463 (2015). 239 Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., Bhagat, J., Belhajjame, K., Bacall, F., Hardisty, A., Nieva de la Hidalga, A., Balcazar Vargas, M. P., Sufi, S. & Goble, C. Nucleic Acids Res 41, W557-561, doi:10.1093/nar/gkt328 (2013). 240 Boekel, J., Chilton, J. M., Cooke, I. R., Horvatovich, P. L., Jagtap, P. D., Kall, L., Lehtio, J., Lukasse, P., Moerland, P. D. & Griffin, T. J. Nature biotechnology 33, 137-139, doi:10.1038/nbt.3134 (2015). 241 Warr, W. A. J Comput Aided Mol Des 26, 801-804, doi:10.1007/s10822-0129577-7 (2012). 242 Lettre, G., Palmer, C. D., Young, T., Ejebe, K. G., Allayee, H., Benjamin, E. J., Bennett, F., Bowden, D. W., Chakravarti, A., Dreisbach, A., Farlow, D. N., Folsom, A. R., Fornage, M., Forrester, T., Fox, E., Haiman, C. A., Hartiala, J., Harris, T. B., Hazen, S. L., Heckbert, S. R., Henderson, B. E., Hirschhorn, J. N., Keating, B. J., Kritchevsky, S. B., Larkin, E., Li, M., Rudock, M. E., McKenzie, C. A., Meigs, J. B., Meng, Y. A., Mosley, T. H., Newman, A. B., Newton-Cheh, C. H., Paltoo, D. N., Papanicolaou, G. J., Patterson, N., Post, W. S., Psaty, B. M., Qasim, A. N., Qu, L., Rader, D. J., Redline, S., Reilly, M. P., Reiner, A. P., Rich, S. S.,

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

243

244

245

246

247

248

249

250

251

252

Rotter, J. I., Liu, Y., Shrader, P., Siscovick, D. S., Tang, W. H., Taylor, H. A., Tracy, R. P., Vasan, R. S., Waters, K. M., Wilks, R., Wilson, J. G., Fabsitz, R. R., Gabriel, S. B., Kathiresan, S. & Boerwinkle, E. PLoS genetics 7, e1001300, doi:10.1371/journal.pgen.1001300 (2011). Presley, L. L., Ye, J., Li, X., Leblanc, J., Zhang, Z., Ruegger, P. M., Allard, J., McGovern, D., Ippoliti, A., Roth, B., Cui, X., Jeske, D. R., Elashoff, D., Goodglick, L., Braun, J. & Borneman, J. Inflamm Bowel Dis 18, 409-417, doi:10.1002/ibd.21793 (2012). Mischak, H., Allmaier, G., Apweiler, R., Attwood, T., Baumann, M., Benigni, A., Bennett, S. E., Bischoff, R., Bongcam-Rudloff, E., Capasso, G., Coon, J. J., D'Haese, P., Dominiczak, A. F., Dakna, M., Dihazi, H., Ehrich, J. H., FernandezLlama, P., Fliser, D., Frokiaer, J., Garin, J., Girolami, M., Hancock, W. S., Haubitz, M., Hochstrasser, D., Holman, R. R., Ioannidis, J. P., Jankowski, J., Julian, B. A., Klein, J. B., Kolch, W., Luider, T., Massy, Z., Mattes, W. B., Molina, F., Monsarrat, B., Novak, J., Peter, K., Rossing, P., Sanchez-Carbayo, M., Schanstra, J. P., Semmes, O. J., Spasovski, G., Theodorescu, D., Thongboonkerd, V., Vanholder, R., Veenstra, T. D., Weissinger, E., Yamamoto, T. & Vlahou, A. Sci Transl Med 2, 46ps42, doi:10.1126/scitranslmed.3001249 (2010). Skates, S. J., Gillette, M. A., LaBaer, J., Carr, S. A., Anderson, L., Liebler, D. C., Ransohoff, D., Rifai, N., Kondratovich, M., Tezak, Z., Mansfield, E., Oberg, A. L., Wright, I., Barnes, G., Gail, M., Mesri, M., Kinsinger, C. R., Rodriguez, H. & Boja, E. S. J Proteome Res 12, 5383-5394, doi:10.1021/pr400132j (2013). Gemoll, T., Epping, F., Heinrich, L., Fritzsche, B., Roblick, U. J., Szymczak, S., Hartwig, S., Depping, R., Bruch, H. P., Thorns, C., Lehr, S., Paech, A. & Habermann, J. K. Oncotarget 6, 16517-16526 (2015). Chaker, S., Kashat, L., Voisin, S., Kaur, J., Kak, I., MacMillan, C., Ozcelik, H., Siu, K. W., Ralhan, R. & Walfish, P. G. Proteomics 13, 771-787, doi:10.1002/pmic.201200356 (2013). Ralhan, R., Veyhl, J., Chaker, S., Assi, J., Alyass, A., Jeganathan, A., Somasundaram, R. T., MacMillan, C., Freeman, J., Vescan, A. D., Witterick, I. J. & Walfish, P. G. Thyroid, doi:10.1089/thy.2015.0114 (2015). Birse, C. E., Lagier, R. J., FitzHugh, W., Pass, H. I., Rom, W. N., Edell, E. S., Bungum, A. O., Maldonado, F., Jett, J. R., Mesri, M., Sult, E., Joseloff, E., Li, A., Heidbrink, J., Dhariwal, G., Danis, C., Tomic, J. L., Bruce, R. J., Moore, P. A., He, T., Lewis, M. E. & Ruben, S. M. Clin Proteomics 12, 18, doi:10.1186/s12014015-9090-9 (2015). Shipitsin, M., Small, C., Choudhury, S., Giladi, E., Friedlander, S., Nardone, J., Hussain, S., Hurley, A. D., Ernst, C., Huang, Y. E., Chang, H., Nifong, T. P., Rimm, D. L., Dunyak, J., Loda, M., Berman, D. M. & Blume-Jensen, P. Br J Cancer 111, 1201-1212, doi:10.1038/bjc.2014.396 (2014). Surinova, S., Radova, L., Choi, M., Srovnal, J., Brenner, H., Vitek, O., Hajduch, M. & Aebersold, R. EMBO Mol Med 7, 1153-1165, doi:10.15252/emmm.201404874 (2015). Surinova, S., Choi, M., Tao, S., Schuffler, P. J., Chang, C. Y., Clough, T., Vyslouzil, K., Khoylou, M., Srovnal, J., Liu, Y., Matondo, M., Huttenhain, R., Weisser, H.,

ACS Paragon Plus Environment

Page 92 of 104

Page 93 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

253

254 255

256

257

258

259

260 261

262 263

264

265

Buhmann, J. M., Hajduch, M., Brenner, H., Vitek, O. & Aebersold, R.. EMBO Mol Med 7, 1166-1178, doi:10.15252/emmm.201404873 (2015). Mebazaa, A., Vanpoucke, G., Thomas, G., Verleysen, K., Cohen-Solal, A., Vanderheyden, M., Bartunek, J., Mueller, C., Launay, J. M., Van Landuyt, N., D'Hondt, F., Verschuere, E., Vanhaute, C., Tuytten, R., Vanneste, L., De Cremer, K., Wuyts, J., Davies, H., Moerman, P., Logeart, D., Collet, C., Lortat-Jacob, B., Tavares, M., Laroy, W., Januzzi, J. L., Samuel, J. L. & Kas, K. Eur Heart J 33, 2317-2324, doi:10.1093/eurheartj/ehs162 (2012). Lake, D. F. & Faigel, D. O. Antioxid Redox Signal 21, 485-496, doi:10.1089/ars.2013.5572 (2014). Radon, T. P., Massat, N. J., Jones, R., Alrawashdeh, W., Dumartin, L., Ennis, D., Duffy, S. W., Kocher, H. M., Pereira, S. P., Guarner posthumous, L., MurtaNascimento, C., Real, F. X., Malats, N., Neoptolemos, J., Costello, E., Greenhalf, W., Lemoine, N. R. & Crnogorac-Jurcevic, T. Clinical cancer research : an official journal of the American Association for Cancer Research 21, 35123521, doi:10.1158/1078-0432.CCR-14-2467 (2015). Suresh, C. P., Saha, A., Kaur, M., Kumar, R., Dubey, N. K., Basak, T., Tanwar, V. S., Bhardwaj, G., Sengupta, S., Batra, V. V. & Upadhyay, A. D. Clinical and experimental nephrology, doi:10.1007/s10157-015-1162-7 (2015). De Marchi, T., Liu, N. Q., Stingl, C., Timmermans, M. A., Smid, M., Look, M. P., Tjoa, M., Braakman, R. B., Opdam, M., Linn, S. C., Sweep, F. C., Span, P. N., Kliffen, M., Luider, T. M., Foekens, J. A., Martens, J. W. & Umar, A. Mol Oncol, doi:10.1016/j.molonc.2015.07.004 (2015). Campone, M., Valo, I., Jezequel, P., Moreau, M., Boissard, A., Campion, L., Loussouarn, D., Verriele, V., Coqueret, O. & Guette, C. Mol Cell Proteomics, doi:10.1074/mcp.M115.048967 (2015). Dunne, J. C., Lamb, D. S., Delahunt, B., Murray, J., Bethwaite, P., Ferguson, P., Nacey, J. N., Sondhauss, S. & Jordan, T. W. Clin Proteomics 12, 24, doi:10.1186/s12014-015-9096-3 (2015). Park, J., Yun, H. S., Lee, K. H., Lee, K. T., Lee, J. K. & Lee, S. Y. Cancer Res 75, 3227-3235, doi:10.1158/0008-5472.CAN-14-2896 (2015). Christin, C., Hoefsloot, H. C., Smilde, A. K., Hoekman, B., Suits, F., Bischoff, R. & Horvatovich, P. Mol Cell Proteomics 12, 263-276, doi:10.1074/mcp.M112.022566 (2013). Lin, H., Zhou, Ling, Peng, Heng and Zhou, Xiao-Hua. Canadian Journal of Statistics 39, 324-343, doi:10.1002/cjs.10107 (2011). Taguchi, F., Solomon, B., Gregorc, V., Roder, H., Gray, R., Kasahara, K., Nishio, M., Brahmer, J., Spreafico, A., Ludovini, V., Massion, P. P., Dziadziuszko, R., Schiller, J., Grigorieva, J., Tsypin, M., Hunsucker, S. W., Caprioli, R., Duncan, M. W., Hirsch, F. R., Bunn, P. A., Jr. & Carbone, D. P. J Natl Cancer Inst 99, 838846, doi:10.1093/jnci/djk195 (2007). Carbone, D. P., Ding, K., Roder, H., Grigorieva, J., Roder, J., Tsao, M. S., Seymour, L. & Shepherd, F. A. J Thorac Oncol 7, 1653-1660, doi:10.1097/JTO.0b013e31826c1155 (2012). Gautschi, O., Dingemans, A. M., Crowe, S., Peters, S., Roder, H., Grigorieva, J., Roder, J., Zappa, F., Pless, M., Brutsche, M., Baty, F., Bubendorf, L., Hsu

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

266

267

268 269

270

Schmitz, S. F., Na, K. J., Carbone, D., Stahel, R. & Smit, E. Lung Cancer 79, 59-64, doi:10.1016/j.lungcan.2012.10.006 (2013). Stinchcombe, T. E., Roder, J., Peterman, A. H., Grigorieva, J., Lee, C. B., Moore, D. T. & Socinski, M. A. J Thorac Oncol 8, 443-451, doi:10.1097/JTO.0b013e3182835577 (2013). Gregorc, V., Novello, S., Lazzari, C., Barni, S., Aieta, M., Mencoboni, M., Grossi, F., De Pas, T., de Marinis, F., Bearz, A., Floriani, I., Torri, V., Bulotta, A., Cattaneo, A., Grigorieva, J., Tsypin, M., Roder, J., Doglioni, C., Levra, M. G., Petrelli, F., Foti, S., Vigano, M., Bachi, A. & Roder, H. Lancet Oncol 15, 713-721, doi:10.1016/S1470-2045(14)70162-7 (2014). Zhang, Z. & Chan, D. W. Cancer Epidemiol Biomarkers Prev 19, 2995-2999, doi:10.1158/1055-9965.EPI-10-0580 (2010). Hathout, Y., Brody, E., Clemens, P. R., Cripe, L., DeLisle, R. K., Furlong, P., Gordish-Dressman, H., Hache, L., Henricson, E., Hoffman, E. P., Kobayashi, Y. M., Lorts, A., Mah, J. K., McDonald, C., Mehler, B., Nelson, S., Nikrad, M., Singer, B., Steele, F., Sterling, D., Sweeney, H. L., Williams, S. & Gold, L. Proc Natl Acad Sci U S A 112, 7153-7158, doi:10.1073/pnas.1507719112 (2015). Subbannayya, T., Leal-Rojas, P., Barbhuiya, M. A., Raja, R., Renuse, S., Sathe, G., Pinto, S. M., Syed, N., Nanjappa, V., Patil, A. H., Garcia, P., Sahasrabuddhe, N. A., Nair, B., Guerrero-Preston, R., Navani, S., Tiwari, P. K., Santosh, V., Sidransky, D., Prasad, T. S., Gowda, H., Roa, J. C., Pandey, A. & Chatterjee, A. BMC Cancer 15, 843, doi:10.1186/s12885-015-1855-z (2015).

ACS Paragon Plus Environment

Page 94 of 104

Page 95 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Table 1. Comparison of the Selectivity and Sensitivity of some Novel Materials for Phosphopeptides Enrichment.

Chromatography Material IMAC

MOAC

Fe3O4@SiO2@(HA/CS)10-Ti4+ CCC-Ti4+ fibers Fe3O4@PDA@Zr-MOF Fe3O4@MIL-100 (Fe) G@PD@TiO2 Fe3O4@mTiO2@mSiO2 Fe3O4@SiO2-La/Sm2O3 Fe3O4/Graphene/(Ti−Sn)O4 TiO2 nanotubes on Ti wire MC-TiNbNS Fe3O4-TiNbNS

Selectivity (β-casein/BSA) 1/2000 1/1000 1/500 1/500 1/1000 1/1000 1/8500 1/1500 1/2000 1/100

ACS Paragon Plus Environment

Sensitivity (detection limit) 0.5 fmol 10 fmol 1 fmol 0.5 fmol 5 fmol 3 fmol 1 amol 40 pmol 0.2 nmol 40 fmol

Ref [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44]

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

FIGURE LEGENDS

Figure 1: Schematic representation of the flow of biological information from DNA to RNA to protein and the proteomics work-flow designed to study biological complexity at the functional level of proteins. The information from our genes is increased by many orders of magnitude following protein translation. To understand biological systems, protein function and dysfunction, must be deciphered. Bottom-up proteomics and bioinformatics forms a cycle; proteomics to first recover, identify and quantify the post-translation modifications, proteinprotein interactions, signalling events, etcetera using advances in sample preparation and mass spectrometry while the software and bioinformatics packages reconstruct these bits of information to systems level understanding of complex biological processing.

Figure 2: Schematic representation of the trend in MS acquisition mode changes. Discovery proteomics is becoming more directed with high reproducibility and accuracy, while validation proteomics is becoming increasingly higher throughput, far beyond targeted analysis on selected transitions. SWATH like MS acquisition is gaining popularity, and is expected to provide a panorama library for both protein identification and quantification.

Figure 3: Schematic representation of factors impacting methodology choice in metaproteomic sample preparation. A) Factors, including the assessment of the

ACS Paragon Plus Environment

Page 96 of 104

Page 97 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

host-microbe interactions and requirement for microbe or host enrichment are important considerations for experimental design in metaproteomics. B) Several extraction buffers should be tested and their corresponding protein extraction yield calculated. C) Downstream taxonomic and functional analysis should be performed for all extraction methods tested to evaluate their potential effect on biological interpretations.

Figure 4: Proteomics at the interface of host-microbe interactions. To study the host-microbe interactions, samples including either stool, biopsy or mucosalluminal interface aspirates are collected from each experimental individual. Biopsies could be used for examine the host proteome changes, while stool and mucosal-luminal interface samples contain both host and microbial proteins. Ideally both proteomes could be examined in a single MS run, but for high coverage, sample pre-processing like differential centrifugation is usually needed where either the microbial or host proteins are enriched followed by MS analysis. Following protein or peptide identification, the quantitation information (L/H ratios or LFQ intensities) could be extracted from the MS data with software suites such as MaxQaunt and Census. Trans-kingdom association studies should be performed to identify co-variate host and microbial proteins for understanding the potential mechanisms of host-microbe interactions, which can then be further regarded as targets for validation studies.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5: Schematic overview of the steps required for effective biomarker identification by proteomics in order for discovery of clinically relevant biomarkers. After defining a clinical question, the identified population should be reflective of the disease, and balanced for age, gender and ethnicity. The appropriate type of sample should be utilized based upon the disease in question, consistently obtained in relation to treatments and body rhythms, and stored to ensure protein stability. Consistent processing methods must be clearly defined, including how protein is isolated, digested and analyzed. At the discovery stage, robust analysis should be used to prioritize and test candidate markers. Markers should be verified in an independent population, and validated by orthogonal methods.

.

ACS Paragon Plus Environment

Page 98 of 104

Page 99 of 104

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

FOR TABLE OF CONTENTS ONLY Consistent sampling and processing

Consistent iden fica on and quan fica on pep des

Tissues

i Bodily fluds

Bioinforma cs Analysis

Present

Quantified

Pathways

Protein Isola on Enrichment Diges on Analysis

Buffer Buffer B A

Identified

Protein Amount

Healthy Symptoma c Disease control Non-disease

ACS Paragon Plus Environment

Figure  1   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Analytical Chemistry

Recovering   Protein   Database  &   Iden.ty,  Form   SoOware  Pipelines  to     and  Quan.ty   Reconstruct   &  Understand  Func.on  

Biological  Informa.on  Flow   Storage     Alternative Promoter Usage SNPs Alternative Processing mRNA Editing

Processing    

Page 100 of 104

Genome ~20,000

Transcriptome est. 100,000

Mass  Spectrometry   Iden.fica.on  of   Enzyma.cally   Digested  Pep.des  

Retaining  Func.onal  Informa.on   Protein-Protein Interaction

Covering  the  Proteome  

Modifica.on    

Proteomic   Complexity  

Proteome est. 1,000,000

Temporal  Varia.on       Spa.al  Varia.on    

Efficiency Reproducibility Automation High-throughput

Post-transcriptional Modifications Enzymatic Activity Trafficking

Signaling

Execu.on     Func.on  

Protein Turnover

Subcellular Compartmentalization Protein-Protein Interaction

PTM Enrichment Strategies to retain/detect low abundant proteins

Biological   Depth     Complexity  

ACS Paragon Plus Environment

Discrimination Time Steps

Protein  Extrac.on  

BoCom-­‐Up  Work-­‐Flow    

Figure   Page 1012 of   104 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Analytical Chemistry

ACS Paragon Plus Environment

Figure  3  

A.  Pre-­‐protein  extrac0on  considera0ons   Host-­‐microbe   interac/on  

Page 102 of 104

B.  Protein  extrac0on  considera0ons   Choice  of   extrac/on   buffer  

Microbe   enrichment  

Buffer   A  

Protein   extrac/on  yield   Buffer   B  

Protein   Debris  

Ca   Enriches  for   calcium   binding   proteins  

Enriches  for   membrane   transport   proteins  

Buffer  A  

Buffer  B  

C.  Evaluate  effect  of  extrac0on  methods  on  taxonomic  assignment  and  func0on     Rela/ve  abundance  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Analytical Chemistry

40   35   30   25   20   15   10   5   0  

Energy  produc/on   Proteobacteria   Ac/nobacteria   Bacteroides  

lipid  metabolism   Transcrip/on   Transla/on  

Buffer  A  

Amino  acid  metabolism  

Buffer  B  

Cell  mo/lity     Firmicutes   ACS Paragon Plus Environment Defense  mechanisms   Buffer  A  

Buffer  B  

0  

10  

20  

30  

40  

50  

Figure   Page 1034 of   104 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Analytical Chemistry

ACS Paragon Plus Environment

Figure  5  

Analytical Chemistry

Page 104 of 104

Iden%fica%on  of  sample  popula%ons  

Protein  Isola%on   Enrichment  

Design  

Diges%on   Analysis   Healthy   control  

Symptoma%c   Non-­‐disease  

Disease  

Tissues  

Bodily  fluids  

Discovery   Pathways  

Univariate  

Mul%variate  

ROC  

Standard  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Consistent  sampling  and  processing  

Valida%on  

Immunohistochemistry  

Immunoblot   ACS Paragon Plus Environment

ELISA  

SRM