LaCyTools: A Targeted Liquid Chromatography ... - ACS Publications

Jun 7, 2016 - ...
2 downloads 6 Views 1MB Size
Subscriber access provided by MCGILL UNIV

Article

LaCyTools – a targeted LC-MS data processing package for relative quantitation of glycopeptides Bas Cornelis Jansen, David Falck, Noortje de Haan, Agnes L. Hipgrave Ederveen, Genadij Razdorov, Gordan Lauc, and Manfred Wuhrer J. Proteome Res., Just Accepted Manuscript • DOI: 10.1021/acs.jproteome.6b00171 • Publication Date (Web): 07 Jun 2016 Downloaded from http://pubs.acs.org on June 9, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Journal of Proteome Research is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

LaCyTools – a targeted LC-MS data processing package for relative quantitation of glycopeptides Bas C. Jansen1, David Falck1, Noortje de Haan1, Agnes L. Hipgrave Ederveen1, Genadij Razdorov2, Gordan Lauc2, Manfred Wuhrer1,* 1

Center for Proteomics and Metabolomics, Leiden University Medical Center, 2300RC Leiden, The

Netherlands 2

Department of Biochemistry and Molecular Biology, Faculty of Pharmacy and Biochemistry,

University of Zagreb, A. Kovačića 1, HR10000, Zagreb, Croatia

*

Corresponding author: Manfred Wuhrer, [email protected], Tel. +31-71-5268744

1 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 38

ABSTRACT Bottom-up glycoproteomics by liquid chromatography-mass spectrometry (LC-MS) is an established approach for assessing glycosylation in a protein- and site-specific manner. Consequently, tools are needed to automatically align, calibrate and integrate LC-MS glycoproteomics data. We developed a modular software package designed to tackle the individual aspects of an LC-MS experiment, called LaCyTools. Targeted alignment is performed using user defined m/z and retention time (tr) combinations. Subsequently, sum spectra are created for each user defined analyte group. Quantitation is performed on the sum spectra, where each user defined analyte can have its own tr, minimum and maximum charge states. Consequently, LaCyTools deals with multiple charge states, giving an output per charge state if desired, and offers various analyte and spectra quality criteria. We compared throughput and performance of LaCyTools to combinations of available tools that deal with individual processing steps. LaCyTools yielded relative quantitation of equal precision (relative standard deviation < 0.5%) and higher trueness, due to the use of MS peak area instead of MS peak intensity. In conclusion, LaCyTools is an accurate automated data processing tool for highthroughput analysis of LC-MS glycoproteomics data. Released under the Apache 2.0 license, it is freely available on GitHub (https://github.com/Tarskin/LaCyTools).

Keywords

Bioinformatics ● glycoproteomics ● profiling ● mass spectrometry ● liquid-

chromatography ● relaƟve quanƟtaƟon ● quality control ● alpha-1-antitrypsin ● immunoglobulin G

2 ACS Paragon Plus Environment

Page 3 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

INTRODUCTION Protein glycosylation is known to be involved in many biological processes and to have a large influence on protein properties.1 Glycosylation is known to be highly heterogeneous, and a single glycoprotein can carry many different glycoforms at one glycosylation site, as demonstrated by immunoglobulin G (IgG), on which at least 17 different glycan structures were found on the fragment crystallizable (Fc)-region.2,3 When more than one glycosylation site is present on a protein, the number of possible glycoprotein isoforms rapidly increases, as is for example the case for alpha1-antitrypsin (AAT) with three N-glycosylation sites and even more so for viral envelope proteins (gp140 of the human immunodeficiency virus) and carcinoembryonic antigen (CEA) which both have more than 20 N-glycosylation sites.4-6 This variability makes protein glycosylation analysis demanding and has led to several approaches at the released glycan, glycopeptide or intact glycoprotein level.7 Analysis at the glycopeptide level offers protein- as well as site-specific information.6 Reverse-phase (RP)-liquid chromatography (LC)-mass spectrometry (MS) is the method of choice for glycopeptidebased glycosylation profiling, although other separation methods have likewise been shown to be valuable.8,9 In quantitative high-throughput analysis of glycopeptides using LC- or capillary electrophoresis (CE)MS data (pre-)processing represents a pronouned bottleneck due to the often high sample complexity. Ideally, a software package is capable of performing all steps required in LC-MS based glycoproteomics studies, including identification, alignment, calibration and quantitation. Recently, several software packages for the identification of glycosylation features were developed, such as GlycoWorkbench, GlycoMiner, Proteinscape and I-GPA.10-13 These developments facilitate the use of targeted data analysis approaches, i.e. extracting the same analytes from a set of samples. The following steps are alignment, calibration and quantitation. There are many individual tools that can perform part(s) of this workflow, such as the freely available MZmine2, mMass, 3D Max Extractor (3D Max), msalign2, MultiGlycan and the commercially available DataAnalysis (DA, Bruker Daltonics, Bremen, Germany).13-18 However, while these packages are well suited for what they are designed, 3 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 38

none offers a complete solution for targeted glycoproteomic analyses. For example, large-scale studies on IgG Fc N-glycosylation were hitherto done using a combination of separate tools, making sample processing laborious.19-21 Here, we developed an automated modular software package for repeated targeted LC- or CE-MS analysis of glycopeptides, called LaCyTools, based on a recently developed software package for MALDI-MS data.22 It is capable of performing targeted alignment, calibration and relative quantitation of the data. Unlike most other software tools in data pre-processing, it calculates several quality criteria (QC) that allow the researcher to curate the data obtained (see Figure 1). LaCyTools was tested on glycopeptide LC-MS data sets of both AAT and IgG. The tests showed robust results, including high quality alignment and relative quantitation with high precision and trueness. LaCyTools allowed the high-throughput (pre-)processing of LC-MS data. Additionally, it provided figures for efficient and objective spectra and analyte curation.

MATERIALS & METHODS Software & Hardware LaCyTools was programmed using Geany 0.21 as an integrated development environment. A 32- or 64-bit python 2.7 with the numpy, scipy, matplotlib, tkinter and optionally the pytables libraries is required to run the program.23-26 Anaconda, a python distribution containing all the required libraries was used to run LaCyTools.27 Proteowizard was used to convert raw data files to the mzXML file format.28 All processing steps were performed on a server containing two Intel Xeon E5-2630 v3 processors and 384GB of memory that was running 64-bits Microsoft Server 2012 R2.

Materials Ultra-pure deionized water (MQ) was generated by a Gard 2 system (Millipore, Amsterdam, Netherlands) maintained at ≥ 18.2 MΩ. Ammonium bicarbonate (ABC; ≥ 99.5% purity), formic acid (FA; ≥ 98% purity) and TPCK treated trypsin from bovine pancreas, were purchased from Sigma4 ACS Paragon Plus Environment

Page 5 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Aldrich (Steinheim, Germany). Trifluoroacetic acid (TFA; ≥ 98% purity), disodium hydrogen phosphate dihydrate (Na2HPO4∙2H2O; ≥ 99.5% purity), potassium dihydrogen phosphate (KH2PO4; ≥ 99.5% purity) and sodium chloride (NaCl; ≥ 99.5% purity) were purchased from Merck (Darmstadt, Germany) and HPLC SupraGradient acetonitrile (ACN; ≥ 99.97 purity) from Biosolve (Valkenswaard, Netherlands). Phosphate-buffered saline (PBS) was made in-house, containing 5.7 g/L Na2HPO4∙2H2O, 0.5 g/L KH2PO4 and 8.5 g/L NaCl. Polyclonal human plasma IgG was isolated from one human plasma sample, donated by a healthy volunteer. A purified AAT standard was purchased from Athens Research and Technology (Georgia, USA).

Sample preparation All AAT glycopeptides were generated from a purified AAT standard. Approximately 50 µg of standard was denatured with an acidification and drying step.18 Briefly, the sample was denatured with 100 µl 100 mM formic acid, followed by drying in a centrifugal vacuum concentrator at 60 °C. The lyophilized AAT was first dissolved in 20 µl 50 mM ABC containing 15% ACN, after which 20 µl of trypsin was added (enzyme/substrate ratio 1:167; 300 ng) which was incubated on a shaker (1000 rpm, Heidolph Titramax 100; Heidolph, Kelheim, Germany) for 5 min, followed by overnight cleavage at 37 °C. Using this method three glycosylated peptides are generated from AAT (UniProtKB entry P01009),

namely

QLAHQSN70STNIFFSPVSIATAFAMLSLGTK

(N70),

ADTHDEILEGLNFN107LTEIPEAQIHEGFQELLR (N107) and YLGN271ATAIFFLPDEGK (N271). The amino acids that can carry a glycan are marked with bold text. All IgG subclasses, except IgG3, were isolated from human plasma by the use of protein A (ProtA) Sepharose beads (GE Healthcare, Uppsala, Sweden). 2 µL plasma was incubated with 15 µL beads and 100 µL PBS for 1 h on a plate shaker (1000 rpm; Heidolph Titramax 100). Subsequently, the sample was washed three times with 200 µL PBS and three times with 200 µL MQ. Elution of the sample was done with 100 µL 100 mM FA, using a centrifuge for 1 min at 100 g. The sample was dried using a vacuum concentrator at 60 °C and dissolved in 40 µL 25 ng/µL TPCK treated trypsin in

5 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 38

25 mM ABC (pH 7.9). The isolated proteins were incubated overnight at 37 °C and stored at -20 °C until use. Three glycosylated peptides are generated using this method, namely EEQYN296YSTYR (IgG1), EEQFN296STYR (IgG2) and EEQN296STNFR (IgG4). The amino acids that can carry a glycan are marked with bold text.

Data acquisition AAT and IgG glycopeptides were separated and analyzed on an Ultimate 3000 RSLCnano system (ThermoFisher Scientific, CA, USA) coupled to a Maxis Impact HD quadrupole-time-of-flight (qTOF) mass spectrometer (Bruker Daltonics, Bremen, Germany). The LC was interfaced to the qTOF-MS via an electrospray ionization source equipped with CaptiveSprayTM and nanoBoosterTM technologies (Bruker Daltonics). Prior to elution, the samples were concentrated on a C18 solid phase extraction (SPE) trap column (Dionex Acclaim PepMap100, 5 mm x 300 μm i.d.). Sample separation was achieved on a Supelco Ascentis Express C18 nano-liquid chromatography column (50 mm x 75 μm i.d., 2.7 μm HALO fused core particles, Sigma-Aldrich) conditioned at 900 nL min−1 with either 20% or 3% solvent B for AAT and IgG, respectively. Spectra were recorded from m/z 550 to m/z 1800, with a frequency of 0.5 and 1Hz for AAT and IgG glycopeptides, respectively. The collision energy was set on 7.0 eV, the transfer time was 110 μs and the pre-pulse storage 21 μs. For the AAT sample, 1 μL was injected. Four different gradients (solvent A 0.1% TFA, solvent B 95% ACN), with a separation window of 4.5 minutes, were used. Gradient 1; start: 20% solvent B, 2 min: 25% solvent B and 4.5 min: 60% solvent B. Gradient 2; start: 15% solvent B, 2 min: 20% solvent B and 4.5 min: 60% solvent B. Gradient 3; start: 25% solvent B, 2 min: 28% solvent B and 4.5 min: 60% solvent B. Gradient 4; start: 20% solvent B, 2 min: 25% solvent B and 4.5 min: 70% solvent B. For the IgG sample, 1 μL was injected after five times dilution in water. A total of 221 glycopeptide measurements were collected over a period of five weeks, serving as a system suitability standard. The gradient (solvent A 0.1% TFA, solvent B 95% ACN) used for the IgG glycopeptides was as follows:

6 ACS Paragon Plus Environment

Page 7 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

start: 3% solvent B, 2 min: 6% solvent B, 4.5 min: 18% solvent B, 5 min: 30% solvent B, 7 min: 30% solvent B, 8 min: 1% solvent B and 11 min: 1% solvent B. For structural elucidation of AAT glycopeptides, 0.2 μl of the sample was injected into a nano-LC-ESIion trap-MSn system for fragmentation analysis by collision induced dissociation (CID). Within the Ultimate 3000 RSLCnano system (Thermo Scientific), samples were first concentrated onto an Acclaim PepMap 100 C18 trap column (100 μm × 2 cm, C18 particle size 5 μm, pore size 100 Å, ThermoFisher Scientific) prior to separation on an Acclaim PepMap RSLC nano-column (75 μm × 15 cm, C18 particle size 2 μm, pore size 100 Å, ThermoFisher Scientific). A flow rate of 700 nL min-1 was applied with the following gradient (solvent A 0.1% FA; solvent B 95% ACN), start: 10% solvent B, 5 min: 10% solvent B, 35 min: 60% solvent B, 40 min: 70% solvent B, 45 min: 70% solvent B, 46 min: 10% solvent B and 65 min: 10% solvent B. The LC was interfaced to the AmazonSpeed ion trap (Bruker Daltonics) via an electrospray ionization source equipped with CaptiveSprayTM and nanoBoosterTM technologies (Bruker Daltonics). MSn was automatically performed on the three highest precursors per MS spectrum, with ion detection from m/z 500 to m/z 3000.

Analyte detection and confirmation AAT analytes were detected by creating an extracted ion chromatogram (EIC) of the major glycoforms for each of the three glycosylated peptides that were previously described in literature.6 A sum spectrum was created per peptide moiety, using all spectra that were recorded within a ± 25 s retention time (tr) tolerance around the spectrum with the highest intensity for the most abundant glycoforms. A single sum spectrum was created, per peptide moiety, from all four AAT LC gradients. All sum spectra were manually searched for the presence of additional glycoforms. The identity of the main peak(s) in each sum spectrum was investigated using MS2. IgG glycoforms were also manually assigned as described above, with the following minor differences. The tr tolerance for creating sum spectra was 15 s and sum spectra were created per unique peptide moiety. A total of

7 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 38

five sum spectra was combined into a single sum spectrum to manually search for additional glycoforms.

Analyte Preprocessing in LaCyTools Analyte preprocessing in LaCyTools was performed in a similar manner as described before for MassyTools.22 Briefly, the calibration and/or quantitation was done on the basis of a user-defined list of analytes. For the analytes in this list, the required information is the composition of the analyte and whether or not the analyte should be considered as a calibrant. Optionally, the m/z width, and the minimum and maximum charge states for calibration/quantitation can be specified per analyte. Compositions can only contain building blocks that are defined inside the program’s “BLOCKS” section. Building blocks must be listed in the [unit 1 identifier][unit 1 count][unit 2 identifier][unit 2 count] notation, e.g., IgGI1H4N4F1 for an immunoglobulin G1 glycopeptide carrying a glycan consisting of four hexoses, four N-acetylhexosamines and one fucose. Unit identifiers can only contain letters, the ones used in this article being IgGI, IgGII and IgGIV for immunoglobulin G peptides, NLT (N107), NAT (N271) and NST (N70) for AAT peptides, H for hexose, N for Nacetylhexosamine, F for fucose and S for N-acetylneuraminic acid. Additional building blocks can be added according to needs and should encompass the exact mass of a unit as well as its elemental composition (number of carbon, hydrogen, nitrogen, oxygen and sulfur atoms). The elemental composition is required by the program to be able to calculate the theoretical isotopic pattern of analytes. The possibility to apply building blocks to all compositions in a user defined analyte list is also included in LaCyTools. Any building block that is put in the “MASS_MODIFIERS” section of the script will be applied to all compositions at once. However, by default this parameter is empty. Furthermore, the charge carrier applied on all compositions can be changed by modifying the “CHARGE_CARRIER” section, protonation (H+) being the default. Alternatively, the charge carrier can also be changed by changing it in the composition, e.g. adding [old charge carrier]-1[new charge

8 ACS Paragon Plus Environment

Page 9 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

carrier]1. Potential salt formation can be handled in the same way, e.g. [proton]-1[sodium]1 for a protonated sodium salt.

Alignment List The alignment of IgG and AAT samples was compared between LaCyTools, msalign2 and MZmine2. The minimum number of features necessary for both methods was five or seven for AAT and IgG, respectively. For the alignment of AAT these included N107-H5N4F1S1 (m/z 1438.641), N107-H5N4S2 (m/z 1474.901), N70-H5N4S2 (m/z 1347.353), N70-H5N4F1S2 (m/z 1383.868) and N271-H5N4S2 (m/z 990.918), where all m/z values were calculated as [M+4H]4+. The AAT notation refers to the peptides around N70, N107 and N271 (Supporting Information, Table S-1). IgG features used for alignment were IgG1-H3N4F1 (m/z 878.687), IgG1-H4N4F1 (m/z 932.704), IgG1-H5N4F1 (m/z 986.722), IgG2-H3N4F1 (m/z 868.024), IgG2-H4N4F1 (m/z 922.041), IgG2-H5N4F1 (m/z 976.059), IgG4-H3N4F1 (m/z 873.355), IgG4-H4N4F1 (m/z 927.373) and IgG4-H5N4F1 (m/z 981.390), where all m/z values were calculated as [M+3H]3+.

Alignment The alignment of LC-MS runs requires the accurate retention time, meaning the exactly measured retention time (tr), to be determined for all alignment features. Therefore, LaCyTools creates an EIC of each feature with the specified m/z value ± tolerance in Thomson (Th). Additionally, the tr and tr tolerance of the feature are used to indicate the outer bounds for identifying the accurate tr. The tr of a spectrum that yields the highest intensity for a given feature is used as the accurate tr. Subsequently, the signal-to-noise (S/N) ratio of the specified features has to be determined, to ensure that features with a S/N below the user specified minimum are not used for alignment. Due to the difference in the type of data, we use a different method to determine the S/N in the chromatographic component as in the mass spectrometric component. In detail, we use the assumption that noise is normally distributed in our NOBAN (normal distribution based background and noise determination) algorithm. A decision workflow of the 9 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 38

algorithm is presented in the Supporting Information, Figure S-1. The algorithm first sorts all the data points by increasing intensity (Supporting Information, Figure S-2A), and then takes the lowest 25% of the data points to get an initial estimate for the background and noise (Supporting Information, Figure S-2B). Subsequently, the program determines, whether the amount of considered data points has to be reduced or expanded based on the average and standard deviation (SD) of its initial estimate. There are two possible scenarios. In the first scenario, the first data point after the initial estimate falls within the average + three times SD. The algorithm then adds this data point to the initial estimate to get the current estimate and recalculates the average and SD. This process is repeated until the following data point falls outside three times SD of the current estimate. In the second scenario, the first data point after the initial estimate falls outside three times SD, an indication that the initial estimate was too greedy. The algorithm then shrinks the initial estimate by removing the last data point and recalculates the average and SD of the current estimate. This procedure is repeated until a data point is removed that does fall within three times SD of the current estimate, in which case the final reduction is undone. The resulting average is used as the background. For the noise value either the difference between the highest and lowest background data point or the SD can be used (SD being used in this manuscript; Supporting Information, Figure S-2C). While the signal intensity is a well-defined concept and does not differ per method, the noise determination can produce greatly varying results. Therefore, we have compared the S/N that is reported by the NOBAN algorithm with a manual approach as well as the S/N that is reported by DA. The manual S/N determination was done by taking the difference between the highest and lowest data point in a region that showed no clear signals, as the noise. The signal was determined by subtracting the average of a region that shows no clear signals from the maximum signal intensity. Finally, the manual S/N is then calculated by dividing the signal by the noise. The S/N that is reported in DA was determined by selecting the peak of interest, using APEX algorithm with a full width at half maximum (FWHM) of 0.1 Th.

10 ACS Paragon Plus Environment

Page 11 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

To continue with the alignment, the exact tr, meaning the tr that was specified by the user, and accurate tr of features that passed the specified signal-to-noise cutoff are used. After plotting the accurate tr against the exact tr, a power law function (a∙xb + c) is fitted through the time pairs, using the value 1 as an initial estimate for the a, b and c parameters. The program is then allowed to randomly adjust the a, b and c parameters until the fit yields the lowest least squares distance. However, a brisk penalty is applied to the fit if the b term is below 0 or above 2, to prevent overfitting. Finally, the LC-MS run is aligned by adjusting the tr by tr,new = f(tr,old). LaCyTools alignment was compared with msalign2 and MZmine2 alignment, using the AAT glycopeptides.16,17 For msalign2, the parameters were set as follows: a mass error of 20 ppm, a “standard deviation retention time” of 50 scans, a minimum of 50 features, a cost per breakpoint of 0.3, a fitness of 1 and a maximum of 1000 generations during the fit calculation. For MZmine2, peaks were detected using the exact mass detection in MS1, with a noise threshold of 1000 arbitrary units (a.u.). Subsequently, chromatograms were built using a “minimum time window” of 0.02 s, a minimum intensity of 1000 a.u. and an m/z tolerance of 0.05 Th. The tr were then normalized using a minimum intensity of 1000 a.u., an m/z tolerance of 0.05 Th and a tr tolerance of 60 s. Lastly, tr alignment was performed using an m/z tolerance of 0.05 Th, a weight for the m/z of 20, a tr tolerance of 60 s, a weight for the tr of 10 and all detected features had to be of the same charge state. For LaCyTools, the alignment was performed using a tr tolerance of 45 s, an m/z tolerance of 0.1 Th, a minimum S/N of 9 for alignment features and a minimum of five features to be used for the alignment. Alternatively, for the IgG glycopeptides, LaCyTools alignment was performed using a tr tolerance of 10 s, an m/z tolerance of 0.1 Th, a minimum S/N of 9 for alignment features and a minimum of seven features to be used for the alignment.

Sum Spectrum LaCyTools performs both calibration and quantitation on a single sum spectrum per analyte cluster (in our LC-MS method, e.g. IgG1-H3N4F1, IgG1-H4N4F1 and IgG1-H5N4F1 will elute at very similar tr).

11 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 38

The first step in creating a sum spectrum is generating an empty spectrum with equally sized m/z bins, where the number of bins per Th is specified in the script, with a default value of 100. Next, the tr region is calculated per analyte cluster (specified tr ± tr tolerance). Subsequently, all spectra that have a tr that falls within the tr region will be selected. In detail, the sum spectrum bins will be filled by summing the intensities of all data points that have an m/z value that is larger than the lower edge of the bin and smaller or equal to the upper edge of the bin, using all the individual spectra that fulfilled the above mentioned tr selection. For both the AAT and IgG glycopeptides, 100 data points per Th were used, as well as a tr tolerance of 8 s around the center of each peptide cluster.

Calibration & Quantitation List LaCyTools performs calibration and quantitation on the basis of a single user defined list as described in Analyte Preprocessing in LaCyTools. The following glycoforms were extracted for each of the glycosylated AAT peptide sequences: H5N4S2, H5N4F1S2, H6N5S3, H6N5F1S3, H5N4S1, H5N4F1S1, H6N5S2 and H6N5F1S2. Of these, the glycoforms H5N4S2, H5N4F1S2, H6N5S3 and H6N5F1S3 were used as potential calibrants, for each peptide sequence. For the IgG measurements, the following glycoforms were extracted for all unique Fc-peptide moieties derived from the IgG subclasses: H3N3, H5N2, H4N3, H3N4, H4N3F1, H3N4F1, H4N4, H4N4F1, H5N4, H4N3F1S1, H5N4F1, H5N5, H4N4F1S1, H5N5F1, H5N4F1S1, H5N4F1S2. However, for IgG4 the afucosylated glycoforms were removed due to overlap with IgG1 glycoforms. Calibration of each subclass of the IgG measurements was achieved using the following glycoforms: H3N4F1, H4N4F1, H5N4F1 and H5N4F1S1.

Calibration The accurate mass of each calibrant has to be determined for calibration. Therefore, LaCyTools calculates the exact mass for the most abundant isotope of each specified calibrant in all applicable charge states, e.g. m/z 1318.024 and m/z 879.017 for the second isotope of IgG1H3N4F1 as a doubly and triply protonated molecule. The accurate mass is then determined by fitting a cubic spline 12 ACS Paragon Plus Environment

Page 13 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

through all of the data points that fall within a charge state corrected mass region around the exact mass, e.g. a mass tolerance of ± 0.4 Da will yield a region of ± 0.13 Th around the exact mass of a triply charged analyte.22 Subsequently, the S/N of each potential calibrant is calculated as described below. Calibrants that do not pass the specified S/N are excluded from the calibration. The number of remaining calibrants must be equal or greater than the minimum number of calibrants specified by the user, the absolute minimum being four. A second degree polynomial function is fitted through the remaining pairs of accurate and exact masses, minimizing the residual post-calibration mass error for all calibrants, reported in parts-per-million (ppm). The sum spectrum is then calibrated by applying the resulting second degree polynomial to all m/z values in the sum spectrum using the following function: m/znew = f(m/zold). For the calibration of the AAT measurements, a minimum of three calibrants were required per charge state ([M+4H]4+ and [M+5H]5+). Alternatively, for the IgG measurements a minimum of five calibrants were required per charge state ([M+2H]2+ and [M+3H]3+). Furthermore, all calibrants had to have a S/N above 9 and the feature had to be detected within a mass tolerance of 0.3 Da (charge state corrected) from the exact mass.

Background Determination in Sum Spectra LaCyTools uses the adaptive background determination, reported before in MassyTools.22 However, there are some differences due to the effect of multiple charge states in LC-ESI-MS data. Briefly, the program integrates all m/z regions (definition see Jansen et al.22) that fall within a specified m/z tolerance, the default being ± 10 m/z regions from the exact mass. Both width and spacing of the m/z regions are essential to accurately determine the background and noise of an analyte. Therefore, the spacing is charge state corrected, e.g. for a triply charged analyte all m/z regions are spaced at 0.33445 Th, which is 1.00335 Da (the mass difference between a 12C and 13C) divided by 3. However, the width of the m/z regions has to be specified by the user in the script, using the “MASS_WINDOW” parameter. This value should be as big as the broadness of the MS peak of

13 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 38

interest. Across the whole range of m/z regions, the average intensity for all possibilities of five consecutive regions is calculated. The five consecutive m/z regions offering the lowest average intensity are considered as the background region. The area for each of these regions is then calculated as described below. The average area of the five m/z regions is the final background. Additionally, the background intensity and noise are calculated as the average and the SD for all the data points of the five consecutive m/z regions.

Relative Quantitation and Quality Control LaCyTools offers several methods of quantitation. One option, max quantitation, reports the highest intensity data point for each analyte (MS peak intensity). However, the default method is MS peak area integration as performed by MassyTools.22 Briefly, within the integration region (default ± 0.2 Th) the area is calculated as the sum of all data point intensities multiplied by the distance in Th between data points. The background area is calculated following the same principle. Finally, the software is able to report the area of an analyte, the background subtracted area of an analyte, the background area around an analyte and the average noise around an analyte. The (background subtracted) area of an analyte can also be listed as a relative value after total area normalization. Furthermore, the total intensity, meaning the sum of the intensities of all analytes, can be reported, as this is required for total area normalization. Finally, LaCyTools outputs the same QC as MassyTools, namely isotope pattern quality (IPQ) (previously called QC value), mass error and the S/N for the theoretically highest isotope.22 However, LaCyTools reports these values for each measured charge state separately. LaCyTools was compared with both DA and 3D Max quantitation. For fair comparison, all three methods were set up to quantify the first three isotopes of both quadruply and quintuply charged glycopeptides of AAT. Manual editing of the analyte reference file, see Analyte Preprocessing in LaCyTools, was required to allow LaCyTools to integrate only the first three isotopes, as normally LaCyTools integrates a percentage of the theoretical isotopic pattern. All glycopeptide signals were

14 ACS Paragon Plus Environment

Page 15 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

integrated using LaCyTools applying an m/z tolerance of ± 0.15 Th around the exact mass. For DA integration, summed EICs were created for each AAT glycopeptide, using the three main isotopes of both quadruply and quintuply protonated molecules with a width of 0.05 Th. Subsequently, the area was calculated by integrating the summed EICs using DA. The reported areas were then normalized to the total area. For 3D Max quantitation, a wider m/z region of 0.15 Th was used as this program cannot calibrate the data.18 3D Max then reports the maximum intensities measured in each m/z region, thus per isotope. These were then summed manually to get a single intensity value for each glycopeptide, followed by total area normalization. Alternatively, for IgG glycopeptides LaCyTools was set to integrate 99% of the theoretical isotopic pattern. Lastly, LaCyTools QC were used to perform a curation of the integrated AAT analytes. Briefly, the mass error, S/N and IPQ were all calculated per charge state. Analytes were considered detected if they had a mass error between -20 and 20 ppm, a S/N above 9 and lastly a deviation from the theoretical isotopic pattern below 5% for one of their charge states. However, for quantitation the S/N had to be above 27. An exception was made for the main glycoform from N70 of AAT, which failed the curation (mass error > 20 ppm) in the [M+5H]5+ charge state, due to a bad peak shape. Since the other QC passed the threshold and the influence of the interference on quantitation was judged to be minor, the bias incurred by exclusion was judged to be larger. Other analytes that failed this curation were excluded from all quantitation methods.

Performance and Data Storage The performance of LaCyTools was evaluated by using a set of 221 measurements of one IgG glycopeptide sample which had been acquired over the period of one month. A complete workflow was performed on these measurements, including alignment, calibration, integration and calculation of the QC. The time was tracked using the log file created by LaCyTools. For comparison purposes, we tracked the processing time of the four AAT measurements using the individual packages of MZmine2, DA and 3D Max. Notably, these require significantly more hands-on time than LaCyTools.

15 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 38

Furthermore, due to the input and output (IO) performance limitations and size of the mzXML MSdata format, an optional data encoding scheme, based on BLOSC library, was included.29 BLOSC (blocking, shuffling, and compression) is a meta-compression library optimized for IO performance over compression efficiency. Data is written on disk as blocked, shuffled and compressed chunks. Chunks are transmitted to a CPU cache and decompressed faster than using traditional methods. The BLOSC library was first implemented as a filter in the pytables library, a python library that is built around the HDF5 library.30,31 The HDF5 library itself is a library for hierarchical multidimensional binary data, able to cope with extremely large and complex data sets. LaCyTools, is able to encode all measurements in two extendable arrays where one array stores the m/z values and the other array stores the intensities. The array contains a single row for each scan that was measured during the LC-MS run. We compared the performance of LaCyTools using the standard mzXML reading methods and the HDF5 methods, as well as the file size using the four AAT measurements.

RESULTS & DISCUSSION LaCyTools was developed to allow the (pre)processing of glycoproteomic LC-MS data in a robust and high-throughput manner. LaCyTools is able to perform tr alignment, mass spectra calibration, targeted data integration of all isotopes of a list of user defined analytes and calculating QC (calculating for each analyte the mass error, S/N and IPQ; all per charge state). We compared LaCyTools with several other software packages at individual steps of an entire glycoproteomic workflow and an overview of the capabilities of the tested software packages is presented in Table 1. The LaCyTools package is released under the Apache 2.0 license and is freely available on GitHub (https://github.com/Tarskin/LaCyTools). In order to demonstrate the potential of LaCyTools, we performed a complete analysis of two glycoproteomic measurements sets: 1) the analysis of a single AAT sample measured using four different gradients and 2) the analysis of a single IgG sample measured 221-fold over the period of

16 ACS Paragon Plus Environment

Page 17 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

one month. The analysis included tr alignment, sum spectra creation and calibration, after which targeted data and background integration took place and QC were calculated (Figure 1). The analyte list was created by manual inspection of sum spectra created per peptide moiety, using all spectra that were recorded around the spectrum with the highest intensity for the most abundant glycoform, based on published literature for both the AAT and IgG glycopeptide samples.2,6 Identity of the main glycopeptides of AAT was confirmed by MS2 (Supporting Information, Figure S-3). No glycoforms additional to the ones published before were detected on the basis of MS1 and MS2 data.

Alignment The first part of the alignment requires the determination of the S/N of the features for alignment, using the NOBAN algorithm. We have compared the NOBAN algorithm with both a manual approach as well as with the commercially available DA. The resulting S/N are generally 2.6 times higher than manually calculated values, the average of two features over four AAT measurements runs being 237 (LaCyTools) and 92 (Manual). A full table comparing different S/N determinations can be found in the Supporting Information, Table S-2. The large difference between DA and LaCyTools is caused by the DA method overestimating the noise, as illustrated by the peak eluting before the N107H5N4S2 glycopeptide being considered noise using the DA method (Supporting Information, Figure S-2C). Furthermore, the smaller difference between the manual S/N determination and the S/N reported by LaCyTools, using the difference between the highest and lowest data point in a region as noise, is caused by slightly varying tr tolerances between the two methods. The alignment function of LaCyTools was tested using a single AAT sample that was measured using four different LC gradients. Alignment was performed using five features, namely N107-H5N4F1S1 (m/z 1438.641), N107-H5N4S2 (m/z 1474.901), N70-H5N4S2 (m/z 1347.353), N70-H5N4F1S2 (m/z 1383.868) and N271-H5N4S2 (m/z 990.918), with all m/z values calculated as [M+4H]4+. The accurate tr of the features was determined as the peak maximum observed in the automatically generated EIC, and coupled to their target tr, as manually estimated from gradient 1 (Figure 2). A power law

17 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 38

function was used to align the features to the target tr (Figure 3). Before alignment, the unaligned time difference (Δutr) between exact and accurate tr varied between -37.7 s for N271-H5N4S2 in gradient 3 and 20.5 s for N271-H5N4S2 in gradient 2 (Figure 4). Following alignment, all features had a residual time difference (Δtr), between exact and accurate tr, of less than 1 s (Figure 4; Supporting Information, Table S-3). Furthermore, the total root mean square of all Δtr was below 0.5 s. Additionally, we tested the alignment using 221 IgG glycopeptide runs of the same sample, measured over a 1 month period. A total of 9 features was used for alignment of each LC-MS run (Supporting Information, Figure S-4). Before alignment, there is a maximum tr difference of 13.9 s between measurements (-6.8 s vs. 7.1 s), with 657 of 1989 (9 x 221) features showing a Δutr of above 1 s and the maximum observed Δutr being 8.1 s (Supporting Information, Table S-4). After alignment, 95.4% of these features showed a Δtr below 1 s of the target tr. The remaining 92 of 1989 (9 x 221) features, that showed a Δtr of more than 1 s, all showed a Δtr of less than 2 s, except for 5 features that showed a poor peak shape (Supporting Information, Table S-5). LaCyTools uses a power law function for alignment as opposed to a polynomial as it is used during the calibration. The advantage of power law over a polynomial fit is that the minimum or maximum will always be at x = 0. With the function minimum or maximum at x≠0 in a polynomial fit, two scans could have the same tr after alignment (data not shown). However, a power law function performs similar to a polynomial function for runs where the above mentioned problem does not occur (Supporting Information, Figure S-5). LaCyTools alignment was compared to an existing alignment program, MSAlign2 which uses a genetic algorithm approach, which performs a heuristic search that mimics natural selection.17,32 MSAlign2 requires a master file that it will use as an alignment target. Therefore, to ensure fair comparison, the target times for LaCyTools alignment were taken from the same run as the specified master file for MSAlign2. However, MSAlign2 was unable to align the data, presumably due to the low amount of features that were detected in our sample. This result clearly demonstrates the

18 ACS Paragon Plus Environment

Page 19 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

difference between untargeted proteomic and targeted glycoproteomic experiments, namely the data density. Untargeted alignment works well for proteomics studies, but the requirement for a large amount of features hinders its applicability to targeted glycoproteomic study designs. Furthermore, LaCyTools alignment was compared to targeted peak detection and alignment using MZmine2.16 MZmine2 was able to detect 2000 peaks in the AAT measurements, which included the peaks of interest. The alignment of several analytes was evaluated, including the main glycopeptide H5N4S2 in each peptide cluster. After alignment, the Δtr of each analyte that was used for alignment was below 1 s (data not shown). LaCyTools also showed a Δtr of less than 1 s for all features that were used for alignment (Supporting Information, Table S-3). However, the main difference between the two methods is the ease of use, as illustrated by MZmine2 requiring the user to perform several sequential steps, each with its own set of parameters that need optimization. For instance, we spent several hours to find the exact parameters required to achieve an MZmine2 alignment where the Δtr for all features was below 1s. Within LaCyTools, we simply set the tr tolerance to be the maximum observed Δtr ± 10%.

Sum Spectrum & Calibration LaCyTools performs calibration and quantitation on a sum spectrum that is created by summing all spectra within a specified tr region. The main advantage of using a sum spectrum to calibrate a peptide cluster is that features that elute either in the beginning or towards the end of the peptide cluster are present in the sum spectrum, and can be used as calibrants. For calibration, we use a spline fit combined with a polynomial function.22 To evaluate the quality of the calibration, we calibrated both AAT and IgG glycopeptide samples. AAT measurements were calibrated using a sum spectrum resolution of 100 data points per Th (Figure 5). Per peptide cluster of AAT, H5N4S2, H5N4F1S2, H6N5S3 and H6N5F1S3 glycoforms were used as potential calibrants. The average residual mass error after calibration was below 1 ppm for all quadruply charged potential calibrants and the SD varied between 1.0 and 5.9 ppm. For the 19 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 38

quintuply charged potential calibrants the program showed an average residual mass error below 2 ppm for 4 of the 8 potential calibrants, while the remaining 4 potential calibrants showed average residual mass errors above 20 ppm (Supporting Information, Table S-6). The 4 calibrants that showed a higher residual mass error also showed an interference or a poor peak shape. The SD of the correctly identified calibrants was between 1.2 and 2.0 ppm. Furthermore, for the 221 IgG glycopeptide runs of the same sample, measured over a 1 month period (Supporting Information, Figure S-6), an average residual mass error below 1 ppm was found for all potential calibrants (H3N4F1, H4N4F1, H5N4F1 and H5N4F1S1) in all IgG subclasses, in both the doubly and the triply charged form. The SD of the residual mass errors varied between 1.5 ppm for IgG2-H5N4F1S1 and 4.2 ppm for IgG1-H5N4F1 (Supporting Information, Table S-7). Furthermore, the average residual mass error of non-calibrant glycopeptides was between -3 and 6 ppm in either doubly or triply charged form (Supporting Information, Table S-8). The higher mass error for these peaks is caused by irregular peak shapes as both glycopeptides have less than 0.5% relative abundance. These results demonstrate that calibration of peptide clusters using LaCyTools yields high quality calibration. Furthermore, the post calibration residual mass errors can be used to identify potential contaminant peaks. High quality calibration is essential to allow the researcher to use a narrow m/z region during quantitation, leading to improved selectivity.

Relative Quantitation and Quality Control Quantitation using LaCyTools was compared with quantitation using either DA or 3D Max. The 3D Max quantitation was performed using a wider quantitation region than used for LaCyTools and DA as 3D Max Extractor does not have an option for calibration. The profiles reported by DA, 3D Max and the max quantitation option of LaCyTools are similar, as all use MS peak intensity. The values reported by MS peak area integration of LaCyTools differ slightly from both the DA and 3D Max values as is illustrated by the H5N4S2 glycopeptide being only 80.7% (SD 0.4%) for N70 while it is 86.8% (SD 0.1%) and 86.7% (SD 0.1%) with DA and 3D Max, respectively (Figure 6). The discrepancy

20 ACS Paragon Plus Environment

Page 21 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

in relative quantities is partially caused by an increase of the peak width with increasing m/z, something that is ignored when looking only at the maximum intensity. Another reason for the discrepancy is that less of the isotope pattern is considered in the DA and 3D Max methods. Both biases result in an underestimation of larger glycopeptides and an overestimation of smaller glycopeptides in DA, 3D Max and the max quantitation option of LaCyTools. Furthermore, there was no difference between the variation observed for all four methods, as illustrated by the SD of the main AAT glycoform never exceeding 1% for all three peptide clusters (Supporting Information, Table S-9). We previously reported that MS peak area integration yielded a higher precision than MS peak intensity quantitation.22 We did not observe this for the AAT measurements, possibly due to the simplicity of AAT glycosylation with one dominant glycoform (> 80% relative abundance) for N271 and N70. However, for N107 we did notice a minor improvement of MS peak area integration when compared to MS peak intensity quantitation. Therefore, we also compared the MS peak area integration with MS peak intensity quantitation using the 221 IgG measurements, which has a more complex glycosylation than AAT. The major glycoform of IgG1, H4N4F1, showed a significant difference between the precision of LaCyTools and 3D Max, with the SDs being 1.2% and 3.9% with LaCyTools and 3D Max, respectively (Brown-Forsythe test; p-value < 0.000001).33 We evaluated the effect of the peak width bias by several methods. Manual inspection of the spectra indeed confirmed that the main (4th) isotopic peak of H5N4S2 was narrower than the main (4th) isotopic peak of H6N5S3, as demonstrated by an average full width at half maximum (FWHM) of 0.0304 Th for H5N5S2 and an average FWHM of 0.0585 Th for H6N5S3. Additionally, we performed several quantitation steps with LaCyTools of the quadruply charged glycoforms on N70 of AAT with various integration m/z tolerances, from 0.20 Th to 0.05 Th, to compare with an MS peak intensity quantitation (Supporting Information, Figure S-7). The results clearly show that the observed difference between an MS peak area integration and an MS intensity based quantitation derives from the peak width.

21 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 38

Regarding the isotopic pattern, DA and 3D Max require the user to specify all the isotopes for quantitation, which was 3 isotopes in this study. A similar number of isotopes for two different glycopeptides will reflect a different percentage of the theoretical isotopic pattern. For example, the 3 main isotopes of H5N4S2 on peptide N70 capture 57.2% of the theoretical isotopic pattern, while the 3 main isotopes of H6N5S3 on the same peptide capture 54.5% of the theoretical isotopic pattern (Supporting Information, Figure S-8). This will directly result in a 3% relative bias in DA and 3D Max, explaining a significant part of the observed differences. The bias becomes larger with a larger spread of glycopeptide sizes, found in more complex samples. To reduce the biases described above, LaCyTools normally performs an MS peak area integration of a minimum percentage of the theoretical isotopic pattern. The data quality determines how many isotopes can actually be distinguished from the background. Though background subtraction significantly reduces the error in quantitation, data quality may suffer if an excessive amount of low abundant isotopes is integrated. A compromise needs to be found, but as LaCyTools reports the theoretical percentage of the isotope pattern which has been integrated, the obtained quantitation results can be corrected for the remaining bias. Additionally, we have quantified 221 runs of IgG glycopeptides with LaCyTools MS peak integration measured over a period of 1 month. We used 99% of the theoretical isotopic pattern and demonstrated a profile highly similar to previously reported IgG glycopeptide profiles (Supporting Information, Figure S-9). However, while the SD we observed for the IgG measurements was low, it was higher than what we expected. For instance, the main glycopeptide of IgG1 (H4N4F1) showed an average relative abundance of 37.4% (SD 1.2%), using a list of 16 features. The SD decreased to 0.6% when renormalizing using only the main features, H3N4F1, H4N4F1 and H5N4F1. Visual inspection of the relative ratios over time (1 month), showed that there was a decrease of H4N4F1 glycopeptide with time (linear regression; p < 0.001) as well as an increase of the H5N4F1 glycopeptide (linear regression; p = 0.022) as presented in Supporting Information, Figure S-10.

22 ACS Paragon Plus Environment

Page 23 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

However, it is unclear whether this minor effect is caused by changing instrument performance or partial decay of the sample. All AAT data was curated using the QC calculated by LaCyTools. For instance, the glycopeptide N107H5N4F1S1, with a quadruply charged m/z value of 1438.641, showed an average mass error of -99.9 ppm and the isotopic pattern deviated 14.3% from the expected isotopic pattern. Examining the mass spectrum showed that the area or intensity reported by the other methods, 3D Max Extractor and DA, were of an interference at m/z 1437.179 (Supporting Information, Figure S-11). The use of the QC allowed us to quickly identify contaminants. 3D Max or DA require detailed manual inspection of all data, something that is not desirable for large clinical cohorts. The compositions of the five most abundant glycopeptides of AAT that passed curation were verified by fragmentation (Supporting Information, Figure S-3).

Performance and Data Storage The performance of the entire workflow of alignment, calibration, integration and curation using LaCyTools was evalulated by using a set of 221 IgG glycopeptide measurements. The entire workflow took 10 hours to complete using an in-house server, equating to roughly 3 minutes per sample. In comparison, using 3D Max, the extraction took more than 24 hours, and alignment was not possible. Furthermore, significant additional manual steps had to be performed to convert the 3D Max output into a format that was comparable to the LaCyTools output, e.g. summing the area of individual isotopes and normalizing the data. Additionally, we evaluated the performance gain and size decrease of using a pyTables file format, using the four AAT glycopeptide measurements. A major benefit of the pyTables format is a significant reduction in size of the data, with the pyTables file being 30% of the size of the regular mzXML files. Furthermore, processing from pyTables file was significantly faster than from mzXML files. However, there is currently no way to go from the raw data directly to a pyTables file, thereby

23 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 38

making it necessary to first go to the mzXML format and thus negating any performance gain for now (Supporting Information, Figure S-12).

Limitations and Future Prospects Currently, LaCyTools requires other software, e.g. proteowizard, to be able to process raw data as it only accepts mzXML input files. The reading efficiency of mzXML files was optimized here, and we intend to include a highly efficient mzML reading function in the near future. At the moment LaCyTools ideally requires the RAM to be 25% larger than the size of the mzXML files, due to the fact that it stores the entire mzXML file in the memory. Future changes will focus on optimizing the performance of LaCyTools, reduce the memory usage of the reading functions and include support for different file formats. Afterwards, we envision to couple LaCyTools to an existing identification package to create a pipeline that includes data transformation, analyte identification, analyte quantitation and analyte curation.

CONCLUDING REMARKS Here, we presented a complete software package for targeted high-throughput data processing of LC-MS glycosylation data. LaCyTools was applied to a large set of IgG glycopeptide measurements, as well as a small amount of AAT measurements. The results show that the alignment of LaCyTools performs as good as a commonly used untargeted alignment module of MZmine2. Additionally, the algorithms used in LaCyTools have been previously shown to calibrate better than the commercially available DA.22 Quantitation using LaCyTools showed high precision, as demonstrated by low SDs. Trueness was different from the other methods, but arguably better due to the effects related to peak width and isotopic envelope. Importantly, LaCyTools showed much higher throughput when compared to all other packages, as demonstrated by the complete workflow of alignment, calibration, integration and curation being completed in 10 hours on a set of 221 IgG glycopeptide measurements. LaCyTools allows to repeatedly align, calibrate and integrate LC-MS glycosylation

24 ACS Paragon Plus Environment

Page 25 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

data in a manner which offers easy access for unskilled users without sacrificing flexibility for advanced users. The users have access to a wide variety of QC that allow them to assess each analyte individually. The entire package offers an invaluable tool in the glycoproteomic analysis of large clinical cohorts. We imagine that LaCyTools will assist researchers with focusing on the biological meaning of their data rather than the data processing itself.

ACKNOWLEDGEMENTS This work was supported by the Horizon Programme Zenith project funded by the Netherlands Genomic Initiative (project number: 93511033), the European Union (Seventh Framework Programs HighGlycan, grant number 278535 and IBD-BIOM, grant number 305479) and INTEGRA-LIFE (grant number 315997).

AUTHOR CONTRIBUTIONS All authors were involved in manuscript preparation. Furthermore, B.C.J. designed the software, wrote the software and analyzed the data. D.F. assisted in the designing and testing of the software and in preparing the IgG sample. N.H. measured the the IgG sample, assisted with the IgG annotation and assisted in testing the software. A.H.E. measured the AAT sample. G.R. wrote the HD5 part of the program and assisted in comparing HD5 with other data formats. G.L. supervised the HD5 part of the program. M.W. directed and advised the project with regard to software development, data processing and edited the manuscript.

COMPETING FINANCIAL INTERESTS The authors declare no competing financial interests.

SUPPORTING INFORMATION AVAILABLE 25 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 38

Figure S1 - NOBAN algorithm decision workflow, Figure S2 - NOBAN example using N107-H5N4S2 glycopeptide of AAT, Figure S3 - AAT Fragmentation mass spectra, Figure S4 - EICs of major IgG glycoforms, Figure S5 - Alignment of IgG runs, Figure S6 - Sum spectra of IgG peptide clusters, Figure S7 - Quantitation with varying m/z tolerance, Figure S8 - Isotopic pattern bias in quantitation, Figure S9 - IgG integration results, Figure S10 - IgG relative abundance drift over time, Figure S11 - AAT curation using calculated QC, Figure S12 - PyTables size and performance. Table S1 - AAT peptide sequences, Table S2 - S/N determination using different methods, Table S3 - AAT post-alignment Δtr, Table S4 - IgG pre-alignment Δtr, Table S5 - IgG post-alignment Δtr, Table S6 - AAT post-calibration mass errors, Table S7 - IgG post-calibration mass errors, Table S8 - IgG quantitation using LaCyTools, Table S9 - AAT quantitation comparison between LaCyTools, DataAnalysis and 3D Max Extractor.

26 ACS Paragon Plus Environment

Page 27 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

References 1. 2.

3.

4. 5. 6.

7.

8.

9. 10.

11.

12.

13.

14.

15.

16.

Ohtsubo, K.; Marth, J. D. Glycosylation in cellular mechanisms of health and disease. Cell 2006, 126, 855-867. Selman, M. H. J.; Derks, R. J. E.; Bondt, A.; Palmblad, M.; Schoenmaker, B.; Koeleman, C. A. M.; van de Geijn, F. E.; Dolhain, R. J. E. M.; Deelder, A. M.; Wuhrer, M. Fc specific IgG glycosylation profiling by robust nano-reverse phase HPLC-MS using a sheath-flow ESI sprayer interface. J. Proteomics 2012, 75, 1318-1329. Pucic, M.; Knezevic, A.; Vidic, J.; Adamczyk, B.; Novokmet, M.; Polasek, O.; Gornik, O.; Supraha-Goreta, S.; Wormald, M. R.; Redzic, I.; Campbell, H.; Wright, A.; Hastie, N. D.; Wilson, J. F.; Rudan, I.; Wuhrer, M.; Rudd, P. M.; Josic, D.; Lauc, G. High throughput isolation and glycosylation analysis of IgG-variability and heritability of the IgG glycome in three isolated human populations. Mol. Cell. Proteomics 2011, 10, M111.010090. Pabst, M.; Chang, M.; Stadlmann, J.; Altmann, F. Glycan profiles of the 27 N-glycosylation sites of the HIV envelope protein CN54gp140. Biol Chem 2012, 393, 719-730. Boehm, M. K.; Perkins, S. J. Structural models for carcinoembryonic antigen and its complex with the single-chain Fv antibody molecule MFE23. FEBS Lett 2000, 475, 11-16. Kolarich, D.; Weber, A.; Turecek, P. L.; Schwarz, H.-P.; Altmann, F. Comprehensive glycoproteomic analysis of human α1-antitrypsin and its charge isoforms. PROTEOMICS 2006, 6, 3369-3380. Beck, A.; Wagner-Rousset, E.; Ayoub, D.; Van Dorsselaer, A.; Sanglier-Cianférani, S. Characterization of Therapeutic Antibodies and Related Products. Anal. Chem. 2013, 85, 715736. Wuhrer, M.; Deelder, A. M.; Hokke, C. H. Protein glycosylation analysis by liquid chromatography-mass spectrometry. J. Chromatogr. B Analyt. Technol. Biomed. Life. Sci. 2005, 825, 124-133. Kolarich, D.; Jensen, P. H.; Altmann, F.; Packer, N. H. Determination of site-specific glycan heterogeneity on glycoproteins. Nat. Protoc. 2012, 7, 1285-1298. Ozohanics, O.; Krenyacz, J.; Ludányi, K.; Pollreisz, F.; Vékey, K.; Drahos, L. GlycoMiner: a new software tool to elucidate glycopeptide composition. Rapid Commun. Mass Spectrom. 2008, 22, 3245-3254. Ceroni, A.; Maass, K.; Geyer, H.; Geyer, R.; Dell, A.; Haslam, S. M. GlycoWorkbench: A Tool for the Computer-Assisted Annotation of Mass Spectra of Glycans. J. Proteome Res. 2008, 7, 1650-1659. Hufnagel, P.; Resemann, A.; Jabs, W.; Marx, K.; Schweiger-Hufnagel, U. Automated Detection and Identification of N-and O-glycopeptides. Proceedings of the Beilstein glycobioinformatics symposium, Potsdam, Germany, June 10-14, 2013. Park, G. W.; Kim, J. Y.; Hwang, H.; Lee, J. Y.; Ahn, Y. H.; Lee, H. K.; Ji, E. S.; Kim, K. H.; Jeong, H. K.; Yun, K. N.; Kim, Y. S.; Ko, J. H.; An, H. J.; Kim, J. H.; Paik, Y. K.; Yoo, J. S. Integrated GlycoProteome Analyzer (I-GPA) for Automated Identification and Quantitation of SiteSpecific N-Glycosylation. Sci Rep 2016, 6, 21175. Yu, C.-Y.; Mayampurath, A.; Hu, Y.; Zhou, S.; Mechref, Y.; Tang, H. Automated annotation and quantification of glycans using liquid chromatography–mass spectrometry. Bioinformatics 2013, 29, 1706-1707. Strohalm, M.; Kavan, D.; Novák, P.; Volný, M.; Havlíček, V. mMass 3: A Cross-Platform Software Environment for Precise Analysis of Mass Spectrometric Data. Anal. Chem. 2010, 82, 4648-4651. Pluskal, T.; Castillo, S.; Villar-Briones, A.; Oresic, M. MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 2010, 11, 395.

27 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

17.

18.

19.

20.

21.

22.

23. 24. 25. 26. 27. 28. 29. 30. 31. 32.

33.

Page 28 of 38

Nevedomskaya, E.; Derks, R.; Deelder, A. M.; Mayboroda, O. A.; Palmblad, M. Alignment of capillary electrophoresis-mass spectrometry datasets using accurate mass information. Anal. Bioanal. Chem. 2009, 395, 2527-2533. Falck, D.; Jansen, B. C.; Plomp, R.; Reusch, D.; Haberger, M.; Wuhrer, M. Glycoforms of Immunoglobulin G Based Biopharmaceuticals Are Differentially Cleaved by Trypsin Due to the Glycoform Influence on Higher-Order Structure. J. Proteome Res. 2015, 14, 4019-4028. Kapur, R.; Kustiawan, I.; Vestrheim, A.; Koeleman, C. A.; Visser, R.; Einarsdottir, H. K.; Porcelijn, L.; Jackson, D.; Kumpel, B.; Deelder, A. M.; Blank, D.; Skogen, B.; Killie, M. K.; Michaelsen, T. E.; de Haas, M.; Rispens, T.; van der Schoot, C. E.; Wuhrer, M.; Vidarsson, G. A prominent lack of IgG1-Fc fucosylation of platelet alloantibodies in pregnancy. Blood 2014, 123, 471-480. Huffman, J. E.; Pucic-Bakovic, M.; Klaric, L.; Hennig, R.; Selman, M. H.; Vuckovic, F.; Novokmet, M.; Kristic, J.; Borowiak, M.; Muth, T.; Polasek, O.; Razdorov, G.; Gornik, O.; Plomp, R.; Theodoratou, E.; Wright, A. F.; Rudan, I.; Hayward, C.; Campbell, H.; Deelder, A. M.; Reichl, U.; Aulchenko, Y. S.; Rapp, E.; Wuhrer, M.; Lauc, G. Comparative performance of four methods for high-throughput glycosylation analysis of immunoglobulin G in genetic and epidemiological research. Mol. Cell. Proteomics 2014, 13, 1598-1610. Gardinassi, L. G.; Dotz, V.; Hipgrave Ederveen, A.; de Almeida, R. P.; Nery Costa, C. H.; Costa, D. L.; de Jesus, A. R.; Mayboroda, O. A.; Garcia, G. R.; Wuhrer, M.; de Miranda Santos, I. K. Clinical severity of visceral leishmaniasis is associated with changes in immunoglobulin g fc N-glycosylation. MBio 2014, 5, e01844. Jansen, B. C.; Reiding, K. R.; Bondt, A.; Hipgrave Ederveen, A. L.; Palmblad, M.; Falck, D.; Wuhrer, M. MassyTools: A high throughput targeted data processing tool for relative quantitation and quality control developed for glycomic and glycoproteomic MALDI-MS. J. Proteome Res. 2015, 14, 5088-5098. van der Walt, S.; Colbert, S. C.; Varoquaux, G. The NumPy Array: A Structure for Efficient Numerical Computation. Comput. Sci. Eng. 2011, 13, 22-30. Oliphant, T. E. Python for Scientific Computing. Comput. Sci. Eng. 2007, 9, 10-20. Hunter, J. D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9, 90-95. Van Rossum, G.; Drake Jr, F. L. Python reference manual; Centrum voor Wiskunde en Informatica Amsterdam: 1995; Technical Report CS-R9526. Anaconda. Retrieved 1-7-2014, from https://store.continuum.io/cshop/anaconda/. Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 2008, 24, 2534-2536. Alted, F. Why Modern CPUs Are Starving and What Can Be Done about It. Comput. Sci. Eng. 2010, 12, 68-71. Alted, F.; Vilata, I.; Prater, S.; Mas, V.; Hedley, T. PyTables: hierarchical datasets in Python. Available from World Wide Web: http://www.pytables.org. 2002-2016, The HDF Group. Hierarchical Data Format, version 5. Available from World Wide Web: http://ww.hdfgroup.org/HDF5. 1997-2016, Palmblad, M.; Mills, D. J.; Bindschedler, L. V.; Cramer, R. Chromatographic alignment of LCMS and LC-MS/MS datasets by genetic algorithm feature extraction. J. Am. Soc. Mass Spectrom. 2007, 18, 1835-1843. Brown, M. B.; Forsythe, A. B. Robust Tests for the Equality of Variances. J. Am. Stat. Assoc. 1974, 69, 364-367.

28 ACS Paragon Plus Environment

Page 29 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure Legends Figure 1. LaCyTools workflow. The workflow depicts every module of LaCyTools in the blue boxes, with the respective user input listed above in the orange boxes while all the steps that are performed by LaCyTools are listed below in the grey boxes. The alignment and quantitation steps are independent of each other, meaning that the user can select either or both to be performed. Additionally, calibration is an optional step during quantitation, which is why the reference file links to both calibration and quantitation modules. The alignment and calibration both use a least squares (LSQ) approach to determine the optimal solution using either a power law or a polynomial function. Furthermore, a detailed decision workflow of the NOBAN algorithm, that is used to determine the background and noise during the alignment, is included in the Supporting Information, Figure S-1. Figure 2. Extracted ion chromatograms of main AAT glycopeptides. The EICs of the major glycopeptides are shown for each glycosylation site of AAT. The displayed chromatograms are extracted with a width of 0.05 Th, using the three main isotopes of the quadruply and quintuply charged glycopeptides. These results show that the dominant glycan on N70 and N271 is the fully sialylated diantennary glycan, while on N107 the fully sialylated triantennary glycan is also present, both in its fucosylated and afucosylated form. These results are in agreement with literature.6 Figure 3. Alignment functions for fitting of AAT, measured with different LC gradients. A total of four analyses of AAT using differing LC gradients were aligned using LaCyTools. (A) gradient 1, (B) gradient 2, (C) gradient 3 and. (D) gradient 4. The features used for alignment are shown before and after alignment (blue and red markers, respectively). Furthermore, the figure shows the function that was used for the alignment (blue) as well as the target line (red, striated). The manually estimated tR values from gradient 1 were used as target values. Figure 4. Alignment of AAT chromatograms. A total of four chromatograms, measured using differing LC gradients were aligned using LaCyTools. (A) EIC of feature N70-H5N4S2 before alignment.

29 ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 30 of 38

(B) EIC of feature N271-H5N4S2 before alignment. (C) EIC of feature N70-H5N4S2 after alignment. (D) EIC of feature N271-H5N4S2 after alignment. LaCyTools was able to align both features to within 1 s of the desired tr in all runs. Figure 5. Sum spectra of AAT glycopeptide clusters. (A) Sum spectrum of the glycopeptide cluster of QLAHQSN70STNIFFSPVSIATAFAMLSLGTK. (B) Sum spectrum of the glycopeptide cluster of ADTHDEILEGLNFN107LTEIPEAQIHEGFQELLR. (C) Sum spectrum of the glycopeptide cluster of YLGN271ATAIFFLPDEGK. All sum spectra were generated with a resolution of 100 data points per Th. Displayed glycopeptide structures are based on their accurate mass, previous literature and MS2 where possible6. All non-annotated major peaks could not be assigned to an AAT glycopeptide composition. Figure 6. AAT Quantitation. Relative quantitation was compared for LaCyTools, 3D Max Extractor and Bruker DataAnalysis, using the 3 peptide clusters of AAT. (A) N70 peptide cluster. (B) N107 peptide cluster. (C) N271 peptide cluster. The glycopeptides that could not be detected with all methods are marked with a star. The 3D Max Extractor, Bruker DataAnalysis and LaCyTools max quantitation option methods show similar results. Compared to LaCyTools, using the MS peak integration, higher values are yielded for the smallest and most abundant glycopeptide in each cluster. This is caused by the difference in quantitation methods, specifically MS peak intensity vs peak area.

30 ACS Paragon Plus Environment

Page 31 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Journal of Proteome Research

Tables Table 1: Summary of the software tested in this study. The capability of each package that was used in this study, regarding the processing of targeted glycoproteomic analyses, is shown. It should be noted that software packages may be capable of doing more than what is displayed below, such as automated identification of features. For example, MultiGlycan performs an automated peak identification prior to quantitation based on parameters such as mass accuracy. Therefore, while MultiGlycan does not calculate quality criteria itself, the quantified peaks have passed some basic curation.

Software mzMine2

Free Yes

mMass

Yes

3D Max Extractor msalign 2 MultiGlycan

Yes Yes Yes

DataAnalysis LaCyTools

No Yes

File Formats mzML, mzXML, mzData, Thermo and Waters mzML, mzXML, MGF, ASCII and plain text mzXML

Alignment Untargeted

Calibration No

Quantitation No

Quality Criteria No

Link http://mzmine.github.io

No

Yes

Untargeted (MS)

S/N

http://www.mmass.org

No

No

Targeted (MS)

No

NA

mzXML mzXML and Thermo Bruker mzXML

Untargeted No

No No

No Untargeted (MS)*

No No**

http://ms-utils.org/msalign/ http://darwin.informatics.indiana.edu/col/MultiGlycan/

No Targeted

Yes Yes

Targeted (EIC, MS) Targeted (MS)

S/N S/N, IPQ, Mass Accuracy

http://www.bruker.com http://github.com/Tarskin/LaCyTools

* Designed for released glycans ** Quantified peaks are based on automated identification

31

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 32 of 38

Page 33 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 2 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 34 of 38

Page 35 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 4 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 36 of 38

Page 37 of 38

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Journal of Proteome Research

Figure 6 254x190mm (300 x 300 DPI)

ACS Paragon Plus Environment

Journal of Proteome Research

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

for TOC only 209x62mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 38 of 38