Subscriber access provided by Nottingham Trent University
Article
A straightforward and highly efficient strategy for hepatocellular carcinoma glycoprotein biomarker discovery using a nonglycopeptide-based mass spectrometry (NGP-MS) pipeline Weiqian Cao, Biyun Jiang, Jiangming Huang, Lei Zhang, Mingqi Liu, Jun Yao, Mengxi Wu, Lijuan Zhang, Siyuan Kong, Yi Wang, and Pengyuan Yang Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.9b03074 • Publication Date (Web): 27 Aug 2019 Downloaded from pubs.acs.org on August 28, 2019
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
1
A straightforward and highly efficient strategy for hepatocellular carcinoma
2
glycoprotein biomarker discovery using a nonglycopeptide-based mass
3
spectrometry (NGP-MS) pipeline
4 5
Wei-Qian Cao1,2,#,*, Bi-Yun Jiang1,#, Jiang-Ming Huang1,3, Lei Zhang1, Ming-Qi
6
Liu1,3, Jun Yao1, Meng-Xi Wu1,3, Li-Juan Zhang1, Si-Yuan Kong1, Yi Wang1,
7
Peng-Yuan Yang1,3,*
8
1. The Fifth People’s Hospital of Shanghai and Institutes of Biomedical
9
Sciences, Fudan University, Shanghai, China
10
2. NHC Key Laboratory of Glycoconjugates Research, Fudan University,
11
Shanghai, China
12
3. Department of Chemistry, Fudan University, Shanghai, China
13
#
These authors contributed equally to this work
14
*
To whom correspondence should be addressed:
15
P.-Y.Y. (
[email protected])
16
C.-W.Q. (
[email protected])
17 18
ABSTRACT
19
Efficient detection of aberrant glycoproteins in serum is particularly
20
important for biomarker discovery. However, direct quantitation of glycoproteins
21
in serum remains technically challenging due to the extraordinary complexity of
22
the serum proteome. In the current work, we proposed a straightforward and
23
highly efficient strategy by using the nonglycopeptides releasing from the
24
specifically enriched glycoproteins for targeted glycoprotein quantification. With
25
this so called nonglycopeptide-based mass spectrometry (NGP-MS) strategy,
26
a powerful and nondiscriminatory pipeline for HCC glycoprotein biomarker
27
discovery, verification and validation has been developed. Firstly, a dataset of
28
234 NGPs was strictly established for MRM quantification in serum. Secondly,
29
the NGPs enriched from 20 HCC serum mixtures and 20 normal serum
30
mixtures were labeled with mTRAQ reagents (Δ0 and Δ8, respectively) to find
ACS Paragon Plus Environment
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 24
31
the differentially expressed glycoproteins in HCC. A total of 97 glycoprotein
32
candidates were preliminary screened and submitted for absolute quantitation
33
with NGP-based SID MRM in the individual samples of 38 HCC serum and 24
34
normal controls. Finally, 21 glycoproteins were absolutely quantified with high
35
quality. The diagnostic sensitivity results showed that three glycoproteins, beta-
36
2-glycoprotein 1 (APOH), alpha-1-acid glycoprotein 2 (ORM2) and complement
37
C3 (C3), could be used for the discrimination between HCC patients and
38
healthy people. A novel glycoprotein biomarker panel (APOH, ORM2, C3 and
39
AFP) has been testified outperformed than AFP, the known HCC serum
40
biomarker, alone, in this study. We believe that this strategy and the panel of
41
glycoproteins might hold great clinical value for HCC detection in the future.
42 43 44 45
Glycosylation, as one of the most important posttranslational protein
46
modifications, plays significant roles in various biological processes.1 Abnormal
47
changes in protein glycosylation not only affect the biological functions of
48
glycoproteins, but also are associated with a variety of diseases, including
49
cancers.2,3 Approximately 25% of FDA-approved tumor markers are
50
glycoproteins.4,5 Thus, efficient identification and quantification of abnormal
51
glycoproteins between healthy and cancerous individuals would be useful for
52
the study of the pathological mechanism of cancer and the development of
53
specific cancer biomarkers.6
54
Human serum is a rich source of biomarkers and generally considered
55
crucial for disease diagnosis and therapeutic target discovery. Serum contains
56
various proteins and presents the deepest version of human proteome.
57
Serological aberrant glycoproteins can reflect abnormal states of cancer
58
patients, since aberrant glycoproteins can either be secreted into the
59
bloodstream or shed from cell membranes via abnormally enhanced protease
60
activity.7
Consequently,
quantitative
studies
of
ACS Paragon Plus Environment
aberrant
changes
of
Page 3 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
61
glycoproteins in serum are particularly important for biomarker discovery. Direct
62
identification and quantitation of glycoproteins in serum based on mass
63
spectrometry8,9 are technically challenging due to the extraordinary complexity
64
of the serum proteome, the inherent low stoichiometry and microheterogeneity
65
of glycosylation, as well as the serious signal suppression of glycopeptides by
66
abundant nonglycopeptides during MS analysis.
67
In recent years, multiple reaction monitoring (MRM) MS has been
68
introduced to compensate for the shortfall in biomarker development10 and
69
recognized as a rapid and cost-effective measurement technology for highly
70
specific detection of targeted proteins in extremely complex biological
71
samples.11,12 Different strategies employing MRM have been designed for
72
glycoprotein quantification.13 For example, Lebrilla C.B. et al applied the MRM
73
technique to quantify immunoglobulins G, A, and M and their site-specific
74
glycans simultaneously and directly from human serum/plasma without protein
75
enrichment.14
76
deglycosylated glycopeptides and obtain site-specific quantification information
77
of core fucosylated peptides.15
Qian X. et al used MRM-MS to analyze the partially
78
Although MRM has shown vital value and great potential in both
79
glycoproteome research and biomarker discovery, the broad application of this
80
method has been impeded due to the limited choices of internal references.
81
MRM is based on the concept of dilution with stable isotope-labeled synthetic
82
reference peptides, which precisely mimic the deglycosylated form of candidate
83
glycopeptides as internal references, for the purpose of glycoprotein
84
quantification. The deglycosylated peptides that are selected as internal
85
references must meet several basic requirements, such as no missing cleavage
86
sites and no methionine in the peptide sequence.16 However, since only 5% of
87
the total number of tryptic peptides contain the N-X-S/T motif, the choice of
88
internal references is very limited.17 Moreover, microheterogeneity exists in
89
each N-glycosylation sites and every glycoproteins owns multiple glycosylation
90
sites, thus further enhancing the difficulties in the internal peptide choice and
ACS Paragon Plus Environment
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 24
91
also would lead to ambiguous results for glycoprotein quantification.18 To solve
92
the abovementioned problems, in the current study, we proposed a promising
93
targeted
94
nonglycopeptides from N-glycoproteins instead, developed an efficient pipeline
95
for high throughput screening of differentially expressed glycoproteins and
96
provided
97
quantification in clinical HCC serum.
glycoproteomic
a
dataset
MRM-MS
of
strategy
nonglycopeptide
based
references
on
for
monitoring
glycoprotein
98 99 100
EXPERIMENTAL SECTION Chemicals and Reagents
101
Sequencing grade trypsin was purchased from HUA LISHI SCIENTICFIC
102
Corporation (Beijing, China). PNGase F (glycerol free) was purchased from
103
New England Biolabs (Ipswich, MA). Affi-Gel® Hz Hydrazide Gel was
104
purchased from Bio-Rad laboratories (Hercules, CA). An mTRAQ reagent 10
105
assay kit was purchased from SCIEX (Redwood, CA). The crude and isotope-
106
labeled peptides were obtained from Guo Tai Biological Technology
107
Corporation (Hefei, China). The synthetic peptides were assessed by MALDI-
108
TOF MS and reversed-phase high-performance liquid chromatography (RP-
109
HPLC). All other chemicals were purchased from Sigma-Aldrich (St. Louis, MO).
110 111
Human Serum Samples Collection
112
A total of 102 human serum samples including 58 HCC and 44 normal
113
controls were collected at Shanghai Zhongshan Hospital and Huashan Hospital
114
from April to December, 2014. The collected blood samples were immediately
115
placed on ice and allowed to stand for 30 mins. Then the samples were
116
centrifuged at 2000g for 15 minutes. The supernatant was collected and stored
117
at -80 °C until it was used.
118
Informed consent was obtained under protocols that were approved by an
119
institutional review board approved. The research followed the tenet of the
120
Declaration of Helsinki and was proved by the Ethics Commeitee of the Fudan
ACS Paragon Plus Environment
Page 5 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
121
University Shanghai Zhongshan Hospital and Huashan Hospital. HCC
122
diagnoses were confirmed by histopathologic study. Normal controls were
123
selected from those without a history of liver disease and hepatitis B or hepatitis
124
C infection and with normal liver biochemical function. The clinical information
125
pertaining to HCC samples and normal controls are included in Table S1.
126 127
LC-MS/MS Identification of Nonglycoeptides and Glycopeptides for
128
Glycoproteins Enriched from Serum Sample
129
In the identification stage, a pooled 10μL serum sample from 20 HCC and
130
20 normal serum mixtures or a mixture of five standard glycoprotein with E.coli
131
proteins was used for subsequent analysis.
132
For identification, glycoproteins were first captured from samples using
133
the hydrazide chemistry method.19 Then, the bound glycoproteins were
134
reduced with 10 mM dithiotreitol and in 8 M urea/100 mM ammonium
135
bicarbonate buffer at 37 °C for 2 h, and alkylated with 20 mM iodoacetamide at
136
room temperature in the dark for 0.5 h. The resins were washed with 100 mM
137
ammonium bicarbonate containing 8 M urea, 1.5 M NaCl and 100 mM
138
ammonium bicarbonate twice to remove unbound nonglycosylated proteins.
139
Finally, the nonglycopeptides were released from captured glycoproteins by
140
trypsin digestion. Released nonglycopeptides were collected, lyophilized with
141
SpeedVac, and stored at -80°C for later analysis. Glycopeptides still bound to
142
the resins were released by further incubating the resins with PNGase F at 37°C
143
overnight. Glycopeptides were also lyophilized with SpeedVac and stored at -
144
80 °C for later analysis.
145
The enriched nonglycopeptides and glycopeptides were then
146
analyzed by LC-MS/MS. The analyses were carried out on nano-LC-ESI
147
MS/MS. The peptides were suspended in 5% (v/v) ACN containing 0.1% (v/v)
148
FA (phase A) and separated by a 15 cm reversed- phase column with a gradient
149
of 5%–45% phase B (95% ACN with 0.1% formic acid) over 100 mins at a
150
constant column-tip flow rate of 500 nL/min. The peptides were analyzed using
ACS Paragon Plus Environment
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
151
an LC-20AB system (Shimadzu, Tokyo, Japan) connected to an LTQ Orbitrap
152
mass spectrometer (Thermo Electron, Bremen, Germany) equipped with an
153
online nanoelectrospray ion source. The spray voltage was 2.3 kV and the
154
heated capillary was set at 180 °C. The peptides were analyzed by MS and
155
data-dependent MS/MS acquisition, selecting the 10 most abundant precursor
156
ions for MS/MS with a dynamic exclusion duration of 60 s. The resolution was
157
set to 60000@400. AGC was set to 1000000 for MS1, and 10000 for MS2.The
158
scan range was set from m/z 300 to m/z 1600.
159
The acquired MS/MS spectra from LC-MS/MS were searched by
160
MASCOT (version 2.3) against the human protein Swiss-Prot database for
161
human serum (including 20212 entries), or against the manually combined
162
dataset of five standard glycoproteins from Uniprot for the standard
163
glycoproteins. The searching parameters were set as follows: fixed modification
164
of cysteine residues (C, +57 Da), variable modifications of methionine oxidation
165
(M, +16 Da), maximum of two missed tryptic cleavage sites, 20 ppm error
166
tolerance in MS and 1 Da error tolerance in MS/MS. The cut-off false discovery
167
rate for all peptide identification processes was controlled to below 1%. For
168
glycopeptides search, variable modification of deamidation (N, +0.98 Da for
169
glycan releasing in H216O; N, or +2.98 Da for glycan releasing in H218O) was
170
added. Only peptides with an N-X-S/T (X≠P) sequon were considered N-
171
glycopeptides.
172 173
MRM Analysis
174
All MRM experiments were carried out on a 6500 QTRAP hybrid triple
175
quadrupole/linear ion trap mass spectrometer (SCIEX, CA) interfaced with an
176
Eksigent nano 1D plus system (AB Sciex, CA). Peptides were separated by a
177
15 cm reversed-phase column (75-μm inner diameter; C18 3-μm silica beads)
178
with a gradient of 5%–80% phase B (98% ACN with 0.1% formic acid) over 60
179
mins at a constant column-tip flow rate of 300 nL/min. The ion spray voltage
180
was set to 2300 V and the curtain gas pressure was 30 p.s.i. Q1/Q3
ACS Paragon Plus Environment
Page 6 of 24
Page 7 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
181
quadrupoles were set at unit resolution (0.7 FWHM). The dwell time for each
182
MRM transition was set to 0.02 s. Skyline was used to generate the MRM
183
method and analyze the data from the MRM assay.
184
MRM-MS optimization: A total of 234 nonglycopeptides from 105 N-
185
glycoproteins of human serum were selected and synthesized for MRM-MS
186
optimization according to the following basic principles: (1) no missing cleavage
187
sites; (2) 2+ or 3+ charge states; (3) no methionine in the peptide sequence;
188
and (4) 5-25 amino acids length. Transitions and collision energy (CE) were
189
optimized to guaranteed high sensitivity. The 2 or 3 most intense daughter ions
190
of the signature peptides were selected for the MRM transitions. Singly charged
191
y-ions, which yield the highest intensity and largest mass differences between
192
the labeled and unlabeled peptides were the preferred selection. The CE was
193
optimized based on calculation and experiments. The default collision energies
194
used for the 6500 QTAP instrument were calculated according to the formulas
195
CE = 0.057 × (precursor m/z) − 4.265 and CE =0.031 × (precursor m/z) + 7.082,
196
for doubly and triply charged precursor ions, respectively. The transitions for
197
each peptide were measured for 11 different CE values (five steps and 1 V step
198
size on either side of the default CE).
199
mTRAQ labeling and MRM relative quantitation: Nonglycopeptides
200
enriched from 20 HCC serum mixtures and 20 normal serum mixtures were
201
labeled using mTRAQ Reagent 10 Assay Kit (△0 and △8 reagent, respectively,
202
SCIEX 4440014 and 4427697) according to the product manual. Then the
203
labeled peptides were desalted with C18 columns, lyophilized with SpeedVac,
204
and stored at -80 °C for later analysis. Proteins with a fold change greater than
205
1.20 or less than 0.83 were selected as differentially expressed proteins.
206
Nonglycopeptides enriched from five standard glycoproteins were labeled using
207
mTRAQ Reagent 10 Assay Kit (△0 and △4 reagent, respectively, SCIEX
208
44440014 and 4427696).
209
Absolute quantitation with isotope dilution-based MRM (SID-MRM):
210
For absolute quantitation, stable isotope-labeled (SIL) peptides of targeted
ACS Paragon Plus Environment
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
211
proteins were synthesized (arginine or lysine of each peptide was isotope-
212
labeled at 13C and 15N, Guopingyaoye, Ltd, China). First, SIL peptides were
213
added to the digested serum proteins for MRM-MS to estimate the
214
concentration of the peptides in the sample. Then, SIL peptides were added to
215
the digested serum proteins and tested in triplicate to construct a standard
216
curve. Finally, SIL peptides were spiked in each sample to a certain
217
concentration (Table S2), and a total of 1μg proteotypic peptides of each
218
sample was applied for analysis.
219 220
Statistic Construction of a Diagnostic Model
221
The quantitative results from the SID-MRM analysis were processed and
222
visualized using Prism 5.0 (GraphPad Software Inc., La Jolla, CA, USA). Binary
223
logic regression was performed to calculate the receiver operation
224
characteristic curves (ROCs) and to produce predictive models by comparing
225
the area under the curve (AUC) using SPSS (v19.0, IBM, Armonk, NY, USA).
226
Parameters of binary logic regression using SPSS were set as follows:
227
dependent variable was set as 1; covariates were set as the average
228
concentration of the glycoproteins. The binary logic regression results were
229
saved as Probability (P) and showed in Table S3
230 231
Immunohistochemistry (IHC) Validation
232
Whole sections of formalin-fixed and paraffin-embedded tissue microarray
233
with 75 hepatocellular carcinoma tissue and 75 para cancer tissue points were
234
purchased from Xin Chao Corporation (Beijing, China) and used for
235
immunohistochemistry analysis. The pathological information pertaining to the
236
tissue chip was included in Table S4. After dewaxing, rehydration and antigen
237
retrieval, the sections were preincubated with 3% hydrogen peroxide to
238
inactivate endogenous peroxidase and blocked with 5% bovine serum albumin
239
(BSA, Sigma) for 1 h. The sections were then incubated with the primary
240
antibody at 4 °C overnight followed by a secondary antibody at 37 °C for 30 min.
ACS Paragon Plus Environment
Page 8 of 24
Page 9 of 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
241
Peroxidase activity was revealed by 3-diaminobenzidine (DAB). After staining
242
with hematoxylin, the sections were dehydrated in an alcohol gradient, cleared
243
with xylene, mounted and imaged under a light microscope.
244 245
RESULTS AND DISCUSSION
246
Nonglycopeptide based Mass Spectrometry (NGP-MS) Strategy
247
MRM-MS is a powerful tool for proteomic quantitation. However,
248
glycoprotein quantitation is hampered by the limited choices of glycopeptides
249
as internal references. In the current study, we proposed a straightforward
250
nonglycopeptide based MS (NGP-MS) strategy for targeted glycoproteomic
251
quantitation. In this strategy, glycoproteins were first captured on solid beads
252
through hydrazide chemistry. After thoroughly washing, the nonglycopeptides
253
(NGPs) of captured glycoproteins were released through trypsin digestion and
254
analyzed by LC-MS/MS. Finally, the selected NGPs were synthesized for the
255
MRM-assay (Fig. 1A).
256
The overall scheme of the NGP-MS-based pipeline for HCC biomarker
257
development is shown in Figure 1B-1D. First, to ensure the feasibility and
258
quantitative accuracy of the strategy, we quantitatively analyzed five standard
259
glycoproteins, cytochrome c (Cyto C), immunoglobulin G (IgG), ovalbumin
260
(OVA), horseradish peroxidase (HRP) and fetuin, with a defined amount of
261
mixture using the NGP-MS strategy (Fig. 1B). The NGP-MS strategy was
262
demonstrated to be an efficient strategy for glycoprotein quantitation, exhibiting
263
high accuracy and good reproducibility. Then, NGP-MS was applied to human
264
serum analysis (Fig. 1C). A total of 1924 NGPs from 259 glycoproteins were
265
identified. Through peptide selection and optimization, a dataset, containing
266
234 NGPs of 105 glycoproteins with optimum parameters, was established for
267
MRM quantitation. Based on the dataset, NGP-MS was ultimately used for the
268
discovery of HCC candidate biomarkers (Fig. 1D). We found that 97
269
glycoproteins were significantly changed in HCC serum through primary
270
screening by mTRAQ labeling and MRM relative quantitation (fold-
ACS Paragon Plus Environment
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 24
271
change>1.20 or