Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE
Moving beyond the van Krevelen diagram: A new stoichiometric approach for compound classification in organisms. Albert Rivas-Ubach, Yina Liu, Thomas Stephen Bianchi, Nikola Toli#, Christer Jansson, and Ljiljana Paša-Toli# Anal. Chem., Just Accepted Manuscript • Publication Date (Web): 19 Apr 2018 Downloaded from http://pubs.acs.org on April 19, 2018
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
1 1 2
Moving beyond the van Krevelen diagram: A new stoichiometric approach for compound classification in organisms.
3
Headline: Stoichiometric compound classification
4
Albert Rivas-Ubach1†*, Yina Liu1,2†, Thomas S. Bianchi3, Nikola Tolić1, Christer Jansson1, Ljiljana Paša-Tolić1
5 6 7
1. Environmental Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, 99354, WA, USA. 2. Geochemical and Environmental Research Group, Texas A&M University, College Station, 77845, TX, USA. 3. Department of Geological Sciences, University of Florida, Gainesville, 32611-2120, FL, USA.
8 9
† Authors contributed equally to this manuscript.
10 11 12 13 14 15 16 17
* Author of correspondence: Albert Rivas-Ubach Environmental Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA, 99354 Tel: 971 319 5962 e-mail:
[email protected] /
[email protected] 18 19
Keywords: Compound Classification, Ecological Stoichiometry, High-resolution mass spectrometry, Mass Spectrometry, Metabolomics, van Krevelen,
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Abbreviations: vK = van Krevelen MSCC = Multidimensional Stoichiometric Constraints Classification HRMS = High Resolution Mass Spectrometry MS = Mass Spectrometry FT-ICR = Fourier Transform Ion Cyclotron Resonance ESI = Electrospray ionization CIA = Compound Identification Algorithm CRAM = Carboxyl-rich Alicyclic Molecules Lipidsc = Lipids category Proteinsc = Protein category A-Sugarsc = Amino-Sugars category Carbohydratesc = Carbohydrates category Nucleotidesc = Nucleotides category Phytochemicalc = Phytochemical compounds category CM = Correctly Matched compounds IM = Incorrectly Matched compounds NM = Not Matched compounds DM = Double Matches IM+DM = Incorrectly Matched compounds considering double matches (DM) as incorrect. CM-(NM+DM) = Correctly Matched compounds without considering the not matched (NM) and double matched (DM) compounds.
ACS Paragon Plus Environment
1
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 26
2 42
Abstract
43 44
van Krevelen diagrams (O:C vs H:C ratios of elemental formulas) have been widely used in
45
studies to obtain an estimation of the main compound categories present in environmental samples.
46
However, the limits defining a specific compound category based solely on O:C and H:C ratios of
47
elemental formulas have never been accurately listed or proposed to classify metabolites in biological
48
samples. Furthermore, while O:C vs. H:C ratios of elemental formulas can provide an overview of the
49
compound categories, such classification is inefficient because of the large overlap among different
50
compound categories along both axes. We propose a more accurate compound classification for
51
biological samples analyzed by high-resolution mass spectrometrybased on an assessment of the
52
C:H:O:N:P stoichiometric ratios of over 130,000 elemental formulas of compounds classified in 6 main
53
categories: lipids, peptides, amino-sugars, carbohydrates, nucleotides and phytochemical compounds
54
(oxy-aromatic compounds). Our multidimensional stoichiometric compound classification (MSCC)
55
constraints showed a highly accurate categorization of elemental formulas to the main compound
56
categories in biological samples with over 98% of accuracy representing a substantial improvement over
57
any classification based on the classic van Krevelen diagram. This method represents a signficant step
58
forward in environmental research, especially ecological stoichiometry and eco-metabolomics studies,
59
by providing a novel and robust tool to further our understanding the ecosystem structure and function
60
through the chemical characterization of different biological samples.
61 62 63 64 65 66 67 68 69 70 71 72 73
ACS Paragon Plus Environment
2
Page 3 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
3 74 75
Introduction The role of nutrients in organisms, especially primary producers, has been a critical topic in
76
ecosystem studies of biogeochemical cycles,1,2 trophic relationships,3 and ecological stoichiometry.4,5
77
The most abundant elements in living organisms are the macroelements, C, H, O, N, P, S, K and, Ca,5
78
which typically behave as constituents of organic compounds.6 Understanding how these elements are
79
selectively allocated into different categories of organic compounds remains a key challenge in many
80
biogeochemical and ecological studies.4,6,7 Although the ratios between these key macroelementshave
81
proven useful in ecological studies, accurately characterizing the major compound categories in
82
organisms still remains a significant challenge.8 For example, the N:P ratio has been valuable as an index
83
of the protein:nucleic-acids (e.g., DNA, RNA) ratio, due to the high content of N in proteins and P in
84
nucleic acids.5,9–11 However, this approach has many limitations since the same macroelements are also
85
commonly found in other organic compound categories.
86
The metabolome of an organism, defined as the entire suite of low-molecular weight
87
compounds (metabolites; typically < 1200 Da) present at any given time and under specific conditions,12
88
provides valuable information for understanding the ecophysiology of organisms. Metabolomes include
89
primary cellular products such as carbohydrates, small peptides, nucleotides, and lipids, as well as
90
secondary metabolites that participate in several diverse physiological processes. Most notably, changes
91
in stoichiometric C:N:P ratios in organisms have been shown to be more linked with the overall
92
metabolome structure rather than with specific compounds or groups of compounds.6,13 Several
93
ecosystem studies have combined metabolomic, elemental, and stoichiometric data to examine linkages
94
between C:N:P biomass stoichiometries with the overall metabolome composition,6,14–18 yet
95
stoichiometry-metabolome relationships remains elusive. A key reason for this ambiguity is our inability
96
to identify the majority of metabolites, in large part due to current technological limitations. For
97
example, the number of identified metabolites in non-targeted studies is typically < 200,8,13,15,19–22 even
98
when complementary analytical techniques are combined (e.g. mass spectrometry and nuclear magnetic
99
resonance). This small proportion of characterized metabolome, combined with a lack of understanding
100
of the dynamics of the major organic compound categories in organisms’ under different environmental
101
conditions, remains a key obsatcle for understanding the mechanisms of the linkages between the C:N:P
102
stoichiometry of organisms and their metabolomes.
103
In the 1950s, van Krevelen developed a graphical representation of macroelemental ratios to
104
evaluate the origin and chemical evolution of petroleum and kerogen samples.23 This graphical
105
representation, the van Krevelen diagram (hereafter vK diagram), represents atomic O:C vs. H:C ratios of
ACS Paragon Plus Environment
3
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 26
4 106
organic compounds,23,24 and has become an important tool for characterizing organic matter extending
107
well beyond petrochemical applications.25–28 vK diagrams have been used in numerous studies to
108
elucidate chemical reactions25,27,28 as well as to assess main compound categories in natural organic
109
matter (NOM) commonly defined as lipid-like, protein-like, amino-sugar-like, cellulose-like, lignin-like,
110
and condensed hydrocarbons.26,27,29 In addition, vK diagrams were also used to examine the chemical
111
characteristics of carboxyl-rich alicyclic molecules (CRAM)30,31 and oxidized black carbon32 in NOM.
112
However, the large overlap in vK diagram between compound categories, often leading to incorrect
113
compound classification (Fig. 1), and the ambiguity of “compound-like” term make the compound
114
classification based solely on O:C and H:C ratios inefficient for obtaining robust conclusions for
115
ecological and biogeochemical studies. For example, using vK diagram alone to depict CRAM is
116
problematic as these compounds significantly overlapped with the tradiationnaly-defined “lignin-like”
117
region,33 but CRAM compounds may not be related to lignins derived from vascular plants. Decomposing
118
organic matter, largely driven by microorganisms (eg. oxidation, methylation, and dehydration) and
119
photo-oxidation, can result in substantial shifts of molecular O:C, H:C and N:C ratioscomparing to the
120
original composition. Therefore, using compound-classification methods based on O:C and H:C ratios
121
alone when studying complex mixtures of organic matter in different matrices could, thus, lead to
122
inconclusive and/or confounding results – largely due to significant compound transformation
123
processes. Besides the large compound overlapping and the transformation of original molecular ratios,
124
the O:C and H:C boundaries defining a specific compound category in a vK diagram substantially differ
125
among published studies (see D’Andrilli et al. 201534) and have never been accurately defined for a
126
robust overall classification of compounds. Many metabolites contain other heteroatoms, such as N or
127
P; therefore, including these heteroatoms in additional to O in multidimensional compound
128
classification approaches, i.e., using more than two dimensions (e.g. O:C and H:C ratios) should provide
129
a better performance in classifying compounds into their corresponding main category (e.g. Lipids,
130
Protein, etc.) (Fig. 2).
131
N and P have been widely considered as the pillars of the elemental ratios found in most ecological
132
and ecophysiological studies, largely because of their significant roles in ecosystem structure and
133
function, especially as they relate to the carbon cycle. For example, C:N, C:P or N:P ratios are important
134
factors in ecological processes such as nitrogen fixation, litter decomposition, trophic relationships,
135
organism growth rate, biodiversity, the responsiveness capacity of organisms, and ecosystems
136
responses to stressors.4–6,35 Lipids, proteins, carbohydrates, and nucleic acids represent the four major
137
building blocks of life and occur in different proportions in all living systems. In addition to these core-
ACS Paragon Plus Environment
4
Page 5 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
5 138
compound categories, plants, fungi, and bacteria, also produce an assortment of secondary metabolites,
139
commonly oxy-aromatic compounds, with diverse elemental composition that typically participate in
140
specific functions.36–38
141
Here, we propose an optimized compound classification method based on the C:H:O:N:P
142
stoichiometric ratios of more than 130,000 molecular formulas of known compounds classified in six
143
main categories: lipids, peptides, amino-sugars, carbohydrates, nucleotides, and phytochemical
144
compounds. The MSCC represents a substantial improvement over the classic vK approach (i.e., O:C and
145
H:C ratios) for overall compound classification. We demonstrate that the MSCC vastly improves the
146
exploration and interpretation of the overall molecular composition in biological samples, as applied to
147
ecology, ecophysiology, ecological stoichiometry, organic chemistry, organic geochemistry, and
148
biogeochemistry. While this novel compound classification approach is more tailored for plants, it can
149
be also applicable to other organisms. To demostrate our method can readily characterize the different
150
metabolite compositions in different organisms, we analzyed three model organisms from different
151
kingdoms (Plantae, Fugi, and Animalia (insect)) with high resolution mass spectrometry and applied the
152
proposed MSCC.
153 154
Materials & Methods
155
Metabolite databases.
156
For the MSCC determination, we explored the C:H:O:N:P molecular ratios of a total of 132,209
157
elemental formulas from compounds consisting of 30,729 lipids; 93,245 peptides (including
158
phosphorylated peptides); 7,774 phytochemical compounds; 82 carbohydrates; 142 amino-sugars
159
(including amino sugar phosphates); and 37 nucleotides (Table S-1). Elemental formulas from lipids,
160
phytochemical compounds, carbohydrates, amino-sugars and nuclotides were obtained from different
161
compound databases: LIPID MAPS Lipidomics Gateway database,39 KEGG COMPOUND,40,41 and ChEBI
162
databases.42 Table S-1 includes the details of the compounds from each dataset included to each
163
compound category. Elemental formulas from peptides were compiled from FASTA file representation
164
of 2014 SwissProt snapshot. Over 78,000 peptide sequences within the mass range 50-1,200 Da were
165
converted to molecular formulas and H2O was added for peptide termini. Subsequently, 15,000 peptides
166
that could be phosphorylated according to their sequences were randomly selected and
167
computationally supplied with 1, 2 or 3 HPO4, depending on the size of the peptide, to generate a large
168
dataset of possible phosphopeptides. See pages S-2 and S-3 in the Supporting Information for a detailed
ACS Paragon Plus Environment
5
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 6 of 26
6 169
description of some considerations of the used compound databases for the determination of the
170
MSCC. See page S-4 of the Supporting Information for validation of MSCC determination.
171 172 173
Determination of new compound category constraints. The proposed compound classification in the MSCC system includes six of the following major
174
categories, according to the aforementioned databases: lipids category (hereafter Lipidsc), proteins
175
category (peptides; hereafter Proteinsc), amino-sugars category (hereafter A-Sugarsc), carbohydrates
176
category (hereafter Carbohydratesc), nucleotides category (hereafter Nucleotidesc) and phytochemical
177
compounds category (hereafter Phytochemicalc).
178
Pairwise, stoichiometric variable scatterplots were obtained between all compound databases
179
to explore the spatial distribution of molecules along all C:H:O:N:P combinations (Fig. S-1). N, P, S, O,
180
and, molecular weight (MW) variables were also considered to complement the stoichiometric
181
constraints of certain compound categories. Boundaries for compound classification were established
182
using the distribution of compounds from databases along all the examined stoichiometric and
183
elemental variables. For compound databases showing overlap in all the examined variables, we
184
considered the variable showing better separation (lowering the overlapping proportion) between the
185
databases as the discriminant variable for their classification (Fig. S-2 to S-8). By keeping the minimum
186
overlapping between compound categories and using diverse stoichiometries (e.g. O:C, H:C, N:C etc.),
187
our MSCC ensures a wide recovery of compounds and minimal compound matching error. We,
188
therefore, expect that the probability of matching compounds outside their category is minimal and
189
most of the detected features will match within their corresponding category boundaries (Fig. 4).
190
Additional specific information regarding the established boundaries of the MSCC are detailed in pages
191
S-2 and S-3 of Supporting Information. A R (https://www.r-project.org/) script is included in S-5 to S-9 of
192
the Supporting Information for compound classification of stoichiometry ratios using MSCC.
193 194 195
Comparison of the classic vK compound classification with the MSCC. We obtained the O:C and H:C boundaries defining the different compound categories from 21
196
published studies (citations shown in table S-2) using GetData Graph Digitizer 2.24 to compare the
197
performance of MSCC versus the classic compound classification - based exclusively on O:C vs. H:C. Due
198
to the vague definition of the “Lignin-like” and “condensed hydrocarbon” categories typically
199
represented in vK diagrams and the lack of specific databases including characterized compounds within
200
those categories, we only considered the lipids, proteins, amino-sugars, and carbohydrates (cellulose)
ACS Paragon Plus Environment
6
Page 7 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
7 201
databases to evaluate how the O:C and H:C boundaries of each compound category from all 21 studies
202
performed in contrast to our MSCC.
203
The proportion of compounds from each compound database (lipids, proteins, amino-sugars,
204
and carbohydrates) that correctly matched (CM), not matched (NM), and incorrectly matched (IM) in
205
their corresponding categories (e.g. proportion of lipids from the database that CM, NM or IM into the
206
Lipidsc) were calculated for the compound boundaries based solely on O:C and H:C ratios from the 21
207
studies and for our MSCC (Table S-2). Double matched (DM), or compounds that fit into two different
208
categories, were also calculated. We considered all amino-sugars matching the Carbohydratesc as CM
209
for those studies that did not present the A-Sugarsc but only Carbohydratesc (Table S-2). We did not
210
consider the carbohydrate database for studies that did not show Carbohydratesc but only A-Sugarsc.
211
Additionally, for each compound category (lipids, proteins, amino-sugars, and carbohydrates), we
212
calculated the proportion of IM considering the double matches as incorrect (IM+DM) and the CM
213
without considering the NM and DM (CM-(NM+DM)) (Table S-2). In order to assess the number of correctly
214
matched compounds per incorrectly matched we calculated the CM/IM+DM ratio for each compound
215
category. The CM/(IM+DM + NM) ratio was calculated and used to assess the efficiency of stoichiometric
216
constraints for the databases. Due the large difference between databases in number of compounds,
217
the overall performance of each study was evaluated using two approaches: i) considering the total
218
absolute number of compounds of all databases together (heareafter, “absolute total”), and ii)
219
considering the relative number of compounds for each database (hereafter, “relative total”).
220 221 222
MSCC performance. The performance of MSCC was also evaluated alone considering all compound categories: lipids,
223
peptides, amino-sugars, carbohydrates, nucleotides and phytochemical compounds. For each category,
224
the CM, NM, IM, CM-(NM+DM), CM/IM, and CM/(IM + NM) were calculated. The absolute total and relative
225
total values were also calculated for each of the compound categories. Additional performance
226
validation of the MSCC was performed for Lipidsc, Phytochemicalc and Proteinsc; the main categories
227
showing larger overlapping across their stoichiometry (see page S-5 of the Supporting Information).
228 229
Compound extraction and FT-ICR-MS analyses of model organisms.
230
To test the MSCC in real biological samples and show the contrasted profiles from organisms of
231
different kingdoms, fresh material of Brachypodium distachon (plant), Saccharomyces cerevisiae (fungi),
232
and Drosophila melanogaster (insect) were flash-frozen in liquid nitrogen, lyophilized, and then grinded
ACS Paragon Plus Environment
7
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 8 of 26
8 233
for metabolite extraction43 (Fig. 3a). Briefly, thirty mg of lyophilized B. distachon, S. cerevisiae, and D.
234
melanogaster powders were added separately to 2 mL glass vials followed by the addition of 1 mL of
235
MeOH/water (80:20). All samples were then placed into a Thermomixer (Eppendorf, New York, NY, USA)
236
and shaken for 30 min. at 1,200 rpm at 12 ᵒC. All vials were subsequently sonicated for 5 min. and then
237
centrifuged at 6,000 × g for 5 min. Supernatants were collected and placed into labeled 2 mL HPLC vials,
238
and kept frozen at at -80 ᵒC until analyses.
239
Metabolic fingerprints from these organisms were obtained using a ultra-high resolution 15
240
Tesla SolariX Fourier-transform ion-cyclotron resonance mass spectrometer (FT-ICR-MS; Bruker
241
daltonics Inc, Billerica, MA, USA) equipped with electrospray ionization (ESI) and operated in negative
242
ionization mode (Fig. 3b). Each sample was directly infused at a flow rate of 3 µL/min. A total of 200
243
spectra at 4 Mword (corresponding to resolution > 400,000 at m/z = 400) of each sample were averaged
244
for each sample. Mass measurement accuracy was < 1 ppm with external calibration, and < 0.2 ppm
245
after internal calibration. Formula assignment was performed using the Formularity software44 based
246
on the automated compound identification algorithm (CIA)45 (Fig. 3c). To avoid assignment ambiguity,
247
formulas for ions with m/z > 70046 and assignments with > 0.2 ppm error were not considered.
248 249 250 251
Results Table 1 shows the determined stoichiometric and elemental boundaries for each compound category of the MSCC.
252 253 254
Contrasting vK vs. MSCC stoichiometric boundaries performance. When considering lipid, peptide, amino-sugar, and carbohydrate databases, for compound
255
classification limits derived from the O:C vs. H:C ratios as shown in the different 21 published studies,
256
total absolute CM compounds varied from the 14.63% to 63.64% (Fig. 5a, Table S-2), and the absolute
257
total IM+DM compounds varied from 0.7% to 68.93%. The absolute total CM and IM+DM compounds for
258
the proposed MSCC were, 99.19% and 0.42%, respectively (Fig. 5a; Table S-2). The relative total CM
259
ranged from the 12.43% and 55.68% for the 21 published studies, in contrast to 98.5% using the MSCC
260
(Fig. 5b).
261
The absolute total CM/IM+DM ratio ranged from 1.19 to 21.01 for the 21 published studies while
262
the MSCC yielded a ratio of 234.64 (Fig. 5c). The relative total CM/IM+DM ratio varied between 1.09 and
263
17.74 among the 21 studies and the MSCC had a ratio of 234.05 (Table S-2). The larger relative matching
264
error was generally attributed to lower proportions of NM compounds (Fig. 5d). Although NM
ACS Paragon Plus Environment
8
Page 9 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
9 265
compounds were not classified in any compound category, they did not introduce any error into the
266
results. For the 21 published studies, the CM-(NM+DM) compounds ranged between the 53.38% and
267
95.46% in absolute terms, and between the 56.74% and 96.18% in relative terms (Table S-2). The MSCC
268
reached 99.58% and 99.57% of absolute and relative total CM-(NM+DM) compounds, respectively. The
269
absolute and relative total CM/(IM+DM + NM) ratios varied between 0.17 and 1.75 and between 0.15 and
270
13.6, respectively, for the 21 published studies, while the MSCC showed a ratio of 121.73 and 272.05,
271
respectively (Figs. 3e-3f).
272 273 274
Evaluation of the MSCC performance considering all compound databases. The evaluation of the MSCC results obtained considering the molecular formulas from the six
275
databases (lipids, peptides, phytochemical compounds, nucleotides, amino-sugars, and carbohydrates)
276
are shown in Table 2. We obtained 98.8%, 0.14% and 0.9% of the absolute total compounds as CM, NM
277
and IM, respectively. Lipids, peptides, and phytochemical compound databases showed the highest
278
overlap among them for all the elemental and stoichiometric variables. However, only a 2.5% of lipid
279
formulas were IM to other groups (1.5% matched into Proteinc, 0.9% matched into Phytochemicalc,
280
0.08% matched into Carbohydratesc and 0.06% matched into A-Sugarc). For the peptide formulas, 0.06%
281
were IM (0.01%, 0.05% and 0.002% matched into A-Sugarc, Phytochemicalc and Lipidc respectively). An
282
estimated 3.3% of the phytochemical compound formulas were IM as A-Sugarsc (0.33%), Lipidsc (2.7%)
283
and Proteinc (0.3%). Amino-sugar and carbohydrate formulas did not show any IM although 1.4% (2
284
compounds) and 1.22% (1 compound) of compounds, respectively, were NM according to MSCC. All
285
nucleotide formulas CM to Nucleotidesc although we found few ones that also matched as Proteinc
286
(40%) and A-Sugarc (5%).
287
Due to the high recovery of compounds using the MSCC, the total relative proportions of CM
288
(98.5%) did not show large changes from the total absolute proportions of CM (99.8%) (Table 2). The
289
total absolute CM/IM ratio was 108.3, so 110 compounds from the databases were CM per each IM. The
290
total absolute CM/(IM + NM) ratio was 95.82.
291 292
Proportions of compounds based on MSCC for model organisms.
293
Based on the metabolite extraction, performed with MeOH:H2O (80:20) and the ESI-FT-ICR-MS
294
analyses in negative ionization mode, we obtained very distinct compound profiles between the three
295
organisms analyzed (Fig. 6). We only found the 9.6%, 7.5% and 8.7% of the metabolic features for B.
296
distachon (plant), S. cerevisiae (yeast), and D. melanogaster (insect), respectively, not matching any of
ACS Paragon Plus Environment
9
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 10 of 26
10 297
the compound categories - as defined by the MSCC defined in Table 1. The 0.1%, 0.4% and 0.3% of
298
variables from B. distachon, S. cerevisiae and D. melanogaster, respectively, matched into two different
299
categories. Phytochemical compounds and lipids were the most abundant compound categories for B.
300
distachon, representing 40.4% and 32.8% of the variables, respectively. S. cerevisiae also showed a large
301
fraction of lipids (31%) and oxy-aromatic compounds (33.6%), followed by protein (12.3%) and amino
302
sugars (10.6%), while D. melanogaster showed a very distinct profile with 25.0%, 24.3%, 17.8%, and
303
10.1% of lipids, proteins, amino sugars, and carbohydrates, respectively.
304 305
Discussion
306
Rationale and performance of the proposed MSCC.
307
Lipids, proteins, carbohydrates and nucleic acids are the main building blocks of life and are
308
present in all living organisms. Although the MSCC was focused on plants, they could be applied to other
309
organisms (see Extension of MSCC to different living systems section). For a more exhaustive chemical
310
characterization of plant samples, we included the “Phytochemical compounds” category
311
(Phytochemicalc), which has been delimited by the stoichiometry from thousands of molecular formulas
312
of well-characterized plant secondary metabolites. These compounds are well known for the crucial
313
roles they play in the physiological, developmental, and anti-stress processes in plants.47,48 We used
314
thousands of elemental formulas from described compounds to accurately define the multiple
315
stoichiometric boundaries for each compound category. By using large compound databases and
316
multiple boundaries defining each compound category, the probability of matching metabolites outside
317
their corresponding major category is minor (Fig. 4; Table 2). The application of the proposed MSCC for
318
the classification of elemental formulas is thus held under the premise that the vast majority of the
319
detected features by high-resolution mass spectrometry (e.g. FT-ICR-MS) in samples should match to the
320
major compound categories (Figs. 4, S-3 to S-8), including Phytochemicalc, in the case of plants.
321
The MSCC (Table 1) is primarily based on the examination of the H:C, O:C, N:C, P:C and N:P
322
ratios of a substantial number of described metabolites covering different organic compound categories.
323
The MSCC provides a new powerful approach for classifying different molecular compounds into their
324
respective categories, with minimal error and exceptional coverage (Fig. 5; Tables 2 and S-2). The
325
enlargement of the areas delimited by O:C and H:C ratios of any compound categories in a vK diagram
326
increases the coverage of compound matching, but also increases the compound matching error (Fig.
327
5d). As such, the use of MSCC allows for a more accurate classification of the different metabolites into
328
six major compound categories - with reduced compound matching error. After the examination of large
ACS Paragon Plus Environment
10
Page 11 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
11 329
databases that included primary and secondary metabolites, we contend that the probability of
330
incorrectly matching metabolites using MSCC is minimal (Figs 4, 5c, 5e-5f) and thus provides a significant
331
improvement over the vK method for assessment of the overall spectrum of compounds in biological
332
samples.
333 334 335
Practical limitations. Advanced mass spectrometry offers high resolution (up to 1,000,000) (Brown, Kruppa, &
336
Dasseux, 2005; Denisov, Damoc, Lange, & Makarov, 2012). Ultrahigh mass-measurement accuracy
337
(typically < 1 ppm after internal calibration) by Fourier transform mass spectrometry (FTMS) allows
338
assignment of elemental formulas for the majority of the thousands of compounds detected in a given
339
sample.49,50 The application of high-resolution FTMS (HR-FTMS) in environmental and biological research
340
has allowed assessment of the diversity of molecular compounds present in a particular sample,
341
assuming the presence of several different essential elements, such as O, S, N and P, and calculating
342
their stoichiometric ratios. Assigning elemental formulas to HRMS data can thus be a valuable tool for
343
simply comparing different stoichiometric metabolic profiles, or for exploring how organisms assign
344
different elements to different molecular compounds under certain environmental conditions, which is
345
an important aspect of ecological stoichiometry studies.4–6
346
HRMS provides unprecedented means for understanding molecular ecological stoichiometry in
347
complex samples, nevertheless there are few limitations. For instance, accurate formula assignments
348
can be hindered by (1) mass measurement accuracy achieved by the HRMS after routine external
349
calibration, (2) final mass measurement accuracy after internal calibration, and (3) assignment accuracy
350
from the formula assignment algorithm. Typically, mass measurement accuracyof < 1 ppm error is
351
expected using well-maintained,i.e. routinely cleaned and calibrated,HR-FTMS instruments,such as FT-
352
ICR-MS.51 Less than 0.2 ppm of mass error is often achieved with a HR-FTMS by employing internal
353
calibration,52,53 which is a crucial step for accurate formula assignmentby CIA45. In fact, CIA correctly
354
assigned 96.94% of the known molecular formulas from our compound databases (Table S-3), when
355
considering all formula types and full mass range (70 to 1200 Da) (see page S-6 and Table S-3 in
356
Supporting Information for a detailed assessment of the formula assignment error of the compound
357
databases using CIA). Furthermore, it is important to note that as molecular mass increases, the number
358
of possible formula assignments also disproportionally increases, especially when multiple heteroatoms
359
are considered.45,46 Hence, only lower mass metabolites have been typically considered for formula
360
assignment (e.g. < 500 Da) to reduce the number of false positives.46 Therefore, using formula
ACS Paragon Plus Environment
11
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 12 of 26
12 361
assignments for compounds < 500 Da and mass measurement accuracy < 0.2 ppm is advised for more
362
robust compound classification in samples; formula assignments for compounds over 500 Da should be
363
treated with caution. It should be noted that in addition to mass measurement accuracy related issue,
364
another inherit limitation of direct infusion FT-ICR-MS is the inability in distinguishing between structural
365
isomers. For example, a lignin compound may have the same elemental formula as a CRAM compound,
366
while they have different structure and different origins in nature.33
367 368 369
Extension of MSCC to different living systems. The MSCC presented here can also be applied to classification of lipids, proteins, carbohydrates
370
and nucleotides in all other living systems. “Lignin-like” and “condensed hydrocarbon” compound
371
categories are often depicted in vK diagrams.34 However, these categories are not directly applicable for
372
metabolites in living systems. Any organic sample contains a significant proportion of molecules with
373
low H:C ratios (< 1.3), typically represented by aromatic or unsaturated compounds. Those metabolites
374
possess one or more aromatic rings and different functional groups (e.g. OH, -C(O)OH, and -NO2) making
375
up a large variety of compounds along the O:C axis, typically polyphenolics. Ligninconsists of large
376
polymers of phenolic compounds (C31H34O11)n and represents one of the main components of plant cell
377
walls. Several plant phenolic secondary metabolites are lignin precursors (lignins) but other non-lignin
378
related phenolics, such as flavonoids, are abundantsecondary metabolites,36 and cluster within the
379
“lignin-like” category in a vK diagram (Fig. 1). Phytochemicalc, which include all polyphenolic
380
compounds, should only be used for plant samples. Another category of phenolics formed by CRAM that
381
occur within the “lignin-like” region of the vK diagrams has also been described.29–31,54 However, not all
382
compounds within this area in a vK diagram are necessarily carboxyl-rich. On the other hand, condensed
383
hydrocarbons are usually considered to be derived from incomplete combustion or geo-condensation
384
such as char or petroleum, respectively.55 Since condensed hydrocarbons category is not directly
385
applicable to organisms and overlaps with our proposed Phytochemicalc (oxy-aromatics), this category
386
was not considered in the MSCC. For the analyses of samples other than plants, we propose the use the
387
general term of “oxy-aromatic compounds” to include all those compounds with low H:C, mainly
388
polyphenolics, applying the same MSCC as for Phytochemicalc.
389
The compound profiles of the analyzed model organisms were obtained through an MeOH/H2O
390
(80:20) extraction and the extracted composition will vary if using different solvents.56,57 Thus, it is
391
important to use the same procedures for sample comparison. Additionally, the ionization method
392
dictates the nature of compounds being observed with MS platforms. Electrospray ionization (ESI) is
ACS Paragon Plus Environment
12
Page 13 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
13 393
often used to resolve polar and semi-polar compounds, while other ionization methods such as
394
atmospheric pressure photoionization (APPI) or atmospheric pressure chemical ionization (APCI) could
395
be considered for non-polar matrices.
396
As expected, three model organisms studied herein display very different compound profiles (Fig. 6).
397
Based on the ESI-FT-ICR-MS analyses of the MeOH/H2O (80:20) extracts, over 90% of the signals
398
detected in all three cases were matched into one of the compound groups of the MSCC (Table 1)
399
demonstrating a reliable compound classification. Furthermore, the proportion of double assignations
400
was minimal, reaching the maximum values for D. melanogaster (insect) (0.4%).
401
Insects are protein-rich organisms58 with relatively low content of polyphenolic compounds
402
compared to plants or fungi. Our results clearly showed this trend with 24.3% of pepetides in D.
403
melanogaster vs. 12.3% in S. cerevisiae (fungi) and 5.7% in B. distachon (plant) (Fig. 6). Chitin, an
404
abundant amino-sugar biopolymer,59 represents a major component in insects.60 The highest
405
proportions of amino sugars was detected in D. melanogaster, especially compared to plants, may
406
directly relates to the high chitin content of the exoskeleton of insects. On the other hand, plants and
407
fungi produce a large diversity of secondary metabolites, a major part of them with oxy-aromatic
408
structures.37 From all the detected features, 40.4% and 33.6% matched into Phytochemicalc (oxy-
409
aromatic category) for B. distachon and S. cerevisiae, respectively, further corroborating the high
410
content of secondary metabolites in those organisms.
411 412 413
Conclusions The proposed multidimensional stoichiometric constraints classification (MSCC) (Table 1) exhibited a
414
substantial improvement over vK diagrams in classifying database compounds with a minimal error and
415
large coverage (Table 2) and it can be applied to different organisms. Additionally, the classification
416
method can serve as a strong starting point to further investigate other (i.e. nonliving) complex
417
environmental matrices, and start defining and optimizing multiple elemental ratios allowing for more
418
robust classification for environmental samples. Inherit limitations originating from the mass
419
measurement accuracy and formula assignment accuracy should be carefully considered when
420
interpreting the MS data in terms of compound classifications and molecular level stoichiometric
421
interpretations. This stoichiometric compound classification method represents a valuable tool for
422
environmental research, especially in the fields of ecometabolomics and ecological stoichiometry.
423
ACS Paragon Plus Environment
13
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 14 of 26
14 424
Acknowledgments.
425
This research was performed at the Environmental Molecular Science Laboratory (EMSL), a DOE
426
Office of Science User Facility sponsored by the Office of Biological and Environmental Research at the
427
Pacific Northwest National Laboratory (PNNL). The research was funded in part by US Department of
428
Energy (DOE) Contract DE-AC05-76RL01830 with PNNL.
429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449
ACS Paragon Plus Environment
14
Page 15 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
15 450
References
451
(1)
Benner, J. W.; Vitousek, P. M. Ecol. Lett. 2007, 10 (7), 628–636.
452 453
(2)
Vitousek, P. M.; Aber, J. D.; Howarth, R. W.; Likens, G. E.; Matson, P. A.; Schindler, D. W.; Schlesinger, W. H.; Tilman, D. G. Ecol. Appl. 1997, 7 (3), 737–750.
454
(3)
Andersen, T.; Hessen, D. O. Limnol. Oceanogr. 1991, 36 (4), 807–814.
455
(4)
Sardans, J.; Rivas-Ubach, A.; Peñuelas, J. Biogeochemistry 2012, 111 (1–3).
456 457
(5)
Sterner, R.; Elser, J. Ecological Stoichiometry: The Biology of Elements from Molecules to the Biosphere; Princetion University Press, 2002.
458 459
(6)
Rivas-Ubach, A.; Sardans, J.; Peŕez-Trujillo, M.; Estiarte, M.; Penũelas, J. Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (11).
460
(7)
Peñuelas, J.; Sardans, J. Chem. Ecol. 2009, 25 (4), 305–309.
461
(8)
Sardans, J.; Peñuelas, J.; Rivas-Ubach, A. Chemoecology 2011, 21 (4), 191–225.
462 463
(9)
Bianchi, T. S.; Canuel, E. A. Chemical Biomarkers in Aquatic Ecosystems - Thomas S. Bianchi, Elizabeth A. Canuel - Google Books; Princeton University Press: New Jersey, 2011.
464
(10)
Elser, J. J.; Dobberfuhl, D. R.; MacKay, N. A.; Schampel, J. H. BioScience. 1996, pp 674–684.
465
(11)
Matzek, V.; Vitousek, P. M. Ecol. Lett. 2009, 12 (8), 765–771.
466
(12)
Fiehn, O. Plant Mol. Biol. 2002, 48 (1–2), 155–171.
467 468
(13)
Rivas-Ubach, A.; Gargallo-Garriga, A.; Sardans, J.; Oravec, M.; Mateu-Castell, L.; Pérez-Trujillo, M.; Parella, T.; Ogaya, R.; Urban, O.; Peñuelas, J. New Phytol. 2014, 202 (3).
469 470
(14)
Gargallo-Garriga, A.; Sardans, J.; Pérez-Trujillo, M.; Rivas-Ubach, A.; Oravec, M.; Vecerova, K.; Urban, O.; Jentsch, A.; Kreyling, J.; Beierkuhnlein, C. Sci. Rep. 2014, 4, 6829.
471 472
(15)
Sardans, J.; Gargallo-Garriga, A.; Pérez-Trujillo, M.; Parella, T. J.; Seco, R.; Filella, I.; Peñuelas, J. Plant Biol. (Stuttg). 2014, 16 (2), 395–403.
473 474
(16)
Rivas-Ubach, A.; Barbeta, A.; Sardans, J.; Guenther, A.; Ogaya, R.; Oravec, M.; Urban, O.; Peñuelas, J. Perspect. Plant Ecol. Evol. Syst. 2016, 21, 41-54.
475 476
(17)
Rivas-Ubach, A.; Hódar, J. A.; Sardans, J.; Kyle, J. E.; Kim, Y.-M.; Oravec, M.; Urban, O.; Guenther, A.; Peñuelas, J. Ecol. Evol. 2016, 6 (13), 4372–4386.
477 478
(18)
Rivas-Ubach, A.; Poret-Peterson, A. T.; Peñuelas, J.; Sardans, J.; Pérez-Trujillo, M.; Legido-Quigley, C.; Oravec, M.; Urban, O.; Elser, J. J. Acta Physiol. Plant. 2018, 40 (2), 28.
479
(19)
Kim, H. K.; Choi, Y. H.; Verpoorte, R. Nat. Protoc. 2010, 5 (3), 536–549.
480
(20)
Petersson, S. V.; Lindén, P.; Moritz, T.; Ljung, K. Metabolomics 2015, 11 (6), 1679–1689.
481
(21)
Pluskal, T.; Nakamura, T.; Villar-Briones, A.; Yanagida, M. Mol. BioSyst. 2009, 6 (1), 182–198.
ACS Paragon Plus Environment
15
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 16 of 26
16 482 483
(22)
Rivas-Ubach, A.; Sardans, J.; Hódar, J. A.; Garcia-Porta, J.; Guenther, A.; Paša-Tolić, L.; Oravec, M.; Urban, O.; Peñuelas, J. Ecol. Evol. 2017, 7 (21), 8976–8988.
484
(23)
van Krevelen, D. Fuel 1950, 29, 269–284.
485
(24)
Curiale, J. A.; Gibling, M. R. Org. Geochem. 1994, 21 (1), 67–89.
486
(25)
Baldock, J. A.; Smernik, R. J. Org. Geochem. 2002, 33 (9), 1093–1109.
487
(26)
D’Andrilli, J.; Foreman, C. M. Org. Geochem. 2013, 65, 19–28.
488
(27)
Kim, S.; Kramer, R. W.; Hatcher, P. G. Anal. Chem. 2003, 75 (20), 5336–5344.
489
(28)
Kracht, O.; Gleixner, G. Org. Geochem. 2000, 31 (7–8), 645–654.
490 491
(29)
Minor, E. C.; Swenson, M. M.; Mattson, B. M.; Oyler, A. R. Environ. Sci. Process. Impacts 2014, 16 (9), 2064–2079.
492 493
(30)
Hertkorn, N.; Benner, R.; Frommberger, M.; Schmitt-Kopplin, P.; Witt, M.; Kaiser, K.; Kettrup, A.; Hedges, J. I. Geochim. Cosmochim. Acta 2006, 70 (12), 2990–3010.
494 495
(31)
Stubbins, A.; Spencer, R. G. M.; Chen, H.; Hatcher, P. G.; Mopper, K.; Hernes, P. J.; Mwamba, V. L.; Mangangu, A. M.; Wabakanghanzi, J. N.; Six, J. Limnol. Oceanogr. 2010, 55 (4), 1467–1477.
496
(32)
Kim, S.; Kaplan, L. A.; Benner, R.; Hatcher, P. G. Mar. Chem. 2004, 92 (1–4), 225–234.
497
(33)
Sleighter, R. L.; Hatcher, P. G. Mar. Chem. 2008, 110 (3–4), 140–152.
498 499
(34)
D’Andrilli, J.; Cooper, W. T.; Foreman, C. M.; Marshall, A. G. Rapid Commun. Mass Spectrom. 2015, 29 (24), 2385–2401.
500 501 502
(35)
Elser, J. J.; Fagan, W. F.; Denno, R. F.; Dobberfuhl, D. R.; Folarin, A.; Huberty, A.; Interlandi, S.; Kilham, S. S.; McCauley, E.; Schulz, K. L.; Siemann, E. H.; Sterner, R. W. Nature 2000, 408 (6812), 578–580.
503
(36)
Bennett, R. N.; Wallsgrove, R. M. New Phytol. 1994, 127 (4), 617–633.
504
(37)
Keller, N. P.; Turner, G.; Bennett, J. W. Nat. Rev. Microbiol. 2005, 3 (12), 937–947.
505
(38)
Pietra, F. Nat. Prod. Rep. 1997, 14 (5), 453.
506 507 508
(39)
Sud, M.; Fahy, E.; Cotter, D.; Brown, A.; Dennis, E. A.; Glass, C. K.; Merrill, A. H.; Murphy, R. C.; Raetz, C. R. H.; Russell, D. W.; Subramaniam, S. Nucleic Acids Res. 2007, 35 (Database), D527– D532.
509
(40)
Kanehisa, M.; Goto, S. Nucleic Acids Res. 2000, 28 (1), 27–30.
510 511
(41)
Kanehisa, M.; Sato, Y.; Kawashima, M.; Furumichi, M.; Tanabe, M. Nucleic Acids Res. 2016, 44 (D1), D457–D462.
512 513
(42)
Hastings, J.; de Matos, P.; Dekker, A.; Ennis, M.; Harsha, B.; Kale, N.; Muthukrishnan, V.; Owen, G.; Turner, S.; Williams, M.; Steinbeck, C. Nucleic Acids Res. 2012, 41 (D1), D456–D463.
ACS Paragon Plus Environment
16
Page 17 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
17 514 515
(43)
Rivas-Ubach, A.; Pérez-Trujillo, M.; Sardans, J.; Gargallo-Garriga, A.; Parella, T.; Peñuelas, J. Methods Ecol. Evol. 2013, 4 (5), 464–473.
516 517
(44)
Tolić, N.; Liu, Y.; Liyu, A.; Shen, Y.; Tfaily, M. M.; Kujawinski, E. B.; Longnecker, K.; Kuo, L.-J.; Robinson, E. W.; Paša-Tolić, L.; Hess, N. J. Anal. Chem. 2017, 89 (23), 12659–12665.
518
(45)
Kujawinski, E. B.; Behn, M. D. Anal. Chem. 2006, 78 (13), 4363–4373.
519
(46)
Koch, B. P.; Dittmar, T.; Witt, M.; Kattner, G. Anal. Chem. 2007, 79 (4), 1758–1763.
520
(47)
Gill, S. S.; Tuteja, N. Plant Physiol. Biochem. 2010, 48 (12), 909–930.
521
(48)
Pietta, P.-G. J. Nat. Prod. 2000, 63, 1035–1042.
522
(49)
Kujawinski, E. Environ. Forensics 2002, 3 (3), 207–216.
523
(50)
Marshall, A. G.; Hendrickson, C. L.; Jackson, G. S. Mass Spectrom. Rev. 1998, 17, 1–35.
524
(51)
Brown, S. C.; Kruppa, G.; Dasseux, J.-L. Mass Spectrom. Rev. 2005, 24 (2), 223–231.
525 526
(52)
Stubbins, A.; Silva, L. M.; Dittmar, T.; Van Stan, J. T. Front. Earth Sci. Front. Earth Sci 2017, 5 (5), doi: 10.3389/feart.2017.00022.
527 528
(53)
Sleighter, R. L.; Mckee, G. A.; Liu, Z.; Hatcher, P. G. Limnol. Oceanogr. Methods 2008, 6 (6), 246– 253.
529
(54)
Lechtenfeld, O. J.; Hertkorn, N.; Shen, Y.; Witt, M.; Benner, R. Nat. Commun. 2015, 6, 6711.
530 531
(55)
Podgorski, D. C.; Hamdan, R.; McKenna, A. M.; Nyadong, L.; Rodgers, R. P.; Marshall, A. G.; Cooper, W. T. Anal. Chem. 2012, 84 (3), 1281–1287.
532 533
(56)
t’Kindt, R.; De Veylder, L.; Storme, M.; Deforce, D.; Van Bocxlaer, J. J. Chromatogr. B. Analyt. Technol. Biomed. Life Sci. 2008, 871 (1), 37–43.
534 535
(57)
Tfaily, M. M.; Chu, R. K.; Tolić, N.; Roscioli, K. M.; Anderton, C. R.; Paša-Tolić, L.; Robinson, E. W.; Hess, N. J. Anal. Chem. 2015, 87 (10), 5206–5215.
536
(58)
DeFoliart, G. R. Bull. Entomol. Soc. Am. 1975, 21 (3), 161–164.
537
(59)
Kumar, M. N. V. R. React. Funct. Polym. 2000, 46, 1–27.
538 539
(60)
Muthukrishnan, S.; Merzendorfer, H.; Arakane, Y.; Kramer, K. J. In Insect Molecular Biology and Biochemistry; Gilbert, L. I., Ed.; Elsevier: San Diego, 2012; pp 193–225.
540 541 542 543 544
ACS Paragon Plus Environment
17
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 18 of 26
18 545 546 547 548 549
Table 1. Proposed C:H:O:N:P stoichiometric and elemental constraints for each compound category. Protein category (Proteinc) is composed by two different series of constraints which have to be used at the same time. Molecular mass range is also proposed for nucleotide category (Nucleotidesc). Table cells with hyphens (-) indicate that the specific variable is not necessary to be used as discriminant for the specific compound category.
550
O:C
N:C
P:C
N:P
O
N
P
S Mass
≤5
-
-
-
-
-
-
-
≥1
-
-
-
-
-
≥1
-
-
-
≤ 0.6
≥ 1.32 ≤ 0.126 < 0.35
> 0.12 ≤ 0.6 > 0.6 ≤1
> 0.9 < 2.5 > 1.2 < 2.5
≥ 0.126 0.2 0.07 ≤ 0.2
< 0.3
≤2
≥3
≥1
-
-
-
Carbohydratesc
≥ 0.8
-
-
-
-
=0
-
-
-
Nucleotidesc *
≥ 0.5 < 1.7
≥ 1.65 < 2.7 >1 < 1.8
-
≥2 ≥1 =0
Phytochemicalc
≤ 1.15
< 1.32 < 0.126 ≤ 0.2
Lipidc Constraints 1 Proteinc Constraints 2
551
H:C
≥ 0.2 ≤ 0.5
≥ 0.1 > 0.6 < 0.35 ≤ 5 ≤3
-
-
-
-
> 305 < 523 -
* Double matches in Nucleotidesc should be considered as nucleotides.
552 553 554 555 556 557 558 559 560 561 562 563
ACS Paragon Plus Environment
18
Page 19 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
19 564 565 566 567 568 569 570 571
Table 2. Number of compounds from databases and proportions of database compounds that correctly matched (CM), not matched (NM) and, incorrectly matched (IM) with each proposed compound category according to the porposed constraints from Table 1. The absolute number of compounds is shown in brackets. Correctly matched excluding the NM and double matchings (DM) (CM-(NM+DM)) and the CM/IM and CM/(IM + NM) ratios are also shown. The total absolute and relative proportions are shown on the calculations based on the absolute number of compounds in databases and on the relative number of compounds, respectively. The percentages marked by asterisks indicate the possibility of DM and it is discussed in the manuscript.
572
Lipids (30,729)
CM
NM
97.1% (29,823)
0.4% (122)
IM 2.5% (784)
CM-(NM+DM)
CM/IM
CM/(IM + NM)
· 17 A-Sugarc (0.06%) · 24 Carbohydratesc (0.08%) · 271 Phytochemicalc (0.9%) · 472 Proteinc (1.5%)
97.4%
38.0
32.91
99.9%
1502.3
1397.4
100%
70
70
98.8%
80.96
81.0
100%*
∞
∞
96.7%
29.3
27.3
0.06% (62)
573 574
Peptides (93,245)
99.9% (93,142 )
0.04% (41)
Amino sugars (142) Carbohydrates (82)
98.6% (140) 98.8% (81)
1.4% (2) 1.2% (1)
0% (0) 0% (0) 0%*
Nucleotides (37)
100%* (37)
0% (0)
Double matching: · 15 with Proteinc (40%) · 2 with with A-Sugarc (5%)
· 10 A-Sugarc (0.01%) · 50 Phytochemicalc (0.05%) · 2 Lipids (0.002%)
3.3% (256)
Phytochemical compounds (7,774)
96.5% (7,499)
0.25% (19)
TOTAL Absolute (132,009) Total Relative
98.8% (130,401)
0.14% (390)
0.9% (1,204)
99.1%
108.3
95.82
98.5%
0.55%
0.97%
98.8%
-
-
· 26 A-Sugarc (0.33%) · 207 Lipidsc (2.7%) · 23 Proteinc (0.3%)
* Double matches in Nucleotidesc are not considered as incorrect matches since any peptide or amino sugar from the databases matched with all the constraints of the Nucleotidesc.
575 576 577 578 579
ACS Paragon Plus Environment
19
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 20 of 26
20
580 581 582 583 584 585 586 587
Figure 1. van Krevelen diagram (O:C vs. H:C elemental ratios) of metabolite databases. Different compound cateogires are represented in different colors (blue, lipids; dark red, peptides; yellow, aminosugars; orange, carbohydrates; cyan, nucleotides; green, phytochemical compounds). Classic compound classification areas according to O:C and H:C elemental ratios are represented. Shown areas are an approximation based on previous compound classification of organic matter studies (citations shown in table S-2) and do not represent exactly any specific areas.
588 589 590 591 592 593 594 595 596 597 598 599
ACS Paragon Plus Environment
20
Page 21 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
21
600 601 602 603
Figure 2. Examples of different molecular structures from different compound categories with their O:C, H:C, N:C, P:C, and N:P ratios. Nitrogen (N) and Phosphorus (P) are shown in red and green, respectively.
604 605 606 607 608
ACS Paragon Plus Environment
21
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 22 of 26
22
609 610 611 612 613
Figure 3. Diagram of the principal procedures to apply the multidimensional stoichiometric constraints classification (MSCC) on samples. a, b and c denote three of the procedures cited in the main text that have been performed before testing the MSCC on model organisms.
614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631
ACS Paragon Plus Environment
22
Page 23 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
23
632 633 634 635 636 637 638 639 640 641 642 643 644 645
Figure 4. Example of a 2-dimensional (2D) density plot of H:C vs. O:C elemental ratios (vK plot) for the peptide database (93,245 elemental formulas). Color gradient indicates distinct number of peptides within each squared area (red squares indicate the areas with higher density of peptides (up to 540-560 peptides); blue squares indicate the areas with lower density of peptides (1-20 peptides)). Contour lines including the 80% and 95% of peptides are shown in black. Boundaries of multidimensional stoichiometric constraints classification (MSCC) for O:C and H:C ratios are indicated with red dashed lines (see Table 1 for accurate stoichiometric thresholds). By probability, the major part of peptides detected in samples will be found within the high-density area (95%). The boundaries of MSCC (Table 1) are substantially extended with respect to the high-desity area and the classification is based on multiple stoichiometric constraints and not just 2 as shown in the figure making thus the probability of matching compounds outside their defined stoichiometric constraints using the MSCC minimal. See Figures S-3 to S-8 for all stoichiometric constraints from all compound categories.
646 647 648 649 650 651 652
ACS Paragon Plus Environment
23
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 24 of 26
24
653 654 655 656 657 658 659 660 661
Figure 5. Matching results of lipids, protein, amino-sugar and carbohydrate databases to its corresponding category according to the proposed multidimensional stoichiometric constraints classification (MSCC) (0; star) and the O:C and H:C tresholds provided by different studies (1 to 21). Proportion of total correctly matched compounds in absolute (a) and relative (b) terms. Correctmatched:Incorrect-matched ratio in absolute terms (c). Not-matched versus incorrect-matched compounds (d). Correct-matched:(Incorrect-matched+Not-matched) ratio in absolute terms (e) and relative terms (f). References for each of the studies are shown in table S-2.
662 663 664 665 666 667 668 669 670 671
ACS Paragon Plus Environment
24
Page 25 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
25
672 673 674 675 676 677 678 679 680 681 682 683
Figure 6. Pie diagrams representing the relative abundance (%) of the compound categories defined by the proposed multidimensional stoichiometric constraints classification (MSCC) (Table 1) for Brachypodium distachon (plant), Saccharomyces cerivisiae (yeast) and Drosophila melanogaster (insect). Each compound category is represented by different color. Metabolic variables that did not matched to any of the compound categories are shown in grey. Variables that matches into two compound categories are represented in black. The number of variables representing the metabolic fingerpints for each organism are shown below each pie diagram. Phytochemical compounds category (Phytochemicalc) is shown only for B. distachon. Oxy-aromatic compounds category is shown for S. cerevisiae and D. melanogaster.
684 685 686 687 688 689 690
ACS Paragon Plus Environment
25
Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 26 of 26
26 691
For TOC only
692
693 694 695 696 697 698
ACS Paragon Plus Environment
26