Moving beyond the van Krevelen Diagram: A New ... - ACS Publications

Apr 19, 2018 - Geochemical and Environmental Research Group, Texas A&M ... University of Florida, Gainesville, Florida 32611-2120, United States...
0 downloads 0 Views 1MB Size
Subscriber access provided by UNIV OF NEW ENGLAND ARMIDALE

Moving beyond the van Krevelen diagram: A new stoichiometric approach for compound classification in organisms. Albert Rivas-Ubach, Yina Liu, Thomas Stephen Bianchi, Nikola Toli#, Christer Jansson, and Ljiljana Paša-Toli# Anal. Chem., Just Accepted Manuscript • Publication Date (Web): 19 Apr 2018 Downloaded from http://pubs.acs.org on April 19, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1 1 2

Moving beyond the van Krevelen diagram: A new stoichiometric approach for compound classification in organisms.

3

Headline: Stoichiometric compound classification

4

Albert Rivas-Ubach1†*, Yina Liu1,2†, Thomas S. Bianchi3, Nikola Tolić1, Christer Jansson1, Ljiljana Paša-Tolić1

5 6 7

1. Environmental Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, 99354, WA, USA. 2. Geochemical and Environmental Research Group, Texas A&M University, College Station, 77845, TX, USA. 3. Department of Geological Sciences, University of Florida, Gainesville, 32611-2120, FL, USA.

8 9

† Authors contributed equally to this manuscript.

10 11 12 13 14 15 16 17

* Author of correspondence: Albert Rivas-Ubach Environmental Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA, 99354 Tel: 971 319 5962 e-mail: [email protected] / [email protected]

18 19

Keywords: Compound Classification, Ecological Stoichiometry, High-resolution mass spectrometry, Mass Spectrometry, Metabolomics, van Krevelen,

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

Abbreviations: vK = van Krevelen MSCC = Multidimensional Stoichiometric Constraints Classification HRMS = High Resolution Mass Spectrometry MS = Mass Spectrometry FT-ICR = Fourier Transform Ion Cyclotron Resonance ESI = Electrospray ionization CIA = Compound Identification Algorithm CRAM = Carboxyl-rich Alicyclic Molecules Lipidsc = Lipids category Proteinsc = Protein category A-Sugarsc = Amino-Sugars category Carbohydratesc = Carbohydrates category Nucleotidesc = Nucleotides category Phytochemicalc = Phytochemical compounds category CM = Correctly Matched compounds IM = Incorrectly Matched compounds NM = Not Matched compounds DM = Double Matches IM+DM = Incorrectly Matched compounds considering double matches (DM) as incorrect. CM-(NM+DM) = Correctly Matched compounds without considering the not matched (NM) and double matched (DM) compounds.

ACS Paragon Plus Environment

1

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 2 of 26

2 42

Abstract

43 44

van Krevelen diagrams (O:C vs H:C ratios of elemental formulas) have been widely used in

45

studies to obtain an estimation of the main compound categories present in environmental samples.

46

However, the limits defining a specific compound category based solely on O:C and H:C ratios of

47

elemental formulas have never been accurately listed or proposed to classify metabolites in biological

48

samples. Furthermore, while O:C vs. H:C ratios of elemental formulas can provide an overview of the

49

compound categories, such classification is inefficient because of the large overlap among different

50

compound categories along both axes. We propose a more accurate compound classification for

51

biological samples analyzed by high-resolution mass spectrometrybased on an assessment of the

52

C:H:O:N:P stoichiometric ratios of over 130,000 elemental formulas of compounds classified in 6 main

53

categories: lipids, peptides, amino-sugars, carbohydrates, nucleotides and phytochemical compounds

54

(oxy-aromatic compounds). Our multidimensional stoichiometric compound classification (MSCC)

55

constraints showed a highly accurate categorization of elemental formulas to the main compound

56

categories in biological samples with over 98% of accuracy representing a substantial improvement over

57

any classification based on the classic van Krevelen diagram. This method represents a signficant step

58

forward in environmental research, especially ecological stoichiometry and eco-metabolomics studies,

59

by providing a novel and robust tool to further our understanding the ecosystem structure and function

60

through the chemical characterization of different biological samples.

61 62 63 64 65 66 67 68 69 70 71 72 73

ACS Paragon Plus Environment

2

Page 3 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

3 74 75

Introduction The role of nutrients in organisms, especially primary producers, has been a critical topic in

76

ecosystem studies of biogeochemical cycles,1,2 trophic relationships,3 and ecological stoichiometry.4,5

77

The most abundant elements in living organisms are the macroelements, C, H, O, N, P, S, K and, Ca,5

78

which typically behave as constituents of organic compounds.6 Understanding how these elements are

79

selectively allocated into different categories of organic compounds remains a key challenge in many

80

biogeochemical and ecological studies.4,6,7 Although the ratios between these key macroelementshave

81

proven useful in ecological studies, accurately characterizing the major compound categories in

82

organisms still remains a significant challenge.8 For example, the N:P ratio has been valuable as an index

83

of the protein:nucleic-acids (e.g., DNA, RNA) ratio, due to the high content of N in proteins and P in

84

nucleic acids.5,9–11 However, this approach has many limitations since the same macroelements are also

85

commonly found in other organic compound categories.

86

The metabolome of an organism, defined as the entire suite of low-molecular weight

87

compounds (metabolites; typically < 1200 Da) present at any given time and under specific conditions,12

88

provides valuable information for understanding the ecophysiology of organisms. Metabolomes include

89

primary cellular products such as carbohydrates, small peptides, nucleotides, and lipids, as well as

90

secondary metabolites that participate in several diverse physiological processes. Most notably, changes

91

in stoichiometric C:N:P ratios in organisms have been shown to be more linked with the overall

92

metabolome structure rather than with specific compounds or groups of compounds.6,13 Several

93

ecosystem studies have combined metabolomic, elemental, and stoichiometric data to examine linkages

94

between C:N:P biomass stoichiometries with the overall metabolome composition,6,14–18 yet

95

stoichiometry-metabolome relationships remains elusive. A key reason for this ambiguity is our inability

96

to identify the majority of metabolites, in large part due to current technological limitations. For

97

example, the number of identified metabolites in non-targeted studies is typically < 200,8,13,15,19–22 even

98

when complementary analytical techniques are combined (e.g. mass spectrometry and nuclear magnetic

99

resonance). This small proportion of characterized metabolome, combined with a lack of understanding

100

of the dynamics of the major organic compound categories in organisms’ under different environmental

101

conditions, remains a key obsatcle for understanding the mechanisms of the linkages between the C:N:P

102

stoichiometry of organisms and their metabolomes.

103

In the 1950s, van Krevelen developed a graphical representation of macroelemental ratios to

104

evaluate the origin and chemical evolution of petroleum and kerogen samples.23 This graphical

105

representation, the van Krevelen diagram (hereafter vK diagram), represents atomic O:C vs. H:C ratios of

ACS Paragon Plus Environment

3

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 4 of 26

4 106

organic compounds,23,24 and has become an important tool for characterizing organic matter extending

107

well beyond petrochemical applications.25–28 vK diagrams have been used in numerous studies to

108

elucidate chemical reactions25,27,28 as well as to assess main compound categories in natural organic

109

matter (NOM) commonly defined as lipid-like, protein-like, amino-sugar-like, cellulose-like, lignin-like,

110

and condensed hydrocarbons.26,27,29 In addition, vK diagrams were also used to examine the chemical

111

characteristics of carboxyl-rich alicyclic molecules (CRAM)30,31 and oxidized black carbon32 in NOM.

112

However, the large overlap in vK diagram between compound categories, often leading to incorrect

113

compound classification (Fig. 1), and the ambiguity of “compound-like” term make the compound

114

classification based solely on O:C and H:C ratios inefficient for obtaining robust conclusions for

115

ecological and biogeochemical studies. For example, using vK diagram alone to depict CRAM is

116

problematic as these compounds significantly overlapped with the tradiationnaly-defined “lignin-like”

117

region,33 but CRAM compounds may not be related to lignins derived from vascular plants. Decomposing

118

organic matter, largely driven by microorganisms (eg. oxidation, methylation, and dehydration) and

119

photo-oxidation, can result in substantial shifts of molecular O:C, H:C and N:C ratioscomparing to the

120

original composition. Therefore, using compound-classification methods based on O:C and H:C ratios

121

alone when studying complex mixtures of organic matter in different matrices could, thus, lead to

122

inconclusive and/or confounding results – largely due to significant compound transformation

123

processes. Besides the large compound overlapping and the transformation of original molecular ratios,

124

the O:C and H:C boundaries defining a specific compound category in a vK diagram substantially differ

125

among published studies (see D’Andrilli et al. 201534) and have never been accurately defined for a

126

robust overall classification of compounds. Many metabolites contain other heteroatoms, such as N or

127

P; therefore, including these heteroatoms in additional to O in multidimensional compound

128

classification approaches, i.e., using more than two dimensions (e.g. O:C and H:C ratios) should provide

129

a better performance in classifying compounds into their corresponding main category (e.g. Lipids,

130

Protein, etc.) (Fig. 2).

131

N and P have been widely considered as the pillars of the elemental ratios found in most ecological

132

and ecophysiological studies, largely because of their significant roles in ecosystem structure and

133

function, especially as they relate to the carbon cycle. For example, C:N, C:P or N:P ratios are important

134

factors in ecological processes such as nitrogen fixation, litter decomposition, trophic relationships,

135

organism growth rate, biodiversity, the responsiveness capacity of organisms, and ecosystems

136

responses to stressors.4–6,35 Lipids, proteins, carbohydrates, and nucleic acids represent the four major

137

building blocks of life and occur in different proportions in all living systems. In addition to these core-

ACS Paragon Plus Environment

4

Page 5 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

5 138

compound categories, plants, fungi, and bacteria, also produce an assortment of secondary metabolites,

139

commonly oxy-aromatic compounds, with diverse elemental composition that typically participate in

140

specific functions.36–38

141

Here, we propose an optimized compound classification method based on the C:H:O:N:P

142

stoichiometric ratios of more than 130,000 molecular formulas of known compounds classified in six

143

main categories: lipids, peptides, amino-sugars, carbohydrates, nucleotides, and phytochemical

144

compounds. The MSCC represents a substantial improvement over the classic vK approach (i.e., O:C and

145

H:C ratios) for overall compound classification. We demonstrate that the MSCC vastly improves the

146

exploration and interpretation of the overall molecular composition in biological samples, as applied to

147

ecology, ecophysiology, ecological stoichiometry, organic chemistry, organic geochemistry, and

148

biogeochemistry. While this novel compound classification approach is more tailored for plants, it can

149

be also applicable to other organisms. To demostrate our method can readily characterize the different

150

metabolite compositions in different organisms, we analzyed three model organisms from different

151

kingdoms (Plantae, Fugi, and Animalia (insect)) with high resolution mass spectrometry and applied the

152

proposed MSCC.

153 154

Materials & Methods

155

Metabolite databases.

156

For the MSCC determination, we explored the C:H:O:N:P molecular ratios of a total of 132,209

157

elemental formulas from compounds consisting of 30,729 lipids; 93,245 peptides (including

158

phosphorylated peptides); 7,774 phytochemical compounds; 82 carbohydrates; 142 amino-sugars

159

(including amino sugar phosphates); and 37 nucleotides (Table S-1). Elemental formulas from lipids,

160

phytochemical compounds, carbohydrates, amino-sugars and nuclotides were obtained from different

161

compound databases: LIPID MAPS Lipidomics Gateway database,39 KEGG COMPOUND,40,41 and ChEBI

162

databases.42 Table S-1 includes the details of the compounds from each dataset included to each

163

compound category. Elemental formulas from peptides were compiled from FASTA file representation

164

of 2014 SwissProt snapshot. Over 78,000 peptide sequences within the mass range 50-1,200 Da were

165

converted to molecular formulas and H2O was added for peptide termini. Subsequently, 15,000 peptides

166

that could be phosphorylated according to their sequences were randomly selected and

167

computationally supplied with 1, 2 or 3 HPO4, depending on the size of the peptide, to generate a large

168

dataset of possible phosphopeptides. See pages S-2 and S-3 in the Supporting Information for a detailed

ACS Paragon Plus Environment

5

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 26

6 169

description of some considerations of the used compound databases for the determination of the

170

MSCC. See page S-4 of the Supporting Information for validation of MSCC determination.

171 172 173

Determination of new compound category constraints. The proposed compound classification in the MSCC system includes six of the following major

174

categories, according to the aforementioned databases: lipids category (hereafter Lipidsc), proteins

175

category (peptides; hereafter Proteinsc), amino-sugars category (hereafter A-Sugarsc), carbohydrates

176

category (hereafter Carbohydratesc), nucleotides category (hereafter Nucleotidesc) and phytochemical

177

compounds category (hereafter Phytochemicalc).

178

Pairwise, stoichiometric variable scatterplots were obtained between all compound databases

179

to explore the spatial distribution of molecules along all C:H:O:N:P combinations (Fig. S-1). N, P, S, O,

180

and, molecular weight (MW) variables were also considered to complement the stoichiometric

181

constraints of certain compound categories. Boundaries for compound classification were established

182

using the distribution of compounds from databases along all the examined stoichiometric and

183

elemental variables. For compound databases showing overlap in all the examined variables, we

184

considered the variable showing better separation (lowering the overlapping proportion) between the

185

databases as the discriminant variable for their classification (Fig. S-2 to S-8). By keeping the minimum

186

overlapping between compound categories and using diverse stoichiometries (e.g. O:C, H:C, N:C etc.),

187

our MSCC ensures a wide recovery of compounds and minimal compound matching error. We,

188

therefore, expect that the probability of matching compounds outside their category is minimal and

189

most of the detected features will match within their corresponding category boundaries (Fig. 4).

190

Additional specific information regarding the established boundaries of the MSCC are detailed in pages

191

S-2 and S-3 of Supporting Information. A R (https://www.r-project.org/) script is included in S-5 to S-9 of

192

the Supporting Information for compound classification of stoichiometry ratios using MSCC.

193 194 195

Comparison of the classic vK compound classification with the MSCC. We obtained the O:C and H:C boundaries defining the different compound categories from 21

196

published studies (citations shown in table S-2) using GetData Graph Digitizer 2.24 to compare the

197

performance of MSCC versus the classic compound classification - based exclusively on O:C vs. H:C. Due

198

to the vague definition of the “Lignin-like” and “condensed hydrocarbon” categories typically

199

represented in vK diagrams and the lack of specific databases including characterized compounds within

200

those categories, we only considered the lipids, proteins, amino-sugars, and carbohydrates (cellulose)

ACS Paragon Plus Environment

6

Page 7 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

7 201

databases to evaluate how the O:C and H:C boundaries of each compound category from all 21 studies

202

performed in contrast to our MSCC.

203

The proportion of compounds from each compound database (lipids, proteins, amino-sugars,

204

and carbohydrates) that correctly matched (CM), not matched (NM), and incorrectly matched (IM) in

205

their corresponding categories (e.g. proportion of lipids from the database that CM, NM or IM into the

206

Lipidsc) were calculated for the compound boundaries based solely on O:C and H:C ratios from the 21

207

studies and for our MSCC (Table S-2). Double matched (DM), or compounds that fit into two different

208

categories, were also calculated. We considered all amino-sugars matching the Carbohydratesc as CM

209

for those studies that did not present the A-Sugarsc but only Carbohydratesc (Table S-2). We did not

210

consider the carbohydrate database for studies that did not show Carbohydratesc but only A-Sugarsc.

211

Additionally, for each compound category (lipids, proteins, amino-sugars, and carbohydrates), we

212

calculated the proportion of IM considering the double matches as incorrect (IM+DM) and the CM

213

without considering the NM and DM (CM-(NM+DM)) (Table S-2). In order to assess the number of correctly

214

matched compounds per incorrectly matched we calculated the CM/IM+DM ratio for each compound

215

category. The CM/(IM+DM + NM) ratio was calculated and used to assess the efficiency of stoichiometric

216

constraints for the databases. Due the large difference between databases in number of compounds,

217

the overall performance of each study was evaluated using two approaches: i) considering the total

218

absolute number of compounds of all databases together (heareafter, “absolute total”), and ii)

219

considering the relative number of compounds for each database (hereafter, “relative total”).

220 221 222

MSCC performance. The performance of MSCC was also evaluated alone considering all compound categories: lipids,

223

peptides, amino-sugars, carbohydrates, nucleotides and phytochemical compounds. For each category,

224

the CM, NM, IM, CM-(NM+DM), CM/IM, and CM/(IM + NM) were calculated. The absolute total and relative

225

total values were also calculated for each of the compound categories. Additional performance

226

validation of the MSCC was performed for Lipidsc, Phytochemicalc and Proteinsc; the main categories

227

showing larger overlapping across their stoichiometry (see page S-5 of the Supporting Information).

228 229

Compound extraction and FT-ICR-MS analyses of model organisms.

230

To test the MSCC in real biological samples and show the contrasted profiles from organisms of

231

different kingdoms, fresh material of Brachypodium distachon (plant), Saccharomyces cerevisiae (fungi),

232

and Drosophila melanogaster (insect) were flash-frozen in liquid nitrogen, lyophilized, and then grinded

ACS Paragon Plus Environment

7

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 26

8 233

for metabolite extraction43 (Fig. 3a). Briefly, thirty mg of lyophilized B. distachon, S. cerevisiae, and D.

234

melanogaster powders were added separately to 2 mL glass vials followed by the addition of 1 mL of

235

MeOH/water (80:20). All samples were then placed into a Thermomixer (Eppendorf, New York, NY, USA)

236

and shaken for 30 min. at 1,200 rpm at 12 ᵒC. All vials were subsequently sonicated for 5 min. and then

237

centrifuged at 6,000 × g for 5 min. Supernatants were collected and placed into labeled 2 mL HPLC vials,

238

and kept frozen at at -80 ᵒC until analyses.

239

Metabolic fingerprints from these organisms were obtained using a ultra-high resolution 15

240

Tesla SolariX Fourier-transform ion-cyclotron resonance mass spectrometer (FT-ICR-MS; Bruker

241

daltonics Inc, Billerica, MA, USA) equipped with electrospray ionization (ESI) and operated in negative

242

ionization mode (Fig. 3b). Each sample was directly infused at a flow rate of 3 µL/min. A total of 200

243

spectra at 4 Mword (corresponding to resolution > 400,000 at m/z = 400) of each sample were averaged

244

for each sample. Mass measurement accuracy was < 1 ppm with external calibration, and < 0.2 ppm

245

after internal calibration. Formula assignment was performed using the Formularity software44 based

246

on the automated compound identification algorithm (CIA)45 (Fig. 3c). To avoid assignment ambiguity,

247

formulas for ions with m/z > 70046 and assignments with > 0.2 ppm error were not considered.

248 249 250 251

Results Table 1 shows the determined stoichiometric and elemental boundaries for each compound category of the MSCC.

252 253 254

Contrasting vK vs. MSCC stoichiometric boundaries performance. When considering lipid, peptide, amino-sugar, and carbohydrate databases, for compound

255

classification limits derived from the O:C vs. H:C ratios as shown in the different 21 published studies,

256

total absolute CM compounds varied from the 14.63% to 63.64% (Fig. 5a, Table S-2), and the absolute

257

total IM+DM compounds varied from 0.7% to 68.93%. The absolute total CM and IM+DM compounds for

258

the proposed MSCC were, 99.19% and 0.42%, respectively (Fig. 5a; Table S-2). The relative total CM

259

ranged from the 12.43% and 55.68% for the 21 published studies, in contrast to 98.5% using the MSCC

260

(Fig. 5b).

261

The absolute total CM/IM+DM ratio ranged from 1.19 to 21.01 for the 21 published studies while

262

the MSCC yielded a ratio of 234.64 (Fig. 5c). The relative total CM/IM+DM ratio varied between 1.09 and

263

17.74 among the 21 studies and the MSCC had a ratio of 234.05 (Table S-2). The larger relative matching

264

error was generally attributed to lower proportions of NM compounds (Fig. 5d). Although NM

ACS Paragon Plus Environment

8

Page 9 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

9 265

compounds were not classified in any compound category, they did not introduce any error into the

266

results. For the 21 published studies, the CM-(NM+DM) compounds ranged between the 53.38% and

267

95.46% in absolute terms, and between the 56.74% and 96.18% in relative terms (Table S-2). The MSCC

268

reached 99.58% and 99.57% of absolute and relative total CM-(NM+DM) compounds, respectively. The

269

absolute and relative total CM/(IM+DM + NM) ratios varied between 0.17 and 1.75 and between 0.15 and

270

13.6, respectively, for the 21 published studies, while the MSCC showed a ratio of 121.73 and 272.05,

271

respectively (Figs. 3e-3f).

272 273 274

Evaluation of the MSCC performance considering all compound databases. The evaluation of the MSCC results obtained considering the molecular formulas from the six

275

databases (lipids, peptides, phytochemical compounds, nucleotides, amino-sugars, and carbohydrates)

276

are shown in Table 2. We obtained 98.8%, 0.14% and 0.9% of the absolute total compounds as CM, NM

277

and IM, respectively. Lipids, peptides, and phytochemical compound databases showed the highest

278

overlap among them for all the elemental and stoichiometric variables. However, only a 2.5% of lipid

279

formulas were IM to other groups (1.5% matched into Proteinc, 0.9% matched into Phytochemicalc,

280

0.08% matched into Carbohydratesc and 0.06% matched into A-Sugarc). For the peptide formulas, 0.06%

281

were IM (0.01%, 0.05% and 0.002% matched into A-Sugarc, Phytochemicalc and Lipidc respectively). An

282

estimated 3.3% of the phytochemical compound formulas were IM as A-Sugarsc (0.33%), Lipidsc (2.7%)

283

and Proteinc (0.3%). Amino-sugar and carbohydrate formulas did not show any IM although 1.4% (2

284

compounds) and 1.22% (1 compound) of compounds, respectively, were NM according to MSCC. All

285

nucleotide formulas CM to Nucleotidesc although we found few ones that also matched as Proteinc

286

(40%) and A-Sugarc (5%).

287

Due to the high recovery of compounds using the MSCC, the total relative proportions of CM

288

(98.5%) did not show large changes from the total absolute proportions of CM (99.8%) (Table 2). The

289

total absolute CM/IM ratio was 108.3, so 110 compounds from the databases were CM per each IM. The

290

total absolute CM/(IM + NM) ratio was 95.82.

291 292

Proportions of compounds based on MSCC for model organisms.

293

Based on the metabolite extraction, performed with MeOH:H2O (80:20) and the ESI-FT-ICR-MS

294

analyses in negative ionization mode, we obtained very distinct compound profiles between the three

295

organisms analyzed (Fig. 6). We only found the 9.6%, 7.5% and 8.7% of the metabolic features for B.

296

distachon (plant), S. cerevisiae (yeast), and D. melanogaster (insect), respectively, not matching any of

ACS Paragon Plus Environment

9

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 26

10 297

the compound categories - as defined by the MSCC defined in Table 1. The 0.1%, 0.4% and 0.3% of

298

variables from B. distachon, S. cerevisiae and D. melanogaster, respectively, matched into two different

299

categories. Phytochemical compounds and lipids were the most abundant compound categories for B.

300

distachon, representing 40.4% and 32.8% of the variables, respectively. S. cerevisiae also showed a large

301

fraction of lipids (31%) and oxy-aromatic compounds (33.6%), followed by protein (12.3%) and amino

302

sugars (10.6%), while D. melanogaster showed a very distinct profile with 25.0%, 24.3%, 17.8%, and

303

10.1% of lipids, proteins, amino sugars, and carbohydrates, respectively.

304 305

Discussion

306

Rationale and performance of the proposed MSCC.

307

Lipids, proteins, carbohydrates and nucleic acids are the main building blocks of life and are

308

present in all living organisms. Although the MSCC was focused on plants, they could be applied to other

309

organisms (see Extension of MSCC to different living systems section). For a more exhaustive chemical

310

characterization of plant samples, we included the “Phytochemical compounds” category

311

(Phytochemicalc), which has been delimited by the stoichiometry from thousands of molecular formulas

312

of well-characterized plant secondary metabolites. These compounds are well known for the crucial

313

roles they play in the physiological, developmental, and anti-stress processes in plants.47,48 We used

314

thousands of elemental formulas from described compounds to accurately define the multiple

315

stoichiometric boundaries for each compound category. By using large compound databases and

316

multiple boundaries defining each compound category, the probability of matching metabolites outside

317

their corresponding major category is minor (Fig. 4; Table 2). The application of the proposed MSCC for

318

the classification of elemental formulas is thus held under the premise that the vast majority of the

319

detected features by high-resolution mass spectrometry (e.g. FT-ICR-MS) in samples should match to the

320

major compound categories (Figs. 4, S-3 to S-8), including Phytochemicalc, in the case of plants.

321

The MSCC (Table 1) is primarily based on the examination of the H:C, O:C, N:C, P:C and N:P

322

ratios of a substantial number of described metabolites covering different organic compound categories.

323

The MSCC provides a new powerful approach for classifying different molecular compounds into their

324

respective categories, with minimal error and exceptional coverage (Fig. 5; Tables 2 and S-2). The

325

enlargement of the areas delimited by O:C and H:C ratios of any compound categories in a vK diagram

326

increases the coverage of compound matching, but also increases the compound matching error (Fig.

327

5d). As such, the use of MSCC allows for a more accurate classification of the different metabolites into

328

six major compound categories - with reduced compound matching error. After the examination of large

ACS Paragon Plus Environment

10

Page 11 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

11 329

databases that included primary and secondary metabolites, we contend that the probability of

330

incorrectly matching metabolites using MSCC is minimal (Figs 4, 5c, 5e-5f) and thus provides a significant

331

improvement over the vK method for assessment of the overall spectrum of compounds in biological

332

samples.

333 334 335

Practical limitations. Advanced mass spectrometry offers high resolution (up to 1,000,000) (Brown, Kruppa, &

336

Dasseux, 2005; Denisov, Damoc, Lange, & Makarov, 2012). Ultrahigh mass-measurement accuracy

337

(typically < 1 ppm after internal calibration) by Fourier transform mass spectrometry (FTMS) allows

338

assignment of elemental formulas for the majority of the thousands of compounds detected in a given

339

sample.49,50 The application of high-resolution FTMS (HR-FTMS) in environmental and biological research

340

has allowed assessment of the diversity of molecular compounds present in a particular sample,

341

assuming the presence of several different essential elements, such as O, S, N and P, and calculating

342

their stoichiometric ratios. Assigning elemental formulas to HRMS data can thus be a valuable tool for

343

simply comparing different stoichiometric metabolic profiles, or for exploring how organisms assign

344

different elements to different molecular compounds under certain environmental conditions, which is

345

an important aspect of ecological stoichiometry studies.4–6

346

HRMS provides unprecedented means for understanding molecular ecological stoichiometry in

347

complex samples, nevertheless there are few limitations. For instance, accurate formula assignments

348

can be hindered by (1) mass measurement accuracy achieved by the HRMS after routine external

349

calibration, (2) final mass measurement accuracy after internal calibration, and (3) assignment accuracy

350

from the formula assignment algorithm. Typically, mass measurement accuracyof < 1 ppm error is

351

expected using well-maintained,i.e. routinely cleaned and calibrated,HR-FTMS instruments,such as FT-

352

ICR-MS.51 Less than 0.2 ppm of mass error is often achieved with a HR-FTMS by employing internal

353

calibration,52,53 which is a crucial step for accurate formula assignmentby CIA45. In fact, CIA correctly

354

assigned 96.94% of the known molecular formulas from our compound databases (Table S-3), when

355

considering all formula types and full mass range (70 to 1200 Da) (see page S-6 and Table S-3 in

356

Supporting Information for a detailed assessment of the formula assignment error of the compound

357

databases using CIA). Furthermore, it is important to note that as molecular mass increases, the number

358

of possible formula assignments also disproportionally increases, especially when multiple heteroatoms

359

are considered.45,46 Hence, only lower mass metabolites have been typically considered for formula

360

assignment (e.g. < 500 Da) to reduce the number of false positives.46 Therefore, using formula

ACS Paragon Plus Environment

11

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 26

12 361

assignments for compounds < 500 Da and mass measurement accuracy < 0.2 ppm is advised for more

362

robust compound classification in samples; formula assignments for compounds over 500 Da should be

363

treated with caution. It should be noted that in addition to mass measurement accuracy related issue,

364

another inherit limitation of direct infusion FT-ICR-MS is the inability in distinguishing between structural

365

isomers. For example, a lignin compound may have the same elemental formula as a CRAM compound,

366

while they have different structure and different origins in nature.33

367 368 369

Extension of MSCC to different living systems. The MSCC presented here can also be applied to classification of lipids, proteins, carbohydrates

370

and nucleotides in all other living systems. “Lignin-like” and “condensed hydrocarbon” compound

371

categories are often depicted in vK diagrams.34 However, these categories are not directly applicable for

372

metabolites in living systems. Any organic sample contains a significant proportion of molecules with

373

low H:C ratios (< 1.3), typically represented by aromatic or unsaturated compounds. Those metabolites

374

possess one or more aromatic rings and different functional groups (e.g. OH, -C(O)OH, and -NO2) making

375

up a large variety of compounds along the O:C axis, typically polyphenolics. Ligninconsists of large

376

polymers of phenolic compounds (C31H34O11)n and represents one of the main components of plant cell

377

walls. Several plant phenolic secondary metabolites are lignin precursors (lignins) but other non-lignin

378

related phenolics, such as flavonoids, are abundantsecondary metabolites,36 and cluster within the

379

“lignin-like” category in a vK diagram (Fig. 1). Phytochemicalc, which include all polyphenolic

380

compounds, should only be used for plant samples. Another category of phenolics formed by CRAM that

381

occur within the “lignin-like” region of the vK diagrams has also been described.29–31,54 However, not all

382

compounds within this area in a vK diagram are necessarily carboxyl-rich. On the other hand, condensed

383

hydrocarbons are usually considered to be derived from incomplete combustion or geo-condensation

384

such as char or petroleum, respectively.55 Since condensed hydrocarbons category is not directly

385

applicable to organisms and overlaps with our proposed Phytochemicalc (oxy-aromatics), this category

386

was not considered in the MSCC. For the analyses of samples other than plants, we propose the use the

387

general term of “oxy-aromatic compounds” to include all those compounds with low H:C, mainly

388

polyphenolics, applying the same MSCC as for Phytochemicalc.

389

The compound profiles of the analyzed model organisms were obtained through an MeOH/H2O

390

(80:20) extraction and the extracted composition will vary if using different solvents.56,57 Thus, it is

391

important to use the same procedures for sample comparison. Additionally, the ionization method

392

dictates the nature of compounds being observed with MS platforms. Electrospray ionization (ESI) is

ACS Paragon Plus Environment

12

Page 13 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

13 393

often used to resolve polar and semi-polar compounds, while other ionization methods such as

394

atmospheric pressure photoionization (APPI) or atmospheric pressure chemical ionization (APCI) could

395

be considered for non-polar matrices.

396

As expected, three model organisms studied herein display very different compound profiles (Fig. 6).

397

Based on the ESI-FT-ICR-MS analyses of the MeOH/H2O (80:20) extracts, over 90% of the signals

398

detected in all three cases were matched into one of the compound groups of the MSCC (Table 1)

399

demonstrating a reliable compound classification. Furthermore, the proportion of double assignations

400

was minimal, reaching the maximum values for D. melanogaster (insect) (0.4%).

401

Insects are protein-rich organisms58 with relatively low content of polyphenolic compounds

402

compared to plants or fungi. Our results clearly showed this trend with 24.3% of pepetides in D.

403

melanogaster vs. 12.3% in S. cerevisiae (fungi) and 5.7% in B. distachon (plant) (Fig. 6). Chitin, an

404

abundant amino-sugar biopolymer,59 represents a major component in insects.60 The highest

405

proportions of amino sugars was detected in D. melanogaster, especially compared to plants, may

406

directly relates to the high chitin content of the exoskeleton of insects. On the other hand, plants and

407

fungi produce a large diversity of secondary metabolites, a major part of them with oxy-aromatic

408

structures.37 From all the detected features, 40.4% and 33.6% matched into Phytochemicalc (oxy-

409

aromatic category) for B. distachon and S. cerevisiae, respectively, further corroborating the high

410

content of secondary metabolites in those organisms.

411 412 413

Conclusions The proposed multidimensional stoichiometric constraints classification (MSCC) (Table 1) exhibited a

414

substantial improvement over vK diagrams in classifying database compounds with a minimal error and

415

large coverage (Table 2) and it can be applied to different organisms. Additionally, the classification

416

method can serve as a strong starting point to further investigate other (i.e. nonliving) complex

417

environmental matrices, and start defining and optimizing multiple elemental ratios allowing for more

418

robust classification for environmental samples. Inherit limitations originating from the mass

419

measurement accuracy and formula assignment accuracy should be carefully considered when

420

interpreting the MS data in terms of compound classifications and molecular level stoichiometric

421

interpretations. This stoichiometric compound classification method represents a valuable tool for

422

environmental research, especially in the fields of ecometabolomics and ecological stoichiometry.

423

ACS Paragon Plus Environment

13

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 26

14 424

Acknowledgments.

425

This research was performed at the Environmental Molecular Science Laboratory (EMSL), a DOE

426

Office of Science User Facility sponsored by the Office of Biological and Environmental Research at the

427

Pacific Northwest National Laboratory (PNNL). The research was funded in part by US Department of

428

Energy (DOE) Contract DE-AC05-76RL01830 with PNNL.

429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449

ACS Paragon Plus Environment

14

Page 15 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

15 450

References

451

(1)

Benner, J. W.; Vitousek, P. M. Ecol. Lett. 2007, 10 (7), 628–636.

452 453

(2)

Vitousek, P. M.; Aber, J. D.; Howarth, R. W.; Likens, G. E.; Matson, P. A.; Schindler, D. W.; Schlesinger, W. H.; Tilman, D. G. Ecol. Appl. 1997, 7 (3), 737–750.

454

(3)

Andersen, T.; Hessen, D. O. Limnol. Oceanogr. 1991, 36 (4), 807–814.

455

(4)

Sardans, J.; Rivas-Ubach, A.; Peñuelas, J. Biogeochemistry 2012, 111 (1–3).

456 457

(5)

Sterner, R.; Elser, J. Ecological Stoichiometry: The Biology of Elements from Molecules to the Biosphere; Princetion University Press, 2002.

458 459

(6)

Rivas-Ubach, A.; Sardans, J.; Peŕez-Trujillo, M.; Estiarte, M.; Penũelas, J. Proc. Natl. Acad. Sci. U. S. A. 2012, 109 (11).

460

(7)

Peñuelas, J.; Sardans, J. Chem. Ecol. 2009, 25 (4), 305–309.

461

(8)

Sardans, J.; Peñuelas, J.; Rivas-Ubach, A. Chemoecology 2011, 21 (4), 191–225.

462 463

(9)

Bianchi, T. S.; Canuel, E. A. Chemical Biomarkers in Aquatic Ecosystems - Thomas S. Bianchi, Elizabeth A. Canuel - Google Books; Princeton University Press: New Jersey, 2011.

464

(10)

Elser, J. J.; Dobberfuhl, D. R.; MacKay, N. A.; Schampel, J. H. BioScience. 1996, pp 674–684.

465

(11)

Matzek, V.; Vitousek, P. M. Ecol. Lett. 2009, 12 (8), 765–771.

466

(12)

Fiehn, O. Plant Mol. Biol. 2002, 48 (1–2), 155–171.

467 468

(13)

Rivas-Ubach, A.; Gargallo-Garriga, A.; Sardans, J.; Oravec, M.; Mateu-Castell, L.; Pérez-Trujillo, M.; Parella, T.; Ogaya, R.; Urban, O.; Peñuelas, J. New Phytol. 2014, 202 (3).

469 470

(14)

Gargallo-Garriga, A.; Sardans, J.; Pérez-Trujillo, M.; Rivas-Ubach, A.; Oravec, M.; Vecerova, K.; Urban, O.; Jentsch, A.; Kreyling, J.; Beierkuhnlein, C. Sci. Rep. 2014, 4, 6829.

471 472

(15)

Sardans, J.; Gargallo-Garriga, A.; Pérez-Trujillo, M.; Parella, T. J.; Seco, R.; Filella, I.; Peñuelas, J. Plant Biol. (Stuttg). 2014, 16 (2), 395–403.

473 474

(16)

Rivas-Ubach, A.; Barbeta, A.; Sardans, J.; Guenther, A.; Ogaya, R.; Oravec, M.; Urban, O.; Peñuelas, J. Perspect. Plant Ecol. Evol. Syst. 2016, 21, 41-54.

475 476

(17)

Rivas-Ubach, A.; Hódar, J. A.; Sardans, J.; Kyle, J. E.; Kim, Y.-M.; Oravec, M.; Urban, O.; Guenther, A.; Peñuelas, J. Ecol. Evol. 2016, 6 (13), 4372–4386.

477 478

(18)

Rivas-Ubach, A.; Poret-Peterson, A. T.; Peñuelas, J.; Sardans, J.; Pérez-Trujillo, M.; Legido-Quigley, C.; Oravec, M.; Urban, O.; Elser, J. J. Acta Physiol. Plant. 2018, 40 (2), 28.

479

(19)

Kim, H. K.; Choi, Y. H.; Verpoorte, R. Nat. Protoc. 2010, 5 (3), 536–549.

480

(20)

Petersson, S. V.; Lindén, P.; Moritz, T.; Ljung, K. Metabolomics 2015, 11 (6), 1679–1689.

481

(21)

Pluskal, T.; Nakamura, T.; Villar-Briones, A.; Yanagida, M. Mol. BioSyst. 2009, 6 (1), 182–198.

ACS Paragon Plus Environment

15

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 26

16 482 483

(22)

Rivas-Ubach, A.; Sardans, J.; Hódar, J. A.; Garcia-Porta, J.; Guenther, A.; Paša-Tolić, L.; Oravec, M.; Urban, O.; Peñuelas, J. Ecol. Evol. 2017, 7 (21), 8976–8988.

484

(23)

van Krevelen, D. Fuel 1950, 29, 269–284.

485

(24)

Curiale, J. A.; Gibling, M. R. Org. Geochem. 1994, 21 (1), 67–89.

486

(25)

Baldock, J. A.; Smernik, R. J. Org. Geochem. 2002, 33 (9), 1093–1109.

487

(26)

D’Andrilli, J.; Foreman, C. M. Org. Geochem. 2013, 65, 19–28.

488

(27)

Kim, S.; Kramer, R. W.; Hatcher, P. G. Anal. Chem. 2003, 75 (20), 5336–5344.

489

(28)

Kracht, O.; Gleixner, G. Org. Geochem. 2000, 31 (7–8), 645–654.

490 491

(29)

Minor, E. C.; Swenson, M. M.; Mattson, B. M.; Oyler, A. R. Environ. Sci. Process. Impacts 2014, 16 (9), 2064–2079.

492 493

(30)

Hertkorn, N.; Benner, R.; Frommberger, M.; Schmitt-Kopplin, P.; Witt, M.; Kaiser, K.; Kettrup, A.; Hedges, J. I. Geochim. Cosmochim. Acta 2006, 70 (12), 2990–3010.

494 495

(31)

Stubbins, A.; Spencer, R. G. M.; Chen, H.; Hatcher, P. G.; Mopper, K.; Hernes, P. J.; Mwamba, V. L.; Mangangu, A. M.; Wabakanghanzi, J. N.; Six, J. Limnol. Oceanogr. 2010, 55 (4), 1467–1477.

496

(32)

Kim, S.; Kaplan, L. A.; Benner, R.; Hatcher, P. G. Mar. Chem. 2004, 92 (1–4), 225–234.

497

(33)

Sleighter, R. L.; Hatcher, P. G. Mar. Chem. 2008, 110 (3–4), 140–152.

498 499

(34)

D’Andrilli, J.; Cooper, W. T.; Foreman, C. M.; Marshall, A. G. Rapid Commun. Mass Spectrom. 2015, 29 (24), 2385–2401.

500 501 502

(35)

Elser, J. J.; Fagan, W. F.; Denno, R. F.; Dobberfuhl, D. R.; Folarin, A.; Huberty, A.; Interlandi, S.; Kilham, S. S.; McCauley, E.; Schulz, K. L.; Siemann, E. H.; Sterner, R. W. Nature 2000, 408 (6812), 578–580.

503

(36)

Bennett, R. N.; Wallsgrove, R. M. New Phytol. 1994, 127 (4), 617–633.

504

(37)

Keller, N. P.; Turner, G.; Bennett, J. W. Nat. Rev. Microbiol. 2005, 3 (12), 937–947.

505

(38)

Pietra, F. Nat. Prod. Rep. 1997, 14 (5), 453.

506 507 508

(39)

Sud, M.; Fahy, E.; Cotter, D.; Brown, A.; Dennis, E. A.; Glass, C. K.; Merrill, A. H.; Murphy, R. C.; Raetz, C. R. H.; Russell, D. W.; Subramaniam, S. Nucleic Acids Res. 2007, 35 (Database), D527– D532.

509

(40)

Kanehisa, M.; Goto, S. Nucleic Acids Res. 2000, 28 (1), 27–30.

510 511

(41)

Kanehisa, M.; Sato, Y.; Kawashima, M.; Furumichi, M.; Tanabe, M. Nucleic Acids Res. 2016, 44 (D1), D457–D462.

512 513

(42)

Hastings, J.; de Matos, P.; Dekker, A.; Ennis, M.; Harsha, B.; Kale, N.; Muthukrishnan, V.; Owen, G.; Turner, S.; Williams, M.; Steinbeck, C. Nucleic Acids Res. 2012, 41 (D1), D456–D463.

ACS Paragon Plus Environment

16

Page 17 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

17 514 515

(43)

Rivas-Ubach, A.; Pérez-Trujillo, M.; Sardans, J.; Gargallo-Garriga, A.; Parella, T.; Peñuelas, J. Methods Ecol. Evol. 2013, 4 (5), 464–473.

516 517

(44)

Tolić, N.; Liu, Y.; Liyu, A.; Shen, Y.; Tfaily, M. M.; Kujawinski, E. B.; Longnecker, K.; Kuo, L.-J.; Robinson, E. W.; Paša-Tolić, L.; Hess, N. J. Anal. Chem. 2017, 89 (23), 12659–12665.

518

(45)

Kujawinski, E. B.; Behn, M. D. Anal. Chem. 2006, 78 (13), 4363–4373.

519

(46)

Koch, B. P.; Dittmar, T.; Witt, M.; Kattner, G. Anal. Chem. 2007, 79 (4), 1758–1763.

520

(47)

Gill, S. S.; Tuteja, N. Plant Physiol. Biochem. 2010, 48 (12), 909–930.

521

(48)

Pietta, P.-G. J. Nat. Prod. 2000, 63, 1035–1042.

522

(49)

Kujawinski, E. Environ. Forensics 2002, 3 (3), 207–216.

523

(50)

Marshall, A. G.; Hendrickson, C. L.; Jackson, G. S. Mass Spectrom. Rev. 1998, 17, 1–35.

524

(51)

Brown, S. C.; Kruppa, G.; Dasseux, J.-L. Mass Spectrom. Rev. 2005, 24 (2), 223–231.

525 526

(52)

Stubbins, A.; Silva, L. M.; Dittmar, T.; Van Stan, J. T. Front. Earth Sci. Front. Earth Sci 2017, 5 (5), doi: 10.3389/feart.2017.00022.

527 528

(53)

Sleighter, R. L.; Mckee, G. A.; Liu, Z.; Hatcher, P. G. Limnol. Oceanogr. Methods 2008, 6 (6), 246– 253.

529

(54)

Lechtenfeld, O. J.; Hertkorn, N.; Shen, Y.; Witt, M.; Benner, R. Nat. Commun. 2015, 6, 6711.

530 531

(55)

Podgorski, D. C.; Hamdan, R.; McKenna, A. M.; Nyadong, L.; Rodgers, R. P.; Marshall, A. G.; Cooper, W. T. Anal. Chem. 2012, 84 (3), 1281–1287.

532 533

(56)

t’Kindt, R.; De Veylder, L.; Storme, M.; Deforce, D.; Van Bocxlaer, J. J. Chromatogr. B. Analyt. Technol. Biomed. Life Sci. 2008, 871 (1), 37–43.

534 535

(57)

Tfaily, M. M.; Chu, R. K.; Tolić, N.; Roscioli, K. M.; Anderton, C. R.; Paša-Tolić, L.; Robinson, E. W.; Hess, N. J. Anal. Chem. 2015, 87 (10), 5206–5215.

536

(58)

DeFoliart, G. R. Bull. Entomol. Soc. Am. 1975, 21 (3), 161–164.

537

(59)

Kumar, M. N. V. R. React. Funct. Polym. 2000, 46, 1–27.

538 539

(60)

Muthukrishnan, S.; Merzendorfer, H.; Arakane, Y.; Kramer, K. J. In Insect Molecular Biology and Biochemistry; Gilbert, L. I., Ed.; Elsevier: San Diego, 2012; pp 193–225.

540 541 542 543 544

ACS Paragon Plus Environment

17

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 26

18 545 546 547 548 549

Table 1. Proposed C:H:O:N:P stoichiometric and elemental constraints for each compound category. Protein category (Proteinc) is composed by two different series of constraints which have to be used at the same time. Molecular mass range is also proposed for nucleotide category (Nucleotidesc). Table cells with hyphens (-) indicate that the specific variable is not necessary to be used as discriminant for the specific compound category.

550

O:C

N:C

P:C

N:P

O

N

P

S Mass

≤5

-

-

-

-

-

-

-

≥1

-

-

-

-

-

≥1

-

-

-

≤ 0.6

≥ 1.32 ≤ 0.126 < 0.35

> 0.12 ≤ 0.6 > 0.6 ≤1

> 0.9 < 2.5 > 1.2 < 2.5

≥ 0.126 0.2 0.07 ≤ 0.2

< 0.3

≤2

≥3

≥1

-

-

-

Carbohydratesc

≥ 0.8

-

-

-

-

=0

-

-

-

Nucleotidesc *

≥ 0.5 < 1.7

≥ 1.65 < 2.7 >1 < 1.8

-

≥2 ≥1 =0

Phytochemicalc

≤ 1.15

< 1.32 < 0.126 ≤ 0.2

Lipidc Constraints 1 Proteinc Constraints 2

551

H:C

≥ 0.2 ≤ 0.5

≥ 0.1 > 0.6 < 0.35 ≤ 5 ≤3

-

-

-

-

> 305 < 523 -

* Double matches in Nucleotidesc should be considered as nucleotides.

552 553 554 555 556 557 558 559 560 561 562 563

ACS Paragon Plus Environment

18

Page 19 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

19 564 565 566 567 568 569 570 571

Table 2. Number of compounds from databases and proportions of database compounds that correctly matched (CM), not matched (NM) and, incorrectly matched (IM) with each proposed compound category according to the porposed constraints from Table 1. The absolute number of compounds is shown in brackets. Correctly matched excluding the NM and double matchings (DM) (CM-(NM+DM)) and the CM/IM and CM/(IM + NM) ratios are also shown. The total absolute and relative proportions are shown on the calculations based on the absolute number of compounds in databases and on the relative number of compounds, respectively. The percentages marked by asterisks indicate the possibility of DM and it is discussed in the manuscript.

572

Lipids (30,729)

CM

NM

97.1% (29,823)

0.4% (122)

IM 2.5% (784)

CM-(NM+DM)

CM/IM

CM/(IM + NM)

· 17 A-Sugarc (0.06%) · 24 Carbohydratesc (0.08%) · 271 Phytochemicalc (0.9%) · 472 Proteinc (1.5%)

97.4%

38.0

32.91

99.9%

1502.3

1397.4

100%

70

70

98.8%

80.96

81.0

100%*





96.7%

29.3

27.3

0.06% (62)

573 574

Peptides (93,245)

99.9% (93,142 )

0.04% (41)

Amino sugars (142) Carbohydrates (82)

98.6% (140) 98.8% (81)

1.4% (2) 1.2% (1)

0% (0) 0% (0) 0%*

Nucleotides (37)

100%* (37)

0% (0)

Double matching: · 15 with Proteinc (40%) · 2 with with A-Sugarc (5%)

· 10 A-Sugarc (0.01%) · 50 Phytochemicalc (0.05%) · 2 Lipids (0.002%)

3.3% (256)

Phytochemical compounds (7,774)

96.5% (7,499)

0.25% (19)

TOTAL Absolute (132,009) Total Relative

98.8% (130,401)

0.14% (390)

0.9% (1,204)

99.1%

108.3

95.82

98.5%

0.55%

0.97%

98.8%

-

-

· 26 A-Sugarc (0.33%) · 207 Lipidsc (2.7%) · 23 Proteinc (0.3%)

* Double matches in Nucleotidesc are not considered as incorrect matches since any peptide or amino sugar from the databases matched with all the constraints of the Nucleotidesc.

575 576 577 578 579

ACS Paragon Plus Environment

19

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 26

20

580 581 582 583 584 585 586 587

Figure 1. van Krevelen diagram (O:C vs. H:C elemental ratios) of metabolite databases. Different compound cateogires are represented in different colors (blue, lipids; dark red, peptides; yellow, aminosugars; orange, carbohydrates; cyan, nucleotides; green, phytochemical compounds). Classic compound classification areas according to O:C and H:C elemental ratios are represented. Shown areas are an approximation based on previous compound classification of organic matter studies (citations shown in table S-2) and do not represent exactly any specific areas.

588 589 590 591 592 593 594 595 596 597 598 599

ACS Paragon Plus Environment

20

Page 21 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

21

600 601 602 603

Figure 2. Examples of different molecular structures from different compound categories with their O:C, H:C, N:C, P:C, and N:P ratios. Nitrogen (N) and Phosphorus (P) are shown in red and green, respectively.

604 605 606 607 608

ACS Paragon Plus Environment

21

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 26

22

609 610 611 612 613

Figure 3. Diagram of the principal procedures to apply the multidimensional stoichiometric constraints classification (MSCC) on samples. a, b and c denote three of the procedures cited in the main text that have been performed before testing the MSCC on model organisms.

614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631

ACS Paragon Plus Environment

22

Page 23 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

23

632 633 634 635 636 637 638 639 640 641 642 643 644 645

Figure 4. Example of a 2-dimensional (2D) density plot of H:C vs. O:C elemental ratios (vK plot) for the peptide database (93,245 elemental formulas). Color gradient indicates distinct number of peptides within each squared area (red squares indicate the areas with higher density of peptides (up to 540-560 peptides); blue squares indicate the areas with lower density of peptides (1-20 peptides)). Contour lines including the 80% and 95% of peptides are shown in black. Boundaries of multidimensional stoichiometric constraints classification (MSCC) for O:C and H:C ratios are indicated with red dashed lines (see Table 1 for accurate stoichiometric thresholds). By probability, the major part of peptides detected in samples will be found within the high-density area (95%). The boundaries of MSCC (Table 1) are substantially extended with respect to the high-desity area and the classification is based on multiple stoichiometric constraints and not just 2 as shown in the figure making thus the probability of matching compounds outside their defined stoichiometric constraints using the MSCC minimal. See Figures S-3 to S-8 for all stoichiometric constraints from all compound categories.

646 647 648 649 650 651 652

ACS Paragon Plus Environment

23

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 26

24

653 654 655 656 657 658 659 660 661

Figure 5. Matching results of lipids, protein, amino-sugar and carbohydrate databases to its corresponding category according to the proposed multidimensional stoichiometric constraints classification (MSCC) (0; star) and the O:C and H:C tresholds provided by different studies (1 to 21). Proportion of total correctly matched compounds in absolute (a) and relative (b) terms. Correctmatched:Incorrect-matched ratio in absolute terms (c). Not-matched versus incorrect-matched compounds (d). Correct-matched:(Incorrect-matched+Not-matched) ratio in absolute terms (e) and relative terms (f). References for each of the studies are shown in table S-2.

662 663 664 665 666 667 668 669 670 671

ACS Paragon Plus Environment

24

Page 25 of 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

25

672 673 674 675 676 677 678 679 680 681 682 683

Figure 6. Pie diagrams representing the relative abundance (%) of the compound categories defined by the proposed multidimensional stoichiometric constraints classification (MSCC) (Table 1) for Brachypodium distachon (plant), Saccharomyces cerivisiae (yeast) and Drosophila melanogaster (insect). Each compound category is represented by different color. Metabolic variables that did not matched to any of the compound categories are shown in grey. Variables that matches into two compound categories are represented in black. The number of variables representing the metabolic fingerpints for each organism are shown below each pie diagram. Phytochemical compounds category (Phytochemicalc) is shown only for B. distachon. Oxy-aromatic compounds category is shown for S. cerevisiae and D. melanogaster.

684 685 686 687 688 689 690

ACS Paragon Plus Environment

25

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 26 of 26

26 691

For TOC only

692

693 694 695 696 697 698

ACS Paragon Plus Environment

26