Influence of Mass Resolving Power in Orbital Ion ... - ACS Publications

Nov 3, 2016 - University Hospital Olomouc, I.P. Pavlova 185/6, 779 00 Olomouc, ... Thermo Fisher Scientific, 355 River Oaks Parkway, San Jose, 95134 ...
0 downloads 0 Views 2MB Size
Subscriber access provided by RYERSON UNIVERSITY

Article

Influence of mass resolving power in orbital iontrap mass spectrometry-based metabolomics Lukáš Najdekr, David Friedecký, Ralf Tautenhahn, Tomáš Pluskal, Junhua Wang, Yingying Huang, and Tomas Adam Anal. Chem., Just Accepted Manuscript • Publication Date (Web): 03 Nov 2016 Downloaded from http://pubs.acs.org on November 3, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

1

Influence of mass resolving power in orbital ion-trap

2

mass spectrometry-based metabolomics

3

Lukáš Najdekr1,2, David Friedecký1,2, Ralf Tautenhahn3, Tomáš Pluskal4, Junhua Wang3,

4

Yingying Huang3, Tomáš Adam1,2

5

1

Laboratory of Metabolomics, Institute of Molecular and Translational medicine, Palacký University in

6

Olomouc, Hněvotínská 5, 775 15 Olomouc, Czech Republic 2

7

University Hospital Olomouc, I.P. Pavlova 185/6, 779 00 Olomouc, Czech Republic 3

8 9

4

Thermo Fisher Scientific, 355 River Oaks Parkway, San Jose, 95134 CA, USA

Whitehead Institute for Biomedical Research, 9 Cambridge Center, Cambridge, MA 02142-1479, USA

10 11

Abstract

12

Modern separation methods in conjunction with high resolution accurate mass (HRAM)

13

spectrometry can provide an enormous number of features characterized by exact mass and

14

chromatographic behavior. Higher mass resolving power usually requires longer scanning

15

times, and thus fewer data points are acquired across the target peak. This could cause

16

difficulties for quantification, feature detection and deconvolution. The aim of this work was

17

to describe the influence of mass spectrometry resolving power on profiling metabolomics

18

experiments.

19

From metabolic databases (HMDB, LipidMaps, KEGG), a list of compounds (41 474) was

20

compiled and potential adducts and isotopes were calculated (622 110 features). The number

21

of distinguishable masses was calculated for up to 3840k resolution. To evaluate these

22

models, human plasma samples were analyzed by LC-HRMS on an Orbitrap Elite hybrid

23

mass spectrometer (Thermo Fisher Scientific, CA, USA) at resolving power settings of 15k

1 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

24

(7.8 Hz) up to a maximum of 480k (1.2 Hz). Software XCMS 1.44, MZmine 2.13.1 and

25

Compound Discoverer 2.0.0.303 were used for evaluation.

26

In plasma samples, the number of detected features increased sharply up to 60k in both

27

positive and negative mode. However, beyond these values, it either flattened out or

28

decreased owing to technical limitations.

29

In conclusion, the most effective mass resolving powers for profiling analyses of metabolite

30

rich bio-fluids on the Orbitrap Elite were around 60 000 - 120 000 FWHM in order to retrieve

31

the highest amount of information. The region between 400 – 800 m/z was influenced the

32

most by resolution.

33

Graphical Abstract

34 35

Introduction

36

Analysis of complex samples by modern separation methods in conjunction with high

37

resolution accurate mass (HRAM) spectrometry can yield an enormous number of features

2 ACS Paragon Plus Environment

Page 2 of 26

Page 3 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

38

characterized by exact mass and chromatographic behavior. High resolution mass

39

spectrometry analyzers are usually based on FT-ICR, double focused magnetic sectors,

40

reflectron time-of-flight mass analyzers or ion traps. The last two techniques are

41

predominantly used in analyses of biological samples. A resolution of several tens of

42

thousands FWHM (full-width-at-half-maximum) with high speed data acquisition up to 100

43

Hz can be achieved with current time-of-flight instruments (TOF), for which scan rate is

44

independent of resolution. In contrast, mass spectrometers based on an orbital ion trap using

45

fast Fourier transformation (FFT) allow a resolution of up to 500 000 FWHM (at 200 m/z) at

46

the expense of lower acquisition rates. Hence, their higher mass resolving power usually

47

requires a longer scanning times, and consequently fewer data points are acquired across the

48

studied peak. This could cause problems for feature detection, the deconvolution of peaks and

49

quantification. Mass spectrometry measurement with a precision of four decimal places is

50

crucial for molecular formula prediction. With increasing resolution, the number of

51

compounds with apparently identical m/z decreases owing to isobaric matrix interferences. In

52

many analyses of highly complex samples (e.g., metabolomics, proteomics), the balance

53

between speed of mass spectral acquisition and mass resolution is an issue.

54

Chromatographic separation of complex biological matrices is still a considerable

55

challenge. The human serum metabolome is chemically highly variable and consists of many

56

classes of metabolites, including lipids (e.g., glycerolipids, phospholipids), amino acids,

57

hydroxycarboxylic acids, purines, etc. Analysis of such complex matrices is usually very

58

difficult and requires several different separation techniques (liquid chromatography, gas

59

chromatography, capillary electrophoresis)

60

metabolites can vary over six orders of magnitude. It has been reported in many studies that

61

right choice of separation methods may significantly improve number of detected features (3-

(1-2)

. Furthermore, concentration levels of

3 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

62

5

63

spectrometry data is crucial.

). Despite the high efficiency and selectivity of available separation methods, HRAM

64

The aim of this work was to describe this relationship by both theoretical and

65

experimental methods. In the first part, we compiled a list of 41 474 metabolites available in

66

public databases (HMDB, LipidMaps, KEGG) and calculated 622 110 potential adducts and

67

isotopes. Values of the partition coefficient (LogP) for each metabolite were retrieved from

68

the databases if available. The resulting lists were used for subsequent in silico calculations.

69

In the second part of the study, human plasma was analyzed at different mass spectral

70

resolutions and the experimental data was compared with the theoretically predicted behavior

71

of a high resolution mass spectrometer.

72

Materials and Methods

73

Chemicals

74

Solvents acetonitrile, methanol and water (all LC-MS quality) and acetone (HPLC

75

quality) as well as formic acid were purchased from Sigma-Aldrich (St. Louis, USA).

76

Samples

77

Plasma samples from healthy volunteers were collected at the University Hospital

78

Olomouc (Czech Republic). The samples were pooled and then stored at -80°C until analysis.

79

Written informed consent according to the Declaration of Helsinki by the World Medical

80

Association (WMA) was obtained from the volunteers for all samples used in the analyses.

81

In silico calculations

4 ACS Paragon Plus Environment

Page 4 of 26

Page 5 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

82

To obtain a comprehensive list of compounds known to constitute the human

83

metabolome, a list of positively ionizable metabolites was compiled from the HMDB,

84

LipidMaps and KEGG databases (41 474 metabolites in total after removing duplicates). All

85

calculations were performed using R software (6) in conjunction with the package Rdisop (7-10).

86

For each metabolite, the isotopic pattern based on the chemical formula was generated. From

87

the database generated list, adducts for M, M+1 and M+2 isotopes ([M+H]+, [M+NH4]+,

88

[M+Na]+, [M+K]+, [M+ACN+H]+) were calculated (622 110 features). Mass distribution

89

graphs for 15 000, 30 000, 60 000, 120 000, 240 000, 480 000, 960 000, 1 920 000 and 3 840

90

000 FWHM at 400 m/z were then plotted. The resolution in orbital ion trap based

91

spectrometers is not constant through all mass range, thus the correction for each m/z was

92

made (Figure S-6). By removing isobars from the metabolite list (41 474) based on m/z, a list

93

of unique m/z was generated (15 722). For each unique m/z in the list, the theoretical mass

94

spectrometry peak width [m/z - x; m/z + x] was calculated, where x = m/z mass/(resolving

95

power*((400/(m/z mass))^(1/2))). Consequently, the entire final list of 622 110 features was

96

searched against the interval defining the number of features not detectable due to isobaric

97

matrix interferences within the calculated range of each unique m/z (15 722).

98

The influence of resolution on the number of detected peaks was calculated for m/z up

99

to 2000. The list of generated in silico features (622 110) was filtered to give unique m/z

100

values (227 060). The first value from the list of unique m/z was taken and the peak width

101

based on resolution and its m/z were calculated. All m/z values lying within the peak width

102

were grouped and removed from the list. The final number of groups was considered to be the

103

number of peaks detectable in the mass spectrum for the given resolution and mass range.

104

Sample preparation and LC-MS method

5 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 6 of 26

105

Samples were prepared using a method modified from Yuan et al. (11) Pooled human

106

plasma sample (500 µL) was deproteinated by mixture of acetonitrile, acetone and methanol

107

(v/v 1:1:1, 1500 µL, -80°C), vortex mixed and incubated overnight at -80°C. Samples were

108

centrifuged (24 400 x g, 15 min, 4°C), freezed-dried and re-suspended in 1 mL of 10%

109

methanol and 90% water. Before analyses, samples were centrifuged again in order to remove

110

the debris and other solid objects.

111

The LC method followed that of Wang, J. et al.

(12)

using a Dionex UltiMate 3000

112

Rapid Separation LC system (Thermo Fisher Scientific, MA, USA). Samples were analyzed

113

on an Acquity UPLC BEH C18, 2.1 x 100 mm, 1.7 µm column (Waters, MA, USA). The

114

mobile phase consisted of water with 0.1% formic acid (mobile phase A) and methanol with

115

0.1% formic acid (mobile phase B). A flow rate of 0.35 mL/min was used with the following

116

elution gradient: t=0.0, 0.5% B; t=4.0, 70% B; t=4.5, 98% B; t=10.4, 98% B; t=10.6, 0.5% B;

117

t=15.0 min, 0.5% B. The column temperature was set at 40°C and the injection volume was 2

118

µL. Peaks in the retention window from 1 – 15 minutes were chosen for data processing.

119

Same LC method was used for both ionization modes (13).

120

An Orbitrap Elite hybrid mass spectrometer (Thermo Fisher Scientific, MA, USA)

121

was operated in either positive or negative mode at 15 000 (transient = 24 ms; 7.8 Hz), 30 000

122

(transient = 48 ms; 7.7 Hz), 60 000 (transient = 96 ms; 6.9 Hz), 120 000 (transient = 192 ms;

123

4 Hz), 240 000 (transient = 384 ms; 2.3 Hz)and 480 000 (transient = 768 ms; 1.2 Hz)FWHM

124

at 400 m/z over the ranges 70–500 m/z and 300–2000 m/z (acquisition at 480 000 FWHM was

125

possible owing to the use of a Tune Plus Developer’s Kit, kindly provided by Thermo Fisher

126

Scientific, MA, USA). Two mass range regions were chosen in order to increase sensitivity

127

and ensure one scan per spectrum (according to Mathieu equation). To eliminate variances

128

due to data acquisition, analyses of plasma samples were performed in sextuplicate for each

129

mass spectrometry resolution. Settings of the electrospray ionization were as follows: heater 6 ACS Paragon Plus Environment

Page 7 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

130

temperature 250°C; sheath gas 35 arbitrary units; auxiliary gas 15 arbitrary units; capillary

131

temperature 300°C and source voltage +3.0 kV. A Thermo Tune Plus 2.7.0.1103 SP1 was

132

used as instrument control software and data were acquired in centroid mode using Thermo

133

Excalibur 2.2 SP1.48 software (Thermo Fisher Scientific, MA, USA).

134

LC-MS data processing

135

The acquired dataset from the plasma samples was processed using the three most

136

frequently used software based on different feature detection algorithms, i.e., XCMS 1.44 (in

137

R software environment), Compound Discoverer 2.0.0.303 and MZmine 2.13.1 centWave

138

algorithm in XCMS, to detect regions of interest (ROI) within the particular m/z value. The

139

Continuous Wavelet Transform (CWT) was applied to the intensity values of the ROI and

140

local maxima in the CWT coefficients for each scale were determined

141

algorithms are mainly influenced by the parameters ppm mass error (ppm) and signal-to-noise

142

ratio (snthresh). Various values of these parameters were tested (ppm = 2, 4, 6, 8, 10, 12, 14,

143

16, 18, 20; snthresh = 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30) and after detailed study of the

144

results, “ppm= 8” and “snthresh=20” were chosen as the best settings in order to obtain less

145

false positive features (noisy features). These findings correspond to the recently published

146

work by Glauser et al.

147

grouping and retention time correction methods, see the Supporting Information. After

148

processing, number of peaks was counted for each data file individually. In case of XCMS

149

and MZmine no deisotoping module or package was used.

(15)

(14)

. Peak detection

. For details of the settings for each peak detection algorithm, peak

150

Retention time correction in each software was performed for individual sextuplicates.

151

The processed lists of features for the ranges 70–500 m/z and 300–2000 m/z for each

152

resolving power were merged at m/z 400 in order to obtain the number of features in the

153

spectra per resolving power. Coefficient of variance (CV) was calculated based on detected 7 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

154

areas across six replicate injections. Peaks with CV > 30% were considered as noise and

155

removed from further calculations.

156 157 158 159

Results and Discussion

160

In this work, we employed both theoretical and experimental methods to investigate

161

the relation between mass spectrometry resolving power, scan speed and capability of feature

162

detection in a metabolomics study.

163

The effect of separation could not be included in the calculations owing to the

164

unpredictable behavior of compounds during separation (e.g., lipids with very similar exact

165

mass but different formula and chromatographic behavior) and extensive variability of

166

chromatographic methods. Thus, the presented calculations are only valid for flow injection

167

analysis metabolomics experiments and “worst case scenarios” in separation methods.

168

In silico calculations

169

In silico calculations were performed to examine distributions of overlaps of m/z

170

representing (622 110) metabolites, isotopes and adducts over the range 50 – 2000 m/z. The

171

first step was filtering the combined metabolite list to identify unique m/z values. These

172

unique m/z values are plotted on the X axis in Figure 1, whereas the Y axis shows the number

173

of m/z values that lie within the interval [m/z - x; m/z + x], as described in the Materials and

174

Methods. Hence, the coordinates of each dot shown in Figure 1 represents the unique m/z

175

values (X axis) and the number of features that are apparently identical at a given resolution

176

and not recognizable within the curve of mass spectrometry peak with Gaussian profile (Y 8 ACS Paragon Plus Environment

Page 8 of 26

Page 9 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

177

axis). The scale variance (sigma squared) of the mass spectral peak was indirectly

178

proportional to the resolving power. Thus, the number of indistinguishable features decreased

179

with increasing resolving power. Two major regions with the highest number of m/z overlaps

180

can be seen in Figure 1. The first one is the region between 400 m/z and 600 m/z, which

181

corresponds to short peptides (di-, tri-, tetra-), secosteroids and partially to lipids with lower

182

m/z (e.g., glycerophosphocholines, glycerophosphoethanolamines, long chain fatty acids). The

183

second region was between 750 m/z to 1050 m/z, which corresponds mainly to lipids. The

184

three colored lines shown in Figure 1 indicate three different quantiles (0.99; 0.75; 0.50) of

185

the dot density distribution.

186

Figure 2 depicts the maximum number and median of the calculated overlapping m/z

187

values at a particular resolving power. The number of m/z masked by isobaric matrix

188

interferences decreased according to a power function with limit at one. Above a resolving

189

power of 240 000 FWHM, the maximum number of indistinguishable features did not

190

decrease. Evaluation of the structure of the data revealed that it was caused by isobaric

191

compounds with high structural diversity. For example, m/z 244.1549 corresponds to a M+K

192

ion of mass 205.1951 (C15H24) which applies to a group of sesquiterpenes and prenols with

193

130 possible overlaps. The other most abundant feature overlaps (m/z 205.1956, 298.2746,

194

322.2746, 450.3219) can mostly be attributed to various lipid classes and adducts

195

corresponding to those lipids (see Figure 1). The overall abundance of lipids in the compiled

196

metabolite list is 37.23 % (Fatty acyls: 6.28%; Glycerolipids: 9.35%; Glycerophospholipids:

197

10.65%; Polyketides: 3.46%; Prenol lipids: 1.85%; Saccharolipids: 0.03%; Sphingolipids:

198

1.34%; Sterol lipids: 4.27%). The median values (dashed line) show that even with very high

199

resolving power, it is not possible to separate all the features fully. At a resolving power of

200

3 840 000 FWHM, a maximum of 35.2 % of features were represented by a specific m/z with

9 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

201

no overlaps, whereas for a typical resolving power of 60 000 FWHM, only 3.63 % of features

202

could be separated.

203

Comparison of individual increments of features generated in silico was made (Figure

204

3). In the range 0 to 600 m/z (Figure 3a), there was a huge increase from 1.47 (0–100 m/z) up

205

to 23.11 (501–600 m/z). In the range 600–1400 m/z, the opposite trend was observed from

206

14.43 to 2.80 (Figure 3b). The curves in Figure 3c show similar trends in the range 1400–

207

2000 m/z (ratios 2.85 to 3.67) but a different dependence than observed at lower m/z because

208

the data plateaued at high resolution above 960 000 FWHM. Thus, in these theoretical

209

calculations, a resolution of millions FWHM still had an effect on the calculated number of

210

detectable unique masses.

211

LC-MS data

212

We analyzed plasma samples at different resolutions up to 480 000 FWHM to

213

investigate the influence of resolution on the number of detected features. The analysis lasted

214

15 minutes with gradient elution and a peak capacity P=167 (N = 90 000 – 576 000 N/m).

215

Total ion chromatograms and extracted ion chromatograms of selected isomeric compounds

216

are provided in the Supporting Information (Figure S-1, Figure S-3). Three different software

217

were used for processing the LC-MS data (data shown in Figure 4). Software XCMS,

218

MZmine and Compound Discoverer yielded similar trends, i.e., sharp increase in the number

219

of detected features with maximum at 60 000 FWHM in both positive and negative mode

220

(120 000 FWHM for Compound Discoverer in positive mode). When all peaks considered,

221

regardless the CV, the trends are peaking at 120 000 FWHM in positive and 60 000 FWHM

222

in negative mode. This findings suggesting that many noise peaks are detected during the

223

peak picking at resolution 120 000 FWHM in positive mode (See Supporting Information

224

Figure S-7, S-8 and S-9). Each software is capable of producing different types of lists of 10 ACS Paragon Plus Environment

Page 10 of 26

Page 11 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

225

features. XCMS is detecting all features without any further filtering, referred to as “raw

226

features” (for isotope and/or adduct grouping, other software modules should be used).

227

MZmine is capable of identifying “raw features” or if the deisotoping module is applied,

228

“isotopic features”. Thus, for ready comparison, XCMS raw feature and MZmine raw feature

229

lists were used to generate the plots shown in Figure 4. The “Unknown Detector” module in

230

Compound Discoverer is capable of detecting features only with the minimum number of

231

isotopes set to one or more, generating an “isotopic feature” list. Numbers obtained by

232

software Compound Discoverer (Figure 4C) represent the sum of compounds present in the

233

mass spectrum as several different ion species grouped as one (grouped isotopes and adducts).

234

For the abovementioned reasons, the absolute numbers in Figure 4 are not strictly comparable

235

and only the trends should be considered. All the software predicted an approximately five

236

times higher number of features for the positive mode compared to the negative mode. This

237

observation may origin from fact that plasma metabolites are predominantly ionized in

238

positive mode. The physical-chemical properties of the compounds and mobile phase

239

composition may also contribute to this observed phenomena

240

features observed in negative mode, the necessity for higher resolution is less crucial.

(16)

. Due to lower number of

241

In plasma samples in positive mode at 60 000 FWHM, 6778 features (MZmine, Figure

242

4B) were detected (if all peaks considered, regardless the CV, at 120 000 FWHM, 10 168

243

features were detected (MZmine, Supporting Information Figure S-8a)). The error bars at

244

higher resolutions will be result of more individual ion signals, therefore presenting a

245

challenge for the peak detection algorithms. In contrast, the number of features revealed by

246

the in silico calculations at 60 000 FWHM was 49 529 (Figure S-4). Although both analyses

247

took account of metabolites, isotopes and most common adducts, the number of features for

248

the plasma samples should be in theory even higher because it includes fragments, noise

249

features and other features possibly generated by the electrospray ionization. The discrepancy 11 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 26

250

in number of features may appear from different reasons. A large number of compounds listed

251

in the databases are present in biological samples at concentrations below the limit of

252

detection for current profiling methods (e.g., hormones, neurotransmitters) and non-targeted

253

metabolite extract, as well as contains exogenic compounds (drugs, food metabolites,

254

xenobiotics, etc.). Other fragments may be chemically and/or biologically unstable, and thus

255

lost. Poor ionizability of certain compound classes may also decrease number of detected

256

features. Another limitation is that some compounds are not retained or are trapped on the

257

column, thus undetectable. Further, isobars may show unpredictable behavior under the given

258

separation modes (e.g., reverse phase, aqueous normal phase, HILIC).

259

Figure 5 presents histograms of m/z values (by XCMS) from plasma samples showing

260

the distribution of individual data points in Figure 4. The overall trend in the curves mostly

261

follows that observed in the in silico calculations (Figure 1). The region 300–800 m/z showed

262

a strong dependence on resolution in positive mode (Figure 5a). In contrast, the region 800–

263

1400 m/z showed almost same number of feature for 60 000 and 120 000 FWHM (Figure 5a)

264

suggesting less need for the high resolution in this region. The resolution in orbital based ion

265

traps detectors is not linear (Figure S-6). This effect result in lower resolving power in region

266

with higher m/z values and thus less number of detected features. In the negative ionization

267

mode (Figure 5b), all curves at resolutions from 15 000 to 120 000 FWHM showed similar

268

profiles. The number of detected features with m/z above 400 at resolutions of 240 000 and

269

480 000 FWHM was significantly decreased due to insufficient scan frequency (data points).

270

This issue may be overcome by using Orbitrap mass spectrometer capable of higher scanning

271

speed.

272

In mass spectrometers based on an orbital ion trap, a high resolving power is achieved (17)

273

by using longer acquisition of ions in the trap, thus lowering the frequency of data points

274

(Figure S-2). It is generally accepted that there should be a minimum of four points per peak 12 ACS Paragon Plus Environment

Page 13 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

(14)

275

for automatic feature detection algorithms

. In order to minimize the influence of this

276

parameter, we conducted an experiment where the minimum number of data points per peak

277

was set to 3 (centWave). Regardless of the resolving power, a higher number of features was

278

detected (Figure S-5). However, detailed inspection of the data revealed that most of the

279

reported peaks were false positive hits. This may suggest that 60 000 - 120 000 FHWM is a

280

good compromise in terms of resolution and scan speed for metabolomics on the mass

281

spectrometer used in this study.

282

Although, very high resolution may not be suitable for general untargeted

283

metabolomics experiment, it can be very useful to define isotopic distribution and

284

determination of elemental composition.

285

Our compiled list covered metabolites present in a given biological system not taking

286

into account differences in tissue/bio-fluid distribution. It is also containing exogenic

287

compounds (drugs, xenobiotics, food and plant metabolites), which may be present to varying

288

degrees in biological samples depending on their nature. In silico calculations in this study

289

were focused on human plasma and it would be interesting to see its application in plant

290

metabolomics where many metabolites are preferably ionized in negative mode. Different

291

scenario may also appear in lipidomics or glycomics which are heavily influenced by high

292

number of structural isomers.

293

294

Conclusion

295

The aim of this work was to address theoretically and experimentally the relation

296

between mass spectrometry resolution and capability of feature detection in a metabolomics

297

experiment. In silico calculations showed that with increasing resolution, more features can be

13 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

298

detected (limited by the maximum number of features possible for the particular biological

299

matrix). LCMS data showed that the best resolution was 60 000 - 120 000 FWHM in positive

300

and 60 000 FWHM in negative ionization mode for ESI, thus our findings suggest that in

301

current metabolomic studies, a resolution above 60 000 FWHM is necessary to retrieve the

302

highest amount of information.

303

Funding:

304

The infrastructural part of this project (Institute of Molecular and Translational

305

Medicine) was supported by a NPU I (LO1304) and Czech Science Foundation Grant 15-

306

34613L. Tomáš Pluskal is a Simons Foundation Fellow of the Helen Hay Whitney

307

Foundation.

308

309

References

310

(1)

Kuehnbaum, N. L.; Britz-Mckibbin, P. Chem. Rev. 2013, 113, 2437–2468.

311 312 313 314 315

(2)

Psychogios, N.; Hau, D. D.; Peng, J.; Guo, A. C.; Mandal, R.; Bouatra, S.; Sinelnikov, I.; Krishnamurthy, R.; Eisner, R.; Gautam, B.; Young, N.; Xia, J.; Knox, C.; Dong, E.; Huang, P.; Hollander, Z.; Pedersen, T. L.; Smith, S. R.; Bamforth, F.; Greiner, R.; McManus, B.; Newman, J. W.; Goodfriend, T.; Wishart, D. S. PLoS One 2011, 6, e16957.

316

(3)

Contrepois, K.; Jiang, L.; Snyder, M. Mol. Cell. Proteomics 2015, 14, 1684–1695.

317 318

(4)

Zhang, T.; Creek, D. J.; Barrett, M. P.; Blackburn, G.; Watson, D. G. Anal. Chem. 2012, 84, 1994–2001.

319 320

(5)

Zhang, R.; Watson, D. G.; Wang, L.; Westrop, G. D.; Coombs, G. H.; Zhang, T. J. Chromatogr. A 2014, 1362, 168–179.

321 322 323

(6)

R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.Rproject.org/

324

(7)

Böcker, S.; Letzel, M. C.; Lipták, Z.; Pervukhin, A. Bioinformatics 2009, 25, 218–224. 14 ACS Paragon Plus Environment

Page 14 of 26

Page 15 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

325 326

(8)

Böcker, S.; Lipták, Z.; Martin, M.; Pervukhin, A.; Sudek, H. Bioinformatics 2008, 24, 591–593.

327 328

(9)

Böcker, S.; Letzel, M.; Lipták, Z.; Pervukhin, A. Proc. Work. Algorithms Bioinforma. (WABI 2006) 2006, 4175, 12–23.

329

(10)

Böcker, S.; Lipták, Z. Algorithmica (New York) 2007, 48, 413–432.

330

(11)

Yuan, M.; Breitkopf, S. B.; Yang, X.; Asara, J. M. Nat. Protoc. 2012, 7, 872–881.

331 332

(12)

Wang, J.; Christison, T. T.; Misuno, K.; Lopez, L.; Huhmer, A. F.; Huang, Y.; Hu, S. Anal. Chem. 2014, 86, 5116–5124.

333 334 335

(13)

Dunn, W. B.; Broadhurst, D.; Begley, P.; Zelena, E.; Francis-McIntyre, S.; Anderson, N.; Brown, M.; Knowles, J. D.; Halsall, A.; Haselden, J. N.; Nicholls, A. W.; Wilson, I. D.; Kell, D. B.; Goodacre, R. Nat. Protoc. 2011, 6, 1060–1083.

336

(14)

Tautenhahn, R.; Bottcher, C.; Neumann, S. BMC Bioinformatics 2008, 9, 504.

337 338

(15)

Glauser, G.; Grund, B.; Gassner, A.-L.; Menin, L.; Henry, H.; Bromirski, M.; Schutz, F.; McMullen, J.; Rochat, B. Anal. Chem. 2016, acs. analchem.5b04689.

339

(16)

Cech, N. B.; Enke, C. G. Mass Spectrom. Rev. 2002, 20, 362–387.

340

(17)

Zubarev, R. A.; Makarov, A. Anal. Chem. 2013, 85, 5288–5296.

341 342

15 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

343 344

Figure 1: In silico calculation of mass distribution at resolution of 15 000 (A), 120 000 (B)

345

and 960 000 (C) FWHM. The X axis shows the number of unique m/z values filtered from

346

the compiled list of metabolites, whereas the Y axis shows the number of m/z values that fit

347

into the interval [m/z - x; m/z + x], where x is based on the resolution. The lines denote

348

different quantiles (from the top 0.99, 0.75 and 0.50, respectively). The colors of the 16 ACS Paragon Plus Environment

Page 16 of 26

Page 17 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

349

independent dots indicate polarity: green = polar, red = non-polar (based on their value of

350

logP – octanol/water).

351

352

Figure 2: Regression of overlapping features based on resolution (in silico calculation).

353

The full line represents the maximum value of indistinguishable features according to the

354

resolving power. The dashed line shows the median value of indistinguishable features in the

355

list of m/z for each resolving power. The percentage of m/z values represented in mass spectra

356

by a single value is shown by the red line.

357 358

359

17 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

360 361

Figure 3: Relative increase of detected features from in silico calculations divided into

362

100 m/z bins. The Y axis represents the ratio of detected features at the specific resolution

363

standardized by value for 15 000 FWHM. The X axis represents the resolution from 15 000 to

364

3 840 000 FWHM.

365 366

Figure 4: Number of detected features in plasma samples. Each part of the picture

367

represents results from different software in both positive (yellow line) and negative mode

368

(blue line): A) XCMS (raw features), B) MZmine (raw features), C) Compound list (grouped

369

features as a compounds).

370

371

18 ACS Paragon Plus Environment

Page 18 of 26

Page 19 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

372 373

Figure 5: Histograms of m/z values in plasma samples in positive (A) and negative (B)

374

mode (XCMS). Each point on the lines represents the frequency of m/z values within a 50 Da

375

window. Numbers in the legend shows resolution and scan speed respectively.

376

KEYwords: untargeted metabolomics, Orbitrap, high resolution, peak-picking, resolving

377

power, mass spectrometry

378

Shortcuts:

379

CV

Coefficient of variance

380

HRAM

High resolution accurate mass

381

HMDB

Human Metabolome Database

382

KEGG

Kyoto Encyclopaedia of Genes and Genomes

383

LC

Liquid chromatography

384

LC-HRMS

Liquid chromatography-high resolution mass spectrometry

385

LC-MS

Liquid chromatography mass spectrometry

386

HPLC

High performance liquid chromatography

387

FWHM

Full-width-at-half-maximum

388

FFT

Fast Fourier transformation

389

TOF

Time-of-flight

390

WMA

World Medical Association

391

CWT

Continuous Wavelet Transform 19 ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

392

ROI

Page 20 of 26

Regions of interest

393

Table of content: Sample preparation Peak detection algorithm settings Figure S-1: Total ion chromatograms Figure S-2: Lower number of datapoints Figure S-3: Separation of isomeric compounds on the column Figure S-4: Number of detectable compounds based on the list of unique masses Figure S-5: XCMS peak picking with 3 points/peak Figure S-6: Dependency of m/z and resolution in Orbitrap based mass spectrometers Figure S-7: All detected features by Compound Discoverer Figure S-8: All detected features by MZmine Figure S-9: All detected features by XCMS 394

20 ACS Paragon Plus Environment

page S-3 S-3 S-5 S-7 S-9 S - 10 S - 11 S - 12 S - 13 S - 14 S - 15

Page 21 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Graphical abstrakt Graphical abstrakt

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1: In silico calculation of mass distribution at resolution of 15 000 (A), 120 000 (B) and 960 000 (C) FWHM. The X axis shows the number of unique m/z values filtered from the compiled list of metabolites, whereas the Y axis shows the number of m/z values that fit into the interval [m/z - x; m/z + x], where x is based on the resolution. The lines denote different quantiles (from the top 0.99, 0.75 and 0.50, respectively). The colors of the independent dots indicate polarity: green = polar, red = non-polar (based on their value of logP – octanol/water). Figure 1 241x291mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 22 of 26

Page 23 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 2: Regression of overlapping features based on resolution (in silico calculation). The full line represents the maximum value of indistinguishable features according to the resolving power. The dashed line shows the median value of indistinguishable features in the list of m/z for each resolving power. The percentage of m/z values represented in mass spectra by a single value is shown by the red line. Figure 2 77x75mm (300 x 300 DPI)

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 3: Relative increase of detected features from in silico calculations divided into 100 m/z bins. The Y axis represents the ratio of detected features at the specific resolution standardized by value for 15 000 FWHM. The X axis represents the resolution from 15 000 to 3 840 000 FWHM. Figure 3 76x24mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 24 of 26

Page 25 of 26

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 4: Number of detected features in plasma samples. Each part of the picture represents results from different software in both positive (yellow line) and negative mode (blue line): A) XCMS (raw features), B) MZmine (raw features), C) Compound list (grouped features as a compounds). Figure 4 255x77mm (300 x 300 DPI)

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5: Histograms of m/z values in plasma samples in positive (A) and negative (B) mode (XCMS). Each point on the lines represents the frequency of m/z values within a 50 Da window. Numbers in the legend shows resolution and scan speed respectively. Figure 5 77x28mm (300 x 300 DPI)

ACS Paragon Plus Environment

Page 26 of 26