Using Receiver Operating Characteristic Curves ... - ACS Publications

Subscriber access provided by University of Newcastle, Australia

Article

Using ROC Curves to Optimize Discovery-Based Software with Comprehensive Two-Dimensional Gas Chromatography with Time-of-Flight Mass Spectrometry Brooke C. Reaser, Bob W. Wright, and Robert E. Synovec Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.6b04991 • Publication Date (Web): 16 Feb 2017 Downloaded from http://pubs.acs.org on February 22, 2017

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Using ROC Curves to Optimize Discovery-Based Software with Comprehensive TwoDimensional Gas Chromatography with Time-of-Flight Mass Spectrometry

Brooke C. Reaser,1 Bob W. Wright,2 Robert E. Synovec*,1 1

Department of Chemistry, Box 351700, University of Washington, Seattle, Washington, 98198, U.S.A. 2

Pacific Northwest National Laboratory, Battelle Boulevard, P.O. Box 999, Richland, Washington, 99352, U.S.A.

Revised version of manuscript AC-2016-049919, prepared for consideration to publish in Analytical Chemistry

February 10, 2017

* CORRESPONDING AUTHOR: Tel: +1-206-685-2328; fax: +1-206-685-8665 EMAIL: [email protected]

ACS Paragon Plus Environment


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

1 2

Page 2 of 29

ABSTRACT We report a quantitative approach to optimize implementation of discovery-based

3

software for comprehensive two-dimensional gas chromatography coupled with time-of-flight

4

mass spectrometry (GC × GC – TOFMS). The software performs a tile-based Fisher ratio (F-

5

ratio) analysis, and facilitates a supervised non-targeted analysis based upon the experimental

6

design to aid in the discovery of analytes with statistically different variances between sample

7

classes. The quantitative approach for software optimization uses receiver operating

8

characteristic (ROC) curves. The area under the curve (AUC) for each ROC curve serves as a

9

quantitative metric to optimize two key algorithm parameters: the signal-to-noise ratio (S/N)

10

threshold of the data prior to calculating F-ratios at each m/z mass channel, and the number of

11

these F-ratios per m/z used to calculate the average F-ratio of a tile. A total of 25 combinations

12

of S/N threshold by number of m/z were studied. Fifty analytes were spiked into a diesel fuel at

13

two concentration levels to produce two sample classes that should in principle produce 50

14

positive instances in the ROC curves. The “sweet spot” for F-ratio analysis was determined to be

15

a S/N threshold of 10 coupled with a maximum of the 10 most chemically selective m/z

16

(requiring a minimum of 3 m/z), corresponding to a ~ 21% improvement in the discrimination of

17

true positives relative to prior studies. This equates to an additional 9 true positives being

18

discovered at a false positive probability of 0.2, and 5 additional true positives being found

19

overall. Furthermore, optimization of these software parameters did not depend upon a priori

20

determination of the statistically correct number of positive instances in the sample classes. The

21

AUC metric appears to be suitable for the evaluation of all data analysis methods that utilize the

22

proper experimental design.

23 24 2 ACS Paragon Plus Environment

Page 3 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

25 26


INTRODUCTION Comprehensive two-dimensional (2D) gas chromatography coupled with time-of-flight

27

mass spectrometry (GC × GC – TOFMS) is a powerful technique for the analysis of compounds

28

in complex mixtures such as those found in food quality and control,1–3 waste water,4,5

29

metabolomics,6–9 and petroleum products.10–13 The inherent complexity of GC × GC – TOFMS

30

data often requires advanced chemometrics for an informative analysis. Analytical approaches

31

and algorithms applied to GC × GC – TOFMS data sets generally fall into two categories:

32

targeted and non-targeted, with the latter further divided into supervised and unsupervised.

33

Targeted analyses focus on preselected, specific analytes of interest. In contrast, discovery-based,

34

non-targeted approaches aim to discover any or all analytes that may be of interest without

35

requiring a priori knowledge of their character or identity. Additionally, supervised non-targeted

36

approaches include classification of samples with more emphasis on experimental design.14,15

37

We previously reported the development of discovery-based software for the analysis of

38

GC × GC – TOFMS data. Tile-based Fisher ratio (F-ratio) analysis facilitates a supervised non-

39

targeted method that uses information based upon the experimental design to aid in the discovery

40

of analytes whose variances are statistically different between sample classes.15–18 Since its initial

41

development,15,16 the tile-based F-ratio software has been applied to the analysis of yeast

42

metabolites18 and chemically-altered diesel fuel,17 successfully elucidating statistically

43

significant class-distinguishing analytes in complex sample matrices. Building upon these

44

previous works, the present study serves to validate and optimize the tile-based F-ratio analysis.

45

Method validation is an important part of all analytical technology developments. Several

46

agencies, including the International Union of Pure and Applied Chemistry (IUPAC),19

47

Eurachem,20 and the Association of Analytical Communities (AOAC)21 have published

3 ACS Paragon Plus Environment


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

48

guidelines on the performance characteristics required to properly validate a new analytical

49

method. While each agency has its own specific recommendations, they have a few key

50

requirements in common: sensitivity, selectivity, accuracy, precision, working range, limit of

51

detection (LOD) and limit of quantification (LOQ). While these guidelines are well established

52

for new analytical methods, no such clearly published guidelines exist for the validation of new

53

chemometric algorithms and software.

54

Page 4 of 29

Introduction of new chemometric algorithms and software should require rigorous

55

validation and optimization; however, often new software algorithms are simply demonstrated

56

and put into implementation, without rigorous validation. Validation may include applying the

57

method on standard mixes of known composition15,16 or employing previously analyzed data

58

sets18 where the “correct” answer has previously been determined. In the case of employing a

59

standard mixture, the analyst will often assume that every analyte in the mixture is in fact a true

60

positive when the sample matrix or analysis conditions may prevent that from actually being the

61

case. The use of quantitative metrics, such as standard errors, correlation coefficients or

62

statistical metrics like F-tests or Students’ t-tests, are commonly utilized for validation. The

63

thorough study and evaluation of a software platform should also go beyond the validation stage.

64

Key input parameters should be evaluated, to ensure algorithm optimization. Optimization of

65

software algorithms and their various parameters is rarely studied in detail, likely because it can

66

be a tedious and subjective process if one lacks the proper quantitative metric. To address this

67

issue, herein we report the optimization of the tile-based F-ratio software15–18 using the area

68

under the curve (AUC) of a receiver operating characteristic (ROC) curve as a quantitative

69

metric.22,23 While the AUC for ROC curves has been applied for other purposes, implementation

70

of the AUC as a quantitative metric for software optimization is presented as a new approach.


Page 5 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

71


ROC curves were developed during World War II in an attempt to quantify the ability of a

72

radar operator to differentiate between the signals of ally and enemy aircraft.24 Since then, ROC

73

curves have been used extensively in medical diagnostics,25–27 evaluating analytical method

74

performance,22,28,29 and machine learning,30 but ROC curves have remained under-utilized in

75

analytical and chemometric analyses.23 While machine learning and chemometric analyses are

76

inherently interdisciplinary and often share similar algorithms, they have distinct differences.

77

Machine learning is predominately a computer science methodology that involves computers

78

learning without being explicitly programmed, while chemometrics is the application of linear

79

algebra coupled with statistics to extract meaningful information from chemical systems.

80

A ROC curve is a plot of the true positive probability (TPP) versus the false positive

81

probability (FPP) at a particular threshold defined by the analyst, usually relating to a minimum

82

signal or concentration. The terms true-positive rate (TPR) and false positive rate (FPR) are also

83

used throughout the literature, interchangeably with TPP and FPP. For the purposes of this paper,

84

TPP and FPP are used. The TPP, also known as the sensitivity, is equal to the sum of the true

85

positives over the total number of positive instances. The FPP, also known as 1-specificity, is

86

equal to the sum of the false positives over the total number of negative instances. The AUC is a

87

popular metric for the quantitative, statistically-based evaluation of ROC curves. It is equivalent

88

to the probability of stochastic domination in non-parametric statistics; that is, the AUC defines

89

the probability that a randomly selected value will be correctly ranked as either a true positive or

90

false positive. Values for the AUC range from 0.5 for a “useless” test to 1.0 for a perfect test.

91

In the initial tile-based F-ratio software development, a signal-to-noise (S/N) threshold of

92

3 and all of the mass channels (m/z) were utilized.15 The S/N threshold is used in the data

93

reduction step prior to the calculation of F-ratios, and the number of m/z is used to calculate the



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

94

average F-ratio value for each tile. Upon gaining more experience with the software, we have

95

recognized that a rigorous evaluation of these two parameters is warranted. Accordingly, herein

96

we employ the AUC metric for the ROC curve to evaluate and optimize these two key

97

parameters. Each ROC curve quantitatively indicates how the S/N threshold and number of m/z

98

both impact the sensitivity of the software to discover chemical features of interest. The

99

hypothesis is that a higher S/N threshold (but not too high) coupled with use of less than all of

100

the m/z would likely improve the capability of finding and more highly ranking true positives,

101

while reducing the number of false positives.

102

Page 6 of 29

Fifty analytes were spiked into a diesel fuel at two similar concentration spike levels (30

103

native and 20 non-native compounds), to serve as a pair of suitably complex samples to render

104

challenging the software optimization using the AUC approach. Thus, the theoretical maximum

105

number of positive instances is 50. The two spiked diesel samples defined two classes for F-ratio

106

comparison. The S/N threshold used in the data reduction step prior to the calculation of F-ratios,

107

and number of m/z used to calculate the average F-ratio values were varied, and F-ratio hit lists

108

were evaluated. A total of 25 combinations of S/N threshold by number of m/z were studied.

109

Varying the S/N threshold and number of m/z engenders optimization of the tile-based F-ratio

110

software, to find the “sweet spot” in terms of S/N threshold, coupled with use of only the most

111

chemically selective m/z (i.e., those with the highest F-ratios). Additionally, it is demonstrated

112

that the number of positive instances does not need to be known prior to analysis, as the AUC

113

metric still provides adequate information regarding the sensitivity of the F-ratio analysis.

114

EXPERIMENTAL

115 116

Diesel fuel (~ 1 gallon) was collected from a fueling station in the Seattle area. A nonnative internal standard, 1,2,3-trichlorobenzene (0.5170 g), was spiked into 1 L of diesel fuel to


Page 7 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


117

create a stock solution at 608 ppm trichlorobenzene. This stock solution was used for all serial

118

dilutions. Two neat spike solutions were made of 30 native compounds and 20 non-native

119

compounds by adding ~ 0.1 g of each component to a scintillation vial. A list of each compound

120

and exact mass is provided in in Tables S1 (natives) and S2 (non-natives) in Supporting

121

Information. The native and non-native compounds were then quantitatively transferred using

122

stock solution to an amber bottle of known mass. Stock solution was then added to each of the

123

two neat spike samples until the final mass reached ~ 100 g, creating ~ 1000 ppm native and ~

124

1000 ppm non-native solutions. Approximately 80 g of the native spike solution and 8 g of the

125

non-native spike solution were then transferred to a third amber bottle. Stock solution was added

126

to bring the total mass of the contents to ~ 100 g, creating a spike solution containing ~ 800 ppm

127

native compounds and ~ 80 ppm non-native compounds in diesel. This solution was diluted by

128

half in stock solution successively until two spike solutions were obtained at the following

129

nominal concentrations: 200/20 and 100/10 ppm native/non-native. The exact concentrations of

130

the analytes in these spike solutions is provided in Tables S1 and S2 in Supporting Information.

131

A neat standards mix with no additional solvent was created in order to find accurate retention

132

times for each analyte by adding 250 µL of each compound into a scintillation vial.

133

GC × GC–TOFMS data were collected using an Agilent 6890N GC (Agilent

134

Technologies, Palo Alto, CA, USA) with a LECO Pegasus III TOFMS equipped with a 4D

135

thermal modulator upgrade (LECO, St. Joseph, MI, USA). A reverse column configuration was

136

employed with a polar 29.75 m × 250 µm × 0.25 µm Rxi-17Sil MS film primary column, 1D, and

137

a nonpolar 1.5 m × 180 µm × 0.18 µm Rxi-1ms film (Restek, Bellefonte, PA, USA) secondary

138

column, 2D. The GC inlet was set to 275 °C and the transfer line was set to 280 °C. Ultra-high

139

purity helium (Grade 5, 99.999%, Praxair, Seattle, WA, USA) was used as the carrier gas at a



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 29

140

constant flow of 2.0 mL/min. An Agilent 7683 autosampler was used to inject 1 µL for diesel

141

samples or 0.1 µL of neat standards mix with a 300:1 split ratio. The primary column was

142

maintained at 40 °C for 1 min, then ramped at 5 °C/min to 300 °C where it was held for 2 min.

143

The secondary column and the modulator block followed the same temperature program with a

144

+15 °C and +60 °C offset, respectively. The modulation period was 2 s with 0.5 s hot and cold

145

pulses for each stage. The ion source was set to 225 °C, the electron impact energy was 70 eV,

146

and the detector voltage was 1740 V. Mass channels, m/z 35-300, were collected at 100 spectra/s

147

after a 10 s acquisition delay. Six injection replicates of each spike sample (200/20 and100/10

148

ppm native/non-native with internal standard) and the stock solution (diesel with internal

149

standard), and two injection replicates of the neat standards were analyzed using the GC × GC–

150

TOFMS method above. These latter two injection replicates were used to determine the retention

151

times of the spiked standards. A lack of variability in retention times between replicates

152

indicated that two replicates were sufficient to obtain average retention times of the spikes.

153

The GC × GC–TOFMS data were data processed using the instrument software

154

(ChromaTOF, version 3.32, LECO, St. Joseph, MI, USA), including baseline correction, peak

155

finding and peak area calculation. Peak areas were calculated at unique m/z for the 30 native

156

compounds in the stock solution and in each spike sample. Standard addition method (SAM)

157

plots were prepared (see Figure S1 in Supporting Information) to determine the native

158

concentration of these analytes in diesel as reported in Table S1. The GC × GC–TOFMS data

159

were also imported from the instrument software into Matlab R2015b (Mathworks Inc., Natick,

160

MA, USA) using an in-house data converting algorithm. The data were analyzed with the in-

161

house developed tile-based F-ratio software, an algorithm that elucidates chemical features that

162

distinguish between two classes of samples, such as two different spike concentration levels. We


Page 9 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


163

direct the reader to previous publications and their respective supplementary information for a

164

thorough description of the tile-based F-ratio software.15–18 For all F-ratio software executions, a

165

1

166

were employed. All chromatograms were normalized to the sum of the total ion current (TIC),

167

and retention time alignment was not required. The S/N threshold and number of m/z utilized for

168

the calculation of the average F-ratio were varied to find the optimum values for software

169

performance. The S/N threshold values examined were 3, 6, 10, 30 and 100, while the number of

170

m/z examined were 3, 6, 10, 20, and all m/z, for a total of 25 S/N by m/z combinations. At least 3

171

m/z were required for a measureable F-ratio, and no F-ratio threshold minimum was employed.

D tile size of 6 s, a 2D tile size of 100 ms, a 1D cluster size of 4 s, a 2D cluster size of 60 ms

172

The S/N threshold is applied by calculating the standard deviation of the noise on each

173

tile region and multiplying that standard deviation by the factor given by the desired threshold

174

(i.e., by a factor of 3, 6, 10, 30 or 100 per the S/N thresholds). Any m/z with a signal in the same

175

tile falling below that S/N threshold is then excluded from the analysis. The number of m/z used

176

for the analysis simply refers to the maximum number of m/z whose F-ratios are used to calculate

177

the average F-ratio of the tile, with the minimum being 3 m/z. For example, if a tile has a feature

178

containing 52 m/z occurring above the S/N threshold and the number of m/z to be examined was

179

set to 10 m/z, only the 10 m/z with the highest F-ratio values would be included in calculation of

180

the average F-ratio. However, if a tile has a feature having only 6 m/z, all 6 m/z would be utilized

181

for the calculation of the average F-ratio even though the software was set to examine 10 m/z

182

because the tile meets the criteria of having at least 3 m/z above the S/N threshold.

183

RESULTS AND DISCUSSION

184

The reverse column GC × GC configuration utilized a substantial portion of the 2D

185

separation space, and was well suited for the separation presented in Figure 1(A) of the

186

compound classes of interest in diesel fuel: alkanes, olefins, cyclic alkanes, and aromatics. The 9 ACS Paragon Plus Environment


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 29

187

standards utilized for the spike study also encompass the 2D region of the chromatogram as

188

shown in Figure 1(B), where each standard peak is numbered according to its respective analyte

189

number in Tables S1 (native compounds) and S2 (non-native compounds).

190

The F-ratio software was employed for the sample class comparison of the six injection

191

replicates of each spiked diesel sample (200/20 versus 100/10 ppm native/non-native

192

compounds) to generate a hit list of class distinguishing analytes. While the software was

193

evaluated for 25 S/N by m/z combinations, we focus initially on the previously employed

194

parameters, a S/N threshold of 3 and all m/z,15,17,18 followed by a detailed focus on the

195

combination of parameters that turned out to be optimal, and finish with a summary for all 25

196

combinations. The distribution of the average F-ratio values in Figure 2(A) corresponds to the

197

class comparison using a S/N threshold of 3 and all m/z. The maximum of the F-ratio distribution

198

is about 0.5, and the sharp peak of the distribution drops nearly to baseline at an F-ratio of about

199

4. The tail of the distribution (shown in the inset figure) begins to be sparse from about 10 to

200

638.9 (hit #1, cyclooctane), and contains the class distinguishing chemical features (at higher F-

201

ratio) that are most likely to be true positives and thus most likely to be the spiked analytes.

202

Using the known 2D retention times of the spiked standards, the hit list was evaluated. Hits that

203

corresponded to spiked analytes were designated “true positives,” while the rest were designated

204

“false positives.” The hit list was investigated until the 50th false positive was identified. Figure

205

2(B) shows the F-ratio value versus hit number in the F-ratio hit list. The blue circles correspond

206

to true positives (spiked standards), while the red circles correspond to false positives. The F-

207

ratios range from 638.9 (hit #1, cyclooctane) to 3.2 (hit #83, false positive).

208 209

Next, the results for the combination of parameters that turned out to be optimal (S/N threshold of 10 and 10 m/z) is provided in Figure 3(A). This class comparison distribution has a


Page 11 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


210

maximum F-ratio at about 1 and drops to baseline at an F-ratio of about 6.5. The area under the

211

F-ratio distribution in Figure 3(A) is about 20% less than that of the curve in Figure 2(A). This

212

indicates use of the stricter parameters reduced the number of total hits returned by the software

213

(4,854 for S/N threshold of 10 and 10 m/z versus 5780 for S/N threshold of 3 and all m/z). The

214

shape of the curve is also shifted down and to the right, indicating a shift to higher F-ratio values

215

as a consequence of using the top 10 m/z for the average F-ratio calculation, decreasing the

216

ability of noisy m/z to lower the average. The tail of the distribution (in the figure inset) begins to

217

be sparse from about 20 to 2014 (hit #1, cyclooctane), and contains the class distinguishing

218

chemical features most likely to be true positives, and thus spiked analytes. Figure 3(B) shows

219

the F-ratio value versus hit number in the F-ratio hit list for this class comparison. Compared to

220

the previous parameters utilized for Figure 2(B), that is S/N threshold 3; all m/z, the true positives

221

(blue) are shifted up and to the left, while the false positives (red) fall down and to the right due

222

to the better combination of parameters, that is S/N threshold 10; 10 m/z. The F-ratios in Figure

223

3(B) range from 2014 (hit #1, cyclooctane) to 5.0 (hit #87, false positive), a much greater range

224

of average F-ratio values compared to that observed in Figure 2(B). In Figure 3, the exclusion of

225

non-class distinguishing m/z increases the overall average F-ratio values, prioritizes true

226

positives, and therefore improves the sensitivity of the discovery-based method.

227

While the results in Figures 2(B) and 3(B) were readily interpreted to gain insight into the

228

F-ratio software performance, the interpretation is qualitative and not analytically rigorous.

229

Therefore, using ROC curves, an AUC quantitative metric was investigated as a means to

230

evaluate and optimize these two key parameters: S/N threshold and number of m/z required. The

231

ROC curves for these two F-ratio analyses are presented in Figure 4(A), where the red curve

232

corresponds to the parameters S/N threshold of 3 and all m/z, while the blue curve corresponds to



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 29

233

the parameters S/N threshold of 10 and 10 m/z. An explanation of how the ROC curves are

234

created can be found in Supporting Information. For the ROC curves shown in Figure 4(A), it

235

was assumed that all 50 analytes spiked into the diesel fuel would be discovered as true positives

236

such that the positive instances used for calculating the TPP was 50. For symmetry, 50 negative

237

instances were evaluated for the FPP, effectively minimizing the length of the hit list evaluated

238

while sufficiently investigating the ROC curves as they plateau.

239

As indicated by the shift upward and to the left of the true positives in Figure 3(B), the

240

blue ROC curve in Figure 4(A) shows the increase in sensitivity, or TPP, for a given FPP, or 1-

241

specificity, by using S/N 10 and 10 m/z. Increasing the S/N threshold from 3 to 10 decreased the

242

influence of overly noisy m/z signals. Using 10 m/z rather than all m/z ensures that the 10 largest

243

F-ratios for a given tile are used to determine the average F-ratio, thus decreasing the influence

244

of smaller F-ratios at m/z with spurious covariance. A quantitative metric, the AUC can aid in the

245

comparison of ROC curves generated by changing the algorithm parameters. For example, the

246

AUC is 0.61 for the ROC curve using a S/N threshold of 3 and all m/z, while the AUC is 0.74 for

247

the ROC curve using a S/N threshold of 10 and 10 m/z. This means, for the combination of a S/N

248

threshold of 10 and 10 m/z is quantitatively superior to using a S/N threshold of 3 and all m/z,

249

with a ~ 21% improvement in software performance.

250

One might presume that, upon software optimization, the AUC would approach the ideal

251

value of 1.00. However, the experimental design provided very challenging datasets due to the

252

number of native compounds naturally occurring at a high concentration in the diesel. Keeping in

253

mind that native concentrations are not known a priori, some of these compounds when spiked

254

in at the 200 ppm and 100 ppm level, were not apt to be discovered due to the lack of a

255

statistically sufficient increase in the total concentration given by the spike. This is a major


Page 13 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


256

reason why the ROC curves level off at a sensitivity well below 1.00. For Figure 4(A), it was

257

assumed that all 50 spiked analytes had a statistically sufficient concentration difference between

258

the two spike levels. In a sense, the number of positive instances was not actually 50, but rather

259

is likely much less. Despite this issue, the ROC curve and the AUC metric were able to

260

differentiate the increase in sensitivity using a S/N threshold of 10 and 10 m/z compared to using

261

a S/N threshold of 3 and all m/z.

262

In order to further investigate this issue, for the purpose of determining the number of

263

spiked compounds that could (or should) serve as positive instances, the native spiked analyte

264

concentrations were determined using the SAM (standard addition method) with the two spike

265

levels, nominally 200 ppm and 100 ppm, and the stock solution, nominally 0 ppm. The

266

concentrations of the native analytes calculated from the SAM plots, as demonstrated in Figure

267

S1, are provided in Table S1 in Supporting Information. Worth noting are the three analytes with

268

0.0 ppm concentration in the diesel: 2,2,4-trimethylpentane, cyclohexene, and cyclooctane.

269

These compounds, though having negligible concentrations in this investigation so were not

270

detected, can be found regularly in other blends of diesel. For each native compound, the native

271

concentration was added to the actual spike concentration (at both spike levels), and the

272

concentration ratio was determined as shown in Table S1 Supporting Information. We have

273

previously shown that tile-based F-ratio analysis is capable of discovering analytes at a

274

concentration ratio between sample classes down to ~ 1.06.16 Accordingly, the native compounds

275

that have a concentration ratio less than 1.06 are shaded in grey in Table S1. In order to

276

statistically determine which of the 30 spiked native compounds were not legitimate positive

277

instances, a t-test was performed as described in Harris,31 with the results provided in the last

278

column of Table S1. The t-test values were calculated using the peak areas of a selective m/z for



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 29

279

each native analyte in all six injection replicates of the 200 ppm, 100 ppm and the stock solution

280

(that is, nominally 0 ppm). The peak areas were normalized to the internal standard. At a 95%

281

confidence interval, the t-table value is 2.57 for a two-tailed test with 5 degrees of freedom, so all

282

t-values less than that correspond to analytes that have no statistically significant difference in

283

their concentration between the 200 ppm and 100 ppm spike levels, as designated by yellow

284

highlighting in Table S1. Corresponding p-values for the t-tests were also calculated (Table S1 in

285

Supporting Information with discussion). Nine analytes were determined not to be statistically

286

significant positive instances. There is good agreement between the concentration ratio and the t-

287

test metric for the discovery of true positives. Other than hexane, which for unknown reasons

288

appears to be an outlier, the minimum concentration ratio corresponding to a true positive

289

appears to be ~ 1.04, in good agreement with a previous study.16 Using the list of native

290

compounds corrected using the t-test metric, ROC curves were re-made (Figure S2 in Supporting

291

Information) with a TPP out of 41 total positive instances instead of the previously assumed 50

292

for Figure 4(A). While the shape of each curve does not change, the AUC value obviously

293

increases, from 0.61 to 0.74 for a S/N threshold of 3 and all m/z, and from 0.74 to 0.90 for a S/N

294

threshold of 10 and 10 m/z. It is important to note that the analyst can draw the same conclusions

295

from Figure 4(A) or Figure S2 which has important ramifications for implementing the ROC

296

curve-based AUC metric. Essentially, optimization of the software parameters does not require a

297

priori determination of the statistically correct number of positive instances in the sample

298

classes.

299

To more fully explore the effects of S/N threshold and number of m/z on F-ratio software

300

performance, ROC curves were generated for the 25 combinations of S/N threshold and number

301

of m/z defined in the Experimental Section. The area under the ROC curve (AUC) was calculated


Page 15 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


302

for each combination of S/N threshold and m/z, whereby the AUC provided a quantitative metric

303

to assess the effect of these parameters on the discrimination of the F-ratio software. The AUC

304

values for all 25 parameter combinations are shown as a heat map in where Figure 4(B)

305

corresponds to AUCs determined assuming all 50 spiked analytes correspond to positive

306

instances. Most notably, in Figure 4(B) the optimum parameters remain an S/N 10 coupled with

307

10 m/z, producing an AUC of 0.74 assuming 50 positive instances, while the previously

308

employed parameters (S/N 3 coupled with all m/z) produced an AUC of 0.61. The difference

309

between these two AUC values (0.13) relative to the prior value (0.61) is 0.21, or a 21%

310

improvement in the F-ratio software performance. Hence, optimization of the S/N threshold and

311

number of m/z used provided a 21% increase in the discrimination of the true positives from the

312

false positives. A heat map of AUCs was calculated using only the 41 statistically significant

313

positive instances (Figure S3 in Supporting Information), but the conclusions remain the same,

314

no a priori knowledge of the statistically different analytes is necessary.

315

Beyond interpretation of the AUC, perhaps most elucidative of the increase in software

316

performance is the comparison of the number of true positives discovered at different FPP

317

thresholds, which can serve as alternative metrics of ROC performance (see Supporting

318

Information). It may be advantageous to combine insight provided by more than one ROC curve

319

metric when one set of parameters is not clearly superior, so as to result in the selection of the

320

best set of parameters. For example, the previously utilized parameters (S/N 3; all m/z) found a

321

total of 33 true positives, while the optimized parameters (S/N 10; 10 m/z) found a total number

322

of 38 true positives of the 41 possible, statistically significant, true instances. Not only did the

323

optimized parameters find 5 additional hits total (i.e., at an FPP of 1.0), but using the optimized

324

parameters allowed the F-ratio algorithm to discover the vast majority of those true positives at



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 16 of 29

325

much lower FPP thresholds than those required by the previously utilized parameters. For

326

example, the greatest distance between the two ROC curves shown in Figure 4 occurs at an FPP

327

of about 0.2. An FPP of 0.2 corresponds to when the F-ratio hit list has reached 10 false

328

positives. At this FPP threshold, the optimized parameters (S/N 10; 10 m/z) have discovered 37

329

of the 38 true positives that were discovered overall, which corresponds to 9 more true positives

330

than the 28 true positives that the software discovered using the previous parameters (S/N 3; all

331

m/z). The optimization of the parameters (using S/N 10 and 10 m/z) allowed the method to

332

discover nearly all of the true positives at the cost of only 10 false positives.

333

CONCLUSION

334

The AUC is a quantitative metric to compare ROC curves to study and optimize

335

chemometric software performance and to demonstrate how various input parameters employed

336

by the software can be tuned for better analysis. We report the implementation of such an

337

approach on the tile-based F-ratio analysis software of GC × GC – TOFMS data using diesel fuel

338

spiked with 30 native and 20 non-native compounds at two nominal concentrations, 200/20 ppm

339

and 100/10 ppm, native/non-native respectively. By varying the S/N threshold and the number of

340

F-ratios per m/z used to calculate the average F-ratio of each tile, the AUC quantitatively

341

determined the best parameters, found to be a S/N threshold of 10 and 10 m/z. Compared to the

342

previously employed parameters of S/N threshold of 3 and all m/z, the use of these parameters

343

provided a ~21% improvement in the ability of the software to discriminate between true

344

positives and false positives in the hit list. It also provided the method with the ability to find 9

345

additional true positives at an FPP of 0.2 (10 allowed false positives) and 5 additional true

346

positives overall. Furthermore, we demonstrate that the AUC metric can be used effectively


Page 17 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


347

without prior knowledge of whether or not the spiked analytes (i.e., the intended true positives)

348

were in fact statistically significant positive instances.

349

For practical application of the ROC curve-based AUC metric method to optimize other

350

sample matrices and chemometric software algorithms, a few comments of merit remain. First,

351

while the F-ratio software optimization used two sample classes fabricated by spiking standards

352

into a diesel matrix, for other software algorithms the research goals and experimental design

353

should guide the preparation of suitable samples for optimization. For example, a software

354

algorithm designed for isotopic labeling investigations of metabolomes should utilize

355

isotopically labeled metabolites in a biological matrix. Second, while the software optimization

356

method was relatively time-consuming, the optimization need only be performed once prior to

357

application to numerous experiments for a given experimental design and sample type. Lastly,

358

this software optimization method should be broadly applicable to many other software methods

359

and investigative questions. For example, peak detection and quantification with results

360

presented in peak tables, commonly applied in many software packages, could be compared15

361

and optimized using the ROC curve-based AUC metric approach.

362

ACKNOWLEDGEMENTS

363

This work was supported by the Internal Revenue Service (IRS) under an Interagency Agreement

364

with the U.S. Department of Energy (DOE) under Contract DE-AC-5-76RLO with the Pacific

365

Northwest National Laboratory. The authors thank Dr. Brendon Parsons for his assistance

366

throughout this investigation.

367 368 369



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 29

370

SUPPORTING INFORMATION

371

Table S1: Table of native spike components

372

Statistical metrics for Table S1

373

Table S2: Table of non-native spike components

374

Figure S1: Standard addition method plot

375

Calculation of native concentration of spiked analytes

376

Explanation of ROC curve creation

377

Figure S2: ROC curves with 41 positive instances

378

Figure S3: Heat map of AUC with 41 positive instances

379

Other Metrics of ROC curve performance

380

Figure S4: Heat map of TP at FPP of 0.2

381

REFERENCES

382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408

(1) Zhang, L.; Zeng, Z.; Zhao, C.; Kong, H.; Lu, X.; Xu, G. J. Chromatogr. A 2013, 1313, 245–252. (2) Cordero, C.; Liberto, E.; Bicchi, C.; Rubiolo, P.; Schieberle, P.; Reichenbach, S. E.; Tao, Q. J. Chromatogr. A 2010, 1217 (37), 5848–5858. (3) Humston, E. M.; Knowles, J. D.; McShea, A.; Synovec, R. E. J. Chromatogr. A 2010, 1217 (12), 1963–1970. (4) Prebihalo, S.; Brockman, A.; Cochran, J.; Dorman, F. L. J. Chromatogr. A 2015, 1419, 109–115. (5) Ochiai, N.; Ieda, T.; Sasamoto, K.; Takazawa, Y.; Hashimoto, S.; Fushimi, A.; Tanabe, K. J. Chromatogr. A 2011, 1218 (39), 6851–6860. (6) Castillo, S.; Mattila, I.; Miettinen, J.; Orešič, M.; Hyötyläinen, T. Anal. Chem. 2011, 83 (8), 3058–3067. (7) Welthagen, W.; Shellie, R. A.; Spranger, J.; Ristow, M.; Zimmermann, R.; Fiehn, O. Metabolomics 1 (1), 65–73. (8) Mohler, R. E.; Dombek, K. M.; Hoggard, J. C.; Young, E. T.; Synovec, R. E. Anal. Chem. 2006, 78 (8), 2700–2709. (9) Mohler, R. E.; Tu, B. P.; Dombek, K. M.; Hoggard, J. C.; Young, E. T.; Synovec, R. E. J. Chromatogr. A 2008, 1186 (1–2), 401–411. (10) Jennerwein, M. K.; Eschner, M.; Gröger, T.; Wilharm, T.; Zimmermann, R. Energy Fuels 2014, 28 (9), 5670–5681. (11) Nizio, K. D.; MGinitie, T. M.; Harynuk, J. J. J. Chromatogr. A 2012, 1255, 12–23. (12) von Mühlen, C.; Zini, C. A.; Caramão, E. B.; Marriott, P. J. J. Chromatogr. A 2006, 1105 (1–2), 39–50. (13) Kehimkar, B.; Hoggard, J. C.; Marney, L. C.; Billingsley, M. C.; Fraga, C. G.; Bruno, T. J.; Synovec, R. E. J. Chromatogr. A 2014, 1327, 132–140. (14) Pierce, K. M.; Kehimkar, B.; Marney, L. C.; Hoggard, J. C.; Synovec, R. E. J. Chromatogr. A 2012, 1255, 3–11. 18 ACS Paragon Plus Environment

Page 19 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433

(15) Parsons, B. A.; Marney, L. C.; Siegler, W. C.; Hoggard, J. C.; Wright, B. W.; Synovec, R. E. Anal. Chem. 2015, 87 (7), 3812–3819. (16) Marney, L. C.; Christopher Siegler, W.; Parsons, B. A.; Hoggard, J. C.; Wright, B. W.; Synovec, R. E. Talanta 2013, 115, 887–895. (17) Parsons, B. A.; Pinkerton, D. K.; Wright, B. W.; Synovec, R. E. J. Chromatogr. A 2016, 1440, 179–190. (18) Watson, N. E.; Parsons, B. A.; Synovec, R. E. J. Chromatogr. A 2016, 1459, 101–111. (19) Thompson, M.; Ellison, S. L. R.; Wood, R. Pure Appl. Chem. 2002, 74 (5), 835–855. (20) Magnusson, B.; Ornemark, U. 2014. (21) Official methods of Analysis of AOAC International, 6th ed.; AOAC International, 1995. (22) Fraga, C. G.; Melville, A. M.; Wright, B. W. Analyst 2007, 132 (3), 230–236. (23) Brown, C. D.; Davis, H. T. Chemom. Intell. Lab. Syst. 2006, 80 (1), 24–38. (24) Green, D. M.; Swets, J. A. Signal Detection Theory and Psychophysics; John Wiley & Sons, Inc., 1966. (25) Metz, C. E. J. Am. Coll. Radiol. 2006, 3 (6), 413–422. (26) Baker, S. G. J. Natl. Cancer Inst. 2003, 95 (7), 511–515. (27) Carter, J. V.; Pan, J.; Rai, S. N.; Galandiuk, S. Surgery 2016, 159 (6), 1638–1645. (28) Blaise, B. J. Anal. Chem. 2013, 85 (19), 8943–8950. (29) Duraipandian, S.; Zheng, W.; Ng, J.; Low, J. J. H.; Ilancheran, A.; Huang, Z. Anal. Chem. 2012, 84 (14), 5913–5919. (30) Bradley, A. P. Pattern Recognit. 1997, 30 (7), 1145–1159. (31) Harris, D. C. Quantitative Chemical Analysis, 6th ed.; W. H. Freeman and Company: New York, NY, 2003.

434

Figure 1. (A) Total ion current (TIC) chromatogram of the stock solution containing diesel fuel

435

with a ~600 ppm internal standard 1,2,3-trichlorobenzene using a reverse column GC × GC

436

configuration. (B) TIC chromatogram of all 50 components in the standards mixture, including

437

30 native and 20 non-native compounds, numbered corresponding to the first column of Table S1

438

(natives) and S2 (non-natives) in Supporting Information. The “IS” indicates the internal

439

standard, 1,2,3-trichlorobenzene.

Figure Captions

440 441

Figure 2. (A) The F-ratio distribution for the preliminary hit list of the 200/20 ppm versus

442

100/10 ppm comparison using a S/N threshold of 3 and all m/z. The tail of the distribution is

443

shown in the inset figure, with the average F-ratio on a log scale so the entire tail can be viewed.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 29

444

(B) Log of average F-ratio versus hit number based upon the F-ratio distribution in (A), with true

445

positives (spiked analytes) shown in blue and false positives shown in red.

446 447

Figure 3. (A) The F-ratio distribution for the preliminary hits list of the 200/20 ppm versus

448

100/10 ppm comparison at optimized conditions of the tile-based F-ratio algorithm, S/N

449

threshold of 10 and 10 m/z. The tail of the distribution is shown in the inset figure with the

450

average F-ratio on a log scale so the entire tail can be viewed. (B) Log of average F-ratio versus

451

hit number based upon the F-ratio distribution in (A), with true positives (spiked analytes) shown

452

in blue and false positives shown in red. The true positives are shifted up, to higher F-ratio

453

values, and to the left, to hit numbers toward the top of the hit list relative to Figure 2(B).

454 455

Figure 4. (A) ROC curves for the two sets of parameters studied: (red) S/N threshold of 3 and all

456

m/z (based upon the results in Figure 2), and (bue) S/N threshold of 10 and10 m/z (based upon

457

the results in Figure 3). TPP is calculated assuming all 50 spiked analytes were positive

458

instances. The greatest difference between the ROC curves can be seen at an FPP of about 0.2,

459

which corresponds to the allowance of 10 false positives. This difference equates to the ability of

460

the software using the parameters S/N 10; 10 m/z to find 9 additional true positives than the

461

previously used parameters of S/N 3; all m/z. (B) A heat map showing the area under the curve

462

(AUC) at the 25 S/N threshold and number of m/z combinations studied. The AUC was

463

calculated by assuming all 50 spiked standards were positive instances. The optimum

464

combination of F-ratio software parameters was determined to be a S/N threshold of 10 with 10

465

m/z for both (A) and (B).

466


Page 21 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


For TOC only 190x254mm (96 x 96 DPI)



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 1(A) 190x254mm (96 x 96 DPI)


Page 22 of 29

Page 23 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60


Figure 1(B) 190x254mm (96 x 96 DPI)



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



Page 24 of 29

Page 25 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60





1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



Page 26 of 29

Page 27 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60





1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60



Page 28 of 29

Page 29 of 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60




Using Receiver Operating Characteristic Curves ... - ACS Publications

Recommend Documents