Parameter Optimization for Feature and Hit ... - ACS Publications

Feb 3, 2018 - Analytics AB, Umeå, Sweden) was used for the DoE approach, and ChemSpider was used as a free-of-charge chemical structure database, wit...
1 downloads 10 Views 515KB Size
Subscriber access provided by Universitaetsbibliothek | Johann Christian Senckenberg

Article

Parameter Optimization for Feature and Hit Generation in a General Unknown Screening Method – Proof of Concept Study Using a Design of Experiment Approach for a High Resolution Mass Spectrometry Procedure after Data Independent Acquisition Marco Pius Elmiger, Michael Poetzsch, Andrea E Steuer, and Thomas Kraemer Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.7b05387 • Publication Date (Web): 03 Feb 2018 Downloaded from http://pubs.acs.org on February 3, 2018

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Parameter Optimization for Feature and Hit Generation in a General Unknown Screening Method – Proof of Concept Study Using a Design of Experiment Approach for a High Resolution Mass Spectrometry Procedure after Data Independent Acquisition Marco P. Elmiger, Michael Poetzsch, Andrea E. Steuer, Thomas Kraemer* Department of Forensic Pharmacology and Toxicology, Zurich Institute of Forensic Medicine, University of Zurich, Zurich , Switzerland

Abstract High resolution mass spectrometry and modern data independent acquisition (DIA) methods enable the creation of general unknown screening (GUS) procedures. However, even when DIA is used, its potential is far from being exploited as often the untargeted acquisition is followed by a targeted search. Applying an actual GUS (including untargeted screening) produces an immense amount of data which must be dealt with. An optimization of the parameters regulating the feature detection and hit generation algorithms of the data processing software could significantly reduce the amount of unnecessary data and thereby the workload. Design of experiment (DoE) approaches allow a simultaneous optimization of multiple parameters. In a first step, parameters are evaluated (crucial or non-crucial). Secondly, crucial parameters are optimized. The aim in this study was to reduce the number of hits, without missing analytes. The obtained parameter settings from the optimization were compared to the standard settings by analyzing a test set of blood samples spiked with 22 relevant analytes as well as 62 authentic forensic cases. The optimization lead to a marked reduction of workload (12.3% to 1.1% and 3.8% to 1.1% hits for the test set and the authentic cases, respectively) while simultaneously increasing the identification rate (68.2% to 86.4% and 68.8% to 88.1%, respectively). This proof of concept study emphasizes the great potential of DoE approaches to master the data overload resulting from modern data independent acquisition methods used for general unknown screening procedures by optimizing software parameters.

Keywords Proof of concept study, Design of experiment approach, General unknown screening, parameter optimization, high resolution mass spectrometry

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Nowadays, tandem mass spectrometry (MS/MS) using data dependent acquisition (DDA) is the gold standard for drug screening in clinical and forensic toxicology.1,2 Recently, a gradual shift to data independent acquisition (DIA) methods (i.e. SWATH, SONAR, MSE) can be observed.3 High resolution mass spectrometry (HRMS) which is required for these new methods is likely to become the gold standard for non-targeted screening in coming years.4 Non-targeted screening with these new DIA methods offer for the first time the possibility to detect all ionized analytes in a sample and allow retrospective data analysis as an additional option. However, most actually used DIA methods simply combine untargeted acquisition with targeted data processing.5-7 To establish an actual general unknown screening (GUS) approach, untargeted acquisition must be followed by untargeted data processing. This approach generates an immense amount of data which must be assessed by an experienced user. This often takes hours and lots of human resources. The peak finding and assignment algorithms of the data processing software generate an endless list of thousands of features consecutively leading to hundreds of so called hits including a lot of false positive results. The number and the quality of features and hits are controlled by a combination of different software parameters. Optimization of these parameters can significantly reduce the workload. For this purpose, the typical strategy is to vary one parameter after another in no particular order. A more progressive strategy is using a design of experiment (DoE) approach where multiple parameters can be evaluated simultaneously. DoE approaches are already widely applied in bioanalytical science8, emergency toxicology9, environmental analytics10, food analytics 11,12, metabolomics13-15, pharmaceutical analytics16-18 and water analytics19 mostly for optimization of the chromatographic separation. The aim of this proof of concept study was to use a DoE approach for the optimization of the parameters regulating the feature detection and hit generation algorithms of the data processing software by analyzing test sets of whole blood samples spiked with substances common in forensic toxicology. The optimized parameters should also be applied to analysis of authentic blood samples and the results should be compared to those from accredited routine methods. Ideally, workload for data analysis should be significantly reduced while still allowing for an increase of the number of positively identified analytes.

ACS Paragon Plus Environment

Page 2 of 17

Page 3 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Material and methods Chemicals and reagents Methanolic or acetonitrilic solutions (1 mg/mL) of 6-acetylmorphine, amphetamine, benzoylecgonine, cocaethylene, cocaine, codeine, dihydrocodeine, ecgonine methyl ester, hydrocodone, hydromorphone, methadone, methamphetamine, 3,4-methylenedioxyamphetamine (MDA), 3,4-methylenedioxyethylamphetamine (MDEA), 3,4-methylenedioxymethylamphetamine (MDMA), methylphenidate, morphine, O-desmethyltramadol (ODMT), oxycodone, oxymorphone, tramadol and zolpidem were obtained from Cerilliant (delivered by Sigma-Aldrich, Buchs, Switzerland) and Lipomed (Arlesheim, Switzerland). Water and acetonitrile of high-performance liquid chromatography (HPLC) mass spectrometry (MS) grade was obtained from VWR International GmbH (Dietikon, Switzerland) and Merck (Zug, Switzerland), respectively. All other chemicals used were from Merck (Zug, Switzerland) and of the highest grade available.

Biosamples For the preparation of the test sets, blank blood samples were obtained from healthy volunteers of the Zurich Institute of Forensic Medicine (ZIFM) of Zurich University, Switzerland after written informed consent. After anonymization, 62 authentic blood samples from routine analysis of the ZIFM were used for applicability studies. The samples had been submitted to the authors’ laboratory by the local police and state attorneys mainly in the context of driving under the influence of drugs (DUID) cases. All biosamples were stored at -18° Celsius.

Spiking solutions Commercially available 1 mg/mL methanolic or acetonitrile solutions for each analyte were used as stock solutions for the spiking solution low and high (SpMix-low and SpMix-high). The two spiking solutions were prepared in methanol by mixing appropriate amounts of the corresponding commercially available analytes with further dilution to a total volume of 5 mL methanol. The final spiking solutions had a concentration 10 times higher than the corresponding blood concentration. The final blood concentrations are detailed in Table S-1.

Sample preparation A test set was prepared using blank blood samples from ten different people. Each blank blood sample was separated into three aliquots, whereby one aliquot was spiked with the SpMix-low and one aliquot with the SpMix-high. The remaining portion each was spiked with methanol only (blank). In the end, the test set included ten groups of blood samples, each consisting of a blank blood, a SpMixlow and a SpMix-high spiked blood sample. Their respective final concentrations are listed in Table S-1. The samples were extracted by protein precipitation (PP) as described previously with some slight changes.20 For the test set, 20 µL methanol (MeOH) was placed in a vial before adding 200 µL whole blood. Then, 20 µL SpMix-low or SpMix-high were added, respectively. For the unchanged blank

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

blood samples and the authentic cases, 40 µL MeOH were placed in a vial before adding 200 µL blood. PP was performed by slowly adding 600 µL acetonitrile during vortexing. The mixture was shaken, centrifuged and the supernatant was transferred into an autosampler vial. After addition of formic acid, the supernatant was evaporated to dryness under a gentle stream of nitrogen at room temperature. Reconstitution was in 250 µL of a mixture of eluent A and eluent B. The extracts were analyzed by liquid chromatography electro spray ionization quadrupole Time-of-Flight (LC-ESIQTOF) as described in the following section.

Liquid chromatography and mass spectrometry conditions HPLC separation was performed using an UltiMate 3000 rapid separation liquid chromatography system (RSLC) (Thermo Fischer Scientific, San Jose, CA), configured in binary high pressure gradient mode and controlled by Chromeleon 6.80 software (Thermo Fischer Scientific). Mobile phase A consisting of ammonium formate 10 mM aqueous buffer with 0.1% formic acid and mobile phase B consisting of acetonitrile with 0.1% formic acid. The gradient profile for eluent B was as follows: 0.0−1.0 min 10%; 1.0−20 min increase to 60%; 20−22 min increase to 95%; 22−23.5 min hold at 95%; 23.5−24 min decrease to 10%. For re-equilibration of the HPLC column, the gradient was set to 10% eluent B for 1.0 min. The column oven was set at 40°C and the autosampler was cooled at 10°C. The flow rate was 0.5 mL min−1 at all times. A volume of 10 µL of the sample was injected onto a Synergi Polar RP column (100 mm × 2.0 mm i.d.; 2.5 µm particle size, 100 Å) (Phenomenex, Torrance, CA) guarded with a C18 guard column (2.0 mm i.d. × 4.0 mm; Phenomenex). Mass spectrometric detection was performed using a QTOF MS instrument (TripleTOF 6600, Sciex, Concord, Ontario, Canada) with resolving power (full width at halfmaximum, fwhm, at m/z 400) set of 30,000 in MS and 30,000 in Sequential Window Acquisition of all Theoretical Mass Spectra (SWATH) MS/MS (high resolution mode). The automated calibration device system (CDS) performed an external calibration every five samples. The Turbo V ion drive source equipped with a stainless steel electrode (100 µm internal diameter) was operated with the following MS conditions: gas 1, nitrogen (40 psi); gas 2, nitrogen (40 psi); ion spray voltage, 5500 V; ion-source temperature, 450°C; curtain gas, nitrogen (35 psi), collision energy, 10 eV. The MS was operated in the SWATH acquisition mode where one complete cycle consists of a survey scan and a Q1 isolation strategy. The survey scan covered a mass range of 100 to 1000 m/z with an accumulation time of 50 ms. The Q1 isolation strategy covered a mass range of 140−800 m/z with a 25 Da SWATH window for Q1 isolation (overlap 1 u). In each SWATH window, a collision energy of 35 eV with a spread of ±15 eV and an accumulation time of about 55 ms in high-resolution mode was used. The total cycle time was 1.6 s. All MS parameters were controlled by AnalystTF 1.7 Software (Sciex). Data were processed with PeakView® 2.2 (Sciex) and MasterViewTM 1.1 (Sciex) which allow screening with an in-house library or a linkage to Chemspider.21 MODDE Go version 11.0.2 (MKS Umetrics, now Sartorius Stedim Data Analytics AB, Umeå, Sweden) was used for the DoE approach and Chemspider was used as a free-of-charge chemical structure database, with access to structures, properties, and associated information.

Data processing software (PeakView) For data processing of the SWATH data resulting from DIA, PeakView® with the MasterViewTM module was used. PeakView® can be used for screening purposes without having any compound list

ACS Paragon Plus Environment

Page 4 of 17

Page 5 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

(i.e. no suspected-target screening) or performing any library search (i.e. no actual target-screening). For that purpose, PeakView® has a Non-Targeted Peak Finder approach that generates features from the extracted ion chromatogram (XIC) displayed in a feature table sorted according to their accurate masses. In addition, a Formula Finder can generate all theoretically possible chemical formulas for the features within predefined settings. The chemical formulas for a feature are listed in an additional table sorted according to their Formula Finder score which is composed of information from mass spectrometric and tandem mass spectrometric data. If at least one chemical formula reaches or exceeds a threshold value (i.e. for the present study a score >30), the corresponding feature becomes a positive hit (highlighted in green in the feature table). The Non-Targeted Peak Finder and Formula Finder parameters affecting the feature detection and hit generation are listed in Table 1 together with their respective descriptions. The parameters for ‘intensity’ and ‘signal-to-noise’ definition are used by default but were set to zero for optimization. By default, the ‘minimum and maximum Retention Time’ is not used. The ‘Peak detection Sensitivity’ regulator was set to the second highest option (out of seven options between fast and exhaustive). The settings for ‘Max Element’ were as follows: 30 carbons, 50 hydrogens, 5 nitrogens, 10 oxygens, 2 sulfurs, 2 chlorines, 3 fluorines, 2 bromines, 2 iodines and 2 phosphoruses. The remaining parameters are the ‘default XIC width’, ‘the default retention time width’, ‘the default threshold’, ‘the default threshold (ratio of control)’ and the ‘mass tolerance’.

Parameter optimization using design of experiment approach MODDE Go software was used for our DoE approach to evaluate importance of the above parameters and their potential to influence other parameters. ‘Screening’ objective was selected as recommended by MODDE Go when only little is known about the parameters for evaluation and optimization. A score formula (Formula 1) was designed to translate the extent of influence of different parameters into numbers, because MODDE Go requires numbers for parameter description including their range for evaluation and optimization.

Formula 1 Score formula  = 0.6 ×

    × 22 + 0.4 ×   ℎ         

A perfect combination of parameters would lead to a low score i.e. all 22 analytes identified with still a low number of hits. The numbers 0.6 and 0.4 in Formula 1 are weighting factors meaning that the actual identification of the 22 spiked substances is slightly more important than having a smaller number of hits. The ‘ranking sum’ is derived from the respective ranks of the 22 analytes in the Formula Finder table. A rank of 1 was assigned if the analyte was listed on top. Ranks 2 to 5 were assigned accordingly. An analyte listed outside the top 5 lead to a rank of 6. If the analyte was not in the list at all, a rank of 7 was assigned. In conclusion, the perfect ranking sum would be 22 (i.e. 22 analytes on rank 1). ‘Total analytes of test set found’ represent the number of spiked analytes actually found in the sample. If all spiked analytes are found, the whole fraction can be reduced to the ranking sum. Fewer findings of spiked analytes in the sample will lead to a higher score. ‘Number of hits’ is the total of all hits in a sample as given by PeakView® software.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

MODDE’s DoE approach starts with the identification of those parameters which actually have crucial effects on the result. For that purpose, simple models (linear or linear with interactions) are used. In our case, ‘Frac Fac Res IV’ was selected (as proposed by MODDE Go). Therewith, MODDE Go generated a worksheet with 19 experiments, each with a different combination of parameter settings. Thus, the sample test set (ten groups each including SpMix-low, SpMix high and blank blood) had to be processed 19 times in PeakView® software. The results for low and high concentrations of those experiments (average score of the ten groups, respectively) were filled in the response columns of the worksheet. The analysis wizard provides guidance through the main steps and is the recommended method for making changes to and adjusting the model. It is possible to review and fit the raw data as well as diagnose and refine the model. The statistics are described by four values (R2, Q2, model validity, and reproducibility). R2 shows the model fit (green bar on the right part of Figure 1) and Q2 an estimate of the future prediction precision (blue bar on the right part of Figure 1). A model with R2 of 0.5 is still a model with rather low significance. In contrast, Q2 of 0.1 is already indicative for a significant model and 0.5 represents a good model. Q2 is the best and most sensitive indicator. R2 and Q2 should be close in size and their difference should not be more than 20%. Non-crucial parameters can be visually detected (small bars) and removed to refine the model according to its fit and estimation of the future prediction precision. The optimizer tool helps with finding the optimal conditions of parameter combinations. When searching for a solution with more than one response (two in our case, i.e. low and high concentration), the result will be a compromise between those responses. The Dynamic profile window shows the effect of the parameters over the investigated range. By default, these parameters are the ones calculated by the optimizer. The window allows for graphical and numerical changes of the settings of the parameters. Changes in the effect curve of one parameter versus one response (represented in one of the four graphs in Figure 2) will trigger changes to the other effect curves (i.e. the three other graphs).

DoE optimized parameters and application to authentic samples Blood samples from authentic cases (n = 62) covering different analytes and concentration ranges were analyzed with the described method. The analytical results (i.e. features and hits) obtained with the PeakView® software by processing the data with the optimized parameter settings were compared to the results obtained after processing the data with the standard parameters. In addition, those results were tested against the results of the samples obtained with routinely used and validated methods. Hereby, all identified compounds of interest (and not only the test set compounds) taken together were set as 100% to calculate the identification rates.

ACS Paragon Plus Environment

Page 6 of 17

Page 7 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Results and discussion Modern DIA methods allow the creation of general unknown screening procedures. However, an immense amount of data is produced which must be assessed by an experienced user. An optimization of the parameter regulating the feature detection and hit generation algorithms of the data processing software can significantly reduce the amount of unnecessary data and thereby the workload. In this proof of concept study, a DoE approach was used for this optimization process.

Data processing software (PeakView) The potential of the Non-Targeted Peak Finder approach combined with the Formula Finder option was used to detect features in a blood sample and connect them to a theoretical formula due to their accurate mass, without having an in-house library. Several parameters could be varied to define the Non-Targeted Peak Finder approach. According to the manufacturer of the PeakView® software, parameters for ‘intensity’ and ‘signal-to-noise’ definition were set to zero for the optimization, resulting in forwarding any peak from an extracted ion chromatogram directly to the next step of feature detection without filtering at this stage. However, for the parameter evaluation these two parameters were considered. The ‘minimum and maximum Retention Time’ parameter is not used by default. For the optimization, the ‘minimum Retention Time’ was set to one minute to exclude signals from dead volume and injection peak. The ‘Peak Detection Sensitivity’ regulator was set to the second highest option (out of seven options between fast and exhaustive). The highest option (i.e. exhaustive) would probably have given better results for feature detection, but was neglected for computational reasons. While sample processing with the second highest option took several minutes (typically 2 to 5 minutes), processing a sample employing the highest option took up to 3 times longer (specification of computer used: windows 7 professional; processor: Intel® Core™ i7-4770 CPU @ 3.40 GHz 3.40 GHz; RAM: 16 GB). This was unfeasible regarding the amount of samples and should be considered in further investigations, especially if more powerful computers will be available. The Formula Finder settings for ‘Max Element’ (cf. above) were chosen to cover small molecules of interest in the field of forensic toxicology but to exclude larger (e.g. endogenous) molecules. Combining the Formula Finder with the free chemical structure database Chemspider21 allowed the linkage of theoretical formulas with chemical structures.

Parameter optimization using design of experiment approach The advantage of using a DoE approach is the possibility to evaluate multiple parameters simultaneously via statistical experiments versus the traditional approach of changing only one parameter at a time in no particular order and observing changes to the results on a run by run basis. Using a DoE approach is widely applied in bioanalytical science8, emergency toxicology9, environmental analytics10, food analytics11,12, metabolomics13-15, pharmaceutical analytics16-18 and water analytics19 mostly for optimization of chromatographic separation. However, using a DoE approach has not yet been described to master the data overload resulting from modern data independent acquisition methods used for general unknown screening procedures by optimizing software parameters for the feature detection and hit generation. For this proof of concept study, a DoE approach (software MODDE Go) was used to detect and optimize those parameters most affecting the feature detection and hit generation. By varying all these

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

relevant parameters simultaneously (different experiments, listed in the worksheet), the optimum parameter setting combination can be found. The samples from the test set, while containing known (spiked) analytes, were treated as unknown samples. Thus, the framework conditions of the optimization could be defined because the possible maximum number of analytes in a sample was already known. It must clearly be stated that this approach does work also for true unknown substances when no retention time or structural information is available and consequently no XIC list for doing a suspect screening can be created. The aim here was to reduce the number of hits, without missing an analyte. For parameter evaluation, the simple goal was to identify those parameters which actually have crucial effects on the result (i.e. feature detection and hit generation). The analysis wizard suggested the ‘Frac Fac Res IV’ model, which is a balanced subset of the full factorial at two levels. Although the parameters ‘intensity’ and ‘signal-to-noise’ can be neglected by setting them to zero (according to the manufacturer), they were still considered for evaluation to test the potential of the DoE approach. If the approach was actually working, these parameters should be recognized as non-crucial in the software’s assessment. And indeed, the analysis wizard showed small bars (non-crucial) for the parameters ‘intensity’ and ‘signal-to-noise’ on the upper left part of Figure 1. In addition, the ‘default XIC width’ parameter was identified as non-crucial. Consequently, removing of those non-crucial parameters and adding of a square test factor (recommended by the analysis wizard) actually resulted in an increase of the R2 and Q2 values and a reduction of their size difference well under 20% (lower right part of Figure 1). In the next step (i.e. parameter optimization), the non-crucial parameters were removed and another logically relevant parameter, ‘mass tolerance’, was added. As a consequence, MODDE Go suggested a new statistical model (i.e. ‘Onion D-Optimal’). The first optimization run indicated that the range for 3 out of 4 parameters had been chosen too narrow, as optimized values were at the very beginning or end of the range. Therefore, the worksheet was extended by three additional experiments to broaden these ranges (Table 2). A second optimization run (including these new experiments, shown in Table S-2) resulted in the final prediction chosen as optimal parameter combination (Table 3). Obtaining small values for the parameters ‘default XIC width’ and ‘the default retention time width’ as well as having a high value for the ‘default threshold (ratio of control)’ is easily understandable. However, it was quite surprising that the software suggested a huge ‘mass tolerance’ of 13 ppm. To check that this was not caused by inadequate calibration during the experiments, a new series was measured with auto calibration after every experiment (mass error surely better than 5 ppm), but still, 13ppm turned out to be the optimal value. An explanation could be that a higher value is more forgivable to turn a feature into a hit especially at low concentrations.

DoE optimized parameters and application to authentic samples The analytical results (i.e. features and hits) obtained in the PeakView® software by processing the data with the optimized parameter settings were compared to the results obtained after processing the data with the standard parameters. Again, the aim was to reduce the number of hits, without missing an analyte. Although as much as 12.3 % of all features were hits using the standard settings, only 68.2 % of the spiked analytes from the test set were identified on average. The optimization of the parameters reduced the number of hits to as little as 1.1 % while increasing the number of identified analytes to an average of 86.4 %.

ACS Paragon Plus Environment

Page 8 of 17

Page 9 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Furthermore, usefulness of the optimization was tested using 62 authentic forensic cases by comparing the results from data processing applying the two different parameter settings. The standard settings generated 3.8 % hits on average and lead to an identification rate of 68.8 % of all analytes. In contrast, the optimization of the parameters lead to 1.0 % hits while maintaining a much higher identification rate of 88.1 %. Again, this marked reduction of workload still allowed for an increase of the identification rate of 30%.

Conclusion In this proof of concept study, a design of experiment (DoE) approach was used to optimize parameters for a Non-Targeted Peak Finder approach combined with a Formula Finder. The parameter optimization lead to a significant workload reduction with a simultaneous increase of identified analytes in test samples containing spiked analytes as well as in authentic samples. Thus, the great potential of using a DoE approach was shown to master the data overload resulting from modern data independent acquisition methods used for general unknown screening procedures by optimizing software parameters for the feature detection and hit generation.

Acknowledgment The authors express their gratitude to Emma Louise Kessler, MD for her generous legacy she donated to the Institute of Forensic Medicine at the University of Zurich, Switzerland for research purposes and to Lana Brockbals for proof reading.

Associated content Supporting information Additional information as noted in the text.

Author Information Corresponding Author * E-mail: [email protected]

Authors Contribution The authors declare that they have no potential conflict of interest.

Notes The authors declare no competing financial interest.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ACS Paragon Plus Environment

Page 10 of 17

Page 11 of 17 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

References

(1) Maurer, H. H. Journal of chromatography. A 2013, 1292, 19-24. (2) Steuer, A. E.; Poetzsch, M.; Koenig, M.; Tingelhoff, E.; Staeheli, S. N.; Roemmelt, A. T.; Kraemer, T. Journal of chromatography. A 2015, 1381, 87-100. (3) Arnhard, K.; Gottschall, A.; Pitterl, F.; Oberacher, H. Analytical and bioanalytical chemistry 2015, 407, 405-414. (4) Pasin, D.; Cawley, A.; Bidny, S.; Fu, S. Analytical and bioanalytical chemistry 2017, 409, 58215836. (5) Roemmelt, A. T.; Steuer, A. E.; Kraemer, T. Analytical chemistry 2015, 87, 9294-9301. (6) Sundstrom, M.; Pelander, A.; Ojanpera, I. Journal of analytical toxicology 2017, 41, 623-630. (7) Sauvage, F. L.; Picard, N.; Saint-Marcoux, F.; Gaulier, J. M.; Lachatre, G.; Marquet, P. Journal of separation science 2009, 32, 3074-3083. (8) Dawes, M. L.; Bergum, J. S.; Schuster, A. E.; Aubry, A. F. Journal of pharmaceutical and biomedical analysis 2012, 70, 401-407. (9) Hlozek, T.; Bursova, M.; Coufal, P.; Cabala, R. Journal of pharmaceutical and biomedical analysis 2015, 114, 16-21. (10) Asati, A.; Satyanarayana, G. N.; Patel, D. K. Analytical and bioanalytical chemistry 2017, 409, 2905-2918. (11) Zheng, H.; Clausen, M. R.; Dalsgaard, T. K.; Mortensen, G.; Bertram, H. C. Analytical chemistry 2013, 85, 7109-7116. (12) Gionfriddo, E.; Naccarato, A.; Sindona, G.; Tagarelli, A. Analytica chimica acta 2012, 747, 58-66. (13) Pan, L.; Qiu, Y.; Chen, T.; Lin, J.; Chi, Y.; Su, M.; Zhao, A.; Jia, W. Journal of pharmaceutical and biomedical analysis 2010, 52, 589-596. (14) Huang, Q.; Wang, G. J.; Sun, J. G.; A, J. Y.; Zha, W. B.; Zhang, Y.; Zhang, J. W.; Yan, B.; Gu, S. H.; Ren, H. C.; Liu, L. S. Journal of pharmaceutical and biomedical analysis 2008, 46, 728-736. (15) A, J.; Trygg, J.; Gullberg, J.; Johansson, A. I.; Jonsson, P.; Antti, H.; Marklund, S. L.; Moritz, T. Analytical chemistry 2005, 77, 8086-8094. (16) Perrenoud, A. G.; Farrell, W. P.; Aurigemma, C. M.; Aurigemma, N. C.; Fekete, S.; Guillarme, D. Journal of chromatography. A 2014, 1360, 275-287. (17) Jadhav, S. B.; Kumar, C. K.; Bandichhor, R.; Bhosale, P. N. Journal of pharmaceutical and biomedical analysis 2016, 118, 370-379. (18) Poceva Panovska, A.; Acevska, J.; Stefkov, G.; Brezovska, K.; Petkovska, R.; Dimitrovska, A. Journal of chromatographic science 2016, 54, 103-111. (19) Fauvelle, V.; Mazzella, N.; Morin, S.; Moreira, S.; Delest, B.; Budzinski, H. Environmental science and pollution research international 2015, 22, 3988-3996. (20) Elmiger, M. P.; Poetzsch, M.; Steuer, A. E.; Kraemer, T. Analytical and bioanalytical chemistry 2017, 409, 6495-6508. (21) Chemistry, R. S. o.; Royal Society of Chemistry, 2015.

ACS Paragon Plus Environment

Analytical Chemistry 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Tables

Table 1 Parameters affecting the feature detection and hit generation and their respective description (modified from MasterView Software manual). Parameters used for optimization are given in bold letters. Parameter (unit)

Description

Do not calculate details for XIC with Intensity < counts or S:N