Artificial Neural Network for Probabilistic Feature Recognition in Liquid

Dec 20, 2016 - In this work, a novel probabilistic untargeted feature detection algorithm for liquid chromatography coupled to high-resolution mass ...
0 downloads 0 Views 2MB Size
Subscriber access provided by University of Newcastle, Australia

Article

Artificial Neural Network for Probabilistic Feature Recognition in Liquid Chromatography Coupled to High Resolution Mass Spectrometry Michael Woldegebriel, and Eduard Derks Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.6b03678 • Publication Date (Web): 20 Dec 2016 Downloaded from http://pubs.acs.org on December 21, 2016

Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.

Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.

Page 1 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Artificial Neural Network for Probabilistic Feature Recognition in Liquid Chromatography Coupled to High Resolution Mass Spectrometry Michael Woldegebriel*1, Eduard Derks2 1

Analytical Chemistry, Van’t Hoff Institute for Molecular Sciences, University of Amsterdam P.O. Box 94720, 1090 GE Amsterdam, The Netherlands 2

*

Dept. of Analytics and Statistics, DSM Resolve, 6167 RD Geleen, The Netherlands

Corresponding Author:

E-mail: [email protected] Tel: +31 20 5257040

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

ABSTRACT In this work, novel probabilistic untargeted feature detection algorithm for liquid chromatography coupled to high resolution mass spectrometry (LC-HRMS) using artificial neural network (ANN) is presented. The feature detection process is approached as a pattern recognition problem, and thus, ANN was utilized as an efficient feature recognition tool. Unlike most existing feature detection algorithms, with this approach, any suspected chromatographic profile (i.e. shape of a peak) can easily be incorporated by training the network, avoiding the need to perform computationally expensive regression methods with specific mathematical models. In addition, with this method, we have shown that the high resolution raw-data can be fully utilized without applying any arbitrary thresholds or data-reduction, and therefore improving the sensitivity of the method for compound identification purposes. Furthermore, opposed to existing deterministic (binary) approaches, this method rather estimates the probability of a feature being present/absent at a given point of interest, thus giving chance for all data-points to be propagated down the data-analysis pipeline, weighed with their probability. The algorithm was tested with datasets generated from spiked samples in forensic and foodsafety context, and has shown promising results by detecting features for all compounds in a computationally reasonable time. KEYWORDS: Neural-Network; Bayesian Statistics; LC-HRMS; Untargeted Screening;

ACS Paragon Plus Environment

Page 2 of 31

Page 3 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

INTRODUCTION In toxicology screening, compound identification can be approached either as targeted or untargeted screening process1-3. For both approaches, in most laboratories, highly sensitive liquid chromatography coupled to high resolution mass spectrometer (LC-HRMS) is commonly used as the state-of-the-art technique4-9. In recent years, there has been a growing interest and realization to analyze compounds beyond target compound lists, and therefore a shift towards non-targeted screening is on its way. In general, the process of untargeted compound screening for LC-HRMS can be summarized in to two main steps. (i) Feature detection, and (ii) feature matching. The first step (i) is efficient detection of all possible peak features present within the raw-data, taking in to account the two-dimensional separation space (LC and MS). The main aim of this step is to basically identify interesting regions within the separation space consisting of signal generated by an actual compound. These interesting regions can be selected by identifying features that could resemble predefined peak shapes. For LC-HRMS, taking into account the two-dimensions of separation (LC and MS) have their own unique feature characteristics; the feature detection process should take this into account. In the chromatographic domain, the elution process of a given compound retained by the column of the system is usually summarized as having a Gaussian like shape with a long tail, and thus, it's a common approach to use a Gaussian function as a representative mathematical model for feature detection10-12. However, all factors affecting the peak shapes have not yet been full characterized, so therefore other mathematical models have also been proposed to represent chromatographic peaks13. In most cases, the feature detection process are approached either one dimensionally; by taking the chromatographic and MS space separately, or two-dimensionally; using three-dimensional models by taking into

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

account both separation space simultaneously14-15. In our previous work, we have highlighted the benefit of using a multidimensional model to capture all common characteristics of an LCHRMS data governed by probabilistic framework16-17. In the second step of untargeted screening (ii), given the detected peak features from the first step (i), an exhaustive probabilistic assessment to identify possible match between the detected features and any candidate compound that could explain the observed pattern should follow. Due to the complexity and large size of high resolution datasets, almost all currently existing algorithms approach the above mentioned steps in overly simplistic manner; by first reducing the data-size using centroiding approach, and/or using a high cutoff value for the intensity (threshold). Such an approach is usually not only vulnerable to loss of important chromatographic profiles (i.e. centroiding results in loose of peak shape in the mass/charge domain), but also discards low abundant peak features with chemical relevance, too early in the data analysis pipeline. In addition, the methods are limited by the non-exhaustiveness of the compound database utilized for matching the detected features, and thus the term untargeted screening appears to be misused18-21. Therefore the weakness in both steps (i and ii) need to be addressed. For this work, taking into account the first step (feature detection) can be considered as a pattern recognition problem, the practicality of artificial neural network (ANN) for this particular purpose has been explored. Ever since ANN has emerged in the 1940's, it has been shown to be a very useful tool for pattern recognition in several fields22-23. Similarly, ANN and other machine learning tools have also found more application areas in the analytical chemistry domain24-30. For this work, the novelty of the proposed feature recognition algorithm based on ANN lies in its capability to avoid the limitation of predefined peak shapes based on mathematical models, in which most

ACS Paragon Plus Environment

Page 4 of 31

Page 5 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

existing algorithms are strictly dependent-on. In addition, the method also efficiently utilizes the high resolution LC-MS raw-data by avoiding regression methods commonly applied in most algorithms for peak model least-squares-fitting, involving computationally expensive matrixinversion process. Lastly, opposed to deterministic approaches governed by stringent criteria, here a probabilistic method based on Bayesian framework, capable of incorporating any prior knowledge to estimate the presence/absence of feature is presented. In comparison to commercial/open-source software developed for screening purposes, to our knowledge, the proposed algorithm is the first method developed to strictly handle LC-HRMS data by detecting all possible features without any preprocessing or threshold in a computationally reasonable time.

Theory The basis for applying ANN for the purpose of peak detection in an LC-HRMS data lies on the fact that a peak feature generated from a given compound of interest (COI) has a specific feature that a certain algorithm can be trained to recognize. That way, once the algorithm has been trained, a quick scan over the entire dataset in search of similar peak features can allow efficient peak detection. To further elaborate on this notion, a peak generated by a random compound is assumed. The data-points of the peak signal generated can be represented as t={t1, t2, …, tt, …, tT}, representing all acquisition times (tR) in the chromatographic domain, mt={mt,1, mt,2, …, mt,m,…, mt,M} representing all mass to charge (m/z) values at all acquisition times, and It={It,1, It,2, …, It,M} representing all intensities values of all m/z values at all retention times. With this data structure, the entire peak feature can be divided into different compartments; each possessing a specific characteristic of the overall feature, regardless of which compound

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

generated the signal. More specifically, any given peak in an LC-HRMS data has a certain recognizable feature that can be picked up by a pattern recognition algorithm. The pattern recognition process can be initiated by first training each compartment feature to the algorithm. For this process, as mentioned earlier, a further split of the peak into different compartments is necessary. Based on the data description given above, a given LCHRMS peak can be split into data-points generated at each tR. All data-points falling under each tR can be represented as; mt={mt,1, mt,2, …, mt,K}, It={It,1, It,2, …, It, K} where K represents the maximum number of data-points registered at that specific tR. Here it should be noted that, the variation in m/z values of the data-points generated at each tR is the result of the MS detector error, resulting in a Gaussian like distribution. Intuitively, the maximum intensity of each feature at each tR depends solely on which tR we are looking into. In other words, this information implies the rise and fall of the signal in the chromatographic domain. Given these patterns expected from any multivariate peak feature, the following key characteristics can be learned by a pattern recognition algorithm: i. The shape of the intensity distribution of all data-points at each tR (expected to be Gaussian). ii. The location of the most intense m/z value from all data-points at a given tR (expected to be at the center). iii. The location of the first moment of the m/z value (weighed average of m/z value with its corresponding intensity) at each tR (expected to be at the center). iv. The maximum intensity values of all data-points at each tR (expected to create a Gaussian chromatographic peak profile).

ACS Paragon Plus Environment

Page 6 of 31

Page 7 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Given the above expected characteristics, ANN has been chosen to be an efficient pattern recognition algorithm, trained to specifically detect the characteristics of not only an ideal but also a non-ideal multivariate peak shapes generated by a compound.

Neural Network Architecture The architecture of the neural network consists of three layers; the input layer, hidden layer, and output layer. The depth of the network can be generally defined by how many hidden layers the entire network consists of. For this particular work, only one hidden layer was found to be sufficient to model the data of such complexity. This approach is in accordance with the notion that a single layer can be efficiently utilized by adjusting the number of hidden neurons31. The entire architecture is built by connecting specific neurons (nodes) in a feed-forward fashion where input signal is only carried in forward direction during the prediction (pattern recognition) phase. However, during the training process, a back-propagation approach is applied (details below). Fig.1 (top) depicts the neural network architecture. The total number of inputs of the network is 504; including a bias term, all normalized intensity values (adjusted between 0 and 1) registered for each data point, the maximum intensity value and the m/z first moment at each tR. The number of inputs defined is based on the assumption that at-least 21 data-points for each dimension (m/z and tR) has been registered (Fig.1, bottom), resulting in a total of 441 datapoints. This assumption governs the coarseness of the grid used to transform the raw-data in to gridded-data for pattern recognition purposes (details below). Each input value contributes information regarding either the shape of a given component of the peak feature, or its location. In addition, the interdependence of the feature components is also taken into account by exchanging information between each tR in the network. Each input value has its own corresponding weighing factor () which is adjusted during the training phase. Given the inputs

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 8 of 31

and their corresponding , each neuron in the hidden layer receives a weighed sum of specific number of inputs with the following pre-activation function.  =  + ∑    ∗ 

(1)

On Eq.(1),  represents the bias term added to the pre-activation function of each hidden neuron, where the sub-index  represents to which hidden neuron its added to. Similarly  represents the weights of each input where  and  representing the sub-index of the hidden neuron and input, respectively. In the hidden layer, there are 21 neurons obtaining information from each component of the feature (each tR). However, each neuron receives only 64 inputs. These 64 inputs directed towards each hidden neuron can be categorized as follows; 21 normalized (adjusted between 0 and 1) intensity values of each data-point from the corresponding acquisition time (21 inputs), the first moment of the m/z values of the corresponding acquisition time (1 input), the maximum intensity of the corresponding acquisition time (1 input), and a bias term (1 input). In addition, each hidden neuron also receives the first moment of the m/z values and the maximum intensity values of the other 20 components at each acquisition times (40 inputs). Each input value is weighed by its corresponding weight ( ), summed and passed on to its corresponding hidden neuron using the pre-activation function on Eq.(1). Given this sum of weighed inputs ( ), each hidden layer applies the following sigmoid non-linear model as an activation (transformation) function: 

 = ( ) =  

(2)

On Eq.(2),  refers to the transformed output of a given node. Similarly, the transformed outputs from the hidden layers are once again assigned a weight, summed and passed on to the single node in the output layer. However, it should be noted that a bias term is added to the output neuron making it a total of 22 inputs into the output layer. The weighed sum of input is

ACS Paragon Plus Environment

Page 9 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

then transformed using the same sigmoid function described on Eq.(2). In general, the above described neural network architecture performs in two modes: training mode, which allows to adjust the weights of each input value in all layers, and then a predication mode, which gives a predictive (classification) output based on the training features used. Both modes will be described below.

Training Phase Using Back-Propagation The learning step can be achieved by what is referred to as back-propagation process. For a given LC-HRMS raw-data, the algorithm training process is governed by having a realistic representation of all expected features in an LC-HRMS data, as well as sufficient number of data-points for those features in both dimensions (time and m/z). For this work, both a simulated and real-data features has been used. However, it should be noted that in theory, there is no limit to what type of feature is to be used to train the algorithm. In-fact, any suspected profile (i.e. peak shape) can be employed. For this study, Fig.S1A-H (in supporting information) represents all the features used for training the ANN; a chromatographic peak (S1A-D), a chemical noise (S1E-F), and an electronic noise (S1H), where only Fig.S1A, S1E, and S1H are simulated, while as the rest of the depictions are obtained from an actual data. Here, it’s worth mentioning that these features expected to generalize different case scenarios were chosen by visual inspection. For detailed explanation in how the simulations were generated, back-propagation training, and iterative weight adjustment process, refer to supporting information.

Prediction Phase Using Trained Neural Network Given that the weight optimization for the neural networks corresponding to each feature has been performed independently and saved (details in supporting information), the feature detection (prediction) process for the raw LC-HRMS data can be performed. It should be noted

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 10 of 31

that, the number of data-points expected at any given time from any given random region of the raw LC-HRMS data should be the same. However, taking into account LC-HRMS data does not have an equally spaced data-structure in both dimensions (time and m/z), first, a data transformation step is necessary. Grid Based Data Transformation Grid based approach can generally allow transformation of the raw unstructured LCHRMS data into equally spaced structured dataset. However, in order not to lose any essential features present in the original data, this process has to be performed efficiently. Here the transformation process will be described. First, the raw LC-HRMS data can be represented as follows: t={t1, t2, …, tt, …, tT} is a vector representing the tR covering the full LC-HRMS separation space; m={mt,1, mt,2, …, mt,m,…, mt,M} and I={It,m,1, It,m,2, …, It,m,M} are vectors representing the m/z values and their corresponding intensities for the tth acquisition time, respectively. The vector length of m and I can differ based on which acquisition time (tR) is in question, resulting in unequally spaced dataset. Therefore, for the data transformation process, first, σ (¼ of the average peak width in the retention time dimension) and σ (the average standard deviation of the Gaussian chromatographic peak in the m/z dimension), parameters already determined during the training phase can be used to create an equally spaced grid. Here taking in to account there is a slight variation on σ based on the m/z value and instrument, as a rule of thumb we advice to take the central point of the m/z range (e.g. for 100-1000 m/z range, σ of 500 m/z ). The grid to be generated for both dimensions can be mathematical described as follows: Ṫ = t ,!" + (# × t)

(3)

Ṁ = m!" + (# × m)

(4)

ACS Paragon Plus Environment

Page 11 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

In Eq.(3) and (4), # is a vector taking values {1,2,3…} referring to the index of the gridpoint, while as (t and m), (t ,!" and m!" ), and (Ṫ and Ṁ) represent scalar values representing the increment of the grid, scalar values representing minimum observed values form the rawdata, and vectors representing the generated equally spaced grids, for the time and m/z dimensions respectively, for all cases. The values for t and m are defined as follows: t=

'(

(5)

)

m=

'*

(6)

)

Eq.(5) and (6) are based on the fact that the training of the algorithm is performed by assuming a minimum of 21 data-points for a given peak feature in both dimensions, i.e. within +(0 ± 2/0 ) × (0 ± 2/1 )2 window assuming the peak feature is centered at the origin in both time and m/z dimensions, resulting in approximately 5 data-points for every σ or σ window. However, it should be noted that any value that can result in sufficient number of data-points is possible. For this particular work, given the fact that an optimal grid system that avoids over sampling, requiring large memory, or down sampling, that sacrifices features observability is to be avoided, and therefore such an optimal value was chosen. Once the grid has been generated, the interpolation process to transform the raw-data into a grid based data takes two main steps. On the first step, given the generated grid in the m/z domain, and the observed m/z values with their corresponding intensities at each tR from the raw-data, intensity values for each m/z value of the grid are obtained using a classical linear interpolation process. That way, the twodimensional vectors (m/z and intensity) from all tR values will have the same length, and will be equally spaced in the m/z dimension as expected. In the second step, further iterative interpolation in the time dimension takes place. The number of iterations in the time domain is equivalent to the number of elements in the Ṫ vector. Given that the original spectrum at each tR

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 12 of 31

has already been transformed into equally spaced grid in the first step, the aim of this second step is to assign the appropriate spectrum for the equally spaced grid in the time domain (Ṫ). The iteration starts by assigning the spectrum of the closest tR value to the first (smallest) time point of the time grid (Ṫ). The rest of the spectrum assignment process follows an average-nearestneighbor approach. Detailed description of the data transformation can be found in the supporting information. That way, as can be seen on Fig.S2A-C (in supporting information), the shapes of the features from the original data have been efficiently retained to its corresponding gridded data depicted on Fig.S2D-F. Feature Recognition of Gridded Data Any originally existing peak feature falling within +(τ ± 2/0 ) × (4 ± 2/1 )2 window in the original untransformed data, where τ and 4 are the peak centers in the time and m/z dimensions, respectively, is assumed to fall within the [21×21] data-points window in the gridded two-dimensional space. That way, the feature detection process can be conducted efficiently by a moving window. In order to proceed with the feature detection, first, sufficient number of points in the two-dimensional separation space can be selected for hypothesis testing. These points will be referred to as points of interest (POI). For a similar approach based on point-by-point hypothesis testing, refer to reference32. Here, for a single POI, two competing hypothesis can be formulated as follows: H1: There is peak feature present centered at this point. H2: There is no peak feature present centered at this point. Given these points and the defined hypothesis, an attempt to answer the probability of which hypothesis is true can be approached from a Bayesian perspective as follows: P(H |D) = +9(:|;

9(:|;< )×9(;< ) < )×9(;< )2+9(:|;= )×9(;= )2

ACS Paragon Plus Environment

(7)

Page 13 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

P(H> |D) = +9(:|;

9(:|;= )×9(;= ) < )×9(;< )2+9(:|;= )×9(;= )2

(8)

In Eq.(7) and (8), P(H |D) and P(H> |D) are the posterior probabilities, P(D|H) and P(D|H> ) refer to the likelihoods, where D represents the data corresponding to all data-points within the vicinity of POI that will be utilized for the hypothesis testing (details below). On the other hand, P(H ) and P(H> ) refer to the prior probabilities reflecting the prior belief about the presence/absence of any peak feature at the designated hypothetical points (POI) in the 2D space. Here it should be noted that any prior knowledge in regards to the presence/absence of a peak feature based on the LC-HRMS system can be applied. Further discussion on this can be found on reference32. The estimation of the likelihoods for a given POI can be performed by first extracting [21×21] data-points centered at the POI giving a total of 441 data-points. The number of data-points extracted at a given time is equivalent to the number of data-points used to train the ANN. That way, for prediction, the number of points will be sufficient to accommodate an entire peak feature, as well as the number of data-points expected as an input to the neural network will not be violated. However, before prediction step, the information extracted around the POI should be adjusted to be zeros-centered in both time and m/z dimension; this can easily be achieved by taking the POI coordinates (τ and 4) as the mean value. Similarly, the intensity values should be normalized (adjusted between 0 and 1) making the extracted data for the feedforward prediction step compatible with the training dataset. Given the adjusted extracted data, in a feed-forward process, all the information described in the training phase of the algorithm is introduced to all three independently trained neural networks (one network supporting H , and two networks supporting H> ), and an output of the neural network is obtained. Given the three outputs for POI indicating the classification outcome of the data pattern, the likelihoods P(D|H)

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 14 of 31

and P(D|H> ) of the formulation presented on Eq.(7) and (8) can be expanded to accommodate the three case-scenarios by introducing a new parameter ? and marginalizing over it as follows: P(D|H ) = ∑A!  P(D|p! , H )P(p! |H )

(9)

P(D|H> ) = ∑A!  P(D|p! , H>)P(p! |H>)

(10)

On Eq.(9) and (10), ? refers to the three neural networks trained to distinguish the three case-scenarios (peak, chemical noise, electronic noise), with index  referring to a given case. P(D|p! , H ) and P(D|p! , H> ) refers to the probability of observing such a feature pattern under the supposition that the hypothesis H and H> are true with a given neural network, respectively. Similarly, P(p! |H) and P(p! |H> ) refer to the probability of a given neural network under the supposition that H and H> is true. However, under the supposition that H is true, the only logical network expected to be valid is the one trained to give positive output with features in Fig.S1A-D. On the other hand, for the case of H> , the networks trained to recognize chemical/electronic noise are the only logically valid networks expected to give a positive output, and thus Eq.(9) and (10) can be simplified as follows: P(D|H ) = P(D|p , H )P(p|H)

(11)

P(D|H> ) = P(D|p> , H>)P(p> |H>) + P(D|pA , H> )P(pA|H> )

(12)

For Eq.(11), taking into account there is only one network in the formulation considered, P(p |H ) automatically obtains a value of 1, while as P(p>|H ) = P(pA|H) = 0. For the case of H> described on Eq.(12), the only two valid neural networks considered under the supposition H> is true are assumed equally likely, thus, P(p>|H> ) = P(pA |H> ) = 0.5, while as P(p|H> ) = 0. The probabilities, P(D|p! , H ) and P(D|p! , H> ) on the other hand can be calculated from the output obtained from the neural networks. Here it should be noted that, for a given POI, the feature classification output from the three neural networks are assumed to be mutually

ACS Paragon Plus Environment

Page 15 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

exclusive. This assumption is based on the logic that a given feature is assumed to be only one of the predefined features at a given time. In order to make the output from the neural network compatible with Eq.(11) and (12), the classification scores can be converted to a probability. For that, the most commonly applied method in multiclass classification, naive Bayes’ classifiers or artificial neural network to convert any outputs (scores) into normalized probabilities is to implement softmax function33. If the three scores obtained from the neural networks are to be labeled as {E , E> , EA }, score to probability transformation using softmax function can be expressed as follows: P(SH ) = ∑K

 IJ

(13)

I JL<  J

In Eq.(13), P(SH ) refers to the probability transformation for a given score, where SH refers to {E , E> , EA }. Thus by substituting the output from Eq.(13), into Eq.(11) and (12), we obtain the following: P(D|H ) = P(S ) × 1

(14)

P(D|H> ) = P(S> ) × 0.5 + P(SA ) × 0.5

(15)

Finally by applying the outputs from Eq.(14) and (15) into Eq.(7) and (8), we obtain the final posterior probability of whether peak feature is present/absent. The advantage of the Bayesian formulation in this case is for its capability to allow us to incorporate any prior knowledge possessed in regards to whether a random ROI may/may-not be consisting of a peak feature, as a prior probability into Eq.(7) and (8). Details on the implementation of statistical overlap theory for such purpose can be found on reference32. However, for this work, a simple flat prior was assumed.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Experimental Materials and Methods Datasets In order to demonstrate the performance of the proposed algorithm, two datasets known to contain 35 Xenobiotics and 170 pesticides, with different concentrations, and analyzed with two different LC-HRMS systems (LC-TOF/MS and LC-Orbitrap), respectively, were selected. That way, the performance of the algorithm can be tested in different case scenarios (instrument type and concentration variation). The first dataset was obtained from a sample consisting of a human whole blood spiked with 35 known Xenobiotics with minimum required performance level (MRLP) concentration for each compound. To obtain the data, UPLC-MS (ToF) system was used. During analysis, 2 µl was injected on the UPLC-ToF system, and profile spectra were obtained at an acquisition rate of 2.5 spectra/sec for a mass range of 75 to 1100 AMU, in positive electrospray mode. The second dataset was obtained from a sample consisting of a vegetable (Lettuce) spiked with 170 pesticides, at levels corresponding to 50 µg/kg for each compound. To obtain the data, an ultimate 3000 UHPLC coupled to a Q-Exactive Orbitrap mass spectrometer system was used. The mass spectrometer was operated in positive electrospray mode with an acquisition at 70,000 resolving power. Further details can be found on materials and methods section of reference 34 and 35, for the two datasets, respectively.

ACS Paragon Plus Environment

Page 16 of 31

Page 17 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Algorithm All functions for the algorithm were developed on MATLAB Version: 8.2.0.701 (R2016a) and implemented on an Alienware AURORA_R4, Intel(R) Core(TM) i7-3820 CPU, 3.60GHZ, 32 GB RAM, 64-bit Windows 7 Professional Operating System Desktop computer.

Results and Discussion Given the datasets described in experimental section, in order to apply the feature recognition algorithm proposed, first, training the neural network based on both instrument types was necessary. Thus, the average σ and σ for both instruments were determined by observation, and the weight adjustment process as described in the Training Phase Using BackPropagation section was performed (~10 min for this case, time could vary as random weight initialization is involved). By using the trained networks for both instrument types independently, fully untargeted feature detection for the entire dataset was performed (~55 min for 1.5 GB LC-TOF MS data with 1.7×108 data-points and ~67 min for 1 GB LC-Orbitrap data with 2.7×107 data-points). Obviously the computation time is dependent on the number of points selected as POI. The output of the algorithm is a two-dimensional coordinate (tR and m/z) in the LC-HRMS separation space, and their corresponding posterior probability of whether that specific coordinate (POI) is the center of a peak feature. Given the probabilistic output of the algorithm for both dataset with the method described in theory section, and by adopting 0.5 as the logical probability borderline for the two competing hypothesis (H and H> ), all points that obtained a probability of >0.5 based on Eq.(7) were assumed to be in favor of H , indicating the presence of feature (peaks), so therefore were saved, while the rest of the results were discarded. For the dataset generated with an LC-TOF/MS system known to contain 35 Xenobiotics, a total

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 18 of 31

of 3549 peak features were detected (Fig.2). On the other hand, for the data produced with LCOrbitrap/MS system known to contain 170 pesticides, a total of 10,747 peak features were detected (Fig.3). Both Fig.2 and Fig.3, depicts four-dimensional information; the intensity (concentration) of the identified peak is proportional to the relative size of the dot representing it; color depiction refer to the posterior probability of an actual peak feature being centered at that point; the other two dimensions represent the time and m/z coordinates of the features. From the figures, it can be noted that there appears to be no correlation between concentrations (intensity) and probability of feature detection, highlighting the insensitiveness of the proposed method to concentration, but rather feature pattern, fulfilling the main of this work. In addition, sensitiveness of the method to different set of conditions is also proof that the over-fitting problem that is usually observed in ANN was overcome with the noise introduction training method discussed in the supporting information, thus the neural network was generalizing sufficiently.

Targeted Match of Detected Features For a fully untargeted compound screening, in theory, as described in the introduction section, the second step should be to identify any logical pattern out of the detected features, to identify the compound it corresponds to. Taking into account the uncertainty of the instrument involved, this step in theory should be approached in a fully probabilistic manner. However, since such an approach will go beyond the scope of this paper, here, in an attempt to validate the result, an in-house library containing expected retention time (τ ), expected m/z value (4 ), standard deviation of retention-time-shift (/0N ), and the most probable adduct (i.e. [H+], [Na+], [NH4+], [K+] ) for the compounds used to spike the samples was utilized. Given this library a direct search of the compounds from the list of detected features was conducted by using

ACS Paragon Plus Environment

Page 19 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

OPτ ± (/0 + 4/0N )R × (4 ± /1 )S criteria for a minimum of 1 and a maximum of 3 isotopes per compound. In reality, when using such criteria, from the detected features list, more than one possible candidate for each isotope per COI could be observed, however, in the final step, the peak feature with the minimum Euclidian-distance in terms of retention time and m/z from the expected value is chosen with supervision. Once these compounds were confirmed from a subset of the detected features, the rest of the peak features were labeled as peaks from unknown compounds which should further be investigated for identification. Using the above described criteria, from the first dataset (LC-TOF/MS) spiked with 35 know Xenobiotics (details of the dataset can be found on reference34), all 35 compounds were indentified, with only ~105 peak features being attributed to the compounds of interest (Xenobiotics). As an example, Fig.S3A-B (in supporting information), depicts 3 peak features perfectly matched with isotopes of broomperidol and its isotopic distribution. On the other hand, the rest of the detected peak features are assumed to have been generated from different charge states, adduct formation, compounds naturally present in human blood sample, and/or unknown compounds which needs further investigation. Similarly, for the second dataset (LCOrbitrap/MS), spiked with 170 known pesticides (details of the dataset can be found on reference35), out of the 10,747 peak features detected, by using the same criteria as described above, all 170 compound were identified with only ~600 peak features being attributed to the compounds of interests (pesticides). As an example, Fig.S4A-B (in supporting information) depicts 3 detected peak features in perfect match with the isotopes and isotopic distribution of acetamiprid. The rest of the detected peak features were labeled as unknown, and attributed to adduct formation, different charge states, or unknown compounds possibly present in the food sample itself, which requires further probabilistic assessment. Here it should be noted that in the

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 20 of 31

feature recognition step, when the algorithm encounters partially resolved peaks, the advantage of estimating the probability for several data-points (POIs) becomes apparent; i.e. the chance of missing the entire peak feature due to unexpected peak shape is significantly minimized. This is one of the biggest advantages of the proposed method in comparison to classical deterministic methods were an entire region is assessed only once by fitting a single peak model, which are vulnerable to give false negative results if the parameter of the model happen to slightly miss the actual configuration of the data (e.g. if the center of the peak model and the data point don’t coincide). Fig.S5 (in supporting information) depicts such an ambiguous scenario observed for the most abundant isotope of carbarly in the second dataset. For this compound, there is a clear co-elution occurring distorting the peak shape. However, the assessment of several data-points within that region resulted in four data-points obtaining a higher posterior probability (0.9) with respect to the designated threshold (0.5). As can be seen on Fig.S5A-B each co-eluting peak obtained two significant data-points (peak-center and its vicinity) that were utilized to identify the isotope of the compound with OPτ ± (/0 + 4/0N )R × (4 ± /1 )S criteria described above. In general, this particular example is a good indication that the neural network is generalizing efficiently, and no over-fitting during training has occurred. However, in such scenarios, even though the proposed method accomplishes the critical step of identifying a peak region, further de-convolution steps are obviously necessary. In order to check the accuracy of the algorithm further, visual inspection of the detected features labeled as from unknown sources was also conducted. Fig.4 and Fig.5 depicts 10 features each randomly selected for display, from dataset one and two, respectively. As can be seen, the detected features show variation in shapes (including symmetric and asymmetric), width, and amplitude, indicating the algorithm has managed to efficiently generalize variety of peak profiles from both datasets.

ACS Paragon Plus Environment

Page 21 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

It’s critical to understand that, in assumption there was no a priori knowledge about the compounds present in the datasets, in a fully untargeted approach, the probability of every possible, chemically and physically stable compound on the planet, should be a suspect, which makes the list of possible hypothesis enormously high, but yet again the appropriate statistical approach. However, we have limited this work to just the first step focused on developing a robust feature recognition algorithm. The second part aimed at using these features for exhaustive compound identification will be addressed in our future work.

Conclusion In a strict definition, untargeted screening refers to compound identification process fully dependent on the algorithms capability to identify all existing peak features in the raw-data. Given these peak features, a probabilistic assessment in an attempt to understand which of the known or unknown compounds could possibly result in the detected peak features should follow. Unfortunately, most currently existing open-source/commercial software packages are not equipped to efficiently handle the large, high resolution datasets, thus usually applying some sort of data-reduction or pre-processing, vulnerable to discard valuable information too early in the data analysis pipeline, resulting in a propagated error affecting the compound identification step. In our previous work28 we had demonstrated that probabilistic peak detection is superior to deterministic algorithms and have shown that it outperforms widely used existing algorithms. In this work, a novel, fully untargeted, faster and efficient probabilistic feature recognition algorithm based on artificial neural network (ANN) for LC-HRMS data has been developed and its usefulness in forensic and food-safety toxicology context has been demonstrated.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 22 of 31

Associated Content Supporting Information Training Phase Using Back-Propagation; Iterative (on-line/batch mode) Weight Adjustment; Raw LC-HRMS Data Architecture; Grid Based Data Transformation; Pseudo Code for Interpolation in the m/z Dimension; Pseudo Code for Interpolation in the tR Dimension. This material is available free of charge via the internet at http://pubs.acs.org.

Author Information Corresponding Author *E-mail: [email protected]. Phone: +31 20 5257040. Notes: The authors declare no competing financial interests.

Acknowledgments This work was funded by NWO and project partners (DSM Resolve, RIKILT, and NFI) as part of COAST project no. 053.21.105.

ACS Paragon Plus Environment

Page 23 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

References 1. Herrera-Lopez, S.; Hernando, M.D.; García-Calvo, E.; Fernandez-Alba, A.R.; Ulaszewska, M.M. J. Mass. Spectrom. 2014, 49, 878-893. 2. Milman, B.L. Trends Anal. Chem. 2005, 24, 6. 493-508. 3. Berendsen, B.J.; Stolker, L.A.; Nielen, M.W. J. Am. Soc. Mass Spectrom. 2013, 24, 154163. 4. Richardson, S.D. Anal Chem. 2010, 82, 4742-4774. 5. Makarov; A, Denisov, E; Kholomeev, A; Balschun, W; Lange, O; Strupat, K; Horning, S; Anal Chem.2006, 78, 2113-2120. 6. Krauss, M. Anal Bioanal Chem. 2010, 397,943-951. 7. Kind, T; Fiehn, O. Bioanal Rev.2010, 2, 23–60. 8. Marshall, A.G.; Rodgers, R.P. Proc Natl Acad Sci USA.2008, 105, 18090-18095. 9. Kim, S; Rodgers, R.P; Marshall, A.G.; Int J Mass Spectrom.2006, 251,260-265. 10. Dose, E.V.; Jacobson, S.; Guiochon, G. Anal Chem. 1991, 63,833-839. 11. Li, J. Anal Chem. 1997, 69, 4452-4462. 12. Pai, S.C. J Chromatogr A. 2003,988, 233-260. 13. Grushka E. J. Phys. Chem. 1972, 76, 2586-2593. 14. Katajamaa, M.; Oresic, M. BMC Bioinformatics. 2005,6,179. 15. Hermansson, M.; Uphoff, A; Kakela, R.; Somerharju, P. Anal Chem. 2005, 77, 21662175. 16. Woldegebriel, M.; Gonsalves, J.; Asten, A.V; Vivó-Truyols, G. Anal Chem. 2016, 88, 2421-2430.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Page 24 of 31

17. Woldegebriel, M.; Zomer, P.; Mol, H.G.; Vivó-Truyols, G. Anal Chem. 2016, 88, 77057714. 18. Herrera-Lopez, S.; Hernando, M.D.; García-Calvo, E.; Fernández-Alba, A.R.; Ulaszewska, M.M. J Mass Spectrom. 2014,49,878-893. 19. Wu, H.; Li, X.; Yan, X.; An, L.; Luo, K.; Shao, M.; Jiang, Y.; Xie, R.;Feng, F. J Pharm Biomed Anal. 2015,115,315-322. 20. Wehrens, R.; Weingart, G.; Mattivi F. J Chromatogr B Analyt Technol Biomed Life Sci. 2014, 966,109-116. 21. Wu, A.H.; Colby, J. Methods Mol Biol. 2016, 1383, 153-166. 22. McCulloch, W.S.; Pitts, W. Bull Math Biol. 1990, 52, 99-115. 23. Baczek, T.; Bucinski, A.; Ivanov, A.R.; Kaliszan, R. Anal. Chem. 2004, 76, 1726-1732. 24. Svozil, D.; Kvasnieka, V.; Pospichal, J. Chemometr. Intell. Lab. 1997, 39, 43-62. 25. Petritis, K.; Kangas, L. J.; Ferguson, P. L.; Anderson, G. A.; Pasa-Tolic, L.; Lipton, M. S.; Auberry, K. J.; Strittmatter, E. F.; Shen, Y.; Zhao, R.; Smith, R. D. Anal. Chem. 2003, 75, 1039-1048. 26. Zupan, J.; Gasteiger, J. Anal. Chim. Acta 1991, 248, 1-30. 27. Ekins,S.; Perryman, A.L.; Clark, A.M.; Reynolds, R.C.; Freundlich, J.S. J Chem Inf Model. 2016, 56, 1332-1343. 28. Falchi, F.; Bertozzi, S.M.; Ottonello, G.; Ruda, G.F.; Colombano, G.; Fiorelli, C.; Martucci, C.; Bertorelli, R.; Scarpelli, R.; Cavalli, A.; Bandiera, T.; Armirotti, A. Anal Chem. 2016, 88, 9510-9517. 29. Bade, R.; Bijlsma, L; Miller, T.H.; Barron, L.P.; Sancho, J.V.; Hernández, F. Sci Total Environ. 2015, 538, 934-941.

ACS Paragon Plus Environment

Page 25 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

30. Miller, T.H.; Musenga, A.; Cowan, D.A.; Barron, L.P. Anal Chem. 2013, 85, 1033010337. 31. Hornik, K. Neural Netw. 1991 4, 251-257. 32. Woldegebriel, M.; Vivó-Truyols, G. Anal. Chem. 2015, 87, 7345-7355. 33. Costa, M. Int J Neural Syst. 1996, 7,627-637. 34. Woldegebriel, M.; Gonsalves, J.; Asten, A.V; Vivó-Truyols, G. Anal Chem. 2016, 88, 2421-2430. 35. Woldegebriel, M.; Zomer, P.; Mol, H.G.; Vivó-Truyols, G. Anal Chem. 2016, 88, 77057714.

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

FIGURE CAPTIONS Figure 1. Top. Depiction of the neural network topology implemented in this work. Bottom. Zoomed-in representation of a single component of the network. Figure 2. Depiction of all peak features detected from two LC-HRMS (LC-TOF MS) datasets. Dot-size represents maximum intensity of the peak, color represent posterior probability. Figure 3. Depiction of all peak features detected from two LC-HRMS (LC-Orbitrap MS) datasets. Dot-size represents maximum intensity of the peak, color represent posterior probability Figure 4. Depiction of 10 randomly selected detected peak features from LC-TOF MS dataset generated from unknown sources. Figure 5. Depiction of 10 randomly selected detected peak features from LC-Orbitrap MS dataset generated from unknown sources.

ACS Paragon Plus Environment

Page 26 of 31

Page 27 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 1

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 2

Figure 3

ACS Paragon Plus Environment

Page 28 of 31

Page 29 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

Figure 4

ACS Paragon Plus Environment

Analytical Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Figure 5

ACS Paragon Plus Environment

Page 30 of 31

Page 31 of 31

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Analytical Chemistry

For TOC only

ACS Paragon Plus Environment