Downloaded by COLUMBIA UNIV on December 12, 2016 | http://pubs.acs.org Publication Date (Web): December 9, 2016 | doi: 10.1021/bk-2016-1241.ch004
Chapter 4
Statistical Approaches for LC-HRMS Data To Characterize, Prioritize, and Identify Transformation Products from Water Treatment Processes Jennifer E. Schollée,*,1,2 Emma L. Schymanski,1 and Juliane Hollender1,2 1Eawag,
Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland 2Institute of Biogeochemistry and Pollutant Dynamics, ETH Zürich, 8092 Zürich, Switzerland *E-mail:
[email protected].
Studying the formation of unknown transformation products (TPs) from water treatment processes can be a daunting task due to the high volume of information generated with modern analytics such as non-targeted liquid chromatrography high-resolution mass spectrometry. To disentangle and select those unknown compounds, including TPs, a variety of statistical methods can be applied. Significance testing and fold changes can provide an overview of those non-target features in post-treatment samples that are both statistically significant and large in magnitude. Time trend analysis can select non-target features that follow expected intentisty trends. Finally, multivariate analysis such as principal component analysis, hierarchical clustering, and partial least squares can cope with co-varying features to help characterize and group unknown non-targets. With proper sampling and pre-processing, these tools can help to prioritize and identify potential TPs that may be relevant in the environment. In this review, different approaches are presented using examples from the literature and our own research.
© 2016 American Chemical Society Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by COLUMBIA UNIV on December 12, 2016 | http://pubs.acs.org Publication Date (Web): December 9, 2016 | doi: 10.1021/bk-2016-1241.ch004
Introduction The search for transformation products (TPs) in the aquatic environment often focuses on known compounds, especially TPs of contaminants of emerging concern (CECs), such as pharmaceuticals, pesticides, artificial sweeteners, X-ray contrast media, flame retardants, etc. While target and suspect screening approaches are very successful at detecting known compounds, non-target screening methods are being increasingly applied to detect unexpected TPs (1–3). Since non-target refers to anything not included in target and suspect screening lists (4), this can constitute a large number of peaks, especially when using high-resolution mass spectrometry (HRMS). With the inclusion of steps such as solid-phase extraction (SPE) and chromatography for enrichment and separation, large numbers of compounds can be measured simultaneously (5–7). Therefore in many cases measurement or detection of these TPs is no longer the bottleneck; the focus has turned to analyzing these large datasets and prioritizing which of these non-target peaks may be TPs and therefore interesting for identification. This chapter will explore the various statistical approaches that have been employed so far to find TPs that are generated from water treatment processes. The statistical tools presented here cover a variety of possible methodologies, from univariate to multivariate analyses (MVA), from unsupervised to supervised methods, and inclusion of additional data for prioritization. Many of these tools are implemented in open source or in vendor software. Treatment processes comprise different matrices and applications and can be applied for many reasons, e.g., wastewater treatment, drinking water treatment, and river bank filtration. During these treatments, various physical, chemical, and biological processes such as sorption, advanced oxidation, and biodegradation remove compounds from the water matrix. Occassionally multiple steps are applied in series, e.g., in wastewater treatment where filtration may act as a primary treatment, followed by a secondary treatment such as conventional activated sludge (Figure 1). Within one treatment (step), multiple processes may also be occurring, e.g., in powder activated carbon where both sorption and biodegration can lead to the removal of compounds. When studying these treatment processes, researchers sample the pre-treated water (also referred to as the influent) and the post-treated water (also known as the effluent) of either the entire treatment train or of individual steps. In this way a comparison can be made about the changes that have occurred within the treatment step(s) in question.
Figure 1. Example of a wastewater treatment train with multiple types of treatment processes applied in combination. 46 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by COLUMBIA UNIV on December 12, 2016 | http://pubs.acs.org Publication Date (Web): December 9, 2016 | doi: 10.1021/bk-2016-1241.ch004
From these natural and engineered systems it is expected that TPs behave in many cases in predictable ways that are therefore amenable to confirmatory data analysis such as hypothesis testing. In the simplest form, it is expected that a parent compound is removed and in its place a TP is formed by a specific process such as oxidation. Of course the real situation is much more complicated and one always needs to keep this in mind when applying statistics to finding TPs. For example, there is not necessarily a one-to-one relationship between parent and TP. One parent compound can form multiple TPs (8) but also the same TP can come from multiple parent compounds (9). Especially when dealing with mixtures it can be a daunting task to unravel these relationships. Additionally, newly formed TPs can themselves become “parent” compounds, leading to more TPs (10). Depending on the rate of the reaction mechanism sometimes these first generation TPs are not even detected or observed (8). Reactions are also not only one-directional. A TP may transform back to the parent compound under the right conditions, completely reversing the trend that is expected (11). These situations can prove challenging for statistical hypothesis tests. It can therefore be worthwhile to consider exploratory data analysis. This type of analysis may be more useful initially to visualize trends in the data, after which more specific hypotheses can be constructed. These types of techniques can include simple univariate visualizations, such as box plots or scatter plots, or multidimentional methods, e.g., principal component analysis (PCA). The application of statistical tools to finding non-target TPs that result from these processes is still in the early stages. In contrast, within the metabolomics community, extensive time and effort has already been dedicated to investigating and evaluating various different statistical approaches to select unknown metabolites. Metabolomics is the study of small molecules which are the result of specific cellular processes and the metabolites that are studied are therefore biological molecules such as amino acids, sugars, vitamins, etc. which likely perform a cellular function. Metabolomics studies often compare control and treated samples to identify differences in the presence of a stressor such as disease. Hence there is some comparibility between studies searching for TPs and those that seek to identify metabolomics changes. Therefore a short introduction to the tools and workflows common in metabolomics will be given, with an eye on how this may be applied to studies searching for TPs. This chapter focuses on studies which used liquid chromatography coupled to high-resolution mass spectrometry (LC-HRMS) since CECs and their TPs, especially those relevant in water samples, are best detected with this method. Considerations Prior to Statistical Analysis Prior to any statistical analysis, sampling, measurement, and pre-processing are crucial steps to obtaining useful data. Challenges in sampling include accounting for and removing variability, if possible. For example, inputs to wastewater treatment plants (WWTPs) vary with time of day, day of the week, and location (12). In contrast, effluents are more averaged values due to mixing and the retention time of the wastewater plant. Therefore it is important to select a proper sampling strategy which takes into account this variability (13). 47
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by COLUMBIA UNIV on December 12, 2016 | http://pubs.acs.org Publication Date (Web): December 9, 2016 | doi: 10.1021/bk-2016-1241.ch004
Analytics start with selecting an appropriate method for measurement to adequately detect the compounds of interest. In the case of LC-HRMS measurement, this includes proper chromatographic separation, correct ionization mode, sufficient MS resolution and accuracy, and a limit of detection (LOD) low enough to detect compounds in the expected concentration range. Since no method will cover all compounds, during data analysis the researcher should be aware of the types of chemicals which can be detected with the analytical method selected. Additionally, sampling and measurement may introduce their own biases in the data, which can affect the data analysis. When applying univariate metrics to compare signal intensity between peaks in different samples, one needs to be aware of potential matrix effects coming from different sample matrices. For example the influent and effluent of a WWTP or even different surface waters can have very unique matrices. These different matrices can lead to ion suppression, which in turn makes it appear that there are differences in concentration when there in fact are not or vice versa. These matrix effects can partially be reduced during sample preparation; for example it has been shown that diluting influent and effluent wastewater samples reduces the matrix effects, but this can lead to the loss of compounds (14). To account for the possible matrix effects, quality control samples can be included in the study design to calculate matrix factors for each sample type. Additionally, the behavior of internal standards in different sample types can be used to correct for matrix effects, but this only applies to target compounds. Matrix effects are not consistent across a sample (Figure 2) and extrapolation of this phenomenon to unknown peaks is difficult (14–17). After measurement comes pre-processing. The conversion of raw measurement data to a dataset that can then be analyzed is a crucial and non-trivial point. As demonstrated by many, for example Nürenburg et al. (14), a proper pre-processing workflow is necessary to obtain reliable data. This includes generally the following four major areas:
(a) Peak picking, where distinct chromatographic and mass peaks are detected and separated; (b) Feature building, where detected peaks are then combined across samples; (c) Feature grouping, where peaks belonging to the same compounds such as isotopes and adducts are grouped together; (d) Blank and blind subtraction, where peaks belonging to matrix, contamination, or background are removed from the sample data.
Peak picking is the processes of identifying unique chromatographic features measured by the HRMS. Good peak picking starts with a proper method since peaks that are not chromatographically separated will remain challenging for an algorithm to detect. Also broad peaks or very low intensity peaks may not be detected during peak picking, depending on the algorithm and settings applied. Additionally, without sufficient mass accuracy, different isotopic peaks may not 48 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by COLUMBIA UNIV on December 12, 2016 | http://pubs.acs.org Publication Date (Web): December 9, 2016 | doi: 10.1021/bk-2016-1241.ch004
be resolved. The user must select the analytical method which has the highest peak capacity for the relevant compounds but this remains a challenge when the compounds of interest are unknown before analysis, such as in non-target screening (18).
Figure 2. Ratio of areas of each internal standard for matrix factor correction. While a slight downward trend could be observed, the correlation was not strong enough to justify a blanket matrix factor correction of all non-target peaks, demonstrated by the low R2-value of the linear regression. The outlier at 4.78 minutes is the compound Ritalin.
Feature building consists of comparing peaks detected in each sample in order to construct a data matrix which includes all peaks detected in all samples. While in theory each compound should have a defined mass to charge ratio (m/z) and retention time (RT) associated with it, in reality shifts may occur during measurement. Drift can occur on an instrument, both in RT and m/z, which will need to be either corrected for with an algorithm or accounted for with a tolerance window. Many strategies have been suggested for tackling these drifts but this still remains a source of error in constructing data matrices. Additionally, after constructing the data matrix, there is the issue of missing values. In most cases a missing value is not unimportant; a non-detect may be just as valuable as a detection. But the researcher must determine how best to represent this in the dataset, depending on the study question. Many different strategies exist including replacing the missing values with zero, half the LOD, mean, or more complicated imputations. The technique selected can have large implications on the downstream data analysis (19, 20). 49
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by COLUMBIA UNIV on December 12, 2016 | http://pubs.acs.org Publication Date (Web): December 9, 2016 | doi: 10.1021/bk-2016-1241.ch004
Feature grouping involves grouping together detected peaks that result from the same compound (also termed componentization). For example in LC-HRMS, isotopes, adducts, and/or in-source fragments are detected in addition to the monoisotopic peak of a compound. Grouping these peaks together into a feature is an often overlooked but nonetheless important aspect of pre-processing. In many statistical methods the presence of co-varying variables may lead to bias or skew in the data analysis. Since these peaks are resulting from the same compound, they should inherently be co-varying. Therefore their removal is important for the construction of a data matrix. As these peaks provide information about the molecular formula, which is in turn important for structure elucidation, the information must be retained separately. Similar challenges face feature grouping as for feature builiding, e.g., mass accuracy of the measurement may lead to the need for a m/z tolerance window. Additionally, depending on the resolution of the measurement instrument, isotope peaks may not be completely resolved, leading also to a shift of the measured m/z. Finally, selecting a proper blank and/or blind sample is critical (21) but can be a challenge. The more representative a blank/blind is of the true background, the more noise can be eliminated from the data analysis. For example, some researchers have used the pre-treatment sample as a background and subtracted it from the post-treatment sample, resulting in a new chromatograph which shows only those peaks higher after the treatment, and therefore likely to be TPs (22, 23). But this background sample can be difficult if not impossible to obtain. Prioritization based on mass defect to select compounds which contain halogens (and are therefore likely to be anthropogenic) was also applied successfully to separate out e.g., relevant non-targets from sediment matrix (24). In the case of experiments which involve spiking, an unspiked sample can easily serve as the reference for blank subtraction (14, 25), but this is not an option when working with unspiked samples. Two methods for blank correction have also been suggested—blank exclusion, where any feature detected in a blank is removed, and blank subtraction, where the intensity of a feature in a blank is subtracted from the intensity of the same feature in the sample. Blank exclusion is the more conservative approach, although it has been shown that there was not a significant difference between the two (14). A safety factor can also be included, where features that are, for example, ten times more intense in the sample than in the blank are still retained.
Metabolomics Workflows Within the metabolomics community, a number of tools have established themselves as the standard for non-target screening, generally known in metabolomics as untargeted screening. For example, XCMS Online (xcmsonline.scripps.edu) (26, 27) is a powerful online platform which includes many features for data visualization and statistical analysis. The latest version of the software includes pre-processing of the data as described above, using the XCMS approach (28, 29), as well as many univariate and multivariate data 50 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by COLUMBIA UNIV on December 12, 2016 | http://pubs.acs.org Publication Date (Web): December 9, 2016 | doi: 10.1021/bk-2016-1241.ch004
visualization techniques. In the classical study design, a two-group comparison is made (e.g., control vs. treated). Relevant features can then be selected based on the significance of the difference in peak intensity with the selected statistical test as well as on their fold change. This can be then further visualized in a cloud plot. In a recent update to XCMS Online, tests are implemented not only for two group comparisons, but also for multigroup comparisons. MVA is also prevalent in metabolomics research since this provides an opportunity to search for interesting spectral features that differ between sample types while being robust against intra-class variability. The two main tools used for this are PCA and partial least squares projection to latent structures (PLS) (30). XCMS Online provides PCA to explore the relationship between sample types and which features may be characteristic of a specific group. Another common software for MVA in metabolomics is SIMCA (Umetrics Inc.) (31). This software includes two additional multivariate techniques, PLS and orthogonal projection to latent structures (OPLS). PLS has the advantage over PCA that it is a supervised method; a response vector can be included to provide information about the classes to which the samples belong. For example, samples can be ordered into their respective sample types or a more complicated response vector can be used, such as time or concentration. The method then seeks to find the variables which best explain the difference between the indicated response vector. This method can be especially useful if there are large variations within sample types which PCA would then seek to explain (e.g., variations which may arise from changing inputs or fluctuating treatment efficiencies). But the user should be cautious, since PLS and OPLS assume that there is a difference between sample types and can on occasion over-fit the data (32). Generally it is advised to use an unsupervised method first, to investigate the inherent structure of the data, for example to visualize the intra-class vs. inter-class variability and identify possible outliers. After this, supervised methods can be applied to answer more specific research questions. Compared with environmental analyses, statistical analysis of metabolomics does still have some advantages. The matrix effects are generally less between control and treated groups in a metabolomics study than in, for instance, the influent and effluent of a wastewater treatment process. Additionally, the chemical space of interest in metabolomics is significantly smaller, e.g., few structures contain halogens. Extensive time and effort has been put into developing and sharing spectral libraries of metabolites (33). This has made it possible for metabolomics to start to move away from non-target studies and toward quantitative analysis (34). In environmental analysis this trend is actually reversed. For many years analyses focused on the set of known environmental pollutants, using those on a regulatory list. But it has become clear that these known pollutants do not explain the effects that are observed in biological communities exposed to these environmental samples. Therefore, with the advent of new analytical techniques, especially highly-resolved and accurate MS, environmental analyses has moved towards non-target analyses (3). And while there has been progress in developing spectral libraries of environmentally relevant compounds (35–37), the number of chemicals necessary to include is considerably larger and therefore the task much more challenging. 51
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by COLUMBIA UNIV on December 12, 2016 | http://pubs.acs.org Publication Date (Web): December 9, 2016 | doi: 10.1021/bk-2016-1241.ch004
Temporal and Spatial Trend Analysis The simplest system in which to investigate TPs formed during water treatment is a batch experiment. In this case, a sample from the treatment plant (such as activated sludge) is spiked with the parent compound(s) of interest and the treatment process under investigation is applied for a certain length of time. Samples are collected after treatment either once or multiple times and then compared to the initial t=0 sample to observe the behavior of the parent compound and formation of any TPs. In order to study biotransformation in a WWTP, Helbling et al. spiked 12 micropollutants into batch reactors with activated sludge and observed the removal of those compounds over a number of days (10). To find unknown TPs, two samples (t=0 and t>0) from a single biotransformation experiment were compared and then a series of logical filters were applied. For example, since TPs are generally smaller and more polar than their parent compounds (38), they should have a lower m/z and elute at an earlier RT. Additionally, intensities in the t>0 sample were required to be 1.5 higher than in the t=0 sample to account for TP formation. With this method 26 TPs were identified from the spiked parent compounds. This workflow was further developed in Gulde et al. (8) There, in addition to the logical filters, a series of points were used to establish a trend of increasing concentrations using Sieve (Thermo Scientific) software, which lead to the identification of 101 TPs from 19 parent compounds. A similar method was used in Li et al. to find 11 TPs from 9 parent compounds from water/sediment batch experiments (39). These types of experiments have several advantages. As mentioned, using spiked samples means unspiked samples can serve as a relevant blank. Also, using a set of spiked parent compounds makes subsequent structure elucidation more straightforward, since the unknown TP is related in some way to the known parent compound. Finally, these experiments can also provide additional information such as reaction rates and may therefore also help prioritize TPs which are more likely to be persistent. But unfortunately testing all parent compounds in this manner is unfeasible; therefore researchers have moved toward looking for TPs in real-world samples. For investigation of non-target peaks within the Rhine River, a novel method was applied in Ruff et al. (2) Samples were collected at six points along the stretch of the river from Switzerland to the Netherlands. A list was generated of the 150 most intense features measured at all stations. Then at each sampling location, site-specific unknown compounds were found by selecting those features detected at all downstream sampling locations but not at upstream sampling locations. This systematic comparison of upstream and downstream samples resulted in 57 unknown substances for further investigation. While this study was not directly focused on TPs, since WWTP effluents are the main anthropogenic input into the river system (40, 41), it is likely that a number of relevant TPs were present. Gago Ferrero et al. incorporated temporal information into the identification evidence for supposed TPs found during non-target analysis of their samples, rather than into the selection of potential TPs (42). They postulated that pharmaceuticals and their TPs would follow similar weekly and/or diurnal trends. 52 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
While this approach worked well for selected examples, e.g., clarithromycin, N-desmethyl clarithromycin, and hydroxyclarithromycin (Figure 3 in Gago Ferrero et al.), the trends were less obvious for other cases such as venlafaxine and TPs as well as nicotine and TPs (Figure S5, Gago Ferrero et al.). In the end this information was useful to provide some additional evidence but not as clear as may have been anticipated.
Downloaded by COLUMBIA UNIV on December 12, 2016 | http://pubs.acs.org Publication Date (Web): December 9, 2016 | doi: 10.1021/bk-2016-1241.ch004
Univariate Statistics Depending on the question at hand, simple methods can be used to select peaks of interest effectively. Zonja et al. used frequency of detection in real-world samples to select TPs which were environmentally relevant (43). They conducted batch experiments of six X-ray contrast media, then prioritized unknown masses by comparing to a list of predicted TP masses. These masses were then selected by their presence in real-world samples such that identification focused only on those TPs which have been detected in surface waters. This efficient method led to the detection of 11 TPs. While the workflow was not truly non-target because of the incorporation of a suspect screening step with predicted TPs, it demonstrates a method for directly translating batch experiment results to environmental samples. In a study on 10 Swiss wastewaters, non-target substances of interest were selected based on the presence in all samples, prioritized by highest average intensities (1). On the one hand, this captured omnipresent substances, which are thus potentially highly relevant for the environment, such as an exposure-relevant TP from the industrial solvent benzothiazole. But on the other hand also matrix substances and natural mater that are present everywhere were prioritized. An interesting result of this was that certain groups of substances that degraded differently in various WWTPs were missed. The glycol ether sulfate (GES) surfactant series found in Gago Ferrero et al. (42) was also found in Swiss wastewater influents (Figure 3a) and retrospectively in some effluents from Schymanski et al. (shown here in Figure 3b), but was well removed in other effluents (Figure 3c, where several members of the series are close to or below detection limits). As a result of the intensity prioritization, this series was missed in the original investigation, despite very high signals in some samples. One of the first examples of applying statistical methods to prioritize nontarget peaks was reported by Müller et al. (44) Their strategy focused on using mathematical operators to analyze how features changed in different sample types. In this case, they selected features that were detected in both a landfill leachate sample and a downstream groundwater sample, while exluding those in the blank. In contrast to time-trend analyses, no specific trend was needed here, merely the presence of the feature in the relevant sample type; hence only presence/absence information was necessary for the features. For this reason one data restriction was the detection of a feature in multiple samples, to reduce false positives. From this method, they selected three non-target features which appeared in the drinking water after ozonation and identified the source of the contamination. This type of approach could be easily adapted for different study questions, provided the correct samples are collected. 53 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by COLUMBIA UNIV on December 12, 2016 | http://pubs.acs.org Publication Date (Web): December 9, 2016 | doi: 10.1021/bk-2016-1241.ch004
Figure 3. The glycol ether sulfate (GES) surfactant series in (a) Swiss wastewater influent and (b) wastewater effluent where the series is poorly removed and (c) Swiss wastewater effluent where the series is well removed (note the intensity differences). Inset: series members and the structure. Data from Schymanski et al. (1) and Gago Ferrero et al. (42)
Other studies have applied more metabolomics-like workflows to identify potential TPs. Negriera et al. used a combination of Student’s t-test and fold change to select those features which were significantly different between the control and spiked samples in a chlorination experiment (45). They detected 19 TPs from the parent pharmaceutical. In another example of batch experiments, Singh et al. selected TPs which were formed during advanced oxidation based on those features with a 2-fold change intensity between the pre-treatment and post-treatment samples (25). Means of the treatment types were compared with one-way ANOVA, Tukey honestly significant difference (HSD), and Benjamini-Hochberg false discovery rate (FDR). Using this method, 75% reduction in unknown peaks could be achieved. As has already been discussed, the presence of relevant blanks in both these studies was likely crucial for the successful implementation of the workflow. Since spiked wastewater was used for these batch experiments, an unspiked wastewater sample could be used for blank subtraction, particularly useful in a complex matrix such as wastewater. Fold change and univariate statistics have also been evaluated and applied in our own work. Samples were collected from the influent and effluent of a WWTP equipped with anaerobic and aerobic conventional activated sludge treatment; further details in Schollée et al. (15) In a validation test, a number of parent compounds and TPs were spiked at environmental concentrations into the influent and the effluent, respectively, and then compared with fold change and p-value 54
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by COLUMBIA UNIV on December 12, 2016 | http://pubs.acs.org Publication Date (Web): December 9, 2016 | doi: 10.1021/bk-2016-1241.ch004
based on a combination of Welch’s t-test and Mann-Whitney test implemented with the package ‘muma (46). in the R statistical software (47). These results are shown in a volcano plot, which is a common type of scatter plot to visualize changes that both large in magnitude and statistically significant (Figure 4). It can be seen that all compounds were found to be significant (p-value