Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
Chapter 6
A Comprehensive Workflow for Target, Suspect, and Non-Target Screening by LC/MS Demonstrated for the Identification of CECs in Effluents from Waste Water Treatment Plants Thomas Glauner,*,1 Bernhard Wüst,1 and Thierry Faye2 1Agilent
Technologies Sales & Services GmbH, Hewlett-Packard- Strasse 8, 76337 Waldbronn, Germany 2Agilent Technologies France S.A.S., 3 avenue du Canada, CS 90263, Les Ulis, Essonne 91978, France *E-mail:
[email protected].
A comprehensive strategy for the target, suspect and non-target screening for chemicals of emerging concern (CEC) in environmental samples using Agilent Q-TOF instrumentation and software is described. The workflow is demonstrated for effluent samples from waste water treatment plants and important experimental parameters are explained along the way. With the highly sensitive Q-TOF instrumentation target screening for regulated compounds was possible in the WWTP effluents below the regulatory limits. Suspect screening using the All Ions MS/MS workflow based on the Agilent Water Screening Personal Compound Database and Library (PCDL) provided information on the more expected additional CECs with high confidence. The use of statistical software for data reduction in non-target screening is demonstrated. The goal is that time and money spent for the identification of CECs is limited to the most relevant and emerging contaminants. Finally, the identification of compounds based on library searching and in-silico fragment comparison is shown and the connection of MassHunter programs to open source databases and libraries is discussed.
© 2016 American Chemical Society Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
Introduction Environmental regulations throughout the world such as EU directive 2013/39/EU currently focus on monitoring a limited number of well-known contaminants that are assumed to be responsible for significant ecological and human health related risks (1, 2). However, as these priority pollutants represent only a small fraction of the anthropogenic chemicals that are used, there is growing interest to collect occurrence data for further CECs and to identify new and emerging risks. In the context of water reuse, it is critical to identify the most relevant CECs for further monitoring (3). Traditional targeted analytical methods are increasingly complemented by untargeted acquisition methods using high resolution accurate mass Q-TOF GC/MS and LC/MS. Using both techniques allows to screen for thousands of compounds over the full polarity range with high sensitivity at very fast data acquisition rates. This is essential to obtain information on molecular ions, isotope patterns, and fragments with a single injection and minimal sample preparation. High resolution Q-TOF mass spectrometry provides excellent mass accuracy sufficient for chemical formula generation and is ideal to resolve compounds with similar molecular weights (4). With modern Q-TOF LC/MS instruments for most polar compounds regulated by the EU Water Framework Directive, the EU Drinking Water Directive, or the EPA Clean Water Act, method detection limits in the low ng/L range can be achieved with a large volume direct injection of the water sample (5). A comprehensive screening strategy for environmental contaminants combines both, a target and a broad suspect screening, using commercial and open source databases as well as a untargeted data analysis, ideally using the same samples or even the same data files. On the other hand, the untargeted data analysis is a multistep approach starting with the feature detection, followed by the grouping of adducts and isotope distribution analysis. With a good mass accuracy and a coherent isotope pattern, the correct molecular formula can be generated which further allows searching in established databases (6, 7). However, even with the correct molecular formula many structures can be associated and fragment information might be required to identify the correct structure. To handle the complexity of the data, implementing statistical methods helps to identify the chemical differences between two or more different sample groups. The compounds which are responsible for the differentiation of samples or sample groups are called molecular differentiators. The goal is to focus on those molecular differntiators rather than identifying all compounds in a given sample type (4). Most anthropogenic contaminants enter the aquatic environment as a result of incomplete removal in wastewater treatment plants (WWTPs). Moreover, degradation and transformation products are formed during chemical and biological wastewater treatment. These compounds are typically not monitored and often not even known (8, 9). In this chapter, an efficient strategy for target, suspect and non-target screening is introduced using Agilent Q-TOF LC/MS and MassHunter software. Advantages of the workflow are demonstrated for a comprehensive set of WWTP effluent samples. Emphasis will be put on the experimental design, and the essential parameter set required to identify CECs 114 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
and distinguishing molecular differentiators present at high, medium, and low abundance in the sample set. Strategies for the identification of significant compounds using databases, spectral libraries and in-silico fragmentation tools such as Agilent MassHunter Molecular Structure Correlator (MSC) will conclude the discussion.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
Experimental Effluents of four different WWTPs in central Europe were collected as 14-day composite samples over 3.5 months (March to June). The catchment areas of three of the WWTPs are agriculturally dominated (AG, AI, AL), whereas the other one is located in an urban area (AZ). One of the WWTPs also receives some industrial wastewater (AL). Efficient nitrification-denitrification is observed in three of the WWTPs (AG, AL, AZ) and, thus, a better elimination of trace contaminants can be expected. Samples were filtered using glass fiber filters. Ultrapure water was filtered using the same glass fiber filters and was used as the procedural blank. All reagents and solvents were HPLC or LC/MS grade. Acetonitrile, methanol, and acetic acid were purchased from Fluka (Sigma-Aldrich, Buchs, Switzerland). Ammonium acetate was purchased from VWR International (Darmstadt, Germany). Ultrapure water was produced using a Milli-Q Integral system equipped with a 0.22-μm point-of-use membrane filter cartridge (EMD Millipore, Billerica, MA, USA). For calibration and spiking experiments, the mixed standard solutions of pesticides, pharmaceuticals, and drugs of abuse, which are available as part of the Agilent LC/MS application kits, were combined to a multi-analyte working solution with a concentration of 1 µg/mL. Calibration samples were prepared by dilution of the working solution with tap water. The analysis was carried out using an Agilent 1290 Infinity UHPLC system consisting of a binary pump (G4220A), a high performance autosampler (G4226A) equipped with a large volume injection kit and a thermostatted column compartment (G1316C). An injection volume of 100 µL of the filtered WWTP effluent was injected directly onto an Agilent Zorbax SB-Aq separation column (150 x 2.1 mm, 1.8 µm) thermostatted to 40 °C. A flow rate of 0.4 mL/min was used and ammonium acetate 1mM plus acetic acid 0.1% in water (A) and acetic acid 0.1% in acetonitrile (B) were used as mobile phases. The UHPLC system was coupled to an Agilent G6550A iFunnel Quadrupole Time-of-Flight LC/MS System equipped with a Dual Spray Agilent Jet Stream electrospray ionization source. MassHunter Acquisition software for TOF/Q-TOF B.06.01 Service Pack 1 was used for data acquisition. The Q-TOF LC/MS instrument was operated in positive and negative electrospray ionization (ESI) with three different methods: All Ions MS/MS acquisition with 3 scans/sec and three discrete collision energies (0, 20 and 40 V), in TOF only mode with 3 scans/sec and in data dependent MS/MS mode with a data rate of 5 scans/sec in MS and 5 scans/sec in MS/MS with a fixed collision energy of 20 V to aid compound identification. More detailed experimental information can be found elsewhere (10). 115 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
For target and broad suspect screening data was evaluated using MassHunter Qualitative and Quantitative Analysis B.07.00 in combination with the MassHunter Water Screening PCDL. For non-target screening data was evaluated using the MassHunter Profinder software B.06.00 using the Batch Recursive Feature Extraction. Statistical data analysis was performed using MassHunter MassProfiler for single sample comparison and using MassHunter Mass Profiler Professional (MPP) for multivariate statistics. Distinguishing molecular differences were tentatively identified using the ID Browser which can be called directly from MassProfiler and MPP. Accurate mass MS/MS spectra were searched against the MassHunter Water Screening PCDL and customized databases including spectra acquired from reference standards and downloaded from open source libraries like Norman MassBank. Non-matched spectra were finally evaluated using MassHunter Molecular Structure Correlator which combines a database search in open source databases like ChemSpider and PubChem with an in-silico fragment prediction.
Results and Discussion Workflow Overview Within the scientific community there is agreement that for a comprehensive risk assessment of environmental samples, targeted analytical methods need to be complemented by untargeted acquisition using high resolution accurate mass LC/MS and GC/MS. Ideally the same sample or even the same data file used for target analysis of regulated and commonly found contaminants, is evaluated in a comprehensive data analysis workflow to reveal what else can be found in the sample. Figure 1 outlines the recommended data analysis workflow for target, suspect and non-target screening using an Agilent Q-TOF LC/MS and the MassHunter data analysis software. The identification levels mentioned in Figure 1 are based on Schymanski, et al. (11) It should be noted that confidence in identifidation for tentatively identified compounds (level 2A to 3) in both workflows can be increased to level 1 by acquiring data for reference standards and comparing spectra and retention times. Those compounds finally identified with a reference standard can be added to the target screening approach for absolute quantitation. While the workflow for target and suspect screening generally starts with a hypothesis of potential contaminants which finally has to be confirmed, non-target screening aims to identify the truly unknown substances. This is a very difficult and time consuming task with no guarantee of success (12, 13). The value of software tools and algorithms therefore results from their ability to condense the countless compounds detected in an environmental sample to a manageable number of relevant substances which justify the additional investment of time and money for a final identification.
116 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
Figure 1. Recommended data analysis workflow for target, suspect and non-target screening using an Agilent Q-TOF LC/MS and the MassHunter data analysis software.
Target and Suspect Screening Using UHPLC/Q-TOF MS The target and suspect screening workflow starts with an All Ions MS/MS acquisition of the environmental sample in positive and negative polarity. In All Ions MS/MS precursor and fragment information is acquired subsequently at a very high speed as described before. A key element of the target and suspect screening strategy is the definition of a compound database containing expected contaminants, known and probable transformation products and even theoretically predicted reaction products. The Agilent Personal Compound 117 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
Database and Library (PCDL) manager software provides the infrastructure for tailored databases which contain compound names, formulas and accurate masses as well as retention times and links to open source databases like ChemSpider, PubChem and others. Moreover, it contains accurate mass MS/MS spectra for different polarities and precursor species. For environmental analysis Agilent offers a Water Screening PCDL which contains a relevant list of more than 1,400 environmental contaminants including all compounds currently regulated in the US, EU, and China, and further compounds that have either been previously detected in the environment or are likely to be detected due to their production amount or widespread use. For more than 1,000 compounds accurate mass MS/MS spectra are included, which were acquired under standardized conditions and curated for their theoretical formula mass. The PCDL is the source of compound formulas for the “Find by Formula” (FBF) data mining algorithm which automatically pulls extracted ion chromatograms (EICs) for all potential adducts and isotope signals, and bins them together as one compound entry. The extracted peak spectra are compared against the theoretical isotope distribution and scored based on accurate masses, the isotope pattern match and the retention time if stored in the PCDL. In the All Ions MS/MS data analysis workflow the most abundant fragment ions from the accurate mass PCDL spectra are extracted for the targeted compound and chromatographic elution profiles of the EICs of all fragments are correlated for a more confident identification. What differentiates target from suspect screening is the availability of reference standards for a final confirmation (identification level 1) and absolute quantitation. For the analysis of the WWTP effluents a subset of the Agilent Water Screening PCDL with 390 entries was created for which reference standards were available. Data of a medium level calibration sample, corresponding to 1µg/L was initially evaluated with the MassHunter Qualitative Analysis Software using the FBF data mining algorithm with a mass error of ± 5 ppm and a retention time window of ± 0.5 minutes compared to the expected retention time. Fragment confirmation with the five most specific ions from the PCDL library spectra was applied to identify suitable qualifier ions and the relative ion ratios for compound confirmation. Information was automatically passed on to the MassHunter Quantitative Analysis software for quantitation and batch review using accurate mass for molecular ions and fragments as well as isotope pattern matching as criteria for compound identification. Figure 2 shows the chromatograms, spectra and calibration curves for ibuprofen and quinoxyfen as example acquired with a 100 µL direct injection. A large variety of environmental contaminants could be quantified in the WWTP effluents at concentrations ranging from a few ng/L to several µg/L. Residues from pharmaceuticals dominated in the effluents of the WWTPs with urban catchment areas while most pesticides were found in the effluent of AI, which has an agriculturally-dominated catchment area. Highest concentrations were detected for azoxystrobin, flufenacet, linuron, metamitron, methomyl, metribuzin, propamocarb, spiroxamin, and terbuthylazine which are used for the major crops grown in that region (cereals, vegetables, corn, beetroot, and potatoes). 118
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
Figure 2. EIC chromatograms of molecular ion and fragments, peak spectra and calibration curves for ibuprofen (neg. mode) and quinoxyfen (pos. mode).
In all WWTP effluents the X-ray contrast media iopromide and iomeprol were found in concentrations of up to 2 μg/L and 7 μg/L, respectively. Other common pharmaceuticals were amisulprid (up to 500 ng/L), atenolol (up to 1.7 μg/L), metoprolol (up to 470 ng/L), and tramadol (up to 2 μg/L) as well as carbamazepine, diclofenac, ibuprofen, naproxen, and sulfamethoxazole. In total 33 pharmaceuticals and metabolites as well as 46 pesticides could be quantified. The insecticide diethyltoluamide (DEET) was found in all WWTPs in concentrations between 14 and 770 ng/L. Also present in samples of all WWTPs were the herbicides metolachlor (up to 1.1 μg/L) and isoproturon (up to 450 ng/L). The All Ions MS/MS data analysis workflow was also used for the broad suspect screening and the same data files used for target screening were evaluated for an extended analytical scope. The availability of accurate mass MS/MS information is key for the identification of potential candidates. In the FBF data mining algorithm, the EICs for precursors and the most specific fragment ions from the PCDL spectra are correlated based on their chromatographic profile. Identification with high confidence (level 2A) is achieved when the EICs of the molecular ion and typically two to three fragment ions show perfect co-elution (expressed by a coelution score of > 90 out of 100), and if the mass accuracy for both, molecular ions and fragments is better than 5 ppm. The number of fragment ions available for fragment confirmation is dependent on the molecular structure and the precursor mass. In some cases only one specifc fragment was observed. 119
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
The coelution score takes into account factors such as abundance, peak shape (symmetry), peak width, and retention time. The normalized intensity ratios across the peak are plotted in a coelution plot. Figure 3 shows the identification results for the angiotensin receptor blocker valsartan in a WWTP effluent sample. All 5 evaluated fragment ions from the accurate mass spectrum were successfully correlated with the chromatographic profile of the [M+H]+ ion.
Figure 3. Overlay of precursor and fragment ion traces for valsartan in a WWTP effluent (A), coelution plot (B) and compound identification results including the coelution score (C). Data used for target screening was re-evaluated in this way, using another subset of the Agilent Water Screening PCDL containing all compounds except those used in target screening. Based on the identification criteria specified above, a number of pharmaceuticals were identified in the WWTP effluent samples (level 2A), e.g. candesartan, irbesartan, losartan, clarithromycin, venlafaxine, and its metabolite desmethyl venlafaxine, citalopram, cetirizine, and ritonavir. Moreover, nine further pesticides (napropamid, pyrimethanil, fenamidone, lenacil, dimethenamid, boscalid, dinoseb, fludioxonil, penconazole), 120 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
the UV filter phenylbenzimidazole sulfonic acid, perfluorooctanoic acid (PFOA) and 4 organophosphates (triethylphosphate, tris(2-chloroethyl)phosphate, tributylphosphate, and triphenyl phosphate) were found. In a few cases, when contaminants were present only at extremely low concentrations and, thus, signal intensities of the fragment ions were low, fragment confirmation was not possible and only a level 3 identification was achieved. This was also true if interferences hampered the detection of low mass fragment ions, or if no accurate mass MS/MS spectra were available in the PCDL. In all of these cases further information is required to increase the confidence in identification (see “Identification of suspects and unknowns”).
Non-Target Screening Non-Target Feature Finding Innovative software algorithms for the untargeted finding of molecular features in unknown samples have revolutionized the workflow and the efficiency of non-target screening. The MassHunter Molecular Feature Extractor (MFE) combines the mapping of signals in the 3-dimensional space of time, mass and abundance to create a list of ions. Groups of ions with very close retention times and with a chemical relation (i.e. isotope and adduct signals, dimers) are combined into a “Molecular Feature” which is basically a compound. A compound is recorded by the software as the uncharged mass at a retention time and with an abundance value which is the sum of the intensities of all ions belonging to the feature. The Q-Score in MFE provides a measure of the quality of a feature and takes information on the number of ions and species contributing to the feature, the signal-to-noise ratio, the consistency of the accurate mass and the symmetry of the elution profile for each ion in the feature into account. By merging multiple ions in one feature and by providing a measure for the feature quality, the data set is already reduced and the number of false positives in the feature list is largely decreased. MFE is working with data files acquired with full spectrum acquisition (TOF mode), with target and data dependent MS/MS or with All Ions MS/MS data. In case All Ions MS/MS data is used, MFE uses only the information from the low energy trace to collect information from the molecular ions. The typically low concentrations of emerging contaminants in environmental samples makes the definition of reasonable ion intensity and peak area thresholds a challenging task. Lower thresholds increase the chance to detect a potential contaminant but at the same time the overall number of detected features and time for data processing is dramatically increased. A good strategy to identify good thresholds for the MFE algorithm is to analyze quality control samples (QC) in replicates within a sample sequence. A good QC sample would be a sample matrix which has been spiked with several indicator compounds, covering the whole polarity and mass range of the analysis. Indicator compounds should be spiked at low, medium and high concentrations or should cover substantial intensity differences in ESI. 121
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
As QC sample a WWTP effluent was spiked with a mixed standard solution containing 390 compounds. Several peak intensity thresholds were tested until 95% of the spiked compounds were identified by database search in the MFE results. Final settings were an MFE height filter of 50,000 counts and a Q-Score of >85 in combination with a Find-by-Ion (FBI) height filter of 1,000 counts with an FBI score of > 65. Even for very low threshold settings, some features might be missed in one sample or another. In order to reliably find the relevant differences in the subsequent statistical analysis, it is recommended to use the batch recursive feature extraction. This algorithm is available in the MassHunter Profinder software. Moreover, the recursive analysis can be triggered in MassHunter Qualitative Analysis with a compound list from MPP. In the recursive workflow MFE features found in multiple data files are aligned by mass and retention time and a composite feature list across all samples is created. For each composite feature a composite spectrum is created which contains all ions which have been detected for this feature across all samples. Based on the composite feature list and spectra, ions are regrouped to reduce the number of “single ion compounds” and wrong charge carrier assignments. Subsequently composite features are searched again in all samples using a targeted data mining algorithm (“Find-by-Ion”) eventually using a lower or no intensity threshold.
Mass Profiler – Single Sample Comparison Inherent molecular differences between two samples or two sample groups are most effectively investigated using the MassHunter Mass Profiler software. After recursive feature extraction with Profinder or running MFE with recursive grouping directly in MassProfiler, data can be filtered using different statistical tools and including the Q-Score described before as a post alignment filter. Filtering the data is feasible when multiple replicates have been analyzed for each sample. Two samples from different times in the growing season (beginning of March vs. end of June) from the WWTP AI were selected for comparison. Samples were analyzed in triplicates and processed using MFE with recursive grouping in conjunction with post-alignment filtering via Q-Score. From the raw feature list of 37,727 features detected in all samples the post-alignment filter resulted in 9,178 composite features. Filtering features on frequency, requiring a feature to be present in 100% of all samples within one condition, reduced the number of composite features to 7,869. Applying a Q-score of 80 (out of 100) reduced the number of compounds moderately to 5,849. Increasing the Q-score to 95 resulted in 5,094 remaining features and a further increase to 100 decreased the number finally to 2,445 compounds. A Q-Score of 100 significantly reduces the false positive rate but at the same time increases the chance that relevant but low level features are missed. In order to reduce the data to a handleable number of molecular differentiators, false negative rates and the total number of compounds put forward for further identification need to be balanced against each other’s. 122
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
Features were further filtered on abundance and a fold change of > 10 was specified as the cut-off value for the differential analysis to identify unique or significantly more abundant compounds. 165 features were either unique (133) or significantly increased (32) in the sample from end of June (AI-8). Once filtered the remaining features were identified using the ID browser which is called directly from Mass Profiler. A database search using the Agilent Water Screening PCDL and a Molecular Formula Generation (MFG) was applied for identification. Figure 4 shows a bubble diagram of the molecular differentiators of sample AI-8. The size of the bubbles corresponds to the overall abundance of the feature. Compounds shown in black are fully identified based on a database search including retention times (level 3), while for compounds in white a molecular formula has been assigned (level 4). There were 4 compounds (shown in grey) for which no formula evidence was available. Within the 67 fully identified molecular differentiators there were 31 pesticides or pesticide metabolites and 26 pharmaceuticals and metabolites. Several pesticides like napropamide, terbutylazine, S-metolachlor, propamocarb, pyrimethanil, metalaxyl, metribuzin, and flufenazet were amongst the 10 most abundant compounds. Major crops grown in the catchment area of this WWTP are cereals, vegetables, corn, beetroot and potatoes and all compounds mentioned above are commonly used as plant protection products on these crops. For both, the compounds identified by database search and the 94 compounds for which molecular formulas could be assigned, confidence in identification can be increased by adding MS/MS information (see “Identification of suspects and unknowns”).
Profiling Using MPP – Multivariate Statistics In many cases the comparison of two conditions is not sufficient and multivariate statistics is required. Examples for the use of multivariate statistics would be the comparison of several related sampling points, eventually even over a time course, or the analysis of a treatment process with several subsequent treatment steps. In these cases, the chemometric software package MassHunter Mass Profiler Professional (MPP) would be used. In MPP samples can be grouped based on different experimental parameters and each parameter can have several values. The data shown here were grouped based on the four different WWTPs (AG, AI, AL and AZ) and on the nine 14-day composite samples, sampled over 3.5 months (March to June). Each of the 36 samples was measured in triplicate in random order within the sample sequence. In addition, procedural blanks and QC samples were measured and added to the MPP project. Data import settings included a minimum abundance filter of 5,000 counts and a minimum of 2 ions to be associated with each feature. Signals were baselined to the median of all samples. By requiring a feature to be present in all of the three replicates, the total number of features was reduced by more than 50%. This process is also available during recursive feature finding in the Profinder software for initial feature reduction. Molecular similarities and differences can be visualized by principal component analysis (PCA). Figure 5 shows the PCA plot for the comparison of the four WWTPs across the sampling period. 123
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
Figure 4. Bubble diagram of molecular differentiators in sample AI-8. Bubble sizes correspond to feature abundances, black compounds are fully identified (level 3), for white bubbles molecular formulas could be assigned (level 4).
The differences between the WWTPs were larger than the differences over the sampling period. AI and AZ showed the biggest differences whereas AG and AL were more similar to AI. This can be also visualized by cluster analysis using a hierarchical clustering algorithm. In the hierarchical condition tree compounds are grouped together based on the presence or absence in a specific sample group. Relationships are formed based on the similarity of the compound distribution. In the cluster analysis AI and AG showed the closest relationship. AL was more closely connected to AI and AG and AZ showed the largest differences. The relationships can be explained by the catchment areas with similar land use, the population equivalents connected to the WWTP and the treatment technology. While most of the WWTPs had a very similar chemical inventory over the sampling period, the effluents from WWTP AI showed larger variability. Therefore, the samples from AI were analyzed in more detail. 124 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
Figure 5. PCA analysis of entities detected in the 4 treatment plants over all time points (Component 1: 30.8%, Component 2: 17.3%, Component 3:14.4%).
Further data reduction was achieved by subtracting the entities from the blank samples, filtering compounds on a fold change of > 10 and by applying significance analysis using a student t-test including an asymptotic p-value calculation with a Benjamini-Hochberg multiple testing correction. The remaining features can be displayed in a profile plot to recognize compounds which show interesting time courses, e.g. to identify compounds which are significantly increasing over time. This was e.g. observed for a compound which was identified based on database search as azoxystrobin. Data was visualized in a box-whisker-plot showing the raw intensities over the sampling period in AI. There was a clear peak in the relative concentrations going up by more than a factor of > 10 from the beginning of March to end of June. Searching similar entities to azoxystrobin based on a Euclidian similarity metric helped to identify further compounds with a similar distribution in the samples. With a decreasing similarity cut-off value increasing numbers of compounds were flagged which showed a similar behavior. For a cutoff value > 0.7 there were 4 additional compounds flagged including pyrimethanil and trifloxystrobin. With a cut-off value > 0.5 there were in total 46 additional compounds flagged which have a high similarity. 125
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
The list of molecular differentiators in the PCA analysis and the similar entities to azoxystrobin were identified using the ID browser. A database search using the Agilent Water Screening PCDL and a Molecular Formula Generation (MFG) was applied for identification. Confidence of the identification can be further increased by adding MS/MS information (see “Identification of suspects and unknowns”).
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
Identification of Suspects and Unknowns The result of a non-target screening is a list of molecular differentiators which need further identification. This is especially true if only a molecular formula could be calculated or if a compound name has been tentatively identified based on a database search. Without further structural evidence or retention time information the risk of reporting a false positive result is high. One possibility to add structural evidence is to add the compound to the suspect list and to follow the All Ions MS/ MS workflow described before. If accurate mass library spectra are available in the PCDL format or in literature, the matching of co-eluting fragment ions increases the confidence in identification to level 2A. However, in cases where contaminants are present only at trace concentrations or if interferences occur between the low mass fragment ions and ions from the matrix, the verification of compounds by the All Ions MS/MS workflow is hampered. Under these circumstances an accurate mass MS/MS library comparison after precursor isolation needs to be performed. MassHunter Qualitative Analysis as well as the two statistical software packages Mass Profiler and MPP enable the export of MS/MS target lists of selected compounds including m/z values and retention times for the creation of a target MS/MS acquisition method. In a consecutive run using the same chromatography, accurate mass MS/MS spectra for the targeted precursors are acquired. Data analysis for the targeted MS/MS run starts with data mining using the FBF data mining algorithm using the list of tentatively identified compounds and assigned molecular formulas as formula source. Accurate mass MS/MS spectra are automatically extracted for each compound and can be compared to PCDL spectra from comprehensive MS/MS libraries. The first choice is the comparison against the MassHunter Water Screening PCDL, but searching in further PCDL products and open source libraries can also help identifying potential candidates. Figure 6 shows the identification of metformin by MS/MS library searching in an effluent sample of WWTP AZ. Due to the low mass of the molecular ion and the even lower masses of the fragments, All Ions MS/MS verification was not successful. Specificity of the fragments was not sufficient and chromatograms were interfered from the sample matrix. With data dependent MS/MS acquisition, identification was achieved by good mass accuracy (Δ m/z -0.7 ppm) and isotope pattern matching in MS1 mode and by matching the acquired MS/MS spectrum with the library spectrum from the Agilent Water Screening PCDL. All major fragment ions from the library spectrum were found in the measured spectrum within a narrow mass extraction window and in a similar ratio as in the reference spectrum (CE 20 V). The reverse searching with a modified Dot Product search 126 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
(14) algorithm resulted in a score of 95.4 (out of 100) and verified the presence of metformin in the sample (level 2A).
Figure 6. Compound chromatogram (A) and cleaned peak spectrum (B) of metformin in a WWTP effluent sample and comparison of the acquired accurate mass MS/MS spectrum with the reference spectrum from the Agilent MassHunter Water Screening PCDL (C).
In cases where there is no library spectrum available, for example, for newly identified compounds or suspected transformation products, accurate mass MS/MS spectra can be compared to the theoretical fragmentation of a compound in the Agilent MassHunter Molecular Structure Correlator (MSC) software. The software is based on an automated assignment of fragment formulas to MS/MS signals using a systematic bond disconnection of the precursor structure as criteria for the ranking of the resulting substructures (15). MSC allows for a batch searching of multiple accurate mass MS/MS spectra from one or multiple data dependent or target MS/MS data files. Structure sources are either ChemSpider and PubChem, or a local collection of PCDL libraries and .mol files. Results are scored based on the mass accuracy and isotope pattern fit for the assigned structure and a penalty based scoring system for potential fragments which also takes the relative abundance of the fragments and the coverage of explained fragments into account. Moreover, the results can not only be sorted by the score but also by the number of references for the compound e.g. in PubMed. Figure 7 shows the MSC results for a compound which has been identified as molecular differentiator for the WWTP AZ. Based on the accurate mass, a molecular formula of C11H9NO4S2 has been assigned resulting in a high MFG 127
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
score (96.4 out of 100). The accurate mass MS/MS spectra were transferred to MSC and ChemSpider was used as the structure source. From the 50 found structures the one with the highest score (92.8 out of 100) and with the highest number of references was (Benzothiazol-2-ylthio)succinic acid. For this compound 93.8% of the observed ions in the MS/MS spectra could be assigned with structural evidence. (Benzothiazol-2-ylthio)succinic acid is used as part of a corrosion inhibitor in industrial coatings. No information was available about the annual production rates within the REACH database. An accurate mass MS/MS library spectrum for this compound was not available for further identification and the next step would be the final confirmation using a reference standard.
Figure 7. MSC results for (Benzothiazol-2-ylthio)succinic acid based on an accurate mass MS/MS spectrum and a structure search in ChemSpider. Suggestions for major fragment structures are shown.
Conclusions and Outlook Availability of a comprehensive strategy for the target, suspect and non-target screening along with the required hardware and software tools is critical for today’s analytical challenges not only in environmental research. There are trends towards harmonization of screening workflows in adjacent scientific disciplines like food, forensics, and metabolomics. Highly sensitive and robust Q-TOF instrumentation enables routine target screening below regulatory limits and at the same time provide information about “what else is in the sample”. The goal of the software workflows demonstrated before, is the effective reduction of 128 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
data to enable the chemist to focus on the identification of the most promising new CECs. This is mainly done by the chemometric evaluation of data but also other strategies like mass defect filters, spectral similarity searches or metabolite identification can be used successfully in finding relevant compounds. Another strategy referred to as “de-replication” is the collection of common compounds in a database. Even if the identity of these compounds is not finally confirmed it can help to reduce the analysis time. Having access to comprehensive databases and accurate mass libraries of environmental contaminants is key for the identification of potential CECs. Agilent provides a variety of PCDL products which contain entries for relevant compounds in several markets. The accurate mass MS/MS spectra are acquired under quality controlled conditions and fragment masses are curated for their theoretical formula mass. In addition, the connection of the data analysis software with open source databases and libraries is becoming increasingly important. The Agilent MassHunter software programs allow the export of a human readable compound exchange file (.cef) which can be used to connect MassHunter results to databases and open source software tools. This has been successfully demonstrated by a converter program enabling the export and import from accurate mass MS/MS spectra to NORMAN MassBank.
References 1.
2.
3.
4.
5.
6.
7.
Directive 2013/39/EU of the European Parliament and of the Council of 12 August 2013 amending Directives 2000/60/EC and 2008/105/EC as regards priority substances in the field of water policy. Off. J. Eur. Union L. Brussels 2013, 8, 24. Gago-Ferrero, P.; Schymanski, E. L.; Bletsou, A. A.; Aalizadeh, R.; Hollender, J.; Thomaidis, N. S. Extended Suspect and Non-Target Strategies to Characterize Emerging Polar Organic Contaminants in Raw Wastewater with LC-HRMS/MS. Environ. Sci. Technol. 2015, 49, 12333–12341. Rodriguez, C.; Van Buynder, P.; Lugg, R.; Blair, P.; Devine, B.; Cook, A.; Weinstein, P. Indirect Potable Reuse: A Sustainable Water Supply Alternative. Int. J. Environ. Res. Public Health 2009, 6, 1174–1209. Knolhoff, A. M.; Zweigenbaum, J. A.; Croley, T. R. Nontargeted Screening of Food Matrices: Development of a Chemometric Software Strategy To Identify Unknowns in Liquid Chromatography−Mass Spectrometry Data. Anal. Chem. 2016, 88, 3617–3623. Yang, D.-H. D.; Murphy, M. A.; Song, Y.; Chan, J. Sensitive Screening of Pharmaceuticals and Personal Care Products (PPCPs) in Water Using an Agilent 6545 Q-TOF LC/MS System. Agilent Application Note, 5991-5954EN; 2015, Santa Clara, California. Knolhoff, A. M.; Callahan, J. H.; Croley, T. R. Mass Accuracy and Isotopic Abundance Measurements for HR-MS Instrumentation: Capabilities for Non-Targeted Analyses. J. Am. Soc. Mass Spectrom. 2014, 25, 1285–1294. Kind, T.; Fiehn, O. Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinf. 2007, 8, 105. 129
Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.
8.
9.
Downloaded by UNIV OF FLORIDA on January 12, 2017 | http://pubs.acs.org Publication Date (Web): December 12, 2016 | doi: 10.1021/bk-2016-1242.ch006
10.
11.
12.
13.
14.
15.
Gómez, M. J.; Gómez-Ramos, M. M.; Malato, O.; Mezcua, M.; Férnandez-Alba, A. R. Rapid automated screening, identification and quantification of organic micro-contaminants and their main transformation products in wastewater and river waters using liquid chromatography–quadrupole-time-of-flight mass spectrometry with an accurate-mass. J. Chromatogr. A 2010, 1217, 7038–7054. Letzel, T.; Bayer, A.; Schulz, W.; Heermann, A.; Lucke, T.; Greco, G.; Grosse, S.; Schüssler, W.; Sengl, M.; Letzel, M. LC–MS screening techniques for wastewater analysis and analytical data handling strategies: Sartans and their transformation products as an example. Chemosphere 2015, 137, 198–206. Berset, J. D.; Rennie, E.; Glauner, T. Screening and Identification of Emerging Contaminants in Wastewater Treatment Plant Effluents Using UHPLC/Q-TOF MS and an Accurate Mass Database and Library. Agilent Application Note, 5991-6627EN. 2016, Santa Clara, California. Schymanski, E. L.; Jeon, J.; Gulde, R; Fenner, K.; Ruff, M.; Singer, H. P.; Hollender, J. Identifying Small Molecules via High Resolution Mass Spectrometry: Communicating Confidence. Environ. Sci. Technol. 2014, 48, 2097–2098. Schymanski, E. L.; Singer, H. P.; Slobodnik, J.; Ipolyi, I. M.; Oswald, P.; Krauss, M.; Schulze, T.; Haglund, P.; Letzel, T.; Grosse, S.; Thomaidis, N. S.; Bletsou, A.; Zwiener, C.; Ibáñez, M.; Portolés, T.; de Boer, R.; Reid, M. J.; Onghena, M.; Kunkel, U.; Schulz, W.; Guillon, A.; Noyon, N.; Leroy, G.; Bados, P.; Bogialli, S.; Stipaničev, D.; Rostkowski, P.; Hollender, J. Non-target screening with high-resolution mass spectrometry: critical review using a collaborative trial on water analysis. Anal. Bioanal. Chem. 2015, 407, 6237–6255. Zedda, M.; Zwiener, C. Is nontarget screening of emerging contaminants by LC-HRMS successful? A plea for compound libraries and computer tools. Anal Bioanal. Chem. 2012, 403, 2493–2502. Stein, S. E.; Scott, D. R. Optimization and Testing of Mass Spectral Library Search Algorithms for Compound Identification. J. Am. Soc. Mass Spectrom. 1994, 5, 859–866. Hill, A. W.; Mortishire-Smith, R. J. Automated assignment of high-resolution collisionally activated dissociation mass spectra using a systematic bond disconnection approach. Rapid Commun. Mass Spectrom. 2005, 19, 3111–3118.
130 Drewes and Letzel; Assessing Transformation Products of Chemicals by Non-Target and Suspect Screening Strategies and ... ACS Symposium Series; American Chemical Society: Washington, DC, 2016.