Subscriber access provided by UNIV OF LETHBRIDGE
Article
Prioritizing Unknown Transformation Products from Biologically-Treated Wastewater using High-Resolution Mass Spectrometry, Multivariate Statistics, and Metabolic Logic Jennifer E. Schollée, Emma Louise Schymanski, Sven Erik Avak, Martin Loos, and Juliane Hollender Anal. Chem., Just Accepted Manuscript • DOI: 10.1021/acs.analchem.5b02905 • Publication Date (Web): 17 Nov 2015 Downloaded from http://pubs.acs.org on November 19, 2015
Just Accepted “Just Accepted” manuscripts have been peer-reviewed and accepted for publication. They are posted online prior to technical editing, formatting for publication and author proofing. The American Chemical Society provides “Just Accepted” as a free service to the research community to expedite the dissemination of scientific material as soon as possible after acceptance. “Just Accepted” manuscripts appear in full in PDF format accompanied by an HTML abstract. “Just Accepted” manuscripts have been fully peer reviewed, but should not be considered the official version of record. They are accessible to all readers and citable by the Digital Object Identifier (DOI®). “Just Accepted” is an optional service offered to authors. Therefore, the “Just Accepted” Web site may not include all articles that will be published in the journal. After a manuscript is technically edited and formatted, it will be removed from the “Just Accepted” Web site and published as an ASAP article. Note that technical editing may introduce minor changes to the manuscript text and/or graphics which could affect content, and all legal disclaimers and ethical guidelines that apply to the journal pertain. ACS cannot be held responsible for errors or consequences arising from the use of information contained in these “Just Accepted” manuscripts.
Analytical Chemistry is published by the American Chemical Society. 1155 Sixteenth Street N.W., Washington, DC 20036 Published by American Chemical Society. Copyright © American Chemical Society. However, no copyright claim is made to original U.S. Government works, or works produced by employees of any Commonwealth realm Crown government in the course of their duties.
Page 1 of 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
Jennifer E. Schollée,1,2 Emma L. Schymanski,1 Sven E. Avak,1,3 Martin Loos,1,2 and Juliane Hollender*,1,2 1
Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland
2
Institute of Biogeochemistry and Pollutant Dynamics, ETH Zürich, 8092 Zürich, Switzerland
3
Department of Chemistry, University of Zürich, 8057 Zürich, Switzerland
*Corresponding Author”
[email protected]. phone: +41-58-765-5493. fax: +41-58-765-5893
ACS Paragon Plus Environment
1
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 2 of 12
Prioritizing Unknown Transformation Products from BiologicallyTreated Wastewater using High-Resolution Mass Spectrometry, Multivariate Statistics, and Metabolic Logic Jennifer E. Schollée,1,2 Emma L. Schymanski,1 Sven E. Avak,1,3,† Martin Loos,1,2 and Juliane Hollender*,1,2 1
Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland
2
Institute of Biogeochemistry and Pollutant Dynamics, ETH Zürich, 8092 Zürich, Switzerland
3
Department of Chemistry, University of Zürich, 8057 Zürich, Switzerland
ABSTRACT: Incomplete micropollutant elimination in wastewater treatment plants (WWTPs) results in transformation products (TPs) that are released into the environment. Improvements in analytical technologies have allowed researchers to identify several TPs from specific micropollutants but an overall picture of nontarget TPs is missing. In this study, we addressed this challenge by applying multivariate statistics to data collected with liquid chromatography coupled to high-resolution mass spectrometry (LCHRMS) and subsequent tandem HRMS (MS/MS) in order to characterize peaks detected in the influent and effluent of a WWTP. Known biotransformation reactions were used to link potential parent compounds and TPs, while the structural similarity of these pairs hypothesized by MS/MS similarity was used for further prioritization. The methodology was validated with a set of spiked compounds, which included 25 parent/TP pairs for which analytical standards were available. This procedure was then applied to nontarget data, and 20 potential parent and TP pairs were selected for identification. In summary, a surfactant homologue series, with associated TPs, was detected. Some obstacles still remain, including spectral interferences from coeluting compounds and identification of TPs, whose structures are less likely to be present in compound databases. The workflow was developed using openly accessible tools and, after parameter adjustment, could be applied to any dataset with before and after information about various biological or chemical processes.
INTRODUCTION Studies concerning the fate and occurrence of emerging contaminants in the environment have been increasing in recent years. Consequently, much time and effort has been invested in the development and optimization of highresolution mass spectrometry (HRMS) coupled to liquid chromatography (LC) to detect the small, organic, and polar compounds which in general represent many emerging contaminants.1,2,3,4,5,6 Especially solid-phase extraction (SPE) or large-volume injection has made it possible to detect these compounds also at environmentally-relevant concentrations in complex matrices like wastewater.7,8,9 Screening methodologies fall into one of three categories – conventional target screening, where a compound is confirmed through match with an analytical reference standard measured at the same conditions; suspect screening, where a compound is screened for based on prior information and then is confirmed by exact mass and isotopic pattern; and nontarget screening, which encompasses all other compounds in a sample that have a detected m/z and retention time (RT) but are not yet identified.10 The improved detection capabilities of HRMS have also increased the amount of data generated, most of which is nontarget, as was demonstrated in Schymanski et al.9 Therefore, to make the most of these new techniques, it is necessary to develop methods of data processing, specifically for the selection of nontarget peaks of interest. Different strategies have been applied to pick nontarget peaks for identification. The simplest method (and most often applied) is selecting nontarget compounds with the most intense peaks in one or more samples.9 But, since high intensity does not always translate to the most concentrated or
highest risk substances, relevant lower intensity peaks may be missed. Approaches such as Toxicity Identification Evaluation (TIE) or Effect-Directed Analysis (EDA) can be used to prioritize compounds based on potential toxicity,11,12,13 but these investigations are often time consuming with no guarantee of success. Other strategies include prioritizing nontarget compounds based on the presence of certain elements in isotopic signals or mass defects, since this additional information can increase the odds of a successful nontarget identification.14,15 But this type of selection is often biased towards compounds containing sulfur, chlorine, or bromine. Statistical approaches including multivariate analysis (MVA) have increasingly been used in metabolomics for the identification of new metabolites but has only be applied sparingly in environmental investigations,16,17,18 although it is now incorporated in many open-source and vendor software approaches (e.g., XCMS, XCMSOnline, Umetrics SIMCA, Thermo Sieve, Bruker ProfileAnalysis, Waters MarkerLynx, Agilent Mass Profiler). Unfortunately many of these programs were designed for metabolomics data and the workflows and parameters are not always suitable for environmental studies. Of particular interest in the field of emerging contaminants is the formation of transformation products (TPs).5,19,20 In wastewater treatment it is estimated that 50% of micropollutants are transformed but not fully removed from the treatment process.21 Many strategies exist for discovering TPs, including batch experiments to simulate transformation processes,22,23,24 prediction of TP formation through software such as Eawag-Pathway Predication System (Eawag-PPS, formerly the University of Minnesota-PPS) or manually through metabolic logic,25,26 or use of “fragmentationdegradation” relationships to search for TPs which share
ACS Paragon Plus Environment
2
Page 3 of 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
MS/MS fragment(s) with the parent compounds.27 Specifically for this last point, it has been postulated that the MS/MS spectra of a parent and TP should be similar due to their similar structures.22,27,28 While each of these strategies has had their successes, they are all restricted to known parent compounds and the number of compounds that can be investigated is often limited by time and resources. The aim of the present study was to develop a workflow that uses a new approach to obtain a more comprehensive picture of the formation of nontarget TPs in biological wastewater treatment. For this purpose, a workflow for feature prioritization was developed that incorporates (i) MVA, (ii) difference analysis using known information about biotransformation in wastewater treatment plants (WWTPs), and (iii) spectra similarity match based on the assumption that parent compounds and TPs share fragments. Although some of the methods have been applied previously, they have to our knowledge not been combined and applied to the broad-scale investigation of micropollutants in environmental samples. For validation a known compound set of parent compounds and their TPs was used and the optimized workflow was applied to samples from the influent and effluent of a WWTP near Zürich, Switzerland.
METHODS Sampling. This method was adapted from Schymanski et al.9 Adjustments to the method included addition of an activated carbon layer in the SPE cartridge, to capture more polar compounds; more sample replicates to increase statistical robustness; and a focus on full-scan HRMS data in a first run for an initial prioritization. In summary, 24-hr flowweighted composites were collected in Dübendorf, Switzerland, from a WWTP influent and effluent for four consecutive weekdays (i.e., Monday to Thursday). The first two days and the last two days were combined, respectively, to form two 48-hr flow-weighted composites, which were then filtered (47 mm GF/F filters, Whatman). From each sample type, 200 mL was subsampled nine times. Six replicates were used to perform nontarget screening, while three were used for method validation (done with both target and nontarget screening). Ammonium acetate buffer was added and the pH was adjusted to approximately 6.8, after which samples were filtered again. All samples were spiked with 100 ng absolute of a mix of 116 internal standards (Supporting Information (SI), Table S-1). Offline sample enrichment was carried out with SPE, using manually-packed mixed-mode cartridges containing 200 mg EnviCarb, 350 mg Strata X-AW:Strata X-CW:Isolute ENV+ (1:1:1.5), and 200 mg Oasis HLB (Waters AG, US). Samples were enriched with EnviCarb as the bottom layer in the cartridge, then eluted in the opposite flow direction to prevent compounds eluting from the Oasis and ENV+ mixture from adsorbing to the activated carbon. The addition of EnviCarb showed some recovery improvement compared to the original mixed mode cartridge,25 especially for small polar compounds such as TPs (data not shown). Conditioning was done with 5 mL methanol, followed by 10 mL of nanopure water. After
enrichment and drying of the cartridges, samples were eluted with 6 mL of ethyl acetate:methanol (1:1) with 2.5% ammonia and 3 mL of ethyl acetate:methanol (1:1) with 2% formic acid. Basic and acidic fractions were collected in one glass vial and checked that they were at neutral pH. The samples were then dried to 100 µL under nitrogen and diluted to 500 µL with nanopure water.
Validation Samples. For the evaluation of the MVA and the difference analysis, 25 parent-TP pairs for which standards were available were selected and spiked into validation samples after SPE. Seventeen parent compounds were spiked into the influent samples, while 25 TPs were spiked into the effluent samples (some TPs matched to the same parent) for a final spiked concentration of 20 ng/mL for each compound. Target screening of these compounds in the validation samples was done with XCalibur QuantBrowser software. Measurement. For the initial analysis, only full scan acquisition (LC-HRMS) was performed to maximize information for the statistical analysis. Chromatographic separation was done on 20 µL of sample extract on an HPLC system with a PAL Autosampler (CTC Analytics, Zwingen, Switzerland), an Accela 1250 mixing pump (Thermo Fischer Scientific, San Jose, United States), and an Xbridge C18 column (2.1x50 mm, 3.5 µm) from Waters (Milford, United States) using water and methanol (both acidified with 0.1% formic acid) as solvents. The gradient program was 90:10 at 0 minutes, to 50:50 at 4 minutes, to 5:95 at 17 minutes, then held until 25 minutes, and back to 90:10 from 25.1 to 29 minutes, at a flow rate of 200 µL/min and a column temperature of 30°C. HRMS was performed on Q-Exactive (Thermo Fischer Scientific, San Jose, US) with positive and negative electrospray ionization in separate runs, with a spray voltages of +4kV and -4kV, respectively. Capillary temperature was 300°C, m/z range 100 to 1000, instrument resolution was 140,000 at 200 m/z, and mass accuracy less than 5 ppm. After statistical prioritization, HR-MS(/MS) information was obtained using a similar chromatographic set-up as above, except that an UltiMate 3000 RS mixing pump (Dionex, Sunnyvale, United States) was used. HR-MS(/MS) was performed on a Thermo Orbitrap Q-Exactive Plus, with similar conditions as described above, except the full scan resolution was decreased to 70,000 at 200 m/z to allow more cycle time for MS/MS. For MS/MS experiments, resolution was 17,500, isolation window was 1.0 Da, mass accuracy was less than 5 ppm, and N2 was used as collision gas for higher energy collision-induced dissociation (HCD). A datadependent approach was used with an inclusion list of the suspected parent and TPs based on the prioritization described below. Normalized collision energies (NCE) for HCD of the unknown compounds were calculated based on mass; the range was between 10 and 90. Further information about MS/MS measurement can be found in the SI. Workflow Steps. All elements of this workflow were implemented with openly accessible software. A flowchart of the workflow is presented in Figure 1. RAW files were
ACS Paragon Plus Environment
3
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Page 4 of 12
Figure 1. Final workflow for the prioritization of peaks from nontarget screening, with delineation between the data acquisition, data processing, and data mining steps. Black, solid arrows connect the main parts of the workflow. Red, dashed arrows relate to a componentization step which was not incorporated into the whole analysis but selectively applied in structure elucidation of the prioritized links.
converted to .mzXML files with ProteoWizard (version 3.0).29 All further analysis was done with the R statistical software. Each file was peak picked with the R package ‘enviPick’30 (settings in SI, Table S-3). Profiles of each feature across all samples were constructed with the R package ‘enviMass’31 (settings in SI, Table S-4). Features that were detected in a blank or blind sample were then removed from the feature list. A table was constructed with a row for each sample, column for each feature, and entries were peak intensities. Each column was mean-centered and scaled, and the unsupervised MVA principal component analysis (PCA) was carried out with the R package ‘prcomp’.32 Characterization of the features was based on manual interpretation of the loading and scores plots. After PCA, the inclusion of a supervised MVA method such as partial least squares projection to latent structures discriminant analysis (PLS-DA) was also considered, as were additional statistical tools such as significance testing, intensity fold change, and indicator species analysis; in the end these tools were not incorporated into the final workflow for reasons discussed below. The PLS-DA and univariate tools were implemented with the R package ‘muma’,33 while indicator species analysis was done with the R package ‘indicspecies’.34 After classification of the nontarget peaks into influent and effluent groups with PCA, links between the two groups were explored through difference analysis based on metabolic logic i.e., known biotransformation reactions. For those features classified as likely parent compounds, theoretical TP masses were calculated for a variety of biotransformation reactions
(Table 1). These masses were then screened in those features classified as potential TPs. When matches were found, a “link” was established between these two features. For simple structural modification reactions (i.e., reactions 1-7 in Table 1), a filter was applied so that the RT of the TP had to be less than the parent compound, since TPs are often more polar than their parent compounds.19,22 This restriction was not included for (de)conjugation reactions as they follow this general rule only occasionally. Further links prioritization was done using peak shape, intensity, and significance of the intensity difference between the two sample types with the t-test. Samples were remeasured to collect MS/MS information for the prioritized peaks, as described above. Links were finally selected based on MS/MS spectra similarity of the parent compound and TP as calculated with the R package ‘OrgMassSpecR’.35 Similarity is calculated as the dot product of the aligned intensity vectors, similar though not the same as that provided in the NIST MS Search program.36 Extraction of MS/MS information and visualization of the extracted ion chromatograms (EICs) was done with the R package ‘RMassBank’.37 Presence of isotopic information was determined with the R package ‘nontarget’.38 Formula generation was performed with Thermo XCalibur and/or GenForm (formerly MOLGEN-MS/MS)39 and visually crosschecked with the predicted isotopic patterns from enviPat.40 MS/MS fragment annotation was done with GenForm and candidate structures were retrieved from MetFrag41 and MetFusion,42 as well as manual interpretation of MS/MS spectra. Confidence of the identification of any nontarget 4
ACS Paragon Plus Environment
Page 5 of 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Analytical Chemistry
peaks is based on the confidence levels described in Schymanski et al.43
RESULTS AND DISCUSSION This section consists of three parts. The first part describes the results from the target screening of the validation compounds. Included here is a discussion of the application of PCA and additional statistical methods for the classification of these peaks into likely parent compounds or TPs. The second part details the results from the nontarget screening of the validation set. In addition to the PCA classification, the application of the difference analysis to recover known pairs is discussed here. Finally, in the third part, the results from the nontarget screening of all peaks are described, including the PCA classification, difference analysis, and final prioritization and structure elucidation of selected nontarget pairs.
Validation results from target screening. The use of PCA and other statistical tools was validated initially using target screening data of the 42 spiked target compounds measured in positive ESI mode (17 parents / 25 TPs) (SI, Table S-5). In the PCA, it was confirmed that separation of the influent and effluent samples could be achieved and attributed to a single PC (Figure 2a) and that peaks could be characterized based on the loadings (Figure 2b; PC1 loadings in SI, Table S-5). In particular, the second point was critical for using PCA to classify nontarget peaks. As shown in Figure 2a and 2b, the parent compounds corresponded well to the influent samples, while the TPs corresponded well to the effluent samples. Furthermore, N4-Acetyl-sulfamethoxazole was correctly associated with the influent as expected, since it is a human metabolite of sulfamethoxazole which enters the WWTP and then through deconjugation is transformed back to the parent pharmaceutical. It was also observed that classification can be difficult for compounds which are poorly removed during treatment. Here, carbamazepine, venlafaxine, and irgarol were classified as effluent peaks and the data indeed revealed higher effluent intensities for these compounds, with two likely explanations; either these compounds are being formed through the degradation of TPs entering the WWTP (possible for the pharmaceuticals carbamazepine and venlafaxine but less likely for the pesticide irgarol), or the matrix suppression in the effluent samples is less, thereby enabling higher ionization efficiency and resulting in higher intensities even at lower concentrations. Calculating a correction factor for the matrix suppression was considered, by comparing the intensities of internal standards in influent and effluent samples. But, although there was more ion suppression in the influent samples (average ratio of the area of the internal standard in effluent to influent was 1.36), no clear trend could be drawn and therefore no correction factor could be applied (SI, Figure S-1). Furthermore, the amount of validation compounds spiked was approximately two times the amount present in the samples in order to remain within environmentally-relevant concentrations. But these concentrations were seemingly not different enough to always be captured by the PCA. While PCA could adequately classify the validation compounds, additional statistical tools were considered to improve the classification and increase the specificity. The supervised MVA PLS-DA was able to correctly classify carbamazepine and venlafaxine (SI, Table S-5) but the
implementation of PLS-DA was cumbersome since it was not available as a stand-alone R function but built into a preexisting workflow in the R ‘muma’ package.33 Therefore, it was chosen not to incorporate this type of MVA into the workflow. The second statistical method considered was a “weight of evidence approach”, which incorporated information from the PCA loading, intensity fold change, significance testing, and indicator species analysis. A fold change cutoff of 10 was used, i.e., influent peaks (potential parents) had at least 10 times higher intensity than in the effluent. Irgarol,
(a)
(b)
Figure 2a. Scores plot (which displays the observations, here the samples) of the principal component analysis (PCA) using target screening of the validation pairs; 2b. Loading plot (which displays the variables, here the validation compounds) of the PCA. 5
ACS Paragon Plus Environment
Analytical Chemistry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
carbamazepine, and venlafaxine were not classified with either the influent or the effluent and 16 additional target compounds were eliminated which had been correctly characterized by the PCA, since the fold changes were not adequately large enough (SI, Table S-5). While significance testing added additional information since all target compounds were determined to be significant at p