Article pubs.acs.org/est
Multicriteria Approach To Select Polyaromatic River Mutagen Candidates Christine M. J. Gallampois,†,⊥ Emma L. Schymanski,‡ Martin Krauss,† Nadin Ulrich,§ Mahmoud Bataineh,∥ and Werner Brack*,† †
UFZ - Helmholtz Centre for Environmental Research, Department of Effect-Directed Analysis, Permoserstr. 15, D-04318 Leipzig, Germany ‡ Eawag − Swiss Federal Institute of Aquatic Science and Technology, Ü berlandstrasse 133, CH-8600 Dübendorf, Switzerland § Institute for Prevention and Occupational Medicine of the German Social Accident Insurance, Institute of the Ruhr-Universität Bochum (IPA), Bürkle-de-la-Camp-Platz 1, 44789 Bochum, Germany ∥ Abu Dhabi Men’s College, P. O. Box 25035, Abu Dhabi, United Arab Emirates S Supporting Information *
ABSTRACT: The identification of unknown compounds remains one of the most challenging tasks to link observed toxic effects in complex environmental mixtures to responsible toxicants in effectdirected analysis (EDA). Here, a workflow is presented based on nontarget liquid chromatography-high resolution mass spectrometry (LC-HRMS) starting with molecular formulas determined in a previous study. A compound database search (ChemSpider) was performed to retrieve candidates for each formula. Subsequently, the number of candidates was reduced by applying MS-, physicalchemical, and chromatography-based selection criteria including HRMS/MS fragmentation and plausibility, ionization efficiency with different ion sources and detection modes, acid/base behavior, octanol/water partitioning, retention time prediction and finally toxic effects (mutagenicity caused by aromatic amines). The workflow strongly decreased the number of possible candidates and resulted in the tentative identification of possible mutagens and the positive identification of the nonmutagen benzyl(diphenyl) phosphine oxide in a mutagenic fraction. The positive identification of mutagens was hampered by a lack of commercially available standards. The workflow is an innovative and promising approach and forms an excellent basis for possible further advancements.
■
INTRODUCTION Mutagenic compounds may enter the aquatic environment via domestic, agricultural and industrial effluents and pose a risk to aquatic organisms1 and to human consumers of drinking waters.2 Due to the complexity of chemical mixtures found in typical water samples, with tens of thousands of individual compounds of anthropogenic and natural origin, the identification of those chemicals responsible for measurable mutagenic effects is challenging. Effect-directed analysis (EDA) is a promising approach to address this challenge by progressive fractionation to reduce the complexity of the mixture and subsequently focusing chemical analysis on those fractions exhibiting bioactivity.3,4 The structure elucidation of unknowns in bioactive fractions, particularly mutagenic compounds, remains one of the most challenging steps. This holds especially for chemicals that require liquid chromatography−mass spectrometry (LC−MS) based methods, such as aromatic amines or nitroaromatics, which would otherwise require derivatization before analysis by gas chromatography− mass spectrometry (GC−MS). LC−MS-based methods offer a wide coverage of chemicals and is the first choice for polar and © XXXX American Chemical Society
water soluble compounds (e.g., aromatic amines) analysis, but do not yet have the extensive spectral libraries available as for GC−MS.5,6 While GC-electron impact (EI) MS are generally very reproducible and commercial libraries exist with several hundred thousand compounds (e.g., NIST, Wiley), the soft ionization techniques used in LC−MS are much less reproducible between instruments (such that these are less comparable) and the compound space covered by available open and commercial libraries covers only a few thousand compounds (generally all measured with electrospray ionization), limiting the applicability of spectral library searching for mutagen identification in EDA studies immensely. Due to these limitations, compound database searching has become very popular in nontarget screening involving LC-MS, particularly combined with in silico fragmentation techniques, but the correct structural candidate is rarely ranked in the top place Received: July 26, 2014 Revised: January 28, 2015 Accepted: January 30, 2015
A
DOI: 10.1021/es503640k Environ. Sci. Technol. XXXX, XXX, XXX−XXX
Article
Environmental Science & Technology using this information alone,7−9 especially if large compound databases (with several million substances) are used. Thus, LC−MS-based structure elucidation of mutagenic unknowns in EDA requires extensive refinements of compound database searches to allow for quicker candidate selection based on optimized use of the analytical and toxicological information available. Non target screening involving LC-HRMS where no information about the analyte is available in advance typically involves noise removal, automated deconvolution and peak picking.10 Next, the most relevant peaks (m/z of interest) are selected by comparing different samples and blanks. Elemental compositions are assigned to each peak of interest, where rules such as those from Kind and Fiehn can be applied.11 A previous study on the isolation and selection of polyaromatic mutagens12 provided molecular formulas for 17 candidate mutagens, which were not present in blank and nonmutagenic neighboring fractions. The bioactivity of concern in this fraction was mutagenicity, which was assessed with the Ames test.13,14 The results indicated a predominance of aromatic amines.12 The molecular formulas of 13 or 14 unknown compounds contained at least one N, such that aromatic amine functionality was potentially present in these unknowns. Based on these results, the present paper focuses on a workflow to address candidate selection and the tentative identification of potential mutagens matching the molecular formulas of interest. Assuming that after decades of chemical research, the probability of a chemical structure to occur in the environment is greater if this structure has been found and described before, it is viable to start elucidation efforts by searching large structure databases such as PubChem and ChemSpider, each with over 25 000 000 compounds.15,16 As mentioned above, combining this search with an in silico fragmenter such as MetFrag6,9 allows a ranking of the candidates according to the match between the calculated and experimental fragments, but does not often provide sufficient ranking power to distinguish only a few candidate structures from the often several thousand candidates retrieved, even using the best fragmenters published to date.7,17 Further criteria such as retention time correlation with the octanol−water partition coefficients (log Kow) and linear solvation-energy relationships (LSERs) for retention prediction18,19 have been already used in EDA studies to eliminate candidates in GC-MS20,21 and LC−MS.22 Since EDA studies focus on the identification of structures that may cause the observed test response of concern, quantitative structure activity relationships (QSARs)23 and structural alerts24 are promising approaches for exclusion of candidate toxicants.25 The mutagenicity of aromatic amines (which was shown to occur in the sample12) has been demonstrated to be related to the stability of nitrenium ions as the ultimate electrophilic metabolite that covalently binds to DNA.26−29 The nitrenium calculation is quick and simple enough to be broadly applicable to several thousand candidates and as such is more amenable to the selection of potentially mutagenic candidates of interest than, for example, structural alerts or more extensive toxicity calculations. Such information has not yet been incorporated into candidate selection for nontarget identification of toxic components. Thus, the aim of the present study was to develop a multicriteria methodology incorporating all available analytical and toxicological information for the selection of potentially mutagenic polyaromatic substances measured in surface waters to prioritize and expedite identification efforts. The ionization and fragmentation
behavior of 47 reference standards was investigated with both ESI and APCI in positive and negative mode to characterize diagnostic fragments and include ionization behavior as a criterion for candidate selection. The structure elucidation strategy relies on database search followed by sequential reduction of the number of candidate compounds for each molecular formula by exploiting mass spectrometric (MS/MS fragments (both present and absent) plus ionization behavior), chromatographic (retention time using partitioning behavior and linear solvation energy relationships) and toxicological information (mutagenicity prediction) that was accessible from experiments and models. The fraction N2−8 from a previous investigation12 was selected as an example to demonstrate this workflow for structure elucidation.
■
MATERIAL AND METHODS Standards and Samples. The solvents, reagents and chemical standards used are given in the Supporting Information (SI), Table S1. The 47 reference standards (prepared in methanol) were mainly compounds containing at least one aromatic ring or a larger polyaromatic structure, with a wide range of functionalities (nitro-, keto-, hydroxyl-, nitro-keto-PAHs, quinones, hydroxyl-quinones, amino-compounds, and azaarenes), covering a wide range of octanol− water partitioning coefficients (log Kow 0−5.6, using EPISuite and ACD/Laboratories within ChemSpider16). The study involved mass spectrometric signals observed in positive and negative mode using electrospray ionization (ESI) and atmospheric pressure chemical ionization (APCI), tandem MS/MS fragmentation and relative detection limits in the various modes. The studied fraction N2−8 was based on water from the River Elbe collected with a passive sampler (blue rayon) and fractionated as described previously.12 The passive sampler was extracted using methanol and as it is extremely selective for aromatic compounds, little to no matrix effect was expected. The fraction was obtained as the result of an ionexchange based separation in acidic, neutral and basic compounds and two consecutive reversed phase fractionation steps according to log Kow. It is characterized by compounds with a log Kow within the range [0.5−4.2],12 (including an estimation error of ± 120). The seventeen compounds of interest were numbered from 1 to 17 according to their elution order (chromatographic information for these peaks is given in the SI, Table S2). LC-MS System. Separation of standards and active fraction N2−8 was performed on an Agilent LC system series 1200 HPLC (SI, Description S1) controlled by the ChemStation software, as described in ref 12. This system was coupled to an LTQ-Orbitrap hybrid instrument (Thermo Fisher Scientific, Bremen, Germany), equipped with an APCI or ESI source and controlled by the Xcalibur software. Further details regarding MS spectra and data-dependent MS/MS acquisition are given in the SI, S1. For chemical analysis, the standards and active fraction N2−8 were separated on a C18 reversed-phase column (LC-PAH, 250 × 2.1 mm, 5 μm particle size, Supelco, CA). Volumes of 5 and 10 μL were injected for standards and fraction, respectively. Lowest detectable concentrations of the compounds in positive and negative APCI and ESI modes were determined by injecting 5 μL of standard solutions at different concentrations (from 1 ng/mL to 2 μg/mL). For the LSER experiments, a C18 RP column (Kinetex C18, 150 × 3.0 mm, 2.6 μm particle size, Phenomenex, Aschaffenburg, Germany) was calibrated with the standards B
DOI: 10.1021/es503640k Environ. Sci. Technol. XXXX, XXX, XXX−XXX
Article
Environmental Science & Technology listed in Ulrich et al.19 to obtain a relationship between retention times and CHI values. The fraction investigated, N2− 8, was then separated on the same column.17 The gradient used was as described in Hug et al.22 and is given in SI S1. Identification Procedure. The identification procedure on the basis of the molecular formulas obtained previously12 is given in Figure 1 and summarized below.
MS/MS interpretation also involved fragments from candidates that were expected but not observed, based on the fragmentation pattern of the reference standards. This was applied for the first time in candidate selection to compliment the in silico fragmentation results, which only uses the fragments observed to rank candidates. The estimated log Kow values of the candidates (EPISuite and ACD/Laboratories within ChemSpider16) were used to exclude all candidates outside the log Kow range for fraction N2−8 ([0.5−4.5], as calculated previously12 using EPISuite30). The ion-exchange separation applied to the fraction12 occurred at loading pH values of 2.5 and 11 and for compounds with pKa values with three pH units respectively above and below the loading pH values, the neutral species would not be expected to be seen in significant concentration in the neutral fraction. (iii) A retention prediction using linear solvation energy relationships (LSERs) was performed as described in Ulrich et al.19 using the chromatographic hydrophobicity index (CHI) as the solute property to describe the retention of an analyte. In brief, standards with known CHI (Table 1 in 19) were measured on the LC system described above to derive a linear relationship for CHI by correlating the measured retention times (Rt) with the literature CHI values (eq 1). This equation was used to calculate the CHI value from the respective Rt of each of the 17 peaks of interest in the chromatographic system. These CHI values were compared to calculated ones applying the LSER model for the prediction of CHI (eq 2) using predicted solute descriptors A, B0, S, E, and V for all candidate structures. Solute descriptors were obtained from ACD/ADME Suite 5.0.7 Absolv (ACD/ Laboratories, Toronto, Canada). Prediction of CHI included a certain prognosis interval (PI) which resulted from the development of the LSER model and was set to 95%. Each predicted value includes therefore a PI {(CHIPI) − (CHI+PI)} where the prediction error of the model is implicated.
Figure 1. Strategy for the database dependent identification of unknown compounds
(i) High resolution-MS/MS: MetFrag9 was used to retrieve candidates from ChemSpider16 based on the determined molecular formula and to rank the candidates according to the match of in silico-generated and experimental fragments. This search is performed automatically once the formula, MS/MS fragments (exact mass and intensity) and the database to be searched are specified. The MS search parameters “mzabs” (absolute mass deviation) and “mzppm” (mass accuracy measured in ppm) were set to 0.001 and 5, respectively. Candidates are returned with a score between 0 (poor match) and 1 (best matching candidate(s)), explained in Wolf et al.9 To retain candidates with a sufficient spectral match, a cutoff was chosen based on the score distributions (described below), eliminating all candidates with a lower score. (ii) The remaining candidates were then selected using the available analytical information (ionization efficiency, MS/MS interpretation (including eliminating candidates where further fragment(s) were expected but not observed in the experimental data), log Kow and acid/ base properties (pKa). Candidates that did not match the analytical information were eliminated. The ionization efficiency-based selection (presence or absence of the signals in positive and/or negative ionization mode in ESI and APCI, plus ion type and ratios observed) relies on the observation for the reference standards, see below. The observed detection limits were also used to define ionization rules for the different functional groups. The
CHI = 3.55 Rt − 4.86(r 2 = 0.9543)
(1)
CHI = −6.75(± 0.88)A − 34.05(± 1.13)B0 − 9.47( ± 0.98)S + 1.71(± 0.58)E + 35.20( ± 0.91)V + 62.80( ± 0.87)
(2)
(iv) Mutagenicity prediction based on the stability of the nitrenium ion, described by Bentzien et al.26 and given in eq 3. Stability of the mono- and polycyclic unknown (Ar) is calculated relative to the nonmutagenic compound aniline (PhNH2). ΔΔE = ΔEArNH+ + ΔE PhNH2 − ΔEArNH2 − ΔE PhNH+
(3) +
ArNH2 represents the parent amine, while ArNH is the corresponding nitrenium ion. Accordingly, PhNH2 is aniline with PhNH+ as the corresponding nitrenium. ΔΔE represents the overall enthalpy change and ΔE values correspond to the heat of formation (ΔHf) of the compounds. Candidates were classified (according to 26), as follows: ΔΔE < −5 kcal/mol compounds are considered as likely positive in the Ames test, ΔΔE within the range [−5 + 5] kcal/mol compounds are C
DOI: 10.1021/es503640k Environ. Sci. Technol. XXXX, XXX, XXX−XXX
Article
Environmental Science & Technology
Table 1. Diagnostic Losses for the Different Functional Group and Ionization Rules for the Presence of Functional Group in ESI and APCI ± Using CID Fragmentation with Collision Energy Sets up at 35%a typical fragment losses class of compounds/ compounds
APCI +
APCI −
azaarenes
[M + H − CHN]+
[M − H]− or no signal
heterocyclic amines
[M + H − CHN]+ [M + H − NH3]+
[M − H]−
keto-PAHs
[M + H − CO]+ [M + H −CHO]+ [M + H − C6H6]+[M + H − C7H6O]+ [M + H − NO]+ [M + H − NO2]+ [M + H − H2O]+ [M + H − CO]+ [M + H − CO2]+ [M + H −2 CO]+ [M + H − 3 CO]+ [M + H − H2O]+
[M]−
benzophenone (ketone) nitro-keto-PAHs quinones
hydroxyl-quinones
amino-compounds amides nitro-compounds hydroxyl-PAHs 2-naphtoic acid (carboxylic acid-PAH)
[M [M [M [M [M [M No [M [M [M [M
+ H − CO]+ + H − CO2]+ + H −2 CO]+ + H − 3 CO]+ + H − NH2]+ + H − NH3]+ typical loss + H − NO]+ + H − NO2]+ + H − H2O]+ + H − H2O]+
[M + H − CO2]+ [M + H − substituent]+ [M + H − substituent]+
ESI + [M + H −CHN]+
ESI −
APCI +
APCI − ESI + ESI −
no signal
+
+
I+
−
[M − H]−
+
+
I+
+
no signal
+
I+
+
−
no signal
+
−
+
−
[M - H − NO]−
[M+H −CH2N]+ [M + H −CHN]+ [M+H −NH2]+ [M+H −NH3]+ [M + H − CO]+ [M + H −CHO]+ [M + H − C6H6]+ [M+H − C7H6O]+ no signal
no signal
nd
I+
−
−
[M]−
no signal
no signal
nd
I+
−
−
[M]−
[M + H − H2O]+
[M − H CO]−
+
I+
+
+
no signal
+
−
I+
−
no signal
+ nd
− I+
I+ −
− −
+ +
I+ +
+ +
+ I+
no signal
no signal
[M − NO]−
[M [M [M [M [M [M
+ + + + + +
H H H H H H
− CO]+ − CO2]+ −2 CO]+ − 3 CO]+ − NH2]+ − NH3]+
no signal
[M − H]− no signal [M - H − CO2]− [M + H − H2O]+
[M + H − CO]+
substituents on aromatic rings (e.g.CH3) N-substituents
ionization rules
[M + H − CO]+ [M - H − substituent]− [M - H − substituent]−
[M − H]− [M - H − CO]− [M - H − CO2]−
[M + H − CO2]+ [M + H − substituent]+
[M − H]−
[M + H − substituent]+
[M − H]−
+, ionizable/detected; −, non ionizable/not detected; I+, maximum intensity associated to one ionization source vs the other source and nd: not determined (ionization is compound dependent). a
unclassified and ΔΔE > +5 kcal/mol compounds are considered as likely negative in the Ames test. This approach is applicable to mono- and poly cyclic aromatic amines. The ΔE calculations were performed using MOPAC,21 the ΔHf for the aniline (ΔEPhNH2) was 20.41 kcal/mol and ΔHf for the corresponding nitrenium ion of the aniline (ΔEPhNH+) was 126.27 kcal/mol.
■
generally assumed to provide larger signals in ESI.32 This assumption was confirmed for azaarenes, heterocyclic amines, amides and amino-compounds. However, hydroxyl-quinones and hydroxyl-PAHs, which are acidic in solution, were detected better with APCI. Nonpolar compounds lacking a site for protonation or deprotonation were expected to be barely ionizable with ESI33,34 and this was confirmed for the nitrocompounds, which were only detected in APCI measurements.35 More fragment-rich MS/MS spectra were obtained from ionization in positive mode (APCI and ESI), while compounds observed in negative APCI mode did not generate many fragments. The molecular or deprotonated ion ([M]− or [MH]−) was often the most abundant ion in negative mode. As expected, the weakest bonds, including substituents on PAH rings, bonds in β position of the oxygen in a carbonyl group and heteroatom-substituents bonds (e.g., nitrogen atom − functional group) were the first bonds to be broken and release
RESULTS AND DISCUSSION
Ionization of Standard Compounds by ESI and APCI Sources. The 47 standards, consisting of nitro-, keto-, hydroxyl-, carboxylic acid-, nitro-keto-PAHs, amides, quinones, hydroxyl-quinones, amino-compounds, and azaarenes were measured in positive and negative ESI and APCI modes. These results are summarized in Table 1 and the full details are in Table S3 (SI). The MS/MS spectra are available on the NORMAN MassBank database31 with the reference numbers given in Table S3 (SI). Chemicals ionizable in solution are D
DOI: 10.1021/es503640k Environ. Sci. Technol. XXXX, XXX, XXX−XXX
Article
Environmental Science & Technology Table 2. Multi-Criteria Parameters Used to Eliminate Candidate Structures from Left to Righta peak
molecular formula
candidates in data set
high score metfrag candidates selection score
1
C12H13NO2
1345
0.89−1
2
C18H13NO2
417
0.749−1
3
C19H11NO3
21
0.925
4
C13H10O2N2S
293
0.778−1
5
C16H11NO2
346
0.879−1
6
C12H12N2
500
0.802−1
7 8
C21H11NO3 C17H19NO4
8 2257
1 0.722−1
9
C18H21N3O5
1520
0.833−1
10 11
C17H12O C15H16N2O3
113 2900
0.862−1 0.873−1
12 13
C19H17OP C21H20N4O3
20 1635
1 0.856−1
14
C15H16N2O
1767
0.869−1
15
C14H8O3
44
16
C12H9N
17
C20H9N3O2 C22H10O3
analytical information remaining candidates
LSER
MS/MS interpretation 7
log Kow and pKa 7
cand
20
ionisation mode 19
2 or 3(5) 3 (6)
7
5
5
1
1
3
3
3
1
1
3 to 5 (6) 1 or 2 (2) 6 or 7 (7) 2 (6) 2 to 4 (5) 3 to 5 (6) 3 (4) 1 or 2 (2)
4
4
4
3
1
5
5
2
2
1
59
47
47
15
0
1 10
1 9
1 3
0 3
0
8
8
7
4
0
5 18
5 18
5 15
0 14
4
MS/ MS fitb 1 (1)
3 (3) 4 or 5 (6) 1 or 2 (2)
cand
1
4 7
4 4
4 4
4 1
# 0
116
101
51
35
9
1 or 2 (4) 1 (1)
6
4
1
1
0
71
0.77− 0.956 1
71
68
30
4
2
5 2
0 0
0 0
0 0
candidates name
mutagenicity prediction
7-amino-4-propyl-2H-chromen-2one 3-amino-2,5-diphenylcyclohexa-2,5diene-1,4-dione 3-(2-hydroxybenzoyl)-5Hindeno[1,2-b]pyridin-5-one methyl 3-amino-5-(4-cyanophenyl) thiophene-2-carboxylate 9-ethynyl-9H-fluoren-1-yl carbamate
ames -
2-methoxy-N′-[1-(5-methylfuran-2yl)ethylidene]benzohydrazide 1,3-bis(2-methoxyphenyl)urea 1,3-bis(3-methoxyphenyl)urea 1-(2-methoxyphenyl)-3-(3methoxyphenyl)urea
na
inconclusive na inconclusive na
na na na na
2-amino-3-methyl-N-(2methylphenyl)benzamide 2-amino-3-methyl-N-(3methylphenyl)benzamide 2-amino-3-methyl-N-(4methylphenyl)benzamide 2-amino-6-methyl-N-(2methylphenyl)benzamide 2-amino-6-methyl-N-(3methylphenyl)benzamide 1-[2-amino-5-(benzylamino) phenyl]ethanone 2-amino-N-benzyl-3methylbenzamide 2-amino-N-benzyl-6methylbenzamide 2-amino-N-benzyl-5methylbenzamide
inconclusive
3H-benz[e]indole 1H-benz[f]indole
na na
inconclusive inconclusive inconclusive inconclusive Ames + inconclusive inconclusive inconclusive
Numbers in bold represent the number of candidates remaining after each filtration step. b(number): Number of experimental MS/MS fragments. Cand: remaining candidates. na: not applicable. # Standard was available for purchase and confirmed as present in the fraction.
a
groups (Table 1 and Table S3, SI), which can be used as diagnostic fragments to select candidates. For instance, azaarenes in positive mode released CHN and CH2N, while CO and CHO neutral losses are typical for keto-PAHs and NO for nitro compounds. Lowest detectable concentrations of the compounds in positive and negative APCI and ESI (SI Table S3) were used to interpret and apply relative efficiencies of different ionization modes to different classes of compounds. For instance azaarenes, heterocyclic amines, amino-compounds, and amides exhibit MS signals up to 10 times lower in ESI
fragments (e.g., nitro-, amino-, hydroxyl-groups, methyl), while the ring structure of polyaromatic compounds remained intact. As a larger range of compound groups (particularly nitro compounds) could be ionized in APCI, this technique was used to identify unknowns of interest in original sample screening. The information from ESI was used to confirm the presence or absence of certain functional groups in the molecular structure of the unknown compounds according to the criteria presented in Table 1. The study of the 47 standards enabled the identification of typical neutral losses associated to functional E
DOI: 10.1021/es503640k Environ. Sci. Technol. XXXX, XXX, XXX−XXX
Article
Environmental Science & Technology
Figure 2. Illustrative example for the selection of the best candidates for Peak 14. (a) MS2 spectrum, (b) MetFrag fragmentation prediction for the highest scored candidates, and (c) procedure using MetFrag scoring, MS interpretation, analytical properties and LSER and mutagenicity predictions.
filtering of molecular formulas.11 The multicriteria identification strategy presented in Figure 1 was used to select candidate structures as follows, with each step generally reducing the number of candidates. The number of candidates was drastically decreased (in some cases by orders of magnitude) for most peaks by using a cutoff value of the MetFrag score chosen case-by-case according to the score distribution (in all cases >0.7). The cutoff was chosen where the distribution showed a clear break between the top candidates and the next candidates. The candidate reduction is shown in Table 2 and the reasoning is explained in detail per peak in the SI, Table S4. For example, for Peak 1, the
positive than in APCI. Nitro-compounds are only detected in APCI, mainly in negative mode, so if the same compound (same exact mass and retention time) is also detected in ESI mode, it will not contain a nitro group. Thus using this information for the selection of candidate can be useful. Identification Procedure. The formulas for the different unknown compounds were previously determined12 using the accurate mass, the isotope pattern and the elements C, H, N, O, P, and S (no halogens were observed). The error margin was set to 5 ppm and the formulas provided by the software Xcalibur (Thermo Scientific, Bremen) and MOLGEN-MSMS36 were verified applying the Seven Golden Rules for heuristic F
DOI: 10.1021/es503640k Environ. Sci. Technol. XXXX, XXX, XXX−XXX
Article
Environmental Science & Technology
of the unknown showed two fragments (C8H8NO and C7H9N), representing a splitting of the molecule (Figure 2a). The 1767 candidates were ranked by MetFrag, where 116 candidates had a score > 0.865 (Figure 2c) and one or two predicted fragments (Figure 2b). This peak was detected only in positive mode, which does not support the presence of hydroxylated compounds, or an aromatic ketone. As seen in Table 1, the signal is expected to be more intense in APCI negative than in APCI positive mode for these two types of compounds. Fifteen candidates could be removed on the assessment of ionization behavior. As both fragments can be explained by breaking only one bond, candidates with only one fragment predicted were removed from the list. Furthermore, for some candidates, the loss of a methyl group (N − methyl bond, Table 1) and/or the loss of CxHyN2 (bond breakage in β position of carbonyl, Table 1) would be expected but were not observed in the experimental spectrum. Thus, these candidates were excluded. The match of physical-chemical properties of the candidates with the analytical information on the unknowns (logKow range and pKa) led to the exclusion of another 16 structures. Matching predicted and experimental retention (LSER) enabled to further decrease the number of candidates to nine. Among these nine candidates, one was predicted as a possible mutagen, while the others were inconclusive. However, the reference standards are not commercially available and thus it was not possible to confirm the presence or potential mutagenicity of this compound. Using this procedure for Peak 12, four structures remained after selection, benzyl(diphenyl) phosphine oxide, (3-methylphenyl) (diphenyl)phosphine oxide, (2-methylphenyl) (diphenyl)phosphine oxide and (4-methylphenyl) (diphenyl)phosphine oxide. Benzyl(diphenyl) phosphine oxide was available as standard and the chromatogram and MS2 spectrum of the unknown Peak 12 and chemical standard are presented in the SI Figure S1a and Figure S1b, respectively. The retention times and the MS2 fragmentation pattern are identical for both unknown and standard. The fragments show a difference of 1 ppm or less, and the relative intensities show similar trends. Thus, the benzyl(diphenyl) phosphine oxide is confirmed to be present in the blue rayon sample, but was not responsible for the mutagenicity observed in the sample (see SI section S2, Figure S2a and b). Peaks 2, 4, 12, and 14 were the only peaks for which candidate structures remained after all steps and these are given in SI Figure S3. All but the confirmed structure mentioned above have aromatic amine groups. Of these, the bottom left structure (1-[2-amino-5-(benzylamino)phenyl]ethanone, is predicted to be Ames positive (all others inconclusive or negative) and is thus most favorable in terms of explaining the sample mutagenicity, however none of the remaining standards are commercially available. Critical Assessment of the Candidate Selection Procedure. The candidate selection for unknown identification presented here is a promising method for the isolation and investigation of mutagenic compounds in water matrices and shows how candidate reduction can be achieved using a vast range of analytical information. The successful nontarget identification of benzyl(diphenyl) phosphine oxide (Peak 12) supports the potency of the workflow for candidate selection. The procedure proved to be very successful in reducing the number of possible candidates from thousands to very few. However, information on sources, environmental fate and toxicity is not available for these compounds, as they have never been reported in environment and are not commercially
candidates were reduced from 1345 to 20 using a score cutoff of 0.89 in MetFrag. Thus, the cut-offs applied were more dynamic than the strict cutoff of 0.9 applied in Hug et al.,22 which would have resulted in the elimination of several interesting candidates here. Peak 16, Table 2, is an exception to the candidate reduction, as there was only one fragment recorded (hydrogen loss) resulting in score of 1 for all the candidates. The second step involved the interpretation of analytical information. The ionization efficiency criteria are summarized in Table 1 and were applied in the selection of candidates. For instance, for Peak 6, 59 candidates were retrieved from ChemSpider using MetFrag, it was detected only in APCI positive mode. Among these 59 candidates, 12 had an amino group in their structure (details in Table S4, SI) and could be removed from the list as it was shown that amino compounds are better ionizable in ESI positive mode (Table 1 and Table S3, SI). Using the ionization efficiency for Peak 8, which was not detected in ESI negative mode, one of the candidates with a hydroxyl group in its structure was discarded, as such compounds are also detected in ESI negative (Table 1 and S3, SI). Table 2 shows that candidates could be removed for Peaks 6, 8, 13, 14, 15, and 16 in this manner. MS/MS interpretation of fragments that are observed, but not expected or vice versa, allowed for further reduction of candidates, as seen for Peak 5 (Table S4, SI), for which two candidates among five have a benzyl group in their structure which is expected to be lost during fragmentation (substituent on aromatic rings for candidate 3, and benzyl at β position of carbonyl for candidate 2). Table 1 shows that substituents on aromatic rings are easily lost and carbonyl bonds are breakable as seen for the benzophenone. Using the MS/MS interpretation of fragments it was possible to decrease the number of candidates in many cases as seen in Table 2 for Peaks 1, 14, and 16, for which more than half of the number of candidates were removed. The combination of log Kow and pKa criteria enabled the removal of further candidate structures for some of peaks (e.g., 6, 14, and 16) but none for others (Table 2). In a third step LSER was applied. The CHI values of all peaks of interest were between 78 and 88, except for the unknown 6 (CHI = 49, SI Table S5 for details). For instance Peak 11 required 81% of methanol to elute and among the 14 candidates only 4 of them had a predicted CHI matching this unknown (SI Table S5). Applying the comparison of measured and predicted CHI values for all peaks resulted in the reduction of 130 candidates in total to 20 candidates. In the last step, mutagenicity prediction was performed and the calculated values of ΔHf for the candidates containing an amino group are given in the SI Table S5. Among the 20 candidates remaining to explain Peaks 1, 2, 3, 4, 5, 11, 14, and 16, amino groups were present in 12 of these. For instance for Peak 1 (SI Table S5), only one candidate remained on the list after the LSER selection and did not possess an amino group in its structure (4,5,6,7-tetramethyl-1H-indole-2,3-dione) and was removed as it was clearly shown that the mutagenicity in this fraction was caused by aromatic amines.12 Table 2 shows that one compound was identified as possible mutagen, one as a nonmutagen and ten were inconclusive. Although most candidates were inconclusive (ΔΔE values within the range [-5 − 5] kcal/mol), all but two of these are highly similar substitutional isomers (see SI Figure S3) and thus would be expected to have similar results. Illustrative Example of the Procedure. Figure 2 explains the selection procedure for Peak 14. The APCI MS2 spectrum G
DOI: 10.1021/es503640k Environ. Sci. Technol. XXXX, XXX, XXX−XXX
Article
Environmental Science & Technology available. Although this prevented analytical and toxicological confirmation in this study, further investigation and monitoring of aromatic amines in the environment will be performed as a result of this study, including the tentatively identified candidate mutagens into suspect screening.37 The use of MetFrag greatly improved the searches through thousands of compounds with the same molecular formula, and was helpful in the selection of the best candidates due to the ranking power of the score. The applied combination of mass spectrometric, physicochemical, and chromatographic information provided valuable additional selection criteria and helped drastically reduce the number of candidates. Together with a study on EDA of androgen and arylhydrocarbon receptor binding compounds who used an insilico model to confirm receptor-binding potential38 the present investigation is one of the first that includes effect prediction into candidate selection. Mutagenicity prediction for aromatic amines, is straightforward to use and does not require further experiments. Although in this study in silico mutagenicity modeling did not reduce the number of candidates greatly (one compound for Peak 1 was predicted nonmutagenic, Table 2) it may be very promising in other cases where more candidates are left after filtering. However, even if the presented workflow showed clearly its efficiency to decrease the number of candidates and to tentatively identify possible structures, no mutagens could be positively identified and there is a need for further advancement. (i) Aromatic amines and other chemicals of concern might exhibit adverse bioactivity (here mutagenicity) at concentrations below the detection limit of many nontarget methods and thus could not be identified as the cause of the observed effects. Some studies have reported ng/L level of aromatic amines in surface waters.14,39 It cannot be excluded that at such low concentrations, aromatic amines responsible for the mutagenicity in this fraction, might not be detected. However, in a previous study,40 freely dissolved effect concentrations (DEC) along the entire concentration− response relationship were measured on the same equipment and associated to mutagenicity of several known mutagens. Analyzed DECs for the extremely mutagenic 3-nitrobenzanthrone go down to few ng/L with a DEC50 of 10 ng/L (5 pg on column). For 1,3diaminopyrene, a typical mutagenic amine DECs down to 2 μg/L were clearly above the limit of quantification with a DEC50 of about 100 μg/L (50 ng on column). With our analytical method concentrations in the pg levels of 1,8-diaminopyrene, 1-nitropyrene and 2-nitro-9fluorenone can be quantified (SI Table S3) . Thus, we believe that the described method should be able to detect mutagens present in the fractions. However, quadrupole-time-of-flight (QToF) and triple quadrupole instruments41 but also next generation Orbitraps are more sensitive and thus lower limits of detections can be achieved. An alternative may be the use of derivatization to enhance the stability and detectability of aromatic amines as shown by Fekete et al.,42 where ng/L levels were achieved for different aromatic amines using different derivatization agents. However, such tailormade analytical approaches for a specific class of chemicals may only be helpful in a few EDA studies,
while the typical approach needs to screen for the whole range of possible chemicals in a biologically active fraction. (ii) Since the present approach was based on database search and databases are always limited, it is possible that the compounds behind the peaks detected may be absent from the database. An alternative approach to database searching is the use of structure generation to deliver all possible structures for a given molecular formula and substructure information.21 For larger molecules this approach may produce several million possible structures which cannot be evaluated with typical computer capacities. As shown for GC-EI-MS the involvement of mass spectral substructure information and calculated properties may be effective in limiting the generated structures to a manageable number.43 In this study, insufficient information from fragmentation patterns was available to limit the number of candidates sufficiently. This can be demonstrated for Peak 17 (SI Table S4), where the occurrence of only one fragment corresponding to a loss of carbonyl ([M+H-CO]+) results in the generation of more than 10 000 000 possible structures for the formula C20H9N3O2. (iii) All models used for candidate exclusion suffer from uncertainties, which are propagated if several models are used sequentially. Thus, the correct individual structure might fail to fulfill all criteria and thus be excluded. A consensus approach scoring candidates according to criteria rather than filtering may reduce the risk of false exclusion.43 (iv) Although exploiting more analytical and toxicological information than ever before in LC-MS based EDA there is still more information that might be used. This includes for example fragments formed under different collision energies particularly in the context of expected but nonobserved fragments. Further investigation using deuterium exchanges could also provide some information on labile hydrogens as performed in Hug et al.22 The major challenge for all these additional approaches is a lack of applicable in silico prediction tools and the reliance on manual MS/MS interpretation and expert judgment, which are time-consuming and not always available. (v) In this study, it was assumed that the masses in the chromatograms represented the most commonly observed protonated or deprotonated molecular ions. Although this is a reasonable approach to limit the number of possible structures, other ion species may also occur, such as [M] + for quaternary ammonium compounds or adducts such as [M+Na]+, [M+NH4]+, and many more.44 Involving these adduct/ion species may reduce the risk of overlooking relevant structures in future work. It is also possible that in-source fragmentation of the molecule occurs during the ionization process, such that the masses correspond to fragments and not to parent molecules. To summarize, in the present work a promising and innovative approach for the LC-MS based tentative identification of mutagens in fractions of river water extracts has been presented. This workflow has the potential to be translated to other toxicological end points and environmental matrices and may be a basis for future EDA studies. Possibilities for further H
DOI: 10.1021/es503640k Environ. Sci. Technol. XXXX, XXX, XXX−XXX
Article
Environmental Science & Technology
(3) Brack, W. Effect-directed analysis: A promising tool for the identification of organic toxicants in complex mixtures? Anal. Bioanal. Chem. 2003, 377 (3), 397−407. (4) Weiss, J. M.; Hamers, T.; Thomas, K. V.; van der Linden, S.; Leonards, P. E. G.; Lamoree, M. H. Masking effect of anti-androgens on androgenic activity in European river sediment unveiled by effectdirected analysis. Anal. Bioanal. Chem. 2009, 394 (5), 1385−1397. (5) Kind, T.; Fiehn, O. Advances in structure elucidation of small molecules using mass spectrometry. Bioanal. Rev. 2010, 2 (1−4), 23− 60. (6) Scheubert, K.; Hufsky, F.; Bocker, S. Computational mass spectrometry for small molecules. J. Cheminformatics 2013, 5. (7) Schymanski, E. L.; Neumann, S. CASMI: And the winner is. Metabolites 2013, 3 (2), 412−39. (8) Nishioka, T.; K., T.; Kinumi, T.; Makabe, H.; Matsuda, F.; Miura, D.; Miyashita, M.; Nakamura, T.; Tanaka, K.; Yamamoto, A.Winners of CASMI2013: Automated tools and challenge data. Mass Spectrom. 2014, 3 (2), dx.doi.org/10.5702/massspectrometry.S0039. (9) Wolf, S.; Schmidt, S.; Muller-Hannemann, M.; Neumann, S. In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC bioinform. 2010, 11, 148. (10) Zedda, M.; Zwiener, C. Is nontarget screening of emerging contaminants by LC-HRMS successful? A plea for compound libraries and computer tools. Anal. Bioanal. Chem. 2012, 403 (9), 2493−2502. (11) Kind, T.; Fiehn, O. Seven Golden Rules for Heuristic Filtering of Molecular Formulas Obtained by Accurate Mass Spectrometry. BMC Bioinform. 2007, 8, 105−124. (12) Gallampois, C. M. J.; Schymanski, E. L.; Bataineh, M.; Buchinger, S.; Krauss, M.; Reifferscheid, G.; Brack, W. Integrated biological-chemical approach for the isolation and selection of polyaromatic mutagens in surface waters. Anal. Bioanal. Chem. 2013, 405 (28), 9101−9112. (13) Ames, B. N.; Mccann, J.; Yamasaki, E. Methods for detecting carcinogens and mutagens with salmonella-mammalian-microsome mutagenicity test. Mutat. Res. 1975, 31 (6), 347−363. (14) Ohe, T.; Watanabe, T.; Wakabayashi, K. Mutagens in surface waters: A review. Mutat. Res., Rev. Mutat. Res. 2004, 567 (2−3), 109− 149. (15) National Center for Biotechnology Information. PubChem Compound Database. http://pubchem.ncbi.nlm.nih.gov (accessed 2014). (16) Royal Society of Chemistry. ChemSpider www.chemspider.com (accessed 2014). (17) Schymanski, E. L.; Neumann, S. The critical assessment of small molecule identification (CASMI): Challenges and solutions. Metabolites 2013, 3 (3), 517−538. (18) Abraham, M. H.; Roses, M.; Poole, C. F.; Poole, S. K. Hydrogen bonding 0.42. Characterization of reversed-phase high-performance liquid chromatographic C-18 stationary phases. J. Phys. Org. Chem. 1997, 10 (5), 358−368. (19) Ulrich, N.; Schuurmann, G.; Brack, W. Linear solvation energy relationships as classifiers in non-target analysisA capillary liquid chromatography approach. J. Chromatogr., A 2011, 1218 (45), 8192− 8196. (20) Schymanski, E. L.; Meinert, C.; Meringer, M.; Brack, W. The use of MS classifiers and structure generation to assist in the identification of unknowns in effect-directed analysis. Anal. Chim. Acta 2008, 615 (2), 136−147. (21) Schymanski, E. L.; Meringer, M.; Brack, W. Automated strategies to identify compounds on the basis of GC/EI-MS and calculated properties. Anal. Chem. 2011, 83 (3), 903−912. (22) Hug, C.; Ulrich, N.; Schulze, T.; Brack, W.; Krauss, M. Identification of novel micropollutants in wastewater by a combination of suspect and nontarget screening. Environ. pollut. 2014, 184, 25−32. (23) Benigni, R.; Giuliani, A.; Franke, R.; Gruska, A. Quantitative structure-activity relationships of mutagenic and carcinogenic aromatic amines. Chem. Rev. 2000, 100 (10), 3697−3714.
improvements have been discussed. Predictive in silico tools have the potential to support EDA significantly by enhancing throughput and success rate. Reducing the uncertainty of predictive tools and integrating them in a way that balances extensive reduction of false candidates with a minimum of false exclusions remains a challenge to be addressed. Automation of structure elucidation by online coupling of different software is required to make such workflows applicable to a larger community of scientific and monitoring laboratories. It is evident that nontarget screening relies heavily on databases, as the application of structure generation is limited in this case by the small number of informative fragments in the MS/MS spectra obtained. Thus, additional database search development and improvement in the number of spectra available for the identification of polar unknown compounds will be beneficial5,10 as well as contribution of spectra to open databases, as performed here, which is now possible with automated software such as RMassBank for the MassBank database.45
■
ASSOCIATED CONTENT
* Supporting Information S
Additional tables (Tables S1−S5), figure (Figures S1−S3) and text (SI S1−S2) to support the experimental and results detailing standards ionization and fragmentation behavior, chromatographic information for the compounds to identify, candidates selection, prediction of retention time and mutagenicity, MS spectra for candidate confirmation and candidate similarities. This material is free of charge via the Internet at http://pubs.acs.org.
■
AUTHOR INFORMATION
Corresponding Author
*Phone: +49 341 235 1531; fax: +49 341 235 2401; e-mail:
[email protected]. Present Address ⊥
(C.M.J.G.) Department of Chemistry, Umeå University, 90187 Umeå, Sweden. Author Contributions
The manuscript was written through contributions of all coauthors. All authors have given approval to the final version of the manuscript. Notes
The authors declare no competing financial interest.
■
ACKNOWLEDGMENTS This work was supported by the European Commission through the Marie Curie Research Training Network KEYBIOEFFECTS (contract number MRTN-CT-2006-035695) and the EU FP7 Project SOLUTIONS (603437). We thank C. Hug for performing the LC-MS run for the LSER application and T. Schulze for the MOPAC calculations.
■
REFERENCES
(1) Jha, A. N. Genotoxicological studies in aquatic organisms: An overview. Mutat. Res., Fundam. Mol. Mech. Mutagen. 2004, 552 (1−2), 1−17. (2) Richardson, S. D.; Plewa, M. J.; Wagner, E. D.; Schoeny, R.; DeMarini, D. M. Occurrence, genotoxicity, and carcinogenicity of regulated and emerging disinfection by-products in drinking water: A review and roadmap for research. Mutat. Res., Rev. Mutat. Res. 2007, 636 (1−3), 178−242. I
DOI: 10.1021/es503640k Environ. Sci. Technol. XXXX, XXX, XXX−XXX
Article
Environmental Science & Technology (24) Kazius, J.; McGuire, R.; Bursi, R. Derivation and validation of toxicophores for mutagenicity prediction. J. Med. Chem. 2005, 48 (1), 312−320. (25) Liu, H. X.; Papa, E.; Gramatica, P. QSAR prediction of estrogen activity for a large set of diverse chemicals under the guidance of OECD principles. Chem. Res. Toxicol. 2006, 19 (11), 1540−1548. (26) Bentzien, J.; Hickey, E. R.; Kemper, R. A.; Brewer, M. L.; Dyekjaer, J. D.; East, S. P.; Whittaker, M. An in silico method for predicting ames activities of primary aromatic amines by calculating the stabilities of nitrenium ions. J. Chem. Inf. Model. 2010, 50 (2), 274−297. (27) Borosky, G. L. Carcinogenic carbocyclic and heterocyclic aromatic amines: A DFT study concerning their mutagenic potency. J. Mol. Graph. Model. 2008, 27 (4), 459−465. (28) Knize, M. G.; Hatch, F. T.; Tanga, M. J.; Lau, E. Y.; Colvin, M. E. A QSAR for the mutagenic potencies of twelve 2-aminotrimethylimidazopyridine isomers: Structural, quantum chemical, and hydropathic factors. Environ. Mol. Mutagen. 2006, 47 (2), 132−146. (29) McCarren, P.; Bebernitz, G. R.; Gedeck, P.; Glowienke, S.; Grondine, M. S.; Kirman, L. C.; Klickstein, J.; Schuster, H. F.; Whitehead, L. Avoidance of the Ames test liability for aryl-amines via computation. Bioorg. Med. Chem. 2011, 19 (10), 3173−3182. (30) USEPA. Estimation Program Interface (EPI) SuiteTM, V3.20; United States Environmental Protection Agency, 2007. (31) NORMAN Association, NORMAN MassBank, www.massbank. eu. (accessed 15/06/2014). (32) Reemtsma, T. Liquid chromatography-mass spectrometry and strategies for trace-level analysis of polar organic pollutants. J. Chromatogr., A 2003, 1000 (1−2), 477−501. (33) Lanina, S. A.; Toledo, P.; Sampels, S.; Kamal-Eldin, A.; Jastrebova, J. A. Comparison of reversed-phase liquid chromatographymass spectrometry with electrospray and atmospheric pressure chemical ionization for analysis of dietary tocopherols. J. Chromatogr., A 2007, 1157 (1−2), 159−170. (34) Straube, E. A.; Dekant, W.; Volkel, W. Comparison of electrospray ionization, atmospheric pressure chemical ionization, and atmospheric pressure photoionization for the analysis of dinitropyrene and aminonitropyrene LC-MS/MS. J. Am. Soc. Mass Spectrom. 2004, 15 (12), 1853−1862. (35) Bataineh, M.; Lubcke-von Varel, U.; Hayen, H.; Brack, W. HPLC/APCI-FTICR-MS as a tool for identification of partial polar mutagenic compounds in effect-directed analysis. J. Am. Soc. Mass Spectrom. 2010, 21 (6), 1016−1027. (36) Meringer, M.; R., S.; Zhang, J.; Muller, A. MS/MS data improves automated determination of molecular formulas by mass spectrometry. Match 2011, 65 (2), 259−290. (37) Krauss, M.; Singer, H.; Hollender, J. LC-high resolution MS in environmental analysis: From target screening to the identification of unknowns. Anal. Bioanal. Chem. 2010, 397 (3), 943−51. (38) Radovic, J. R.; Thomas, K. V.; Parastar, H.; Diez, S.; Tauler, R.; Bayona, J. M. Chemometrics-assisted effect-directed analysis of crude and refined oil using comprehensive two-dimensional gas chromatography-time-of-flight mass spectrometry. Environ. Sci. Technol. 2014, 48 (5), 3074−3083. (39) Bornick, H.; Grischek, T.; Worch, E. Determination of aromatic amines in surface waters and comparison of their behavior in HPLC and on sediment columns. Fresenius J. Anal. Chem. 2001, 371 (5), 607−13. (40) Bougeard, C.; Gallampois, C.; Brack, W. Passive dosing: An approach to control mutagen exposure in the Ames fluctuation test. Chemosphere 2011, 83 (4), 409−414. (41) Kim, H.; Huh, I. A.; Jung, M.; Shin, H. S. Ultra-trace level determination of benzidine and dichlorobenzidine residues in surface water by liquid chromatography-electrospray ionization tandem mass spectrometry. Int. J. .Environ. Anal. Chem. 2013, 93 (9), 999−1007. (42) Fekete, A.; Malik, A. K.; Kumar, A.; Schmitt-Kopplin, P. Amines in the environment. Critical Rev. Anal. Chem. 2010, 40, 102−121. (43) Schymanski, E. L.; Gallampois, C. M. J.; Krauss, M.; Meringer, M.; Neumann, S.; Schulze, T.; Wolf, S.; Brack, W. Consensus structure
elucidation combining GC/EI-MS, structure generation, and calculated properties. Anal. Chem. 2012, 84 (7), 3287−3295. (44) Holcapek, M.; Jirasko, R.; Lisa, M. Basic rules for the interpretation of atmospheric pressure ionization mass spectra of small molecules. J. Chromatogr., A 2010, 1217 (25), 3908−3921. (45) Stravs, M. A.; Schymanski, E. L.; Singer, H. P.; Hollender, J. Automatic recalibration and processing of tandem mass spectra using formula annotation. J. Mass Spectrom. 2013, 48 (1), 89−99.
J
DOI: 10.1021/es503640k Environ. Sci. Technol. XXXX, XXX, XXX−XXX