Letter to the Editor Cite This: Environ. Sci. Technol. 2019, 53, 5531−5533
pubs.acs.org/est
Response to “Letter to the Editor: Optimism for Nontarget Analysis in Environmental Chemistry”
Downloaded via 46.148.115.29 on July 27, 2019 at 15:24:50 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
W
e thank Samanipour et al.1 for their comments on our Viewpoint.2 We appreciate their optimism that state-ofthe-art mass spectrometry and informetric techniques will provide the information with which pollutants contaminating our environment can be identified. We share this optimism, and we encourage our colleagues to view the discussion of questionable aspects of nontargeted methods as an opportunity to promote continual improvements in the field. We suggested that the gold standard of reproducibility would be for other researchers to independently identify and quantify the same compound(s),2 but even when the objective is restricted to qualitative identification, there is still room for improvement. Rostkowski et al. led the most recent interlaboratory study of nontargeted screening, involving 21 participating laboratories in Europe, North America, and Australia.3 They reported (tentative) identifications of about 2350 compounds in house dust. This effort can be considered a success, and Rostkowski et al. should be commended for leading this rigorous study.3 Their findings underline the fact that no single analytical technique is suitable for the analysis of all contaminants, and that successful nontargeted screening will require the development of multiplatform approaches. In this case, only 37% of all compounds were reported by more than one participant, and an even smaller fraction (5%) was detected using both gas and liquid chromatography.3 This small overlap in reporting thus begs the question: Is nontargeted screening reproducible? The success and reproducibility of nontargeted screening hinge upon access to open, accurate, and representative chemical databases, consisting of structures, mass spectra, metadata, and other identifiers such as retention indices and collisional cross section. An impressive array of open databases now exist that are unquestionably useful, but they are incomplete. Spectral libraries, such as those compiled by NIST and Wiley, will always be small compared to structure libraries (for example, EPA’s CompTox Chemistry Dashboard),4,5 in part because of the need for authentic standards. Herein lies a cyclical problem: If one objective of nontargeted screening is to identify previously unknown pollutants, and by virtue of being unknown, these pollutants are not registered in any database, then one may question whether incomplete databases are valuable for nontargeted screening. Even suspect screening may be frustrated by databases that report incorrect or unrepresentative structures. This is especially problematic for mixtures−mixed halogenated paraffins are a prime example. Proposing representative structures based on mass spectrometric characterization of standards or technical mixtures is critical to the success of suspect screening and as a first step toward determining environmental significance. 1,8-Dibromo-3,6-dichlorocarbazole is a useful example of a previously unknown compound identified in sediment from Ontario.6 At the time of its discovery, neither its structure nor its mass spectrum were registered in any database. Consequently, human intervention was required to propose a © 2019 American Chemical Society
structure by manual interpretation of its mass spectrum. Authentic standards were not commercially available and needed to be synthesized. Of course, both steps are timeconsuming and may be rate-limiting in the identification process. Samanipour et al.1 raise an intriguing point that machine learning may help expedite these steps, by enabling automated annotation7 and prediction of fragmentation pattern8−11 One implication is that existing spectral libraries could be expanded, potentially capturing a large portion of “chemical space”. In practice, the approach9,10 does not always yield high-quality results, and critical evaluation is still important.11 For example, the predicted mass spectrum of 1,8-dibromo-3,6-dichlorocarbazole (Figure 1a) is a poor match compared to the experimental spectrum (Figure 1b), presumably because this compound was unlike the compounds used to train the machine learning algorithm. In contrast, the theoretical spectrum shown in Figure 1C is an excellent match. The latter was produced using the quantum chemical method developed by Grimme,8 which to our knowledge has not yet been applied for contaminant identification. This computationally demanding calculation required access to a few hours on a supercomputing cluster. However, the excellent match raises the tantalizing possibility that an unknown contaminant can be confirmed in hours, as opposed to the days or months required to obtain an authentic standard. Ultimately, one hopes that a computational method will eventually be developed that enables automated interpretation of a mass spectrum, even if the corresponding structure is absent from existing databases. Until then, we may still have to rely on the skill of old-school mass spectrometrists. We also continue to caution that important concepts and terms (jargon) in this field need to be standardized. Indeed, our Viewpoint2 focused on “nontargeted screening,” and Samanipour et al.1 focused on “non-target analysis.” In line with the opinion of others in the field, we suggest that the term “screening” better captures the need to prioritize environmentally significant chemicals, which is often accomplished using a scripted filter (or sieve). Rostkowski et al.3 note in their recent collaborative study that not even established users of untargeted analytical techniques share a common definition of the terms “target,” “suspect,” and “nontarget,” in part due inconsistency between liquid chromatography and gas chromatography users. Another term that has various meanings to different people working in this field is “feature.” Samanipour et al.1 define this term as “a data tuple consisting of three aspects of the analytical instrument output (a chromatographic peak, and m/z of adducts and isotopes) . . .” One view is that this definition is not general enough. For Received: April 23, 2019 Accepted: May 1, 2019 Published: May 10, 2019 5531
DOI: 10.1021/acs.est.9b02473 Environ. Sci. Technol. 2019, 53, 5531−5533
Environmental Science & Technology
Letter to the Editor
Figure 1. Mass spectrum of 1,8-dibromo-3,6-dichlorocarbazole obtained from (a) machine learning (CFM-EI9); (b) experiment;6 and (c) QCEIMS.8
such data retention and storage. There are good reasons for going slowly,15 but in the meantime, those of us involved in nontargeted analysis (broadly defined) need to consider how to make the data and the related informatic tools more publicly available.16 Finally, we cautiously agree that nontargeted screening is accelerating the rate of contaminant discovery, but nevertheless wonder what are the criteria for success? According to the most recent United Nations Global Chemicals Outlook,17 global chemical production is projected to rapidly outpace population growth during the next 15 years. We feel that the word “optimism” is not quite the right term to describe this scenario. Instead, we hope that our colleagues in the field will be imbued with a sense of determination to continually improve nontargeted screening and to stay ahead of the curve.
example, chromatographic separation is not always necessary for the identification of a previously unknown compound.12 In addition, this definition misses key elements. Does a “feature” include fragment ions as well as adduct ions? Depending on the ionization method, both of these types of ions may be essential for a compound’s identification. The most important part of a “feature” is the abundance of the ions. Indeed, relative ion abundances can lead the interpreter (or script) to a quick and easy understanding of the presence and number of chlorine or bromine atoms in a particular compound. Clearly, a “feature” should include the abundances of all the ions. Some of these terminological problems could be solved by making the mass spectral data publicly available. In this way, users could determine for themselves what constitutes a “feature” and verify the authors’ interpretations. In general, the need for data transparency and openness is obvious, and of course, a committee exists to promote this need.13 In many fields, archiving the data in a publicly available site has become the accepted standard. For example, genomic data are often placed in public databases, but even here, “polices have not adequately resolved a critical dilemma, regarding how data are to be used once made publicly available”.14 In the case of nontargeted screening, it is easy to imagine putting all the mass spectral data from such an experiment in a public database. In principle, this would be simple, consisting of lists of exact masses and relative abundances of all ions for all chromatographic peaks under the various ionization conditions. Of course, all of the metadata would need to be included as well. In practice, these lists (mass spectra) may morph into hundreds of thousands of entries, making archiving these data problematic. The publisher of Environmental Science & Technology, the American Chemical Society, has no policy on
Ronald A. Hites* O’Neill School of Public and Environmental Affairs, Indiana University, Bloomington, Indiana 47405, United States
Karl J. Jobst
■
Department of Chemistry and Chemical Biology McMaster University Hamilton, Ontario L8S 4M1, Canada
AUTHOR INFORMATION
Corresponding Author
*E-mail:
[email protected]. ORCID
Ronald A. Hites: 0000-0003-0975-5058 Karl J. Jobst: 0000-0002-7687-6629 Notes
The authors declare no competing financial interest. 5532
DOI: 10.1021/acs.est.9b02473 Environ. Sci. Technol. 2019, 53, 5531−5533
Environmental Science & Technology
■
Letter to the Editor
REFERENCES
(1) Samanipour, S.; Martin, J. W.; Lamoree, M. H.; Reid, M. J.; Thomas, K. V. Letter to the Editor: Optimism for non-target analysis in environmental chemistry, Environ. Sci. Technol., 2019, 53, DOI: 10.1021/acs.est.9b01476 (2) Hites, R. A.; Jobst, K. J. Is nontargeted screening reproducible? Environ. Sci. Technol. 2018, 52, 11975−11976. (3) Rostkowski, P.; et al. The strength in numbers: Comprehensive characterization of house dust using complementary mass spectrometric techniques. Anal. Bioanal. Chem. 2019, 411, 1957−1977. (4) Williams, A. J.; Grulke, C. M.; Edwards, J.; Mc Eachran, A. D.; Mansouri, K.; Baker, N. C.; Patlewicz, G.; Shah, I.; Wambaugh, J. F.; Judson, R. S.; Richard, A. The CompTox Chemistry Dashboard: A community data resource for environmental chemistry. J. Cheminf. 2017, 9, 61−88. (5) McEachran, A. D.; Sobus, J. R.; Williams, A. J. Identifying known unknowns using the U.S. EPA’s CompTox Chemistry Dashboard. Anal. Bioanal. Chem. 2017, 409, 1729−1735. (6) Pena-Abaurrea, M.; Jobst, K. J.; Ruffolo, R.; Shen, L.; McCrindle, R.; Helm, P. A.; Reiner, E. J. Identification of potential novel bioacculative and persistent chemicals in sediments from Ontario (Canada) using scripting approaches with GCxGC-TOF MS analysis. Environ. Sci. Technol. 2014, 48, 9591−9599. (7) Blazenovis, I.; et al. Structure annotation of all mass spectra in untargeted metabolomics. Anal. Chem. 2019, 91, 2155−2162. (8) Grimme, S. Towards first principles calculation of electron impact mass spectra of molecules. Angew. Chem., Int. Ed. 2013, 52, 6306−6312. (9) Allen, F.; Pon, A.; Greiner, R.; Wishart, D. Computational prediction of electron ionization mass spectra to assist in GC/MS compound identification. Anal. Chem. 2016, 88, 7689−7697. (10) Wei, J. N.; Belanger, D.; Adams, R. P.; Sculley, D.; Rapid prediction of electron-ionization mass spectrometry using neural networks, ACS Cent. Sci., 2019, DOI: 10.1021/acscentsci.9b00085. (11) Schymanski, E. L.; Ruttkies, C.; Krauss, M.; Brouard, C.; Kind, T.; Duhrkop, K.; Allen, F.; Vaniya, A.; Verdegem, D.; Bocker, S.; Rousu, S.; Shen, H.; Tsugawa, H.; Sajed, T.; Fiehn, O.; Ghesquiere, B.; Neumann, S. Critical assessment of small molecule identification 2016: Automated methods. J. Cheminf. 2017, 9, 22. (12) Venier, M.; Stubbings, W. A.; Guo, J.; Romanak, K.; Nguyen, L. V.; Jantunen, L.; Melymuk, L.; Arrandale, V.; Diamond, M.; Hites, R. A. Tri(2,4-di-t-butylphenyl) phosphate: A previously unrecognized, abundant, ubiquitous pollutant in the built and natural environment. Environ. Sci. Technol. 2018, 52, 12997−13003. (13) Nosek, B. A.; et al. Promoting an open research culture. Science 2015, 348, 1422−1425. (14) Amann, R. I.; et al. Toward unrestricted use of public genomic data. Science 2019, 363, 350−352. (15) Sweedler, J. V. Where is the data? Anal. Chem. 2018, 90, 8721− 8721. (16) Schymanski, E. L.; Williams, A. J. Open science for identifying “known unknown” chemicals. Environ. Sci. Technol. 2017, 51, 5357− 5359. (17) United Nations Global Chemicals Outlook, 2nd ed.; United Nations Environment Programme, 2019.
5533
DOI: 10.1021/acs.est.9b02473 Environ. Sci. Technol. 2019, 53, 5531−5533