Signature-Discovery Approach for Sample Matching of a Nerve-Agent

Apr 21, 2010 - This report demonstrates the use of bioinformatic and chemometric tools on liquid chromatography−mass spectrometry (LC−MS) data for...
0 downloads 12 Views 530KB Size
Anal. Chem. 2010, 82, 4165–4173

Signature-Discovery Approach for Sample Matching of a Nerve-Agent Precursor Using Liquid Chromatography-Mass Spectrometry, XCMS, and Chemometrics Carlos G. Fraga,* Brian H. Clowers, Ronald J. Moore, and Erika M. Zink Pacific Northwest National Laboratory, 902 Battelle Boulevard, Richland, Washington 99352 This report demonstrates the use of bioinformatic and chemometric tools on liquid chromatography-mass spectrometry (LC-MS) data for the discovery of trace forensic signatures for sample matching of ten stocks of the nerveagent precursor known as methylphosphonic dichloride (dichlor). XCMS, a software tool primarily used in bioinformatics, was used to comprehensively search and find candidate LC-MS peaks in a known set of dichlor samples. These candidate peaks were down selected to a group of 34 impurity peaks. Hierarchal cluster analysis and factor analysis demonstrated the potential of these 34 impurities peaks for matching samples based on their stock source. Only one pair of dichlor stocks was not differentiated from one another. An acceptable chemometric approach for sample matching was determined to be variance scaling and signal averaging of normalized duplicate impurity profiles prior to classification by Knearest neighbors. Using this approach, a test set of seven dichlor samples were all correctly matched to their source stock. The sample preparation and LC-MS method permitted the detection of dichlor impurities quantitatively estimated to be in the parts-per-trillion (w/w). The detection of a common impurity in all dichlor stocks that were synthesized over a 14-year period and by different manufacturers was an unexpected discovery. Our described signature-discovery approach should be useful in the development of a forensic capability to assist investigations following chemical attacks. Methylphosphonic dichloride (dichlor) is a commercially available toxic organophosphorous compound and nerve-agent precursor. It is listed as a Scheduled 2 compound by the Organisation for the Prohibition of Chemical Weapons because it is a known chemical weapon (CW) precursor with limited nonweapon uses.1 In this report, we use commercial dichlor to help develop a chemical forensic capability to address potential chemical attacks involving CW agents or other similar toxicants. In the event of a chemical attack, it is important to have a capability to obtain useful information from collected material evidence and * To whom correspondence should be addressed. E-mail: [email protected]. (1) OPCW, Ed. Annex on Chemicals; Organisation for the Prohibition of Chemical Weapons (OPCW): The Netherlands. 10.1021/ac1003568  2010 American Chemical Society Published on Web 04/21/2010

aid criminal investigations and court proceedings. Herein, we investigate liquid chromatography-mass spectrometry (LC-MS) with bioinformatic and chemometric tools for sample matching using commercial dichlor as a model toxicant. Sample matching involves connecting a chemical sample back to its source (e.g., stock bottle) or to samples from the same source based on some intrinsic characteristics. Previous work with dimethyl methylphosphonate (DMMP) demonstrated the feasibility of the use of impurity profiling for sample matching of a CW-related compound.2 Impurity profiling of DMMP was performed using comprehensive two-dimensional gas chromatography with timeof-flight mass spectrometry (GC × GC-TOFMS), and the feasibility for sample matching was demonstrated using chemometric analysis. The selected impurities were those having signals strong enough to be visible in the total ion current (TIC) chromatogram or believed to be important based on manufacturer information. Herein, we complement this previous work by demonstrating a more comprehensive approach for locating forensic impurities or signatures in chromatographic-mass spectrometric data that has the advantages of being automated, previously tested, and less likely to overlook low-signal signatures. The forensic-signature discovery approach demonstrated in this paper relies on preexisting bioinformatic and chemometric tools that together, to our knowledge, have not been openly applied outside of metabolomic applications. Here, we use the bioinformatic tool known as XCMS (an acronym for various forms (X) of chromatography mass spectrometry) to find numerous traceimpurity signals in LC-MS data from several dichlor stocks. XCMS is an open-source tool developed by the Scripps Research Institute for locating biomarkers in LC-MS data.3 XCMS incorporates nonlinear retention time correction, peak detection, and peak matching which permits the comprehensive discovery of chromatographic peaks that may be markers for a particular condition or for what is being studied. According to our literature search, XCMS and similar biomarker tools have been used almost exclusively for metabolomic studies;4-13 however, in this report, we have used XCMS for nonmetabolomic work in order to look (2) Hoggard, J. C.; Wahl, J. H.; Synovec, R. E.; Mong, G. M.; Fraga, C. G. Anal. Chem. 2010, 82, 689–698. (3) Smith, C. A.; Want, E. J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. Anal. Chem. 2006, 78, 779–787. (4) Lommen, A. Anal. Chem. 2009, 81, 3079–3086. (5) Katajamaa, M.; Miettinen, J.; Oresic, M. Bioinformatics 2006, 22, 634– 636.

Analytical Chemistry, Vol. 82, No. 10, May 15, 2010

4165

Table 1. Stocks of Dichlor (Methylphosphonic Dichloride, g97% Purity) stocka

supplierb

manufacturerc

dated

A B C D E F G H I J

I II II II II II III III III IV

yes yes yes no yes yes no yes yes no

made June 2004 made Dec 1995 made June 2004 acquired May 2004 made Sept 1993 made Dec 2007 tested Oct 2005 tested Jul 1998 tested May 2007 unknown

a Letter represents a specific dichlor stock with a unique lot number. The letter is the first term in the sample labeling scheme: stockhydrolyzate-aliquot. b Roman numeral represents a specific supplier. c Yes indicates the supplier was the manufacturer of the dichlor stock. d Date when stock was acquired by the supplier, made by the supplier, or analyzed by the supplier for purity assurance.

for LC-MS peaks that may be markers for associating or differentiating various dichlor samples based on source. XCMS was used in generating a peak table that listed the signal intensities of several LC-MS peaks pertaining to dichlor impurities. The peak table was then analyzed by the chemometric tools known as hierarchal cluster analysis (HCA), principal component analysis (PCA), and non-negative matrix factorization (NNMF) to reveal intrinsic sample clustering based on stock source. The supervised classification method known as K-nearest neighbor (KNN) was then used to correctly classify samples from a test set to validate the discovery of forensic signatures. While HCA, PCA, and KNN have been used for similar sample-matching studies in drug profiling14-16 and NNMF has been used in a previous CWprecursor study,2 our work is the first to combine these types of chemometric tools with a bioinformatic tool for the comprehensive discovery of chemical forensic signatures. EXPERIMENTAL SECTION Dichlor. Ten dichlor stocks (A-J) with a listed purity of at least 97% were obtained from four chemical suppliers (I-IV). According to the suppliers, these dichlor stocks were all synthesized by the chlorination of commercial DMMP using thionyl chloride. Each of these dichlor stocks were determined to be from different lots based on supplier information. Pertinent information regarding each of the dichlor stocks is provided in Table 1. (6) Wong, J. W. H.; Cagney, G.; Cartwright, H. M. Bioinformatics 2005, 21, 2088–2090. (7) Jankevics, A.; Liepinsh, E.; Liepinsh, E.; Vilskersts, R.; Grinberga, S.; Pugovics, O.; Dambrova, M. Chemom. Intell. Lab. Syst. 2009, 97, 11–17. (8) Kind, T.; Tolstikov, V.; Fiehen, O.; Weiss, R. H. Anal. Biochem. 2007, 363, 185–195. (9) Nordstrom, A.; Want, E. J.; Northen, T.; Lehtioe, J.; Siuzdak, G. Anal. Chem. 2008, 80, 421–429. (10) Rijk, J. C. W.; Lommen, A.; Essers, M. L.; Groot, M. J.; Van Hende, J. M.; Doeswijk, T. G.; Nielen, M. W. F. Anal. Chem. 2009, 81, 6879–6888. (11) Tikunov, Y.; Lommen, A.; de Vos, C. H. R.; Verhoeven, H. A.; Bino, R. J.; Hall, R. D.; Bovy, A. G. Plant Physiol. 2005, 139, 1125–1137. (12) Want, E. J.; G., O. M.; Smith, C. A.; Brandon, T. R.; Uritboonthai, W.; Qin, C.; Trauger, S. A.; Siuzdak, G. Anal. Chem. 2006, 78, 743–752. (13) Wikoff, W. R.; Gangoiti, J. A.; Barshop, B. A.; Siuzdak, G. Clinical Chem. 2007, 53, 2169–2176. (14) Buchanan, H. A. S.; Daeid, N. N.; Meier-Augenstein, W.; Kemp, H. F.; Kerr, W. J.; Middleditch, M. Anal. Chem. 2009, 80, 3350–3356. (15) Dae´id, N. N.; Waddell, R. J. H. Talanta 2005, 67, 280–285. (16) Waddell-Smith, R. J. H. J. Forensic Sci. 2007, 52, 1297–1304.

4166

Analytical Chemistry, Vol. 82, No. 10, May 15, 2010

Safety. Dichlor is considered highly toxic and reacts vigorously with water. It hydrolyzes in water to form methylphosphonic acid (MPA) and HCl. All work with nonhydrolyzed dichlor took place in a laboratory fume hood while wearing a lab coat, eye protection, and nitrile gloves. Hydrolyzates. In order to accommodate the sample requirements of LC-MS, hydrolyzate solutions for each of the ten dichlor stocks were made by carefully mixing dichlor with water. Up to four hydrolyzate solutions were made per dichlor stock depending on the availability of a dichlor stock at the time of preparation. A dichlor hydrolyzate solution was created by the slow dropwise addition of 0.3 mL of liquid dichlor into a 20 mL borosilicate glass vial containing 1.2 mL of deionized water from a Milli-Q water purifier (Millipore, Billerica, MA). Each stock bottle of dichlor was slightly heated to ensure the dichlor was in a liquid state (melting point 32-34 °C) prior to being transferred using borosilicate glass pipettes. The weight of each vial was recorded before and after each addition of water and dichlor. The mean mass of the 1.2 mL of water in each vial was 1.269 g (0.292% relative standard deviation or RSD), and the mean increase in mass from the addition of 0.3 mL of dichlor was 1.696 g (0.719% RSD) for 31 hydrolyzate solutions made over a 3 month period. After initial mixing, the 20 mL vial containing hydrolyzed dichlor was agitated in a tray shaker for at least 5 days and then stored at room temperature. Starting on February 23, 2009, 300 µL of each dichlor hydrolyzate was then mixed with 200 µL of aqueous concentrated ammonium hydroxide (Fisher Chemicals product #A669-500). (In this paper, exact dates are provided for sample preparation and LC-MS analysis in order to best illustrate how sample storage times were factored into this study.) Ammonium hydroxide was added to each hydrolyzate to raise the pH to an approximate value of 2 by neutralizing some of the HCl(aq) produced by the hydrolysis of dichlor. This pH adjustment was originally required as the first step of a solid-phase extraction (SPE) procedure. The purpose of the SPE procedure was to remove most of the MPA prior to LC-MS analysis. This initial procedure proved ineffective because, while it removed most of the MPA, no LC-MS peaks unique to the dichlor samples were found when compared to the method blanks. Nevertheless, pH adjustment was required for a solvent-extraction procedure that favorably resulted in numerous LC-MS peaks that were detected in the dichlor samples and not in the method blanks. This procedure was used in this study, and it is discussed below. Solvent Extraction. The solvent-extraction procedure was applied to all dichlor hydrolyzate solutions, method blanks, and spiked dichlor composites. The solvent-extraction procedure involved adding 15 µL of concentrated ammonium hydroxide to 200 µL of the pH-adjusted hydrolyzate (or method blank or composite). Previous experiments determined that 15 µL of concentrated ammonium hydroxide was sufficient to raise the pH to at least 3.0, which was above MPA’s pKa of 2.38 at 25 °C.17 A pH at or above 3.0 ensured that most of the MPA would be in its anionic form during solvent extraction, thereby limiting the amount of MPA that was extracted from the aqueous phase. Solvent extraction of the aqueous solution was done three (17) Munro, N. B.; Talmage, S. S.; Griffin, G. D.; Waters, L. C.; Watson, A. P.; King, J. F.; Hauschild, V. Environ. Health Perspect. 1999, 107, 933–974.

Table 2. Hydrolyzates from Stock hydrolyzatea

date/time when made

1 2 3 4

November 14, 2008/14:09-15:19 November 14, 2008/15:41-16:43 February 10, 2009/10:50-12:24 February 10 2009/13:24-14:47

a Number represents a date and time when a dichlor hydrolyzate was made. The number is the second term in the sample labeling scheme: stock-hydrolyzate-aliquot.

Table 3. Dichlor Samples (Labeling Scheme: Stock-Hydrolyzate-Aliquot)a A-1-1 A-2-1 A-3-1 A-4-1 F-3-1 F-3-2 F-4-1 F-4-2

B-1-1 B-2-1 B-3-1 B-4-1 G-1-1 G-2-1 G-3-1 G-4-1

C-1-1 C-2-1 C-3-1 C-4-1 H-1-1 H-2-1 H-3-1 H-4-1

D-1-1 D-2-1 D-3-1 D-3-2 I-3-1 I-3-2 I-4-1 I-4-2

E-3-1 E-3-2 E-3-3 E-3-4 J-1-1 J-2-1 J-3-1 J-3-2

a A sample’s label is deciphered using Tables 1 and 2 and knowing that aliquot (third term in labeling scheme) represents one of four possible aliquots from a hydrolyzate.

separate times with fresh 0.5 mL methylene chloride (Chromasolv, 99.9% purity, Sigma-Aldrich). Each of the three solvent extracts was combined in a 16 × 100 mm flint-glass culture tube (VWR, product #60825-425). Solvent Exchange. The methylene chloride extract in the culture tube was evaporated to dryness using a Zymark TurboVap LV Evaporator (Caliper Life Sciences, Hopkinton, MA) with nitrogen as the drying gas. The culture tube containing the extract was kept at a temperature of 4-8 °C. Extracts required approximately 2.5 h to reach dryness. Once an extract was dry, the residue in the culture tube was reconstituted with 15 µL of an aqueous solution of 0.2% acetic acid and 0.05% trifluoroacetic acid. Each reconstituted sample was then transferred into a 200 µL deactivated glass insert in a 2 mL borosilicate vial capped with a silicone/polytetrafluoroethylene (PTFE) bonded septa screw cap (Microsolv, Eatontown, NJ). The 2 mL sample vial was placed in a chilled autosampler tray for LC-MS analysis. Dichlor Samples. A total of four dichlor samples were created per dichlor stock for a total of 40 dichlor samples. For a given stock, each of the four dichlor samples could have originated from up to four different hydrolyzate solutions. In this report, the source for each of the dichlor samples is determined using Tables 1 and 2 and the following labeling scheme: stock-hydrolyzate-aliquot. For example, samples J-3-1 and J-3-2 were both separate aliquots taken from the same hydrolyzate that was created by mixing water with dichlor stock J on February 10, 2009, between 10:50 and 12: 24. Table 3 provides a list of all 40 dichlor samples. Method Blanks. Method blanks were prepared with each batch of dichlor samples. A total of eight method blanks were generated. A method blank consisted of the water used to hydrolyze the dichlor stocks plus a known amount of concentrated HCl(aq) to mimic the HCl produced by dichlor hydrolysis. The method blanks came from two different batches of water. One batch was used for hydrolysis work on November 14, 2008, and the other was used for hydrolysis work on February 10,

2009. The method blanks from November 14 were labeled BLK1-x and those for February 10 were labeled BLK-3-x where x ) 1, 2, 3, or 4. The method blanks were treated the same way as the dichlor samples beginning with solvent extraction. Spiked Dichlor Composites. A dichlor composite was made by mixing equal volumes of ten dichlor hydrolyzate solutions from ten different dichlor stocks. The dichlor composite was then mixed with a known volume of concentrated aqueous ammonium hydroxide to obtain an approximate pH of 2. Three portions of the pH-adjusted dichlor composite were then spiked with different volumes of an aqueous solution containing equal concentrations of six analyte standards including triethylphosphate (TEPO). The three spiked composite samples were treated the same way as the dichlor samples beginning with solvent extraction. Solvent Standards. Four solvent standards having analyte concentrations of 5, 10, 25, and 50 ng/mL were made using the LC-MS mobile phase as the solvent. The analytes were the same six compounds spiked into the dichlor composite. These solvent standards were used for initial LC-MS method development. They were also used to measure instrument performance just prior to the dichlor sample analyses. LC-MS. The samples described above were analyzed by an LC-MS system consisting of an Agilent 1100 Series Capillary HPLC (Agilent Technologies Inc., Santa Clara, CA) coupled by an electrospray ionization (ESI) interface to a LTQ Orbitrap hybrid mass spectrometer (Thermo Fisher Scientific Inc., San Jose, CA). The capillary HPLC column was manufactured in-house by slurry packing 5 µm Pursuit C18 stationary phase (Varian, Lake Forest, CA) into a 20 cm length of 360 µm outer diameter × 150 µm inner diameter fused silica capillary tubing (Polymicro Technologies Inc., Phoenix, AZ). A 2 µm screen in a 1/16 in. capillary-bore union (Valco Intruments Co., Houston, TX) was used to retain the column packing. Mobile phase A consisted of 0.2% acetic acid and 0.05% trifluoroacetic acid in water, and mobile phase B consisted of 0.1% trifluoroacetic acid and 90% acetonitrile in water. At a flow rate of 3 µL/min, 1 µL of sample was injected onto the column at 98% A/2% B, held for 1 min, then subjected to a linear gradient to 90% A/10% B from 1-15 min, a hold at 90% A/10% B from 15-26 min, and finally returned to starting conditions at 27 min allowing for a 33 min column re-equilibration period prior to another injection. The LTQ-Orbitrap mass spectrometer was outfitted with a custom-built, ion funnel-based atmospheric pressure ionization (API) source and ESI interface. Electrospray emitters were custom-made using 150 µm outer diameter × 20 µm inner diameter chemically etched fused silica.18 The heated capillary temperature and spray voltage were 200 °C and 2.2 kV, respectively. Orbitrap spectra (automatic gain control [AGC] 1 × 106) were collected from 50-500 m/z at a resolution of 100k followed by data dependent ion trap MS/MS spectra (AGC 1 × 104) of the six most abundant ions using a normalized collision energy of 35%. In summary, the above LC-ESI-MS method produced both LC-ESI-high resolution mass spectrometry (LC-ESI-HRMS) data and LC-ESI-low resolution tandem mass spectrometry (LC-ESI-LRMS/MS) data for each sample analysis. Analysis Run Order. Each of the 40 dichlor samples and eight method blanks were analyzed once, in random order, by an (18) Kelly, R. T.; Page, J. S.; Luo, Q.; Moore, R. J.; Orton, D. J.; Tang, K.; Smith, R. D. Anal. Chem. 2006, 78, 7796–7801.

Analytical Chemistry, Vol. 82, No. 10, May 15, 2010

4167

automated LC-MS sequence starting on March 4, 2009. The four solvent standards, three spiked dichlor composites, and a solvent blank (i.e., mobile phase A) were run multiple times. The run order and number of runs (in parentheses) are provided as follows: solvent blank (1), solvent standards from low to high concentration (4), solvent blank (1), spiked dichlor composites (3 in random order), solvent blank (1), dichlor samples and method blanks (48 in random order), solvent blank (1), and spiked dichlor composites (3 in random order). Data Conversion and XCMS Software. The data from the LC-ESI-HRMS analyses were first preprocessed prior to XCMS analysis. The proprietary data format (*.raw) generated by the MS instrument control software was converted to the open mzXML format using the ReAdW provided by the Seattle Proteome Center (http://tools.proteomecenter.org/). Following data conversion, data were processed using a custom graphical user interface (GUI) using the XCMS R module developed and maintained by the Scripps Research Institute.3 The GUI program, which we named pyXCMS, was written in Python (version 2.5), which provides a direct link to the XCMS R processing algorithms through the rpy2 python extension. The source code and binaries can be obtained at http://sourceforge.net/projects/pyxcms/ or by contacting the author. In an effort to extend the XCMS module through a GUI and provide a readily available set of tools to graphically examine processing results, each XCMS processing experiment could be saved in the HDF5 file format (http:// www.hdfgroup.org/HDF5/). Compressed HDF5 files produced by the pyXCMS GUI can be loaded for future exploration. While not all of the features provided in the XCMS module are directly accessible in pyXCMS, the primary functionality of the R module is enabled through the GUI and additional features can be readily added. Peak Table Generation. The XCMS software generated an initial peak table from the LC-ESI-HRMS data collected from 40 dichlor samples and 8 method blanks. The “matchedFiltered” method from XCMS was used for peak detection. The optimized XCMS parameters were as follows: full width at half-maximum of model peak (fwhm) ) 7.0; standard deviation of model peak (sigma) ) 3; signal-to-noise cutoff (snthresh) ) 5.0; m/z difference (mzdiff) ) 0.01; step size for profile generation (step) ) 0.01; Gaussian smoothing-function width (bw) ) 10.00; minimum number of samples (minsamp) ) 4; width of overlapping m/z slices (mzwid) ) 0.01. XCMS analysis was done with and without retention-time correction. The XCMS tables generated with and without retention-time correction were practically indistinguishable. This was because the retention-time deviations in our data were small enough, under the given XCMS parameters, to have no apparent effect on the outcome of peak grouping by XCMS. This likely would not be the case for LC-MS data collected using a lower resolution spectrometer on more complex samples because some peaks representing different impurities would be erroneously grouped together without the use of retention-time correction. In our study, the XCMS table that was generated without retention-time correction was arbitrarily chosen for further processing. The XCMS table consisted of a list of extracted-ion chromatogram (EIC) peaks designated by a number. Each EIC peak was a single LC-MS peak with a unique mass (i.e., m/z) and retention time that was present in some fraction of the dichlor 4168

Analytical Chemistry, Vol. 82, No. 10, May 15, 2010

Figure 1. (A) Total-ion-current (TIC) chromatogram from the LC-ESI-HRMS analysis of a representative dichlor sample. The huge, broad peak is the hydrolyzed dichlor, i.e., methylphosphonic acid. The actual sample is A-1-2 (see Tables 1-3). (B) Extractedion chromatogram from A-1-2 using the summed signals from four different m/z values to illustrate four impurity peaks (see Table 5) that are masked in the TIC chromatogram.

samples and method blanks. In addition, the XCMS peak table included an EIC peak’s median retention time, median m/z, and area value for each of the dichlor samples and method blanks that presumably contained that specific peak. The XCMS table was then processed in Excel (Microsoft, Redmond, WA) and only those EIC peaks that appeared in at least four dichlor samples and none of the eight method blanks were kept. Of those peaks, a visual inspection of the EIC for each peak was made to validate the detected peak. With this final XCMS peak table, the median retention time and median m/z for each EIC peak was then entered into the Thermo Fisher Xcalibur 2.0.7 software for targeted data analysis. The original LC-ESI-HRMS data for the dichlor samples and method blanks were then processed again to generate a peak table using the Xcalibur software. The Xcalibur peakdetection and integration algorithm called Genesis was used. The peak detection parameters were set to highest-peak detection, a minimum signal-to-noise ratio (S/N) of 2, and a retention-time window width of 90 s. The mass tolerance for peak detection was set to 5.0 parts-per-million. The integration parameters were a S/N threshold of 0.1 and no point smoothing. The EICs from the Xcalibur software were visually inspected to ensure that every detected peak was actually a chromatographic peak, that is, had at least two points across its width. Those that had just one point were considered noise spikes and given an area of zero. The final Xcalibur table was a 48 × 34 data matrix contacting the areas for 34 impurity peaks in 48 samples (40 dichlor sample and 8 method blanks). The matrix can be found in the Supporting Information. Chemometric Analysis. HCA (PLS Toolbox 5.0, eigenvector Research, Inc., Manson, WA) and factor analysis using PCA (PLS Toolbox 5.0) and NNMF (Matlab Statistics toolbox, The MathWorks, Inc., Natick, MA) were performed on the data matrix with

Table 4. Mass and Retention Times of Impurity Peaks +

impurity peak

mass [M + H]

retention time (min)

1 2 3 4 5 6 7 8 9 10 11 12a 13a 14 15 16b 17b 18 19c 20c 21c 22 23d 24d 25 26 27 28 29 30e 31e 32e 33 34

187.0287 139.1228 145.1336 146.1368 161.0997 239.0776 139.1228 187.1997 187.0950 172.9248 253.0923 118.9983 175.0608 197.0724 249.0208 254.9682 217.0772 192.1743 181.0988 73.0643 91.0307 254.1905 105.0693 208.1522 211.0280 219.1870 249.0388 225.0443 231.1506 118.9984 73.0644 168.9809 171.1379 177.0564

9.00 9.85 12.60 12.80 20.70 21.70 21.72 22.00 22.06 22.32 22.47 22.70 22.90 23.00 23.00 23.04 23.10 23.20 23.30 23.30 23.30 23.50 23.65 23.70 23.70 23.76 23.90 24.40 24.46 24.50 24.50 24.55 25.80 23.00

a-e Each letter group represents a set of peaks that are believed to belong to the same chemical component because they appear together, perfectly coelute, and have the same peak shapes.

a personal computer running MATLAB 7.0 (The MathWorks). In addition, classification of a test set of dichlor samples was performed using KNN (PLS toolbox 5.0). RESULTS AND DISCUSSIONS The first step of the signature-discovery was the untargeted LC-ESI-HRMS analysis of trace impurities in dichlor. The main constituent in all dichlor samples was the MPA produced by the hydrolysis of dichlor during sample preparation. Figure 1A depicts the MPA signal in the TIC chromatogram from the LC-ESIHRMS analysis for a representative dichlor sample known as A-12. The MPA’s [M + H]+ signal with an m/z of 97 (unit

Table 5. Test Set Prediction Using KNN Classification of Signal-Averaged Samples

known sample

prediction (no variance scaling)

prediction (variance scaling)

prediction (variance scaling and NNMF)

BLK A B C D G H J

BLK A B C G G H A

BLK A B C D G H J

BLK A B C D G H J

resolution) dominates the TIC chromatogram such that the impurity peaks detected by XCMS are completely masked. Figure 1B illustrates the impurity peaks found in sample A-1-2 using the EIC created by the summation of the m/z channels pertaining to the [M + H]+ masses of the four impurity peaks (see Table 4). The removal of all m/z channels pertaining to MPA (e.g., MPA dimer) and all other noise-containing m/z channels was necessary in order to reveal these impurity peaks. Some of the chemical compounds responsible for these impurity peaks are believed to be present in dichlor stock A at ultratrace levels. For example, Figure 2 depicts the [M + H]+ signal for TEPO that was spiked into the dichlor composite. The [M + H]+ signal for TEPO in Figure 2 actually includes the summation of the m/z channels pertaining to the [M + H]+ masses for impurity peaks 3, 25, and 28 (see Table 4). The TEPO concentration relative to neat dichlor was approximately 2 parts-per-billion or ppb (w/w). The TEPO concentration was determined by dividing the known mass of spiked TEPO by the known mass of dichlor in the dichlor composite. TEPO was not detected in any of the nonspiked dichlor samples. However if TEPO were present at 2 ng per g of dichlor stock, then TEPO in dichlor could be readily detected by the analytical approach described in this study as shown in Figure 2. Assuming similar levels of ionization as TEPO, it is likely that the chemical compounds pertaining to the impurity peaks were present at the low ppb (w/w) range because the areas of the impurity peaks in Figure 2 are of the same magnitude as the area of the 2 ppb TEPO peak. Furthermore, one can also assume that many of the impurities revealed in this study were in the double-digit ppt given that impurity peaks depicted in Figure 2 have areas that were 2 orders of magnitude larger than several impurity peaks in this study. In addition to the above-mentioned impurity peaks, the XCMS software was used to obtain a total of 34 impurity peaks from an

Figure 2. Extracted-ion chromatogram (EIC) from the LC-ESI-HRMS analysis of the dichlor composite that was spiked with TEPO. The concentration of TEPO relative to neat dichlor prior to sample preparation and analysis is listed with the retention time (RT) and peak area (A). Impurity peaks 25 and 28 are each clusters of chromatographic peaks that were treated as one peak in this study. Analytical Chemistry, Vol. 82, No. 10, May 15, 2010

4169

Figure 3. Overlay of the extracted-ion chromatograms for the complete sample set of 40 dichlor samples and 8 method blanks for 139.11-139.13 m/z. This plot was generated by XCMS for impurity peak 7. Peak 7 was found in 80% of the dichlor samples and in none of the method blanks. One chromatographic peak is present for each of the 32 dichlor samples having impurity peak 7 and only baseline for all others. Some of the peak 7 signals are labeled using their respective sample names. XCMS retention-time correction was used in generating this plot.

initial XCMS peak table of 600 EICs. The 600 EICs were reduced to 34 after preserving only EICs (i.e., peaks) that were present in at least four of the dichlor samples and not present in any of the

eight method blanks. This step also included visually inspecting the EICs to ensure that those selected actually contained a chromatographic peak or clusters of peaks. Some of the selected impurity peaks were actually peak clusters of partially resolved isomers (see peaks 25 and 28 in Figure 2). Figure 3 illustrates a representative set of EICs provided by the XCMS software. The depicted EICs are for impurity peak 7. As seen in Figure 3, impurity peak 7 is present in a multitude of EICs. In fact, there are 32 chromatographic peaks depicted in the figure, one for each of the dichlor samples having impurity peak 7. Moreover, peak 7 and the other 33 impurity peaks were clearly not detected in any of the eight method blanks as depicted in the impurity matrix (see Supporting Information). The impurity matrix consists of 48 rows and 34 columns. Each row is an impurity profile for a given dichlor sample. An impurity profile consists of areas for all 34 impurity peaks. Those impurity peaks not detected (i.e., a flat signal baseline) have an area of zero. Each impurity profile or row was normalized to unit area by dividing each element in a row by the sum of all elements in that row. Area normalization makes an impurity profile resistant to the effects of sample dilution. The mean normalized profile for each dichlor stock is depicted in Figure 4. Each mean normalized area was obtained by averaging the nonzero normalized areas from

Figure 4. Bar graphs representing the mean normalized impurity profiles for dichlor stocks A-J. Each bar represents the mean value of the normalized peak areas for an impurity peak detected in samples from a given dichlor stock. 4170

Analytical Chemistry, Vol. 82, No. 10, May 15, 2010

Figure 5. Dendrogram using the peak area measurements from each sample’s normalized impurity profile. Each dichlor stock (A-J) generated four samples for a total 40 dichlor samples (see Tables 1-3 for sample labeling scheme). Also included are eight method-blank samples (BLK). The dichlor samples cluster according to stock. Stocks E and F are not reliably distinguishable.

the four samples of a given stock. The error bars were one standard deviation of the mean. Impurity peaks that were detected in less than three of four dichlor samples were given a mean area of zero for a dichlor stock. The exception was for impurity peak 7 in stocks A and B; it was detected in two of four samples. An exception was made for peak 7 because it was the only impurity peak found in all dichlor stocks and, therefore, may have some forensic significance. The main purpose of Figure 4 is to visually illustrate that each dichlor stock has an intrinsically different profile. The only exceptions were for stocks E and F. These two lots had only one robust impurity peak, that is, peak 7. The robustness of these normalized impurity profiles for possible sample matching was addressed by HCA. HCA was performed on the normalized impurity matrix. The dendrogram obtained from HCA is displayed in Figure 5, which clearly shows that dichlor samples cluster based on stock. The only exceptions are stocks E and F, which are not reliably distinguishable because two of the F samples are grouped perfectly with the E samples. On the contrary, the method blanks are well grouped. Interestingly, the dendrogram shows that the method blanks are closer in multidimensional space to the J samples than any other stock. This is easy to understand because the values for the normalized peak areas in the J samples were the smallest compared to the other samples (see Figure 4). Therefore, the Euclidean distance between the J samples and the method blanks, which had peak areas of zero, were also the smallest. (The Euclidean distance between a method blank and a sample was simply the sum of the squares of the sample’s normalized peak areas). Overall, the HCA results support that the impurity profiles of dichlor are stock specific and robust for the given analytical protocols. It must also be noted that the detected impurities have some resistance to loss or degradation given that one set of dichlor stocks (except for those from stocks E, F, and I) were hydrolyzed with water three months prior to final

preparation and analysis. The results from HCA illustrated the potential for sample matching using impurity profiles. Actual validation was made using a test set. The validity of the use of impurity profiles for sample matching was performed using a test set made from 7 of the 10 dichlor stocks and consisting of 14 dichlor samples (2 samples per stock) and 2 method blanks. These dichlor samples and method blanks were initially prepared on November 14, 2008. This test set was then subjected to final preparation and LC-ESI-HRMS analysis a month before the set 40 dichlor samples. The original purpose for this test set was for method development of the analytical protocols. Fortunately, these test samples were prepared and analyzed in a similar fashion to the set of 40. The main difference was the drying procedure used for the test set was harsher then that used on the dichlor set of 40. The drying procedure for the test set involved a faster nitrogen flow and a higher temperature for solvent volatilization. It was decided that these differences would further test the robustness of the analytical protocols and dichlor impurities for sample matching. Sample matching of the test set (n ) 16) was performed using KNN (K ) 3). KNN involved measuring the Euclidean distance between the normalized impurity profile of a test sample and each of the normalized profiles from the set of 40 (known set). The three closest samples from the known set to the test sample determined the stock classification of the test sample. K ) 3 was selected because it was the largest odd number (in order to avoid ties with nearest neighbors) that did not exceed the number of known samples in each stock class (n ) 4). Typically, the optimal K is the largest K (for more confidence in classification) selected by applying a cross-validation procedure on the known sample set. Using K ) 3, the number of test samples that were correctly classified, including correct classification of the two method blanks, was 13 (out of 16). This result seemed reasonable given the differences between the analytical protocols for the test and Analytical Chemistry, Vol. 82, No. 10, May 15, 2010

4171

Figure 6. Score plots from the non-negative matrix factorization of the impurity profiles for the 40 dichlor samples (o). Each sample is labeled with the letter corresponding to one of 10 stock sources (A-J). Also included are the locations and labels for eight signal-averaged test samples (+) that were mathematically projected into the above score space.

known sets. However, an inspection of the impurity profiles for the mismatched samples indicated that variance scaling and signal averaging might improve classification. Variance scaling is a commonly used data preprocessing step used in chemometrics to give all variables, that is, impurity peaks, equal footing by weighing variables based on their standard deviations. In this study, this meant giving smaller-intensity impurity peaks, which generally had smaller standard deviations, more significance and larger-intensity impurity peaks less significance. This was done by dividing all areas for a given impurity peak by the area standard deviation of that peak across the 40 samples of the known set, that is, the normalized impurity matrix. Prior to KNN classification, each normalized test sample was variance-scaled using the standard deviations from the known set. This improved classification such that 14 of the 16 test samples were correctly classified. The effect of signal averaging of duplicate samples was then tested. Signal averaging involved averaging the two normalized impurity profiles from each pair of samples from a given stock in order to lessen the inaccuracy associated with a single dichlor sample. Table 5 provides the KNN classification results for the signal-averaged samples with and without variance scaling. The combination of variance scaling and signal averaging resulted in 100% correct classification of the eight averaged test samples. Clearly, of the methods examined, this was the best approach for dichlor sample matching. PCA and NNMF were then each independently used prior to KNN classification in order to determine what impurity peaks were the most useful for sample matching. HCA does not provide that type of information. Using PCA, the original 34 variables of the variance-scaled normalized impurity matrix (known set) were compressed into 10 factors or principal components. The number of factors was 4172

Analytical Chemistry, Vol. 82, No. 10, May 15, 2010

Figure 7. Loading plots from the non-negative matrix factorization of the 40 impurity profiles. Each loading value is the weight given to a specific impurity peak for that specific factor.

determined by cross validation.19 Classification by KNN of the test set was then performed using the 10 factors after projection of the test set into the 10-dimensional (10-D) space defined by the factors. The same procedure was also done with NNMF using 10 factors. Only the results from NNMF are provided; however, the PCA results are remarkably similar to those for NNMF. The score plots in Figure 6 depict the locations of the known and test (19) Malinowski, E. R. Factor Analysis in Chemistry, 2nd ed.; Jon Wiley & Sons, Inc.: New York, 1991.

Table 6. Tentative Molecular Formulas for Select Impurity Peaks peak

mass [M + H]+

retention time (min)

tentative molecular formula of M

NIST isomers

3 5 7

145.1336 161.0997 139.1228

12.60 20.70 21.72

C7H16ON2 C8H16OS C8H14N2

6 28 21

samples in the 10-D space created by NNMF. The classification results obtained using NNMF were exactly the same as those obtained using the original 34 impurity peaks (see Table 5). More importantly, the score plots reveal which factor was the most important for discriminating a dichlor stock. For example, factor 2 and factor 10 were the key discriminating factors for stocks J and C, respectively, because the samples for those stocks were clustered away from all others along their respective factor. The loadings (see Figure 7) for these factors tell which of the 34 impurity peaks were the most important for dichlor discrimination. For example, the loadings for factors 2 and 10 indicate that impurity peaks 6, 11, 14, 19, 20, and 21 are the most important for distinguishing stock J while impurity peak 9 is the one most important for stock C. Not surprising, impurity peak 7 was the most important for distinguishing stocks E and F (see scores and loadings for factor 9). It was the only impurity peak present in stocks E and F. Interestingly, impurity peak 7 was the only impurity peak detected in all 10 dichlor stocks. Impurity peak 7 was detected in all dichlor stocks but not necessarily in all four dichlor samples from a given stock. All dichlor stocks except A and B had detections for peak 7 in at least three of four dichlor samples. Stocks A and B had detections for peak 7 in two of four dichlor samples. It is quite possible that the impurity associated with peak 7 was lost just enough during the sample drying procedure to not be detected in two of the four samples for stocks A and B. Nevertheless, impurity peak 7 was clearly detected in all dichlor stocks even if not present in all four dichlor samples from a given stock. It was not detected in any of the eight method blanks. Therefore, the impurity associated with impurity peak 7 was common to all dichlor stocks and may be a forensic marker unique to commercial dichlor. This assertion is still under investigation; however, it has not been detected in any of the commercial DMMP stocks we possess. The commercial dichlor samples were all made from commercial DMMP produced by one manufacturer. On the basis of the chemicals involved in the analytical protocols and the molecular complexity of peak 7, this impurity appears not to be synthesized during sample preparation or analysis. The tentative molecular formula for peak 7 and two other impurity peaks are provided in Table 6 to give an indication of

the types of trace impurities present in dichlor. Table 6 lists molecular formulas including the number of isomers for that formula as listed in the NIST Chemistry Webbook (www.webbook. nist.gov). The isotopes used for generating the molecular formulas were 14 N, 16 O, 12 C, 1 H, 32 S, 35 Cl, 37 Cl, 31 P, 127 I, 79 Br, and 81 Br. A low-resolution tandem mass spectrometry spectrum was obtained for each of these impurity peaks, but it was not sufficient to deduce chemical structure or down select the number of possible isomers. CONCLUSION This report demonstrated the use of bioinformatic and chemometric tools on LC-MS data for the discovery of trace forensic signatures for sample matching of various stocks of the model toxicant known as dichlor. Specifically, it was shown that 8 out of 10 dichlor stocks had unique impurity profiles obtained by untargeted LC-MS analysis. These impurity profiles were used with KNN to correctly match dichlor samples from a test set back to their respective sources. Further work may involve determining the molecular structures of some of the detected impurities and the inclusion of other forensic impurities to improve confidence in the sample-matching results. Feature-selection methods and state-of-the-art classification methods such as support vector machines will also be investigated. In addition, statistical criteria will be incorporated to flag test samples whose impurity profiles do not match those in a known sample set. ACKNOWLEDGMENT Funding for this work provided by the Science and Technology Directorate, U.S. Department of Homeland Security. SUPPORTING INFORMATION AVAILABLE Impurity data matrix generated in this study. This material is available free of charge via the Internet at http://pubs.acs.org.

Received for review February 8, 2010. Accepted March 29, 2010. AC1003568

Analytical Chemistry, Vol. 82, No. 10, May 15, 2010

4173