Classification of polychlorinated biphenyl residues: isomers vs

Jul 15, 1987 - Organic contaminants in the northwest Atlantic atmosphere at Sable Island, Nova Scotia, 1988–1989. T.F. Bidleman , W.E. Cotham , R.F...
1 downloads 0 Views 2MB Size
1853

Anal. Chem. 1987, 59, 1853-1859

Classification of Polychlorinated Biphenyl Residues: Isomers vs. Homologue Concentrations in Modeling Aroclors and Polychlorinated Biphenyl Residues D. L. Stalling* and T. R. Schwartz National Fisheries Contaminant Research Center, Route 1, Columbia, Missouri 65201

US.Fish and Wildlife Service,

W. J. Dunn I11 University of Illinois a t Chicago, Department of Medicinal Chemistry and Pharmacognosy, Chicago, Illinois 60612

Svante Wold University of Umea, Department of Chemometrics, Umea, Sweden

SIMCA (soft Independent modeling by class analogy), a prlncipai components chemometric modeling program, was used to examine complex mixtures of pdycMorlnatedbiphenyl resldues (PCBs) In fish and turtles. Indivldual PCB isomers were measured by electron capture capillary gas chromatography. We calculated PCB (Cll-lo) congener concentrations by summing 105 isomer concentrationsInto homologue subgroups. Informatlon theory was used to estlmate the maximum Information content of the two data sets. We compared the results from principal components modeling of samples and Aroclors by using both Isomer and Cll-lo homologue concentrations. Modeling of normalized data from Aroclors or their mlxtures gave similar sample score plots for both data sets. However, modeling environmental sample congener concentrations gave erroneous dasdflcatlon r e d s when compared to results from modellng Isomer data. Although the Cl,-lo sums accurately reflect the concentratlon of PCBs In the sample, calculatkns to determine PCB proflles as Aroclor mixtures should be made by uslng lndlvldual PCB Isomers.

The complexity of certain residues in environmental samples challenges analysts in many ways. Because polychlorinated biphenyls (PCBs) consist of complex mixtures of a possible 209 individual isomers, investigators require a powerful analytical approach that provides not only separation and detection, but data reduction and interpretation as well. These capabilities should facilitate characterization and description of residue profile similarities in environmental and technical mixtures (1). Chemometrics, as defined by Kowalski ( 2 ) )includes the application of multivariate statistical methods to the study of chemical problems. The use of chemometrics in evaluating the performance of analytical methods can improve data quality, identify the groups of samples that are similar, and provide a classification method for identifying the group or class to which a sample belongs (3). Pattern recognition is an important chemometric method. Two important analytical questions can be addressed by pattern recognition: What is the quality of the data analyses? What information about the problem under study is contained in the data? Increased use of chemometrics has been facilitated by the wider availability of less costly and more powerful microcomputers

Table I. Matrix Representation of Sample Analysis for Three Classes, p Peaks and n Samples (Chromatography Data Matrix) (objects) sample ng

(variables) peak number

12 3

e.

p

**e

P

1 class 1

z 3

class 2

class

3 N

and multivariate statistical software (3,4)that can be executed on these microcomputers. The power of principal components modeling of multivariate data, such as those encountered in Aroclors and PCB residues, provides a basis for graphical presentations of sample similarity, as well as results that parallel more traditional statistical analyses ( 4 , 5 ) . In principal components modeling, sample data are treated as points in multidimensional space. These data are projected onto lower dimensional space (generally two or three dimensions) in a way that preserves the maximum amount of variance and relations among samples and variables (4). This technique is especially useful in visualizing sample results having more than three dimensions. The method of principal components analysis (PCA), to which SIMCA (soft independent modeling of class analogy) belongs, makes no a priori assumptions of similarity to Aroclors. The SIMCA pattern recognition technique, developed by Wold and co-workers, has been described in detail elsewhere (4-6);hence only a short presentation of pertinent features will be given here. This pattern recognition technique is based on derivation of disjoint principal component models for classification of objects (samples). The primary objective of PCA is to get an overview of similarity among samples represented by data tables. The data tables discussed here have the matrix format shown in Table I. In this matrix, the notation xpndenotes a data point, index n denotes an object upon which a chemical measurement has been made (a sample), and index p denotes a measured

This article not subject to U S . Copyright. Publlshed 1987 by the American Chemical Society

1854

ANALYTICAL CHEMISTRY, VOL. 59, NO. 14, JULY 15, 1987

variable (a PCB isomer). Therefore the element xpn represents the value of variable p in object n. The matrix containing the data is called X in matrix notation. The SIMCA class models are bilinear projection models obtained by decomposing the class data matrix X into a score matrix T (n X F),a loading matrix P (F X p ) , and a residual matrix E. The calculations involved in principal components are summarized in the following equation.

X

= I*x

+ TP + E

(1)

The objective was to derive a model of the isomer and congener sums from the data set presented in Table I through a data matrix, X, having N objects (27samples) and P variables (105 isomers or 10 congener sums) from which the concentration value of the PCB isomer, xpn, could be calculated. The diagonal matrix x is the mean of variables x ( p ) in all samples. The ( n X F) score matrix, T, describes the projection of the n sample points onto the F-dimensional hyperplane defined by the (F x p ) loading matrix. If the residuals, E (or unexplained part of the measurement not modeled), are small when compared with the variation in X, then the model is a good representation of X. SIMCA has been applied to a variety of chemical problems. By thismethod, Stallinget al. (I)and Dunn et al. (3)examined similarities in the composition of PCB mixtures and Aroclors. These investigators demonstrated that three-term principal components models of Aroclor and Aroclor mixtures formed a tetrahedron-like volume in concentration space in which mixtures of any two Aroclors formed the edge boundaries and mixtures of any three Aroclors formed the surface planes of the tetrahedron. Mixtures of four Aroclors were contained in the interior space of the tetrahedron. Dunn et al. (3) demonstrated that partial least squares (PLS) in latent variables is a suitable method for determining the composition of Aroclor mixtures in samples composed of or derived from technical Aroclors mixtures. Because statistical evaluation of large sets of sample data composed of many individual PCB concentration measurements is difficult if the data are not readily available in machine-readable form, Onuska (7)used SIMCA to examine PCB residue profiles composed of Cll-lo congener sums derived from the analysis of Aroclors and their mixtures. Onuska's study focused on characterizing Aroclor mixtures by using Cll-lo congener profiles and obtained principal components score plots that were similar to those obtained by modeling 69 isomers concentrations by Stalling e t al. (I). Schwartz et al. (8)used SIMCA to assess the similarity of residue profiles in fish and turtles to Aroclors based on class models derived from isomer specific PCB residue profiles (105 peaks). These investigations determined that the environmental PCB residues could not be described by an Aroclor or Aroclor mixture and that it would be inappropriate to report the PCB residue profiles as such. In the present report, we further examine the data reported by Schwartz et al. (8) (1) to determine the relevance of modeling Cll..lo congener sums, (2) to determine the decrease in information content resulting from the congener summation into individual subgroups, and (3) to explore the use of three-dimensional (3-D) graphics for viewing sample scores from principal components modeling.

EXPERIMENTAL SECTION Analysis of Aroclors and Their Mixtures in Fish and Turtles. The data examined originated from analyses of PCB residues in composite fish samples, snapping turtles (Chelydra serpentina; from Darby Creek, Tinicum National Environmental Center, Philadelphia, PA), and Aroclors 1242,1248,1254,and 1260, and their mixtures. Chemical preparation, analysis, and modeling of the PCB constituents in these samples have been reported by Schwartz et al. (8). These procedures involved extraction and sample enrichment by gel permeation chromatography with

Table 11. Sample Derignation and Identification Numbers for Aroclors, Their Mixtures, Fish, and Turtles" designation

no. 1

3 4 5 6 7

21 22 23 24 25 26 27 28 8 9 10 11 15 16 14 12 17 19 13 18 20

Class I. Aroclors and Their Mixtures Aroclor 1242:1248:1254:1260 (1:1:1:1) Aroclor 1248 Aroclor 1254 Aroclor 1242 Aroclor 1260 Aroclor 1242:12481254:1260 (1:l:1:1) Aroclor 12541260 (5:l) Aroclor 1254:1260 (3:l) Aroclor 1254:1260 (2:l) Aroclor 1254:1260 (1:l) Aroclor 1254:1260 (1:2) Aroclor 1254:1260 (1:3) Aroclor 1254:1260 (1:3) Aroclor 1254:1260 (1:5) Class 11. Turtles turtle-A turtle-B turtle-C

turtle-D rep 1 turtle-D rep 2 turtle-D rep 3 turtle-E Class 111. Fish fish-A rep 1 fish-A rep 2 fish-A rep 3 fish-B rep1 fish-B rep2 fish-B rep 3

nRatios of Aroclor mixtures are in parentheses (weight ratios). Biobeads SX-3 and 1:l (v/v) hexane/methylene chloride. PCBs were separated from other coextractives with silica gel column chromatography. Fused silica capillary gas chromatography combined with electron capture gas chromatography was used to separate and detect PCB constituents. The concentration of PCB isomers was determined by regression calculations, which were done by using individual peak area responses for various injected concentrations of an equal weight mixture of Aroclors 1242,1248,1254,and 1260. The results of the gas chromatographic analyses were calculated and retrieved in a SIMCA data file. Data transfers were made to an IBM-AT computer by using the program Cyber (Department of Linguistics, University of Illinois at Champaign-Urbana, Urbana, IL) from the data base of the National Fisheries Contaminant Research Center (9) by way of an RS-232 link. For principal components modeling, the samples were grouped in three classes (Aroclors,turtles, and fish), having identification numbers as presented in Table 11. In addition to the sample data reported by Schwartz et al. (8), eight additional mixtures of Aroclors 1254 and 1260 were analyzed and included in the Aroclors (class I). In the present study the data matrix comprises 27 objects and 105 isomers or 2835 elements. In addition, the Cll-lo congener data derived from this matrix added another 270 elements to be evaluated. Principal Components Modeling. The concentration data obtained from each analysis were expressed as fractional parts and normalized to sum 100. The normalized data were analyzed statistically by calculating principal components sample scores (theta's) and variable loading terms (1) with the programs CLOAD and CPRIN, Version 3-X, from the SIMCA-3B software package (Principal Data Components, Columbia, MO). In addition to modeling the data having 105 individual isomers, we calculated the C1,l-lo)congener sums for each sample, using a BASIC program that reads a file containing the chlorine number of each isomer. Concentrations of isomers having the same number of chlorines were totaled to Cll-lo sums and the congener sums were written to a SIMCA data file for use in principal components modeling. The statistical technique of cross-validation (6) was used to de-

ANALYTICAL CHEMISTRY, VOL. 59, NO. 14, JULY 15, 1987

termine the number of significant principal components in each class. Geometrically a data table with P variables can he interpreted as ap-dimensional space with each object represented as a point. This feature is especially helpful in visualizing data having more than three dimensions. In the present study, projections were made from 105 dimensional space into either 2- (2-D) or 3-0 space. The sample scores determined from principal components modeling are used as sample coordinates in two-dimensional (2-D) or 3-D plots. Quality Control. Information on the precision of the analyses can he obtained from principal components modeling. If the precision of the method is high in relation to differences between sample groups, samples representing duplicate analyses should he located at nearly identical points in the plots. To address the reproducibility of the analyses, two replicate samples of a 1:1:1:1 mixture were analyzed and three samples were analyzed in triplicate: turtle-D (no. 11, 15, and 16); fish-A (no. 12, 17, and 19); and f i s h 8 (no. 13, 18, and 20) (Table 11). Representation of Sample Scores in Three-Dimensional Plots. Three-dimensional graphics are especially useful in presenting sample score plots if there are three or more principal components. Because it has been shown that principal components models of a data set derived from four Aroclors and their mixtures are represented by elements of a tetrahedron, 3-D plots of sample scores rotated about score axes were needed to examine the results of this study. To extend the 3-D plotting capacity of the SIMCA-3X package, we wrote a Basic program, SDPG.BAS, that reads a SIMCA score data file and transfers these data to an IBM Professional Graphics Adapter for display and rotation in 3-D. Each sample is represented as a point in color on a high-resolution monitor. Coordinates for a sample point to he plotted in 3-D were scaled to the display coordinates (400 x 600). The color of the point representing each sample may he determined hy a character match to a letter position in the sample's identity code. After the image is viewed, another rotation increment can he specified about an axis. The identity of a sample can he ascertained by striking another key to sequentially address the sample list, change the color of the sample point, and display the sample's assigned name on the screen. An additional program, HPPLOT.BAS, was written that provides a lower resolution image by using the color graphics adapter mode of the ProfessionalGraphics Adapter. This latter program can also construct a hard copy plot of the data on a Hewlett-Packardmultipen plotter. Copies of these programs can he obtained upon request to the authors. Information Content of Data Sets. To obtain quantitative information about the information content of each data set, we used the information and communication theory developed by Shannon (IO). This approach was also applied to qualitative chemical analyses hy Cleij and Dijkstra (21). Marlen and Dijkstra (12) applied information theory to measure the information content and evaluate selection of mass spectral peaks for retrieval of m a s spectra, and Scott (13) used it in feature selection to select ions from m a s spectral data to hinary enccde for calculating class models with SIMCA. Because the normalized data from capillary gas chromatography analyses are analogous to m a s spectra, we have used the approach reported by Scott (23) to calculate the maximum information content of the PCB data; we calculated it from the observed probability distrihution of isomers and congener data with a computer program written in BASIC. We used this program to calculate the probability distrihution for each of the PCB constituents for each data set by applying the following equation: no. of times peak, has concn > 0 in sample d a t a set P, = 12) no. of samples in data set The pwtiahility dirtrihution rP, = pruhability uf urciirrtnur d a given GL' peak 1 ) for thP P'CB constituents and thr (-'I,., , humologues are shown in Figure 1. The maximum iniorniarim mntent (ignoring cnrrelatiuns between isumers). I,,of peak I wae ralcirlaud from the prohahility distribution dntn by iiaing the

following equation: (3) I , = -P, lug2 (PJ - (1 -- I ' ) lng2 11 PJ wherr 0 loa' (0) = 0 The prohahility dirtrihution and informalion mntent w r r then plotted for each data set (Figure I ). The unit ior inf