Quantitative Component Analysis of Mixtures for Risk Assessment

and Design (M/C-781), College of Pharmacy, The University of Illinois at Chicago, 833 South Wood Street, Chicago, Illinois 60612-3479 ... Bibliogr...
0 downloads 0 Views 67KB Size
1050

Chem. Res. Toxicol. 1999, 12, 1050-1056

Quantitative Component Analysis of Mixtures for Risk Assessment: Application to Eye Irritation Hitesh C. Patel, Jose´ S. Duca, and A. J. Hopfinger* Laboratory of Molecular Modeling and Design (M/C-781), College of Pharmacy, The University of Illinois at Chicago, 833 South Wood Street, Chicago, Illinois 60612-3479

Cheryl D. Glendening and Edward D. Thompson Miami Valley Laboratories, The Procter & Gamble Company, P.O. Box 538707, Cincinnati, Ohio 45253-8707 Received June 4, 1999

A methodology called quantitative component analysis of mixtures (QCAM) was used to analyze an existing set of product formulation data to determine if the irritating ingredients in the mixtures could be identified. Eye irritation scores, based on a rat model, for 18 mixtures having a composite total of 37 components, were analyzed by QCAM. QCAM relates a net toxicity measure of a mixture to the toxicities of the individual components of the mixture through linear, quadratic, and pairwise cross-component concentration-dependent interactions. A correlation model is established using a particular genetic algorithm employing either multidimensional linear regression or partial least-squares regression fitting. Cornea eye irritation and average eye irritation are well-explained in terms of a linear model of, at most, three components over the set of mixtures. Moreover, extensive cornea and average eye irritations are due to only one of these three components of the mixtures. Also, one of the three significant components was predicted to decrease the extent of eye irritation, and subsequently identified as an “anti-irritant” in contact lens solutions. A reasonable linear correlation model could also be developed for conjunctiva irritation, but no significant iris irritation model could be constructed. The addition of quadratic and/or cross-component concentration terms to a linear correlation model did not statistically improve the overall resultant model. The QCAM models permit estimation of the intrinsic (self) toxicity of each of the components of a mixture, and may aid in the reduction, and ultimate elimination, of the need for animal eye irritation studies.

Introduction A relatively large number of modeling and chemometric methods for establishing structure-activity models with which to forecast biological end points have been developed (1, and references cited therein). Many of these methods have been applied to specific problems in toxicology. A variety of toxicological end points, ranging from “mild” inflammatory responses such as eye and skin irritation to measures of carcinogenicity, have been investigated (2-5). Clearly, such mathematical paradigms have the potential, and in some cases have shown the benefit, of reducing the use of animal testing as well as providing substantial time and cost savings in risk assessment. However, the very large majority of statistical and computational risk assessment and toxicity applications have been for single chemical entities. Mixtures of chemicals have rarely been considered. Yet, most commercial chemical products are mixtures. Thus, there is a pressing need to develop a reliable modeling and chemometric method for evaluating and forecasting the fate and effect of chemical mixtures in the environment. This paper describes such a method for product formulations. The methodology is termed quantitative component analysis of mixtures (QCAM).1 QCAM permits the construction of quantitative models for extracting the

individual toxicity profile of each of the components of a mixture from only knowledge of their relative concentrations and the net toxicity of the entire mixture. The application of a QCAM model can be extended to predicting the individual component toxicity profiles of new formulations provided the model has been validated. Thus, QCAM has the potential for reducing, or eliminating, the need for animal testing in making an assessment of a toxicity end point.

Materials and Methods Mixtures (formulations) can be thought of as presenting three distinct problems in terms of toxicological analysis. First, the intrinsic toxicity (the toxicity of a chemical tested by itself) of one, or more, of the mixture’s components may be unknown. Second, the toxicity of any component of the mixture may not be linear with respect to its relative concentration because of the toxicity mechanism and/or interactions with other components of the mixture. Third, interactions among the components can lead to cross-component contributions, that is either positive or negative synergy, to the net toxicity of the entire mixture. These three complicating features can be represented for a set 1 Abbreviations: QCAM, quantitative component analysis of mixtures; GAs, genetic algorithms; GFA, genetic function approximation; MLR, multidimensional linear regression; PLS, partial least squares; QSAR, quantitative structure-activity relationship; LOF, lack of fit; MV, mean value; MES, molar eye scores.

10.1021/tx990098z CCC: $18.00 © 1999 American Chemical Society Published on Web 10/09/1999

Quantitative Analysis of Mixtures for Risk Assessment

Chem. Res. Toxicol., Vol. 12, No. 11, 1999 1051

of mixtures by a set of equations each of the form

Tn(j) )

∑C t(i) + ∑C ij

i

2 ij t(i)

i

+

∑∑ C C ij

kjt(i,k)

(1)

cornea

i k>i

where Tn(j) is the net observed toxicity of the jth mixture in the training data set of M mixtures, Cij is the concentration of the ith component of mixture j, t(i) is the intrinsic toxicity of component i, and t(i,k) is the cross-component toxicity between components i and k. The second set of terms in eq 1 takes into account the possible nonlinear behavior of toxicity for each of the components of the mixture with respect to each of their concentrations. The third set of terms in eq 1 are the representations for pairwise toxicity interactions between components of the mixture. In those cases where only the first set of linear terms are significant in the expression of the biological activity, BA(j), of the jth mixture, an easy to understand, group additive model for the biological activity of the mixture results

BA(j) )

Table 1. Mean Value (MV) Eye Irritation Scores and Corresponding Standard Deviations (SD) for the 18 Mixtures Considered in the QCAM

∑ba(i,j)

(2)

i

where ba(i,j) is the biological activity of the ith component of the jth mixture. Further, ba(i,j), in terms of eq 1, can be expressed as

ba(i,j) ) Cij 〈ba(i)〉

(3)

where 〈ba(i)〉 is an inherent property of component i which is called the intrinsic biological activity of this component. The intrinsic biological activity, in turn, should be close to the measured biological activity of the component when tested by itself, assuming complications due to concentration are minimal. The number of unique terms, N, on the right-hand side of eq 1 over the training data set is, in most applications, not equal to M, with N > M called an overdetermined problem, and N < M an underdetermined problem. In both of these situations, an optimum fit (as opposed to the solution) is sought between the left- and right-hand sides of the set of equations defined in eq 1. Evolutionary algorithms and, in particular, genetic algorithms (GAs) are proving to be very effective tools in determining the global optimum fits for both over- and underdetermined relational data sets (6). Rogers has developed a particular formulation of a GA called the genetic function approximation (GFA), which has useful features beyond establishing the efficient large-scale global optimization fit in relational data sets (7). The GFA permits the automatic introduction of nonlinear relationships into correlation models for a relational data set. The generation of nonlinear relationships also leads to intrinsic testing of the relational data set for subpartitioned fittings with respect to the set of observations (mixtures). That is, the GFA will explore and determine if the relational data set should be broken into (clustered) subsets, and individual relational models (correlations) constructed for each of the subsets. This subpartitioning capability leads to the identification and removal of outliers as an inherent feature of the optimization process. The GFA also permits using different types of statistical fittings, including multidimensional linear regression (MLR) and partial least-squares (PLS) regression (8). Thus, model optimization can be explored in terms of the vehicle used to establish the correlation relationship. One other useful feature of the GFA is the identification of the complete set of unique correlation models fitting the relational data set and their relative significance to one another. However, perhaps the most important feature of using a GA in the construction of a toxicity quantitative structure-activity relationship (QSAR) is the ability to monitor descriptor usage as a function of model optimization pathway (the GA crossover operation number). Regardless of the “noise” in the dependent variable (toxicity end point measures) of the training set, the GFA approach will develop the best-fit family of models.

iris

conjunctiva

average score

mixture

MV

SD

MV

SD

MV

SD

MV

SD

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

36.7 7.5 5.8 6.7 11.7 18.3 17.5 7.5 5.8 31.7 5.0 5.0 5.8 4.2 0.8 18.3 7.5 40.8

25.6 6.1 2.0 2.6 16.6 14.4 17.8 2.7 2.0 25.4 0.0 0.0 2.0 2.0 2.0 16.9 2.7 29.7

4.2 4.2 5.0 4.2 3.3 2.5 0.8 0.8 0.0 3.3 1.7 0.0 0.0 0.8 0.0 3.5 5.0 3.3

2.0 2.0 0.0 2.0 2.6 2.7 2.0 2.0 0.0 2.6 2.6 0.0 0.0 2.0 0.0 2.4 0.0 2.4

3.7 4.0 3.3 2.0 4.0 6.0 4.7 2.3 2.0 4.0 3.3 2.3 4.7 2.7 2.0 4.7 3.3 5.0

0.8 0.0 1.0 0.0 1.3 1.3 2.4 0.8 0.0 2.8 1.0 0.8 1.6 1.0 1.3 1.0 1.0 1.7

44.5 15.7 14.2 12.8 19.0 26.8 23.0 10.7 7.8 39.0 10.0 7.3 10.5 7.7 2.8 28.8 15.8 49.2

27.7 6.8 1.6 3.8 18.9 16.0 19.1 4.8 2.0 30.2 3.2 0.8 2.4 3.9 2.4 16.3 3.5 33.8

Further, the usage of descriptors over the corresponding evolution to the best-fit family of models provides a basis for assessing the relative importance of the descriptors (the set of possible independent variables) in the best QSAR models irrespective of the measures of fit (R2, Q2, or LOF) (7). Thus, important and significant information regarding descriptor selection can be gleaned from a QSAR model having marginal measures of statistical fit because of high “noise” in the dependent variables of the training set. This is often the case for toxicity end point measures, and is true of the training set studied in the work reported here. In this study, 18 mixtures having a composite total of 37 possible components were investigated. Four scoring measures of rat eye irritation were determined for each mixture using a protocol analogous to that reported for the ECETOC eye irritation data set (9) which are, in turn, based on the Draize test (9). Individual irritation scores are recorded for the cornea, iris, and conjunctiva. A composite net eye irritation score is constructed from these three scores. Each mixture is evaluated using multiple animals. Five rats were used to establish the scores for each mixture in the data set. The average values of each of the rat eye irritation scores for each of the 18 mixtures, which are used as the dependent variables in the QCAM, are reported in Table 1. By convention, the extent of eye irritation increases with increasing score value in an assumed linear fashion. Table 1 also contains the standard deviations in each of the four eye irritation scores for each mixture. It is clear from a comparison of the standard deviation to the mean value score values that the rat eye irritation scores are rich in noise. Consequently, a QCAM study of this “noisy” data will particularly benefit from the use of a genetic algorithm in establishing correlation relationships. Table 2 contains the relative concentrations of each of the possible 37 components of each mixture in the training set. The concentration of water in the mixture, and the pH of the mixture, were considered as components of the mixture. An entry of zero in Table 2 means that specific component was not part of the particular mixture. The explicit chemicals used as components in the mixtures are not given for proprietary reasons. However, these mixtures were considered as trial lotion formulations that contain moisturizers, surfactants, perfumes, etc. A novel feature of QCAM is that the chemical structures of the components are not required to perform a study. However, to apply a QCAM model to predict the toxicity of a new formulation, this new formulation must be composed of components contained in the training set of formulations used to build the QCAM model.

Table 3. Linear Correlation Analyses of the Eye Irritation Data Given in Table 1 (A) Correlation Matrix of the Four Eye Irritation Scores

0 0 0 0 0 0 0 0 0 0 0 0.2 0 0 0.3 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0.1 0 0 0 0 0 0

Patel et al.

0 0 0 0 0 0 0 0 0 0 0 0.2 0 0 0 0 0 0

Chem. Res. Toxicol., Vol. 12, No. 11, 1999

0 0 0 0 0 0 0 0 0 0 0.3 0 0 0.1 0 0 0 0

0 0 0 0.5 0 0 0 0 0 0 0.4 0 0 0 0.3 0 0 0

cornea iris conjunctiva average

cornea

iris

conjunctiva

average

1.00 0.38 0.49 0.98

1.00 0.29 0.50

1.00 0.57

1.00

0 0 0 0 0 0 0 0 0 0 0.3 0 0 0 0 0 0 0

(B) Linear Correlation Equations between Standard Deviations (SD) and Mean Values (MV) for Each of the Four Eye Irritation Scores

0 0 0 1.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0.5 0 0 0 0 0 0 0 0 0 0.3 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0.4 0 0 0.2 0 0 0 0

cornea iris conjunctiva average

SD ) 0.81 × MV - 1.2; SD ) 0.17 × MV - 1.1; SD ) - 0.03 × MV + 0.3; SD ) 0.77 × MV - 3.8;

R2 ) 0.91 R2 ) 0.08 R2 ) 0.26 R2 ) 0.90

0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0 0.4 0.4 0.3 0 0.4 0.4 0.4

0.2 0.2 0.2 0.1 0.2 0.2 0.1 0.1 0 0.2 0 0.2 0.2 0 1.3 0.2 0.1 0.2

0.1 0.1 0.1 0.4 0.1 0.1 0.1 0.1 0.1 0.1 0 0.1 0.1 1.3 0 0.1 0.1 0.1

0 0 0 0.5 0 0 0 0 0 0 0 0 0 2 0 0 0 0

0 0 0 7.5 0 0 0 0 0 0 0 0 0 0 7.5 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.4 0

0 0 0 0 0 0 0 0 0 0 1.3 0 0 0 0 0 0 0

0 0 0 2.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0.9 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0 0 0 0 0 0.5 0 0 0 0

Results

a

Concentrations are expressed as a percentage of the total weight of the mixture.

0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.2 0.2 0 0.2 0.2 0.2 0 0.5 0.5 0.5 0 0.5 0.5 0.5 0.5 0.5 0.5 1.5 0.5 0.5 0.7 0 0.5 0.5 0.5 0.6 0.6 0 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.7 0.6 0.6 0.5 0 0.6 0.6 0.6 1.5 1.5 1.5 0 1.5 1.5 1.5 1.5 1.9 1.5 1.5 1.5 1.5 1.5 0 1.5 1.5 1.5 1.5 1.5 1.5 2 1.5 1.5 1.5 1.5 0 1.5 0 1.5 1.5 0 1 1.5 1.5 1.5 1.5 1.5 0 1.5 1.5 1.5 1.5 1.5 1.5 1.5 2.3 1.5 1.5 1.5 1 1.5 1.5 1.5 0 0 0 0 0 0 0 0 0 1 0.2 0 0 2.3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0.5 0 0 0 0 0 3 0 0 3 0 0 3 0 0 1.5 0 10 3 0 0 3 0 0 3 0 0 3 0 0 3 0 0 3 0 0 10 0 0 3 0 0 3 0 0 3 0.2 0 1 0 0 3 0 0 3 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 5.7 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 4 8 0 0 0 0 0 0 0 0 0.2 0 0.3 0.2 0 0 0 0 0 0 0.1 0.3 0.3 0 0 2 2 2 1.5 0.5 2 0 0 0 2 0 2 2 0 0 2 0 2 8 4 0 0 0 8 8 0 0 8 0 0 0 0 0 8 0 8 2.9 2.5 2.4 2.3 2.9 2.8 7.3 7.2 8.3 6.6 0 0 0 0 0 0 2.1 3.0 80.5 84.5 84.5 79.2 80.0 80.5 82.6 90.6 92.2 79.5 77.5 84.5 80.5 78.1 78.4 80.5 90.1 80.7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

mixture water pH co-1 co-2 co-3 co-4 co-5 co-6 co-7 co-8 co-9 co-10 co-11 co-12 co-13 co-14 co-15 co-16 co-17 co-18 co-19 co-20 co-21 co-22 co-23 co-24 co-25 co-26 co-27 co-28 co-29 co-30 co-31 co-32 co-33 co-34 co-35

Table 2. Aqueous Concentrations, pH, and the 35 Component Concentrations for Each of the 18 Mixtures in the QCAMa

1052

A correlation analysis of the eye irritation scores reported in Table 1 was carried out. The results of correlating each of the four eye irritation scores to one another is presented in a correlation matrix in part A of Table 3. It is readily seen that the cornea irritation score and the average eye irritation score are very highly correlated with one another. This is to an appreciable extent due to the cornea scores being highly weighted, as compared to the other three eye irritation scores, in computing the average eye irritation score in the Draize eye irritation paradigm (9). From a statistical point of view, cornea and average eye irritation scores are equivalent with respect to information content. Each of the standard deviations (SD) in the eye irritation scores was correlated with its respective mean value (MV). The linear correlation equations are given in part B of Table 3. It is seen that the cornea and average SD correlate significantly with MV. The larger the MV (greater eye irritation score), the larger the SD. Thus, the data in Table 1 suggest the cornea and average eye irritation scores effectively define compounds that are nonirritants, but do not effectively resolve the extent of irritation of compounds that are irritating. QCAM correlation models were constructed by first including only the linear set of terms in eq 1

Tn(j) )

∑i Cijt(i)

(4)

In this case, the training set consists of 18 observations (mixtures) for 37 degrees of freedom (components) which is a moderately overdetermined training set. However, the GFA method easily identifies the unique set of best correlation models as a function of the number of significant independent variables (components) allowed in a model. The best one-, two-, and three-term linear models, using eq 4, for each eye irritation score, are given in Table 4. These models are derived using MLR fitting. An inspection of the QCAM correlation equations in Table 4 leads to the following observations. (1) No significant QCAM model, as measured by R2 and 2 Q , can be constructed for the iris score using eq 4. Conversely, reasonably good and nearly equally significant (R2 ) 0.71-0.88) QCAM correlation equations can be constructed for the other three eye irritation scores. (2) Component co-1 is present in all significant QCAM models for cornea, conjunctiva, and average eye irritation scores. Moreover, a comparison of the one-, two-, and three-term models for each of these three eye irritation

Quantitative Analysis of Mixtures for Risk Assessment

Chem. Res. Toxicol., Vol. 12, No. 11, 1999 1053

Figure 1. Crossover vs variable (mixture component) usage during the GFA/MLR cornea irritation QCAM model optimization process. Only the six most often used mixture components in model optimization are plotted. Table 4. Best, As Measured by the Correlation Coefficient R, One-, Two-, and Three-Term QCAM Models Based on a Linear Relationship between an Eye Irritation Score and Component Concentration (eq 4)a Cornea Irritation Scores (CIS) CIS ) 5.93 + 2.61 (co-1) R2 ) 0.71; Q2 ) 0.60 CIS ) 7.38 + 2.94 (co-1) - 35.3 (co-3) R2 ) 0.83; Q2 ) 0.74 CIS ) 17.87 + 3.41 (co-1) - 56.14 (co-3) - 19.13 (co-14) R2 ) 0.88; Q2 ) 0.83 Iris Irritation Scores (IIS) IIS ) 0.53 + 1.47 (co-12) R2 ) 0.23; Q2 ) 0.09 IIS ) 0.85 + 1.43 (co-12) - 9.88 (co-33) R2 ) 0.42; Q2 ) 0.29 IIS ) 0.94 + 1.42 (co-2) - 0.42 (co-4) + 10.16 (co-22) R2 ) 0.62; Q2 ) 0.10 Conjunctiva Irritation Scores (CoIS) CoIS ) 2.91 + 0.22 (co-1) R2 ) 0.52; Q2 ) 0.41 CoIS ) 2.65 + 0.26 (co-1) + 0.18 (co-2) R2 ) 0.63; Q2 ) 0.36 CoIS ) 2.46 + 0.28 (co-1) + 0.21 (co-4) + 0.15 (co-7) R2 ) 0.71; Q2 ) 0.38 Average Eye Irritation Scores (AEIS) AEIS ) 10.47 + 3.02 (co-1) R2 ) 0.74; Q2 ) 0.65 AEIS ) 12.36 + 3.37 (co-1) - 37.34 (co-3) R2 ) 0.84; Q2 ) 0.77 AEIS ) 21.87 + 3.75 (co-1) - 55.64 (co-3) - 6.77 (co-13) R2 ) 0.88; Q2 ) 0.81 a The number of observations (mixtures) is 18, and Q is the cross-correlation coefficient.

measures indicates that component co-1 dominates in the correlation model. For example, the single-term QCAM correlation model for the average eye irritation score has an R2 of 0.74, while the best three-term model has an R2 of 0.88. Thus, the two additional components (terms) only provide a gain of 0.14 in the accounting of the overall variance of the training data. The dominating role of component co-1 in the average eye irritation score can

be discerned by plotting the crossover number in the GFA optimization process versus component (independent variable) usage. This has been done in Figure 1 for the cornea irritation score correlation models. It is quite clear from Figure 1 that component co-1 is the dominant independent variable being used in optimizing the QCAM correlation equations. In addition, variable usage of component co-1 plateaus to a constant usage of near 100% after about 4000 crossovers. This means that model optimization has occurred and secondary independent variables (the other mixture component concentrations) are fluctuating in modest usage for continued crossover operations. Stated another way, the constant usage of component co-1 indicates the best QCAM models have been realized and are given in Table 4. (3) The regression coefficient for component co-1 is positive and nearly the same in each of the correlation models of the respective three types of eye irritation scores for which good QCAM models can be built. Thus, the level of eye irritation is predicted to increase as the relative concentration of component co-1 increases in a mixture. Component co-3, however, has a negative regression coefficient in each of the QCAM models where it is found, which, in turn, suggests that this component in a mixture acts to decrease the level of eye irritation. Figure 2 contains plots of mixture number (as adopted in Table 1) versus the cornea irritation scores. The observed scores are plotted as well as the predicted cornea irritation scores using eq 4 by including (a) only the most significant component in the QCAM correlation and (b) the three most significant components in the QCAM correlation model. An inspection of Figure 2a reveals that the one-component QCAM correlation model gives reasonable predictions except for mixtures 6, 7, 15, and 16. In all four of these mixtures, the predicted cornea irritation score is greater than the observed score. The inclusion of components co-3 and co-14 brings the predicted average eye irritation scores into good agreement with the observed values for all four of these mixtures.

1054

Chem. Res. Toxicol., Vol. 12, No. 11, 1999

Patel et al.

Figure 2. Mixture number and corresponding component concentrations vs cornea irritation scores (CIS). The solid lines are the recorded scores, and the dotted lines are the predicted cornea irritation scores using (a) the best single-mixture component linear equation for CIS which is given in Table 4 and (b) the best linear equation for CIS having three mixture components which is given in Table 4.

This can be seen by comparing panels a and b of Figure 2. The concentrations of components co-1, co-3, and co14 for each mixture are reported below the mixture number on the x-axes of panels a and b of Figure 2. The contribution of component co-14 in the three-component QCAM model is to establish a good fit between observed and predicted eye irritation scores for some mixtures exhibiting low to moderate eye irritation scores (mixtures 8, 11, and 14). The role of component co-14 in the QCAM can be discerned by again comparing panels a and b of Figure 2. The possibility of a nonlinear dependence (represented by a quadratic term) of eye irritation scores on component concentration, the second set of terms on the right side

of eq 1, was next explored. No significant quadratic term could be identified in any QCAM correlation model for the four types of eye irritation scores given in Table 1. The most significant QCAM correlation equation for the cornea irritation score involving a quadratic term is given in part A of Table 5. The corresponding linear QCAM correlation model is also given for comparison at the top of Table 5. The consideration of the second set (quadratic concentration dependence) and/or third set (cross-component) of terms on the right side of eq 1 introduces many additional trial-independent variables, leading, in turn, to a higher level of overdetermining of the training set of mixtures. The inclusion of the set of quadratic con-

Quantitative Analysis of Mixtures for Risk Assessment Table 5. Best Nonlinear Cornea Irritation Score (CIS) QCAM Models, Based on the Terms in eq 1, and the Corresponding Best Linear CIS QCAM Model for Referencea Best Linear Model CIS ) 5.93 + 2.61 (co-1) R2 ) 0.70 Q2 ) 0.60 (A) Quadratic Concentration Dependence CIS ) 5.69 + 2.72 (co-1) - 0.02 (co-1)2 R2 ) 0.73 Q2 ) 0.59 (B) Cross-Component Dependence CIS ) 5.95 + 2.83 (co-1) - 0.39 (co-6) (co-19) R2 ) 0.76 Q2 ) 0.67 (C) Quadratic and Cross-Component Dependence CIS ) 5.92 + 2.91 (co-1) - 0.02 (co-1)2 - 0.38 (co-6) (co-19) R2 ) 0.77 Q2 ) 0.65 a

Q is the cross-correlation coefficient.

centration terms yields a data structure problem of 18 observations and 74 possible independent variables. This level of an overdetermined problem is easily handled by GFA analysis. A combination of linear and cross-component concentration terms (the first and third sets of terms in eq 1) were also used to develop QCAM correlation models. Again, no significant independent variable terms, other than linear concentration component terms, were found for any of the eye irritation scores. The top (but not statistically significant) cross-component concentration term QCAM model is given in part B of Table 5 for cornea irritation. The corresponding best linear QCAM correlation model is the same linear model given at the top of Table 5. The overdetermination for this representation of the QCAM problem has 18 observations and 703 possible independent variables. Obviously, this is a very highly overdetermined problem. However, it is well within the dimensions successfully treated by GFA (10). Moreover, the large majority of cross-component terms are zero, or at least very small relative to non-crosscomponent terms. This architecture of the independent variable matrix enhances the model optimization potency of the GFA. Finally, all three sets of terms on the right side of eq 1 were simultaneously considered in the development of a QCAM correlation model. No statistically significant quadratic and/or cross-component term was found in the GFA analysis of the four eye irritation scores. The best “forced” QCAM model with both a quadratic and pairwise cross-component interaction term is presented in part C of Table 5. The optimized QCAM model realized from this representation of the QCAM problem is the same linear model at the top of Table 5. The dimensional form of this QCAM problem has 18 observations and 740 independent variables. Overall, this representation of the QCAM problem is about the same as the linear and crosscomponent interaction form of the problem with respect to the extent of overdetermination. This representation

Chem. Res. Toxicol., Vol. 12, No. 11, 1999 1055

of the QCAM problem is once again handled by GFA model optimization using MLR fitting. All of GFA analyses reported above for MLR fitting were re-run using PLS fitting. In particular, the GFA/ PLS experiments were carried out as a function of the number of principal components used in the PLS fit. The GFA/PLS studies can be viewed as providing an assessment of the theoretical dimensionality of a data set, as well as the corresponding upper limit in correlation variance that can be “explained” in a statistical model of the data set. The “payment” for these useful measures is that no real-world model can be gleaned from PLS fitting. PLS models use orthogonal independent variables (the principal PLS components) constructed from linear combinations of the actual independent variables (mixture concentrations). The R2 PLS values as a function of the number of included principal components are given in Table 6. A comparison of the R2 values of the best one-, two-, and three-term MLR models of Table 4 to the respective R2 values of the one-, two-, and three-principal component PLS models (see Table 6) reveals the following. (1) Cornea and average eye irritation scores are governed by no more than three mixture components. There is very little difference in the R2 values of the best three-term MLR and three-principal component PLS models. Moreover, these R2 values are the effective maximum values as a function of the number of independent variables used in a correlation model. (2) No significant QCAM model is possible for the iris score. The best PLS fit has an R2 of only 0.63. (3) The mixture components found in the one-, two-, and three-term MLR models “explain” nearly the same amount of variance in cornea, conjunctiva, and average eye scores as the respective number of principal component term PLS models.

Discussion The principal finding from this QCAM is that one component, co-1, is largely responsible for the eye irritation scores over the set of mixtures. Only very modest contributions to eye irritation scores come from other components in the mixtures. The second most significant finding is that component co-3 decreases eye irritation scores. That is, component co-3 is an “anti-eye irritant”. An investigation of the uses of component co-3 subsequent to this QCAM study has revealed that component co-3 is used commercially in contact lens solutions “to moderate eye sensitivity”. Thus, the QCAM was able to provide the unexpected added benefit of identifying antieye irritation components, as well as the eye irritation mixture components. The ability to fully implement eq 1 in this QCAM study using GFA analysis has established that neither nonlinear concentration effects, nor pairwise cross-component interactions, play any meaningful role in the manifesta-

Table 6. R2 Values from GFA/PLS Analyses as a Function of the Number of PLS Principal Components and the Type of Eye Irritation Score

cornea iris conjunctiva average

1 principal component

2 principal components

3 principal components

4 principal components

5 principal components

0.69 0.27 0.66 0.75

0.85 0.46 0.71 0.85

0.89 0.63 0.72 0.89

0.83 0.60 0.73 0.81

0.77 0.53 0.73 0.75

1056

Chem. Res. Toxicol., Vol. 12, No. 11, 1999

tion of eye irritation in the test set of mixtures. It should be noted that if nonlinear concentration effects are significant in a QCAM model, then it may be possible to optimize (minimize) toxicity with respect to the component(s) exhibiting nonlinear concentration behavior. GFA analysis is crucial to the reliable and complete statistical investigation of the behavior of highly overdetermined data sets such as mixtures and their properties. GFA provides one of the very few ways currently available for testing various representations of statistical models for describing data sets without fear of compromising reliability by overfitting. The other way in which concerns regarding overfitting of the data arise is due to the noise (as measured by standard deviations) in the dependent variable measures (eye irritation scores). Once again, GFA provides a tool for coping with this possible problem. A crossover versus variable usage optimization plot, of the type shown in Figure 1, can be constructed as part of a GFA analysis irrespective of the noise in a data set. GFA optimization can be performed regardless of the “quality” of the data set and/or functional fit between dependent and independent variables. Thus, a best assessment of the relative significance of the trial set of independent variables can always be performed regardless of the lack of “integrity” of the data set. The large standard deviations in Table 1 are strong indicators that highly fit QCAM models should not be expected. Thus, the MLR QCAM models constructed by GFA optimization having R2 values in the range of 0.70.8 are likely the best that can be achieved for this data set. However, for cornea and average eye irritation, the SDs are highly correlated to the MV. Thus, nonirritating compounds can be scored with high accuracy (small SD), while increasingly irritating compounds are scored with decreasing accuracy (large SD). Both the MLR and PLS “average” the fit over the range in scores and do not consider the distribution of SD values over the range of molar eye scores (MES) (9). If the QCAM models of Table 4 are considered in terms of eq 3, then the intrinsic average eye irritation score of component co-1 is 3.02. Generalizing on this idea of combining a QCAM model, and eq 3, suggests that QCAM can be used as a preprocessor tool to generate the individual biological end point measures for each of the significant bioactive components of a set of mixtures. These individual biological end point measures can then, in turn, be used as the dependent variables in subsequent QSAR studies.

Patel et al.

This is a preliminary investigation of an existing set of data carried out to determine if the QCAM methodology could identify the irritating chemical component(s) of a set of formulations. The fact that the irritant was a major component of these formulations might suggest that the “answer” could have been found by direct inspection of the data set. However, the identification of the “anti-irritant” (a minor component of the formulations) indicates the methodology does work. Still, further validation of the QCAM approach using more extensive data sets is needed.

Acknowledgment. The UIC group is pleased to acknowledge the financial support of the Procter & Gamble Company and The Chem21 Group, Inc. Resources of the Laboratory of Molecular Modeling and Design were used to perform these studies.

References (1) Kubinyi, H., Folkers, G., and Martin, Y. C., Eds. (1998) 3D-QSAR in Drug Design, Vol. 1-3, Kluwer/Escom, Dordrecht, The Netherlands. (2) Chamberlain, M., and Barratt, M. D. (1995) Practical applications of QSAR to in vitro toxicology illustrated by consideration of eye irritation, Toxicol. in Vitro 9, 543-547. (3) Wilschut, A., ten Berge, W. F., Robinson, P. J., and McKone, T. E. (1995) Estimating skin permeation. The validation of five mathematical skin permeation models. Chemosphere 30, 12751296. (4) Klopman, G. (1998) The multi CASE program II. Baseline Activity Identification algorithm (BAIA). J. Chem. Inf. Comput. Sci. 38, 78-81. (5) Kawakami, Y., and Hopfinger, A. J. (1990) Prediction of initial reduction potentials of compounds related to antracyclines and implications for estimating cardiotoxicity. Chem. Res. Toxicol. 3, 244-247. (6) Hasegawa, K., Miyashita, Y., and Funatsu, K. (1997) GA strategy for variable selection in QSAR studies: GA-Based PLS analysis of calcium channel antagonists. J. Chem. Inf. Comput. Sci. 37, 306-310. (7) Rogers, D., and Hopfinger, A. J. (1994) Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships. J. Chem. Inf. Comput. Sci. 34, 854-866. (8) Glen, W. G, Dunn, W. J., III, and Scott, D. R. (1989) Principal components analysis and partial least squares regression. Tetrahedron Comput. Methods 2, 349-376. (9) Draize, J. H., Woodward, G., and Calvery, H. O. (1994) Methods for the study of irritation and toxicity of substances applied to the skin and mucous membranes. J. Pharmacol. 82, 377-389. (10) Hopfinger, A. J., Wang, S., Tokarski, J. S., Jin, B., Albuquerque, M., Madhav, P. J., and Duraiswami, C. (1997) Construction of 3D-QSAR models using the 4D-QSAR analysis formalism. J. Am. Chem. Soc. 119, 10509-10524.

TX990098Z