A Procedure for Correlation of Chemical and ... - ACS Publications

AN-KUO MENG AND IRWIN H. SUFFET*. Department of Chemistry and Environmental Studies Institute,. Drexel University, Philadelphia, Pennsylvania 19104...
0 downloads 0 Views 388KB Size
Environ. Sci. Technol. 1997, 31, 337-345

A Procedure for Correlation of Chemical and Sensory Data in Drinking Water Samples by Principal Component Factor Analysis AN-KUO MENG AND IRWIN H. SUFFET* Department of Chemistry and Environmental Studies Institute, Drexel University, Philadelphia, Pennsylvania 19104

A statistical method utilizing principal component factor analysis (PCFA) was developed for chemical/sensory taste and odor correlation. With this method, specific flavor descriptors can be correlated with specific chromatographic peaks. A background odor mechanism was assumed to explain the odors perceived at or below their odor threshold concentrations. PCFA was applied to a series of simulated data sets and chemical/sensory data obtained from drinking water samples. The simulated data sets were used to evaluate six types of chemical/sensory response equations. The response equation giving satisfactory correlation results was then used to evaluate the drinking water sample data sets. After merger of the chemical/ sensory data, the covariance between items was calculated, and PCFA was applied to the covariance structure followed by a target transformation of PCFA factors. Quality assurance evaluations of both sensory and chemical data were an integral part of the correlation procedure. The correlation study using simulated data showed that the PCFA correlation method using linear-additive (e.g., log-additive) data yielded better results (i.e., less type I and II errors) than nonlinear, nonadditive data. The quality of the drinking water sample correlation results was highly dependent on the sensory data quality.

Introduction In the analysis of taste and odor problems in drinking water, causal relationships between the chemical constituents and sensory panel organoleptic characteristics of drinking water samples need to be defined. However, especially for drinking water samples from river sources, there may be 200-300 volatile organic compounds present at concentrations greater than 1 ng/L as measured by capillary gas chromatography (GC). Any of these chemicals alone or in combination can cause water sample odor. The odor-causing compounds are often unknown. The concentration of individual compounds are usually below their odor threshold concentrations (OTCs). Since the identification of all the GC peaks of a typical drinking water sample by GC/MS is usually not possible and the consideration of all compounds is desirable, statistical correlation methods can be used to mathematically describe the relationships between volatile chemicals and tastes and odors. This paper presents a statistical method, principal component factor analysis (PCFA), for chemical/sensory taste and * Corresponding author present address: Environmental Science and Engineering Program, UCLA, School of Public Health, 10833 Le Conte Avenue, Los Angeles, CA 90095; telephone: (310)206-8230; fax: (310)206-3358; e-mail: [email protected].

S0013-936X(95)00776-0 CCC: $14.00

 1997 American Chemical Society

odor correlation. Mallevialle and Suffet (1) introduced background information on chemical/sensory correlation methods and showed general results from correlation studies. Suffet et al. (2) used a similar statistical method, factorial correspondence analysis, to correlate chemical and sensory data. In the present discussion, a background odor mechanism is assumed to explain the odors perceived from drinking waters where most of the individual constituents are at or below their odor threshold concentrations. The quality assurance evaluation of both sensory and chemical data is emphasized as an important part of the evaluation procedure. Gross-Sensory Response Function for the Odors of Chemical Mixtures. When an odor is produced by a chemical compound dissolved in water, the relationship between the odor intensity and the chemical concentration is often described by the Weber-Fechner law (3):

OIi ) aij log Cj

(1)

where aij is the response constant between the intensity of odor i (OIi) and the logarithmic concentration of chemical j (log Cj) dissolved in the water. Mathematically, eq 1 represents a semilogarithmic “specific-sensory response” function between one particular chemical and one odor. When an odor is produced from a mixture of chemicals, its intensity may be the integration of all the individual “specific-responses“ to a “gross-sensory response”. Diversified models have been developed for the grosssensory response function. The diversity of these models falls into two categories: (1) additive (4-8) and (2) nonadditive (9). The additive models describe the gross-sensory response of chemical mixtures as the simple sum of the individual chemical intensity attributes (i.e., odors of each chemical). In contrast to this, the nonadditive models propose that the gross-sensory response is not a simple sum of the chemical attributes. While both categories of gross-sensory response models have been developed for chemicals at levels above their odor threshold concentrations, there is currently no model that describes the gross-sensory response of chemicals below their OTCs (see Figure 1). Sub-threshold odor characteristics (SubOCs) are the predominant odor characteristics of chemical compounds below their odor threshold concentrations. The abovethreshold or supra-threshold odor characteristics (SupOCs) of a chemical compound may be different than its SubOCs. The SupOCs of some chemical compounds vary depending on concentration. For example, at low concentrations, 2-methylisoborneol (MIB) causes a musty odor, while much higher concentrations are associated with a camphor odor. Figure 1 shows the possible relationships between odor intensities and chemical attributes utilizing the idea of grosssensory response of chemicals above and below their OTCs. Cases I-IV show four chemicals (A-D) possessing a similar odor characteristic (e.g., musty). For chemicals A-C, the musty odor is the SubOC; while for chemical D, it is the SupOC. Case I shows that when integration of the SubOC attributes is below the sensory perception threshold, no odor is discernible; case II shows that odor is perceived when the integration of the SubOC attributes exceeds the sensory perception threshold; case III shows that odor is perceived from a SupOC attribute; and case IV presents the odor intensity as the integration of both SubOC and SupOC attributes from the water sample. The reported total odor attributes are described by the gross-sensory response function. Background Odor Mechanism. Most previous studies on sensory/chemical relationship modeling were based on odor-causing chemical components well above their threshold

VOL. 31, NO. 2, 1997 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

337

FIGURE 1. Odor intensity and background odor mechanism. A, B, and C sub-threshold odorants; D, supra-threshold odorants. Case I, no odor perceived; case II, odor produced by sub-threshold odorants; case III, odor produced by supra-threshold odorant; Case IV, odor produced by mixture of sub- and supra-threshold odorants. (9-12). In this study, a working assumption proposed by Guadagni et al. (13) is utilized. According to Guadagni et al., the relationships between the chemical composition and organoleptic properties of water samples can be understood as an integration of sub-threshold odor characteristics that produces the overall gross-sensory response characteristics. In this study, an attempt is made to resolve the relationships between the chemicals and the tastes and odors observed in drinking water samples. As shown by eq 1, the correlation is best described by the response constants (aij) between odor intensity and chemical concentration regardless of the gross-sensory response function. Therefore, the aij values must be calculated by statistical correlation methods while minimizing the effect caused by the modeling of the gross-sensory response function (i.e., to be as independent of the gross-sensory response function as possible). Since chemical concentrations vary at trace level (i.e., µg/L to ng/L) in drinking water samples, the variation of an odor intensity may be the integration of many infinitesimal changes of chemical attributes. From the SubOC assumption, the odors perceived from a water sample may be the gross-sensory response of many SubOCs or SupOCs of the chemical attributes that exceed the olfactory threshold and are more intense than the other odors. This assumption attempts to explain the frequently perceived “background odors” of water samples, even though the chemical constituents are at concentrations below their OTCs.

Experimental Section Figure 2 outlines the procedures used for the taste and odor correlation study. Flavor profile analysis (FPA) (14) as adapted to the drinking water field (15, 16) was utilized to examine the organoleptic properties, and a closed-loop stripping analysis (CLSA) (17) was used for chemical analysis. The analytical data were processed separately using techniques specifically developed in this study. The normalization of chemical data and the compilation of a chemical data base were discussed by Meng and Suffet (18). The assessment of

338

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 31, NO. 2, 1997

FIGURE 2. Procedure for taste and odor correlation study. sensory data quality and the compilation of a sensory data base were discussed by Meng and Suffet (19). This paper describes the screening of the analytical data for correlation analysis and the correlation process of the sensory and chemical data bases. Simulated Data Sets. The PCFA correlation method is examined by applying it to a series of computer-generated data sets. Since the gross-sensory response function (G) is poorly understood, the applicability of the PCFA method to a variety of simulation equations was tested. The chemical/ sensory equations utilized in simulation can be classified into two categories: additive and nonadditive. The additive equations include (1) linear additive, (2) logarithmic additive, (3) adjusted linear additive, and (4) adjusted logarithmic additive; the nonadditive equations include (1) logarithmic additive with intercept and (2) a set of nonlinear log sum equations. The mathematical definitions of these equations are listed in Table 1, where the odor intensity of a certain descriptor (OIi) is related to the concentration of a certain chemical compound (Cj) with a statistically defined response constant (aij). To test the effects of experimental error on the correlation results, simulated error is added separately to the simulated chemical and sensory data. A tolerance level is set for each of the data sets, e.g., a maximum of 40% error is added to chemical data, and a maximum of 80% error is added to the sensory data. Since flavors are not recognized by the sensory panel in some situations, a flavor recognition factor is also utilized to test the effect of missing flavor descriptors. In the simulation, flavor descriptors are set to be recognized at a maximum of 50% of the time. Error simulation conditions are also shown in Table 1. Drinking Water Samples. The relationships between the chromatographic peaks of chemical profiles and the taste and odor descriptors of sensory profiles of drinking water samples were examined. Sixty-five drinking water samples were collected at Philadelphia Water Department’s Baxter Treatment Plant (PWD), and 51 samples were collected at

TABLE 1. Correlation Analysis of Simulated Data correlation error (%) no.

typea

no. of samples

no. of peaks

no. of odors

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

a b b b b c d d d d e e e e f f f

35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35 35

7 7 7 7 7 7 7 7 7 7 15 15 15 15 15 15 15

3 3 3 3 3 3 3 3 3 3 7 7 7 7 7 7 7

type I

type II

tolerance condition

14

E

5

E

14

E E

12 13 12 17 17 17 19

0 1 3 20 20 21 4

E R E E, R R

a Type of chemical/sensory response equation used in simulation: a, linear additive, OI ) ∑(a C ); b, log additive, OI ) ∑(a log C ); c, adjusted i ij j i ij j linear additive, dOIi ) ∑(aij dCj); d, adjusted log additive, dOIi ) ∑(aij d(log Cj)); e, log additive w/intercept, OIi ) ∑(log kij + aij log Cj); and f, nonlinear log sum, OIi ) log (∑kijCjaij). OIi, intensity of odor i; Cj, concentration of chemical j; aij, response constant of odor i to chemical j. Simulated tolerance conditions: E, with simulated chemical error e40%, simulated sensory error e80%; R, with a simulated flavor recognition factor e50%.

Philadelphia Suburban Water Company’s Neshaminy Plant (PSWC) (20). Procedure for Taste and Odor Correlation. PCFA (21, 22) was used to correlate the data bases for the two analytical measurements. The expected result of the correlation analysis is a series of chemical response constants (CRCs). Assuming that the Weber-Fechner law of sensory/chemical relationship (2) is valid for mixtures, the CRCs correspond to the aij values presented in eq 1. However, due to modeling and experimental errors, there may be some differences between CRCs and aij values. Hence, a chemical “correlating” to an odor does not mean it “causes” the odor. The goal of correlation analysis is to use the most reliable analytical and statistical techniques to derive a set of CRCs that best estimates the aij values. The experimental procedures of the correlation study include the following steps: (1) preparation of analytical data, (2) normalization of analytical data, (3) screening of analytical data, (4) merging of chemical and sensory data, (5) calculation of covariance between data items, (6) PCFA of the covariance structure, and (7) target transformation of PCFA factors. The sensory and chemical data bases are prepared, screened, and then analyzed by the PCFA method. Preprocessing of the raw data items includes transformation and normalization of each data point before statistical analysis can be performed. A computer program, TAOCA, was developed for PCFA multivariate statistical analysis. This program has the capability of handling a large data matrix on an IBM-compatible personal computer and is available from the authors. Preparation of Analytical Data. Sensory and chemical measurements were made separately. These two analyses produce different types of systematic and experimental errors. The error associated with a specific measurement must be reduced before correlation can be made. In order to monitor a number of chemical compounds and flavors in the water samples, data bases were established for each of the measurements. The sensory data base was constructed using the method described by Meng and Suffet (19). The chemical data base was established utilizing the total chromatogram method described by Meng and Suffet (18). Separate quality control procedures for these two data bases were discussed in these articles.

Normalization of Analytical Data. The Weber-Fechner law (eq 1) assumes that flavor intensity approaches zero as chemical concentration approaches unity. Under certain experimental conditions, therefore, adjustments become necessary. For example: (a) Since there is no definition for zero flavor intensity, and zero flavor intensity cannot be measured, the term “threshold” is used to represent conditions in which flavors are just perceived by a sensory panel. (b) The lowest chemical concentration at which a flavor can be perceived or recognized (i.e., the odor threshold concentration) varies from chemical to chemical. (c) The flavors and OTCs of most chemical components in the samples are unknown. When the sample size, n, is large enough, the population center of analytical measurements can be estimated by the sample mean. Hence, for f flavors and p chemicals, the following terms are defined: N

∑OI k)1

OIi. )

ni

ik

for i ) 1 to f flavors

(2)

N

∑C

jk

Cj. )

k)1

nj

for j ) 1 to p chemicals

(3)

where,OIi. is the mean intensity of flavor i; OIik is the intensity of flavor i from sample k; ni is occurrence of flavor i; Cj. is the mean concentration of chemical j; Cik is the concentration of chemical j in sample k; nj is the occurrence of chemical j; and N is total number of samples. Then, eq 1 can be modified to express the relationship between flavor j and chemical i in the sample k:

OIik - OIi. ) aij (log Cjk - log Cj.) Cjk ) aij log Cj.

( )

(4)

By replacing (OIik - OIi.) with OI′ik and (log Cjk - log Cj.) with log C′jk, then eq 4 becomes

VOL. 31, NO. 2, 1997 / ENVIRONMENTAL SCIENCE & TECHNOLOGY

9

339

OI′jk ) aij log C′jk

(5)

Equation 5 represents a modified Weber-Fechner relationship and is utilized in this study. Screening of Analytical Data. When chemical and sensory data bases are properly prepared, specific elements (i.e., the chemicals and odors) in the two data bases are selected for correlation study. The selection criteria are experimentally determined based upon data quality. The reliability of a data item is also considered, especially in sensory data selection. The selection of chemical data is based on (1) occurrence (sample size), (2) variability (sample variance - coefficient of variation relative to internal standards), and (3) detectability (occurrence above a concentration threshold). The selection of sensory data is based on (1) occurrence (sample size), (2) frequency of descriptor usage (familiarity), (3) qualitative reproducibility, (4) selectivity (note/descriptor ratio), (5) variability (sample variance), and (6) reproducibility (replicate variance). Despite the selection criteria stated above, the overall limiting factor is the sample size of the data set. The maximum number of resolvable factors is the smaller of the sample size and total number of variables. When f odors from n samples are correlated with chemicals, the maximum number of chemicals p that can be analyzed by PCFA is

p)n-f-1

(6)

where experimental error accounts for 1 degree of freedom. Thus, the chemical data must be screened to obtain a maximum number of chemicals to correlate with screened sensory data. Merging of Chemical and Sensory Data. After the two data sets are prepared, screened, and normalized, they are put into separate data matrices. The screened FPA sensory data, which includes the odor qualities and their corresponding intensities, are stored in a FPAn×f matrix, where, n is the number of samples and f is the number of odors selected. The screened CLSA data, which comprise the chemical informationschemical compounds (i.e., the peaks at certain retention indices) and their corresponding concentrations (i.e., the normalized peak areas)sare stored in a CLSAn×p matrix, where p is the number of chemicals selected. The two matrices, CLSAn×p and FPAn×f, are then merged into a matrix D with dimensions n × (f + p): chemicals C11 C12 C13 ............... C1p C21 C22 C23 ............... C2p : : : : Cki : : : : : : : Cn1 Cn2 Cn3 ............... Cnp

flavors O11 O12 O13 ............... O1f O21 O22 O23 ............... O2f : : : : Okj : : : : : : : On1 On2 On3 ............... Onf

(7)

chemicals Cov(C1,C1) ... Cov(C1,Cp) Cov(C2,C1) ... Cov(C2,Cp) : ... : Cov(Cp,C1) ... Cov(Cp,Cp)

flavors Cov(C1,O1) ... Cov(C1,Of) Cov(C2,O1) ... Cov(C2,Of) : ... : Cov(Cp,O1) ... Cov(Cp,Of)

Cov(O1,C1) ... Cov(O1,Cp) : ... : Cov(Of,C1) ... Cov(Of,Cp)

Cov(O1,O1) ... Cov(O1,Of) : ... : Cov(Of,O1) ... Cov(Of,Of)

(9)

It should be noted that the diagonal elements of the covariance matrix (i.e., Cov(i,i)) represent the variances of each variable (i.e. Var(i)). PCFA of the Covariance Structure. In matrix terms, PCFA extracts the information from a data set of large dimensions and condenses them into a set of factors with smaller dimensions. This extraction is expressed as

data ) score × factor + error

(10)

where factor is the loadings of the variables onto the factors, score is the attributes of the factors to the samples, and error is the residual random error of the raw data set. In general, PCFA is capable of extracting the underlying information from a much more complicated environment while reducing the random error associated with the experimental data. The PCFA extracted factors are sorted by their relative importance (i.e., their magnitude of variability). Each of the principal factors is

Pj ) vj1/2 × Fj

(11)

where, Pj is the jth principal factor calculated from the corresponding eigenvalue vj and eigenvector Fj. Target Transformation of the PCFA Factors. When factor analysis is applied to correlate sensory data with chemical data, the extracted factors represent the underlying variables that explain all the covariability among chemical and sensory variables. Each of the factors isolates the sources that cause the covariation of measurements and represents the composition of the chemical and sensory data (which was determined by the underlying gross-sensory response function in eq 1). Therefore, the variation of chemical composition will cause subsequent variations of the factors and thus affect the composition of the sensory data. However, PCFA results are difficult to interpret. For correlation analysis, a target transformation technique is utilized to transform the factors into a predefined structure. By suitable transformation, the relationship between chemicals and sensory responses can be resolved if each factor represents the sensory response composition of a specific chemical. For instance, after finding the principal factors, P, of the data space (see eq 11), a transformation matrix, Tp, can be defined to perform the target transformation:

where Cki is the logarithmic chemical concentration of compound i in sample k and Oki is the flavor intensity of flavor j in sample k. Calculation of Covariance between Data Items. From the merged data matrix D, the covariance between the chemicals and the flavors is calculated and stored in a covariance matrix Cov. By definition, a covariance matrix can be calculated by multiplying the transposed data matrix by the data matrix itself:

where p is the number of chemicals and Pp×p-1 is the inverse matrix of the chemical loadings of the first p principal factors. Then, applying the transformation matrix to the principal factors would result a target transformed matrix, TT:

Cov(p+f)×(p+f) ) DT(p+f)×n × Dn×(p+f)

Pp×(p+f)Tp ) Pp×(p+f)Pp×p-1

(8)

Tp ) Pp×p-1

) TT where DT(p+f)×n is the transposed data matrix, and Cov(p+f)×(p+f) is the covariance matrix, with the elements as the covariance between any two variables given by:

340

9

ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 31, NO. 2, 1997

(12)

(13)

where f is the number of flavors and the contents of the matrix TT are

factors

chemicals

flavors

f1

f2

f3

f4



fp

a

1

0

0

0



0

b

0

1

0

0



0 0

c

0

0

1

0



d

0

0

0

1



0

e

:

:

:

:



:

p

0

0

0

0



1

1

all











2 :

: :

… …

… aij

… …

… …

… …

f

afl









afp

TABLE 2. Data Quality in Chemical/Sensory Correlation Study PWD

(14)

Upon transformation, subsequent changes are made in sensory-factor loadings so that each transformed factor represents a specific chemical and the corresponding sensory loading represents the response constants (aij). Analytical error (i.e., type I and type II errors) is utilized to examine the applicability of the PCFA correlation method. By statistical definition, a type I error occurs when a significant aij is represented by a nonsignificant chemical response constant (CRC). A type II error happens when a nonsignificant aij is resolved with a significant CRC. (In this study, if the aij and CRC are of different sign, they are considered as unresolved, i.e., a type I error.)

Results and Discussion Correlation Results of the Simulated Data. Table 1 summarizes the correlation analysis results of the simulated data sets together with the parameters and chemical/sensory response equations used to generate the simulated data matrices. Additive Model. The simulated data sets using additive equations (cases 1-10) gave satisfactory correlation results. (1) The CRCs of the data sets without simulated error are identical to the aij values both in sign and magnitude. (2) When simulated error is included, primary aij values (i.e., the ones with significant values) are reproduced with no type II error. (3) Unresolved aij values (i.e., the portion specified by type I error) are all of relatively small magnitude (