Selection of calibration samples for near-infrared spectrometry by

method: A comparative study of original molecular descriptors and multivariate image analysis descriptors. Zahra Garkani-Nejad , Marziyeh Poshteh-...
0 downloads 0 Views 655KB Size
569

Anal. Chem. 1900, 60,569-573

Selection of Calibration Samples for Near-Infrared Spectrometry by Factor Analysis of Spectra Gerd Puchwein Landwirtschaftlich chemische Bundesanstalt, Institut Linz, Wieningerstrasse 8, A-4025 Linz, Austria

Near-Infrared (near-IR) spectra of samples are measured at 19 fixed wavelengths. After a factor analysk of absorbances the factor scores of samples can be used to delimit a region of factor space. The extenslon of thls region and the distance of data polnts from each other are used as crlterla to lteratlvely remove samples as redundant. The samples left over are representative of the complete orlglnal set and are used for callbration by linear regresslon modellng. The potentlal of the selection algorithm to obtain callbratlons for protein, molsture, and oll was tested wlth corn and rapeseed samples. The effort for laboratory analyses can be substantially reduced wlth no appreclable loss of predlctlon accuracy. The same prlnclples also allow one to declde If a new sample wlll be covered by the callbration.

normalized, orthogonal variables are called factors. Each sample can be viewed as a data point in p-dimensional factor space and is characterized by its factor scores fac(i,k)

(i = 1 to n, k = 1 t o p )

Samples close to each other in factor space are considered similar. The squared Euclidean distance between any two data points i and j , D(ij), is chosen as a measure of similarity P

D(i,j) = C [fac(i,k) - fac(j,k)12 k=l

(1)

Due to the properties of factor space the squared Euclidean distance of the ith data point from the origin is identical with its Mahalanobis distance, MAHAL (9) D(i,origin) = MAHAL(i)

As near-infrared instrumentation has to be calibrated indirectly with a training set of samples, the proper choice of samples is decisive for a good calibration. If the variation in structure and composition of future unknown samples is not well taken care of by the trainiig set, unexpectedly large errors may occur. This can be avoided by selecting samples from a large pool of known composition of all the major constituents, but this usually requires a lot of cumbersome analyses by standard laboratory methods. Selection procedures based on spectral data alone, which can be collected with little labor and cost, have been presented (1-3). The proposed selection algorithm also relies on spectral data but was developed to meet the following requirements: to allow a reduction of an initially large set of samples to a given level retaining the most important ones only; to signal when the number of samples becomes too low for a good calibration; to enable the user to examine any future unknown sample by the same standards as applied in sample reduction. The basic concept of the algorithm is to iteratively eliminate samples from a large set. The samples finally left over form the selected training set for calibration. Its practical performance is tested with different data sets. If the absorbance, A (defined in this context as A = log ( l / R ) ,where R is reflectance), of n samples is measured at q fixed wavelengths in the near-infrared region, a raw data matrix of n rows and q columns is obtained. It is well-known that the columns of such a data matrix are usually nearly collinear (4,5). For further analysis a principal component extraction is performed fmt, a standard technique in chemical data analysis in general, which has become increasingly popular in near-infrared spectroscopy work (4-8). After extraction of p principal components (p Iq ) the original raw data matrix can be transformed into a new one of n rows and p columns. In contrast to the raw data the columns of the transformed data matrix are orthogonal to each other. Because of particle size effects the variance of the first component usually accounts for more than 90% of the total variance but often contains little chemical information. To give due statistical weight to the chemically relevant components, the values of each column are normalized to unit variance. These

In the following, the symbol D is used for distance generally or for the distance of a data point from another one while the term, MAHAL, is reserved for the distance of a data point from the origin. Thus samples with large MAHAL form the limit of the region of factor space populated by data points. As these samples can be expected to exert a strong influence in any regression equation of constituent concentration against factor scores, they have to be retained preferentially during sample reduction. Samples close to the origin and therefore with low MAHAL, carry only little weight in a regression equation on the other hand. Hence MAHAL of a data point constitutes the first criterion in the algorithm. Before starting with reduction, MAHAL of all data points is calculated and the samples are sorted by MAHAL in decreasing order. The distance of a data point from neighboring ones serves as the second criterion. To begin with, a limiting distance, D(l), is fixed. The data point with the largest MAHAL (i.e. the first of the set) is chosen as the reference point (ref). If the distance, D(i,ref), of any data point i from the reference point satisfies the condition D(i,ref)

< D(1)

the sample is considered redundant and removed. After all samples are checked against the first and all redundant ones are removed, the sample out of the remaining ones with the second largest MAHAL is made point of reference next, and all others (except the first) are examined for elimination again. This is continued likewise until there are no data points left closer than D(1) to each other. The first cycle is thus completed. The factor space region and the elimination procedure can be visualized by a two-dimensional analogue (Figure 1): Labeled points (1-8) represent samples that are made points of reference in this order. The radius of the circles corresponds to the limiting distance and unlabeled points lying inside symbolize redundant samples that are removed in this cycle. Points 7 and 8 are the two closest neighbors of the samples left over (minimum distance = d). The extreme factor scores

0003-2700/88/0360-0569$01.50/00 1988 American Chemical Society

570

ANALYTICAL CHEMISTRY, VOL. 60,

NO. 6, MARCH 15, 1988

N L

0

I

0

m

c

1

Figure 1. Two-dimensional analogue of

factor space.

(points 2, 3, 5, 6 ) , the maximum distance from the center (point l),and circles of radius d around data points together delimit an area which represents the factor space region of this subset. The heavy line of Figure 1 forms the boundary of this region. All original data points lie within this area. A second cycle using a new limiting distance (D(2)= 20(1)) is run with the samples surviving the first cycle. This is repeated in the same manner, the distance of the mth cycle being D(m) = mD(1). Thus the elimination proceeds stepwise and can be interrupted after each cycle. The samples finally left over are used for calibration. The degree of reduction achieved in each cycle depends of course on the choice of D(1). If D(1) is made very small, the computations become cumbersome as many cycles are needed before a substantial reduction is reached. If D(1) is too large, on the other hand, the graduation of successive cycles may be to coarse. It was found empirically that a good starting value is

D(1) = 0.2(p - 2) (It can be shown that p - 2 is the most frequent value of

MAHAL when the fac(i,k) values are distributed normally and

p > 2.) When the concentrations of the desired constituents determined by a standard method are available, the regression coefficients can be computed by any of the established methods (multiple linear regression, principle component regression, or partial least-squares modeling) (10, 11). All results given in this paper were calculated by using linear regression equations with the factors as independent variables. How far can the reduction of the samples be carried before calibrations derived from the reduced set will deteriorate? This is discussed below in connection with real data, but two plots can serve as useful guidelines: a plot of number of samples eliminated in each cycle against the limiting distance and a plot of leverages of remaining samples against number of samples eliminated. The leverage, h(i),of a sample is a measure of its influence in a regression equation (12). Its average value is p / n , if all factors are independent variables of the regression equation and there is no constant term (13). Thus the sum of leverages is equal t o p . (The h(i) values can be computed as h(i) = MAHAL(i)/(n - 1) (131.) When one has settled upon a particular number of samples, a kind of cross check can be made: The raw data of the reduced number of samples are submitted to factor analysis and the factor scores are recalculated for the whole data set using the transformation matrix derived from the subset. This corresponds to a rescaling operation. If the redefined subset region contains all or at least most of the original samples, it is assumed that the subset still can represent the original

set. This constitutes the final test for fixing the size of a representative subset. If the subset forms the calibration set, any predicted values of samples either new or of the starting set can be accepted with some confidence, if they lie inside the subset region while results of samples far outside will have to be viewed with caution (5). When large sets of data are dealt with, an important point has to be kept in mind too: the danger of a mix up of samples or of a grave error in sample preparation. Such an outlier sample is usually detrimental to calibration in any case, but its negative effect will be felt stronger when there are fewer samples. On the other hand such a sample is almost bound to show up in the reduced set, it may even grab a factor just by itself in factor analysis. Hence before any selection algorithm is run, the data set should be inspected for outliers. First, if p < q, an approximate F test can be performed to check if all q absorbance values of each sample can be regenerated adequately by the p-factor model (14). Second, an outlier has to be suspected when the factor scores of a sample are way off all the other ones. As there is no clear cut rule to decide between an outlier and a very important sample, there often wiU be a trade-off between robustness and accuracy of a calibration.

EXPERIMENTAL SECTION Near-IR spectra were measured with a fixed filter instrument (Technicon InfraAlyzer 400) at 19 wavelengths. All data were transferred online to an IBM compatible personal computer and stored on diskettes. All computations were performed with this computer using specially written software. A total of 73 corn samples of different breeds and year of harvest were ground with a Cyclotec mill to pass a 1-mm sieve. Protein content was determined by standard Kjeldahl methods. Moisture of another set of 46 corn samples was determined by oven drying. Rapeseed samples (104) were oven dried and milled as described previously (15). Oil content was determined by petroleum ether extraction. RESULTS A N D DISCUSSION The potential of the selection procedure was evaluated with three different sets of samples for three constituents. The content of the constituent of interest of each sample as determined by a standard reference method was available. Thus the selection cycles could be followed in terms of actual content of constituent. For each subset a factor analysis was performed and the number of factors required to regenerate the raw data matrix within measurement error was retained (16). Experience with different data had shown that seldom more than about 10 factors are necessary. As a rule they account for more than 99.99% of the total variance of absorbances. First the constituent concentration was regressed against the scores of each factor individually. For the final regression model factors were introduced into the regression equation by the absolute o f t values of their regression coefficients in decreasing order and a standard error was estimated by cross validation, SECV (17)

where y i is the laboratory value of the ith calibration sample, j ( i )is the near-IR predicted value of the ith calibration sample

using regression coefficients calculated without the ith observation, and m is the number of calibration samples. The inclusion of a factor was omitted when it failed to lower SECV. As the factors are orthogonal to each other, the regression coefficient of a factor is independent of all the other ones. After back converting these regression coefficients into coefficients of absorbances, a near-IR-predicted concentration was calculated for all samples of the complete starting set.

ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988

571

Table I. Protein Content of 73 Corn Samples step

m

range, %

mean, %

SD, %

f

R

SEC, %

SECV, 70

RS, %

0

73 40 18 15 10

7.25-11.21 7.25-11.21 7.37-11.21 7.37-11.21 7.68-11.21

9.37 9.36 9.41 9.36 9.59

0.93 0.96 1.13 1.14 1.16

8 6 7 6 4

0.989 0.984 0.989 0.989 0.993

0.14 0.18 0.21 0.22 0.19

0.15 0.20 0.26 0.24 0.21

0.14 0.16 0.19 0.22 0.40

4 9 10 12

Table 11. Moisture of

46

Corn Samples

step

m

range, %

mean, 70

SD, %

f

R

SEC,%

SECV, %

RS, %

OUT

0 6 7 9

46 21 18 13

3.63-19.39 3.63-19.10 3.63-19.10 3.63-19.10

11.14 10.08 10.28 9.62

4.41 4.42 4.76 4.60

6 6 5 4

0.997 0.998 0.998 0.999

0.39 0.33 0.40 0.29

0.42 0.39 0.43 0.39

0.36 0.38 0.42 0.54

0 2 7 10

The estimated error of calibration samples is given by the standard error of calibration (SEC)

Loo

while the residuals of the samples not in the calibration can be used to calculate a standard error of prediction (SEP)

where r is the number of samples not used for calibration (r = n - m), f is the number of fadors in the regression equation, and 9i is the near-IR predicted value of the ith calibration sample. As SEP is based on a different number and combination of samples depending on the subset chosen for calibration, the square root of the mean of squares of all residuals, RS, is used to compare different calibrations instead

Naturally RS is minimal when all samples are used for factor analysis and regression. A subset representative of all samples should give a RS not much inferior. Protein Content of Corn. Figure 2 shows the stepwise elimination of samples from a starting set of 73 corn samples. In the first cycle only three samples were found redundant indicating that the original set does not contain very similar samples. In cycles 2-5 an average of 10 samples could be eliminated, while only none to three samples per cycle were removed after the seventh cycle. It seems reasonable to assume that about 15-20 samples are necessary to represent the whole set. Figure 3 shows that the sum of leverages is partitioned unequally between eliminated and remaining samples. The difference of curves B and A increases a t first, is more or less constant down to about 20 samples left over, and finally drops off sharply, as very influential samples are removed at last. The number of samples remaining a t the stage of the drop off corresponds well to the range suggested by Figure 2. By use of transformation matrices of different subsets, all rescaled data points fall inside the subset region down to a subset size of 18. Below this limit a rapidly increasing number of data points lie outside. Thus 18 samples are considered to be the minimum required for good representation of the complete set. The regression statistics of the original set (step = 0) and of four subsets (obtained after 4,9,10, and 12 reduction steps) are summarized in Table I (Ris the correlation coefficient). Range and mean of protein content are conserved down to 15 samples very well. Due to the preferential removal of

b

'10

'30 distance

'20

Flgure 2. Stepwise elimination of corn samples. The total number of samples was 73. Curve A shows the number of samples that are eliminated in each step while curve B corresponds to the total number of samples left over.

0

20

40

60

80

samples eliminated

Flgure 3. Partition of sum of leverages between samples eliminated and left over. The total number of corn samples was 73. Curves A and B correspond to the theoretical sum (proportional to number of samples) and to the actually observed sum of leverages, respectively. Curve C is the difference of curves B and A.

samples close to the center, the standard deviation of subset samples, SD, increases during reduction. The original 73 samples can be reduced to 18 with no substantial loss in accuracy of prediction, but RS soars when the calibration set becomes smaller still. This behavior is paralleled by the number of samples outside (=OUT) the subset region after rescaling (Figure 4). Hence 18 samples are both sufficient and necessary for a calibration which is able to cover all 73 samples adequately. Moisture of Corn. Forty-six samples comprising normal, very moist, and partly dried specimens form the starting set. Plots similar to Figures 2 and 3 suggest that about 15-25 representative samples would be required. The transition from a good to a weak representation of the complete set could be narrowed down to 18-21 samples by rescaling all data points and counting those outside the subset region. Table I1 shows

572

ANALYTICAL CHEMISTRY, VOL. 60, NO. 6, MARCH 15, 1988

Table 111. Oil Content of 104 Rapeseed Samples step

m

range, %

mean, 7~

SD, ‘70

f

R

SEC, 70

SECV, 70

RS, 70

OUT

0 4

104 53 26 21 16 11

40.30-48.40 40.30-48.40 40.30-48.40 40.30-48.20 40.30-48.20 40.30-48.20

44.94 44.93 44.85 44.51 44.71 45.04

1.72 1.91 2.17 2.16 2.20 2.37

7 6 5 5 4 4

0.968 0.972 0.981 0.980 0.988 0.973

0.45 0.48 0.47 0.49 0.39 0.71

0.46 0.50 0.54 0.57 0.46 0.79

0.43 0.45 0.49 0.52 0.62 0.61

0 0 6 17 41 55

8 9 10 14

Table IV Influence of Selection Method on Error Estimate of Oil Content of Rapeseed 0 4 4 0

m

6

algorithm, %

random A, %

random B, 70

0.48 0.46 0.45 0.47 0.51 0.49 0.49 0.54 0.52 0.39 0.66 0.62 0.71 0.62 0.61

0.42 0.57 0.49 0.56 0.50 0.50 0.33 0.76 0.69 0.25 0.67 0.63 0.23 0.60 0.57

0.38 0.55 0.46 0.45 0.66 0.60 0.68 0.49 0.52 0.43 0.78 0.74 0.14 0.90 0.85

4

53

+

_.

02-20

SEC SEP

RS 26

5

SEC SEP

RS

calibration set s i z e

Figure 4. Prediction accuracy and subset region of different calibration sets: curve A, square root of mean square of residuals, RS, of 73 corn samples; curve B, samples not in the calibration set, which lie outside after rescaling, OUT.

21

SEC SEP

16

SEC SEP

RS RS 11

SEC SEP

RS

that a low RS is concomitant to a good representation as expressed by a low number of samples outside the subset region. It is interesting that though there are much fewer samples than in the first example, almost an equal number of samples is required for a good calibration. This is probably due to the wide range of moisture of the samples. Oil Content of Rapeseed. A plot analogous to Figure 2 indicated a region of 29-16 samples, the plot of leverages a region of 35-21. With a cross check and computation of six different regression models, it was confirmed that 26 out of 104 samples are representative of the whole set (Table 111). All three examples confirm that a calibration derived from a subset is capable of predicting the content of all samples of the initial population very well as long as the subset region in factor space extends to contain almost all data points. As the region shrinks and as an increasing number of data points fall outside, the prediction accuracy declines. For comparison two sets of samples were randomly selected for each subset level too. The corresponding RS values show that calibrations based on a random choice may perform almost equally well but that is just as likely to get much worse results (Table IV). Also it is difficult to decide on a sufficient number of samples. While the SEC of samples selected by the algorithm usually agrees quite well with the S E P of remaining samples even below the minimum of good representation, SEC and SEP of random samples often differ considerably. It was not possible to compare this selection procedure directly with other spectral selection methods (1-3). Application of the method of Honigs et al. (2) has shown however that very prominent samples are usually selected by both methods. In the course of development of this method some experimentation with cluster analysis was carried out to which this algorithm is somewhat related. Hence this algorithm and the cluster analytical approach of Naes (3) can be expected to turn out similar sets of samples. It is the intention of this work not only to select samples but also to arrive at an optimal subset size and to classify any future unknown samples. It is not yet possible to formulate an unambiguous rule which allows the stopping of sample reduction automatically when the minimum subset is reached, which still gives a good calibration. But on the basis of the data presented and of experience gathered with other sample sets of varying sizes,

the following mode of operation can be recommended for sample sets of unknown reference laboratory values at present: The complete set is subjected to the selection algorithm, which turns out subsets of decreasing size. Plots analogous to Figure 2 and 3 are drawn. Experience has shown that it is quite safe to reduce samples down to the region where the number of samples removed per step begins to level off and/or the difference of leverages has a maximum. Especially when the original set is large however, such an estimate of the minimum subset size is rather conservative and further reduction may still be possible. To close in on the limit a factor analysis is performed individually for successive subsets each of which can be considered a potential candidate for the desired minimum calibration set. After the data points of all samples are rescaled to the factor space as defined by a given subset, the positions of data points of samples not in the subset are checked. The covering of eliminated samples by subset ones can be measured by various quantities (e.g. F ratio, MAHAL). A simple count of samples whose data points lie outside the factor score range of subset samples was found to be useful. With decreasing subset size, this count increases only slowly at first but from a certain size downward the number of samples outside rises sharply. A plot of sample count against subset size serves well to recognize this behavior. After calibration any predicted values of samples outside will be obtained by extrapolation instead of by interpolation. Hence it is not advised to carry the reduction to a point where a substantial proportion of samples lies outside subset range. Thus, while a minimum subset size cannot be fixed unequivocally, it can be narrowed to a few choices. It is an attractive feature of this algorithm that each subset of samples is also a subset of samples of all previous levels; thus it is most economical to analyze at first only a minimum number of samples. Depending on the desired accuracy, samples from preceding cycles of reduction can be added to improve the calibration or to test prediction accuracy. Thus the effort for lab analysis can be tailored to the analytical problem. It has to be kept in mind though that the reliability of the determination of the constituent by a reference method

573

Anal. Chem. 1988, 60,573-577

becomes all the more important when there are fewer calibration samples. These three practical examples show that the selection algorithm is capable of substantially reducing the effort for reference analyses for near-infrared calibration. Much larger data sets (300-1100 samples) were also subjected to it. By application of the above criteria, the number of samples, which had to be analyzed by the reference method, could be cut down to about 10-13%, sometimes even to less than 5%. The same principles that form the basis of the selection algorithm are also suited to classify future samples. A sample very different from the calibration samples can usually be detected by the failure of the factor model to regenerate the absorbances of this particular sample. This can be checked by an F test as mentioned above (14). Apart from such outliers any sample can also be examined by the position of its data point in factor space with respect to the region of the calibration set. If it lies well inside the limits or if it is close to a calibration sample, it is probably safe to use the regression equation for prediction. The farther a sample is away from the calibration region however, the less reliable are the results predicted. A detailed elaboration of this concept will be the subject of a forthcoming paper.

ACKNOWLEDGMENT A. Eibelhuber is thanked for stimulating discussions on methodology and assistance in software development.

Registry No. HzO, 7732-18-5.

LITERATURE CITED Hruschka, William R.; Norris, Karl H. Appi. Spectrosc. 1982, 36, 26 1-265. Honlgs, D. E.; Hleftje, G. M.; Mark, H. L.; Hirschfeld, T. B. Anal. Chem. 1985, 57,2299-2303. Naes, Tormod J . Chemom. 1987, 1 , 121-134. Robert, P.; Bertrand, D. Sci. Aliments 1985, 5,501-517. Mandel, John J . Res. Nati. Bur. Stand. ( U . S . ) 1985, 90,465-476. Mark, Howard Anal. Chem. 1988, 58,2814-2819. Cowe, Ian, A.; McNicol, James W.; Cuthbertson, D. Clifford, Anawst (London) 1985, 110, 1227-1232. Cowe, Ian A,: McNicol, James W.; Cuthbertson, D. Clifford Anakst (London) 1985, 110, 1233-1240. Flury, Bernhard; Riedwyl, Hans Angewandte Multivariate Statistik;Gustav Fischer Verlag: Stuttgart-New York, 1983; pp 124-127. Beebe, Kenneth R.; Kowalski, Bruce R. Anal. Chem. 1987, 59, 1007A- 1017A. Martens, H. A. Thesis, Technical University of Norway, Trondheim, 1985. Belsley, David A,; Kuh, Edwin: Welsch, Roy E. Regression Diagnostics; Wiley: New York, 1980: Chapter 2. Velleman, Paul F.; Welsch, Roy E. Am. Stat. 1981, 3 5 , 234-242. Massart, D. L.; Dijkstra, A.; Kaufman, L. Evaluation and Optimization of Laboratory Methods and Analytical Procedures ; Elsevier Scientific: Amsterdam-Oxford-New York, 1978; Chapter 19. Puchweln, G.; Eibelhuber, A. Proceedings of the International NIR/NIT Conference, Budapest, Hungary, 1986; Akademiai Kiado: Budapest, 1987; pp 201-206. Malinowskl, Edmund R. Anal. Chem. 1977, 49,606-612. Puchwein, G.;Eibelhuber, A. Mikrochim. Acta 1986, I I , 43-51.

RECEIVED for review July 22,1987. Accepted November 13, 1987.

Dual On-Column Fluorescence Detection Scheme for Characterization of Chromatographic Peaks Christine E. Evans and Victoria L. McGuffin*

Department of Chemistry, Michigan State University, East Lansing, Michigan 48824

A novel detection system Is presented that allows the accurate measurement of zone varlance excluslve of extracolumn considerations. This system employs a slngle laser together wHh parallel detection optics and electronics to collect fluorescent emlsslon at several points along the chromatographic column. With one detector posltloned near the column Inlet and the other near the outlet, the dlfference In peak characterlstlcs (veloclty, area, varlance, etc.) between the detectors may be measured. Because both measurements are performed directly on the column, extracolumn effects may be successfully ellmlnated. Results of varying extracolumn InJectIon varlance and detector temporal varlance are reported for both slngle and dual detector modes. Verfflcatlon of system performance with Golay theory has been estabIlshed, and to date variances less than 10 nL2 have been measured accurately. This detectlon scheme Is not limited to llquld chromatographic appllcatlons, as described herein, but should be compatible with gas and supercrltlcal fluid chromatography as well as caplllary zone electrophoresls.

The effectiveness of a separation in chromatography is determined by the differential migration rate of solutes as well as by the broadening or variance of the solute zones. In the analysis of complex mixtures, altering the migration rate merely transposes the order of solute elution while not im-

proving the separation to any significant extent. However, minimizing the spreading of solute zones on the chromatographic column allows a greater number of species to be separated simultaneously. Thus, it is of major importance, particularly for the analysis of complex mixtures, to understand how various separation parameters affect solute zone dispersion. The development and experimental verification of theoretical equations to describe the dispersion of solute zones have been among the major challenges in modern chromatography. The correlation between theory and experiment becomes particularly difficult for microcolumn and high-speed liquid chromatography, where volumetric variances may be extremely small. Such correlations frequently are hindered by extracolumn sources of dispersion, which may be either volumetric or temporal in origin. Volumetric dispersion may arise from laminar flow or mixing phenomena in the injection, detection, or connection devices, while temporal dispersion may result from the finite rate of response of electronic circuitry in detectors, amplifiers, chart recorders, etc. The effect of extracolumn dispersion on chromatographic performance has been examined comprehensively by Sternberg (I) and by Guiochon and co-workers (2).

THEORY Extracolumn volumetric and temporal dispersion combine with the column dispersion to yield an overall system dispersion. If the individual sources of variance are independent,

0003-2700/88/0360-0573$01.50/00 1988 American Chemical Society