Use of Mahalanobis distances to evaluate sample preparation

Mar 1, 1987 - Howard. Mark. Anal. Chem. , 1987, 59 (5), pp 790–795. DOI: 10.1021/ac00132a024. Publication Date: March 1987. ACS Legacy Archive...
1 downloads 0 Views 794KB Size
790

Anal. Chem. 1987, 59, 790-795

Use of Mahalanobis Distances To Evaluate Sample Preparation Methods for Near-Infrared Reflectance Analysis Howard M a r k

Technicon Instrument CorDoration. Industrial Svstems Division, 511 Benedict Avenue, Tarrytown, New York 10591

Root mean square groupslzes computed from Mahalanobls distances were lnvestlgated as a means of determlnlng, before the callbratlon step ls executed, the proper sample preparatlon to use for performlng near-Infrared reflectance analysls. Thls multlvarlate mathematlcal technique appears to be a vlaMe and useful method of predetermlnlngwhlch of several sample preparation methods to use. The advantage of this technique Is that It can be appUed to optical data alone, wlthord the need to measure reference laboratory values and perform calibration calculations before determlnlng how the samples should be prepared.

Since the initial development of the field of near-infrared reflectance analysis (NIRA) (1-3) the techniques involved have been applied to a wide variety of materials (4). A distinguishing characteristic of the technology is the need for calibration of the instrument against reference laboratory results. The most common calibration technique is multiple regression analysis of the optical data against the reference values, although recently, principal component regression ( 5 ) ,partial least-squares (6), and curve-fitting (7)alogorithms have been investigated as calibration techniques. A limitation of all these calibration techniques is that no results are available until calibration samples have been collected, analyzed by the reference laboratory, and measured by the instrument and the calibration calculations have been performed. This complicated procedure makes it difficult to compare different approaches to calibration, particularly when it is desired to compare the effects of different physical phenomena. By contrast, instrumental analysis is fast, easy, and inexpensive. It is therefore desirable, although more difficult, to obtain useful information from instrumental data only, before doing a full calibration. Attempts a t this are very scarce. Principal component analysis ( 5 ) and Fourier transforms (8) have been used to reduce the amount of data required to contain the spectral information. Preanalysis of the optical data has been used to select those samples from a sample set that contains the most information (9). As with many other techniques, NIRA depends upon a combination of several considerations, each of which must be optimum in order to obtain best results. A combination of high-quality instrumentation, proper data treatment and adequate methods of sample preparation are required for best results. While many papers dealing with data handling exist in the literature, and instrument manufacturers continually expend large amounts of effort to improve the instruments, sample preparation techniques have been treated like a poor relation. Indeed, they have been largely ignored. Aside from the classic paper of Norris and Williams (IO) there is little in the literature specifically dealing with the effect of sample preparation on the accuracy of analysis via NIRA. Here we investigate a method of evaluating different sample preparation methods that requires only optical data collected from aliquots of a single sample subjected to the various preparation treatments under test. The mathematical tech-

nique used is the calculation of root-mean-square (RMS) groupsize in multidimensional space, a quantity based on the concept of Mahalanobis distance. We use this multidimensional method of data analysis to estimate the normal univariate measure of the precision of rereading the samples. This is analogous to the use of multiple regression to correlate the multivariate spectral data with the univariate measurement of a constituent during the calibration process. This analogy is the rationale for expecting the Mahalanobis distance measures to estimate the precision.

THEORY When the reflectance of a powdered solid is measured, the measurements vary depending upon the details of the presentation of the surface grains of the specimen. It is well-known that a coarse grind will result in greater variation from this source than a finer grind of the same material. Figure 1A shows this phenomenon for the case of ground hard red spring wheat; the upper set of spectra results from grinding 25 aliquots of wheat with a coffee grinder. The lower set results from grinding with a Udy Cyclotec grinder, which is known to result L a much more uniform grind. The smaller spread of the dad. due to use of the Udy grinder is apparant in Figure 1A. We have previously used the concept of measuring Mahalanobis distance (1l , 12) to quantitate the dispersion of data in multidimensional space and to distinguish different materials (13). Briefly, Mahalanobis distance is a multidimensional distance D defined by the matrix equation

D 2 (X- X)’M(X - X) where D is the Mahalanobis distance, X is a vector consisting of optical readings at several wavelengths (this vector describes the position in multidimensional space corresponding to the spectrum of a given sample), X is a vector describing the position of a reference point in space (e.g., the position of a known material), and M is the pooled inverse covariance matrix describing distance measures in the multidimensional space of interest. The distances calculated this way have the desirable property that they are based on the spread of the data in multidimensional space. In cases such as that shown in Figure lA, it is clear that one group of data is spread more than the other. The comparative spread between the two sets of spectra in multidimensional space is even greater than that shown by the spectra in Figure 1A. Figure 1B is a two-dimensional plot of the same data. This plot was formed by plotting the readings at one wavelength (1680 nm) against the readings at a different wavelength (2100 nm) for each specimen. The greatest spread of the data, following an approximate 4 5 O line, is due to the uniform, correlated scatter variations due to the repacking of the sample. This spread is only somewhat greater for the coffee-mill ground material than for the Udy ground material. The spread in the orthogonal direction is due to other sources of variation, such as moisture differences, the contribution of nonuniform scatter variations, etc. Examination of Figure 1B shows that the data obtained from the coffee-mill-groundspecimens give rise to a multidimensional

0003-2700/87/0359-0790$01.50/00 1987 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 59, NO. 5, MARCH 1, 1987

A

.ow ,800

1 :z ,700

,400

.Loo ,100

B

* *'

+:

:

*+' *+

'*

I)

Figure 1. (A) Spectra of wheat. The higher absorbance set of spectra represents 25 aliquots of a sample ground wkh a coffee mill. The lower absorbance set represents 25 aliquots ground with a Udy grinder. (B) A log-log plot of the data in part A. The greater width of the group of data points from the coffee-mill ground wheat causes those data to have a much larger spread in multiidimensional space than the data points from the Udy-ground material.

pattern in which the width of the group is much larger, relative to the length, than the Udy-ground material. Thus these anomalous phenomena cause the coffee-mill data to have a much larger spread in multidimensional space than the Udy data, and the increased spread is also much greater than would be suspected directly from the spectra in Figure 1A. The amount that a given material is spread through multidimensional space can be calculated as the RMS groupsize, which has been previously used for improving the accuracy of classsification by compensating for differences in the spread of different materials (14). The RMS groupsize is calculated as where Dij is the Mahalanobis distance from the ith sample observation belonging to the jth group to the center of the jth group, the summation is taken over all observationswithin the group, and n is the number of observations in the jth group. Here we are concerned with the relationship between the RMS groupsize and the behavior of a calibration applied to data containing variation of the X data. To the extent that variation of the X values (the optical data, in NIRA) can be accounted for by the multiple regression model, there is no harm to the fact that these data are subject to electronic noise, repack variation, etc. However, perfect compensation via the regression is usually not possible. Inasmuch as there is not yet a rigorous, generally applicable theory of diffuse reflectance in existence, it has not been possible to compensate perfectly for the differential changes in sample reflectance at different wavelengths by mathematical means. Attempts to eliminate

791

these variations through instrumental design variations have been, and seem fated to be, unsuccessful, because the changes are due to real differences in reflectance on the part of the sample. Previous work (15) has shown that averaging several readings of each sample can improve results, due to mathematical reduction of variation in the optical data. A good method of preparing samples will also improve results by physically reducing variation of the optical data. The more uniform the sample, the more reproducible the optical readings will be when sample is measured repeatedly; this results in smaller error in the X variables with a resulting decrease in the bias of the regression coefficients. The results in Figure 1show that the spread of data in multidimensional space does indeed depend upon the method of sample preparation used, and Norris and Williams (10) have shown that use of the Udy grinder is the preferred method of preparing wheat for near-IR analysis. A "perfect" method of preparing the sample will create such a homogeneous material that there will be no variation beyond that due to electronic noise; and thus there would be no effect on the final calculated answer. When some inhomogeneity exists, then the effect of the optical variations on the final answer will depend on both the calibration used and on the covariance structure of the data. For perfectly homogeneous samples,the RMS groupsize will be reduced to very small values (to zero in the limit of absolutely no change of readings from any cause). For real samples, which are not so completely homogeneous, the spread of the data will cause the RMS groupsize to increase; the less homogeneous the sample is, the more the data will spread. Thus, we expect that both the RMS groupsize and the repack standard deviation will decrease with improved sample preparation methods, and calculation of the RMS groupsize, which can be performed on data from a single sample and without the need for a calibration equation, can be used to determine the method of sample preparation that will provide homogeneous samples for instrumental measurement, and thus improved NIRA results. For the beef data, and for the first wheat results, the standard deviation of repack was computed by calculating the ordinary standard deviation of the predicted values of the several readings of each sample, using a calibration equation appropriate to the material. The standard deviations (SD) due to the several components of variation that were measured for the wheat data were calculated by using the equation

where d is the difference between the paired predicted values (using the appropriate equation) due to the phenomenon and n is the number of readings of the specimen.

EXPERIMENTAL SECTION Two types of samples were used to examine the hypotheses presented in the previous section: wheat and beef. The beef was obtained from a local supermarket and eight different methods of preparation were examined: (1)meat grinder at room temperature (RT) and then blender together with dry ice; sample maintained at 5 O C and run cooled in open cup; (2) meat grinder at RT and then blender at RT; run in open cup at R T (3) meat grinder, stored in refrigerator 1day, and then blender at RT; sample maintained at 5 "C and run in cooled cup at 5 O C ; (4) meat grinder precooled with dry ice, beef ground in cold meat grinder and then blender with dry ice; sample maintained at 5 "C and run in open cup; (5) meat grinder at RT and then food processor for 90 s, scraping every

702

ANALYTICAL CHEMISTRY, VOL. 59, NO. 5, MARCH 1, 1987

Table I. Comparison of Repack Standard Deviation with RMS Groupsize for Beef Data (A) Results for Individual Sample Preparation Methods fat sample prep

SD repack

RMS grpsz"

1

0.348 0.543 0.484 1.25 0.248 0.532 0.332 0.660

2.02 2.01 2.02 2.66 1.65 2.00 1.91 2.09

2 3 4 5 6 7 8

protein SD repack RMS grpsz" 0.209 0.299 0.156 0.628 0.154 0.114 0.110 0.188

solids SD repack RMS grpsz"

1.50 2.27 2.07 3.06 1.54 1.69 1.87 1.83

0.715 0.587 0.384 0.874 0.375 0.598 0.664 0.846

2.52 1.94 1.84 2.79 1.34 1.81 1.65 1.97

19-wavelength RMS groupsize 4.63 4.51 4.32 6.07 3.15 4.03 4.00 4.13

(B) Statistics for Comparison of Repack SD with RMS Groupsize fat

a

protein

solids

corr coeff t test

Results Using Only Calibration Wavelengths 0.9556 0.8896 7.95 4.77

0.7165 2.52

corr coeff t test

Results Using RMS Groupsize from all 19 Wavelengths 0.8663 0.8818 4.25 4.58

0.6505 2.09

RMS groupsize calculated by using the same wavelengths as used in the corresponding constituent calibration.

30 s; sample maintained at 3 OC and run in open cup; (6) meat grinder and then food processor at RT, stored refrigerated 1 day; run in open cup; (7) meat grinder, then frozen and thawed, and then blender at RT for 60 s, scraping every 5 s; (8) meat grinder and then blender at RT for 60 s, scraping every 5 s. Ten to 15 repacks of each sample preparation method were used for the beef samples. The data for wheat were collected by using a somewhat more complex procedure. A large sample (approximately 5 lb) of hard red spring wheat was mixed with a Boerner divider and then divided into 100 aliquots. Four methods of grinding the aliquots were used (16);25 aliquots were ground by each method. The methods used were as follows: (1)Wiley mill fitted with a 1-mm screen; (2) Udy Cyclotec mill fitted with a 1-mm screen; (3) Mitey mill (knife mill), 30-s grind; (4) Mitey mill, 10-s grind. Each aliquot was packed twice into a sample cup, each pack was read in two orientations, for each orientation two sets of instrument readings were collected. This procedure is an extension of the one previously used for one of the sample preparation methods ( I 7) to four methods. The optical data for the wheat samples were collected on a Technicon InfraAlyzer 500, and an InfraAlyzer 400R was used for the beef samples. RMS groupsizes for the various data sets were calculated by using discriminant analysis software previously described (14); all results other than the RMS groupsize calculations were ignored.

RESULTS AND DISCUSSION In order to test the theory, and determine if there is a relationship between the calculated value of the RMS groupsize and the effect of sample preparation, the value of the standard deviations of the NIRA analyses for the several repacks of each of the sample preparation methods was calculated. This value is compared with the RMS groupsize of the optical readings for the corresponding set of samples. A difficulty arises in doing this, however. The difficulty is that in normal usage the calculations described in this paper are intended to be used before a calibration is performed. Thus, since the wavelengths will have not yet been chosen, it may not be clear which wavelengths to use to calculate the

RMS groupsize. Thus, in this work we calculated the RMS groupsize based on two sets of wavelengths: the set used in the calibration we are comparing, and the set of 19 wavelengths contained in the InfraAlyzer 450R. Table I displays the results obtained for the beef samples. In part A the RMS groupsizes based on the wavelengths used for a given constituent are presented next to the standard deviation of the results from the several repacks for that constituent and sample preparation method. The RMS groupsizes based on the set of 19 wavelengths is presented at the right of Table IA; this set of groupsizes is the same for all constituents with which it is being compared. In order to determine whether the RMS groupsize is, indeed, an estimator of the repack variation or not, the correlation coefficient between the two statistics was computed for the set of eight sample preparations. In order to determine if the results are statistically significant or merely due to random variation of the data, the Student t test was also computed. These statistics are presented in part B. Since the critical value of t(0.95, 4) = 2.19, we find that almost all the correlations are statistically significant; Le., a real phenomenon exists. If these data are plotted on a graph, we find that most of the sample preparation methods give approximately the same results. Only methods 4 and 5 are appreciably different than the others; nevertheless, these differences are sufficient to demonstrate the efficacy of the method. In practical use we would want to select the sample preparation method that provides the smallest RMS groupsize, and therefore the smallest repack SD. Therefore,of more note is the fact that the RMS groupsize for all three sets of wavelengths corresponding to the constituent calibrations, as well as the RMS groupsize determined for the full set of 19 wavelengths, is smallest for sample preparation no. 5 . The repack standard deviation is also smallest for sample preparation method 5 for both fat and solids. It is not smallest for the protein, although we note that the values for the smallest few repack SD are sufficiently close as not to matter too much which was used. Therefore, even though the correlation coefficient for the RMS groupsize based on 19 wavelengths is not as high as for the RMS groupsize based on the wavelengths used in the corresponding calibration, the use of that calculation arrives at the same result, i.e., the selection of the same sample preparation method.

ANALYTICAL CHEMISTRY, VOL. 59, NO. 5, MARCH 1, 1987

793

Table 11. Standard Deviation of Readings at Individual Wavelengths, for the Different Sample Preparation Methods for the Beef Data (all values multiplied by 100)

wavelength, nm

method 1

method 2

method 3

method 4

method 5

method 6

method 7

method 8

1445 1680 1722 1734 1759 1778 1818 1940 1982 2100 2139 2180 2190 2208 2230 2270 2310 2336 2348

1.919 1.988 2.145 2.042 2.150 2.154 2.147 2.807 2.194 1.791 1.942 2.005 2.037 2.046 2.040 1.898 1.813 1.788 1.836

2.919 2.795 3.012 3.012 3.000 2.960 2.927 1.588 1.867 2.735 2.805 2.826 2.832 2.827 2.804 2.639 2.213 2.307 2.146

2.482 3.075 3.272 3.217 3.288 3.290 3.315 1.258 1.327 2.317 2.604 2.720 2.785 2.848 2.847 2.529 1.842 2.072 1.847

1.685 1.841 1.775 1.722 1.794 1.876 1.951 1.573 1.457 1.383 1.401 1.538 1.600 1.602 1.524 1.203 1.319 1.233 1.314

1.470 1.334 1.445 1.401 1.471 1.480 1.512 1.412 1.387 1.477 1.510 1.573 1.581 1.592 1.610 1.635 1.555 1.616 1.587

2.151 2.412 2.541 2.481 2.540 2.544 2.567 1.621 1.598 2.044 2.222 2.317 2.352 2.388 2.400 2.249 1.909 2.026 1.912

1.383 1.806 1.871 1.809 1.880 1.908 1.972 1.145 1.113 1.254 1.381 1.462 1.515 1.544 1.536 1.346 1.280 1.239 1.259

1.830 1.583 1.703 1.654 1.751 1.777 1.831 1.834 1.722 1.726 1.778 1.867 1.866 1.883 1.890 1.865 1.732 1.809 1.785

Table 111. Correlation Coefficients between Individual Optical Reading SD’s and Repack SD’s for the Various Constituents from the Beef Data, for the Eight Sample Preparation Methods

wavelength, nm 1445 1680 1722 1734 1759 1778 1818 1940 1982 2100 2139 2180 2190 2208 2230 2270 2310 2336 2348

fat 0.0268 0.0039 0.0859 0.0779 0.0821 -0.0483 -0.0238 -0.0519 -0.0267 -0.1060 -0.1661 -0.1293 -0.1149 -0.1243 -0.1697 -0.2987 -0.2442 -0.2856 -0.2624

protein 0.0132 -0.0635 -0.1441 -0,1347 -0.1458 -0.1193 -0.1089 0.0717 0.0966 -0.1207 -0.1997 -0.1801 -0.1685 -0.1879 -0.2384 -0.3597 -0.2500 -0.3365 -0.2847

Table IV. Comparison of Total Standard Deviation with RMS Groupsize for Wheat Data (A) Results for Individual Sample Preparation Methods

19-wave-

solids -0.2494 -0.3540 -0.4006 -0.4045 -0.4004 -0.3823 -0.3718 0.3918 0.3045 -0.3369 -0.4011 -0.3924 -0.3940 -0.4097 -0.4343 -0.4780 -0.2882 -0.4084 -0.2843

Thus, these data show that selecting a sample preparation method based on the RMS groupsize will not lead us astray and will indeed allow us to select the appropriate method. A further question arises: could we have done as well with a univariate method of data analysis? The univariate statistic corresponding to the RMS groupsize is the standard deviation; it behooves us to investigate the relationship between the standard deviation of the optical readings and the standard deviation of repack. These results are shown in Table 11. Table I11 shows that the correlation of readings at individual wavelengths to the repack SD is small and not significant; thus individual wavelength readings could not be used to predetermine the appropriate sample preparation method. Surprisingly, the wheat data presented a much more complicated situation. The initial calculations using the wheat data were performed by using already-existing universal calibrations. Table IV presents the wheat results corresponding to the beef results of Table I. For this table, the data of the first reading of the first orientation of the first pack of each aliquot was used, to simulate taking different aliquots for testing. While the t value for the comparison of total standard deviation of moisture with RMS groupsize shows a superb relationship, the other three comparisons show

sample prep

1 2 3 4

length

protein moisture RMS SD RMS grpsza SD RMS grpsza groupsize 1.860 0.559 0.648 0.672

2.078 1.181 1.432 2.057

0.408 0.534 0.163 0.246

1.569 1.931 0.832 1.056

4.300 3.422 4.161 5.337

(B)Statistics for Comparison of Total SD with RMS Groupsize protein

moisture

Results Using Only Calibration Wavelengths corr coeff t test

0.6201 1.1

0.9997 57.9

Results Using RMS Groupsize from all 19 Wavelengths corr coeff 0.0670 0.6210 t test 0.095 1.1

’RMS groupsize calculated using the same wavelengths as used in the corresponding constituent calibration. essentially no relationship at all. The puzzlement caused by this rather unexpected result was magnified by the fact that, for the protein and moisture data separately, the optical data correspondingto the sample preparation method giving the smallest standard deviation also gave the smallest RMS groupsize. However, the sample preparation method giving the smallest standard deviation for moisture was not the same as the one giving the smallest standard deviation for protein; similarly, different sample preparations minimized the RMS groupsize of the two constituents. The apparant failure of the sample selection algorithm appears to be due, not to an inherent fault of the algorithm, but to the fact that the best sample preparation is not necessarily unique! This result perhaps should not be entirely unexpected. While for most constituents the optimum sample preparation method is the one producing the most homogeneous and repeatable material, the optimum one for moisture is the one that minimizes the exposure of the sample to air. As long as the sample treatment is consistent, exposure to air causes a systematic change in the amount of moisture in the sample rather than a random change. Matters could

794

ANALYTICAL CHEMISTRY, VOL. 59, NO. 5, MARCH 1, 1987

Table V. Comparison of Repack Component of Standard Deviation with RMS Groupsize for Wheat Data (A) Results for Individual Sample Preparation Methods

protein

moisture

sample prep

SD

RMS

SD

RMS

repack

grpsz"

repack

grpsz"

1 2 3 4

1.492 0.198 0.804 0.981

2.051 0.749 1.400 2.295

0.359 0.088 0.218 0.312

1.597 0.678 1.586 1.571

19-wavelength RMS groupsize 1.114

2.683 4.271 5.550

(B) Statistics for Comparison of Repack Component of' with RMS Groupsize protein

Table VI. Comparison of Rotation Component of Standard Deviation with RMS Groupsize for Wheat Data (A) Results for Individual Sample Preparation Methods

sample prep 1 2 3 4

protein RMS SD rotat grpsza 0.207 0.070 0.188 0.363

1.952 0.565 1.390 2.437

moisture RMS SD rotat grpsza 0.077 0.030 0.056 0.091

1.750 0.495 1.142 1.841

19-wavelength RMS groupsize 4.456 2.686 4.215 5.582

(B) Statistics for Comparison of Rotation SD with RMS Groupsize

moisture

protein

Results Using Only Calibration Wavelengths corr coeff 0.d472 0.8710 t test 2.25 2.54 Results Using RMS Groupsize from all 19 Wavelengths corr coeff 0.6961 0.8375 r test 1.37 2.16 a RMS groupsize calculated by using the same wavelengths as used in the corresDondine consituent calibration.

not be otherwise, else moisture measurement, one of the most successful near-IR methods, would not be possible. Nevertheless, one does not wish to prepare the sample twice in order to measure different constituents; what is needed is the method of preparation that produces satisfactory results for all constituents, even if that method is a compromise. An extensive analysis of one of the data sets,which has been previously reported (17))shows that, for the wheat data, the largest source of variance is the aliquot-to-aliquotvariability caused by the initial sampling of the unground wheat. Since this sampling error was introduced into the data even before any sample preparation was performed, it clearly has no relationship to the effect of the sample preparation method used. Thus, a better comparison can be made by considering the individual sources of error separately. This is possible with this data set due to the way the data were collected. As discussed in the Experimental Section, each of the 25 aliquots of wheat that were subjected to each of the grinding techniques was measured eight times by using an experimental design which nested noise contribution within rotations, and rotations within repacks, and repacks within aliquots. Thus, noise, cup orientation sensitivity and repack error can all be estimted separately. The RMS groupsizes corresponding to these components of variance can also be estimated separately. The RMS groupsizes due to the noise component was calculated by subtracting the optical data at each wavelength of one reading of a given orientation for a given pack from the set of optical data of the other reading corresponding to the same orientation and pack. This gave 100 sets of differences for each sample preparation that were used in the RMS groupsize. To obtain the RMS groupsizes corresponding to orientation, the two sets of readings corresponding to each orientation were averaged, then the differences of the readings corresponding to the two orientations of each pack were subtracted from each other in a manner similar to the above and these differences were used. Similarly, data from the two orientations measured for each pack were averaged, and the differences between the two packs obtained from each aliquot of wheat were used to calculate the RMS groupsize corresponding to the repack effect for each sample preparation. These results are reported in Tables V, VI, and VI1 for the three components of variance discussed above. The critical

moisture

Results Using Only Calibration Wavelengths corr coeff 0.9519 0.9875 t test 4.39 8.86 Results Using RMS Groupsize from all 19 Wavelengths corr coeff 0.9820 0.9691 t test 7.36 5.56 RMS groupsize calculated by using the same wavelengths as used in the corresponding constituent calibration.

Table VII. Comparison of Noise Standard Deviation with RMS Groupsize for Wheat Data (A) Results for Individual Sample Preparation Methods

samde prep 1

2 3 4

protein RMS SD noise grpsza 0.078 0.053 0.080 0.112

1.339 1.338 1.428 2.525

moisture RMS SD noise grpszn 0.015 0.018 0.017 0.016

1.125 1.299 1.156 1.925

19-wavelength RMS groupsize 4.112 3.342 4.146 5.543

(B) Statistics for Comparison of Noise SD with RMS Groupsize protein

moisture

Results Using Only Calibration Wavelengths 0.8800 0.0853 coir coeff t test 2.62 0.012 Results Using RMS Groupsize from All 19 Wavelengths corr coeff 0.9932 0.0522 t test 12.08 0.0865 RMS groupsize calculated by using the same wavelengths as used in the corresponding constituent calibration.

value of t(0.95,2) = 2.9; thus we find that for repack and noise, few of the relationships in Tables V, VI, and VI1 are statistically significant. However, a noteworthy point is that in all cases, the sample preparation method giving the smallest standard deviation due to each component of variation also has the smallest value of RMS groupsize. Taken individually for each case, this result is not too impressive. If chance alone were operating, we would expect this result only one time in four. However, for the three cases taken together (Le., repack, rotation, and noise), the probability of arriving at this result from chance alone is one in 64 (0.015). If we include the original measure of total standard deviation (even though we have seen that it is due to an uninteresting effect), the probability drops to one in 256 (0.0039). These low probabilities are well below the limit usually used for statistical testing (0.05) and thus is statistically significant or, in other words, indicates that the relation between the two measures

ANALYTICAL CHEMISTRY, VOL. 59, NO. 5, MARCH 1, 1987

795

T a b l e VIII. Comparison o f Components o f V a r i a n c e f o r P r o t e i n with RMS Groupsize f r o m Wheat Data (Udy Grind Calibration)

Table IX. Comparison o f Components o f V a r i a n c e f o r P r o t e i n with RMS Groupsize f r o m Wheat Data (Mitey Mill Grind Calibration)

(A) Results for Individual Sample Preparation Methods

(A) Results for Individual Sample Preparation Methods

rotation repack RMS SD RMS SD RMS grpsz" repack grpsz" repack grpsz"

noise

sample SD prep repack 0.085 0.058 0.089 0.122

1 2 3 4

1.339 1.338 1.428 2.525

0.258 0.101 0.326 0.568

1.952 0.565 1.390 2.437

0.755 0.257 1.449 4.151

2.051 0.749 1.400 2.295

(B)Statistics for Comparison of Components of SD with RMS Groupsize noise

rotation

noise rotation repack SD sample RMS SD RMS SD RMS prep repack grpsza repack grpsz" repack grpsz" 1 2 3 4

0.8721 2.52

0.8769 2.58

noise

repack

corr coeff t test

0.9901 9.99

0.9494 4.28

0.8549 2.33

" RMS groupsize calculated by using the same wavelengths as used in the corresponding constituent calibration. RMS groupsizes using all 19 wavelengths taken from Tables V, VI, and VII, as appropriate.

of determining optimum sample preparation is real. For the rotational sensitivity, the relationship between RMS groupsize and the standard deviation due to rotational sensitivity is seen to be statistically significant; thus we can conclude that for this component of variation, the RMS groupsize is, indeed, an indicator of the effect of rotation. We also note from Table VI that the smallest SD for rotation for both protein and moisture occur for the same sample preparation method. The largest values for SD due to rotation also occur for the same sample preparation method. However, for the noise component of variation, the minimum value of SD corresponds to different sample preparations. One last set of calculations were performed. For this set of calculations, the computations of the components of variation for the protein were performed using calibrations generated from the same instrument and at approximately the same time as the prediction samples were run. Two sets of calibration samples were used: one set was ground on the Udy grinder, the other was ground on the Mitey Mill for 10 s. These results are presented in Tables VI11 and IX, respectively. An important point to note here is the effect of the calibration on the relative rankings of the SD due to repack. Although the best sample preparation method did not change with different calibrations, the worst and second-worst sample preparation methods switch places depending upon which calibration is used to determine these

SD's. This change of ranking, together with the previous observation that the optimum sample preparation seems to differ for protein and moisture, indicates that the relationships of the sample preparation methods for wheat are somewhat unstable. This seems to coincide with the observation that only the one component of variance that showed stable behavior (the rotational component) showed a statistically significant relationship between the SD due to sample preparation and the RMS groupsize. CONCLUSIONS The use of RMS groupsize to determine the optimum sample preparation for near-infrared reflectance analysis seems

0.144 0.062 0.132 0.247

1.952 0.565 1.390 2.437

0.409 0.254 0.348 0.387

2.051 0.749 1.400 2.295

rotation

repack

Results Using Only Calibration Wavelengths 0.8775 0.9443 0.9419 corr coeff t test 2.58 4.05 3.97

0.7075 1.42

Results Using RMS Groupsize from all 19 Wavelengthsb

1.339 1.338 1.428 2.525

(B) Statistics for Comparison of Components of SD with RMS Groupsize

Results Using Only Calibration Wavelengths corr coeff t test

0.071 0.048 0.073 0.102

Results Using RMS Groupsize from all 19 Wavelengthsb 0.9925 0.9750 0.8653 corr coeff t test 11.5 6.22 2.44 "RMS groupsize calculated by using the same wavelengths as used in the corresponding constituent calibration. RMS groupsizes using 19 wavelengths taken from Tables V, VI, and VII, as amrooriate.

to be a viable approach to determining which sample preparation method to use before a calibration is performed, as long as certain restrictions are met. The main restriction is that the different constituents in the sample behave similarly for the different sample preparation methods, in order to ensure reliability. In the current study, the smallest RMS groupsize did indeed correspond to the smallest SD in all cases, which would seem to give this technique practical value despite the irregular results. But for those cases (particularly with wheat) in which the relationship was not statistically significant, we do not know whether this correspondence was merely fortuitous. It is also not entirely clear at this point why different sample preparations sometimes appear better for different constituents.

LITERATURE CITED Massie, D. R.; Norris, K. H. Trans. Am. SOC.Agric. Eng. 1985, 8(1), 596-600. Ben-Gera, I.; Norris, K. H. J . Food Sci. 1988, 33(1), 64-67. Ben-Gera, I.; Norris, K. H. Isr. J . Agric. Res. 1988, 18(3), 117-124. Honlgs, D. E. Ph.D. Thesis, University of Indiana, Bloomington, IN,

._-..

IQRA

Cowe, I . A.; McNlchol, J. W. Appl. Spectrosc. 1985, 39(2), 257-266. Frank, I. E.; Kalivas, J. H.; Kowaiskl, B. R. Anal. Chem. 1983, 55, 1800-1 804. Hruschka, W. R.; Norris, K. H. Appl. Spectrosc. 1982, 36(3), 261-265. Giesbrecht, F. G.; McClure, W. F.; Hamid, A. Appl. Spectrosc. 1981. 3512). 210-214. Honigs, D. E.; Hieftje, G. M.; Mark, H. L.; Hirschfeld, T. M.; Anal. Chem. 1985, 57, 2299-2303. Norris, K. H.; Williams, P. C. Cereal Chem. 1984, 61(2), 158-165. Mahalanobis, P. C. Proc. Natl. Inst. Sci. India 1938, 2 , 49-55. Gnanadesikan, R. Methods for Statistical Data Analysis of Multivariate Observations; Wiley: New York, 1977; Chapter 4. Mark, H.; Tunneil, D. Anal. Chem. 1985, 57, 1449-1458. Mark, H. Anal. Chem. 1988, 58, 379-384. Mark, H.; Workman, J. Anal. Chem. 1986, 58, 1454-1459. Williams, P. C. CerealChem. 1975, 52, 561-576. Mark, H. Anal. Chem. 1988, 58, 2614-2819.

RECEIVED for review July 15, 1986. Accepted November 12, 1986.