Multiple Factor Experiments in Analytical Chemistry W. J. YOUDEN' Boyce Thompson Institute for Plant Research, Znc., Yonkers,
This paper gives a description of factorial experiments and attempts to show t h a t statistical computations for these experiments are not necessarily like turning the handle on a sealed black box. The grouping of the data in the numerical operations corresponds intimately to the physical contrasts t h a t are under investigation. A properly designed experiment makes it possible to set up in a valid manner the numerical combinations appropriate for the questions for which the data are expected to provide answers. The paper concludes with a reexamination of a published study of the precision of analytical data. A portion of the data was used
to illustrate the effectiveness of statistical techniques in such studies. A series of laboratories analyzing the same two compounds disagreed in the difference found between the compounds far more than would have been anticipated from the agreement of the duplicate analyses on the same compound. If a single analysis for the same element is run on each of two compounds and the difference between the two analyses is tabulated for a series of such pairs (using the same two compounds), a n estimate of the precision may be based upon the deviations of these differences from their average.
I
N MANY researches there are several variables of interest, if only to determine whether or not these variables exert any consequential effect upon the outcome. Sometimes a variable or factor is either present or absent-for example, a catalyst, Other variables such as temperature, pressure, and time may be investigated over a range so that two or more levels are selected for study. Many of these multiple factor experiments are of the type which statisticians call factorial eyperiments. Once the variables and the number of levels of each are chosen, a factorial experiment consists in performing an experiment for every possible combination of the levels chosen. Thus if it is planned to study an analytical procedure, it may be desirable to study the effect of pH, of the kind of acid used, and of a possible interfering element. If one interfering element is chosen, three acids, and four different pH levels, then there are 2 X 3 X 4 or 24 possible combinations to be tried: each acid a t each pH value and with and without the interfering element. This is not a new type of experiment, although very often not all the possible combinations are tried. The thing that is new about factorial experiments is the recent technique of the analysis of variance for making a systematic appraisal of the data. The application of the analysis of variance technique usually requires, if it is to be applied a t all conveniently, the data for every combination of the factors. When every combination of the factors has been tried it is possible to set down a simple statistical analysis so that, for each of the comparisons which the experimenter would naturally look for in the data, there exists a corresponding numerical computation. These computations make possible an objective evaluation of the data. This is done by computing for each of the various comparisons the probability that the differences observed could have arisen by chance because of the experimental errors in the measurements.
The most simple experiment consists of repeated measurements of the same quantity-that is, all factors are held constant. Two pieces of information can be extracted from a series of measurements of this character: an estimate of the average value, and an estimate of the precision of the measurements. Table I shows a set of eight measurements of the same quantity, and formulas for the standard deviation and average deviation of these measurements. The second formula is usually more convenient, especially if a constant value is first deducted from all the observations. The standard deviation, s, is a more convenient quantity to use in probability calculations than the average deviation. Taking the standard deviation as 0.0705, the standard deviation for the average is obtained by dividing by the square root of 8 (the number of measurements). This gives 0.028 for the standard deviation of the average. If it is desired to determine confidence limits for the average, the relatively small number of measurements upon which the value of s is based must be taken into consideration. If the number of measurements is large it is customary to take a factor based on the normal error curve.
Table I. ai az
a* -_
Set of Eight Replicate Measurements and Computations 9.78 9.84 S.D. = - ')* = = 0.0705 8
or
R 7.5 9.65
n.
4 754
a1
9.7 69 9.86
-. a6 aa
-
Total Av. = a
a4
9.78 9.84 9.75 9.65
Total Av. a
39.02 9.756
as
I
+(a
na
1136
- di
or
(Xu2
-
y)-
0.0705
-
a
- a1 = 0.0531 2la
Comparison of Two Items, Each Run in Quadruplicate
a1 a2
I
S.D.
rlverage deviation =
Table 11.
-
d e
7::i6475
STATISTICAL TECHNIQUE OF E X k l l I N l X G DATA
I n order to appreciate the statistical analysis which is appropriate for complex experiments-of which the factorial is the most common-a start is made with an elementary situation and then, step by step, the experiment is made more and more complex. The emphasis is placed on the exposition of the ideas involved rather than a description of particular complex factorial experiments. The purpose here is to show that, very often, by the exercise of a little ingenuity, an experiment may be designed whichwill be especially adapted for the situation a t hand. 1 Present address. National Bureau of Standards, Washington, D. C.
A'. Y.
+Z(b - 6)2 4- na
-2
bi b1
ba bi
b $.0189
+ 0.0146 = 6
9.78 9.79 9.69 9.86 39.12 9.78
e
= 0.0747
V O L U M E 20, NO. 1 2 , D E C E M B E R 1 9 4 8
1137
the duplicates for each day were in good agreement but that the two pairs differed in their averages, then the third line would contain a disproportionate part of the total variance. I n this case it does have the larger part but not significantly 9.75 9.65 9.78 9.79 9.69 9.86 so. The statistical test consists of dividing 0.0121 1 1 -1 -1 -1 -1 -0.10 8 0.00125 0,00125 by the average for the other two. This ratio is 0.06 2 0.0018 a statistic commonly denoted by F and has been -1 1 -0.10 2 0.0050 1 1 -0.22 4 0.0121 0.0189 tabulated. The F table (1, 2, 4, '7) is a double entry table and is entered with nl = 1 and =2 -1 1 0.01 2 0.00005 (the number of lines involved in the comparison). 1 0.17 2 0.01445 -1 -1 -1 1 1 -0.02 4 0.0001 0.0146 The tabulated value for F for the 5% level of -~significance is 18.5. Grand total 0.0347
Table 111. Partition of Variance into Single Degrees of Freedom
9 78
1 -1 -1
9.84
1 1 -1
For 95 and 99% confidence limits the factors are 1.96 and 2.58 when s is computed from a large number of measurements. When less than 30 measurements are available the factors may be obtained from a table of t (1,8, 4 , which must be entered with n - 1-that is, one less than the number of measurements. The factors are 2.37 and 3.50 for 95 and 99% confidence limits. The confidence limits are therefore t X S.D., or 10.059 and *0.088 for these probability levels. I t is obvious that the wider the limits set, the greater the confidence that these limits will bracket the average given by a very large number of measurements. A more usual situation arises in the comparison of two quantities. The eight measurements may be divided into two sets as shown in Table 11. The difference between the averages for the two sets is appraised in terms of the variation among the measurements within each set. The formulas show how to compute the statistic t , provision being made for different numbers of measurements in the two sets. I t is clear that if the difference between the two averages is large, t will be large. How large may t become by chance when the two sets of measurements are really on the same quantity? This is answered by consulting a table of t (1, 8, 4 ) which lists the magnitude of t for various probabilities, taking into consideration the total number of measurements on which s is based. The table is here entered with n = 6, since, with two sets, one is subtracted for each set. There is 70% probability that t will be as large as 0.404, a 50% probability that t will reach 0.718, a 5% probability that t be as large as 2.447 through purely chance variations when s is computed from two sets of four replicate measurements made on the same quantity. In this instance the two sets are simply an arbitrary division of eight homogeneous measurements. The value 0.474 found for t is, as would be expected, small and far short of 2.447. If 2.447 is taken as a criterion for deciding that a real difference exists between the two averages, then once in 20 times it would be mistakenly concluded that there was a real difference when in fact there was none. I t is possible to make a detailed analysis of the variation among the eight measurements, if the data are tabulated as shown in Table I11 with the seven rows of coefficientc: listed beneath the experimental values. The first line of coefficients indicate. that the sum of the b measurements is to be subtracted from the sum of the a measurements. The result, or algebraic sum is tabulated under the heading Sum. The next column shows the sum of the squares of the coefficients, and this is divided into the square of the algebraic sum to give the entry 0.00125 in the next column. The next three lines of coefficients examine the variation within the four a measurements. The measurements are pictured as occurring in two pairs. The difference between the first two measurements is taken, squared, and divided by 2. A similar step is carried out for a, and a4. Finally, the sum of the first pair is subtracted from the sum of the second pair, which is equivalent to testing how well the two pairs agree in their averages. The results of these three steps give, when summed, the total 0.0189. This agrees exactly, as it must, with Z ( u - si)* as shown in Table 11. The sum of the squares of the deviations of the a measurements from their average ? has i by this means been partitioned into three parts and each part corresponds to a comparison that i s often made on such a set of measurements. If a, and a2 had been run on one day and a3 and a, on a later day and it turned out that
There are other ways in which the coefficients could have been assigned to the a measurements. The following set shows how a1 could be compared with the average of the other three (3al - a2 - a3 - a,),a2 with the average of a3 and ar, and a3 compared with a,.
I
-1 2
3
I1 I11
-1 -1 1
0.10 0.28 0.10
-1 -1 -1
12 6 2
0,000833 0,013067 0.005000 0.018900
I t is a necessary requirement, in order that the total of the individual comparisons equal the total variance that the comparisons be independent-in statistical terminology, orthogonal. It is not necessary to carry out the arithmetic t o test whether a set of orthogonal partitions has been assigned. The products of the corresponding coefficients must add up to zero for each pair of rows: I XI1 I x 111 I1 x 111
0-2+1+1=0 O + O - l + l - O o+o-1+1=0
h similar process for the b measurements gives a total for the three parts which agrees with the Z ( b The total for all seven lines is 0.03475, which is equal to the sum of the squares of the differences shown in Table I, where all the measurements are regarded as a homogeneous set. This reveals that the total variation or variance of all eight items about their common mean is accounted for by the difference between the two sets plus the combined variation within the sets. In making an analysis of variance the ratio, F ( 1 , 2, 7 ) , is computed. F is the ratio of the entry for the contrast between the sets divided by the mean of the residual entries for error.
z)z.
0.00125 0.00125 = = 0.224 0.0146)/6 0.005583
= (0.0189
-
+
If there is no difference between the sets, the expected values are the same for numerator and denominator. In particular examples the computed value may depart considerably from unity through the chance combinations of the errors of measurements. When the numerator is less than the denominator, it is usual practice to dispense with the computation, since the values listed in the F table are all greater than unity. h not uncommon case arises when a number of materials are compared and duplicate analyses are run for each material. Using the same data the results may be set forth as shown. 01
a:
Ai
A;
bl
9.78
9.84
9 75
9 65
9.78
br 9 79
BI 9.69
BY 9.86
The four materials are identified as a, A , b, and B, and the duplicates are indicated by subscripts, The estimate of the error or standard deviation is found by summing the values tor the second, third, fifth, and sixth lines of coefficients shown in Table 111-that is, each pair of duplicates furnishes an estimate of the error. The mean of the four is 0.005325. The contrasts shown in lines 4 and 7, formerly available for estimating the error, are now expended on revealing the difference between a and A , and b and B. X little consideration must be given as to whether the partition of variance as given in lines 1, 4, and 7 corresponds to the interest of the experimenter.
ANALYTICAL CHEMISTRY
1138 In the event that there is no relationship among the four materials a , A , b, and B and it is merely desired to test whether differences exist among the materials, then the average for lines 1, 4, and 7 is found and the ratio of this t o the average of the four lines for error is computed.
F =' - - 0.00448 - o.84 0.005325
nI = 3;
n2
=4
The ratio does not approach the 5% value of F , 6.59, taken from the F table. There is, then, no evidence in these data of differences between the averages for the four pairs. iYow suppose the two letters correspond t o two different sources of material and the small letters refer to the materials as received and the capital letters after recrystallization. Then the first line of Table I11 compares the two sources of material, the values for the material as received and after recrystallization being aggregated. Lines 4 and 7 test for each material separately the effect of recrystallization. An F value may be computed foi each of these lines:
0 0121 0 0001 0.00125 = 0.24; A = 2.27; L o.00832 0.00532 0.00532 = 0'02 With but 4 degrees of freedom to estimate the error variance, the minimum F ratio at the 5y0 level of significance (n, = 1, m- = 4) is 7.71 and all three ratios are below this. No differences are expected, since the data are a homogeneous set. We have here the most rudimentary factorial experiment: two different materials have been analyzed, as received and after recrystallization. A more common arrangement of the coefficients treats the two factors (materials and crystallization) symmetrically m
-1 -1
a2
1
1 -1 -1
Ai 1 1 1
Az
bi -1 -1 1
1 1 1
BI -1 1 -1
b2
-1 -1 1
B2
-0
-1 1 -1
10 -024 - 0 20
8 8
8
000125 00072 0 0050
0.01345
Note that the former values 0.00125, 0.0121, and 0.0001 also sum up to 0.01345. The first line is the same as before, a contrast of the two materials. The second line reveals the over-all effect-Le., on both materials-of recrystallization. The meaning of the last line may be gathered by observing that it may be written as: al (-1
az -1
AI +1
AS +1)
-
bi (-1
bz -1
Bi
Bz
+1
+I)
Within each bracket there is shown the difference between the material as received and after recrystallization. Taken as a whole the expression sets up the difference between these two differences. I t inquires whether the effect of recrystallizing is different for the two materials. This is identical with the query as to whether the difference between the two materials remains the san'.e after recrystallization as it was prior t o recrystallization. That this is a property of the coefficients may be seen by rearranging the order of writing t o permit the grouping in brackets. a1
n?
(-1
-1
bi 7 1
bz +1)
Ai
- (-1
A2 -1
B) 7 1
B? 71)
This is the type of question which statisticians term an interaction. If the effect of recrystallization is the same on all materials, there is no interaction between the t,mo factors: mat,erials and recrystallization. This example displays a general characteristic of factorial experiments. I n :triswerinq the questions as t o a difference between materials, as to the effect of recrystallizing, and whether these effects are consistent, all the data are used each time. In each instance a different ,et of four is balanced against the zemainder. The proper choice of these sets requires some thought. It is generally sought to have the partitions orthogonal-that is, independent-and to avoid overlapping questions. There are many ways in x-hich the dat,a may be divided orthogonally. The method chosen is properly determined at the time the experiment is planned and not subsequent to securing the data. It may also give some substance to the term degrees of freedom to note that each line corresponds to one degree of
freedom. Introducing the factor of recrystallization permitted two further questions to be answered by the experiment: first. as to the general effect of recrystallizing. and second; as to whether the effthct was the same for both materials. The jirice paid is two of the degrees of freedom previously available for estimating the er-or I t is not necessary in making a statistical analvsis of a factorial experiment to break down the total variance of the observations into every individual degree of freedom I t is generally surficient to partition the variance into several groupings which are indicated bv the design of the experiment. All t,he degrees of freedom which may be properly assigned to the estimation of error are grouped without troubling to compute each individual component. The above examples were intended to show how the statistical analysis may be made to correspond in a precise way to the questions the experiment was expected to answer. I n the next section some published data are examined from the viewpoint of the analysis of variance. These data result from duplicate determinations of carbon on each of two presumably pure compounds, there being available results from ten different laboratories. The two factors are compounds and laboratories. The data will be examined to see if the two compounds respond differently, if the laboratories agree or disagree among themselves, and, if there is a difference between the compounds, whether the laboratories are consistent in reporting this difference. The duplicates make available an estimate of the precision of the analytical work. One characteristic of chemical data sometimes causes confusion in discussions of the errors of measurement. The theoretical composition of a pure substance may be computed from its chemical formula. The results of the analytical work are often compared with the theoretical composition. Observed deviations from theory may be due to impurities in the compound taken for analysis. If it can be shown by some means that the conipound is free from impurities in amounts which would explain the discrepancy, then the analytical procedure and the analyst are considered responsible for the deviation between observation and theory, and this deviation reveals the accuracy of the p ~ o cedure in the hands of the analyst. The precision or reproducibility of the analytical process is revealed by the deviations of a series of individual analyses on the same compound from their common average. Chemists are chiefly concerned with the accuracy of their work. It is also important to examine the precision of the measurements. Consider two analytical procedures, both of which show an average deviation from theory of 0.20, and that the average deviation of the analyses from their own average-that is, the precision-is in the first case 0 15 and in the other 0.05. It can be demonstrated that, in the first case, the accuracy will be greatly improved if t,he precision of 0.15 can be brought down to the value 0.05, since most of the deviation from theory can be accounted for by a lack of precisi In. I n the second case most of the deviations from theory cannut be ascribed t o a lack of precision and efforts to improve the precision would not help much. The carbon analyses vi11 be discussed from the viewpoint of the precision of the analyses. The precision is always revealed by agreement, or lack of i t , bet,ween duplicates. This is not the only way in which the precision may be estimated, as the following experiment will show. Suppqse part of a stock of a chemical which runs 1% high in carbon is put through a purification process. S o w suppose a series of paired analyses is run, each pair consisting of a sample from the original stock and a sample from the purified lot. The difference between these two analyses is tabulated for a series of such pairs. The values should hover around 17, and it should be clear that the deviations of these figures from their average reveal the precision. h little consideration will show that this method of examining the precision is independent' of the amount of original impurity or the success of the purification. I t assumes only t,hatthe two lots are homogeneous and that the impurity is not
A N A L Y T I C A L CHLHISTRY
1140 ‘l’ulde
VI. Aurlyrir of Vuriuiioo fur Datu in l’ul)ln V Froetlom
Hum of Bquaren
Rquare
P
S.D.
I
0 MIB31i 1.50057
0 GOA3b U.16117
4 3 . 41; 11 03
... ...
0.77007 0 27305
O.OSSU(i 0 01370
8.26
0.803
B u t ~ w nconirioundr (over tlinory)
I3ut HRCll anaiysta Inrtwc*tion between and.~&and nonipounila I)rinlic*uti.c ~l,lBllIl -.”.
... ..
..
. .”..
Mean
L)spreei of Ibtll
__
14 $1
20 . 39 .. .
. . . .. .-_ 3.00085 .“
.
.
.
. ..
0 117
-
‘l’lie aiiirlysis of vurisnccr for these d ~ t ( Lis sliowii ia Table VI. Tlittre arc avniltttdc (8, 9, 6, R) nurncrous expositioiis on t.hs nic*t.htisof ciiiiiput~ttior!for the a1idysi8 of variance. The yerieralircd forniulrw for iwdysis of vurirme oornput,ationo will in this estl.nip1i~nifty be not bc given herc. ‘l’Ii(1 suiiis of ~c~utirrs obtaiiied by t.hc follorviiig strps: The sum of s uares for duplicates is e q u d to one half the sum of the squares o? tile 20 differcmces list.4 in Table IV and the iuni of squares for interaction is equal to the sum of t,he Rquarw of the ten deviations in the last column of Table V. The sum of the squares for compounds hss been computed to test whether the observed differeme in carbon content between the compounds departs significantly from the theoretical difference of 9.29%. The observed value is 9.534 or 0.244% more than theory. The value 0.244 is squared and multiplied by 10, since this is the average result obtained by ten aiialystfs. The s u m of squares for analystv is obtained by adding for each analyst the two percentages shown in Table V. The sum of the squares of the deviations of t,hese ten s u m from t,heir own average gives directly the value 1.36057. Any tcndwcy for an individual analyst to run consistentsly high or low will inflate t,his s u m of squares. The xiuiiierical gt(!ps described above are somewk:.t different from the general formulw described iu ststistical texts which are designed to eliminate computation errors due to rounding off of the entries in the columns of differences. The particular numerical operations performed above apply l o duplicate analyses on each of two compounds. The four entries in the column of nwan squares are of special interest. T h e entry opposite compounds would approxiiiiatc that for duplicates if the difference fouud experimentally between the carbon contents for the two compounds (after allowing for the theoretical difference) was no greater than would normally arise between two aver e5 each based on 20 analyses mad:+ with a predifference between du licates. The mean cision shown by 6 uare is over 40 times t h a t for duplicates, &le the critical value o?F at the 5% level of probability is 4.35. The results therefore are not in keeping with what would have been expected if the cornpounds had really possessed the theoretical composition. Similarly the F ratio (nl = 9; nz = 20) for analystsis 11.03 whereas the 5 and 1% critical values are 2.40 and 3.45,.mdicating that ’r differences existed between the laboratories than would ave been anticipated if the only sources of variation were those t h a t apply to duplicates conducted in the same laboratory. The
30
Ereat‘
intcwc!tioii tJuhwcoii I L I I ~ ~ I ~ Karid I H C O I I I ~ M J U i* I ~($6~ ~uivalr?nt tsw, ebtirwtc3 of precision bawd on the uoii&mncy olthe d f i m p u in carbon contctttt fourid in the tun lobaratorhi. There u a
nignificant disagreemc!nt in the two estimata (dup1icat.m and interaction) of precision, aa the coinputod F e x d a . 2.40, the 5% value. Thc precision estimate bluwd on interaatlon IS the lrroper one to use in evaluatin anal ses done in different lahrrrtories. Judged by this stan;fard t i e anslyste have not b w i proved to disagmc ainong tliemmlvetd, Rinw the ratio 0.161 17/ 0.08566 is lestl than.3.18, theS%value (n,= 9; nz 9). The standard devitrtion R i obtained .by taking the squtrre root of the nieRn tqunrc. It wims rcdintic to bast. the estimate of precision on th(h incan square for interaction-that iR, t l c q r w mont anioiig the ten difftmiren in carbon eoritont report4 by the tc?n hfioratories. This give8 a standard deviation CJf 9.2’3% or about 5 parts piir thounand. Thin is largcv thsri lhu estimate t hat Powtv obtained for the accuracy by throwiner all the rwulta into o~iogroup anti igrioririy ih! tlidiiction between a n a l y w from the mino laboratory and thoso from diffwent Iatmratorien. llin ehtimute is thereforc h c r m r d i a t r betwren that for duplicatm (0.117) and that for intcraction (0.2!)3). I f cacti laborstory liwl rcportrtl oiily otic urialysis for cwh (wiipouIii1, his coniputal ion would linvc yic!ltli~tltq)proYimatcly 0,2‘3. If each laboratory had made several analyse* on each compound, tBe rrsult of his compubtion would hRve bmt closer to ttint given hcre for duplicatrs.
-
A viilid cstiriintc of the precision must not depciid on tlicb particular apportionment of analyscfi between laboratories and tiuplicates. The t echniyue of the analysis of variance clearly differentiates between the two estiniates of precision. It extracts from the data the precision based on interaction, axid that is the one which would more accurately predict the dispersion of reported differences between two compoundj if these were sent to a number of different laboratories for analysis and analyzed in duplicate. This approach also confirms, b y the contrast in magnitude of the two estimates of precision, t h a t the reproducihility of the conditions as achieved between different laboratories is much less satisfactory thau that ninintained for duplicatw run in one hhoratory. LWERATUHE CITEI)
Brownlee, K. A., “Industrial Experimentation,” 2nd ed., RrookIyn, N. Y., Chernicsl Publishing Co., 1947. (2) Fisher, R. A., “Statistical Methods for Research Workers.” 8th ed., London, Oliver & Boyd, 1941. (3) Fisher, R. A., “The Design of Experiments,” 2nd ed., London. Oliver & Boyd, 1935. (4) Finher, R.A., and Y ~ t e sFrank, , “Statistical Tables for Biologicd, .4gricultural and Medical Ile.setirch,” 2nd ed., London, Oliver & Boyd, 1943. (5) Power,P.W., IND. ENO.CHEI.,ANAL.En., 11, G W 7 . 7 (1839). (6) Smallwood, H. M., J. (:hem. Education, 23, 352-f1 (1940). (7) finedecor, 0.W.,“Statistical Methods Applied to Experimena in Agriculturc! and Biolow,” 4th ed., Amm, Iowa, Iowa State (1)
College Press, 1840. (8) Yates Y., “Design and Analysis of Factorial Experiments."
Techniral Cornrriunication 35, Hiirpctndon. England, Impend Bureau of Soil Science, 1837.
P ~ c r r v s oSeptanibor 10, 1048.
Teaching Students How to Evaluate Data T
ilE very idea of introducing niore subjects into ail alrcady overcrowded currioulum fills the teacher of chemist>ry with dread. However, the subject of statistics is so importtriit in industry and is becoming so increasingly important in the devclopment of analytical and h t i n g methods, that Bome introduction to the subject should be given in undergraduate courm. Furthermore, in graduate ooursea in sualytical chedxt,ry, Rufficient meterin1 on the application of ststiatics should be introduced 80 that prospective remarch chemisb will realise that they can ihorten their rasoarch work to an apprwiahle exhnt hy the
proper tlcaign of t!spcriniciitK aiid the proper USCof the dttb olr taiiid from their csperinients, Tho pre~entsutjhors do not ink
licve tlirtt sufficienl niatmial could ha bitroducod in thc usual courses in analytical chemistry to givo the a t u d e n b suficicntj background to handlo NtatistricaItectiniqucw compatcntly, or c v w fully to understand statintird implications. IIowiwer, the students can be niado to rtxtlisct the irnportanro of the a t a t i x t i d approach and they rttn bo taught H few itenls of denientary niatistiral manipulation. At prosunt, int,rodurtory t o x t h k ~in quantitative nialysis