Statistical methods in analytical chemistry

hypothetically considered from the point of view of the vulcanized product requires a preliminary ... numbers by a power of 10, so as to eliminate the...
1 downloads 0 Views 5MB Size
STATISTICAL METHODS IN ANALYTICAL CHEMISTRY JOHN MANDEL National Bureau of Standards, Washington, D. C.

INTRODUCTION/

No pretense is made to treat exhaustively or even to illustrate all these points in this discussion. I n examinStatistical methods, as used in the design and interpretation of chemical experimentation, are not a suh- ing two sets of data, selected from work recently done stitute for common sense or for what scientists refer to a t the National Bureau of Standards in connection with as scientific judgment; they rather constitute an ob- research on analytical test methods for rubber, we will jective aid to judgment. I n view of the meticulousness be able, however, to note many important facts relating exhibited by scientific workers in the purely technical to the design and the interpretation of experiments. aspects of their experiments i t seems appropriate to The first example selected illustrates the principles of devote some thought to two further and no less impor- design, while the second is more concerned with the tant aspects of experimentation: its design and the fmal interpretation of data. It must he emphasized, fmal interpretation of its outcome. Very often the however, that design and interpretation are, in the view experimenter designs the experiment as it proceeds, act- of the modern statistician, but different phases of one ing on a moment's intuition. Similarly, the inter- fundamental methodology, as will be illustrated in the pretation of results seldom goes beyond a graphical discussion of the examples. representation or a rudimentary study of data, the WATER ABSORPTION TEST ON SYNTHETIC RUBBER latter consisting in many cases in ranking studied effects The test here discussed consists in cutting a specimen in accordance with their observed experimental approximations, thus ignoring the possible disturbances caused from a sheet of rubber, dipping it in freshly distilled by experimental and systematic errors. Naturally, water, blotting i t with filter paper, and weighing it. even the most experienced worker is subject to an . The specimen is then submerged in a heaker containing occasionally misleading intuition. The assumption is distilled water and the heaker is placed in an oven maintherefore natural, and actually borne out by the facts, tained a t a constant temperature for a period of 20 that many experiments could have resulted in more hours. After this period the specimen is placed in conspicuous and sharper conclusions, had they been distilled water a t room temperature for 10 minutes, blotted with filter paper, and reweighd in a weighing designed by a carefully constructed plan. In taking cognizance of the unavoidable experimental bottle. This is cert%inly not an ideal example of errors, rather than in ignoring them or dismissing them precise chemical analysis. We will see, however, that as negligible, and in attempting a mathematical study the principles involved do not a t all depend on the of the avoidance or the correction of systematic errors, particular nature of this test and will apply wit,h the statisticians have succeeded, to a certain extent, in same effectivenessto a wide variety of si$na,tionscomoffering some objective criteria which Bre invaluable mon in the analytical laboratory. Previous results obtained by this test, which is of in these situations. One of the important by@roducts of these methods is an estimate of the quantity of practical importance for GR-S used in wire and cable experimental work that is necessary for obtaining insulation, were very erratic. It was planned, theresufficient factual proof for a scientific hypothesis, such fore, to investigate the factors causing variability, the an estimate being more objective than the guessing most important of which were considered to be: technique often used. But among the most important (1) Batch-to-hatch variation arisiG in the vulcanizcontributions of statistical methodology to scientific ing.process of the raw rubber. experimentation is the possibility of clearly separating (2) batches. the effects of the various variables under study as well (3) Specimen-to-specimen sheets. as the interactions of these variables with regard to the (4) The temperature of the oven used in the test. measured auantities. from the data resultine from a complex experiment. It is clear that this requires Consideration of the first three sources of variability, careful planning. Even before the experiment is which reflect actual heterogeneity of the material, is started, the various possible types of results must be necessary, because the application of the test to the hypothetically considered from the point of view of the vulcanized product requires a preliminary treatment of questions under study. the raw rubber, via., ~ornpou&ng and hlcanieing. 4 The design as well as the results of the experiment *Presented before the Third Annual Microchemical Symposium, sponsored by the Metropolitan Microchemical Society are in Table Four hatches were prepared,. of New Yark, an February 28. 1948, in New York City. from each of which four sheets of vulcanized material, .. .. .. . , ,. 534

-

OCTOBER, 1949

535 TABLE 1 Water Absorption of GR-S for Wire and Cable Insulation, in mg. per cm.*

Batch

Test temp., "C.

A1

A2

Sheet and specimen Bb CI

BI

were obtained. Each sheet was cut in half, thus giving two test specimens. In every batch the two specimens of the first sheet were tested a t 7OoC., the two specimens of the second sheet a t 75'C., while from each of the remaining two sheets one specimen was tested a t 70°C. and the other a t 75°C. The highly compact nature of this design should be noted. In contrast to many experiments carried out in analytical laboratories this experiment did not consist in the study of one factor a t a time; the design rather allows all of the variables to vary simultaneously. It will be seen that this circumstance in no way impairs the value of the experiment. On the contrary, the range of validity of the effect attributed to each variable is extended by the fact that the effect is observed for several values of each of the remaining variables. This fact will emerge with greater clarity from the statistical analysis of the data of Table 1 to which we now proceed. The first step of the analysis consists in the codmg of the data, by subtracting a constant, conveniently chosen, from all values and multiplying the resulting numbers by a power of 10, so as to eliminate the decimal point. The first transformation obviously does not affect the general pattern exhibited by the data and the second transformation is merely a change in scale or unit. We have chosen 1.30 as the subtractive constant and 10%as the multiplicative factor. This elementary treatment of the data leads to Table 2, which already reveals some remarkable facts to a Eareful observer. In each batch the data relating to the sheets designated A and B are especially interesting from the point of view of specimen-to-specimen variability. The diierences between the results of specimens cut from

C2

Dl

Dd

the same sheet and tested at the same temperature are tabulated in Table 3. Thus, for example, the entry 3 in this table is the diierence between the results 13 and 16 for the two specimens, both tested a t 75"C.,

----'I'AULL; Z Coded Data from Table 1 Test temp., Batch "C. 9 1

Sheet and specimen Ad BI Bd C1 Cd

Dl

Dd

of sheet B in batch I. No remarkable pattern is exhibited by these eight numbers, the average of which is 1.5 or, in the original unit, 0.015 mg./cm.z This number represents the 'Average difference between duplicate measurements on the same sheet and could be taken as a measure of specimen-to-specimen variahility. However, a statistically more satisfactory number is obtained by computing, from these eight numbers, the specimen-to-specimen staddard deviation, s,, as follows: . . . + 3' + ... + 121 '/2[22 + 12 (sda =

+

= 1.38

8

The weighting factor ' / a is necessary because each number is the difference of two original observations;

TABLE. 3 Sneeimen-to-Snecimen Differences. Sheet. Batch and Temperature Effects Batch

Test temp. 70°C. 76°C.

Batch

I

2

3

I

I1

1

1

I1

I11

1

1

I11

IV

2

1

IY

Sheet C

Test temp. 70%'. 76'C.

Difference

2 15 13 D 5 15 10 C -1 17 18 D 0 17 17 C 12 24 12 D 11 25 14 C 9 23 14 D 8 19 11 Averaee difference for all batches 13.8

Average difference per batch 11.5 17.5 13.0 12.5 13.8

536

JOURNAL OF CHEMICAL EDUCATION

the subscript t refers to the fact that the variability n-1, because the true average of the differences in the thus calculated is the best available measure for the numerator, i. e., the average that would be obtained test error. The square of the standard deviation is if a large number of such differences were available, known as "variance"; therefore, s4 is the test variance, must necessarily be zero. (In the long run, for every or, more accurately, the variance of the test error. C exceeding D there must be a D exceeding C by the It is seen that .s, = = 1.2, or, in the original unit, same amount.) Thus, no adjustment is made for the mean in the sum of squares in the numerator, and con0.012 m g . / ~ m . ~ Table 3 also contains a rearrangement of the data sequently no "degree of freedomJJ2lost in the. denomrelating to sheets C and D in Table 2. tozether with some inator. The computations just outlined, as well as their interelemeniary combinations of th&e ha&. The column labeled "Difference" contains diierences between the pretation, are summarized in the "Analysis of Variance" results obtained on specimens cut from the same sheet shown in Table 4. It is interesting to note that the "sums of squares" but tested a t different temperatures. Obviously, these differences can only be affected by two factors, for sheets and batches add up to the over-all sum of specimen-to-specimen variability and temperature ef- squares. This additivity is a general property of the fect. If one remembers that the average specirnen-t* sums of squares as well as of the degrees of freedom specimen difference, as previously derived, is 1.5, it in any analysis of variance table. In this case its applibecomes apparent from examination of these eight TABLE 4 numbers that a substantial increase in water absorption Analysis of Variance of Temperature Effects is obtained by raising the test temperature by 5'C. Degrees Sum It is seen that, on the average, this increase is 13.8, i. e., of of Mean 0.138 m g . / ~ m . ~Is this increase consistent between freedom souares smam Source of variabililu " . sheets of a same batch and between batches? An Total 7 26.94 3.85 answer to these questions is obtained as follows: Batch-to-batch 3 21.19 7.06 (a) The over-all variance of these numbers is: Sheet-toaheet within

a

The weighting factor ' / z is used to compensate for the fact that each of the eight numbers, as a difference of two original observations, is subject to two test errors. Except for this weighting factor, the quantity corresponds to the well-known formula:'

4 5.75 1.44 batches The mean square for sheer vlrriatriiry is 1.11 as cornparcd to 1.38 ior rfst error. Tl1i.i rlwe xgrecrnmr itdicare~that no sl.ccrro-shcct vnrisbilitv of rile rcrnnrrsr~lrccflcer is detcctsblc in the data. The batch-to-batch mean square, 7.06, on the other hand, exceeds the sheetto-sheet mean square in the ratio 7.06/1.44 or 4.9 to 1. A statistical table, known as the F-table (see references 2, 3, 41, shows this value to be not quite significant. It is, however, sufficiently large to indicate the possible existence of some variation, from hstch-to-batch, in the difference in water absorption at 70- and 75% . .

cation would-have allowed us to dispense with one of the steps (b) or (c). n - 1 The analysis just described by no means exhausts the (b) The batch-to-batch variance of the differencesare: information contained in the data of Table 1. Thus, 11.51 + 17.5' + 13.0' + 12.52 - (11.8 + 17.5 + 13.0 + 12.5)2/4 for example, columns 6 and 7 in %ble 3 could be treated bv analvsis of ~ a r i a n c e and . ~ several other facts could A - I be extracted by a thoro&h analysis. For our purpose, however, which is purely illustrative, the computations Here the weighting factor is omitted, because averages outlined are believed to suffice. of two differences are used. This averaging decreases ULTRAVIOLET ABSORPTION OF GR-S IN METHYLthe variance in the proportion 1: 2, thus canceling the CYCLOHEXANE initial increase in variance in the proportion 2:l. The second example is a study of the ultraviolet (e) The sheet-to-sheet variance is obtained by com- absorption of a sample of GR-S synthetic rubber disputing: solved in methylcyclohexane. Measurements of optical density were made on this sample, a t a wave length of 288 mp, on two consecutive days, for ten different concentrations. The total number of measurements thus consists of 20 values, shown in columns 2 and 3 The value of the weighting factor is readily ex- of Table 5. The instrument used was a Beckman plained, since each difference in the numerator involves ultraviolet suectrouhotometer. four original observations, and therefore four test 2 The concept "degree of freedom" is more fully explained in errors. The denominator, in this case, is n rather than the next section of this paper. Z(z - S)= n - 1

1

See references 2 and 4.

or

2 9

- (2x2)/n

Such an analysis would reveal that the water absorption vdue, at constant temperature, varies appreciably from batch to batch.

OCTOBER. 1949

Ultraviolet Absorption of GR-S in Methylcyclohexane

-

Observed Data and Least Squares Fit

Relative Optical density conca- A b s m e d trotion, Day One. Day Two, Average,

.

Calmlated..

Residuals X lo5,

" The method of computation is indicated in T:~l,le6. Tht. measunments corresponding to 7ero rcla~iveconcentration. 7 . c.. oun. sdvtnt in lmrhcells. w n ! mad? aitvr a orelin~inarv statisticalanalysis of the other data. had indicated thekvistencc