REPORT FOR ANALYTICAL
CHEMISTS
the statistical approach in analytical chemistry: why it is important by Allan Β. Calder, Rutherford
College
of Technology,
Newcastle,
England.
oTATiSTics receives scant attention during the education of the average ^ chemist and, as a result, most practicing chemists are ill-equipped to use statistical methods with discretion. B u t each observation in analytical chemistry, no less t h a n in any other branch of scientific investigation, is inaccurate in some degree, and while the accurate value for the concentra tion of some particular constituent in the analysis material cannot be de termined, it is reasonable to assume t h a t the accurate value exists, and it is i m p o r t a n t to estimate the limits between which this value lies. I t is there fore desirable—nay, imperative—that the chemist should be familiar with the elements of statistical method, not only from the point of view of con sistency in the general presentation of analytical results but in order to derive reliable estimates from the observational data. This is not m e a n t to imply t h a t the analytical chemist is not interested in the " a c c u r a c y " of his results but t h a t he tends to interpret these results intuitively and therefore often rather inefficiently. It must, of course, be understood t h a t the statistical approach is con cerned with the appraisal of experimental design and d a t a whereas the analytical approach concerns only the analytical process. I n other words, statistical techniques can neither detect nor evaluate constant errors or bias; the detection and elimination of inaccuracy are analytical problems. Nevertheless statistical techniques can assist considerably in determining whether or not inaccuracies exist and in indicating when procedural modi fications have reduced them. I t is also desirable t h a t the chemist should look beyond the confines of the analytical laboratory and attempt to an swer the question " W h a t are Ave trying to d o ? " where we includes not only the analysts but those to whom the final results are to be reported. Ideally the chemist should be his own statistician. H e should be familiar with the fundamental concepts and should be a general practitioner unto himself. Only when relatively complicated issues are at stake or the prob lem involves higher m a t h e m a t i c a l statistics, should the specialist be called in. It would be wrong, however, for the research worker to assume that
ALLAN B. CALDER is senior lecturer in inorganic chemistry at the Rutherford College of Technology, Newcastle upon Tyne, England. Previously (1947-1956) he was in charge of the spectrographic unit in the Edinburgh and East of Scotland College of Agriculture where he dealt with the use of spectro graphic equipment both for re search and advisory purposes in agriculture. He has undertaken a special study of the normal variation in trace element con tent of hill soils and herbage and has devised a sampling technique suitable for advisory purposes and for a survey of soils and upland pastures in the East of Scotland area. Dr. Calder obtained the B.Sc. in chemistry at St. Andrews Uni versity (1943) and the Ph.D in applied statistics at Edinburgh University (1955). He is the author of several papers dealing with trace elements, spectrochemical analysis, nomography, and the application of statistical methods in instrumental analysis and biology. VOL. 36, NO. 9, AUGUST 1964
•
25 A
REPORT FOR ANALYTICAL CHEMISTS
mathematical calculations by them selves can obviate the need for com mon sense and sound analytical technique in the chemical labora tory, and again in the use of statis tical methods, as in the application of other scientific disciplines, there are pitfalls to be avoided. To quote from Chambers (5) : ". . . sta tistical methods are merely tools for a research worker. They enable him to describe, relate, and assess the value of his observations. They cannot make amends for incorrect observations nor can they of them selves provide a single fact of psy chology, biology, or any other sub ject of research." In what follows, I have selected by way of illustra tion various analytical problems to which statistical logic and method ology have been applied with ad vantage.
REDUCING OPERATIONAL TIME AND FATIGUE
Figure 1. Schematic Presentation of Breakdown of Process of Determination Into Component Steps
26 A
•
ANALYTICAL CHEMISTRY
Let us consider the Lundegardh flame-spectrographic method of analysis (7) which I used from 1951 to 1955 to determine the ele ments potassium, K, and man ganese, Mn, in samples of hill herb age (4) • This analytical method is of the multi-stage type involving a chemical pretreatment (hydro chloric-acid extraction of the ashedherbage aliquot) prior to an instru mental (spectrographic) assay. Figure 1 depicts the necessary steps in the entire process of determina tion from field to final laboratory evaluation. The word "sample" is used here in the chemical sense, that is P 3 , P 2 , . . . P n in Figure 1 refer to small cuttings of herbage taken at random from a relatively large area of hill side, uniform within itself with re gard to vegetation and soil type. Let us now suppose that we have drawn χ random samples (Pj . . .P,:) from one such area and that each sample has been subjected to the analytical treatment (as for Pj in Figure 1). In an imperfect world it would indeed be surprising if all final χ results were identical; in other words we should expect a certain amount of scatter. The question arises, however, "How is this scatter or error distributed
REPORT FOR ANALYTICAL CHEMISTS
Figure 2.
Table 1. VT V.,„:,l.
Schematic Presentation of Flame Spectra
Errors of Determination (K) 20 6 3
(Mn) 35 7 5
among the several stages of the entire process?" Before proceeding further with this problem it is neces sary to introduce a certain amount of elementary statistical arithmetic. Using the notation employed by Davies (6), we express an error or variation as a percentage coefficient of variation (v) which is derived from the standard deviation (s) ac cording to the relation ν = 100s/X, where X is the mean of η observa tional data. The standard devia tion is given by s = [2 (X — Χ ) 2 / ( η — Ι)] 1 /' with the usual notation. Further, the square of the standard devia tion [i.e., the variance (s2) ] has the advantage of additivity in the sense that if there be a number of independent causes of variation operating on a system with vari ances Sj2, s22, . . . sr2, the total vari-
T a b l e 2 . Distribution of Errors Variance (as ν 2 ; Ρ = 0.68) Element Κ Μη
Vin.tr.' 2
Vch t m. 2
Var.al. 2
Vgampl.
9 25
27 24
36 49
364 1176
an ce sT2 = S sr2 and since by defini tion, ν cc s, we can put vT2 = 2v r 2 . Returning to our original theme, the normal variation or total error of determination is therefore distrib uted among the several stages of the process according to the relation *instr.
ν sampl.
...[1] How do we determine the magni tude of these quantities? v T 2 is ob tained by computing sT2 from the analytical data for η samples {i.e., Pi, P 2 , . . . P n ) :v anal . 2 , i.e., v instr . 2 + Vchem.2, is obtained by eliminating the effect of veampi.2 and computing Sanai.2 from the data for several simi lar aliquots of the same herbage sample; i.e., the complete analysis is performed on each of η similar portions of a carefully-mixed single ground sample, e.g., pi. Similarly Vinstr.2 is derived via s lnstr . 2 which is calculated from the analytical re-
VT2
400 1225
suits obtained for η aliquots of the final extract from (say) pi. In this investigation (4) I found v T for any particular element to be gener ally of the same order of magnitude for areas, uniform within them selves but differing in vegetation and soil type. The values for vT, Vanai., and v Instr . are shown in Table 1. From the above data and relation [1] we find the errors to be dis tributed in the following manner as shown in Table 2. (The statistical results in Tables 1 and 2 have been expressed in terms of probability Ρ ( = 0.68 here). What does this signify? Simply that when an analyst derives his error in the normal way via the standard deviation and quotes it as 20 per cent (for K, say) he is work ing at a level of Ρ = 0.68. This means that, on the average, in the general run of analysis, one out of V O L 36, NO. 9, AUGUST 1964
•
27 A
REPORT FOR ANALYTICAL CHEMISTS
Table 3 .
C o m p a r i s o n of P r e c i s i o n s for S i n g l e airid D u p l i c a t e Spectra (P = 0 . 6 8 ) η 2 = 25
Duplicate Single
Table 4.
Brookfield Synchrolectric Viscometer
Chemists' best friend keeps on getting better Today, "Brookfield Viscosity" is the universal language of viscosity measurement. Keeping pace with the science of Rheology isthe Brookfield Synchrolectric Viscometer. There are now nine models in the Brookfield line, one of which has the exact number speeds, speed range and centipoise range to meet your requirements. Whether you work with Newtonion or nonNewtonion materials . . . whether you work in the ultra-low viscosity range of 0 to 10 centipoises up to 64,000,000 centipoises there's a Brookfield instrument for you. Why not take a look at the new Brookfield Viscometer and Automatic Viscosity Measuring Instrument brochure. We will be pleased to mail you one.
Brookfield
ENGINEERING LABORATORIES INCORPORATED
Stoughton 16, Mass. Circle No. 24 on Readers' Service Card 28 A
•
ANALYTICAL CHEMISTRY
Element
Vinatr. 2
V„hem. 2
Κ Μη Κ Μη
9 25 9 X 2 25 Χ 2
27 24 27 24
Background Readings
θ Β (Μη)
0B(K)
25.9 26.9 25.9 26.7 26.5 25.8 26.1 27.5 26.2 25.8
26.3 25.9 26.0 26.1 26.4 26.0 27.4 26.3 26.1 25.8
9B(MII) 0B(K)
-0.4 + 1.0 -0.1 +0.6 +0.1 -0.2 -1.3 + 1.2 +0.1 0.0
three measurements may be ex pected to deviate from the mean by more than one standard deviation. It is more usual to accept a P level of 0.95 which means that one meas urement in twenty, on the average, will deviate from the mean by more than two standard deviations). It is important to note that the errors and therefore variances refer to single herbage samples; i.e., accord ing to the above scheme of analysis the analytical results obtained for Κ and Mn in a single (chemical) sample drawn at random from the field will be reproducible to within ± 20 per cent and ± 35 per cent, respectively. We could reduce this error in one of two ways: (i) we could draw η χ random samples (i.e., Pi, P 2 , . . . P n l ) and submit each to an analy sis : the average of the ni analytical results would be accurate to within ν τ -=- \ / n i , i.e., the total variance would now become 400/ni for Κ and 1225/t for Mn; or (ii) we could select n 2 random herbage samples, form a homogeneous composite from these, and submit only one aliquot from this composite to anal ysis: the analytical results ob tained would be precise to within ± [Va„ai.2 + v s a m p i. 2 /n 2 P per cent,
Vsampl.
364/25 1176/25 364/25 1176/25
VT2
50.56 96.04 59.56 121.04
VT
7.1 9.8 7.7 11.0
i.e., the total variance would become [36 + 364/n 2 ] for Κ and [49 + 1176/n 2 ] for Mn. The choice of procedure must be left to the analyst, and that choice will be governed by the relative magnitudes of the analytical and sampling errors, the time factor in analysis and sampling, and the degree of re producibility desired. These con siderations will therefore determine the values for n! and n2. In the original investigation I used a 25times replication at the sampling stage (i.e., n2 = 25) so that the limits of reproducibility in the final result were those shown in the upper half of Table 3. Now in the instrumental assay employed for determining cations in plant extracts, as depicted in Fig ure 1, it has always been specified that duplicate spectra be recorded for each portion of solution to be examined. The idea behind this is to increase the reproducibility of the instrumental performance. But however much this may be justified in practice from the purely analyti cal point of view, is there really any justification for such replication when we view the process of de termination as a whole? In Table 2 the instrumental variances for Κ and Mn are respectively 9 and 25, and these refer to results evaluated from duplicate spectra. Suppose we were to record only one spectrum per portion of solution to be ex amined; this would be equivalent to doubling the instrumental variance. Hence if we evaluate our final re sult from a single spectrum only, but still using a 25-times replica tion at the sampling stage, we should then obtain the precisions given in the lower half of Table 3. It is seen that the vT differences as between single and duplicate in strumental assays are not suffi ciently large to justify a preference
for either alternative. The opera tional time, however, is substan tially reduced when only single spectra are taken, being approxi mately half of t h a t normally ex pended. Let us now investigate more closely the instrumental stage of the analysis (1). In flame-spectrographic methods we plot # E / 0 B against concentration, where ΘΕ = galvanometer deflection (i.e., the "blackening") for the analysis line; (9B = corresponding deflection for the background (i.e., the back ground spectrum of the carbon monoxide flame from an air-acety lene source) measured at a conven ient specified point in the immedi ate neighborhood of the analysis line concerned as shown in Figure 2. The background varies in density throughout the length of the spec trum, but at any particular wave length it should be reproducible for a series of spectra. I t s introduction in the method therefore compen sates for fluctuations in the light source or small variations in total emission from one spectrum to the next. I t therefore serves as an in ternal standard. Each photographic emulsion carries in addition to the sample spectra, a number of stand ard spectra from which the θτ./θκ vs. concentration relationship is de rived. I n determining the elements Κ and M n we must obtain from our photometer four readings: