Statistical Methods in Chemistry - Analytical Chemistry (ACS

Fresenius' Zeitschrift f r Analytische Chemie 1961 185 (1), 1-98. Statistical Evaluation of Spectrographic Methods. Benjamin N. Nelson. Applied Spectr...
2 downloads 0 Views 1MB Size
REVIEW OF FUNDAMENTAL DEVELOPMENTS IN ANALYSIS

I s

I

I Statistical Methods in Chemistry I

I I

I

JOHN MANDEL and FREDERIC J. LlNNlG Testing end Specifications Section, National Bureau of Standards, Washington 25, Q. C.

C

along with types of factors affecting each. Lark ( 1 1 6 ) has indicated the necessity in studies of this type involving regression analysis of using joint tests of significance on both the slope and intercept in order t o allow for the interdependence of their errors in making “confidence statements.” Mood’s book ( 1 4 7 ) constitutes an excellent source on joint confidence tests, including computational details. This subject is of great general importance and is dealt with in more detail in the section on the interpretation of data. hIandc1 and Stiehler (137j have introduced the concept of sensitivity t o compare the merits of alternative methods of test. Sensitivity, as defined by these authors, relates to the ability of a method of measurement t o detect small differences in the property under study and involves, besides the standard deviation of the measurement, the relationship betn-een the property and the measured quantit,y. The use of various recently developed statistical techniques for the study of analytical procedures is effectively described in tivo expository articles by Touden (814) and Box (20). I n methods requiring standard niatcriah, the difficulties due t o chemical instability, such as in the case of photographic materials, have been dealt with by K e s t (20;) l ~ > -means of statistical techniques. T h e precision of results of microanalyses is appreciably influenced by the observer’s unconwious preferences for certain fractions in estimating the last place on the pointer scale of the balance. Gysel(58) has s h o m that these preferences differ from person t o person b u t are remarkably constant for a given individual. This phenomenon may Tell deserve consideration in the calibration and use of other types of equipment. A number of articles(J.3, 62, 66, 74, 101, 107, 116, 138, 160, 163-165, 180, 1P4, 210) involve the study and calibration of equipment of various types. Many articles have appeared, covering a wide range of problems involving accuracy and precision ( 6 >7 , 16, 19, 26, 31, 32, 4.2, 61, 67, 75-77, 81, 92, 96, 100, 102, 114, 118, 182, 127, 133, 242-144, 146, 166, 163, 178, 179, 188, 183, 185, 187, 189, 138, 202, 218).

OSSIDER.\BLE progress, both in theory and in the number of applications, has been made since the last review of statistics in chemistry b y Hader and Youden ( 8 9 ) , in this journal. Because the authors felt t h a t the present revieTT should concentrate on general developments rather than present a coniprehencive bibliography, many of t’he p3pers described here in detail have been selected because they are convenient illustrations of the ideas and methods discussed in the text. Inclusion or omission of a particular paper in this bibliography should not be considered as an attempt t o judge its merit. Furthermore, \Yhile an effort was made t o cite as many art’icles as possible from the chemical literature, it has been indispensable, in view of the intended nature of this review, t o refer frequent,ly to statistical journals. I n doing this, priority was given to those articles in which the underlying ideas, rather than mathematical proofs, are presented. T h e selection of topics for discussion is necessarily someivhat arbitrary and subjective. It is based, not so much on logical order, as on a consideration of the current trends in statktical methodology. Here again, as in the listing of individual papers, completeness has not been attempted. Thus, a variety of topics which can legitimately be considered as statistical and useful in some chemical applications were nevertheless omitted. The topics t h a t are considered deal, broadly speaking, with the drawing of inferences from measurements obtained in planned experimen ts. ACCURACY AND PRECISION OF .METHODS ASD EQUIPJlEXT

T h e concepts of accuracy and precision are examined in detail l)y Eisenhart (65), who proposes definitions based on an operational viewpoint. With regard t o t h e definition of precision, Eisenhart points out t h a t the reproducibility of a method depends on the conditions under which the repeated measurements are carried out. If, in the sequence of repeated measurements, more than one instrument, or operator, or laboratory is used, the variability due t o these factors will affect the precision of the method. T h e operational definition of precision ivould therefore specify the exact nature of the sequence of repeated nieasuremen ts. I n regard t o accuracy, Eisenhart remarks t h a t unbiasednessi.e., complete absence of systematic errors-does not yield its full benefit unless enough replicate measurements are made to reduce the stnndard error of their average t o a negligible value. Thus, in cases where only a few replicate measiirenients are itveraged, a biased b u t reproducible method is often preferable tematic errors b u t less reproducibility. Youden in a n earlier paper (213) and in his book ( 2 1 6 ) de,cribed a method involving linear regression techniqnes for resolving relative and constant type errors in studying the accuracy of a procedure, using the “standard error of estimate” as a measure of precision. Recently, this approach has been used in the study of test methods (92,116,126). Linnig, Mandel, and Peterion (125) describe a plan for studying accuracy and precision based on this concept and on analysis of variance techniques. A flexible relationship between the concepts of accuracy and precision as applied t o a given analytical procedure is indicated

DESIGN OF EXPERIlIESTS

Sumerous applications of the principles and procedures of statistical design are recorded in the recent literature, with particulnr emphasis on factorial designs and latin squares (1, 8, 28, 23, 38, 45, 61, 56, 79, 82, 95, 103, 112, 170, 13D! 145, 150, 151, 162, 166, 167, 171, 174, 175, 1 9 , 197, 203, 212). At the same time, new trends are emerging, which are of psiticular interebt in the physical sciences. A number of these w e discussed below. Fractional factorial designs are a class of designs t h a t have been considered of particular iiiefulness in preliminary or exploratory investigations of the effects of many controlled variables on a measured quantity. I n these designs, a selection is made of the factorial combinations, instead of the coniplete coverage required in full factorials. T h e selection is such t h a t only effects t h a t are deemed of secondary importance (generally high-order interactions) are sacrificed. A lucid description of these designs is given in a n expository article by Eroivnlee (28:. Other articles of interest have appeared in the literature (43, 108, 196). The texts by Davies ( 4 7 ) ,Bennett and Franklin (16), and Cochran 770

77 1

V O L U M E 28, NO. 4, A P R I L 1 9 5 6 a n d Cox ( 3 7 ) contain well-written chapters dealing with fractional factorials. Briefer accounts are given b y Brorrnlee (29), Kilson (211), and Youden (216). There has been a greater utilization of a class of designs useful in scheduling experiments, in the sense of making t h e most efficient use of available equipment, personnel, time, and similar factors. Compensation for the systematic errors introduced by variation of these factors is accomplished through t h e device of partitioning t h e measurements into statistical “blocks.” A block is a group of measurements characterized b y some constancy of experimental conditions, such as a single day, the same oven, or the same batch, or sheet, or roll of material. T h e application of block designs t o chemical experimentation is by now well established. I n “randomized block designs,” treated in a number of text? ( 5 , 16, 2P, 37, 47, 54, 147), all the treatments occur in each i)lock in a randomized order, the term “treatments” referring t o t h e categories t h a t are t o he compared in the teat-e.g., different niaterials, different concentrations, different temperatures, different solvents, or different brands of a commercial product. Further progress was accomplished through the introduction of “incomplete blocks,” which are required in situations where t h e blocks are too small to include one measurement for each treatment. T h e bias due t o variations among blocks is eliminated in these designs b y proper overlapping of some of the treatments in all blocks. HOWthis is accomplished is clearly described in nonmathematical terms b y Touden ( 2 1 6 ) , and some applications are described in the literature (82, 94,121, 193). Certain incomplete block designs have t h e maximum possible symmetry. in which case they are called “balanced” (5, 16, 29, 37, 47, 2 1 6 ) and lead to a constant precision for all pairnise comparisons of t,reatnients. However, in order t o achieve balance, i t is often necessary t o include a rather large number of measurements for each treatment. Designs t h a t allow for proper scheduling with fewer measurements per treatment are now available and are known as “partially balanced incomplete blocks” ( 3 7 ) . I n applications to chemistry, the disadvantage of having slightly different precisions for the various treatment comparisons appears of secondary importance because of the relatively high prerision of measurements in the physical sciences and is generally more t h a n compensated for b y the economy of these designs. =\n interesting class of partially balanced incomplete block designs. known as “chain blocks,” has been introduced by Touden and Connor (217). hfandel (135) describes a flexible class of designs, related t o the chain blocks and requiring no more than tn-o measurements per treatment. An application of the latter is described b y Rothman and coworkers ( 1 7 2 ) . An elementary discussion of chain block designs and their usefulness in scheduling experiments is available (149). Balanced incomplete block designs are tabulated in a number of places ( 3 7 , 47, YO), and a compilation of an important class of partially balanced incomplete block designs, Tyith instructions for analysis, has been made available b y Bose, Clatworthy, and Shrikhande (18). Designs based on blocks of size tn-o-i.e., containing only two measurements per block-have been erteneively studied, and an up-to-date account is given by Clatn-orthy ( 3 6 ) . They have been used b y Youden (216) for the detection and measurement of instrumental drift. Page ( 1 5 2 ) explores their possible use for the calibration of meter bars. Sequential designs are schemes of stepn-ise esperimpntation in x h i c h each step depends on the results of the previous erperimenta. Gore (78) describes a sequential plan based on 2 X 2 latin squares. Box and his coworkers (21, 2 ~ i have ) developed extensive schemes of sequential experimentation for the exploration of “response surfaces,” a name given by these authors t o the result of a measurement such as yield, considered as a function of a number of controlled variables (geometrically, a slirface in a higher-dimensional space). Box’s efforts are aimed more specifically a t the determination of conditions leading t o a maximum

of the measured quantity. Good expository discussions are given by Box ( 2 0 ) and b y Read (161), and a comprehensive chapter is included b y Davies ( 4 7 ) . T h e problem of selecting from among several populations the one t h a t is optimum lyith respect t o some property has also been treated b y Bechhofer (10, 12, I S ) , with particular reference t o the determination of the minimum number of measurements t h a t are needed t o make this selection with a risk of error not t o exceed a preassigned value. T h e author describes his method in an easily read expository article (11). The techniques introduced by Bechhofer are also noteworthy in another respect discussed in the section on the interpretation of data. INTERPRETATION OF EXPERIMEXTAL RESULTS

Most catalogued statistical designs are such t h a t the “least squares” solutions (6, 48, ,54, 147) for the effects of factors or blocks or for the comparisons among treatments are obtained x-ith very little arithmetic. They also lend themselves readily to treatment, by the “analysis of variance” technique (5, 16, 29, 48,64, 147, ? I S ) , which provides an estimate of experimental error and serves as a guide in ascertaining the zignificance of the effects of the various factors on the measured properties. I t has been recognized, however, t h a t analysis of viriance is generally only a first step in the interpretation of data. Eisenhart ( 6 2 ) has pointed out t h a t the interpretation of the analysis of variance depends on the nature or tlie h e t o r s involved. Thus, the factor “day of experimentation” is of a random type, whereas a factor such as “catalyst” is termed “fixed.” Eisenh a r t defines certain “models” based on the nature of the factors in the experiment. The distinction is particularly important when i t is desired t o derive from the mean squares measures k n o x n as “components of variance” to estimate the effects of certain factors, since the steps in the calculation depend on the model involved. T h e present state of knoiJ-ledge in this field is revieTed in a comprehensive article by Crump (41). Pertinent discussions are given in the books by l l o o d ( 1 4 7 ) , Anderson and Bancroft ( 5 ) , and Bennett and Franklin ( 1 6 j . Some applications of components of variance appear in the literature (119, 165, 165). Hamaker (93), in an expository article dealing x i t h industrial applications, emphasizes the usefulness of a technique combining analysis of variance and regression analysis, for factors susceptible of continuous variation on a scale, such as temperature or concentration. B y this technique, the mean square corresponding t o the effect of such a factor is decomposed into components relating t o linear, quadratic, cubic, and, if necessary, higher order terms. Thus, it may be judged whether, within the limits of experimental error, the relation b e b e e n the measured quantity and this factor is linear, quadratic, etc. This techniqne is also described by Brownlee (29) and in greater detail by Davies ( 4 7 ) . -1 number of recent investigations are of particular interest from the viewpoint of t h e validity and The scope of inferences drarvn from experiments. They emphasize the efforts made in recent years t o clarify the basic principles underlying the application of statistics. These developments are already reflected in some applications and are likely t o affect to a n even greater extent future work in applied statistics. I t is natural t h a t a variety of approaches have been proposed, differing in fundamental concepts. I n practice, hoiverer, this diversity of viexpoints permits the experimenter t o select t h a t approach which he considers most appropriate for the problem a t hand. Especially noteworthy is the fundamental n-ork of ScheffB ( 17 6 ) and Tukey (199), b y which the heretofore unsolved problem of the reliability of “multiple comparisons” receives a rigorous solution. T h e difficulty resulted from the effect of the interdependence of conclusions drawn from the same set of d a t a on the joint reliability of these conclusions. I n solving the problem, Scheff6 and Tukey have taken the viewpoint t h a t probability statements concerning the reliability of conclusions drawn from

772 N set of data should concern the experiment as a whole-i.e.,

the totality of all conclusions drawn from it-rather than each conclusion individually. An important practical aspect of this problem deals with the construction of confidence intervals (16, 91, 147, 211)-i.e., intervals within which the true values t h a t are sought can be said to lie with a given probability. If the intervals corresponding t o these various values are not statistically independent it may sometimes be advisable t o evaluate their reliability jointly rather than separately for each interval. This joint procedure leads to ellipsoidal confidence regions f i om which the individual confidence intervals can be derived, and the probability measuring the confidence of the conclusions (the “confidence coefficient” j applies to the entire region, rather than to individual intervals. An excellent and not too mathematical account of the problem is given by Durand (59). This author points out t h a t joint confidence regions are more particularly useful when a multiplicity of conclusions are likely to be drawn from the experiment and the choice of comparisons must tie deferred until after the d a t a are obtained. Duncan ( 5 7 ) , in dealing with the problem of grouping a set of treatments on the basis of their observed averages, uses a different approach to the problem of multiple comparisons. If a significance level a is adopted-e.g, a = 0.05-Duncan associates with each comparison of treatments a probability of error a’, such t h a t 1 - a’ = (1 - a)”-’,where p is the numbei of treatments involved in the comparison. I t is seen t h a t only comparisons involving two treatments ( p = 2) are made at the specified significance level, a . As the number of treatments involved in a comparison increases, the level a’ increases and the reliability of the comparison decreases. Thus, Duncan’s method is less stringent than Scheffk’s, which requires that the probability of even a single error in the totality of all comparisons among treatments be at most a. +4n application of Duncan’s method to a problem of taste testing has been described (68). A striking illustration of the importance of the interdependence of confidence statements derived from the same data is given by Lark ( 1 1 6 ) . I n a situation requiring a decision as to whether 311 observed set of data fits a straight line with slope equal to unity and intercept equal t o zero, Lark s h o m t h a t separate 95% confidence intervals for the slopc and intercept lead to an obviously erroneous conclusion, while the conclusions based on n joint confidence ellipse for slope and intercept are entirely reasonable. Joint confidence intervals arise also in experiments involving the simultaneous measurement of several properties Daniel :Lnd Riblett ( 4 9 ) evaluate the effects of several controlled variables in the preparation of a catalyst by means of measurements of its activity and selectivity. They apply the results of multivariate analysis ( 1 4 7 ) for the construction of a joint confidence legion, in this case an ellipse in a plane whose coordinates are activity and selectivity. The joint analysis shows t h a t one of the factors, which on the basis of separate tests of significance for selectivity and activity was apparently without effect, is actually clearly significant, while two other factors, which on the Insis of the individual tests appeared definitely significant, are nctually of doubtful significance. Joint confidence regions are also appropriate in the quality control of materials or consumers’ products. Beall and Pascoe ( 9 ) present an interesting procedure for the quality control of paper based on two properties, such as basis weight and burst strength. Their method, which also involves multivariate analysis, permits the elimination of material t h a t is deficient in either one or in both properties, with due consideration of the correlation t h a t exists between the two tests. A somewhat different application of multivariate analysis is made by Fisher, Hansen, and Norton (68) in the simultaneous analysis of glucose and galactose by a spectrocolorimetric method. Observations w e made a t two wave lengths and the results are used to solve two simultaneous quadratic equations in the two unknowns. The

ANALYTICAL CHEMISTRY absorbance a t each wave length can be considered as a function of the concentrations of glucose and galactose. Tested separately, these functions for the two Tvave lengths appear to be linear, while a joint analysis reveals t h a t quadratic terms must be included. The reason for the inadequacy of separate statistical t,ests for the two wave lengths is the high correlation between the errors of the two measurements. The usefulness of experimental results depends to a large extent on their precision, which, in turn, is a function of the measuring process. Considerations of precision, accuracy, and sensitivity, discussed in an earlier section, will generally be used t o select the better measuring process from among alternate choices: but even after a particular method of measurement has been selected, the precision of the final results still depends on the number of replicate measurements and on the statistical procedure used in the processing of the data. One way of specifying the precision of the final results is to require t h a t effects of a given, preassigned magnitude shall be statistically significant and therefore detected. The ability of a statistical significance test or estimation process t o satisfy such a requirement is expressed by the “power” concept, described in a number of texts (5, 64, 91, 147, 211). Greater power is characterized by shorter confidence intervals. Clearly, an increase in the number of replicate measurements leads to higher power. Conversely, decisions regarding the number of determinations to be made, or t h e number of samples to be tested, can be made on the basis of power requirements. I n the chemical literature, a number of papers describe applications of the power concept ( 7 1 , 125, 184, 136, 263, 158). Power considerations form an integral part of the sequential test procedures developed by Wald, whose book (20.2) is an excellent source, even for nonmathematical readers. Shorter accounts are given by Mood ( 1 4 7 ) and Davies ( 4 7 ) . Churchman (%), in a paper already mentioned in a previous review (B06)> and Kasagi (104) apply sequential methods to decide between t x o possible empirical formulas for a chemical compound, I n Churchman’s paper, the decision is based on the observed percentages of bromine and sulfur. Here, the power concept relates t o the risk of choosing the incorrect formula. As SUP.. cessive samples are analyzed, the combined results, though affected by experimental error, tend toward the percentage values for the correct formula. The advantage of the sequential procedure results from the likelihood t h a t this tendency will be conclusive, in terms of the tolerated risk, in relatively few determinations. Theory shows t h a t the saving in the number of determinat’ions is often of the order of 50% as compared to experiments having a predetermined number of determinations and the same risk value. The theory developed by Bechhofer ( I I ) , which has already been mentioned, is also noteworthy in t h a t it introduces considerations of poiyer, in the form of the “smallest difference worth detecting,” a t the outset rather than as a secondary consideration. The choice of the “smallest difference worth detecting’’ is considered by Bechhofer as an extra-statistical decision to be made by the experimenter, possibly on the basis of economic considerations. Somerville ( 188), in a theoretical investigation dealing with the problem of selecting, from among several categories, t h a t which is optimum in some specified respect, incorporates into the statistical procedure the cost of sampling and the economic losses resulting from an erroneous selection. The work of both Bechhofer and Somerville is based on a “decisiontheoretic” approach, a recent development in which statistical procedures are viewed as rules for making decisions, the value of each decision being determined by its economic consequences. A popular exposition of decision theory is given by Bross ($7). This approach has been found most useful in industrial applications. I n more fundamental work, different approaches have been used. Thus, in a study of the accuracy and precision of an analytical method, the power of the test has been related to the

173

V O L U M E 28, NO. 4, A P R I L 1 9 5 6 .tandard deviation of the analytical method (125). Fisher (69)) in a stimulating recent article of a philosophical nature, opposes the injection of cost considerations or of decision-making in applications to fundamental research and concludes t h a t , in such applications, ‘‘We aim, in fact, a t methods of inference which should be equally convincing t o all rational minds, irrespective of any intentions they may have in utilizing the knoivledge inferred.” Building “power” requirements, whether based on economic or other considerations, into t h e design of an experiment appears to be a first, though significant, step in t h e growing trend of tailoring each design exactly to the particular requirements of the experiment, with only selective use of catalogued schemes. This is particularly true for investigations t h a t have progressed well beyond the exploratory stages and are concerned with the study of detailed relationships, such as those discussed in t h e following section. STUDY OF CHEMICAL REACTIONS AND LAWS

A few papers have appeared in t h e literature concerning t h e use of statistical methods in t h e study of chemical reactions and laws. Box and Youle ( 2 5 ) have related t h e parameters in the equations of a fitted surface to parameters describing the kinetics of t h e reaction, The surface results from the measurement of t h e end product of the reaction obtained under different conditions of temperature, time, or concentration of reactants. The technique of “exploration and exploitation of response surfaces” has been mentioned in the section dealing with the design of experiments. Another paper, mathematical and theoretical in nature, b y Singer (186) investigates the factors causing irreproducibility in chemical reactions. For example, chain reactions with a very slow rate of initiation followed by a very rapid chain propagation are irreproducible because of t h e very nature of the mechanism involved. -4method is proposed for distinguishing between this type of irreproducibility and t h a t which is caused by the introduction of extraneous impurities or other similar factors. Grohskopf ( 8 4 ) discusses the use of factorial designs in t h e study of chemical reactions, stressing the importance of interactions, often dismissed by investigators, for the detection and measurement of physical and chemical relationships. He suggests the use of factorials for the study of reaction velocities. Liebhafsky, Pfeiffer, and Zemany (123) verify experimentally t h a t x-ray emission is a random process with a standard deviation equal t o the square root of t h e count per unit of time. This standard deviation increases, however, when operating conditions are not ideal. The theoretical standard deviation may therefore be used as a control in x-ray emission spectrography. Linnig and con-orkers ( 1 2 5 ) observe t h a t variation in the quantity of end product in equilibrium reactions may result in a curvilinear relationship between material found and added. T h e presence of curvature can be detected and measured by statistical means. Whitman and JVhitney (208) have used regression analysis to determine the order of a reaction. These and some other similar papers ( 3 9 , 40j 44, l l S , 131, 140) 168) represent a step in the direction of the practical application of statistics to the study of fundamental chemical problems. B u t the possibilities of relating statistically derived parameters to natural l a w are only now heing exploited. The value of this approach over the classical approach is due to a great extent to the statistical practice of varying a number of factors a t a time, thus not losing sight of important interrelationships between these factors. MISCELLANEOUS APPLICATIONS

Nonquantitative or semiquantitative data, such as arise in the taste-testing of food (34, 60, 6 7 , 88, 98, 99, 117, 1.21, 141, 155, Zla), have effectively been treated by statistical techniques known as “nonparametric” or “distribution-free.” -4s these methods are based entirely on the ranking of the observations, they require no assumptions regarding the underlying distribution

of the data and have therefore also been very valuable in the analysis of quantit’ative measurements %-hen the assumptions underlying other stat’istical methods are of questionable validitj-. Among the most useful nonparametric tests, of wide applicability, are the sign test (65) and the rank-correlation coefficient. Thesr, as well as other nonparametric methods, are described in book,* by Mood ( 1 4 7 ) , Dixon and Nassey (54),and Bennett and Franklin ( 1 6 ) . Especially notervorthy is the little book by Illorone!( I & ) , in which the rationale and practical application of many nonparametric tests are lucidly described without mathematical developments. An interesting example of the use of rankcorrelation methodsis provided by Schmidt (180) in a study of the interchangeability of spectrophotometric absorption cells, with comments on other applications. Convenient tables for the use of the rank-correlation coefficient are given by Litchfield and Wilcoxon (126). The old and vexing problem of t’he “reject’ion of outlying observations’’ has received recent discussion by Dixon (68), Proschan ( l b 7 ) , King ( I l l ) , Grubbs (86), and Blaedel, RIeloche, and Ramsay ( 1 7 ) . Kase (106, 1 0 6 ) has applied the theory of “extreme values” to problems in polymer research and rubber technology. This theory deals with observations which, by their very nature, lie in the tail end of a frequency distribution-e.g., the largest flood a bridge is intended to withstand, or the strength of the weakest fiber in a rubber tensile specimen. A very readable and comprehensive treatment is given by Gumbel (86) and a newer method of analysis by Lieblein ( 1 2 4 ) . These authors have also conipiletl a list of recent applications in a short, expository article ( 8 ? ) , Practical applications are discussed by Kimball in a recent article (110). A number of papers (4, 177, 2 0 6 ) deal with the problem of devising efficient designs for large scale interlaboratory studies of test methods or equipment and with methods of analysis of the resulting data. An appreciable number of interlaboratory tests are described in the literature (30, 52, 60, 67, 114, 1 2 8 , 159, 166, 178, 181, 190, 191). T h e problem of estimating the useful life of manufactured products has been t h e object of statistical studies, many of which are in t h e form of unpublished technical reports. Among the published work, there are papers by Weibull ( S o d ) , Freudenthal and Gumbel ( 7 2 , 7 3 ) , McClintock (125, 130), Epstein and Sobel ( 6 4 , 6 6 ) >and hIcElwee (132). The American Society for Testing Materials has devoted two special technical publications t,o the subject (2, 3). Davis (49)has written a paper of an expository type, including t h e analysis of 13 systems encompassing a n-ide range of applications. The increasing availability of high-speed computing equipment to many research n-orkere n-ill undoubtedly be reflected in a more widespread use of mnlt’iple regression methods for the anal) ’ of large amounts of data. The applicat’ion of Box’s techniques (20, 47, 161), mentioned in an earlier section, often involves extensive comput’ations. Electronic calculators have been used in a study of infrared absorption and emission spectra by Plyler, Blaine, and T i d n e l l ( l 6 6 ) and in a factorial experiment described by Rowel1 ( 1 7 3 ) . A bibliography of recent articles and books dealing with computing machines, including also other statietical and mathematical references, is given by Rose, Johnsoii, and Heiny ( 17 0 ) . BOOKS, TABLES, AND ABSTRACTS

T h e following is a partial listing of the numerous books, t h a t have recently appeared, which cover statistical theory on various levels of mathematical difficulty and with varying emphasis on applications. The basic theory is \vel1 covered by Mood ( l d ? ‘ ) , Hoe1 (97), Hald (91), and Anderson and Bancroft (5). More elementary expositions of statistical theory are given by Wilks (209),Tippett (194), and Dixon and PIlassey (64).

774

ANALYTICAL CHEMISTRY

-

Among the books written more specifically for chemists, t h a t b y Bennett and Franklin ( 1 6 ) is the most comprehensive, while the books by Youden (216), Brownlee (69), and Davies (48) provide good introductory treatments. Other books in this area have been written b y Tippett (195), Villars (200), Gore (80), David (46), and Beers ( 1 4 ) . A book deserving special mention, because of its unique approach, is t h a t m i t t e n b y Wilson (211). It contains many interesting discussions on the rational basis and the practice of the statistical method. The design of experiments has been comprehensively treated b y Kempthorne ( l o g ) , with considerable mathematical detail. An excellent practical book on the design and analysis of experiments has appeared under the editorship of Davies (47). Cochran and Cox (38) provide a comprehensive catalog of designs, with explanatory chapters on their characteristics and analysis. Finally, a casual, yet clear and detailed exposition of a large number of useful statistical techniques is contained in the book written by Moroney (148). Statistical tables are appended to most statistical texts. A particularly complete set is t h a t found in t h e book by Dixon and Massey (54). Hald has issued, in conjunct,ion with his book, a compilation of many useful tables (90). Fisher and Yates’ tables (70) have served statisticians and research workers for many years. A newly revised and very comprehensive version of the well known Bioinetrika tables, with detailed explanatory notes, has been made available by Pearson and Hartley (164). &[any articles dealing with statistical methods appear in the chemical and related literatures. I n addition, such journals as Biometrics, Industrial Q u a l i t y Control, the J o u r n a l of the d m e r i c a n Statistical Association, the J o u r n a l of the Royal Statistical Society, Series B, and A p p l i e d Statistics occasionally contain articles useful in chemical work. A column devoted t o discussions on statistical design appears on a bimonthly basis in I n d u s t r i a l and Engineering Chemistry, under the editorship of W. J. Youden. Abstracts of statistical articles, both theoretical and applied, are regularly compiled by G. E. Sicholson, Jr., in the J o u r n a l of the A m e r i c a n Statistical Association, and by Joseph Movshin in I n d u s t r i a l Quality Control. A further source is t h e International J o u r n a l of Abstracts o n Statistical Methods in I n d u s t r y , edited by G. I. Buttcrbaugh. ACKNOWLEDGMENT

It is a pleasure for the authors to acknoxTledge their indebtedness to Grant Wernimont, author of the 1949 review on statistics in this journal (206), for supplying a collection of more than 400 abstracts of references, from xhich the major portion of papers listed in the present bibliography has been taken. LITERATURE CITED (1) Ames. 5.R., Risley, H. A,, Harris, P. L., ASAL.CHEM.26, 1378 (1954). Extraction and determination of vitamin A in liver.

(2) Am. Soc. Testing hlaterials, Philadelphia, Pa., Spec. Tech. Pub. 121 (1952). Statistical aspects of fatigue. (3) Ibid., 137 (1953). Fatigue with emphasis on statistical approach. (4) Am. Soc. Testing Materials, Philadelphia, Pa., “Standards,” P VII, D990-54 T (1954). Tentative recommended practices for interlaboratory testing of textile materials. (5) Anderson, R. L., Bancroft, T. A,, “Statistical Theory in Research,” NcGraw-Hill, S e w York, 1952. (6) Asaka, T., J a p a n Analyst 2, 378 (1953). Statistical test and estimation. (7) Asperger, S., AIurati, I., Ax.4~.CHEX 26, 543 (1954). Determination of mercury in atmosphere, submicroanalytical determination of mercuric ion in bromine and chlorine water. (8) Asselman, T. H., Muller, F. LI.,Smolder, P. AI., Setherlands Expt. Sta. for Utilization of Straw, Treatise 2 (1953). Digestion of rye straw with caustic soda. (9) Beall, G., Pascoe, T. A., T a p p i 38, 74 (1955). Simultaneous process control of two tests. (10) Bechhofer, R. E., Ann. Math. Stat. 25, 16 (1954). Ranking means of normal populations with known variances.

(40) (41)

(42) (43) (44) (45)

(46)

Bechhofer, R. E., Trans. Am. SOC.Quality Control, Natl. Conv., 1955, 513. hlultiple decision procedures for ranking means. Bechhofer, R. E., Dunnett, C. W., Sobel, AI., Biometrika41, 170 (1954). Ranking means of normal populations with common unknown variance. Bechhofer, R. E., Sobel, M., Ann. Math. Stat. 25, 273 (1954). Ranking variances of normal populations. Beers, Y., “Introduction to Theory of Error,” Addison-Wesley Publishing Co., Cambridge, Mass., 1953. Bennett, C. A., Am. SOC. Quality Control Conf. Papers 337 (1952). Effect of measurement error on chemical process control. Bennett, C. rl., Franklin, N. L., “Statistical Analysis in Chemistry and the Chemical Industry,” Wiley, New York, 1954. Blaedel, W.J., Meloche, V. W., Ramsay, J. A., J . Chem. Educ. 28, 643 (1951). Criteria for rejection of measurements. Bose, R. C., Clatmorthy, W.H., Skrikhande, S.S.,Agr. Expt. Sta., Chapel Hill, S . C., Tech. Bull. 107 (1953). Tables of partially balanced designs with two associate classes. Bowman, H. Al., Tarpley, W. B., A p p l . Spectroscopy 7, 57 (1953). Statistical studies of absorption intensity reproducibility. Box, G. E. P., Analyst 77, 879 (1952). Statistical design in study of analytical methods. Box, G. E. P., Biometrics 10, 16 (1954). Exploration and exploitation of response surfaces. Box, G. E. P., Biometrika 39 (1 and 2), 49 (1953). Multifactor designs of first order. BOX,G. E. P., Hay, W. A., Biometrics 9, 304 (1953). Efficient removal of trends occurring in a comparative experiment. Box, G. E. P., Wilson, K. B., J . Roy. Stat. SOC.B , 13, 1 (1951). Experimental attainment of optimum conditions. Box, G. E . P., Youle, P. V., Biometrics 11, 287 (1955). Exploration and exploitation of response surfaces. Link between fitted surface and basic mechanism of system. Bright, Harry, ANAL.CHEM.23, 1544 (1951). Standard sample program of National Bureau of Standards. Bross, I. D. J., “Design for Decision,” >lacmillan, Sew York, 1954. Brownlee, II., J . Am. Leather Chemists’ Assoc. 48, 60 (1953). Statistical methods applied to analysis and testing of leather. LlcClintock, F. A , , J . A p p l . Xechanics22,421 (1955). Statistical theory of size and shape effects in fatigue. Ibid., p. 427. Criterion for minimum scatter in fatigue. LIcCoy, R. D., Svinehart, D. F., J . Am. Chem. Soc. 76, 4708 (1954). Ionization constant of metanilic acid from 0’ to 50’ by e.m.f. measurements. RIcElwee, E . XI., Proc. I . R . E . 39, 137 (1951). Statistical evaluation of life expectancy of vacuum tubes designed for long life operation. Llacurdy, L. B., Alber, H. K., Benedetti-Pichler, A. A., Carmichael, H., Corwin, A. H., Fowler, R. I f . , Huffman. E. W, D., Kirk, P. L., Lashof, T. W., AFAL.CHEY.26, 1190 (1954). Terminology for describing performance of analytical and other precise balances. Maeser, L., J . Am. Leather Chemists’ Assoc. 48, 187 (1953). Revision of physical testing methods. hIandel, J., Biometrics 10, 251 (1954). Chain block designs with two-way elimination of heterogeneity. blandel, John, Alann, C. W., J . Research S a t l . B u r . Standards 46, 99 (1951). Statistical solution of problem in sampling leather. Mandel, J., Stiehler, R. D., Ibid., 53, 155 (1954). Sensitivitycriterion for comparison of methods of test. AIarteret, J., Chim. anal. 34, 149 (1952). Precision of measurements in analytical chemistry. Martin, G. XI., Mandel, J., Stiehler, R . D., J . Research Natl. B u r . Standards 53, 383 (1954). Aerological sounding balloons. Martin, L., Analyst 77, 892 (1952). Statistical methods in radiochemistry. blatchett, J. R., von Loesecke, H . IT., A s . 4 ~ .CHEW.27, 623 (1955). Food. hlendelowitz, A,, Riley, J. P., Analyst 78,704 (1953). Spectrophotometric determination of long-chain fatty acids containing ketonic groups with particular reference to licanic acid. llichonsniky, &I,, Rec. mBt. 50, 341 (1953). Spectrographic analysis of steel during manufacture and routine laboratory control. AIiranda, H. de, Chem. Weekblad 47, 1046 (1951). Varying standards for reproducibility of chemical analytical results. Mitchell, J. A,, Am. Soc. Quality Control Conference Papers, 29 (1952). Experiments in chemical plant. llitsui, S., Arakan-a, Al., J a p a n Analyst 2 , 6 (1953). Application of statistics to errors of chemical analysis. Alood, A. A‘lcF., “Introduction to Theory of Statistics,” McGraw-Hill, Xew York, 1950. Moroney, LI. J., “Facts from Figures,” Penguin Books, Baltimore, Md., 1953. S a t l . B u r . Standards C. S., Tech. S e w s Bull. 39, 90 (1955). Experimental designs for duplicate measurements. Oehler, R., Davis, J. H., Kinmonth, R. A., J . Am. Leather Chemists’ Assoc. 50, 17 (1955). Pilot plant study of process for treating heavy leather with polyisobutylene and other polymers. Olsson, I., Pihl, L., T a p p i 37, 42 (1954). Printing studies at Swedish graphic arts research laboratory. Page, B. L., J . Research S a t l . B u r . Standards 54, 1 (1955). Calibration of meter line standards of length a t National Bureau of Standards. Peake, R. E., A p p l . Statistics 2 , 184 (1953). Planning experiment in cotton spinning mill. Pearson, E. S., Hartley, H. O., “Biometrika Tables for Statisticians,” Vol. I, Cambridge University Press, Cambridge, 1954 Peryam, D. R., Shapiro, R., Ind. Quality Control 11, 15 (1955). Perception, preference, judgment-clues to food quality.

(161) Read, D. R., Biometrics 10, 1 (1954). Design of chemical ex-

periments. (162) Reed, 11.C., Klemm, H . F., Schulz, E. F., I n d . Eng. Chem. 46, 1344 (1954). Plasticizers in vinyl chloride resins, removal by

oil, soapy water, and dry powders. (163) Reichman, E. J., T a p p i 38, 119.i (1955). Structural fiber-

pulp with its own problems. (164) Reilley, C. N., Crawford, C. AI., . i x a ~CHERI. . 27, 716 (1955). (165) (166) (167) (168) (169) (170) (171) (172)

(173) (174) (175) (176) (177) (178) (179) (180) (181) (182) (183) (184) (185) (186) (187)

(188) (189) (190)

Principles of precision colorimetry, general approach to photoelectric spectrophotometry. Reinhardt, F. W., LIandel, J., Am. SOC.Testing Materials. Bull. 206, 50 (1955). Comparison of methods for measuring tensile and tear properties of plastic films. Ricciuti, C., Coleman, J. E., Willits, C. O., ASAL. CHEM.27, 405 (1955). Statistical comparison of three methods for determining organic peroxides. Rickard, R. R., Ball, F. L., Harris, W.W., Ibid., 23, 919 (1951). Microdetermination of fluorine in solid halocarbons. Riedel, O., .\‘ucleonics 12, 64 (1954). Statistical purity in nuclear counting. Roe, J. E., Edelson, H., J . Assoc. Ofic. Agr. Chemists 37, 849 (1954). Loss of fat during souring of cream. Rose. A., Johnson, R. C., Heiny, R. L., I n d . Eng. Chem. 47, 626 (1955). Computers, statistics, and mathematics. Rosenberg, D., Eunerson, F., I n d . Quality Control 8 , 94 (1952). Production research in manufacture of hearing aid tubes. Rothman, S., hlandel, J., LIcCann, E. R., Weissberg, S. G., J . Colloid Sci. 10, 338 (1955). Sorption of macromolecules to solid surfaces. Sorption of dextran to cellulose nitrate membranes in presence of serum albumin and surface-active agents. Rowell, J. G., J . Roy. Stat. Soc. 16, 242 (1954). Analysis of a factorial experiment (with confounding) on an electronic calculator. Rudnick, E. S., Tezt. Div. Suppl., Ind. Quality Control, 1, 67 (1952). Analysis of variance applied t o textile processes. Rushton, S., Davies, D. R. G., J . Iron Steel Inst. (London) 167, 247 (1951). Variation in electrical properties of silicon-iron transformer sheet. Scheff6, H., Biometrika 40, 87 (1953). Judging all contrasts. Scheuer, E., Smith, F. H., Light Metals 15, 8 0 (1952). Analytical comparison scheme. Schlecht, W.G., ANAL.CHEM.23, 1568 (1951). Cooperative investigation of precision and accuracy. Schmidt, R., A p p l . Spectroscopy 6, 7 (1952). Reliability of chemical analysis. Schmidt, R., Photoelectric Spectrometry Group Bull., KO.5 , 115 (1952). Application of parameter-free statistical methods to problems of absorption spectrophotometry. Schoenberg, W.,T a p p i 37, 196.i (1954). Effect of moisture variation upon standard corrugated test procedures. Scott, T. A,, Jr., Melvin, E. H., AFAL.CHEM.25, 1656 (1953). Determination of dextran with anthrone. Scribner, B. F., Corliss, C. H., Ibid., 23, 1548 (1951). Emission spectrographic standards. Shah, J. N., Worthington, 0. J., Food Technol. 8 , 121 (1954). Methods and instruments for specifying color of frozen strawberries. Shapiro, L., Brannock, W. W., .ANAL. CHEM.27, 725 (1955). Automatic photometric titrations of calcium and magnesium in carbonate rocks. Singer, K., J . Roy. Stat. SOC. B , 15, 92 (1953). Application of theory of stochastic processes to study of irreproducible chemical reactions and nucleation processes. Snesarev, K . A., T r u d y Komissii Anal. R h i m . Akad. Nauk S.S.S.R., Otdel. K h i m . h’auk 4 , 282 (1952). Increasing precision of methods of quantitative analysis and application to photoelectric colorimetric method of optical compensation. Somerville, P. pu’., Biometrika 41, 420 (1954). Problems of optimum sampling. Stevens, S. S., Science 121, 113 (1955). Averaging of data. Swift, C. E., Hankins, 0. G., Food Technol. 8, 323 (1954) Variance in chemical determinations of added moisture.

V O L U M E 28, NO. 4, A P R I L 1 9 5 6 (191) Switzer, AI. H., Am. SOC.Testing hlaterials, Bull. S o . 197, 60 (1954). Development of hiding power test method. (192) Terry, AI. E., Bradley, R. A., Virginia Agr. Exp. Sta. Bi-annual Rept. No. 2, Appendix B (1951). Designs and techniques for tests of quality. (193) Thayer, F. D., Jr. Bianco, E. G., Wilcoxon. F., J . A m . Leather Chemists’ Assoc. 46, 670 (1951). Application of statistics in tanning laboratory. (194) Tippett, L. H . C., “Methods of Statistics,” 4th ed., Wiley, Ken, York. 1952. (195) Tippett, L. H. C., “Technological Applications of Statistics,” Wiley, S e w York, 1950. (196) Tischer, R. G., Jerger, E. W., Kempthorne, O., Carlin, 4 . F., Zoellner, A. J., Food Technol. 7, 223 (1953). Influence of variety on quality of dehydrated sweet corn. (197) Tischer, R. G., Kempthorne, O., Ibid., 5, 200 (1951). Influence of variations in techniques and environment on determination of consistency of canned sweet corn (198) Tryon, If.,Horowitz, E., Nandel, J., J . Research Satl. B u r . Standards 55, 219 (1955). Determination of natural rubber vulcanizates by infrared spectroscopy. (199) Tukey, J. 77’., Princeton University, Princeton, K,J.; unpublished work. (200) Villars, D. S., “Statistical Design and Analysis of Experiments for Development Research,” Wm. C. Brown Co., Dubuque, Iowa, 1951. (201) Volk, W.,~ ~ N A LCHEX. . 26, 1771 (1954). Precision of inass spectrometer analyses of carbureted water gas. (202) Wald, A,, “Sequential .Inalysis,” IYiley, New k’ork, 1947. (203) Watson. 1%’. R., Sylzania Technologist 5, 100 (1952). Neasurement of relative efficiencies of commercial television picture tubes.

777 J . of A p p l . Mechanics 18, 293 (1961). Statistical (204) Weibull, W., distribution function of wide applicability. . 23, 1572 (1951). Design and iii(205) Wernimont, G., A x . 4 ~ CHEM. terpretation of interlaboratory studies of test methods. (206) Wernimont, G., Ibid., 21, 115 (1949). Statistics applied t o analysis. (207) West, L. E., Ibid., 23, 1562 (1951). Standards of unstable materials. (208) Whitman, D. W., TThitney, R. LIcL., Ibid., 25, 1523 (1953) Catalytic activity of cysteine and related compounds in iodine-azide reaction. (209) Wilks, S. S., “Elementary Statistical Analysis,” Princeton University Press, Princeton, N.J., 1948. (210) Willits, C. O., AXAL.CHEM.23, 1565 (1951). Standardization of microchemical methods and apparatus. (211) Wilson, E. B., Jr., “Introduction to Scientific Research.” McGraw-Hill, S e w York, 1952. (212) Wood, E. R., Olson, R. L., Kutting, 11. D., Food Technol. 9, 164 (1955). Comparison of consistency in potato granule sample-: appraised at different times. (213) Youden, W.J., XKAL.CHEM.19, 946 (1947). Technique for testing accuracy of analytical data. (214) Youden, W.J., Analyst 77, 874 (1952). Statistical aspects of analytical determinations. (215) Youden, W.J., Science 120, 627 (1954). Instrumental drift. (216) Youden, W. J., “Statistical Methods for Chemists,” Kiley, SewYork, 1951. (217) Youden, W.J., Connor, IT’. S., Biornetrics9, 127 (1953). Chain block design. (218) Zimmerman, E. IT., Hart, V. E., Horowitz, E.. ANAL.C H E x . 27, 1606 (1955). Determination of sulfur in rubber T-UL canizates.

I REVIEW OF FUNDAMENTAL DEVELOPMENTS IN

I

’ I

I

I

I I

I

T

Characterization of Organic Compounds CLARK W. GOULD Research Laboratory, G e n e r a l Electric Co., Schenectady, N. Y.

HE term “characterization,” according t o Peck and Gale,

the authors of the previous review on this subject (SS), involves the following steps: establishment of purity of sample; determination of physical properties; determination of element a r y composition, functional groups, a n d empirical formula; and elucidation of structural formula and spatial relationships. Craig (16) gives t h e following steps as being involved in a complete structural study of large naturally occurring molecules: isolation and characterization of a single chemical individual, including t h e accumulation of proof t h a t such a n objective has heen reached; determination of molecular size; determination of functional groups; degradation t o smaller fragments which can be identified n i t h knox-n substances or, if not, which can lie characterized and in turn degraded; derivation of the manner in Ivhich the knoJvn fragments are connected; a n d synthesis. *idifference in terminology is evident: Peck a n d Gale’s char:icterization means t h e whole story of qualitative organic analysis, purification techniques, spectroscopy, etc. Craig’s seems to imply those techniques useful for distinguishing pure samples. T h e reviewer has a preference for Craig’s vieivpoint, and con*idem t h e term characterization t o be too broad t o be useful unless chemists can agree on a precise meaning. T o escape being limited b y the dictionary definition a s “a description by a set of characteristics” t h e reviewer must assume t h a t characterization means t h e sum of all the steps in both lists above. This review consists of five of t h e cases of noteworthy work completed from Sovember 1951 t o November 1955 on the

chemical constitution of some natural products. These case