Statistics Training f

To apply the methods of multiple regression we need sets of independent observations on the dependent and independent variables. In a batch process th...
0 downloads 0 Views 447KB Size
September 1951

I N D U S T R I A L A N D ENGINEERING CHEMISTRY

1.90, respectively. For these degrees of freedom (135), the first two are highly significant ( P < 0.001) and the last is almost significant at the 0.05 level. It will be noted t h a t including the third independent variable za has made little difference to the values obtained for 5. and xu (bva.w = 1.186, b,,.,, = 1.072:b,,., = 3.297, and b,,.o, = 3.307). However, the value for b,s.wa(= 0.209) is markedly different from bus( = 0.537). The effect of acid concentration on the logarithm of the stack loss is much less than would have been deduced from the simple regression coefficient. It is clear that variation in acid concentration over the range encountered here is relatively unimportant compared with the effects of variation in cooling water temperature and air tcmperature. To apply the methods of multiple regression we need sets of independent observations on the dependent and independent variables. I n a batch process these can refer to successive batchcs and will presumably be independent. For a continuous process, if the successive observations are sufficiently close together in time so that the effects of any random disturbance present at the time of one set of observations are still present a t the time of the succeeding set of observations, then there are theoretical difficulties in the application of regression methods which are a t present unsolved. However, these difficulties vanish if the time interval is sufficiently large so that the observations a t the succeeding time are not affected by the disturbances which may he present a t the preceding observations. In the present example thc interval between successive observations vas 1 day, which was considered long enough to be satisfactory. Intervals of 1 or 2 hours, o n the other hand, cicarly would have been too small. This example shoivs several t.ypical features of the way the application of regression methods frequently works out in practice. First, it s h o m the desirability of finding suitable transformations of the variahlcs, particularly when they vary over a wide range, in order to st,abilize the variance and to make the relationships approximately linear. As an illust,rationof the gain obtained by using the logarithm of the stack loss, the niultiple correlation coefficient, defined as the ratio of the variance explained or accounted for by the terms fitted t o the total variance, is 0.556; the corresponding figure for a similar analysis on the simple stack loss, without the transformation to logarithms, was

2071

0.123. The gain by using the logarithm, though not serisatiomil, is clear. Secondly, it shows the possibility of intercorrelation lietween pairs of independent variablcs producing fictitious values for the simple regression coefficients. This can be avoided by including all possible relevant variablcs in the multiple regression equation, and it can he studied Ijv considering the various simple and partial correlat,ion cocfficicnts. This example serves to emphasize the distinction hetween observation and experiment. Tn ohsrrvation one takes a set of figures displaying variation which has already occurred, or if still in the process of occuri,ing is varying apparently a t random without t h e conscious intervention of the man who is going to make the analysis. I n a valid expcrinwnt, on the other hand, a schedule of desired “treatment comliinations” (or sets of values for the independent variables) is d r a w u p , then put into random order, and then executed. In particular cases such a n experiment may lie inconvenient or inipossitjle to perform. Here, for example, control of the concentration of acid circulating in the absorption tower would be easy, control of the cooling water temperxture would be, in practice, very difficult hut not theoretically inilxissible, and control of the atmospheric temperature would be iiiorc o r less impossible. Where the treatment combinations arc: executed in accordance with a deliberate schedule of randoniiz:it.ion, concealed correlations with unsuspected variables must tcBntl t,o zero, and under these conditions the simple regression cocfiicients will be valid estimates of the true relationship. For these reasons a properly planned and randomizcd csperiment is generally to be preferred to a correlation analysis of existing data. The uncritical use o f the latter without the check of a confirmatory experiment is liable to lead to error, hut, if properly applied it can he invaluable for indicating ficltls for experiment which are likely to he profit:hle. BIBLIOGRAPPIY

(1) Brownlee, K. A., ”Industrial Experimentation,” p. 61 et scq., New York, Chemical Publishing Co., 1949. ( 2 ) Fisher, R. A., “Statistical Methods for Research Workers,” p. 120 et seq., Edinburgh. Oliver and Boyd, 1941. (3) Fisher, R. .I.,and Yates, F., “Statistical Tables,’’ p. 42, Table VI, Edinburgh, Oliver and Iloyd, 1943. (4) Snedecor, G. W., “Statistical Methods,” pp. 138-68, 340-99, Ames, Iowa, Iowa St,ate Collegc Press, 1946. R E C E I V EMay D 26 , l E l .

Statistics Training f T h e means now available to chemists for acquiring familiarity with statistics are definitely unsatisfactory. Concrete improvement in these means may be expected as a result of the National Research Council textbook as well as the other texts due to appear in the near future. It is to be hoped that university departments of chemistry will make a conscientious investigation of the desirability of

offering courses in applied statistics and that the results of these investigations will lead to provision of better means of training student chemists in the methods of statistics.

T

textbooks, none of which is entirely suited to the immediatc purpose. It is no longer necessary to present arguments in favor of applying statistical methods to the broad field of chcniistry. The collective experience of the last 5 to 10 years amply dcmonstrates the utility of these methodp. These tools have proved useful in various applications including factory scale research, pilot plant research, topics possessing large experimental errors (such as natural products and corrosion), analytical chemistry, and control of production and product quality. Two fields in which statistical methods appear a t

HE object of this paper is to present a review of the means now available for training chemists in statistics and t o make some recommendations for future improvement. The desirability for a general discussion of this subject has been born in on me by several experiences which, I am sure, are far from unique. Whenever I have addressed audiences of chemists on the value of statistical methods in various phases of chemistry, a few would-be converts have said a t the end of the meeting: This sounds very fine-what can we do about it? Unfortunately, I have not had a very good answer t o this question. The only response possible has been to suggest a number of

HUGH M. SMALLWOOD United States Rubber Co., Passaic, N. J .

2072

INDUSTRIAL AND ENGINEERING CHEMISTRY

present to have little if any application are preparative work on a laboratory scale and exploratory research concerned primarily with large effects. I n addition, there are a number of important fields of physicochemical research where little or no attempt has been made t o evaluate the utility of statistical methods-for example, molecular structure, reaction kinetics, spectroscopy, and the like. Serious application of modern statistical methods to the problems of these fields might yield very valuable results. Theie are, therefore, many chemists who have a real need for knowledge of statistical methods. Before turning t o consideration of t h e means by which they may obtain this knowledge in the future, i t should be pointed out t h a t many practicing chemists are now familiar with statistical methods and are making extensive use of them. I n most cases these chemists have entered the statistical field by the back door of self-study. This means of entry has often been like the old-fashioned method of teaching a child t o swinnamely, throw him overboard. On the whole, the disadvantages of this method outweigh the advantages: There is a strong tendency t o obtain an incomplete arid biased knowledge of the field and, particularly during one’s eaily days, serious mistakes may be perpetrated. In addition, some companies have sponsored training courses for their employees. These have been given with varying degrees of formality. Training of this type. has, of course, been limited t o those concerns large enough to have among their employees or consultants someone with considerable training in statistics. Generally, this type of training is on a comparatively cl(.nientary level. Alany chemists have benefited by the numerous coursefl on quality control t h a t have been sponsored by various universities throughout the country. These courses have furnished an intensive introduction t o one of the important phases of statistical applications, b u t they have not been designed t o provide a broad knowledge of the subject. Valuable as they have been t o industry in general, they 11:~venot fuinished the type of training needed by most chemists. In considering the future, it is desirable to differentiate between training of practicing chemists on the one hand and, on the other, the university trrqining of students of chemistry. Undoubtedly, self-study and training within industry will continue to be the chief means by which practicing chemists acquire familiarity with these methods. I n the past the principal liability t o this type of training has been the lack of a suitable text. There are many textbooks on the applications of statistics to subjects other than chemistry. Recently a few texts have appeared in which sp6ckl attention is paid t o the problems of chemical industry. These have the advantage of familiar examples, and do not require wrestling with the unfamiliar terminology of other subjects. Study of statistical methods is suffiriently difficult without the additional hazard of examples involving heterozygous twins or the various types of fertilizers with which the agriculturalists concern themselves. None of the current books on applied statistics, however, gives enough of the background and underlying assumptions t o enable the average physical chemist t o avoid the feeling t h a t he is being talked down to. This is because these texts have been written for professions in which mathematical training is the exception rather than the rule and in which the subject matter is largely empirical. These circumstances have led t o t h e appearance of numerous texts on applied statistics t h a t tell the reader what to do, b u t fail t o develop the underlying theory of the methods. I n order t o acquire information on this phase of the subject the student must either. go t o the texts on mathematical statktics or t o the original literature. These expedients are not entirely satisfying, as there is rather more emphasi8 on rigor than is desirable.

Vol. 43, No. 9

Several years ago the National Research Council Committee on Applied Mathematical Statistics recognized the desirability of a text t h a t would obviate these objections. With the benefit of grants from the Office of Naval Research and through t h e hospitalit’y of Princeton University, it has recently been possible t o have such a manuscript prepared. Carl A. Bennett of the General Electric Co., Hanford, Wash., and Norman L. Franklin of the University of Leeds w e the authors. This manuscript is now in the process of revision. . It will constitute an advanced, self-contained presentation of those methods of value t o the research chemist, with most of the underlying theory; it is hoped t h a t this book will be published within the next 12 t o 18 months. Two other texts will noon appear. One, by W. J. Youden of the National Bureau of Standards will be called “Statistical Methods for Chemists”; the other, by D. S. Villars of U. S. Naval Ordnance Test Station, Inyokern, Calif., will have the title “St,atistical Design and Analysis of Experiments for Development. and Research.” These books will be more introductory than the Bennett-Franklin test. Appearance of theae three texts should facilitate both self-study and other methods of training. T o obtain, a rough picture of university training in statistics, the author directed a survey of the catalogs of 54 representative universities in this country. This was not a random sample. The significance of the conclusions drawn from it, therefore, will be about the same as the significance of those drawn by Kinsey in his recent survey. All but three of the institutions included in the survey give courses on statistics in the mathematics department or, occasionally, in a separate department of statistics., These courses, for the most part, present the theory of statistics without special attention t o the applications. I n one respect the subject of statistics is certainly uniquenamely, in the number and variety of courses given on its applications. No less than fifteen different fields of application are listed in the catalogs that have been examined. On the average, each institution has three fields of application. The fact t h a t so many different departments give elementary courses in applied statistics seems t o me undesirable. This situation may mean t h a t in t,hese subjects the use of statistics is compartmentalized, and t h a t there are those who feel t h a t statistics constitute no more than a specialized tool which may or map not be needed by a practitioner of the subject. Among the 54 institutions whose catalogs were examined, only one offers a course in statistics for chemical engineers. There iu no course available for chemists. This almost complete lack of attention t o the preparation of chemists in statistics by the universities is undoubtedly a serious condition. T h e only logical conclusion is that departments of chemistry and chemical engineering have failed t o keep up t o date with the needs of their students. It is t o be hoped that in the near future alterations in university curricula will enable the students of chemistry and chemical engineering t o obtain thorough familiarity with these important tools. It is possible t h a t the situation is not quite as bad as these catalogs indicate. Conceivably, some faculties might encourage chemistry students t o take courses in statistics and might introduce statistical topics in the regular chemistry courses. Such contacts as the author has had with universities, however, have led t o the belief t h a t such emphasis on statistics is estremely rare, if indeed it exists a t all. Probably the only way that suitable alterations could be effected in a reasonable time would be by instituting more courses in applied statistics for chemists and chemical engineers. I have already criticized the tendency to have a separate course in statistics for each field of application. Further courses of this type could only be justified by reason of expediency. Industry needs chemists trained in statistics. This appears t o be the readiest way of filling this need, since i t is generally admitted t h a t a

September 1951

I N D U S T R I A L A N D E N G I N E E R I N G CHEMISTRY

suc:c~css~ul course in applied statistics can only be given by one n-ho has specialized in the field of application. Certainly the average student of chemistry would have difficulty in obtaining much of practical value from the usual course in mathematical statistics. College and university schedules are undoubtedly already overloadtd, but i t would be well for t,he various faculties t o compare the value of some of the material now given with t h a t of a course in applied statistics and so determine the advisability of niakiiig room for the statistics. I t can certainly be concluded t h a t graduates with statistical training will kw of more value in certxin fields than those without it. Introduction of morc courses on applied statistics for chemists, however, would be only a temporary Polution. The change that, is to be desired is more far-reaching than this. Universities do not offer a dozen or more courses i n applied mathemat,ics or applied English. Why should it he nccessarv to have so ni:iny courses in applied statistics? Eventually, it should be powible t o teach statistics in a nxiiiner similar to t h a t in which mathematics is now taught. The student. takes t,wo or more courses in niathematics given by a Inathematician. H e learns his npplicd mathematics incidentally t o his coui'ses in physics and phg-ric:il chemistry. U71ij.cannot statistics be haridled in t,he same way? A basic course i n mathematical statistics given by a statistici:tn should lay the foundations for future applications. This type of training has been discussed by Snedecor ( 1 ) and by JVilks ( 2 ) . I t equivalent t o the general training in mathematics given should a s a pr,eiequisite t o all advanced scientific study. But such trttinirig would fail in its purpose unless it3 principles were eshauetively applied. The applicationi; should permelite all subesample, the, sequent courses of a quantitative nature-for course in quantitative :iiialysis should make use of control charts in studying experirniintal erroi', Experiments in phywical chemistry should be used to illustrate the design of exprr~inients arid statistical analysis of results as well as the l n \ w u i d phcnomena of physical chemistry. Couiws i n industrial clieniistq.

should include study of control charts and some of i 1 : i t tLi methods of trouble shooting. Statistical methods ofi'er a bag of tricks that is almost aliva>.s useful and is sometimes :in absolute necessity. But there is inon> to it: Modern statistics is a formalization of much t h a t is instinctive t o a first-class experimenter. The explicit expression 01' these principles of sountl esperimentation is helpful t o exprrinienters of all grades arid can be counted on t o decrease their mistakes and to increase their efficiency. The far-reaching importance of the subject justifies its iriclusiori in the edurat,ion of :ill experimentalists. It is too much to hope that a revolutionary change in chemical curricula can be r:ipidly effected. For the present it will be ii major advance for representative chemistry department* t o install courses in applied statistics. However, there is liopc' that the time will come when statistics and statistical methodology are as much a pmt of our common way of thinking ns are the algebra, el&ir:nt:ir!. c:ilrulus, :md granimnr lenrncd :it R I I early age. It appears that industrial chemists m e now more awaw of thy importance of statisticaal training for chemists tliari are tht, university chemists n h o give the training. I n order to infoi,m the faculties of the extent t'o which training of this type is needed, each of us who has found it useful must appeal to app1,opriatc members of his Alma RIater t o 1,ecognize the w e d and to do something about it. If such an appeal is made i n sufficient volume, favorable action n41 rwult, in the not. too distant futiiri.

h t l

ACKNOWLEDGMENT

The author is indobtecl to Irene hlontz and &Lie 1'. Hildt.lirandt for tabulating the c o u r s t ~in statistics listed i i i t l i t a various catalogs. L r r m u m r m CITED

(1) Snedecor, George IT., J . A m . S t o t . Assoc., 43,53 (1948). ( 2 ) Wilks, S.S., Ibid., 46,1 (1951). KKCEIVED ;ipril 21. 1951. I.nitad States Rubi,er C'o.

C(,riti.ihution So. 114 fi,oiii ( h i i t ~ r i i l1 . ~ 1 ~)

: t r 0 1 n 1 ~

(END OF SYMPOSIUM)

Organic Derivatives f Alginic Acid ,iRNOLD H. STEINER AND WILLIAM H . 3IcNEELk' Kelco Co., Sari Diego, Calif.

Previous attempts to prepare organic derivatives of alginic acid adaptable to commercialization have met with little success. Drastic conditions required to accomplish substitution by the usual reagents largely destroy the colloidal properties of the polymer. As a part of a research program designed to extend the range of usefulness of algin, alginic acid has been found to react under mild conditions with various alkylene and substituted alkylene oxides to give a unique series of algin products. These water-soluble alkylene glycol alginates give viscous solutions at relatively low concentrations. In contrast to sodium alginate, these esters are soluble in acid solutions. The algin derivative, propylene glycol alginate, is now available commercially. 1 ts pronounced thickening and

emulsifying powers in acidic solutions have resulted in extensive commercial application in such uses as French dressing, salad dressing, flavor emulsions, meringues, pharmaceutical jellies, and foam stabilization. e

I

4

m

.

0

N 1881 while attempting to find a use for the seaweeds which were abundant around the 13ritish Isles, Stanford (20, 21') discovered a colloid that he named algin. H e then spent thc next few years establishing the properties of this water-solublt. gum which he obtained by an alkalitie digestion of several species of brown algae. The alkali salts such as sodium and potassium alginate as \vel1 as ammonium alginate were found to give viscous aqueous solutions at remarkably l o a algin co~iceiitrations. These watc.i,-solul)li, algins were precipitatd from