Introduction to Experiment Design for Chemists Ronald S. Strange Fairleigh Dickinson University, Florham-Madison Campus, Madison. NJ 07940 While conducting a course in statistics and graduate-level experiment design (ED) for several PhD chemists, the author became convinced that the education of many chemists is deficient, both at the undergraduate and graduate levels, in basic techniques of ED. I t is not generally known how ED can lead to a ereatlv reduced number of experiment runs withmuch simplified analysis of data. chemists that become involved in process development often learn these techniquesquickiy enough,and engineernare more likely to have been exposed t(, ED while in school, but chemists are often not trained in design either as an undergraduate or in their graduate research programs. - One purpose of this~paperis to review basic concepts and terminology of ED, analysis of data, and optimization. A suhseouent . oaDer chemis. will introduce an undergraduate try experiment, suitable for advanced general chemistry students or chemistrv maiors, illustrating these principles. In addition, it is hopdd that others will beencouraged to develop similar experiments or modify existing ones and thus to begin including ED in the undergraduate curriculum. I t is not necessary to make room in an already crowded lecture syllabus since these methods can be introduced quite naturally through suitably modified standard experiments. Two-Level Factorlal Designs Suppose it is desired to study the dependence of a process yield on three factors, here labeled X , Y , and Z, representing temperature, choice of a catalyst, and the concentration of some reactant. For each run, the temperature will he either 120 ' C or 140 "C, the catalyst either type A or type B, and the concentration 0.25 M or 0.50 M. A two-level factorial design requires 23 = 8 runs with each factor at low and high settines as shown in Table 1. Also shown in the table are ....----"coded"values, -1 and +1, of the factor settings. The coding of variables is extremelv i m ~ o r t a n for t the analysis of data and here is equivalentto replacing each variable by its Z score (unit normal deviate). The Z3-by-3 matrix of coded iactor settings is referred to as the ''de;ign matrix" and this desien is a "full" or "complete" factorial design at two levels, since each of the eight possible combinations of factor settings appears exactly once. The rows of the design matrixare listed here in standard "Yates" ( I ) order, which is the basis of a simple algorithm for the analysis of results. ~ e f o r kactially carrying out the experiment runs, it is important to determine a randomized sequence of runs. The statistical validitv of much of what follows is dependent on randomization and on the assumption that all the runs are carried out identically, except for the factor settings. The factors may affect the process yield in several ways ~~~
-1 1 -1 1 -1 1
-1 1
(26 (-28 Z*Xinterection: (28
mean:
Xmaln effect:
+ 17 + 4 1 + 34 + 56 + 51 + 42 + 36)/6 + 17- 41 + 3 4 - 56 + 51 - 42 + 36)/4 - 17 + 41 - 34 - 56 + 51 - 42 + 36)/4
= 36.1 = -7.25 = 1.75
~~~~
Table 1. X
and the two-level design, if carried out properly, provides a tremendous amount of information, although there are certain important restrictions. The s