Construction and Use of Statistical Control Charts on Continuous

Attempts to hontrol the former will probably be fruitless, whereas it should be possible to control the latter. The statistical quality control chart,...
1 downloads 0 Views 482KB Size
Construction and Use of Statistical Control Charts on Continuous Variables K. A. Brownlee The Squibb Institute f o r Medical Research, New Brunswick, N. J .

1

I n the control of production processes it is necessary to discriminate between random variation in yield or quality and definite variation assignable to identiliable causes. Attempts to hontrol the former will probably be fruitless, whereas it should be possible to control the latter. The statistical quality control chart, due to Shewhart, is a simple means of making this distinction. Typical random variability of chemical parameters can be approximately represented by the so-called Gaussian curve. The range of a group of observations can be used to give estimates of standard deviation. The reasonable fluctuation of averages can be predicted from a knowledge of the standard deviation of the individuals. Nitrocellulose viscosities and streptomycin assays are used as examples of the procedure. The occasional occurrence of variables with skewed distribution can often be corrected for by the use of the transformation to logarithms.

I

?J ALL industrial processes the product has Variability.

r:

It

may be in its quality, as measured, for example, by its density, color, breaking strength, and chemical composition, or it may be in the yield of finished product expressed as a percentage of the raw materials. These are variables, which can be measured on a continuous scale, and are dependent on the state of the process. The statistical quality control chart ( 4 ) is a simple means of distinguishing between that part of the variability which is probably inherent in the process, and that part which is due t o abnormal discrete events. In the language of the control chart, these abnormal discrete events are called “assignable causes”-that is, they can be assigned t o particular causes. iMore accurately, it should be possible to assign them t o particular identifiable causes. In practice, in particular instances it may be very difficult to be certain what was really responsible for a particular episode of abnormal variability. All that the control chart will do is to tell when this abnormality occurred, not why it occurred.

As another example, Figure 2 gives the acidity of a material, originally treated in acid, after several washes of weak alkali and water. Figure 3 shows the moisture of a fibrous paste material after a period in a drying oven. Figure 4 shows the yield of a n antibiotic from 459 consecutive fermentations. These examples show that most distributions have common features-namely, a single central hump, trailing off to zero on both sides, approximately symmetrically. These distributions are often close approximations to the so-called Gaussian, or normal, distribution, which can be represented by the formula e

Y

-

80 ’

Acidity after repeated washing

Percentage

As one example of the typical form of variability, Figure 1 shows the distribution of the percentage of a minor ingredient in a three-component plastic mixture. The graph, or “histogram” as it is called, shows that there were 3 batches with this component at 1.9%, 26 batches at 2.0%, and so on.

QjTu

VI

-

c Q 1D

L

-

ll

;so 40

.n

5 z

20

Acidify

Figure 2.

Figure 1. Distribution of Percentage of Minor Ingredient in Three-Component Plastic Mixture

= Pb) =

Here y is the proportion of observations occurring with a particular value z. The constant m is merely the midpoint of the curve, measured on the 2-axis-Le., the average of the z’s-and u is the so-called standard deviation and determines the width of the curve about its midpoint: The larger u, the more spread out along the z-axis is the distribution.

I

501

--( 5 -2a2rn)!

Residual Acidity of Washed Batches

The Gaussian distribution has a central importance in control chart theory. No one imagines that any actual distribution is exactly Gaussian, but it can be assumed that the approximation is close enough t o be satisfactory for the following reasons: Fundamentally the Gaussian distribution represents the type of phenomena with which we hope t o be dealing. It can be shown that if there are present a very large number of sources of variability, all small and functioiiing independently, these give rise t o a Gaussian distribution. Even where the actual distribution is appreciably non-Gaussian, the distribution of the average of 2 observations is more nearly Gaussian, and as the number of observations in each average increases, the distribution of these averages tends to become closer and closer t o the Gaussian. For this reason in the control chart averages are plotted rather than single observations.

1307

1308

INDUSTRIAL AND ENGINEERING CHEMISTRY

Vol. 43, No. 6

Gaussian dist,ribution that for a variable distributed with a standard deviation U, 950/, of all observations will lie in the range * 1.96 u on either side of the mean. Similarly, 99.8% of all observations will lie in the range *3.09 U. [For a complete table of the Gaussian distribution see (S)]. Accordingly, the averages of the groups of 3 (column 7 in Table 11) are calculaid and the average is found of these averages, X, here 61.8. Inner control lines a t 61.8 * 1.96 X 3.371 = 61.8 * 6.6, or 55.2 and 68.4 are drawn on a chart. Outer control lines go at 61.8 =t 3.09 X 3 371 = 61.8 * 10.4, or 51.4 and 72.2. The inner lines are drawn dotted and the outer lines heavy, and from left to right the daily averages are plotted as they occur (Figure 5 ) . Now in the absence of abnormal sources of variability 9570, or 19 out of 20 points, will be inside the inner control limits-that is, 1 out of 40 above the inner upper control line and 1 Percentage Moisture out of 40 below the inner lower control line. Figure 3. Moisture of Fibrous Paste after Drying Similarly, 99.8% of all points should lie within the outer control lines-that is, 1 out of 1000 above the outer upper control line and 1 out of 1000 below the lower control line. Thus, because there is only a 1 in 1000 chance of a point's fallTABLEI. FACTORSFOR CohSTRUCTION O F C O E T R O L C H A R T S ing beyond--e.g., the upper outer control line on the hypothesis USINGMEANRANGE that abnormal sources of variability are not present-if any parn dr A'0.025 A'n.mi D'o.w5 D'O.QSY ticular point does so fall, it can be said that since such an event is 2 1.13 1.229 1.937 2.81 4.12 so unlikely on the etated hypothesis it is more reasonable t o sup1.054 2.17 2 98 3 1.69 0.668 pose that this hypothesis is wrong; therefore, it is concluded that 0.750 1.93 2.57 4 2.06 0.476 an abnormal source of variability is present. 1.81 0.377 0.594 2.3-4 5 2.33 0.498 1.72 2.21 6 2.53 0.316 As regards the inner control lines, for any single point to fall 0,432 1.66 2.11 7 2.70 0.274 outside one of the limits, say the upper one, is a 1 in 40 chance 1.62 2.04 0.384 8 2.83 0,244 3 10

2.97 3.08

0.220 0.202

0.387 0.317

1.68 1.66

l,99 1 93

It8ridged from Tables 10, 13, a n d 13.4 of (Z). TABLE

Technically the best measure of variability is the variance, or the square of the standard deviation,

Day 1 7

3 4

111 words, this is the sum of the squares of the deviation. (z - 2) of each observat,ion from the mean divided by one l e s ~ than the number of observations. For routine use, however, t,his is tedious, and for many purposes it is good enough to use the range-that is, the difference between the largest and smallest observations in a set. For any size of group the range is related to the standard deviation by a simple proportionality constant, d,, (Table 1)-for example, with samples of size 3 on the average the range is 1.69 times the standard deviation.

5 6 7 8 9

10

11

12 13 14

15 16

17 18 19

11.

DATAFOR COIisTRUCTIoS CONTROL CH.4RT

ILLUSTRATIVE

Kitrocellulose viscosities f o r 3 batches per day Total Range Average Viscosities 58 70 181 63 12 63.7 192 65 68 9 59 64.0 71 63 76 210 13 70.0 62 75 69 209 16 69.7 71 199 65 63 8 66.3 68 185 59 5s 10 61.7 160 51 51 58 7 03.3 173 60 57 4 57.7 56 63 3 64.7 66 194 65 60 7 65.3 73 63 196 60 63 63 186 3 62.0 182 58 10 60.7 57 67 163 57 50 56 7 54.3 178 5 .% 61 63 Y 59.3 165 43 73 30 55.0 49 60 15 62.3 56 187 71 164 64 47 17 54.7 53 157 49 58 9 52.3 50 48 137 48 41 7 45.7

CONSTRUCTIOY OF A CONTROL CHART

The function of a control chart is to discriminate between the usual variability and the abnormal variability. As a measure of the former, the variability between observations close together in -time or in space can be used. The following example relates to t,he production of nitrocellulose, 3 batches per day, for which the Yariation betxeen the 3 batches within each day is taken as a measure of the inevitable variability. Accordingly, the observations (which are viscosities) are written down in rows of 3 (see Table 11) corresponding to each day, and for each row the smallest is subtracted from the largest to give the range. When we have about 15, 20, or 25 such groups we can take the average range, E , which here based on the first 15 is 9.867. We can convert this into standard deviation by dividing by the value of d, (see Table I) appropriate to a sample of size 3, 1.69 giving 5.838. The limits within Tvhich the average can reasonably fluctuate can be found from the fact that if individual observations have a standard deviation U , then the-average of n will be distributed with a standard deviation, u / d n . Here u = 6.838 and n = 3, so the averages will have a standard = 3.371. Kow it is a property of the deviation of 5,838/*

OF h

Figure 1. Distribution of Antibiotic Yields

INDUSTRIAL AND ENGINEERING CHEMISTRY

June 1951

1309

This is not so extreme as t o be a basis for an immediate decision, but is reasonable grounds for suspicion. Accordingly, we wait for the next point. If it also falls outside the same limit, we have two 1 in 40 probabilities occurring consecutively, which 1 1 1 amounts to a 4o X 4o = 1600 chance. We then conclude that the hypothesis of no abnormal variabilit is untenable, and admit that abnormal variability is present. &milady, if 2 out of 3 consecutive points fall outside the inner limits we would conclude that abnormal variability is present. However, if the 1 point outside the inner limit was followed by a sequence of points within the limits, we would regard it as the 1 in 40 probability which does occur once in forty times and forget about it.

-

*

a

40

1

1

1

1

2

4

6

8

1

1

IO

12

1

1

14 16

Figure 6.

I 18 20

DdVS

Days

Figure 5. Control Chart for Nitrocellulose Viscosities

In the chart for nitrocellulose in Figure 5 there are 2 points outside the upper inner control limit a t days 3 and 4, so it is concluded that the system was significantly high compared t o its over-all average a t this point. At day 7 there was a point below the lower inner control limit, but this is followed by points within the limits, so this episode can be forgotten. At day 13 the point falls below the lower inner control limit; day 14 is normal, but day 15 is again below the lower inner control limit. Thus the suspicions which were aroused on day 13 are confirmed, and it is concluded that the average of the process has dropped significantly. Further confirmation of this occurs on days 17 and 18, and on day 19 a point falls well below the lower outer limit.

a

There is a slight difference between British and American practice in the choice of control limits. The procedure of using inner and outer limits a t defined probability levels which is described above is the British practice; the American practice is to use a single set of limits closely corresponding to the British outer limits. Whereas the British outer limits go a t 3.090, the American single limits ueo a t 3 . 0 0 ~ . For 50 an exactly Gaussian distribution these limits include 99.73% as compared to 99.8yo for the 3.090- limits. Clearly, there is no practical 40 differe1ice between these, as in actuality other approximations of all kinds are being made. A . The use of inner limits was outlined above, as in the author's opinion they can add con30 2 siderably to the sensitivity of the control chart. I n the example just discussed, for 5 20 example, they enabled the drift to be picked E, up a t day 15. If only the outer limits were z used, the drift would not have been deterIO mined till day 19. There are a number of ways in which the construction of the chart can be slightly 0 simplified : 02

-

i?

~

Control Chart for Streptomycin Acids

and 3.09. All these operatio:s can be run together. Tab? I gives constants denoted by A 0.026 for the inner limit, and A 0 oO1 for the outer limit, tabulated for different sample sizes from 2 to 10-for example, for samples of size 3 these constants read 0.668 and 1.054, respectively. The mean range G, 9.867, is multiplied by these to give 6.6 and 10.4 as the * limits for 95 and 99.8% probabilities, the same results as obtained from the direct approach earlier. There is no explicit need to calculate the average for every group. It generally saves time merely t o obtain the group totals, which are then plotted on the chart. The average of the totals, rather than the average, of the averages, is obtained, and the control lines go at *nGA o 026 and nGA'o 001, where n is the number in the group on the chart. The left-hand vertical axis can be marked off in a scale appropriate to group totals, while the right-hand vertical axis can be marked off in averages. This latter provision will allow the average level a t which the process is running a t any particular point to be read off without actually having to perform the division of the observed total to get the average. ABNORMAL VARIABILITY WITHIN GROUPS

So far the detection of abnormal variability in the group averages has been discussed. There is in addition a simple method of detecting abnormal variability within the groups. The distribution of the range is known for samples drawn from a Gaussian population, and so grossly excessive ranges can be identified. The mean range is multiplied by constants D'o 915 and D', 999 (Table I), which have been tabulated for different group sizes from 2 to 10. Here for n = 3, the values of these constants are 2.17 and 2.98; hence the 1 in 40 upper limit for range is 2.17 X

~~

- Log percentages _--- Percentages

5

-

There is no literal need to go through the process of calculating u from G, then calculating o-/< and then multiplying by 1.96

Figure 7.

1.0 1.2 14 Logarithm of Percenfage Moisture

1.6

Distribution of Logarithms of Moisture Content Compared with Original Parameter

1310

INDUSTRIAL AND ENGINEERING CHEMISTRY

9.867 and the 1 in 1000 is 2.98 X 9.867, or 21.4 and 29.4, respectively. The ranges are plotted consecutively, and these two control limits are drawn in. The results are interpreted in an analogous manner to the interpretation of the charts for averages. Here it appears that the range on day 15 exceeds the 1 in 1000 limit. As another example, Figure 6 shows a control chart kept on an antibiotic assay. A large number of streptomycin saniples are handled by the customary serial dilution technique ( 1 ) and to keep a running check on the accuracy of the assay, 6 samples with a potency of 200 units per ml. are included in every day’s work. These are treated exactly like all the other samples, and the observed potencies for the 6 samples for each day are assenibled on a form and the total and the range are plotted on appropriate charts. The range chart shows whether the within-day error is running in a stable manner; the totals chart shows if the over-all average for any particular day is high or low. Figure 6 shows one month’s working with control lines based on the average range for the previous month. Here the totals chart is not centered on the observed average, but on the known figure of 1200 (6 times 200 units per ml.), because this is what the true average should be. Inspection of Figure 6 shows that the range chart and the totals chart are both in very good control for the whole month, and one can be confident that the assay was running consistently satisfactorily. The chart has the further use that we can take the average range (37.1), and divide it by the value of d, appropriate to a sample of size 6, 2.53, to give the standard deviation of a single assay, 14.7 units per ml. This is on an average of 200 units per ml., which as a percentage is 7.3E17~. ThisYigure is, of course, of use in interpreting assay figures, as we know that the 95% limits of error for a single assay will be i.1.96 u, or =+=14.47& and the 9570 limits of error for the difference between any two assays rvill be times this, or 20.4%. We thus know \Then to accept a difference between two assays as probably genuine and when to regard it as possibly fortuitous. Often when the dependent variable has a definite lower limit, and a high standard deviation, it is appreciably skewed. For example, a percentage moisture cannot be less than zero, and if the average is at 1.5%, and the standard deviation about 1.0%, the distribution must be compressed inward on the lower side, as it is impossible to go down to, for instance, two times the standard deviation on the lower side of the mean. Under these conditions, though the distribution of the variable itself is appreciably skewed and non-Gaussian, yet the logarithm of the variable is

45

Vol. 43, No. 6

often much closer to the Gaussian form. One reason for this is that there is no lower limit to what the logarithm can be; it can go to minus infinity if need be. .4n example of this was the percentage moisture in Figure 3, which is reproduced in Figure 7 as the dotted line. Superimposed is the distribution of the logarithm of the percentage moisture. I t can be seen that’ the distribution of this latter is much more symmetrical. The short tail on the left is pushed out somewhat, and the long tail on the right is markedly pulled in. In this case, therefore, a control chart xvould be better kept on the logarithms of the percentage moisture. CONCLUSION

Though the control chart is a useful tool, it is a very simple one, being only capable of analyzing variability into two parts, and that only roughly. In practice one meets circumstances where one would like numerical estimates of the relative magnitude of the two variabilities, in order to know on which section of the process one should concentrate one’s attention; t,his information the control chart cannot provide, and the more advanced statistical t,echnique known as the analysis of variance is necessary. Again, situations arise in which there may be three, four, or more different sources of variability, all superimposed on each other. These more complicated systems can also be well handled m-ith. the analysis of variance. ACKNOWLEDGIIIEKT

The author is indebted to D. K. Lapedes for the control chart in Figure 6. LITERATURE CITED

(1) Donovick, R., Hamre, D., Kavanagh, F., and Rake, G., J . Bnct., 50, 623-8 (1945). ( 2 ) Dudding, B. P., and Jennett, W.J., “Quality Control Charts,” British Standard 600 R, London, British Standards Institution, 1942. (3) Hodgman, C. D.! ed., “Hmdbook of Chemistry and Physics,” p. 200, Cleveland, Ohio, Chemical Rubber Publishing Co., 1945. (4) Shewhart, W.A , , “Economic Control of Quality of Manufactured Product,” New York, Alacmillan Co., 1931. RECEIVED AIay 24. 1950. Presented before t h e Division of Industrial and Engineering Chemistry, Symposium on Statistics in Quality Control in t h e SOCIETY,. Chemical Industry, a t the 117th hreeting of the AxERICAx CHENICAL Detroit, Mioh.

INDUSTRIAL AZlD ERGI3EhRING CHEMISTRY plans to publish two groups of papers this Fall which will

add to the now scant literature on the use of statistical methods in chemical engineering. The first of these, a SJ mposium on the use of computer machines, covers a wide range of applications in specific fields such as heat transfer, fluid flow, distillation, absorption, vapor-liquid equilibria, chemical equilibria, and thermodynamics. The work reported was done on both analog and digital computers; a paper b y G . W’. King of Arthur D. Little, Inc., discusses the solution of a gaseous diffusion problem by the Monte Carlo method, using a digital computer. The Symposium on Statistical hIethods i n Chemical Production, w-hichwas presented at the Spring Meeting of the AMERICANCHEM~CAL SOCIETYin Boston, April 1-5, brought forth some valuable information on the use of statistics in “trouble-shooting” (locating sources of rariabilitj) and methods for the treatment of various production data. Included are discussions on the science of production trouble-shooting; locating sources of variabilitJ; statistical design in chemical experimentation; correlation methods applied to production data; and application of tests for randomness to chemical problems.

. ,-