306
Ind. Eng. Chem. Fundam., Vol. 17, No. 4, 1978
Fitting Data to Dimensionless Groups Russell R. Krug Chevron Research Company, Richmond, California 94802
Simulations by Rowe (1974) demonstrate that false correlations using dimensionless groups may result when a measured variable appears in more than one group. The statistical analysis presented here reveals the structure of such false correlations. Statistically unbiased empirical correlations are obtained by a transformation of the original dimensionless groups without loss of the advantage gained by dimensional analysis of reducing the number of variables to a minimum.
The exact solutions to the equations of change are not known for many problems of interest in chemical engineering. Dimensional analyses of the relevant mass and energy balances, however, reveal the minimum number of parameters that effect the response of the system under consideration. These parameters are dimensionless groupings of measurable physical quantities such as characteristic size, velocity, fluid properties, etc. A statistical data-fitting problem occurs when the same measurable physical quantity contributes to more than one dimensionless group. If several physical quantities are measured with different precisions or accuracies, the correlations that result from plotting these groups may be seriously biased by the propagation of measurement errors of the least precisely measured physical quantities. Rowe (1974) demonstrated by simulations that physically meaningless correlations can be observed when the variations of measurements are sufficiently large and random, as might occur for extreme measurement errors masking true physical variations. Entirely false correlations have also been previously uncovered by Krug et al. (1976,1977) in another area of chemical analysis. Such poor data would probably not be present in dimensional analysis studies, however. The effect of measurement errors propagating through multiple dimensionless groups is shown below to bias observed correlations away from those that would be observed in the absence of measurement errors. Thus, to obtain a statistically unbiased fit to dimensionally grouped data, a fundamental understanding of the error structures associated with the groups is necessary to determine how to perform the regression of one dimensionless group onto another. This paper is concerned with determining the functionality displayed by this propagation of errors and the transformations necessary to allow for the data to be fit in an unbiased manner by the usual statistical methods. The analysis reveals that the dimensionality advantage gained from the dimensional analysis can be retained and is not inconsistent with proper statistical data treatment. Theory Consider the case for which a physically measurable quantity x 2 is common to two dimensionless groups and assume this quantity is measured much less precisely than the combinations of all other measurable quantities, x1 and x3, that are relevant to a particular dimensional analysis.
When looking for a functional relationship between two dimensionless groups, the experimenter might start by considering the familiar power-law model 0019-7874/78/1017-0306$01 .OO/O
Nz = KN1P (3) where p is the value of the slope in ln Nz-ln Nl coordinates. He would actually be plotting In ( ~ 3 x 2 ” ) vs. In ( x l x z P 1 ) . It is readily shown that if the values of Nl and Nz change only from variations of xz, the expected value of p is p 2 / p 1 . d In N z d(ln ~ 3 ~ 2 P 2-) d In x 2 P z p z - (4) p = -d -In N1 - d(ln x 1 x 2 P 1 ) - -d= In x 2 P 1 Pi Since the equations of change predict that x1 and x 3 must vary with x z , the result in eq 4 happens only when the variation of x z is due to measurement errors, not due to real physical variations. Such a trivial and false correlation should never be seriously reported, except to demonstrate that entirely false correlations are possible due to the nature of dimensionless groups (cf. Rowe, 1974). A more realistic case is treated below for which the error variance is smaller than the variance due to physical effects yet contributes a bias to the observed correlation. The more complete analysis is obtained by considering the expected value of the slope in In Nz-ln Nl coordinates by least squares as was done by Berkson (1950) in his classic analysis of regression with error in both variables. For this analysis we must take the errors to be additive to zi= In xi where the errors have zero mean and constant variance. Taking y = In N z = (23 + t 3 ) + p2(zz+ c2) and x = In N1 = (zl+ el) + pl(z2+ e2), the least-squares value of the slope in In N2-ln N 1 coordinates is the power law exponent E(.Y - y ) ( x - f ) P=
expanding gives p = [ c ( z 3 + €3 + pzz2 + Pzt2 - f 3 - pzfz)(zi + €1 P l Z Z + PlQ - 21 - Plfz)l/[c(zl + €1 + PlZZ pic2
-fi-
PZ~2l[(Zl- 21)
(5)
C(x - $2
+ +
=[ ~ [ ( ZQ 23) + pz(2z - 22) -t e3 + + P l ( Z Z - 22) + e1 + Pl~ZlI/[C[(Z,- 21) + pi(zz - f z ) + 61 + P I ~ z I ~ (6) I
After multiplying out, several special cases become apparent. If the errors el, cz, c3 are assumed to be independent, the expected value of the slope as determined by least squares is very nearly E[p] i= [cov(Zi,z3) + picov(z2,z3) + pzcov(z1,zz) + pGzVar(zJ + plp2Var(cz)l/[Var(zl) + p12Var(zz) + Var(e1) + p12Var(d + 2~~Cov(zl,zz)l (7) This result is obviously biased by the variance of the measurement errors and is different than the fundamental answer sought for the power law equation. Considering the limiting case of invariant physical measures and the 0 1978 American Chemical Society
2
-0
0
lWc
0
0
00
m -1-
0 0
A
XI$ v
-IS
00
1
0
r
, 10-7
10-6
10-5
10-1
Fr
We
10-1
Froude N o
0
0
00
1I
I
10-2
10-1
I
1
*
I
IO
I 102
-1s
0
0
x W e b e r ho
Figure 1. A false correlation generated by Rowe (1974)due to random variations of heat transfer coefficient and velocity.
correlation due just to the propagation of independent measurement errors, the expected value of the estimated power law exponent is
2
0
Figure 2. A plot of the transformed dimensionless group’s data reveals that no physically meaningfulcorrelation exists between St and WeFr for these data by Rowe (1974).
The observed power law exponent further reduces to the earlier observation E l p ] = p 2 / p 1if the correlation is due entirely to measurement errors associated with 22. The bias associated with the observed value of the power law exponent can be eliminated if the correlating variables, the dimensionless groups, have uncorrelated error structures. The error structures may be made uncorrelated by a proper transformation of the relevant dimensionless groups. The proper procedure, then, is to perform the regression with the errors in y and x independent of one another. This can be accomplished by combining the dimensionless groups to eliminate the common source of measurement errors. Consider two dimensionless groups for which, after inspecting relative errors, two physical measures x 2 and x4 have significantly more measurement error than do the rest of the physical measures, which are grouped here as x1 and x3. N1 = X l X z P l (9)
IV, = x 3 x p x 4 p 3
(10)
T h e plot should then be N f 2vs. N’, (or In N f 2vs. In N f l for a power law model) where
N f , = N1l/P1 Nf,=
(NzN1-P2/P1)l/P3
(11)
Table I. The Simulated False Correlations of Rowe (1974) Are Compared with the Values Predicted by an Analysis of the Propagation of Errors. The Observed Power Law Exponents Agree for These False Correlations with t h e Values Predicted b y t h e Error Analysis
plot
p (observed slope) - 1.12 - 0.50 - 0.496 0.56
St-Re St-We St-Fr Nu-Gr Nu ( d / D)-Gr St-FrWe
0.86 -0.245
slope)
-111 -112 - 112 113 213 - 114
x, U U U
d d U
x 2 = u and x4 = h, the proper plot for data fitting, reveals
that no physically meaningful correlation exists between St and FrWe. Also, the data in Table I show that the above analysis predicting an error slope of p 2 / p 1for the false power law exponent, p , is substantiated by Rowe’s simulations. If a correlation were discernible in the transformed coordinates of Figure 2, a consideration of the relative magnitude of the uncertainties of the y and x variables is necessary to obtain an unbiased fit. Berkson (1950) concluded that the slope, b, calculated by least squares is related to the “true” or functional value, 0,by
(12)
The plot is just a weighted x 2 vs. a weighted x4 (or a weighted z 2 vs. a weighted 2 4 for the power law model). If a line is to be fitted to these data, the techniques of fitting lines to data with error in both variables may be used as detailed by Blarlett (1949), Cochran (1968), and Madansky (1959). If either x 2 or x4 is the ratio of two measured quantities, Fieller’s theorem (1940) may be used to determine the uncertainty of the ratio. If the error associated with the measurement of x4 is much smaller than the error associated with x 2 , then the usual methods of regression may be used for which there is no error in the independent variable x . After an unbiased fit of N12 to N f l is obtained, the irelationship between N2 and Nl may then be determined using eq 11 and 1 2 without worry of introducing bias by the propagation of measurement errors. A n Example. T o illustrate that false correlations might result from the propagation of measurement errors in dimensional analysis, Rowe (1974) performed several simulations. Taking randomly generated values of heat transfer coefficient, h , and velocity, u , he made dimensionless group plots that showed good correlations. Figure 1 shows his plot of Stanton number vs. Froude times Weber numbers as an example. Figure 2 shows that taking
P2IPI (error
b=
PVb)
V(X)+ V(€J
where V ( x )is the variance of the x values and V(ex)is the variance (square of the standard deviations) of the uncertainties of the x values. The intercept at the mean of the experimental range is unbiased, however. A more thorough analysis was given by Bartlett (1949), and the confidence intervals were derived by Creasy (1956). These developments were reviewed by Madansky (1959) and Cochran (1968) and adapted to a chemical data fitting problem by Krug e t al. (1976, 1977). If the correlation in the transformed coordinates appears to be nonlinear, the best procedure is to multiply the x ordinate by X’lz where X = V(t,)/V(t,) so that the uncertainty of y-measures are the same as that of the (A’”%) measures. The residuals to be minimized for fitting the data are now the perpendiculars between the data and the fitted line. A rigorous statistical procedure to cover all such nonlinear data fitting problems probably can never be offered, but the experimentalist can obtain a good graphical fit by remembering that the residuals to be minimized are the perpendiculars, not the vertical displacements as is the case for ordinary least squares.
308
Ind. Eng.
Chem. Fundam.,
Vol. 17, No. 4, 1978
Lastly, if the errors associated with the y or x variables are due to a ratio of the least precisely measured physical quantities, Fieller’s theorem may be used to calculate the variance of the ratio. For example, if both x i = h and x j = C, from Figure 2 were measured with much less precision than p , u, and g, the error variance of the ratio, r = h/C,, may be estimated by a consideration of the confidence interval about r. The confidence interval for r at any level of confidence for which the t statistic is evaluated is given by Fieller (1940) as the roots of the quadratic
and the error variance associated with y = N h is
because [ p / ~ r g ] ’ is/ ~taken as being very precisely measured relative to h, C,, and u. Without the insight from the statistical analysis presented above, Rowe (1974) unfortunately concluded that empirical correlations should be made primarily between the individual physical quantities whenever possible and between dimensionless groups or other such combinations of physical quantities only when absolutely necessary. Although this reasoning makes good sense for brute force empirical model building, many systems of interest to the chemical industry have their various physical quantities obviously interrelated by the equations of change. A dimensional analysis of those equations identifies the minimum number of parameters as combinations of physical quantities that functionally affect the response of the system. Since it also makes good sense in empirical model building to work with the minimum number of independent variables, this procedure for data fitting is desirable because it enables the researcher to work with only the minimum number of variables as determined by dimensional analysis. This procedure allowed the correlation analysis in Figure 2 to remain merely a two-dimensional problem as in Figure 1 instead of as many dimensions as there are measured physical quantities as promoted by Rowe (1974). If a functional relationship did exist, it would be displayed as a structured variation in Figure 2, which could be fit by the procedures mentioned above and transformed back to the original coordinates using eq 11 and 12. The usual dimensionless groups data fitting procedure has the risk of fitting a correlation that results from or is strongly influenced by the propagation of measurement errors rather than one that results solely from the physical interactions that are described by the equations of change. Conclusions Biased and even false correlations between dimensionless groups may be observed when an imprecisely measured variable contributes to the value of more than one dimensionless group. When dimensionless groups
contain common measured variables subject to substantial measurement error, the functional relationship between dimensionless groups must be determined in the transformed coordinates presented above to avoid biased or false correlations which appear to represent true physical effects. The transformed dimensionless groups may then be fit using the usual procedures for regression wit,h errors in both variables. Nomenclature b = slope calculated by least squares Cov = Covariance, Cov(y,x) = E[(y - g)(x - a ) ] C, = specific heat d = diameter D = reference diameter E[p] = expected value of p Fr = Froude number g = gravitational constant Gr = Grashof number h = heat transfer coefficient K = power law proportionality constant N , = a dimensionless group N’i = a transformed dimensionless group composed of one or more N i Nu = Nusselt number p = the exponent in a power law model pi = the power to which a physical parameter is raised in a dimensionless group N , r = ratio of h/C, Re = Reynolds number St = Stanton number t = t-statistic Var = Variance, Var(x) = E[(x - a)*] u = velocity We = Weber number 1: = variable plotted against x coordinate f = the arithmetic average of various values of 1: xi = a physical quantity such as u or h of which dimensionless groups N iare composed y = variable plotted against y coordinate 9 = the arithmetic average of various values of y zi = logarithm of x i 2 = the arithmetic average of various zi Greek Letters /3 = true value of a slope p = density ti = error associated with variable i X = ratio of variances, V(t,)/ V(c,) u = surface tension Literature Cited Bartlett, M. S., Biometrics, 5 , 207 (1949). Berkson, J., J . Am. Stat. Assoc., 45, 164 (1950). Cochran, W. G., Technometrics, 10,637 (1968). Creasy, M. A,, J. Roy. Stat. Soc., 8 ,18. 65 (1956). Fieller, E. C., J . Roy. Stat. Soc., Suppl., 7 , 1 (1940). Krug, R. R., Hunter, W. G., Grieger, R. A, Nature(London), 261,566 (1976). Krug, R. R., Hunter, W. G., Grieger, R. A., J . fhys. Chem., 80, 2335,2341
(1976). Krug, R. R., Hunter, W. G., Grieger-Bkk, R. A,, ACS Symp. Ser.,52, 192 (1977). Madansky, A,, J. Am. Stat. Assoc., 54, 173 (1959). Rowe, P. N.. CHEMTECH. 14, 9 (1974).
Received for reuietu December 27, 1977 Accepted