Dale A. Brandreth' University of Toronto Toronto 5, Ontario Conada
II
lome Practical Aspects
M o s t experimentalists and some theoreticians are at some time in their careers faced with the problem of representing data by means of approximating functions. Sometimes the form of the function to be used is suggested by theoretical considerations; often it is not. Unfortunately, despite the extensive literature devoted to this subject, there has been only a token effort directed at developing a practical scheme for attacking the sort of curve fitting problems which arise so frequently in experimental work. Too few of the theoretical results known to mathematicians for a long time have been presented in an assimilable form to the engineers and scientists who could make good use of such results. I n what follows an attempt is made to extend the number of possible methods for solving curve fitting problems beyond the few very common methods, and also to point out several pitfalls which commonly arise. There is no pretense either of completeness or of rigor in this presentation. Completeness would necessitate a book; rigor a t this stage would impede progress. The sort of curve fitting problems considered here are those in which an experiment has yielded a set of points each of which has associated with it a certain experimental error. The problem is to find a reasonable representation of the points by some analytic function. The choice of best fit is guided by the experimenter's knowledge of the behavior of the system under study as well as by some measure of the differences between the curve and the points. Problems which deal with interpolation of "exact" data, as from tables of functions, or which deal with approximating a complicated function over a fairly long range by a simpler function with an easily computable form are not treated here, although such problems are quite similar. For the sake of simplicity only functions of a single variable will be considered here. There are three things to be decided before proceeding with the mechanics of the problem: the criterion of approximation, the form of the function, and the weighting of the points. These are dealt with in turn below, and an example is given for illustration. The Criterion of Approximation
Here the only criterion of approximation used will be the least squares criterion. The least squares criterion
'
Present address: Organic Chemicals DepartmenL, E. I. Do Pont,, Wilmington, Del. 21f the measurements are of meqnal precision, probability theory says t,he hestvalrteis that for which the somof t,heweighted sqnares is a minimum. a The residual3 are the vertical distances of the points fmm the fitted curve, assuming that the error in the variable plolted as the abscissa is negligible compared to the variable plolted as ordinate. The more general case can be allowed for.
01 curve Fitting
has a sound basis in probability theory (I) and is the most common choice. Given a set of n points determined with equal preci~ion,~ the least squares criterion says that the best or most probable value is that value for which the sum of the squares of the errors is a minimum. When the Gaussian error distribution does not apply, the criterion of best value will be different. The extension to curve fitting is simple. In curve fitting the best representative curve is that for which the sum of the squares of the residuals3 is a minimum. I t should be made clear that the criterion of approximation is only a measure of how closely the curve comes to the points; it is not the only criterion applied to the fitted curve, for it may happen that the curve passes very close to all the points and must still be rejected for its bad behavior between the points. The Form of the Function
The crux of the problem of curve fitting is choosing a function which can take on the same behavior as the set of points. There are few guidelines to facilitate the choice. As Rice (3) has pointed out: ''This is one of those areas where mathematics is an art rather than a science, and the only guides are experience, intuition, and experiment." The common use of polynomials is at least made plausible by a theorem of Weierstrass: An arbitrary continuous function can be approximated by means of a polynomial within any prescribed degree of accuracy. Of course, this theorem still leaves undecided the degree of the polynomial. If a polynomial of too high degree is chosen, the likelihood of introducing possibly undesirable inflection points becomes greater. A simple practical method of approximating the degree is to draw a reasonable approximating curve, and to then make a table of values of ,f(x) for equally spaced values of x. A differencetable (3) can then be constructed to see what order of differences are approximately constant. If the differencesof nth order are approximately constant, then a polynomial of degree n should be tried. Polynomials are greatly overworked in curve fitting probably because so little consideration has generally been given to other functions in the engineering and applied mathematics literature. Rice (4) has pointed out that the rational functions
have a much greater degree of flexibility inasmuch as a rational function of relatively low degree can take on a form which cannot be effectively approximated by polynomials of low or medium degree. This type of funcVolume 45, Number 10, October 1968
/
657
tion is more difficult than a polynomial to evaluate, but with digital computers so commonplace, that consideration is minor. Another artifice which is sometimes of use with polynomials is to attempt to transform the given set of points to another form more easily represented by polynomials. For example, the given set of points might be divided or multiplied by a function which would have the effect of linearizing or otherwise simplifying the set of points for polynomial representation. This again involves judging what a suitable form of function would be to effect the transformation, and again there is no simple mle to follow. Related to the algebraic polynomial fitting just described is snlinefittinr in which nieceuise ~olvnomialsof nth degreeLarejoined smoothly so that thky have n - 1 continuous derivatives. These spline functions are useful because of their relative ease of computation and wide flexibility. Price and Simonsen (5) have outlined the use of spline functions at more length. Before trying more complicated forms, it should be considered whether the data might be fitted using commou nonalgebraic functions such as the exponential, logarithmic, or trigonometric functions either by themselves or in combination with polynomials of low degree. A method of curve fitting which has seldom been used but which has much to recommend it is the use of a type of discontinuous orthogonal polynomials known as Vettin polynomials. These polyuomials are analogous to the Legendre polynomials, and in fact are defined in such a way that
-
where w is the equidistant length between points and Pi(x) is the Legendre polynomial. Like the Legendre polynomials they are defined in the range x = -1 to x = +l. The first four Vettin polynomials are
The restriction of mutual orthogonality takes the form
where N = l/w. It is then assumed that f(x) can be represented by a least squares approximation of kth degree:
The entire procedure is similar to finding the constants in a Fourier series expansion, except that with the use of a finite number of points, summations are made rather than integrations. This method has the advantages that the calculations are easy and that the values of the constants are independent of the point of truncation of the series. A considerable disadvantage is that equidistant points are required. However, a curve can be drawn by eye and equidistant values can he read off. It is sometimes advantageous to proceed in this manner because it is thereby possible for the experimenter to build into the fitted curve his knowledge of the system's behavior-knowledge which often cannot be easily expressed mathematically in the form of additional constraints. But there is a danger too in this method because it is possible to build preconceived ideas into the resulting fitted curve. Further details of the use of Vettin polynomials have been given by Jost and Rock (6). The use of similar orthogonal polynomials has been discussed in other places (5). A method using continued-fraction expansions has been discussed by Lapidus (7), but no evaluation of its usefulness has appeared. Weighting of the Points
The weighting of the points must be chosen on the basis of an error analysis for the experiment which generated the points. I n the absence of better information it is commonly assumed that equal weighting of all the points is satisfactory. One of the main points to be emphasized here is that unequal weighting may he introduced without being realized. Suppose that a suitable function has been chosen, and suppose that it is desired to weight equally a set of points distributed equally over the x-range. The choice of the funct,ion and the least squares criterion together already prescribe a certain weighting of the points vhich is not necessarily the desired equal weighting. An example will illustrate this point. * Assume that data have ~i~~~~ I . Aaumed p.rabo~o-~ike been obtained which look dots. like those in Figure 1. An obvious choice of a function is a parabola. Thus h = x(1 - x)A, where A is a constant. Then the least squares criterion says that: S = (L- z ~ ( l X ~ ) A ) ~( h ~ za(1 - z1)A)= . . .
Fi +
h
-1
+ A
+
(6)
for n points and that
The least squares restriction has the form with the result that Equations (2)-(4) are sufficient to fix the form of the polynomials and to determine the constants a;. The constants ai are given by Consider now the effect of the points at both extremes of the x-range in determining the value of the constant A. Clearly their contribution is minor com658
/
Journal o f Chernkol Educofion
pared to the mid-range values. The points are not equally weighted. I n this case a simple remedy would be to fit not h-values, but h/x(l - x)-values. Then if the above procedure is carried out, it is found that
I
"
"
"
"
'
I
where each point contributes equally to A. Similar situations arise with othcr functions, but with a larger number of constants the weighting is not so apparent. Mechanics of Solution
The mechanics of the solution of the least squares curve fitting problem will he considered very briefly. Having adopted a suitable function with m constants and a scheme of weighting, the least squares criterion indicates that the expression for the sum of the squares of the residuals should be differentiated with respect to each of the constants and that each of these partial derivatives should then be set equal to zero. Thus a set of m equations in m unknowns results. I t is very desirable that these equations be linear in the constants. If the rational functions described earlier are used, for example, a set of complicated nonlinear algebraic equations results. Methods for the solution of nonlinear equations do exist, but difficult problems can arise. Scarborough (8) gives an iterative procedure which can be used if a reasonably close first approximation to the solution is kno\vn. Even when the equations are linear it is quite common with algebraic polynomials for difficulties to arise because the coefficient matrix is ill-conditioned, i.e., the coefficients in the set of equations are in such a ratio to each other that round-off errors build up so rapidly that unusable solutions result unless a large number of decimal places is retained in the calculations. The use of double precision arithmetic (16 decimal places) instead of the usual single precision arithmet,ic (8 decimal places) in high-speed digital computers usually overcomes this difficulty, but of course the situation depends on the number of equations and on just how ill-conditioned the matrix is. A n Example
To illustrate some of the points mentioned here an example from thermodynamics will be considered. The problem is t,o find a thermodynamically suitable function to represent the excess enthalpy (heat of mixing), hE, for binary liquid mixtures involving an alcohol and a "simple" substance over the entire composition range, i.e., mole fraction alcohol, x, from x = 0 to z = 1. I n this case thereare additional restrictions placed on the function. Although thermodynamics gives no information as to the form of function to use, it does impose certain restrictions on such functions. Miller (9) has summarized some of the mathematical properties of a suitable function: (1) The f~mrtionis contin~ons (2) It possesses only one derivative a t each point, 13) I t is as free as possible from inflection points (4) It is singlovalued in composition ( 5 ) It is homogeneous of degree zetn in the mole fractions (6) I t admits of a total differential
Figure 2. A comporiron of various equotionr for Rtting excess entholpy doto for a binav rolution of o Rvorooicohol in chloroform.
= 0 f o r 5 = O and z = 1, where.fE reprexents nnv of the excess fnnct,ians (8) j8/z(l - I)mmi he finite a t z = O and z = 1.
(7)
fR
Figure 2 shows excess enthalpy data for the .50°C isotherm for the system chloroform-Ca fluoroalcohol (HCF2CF2CH20H)plotted as hE/x(l - x) versus x rather than as hE versus x because this way of plotting allows a more realistic appraisal of the regions near x = 0 and x = 1 (10). On an hE versus x plot large relative errors are scarcely detectable because hE is relatively small in those regions. Although several types of functions have been proposed in the literature (11) for representing hE data for solutions of alcohols, all of those functions have certain drawbacks. I t was therefore decided to try to find a function which would satisfactorily represent the data and at the same time conform xith other restrictions. After several types of functions were tried, one mas chosen of the form
which is a modification of one given previously (11). This function has some of its desirable features "builtin." First, assuming finite constants A;, hE must bc zero a t x = 0 and x = 1 because of the factors x(l - x) ; and hE/x(l - x) will always be finite. The sharp increase of h" near x = 0, characteristic of alcohol solutions, is accommodated by the x"? term. The least squares criterion yields a set of lc linear equations for determining the constants A;. For the five-constant form of eqn. (9) the set of five equations analogous to eqn. (7) is as follows for N points:
Volume 45, Number 10, October 1968
/
659
the proposed equation with five constants for this particular system
N ?=I
+ . . . + x (1 - z;)?Z,", N
1 -z ) z A
j=1
The final requirement is that, the equation fit the data nith a standard deviation consistent with the evident scatter of the point,s. Without special provision for weighting, the proposed function did not provide a satisfactory fit in t,he end regions. However, it was found that when a weighting function ~ r a sused which roughly equalized the percentage contribution of each point to the elements of both t,he coefficient matrix and constant-terms mat,rix, then a satisfactory fit was ohtained. Figure 2 shows the proposed equation
as well as five- and six-constant forms of the Guggenheim-Scatchard equation (12) and a five-constant Rtrazek-VanNess equation (IS), which have been used for systems vith an associated component. Those curves that exhibit a sudden up-turn or down-turn near x = 1 are rejected because experience vith many systems leads to the belief that such behavior is unlikely. T