The Use of Non-Linear Least Squares Analysis Thomas G. Copeland Middlebury College, Middlebury, VT 05753 Table 1. Comparison of Non-Linear Fits for Flrst-Order Klnetlcr
A common problem is how to use experimental data to estimate the unknown oarameters in a non-linear eauation. For example, y = x l x z l i ~ x 1+ Bxz), where X I and i2are independent variables and A and B are unknown constants, is a non-linear equation. However, the constants A and B cannot be estimated by the normal polynomial least squares methods. In the past, these equations have been rearranged to linear form for ease of analvsis. In manvcases, the equations needed to be simplified by approximati& before the; could be rearranged. Rather than deal with the manv special cases of rearranged equations, a non-linear least squares analysis can be used to estimate the parameters directly. For any equation that can be expressed the general formy = f(x;),nodinear least squares allows the unknown parameters in the equation to be estimated from experimental data. The function need not be a linear of polynomial form. Once the computer program to do non-linearleast squares is mastered (a fairly simple task) almost any equation can be analyzed. This paper demonstrates how this eeneral method can be aonlied to several common problems. Some statistical oackaees (such as BMDP (1))contain sophisticated p r o g r a k for doing non-linear' Last squares analysis without the need for explicit functional forms of the derivatives. The only input necessary for these programs (other than the data to be fit) is an initial estimate of the value of the parameters to be estimated and the functional form of the eauation to be fit. Since the aleorithms used work well even irthe initial estimates are quiteiar off, any crude method for obtaining initial values is adequate. These s t a h i c a l programs use simple, concise language to determine the s~eciiicsof each oroblem. The control language is typically only 10 to 20 lines long. New problems require only a minor change to a few lines of the control language. Writing appropriate control language is much simpler than writing an entire computer program. These package programs are readily available for many medium-to-large computers. Although they are not presently available on microcomputers, there are less sophisticated programs that will run on micros. The programs that run on micros, in general, do not use the most current statistical methods and, in most cases, relatively little information about the methods of calculation is provided in the documentation. This is one situation where the minor inconvenienceof dealing with a minicomputer or mainframe is overshadowed by the power and ease of use of the software available. In the near future, powerful statistical packages will be available on inexpensive small computers. The case of first- and second-order chemical kinetics is a typical example of the need for non-linear least squares (2,3). There are three Parameters that need to be estimated. For first-order kinetics,
-
Moore (2) pr Po
Rate Error
Table 2. Control Language for First-Order Klnetlcs Fit (PROBLEM /INPUT {VARIABLE /REGRESS /PARAMETER /FUNCTION /PLOT
Journal of Chemical Education
TITLE IS 'First Order Kinetics'. VARIABLES ARE 2. FORMAT IS '(2F15.0)'. NAMES ARE T.P. DEPENDENT IS P. PARAMETERS ARE 3. INITIAL ARE .5,.2.25. NAME = PO.. P1.. R. F = Pl'(1.-EXP(-R'T))+PO'EXF'-R'T) RESIDUAL. VARIABLE IS T.
/END data goes here
mates of the errors in all the narameters. and the eeneral technique becomes slow for more complicated equations. Moore (2) reoorted a non-linear least souares analvsis for ~;t. the first-&der'kinetics of a ~ t o [ , ~ e d - t l o ~ e x p e r i m eThe same data were analvzed usinr the IWDP non-linear. derivative-free least squaies program PAR ( 1 ), and eisenti& the same results were obtained (Table 11. The PAR control language used t o analyze the data is given in Table 2. These commands are very short and quite easy to understand. (For a complete explanation see the BMDP manual.) It is certainly easier to learn to write these commands than it is to write a program to do the analysis. The analysis of the binding of small molecules to macromolecules can become quite complicated even if only one binding site is present and becomes very complicated for multiple binding sites (4). The general equation for small molecules binding to n identical binding sites is e
a
778
*
BMDP PAR 0.299 f 0 . 0 ~ l 3 ~ 0.544 f 0.001 26.4 f 0.7 0.0000332
Emor estimsteo are 1 standard deviation.
..
where P is the measured physical quantity (e.g., absorbance), PI is the measured quantity a t infinite t & n e , - ~is~ the measured quantity a t time zero, t is the time, and r is the rate of the re&ion. In cases where PI cannot be obtained experimentally, Po, P I , and r are all parameters that need to be estimated. This can be done directly using a non-linear least squares analysis (2). Houser (3) recently suggested an alternate, iterative method, but this method does not give esti-
0.301 f 0.003 0.544 0.002 26.8 f 0.7 0.0000341
where u is the velocitv of the reaction. I11is the concentration of small molecules, n-is the number ifsites of the same type, and K is the binding constant.~h~~~are three commonly used linearizations of this equation: the Bjermm plot, the reciprocal plot, and the Scatchard dot. All three of these methods work well for one binding sitk. In the multiple binding case, the equation is
,
where the sum is over the total number of different types of binding sites. However, for multiple-site bmding, none of these analyses gives direct values for the unknown parameters (5). The limiting slopes and intercepts (for very large and very small [I]) give combinations of the various parameters, but the actual parameters can only be easily obtained in some special
Control Language for Two-Slte Binding Problem
Table 3. /PROBLEM /INPUT /VARIABLE /REGRESS
/PARAMETER
TITLE IS '2 Site Binding'. VARIABLES ARE 2. FORMAT IS '(2F15.0S. NAMES ARE A. V. DEPENDENT IS V. PARAMETERS ARE 4. INITIAL ARE 10.,10..1.,.5. NAME = Nl.N2.Kl.K2. . . . F = Nl'AI(K1 + A ) N2'A/(K2 RESIDUAL. VARIABLE IS A.
+
IFUNCTION /PLOT
n2
k, k2
+ A).
Slmulated Two-SHe Binding Data
Id1
Comparlson of True and PAR-Calculated Values of Four Constants
nr
/END data goes here
Table 4.
Table 5.
"
True
Calculated
100 10 1.0 0.01
100.0 9.997 0.99989 0.00998
i0.0009a i0.002 f0.00006 f0.00002
'E m r 891imtes me 1 standarc deviation. The analysis of this problem would take a considerable amount of time using Klotz's method. By using the non-linear equation directly, the problem is simplified greatly. The analysis for Michalis-Menton (steadystate) kinetics leads to equations similar to the multi-site binding equations (6). Extensions of the steads-state analvsis lead to eauations that are inherently non-linear. These equations canbe analyzed easily using a non-linear least squares program. The estimates of the standard deviations of all theparameters in the equation are automatically produced by PAR. These values are necessary in order to establish the validity of the results. PAR will also use experimental errors in the dependent variable to weight the observations. If there are errors in the independent variables, the analysis is more difficult (7-9). In addition, PAR will automatically produce plots of the predicted and observed values of the devendent variahle.,as ~~well as a plot of the residuals. The residial plot can be examined to insure that the residuals are randomlv distributed. Anv systematic pattern to the residuals indicatesMthatthe equation does not adeauatelv fit the data. In conclusi&, noklinear least squares computer programs are extremely valuable in fittine com~licatedeauations to experimentaldata. Such programs are now availadle for most medium to large scale computers. They can be very easy to use and can free students, faculty, and researchers from the tedium of trsina to derive linearized forms to comvlicated equations.kq;ations that cannot be linearized become as easy to handle as linear eauations. Manv eauations that in the vast have been "too complicated to analyze" can now he hanhed quite easily using derivative-free, non-linear least squares methods. ~
cases. However, these analyses can often he used to estimate the initial values for the parameters in the non-linear least squares. The four-parameter equation to fit a two-site binding problem is (4) u = ndAIl(k~+ [All
+ nz[All(kz + [A])
The sample PAR control language to find the four parameters is given in Table 3. Clearly, the control language to solve a very complicated problem is quite easy to write. Twenty simulated data points were calculated using the parameters given in Table 4. Using the above control language, PAR gives values of the four constants that are in excellent agreement with the true values Table 5.
~
~~~~~
Literature Cited ,I Diron. W .I ."RMnPSlat.rtial Scftware 1981." I l n r . (:elifomla P~CII.I9bI. .>IM..lrr.P.,I Chrm Sm Ihrodo) l h n . 1 . 1diklt1'1~3~ , I t H.larer .I..I( ' 1 O M FI)I 1' 59. l-61I9RZf. 48 %nnall. 4. T: Illopnyrtml ('l#l.n>#r!nj' J W~lcy.New York. 19id.('hnp 3 . ,nuon
.
(5) (6) (71 (8) (9)
Klotz.1. M.,andHunston,D. L.,J. Biochem., 10,3065(1911). Re( (41, Chap 10. Chri8tien.S. D.,Lsne.E.H..and Gsrland,F., J.CHEM. E~~~.,51,475(19741 Lane, E. H., Christisn.S. D., and Garland, F., J.Phys. Cham., 80,690 (1976). Onar, J., Amer J. Phys., 50.912 (1982).
Volume 61 Number 9 September 1984
779