Estimating Parameter Precision in Nonlinear Least Squares with

Estimating Parameter Precision in Nonlinear Least Squares with Excel's Solver .... Internally Consistent Ion Volumes and Their Application in Volume-B...
0 downloads 0 Views 285KB Size
Information • Textbooks • Media • Resources edited by

Computer Bulletin Board

Steven D. Gammon

University of Idaho Moscow, ID 83844

Estimating Parameter Precision in Nonlinear Least Squares with Excel’s Solver Robert de Levie† Department of Chemistry, Georgetown University, Washington, DC 20057; [email protected]

In a recent contribution to this Journal, Harris (1) emphasized the power and convenience of Solver, the nonlinear least-squares routine based on the Levenberg–Marquardt algorithm (2, 3) that comes with the Microsoft Excel spreadsheet. As Harris pointed out, Solver yields fitting parameters, but provides no estimates of the precision of those parameters. Separating estimates of the values of the parameters from those of their precision is often appropriate, because computing the parameter values relies much less on the assumption that all data fit a single Gaussian curve than calculating their precision does. In the words of Press et al. (4 ), “χ 2 minimization is a useful means for estimating parameters even if the measurement errors are not normally distributed.” The assumption that experimental data follow a Gaussian (a.k.a. “normal”) distribution is often made simply on the basis of convenience, leaving unanswered the question whether it is justified. Be that as it may, that assumption is often made specifically to allow one to compute a numerical estimate of the resulting parameter precision. Below we will assume that this assumption is made, for whatever reason and with whatever justification, and focus on the associated computation of parameter precision. Harris describes a method for estimating parameter precision that is quite laborious, since it requires the user to repeat Solver, manually, as many times as there are data in the data set. Harris states that, for the 13-point data set used to illustrate his method, it took him about 10 minutes. Since the number of required operations is roughly proportional to the number of data points used, this corresponds to about 45 seconds per point. Clearly, such an approach becomes impracticable for all but rather small data sets. Moreover, such labor-intensive work is quite error prone. Finally, given the extensive macro facilities of Excel, such a manual procedure would seem to be contrary to the purpose of using a spreadsheet, a tool specifically designed to economize operator time and effort. Fortunately, much more efficient methods exist, which do not require that Solver be repeated for every data point and which, moreover, are readily automated in the form of a macro. In this way, the parameter precision of any data set, large or small, can typically be computed in a matter of seconds. For example, a recent book by Billo (5) provides such a macro, SOLVSTAT, written in XLM, the macro language of Excel 4. Even though Microsoft no longer develops XLM, this macro should still run in Excel 5, Excel 7, Excel 97, Excel 98, and Excel 2000. Here I briefly describe how one can write such a macro in VBA (Visual Basic for Applications), †

Current address: Department of Chemistry, Bowdoin College, Brunswick, ME 04011.

1594

the powerful macro programming language introduced with Excel 5, and illustrate its applicability. The macro is available from me on request. Writing and debugging sophisticated programs is an activity best left to professional programmers. Fortunately, well-tested subroutines exist that can perform many oftenused mathematical procedures, such as matrix inversion. The Numerical Recipes by Press et al. (6 ) provide an excellent source for such subroutines, which are available in a variety of computer languages, including Fortran, Pascal, C++, and Basic (7). The Basic subroutines are most useful in the present context, because Excel can incorporate them in its macros without modification. The Algorithm The approach described here is based on the usual assumption that the uncertainties in the N data pairs x,yexp are exclusively found in the dependent variable, yexp, and that these uncertainties follow a single Gaussian distribution. It is further assumed that one has already used Solver (once) in order to obtain a solution ycalc based on a mathematical model y calc = F(x, a1, a2, …, aP) for the experimental behavior, that is, that Solver has provided a Levenberg–Marquardt estimate of the parameters ai by minimizing χ2 = Σ(yn,exp – yn,calc)2. The standard deviations σi of the P parameters ai are then given by (8, 9)

m ii{1 χ 2 N –P

σ1 =

(1)

where χ2 = Σ(y n,exp – yn,calc) 2, N is the number of experimental data pairs considered (i.e., n = 1, 2, 3, …, N ), and P is the number of adjustable parameters ai used by Solver (hence i = 1, 2, 3, …, P). Finally, the term mii{1 denotes the i th diagonal term of the inverse of a P × P matrix M containing the partial differentials of the fitting function ∂Fn/∂ai where Fn = yn,calc. In other words, the matrix M contains the elements

mi j =

N

∂F ∂Fn ∂a j

Σ n n =1 ∂a i

(2)

Having to use partial differentials as well as matrix inversion may sound rather forbidding, but this is precisely where the spreadsheet macro comes in, to do the manual work for us. To compute the partial differentials ∂Fn/∂ai = ∂y n,calc/∂ai for each of the N input data, we return to the definition of the differential quotient as ∂Fn ∆ Fn Fn x n ,a i 1 + δ , a j≠i – Fn x n ,a i , a j≠i = lim ≈ ∂a i ∆a i→0 ∆a i ai 1 + δ – ai

Journal of Chemical Education • Vol. 76 No. 11 November 1999 • JChemEd.chem.wisc.edu

(3)

Information • Textbooks • Media • Resources

provided that δ