Straight Line Calibration: Something More Than Slopes, Intercepts

Bits and pieces, 51. A series of statistical tests for any linear regression using the straight line and method validation is implemented in a compute...
23 downloads 0 Views 3MB Size
edited by JAMES P. BIRK Arizona State University Tempe. AZ 85287-1604

computer series, 159 Bits and Bieces, 51 Guidelines for Authors afBm and Pieces appeared in July 1986; the number ofBits and Pieces manuscripts is expected to decrease in the future-see Julv 1988 and March 1989 issues. B!t9andPwces authom whndcsrr~hprograms will makcavailable Ilsungp and or machine-rradahlr versmns of rhrir programs Wase read earh description carefully tu determmr compnrlh~htv with your own computing environment hefore requesting materials fmm any ofthe authors. Some pmgrams described in this article and designated as such are available from Project SERAPHIM at 515ldisk ($20 foreign and Canada). Make checks payable to Project SERAPHIM. To order, or became a member of the SERAPHIM Clearinghouse and recieve a Catalog ($20lyear),write to: John Moore, Dlreetar, Project SERAPHIM, Department of Chemistry, University of Wkeonsin-Madison, 1101 Univers~tyAvenue, Madman, WI 53706.

Straight Line Calibration: Something More Than Slopes, Intercepts, and Correlation Coefficients R. Boqu6 and F. X. Rius' Depatlament de Quimica Universitat de Tarragona PI. Imperial Tarraco 1 43005 Tarragona, Spain D. L. Massart Farrnaceutisch lnstituut Vrije Universiteit Brussel Laarbeeklaan 103 B-1090 Brussel, Belgium Univariate reeression using. a straieht line model is one of the most frequent matheiatical operations in chemistry. The facilities given by pocket calculators to compute slope, intercept, and correlation coefficients have contributed to its widespread use. However, there are some extensive misconceptions; for instance, i t is often believed that the correlation coefficient constitutes a statistical test of lack of fit (1, 2)or that, in analytical chemistry, a concentration value can readily be found free of error through reverse interpolation of the analyte response by using the fitted equation. D. C. Johnson has sumested that maphical dispiay of nearly linear data should be considered for publication only if linearity is demonstrated by rigomus tests (3).This preparation constitutes a weak point for many experimentalists. In spite of there being excellent reviews and textbooks that contain calibration chapters in which least-squares regression analysis is explained, see for exam~le.references (4-7). the teaching. of this knowledge and'its'further appl'icati& to actual practice is somewhat awkward mainlv due to lack of familiaritv with statistical t e s t s a n d the restricted distribution of the developed programs. Different kinds of soitware have been published to help scientists in this area. Massart et al. (8-10) have published different programs for method validation. Other programs 'Author to whom correspondence should be addressed.

230

Journal of Chemical Education

Flgure I. Plot of residuals. The straight llne y = 4.418 + 5.139~~ with s = 1.111 has been modeled consider~ngthe points (x,, yJ: (0,1.0), (r.3.8). (2, 10.0),(3,14.4),(4.20.7).(5, 26.9), (6,29.1). are available either as listings (11)or on diskettes (12-151, although some of them are not readily accessible and most of them provide only partial solutions to the problem. In the present report, a series of statistical tests for any linear regression using the straight line and method validation has been implemented in a computer pmgram with the goal of furnishmg an extended, unified, and friendly tool for teaching the calibration step a t the undergraduate level. Given n pairs of observed values x,, y, (i = 1,2, .... n), and using the standard least-squares method, intercept (bo) and slope (bJ are first computed according to the model y = bo + blx. The calibration plot is represented and the estimated standard deviations of the intercept and slope as well as the estimated standard deviation of the errors of responses are computed. The latter parameter is sometimes termed s,,, and, although not very meaningful in giving information of the straight line model (2, 161, the correlation coefficient is given. The plot of the straight calibration line is followed by an analysis of the residuals, n

Yz-YL

in which the latter are calculated together with the sum of their squared values, Z b j -;,I

A plot of the residuals versus the original x-values can be obtained (Fig. 1).In this plot the limits of 1s and 2s, once and twice the standard deviation of the residuals, are indicated as dashed lines. Alternatively, a logarithmic plot of residuals in which log I Ly - bo) l blx I versus log x can be obtained (3). I n this plot, suited for those data sets in which the concentration values vary over more than one decade, dashed lines correspond to log I 1f 28 I and log I 1 f 6 1 where 6 represents a given relative deviation (3). In some cases, for instance when checking for the presence of analyte in natura1,samples by comparing the

amount found against the amount added for different wncentration levels in spiked samples, a test on the intercept could show a significant difference from 0,whereas the slope should not differ significantly from 1. As the presence of random errors leads to a scatter of points around the least-squares line, even in the absence of constant or proportional errors, confidence regions around the true values of the intercept and slope do exist. The program performs individual tests on intercept and slope by checking if the values of 0 (for the intercept) and 1 (for the slope) are included into the respectively computed confidence intervals: Ro:bo i t,,-,

s.d.(bo)

R: b

s.d.(bl)

and t

band around the predicted concentration xu of t h e unknown sample giving a response y,. This latter confidence band can be plotted for a given level of significance as shown in Figure 3 for the same data of the example used above. If nonuniform variance along the ranae of concentrations is detected, the simple least-squares procedure cannot be applied without obtaining inaccurate results. The program can perform a weighted least-squares procedure, consisting of the minimization of the weighted residuals,

to overcome heteroscedasticity problems (18). The user is asked to enter a t least three re~licatesof each exoerimental oint in order to compute the weighting factors, w ,= lls, (41,which will influence the bo and bl parameters. Influential points, that is, those points that are located mainly at the extreme of the calibration range and that therefore have a strong influence on the regression coefficients, can be detected by means of ~ook'sdistance.The approach used has been reported elsewhere by Rius et al. (9). Once anv outlier has been detected.,the orimam recal" ~~~& l a t e ~the parameters of the new regression line without these outliers and the user can perform any of the implemented tests using the new data set. The lack of fit of the comouted s t r a i ~ hline t model with respect to the experimentafpoints usedY to build the model can be checked bv means of an Analvsis of Variance test (ANOVA). In this way, the user can assess the usefulness of the computed regression line as a predictor of unknown responses or input points. The test is conducted by performing a series of nireplications for each x-value and comparing the calculated F ratio:

Er

In other cases, for instance when comparina the responses of two different methods along different wncentrationvalues, it has been pointed out that individual tests of significance on slope and intercept can lead to erroneous conclusions because of not taking into account the strong correlation that exists between the estimated slope and intercept of a straight line obtained by least-squares calculations (17). The joint wnfidence region for slope and intercept a t a given level of significance, a,is enclosed by the ellipse drawn in the plane spanned by bo and bl values according to the expression:

~

~

n(bi,-

bolZ+ 2 (I:xi) (bin,- bo)(bSh- b,) + (Xx;) (b,,.- b1? = 2~2~z,.-z,a

The boundarv of the ellipse is determined by the mamitude of the e