A Simplified Calculation of the Real Confidence Interval in Analytical

order to calculate a real confidence interval for an analytical determination. Preliminaries. All analytical determinations contain a degree of random...
0 downloads 0 Views 121KB Size
Research: Science and Education

A Simplified Calculation of the Real Confidence Interval in Analytical Methods Javier Galbán GEAS, Analytical Chemistry Department, Faculty of Sciences, University of Zaragoza, Pza. San Francisco s/n, 50009-Zaragoza, Spain; [email protected]

Interpolation into a calibration line is one of the most commonly used quantification methods in analytical determinations (1). However, some degree of imprecision is normally associated with the construction of a calibration line, which is then translated to the concentration values calculated from it. This source of imprecision is very important yet is rarely taken into account in the classroom given that the mathematical models currently used for its calculation are complex and therefore difficult to understand. Students therefore do not become familiar with this cause of imprecision and as a result ignore it in their future professional activities once their education is completed. The aims of this paper are to present a simplified mathematical model for the calculation of this type of imprecision and discuss the variables that really affect it and to integrate the imprecision resulting from the interpolation into the calibration line with that caused by the other steps in the analytical method in order to calculate a real confidence interval for an analytical determination. Preliminaries All analytical determinations contain a degree of random error (imprecision), giving rise to a confidence interval (see below) for the concentration value obtained within which the real concentration value will be calculated with a certain degree of probability. This random error can be calculated by using different parameters: •

The variance of the replicates obtained (s2x,s; the x and s subscripts referring to concentration and sample, respectively)

s 2q, x, s being the variance of the mean due to the quantifica2 tion step (subscript q referring to quantification) and s m, x, s being the overall variance of the mean resulting from the other steps (to which subscript m refers). While s 2q, x, s can be ex-

2 perimentally obtained, s m, x, s has to be calculated. In most analytical methods, the analyte concentration (x) is related to the analytical signal (y) by a first-order line. In these cases, the analyte determination in a sample is generally performed in accordance with the following steps:

1. N calibration standards are prepared (whose concentrations are represented as x1,…..xn ). From these values, two statistical parameters can be calculated representing the data set: the average concentration of the standards ( x– ) and the concentration variance of the standards (s2x ). Both parameters are subsequently very useful. 2. The corresponding analytical signals are obtained for the standards (represented as y1,…..yn ). The average signal (y–) and the signal’s variance (s2y ) of the standards are also obtained. The x– and y– values define the centroid of the calibration line. When several replicates of each standard are prepared, the calculation of the parameters defined above is performed in the same way, n being the addition of all the replicates. 3. Using the least-squares regression analysis, the regressors (a and b) and the correlation coefficient (r) of the calibration line y ⫽ a ⫹ bx are obtained (2): n

• The variance of the mean of the replicates s 2x, s , which is more convenient to use, defined as:

s2 x, s

=

s x2, s

∑ xi

b =

n

∑ xi

(1)

m

− x yi − y

i =1

− x

(3)

2

i =1

m being the number of replicates used in its calculation n

• The relative standard deviation RSDs, also called the variation coefficient (see below).

∑ x i

Each step performed in a quantitative analysis (from sampling to the presentation of results) increases the imprecision of the calculated concentration. The overall variance of the mean (and also of the replicates) of the analytical determination is calculated as the addition of the variance of the mean of all the steps. As this paper is focused on the variance resulting from the quantification step, s 2x, s can be expressed as:

s2 x, s

=

s2 m, x , s

+

s2 q, x, s

www.JCE.DivCHED.org

(2)



− x yi − y

i =1

r =

n

∑ xi

i =1

− x

2

n

∑ yi

− y

(4)

2

i =1

(5)

a = y − bx 4. From the laboratory sample m aliquots are prepared (for the sake of simplicity, it is assumed that all the aliquots are prepared identically). The average signal

Vol. 81 No. 7 July 2004



Journal of Chemical Education

1053

Research: Science and Education

Total Variance Calculation

y–s and the variance signals (s2y,s ) of the sample are calculated from the analytical signals obtained from the measurement of the solutions ( ys,i ). Although s2y,s is a good measurement of the imprecision, it is usual to replace this parameter by relative standard deviation (RSDy,s ) defined as:

s y, s

RSDy, s =

In order to take s 2q, x, s into account in the s 2x, s calculation, the following equation is often used (3):

(6)

ys

s y2,s

s2 1 = 2 + e2 + b m b n

ys − y n

b 2 ∑ x i − x

(7)

b2

2

(9)

2

n

s e2

n

1 = n − 2

∑ yi

− y

2

∑ xi

− x

yi − y

i =1



n

i =1

∑ xi

− x

2

i =1

Finally, s ´2x, s is obtained from s´2x,s (see eq 1). The confidence interval of the determination (strictly, the confidence interval of the average) is now calculated as: confidence interval = x s ± t s´x, s

2

All the parameters appearing in this equation have been previously defined except se2 :

It is very important to emphasise that eq 7 is only correct when all of the sample aliquots are prepared in exactly the same way (which is usually the case); otherwise, it is necessary to obtain the concentration values corresponding to each aliquot signal (xs,i ), from which x–s and s´2x,s are calculated.

(8)

t being the Student-t statistic (for m ⫺ 1 degrees of freedom and the confidence level desired, generally 90% or 95%). This procedure, though commonly used, is not wholly correct. It can be seen that the value s ´2x, s does not correspond s 2x, s

to the total variance shown in eq 2 (this is the reason why the superscript ´ is used), but that part of the variance relating to the whole analytical process except for the quan2 tification; in other words s ´2x, s is really s m, x, s . For this reason the confidence interval obtained according to eq 8 is not real. As is known (and as can be seen below in the example given 2 in the Appendix), s 2q, x, s can be much higher than s m, x, s , so that the real confidence interval will be much higher than that calculated by eq 8. It might be thought that the calculation of s 2q, x, s could be done with the least-squares line. To do this, it would be enough to substitute xs by x–s in the line, giving xs =

s 2y, s

i =1

5. The y–s value is interpolated into the calibration line, and the average sample concentration is obtained(x–s ). The concentration variance s´2x,s is calculated as:

s´x2, s =

s2 x, s

If s 2x, s is calculated using eq 9, the Student-t statistic of eq 8 is obtained for n ⫺ 2 degrees of freedom. In spite of its importance, eq 9 is not usually used in the classroom because of its mathematical complexity and the difficulty experienced by students in understanding it. Consequently, the objectives at this stage are to: •

Consider what information is really offered by eq 9.



Rewrite eq 9 in such a way that it can more easily be used.



Clarify which parameters really affect the imprecision resulting from interpolation into the calibration line.

The two additions appearing in eq 9 actually correspond 2 2 to the independent contributions of s m, x, s and s q, x, s to the confidence interval. In fact, from eqs 1 and 7 it can easily 2 be seen that the first sum is really s m, x, s : s 2y , s b 2m

s´2x, s 2 = s´2x s = s´m , x, s , m

=

Thus, if equations 2 and 9 are compared, the following differentiation can be made: s2

ys − a

m, x, s

=

s 2y, s

(10a)

b 2m

b

and, applying the error propagation rules, s 2q, x, s could be calculated from the y–s , a, and b variances. However, this procedure is statistically unacceptable because there is a correlation between a and b that would need to be taken into account (through their covariance).

s2 q, x , s

1054



Journal of Chemical Education



Vol. 81 No. 7 July 2004

s2 1 = e2 + b n

ys − y n

2

b 2 ∑ x i − x i =1

www.JCE.DivCHED.org

2

(10b)

Research: Science and Education

and under condition (2)

Variance Due to the Quantification s 2q, x, s Conceptually, this term would appear to be dependent solely on the imprecision due to the estimation of the calibration line (which is later transferred to the result interpolated in it). From this viewpoint, s 2q, x, s can be expressed by a simpler equation when the two following considerations are taken into account: 1. On the basis of the least-squares regression, it can be deduced that:

ys − y = b x s − x

x s ≈ x ⇒

xs x

− 1 = 0

(15)

Substituting eqs 14 and 15 in eq 13 yields:

n

sq2, x, s

=

1 − r2 r2

x2s

(11)



i =1

2i − n − 1

2

n −1 n (n − 2)

2. Combining eqs 3 and 4 and the se2 value yields: n

se2 2

=

b

1− r r2

∑ xi

2

− x

(16)

2

n − 2

The application of eqs 11 and 12 obtains the following equation for s 2q, x, s : sq2, x, s

=

1 − r2 r2

xs − x n −1 2 sx + n − 2 n − 2

2

(13)

As can be seen, s 2q, x, s depends on the number and concentration of the calibration standards used, the difference between the average calculated concentration values and the centroid of the regression line, and the correlation coefficient (r), but is independent of the slope of the regression line (b). This is logical given that s 2q, x, s indicates the imprecision in the estimation of the calibration line. When r ⫽ 1 there is no imprecision since there is only one line fulfilling the condition and therefore s 2q, x, s ⫽ 0. The lower the value of r, the higher the number of point combinations that can produce the r value and therefore the more imprecise is the line. This result is very important from a practical point of view because it allows guidelines to be established to minimize this imprecision s 2q, x, s . Equation 13 both simplifies the calculation and clarifies the meaning of s 2q, x, s . However, a more simplified equation can be deduced for the particular (but widely used) case in which (1) calibration standards are equally spaced along a concentration range, and (2) the interpolation is close to the centroid of the calibration line. If condition (1) is fulfilled, then:

xi = 2x

u

(12)

i =1

As can be seen s 2q, x, s depends only on the average concentration of the sample, a term that depends only on the correlation coefficient (r) (represented by u), and another (represented by v) that depends only on the number of standards but not on their concentrations. As n has only finite values (generally lower than 10), the v term can be previously calculated in most cases (Table 1). Variance Deriving from the Rest of the Method s 2m, x, s The parameter s 2m, x, s can be calculated using eq 10a. When all the sample aliquots are prepared in the same way, it is common to calculate the parameter with the RSDy,s (eq 6), thus avoiding having to interpolate each of the signal values obtained for all the aliquots. Again, when the interpolation is close to the centroid of the calibration line





i =1

xi x

2

− 1 =

n



i =1

2i − n − 1

2

Table 1. Values of v for Corresponding n Values When n ⫽

Then v ⫽

13

0.667

14

0.278

15

0.167

16

0.117

17

0.089

18

0.071

19

0.059

10

0.051

(14)

n −1

www.JCE.DivCHED.org



(17)

y s = y and y s ≈ b x s

i −1 n − 2 n

v

Vol. 81 No. 7 July 2004



Journal of Chemical Education

1055

Research: Science and Education

and eq 10a is transformed into:

s2

m, x, s

=

RSD2y , s

x s2

m

(18)

Calculation of Confidence Interval In conclusion, the confidence interval can in any case be calculated according to the following expression:

x s ± t

s´x2, s

+

1 − r2 r2

xs − x n−1 2 sx + n−2 n−2

2

(19)

If the interpolation is close to the centroid of the calibration line and the standards are equally spaced, eq 19 is transformed into:

confidence interval = x s 1 ± t

RSD2y, s

+ uv

m

(20)

When can eq 20 be applied in practice? Empirically, the equidistant standards approximation can be applied even when the concentrations differ up to 50% of the equidistance. Under these conditions (fulfilling eq 14), the centroid approximation can be applied, in general terms, when the values of x– s and x– do not differ by more than approximately 20%.

In analytical methods the linear response between signal and concentration is generally observed for a limited concentration interval (shaping the linear response range of the method), which is experimentally obtained. Only those signals lying within this range can be interpolated into the calibration line. Often the limits of this range are uncertain. In such cases a statistical linearity test can be applied, but in general the r value is used as a decision criterion (frequently it can be seen that the minimum acceptable r value for a line is 0.995). Alternatively, eq 19 could be used to deduce the linear response range as a function of the imprecision. Equation 19 enables us to determine which fraction of the confidence interval is due to the quantification step and which is due to the rest of the method, and conclusions can thus be drawn about the experimental procedure to be fol2 lowed. For example, when s m, x, s is high, it is not very practical to use a calibration line with very high r values; the linear 2 response range can be extended. Conversely, when s m, x, s is low, it will be necessary to use the greatest possible number of calibration standards and a concentration range with as high an r value as possible. Due to its mathematical complexity, the degree of imprecision resulting from the quantification step is not usually taken into account when calculating the confidence

Journal of Chemical Education

Acknowledgements This work was supported by the Ministerio de Ciencia y Tecnología of Spain (BQU 2000-1162) and by the Diputacion General de Aragón (P080/2000). The author wants to thank Fernando Plo (Departamento de Estadística e Ingeniería de Sistemas, Facultad de Ciencias, Universidad de Zaragoza) and the anonymous referees for their technical advice. Literature Cited 1. Taylor, J. R.; An Introduction to Error Analysis—The Study of Uncertainties in Physical Measurement, 2nd ed.; University Science Books: Mill Valley, California, 1997; Chapter 8. 2. Miller, J. C.; Miller, J. N.; Statistics for Analytical Chemistry, 2nd ed.; Ellis Horwood: Chichester, U.K., 1988; Chapter 5. 3. Massart, D. L.; Vandegiste, B. G. M.; Buydens, L. M. C.; de Jong, S.; Lewi, P. J.; Smeyers-Verbeke, J. Handbook of Chemometrics and Qualimetrics: Part A; Elsevier: Amsterdam, The Netherlands, 1999; Chapter 8.

Appendix

Example

Concluding Remarks

1056

interval. In this paper a mathematical model has been proposed that simplifies both its calculation and its understanding by students (and perhaps even by professionals). For a particular widely used case, this imprecision is dependent only on the number of calibration standards, the effect of which can be previously calculated (see Table 1) and the correlation coefficient of the calibration line.



The results shown in Table 2 correspond to an experimental calibration study of an analyte (A) using a spectrophotometric method.

Table 2. Experimental Results of Spectrophotometric Absorbance of Analyte A [A] /mM

Absorbance

10

0.003

13

0.108

18

0.280

13

0.481

16

0.598

20

0.710

Vol. 81 No. 7 July 2004



www.JCE.DivCHED.org

Research: Science and Education

The method was applied to the determination of A in a sample. Four aliquots gave the following absorbance values: 0.348, 0.342, 0.350, 0.346. The equation of the calibration line obtained was (n ⫽ 6, t0.05(n ⫺ 2) ⫽ 2.776): Abs = 0.0014 + 0.0362 [A]

confidence interval = 9.54 1 ± 2.776

4

+ (0.001923)(0.117)

(A2)

= 9.54 ± 0.40

r = 0.99904 (A1)

very similar to that obtained with eq 19:

The x– s obtained (interpolation) ⫽ 9.54

(A3) despite the fact that the interpolation was not performed in the centroid and the calibration standards were not equally spaced. If the confidence interval is calculated without considering the imprecision due to the interpolation into the calibration line, the result is significantly lower than the real result: (A4) 9.54 ± 0.13 9.54 ± 0.41

The u value eq 16 ⫽ 0.00192 The v value Table 1 ⫽ 0.117 The DSRy,s obtained for the replicates (m ⫽ 4) was 0.00986. The confidence interval calculated by the simplified eq 20 gives:

www.JCE.DivCHED.org

0.00986 2



Vol. 81 No. 7 July 2004



Journal of Chemical Education

1057