Application of the equation of error propagation to ... - ACS Publications

Application of the equation of error propagation to obtaining nonstochastic estimates for ... Quantifying uncertainty due to random errors for moment ...
0 downloads 0 Views 538KB Size
Anal. Chem. 1989, 67, 1058-1062

1058

Application of the Equation of Error Propagation to Obtaining Nonstochastic Estimates for the Reproducibility of Chromatographic Peak Moments David I. Eikens and Peter W. Carr* Chemistry Department, University of Minnesota, 207 Pleasant Street SE, Minneapolis, Minnesota 55455

The statistical moments of an experimental chromatographic peak are commonly computed by using a summation procedure. I n general the random errors in the observed signal far exceed those in the measurement of time. Under this condition it is posstble to use the method of error propagation to derive simple equations for the uncertainty in the statistical moments and thereby circumvent the problems inherent in prevlous stochastlc experiments. The equations can be used to design a data sampling program capable of achieving any desired precision.

INTRODUCTION Precise and accurate measurement of peak moments is very important in many types of chromatographic studies. For example, such measurements are used to determine mass transport properties, activity coefficients, mixed virial coefficients, and diffusion properties (1-3). In size exclusion chromatography the peak centroid is used to determine the molecular weight and the peak width is used in the computation of the polydispersity ( 4 , 5 ) . Inaccurate results are common especially in polydispersity measurements. In the past, many researchers have focused on optimizing the instrumentation (3, 6, 7), while others have tried to optimize the method of calculating the peak moments (1, 8-12). Baumann et al. used several digitized chromatographic peaks to compare the effect of data collection rates and integration window sizes by taking subsets of the raw data in order to determine the accuracy and precision as a function of these parameters (9). Simulations of chromatographic peaks and signal noise have been used for similar studies. Several replicates were used to determine the accuracy and precision for varying signal-to-noise ratios, integration window widths, and data collection rates (1, 8, 10, 11). The stochastic approaches mentioned above are time-consuming and limited to the conditions studied. In contrast if propagation of error analysis is used to determine the precision of the moments measurements, the results allow the use of simple equations to calculate the variances under a broad range of conditions. The summation method of integration approximation was used in the derivation of the variance for each of the first three moments. The propagation of error analysis results were compared to stochastic experiment results. The equations resulting from the propagation of error analysis show the dependence of the variances in the moments on the signal-to-noise ratio, the integration window width, the time interval between data points, and the asymmetry of the window about the peak centroid. The equations indicate the expected results of increasing variance for decreasing signal-to-noise ratio, increasing integration window width, decreasing number of data points per u, and increasing asymmetry.

THEORY Peak moments can be approximated by the summation method of integration. For the zeroth moment, mo

where C ( t ) is a function of the time that defines the peak, C, is the digitized detector response of the ith data point, At is the constant time interval between points, T I and T2denote the starting and ending times, respectively, for integration, and N is the total number of digitized data points. For a Gaussian peak TI and T2should in principle be --m and +m, respectively, in order to acquire all of the peak, but in practice T I and T 2 are finite values that essentially completely encompass the peak. The summation on the right side of eq 1 is an approximation of the definite integral which becomes exact as N approaches infinity. For notational simplicity the and the summation index limits, integration limits, T I and T,, i = 1 to N , will be dropped hereafter. Similarly for the first moment, m,

m, = S C ( t ) t d t = C ( C , t A t )

(2)

In the summation on the right-hand side of eq 2, time, t , can be written as a function of TI, At, and the index, i

t=T,+Ati m,

(3)

~ [ C , ( T+, A t i ) A t ] = T , A t C C , + A t 2 C ( C , i ) (4)

Normalizing the equation reduces it to

(5) The second moment, m2,is derived in a similar fashion. The derivation of the normalized and centralized second moment, m2', follows:

m i = l / m o S C ( t ) ( t - ml')* d t

(6)

(mi)*

(7)

= l / m o J C ( t )t 2 d t -

l/moE-[Ci(T1 + A t i ) 2 A t ]-

(m1')2

(8)

Equations 1, 5 , and 9 show that the values of the peak moments depend on the peak shape via the data points, C,. The approach that will be used here to obtain estimates of the uncertainties in the peak moments does not require any definition of the shape of the peak. In contrast the stochastic method does require an explicit definition of the peak shape. Propagation of error analysis can be applied to eq 1,5, and 9 in order to determine the variance for each moment. Two distinct sources of random error exist in the analysis: error

0003-2700/89/0361-1058$01,50/0@ 1989 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 61, NO. 10, MAY 15, 1989

in time measurement and noise in the detector response. Assuming the fluctuation in the variables Ci and t are truly random, i.e., not correlated, the variance in the nth moment, , : ,a is approximated by the propagation of error equations

(10)

where at2is the variance in the time measurements and U C , ~ is the variance in the detector response for the ith data point. We can make two simplifying assumptions. First, we assume that the timing error is negligible relative to error in the signal response. Second, as did Goedert and Guiochon (7,10, I I ) , the variance in the signal is taken to be constant over the measurement period. This second assumption is quite crucial. It allows the term oc,' to be factored outside the summation in eq 11 and ultimately leads to the conclusion the variances of the moments are independent of the peak shape and fixed only by the data acquisition schedule, that is, by T1, T2,and At. This second assumption will be discussed in more detail below. For the present we note that if it is not made, one obtains much more complex equations and one must specify the peak shape. These assumptions reduce the propagation of error equation to

Equation 12 is used to derive the variance in mo,m i , and m i . In order to derive the variance in the zeroth moment, uF2, we must evaluate the derivative of m, with respect to Ci using eq 1. We expand mo and find the derivative with respect to each data point mo == At C1

+ At C2 + At C3 + ...

am, am, -=-=-

am,

am, =--

aci dc, ac2 ac3

- ... = At

(13)

Xi = W / 2 + N / 2 Xi2 = p / 3 + W / 2 + N / 6 Xi3 = N4/4 + P / 2 + W / 4 Xi4 = N5/5 + N 4 / 2 + P / 3 - N/30

= a c , 2 x A t 2=

UC;

At2N

E

T2 - Ti P/At

= U C , ~A t p

This approximation leads to an error of 1.4% in the final result for the variance in the second moment when as few as 50 data points are used. The exact solution is given in the Appendix. The variance in the first moment, u,,?, is derived by using eq 5 and 12, which lead to

(24) Equation 24 can be made much more compact by using eq 23 and introducing a term that we refer to as the integration window asymmetry factor, f m,' - T1 f=(25)

P

This is termed the integration window asymmetry factor because f will be exactly 1/2 when m,' is equidistant from the starting and stopping times. The above reduces eq 24 to final form

Similarly the variance of the second moment is derived by using eq 9 and 12

Again eq 23 and 24 are used to arrive at the final expression for amd2

2(m2')P2(f - f

(17)

(18)

In the derivation of the variance for the first and second moments we need alternative expressions for several terms. Equations 1,5, and 9 are used to arrive at expressions for CC,, CCii,and x C , i 2 . For the terms and we have opted for algebraic simplicity and made approximations for these terms. The exact forms for these sums follow:

xi,xi2,xi3, xi4

(22)

(16)

The exact relation, N - 1 = @/At,leads to a more complex final result. For the sake of simplicity and in view of the approximations made below, we feel that eq 17 is accurate enough for our purposes. When 100 data points are used, this approximation results in a 1% error in the variance of the zeroth moment. The exact solution is included in the Appendix. The final expression for the variance of the zeroth moment is achieved by substitution of eq 17 into eq 15 am:

(21)

(27)

(15)

we can replace N with the following relation:

N

(20)

(14)

With the size of the integration window, p, defined as the difference between T, and T2

p

(19)

For simplicity we dropped all but the first term in each expression. This leaves the following generalized approximation:

Upon substitution of eq 14 into eq 12 we find :,,a,

1059

+ 1/3) + (m2')2) (28)

The exact solutions for and 2,a,, without the use of the approximations in eq 17 and 23 are included in the Appendix.

RESULTS AND DISCUSSION The results clearly demonstrate the very strong influence of the choice of starting and stopping times on the variance of the peak moments. In particular for a fixed value of At eq 18 indicates that the variance of the peak area,, : ,a will increase in proportion to the first power of p, the integration window. Similarly the variance in the peak centroid, C T , , ~ ~ , and peak width, umd2,increase with the third and fifth power of the integration window, respectively, all else being held constant. The above observations although virtually intuitive are not in complete accord with the results of the empirical stochastic study of Goedert and Guiochon (11). In order to test the accuracy of the present approach and see if the discrepancies

ANALYTICAL CHEMISTRY, VOL. 61, NO. 10, MAY 15, 1989

1060

.A,

5.55 ie

o 30 -

1

E"

032+

-2 E

0301

$

e

028-

e

0 26-

0.0-

0 24C 0 C5-

e

0 22

0004 50

I

75

:

I

i

100 125 '50

: 1 1 : i I 175 200 225 250 275 300

--.._ -. ~

t

0 20 0 15

0 20

0 25

030

035

0 40

045

050

f

N

0 030 j

3c.4,

B

0 012+

1

:032-

-

0 018

0.008-

COO15

E

0.006-

0 012

~

~

0 009+ I 0 006f

0 0c4+ 0.002 4-

0.000.I SO

0 003t

: 75

!

I

100 '25

~

1

1

1

~

;

1

0.000

-1

0.15

150 175 200 225 250 275 300

;

0.20

0.25

0.30

0.35

0.40

+--+ 0.45

0.50

n

I

Y

' 7

I

t 't

I

,

~

Y

24 211

$

,E

4 1

3-

M

'I 50

6-

75

0.15

100 125 150 175 200 225 250 275 300

N Flgure 1. Stocastic (0)and algebraic (-) results for varying integration window sizes: (A) percent relative standard deviation for m, vs N (eq 18), (E) absolute standard deviation for m,' vs N (eq 26). and (C) percent relative standard deviation for m2' vs N (eq 28). Points per u = 21; S / N = 30; 10 replicates. N = 127 for integration limits at C,= 1.OO% of maximum peak height; N = 156 for integration limits at C,= 0.10% of maximum peak height: N = 180 for integration limits at C,= 0.01% of maximum peak height.

between it and that of Goedert and Guiochon are mcne apparent than real, a limited number of stochastic experiments were repeated. A Gaussian peak model was used as the basis for simulating a real chromatographic peak. The data points, Ci,representing the digitized signal were computed at regularly spaced time intervals. Random numbers chosen from a normal distribution with a mean value of zero simulated the noise in the detector response. They were scaled according to a predetermined signal-to-noise ratio and then added to the Gaussian generated data point. The signal-to-noise ratio was defined as the ratio of maximum peak height to 4 times the standard deviation of the random distribution. The standard deviation of each moment was calculated for 10 or 100 replicates as noted below. The variances from both the stochastic experiments and eq 18,26, and 28 were compared. The parameters, ac,, N , and At were varied independently. As shown in Figure 1 the results of the stochastic experiments based on ten replicates to generate the precision estimates are randomly scattered about the solid line based on the algebraicallygenerated result.

0.20

0.25

0.35

0.30

0.40

0.45

0.50

f

Stochastic (a)and algebraic (-) results for varying asymmetry: (A) percent relative standard deviation for m, vs f (eq l8), (E) absolute standard deviation for m ,' vs f (eq 26),and (C) percent relative standard deviation for m i vs f (eq 28). p = 3 / f ; points per u = 21; S I N = 30; 100 replicates. Figure 2.

A closely related set of experiments are shown in Figure 2 where both f and /3 were varied. In this case the points are the result of 100 replicates and fall much closer to the algebraic results again defined by the solid curve. Good agreement was also found when uc2and At were varied. We believe that this shows that the central tendency of the stochastic approach will be to fall on the curve defined by the above equation and thus there really is no difference between the present approach and that of Goedert and Guiochon. Previously Goedert and Guiochon concluded that "the reproducibility of the measurements of both the area and the mean of a Gaussian peak is independent of the width of the integration limits, at least within a reasonable range." (11). Our results clearly show the importance of the italicized qualifier. Goedert and Guiochon used windows where the start, T I ,and stop, T,, times are chosen a t the points where the signal was 1.00%, O.lO%, and 0.01% of the maximum peak height. As seen in Figure 1the inherent stochastic nature of the experiment is such that the trends predicted by the above equations are obscured. A very large number of stochastic experiments must be done, otherwise the statistical fluctuations in the estimates of am: can be excessively large. The error propagation method used in this work is not subject to

ANALYTICAL CHEMISTRY, VOL. 61, NO. 10,MAY 15, 1989

1.Oh

0.0

1.0-

1

20

0

60

40

E

20

0

40

4:

I; ,

0.030

I\,:

0.025

60

100

80

N/o

0.035

B

1.

- 0.015

100

A

I

N ,/a-

0.025 1 0.020

I

80

1061

B

,

1.

0.005 0.000 .I 0

20

60

40

80

i

100

N/o

7.0,

1

r

10.0

C

-i

5.0 6.0 ,

x 1 .o

0.0 4 0

\

'

'\ ---_. . . . .-.-. -. -. ._. _. . _. . _. . . . . .

. . . . . .. . . . . . . . . . . . . . . . . ---

"

20

60

40

4

80

100

N/o

0.0

7

0

20

40

60

80

1 IO

N/Q

Flgure 3. Variance in the moments as a function of points per u for 1.00% (p = 5.160) (-), 0.10% (p = 6.58~) (--), and 0.01% (p = 7.780) (-e) accuracy in m,: (A) percent relative standard deviation for m, vs points per u (eq la), (B) absolute standard deviation for m,' vs points per u (eq 26), and (C) percent relative standard deviation for m i vs points per u (eq 28). f = 112;uc, = 1 % of maximum peak height.

Flgure 4. Variance in the moments as a function of points per u for 1.00% (p = 6.680)(-), 0.10% (p = 8.040) (--), and 0.01% (p = 9.16~) (-) accuracy in m2': (A) percent relative standard deviation for m, vs points per u (eq la),(B) absolute standard deviation for m,' vs points per u (eq 26),and (C) percent relative standard deviation for m i vs points per u (eq 28). f = 1/2; uc, = 1 % of maximum peak height.

the imprecision of a stochastic experiment. Of course the accuracy of an estimate of the precision is not an important issue per se. However the equations lead to the correct trend which is the important point in terms of guiding the design of a data acquisition schedule. Equations 18, 26, and 28 show that an increasing number of data points per u and a decreasing integration window size improve the variances in the moments. Figures 3 and 4 illustrate these effects for an integration window which is symmetric about the peak centroid (f = 1/2). The larger number of points per u resulted in better precision since the peak is better defined by more data. There is rapid improvement in precision up to about 20 points per u. After this the improvement is not very great and perhaps not worth the cost of extra data storage and processing, at least for on-line calculations. The effect of the integration window width also shows the intuitive result. Figures 3 and 4 indicates that the variance of the zeroth moment (the area) is much less sensitive to the window width than is the first moment, which in turn is less sensitive than the second moment. This clearly illustrates the profound effect that noisy data far from the centroid have on the calculation of higher moments. Naturally the relative precision of the first moment improves as the peak centroid moves toward longer time.

In most of the preceding discussion and data displayed in Figures 1,3, and 4 the integration window asymmetry factor, f, was set at one-half and the effect of p and At was examined. The effect o f f on the zeroth moment is nil. The effect o f f and om? a t constant p and fixed data rate per u is on um112 shown in Figure 5. An optimum in precision occurs at f equal to one-half and the results are perfectly symmetric about this point, i.e., both fronting peaks and tailing peaks have the same precision. Ideally then one should design the data acquisition scheme, whenever possible, so that the integration window is centered at the centroid of the peak to obtain the optimum precision shown in Figure 5. A result of the above approach is that the equations of error propagation estimates are independent of the peak shape. This seems counterintuitive since there is no question that it is much more difficult to obtain precise measures of the peak centroid and second central moment when real peaks are asymmetric. This apparent contradiction is resolved as follows. When the measured peak is symmetric, one can locate the integration window symmetrically about the centroid cf = 1/2). When a peak is tailed, one can start data acquisition at the same time as for a symmetric peak but one cannot terminate the peak until well after the time which would be used for a symmetric peak. In this case (3 must be larger and

1062

ANALYTICAL CHEMISTRY, VOL. 61, NO. 10, MAY 15, 1989

0,030!h

0.025 +\

o.ooo/

0.0

0.1

0.2

!

:

:

0.3

0.4

0.5 f

; 0.6

--++A 0.8 0.9

t 0.7

1.0

These results show the need for better data acquisition strategies in order to obtain precise and accurate data. The use of narrow integration windows that are just wide enough to encompass the whole peak to obtain accurate peak moments and windows that are symmetric about the centroid are very important. These conditions are met only with narrow symmetric peaks. At least 20 points per o should be collected. For a Gaussian peak, for example, one needs 80 points per u to obtain 0.1 % accuracy and 0.1 % precision for the zeroth moment. For capillary gas chromatography this means that a 320 points per second data collection rate is needed for peaks with a base-line bandwidth (46) of 1 s.

APPENDIX B/c

In order to simplify the final results, the approximations in eq 17 and 23 were applied in the derivations. For the zeroth moment, eq A1 results when the approximation of eq 17 is not used a,,:

e

~ c , ' A t(0

+ At)

(AI)

For the first and second moments the approximations in eq 17 and 23 are used. Without those approximations the eq A2 and A3 are derived nm1,2= 0.0 0.1

0.2 0.3 0.4

0.5 0.6

0.7

0.8

0.9

u,,2/mo2(At2hi2/3

1.0

f

Flgure 5. Variance in the moments as a function of f for Plu = 6, 8, 10, 12, 14, and 16: (A) absolute standard deviation for m,' vs f (eq 26) and (B) percent relative standard deviation for m 2' vs f (eq 28). Points per u = 20; ac, = 1 % of maximum peak height.

f will be less than one-half. In view of Figure 5 it is evident that in order to include the whole of an asymmetric peak, the precision must be worse than would be observed for a symmetric peak. For example consider an exponentially modified Gaussian peak with a 7-to-u ratio of 1.6. The start and stop times were chosen to be the points where the height is 1% of the height at the peak maximum and uc8= 1% of the peak height. For this case 0 = 6.51~.If we chose to use a collection rate of 20 points per 0, the relative standard deviation for mo is 0.27%, the standard deviation for ml/ is 0.026, and the relative standard deviation for m2'is 4.2%. The estimates of precision here are optimistic. Significant errors arise in selecting the start and stop times especially for asymmetric peaks and where sloping base-lines exist. The models used here are for the ideal case where these pitfalls are avoided. I t is apparent that the greatest limitation of the present approach is not the arithmetic approximations made in the use of eq 17 and 23 but rather our assumption that the variance in C, is constant across the entire chromatographic peak (eq 12). In the absence of an exact knowledge of how uc,2 varies with C,, one must assume some other limiting form, e.g., that the relative standard deviation is constant across the entire peak. In addition if uc,2 is not assumed to be constant, then the peak shape must be specified ultimately resulting in far more complex equations. This in turn leads to a far less general result. I t is perhaps best a t this point to allow 0c,2 to be some suitable average value over all concentrations explored.

fo A t N f P 2 At2(N/2 + 1/6)

+

+

+ + +

+

-

-

fPAt) (A2)

u,,2/mo2{At4(N4/5 M / 2 + W / 3 - 1/30) At34fP(N4/4 w / 2 N/4) At2 (6fp2 - 2m2')(W/3 N / 2 1/6) - At4fP(f2P2 mz')(N/2 1/21 V4P4- 2m2'fP2 ( V Z ~ ' ) (A3) ~)~

a2 ,

+

+

+ +

+

The approximations used for eq 17 and 23 result in underestimation of the variance of the moments. For an integration window width of 5.160 (1% accuracy in mo),an asymmetry factor of 1 / 2 and 100 data points, the approximations give the following errors: for the zeroth moment use of eq 17 leads to 1%error in the variance; for the first moment eq 17 leads to an error of 2% and eq 23 gives an error of 0.08% for a total of 2.08% and for the second moment eq 17 leads to an error of 4.9 % and eq 23 gives an error of 0.4 % for a total of 5.3 % .

LITERATURE CITED Chesler, S. N.;Cram, S. P. Anal. Chem. 1971, 43, 1922-1933. Goto, M.; Goto, S. Sep. Sci. Techno/. 1987, 22, 1503-1514. Oberholtzer, J. E.; Rogers, L. B. Anal. Chem. 1989, 41, 1234-1240. Kubin, M. J. Appl. Polym. Sci. 1985. 30, 2237-2252. Dawkins, J. V.; Yeadon. G. J. Chromafogr. 1980, 188, 333-345. Chesler, S. N.; Cram, S. P. Anal. Chem. 1972, 44, 2240-2243. Goedert, M.; Guiochon, G. Anal. Chem. 1970, 42, 962-968. Anderson, D. J.; Walters, R. R. J. Chromatogr. Sci. 1984, 22. 353-359. (9) Baumann, F.; Herlicska, E.; Brown, A. C. J. Chromafogr. Sci. 1969, 7,680-684. (10) Goedert, M.; Guiochon. G. Chromatographia 1973, 6.39-45. (11) Goedert, M.; Guiochon, G. J. Chromatogr. Sci. 1973, 1 7 , 326-334. (12) Li, B. 0.;Siu, S.; Evans, J. W. J. Chromafogr. Sci. 1987, 25, 281-285. (1) (2) (3) (4) (5) (6) (7) (8)

RECEIVED for review April 25, 1988. Resubmitted January 30, 1989. Accepted February 2, 1989. This work was supported in part by a grant from the National Science Foundation.