Round-off error - American Chemical Society

Under certain circumstances it is possible to compensate statistically for round-off error. The process of rounding-off a number to fewer digits is fa...
2 downloads 0 Views 839KB Size
1141

Anal. Chem. 1980, 52, 1141-1147

On Round-Off Error Lowell M. Schwartz Department of Chemistry, University of Massachusetts, Boston, Massachusetts 02 125

value falling between the cell boundaries qo - ‘ j 2 q and vo + l / * q to the digital value y o = qo, this is rounding. But if we quantize analog values between qo and qo + q to the digital value yo = qo, this is truncation. In both opticns, the q u a n tization leuels are equally spaced by the cell w i d t h , q , Le., the digital variable can assume quantization levels of yo, y o + q , s o+ 2 q , . . . . only. Hence, rounding is quantization to the nearest level and truncation is quantization to 1he next lower level. The term round-off error that appears in the title is called q u a n t i z a t i o n noise in Reference 1 but will be called quantization error here and this is understood to refer to the deviation of 3 from ‘7.

The process of rounding-off or quantizing measured values to fewer significant figures distorts the information conveyed by the measurements. Statistical methods are used to show the effect of this distortion on the mean and variance of quantized values derived from a normal (Gaussian) parent population. Under certain circumstances it is possible to compensate statistically for round-off error.

T h e process of rounding-off a number to fewer digits is familiar to all. If the numbers being rounded are experimental measurements of some system and those measurements are to be treated statistically to gain quantitative information about the system, then the rounding process introduces distortions of the information gained. Although this assertion is obvious, a n extreme example will illustrate the point and will also serve as an introduction to the quantitative treatment to follow. Suppose a n antibiotic zone assay is done by measuring the diameter of zone of activity with a digital zone reader device having resolution of 0.5 m m and the inherent mean and standard deviation of replicate zones are 14.70 and 0.01 mm, respectively. The zone reader is thus a rather crude device with respect to this particular measurement. Nevertheless, if, say, 100 or so replicate measurements are made, there is a good chance that the zone reader outputs for all 100 replications will be 14.5 mm since the device, in effect, rounds-off to the nearest 0.5 mm. If the analyst proceeds to calculate simple statistics from these measurements, he obtains a mean value of 14.5 m m and a standard deviation of zero. Thus the distortions due to round-off are 4 . 2 mm in the mean and -0.01 mm in the standard deviation with respect to the true properties of the zones. If, on the other hand, a hypothetical zone reader having 0.001-mm resolution had been used, he could expect the same experiment to have yielded a mean and standard deviation quite close to 14.70 and 0.01 mm, respectively. I t is the purpose of this paper to use statistical methods to show how these round-off errors depend on the round-off level, Le., on the crudity of resolution with respect to the system being measured. The formulas presented will also be useful as a means of compensating for round-off errors in certain circumstances. Some discussion of this topic has appeared previously in this Journal ( I , 2 ) but those papers focused on round-off effects on signals having a mean value of zero, whereas we will treat the more general case of a nonzero mean. However, we will adopt the same terminology as much as possible. It is explained in Reference 1 that digitization refers to the operation of converting a continuous variable having inherently infinitely many significant figures, an “analog” variable, to a value expressed by a finite number of significant figures, a “digital” variable. In this paper, the analog variable and its properties will be denoted by Greek letters: q for the variable, p for its mean value, and o for its standard deviation. Digitization involves both sampling from the continuum and q u a n t i z a t i o n , the latter process commonly understood as round-off. We will deal exclusively with quantization and in particular with the two usual practices which, henceforth, will be called rounding and truncation: If we quantize any analog 0003-2700/80/0352-1141$01.00/0

NORMALLY DISTRIBUTED ANALOG VARIABLE Fortunately, most physical and chemical quantitative variables are characterized by the normal (Gaussian) probability distribution and, hence, can be treated by the extensive statistical theoretical developments available for this distribution. For example, any statistical treatment utilizing a t or x 2 or F table assumes normally distributed data. Yet, if the data a t hand are quantized in the sense described above, they are no longer normally distributed because the quantization error is not normally distributed. A statistical theory of quantization as developed by electrical engineers for signal analysis exists and is summarized by Widrow (3). This theory will be presented in detail here, both because Reference 3 is not readily available to chemists and because that reference does not give the exact formulation which is necessary for a full understanding of the results. We hypothesize the existence of a normally distributed analog variable q with the mean p and variance u2 so that the probability distribution function (pdf) is

The experimenter is interested in making estimates of the population parameters p and D which itre unknown. If random values from this population are taken and then quantized, the results are a new random variable J which has a pdf to be denoted as Qb).The relationship belween the function N ( v ) and Qk) is given in both References I and 3 where it is made quite clear that N ( 7 ) is a continuous distribution but Q ( y )is discrete since the variable y can assume only discrete values. For the option of rounding the probability Qk, 1 of observing any particular y equals the probability of observing the analog variable between I, = y , - l / * q and IL, = y l + I j 2 q and this probability is the area under the curve N ( q ) vs. v between those limits. For the truncation option, the limits are shifted to I, = sl and u, = y, + q . These areas can be expressed as integrals and so the pdf is

,

where u, and I, are the appropriate upper and lower limits, respectively. Typically the experimenter will make several ( n ) measurements of y, and calculate a mean 9 = Z,J,/ n from these. This 5 is itself a random variable because if several replicate b

1980 American Chemical Society

1142

ANALYTICAL CHEMISTRY, VOL. 52, NO. 7, JUNE 1980

sets of n data values are taken to generate several mean values, these will scatter randomly. If the data were not quantized and so were samples of N ( v ) ,it is well known ( 4 ) that as n approaches infinity, the mean approaches p and the variance of t h e mean approaches u 2 / n . T h e value approached by a statistic for infinitely many samples is called the e x p e c t e d value and will be noted here in the conventional way, Le., the expected mean is E(?) = /L and t h e expected variance of the mean is E(var q ) = 2 / n . For a finite size sample the mean will deviate from p and this deviation is called sampling error. T h e mean of a finite set of y will also deviate from /L but this deviation in general is due to both sampling error and quantization error.

The latter form results from using the Euler equation exp(i2) = cos z + i sin 2. (C) If g ( t ) is the cf of the continuous pdf f ( z ) and a discrete pdf is derived from this by the operation of taking values of f(z)at a uniform spacing z = y , the cf of the resulting discrete pdf is ( 3 )

THE CHARACTERISTIC FUNCTION The expected mean of a pdf is the same as the first noment about zero and t h e expected variance is the second moment about the first moment ( 4 , Chapter 3). T o find the effect of quantization on means and variances we are thus seeking to derive expressions for the first and second moments of Q(y) a n d to show how these moments relate to p , u2 and 4. Widrow's ( 3 ) approach is t o use the characteristic function (cf) for this purpose and although his cf is identical to the Fourier transform, we will adopt the practice in Kendall and Stuart ( 4 ) of defining the cf of the random variable z having pdf f ( z ) as exp(itz) f ( z ) d z g,(t) =

Cexp(itz,) f ( z , )

z continuous

z,, discrete

(3)

which differs from Widrow's usage only by the sign of the imaginary exponent i t z , where i2 = -1. T h e cf leads t o the r t h moment M , of f ( z ) about zero by differentiation with respect t o the transform variable t and evaluation a t t = 0

(4) T h e random variable whose moments we require is 9 and to set u p the appropriate cf we will proceed in stages using properties of characteristic functions and Fourier integrals to be found in References 3, 4, and 5 . (A) T h e cf of a normally distributed variable such as 9 with pdf N ( v ) is

( 3 or 4, pages 62 and 277 or 5 , pair 729.1) (B) Three general properties of characteristic functions are: (1)If g, is the cf of f , and g, is the cf of f 2 , then g, - g, is t h e cf of f , - f i . (5, pair 201). (2) If g ( t ) is t h e cf of f ( z ) and z is displaced by the fixed value zo so t h a t t h e pdf is f ( z - z o ) , the cf of f ( z - z o ) is g(t)exp(izot). ( 5 , pair 206). (3) If g(t) is the cf of f ( z ) , then the cf of j--mZ f(z)dz is --g(t)/it. (5, pair 211). With these properties in mind. consider for the moment a version of y of Equation 2 (rounding option) which is continuous rather than discrete so t h a t u = y + '/,4and 1 = y - 1 / 2 q are continuously varying limits as well. Note that the integral of Equation 2 can be written as the difference

Consequently the cf of the continuous version of Equation 2 is

Consequently, the cf of Equation 2 is

S'k

having pdf Q ( 3 ' k ) as given by

where, following Widrow ( 3 ) ,we let qt~ = 2 r / q . (D) We are seeking t h e mean 9 rather than an individual yk and thus will require two additional general properties. (1) If g, is the cf of z1 having pdf f ( z l ) and g, is the cf of z 2 having the same pdf f ( z 2 )but z l and z 2 are statistically independent, the cf of z , + z2 is the product g,g,. ( 4 , Chapter 4, 11). (2) If g ( t ) is the cf of z having pdf f ( z ) , the cf of z / n is g ( t / n ) where n is a constant. ( 4 , Chapter 4, 11 or 5 , pair 205). Calculating the mean 9 involves the two operations of summation over n independent measurements yk all having the same pdf and then dividing that sum by t h e constant n. T h e corresponding operations on the cf of yk are thus (1) taking an n-fold product and (2) dividing the transform variable t by n wherever i t appears. T h e result is G,(t) the cf of 9

which is the final cf we are seeking but is specifically for the rounding option which was built in at step B. above. If the operations indicated in Equation 4 with r = 1 are applied to the cf of Equation 5 , the first moment M I is found after lengthy but straightforward manipulations.

Here the subscript R denotes that > k are quantized with the rounding option a n d t h e exponential function x = exp(-2h27r2u2/y2)is defined for economy of notation. With r = 2, the result M z is obtained but this is the second moment about zero. T h e variance we require is the second moment about M I and this is given by the relationship ( 4 , Chapter 3) E(var

:vR) = M ,

-

M,*

ANALYTICAL CHEMISTRY, VOL. 52, NO. 7, JUNE 1980

a n d following further manipulations we obtain

I b TRUNCATION

la ROUNDING

QUANTIZATION

(7)

I

1143

I

- \

ERRORS

where the notation X2 indicates that the entire sum is squared. T o derive equations comparable to 6 and 7 but for the truncation option, the required modification a t step B above is to replace the integration limits specified for rounding with u = y + q and 1 = y which are appropriate for truncation. This changes the continuous cf at step B to LL 0

[l - exp(-iqt)]Gx(t)/it

&

It is then convenient to factor out exp(-iqt/2) from the square-bracketed quantity to yield the equivalent expression

[

exp(

6

qt)

-

exp(-

6

qt)]exp(

-

6

qt)Gn(t)/it

I C ROUNDING

-

I d TRUNCATION

r-1

_ I -

-

Figure 1. A linear segment of N(7) vs. 7 subtended by one quantization cell of width 9 as 9 l u 0 for rounding (Figure l a ) and for truncation

which also can be written

(Figure 1b). The dashed vertical lines are the quantized value yk and the horizontal arrows pointing to N(7)are quantization errors. These and I d errors are uniformly distributed as sketched in Figures IC

+

where GIN(t)= exp[-(t2u2/2) it(p - (q/2))]. Thus truncation differs from rounding only to the extent that G’,(t) differs from G N ( t )and this difference is simply whether w or p - q / 2 multiples it in the exponent. Replacing p in Equations 6 and 7 by p - q / 2 , we obtain the corresponding equations for truncation

and

LIMITING CASE BEHAVIOR O F EXPECTED MEANS A N D VARIABLES T h e quantization error in the mean 9 is the difference between Equation 6 or 8 and the mean p of the analog variable. We note that E @ )like E(?) is independent of n, the number of samples summed into the mean. This indicates that the quantization errors E&) - p or E&) - p are not random errors in the sense t h a t they do not decrease to zero even if a n unlimited number n is used. On the other hand, the quantization errors in the variance of 9 are t h e difference between Equation 7 or 9 and E(var 4) = u 2 / n . These errors are inversely proportional to n, thus vanishing for unlimited n. The summations in Equations 6-9 are all dominated by the exponential factor X which is a strong function of the ratio u 2 / q 2 . When the cell width q is much smaller than the spread u of the intrinsic analog variable, X approaches zero as a limit

and all the summations approach zero. As this limit is ap~ zero faster proached, the quantization error for 3 ; approaches than for QT because the operation of truncating ail 9 between Yk and Yk q down to Yk introduces an average bias of - q / 2 while the rounding process introduces no such average bias. As q / u approaches zero, the variance of the mean approaches (u2 + q2/12)/n regardless of the round-off option used. The quantization variance error q2/12 which adds to the inherent variance u2 is that of the rectangular or uniform or “flat-topped’’ distribution function. Figure 1. illustrates the source of this variance for the case of a single measurement mean, Le., for n = 1. If q is small enough, the interval between quantization levels subtends a linear segment of the N ( 7 ) vs. 7 pdf curve and such a single segment is illustrated by the sloping line in either Figures l a or Ib. T h e dashed vertical line is the constant value Yk assumed by the quantized number within the range of one cell and both rounding (Figures l a , IC)and truncation (Figures l b , Id) are shown. Recalling that the expected value is the average over an unlimited number of replications, any one such measurement has a quantization error equal t o the difference between the analog variable 7 and the corresponding quantized value y k . This error is represented by one of the horizontal arrows pointing to the N ( 7 ) sloping line. In the long run, after sufficiently many replications are represented by arrows in this manner, the triangular areas in Figures l a and l b become uniformly filled with arrows as shown. Now consider the distribution of arrow lengths. For rounding, these lengths vary uniformly from -q/2 to + q / 2 and for truncation they vary uniformly from zero to q as sketched in Figures ICand I d , respectively. These are sketches of probability distribution functions of quantization error for a single quantization cell but it is clear that any and all cells yield the same uniform pdf. Therefore, the pdf for all quantization errors has the same rectangular form and such a pdf of width q has a variance q 2 / 1 2 . (See for example Reference 4, p 54.) T h e other extreme case to be examined is when q is much larger than u so that the ratio u / q approaches zero and the exponential function X approaches unity. An experimenter is likely to avoid such severe round-off but i t is interesting

+

1144

ANALYTICAL CHEMISTRY, VOL. 52, NO. 7, JUNE 1980

20 ROUNDING

2b TRUNCATION

are recorded as yk, the bias again is Yk p. However, when coincides with the boundary a t yk + q , half the measurements are yk and half are yR + q in t h e adjacent cell. T h e mean is then yk q / 2 ,the bias is -q/2, and we see that all positions of p yield nonzero biases. T o understand t h e variance behavior as q / a m again consider the variance of a single measurement mean, Le., n = 1. When p falls between cell boundaries, all measurements yield the same Yk and so var 9 = var >'k = 0 as mentioned in the introductory paragraph. But if p falls exactly on a boundary, the measurements are divided in half into two groups separated by q. The mean of the two groups is half-way between the boundaries and so the variance about the mean is (q/2)'. This variance behavior is the same for both round-off options. Having anticipated limiting quantization behavior as q / a m, we now show t h a t Equations 6-9 make the same predictions. When az is set to zero and n to 1, these equations take the forms -

p

+

t-q+

+q+

Figure 2. Quantization of analog values 7 with pdf N ( v ) to yk values when q >> a, For rounding (Figure 2a) yk is midway between t h e cell boundaries. For truncation (Figure 2b) yk is on the lower boundary

-

ah-

3b T R U N C A T I O N

-

-

1 where the infinite sums with

2

-0 8 '

(-l)k

A

CR

P/ q

Figure 3. Relative expected bias of the mean for rounding (Figure 3a, Equation 15) and for truncation (Figure 3b, Equation 16) plotted as functions of p / q for various values of q / n ranging from 2 to m

to see how Equations 6-9 predict the anticipated outcome. Figures 2a and 2b are sketches of single cells with n/q k y / 2 and Yk q / 2 are measured as > k . If N ( 7 ) is imagined to have infinitesimal width and if p is located between, but not on, a cell boundary, all measurements will be Yk and so the mean Q R = y k . T h e quantization bias is yk p and this result was mentioned in connection with the antibiotic zone example mentioned in the introductory paragraph. The bias is zero if p happens to coincide with Yk b u t otherwise increases in proportion to the separation between p and the center of the cell. When p happens to coincide exactly with the cell boundary, say at yk + q / 2 , half the measurements are recorded as yk and half as Yk + q because half the 7 values fall into the adjacent cell. Thus the mean SR is y k + y / 2 and the quantization bias is zero at this and all cell boundaries. The line labeled q / a = m in Figure 3a illustrates this limiting behavior Different behavior is anticipated for truncation which is sketched in Figure 2b. Since all 7 between cell boundaries

+

~

-

=

X

= 1 are

E2 cos k=l k

(

~

2:p)

and STand CT are the same as SRand C R , respectively, but with p replaced by p - 912. First we examine the behavior of these equations when p falls directly on a cell boundary. For the rounding option, this occurs when, say, p = q / 2 so that sin (2nkplq) = 0 for all k and cos ( 2 7 r k p l q ) = ( - l ) k .Thus SR= 0 and CR = &" (1/h2)= 7r2/6, which sum is well-known (6). Equation 10,therefore, predicts E(3;IR) = p , Le., zero bias, while Equation 12 predicts E(var yR) = q 2 / 4 as anticipated. For the truncation option, a cell boundary is at p = q so that the sine in STis zero, the cosine in CT is (-l)kand again ST = 0 and CT = n 2 / 6 . Equation 11 yields E&) = p - q / 2 with a bias of - q / 2 and Equation 13 yields E(var yT) = q2/4. The key to the behavior of these equations when 1 falls between cell boundaries is the recognition that infinite sine and cosine sums are related to Fourier series representations of algebraic functions and the most conveniently located tabulation for chemists is the "Handbook of Chemistry and Physics" (6). T h e relevant series are

-

24

=

(-1)k

- - - =k =T l T and

sin ( k i r z / q )

-q

2 , the sinusoidal character of the biases become evident and as q / u m , these biases become proportional to p bet.ween the cell boundaries as described in the previous section. The effect of quantization on the variance is best illustrated by also writing the expected variance on a nondimensional basis:

-

4

5 i-1)'

k=l

X(

5 + I) $) 47r2k2

cos 2 ~ k (

which is obtained by dividing Equation 7 by q 2 with n = 1. This function is plotted in Figure 4 where, as in Figure 3, each curve represents some particular value of q j u . The effect of increasing q / a is best perceived by regarding q as fixed and u as decreasing relative to q . For large u values, the first term u2/q' dominates. Then as u decreases, this term decreases relative to the constant term l j I 2 and as u decreases further, both summation terms subtract variance from the first terms. The function VR, therefore, decreases steadily to zero everywhere except directly on the cell boundary at p / q = where, as explained in the previous section, V R must approach The picture for truncation is similar to V R hut with the p / q axis shifted by unit in either direction. An experimenter, having measured a mean Q R or gT and a sample variance s2, would like to compensate for quantization error to make the mean a better estimate of p and the variance a better estimate of u2. In principle this can be done by using pairs of Equations 6 and 7 or 8 and 9 but in practice the procedure works well only if q / u is not too large and if the sample size is not too small. If the sample size is large enough, yR is a precise estimate of E(&,) of Equation 6 and s2 is a precise estimate of E(var J R ) of Equation 'i with IZ = 1. Equating QR to the right-hand side of Equation 6 and s2 to the right hand side of Equation 7 with n = 1 yields two equations from which p and u are solved as p and .?i These values are thus improved estimates of the analog parameters p and u in the sense that quantization errors are compensated. However, if q / u is too large, the numerical solution for p and i? tends to diverge, and if the sample size is not large enough, random sampling error in the mean may yield an observed

1146

ANALYTICAL CHEMISTRY, VOL. 52, NO. 7, JUNE 1980

QR which is a poor estimate of E(3;IR)and similarly an observed

s2 which is a poor estimate of E(var yR). Thus, the quantization corrections themselves are random variables and are subject to sampling error. Equations 6 and 7 represent simultaneous nonlinear equations for the unknowns p and a and so may be solved by any of several numerical algorithm. We have chosen a generalized version of the well-known Newton-Raphson iteration (7) to make a preliminary investigation of this compensation procedure for the rounding option. Values fi and C2 are sought which make the functions and $2 vanish simultaneously. These functions are from Equation 6 and

from Equation 7 with n = 1. Initial guesses p l and uI2 are required and these are used to calculate corrections A p and Sa2 which yield refined values p 2 = p l + S p and uz2 = a12+ Su2. T h e successive refinements hopefully converge to the solution @, d2 which makes both $l and be zero. T h e corrections are calculated from the simultaneous linear equations

W1 Ap + W1 A D 2

-

da'

aP

Table I. Examples of Compensation Calculationsa 4

n 100

500 1000 10000

-$1

0.12 0.05 -0.006 -0.031 -0.012

7r

and a12 =

1 - 4x cos (27rQ/y) where X = exp(-22s2/q2). (3) T h e infinite sums in the functions G1,iC2, and their derivatives were all calculated to the same highest h which was sufficiently large that the kth term was