P(ART2 300) =
800
) dx
1280000
F=
0.0141
and (0.00353)(0.0141)= 5 X The corresponding observed value from Table IV is 2/990 cz 2 X considerably larger but still not very significant and a fair check on the independence of the retention index and mass spectral matches.
ACKNOWLEDGMENT I am grateful to Bruce G. Buchanan for inviting me to the Stanford University Computer Science Department where this work was started. I am further grateful to Patricia Anderson for supplying the GCMS data, to Alan M. Duffield for selecting the "garbage" spectra, to Mark J. Stefik for the fast input and output subroutines and for the suggestion of using the experiment name and spectrum number of first occurrence as a temporary identification of a library record, and to Dennis H. Smith for the suggestion of preparing the historical library.
LITERATURE CITED (1) W. H. McFadden, "Techniques of Combined Gas Chromatography, Mass Spectrometry: Applications to Organic Analysis", Wiley, New York,
1973. (2)T. L. isenhour, 8.R. Kowalski, and P. C. Jurs, Crit. Rev. Anal. Chem., 4, l(1974). (3)B. G. Buchanan, A. M. Duffieid, and A. V. Robertson in "Mass Spectrometry: Techniques and Applications", G. W. A. Milne, Ed., Wiley, New York, 1971, p 121. (4)S. L.. Grotch, Anal. Chem., 47, 1285 (1975). (5) F. P. Abramson, Anal. Chern., 47, 45 (1975).
(6) Ref. 1, p 272. (7)C. C. Sweeley, N. D. Young, J. F. Holland, and S. C. Gates, J. Chromatogr., 99, 507 (1974). (8) C. E. Costello, H. S. Hertz, T. Sakai, and K. Biemann, Clin. Chern. (Winston-Salem, N.C.), 20, 255 (1974). (9)T. 0.Gronneberg, N. A. B. Gray, and G. Eglinton, Anal. Chem., 47, 415
(1975). (IO)R. G.Ridley in "Biochemical Applications of Mass Spectrometry". G. Waller, Ed., Wiley, New York, 1972,p 177. (11)H. S.Hertz, R. A. Hites, and K. Biemann, Anal. Chem., 43, 681 (1971). (12)M. C. Hamming and N. G. Foster, "Interpretation of Mass Spectra of Organic Compounds", Academic, New York, 1972. (13)S. R. Heller, Anal. Chem., 44, 1951 (1972). (14)E. Kovats, Adv. Chromatogr., 1, 229 (1965). (15)H. Nau and K. Biemann, Anal. Chem,, 46, 426 (1974). (16)T. Clerc and F. Erni, Forschr. Chem. Forsch., 39, 91 (1973). (17)K. S.Kwok, R. Venkatarghavan, and F. W. McLafferty, J. Am. Chem. Soc., 95, 4185 (1973). (18)R. G.Dromey, M. J. Stefik, T. C. Rindfleisch, A. M. Duffield, and C. DJerassi, Anal. Chem., 48, 1368-75 (1976). (19)D. E. Knuth, "Art of Computer Programming, Vol. 1, Fundamental Algorithms", 2d ed., Addison-Wesley, Reading, Mass., 1973,p 309. (20)F. W. McLafferty, R. H. Hertel, and R. D. Villwock, Org. Mass. Spectrom., 9, 690 (1974). (21)P. H. A. Sneath and R. R. Sokol, "Numerical Taxonomy", Freeman, San Francisco, Calif., 1973,p 178. (22)S. L. Grotch, Anal. Chern., 45, 2 (1973). (23)H. J. Larson, "Introduction to Probability Theory and Statistical Inference", Wiley. New York, 1974,p 271. (24)M. G. Kendall and A. Stuart, "The Advanced Theory of Statistics", Griffin, New York, 1963,p 76. (25)F. W. McLafferty, "Interpretation of Mass Spectra", 2d ed., Benjamin, New York, 1973. (26)S. P. Markey, W. G. Urban, and S. P. Levine, "Mass Spectra of Compounds of Biological interest", U.S. At. Energy Comm. Rept, No. TID-26553.
RECEIVEDfor review March 29, 1976. Accepted September 27, 1976.
AIDS FOR ANALYTICAL CHEMISTS Empirical Approximations to Equations Based on the Error Function S. H. Algie Department of Mining and Metallurgical Engineering, University of Queensland, St. Lucia, Q.4067, Australia
Buys and de Clerk ( I ) have drawn attention to the interest which lies in the construction of simple functional approximations to relationships based on the error function. As they have pointed out, this function describes peak shapes generated in a variety of instrumentation. It also enters the solution of differential equations in heat and mass transfer and is the basis of the area under the Normal curve used in the statistical analysis of experimental measurements. The inverse of the area under the Normal curve is also related to the error function. This has particular application in the simulation of random error propagation, of the type dealt with by Schwartz (2).
The present author has developed and used a series of empirical equations for relationships based on the error function. The first is for the error function and has some advantages over the form given by Buys and de Clerk. The second is an approximtion to areas under the Normal curve. The third relation is the inverse of the second by which standard deviations may be obtained from areas under the Normal curve.
determining the values of these constants given below. The first is that the constants should be expressed in a small number of significant figures. The second is that within the first restriction the extreme positive and negative values of either error or relative error (as appropriate) should be small in absolute magnitude and approximately equal over the range considered. Fitting was done by trial and error and in each case greater accuracy can be achieved if the range of the equation is reduced. Other criteria could have been used. For example Buys and de Clerk ( I ) chose to reduce the total sum of squares of the deviations. This will be preferable for some applications. Error Function. The error function, erf(3c) is defined by:
and the complementary error function is defined by:
EMPIRICAL RELATIONSHIPS Each of the equations contains two empirical constants which have to be specified. Two aims have been followed in 186
ANALYTICAL CHEMISTRY, VOL. 49, NO. 1, JANUARY 1977
Analytical approximations exist for small and large values of X:
Table I. Comparison of Approximations to erf(x) for Positive x
X
0 0.1 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
Eq. 6 C = -15
e r f b 1, ref. 3
D = -7
0
0
0.1124629 0.2227026 0.4283924 0.6038561 0.7421010 0.8427008 0.9103140 0.9522851 0.9763484 0.9890905 0.9953223 0.9981372 0.9993115 0.9997640 0.9999250 0.9999779 0.9999940 0.9999985 0.9999996 0.9999999 1.0000000
0.1082696 0.2170786 0.4256282 0.6063476 0.7477103 0.8482836 0.9140695 0.9540453 0.9767854 0.9889431 0.9950548 0.9979355 0.9992019 0.9997172 0.9999092 0.9999740 0.9999935 0.9999986 0.9999997 1.0000000 1.0000000
Deviation from erf(x) x 103
Deviation relative to erfc(x) 0 -0.005 -0.007 -0.005 0.006 0.022 0.035 0.042 0.037 0.018 -0.014 -0.057 -0.108 -0.159 -0.198 -0.211 -0.177 -0.089 0.062 0.289 0.523 0.734
0
-4.1933 -5.6240 -2.7642 2.4915 5.6093 5.5828 3.7555 1.7602 0.4370 -0.1474 -0.2675 -0.2017 -0.1096 -0.0468 -0.0158 -0.0039 -0.0005
+0.0001 +0.0001 +0.0001 0.0000
Deviation from erf(x)
Eq. 5 A=2 B=3
x 103
0
0 0.0331 0.2459 1.4578 2.9119 3.0563 1,4781 -0.7854 -2.4772 -3.0746 -2.7819 -2.0748 -1.3420 -0.7737 -0.4044 -0.1939 -0.0860 -0.0356 -0.0138 -0.0500 -0.0017 -0.0006
0.1124960 0.2229485 0.4298502 0.6067680 0.7451573 0.8441789 0.9095286 0.9498079 0.9732738 0.9863086 0.9932475 0.9967952 0.9985378 0.9993596 0.9997311 0.9998919 0.9999584 0.9999847 0.9999946 0.9999982 0.9999994
Deviation relative to erfc(x) 0 0.000 0.000 0.003
0.007 0.012 0.009 -0.009 -0.052 -0.130 -0.255 -0.444 -0.720 -1.124 -1.714 -2.586 -3.893 -5.931 -9.187 -14.06 -22.26 -35.91
Table 11. Comparison of Approximations to Normal Functions for Positive u Approximation to u
Approximation to @ ( u ) ~
0 0.1
0.2 0.3 0.4 0.5 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
Deviation from 9 ( u )
Eq. 9 E = 13.2 F = -9
NU) U
~~
ref. 3
x 103
0.5370108 0.5744425 0.6118683 0.6488421 0.6849189 0.7196764 0.7837754 0.8388911 0.8839898 0.9192039 0.9455279 0.9644288 0.9774988 0.9862190 0.9918374 0.9953327 0.9974300 0.9986415 0.9993132 0.9996694 0.9998493 0.9999353 0.9999741
erf(x) N 1 -
exp ( - x X
G
2,
-2.8170 -4.8172 -6.0431 -6.5796 -6.5436 -6.0705 -4.3692 -2.4536 -0.9403 -0.0394 0.3272 0.3591 0.2489 0.1224 0.0349 -0.0061 -0.0149 -0.0086 0.0003 0.0063 0.0084 0.0076 0.0058
(large x )
(4)
However, a relationship which can be used over a wide range of x is desirable. Buys and de Clerk ( 1 ) proposed a semiempirical approximation of the form:
+
erf(x) E (1
to 1- +(u) 0
0
0.5
0.5 0.5398278 0.5792597 0.6179114 0.6554217 0.6914625 0.7257469 0.7881446 0.8413447 0.8849303 0.9192433 0.9452007 0.9640697 0.9772499 0.9860966 0.9918025 0.9953388 0.9974449 0.9986501 0.9993129 0.9996631 0.9998409 0.9999277 0.9999683
Eq. 10 G = -9.4 H=14
Deviation relative
-0.006 -0.011 -0.016 -0.019 -0.021 -0.022 -0.021 -0.015 -0.008 -0.000 0.006 0.010 0.011 0.009 0.004 -0.001 -0.006 -0.006 0.000
0.019 0.053 0.106 0.181
0
0.1060 0.2099 0.3120 0.4127 0.5123 0.6109 0.8064 1.0009 1.1955 1.3913 1.5889 1.7885 1.9902 2.1936 2.3984 2.6038 2.8092 3.0136 3.2165 3.4169 3.6143 3.8082 3.9975
Deviation from u 0
0.0060 0.0099 0.0120 0.0127 0.0123 0.0109 0.0064 0.0009 -0.0045 -0.0087 -0.0111 -0.0115 -0.0098 -0.0064 -0.0016 -0.0038 0.0092 0.0136 0.0165 0.0169 0.0143 0.0082 -0.0025
values of x ) but this is a t the expense of using up to eight figures to express the constants. However, they have shown that the values A = 2 and B = 3 produce a generally satisfactory approximation. Their simplified equation was incorrectly printed in the original paper and the form given here is in accord with the subsequently corrected version. The present author’s empirical approximation has the form:
exp ( - A x ~ / B ) / ~ x ~ ) - ~ /( 5~)
i~
This is capable of producing rather accurate values (although it does not return the correct negative sign for negative
where C and D are constants. ANALYTICAL CHEMISTRY, VOL. 49, NO. 1, JANUARY 1977
187
This is simpler than Equation 5 and returns the correct sign for erf(x). The values of theconstants used by the author are C = -15 and D = -7. Estimates based on these values are compared with standard values ( 3 )in Table I, which also shows estimates based on Equation 5 using the one-figure constants listed above. This table also shows the error in the complementary error function (based on ten-figure values). This is a stringent test and was used in the selection of the constants. On this basis the present equation is more accurate than any of the equations produced by Buys ad de Clerk for listed values of x greater than about 1.6. The absolute error in the present equation is symmetrical about x equal to 0. The listed range of x from 0 t o 4 would generally be considered adequate. The present equation clearly cannot be used for values of x approaching 7 but this is not a limitation in practical terms. Areas under the Normal Curve. The area under the normal curve is defined by: (7)
where in statistical analysis u represents the standardized normal variable (y - p ) / a corresponding to the value of y and @(u) is the probability that a standardized normal variable selected a t random from a population having a mean value p and standard deviation u will not exceed the value u. The value @(u) is directly related to the error function by:
The empirical approximation has the form: (9)
where E and F are constants. These constants are strictly related to C and D above but in the light of the criteria discussed above the author preferred to evaluate them independently over the range of u from 0 to 4, which is the range of interest in statistical analysis. The values so determined are E = 13.2 and F = -9. Calculated values are shown in Table I1 together with standard values ( 3 )for positive values of u . The relative error in the complementary value 1- @(u) is also given. Absolute error is symmetrical about u = 0. Inverse of Areas under the Normal Curve. By means of this function the value of the standardized normal variable u corresponding to a specified value of N u ) can be determined. The recommended empirical function has the form:
G In
(- 1
- 1)
@(U)
where G and H are constants. In this case the actual error in the estimate is considered to be the most important test of acceptability. Values calculated using G = -9.4 and H = 14 are listed for comparison with standard values in Table 11. Absolute error is symmetrical about u = 0. This function is of particular use in Monte Carlo simulation. A single sample of a normally distributed population is simulated by generating a random number between zero and unity. Equation 10 above is then evaluated setting +(u)equal to the random number. The output value u represents standardized normal deviation of that sample from the population mean.
DISCUSSION Some uses of the functions under consideration have been mentioned. However, the particular utility of simple equations given here may be seen if they are contrasted with existing relationships such as the rational approximation functions listed in reference ( 3 ) .These can be very accurate but are rather complicated and require the use of constants expressed in a large number of figures. They are acceptable for use with large computers; however, they are not suitable for use with the small calculators which are now in common use and which have greatly facilitated the application of many procedures in mathematical analysis. Calculators familiar to the author have a rather limited number of storage locations for constants and, if programmable, a limited number of program steps. If calculations involving the use of error function relationships as a part of the process are to be performed, it is desirable to minimize the number of such functions so as t o leave the largest possible capacity for other calculations. The number of program steps is generally in proportion to the number of mathematical operations involved in the evaluation of the function and from this point of view the forms of equations presented here are well suited to these requirements. The few and simple constants which have been determined to specify the equations are also appropriate in this context. Each constant, if stored, takes up one storage location and in addition requires additional program steps for its recall and use. If entered as part of the program, each digit and decimal marker takes up a program step. Either way the constants given make small demands on the available capacity. The equations presented here are recommended for their general utility. However, the balance which they achieve between accuracy and functional complexity makes them particularly suitable for use with small calculators.
LITERATURE CITED (1) T S. Buys and K. de Clerk, Anal. Chem., 48, 585 (1976). (2)L. M . Schwartz, Anal. Chem., 47, 963 (1975). (3) M. Abramowitz and I A. Stegun, National Bureau of Standards, Applied Mathematics Series, 55,Washington, 1964.
RECEIVED for review July 26,1976. Accepted September 28, 1976.
188
ANALYTICAL CHEMISTRY, VOL. 49. NO. 1 , JANUARY 1977