chromium, and nickel were not observed a t the 1000-ppm level. Further developments in photodiode array technology should help improve this situation. In addition, steps such as cooling the array to reduce dark current and allow longer integration times, averaging of multiple scans, and subtraction of fixed pattern noise should also prove beneficial. An analytical curve for the atomic absorption of calcium (4227-A line) in an 02-H2 flame is shown in Figure 5B. The hollow cathode lamp was imaged (1:l) onto the entrance slit of the monochromator using a glass lens (f = 10 cm) with the flame located between the lens and entrance slit. The resulting signal from the calcium 4227-A line was easily detected with the photodiode array. As for the emission studies, each data point is the difference between the averages of six measurements of the sample plus background and background with water aspirating for the background measurements. The clock rate was 30 KHz and the integration time was 610 msec. The per cent average deviation from the mean for the 20-ppm datum point was 3%. The multichannel nature of the photodiode array makes it particularly well suited to multielement flame emission or atomic absorption spectrometry. I t is necessary only that the analytical lines of the desired elements be within the spectral region covered by the photodiode array. Timing and measurement circuits similar to those presented in Figure 1B could be used to gate out and subtract or ratio the output signal a t several positions along the photodiode array. The photodiode array detector should also be quite useful for time resolved spectrometry as it can be repetitively scanned a t relatively rapid rates. At the clock rates typically used in this study (30 KHz), the spectral region cov-
ered by the photodiode array can be observed every 8.6 msec. However, it must be remembered that the observed signal level represents the integral of the light intensity falling on the photodiode array over the time interval since the array was last scanned and is, therefore, not the instantaneous value of the signal a t the time of readout. At the maximum specified clock rate of 10 MHz, repetitive scans could be made in a minimum time of 25.6 psec. However with such short integration times, only very strong signals could be observed. Thus these self-scanning linear silicon photodiode arrays provide a unique and versatile detector system for spectrometric measurements. Their use, of course, is not confined to line spectra but spectrometric measurements over the UV-VIS and near-IR regions can be carried out using these detectors. Future technical developments are sure to greatly expand and improve the capabilities presented here. Already Reticon has announced (21) an array consisting of 512 diodes with similar characteristics to those of the array used in this study. In addition, other integrated circuit technologies for imaging purposes are rapidly being developed. The more prominent of these include charge coupled devices (22, 23) and bucket brigade devices ( 2 4 ) . Received for review September 7 , 1972. Accepted February 5, 1973. Financial support by the University of Alberta and the National Research Council of Canada is gratefully acknowledged. (21) Electronics, 45 (15), 119 (1972). (22) G. F. Ameiio, W. J. Bertram, Jr., and M . F. Tompsett, / E € € Trans. Electron Devices, E D - I S , 986 (1971). (23) Eiectronics, 45 ( 7 ) , 29 (1972). (24) Laurence Altman, E l e c t r o n m . 45 ( 5 ) . 62 (1972).
Accurate Determination of Absorption Line Frequencies Using Simple Least Squares Convolution Techniques Lloyd J. Johnson and Marlin D. Harmony’ Department of Chemistry, The University of Kansas. Lawrence, Kansas 66044
The convolution curve-fitting procedures of Savitzky and Golay have been extended to permit more accurate determination of the frequencies of spectral absorption lines. This has been accomplished by considering the inherent error arising in cases of imperfect functional fit. The origin of this error has been illustrated for a very simple (quartic) functional form, and mathematical expressions for a correction factor have been given for the Lorentzian and Gaussian functions. Microwave spectral measurements of carbonyl sulfide have been used to show the utility of the method for experimental data conforming to the Lorentzian function. The computational techniques described here are sufficiently simple to permit their use in conjunction with small laboratory digital computers. 1494
ANALYTICAL CHEMISTRY, VOL. 45, NO. 8, JULY 1973
In recent years, the small digital computer has become an integral part of many research laboratories in chemistry and other disciplines (1). While the typical characteristics of these machines (small memory and word length, lack of hardware floating point arithmetic, etc.) dictate against their use for sophisticated, large-scale computations, they are extremely valuable for numerous data collection and equipment control functions (2-5). In addiA u t h o r t o whom r e p r i n t requests should b e addressed. (1) S.P. Perone,Ana/. Chem.. 43, 1288 (1971) (2) D. M. Hannon, D. E. Horne, and K. L. Foster, / E M J. Res. Develop. 13, 79 (1969). (3) Y . Okaya, / E M J . Res. Develop.. 13, 126 (1969) (4) M. F. Burke and R . G. Thurman, J. Chromatog. Sci.. 9, 77 (1971). (5) W. D. Gwinn. A. C. Luntz, C. H. Sederholm, and R. Millikan, J. Comput. Phys. 2, 439 (1968).
tion, the machines serve admirably for many simple, often preliminary, data work-up routines (6-8). One of the well-known techniques in this latter class is that of digital smoothing or curve fitting by means of polynomial least squares convolution principles (9). This well-documented method has found much use since it involves only simple numerical operations and requires very little machine time or memory. Consequently, it is ideal for small laboratory computers operating in a real-time, on-line mode. We have used this technique for smoothing and curve fitting of spectra obtained from our microwave spectrometer, which is equipped for automated operation with a Hewlett-Packard 2116B computer and associated peripherals ( I O ) . In this communication, we shall describe the use of the simple convolution technique for finding accurately the centers of spectral lines having Lorentzian or Gaussian shapes. The same procedures may be applied to finding the maxima or minima in any type of experimental data with appropriate modifications.
CONVOLUTION LEAST SQUARES METHOD Several years ago Savitzky and Golay (9) introduced the simplified convolution techniques needed for curve fitting, data smoothing and peak finding. According to this method the nth degree polynomial which best fits (in a least squares sense) a set of 2 m + 1points ( x t , y L )is, given by y, = b,,
+ b,,,i +
b,,$
+
...
b,,,,?' = x b n k i k (1) h -0
where
(2) The abscissa values of the data are equally spaced with interval A and the new variable i is defined by
. 1 =
-
x + 6
A
where 6 is the translation of the original axis system necessary to bring the central data point to the value i = 0. Savitzky and Golay (9) have tabulated the Ci(n,k,m) for m = 0 to 12 (0-25 points), for second through fifth degree (n = 2-5) polynomials, and for all values of K = n. In addition the normalization constants N n k m are listed. Using these tables (portions of which are easily stored in computer memory), the b n k values are trivially calculated by Equation 2. If the curve being fit has a maximum or minimum, the coefficients b n k lead directly to the location of this extremum. Thus for a 2nd degree polynomial fit,
(4) We are interested particularly in finding the value of the abscissa a t the maximum of symmetric absorption features having the Lorentzian or Gaussian shape ( 1 1 ) . Neither of these functions can be fit perfectly by a polynomial, although the inverse of the Lorentzian is a perfect quadratic function. Most spectral absorption or emission lines (6) M . P. Klein and G. W. Barton, Jr., Rev. Sci. Instrum., 34, 754 (1963). (7) W. F. Gutknecht and S. P. Perone, A n a / . Chem., 42, 906 (1970). (8) T. C. Farrar, A n a / . Chein.. 42, (4). 109A (1970). (9) A. Savitzky and M . Golay, Anal. Chem.. 36, 1627 (1964). (10) R . D. Suenram and M. D. Harmony, J . Chem. Phys., 56, 3837 (1972). (11) M. D. Harmony, "Introduction to Molecular Energies and Spectra," Holt. Rinehart and Winston. New York, N.Y., 1972, pp 102-103.
~
~
Table I. Data Points Satisfying the Function in Equation 5 X
-1
y i
9.38 -2
0 316
-1
2 2.44 1
1 ,0039
0
3 25.63
2
approximate one of these two functions, but seldom do they follow them exactly. Often the absorption features exhibit some asymmetry due to a variety of instrumental or more fundamental theoretical factors (modulation effects, unresolved fine structure, etc.) In any case, for the purpose of locating the maximum, a second degree function provides a suitable approximation for the symmetrical Lorentzian or Gaussian functions or for slightly perturbed versions of these functions. (To account for asymmetry in the line shape, a third degree function may be utilized). In the following section, we shall analyze the Lorentzian and Gaussian cases in more detail. Before proceeding, however, we provide a simple illustration which shows the general problems to be met. When the curve-fitting function does not exactly match the experimental points, the maximum will not be correctly determined in general, except for the unlikely occurrance that the central data point ( i = 0) lies exactly a t the extremum. Consider for example the data of Table I, which follow exactly (to within round-off errors) y =(x
-
;y
(5)
For these synthetic data, we know that the minimum is located a t x = 314, y = 0. Suppose we wished to obtain the best fit of these data to a second degree function. The five data points permit us to perform a five-point fit ( m = 2), and the points have been correspondingly labeled from - m to +m. (Note for this data set that A = 1 and 6 = -1). Using Equation 2 and the tabulated coefficients (9),CL(2,k,2),we find y = -2.054 3.462 i + 9.608 i2 as the second degree function which best fits the given data points. By Equation 4, we find i = -0.18 as the location of the minimum, and since 6 = -1, xmln = 0.82. Thus, our fitting procedure has located the minimum of the data with an absolute error of 0.07. This is a rather extreme example (fitting quartic data to a quadratic function), but it serves to illustrate the fact that an error will exist if the fitting polynomial differs in functional form from the true functional form of the data points. This will be the case for both the Lorentzian and Gaussian functions. Several means are available to decrease the error, such as: (a) using a polynomial which fits better; ( b ) using data point intervals which are small compared to the characteristic width of the function; (c) developing an appropriate correction term to apply after the fitting procedure has been performed. The first of these is entirely reasonable but leads to a more complicated computation, since more b n k need evaluating and the equation for computing the maximum in terms of the b n k may be more complicated also. The second alternative is one which must be carefully weighed; decreasing the data point intervals will usually require increasing the number of data points (particularly for imperfect experimental data), since otherwise too small a portion of the curve will be sampled. Increasing the number of data points nullifies to a great extent one of the advantages of digital data collection, uit., the option to sample only relatively few of the continuous range of points available in the analog experiment. The proper balance between these parameters is a matter which will probably vary from experiment to experiment, depending on such
+
A N A L Y T I C A L C H E M I S T R Y , VOL. 45, NO. 8, J U L Y 1973
1495
J W
E
I-
z
I
-1.6
- .8 1
1
1
l
0
l
I
.a
W
t
0 LT
I .6
uR
W
a
Figure 1. Error in fitting perfect Lorentzian data by convolution least squares procedure for case when N = 9 and p = 5. 6 is defined by Equation 9 and ecor is similarly defined using V R (corr) factors as duration of the experiment, data collection rate, and so forth (12). We shall not attempt any general analysis of these matters but shall turn now to the third alternative, which may be utilized in addition to, or in place of, the other considerations.
CORRECTION FACTOR FOR LORENTZIANS AND GAUSSIANS In general, the nonrandom, non-experimental error incurred in locating the maximum (peak) in a set of spectroscopic data is a function of four or more parameters: (1) the degree n of the fitting polynomial, (2) the number of data points N = 2m + 1, (3) a set of 1 parameters u1. . .u2. . .ul which describe the true functional form of the data, and (4) a parameter V R which describes the location of the central data point relative to the true maximum of the function described by the data. Our discussion will be limited to the case n = 2. Consider a Lorentzian distribution of data points,
&L 21
L
1
1
I
2
'
I
3 4
I
'
'
uo
- di
- v0 + s
= 0)
A
A
(7)
where the second equality follows from Equation 3. Using Equations 3 and 7 and taking p = @'/A, the Lorentzian form becomes Y = (i
-
4' UR)*
+ p'
In this coordinate system, the true maximum of a set of data points conforming to a Lorentzian function is given by imax = U R . A convolution least squares fit will produce an apparent maximum given by imax(app) = uR(app), with an error defined by E = "R
-
uR(app)
(9)
For the n = 2 case, t = ~ ( N , ~ , V R ) . We illustrate the error function in Figure 1 for a particular choice of A' and @. Note that 6 is an antisymmetric, oscillating function, and that 6 = 0 when U R = 0. Similar curves occur for other choices of N and p. I t is seen that a perfect fit ( t = 0) can be obtained for large V R ( e . g . , in Figure 1 a perfect fit occurs a t I V R ~ = (12) K Yamashita and S Minami, Jap J Appl Phys , 10, 1097 (1971)
1496
ANALYTICAL C H E M I S T R Y , VOL. 45, NO. 8, JULY 1973
I
Figure 2. Error in fitting perfect Lorentzian data as a function of line-width ( p ) and number of points ( N = 7, 9, 13, 17, 21). Computed for case when V R = 0.5
1.4), but the chances of obtaining a good fit become slim when the central point does not lie in the range -1 < V R $1. In this region Figure 1 shows that ~ ( v R can ) be represented fairly well by a straight line. This suggests that a Taylor's series expansion of ~ ( Y R )should converge rather rapidly, so that a good approximation might be obtained by neglecting terms beyond the first power in V R . This expansion has been carried out for the Lorentzian function, and has been used to obtain a correction factor y, which permits evaluation of a more accurate U R value. udcorrected) = yuR(app)
s 2
S?'-
CS,?+- p*s,
sas,. '
(10 1
i2
C!iz
UR =
I
BETA
=
Here LY is the maximum value of f ( u ) , p' is the half-width a t half-maximum intensity (HWHM) and v0 is the peak frequency of the absorption line. We define V R as the distance of the central data point from v0 in units of A ,
'
5 6 7 6 9 1 0
i?
(11)
+ 02~
where
s ~ =+xir+k. ~
( 12)
The derivation of Equation 11is given in the Appendix. For the case illustrated in Figure 1, y has the value 1.216 by use of Equation 11. Then uR(corr) from Equation 10 leads to the error t(corr) as given in Figure 1. In the region -0.6 < V R 7 +0.6, t(corr) shows that the error in vR(corr) has been reduced to less than 0.02 unit of A ; indeed, over much of this range the error is reduced by tenfold or more. Note, however, that in the vicinity of / V R I = 1 the value of t(corr) is comparable to the uncorrected result, which indicates that the approximations leading to Equation 11 are beginning to break down. In practice it should be possible in most experimental cases to utilize a data set having the central data point within one-half increment of the maximum. Thus if 15 data points were obtained, such that 9 were on the + V R side and 6 on the - V R side, a set of either the first 11 (counting from low to high frequency) or the first 13 points must have its central value in the range -Y2 to +1/2. I t is a relatively easy task to pick out the set having the central value in this range. Consequently, we need not worry about the unsatisfactory improvement in vR(corr) in the regions around I V R ~ = 1 or greater. In Figure 2, we have plotted the 70 error [lo0 x e(corr)/u~]for the case I V R ~ = 0.5 and for a range of values of N and p. I t is seen that except when @ = 1 and N is small, the per cent errors
are satisfyingly small. Over most of the ranges shown in the figure, the percentage errors in vR(corr) represent improvements of approximately a factor of 10 over the uncorrected values. Note also that the figure covers a very wide range of practical data collection options. For example, the 9-point curve represents data collection over the range of 8 halfwidths (/3 = 1) to 0.8 halfwidth (/3 = 10). Consequently, we feel that the correction factor of Equation 11 can be used confidently in the majority of cases to reduce the per cent error in the fitting procedure to a very small value ( = 1%of vH). Our practical experience indicates that, in general, data points should be collected over a range of at least two HWHM, which is equivalent to a minimum of 2/3 1 points. Subject to this constraint, and assuming also that the central data point is within Yz A of the maximum of the absorption curve, our computations show that the error in the uncorrected values vR(app) may be as large as 20% for 4 5 /3 5 10. Using the corrected values given by Equation 10, the error should never exceed 3% for the same range of /3; moreover, for /3 2 7, the error will be less than 1%. Consequently, it is advisable to use the correction factor described here for all applications requiring fitting accuracies better than 0.1 S. (20% of v~ is 0.1 A for Y R = 0.5 A). It is also advisable to avoid the use of data having /3 < 4, since even the corrected values may be subject to rather large errors, as shown in Figure 2. Entirely similar principles can be applied to an analysis of the error in fitting a Gaussian function. The Appendix presents the expression for y which is appropriate for this case.
+
APPLICATION TO MICROWAVE SPECTRA The correction factor discussed in the previous section was incorporated into a program for the calculation of the frequencies of the centers of microwave absorption lines. These lines theoretically follow a Lorentzian contour, and thus y may be calculated by Equation 11. The program was written in Fortran IV and was designed to run in a real-time environment so that the computer could calculate the u R ( corr) values immediately after collecting the data. As was stated in the previous section, V R must be well within the range -1 to 1 for the justifiable application of y. To do this, 15 data points are chosen over a frequency range of roughly 3 halfwidths of the line in such a way that there are definitely 1 or 2 more data points on the low side of y o than on the high side. vR(app) is then calculated using the first 11 data points (counting from low frequency). If the absolute value of this number is greater than 1, the calculation is repeated using points 2 through 12, 3 and 13, etc., until luR(app)l 5 1. Finally y and yR(corr) are calculated using an estimated halfwidth. This asymmetrical collection of the data was adopted because it was felt that it would be easier for the operator than centering the data about some approximate value of u,,. It should be mentioned that in our experimental procedure, the data set is observed on an oscilloscope so it is relatively easy to estimate /3 t o within an accuracy of ~ 0 . unit 2 of
A.
-
This technique was applied to the measurement of the OCS 2 1 microwave absorption line a t 24325.921 MHz. In Table 11, the results for several independent runs carried out on several different days have been listed. Rather than comparing the computed uo(app) and vo(corr) to the accepted value, we have chosen to compare the result for each run to that obtained by applying a nonlinear least squares fit (done off-line on the University HW-635) to each data set. Thus factors such as absolute frequency
Table I I . Experimental Measurements of the Microwave Absorption Frequency of the OCS 1 2 Rotational Transition +
Errora Run Uncorrectedb +0.0965d f0.0683 f 0.0794 -0.0054 f 0.0492 4-0.0680 f0.1400 -0.1322
ErrorU
CorrectedC -0.0107 -0.0007 -0.0021 -0.0579 -0.0261 -0.0717 -0.0377
+0.0222
Run
Uncorrectedb
Correctedc
9 10 11 12 13 14
-0.1503 -0.l 106 -0.1 107 -0.1200 -0.0988 0.1497 - 0 . 1 21 6 -0.1161 0.101
-0.0103 +0.0076 - 0 . 0 1 11 f0.0153
15 16 Mean
-
0.0000 -0.0320 -0 . 0 4 9 4 -0.0333 0.024
Difference between frequency determined by convolution techniques and that determined by a standard nonlinear least squares procedure. Determined by using frequency as computed by Equation 4 . Determined by using frequency computed from Equation 10. Units are those of A ; in this case, A = 0.03 MHz.
('
calibration and time delay effects are largely cancelled, and the comparison represents simply the difference between the corrected-convolution approach and a fullblown nonlinear least squares fit (which should be the best possible statistical result). It should be stressed that we are fitting real experimental data here, rather than perfect synthesized data as in the previous sections. The data are not perfect therefore; the S / N ratios were in the range of about 10-50, and no attempt was made to delete any bad data sets. From the average results of Table 11, it is clear that the use of the correction factor has a beneficial effect amounting to about a factor of four. For only two of the runs does the corrected value lead to a poorer result than the uncorrected value. The error units are those of A , the data point spacing, which was 0.030 MHz for these runs. The average error of 0.024 S. represents, therefore, an absolute relative frequency error of only 0.0007 MHz. Thus it appears that the simple convolution least squares method can be used effectively to obtain accurate values for the frequencies of spectroscopic absorption lines.
APPENDIX
+
Savitzky and Golay (9) have shown that the set of n 1 normal equations which give the best least squares fit of a set of data points y L to a n nth degree polynomial of the form of Equation 1is given by h=0
where
i=-In
(14)
m
i= -m
and the index r takes values 0,1,. . . n. Explicitly, for the n = 2 case, Equation 13 becomes
We consider now the Lorentzian case (Equation 8) in detail. As shown by Equation 4, the apparent maximum is given by A N A L Y T I C A L C H E M I S T R Y , VOL. 45, N O . 8, J U L Y 1973
1497
VdaPP) = -b,,/2b,2
(17)
where bzl and bzz are obtained from Equation 16 a s bn = Fi / S,
If a similar procedure is followed for a Gaussian function, y = f(u) = a exp[-(u - V ~ ) ~ P ' ~ I (21) in which @' = W W H M , the correction factor becomes
(18) We now expand t as defined in Equation 9 in a Taylor's series about U R = 0, t
=
€0
+ u*&)
+
'..
(19)
VR'0
Since-c" = -uR(app) by Equation 9: Equations 16, 17, and 18 show that t o vanishes for the symmetrical Lorentzian function. By tedious differentiation of t (as given by Equation 9, we get finally .^
J The same coordinate transformations have been used and @ is again the value of p' in units of A .
ACKNOWLEDGMENT Assistance in the early stages of this work from P. M. Fast and D. L. Nordlund is greatly appreciated.
1
(20)
Substitution of this expression into Equation 9 leads to the desired results as given in Equations 10 and 11.
Received for review October 24, 1972. Accepted February 5, 1973. We are grateful to the National Science Foundation for support of this work by grants GP-15127 and GJ32213, and to the University of Kansas Computation Center for computer time on the HW-635. One of us ( U J ) appreciates the support from an NDEA traineeship. Laboratory computer equipment was provided by NSF grant GJ 332.
Screening for Heroin-A Comparison bf Current Methods David Sohn Department of Pathology, New York Medical College Center for Chronic Disease, Welfare Island, New York, N. Y . 1001 7
Julius Simon, Moheeb A. Hanna, Guirguis V . Ghali, Ramadan A. Tolba, and Vahe Melkonian Laboratory for Chromatography, Bayside, N. Y. 11364
Heroin (diacetylmorphine) is excreted as morphine primarily in urine. Two-thirds or more of urinary morphine is glucuronide conjugated. Quinine is a major diluent of heroin. A study is made of 3009 samples positive for quinine among some 50,000 urine samples examined during the first half of 1972. They were analyzed by thin layer chromatography (TLC), spectrophotofluorometry (SPF), and automated morphine analyzer (AMA). Yields of morphine-positive samples were 19.9, 24.3, and 27.0%, respectively, by TLC, AMA, and SPF. Additional studies including efficacy of hydrolysis; gas-liquid chromatography (GLC): and Hemagglutination Inhibition ( H I ) immunoassay are presented. Results indicate the need for hydrolysis in techniques sensitive only to unconjugated morphine: that new columns, with shorter retention times for morphine, may permit GLC to be used for initial screening: and that H I which is also sensitive to conjugated morphine may represent a superior method of screening. Limits of sensitivity were on the order of 0.5, 0.2, 0.1, 0.1, and 0.03 Fg/ml, respectively, for TLC, SPF, AMA, GLC, and HI.
Addiction to heroin, or diacetylmorphine, is a major medical-social problem. In the body, heroin is rapidly deacetylated to morphine. In the human, urine is the major route of excretion. Morphine is excreted either free or conjugated, primarily to glucuronide. 1498
A N A L Y T I C A L CHEMISTRY, VOL. 4 5 , NO.
8, JULY 1973
Studies by Way and Adler ( I ) and others, in man, indicate that over 50% of the total dose is excreted in the urine within the first 8 hr and roughly 90% during the first 24 hr following injection. Free morphine represents between 5 and 20% of the total amount excreted in urine, primarily between 3 and 8 hr after parenteral administration. Measurable urinary free morphine persists for 48 hr or longer. Following the same dose of injected morphine, there are great individual variations in the ratios of free and conjugated morphine a t any given time as well as in the interval during which free morphine is present at detectable levels. The overall sensitivity of any method of analysis depends on whether it is free or total morphine that is identified. The time interval between administration of the drug and sampling is therefore an extremely critical determinant of overall sensitivity. In practice, among a large group of specimens containing morphine, administration of the drug may have taken place from several hours to several days prior to sampling. Therefore, this is a study of distribution within a population and many samples must be studied if different methods of analysis are to be compared. In the basic study presented here, a total of some 50,000 urine samples were reviewed. Current methods of bioanalysis detect either free morphine only or total morphine. Since total morphine levels (1) E. L. Way and T. K . Adler. "The Biological Disposition of Morphine and its Surrogates," W.H.O.. Geneva, 1962.