Bayesian approach to resolution with comparisons to conventional

The advantages of this feature selection method over weight-sign feature selection are twofold. First, it allows the weight vector to be initialized t...
4 downloads 0 Views 885KB Size
number of peaks that were eliminated in the first pass. This is caused by benzene having very broad, low intensity absorptions. In Table 111, feature selection was done by eliminating features having all combinations of weight vector components except IIIB and IV. Most classes would not train before the feature selection routine had completely reduced to a minimum. While the second method reduces the features to a bare minimum, only those with unique behavior will continue to train, causing this method of elimination to be unapplicable to most classes of compounds. The advantages of this feature selection method over weight-sign feature selection are twofold. First, it allows the weight vector to be initialized to zero, thus removing any bias caused by the weight vector initialization step. The second advantage is that, with this feature selection method, only one weight vector need be computed, so the computer time needed to train for a class is decreased by a factor of two. Figures 2 through 4 show weight vector plots of the three functional groups that would train after feature selection was performed as in Table 111. The abscissas are wavelength in microns and the ordinates are scaled arbitrarily. In the case of carbonyls and carboxylic acids, this feature selection was continued until no more features could be eliminated. By using the restraint that all features must be in classes IIIB or IV, all weight components must have a positive value as is shown in Figures 2 and 3. This shows that there exists a large amount of unique or semi-unique behavior in these two classes. For alcohols, much less of this behavior is evident. To separate the alcohols from the other compounds, it is necessary to have negative weight components of the weight vector. Literature infrared spectral tables (6, 7) show a great deal of similarity with the IR weight vector component (6) R. M. Silverstein and G. C. Bassler, "Spectrometric identification of Organic Compounds," Wiley, London, 1967, p 73. (7) "Handbook of Chemistry and Physics," Robert C. Weast, Ed., The Chemical Rubber Co., Cleveland, Ohio 1969, D F169.

maps. In the case of carbonyls, the tables show sharp peaks in the 5.8-6.1 p and 7.6-9.1 1 ranges. With carboxylic acids, there are also close similarities between the weight components and the IR tables. Silverstein and Bassler show sharp peaks a t 3.0-3.6, 5.8, and 7.7-8.0 and medium peaks a t 3.7-3.9, 7.0, and 7.7-8.0. The CRC tables show sharp peaks a t 5.8-6.1 and 7.6-8.4 and moderate peaks a t 3.1-3.4, 6.97.4, and 10.4-11.7. With alcohols, the correlation is not as good a t first glance. Since feature selection has been performed several times in reducing the spectra to only nine peaks, any feature remaining will have a high degree of correlation with the OH functional group. The IR tables list alcohols as having sharp peaks in the 2.8-3.1 and the 8.2-10.0 ranges. All the remaining nine peaks are found in these ranges.

CONCLUSIONS A new error correction feedback training method, called the fractional correction method, has been developed and tested. The classifiers generated with the fractional correction method gave predictive abilities from 93 to 100% on complete unknowns. The average predictive ability for the six chemical classes investigated was higher than the predictive abilities attained by any of the classifiers generated by other methods. When the fractional correction training method was combined with an offset intensity scale for the IR absorptions, a new feature selection procedure could be used. Complete training and high predictive abilities were obtained for patterns containing as few as 20% of the original features. A second new feature selection approach attempts to identify which features can be expected to be useful for classification by breaking down the weight vectors into positive and negative components. Further studies are being pursued to investigate more fully the methods reported and to apply them to other sets of data.

RECEIVEDfor review June 20, 1974. Accepted August 14, 1974. The financial support of the National Science Foundation is gratefully acknowledged.

Bayesian Approach to Resolution with Comparisons to Conventional Resolution Techniques P. C. Kelly' and Gary Horlick Department of Chemistry, University of Alberta, Edmonton, Alberta, Canada T6G 2G2

A Bayesian approach to resolution is presented in which a direct calculation is made of the posterior probability that the height of a second peak in a doublet is greater than zero. Since it makes full use of all available information, the Bayesian method should be an effective method for resolving peaks. This conclusion is borne out in a comparison of the Bayesian method to deconvolution, second derivative, cross-correlation, and least-squares methods for the resolution of noisy Lorentzian doublets with various relative heights and separations. In addition, the meaning of results obtained by both least-squares and Bayesian methods in a low signal-to-noise ratio situation is discussed and illustrated. Present address, Research Centre, Canada Packers, 2211 C l a i r Avenue West, Toronto, Ontario, Canada M6N 1K4.

2130

St.

The term resolution refers to the decomposition of a system into its constituent parts. The problem of resolution is frequently encountered by analytical chemists, both instrumentally during acquisition of analytical data and interpretively while processing analytical data. This paper is concerned with resolution a t the processing step where the system is some type of recorded spectrum and the components are peaks. Resolution methods rely on the generation of a resolution statistic. For example, it is readily apparent that a t least two peaks are present when a valley can be seen between two peak maxima. A suitable statistic might be the depth of the valley. Many of the common approaches to resolution involve techniques such as deconvolution or second-differentiation which accentuate the depth of the valley between two peaks.

ANALYTICAL C H E M I S T R Y , VOL. 46, NO. 14, DECEMBER 1974

In addition to parameters such as separation, width, size, and shape, the resolvability of peaks is dependent on the type and intensity of any noise present on the signal. The presence of noise means that any resolution statistic will have some degree of uncertainty associated with it. Such a random statistic can be converted into the probability that a second peak is present in a suspected doublet. Leastsquare and Bayesian methods ( I , 2 ) of resolution rely on such a probability statistic. The specific resolution problem studied here is the detection of a spectral peak near a primary peak which is known to be present. In addition, it is assumed that the shape and width of the second peak are known in advance. A useful probability statistic for resolution in this case is the probability that tkie height of the secondary peak is greater than zero. The Bayesian method presented here makes full use of available information in a spectrum about the height of a second peak by converting the spectrum directly into the required probability-statistic with few prior assumptions about the system. The principles of the Bayesian method are developed gradually by reviewing four other common methods of resolution. These include methods which rely on valley depth (deconvolution, second-derivative), peak detection (cross-correlation with subtraction), and probability (least-squares). A rough idea of the degree of effectiveness of the methods is obtained by analyzing several noisy doublets and some insight on the theoretical and practical implications of the Bayesian method with respect to least-squares methods is provided. The list of resolution methods covered here is not exhaustive. Many methods reported in the literature are slight modifications of those used here. Fourth- and second-derivative with polynomial smoothing are examples ( 3 ) . Pattern recognition ( 4 ) and principal component ( 5 ) methods may be applied to resolution problems also. The last two, however, are more suitable for systems where the final result (resolution of components based on differences in physical properties) may be clearly defined but where the system is too complicated to formulate a statistic based on exact relations between observations and the physical property of interest. Also, it is important to note that in this presentation we are concerned primarily with methods used to generate a resolution statistic and not with decisions about resolvability based on the resolution statistic, as decision rules will depend on the subjective interests of the observer. With a valley depth resolution statistic, a valley depth near zero might be reason enough for a research chemist to do additional work, but a depth considerably greater than zero might be required by a forensic chemist to prove the presence of a poisonous substance. Or with a probability resolution statistic, a probability of 0.95 that the height of a secondary peak is greater than zero may satisfy one investigator while another may require a probability of 0.99.

TEST SPECTRA The spectra used to test the resolution methods were prepared by computer simulation (IBM 360/67). Spectral doublets, s ( t ), with various relative heights and separations were added to a section of background noise, n ( t ) , ( 1 ) C . W. Helstrorn. "Statistical Theory of Signal Detection." Pergamon

Press, Oxford, 1968. ( 2 ) M. G. Lichtenstein and T. Y. Young, I€€€ Trans. Information Theory, IT14, 288 ( 1 968). ( 3 ) R. R Ernst. "Advances in Magnetic Resonance," Vol. 2, J. S. Waugh, Ed., Academic Press, New York. N.Y., 1966, p 1. ( 4 ) L. B. Sybrandt and S. P. Perone, Anal. Chem., 44, 2331 (1972). ( 5 ) D. Macnaughton, Jr., L. B. Rogers, and G. Wernimont, Anal. Chem., 44, 1421 (1972).

1

II

I

I

I

Figure 1. Typical simulated signals The height and separation of the second peak in Figure 1 A is 500 units and 20 sec; in Figure 18. 500 units and 10 sec; and in Figure l C , 100 units and 20 sec. The height and width of the primary peaks are 1000 units and 20 sec in all cases. Figures 1A, 16, and 1 C correspond to Spectra 2, 4, and 5 in Table I

obtained by digitizing the dark current signal from a 1P28 photomultiplier tube. The final simulated signal, x ( t ), may be expressed as: x(t) = s(t)

+ n(t)

(1)

The background noise was assumed to have a Gaussian probability density function with an unknown mean and a constant standard deviation of 61 units (6.1 nA). The noise was assumed to be white and this was verified by the flat power-density spectrum possessed by the section of noise used in this study. The spectral doublet may be written:

where g is a Lorentzian peak shape function:

g ( t ) = (1

+

4tz)-'

The first term in Equation 2 represents the principal peak with height A (1000 units throughout this study) assumed to be unknown, and position, t o , assumed to be known. The second term in Equation 2 represents the secondary peak with height B (variable and unknown), separated from the position of the principal peak by distance d (variable and unknown). The width a t half height, w in Equation 2, was assumed to be known (20 seconds throughout this study) and the same for both peaks. The last term in Equation 2, f i , represents the unknown mean of the noise. Since all calculations are carried out on a digital computer, the complete simulated signal (Equation 1) is digitized. The quantization interval for the ordinate was 0.001 ordinate unit and samples were taken a t 1-sec intervals along the abscissa. For the peaks studied here, quantization noise is negligible and sampling errors are well below 0.01% (6). Three of the simulated signals are shown in Figure 1. Noise free signals are shown in the upper part of the figure and the actual signals used to test the resolution methods in the lower part.

RESOLUTION METHODS Deconvolution. In principle, the peaks in a spectrum can be reduced to sharp spike-like delta functions by a deconvolution operation. Such a deconvolution is most easily pictured as a division in the Fourier domain ( 7 , 8 ): where

X is the Fourier transform

of the observed spec-

( 6 ) P. C. Kelly and Gary Horlick, Anal. Chem., 45, 518 (1973). ( 7 ) R. N. Bracewell and J. A. Roberts, Aust. J. Pbys., 7, 616 (1954) ( 8 ) Gary Horlick, Appl. Spectrosc.. 26, 395 (1972).

ANALYTICAL CHEMISTRY, VOL. 4 6 , NO. 1 4 . DECEMBER 1974

2131

'h i

A

lfffii

i

f

i

I

f

Figure 2. Effect of noise on deconvolutioii in the Fourier domain

trum, G is the Fourier transform of the Loreritzian peakshape function, and f is frequency in hertz. The inverse Fourier transform of X ' ( f ) will yield the desired narrow line spectrum. The observed signal, however, contains noise as well as peaks. The effect of noise on deconvolution is shown in Figure 2 in the Fourier domain. The intensity of the Fourier transform of the peak shape function (Figure 2 A ) approaches zero a t high frequencies, while that of the noisy signal (Figure 2R) remains finite a t high frequencies. Deconvolution amplifies high frequency noise to such an extent that the desired delta functions are obliterated (Figure 2C). Bracewell (9) suggested that the deconvolved spectrum be multiplied by a factor Q ( f ) such that the average absolute square of the difference between the desired spectrum (delta functions) and the deconvolved spectrum, X' ( f )be a minimum. The appropriate factor is:

(5) where

and is the power-density spectrum of the noise and S ( f ) is the Fourier transform of the peaks (Equation 2). In Equation 6, S ( f ) requires estimates of the heights of peaks in the spectrum. Since Q ( f ) is insensitive to the actual peak heights used, the difference between the maximum of the doublet and the approximate base line of the raw data was taken as a crude estimate of the sum of peak heights. With the objective of maximizing signal-to-noise ratio, Ernst ( 3 ) derived an expression similar to Equation 6. In his expression, the 1 in the denominator of Equation 6 is replaced by a variable factor that allows the experimenter to makes a trade off between signal-to-noise ratio and resolution enhancement. The result of the application of this deconvolution method (Equations 5 and 6) to the simulated noisy spectrum in Figure 1A is shown in Figure 3A. Second Derivative. The depth of the minimum between peaks is a natural and obvious statistic for resolution. Practically an infinity of rules have been based on the ratio of the depth of the minimum to the height of one or the other peak ( I O ) . If the negative second derivative of a spectrum is taken, peaks will be obtained corresponding to the original peaks but more narrow. Taking the negative (9) R. N. Bracewell, Proc. IRE, 46, 106 (1958). (10) S. G. Rautian, Sov. Phys.-Usp., 66 ( l ) , 245 (1958)

2132

Figure 3. Spectra resulting from the application of deconvolution ( A ) , second derivative (€9,and cross-correlation-subtraction(C) resolution methods to the noisy doublet shown in Figure 1A

second derivative of a spectrum, x ( t ) , corresponds to multiplying its Fourier transform, X ( f ) by the square of the frequency. Thus, Equation 5 can be used to obtain the second derivative of a spectrum simply by replacing l/C ( f ) with f 2 . Since the desired signal as well as the noise is multiplied by f 2 , high-frequency noise will cause a problem analogous to that depicted in Figure 2. The effect of high frequency noise can be minimized by multiplying the Fourier transform of the negative second derivative by exactly the same 4 factor (Equation 6) as used for deconvolution. The negative second derivative calculated as outlined above for the simulated noisy spectrum of Figure 1A is shown in Figure 3B. Cross-Correlation. In the original definition of the resolution problem, it was stated that although the height of the principal peak is unknown, the position of the principal peak is known. This prior information about the principal peak can be used to minimize the influence of the principal peak by estimating the height of the principal peak and subtracting an estimated peak from the spectrum. The resolution problem then reduces to one of detecting the presence of the secondary peak in the noisy base line. This can be done by cross-correlating the remaining signal with the Lorentzian peak-shape function ( 1 1 ) . It can be shown that for white Gaussian noise as used in the simulation here, cross-correlation of the noisy secondary peak with peakshape function will maximize the signal-to-noise ratio (3, 12). A reasonable and easily made guess of the height of the principal peak is the maximum of the composite spectrum. The result of subtracting this approximation of the principal peak and cross-correlating with the peak-shape function is shown in Figure 3C. The bump in Figure 3C clearly indicates the presence of a second peak. The negative part of the curve occurred because the estimate of the height of the principal peak was too large. (The maximum of the composite spectrum contains contributions from both peaks.) In practice, the estimate could be adjusted until the negative part of the curve just disappears. This modified cross-correlation procedure is similar to that used by commercial curve resolvers which sum electronically generated peaks to match an image of the original spectrum (13). Here, however, peaks are subtracted from the spectrum and cross-correlated to obtain a more sensitive indication of the presence of yet undetected peaks. (11) Gary Horlick, Anal Chem., 45, 319 (1973). (12) G. L. Turin, IRE Trans. Information Theory, IT-6, 31 1 (1960) (13) R. H. Muller, Anal. Chem., 38 (12), 121A (1966)

ANALYTICAL C H E M I S T R Y , VOL. 46, NO. 14, DECEMBER 1974

The performance of the deconvolution and second-derivative methods could be improved by subtraction but this was not considered here because these methods are not normally used this way. Cross-correlation, on the other hand, is useless without subtraction. Least Squares. An estimate of the height of a secondary peak can be provided by the well known method of least squares. In the least-squares method defined by Equation 7, the values of the unknown parameters are varied until the integral (or sum) of the square of the difference between the observed spectrum and the model of the spectrum is a minimum. The values of the parameters a t the minimum are taken as the best estimates, &, in the leastsquares sense.

i ( 1s

Figure 4. A typical sampling distribution for the least-squares estimator of the height of a secondary peak, 6

The required integral can then be obtained easily from statistical tables. This is the usual procedure in non-linear least-squares analysis and this is the procedure referred to below as "least-squares'' when the performance of methods of resolution are compared. The least-squares method is accurate only when the sampling distribution is Gaussian and this is true when all unknown parameters are linear functions of the model. In the problem considered here, the separation parameter, d, is non-linear so we cannot expect the Gaussian assumption to be accurate. In general, though, the accuracy of the Gaussian assumption improves as signal-to-noise ratio increases, or more precisely, as the amount of available information about the parameters increases. The least-squares method for the noisy doublet in Figure 1A gives Pr(B > 0) equal to unity within the limits of the computer's ability to calculate the number. The presence of a secondary peak is practically certain. Bayesian Method. The least-squares method defined above has two limitations. The sampling distribution actually describes properties of the least-squares estimator, 8, and does not provide information about the true value, B, which we are seeking. The limitation of practical interest, however, is the breakdown of the Gaussian approximation of a sampling distribution when the separation between peaks becomes small or when the height of the secondary peak becomes small. By introducing a prior pdf for the unknown parameters and by dispensing with the idea of estimation altogether (15), probabilities of true values of parameters may be calculated directly and exactly from the posterior pdf of the parameters given by Bayes' theorem:

Although only the estimate, B, of the height of the secondary peak is of interest, Equation 7 must be minimized with respect to all of the parameters simultaneously. The other parameters, A , d, and p , 'are nuisances that merely complicate the mathematics of the minimization procedure. Provided the spectrum model is accurate and the minimization procedure is accurate, the nuisance parameters will not influence the value of B on the average, although they will affect the precision of B. It is not enough to base a decision about thf: presence or absence of a secondary peak on the value of B alone. One must know also something about the precision of the estimate. Due to the influence of noise, a positive value of could be obtained even if a secondary peak is not present. Conversely, a negative value of B could be obtained even if a secondary peak is present. The frequency of making mistakes like these can be calculated from the statistical properties of the least-squares estimator. Imagine a large number of identical spectrometers simultaneously observing the same source. The spectra produced will be identical except for the details of the noise. Calculate B for each s-pectrum and plot the relative frequency of values of B us. B. If the number of spectrometers.is large enough, a continuous probability distribution function (pdf) will'be obtained. This pdf, called the sampling distribution of the estimator, could be worked out mathematically from the distribution of the noise if the true values of the parameters were known. A typical sampling distribution, p ( B ) ,is shown in Figure 4. Of course, p ( B ) will depend on the true values of the parameters, a. The sampling distribution will shift to the right as B increases and the distribution will usually broaden as the separation, d, decreases. The sampling distribution provides an objective and easily interpreted rule for deciding on the presence of a peak. If the relative frequency or probability of obtaining a positive value of B, which is given by the area below p ( B ) to the right of the point = 0, written Pr(B > 0), is greater than say 0.95, then the secondary peak may be said to be present with 95 per cent confidence. Of course, it is impractical to carry out the large number of experiments needed to specify the sampling distribution. A more practical method. is to approximate the sampling distribution with a Gaussian distribution having a mean equal to the current estimate and a variance equal to the approximate variance given by the Cram&-Rao inequality ( 1 4 ) and an estimate of the standard deviation of the noise.

In Equation 8, p , ( a ) which is called the prior pdf of the parameters, expresses the information about the parameters at hand before the data are collected. Since nothing is known about the true value of the height of the secondary peak, B, it is reasonable to assume that all possibilities are equiprobable. Making the same equiprobable assumption for the other parameters, we can take p a (a)to be equal to a constant. On the other hand, t o and w are accurately known; thus the prior pdf's of these parameters are sharp spikes. The simple form of Equation 8 makes it easy to expand the problem to the case where 10 and w are unknown or only partly known. It is simply necessary to multiply the right side of Equation 8 by the appropriate prior pdf. The second term in the numerator of Equation 8 pxl (x I a),is the pdf of the observations for given values of the parameters. The vertical solidus is read "given," and the vector x refers to the complete set of data. The pdf of a single point, x ( t 1, is Gaussian with mean s ( t ,a) and variance $. Since the noise is whit,e, adjacent data points will be independent, and the joint pdf of all the data taken together is the product of the individual pdf's of the single points (15).

(14) M. G. Kendall and A. Stewart, "The Advanced Theory of Statistics," Vol. 11, Hafner. New York, N.Y., 1973, Chap. 18.

(15) S. A. Schmitt, "Measuring Uncertainty. An Elementary Introduction to Bayesian Statistics." Addison-Wesley, Reading, Mass., 1969, Chap. 6.

A N A L Y T I C A L C H E M I S T R Y , VOL. 46, NO. 14, DECEMBER 1974

2133

Figure 5. Posterior probability density along the crest of the “hog’s back” for the noisy signal shown in Figure 1A

The denominator in Equation 8, p x ( n ) , depends on the particular set of observations but is independent of the parameters, a; it is essentially a normalizing factor. Since p x (1: ) and p a (a)are constants, Equation 8 is proportional to Equation 9. The posterior pdf of the parameters is proportional to the exponential of the integral square of deviations (ISD). Evidently the least-squares (Equation 7 ) and the Bayesian methods have an element in common but Bayesian probabilities are more easily grasped and interpreted than the ISD. The ISD is significant only a t its minimum in the least squares approach, whereas it is the variation of the ISD as a function of the parameters that is significant in the Bayesian approach. The posterior pdf of B, which is needed to calculate the probability that the height of the second peak is greater than zero, Pr ( B > 01, can be obtained from the joint posterior pdf of all the parameters by integrating p a l x ( d x ) with respect to the nuisance parameters A, d, and p between definite limits:

PBI~(B/X =)

S,

dAJmmdFJ:PaIr(LuIX)dd

(10)

This triple integral can be reduced to a single integral that can be evaluated numerically by considering the topology of the surface described by p a l x ( d x ) ( 1 6 ) .The parameters A, B, and p in the posterior pdf are linear functions of the signal s ( t ,a ) (Equation 2). If one of the linear parameters is varied while all other parameters are held constant, the posterior pdf will describe a Gaussian function. That is, a cut of the posterior pdf parallel to the axis of any one of the linear parameters will describe a Gaussian function. Therefore, the five-dimensional surface of porix(4. ) can be pictured as a ridge or hog’s back that meanders through space in a complicated manner but with sides that are nice Gaussian functions. Consequently, the joint posterior pdf will have the form

where A, E , and b here represent the values of A, B, and p a t the top of the hog’s back for a given value of separation d. These values are obtained by solving linear simultaneous “normal” equations (16). At the same time, the marginal , ~ up2 , are obtained. Both means and variances u ~ u ~ ~ and variances vary with d. The triple dots and G in Equation 11 represent covariance terms that drop out when the pos(16) W. H. Lawton and E. A. Sylvestre, Technometrics. 13, 461 (1971)

2134

terior pdf is integrated with respect to A and p. The remaining dependence of the posterior pdf on d is contained in function $ ( d ) in Equation 11. This function is too complicated to express explicitly but its value can be calculated relatively easily. On the crest of the hog’s back (the maximum of the Gaussian curve describing the dependence of the posterior pdf on the linear parameters) we will have A = A, B = A, and p = 1.A t the crest, for any given value of d, terms involving linear parameters in Equation 11 vanish leaving function $(d 1. Therefore, since the right sides of Equations 9 and 11 are identical $ ( d ) a t the crest of the hog’s back is simply the ISD of Equation 7 . That is, the value of $ ( d ) a t the crest is equal to Equation 7 minimized with respect t o A, R, and k . Along the crest, the posterior probability density will be proportional to the exponent of the ISD. Since this value is used frequently below, it is represented by R ( d ) in Equation 12.

p a I r ( a I ~~ )( ~d =) exp

[- 5w]

(12)

for A = A, B = E, and p = 8. The function R ( d ) ,which will be called the crest factor, is the value of the posterior pdf along the hog’s back. The crest factor behaves much like the posterior pdf for d but it is not the actual pdf. The crest factor is used to simplify the calculation of the probability of the presence of a secondary peak in Equation 13.

Evaluation of the terms R ( d ) , E, and CTB is straight forward. The second integral in Equation 13 is the probability of the presence of a secondary peak for a given value of d, written Pr(B > Old). This is the normal probability integral, which may be found in statistical tables, or as was done here, calculated directly by use of a computer subroutine in a program library. The first integral in Equation 13 gives the overall probability averaged over all values of d. This integral was evaluated by summing the product of R (d)and Pr ( B > 01 d ) a t equispaced intervals of d over the region of interest. Since Pr(B > 0 ) and Pr(B 5 0 ) must sum to unity, the proportionality constant in Equation 13 can be evaluated by determining Pr(B I0 ) also. The accuracy of this numerical integration depends on the number of values of d used. Usually, 10 to 20 values of d will be sufficient. Seldom is a decision so important that the probability is required with an accuracy of more than two significant figures. Figure 5 is a graph of R (d)for the noisy doublet in Figure 1A. The scale of R ( d ) in this and the following figures is arbitrary. Apart from a proportionality constant, R ( d ) is the joint posterior pdf of all parameters maximized with respect to the linear parameters. The posterior pdf of d and R ( d )are not the same but they do have similar characteristics. In particular, the most probable value of d corresponds to the maximum of R ( d ) . The most probable value of d in Figure 5 is 20.1 which is relatively near the true value 20. The true value of d is indicated by a vertical line segment in Figure 5 and the following figures. Most of the posterior density lies between separations of 19 to 21 abscissa units. Throughout this range, Pr(B > O l d ) is near unity. Therefore, Pr (B > 0) is virtually unity and a secondary peak is almost certain to be present.

A N A L Y T I C A L C H E M I S T R Y , VOL. 46, NO. 14, DECEMBER 1974

Table I. Performance of Resolution Methods Specbum NO.

1

2 3 4 5 6 7 8 9

B

d

1000 500 1000 500 100 100

1000 500 100

20 20 10 10 20 10 2.5 2.5

2.5

DeconSecond Cross Least volution derivative correlation squares

Yes' Yes

Yes Yes

No No No No No No No

No No No No No No No

Yes Yes Yes Yes Yes No No No No

lb 1 1 1 1 1 0,986 0.86 0.67

Bayesian

1 1 1 1 1 1 0.998 0.97 0.77

Bl

Yes means doublet obviously present to eye of observer. * Probability that a doublet is present. If greater than 0.95, may be taken to mean yes. Q

-30

0

30

I

r-' /

- 30

0

cI RESULTS AND DISCUSSION Results of the five methods of resolution for the detection of a secondary peak in Lorentzian doublets are presented in Table I. For the deconvolution and second-derivative methods, a secondary peak was considered present when, in the subjective opinion of the observer, a valley between the two peaks was evident and the height of the secondary peak was significantly greater than the height of peaks in the noise. For the modified cross-correlation method, a doublet was considered present when a peak significantly higher than noise could be found. Results obtained by the least-squares and Bayesian methods are expressed in Table I as probabilities for the presence of a secondary peak. A one in Table I refers t o a probability greater than 0.999. If the probability of a secondary peak is greater than 0.95, one may safely say that the spectrum contains a doublet. The probability level a t which this decision is made, of course, can be altered to suit the application of the result. The data in Table I indicate that the Bayesian method outperforms all of the other methods. This, however, should not be taken as a recommendation for the Bayesian method for two reasons. First, practical factors, such as simplicity and computer requirements, play a part in the choice of a method. The Bayesian method fails in this regard. Second, the comparison is unfair. Although the methods using visual statistics are certainly subjective, the leastsquares and Bayesian methods contain subjective elements also. The choice of a decision level is subjective. If the decision level were taken to be a number much closer to one than 0.95, the cross-correlation method could arbitrarily be made to appear superior. The choice of a prior pdf in the Bayesian niethod is subjective also. The least-squares method is subjective in assuming a Gaussian sampling distribution and in making interferences about true values from estimated values ( 1 7 ) . The data in Table I are unfair also in that better statistics could have been used for the deconvolution, secondderivative and cross-correlation methods. For example, asymmetry of peaks could be considered (18). To ensure fairness in a comparison, only the best of the myriads of possible statistics should be used. However Table I does provide a useful comparison of the resolution methods as they are normally implemented. Spectrum 9 in Table I presents an extremely difficult resolution problem. The height of the secondary peak for (17) H. Jeffreys, "Theory of Probability," 3rd ed.. Clarendon Press, Oxford,

Enaland. 1961. (18) E. grushka and G. C. Monacelli, Anal. Chern., 44, 484 (1972).

.C

Ac --.

30

figure 6. Crest factors R ( d ) (-) and probability of 8 > 0 given d, Pr(B > 01d ) , (- - -) for Spectrum 9 in Table I. See text for discussion

Spectrum 9 is %O the height of the principal peak and less than twice the standard deviation of the noise. The distance between the peaks is I/s of their widths. Nevertheless, Spectrum 9 provides the greatest understanding and insight into the meaning of the results obtained by both least-squares and Bayesian methods. The solid line in Figure 6 A represents the crest factor, R ( d ) and the dashed line represents conditional probability of a positive peak, Pr(B > Old). The scale for the conditional probability curves in Figure 6 is such that the maximum of each curve is almost one. The area of the product of the solid and dashed lines in Figure 6 A is proportional to the probability of the presence of a secondary peak located anywhere within the range of d in the graph (see Equation 13). The graph of R ( d ) in Figure 6 A is typified by a primary maximum and two small subsidiary maxima. The leastsquares result in Table I for Spectrum 9 was calculated for the value of d a t the primary maximum of R ( d ) in Figure 6A. If the least-squares method was applied blindly, however, it could have centered in on either of the subsidiary maxima. As the range of d in Figure 6 is increased, the probability of eventually finding a peak-like section of noise giving a maximum of R ( d ) even greater than the primary maximum in Figure 6 A would increase to certainty. In fact, it will be shown that most of the primary peak in Figure 6 A is due to just such a noise peak. The sharp drop of the dashed line in Figure 6 A is typical of the behavior of the conditional distribution, Pr(B > O l d ) , near d = 0. If there is an indication of a positive peak on one side of the principal peak [ P r ( B > Old) near unity], there must be just as strong an indication of a negative peak [ P r ( B > O l d ) near zero] on the opposite side of the principal peak. This follows from a consideration of the values of A and B required to fit the composite spectrum as d varies from negative to positive values. At exactly zero separation, B is undefined and its variance is infinite. That is, the spectrum cannot tell us anything about exactly superimposed peaks. Figure 6B is the graph of R ( d ) (solid line) and the conditional probability of a positive peak (dashed line) for a simulated spectrum equivalent to Spectrum 9 in Table I ex-

A N A L Y T I C A L C H E M I S T R Y . VOL. 46. N O . 14, DECEMBER 1974

*

2135

cept that the position of the principal peak has been shifted approximately 16 abscissa units to the left. Although the true spectra in Figures 6 A and 6B are identical, the posterior distributions are quite different. The primary maximum of R ( d ) in Figure 6B corresponds to the same position with respect to the noise as the primary maximum in Figure 6A. Since the primary maximum in Figure 6B is far from the true location of the secondary peak, the primary maxima in both Figures 6A and 6B are due to a peak-like section of noise. The least-spuares method applied to Figure 6B would calculate Pr ( E > 0) for the value of d a t the primary maximum of R ( d ) . However, the primary maximum as mentioned above is due to a peak-like section of noise; hence, any information provided by the least-squares method about the presence of a secondary peak would be erroneous. The Bayesian method applied to the spectrum giving Figure 6B results in a value of 0.79 for the probability of a secondary peak. At a probability decision level of 0.95, the Bayesian method would incorrectly state that a secondary peak was not present. Nevertheless, the Bayesian method correctly gives the probability that a secondary peak was responsible for, or was the cause of, the observed spectrum. The probability is low because a peak-like section of noise could also have caused the observed spectrum. The Bayesian probabilities for Figure 6 A and 6B are approximately the same (0.77 and 0.79). A search for a secondary peak within an increasing range of separation most clearly points out the difference between the Bayesian and least-squares approaches. As the range increases, Pr(B > 0) for the Bayesian method will slowly approach one half even if a large easily resolved secondary peak is present. This will occur because the number of positive and negative peak-like sections of noise will be approximately the same. As the range increases P r ( B > 0) for the least-squares method, on the other hand, will rapidly approach unity. This will occur because a large, positive, peak-like section of noise will most certainly be encountered.

Figure 6C was obtained for a test spectrum equivalent to the one giving Figure 6B except that the secondary peak is absent (the true value of B is zero). The least-squares method would focus on the primary maximum of R ( d ) and agai? provide an erroneous answer even though it gives Pr(B > 0) = 0.987. The Bayesian method applied to this spectrum gives the probability of a secondary peak as 0.80, which correctly states that a secondary peak is not present a t a probability decision level of 0.95. The Bayesian results for Figure 6B and 6C are approximately the same. The small secondary peak so close to the principal peak is overwhelmed by noise. One would expect that the probability of a secondary peak for Figure 6C where the secondary peak is absent would be less than that for Figure 6B where the secondary peak is present. This apparent contradiction arises from the use of a Bayesian result for one question (the probability of a secondary peak anywhere within a broad region near the principal peak) to answer a different question (the probability of a secondary peak near the location of the true secondary peak). The probability of a secondary peak for given values of d is indeed less in Figure 6C (dashed line) than in Figure 6B in the region of the true secondary peak. In summary, for the resolution of a Lorentzian doublet in white, signal independent noise, the methods studied here may be ranked in order of decreasing effectiveness as Bayesian, least-squares, and modified cross-correlation. The deconvolution and second-derivative methods are approximately equally effective. Since the Bayesian method does not rely on any particular model of a system, the superiority of the Bayesian method will persist for other peak models and for other types of noise.

RECEIVEDfor review August 22, 1973. Accepted August 9, 1974. Presented in part a t the 11th National Meeting, SAS, Dallas, Texas, September 1972. Financial support by the University of Alberta and the National Research Council of Canada is gratefully acknowledged.

Comparison of Backscattering Spectrometry and Secondary Ion Mass Spectrometry by Analysis of Tantalum Pentoxide Layers W. K. Chu, M-A. Nicolet, and J. W. Mayer California lnstitute of Technology, Pasadena, Calif. 9 1125

C. A. Evans, Jr. University of Illinois, Materials Research Laboratory, Urbana, 111. 6 180 I

Samples of anodic tantalum oxide films grown on Ta have been analyzed by backscattering spectrometry (BS), and results compared with those obtained on the same samples by secondary ion mass spectrometry (SIMS). Very good agreement was found in the determination of the film thicknesses. BS supplies this thickness information directly from energy loss data in a non-destructive fashion, whereas SlMS provides relative concentration profiles by destructive sputtering. SlMS can identify the atomic species in the film, 2136

while BS can furnish quantitative information on relative atomic concentration. The comparison stresses the complementary nature of these two analytical techniques.

An increasing interest in the characterization of surfaces and thin solid films has led to the development of several analytical techniques employing the interaction of a probing particle and the sample to be analyzed for physical and/ or chemical characterization of the sample. To be effective

A N A L Y T I C A L CHEMISTRY, VOL. 46, NO. 1 4 . DECEMBER 1974