Improved Identification of Noisy Spectra Using Higher-Ordered

Improved Identification of Noisy Spectra Using Higher-Ordered Correlation Spectral Analysis. Samuel P. Kozaitis*, and Bruce S. Bradshaw. Division of E...
0 downloads 4 Views 236KB Size
Anal. Chem. 1996, 68, 2686-2692

Improved Identification of Noisy Spectra Using Higher-Ordered Correlation Spectral Analysis Samuel P. Kozaitis* and Bruce S. Bradshaw

Division of Electrical, Computer Science, and Engineering, Florida Institute of Technology, 150 West University Boulevard, Melbourne, Florida 32901

We used a higher-order correlation-based method of comparison for spectral identification. Higher-order correlations are an extension of the more familiar secondorder cross-correlation function and have the significant advantage of being theoretically shown to eliminate noise of unknown spectral density under certain conditions. Specifically, we applied a third-order correlation technique to the identification of similar IR spectra in the presence of noise. We were able to reduce the effects of noise from a second-order correlation measurement by further processing the measurement with a third-order autocorrelation. Our results showed that the third-order correlation-based method increased the probability of detection of a spectrum in the presence of noise, when compared to using a second-order technique alone. The probability of detection increased enough at low signalto-noise ratios that this technique may be useful when a second-order correlation technique is not acceptable. The third-order technique is applicable to a single experiment, but improved results were found by averaging the results of multiple experiments. Spectral comparison of an unknown sample for identification using cross-correlation techniques is generally limited by noise. Typically, an unknown input spectrum is cross-correlated with known spectra in a library for identification. The closest match in the library can be found by selecting the spectrum that yields the correlation coefficient with the largest value. If the input spectrum is corrupted by noise, its autocorrelation decreases as the signal-to-noise ratio (SNR) increases. Therefore, the probability of detection of an unknown spectrum within a library decreases as the noise power increases. In addition, the probability of detection is a function of the cross-correlations between the unknown spectrum and spectra in a library when noise is present. This means that comparison of an unknown spectrum to spectra that are similar to each other leads to a higher probability of error than when comparing to spectra that are significantly different. The cross-correlation technique has been theoretically shown to yield the highest possible SNR when detecting a spectrum in additive noise whose spectral density is known.1 For maximum SNR, portions of a spectrum where there is substantial noise power are not necessarily used; therefore, the effect of noise is reduced, the SNR increases, and identification becomes easier. (1) Carlson, A. B. Communication Systems, 2nd ed.; McGraw-Hill: New York, 1975.

2686 Analytical Chemistry, Vol. 68, No. 15, August 1, 1996

In addition, the cross-correlation method has been effective at eliminating cross-interference from other spectra in some cases, such as with IR spectra of gases.2 Cross-correlation techniques are not necessarily optimum for detecting a known spectrum in noise whose spectral density is unknown. This is because portions of a spectrum where there is substantial noise power are not known, so all portions of a spectrum are used for identification. Although noise is often assumed to be additive and generally to follow a Gaussian (normal) distribution, its variance is often unknown. In spite of the effect of noise, spectral comparisons using correlation techniques have shown to be useful.3-9 In addition, the effect of noise was reduced by considering the symmetry of the autocorrelation function.10 In this technique, the overall shape of the correlation function was found by correlating a noisy spectrum with a known spectrum with little noise. Then, by autoconvolving the cross-correlation pattern, information on the symmetry of the correlation function was obtained. Using measures of the symmetry and magnitude of the cross-correlation function, this technique improved the discrimination against noisy data when applied to high SNR situations, such as Mo¨ssbauer spectroscopy.10 The familiar correlation function is a special case of higherorder correlations that can also be used as statistical tools. In the 1960s, researchers in several different technical areas studied higher-order correlations in an effort to infer new properties of data such as skewness, or kurtosis. Since then, there has been a renewed interest, especially in the area of spectral analysis, with applications in many areas, such as astronomy,11 geophysics,12 and plasma physics.13 The main interest in higher-order correlations stems from the fact that a zero-mean Gaussian distribution has a second-order correlation (autocorrelation) but no higher-ordered correlations. Therefore, in situations where an observed function (2) Galais, A.; Fortunato, G.; Chavel, P. Appl. Opt. 1985, 24, 2127-2134. (3) Betty, K. R.; Horlick, G. Anal. Chem. 1976, 48, 1899-1904. (4) Lam, R. B.; Isenhour, T. L. Anal. Chem. 1981, 53, 1179-1182. (5) Lam, R. B.; Wieboldt, R. C.; Isenhour, T. L. Anal. Chem. 1981, 53, 889A900A. (6) Lam, R. B.; Sparks, D. T.; Isenhour, T. L. Anal. Chem. 1982, 54, 19271931. (7) Mann, C. K.; Goleniewski, J. R.; Sismanidis, C. A. Appl. Spectrosc. 1982, 36, 223-227. (8) Kawata, S.; Noda, T.; Minami, S. Appl. Spectrosc. 1987, 41, 1176-1182. (9) Gemperline P. J.; Boyer, N. R. Anal. Chem. 1995, 67, 160-166. (10) Hoffmann, D. P.; Proctor, A.; Hercules, D. M. Anal. Chem. 1989, 61, 898904. (11) Freeman, J. D.; Christou, J. C.; Roddier, F.; McCarthy, D. W., Jr.; Cobb, M. L. J. Opt. Soc. Am. 1988, 5, 406-415. (12) Hinich, M. J.; Clay, C. S. Rev. Geophys. 1968, 5, 347-363. (13) Kim, Y. C.; Powers, E. J. Phys. Fluids 1978, 21, 1452-1453. S0003-2700(96)00115-1 CCC: $12.00

© 1996 American Chemical Society

consists of a non-Gaussian signal in additive Gaussian noise, there may be an advantage to examining a function in a higher-order domain. There have been several books14-16 and tutorial papers17-19 published on higher-ordered correlations, including a bibliography with almost 300 journal articles and books.20 In this paper, we used a higher-order, correlation-based method for spectral comparison to reduce the effect of noise and increase the probability of identification. We compared a noisy input spectrum to a library of spectra using the familiar correlation function, where the input spectrum contained additive noise of unknown spectral density. We then calculated the cross-correlation functions between the input and library. To reduce the effect of noise, we further processed the cross-correlation functions by calculating their third-order correlation coefficient. Because the higher than second-order moments of the Gaussian probability function are zero, the third-order correlation coefficient will not have a contribution from Gaussian noise under certain conditions.16 Therefore, in our approach, we examined correlation coefficients for identification in an environment where the noise had been reduced. Using similar IR spectra, we determined the probability of detection of an input spectrum in the presence of both noise and an interfering spectrum. In the following section, we briefly discuss the familiar correlation method, higher-order correlations, and our method. Then we use experiments to compare a cross-correlation method to our third-order approach as a function of SNR. THEORY Correlation. The correlation between two functions has been often used as a measure of their similarity. In addition, the correlation measure is often normalized so that the result is independent of function scaling. The normalized correlation between two discrete functions a(k) and b(k) is defined as N-1

∑ a(k)b(τ + k)

c(τ) )

1

k)0

)

x

N-1

N-1

∑ a (k) ∑ b (k) 2

k)0

N-1

∑ a(k)b(τ + k)

Cab k ) 0

(1)

If an input spectrum is an exact match with a library spectrum, then that correlation coefficient is unity, which is the maximum value possible. Correlations when noise is present or between differing spectra generally yield values of c(0) less than unity. To find the best match of an unknown spectrum in the presence of noise, the library spectrum with the largest correlation coefficient is chosen. The disadvantage of this approach is that, if two or more library spectra are similar, then errors in identification may occur when noise is present. Spectral Identification Using Correlation. We described a theoretical example of spectral identification using a correlation technique in an attempt to gain insight into the effect of noise. We considered a noisy spectrum that contained zero-mean noise of unknown spectral density. The noisy input spectrum was described as b(k) ) a(k) + n(k), where the spectrum was represented by a(k) and contained N samples, and n(k) indicated noise. We considered a reference spectrum that was the same as the input spectrum without the noise; however, our reference spectrum could be any other spectrum with a change of notation. To compare the input and reference, we calculated the familiar normalized correlation function described in eq 1. We were interested in the best match between the input and reference, so we used only the maximum value of the normalized correlation function. The maximum value occurs at zero displacement, or τ ) 0 in eq 1, and was written as

c(0) )

where τ is the displacement of the two functions and ranges from 0 to N - 1, and N is the number of samples contained in one period. The denominator in eq 1 is a constant with respect to τ, and was set equal to Cab. This normalization factor limits values of c(τ) to values between +1 and -1 and is the square root of the product of the function powers. Considering the two functions as spectra, eq 1 can be used for their comparison. In this case, τ is usually set to zero to compare the spectra at zero displacement. Typically, a correlation is performed between an unknown input spectrum and each known spectrum in a library. The spectrum of the library that yields the best match with the unknown input will have the largest correlation coefficient, c(0). (14) Nikias, C. L.; Peptropulu, A. P. Higher-Order Spectral Analysis: A Nonlinear Processing Framework; Prentice-Hall: New York, 1993. (15) Rosenblatt, M. Stationary Sequences and Random Fields; Birkhauser: Boston, MA, 1985. (16) Proakis, J. G.; Rader, C. M.; Ling, F.; Nikias, C. L. Advanced Digital Signal Processing; Macmillan: New York, 1992. (17) Mendel, J. M. Proc. IEEE 1991, 79, 278-305. (18) Nikias, C. L.; Raghuveer, M. R. Proc. IEEE 1987, 75, 869-891. (19) Nikias, C. L.; Mendel, J. M. IEEE Signal Processing Mag. 1993, 10, 10-37. (20) Delany, P. A.; Walsh, D. O. IEEE Signal Processing Mag. 1994, 11, 61-70.

N-1

∑ a(k)b(k)

(2)

Cab k ) 0

Because the noisy input spectrum could be described as a spectrum plus noise, we considered the correlation result as consisting of two parts. One part was the correlation between the noiseless input spectrum and reference spectrum, and the second part was the correlation between the noise and the reference spectrum. Therefore, because b(k) ) a(k) + n(k), eq 2 was written as

2

k)0

1

c(0) )

1

N-1

∑ a(k)[a(k) + n(k)] )

Cab k ) 0

1

N-1



Cab k ) 0

a2(k) +

1

N-1

∑ a(k)n(k)

Cab k ) 0

(3)

where we assumed the noise was uncorrelated. Due to the random nature of noise, the normalized correlation coefficient will vary from experiment to experiment when noise is present. The first term on the right-hand side of eq 3 will decrease as the noise power increases due to the normalization factor. The second term will vary depending on the noise power and the cross-correlation between the specific noise sample and the spectrum. One way to reduce the effect of noise is to make the second term zero. For example, if a(k) is nearly constant, n(k) has zero mean, and N is large, then the last term in eq 3 could be small. However, a spectrum is not usually nearly constant, so this term remains. It is common to use an averaging process in an attempt to reduce the effect of noise. We considered multiple correlations between a noisy input spectrum and a reference using independent noise samples and averaged the results. We wrote the average of eq 3 as Analytical Chemistry, Vol. 68, No. 15, August 1, 1996

2687

{ {

cˆ(0) ) E{c(0)} ) E

)E

1

N-1



Cab k ) 0 1

a2(k) +

N-1

∑ a (k) 2

Cab k ) 0

1

N-1

∑ a(k)n(k)

Cab k ) 0

} { +E

1

}

N-1

∑ a(k)n(k)

Cab k ) 0

}

(4)

where E{ } indicated the expected value. The function a(k) represented a spectrum that is deterministic, not a random variable; therefore, E{a(k)} ) a(k). Using this fact, eq 4 was written as

cˆ(0) )

1

N-1

N-1

1

∑ a (k) + C ∑ a(k)E{n(k)} 2

Cab k ) 0

(5)

ab k ) 0

Because we have assumed zero-mean noise, the second term in eq 5 will be equal to zero if enough correlation experiments are averaged. However, it may be impractical to take a large enough number of correlations. Note that the random entity in the normalization factor in eq 5 is the expected value of the noise power. Under controlled conditions, we can have independent noise samples even though the power of the noise is constant. We use this fact in a later section to determine the probability of detection of a spectrum in the presence of noise. Higher-Order Correlations. The correlation function as described in eq 1 is a special case of what are called higher-order correlations. Although higher-order correlations have been used for over 30 years, their use has been limited. Recently, there has been renewed interest in several areas. This is primarily because the noise contribution to a higher-order correlation can be theoretically eliminated under certain conditions, which has been well documented.11-21 The nth-order correlation of the signal a(k) is defined as

Figure 1. Correlation example: (a) example function, (b) its secondorder correlation, and (c) its third-order correlation.

Calculation of the Third-Order Correlation Coefficient. The third-order correlation coefficient a3(0,0), can be found by sampling the triple correlation a3(τ1,τ2), at zero displacement where τ2 ) τ1 ) 0. This method of calculating the third-order correlation coefficient may require a potentially large number of computations because the entire third-order correlation is calculated. However, in our work, we were only interested in one sample of the triple correlation, its value at zero displacement. From eq 7, the nth-order correlation at zero displacement was written as

an(τ1,τ2,...,τn-1) ≡

N-1

an(0,0,...,0) )

N-1



a(k)a(τ1 + k)a(τ2 + k) ... a(τn-1 + k) (6)

k)0

where the nth-order correlation is a function of n - 1 independent variables.16 For n ) 2, eq 6 becomes the second-order correlation of a(k), which is the familiar autocorrelation function. We primarily considered the third-order or triple correlation because it has the same advantages for our purpose and is easier to calculate than other higher-order correlations. The third-order correlation, n ) 3, of a one-dimensional function is a function of two variables. From eq 6, the third-order correlation of a(k) is N-1

a3(τ1,τ2) )

∑ a(k)a(τ

1

+ k)a(τ2 + k)

(7)

k)0

where a3(k) is symmetric with respect to its variables τ1 and τ2. There is no distinction between the variables, and their reversal should yield the same function. The third-order correlation of a function can be calculated directly from eq 7 by using a shift and add technique. For example, Figure 1a indicates a function a(k), its second-order autocorrelation is shown in Figure 1b, and its third-order autocorrelation is shown in Figure 1c. Note that, if τ1 (τ2) is set to zero, the second-order correlation, or autocorrelation, appears on the τ2 (τ1) axis in Figure 1c.

2688 Analytical Chemistry, Vol. 68, No. 15, August 1, 1996

n

(8)

Therefore, the nth-order correlation of a(k) at zero displacement is equal to the sum of the nth power of samples of a(k). Equation 8 allows higher-order correlations at zero displacement to be calculated relatively easily. Therefore, the third-order correlation coefficient became N-1

a3(0,0) )

∑ a (k) 3

(9)

k)0

which showed that the third-order correlation coefficient a3(0,0) of a(k) can be calculated directly as the sum of the cubes of a(k) from k ) 0 to N - 1. Insensitivity of Third-Order Correlation Coefficient to Symmetric Distributions. For a zero-mean symmetric distribution, the third-order correlation coefficient is zero.14-16 This can be seen from the following simple example. We considered a discrete symmetric distribution of the random variable k, where -N e k e N, with probability density function p(k) as shown in Figure 2. The third-order correlation coefficient from eq 9 was written to reflect the symmetry of the distribution as N

k3(0,0) )



k ) -N

(21) Giannakis, G. B.; Tsatsanis, M. K. IEEE Trans. Acoustics, Speech, Signal Procecssing 1990, 38, 1284-1296.

∑ a (k)

k)0



N

-1

k3p(k) )

k ) -1

k3p(k) +

∑ k p(k) 3

(10)

k)1

If p(k) is symmetric, then p(N) ) p(-N). Therefore, the first term

Figure 2. Symmetric distribution of random variable k, with probability density function p(k).

in eq 11 to detect the best match between a spectrum and a library. The second-order correlation has been normalized, and we are attempting to remove noise from the measurement. The best match to the input spectrum will be the library spectrum that yields the largest third-order correlation coefficient of the secondorder correlation.21 Because the noisy input spectrum could be considered as a spectrum plus noise, we considered the output of the secondorder correlation as consisting of two parts. One part was the correlation between the input spectrum and a reference spectrum, and the second part was the correlation between the noise and a reference spectrum. The second-order correlation result was written as

c(k) ) ca(k) + cn(k)

(12)

where ca(k) represented the correlation between the input spectrum and the reference spectrum, and cn(k) represented the correlation between the noise and the reference spectrum. Substituting eq 12 into eq 11, we wrote an expression for the thirdorder autocorrelation coefficient as 2N - 1

c3(0,0) )

∑ [c (k) + c (k)]

3

a

n

k)0

Figure 3. Block diagram of third-order correlation classification procedure.

2N - 1

)



2N - 1

∑c

ca3(k) ) 3

k)0

on the right-hand side of eq 10 when k ) -N, and the last term when k ) N, will have the same magnitude but different signs, and their sum will be zero. All other terms will cancel as well when k ) -N + 1, -N + 2, ..., in the first term, and k ) N - 1, N - 2, ..., in the last term, respectively. Because the Gaussian distribution is symmetric about its mean, the third-order correlation of a zero-mean Gaussian process is zero. We can use this fact to eliminate noise when making a measurement in the presence of Gaussian noise. If a measurement is corrupted by additive, zero-mean Gaussian noise, then the contribution of noise to the averaged third-order correlation coefficient will approach zero for a large number of measurements. Therefore, in the limit the measurement will not have any contribution from noise. Spectral Identification Using Higher-Order Correlation. In this section, we describe a method for spectral comparison using third-order correlations. As in our previously described second-order correlation example, we considered a noisy spectrum that contained zero-mean noise of unknown spectral density. Our goal was to compare this spectrum to a library of reference spectra to detect the best match. As before, a reference spectra were described as a(k) and the noisy input spectrum as a(k) + n(k). We first calculated the familiar second-order correlation between the noisy input and the reference spectra. The resulting correlation function was labeled as c(k). We then calculated the third-order autocorrelation coefficient of c(k) in an attempt to reduce the effect of noise. We showed this process considering a library of spectra in block diagram form in Figure 3. Using eq 9, the third-order autocorrelation coefficient of the second-order correlation result of a single spectrum was described as 2N - 1

c3(0,0) )

∑ c (k) 3

(11)

k)0

Note that it is not necessary to normalize the third-order result

2 a

(k)O

Dyn(k) +

k)0

2N - 1

3

∑ c (k)c a

2N - 1

∑c

2 n

(k) +

k)0

3 n

(k) (13)

k)0

In an attempt to reduce the effect of noise, we considered an averaging process. We considered the same input spectrum with independent noise samples and averaged the third-order autocorrelation value in eq 13 for multiple experiments. The average value of eq 13 was written as 2N - 1

cˆ3(0,0) ) E{

∑c

2N - 1 3 a

(k) + 3

k)0

∑c

2

(k)cn(k) +

a

k)0 2N - 1

3

∑ c (k)c a

k)0

2N - 1 2

n

(k) +

∑c

3 n

(k)} (14)

k)0

The function ca(k) is deterministic; therefore, E{ca(k)} ) ca(k), and eq 14 was written as21 2N - 1

cˆ3(0,0) )

∑ k)0

2N - 1

ca3(k) + 3

ca2(k)E{cn(k)} + ∑ k)0

2N - 1

3

2N - 1

ca(k)E{cn (k)} + ∑ E{cn3(k)} ∑ k)0 k)0 2

(15)

Because we have assumed zero-mean noise, the second term in eq 15 is equal to zero if we average enough correlation results. If the noise is Gaussian or has a symmetric distribution, then the last term, which is the third-order autocorrelation coefficient of the noise, will also be zero. The third term is a product of signal and noise. If the mean is subtracted from the input and reference spectra, then this term will vanish, and only the triple-correlation coefficient, the first term, will result. The significance of eq 15 is that, under certain conditions, the noise can be theoretically eliminated from a second-order correlation result by calculating its third-order autocorrelation coefficient. Analytical Chemistry, Vol. 68, No. 15, August 1, 1996

2689

Figure 4. Histogram of correlation values corrupted by Gaussian noise.

There is a significant difference between the second- and thirdorder methods that is not difficult to overlook: the nature of the function for which a correlation coefficient is being calculated. In the second-order correlation example, the noise is multiplied by a spectrum a(k). Often, a(k) can be rapidly varying and generally not symmetric with respect to k. Therefore, many records of the noise are needed so that the expected value of the noise spectrum is zero. In the thirdorder method, the cross-correlation of the noise and reference spectrum is cubed, or multiplied by the square of a crosscorrelation function in the second term in eq 15. The crosscorrelation function is zero at each endpoint and increases to a maximum at its center. In this way, a cross-correlation function can, in a general sense, be thought of as resembling a symmetric function. Therefore, when this function is summed as in eq 15, many of the terms cancel, and the noise may be reduced. The significance is that the effect of noise may be reduced even for a single experiment if the crosscorrelation functions are symmetric enough. Probability of Detection. After a third-order correlation coefficient has been calculated, we need to decide which reference spectrum it is to be associated. In the absence of noise, the value of the correlation coefficient between an input and reference spectra will be equal to a constant. As random noise is added to the input spectrum, the correlation coefficient may fluctuate from experiment to experiment. If at least two reference spectra are similar enough, then an error may be made in assigning the correlation coefficient to a spectrum. We considered a theoretical example of an input spectrum that matches one spectrum in a library of two reference spectra. We determined the probability of identification of the input spectrum when it was corrupted by noise. We assumed the noise was a zero-mean Gaussian process with variance σ2, so its probability density function was described as

p(x) )

1

x2π σ

[ 21(x -σ µ) ]

exp -

2

(16)

reference spectra were described as p1[c3(0,0)] and p2[c3(0,0)] respectively. For the reference spectrum that matched the input, the mean value was referred to as µ1. The mean of the correlation coefficients associated with the other reference spectrum was referred to as µ2. If each of the spectra were equally likely to occur then the maxima of the two probability density functions, P1 ) P2 are equal. If there are only two spectra in our library, then P1 + P2 ) 1. Because the distributions may overlap, a spectrum may be incorrectly identified if a correlation value falls between the means. We may maximize the probability of detection of a spectrum by choosing an optimal threshold T to decide to which distribution a correlation coefficient belongs. If the correlation coefficient is above the threshold, we chose the spectrum associated with p1[c3(0,0)]. If the correlation value is less than the threshold, we chose p2[c3(0,0)]. When choosing p1[c3(0,0)], for example, because the correlation coefficient is above the threshold there is a probability that the value is associated with p2[c3(0,0)]. This error was written as

E1(T) )

∫ p [c (0,0)] ∞

T

2

3

(17)

Similarly, if a correlation value was below T, then the identification error is

E2(T) )

∫ p [c (0,0)] T

0

1

3

(18)

The optimum threshold T for minimizing the probability of error of misidentification when each spectrum is equally likely to occur is simply the average of the means.23 Therefore, if we know the mean values and probability density function of the correlation coefficients, we can assign a correlation coefficient to a spectrum with a specific probability of detection. EXPERIMENTAL SECTION We investigated the third-order correlation technique on the probability of detection of noisy spectra of toluene when deciding between toluene and benzene, whose spectra are shown in Figure 5. The spectra were obtained from a Nicolet 20SX FT-IR spectrometer and digitized to 416 samples between 1700 and 4800 wavenumbers. Each sample had a resolution of 256 levels. The correlation experiments were performed using MATLAB on a Macintosh IIx. We performed second- and third-order correlations as described earlier for different amounts of computer-generated additive noise. We subtracted the means from the spectra before performing correlation experiments to yield the advantages of the third-order technique, as indicated in eq 15. The cross-correlation between the toluene and benzene spectra was found to be 0.738. We performed 100 correlation experiments between the spectrum of toluene and each library spectrum for each of the second- and third-order correlations for 14 different values of SNRs from (10 dB to determine the statistics of the correlation values. For each experiment at a particular SNR, we used independent noise samples that were normally distributed.

When independent noise samples are added to the input spectrum, the distribution of normalized second-order correlation coefficients will be approximately Gaussian. Because Gaussian functions are invariant under linear operations,22 the distribution of third-order correlation coefficients for a series of experiments will have approximately a Gaussian distribution. This situation is illustrated in Figure 4 for one input spectrum and two different reference spectra. The probability density function of the third-order correlation coefficients for the two

RESULTS AND DISCUSSION The effect of Gaussian zero-mean noise on the correlation heights of a second-order and third-order method as a function of the SNR is shown in Figure 6. To compare the two methods,

(22) Clarkson, P. M. Optimal and Adaptive Signal Processing; CRC Press: Boca Raton, FL, 1993.

(23) Gonzalez, R. C.; Woods, R. E. Digital Image Processing; Addison-Wesley: New York, 1992.

2690

Analytical Chemistry, Vol. 68, No. 15, August 1, 1996

Figure 5. Spectra of (a) toluene and (b) benzene.

Figure 7. Graph of the standard deviation of autocorrelation values for second- and third-order correlations as a function of SNR.

Figure 6. Graph of autocorrelation and cross-correlation values for second- and third-order correlations as a function of SNR.

the third-order data in Figure 6 were normalized by multiplying the data by a factor that gave the same results as the secondorder result when noiseless data were used. Each point in Figure 6 represents the mean of 100 correlation coefficients. The results showed that the second-order correlation coefficients decreased significantly with increasing SNR. In contrast, the results of the third-order correlation experiment remained relatively constant. The standard deviation of the correlation coefficients for the autocorrelation experiments are shown in Figure 7. The results showed that the third-order correlation values showed a larger standard deviation than the second-order values and significantly increased as the SNR decreased. Ultimately, we were interested in the probability of detection of a spectrum in the presence of noise. We needed to set a threshold value for correlation heights for which we could derive a decision rule as to which substance a noisy input spectrum belonged. We considered either spectrum equally likely to occur, so the optimum threshold was the average of the autocorrelation and cross-correlation values of toluene. Such a threshold yields equal error probabilities for each spectrum. The decision rule was as follows: if a correlation value was found to be larger than

Figure 8. Graph of the probability of detection of toluene when deciding between toluene and benzene as a function of SNR for zeromean Gaussian noise.

the threshold, then we considered the input to be toluene; if it was less than this value, then we considered the input to be benzene. Using the statistics from 100 experiments for each SNR value, and considering the diagram in Figure 4, we integrated over the regions of the probability of error for each value of SNR for both the second- and third-order experiments. Subtracting this value from unity gave us the probability of detection, which is shown as a function of SNR in Figure 8. Our results showed that, for a SNR of ∼5 dB, the second-order correlation had about a 90% probability of detection. The probability of detection dropped rapidly below SNR ) 5 dB to a 50% probability of detection, which is not useful. Therefore, the second-order correlation technique was generally useful for discriminating between toluene and benzene at values of SNR > 5 dB. The third-order correlation technique indicated a 90% probability of detection at about SNR ) 2 dB for a single experiment. At lower values of SNR, the probability of detection decreased, but less rapidly than in the second-order case. In addition, we Analytical Chemistry, Vol. 68, No. 15, August 1, 1996

2691

Figure 9. Graph of the probability of detection of toluene when deciding between toluene and benzene as a function of SNR for nonstationary noise.

considered averaging the results of experiments of both the thirdorder and second-order techniques. In these cases, correlation coefficients were obtained by averaging the results of 2 and 6 experiments, respectively, using independent noise samples. As before, we performed 100 sets of experiments, each of 14 values of the SNR to determine the statistics of the correlation coefficients. The probabilities of detection using these statistics for multiple experiments are also shown in Figure 8. The results showed that, by averaging multiple records, the probability of detection was increased for the third-order technique. The small number of experiments did not significantly improve the results of the second-order technique. We also considered the effects of nonstationary noise on the probability of detection. We allowed the mean value of the noise to vary in a Gaussian distribution around zero; therefore, the noise consisted of a dc and an ac portion. At a particular SNR value, the dc and ac portions were random variables, and the sum of their powers added to a particular noise power. The distribution of the noise was characterized by the fact that it contained only dc power when the mean was six standard deviations from zero. As in previous experiments, we performed 100 correlation experiments between the spectrum of toluene and each library spectrum for each method for 14 different values of SNRs from (10 dB to determine the statistics of the correlation values. The results are shown in Figure 9 and are similar to the results of the stationary case. The probability of detection of the third-order method was generally only slightly less than in the stationary case for all data. The results of the third-order technique remained higher than those of the second-order method in all cases. Shot noise is a fundamental noise that exists in all optical detection processes. Although the average number of photons

2692 Analytical Chemistry, Vol. 68, No. 15, August 1, 1996

Figure 10. Graph of the probability of detection of toluene when deciding between toluene and benzene as a function of SNR for shot noise.

arriving on a detector may be known, the actual number of photons is not known. The deviation of the actual value from the average is shot noise. When only shot noise is present, the SNR increases linearly with optical power. Using only shot noise, we showed the results of correlation experiments in Figure 10. The probability of detection of the third-order method was generally not as high as with the other types of noise tested. However, the results of the third-order technique generally remained higher than those of the second-order method. CONCLUSION We were able to reduce the effects of Gaussian-distributed noise from a second-order correlation response by further processing the response with a third-order autocorrelation. The noise reduction allowed an increase in the probability of detection of a noisy spectrum of toluene when deciding between toluene and benzene, compared to the second-order technique alone. In addition, the probability of detection was increased at low SNR values, so this technique may be useful when a second-order correlation technique is unacceptable. The third-order technique is applicable to a single experiment, but improved results were found by averaging the results of multiple experiments. Received for review February 5, 1996. Accepted May 27, 1996. AC960115E X

Abstract published in Advance ACS Abstracts, July 1, 1996.