Additional capabilities of a fast algorithm for the resolution of spectra

Philips Laboratories, North American Philips Corporation, 345 Scarborough Road, Briarcliff Manor, New York 10510. Roger B. Searle. Department of Compu...
0 downloads 0 Views 532KB Size
Anal. Chem. 1988, 60, 1896-1900

1896

Additional Capabilities of a Fast Algorithm for the Resolution of Spectra Richard A. Caruana Philips Laboratories, North American Philips Corporation, 345 Scarborough Road, Briarcliff Manor, New York 10510 Roger B. Searle Department of Computer Science, Villanova University, Villanova, Pennsylvania 19085 S . I. Shupack*

Department of Chemistry, Villanova University, Villanova, Pennsylvania 19085

I n thls paper we expand our previous work on the fast resolutlon of spectra Into Gaussian bands by extending the algorlthm to the resolutlon of Lorentzlan bands as well. Tests demonstrate the ablllty of the algorlthm to qulckly resolve spectra contalnlng both Lorentrlan and Gaussian bands. We present a unlque approach to determlnlng band shape that enables the algorithm to select automatically between Gausslan or Lorentzlan contours for bands of unknown shape. Tests of the automatlc band shape selection capablllty Indlcate that It works for all but the most dlfflcult spectra and Incurs llttle tlme penalty. We also report on experlments investlgatlng the effects of wall placement and area renormallration on flt quality In the presence of nolse and make recommendatlons for the placement of walls and use of area renormalization.

We previously presented an algorithm that resolved spectra into Gaussian bands by applying an efficient linear leastsquares analysis to transformed portions of the spectra (1). The main advantages of this method are that it is faster than conventional error space minimization techniques and is not memory intensive. Also, for many spectra it allows a simpler, more intuitive initial parameterization than is usually required by other algorithms. In this work we report on a transformation and fit scheme for Lorentzian bands similar to that developed for Gaussian bands. We demonstrate the procedure by resolving a 16-band partial infrared spectrum of 5P-androstan-17-one. We also present a novel approach to band-shape characterization that automatically chooses between a Gaussian or Lorentzian profile for each band of unknown shape. It is attractive because it does not significantly impair the resolution algorithm’s performance. In addition, we present the results of a study on noisy singlets and discuss wall placement and the quality of fit obtained in the presence of noise. RESOLVING LORENTZIAN BANDS A Lorentzian band may be described by the following: -

Y =

AREA.WIDTH *(WIDTH2 + (X - CENTER)2)

(1)

Although strictly speaking it is inappropriate to speak of the area of a Lorentzian band because it is undefined, we shall use the term AREA to refer to the usual mixing constant. This will later be related to areas measured directly from spectra. CENTER refers to the center of symmetry of the band, that is the point at which the band achieves maximum height (a Lorentzian band has no true mean). WIDTH refers to the

dispersion constant. It is analogous to the standard deviation of a Gaussian distribution. A plot of l l y vs x for Lorentzian data yields a quadratic in x l/y = a

+ bx + cx2

(2)

where

WIDTH2 + CENTER2 AREA-WIDTH -2rCENTER b= AREA-WIDTH

a=*

c=

(3)

(4) (5)

AREA.WIDTH

A least-squares fit to the transformed data yields estimates for the parabola parameters a , b, and c from which the three parameters of the Lorentzian band are calculated

WIDTH =

* (

a

b CENTER = -2c

AREA =

)

- c ( b / ( - 2 ~ ) ) ~’”

(6)

(7) (8)

This transformation and least-squares fit provide a fast, noniterative means of determining the parameters of an isolated Lorentzian band. As with the transformation and fit provided for the Gaussian band (I),it yields the true parameters only in the absence of noise or deviations from the Lorentzian shape (2). When the noise or deviation from Lorentzian shape is small, however, the estimates obtained by this procedure are accurate and require no further optimization. This method of parameterizingan isolated Lorentzian band is applied to spectra containing overlapped bands by applying the transformation and fit to the regions of the spectrum where each band is least perturbed by neighboring bands. This yields estimates for the parameters of all of the bands in the spectrum. These estimates are then used to correct each of the regions for the perturbations of the other bands. New estimates can then be obtained by applying the transformation and fit to the corrected regions. This process is iterated until the operator judges that adequate convergence has been reached. This cycle is identical with the one we used to resolve spectra into Gaussian bands. See ref 1 for a more detailed

0003-2700/8S/0360-1896$01.50/00 1988 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 60, NO. 18, SEPTEMBER 15, 1988

1897

Table I. Parameterization after 50 Iterations with the Second Wall Set band 1

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

wall placement left right 1126.4 1110.5 1102.8 1089.3 1072.4 1061.8 1056.5 1046.9 1034.8 1019.4 1012.1

1132.6 1118.2 1108.5 1097.5 1080.6 1066.6 1061.3 1054.6 1043.0 1023.7 1017.0 1009.7 990.0 981.8 970.7 956.7

1o00.1 984.7 977.9 964.4 951.4

mean

width

area

height

RMSE"

1.26

1130.90 1114.21 1105.53 1093.43 1076.80 1064.37 1059.00 1050.69 1039.19 1021.72 1014.76 1004.59 987.69 979.81 967.21 952.85

3.64 5.42 2.15 3.20 6.31 3.21 3.81 2.78 3.41 3.22 2.26 3.25 2.56 1.52 1.97 2.19

0.586 0.683 0.229 1.473 0.404 0.410 0.647 2.635 1.273 0.595 0.932 2.923 0.466 0.032 0.240 0.484

0.0513 0.0401 0.0340 0.1467 0.0204 0.0406 0.0541 0.3028 0.1188 0.0588 0.1311 0.2864 0.0579 0.0066 0.0387 0.0701

0.0011 0.0006 0.0007 0.0041 0.0012 0.0024 0.0022 0.0016 0.0047 0.0017 0.0018 0.0161 0.0007 0.0009 0.0017 0.0046

0.996 0.987 0.986 0.990 0.801 0.991 0.980 0.999 0.979 0.959 0.999 0.976 0.997 0.759 0.968 0.985

" Root mean squared error. *The square of the standard regression coefficient. presentation of the algorithm. The area renormalization procedure we introduced for Gaussian bands (1)is easily applied to spectra containingboth Gaussian and Lorentzian bands. Given the parameters of a Lorentzian band, the area contributed by this band to the region between the walls at points w1 and w2 can be calculated from the formula for the definite integral of a Lorentzian AREA (9) = *[arctan (2,) - arctan (z,)]

L:

0.35

0.30

a

u

e6

0.25.

'

0.20' 0 .

0"

0.15.

Q

. 0.10. 0.05.

0.00

where

n

wi - CENTER

zi=

WIDTH

(10)

Area renormalizationis performed by multiplying each band's area estimate by the ratio of the sum of the areas measured in all the walled regions of the spectrum to that predicted by the parameter estimates AREAxneamred

Foat Algorithm

Steepeat Descent

Aj = A .

'

AREApredicted

where Aj is the area estimate for band j computed by the quadratic fit, A R E A p r d d is the sum of the predicted areas in all the walled regions computed by using the current parameter estimates and the appropriate definite integral, and AREA,,,& is the s u m of the areas in all the walled regions measured by using numerical integration. Area renormalization yields a new parameterization where the area predicted by the parameter estimates exactly matches the total area measured in the walled regions.

TEST RESOLUTION Figure 1 presents a 16-band partial infrared absorbance spectrum of the steroid 50-androstan-17-one in carbon disulfide solution. The spectrum was collected with a Nicolet Model 6000 Fourier transformation infrared spectrophotometer. It was originally collected with a data interval of 0.03 cm-'. We reduced the data interval to 0.48 cm-' by averaging 16 consecutive data points. This was done to reduce the cm-' interval to about that used in other reports on this steroid (3, 4 ) and to minimize the noise in the spectrum (the sample size was small). We estimate the noise of the resulting spectrum to be between 0.0010 and 0.0014 root mean square (rms) (5). We performed resolutions using three different wall sets. No initial parameter estimates other than wall placement were made. Figure 2 shows the total rms error between the original and fitted spectrum for each wall set as a function of time and

1125

1100

-- -

1075

1

0.02

~

x2

Y

....... 1050

cm

0.03

1025

1000

975

950

-1

Flgure 1. 1&band infrared spectrum of 5&androstan-l7-onein carbon

disulfide solution and the residual errors after the fast algorithm (solid line) and after steepest descent (dotted line). iteration. We selected the first wall set by inspection of the unresolved spectrum. As this resolution shows, the algorithm does not always converge monotonically. We believe that it is this ability of the algorithm to take uphill steps that helps it avoid getting stuck in poor local minima. The second wall set is a refinement of the first set made by narrowing the regions and better centering them about the band centers (Table I). This yielded faster convergence and improved fit. We resolved the spectrum by using a third wall set because the convergence obtained with the second set was so fast that we were concerned that it might be atypical. The third wall set was made by expanding some of the walls of the second set by one or two data points. Table I shows the parameterization after 50 iterations with the second wall set using Lorentzian bands. The algorithm resolved the spectrum to an error of 0.0052 rms with a maximum absorbance error of 0.031 in under 15 min using an IBM AT and Turbo Pascal. (Pascal source code is available upon request.) A steepest descent minimization algorithm started at the 50th iteration parameterization reduced the rms error to 0.0044 absorbance unit and the maximum error to 0.028

1898

ANALYTICAL CHEMISTRY, VOL. 60, NO. 18, SEPTEMBER 15, 1988

Minutes 0.0 n r

10.0

5.0

I

0.002~

I

I

I

1

40

10

15.0

10.0

15.0

30.0

I

I

eraging about 0.75. Much of the residual error is probably due to this Gaussian component. It is also possible that the spectrum contains small unresolved bands. Our results again suggest a significant speedup compared with the algorithms tested by Pithia and Jones if the performance of the different computers (6, 7) and the quality of the initial estimates are taken into account. Differences between the studies make direct comparison difficult, but tests performed so far indicate that the algorithm can resolve many spectra about 5 to 100 times faster than other algorithms when both are given similar initial parameterizations.

S5.0

I

I

80

80

100

Iteration Figure 2. Convergence of three resolutions of the 16-band infrared spectrum using different wall placements. Each symbol represents an iteration.

absorbance unit. Figure 1 also shows the residual error between the fitted and original spectrum and the residual error obtained with steepest descent. The fit rms errors are significantly greater than the estimated noise in the spectrum and the residue plots clearly show patterned bumps. We believe much of this residue is attributable to deviations from the Lorentzian contour. Pithia and Jones ( 3 , 4 )reported on tests made with the same steroid in their evaluation of different resolution procedures. Their study employed a mixed Gaussian-Lorentzian band shape and found the bands to be predominantly but not exclusively Lorentzian in character with Lorentzian-Gaussian ratios av-

1

SINGLET STUDY We performed a singlet study to investigate the effects of noise, wall placement, and area renormalization on the performance of the algorithm. Noisy Gaussian and Lorentzian singlets were created by adding equally distributed pseudorandom noise to synthetic Gaussian and Lorentzian bands with the same heights and dispersions. Eight different noise levels ranging from 0.0005 to 0.1 of the height of the band were used. No attempt was made to decrease the noise in the tail regions of the bands since this would require presuppositions about the type of spectroscopy being employed and the noise characteristics of the instrumentation. The noise added to the Gaussian and Lorentzian bands is identical for each particular noise level to facilitate comparison. Figure 3 shows some of the noisier singlets. A variety of fits to the noisy singlets were performed. (Note that these fits do not require iteration because there are no overlapping bands.) The effects of varying the width of the walled regions and their position symmetric and asymmetric Gaussian Band

1

L o r e n l z i a n Band

1

Gaussian Band

i d

L o r e n t z i a n Band k R M S Noise z 0 1000

-

RMS Noise = 0 0500

-

L o r e n l z i a n Band RMS Noise I 0 0200

-

-

rJ

3

Figure 3. Noisy Gaussian and Lorentzian singlets.

, 7

200.

0,

inn

200.

-1

L

Added Nolrs

20.0

Added Nolre

.-

001 0.100 OD = 0.010 0 0 = 0.001

\

c

.

Gausslan o= Lonntzlan m=

oo= 0.100 OD = 0.010 0 0 = 0.001

\

0-

Gauaalan

o s Lonntrlan

,.,,td-i5.00

;t LL

g

8

1 .oo

I 10

,

I 2.0

0

,

I 3.0

#

I 4.0

,

I 5.0

Wall Position (std. dev. from center)

1 .oo ~~

0.5

1.0

2.0

3.0

4.0

5.0

Wall Position (+- std. dev.) Figure 4. Fit as a function of noise, band type, and wall placement for symmetrically placed walls (left) and asymmetrically placed walls (right). The dotted horizontal lines represent desired behavior.

ANALYTICAL CHEMISTRY, VOL. 60, NO. 18, SEPTEMBER 15, 1988

L o r e nt z i a n S i n g I e t s >.9999

-

c

1

d

s

b

I

9

d

c

b

f

L L ,9995C .9990I . N

2

?0

..

N

Added Nolse

0

abcdefg = o.1000.

.

b

,995.99 q

-

d

+ C

a

Wall Posltlon

2

f

a b c d

0

-I

-

0.0050. 0.0020 0.0010. 0.O005

N

.95-

.9

w=

0

e

t

abcdafg

5

f

f

0.0500

abcdefg = 0.0200, 0.0100

.9995

d

d

Gaussian Singlets

-" .-

*.a

e

1899

Bc

n

..

= +- 0 . ) 8td.d.v. = +- 1.0 8td.d.v. = +- 2.0 8td.d.v.

+-

3.0 8id.d.v.

- 1.0 rM.dn. to cantor = - 2.0 8id.d.r. to cantor g - 3.0 8id.d.v. to oantar

L

3

f

0.0

9 aj I

0.0

.9

r':

t

.99

I

,

,999

I

>.9

B

Gaussian Fit

Figure 5. r 2 as a function of fit type, .. type classification.

I

r':

Gaussian Fit

band type, _ . wall placement, and noise. The dotted lines separate regions of correct and incorrect band

about the band centers are presented in Figure 4. Area renormalization was employed for these experiments. The results indicate that the algorithm yields the best fit when the walls include between *l.O and f3.0 dispersion units of data centered at the band maximum. Walls that are too wide reduce the quality of the fit, particularly with Gaussian bands. The sensitivity to noise in the tails is understandable given the relative height of the noise to the band in these regions. Weighting the points accumulated in the least-squares fit to reduce this sensitivity would probably be effective (2). Weights could be determined dynamically by using the current estimates of the band centers and dispersions. One disadvantage of such a weighting scheme, however, is that it might impair convergence. Dynamic weights would cause the parameterization of a noisy isolated band to require more than one iteration because the weights would have to stabilize before a final answer was obtained. We have not explored applying weights. The results also indicate that regions symmetric about band centers yield better fit than those skewed to one side, although an acceptable fit can often be achieved with skewed walls that do not include too much tail. This result is important because the resolution of highly overlapped bands sometimes requires skewed wall placement. Carius (8) made extensions to our algorithm that dynamically adjusts wall positions as a function of bandwidth and iteration. He uses narrow, symmetrically placed walls in early iterations to improve stability and gradually widens the walls as fit improves. Area renormalization is used mainly to improve the algorithm's convergence properties. Tests performed on the noisy singlets show that renormaliiation almost always yields fit that is better than or equal to that obtained without it. Thus area renormalization aids convergence without reducing the quality of the fit. The improvement in fit is of secondary importance, though certainly welcomed. In general, the fits obtained with walls encompassing fl to f3 dispersion units of data centered at the band maximum yielded rms errors similar to the rms noise added to the spectra. Thus, the quality of the fits obtained by these transformations and fits are good across a wide range of noises; the rms error cannot be expected to be significantly smaller than the added noise. Fits using walls narrower than *1 dispersion unit yielded nonoptimal results, but this is probably due to the reduced number of data points in the sample.

AUTOMATIC BAND SHAPE DETERMINATION Figure 5 shows the rz values obtained by applying both Gaussian and Lorentzian fits to Gaussian and Lorentzian

bands. ? is usually higher for the correct band shape fit. This suggests that the r2 values obtained from Gaussian and Lorentzian fits to an isolated band of unknown type can be used to determine the band type. Failure of this criterion occurred only for singlets containing high noise or using poor wall placement. No classification errors occurred for noises below 0.02 of the band height if the recommended wall placement was used. And few classification errors occurred even with poor wall placement if the noise was small. The data also show that, as expected,1.2 values improve as the noise decreases. This simple scheme of determining band shape is applied to spectra containing overlapped bands by applying both transformation and fits to bands of unknown shape, and selecting, for the duration of the current iteration, the band shape that yields the higher 1.2 value. The band shape selected during an iteration may not be correct, particularly during early iterations where the correction for the perturbation of other bands is likely to be poor. Often, however, the quality of the fit continues to improve and each band's true profile emerges and the proper band shape is selected. Of course the fit may not continue to improve if the band shapes are incorrect, but for spectra that are not too highly overlapped, the difference between the Gaussian and Lorentzian contours is not so great that an early error in shape classification prevents improvement of the fit. Thus the correct band shape classification is often eventually obtained. This approach to classifying band shapes is attractive because it avoids the usual combinatorial explosion created by having to consider all possible combinations of band types during each iteration. Figure 6 shows all combinations of Gaussian and Lorentzian three-band synthetic spectra containing a partially hidden band. We resolved these spectra without specifying any band shapes. The algorithm properly identified each band in each spectrum and converged to the correct solution in about 15 iterations requiring about 1 min per spectrum. Early band shape misclassifications were common, particularly of the obscured band, but by the tenth iteration all bands in each resolution were correctly classified. Resolutions of spectra similar to those in Figure 6 but somewhat more overlapped sometimes failed to classify the obscured band correctly. As expected, the algorithm does not achieve acceptable fit in cases where a band is consistently misclassified. This is indicated by high rms errors, poor r2 values, and patterned bumps in residue plots. An attempt to automatically classify all the bands in the 16-band spectrum of Figure 1 also failed. One band was consistently classified as Gaussian preventing final convergence. (Note that these

1900

ANALYTICAL CHEMISTRY, VOL. 60, NO. 18, SEPTEMBER 15, 1988

1

1

c

1

G

1

L

u 1

L

the resolution of Lorentzian bands. Tests on a 16-bandsteroid infrared spectrum demonstrate that the algorithm is capable of quickly resolving complex spectra. This speed, combined with the algorithm’s modest memory requirements and its ability to resolve many difficult spectra without a full initial parameterization, makes it an attractive technique. Tests performed on noisy Gaussian and Lorentzian singlets show that the quality of the fit obtained with the algorithm is dependent on wall placement. Where possible, walls should be placed symmetrically about the band maximum and should contain between fl and f3 dispersion units of data. The tests also show that the quality of fit obtained with the algorithm on singlets is good across a wide range of data noises. This result is important because it is difficult to analyze theoretically the algorithm’s performance under noise. An efficient automatic band shape determination facility has been developed based on comparison of the r2 values obtained with different band type fits. Tests on noisy singlets show that this procedure is accurate if the noise is small and a reasonable wall placement is used. Tests on synthetic spectra containing a partially obscured band demonstrated its ability to determine band shape without adversely affecting performance. Tests on more difficult spectra, however, show that this approach cannot be used to determine the band shape in all spectra that the algorithm can otherwise resolve.

ACKNOWLEDGMENT We express our warm appreciation to Dr. Jones and to Dr. Flgure 6. Synthetic test spectra coinposed of Gaussians and Lorentzians and containing a partially obscured band.

bands are not pure Lorentzians; they have significant Gaussian components.) Thus, the automatic band selection capability is not able to determine band types for all spectra that the algorithm can resolve given specified band types. For highly overlapped spectra it remains the responsibility of the operator to specify band type. Failure to converge after a reasonable number of iterations usually indicates that operator intervention is required.

CONCLUSIONS We have extended the fast algorithm we previously presented for resolving spectra into Gaussian bands to allow for

Bader, Chairman of Aldrich Chemical Co., Inc., for lending to us the only known sample of P-androstan-17-one. We also thank Pennwalt Co. for the use of their spectrophotometer.

LITERATURE CITED Caruana, Richard A.; Searle. Roger B.; Heller, Thomas: Shupack. Saul I. Anal. Cbem. 1988, 58, 1162-1167. Motulsky, Harvey J.; Ransnas, Lennart A. FASEB J . 1087, 7 , 365-374. Pithia, J.: Jones, Norman R. Can. J . Cbem. 1866, 4 4 , 3031-3050. Plhla, J.; Jones, Norman R. Can. J . Spechosc. 1968, 11. Caruana. Richard A., unpublished work, 1987. Moravec, Hans Mind CbiMen: The Future of Robot and Human Intelligence, in press. Dongarra, Jack, J. Perf. of Various Computers Using Standard Lin. Eqs. Software in a Fortran Environment; MCS-TM-23: Argonne National Lab, 1985. Carius, Wolfgang, private communications, 1987.

RECEWED for review January 26,1988. Accepted May 4,1988.