Boolean logic system for infrared spectral retrieval - Analytical

Barry Wythoff , Xiao Hong-Kui , Steven P. Levine , and Sterling A. Tomellini. Journal of Chemical Information and Computer Sciences 1991 31 (3), 392-3...
2 downloads 0 Views 514KB Size
1288

Anal. Chem. 1983, 5 5 , 1288-1291

2

1

4

3

2nA

~

aqueous samples (e.g., compare Figures 2 and 5). In conclusion, we have shown in this paper that voltammetry can be utilized for selective detection of adsorbable analytes in flow injection systems. Solution-phaseelectroactive species with similar redox potentials do not interfere due to the medium exchange nature of the flow injection systems. At the same time high sensitivity is maintained as a result of the preconcentration step. The simplicity of the method makes it superior to existing techniques for measuring adsorbable analytes. The extensive literature on chemically modified electrodes suggests many possibilities for extensions and elaboration. The characteristic procedure parameterspreconcentration period, flow rate, etc.-must be adjusted to suit the requirements of each particular case (nature of attachment, concentration level, etc.). Work in this laboratory is continuing in this direction. Registry No. Chlorpromazine, 50-53-3.

LITERATURE CITED 5

6

7

8

+

Figure 5. Re etitive injections of a diluted (1 3) urine solution spiked with 8 X I O - M chiorpromazlne. Condtlons were the same as those given in Figure l b except that the flow rate was 0.72 mL/mln.

P

in Figure 5. The mean peak current found was 4.50 nA with a range of 4.25-4.85 nA. The relative standard deviation over the complete series was 4.2%. The precision obtained with the flow injection system compares favorably with the precision reported for analogous batch preconcentration approaches (6,8, IO),and illustrates the absence of carry-over effects. Thus, reproducible data are achievable for a single species, present at the micromolar concentration level in complex samples, such as urine. Chlorpromazine peaks from the diluted urine are only slightly lower than those from

(1) RuiiEka, J.; Hansen, E. H. "Flow InJection Analysis"; Wiley: New York, 1981. (2) Betterldge, D. Anal. Chem. 1978, 50, 832A. (3) RuiiEka, J.; Hansen, E. H. Anal. Chlm. Acta 1978, 9 9 , 37. (4) Pungor, E.;Feher, 2.; Nagy, G.; Toth, K.; Horvai, G.; Gratzl, M. Anal. Chlm. Acta 1979, 109, 1. (5) Yao, T.; Kobayashl, Y.; Musha, S. Anal. Chim. Acta 1982, 139, 363. (6) Price, J. F.; Baldwln, R. P. Anal. Chem. 1980, 5 2 , 1940. (7) Jarbawi, T. B.; Helneman, W. R. Anal. Chim. Acta 1982, 135, 359. (8) Wang, J.; Frelha, 6 . A. Anal. Chim. Acta, in press. (9) Kaldova, R. Anal. Chlm. Acta 1982, 138, 11. (IO) Chaney, E. N.; Baldwln, R. P. Anal. Chem. 1982, 5 4 , 2556. (1 1) Wang, J.; Dewald, H. D.; Greene, B. Anal. Chlm. Acta 1983, 146, 45.

RECEIVED for review January 28, 1983. Accepted March 21, 1983. This work was supported by a grant from the U.S. Department of the Interior, through the New Mexico Water Resources Research Institute.

Boolean Logic System for Infrared Spectral Retrieval S. R. Lowry" and D. A. Huppler Nicolet Instrument Corporation, 5225- 1 Verona Road, Madison, Wisconsin 5371 I

An lnteractlve retrieval system has been developed for infrared spectral data. Thls system employs Boolean logic operations on sets deflned by the presence or absence of I speclflc peak In a spectrum. The sequentlal appllcation of this procedure can reduce the original llbrary of 3300 spectra down to a few spectra, whlch can be displayed and compared to the unknown spectrum. Options have been Included for both peak locatlon and lntenslty varlatlons. The system has been evaluated by using spectra obtalned from a capillary GCIIR experlment.

The trend toward digital data in most new infrared spectrometers has created a renewed interest in automated spectral identification. This demand and the availability of highquality digital data bases have led to new research into methods for computerized spectral identification. Some of this research has been described in a number of recent papers (1-7).

Our laboratory has been extensively involved in this field for several years and recently reported the results of an au-

tomated search system that is part of our GC/FT-IR package (8). This search algorithm utilizes a file of reduced resolution full spectral data as a reference set and performs a point by point similarity measure to determine the best matches. We have also developed a search system that utilizes the Sadtler digital SPEC FINDER data as a reference set (SPEC FINDER is a Trademark of Sadtler Research Laboratory). This data file contains only a reduced set of peak location data and the location of the largest peak for each spectrum. The reference libraries used by these two algorithms approach the opposite ends of the scale of information content for a spectral representation. The "deresolved spectra" used in our GC/IR research still contain most of the peak width and intensity information found in the original spectra. These spectra were created by applying a 17-point polynomial smooth to the original 4-cm-' resolution spectra and saving every fourth point. The spectra were then normalized so that the intensity of the largest absorbance peak could be represented in 10 bits. This compression technique allows an infrared spectrum in the region 4000 cm-' to 450 cm-l to be stored in 230 20-bit computer words with a data point every 8 cm-l and one part in a thousand intensity resolution. Similar

0003-2700/83/0355-1288$01.50/00 1983 American Chemlcal Soclety

ANALYTICAL CHEMISTRY, VOL. 55, NO. 8, JULY 1983

compression techniques have been reported by Azarraga et al. (9) using 16-bit words. Storing two data elements in each word yields 1 part in 256 of intensity resolution. Although this search format generally provides the best search results, two major problems exist. The first problem is that full spectral searching isi noise dependent. Because the full spectral search is basically a minimum difference calculation, both random base lint?noise and contaminants contribute to the match fador computation. A second problem is the large computation time required for this search. The search times are now approaching 1nnin for reference files of 10 000 spectra. While this is not an unreasonable length of time, as the files increase the overall search times will become more noticeable. The SPEC FINDER coding scheme of flagging the largest peak in each 100-cm-’ window approaches the minimum possible information. The major advantages of this search system are the very large reference libraries (approximately 90 000) and the rapid search times. The disadvantage is the lack of specificity that results from the compressed data format. Although this search system frequently performs quite well, many times compounds with similar major features cannot be differentiated. This library also suffers from the problems of manual encoding. In the last year, we have been looking for compromise techniques which fall somewhere between these two extremes. One technique reported in the literature (5) involves computing the eigenvectors for the spectral library and comlputing a reduced representative by projecting the original spectrum onto a reduced ortho-normal basis set. This technique alppears very promising but requires a base initial computation to determine the eigenvectors. The format we chose represents each spectrum with a table containing the location and intensity of each peak in the spectrum that exceeds a threshold. These tables were produced by applying an automatic peak identification algorithm to each spectrum in the library. A major advantage of this method is that the identical algorithm can be used both for the formation of the reference libraries and for the creation of a peak table from an unknown spectrum. This standardization eliminates a significant variable encountered in most early search systems. Once the reference library of peak tables was created, we worked in two directions. The f i t involved a fully autoimated system that starts with a spectrum and determines the most similar reference spectra based on peak location and intensity. Although this search algorithm provided reasonable results, we felt that there was also a need for an interactive infrared retrieval system that allowed the user to define the significant peaks. This paper will describe an infrared spectral itdentification algorithm that applies Boolean operations to spectral subsets defined by a peak and an intensity level. The sequential application of this process c a reduce the library down to a very small set of spectra which meet all the requirements. These spectra can be retrieved from the library and displayed for visual confirmation. This paper will report the use of this system with capillary GC/IR data and we will discuss some of the effects of peak and intensity windows on the convergence rate of the system.

EXPERIMENTAL SECTION The data base for this project is a set of 3300 peak tables taken from the spectra in the EPA vapor phase library. This library was created by Sadtler Laboratories for Leo hzarraga of the EPA Laboratory in Athens, GA. The peak tables use in this stud-7~ were obtained by autoscaling each deresolved spectrum so th,at the largest peak is one absorbance unit and then saving all peaks greater than 0.1 absorbance unit. The intensity for each peak was multiplied by 20 so that it could be stored in integer form. This gives an intensity resolution of 0.05 abflorbance unit. The same identification number was used for the peak tables and the

1289

0

N.

4 II

si

IO

0

Y

O WRVENUMBERS

Flgure 1. Unknown spectrum from GC/IR experlment. Identified as ethyl acetate.

low resolution spectra library. This allows the routine retrieval of the full spectra for plot and display purposes. All software for this project was developed on a Nicolet 1280 computer. The majority of the programming was done in Fortran, but several assembly language subroutines were written for optimizing the Boolean operations on the various bit strings. The final software was designed to run directly from the main FT-IR spectrometer as part of the standard infrared spectrometry package. Most of the spectral data used in this research were acquired with a Nicolet 170SX spectrometer interfaced to a HewlettPackard 5880 gas chromatograph. The chromatograph was equipped with a split injedor and a flexible capillary column. The GC/IR interface contains a gold-coated light pipe and a highly sensitive mercury-cadmium-telluride infrared detector. The sample used for this Rtudy consisted of a mixture of over 20 solvents.

RESULTS AND DISCUSSION System Design. The design of this system was modeled after some early work performed by Woodruff et al. (6). This early system was text base and contained no intensity information. Their system also used a batch process where data entry was quite cumbersome. The initial program that we developed was similar to programs used in both NMR and mass spectrometry (10,11). In these programs, the user entered a peak location rmd the program calculated the number of spectra in the library containing that peak. The user then entered a second peak and the number of spectra containing both peaks was reported. The process could be continued until the user had one or B small number of spectra remaining. These spectra could then be retrieved and displayed. This system had two obvious weaknesses. The first was the use of a predefined error window for the peak location (“wiggle”) that was constant for all peaks. The second was the lack of intensity information. To overcome these problems, the present system allows the user to enter a wavenumber window and an intensity window in which the peak is required to appear. When this information is entered a subset of the library is defined and the number of spectra in that subset is reported. This subset may be saved and a second set can be defined by another pair of windows. These two sets can then be combined baried on three logical operations. These are “AND”, “OR”, and “NOT”. This new set may be saved and the process repeated until a sufficiently small number of spectra remain in the set, to allow easy visual comparison. The actual procedure used for the identification of an unknown spectrum is described below. Figure 1 shows the spectrum corresponding to one of the chromatographic peaks in a GC/IR experiment. The top list in Table I shows the peak locations and normalized intensities for the peaks greater than 0.1 absorbance unit after scaling. Figure 2 shows the printout from the interactive process used on this spectrum. The order in which peaks were chosen was arbitrary and has no effect on the final result. Frequently, the peaks are chosen

1280

ANALYTICAL CHEMISTRY, VOL. 55, NO. 8, JULY 1983 €PA VAPOR PHASE

Table I. Peak Tables from FT-IR Peak Picker peak location

peak intensity

peak location

peak intensity

Unknown Spectrum Scaled 1053 1238 1376

0.27

1.00

1765 2992

RTN = iex HEPTRNOIC A C I 0 . 6VL32

0.60 0.14

ETHYL ESTER

0.20

4 cm-' Spectrum of Ethyl Acetate 1054 1093 1236 1245 1374

0.31 0.11 1.00 0.99 0.22

1384 1756 1770 2995

0.21 0.64 0.68 0.17

/I

4 -1

/I

Ethyl Acetate from Deresolved Library 1053 1237 1376

0.30 1.02 0.22

1764 2990

0.64 0.16

NICOLET SPECTRAL RETRIEVAL PROGRAM ENTER PEAK WINDOW AND INTENSITY: W1-W2.11-I2 1050-1057 10-50 SET CONTAINS 160 SPECTRA NOW ENTER OP CODE:-1 TOTAL RESTART:O REDO LAST:I AND 2 OR:3 NOT:4 SAVE:5 PRINT 1DS:G QUIT 4 ENTER PEAK WINDOW AND INTENSITY: W1-W2.11-12 1233-1239 50-100 SET CONTAINS 109 SPECTRA NOW ENTER OP CODE:-l TOTAL RESTART:O REDO LAST:l AND 2 OR:3 NOT:4 SAVE:5 PRINT IDS:6 Q U I T I SET CONTAINS 8 SPECTRA NOW ENTER OP CODE;-1 TOTAL RESTART:O REDO LAST:l AND 2 OR:3 NOT:4 5AVE:S PRINT IDS:6 QUIT 4

J,i 'io00

3500

3000

2500 2000 W~VENUMBERS

1500

1000

5C0

Flgure 3. Library spectra with Wlsswesser line notation for the two spectra retained by the query system.

ENTER PEAK WINDOW AND INTENSITY: Wl-W2.11-12 1760-1110 50-100 SET CONTAINS 139 SPECTRA NOW ENTER OP CODE:-1 TOTAL RESTART:O REDO LAST:I AND 2 OR:3 NOT:4 SAVE:5 PRINT I D S 6 QUIT 1 SET CONTAINS 2 SPECTRA NOW ENTER OP CODE:-1 TOTAL RESTART:O REDO LAST:l AND 2 OR:3 NOT:4 SAVE:5 PRINT IDS:6 QUIT 5 1 490 2 1818 SET CONTAINS 2 SPECTRA NOW ENTER OP CODE:-1 TOTAL RESTART:O REDO LAST:I AND 2 O R 3 NOT:4 SAVE:5 PRINT 1DS:B QUIT 6

P

flgure 2. Printout of interaction with the retrievalsystem. User replies have been underlined for clarity.

according to the functional groups to which they might correspond. In this example three entries were sufficient to reduce the library down to two spectra. These spectra can be retrieved and either displayed or plotted. Figure 3 shows the two spectra which were retrieved along with their names and the Wisswesser line notation corresponding to their molecular structure. The decision on when to stop entering new peaks is somewhat arbitrary. If a compound is a complete unknown, the chemist might choose to retrieve and display a large number of spectra containing certain peaks, in order to learn more about potential functional groups. In most cases once the set has been reduced to less than five spectra, a visual comparison of all the remaining spectra is as easy as entering more peaks into the query. In this example, we chose to stop with two spectra remaining. A final window from 2995 cm-l to 2987 cm-I with an intensity value of 10 to 30 would have eliminated the ethyl heptanate spectrum, leaving only the correct match. As with an spectral search the final confirmation should be a visual comparison of the unknown with the best matches. In this example, the system was used to identify an unknown spectrum. However, the system can be used for any type of spectral retrieval application. Two obvious applications could involve finding infrared transparent solvents or potential interfering compounds for a specific quantitative peak. These types of problems would make better use of the "NOT" or "OR" operations than spectral identification.

1

9

I

1

'

1*

I \

0 m

E

.

I4

Lp+$r;

,Ad "f!~*fld+Jw*~~~+l 3500 30bC 25b0 20b0 1503

n+MI&+q

4000

LOO0

I

503

r l A L ENLMSESS

Flgure 4. Unknown spectrum from GC/IR experiment, identified as rn-xylene.

Resolution Effects. A major problem with any peak table reference library is the variation in both peak location and intensity caused by different spectral resolution. This is a particular problem with vapor-phase spectra obtained from small molecules because of the potential for rotational splitting of the vibrational modes. The second list in Table I shows the peak data for the original 4-cm-l resolution spectrum of ethyl acetate. Even at this resolution the major bands, such as the carbonyl, are split into doublets. The lower list in Table I shows the peak data from the deresolved library spectra, these peaks are much closer to the peaks obtained in the low-resolution GC/IR experiment. For this specific experiment, peak tables created after a 17-point polynomial smooth resulted in a better reference library for compound identification. Even with this standardization process, differences exist between reference tables and an unknown spectrum. These deviations require that the peak and intensity windows are

ANALYTICAL CHEMISTRY, VOL. 55, NO. 8, JULY 1983

Table 11. Effects of Window Sizes peak wavenumber no. window, cm-‘

intensity window (W abs)

no. of spectra found

spectra remaining

757-777 3020-2040 1600-1620 2923-2943 1372-1391

Broaid Peak Window, 20 cm-’ 10-100 10-100 10-100 10-100 10-100

363 458 530 1280 980

363 58 21 14 6

1 2 3

764-770 3027-3033 2430-2937

Narrow Peak Window, 6 cm-’ 10-100 10-100 10-100

1.33 1.43 678

133 8 5

1.

71

2

Intensity Window with Broad Peak Window 757-777 60-100 71 3020-3060 80-100 19

1 2

Intensity Window with Narrow Peak Window 764-770 60-100 30 3027-3033 80-100 8

1 2 3 4 5

enlarged to some degree. In order to evaluate these parameters a second spectrum from the GC/IR run was chosen. This spectrum shown in Figure 4 was identified by the full spectral search as nz-xylene. The peak locations and intensities were calculated with the standard peak picking software. The peak finder is designed to work with spectra of all resolutions. For this reason, the peaks are listed with three decimal places. The accuracy of the peak locaiions in these spectra is less than f0.5 cm-l when peaks are not merging. Effects of Varying the Size of Peak Window. Although the ability for the user to define the spectral range inl whch a peak must appear is a major advantage of this system, deciding on the best window functions can cause some problems. If the windows are made too large, many spectra in the library will be retained and the convergence to an acceptable number of spectra will require more iterations. A worse case occurs if the windows are so narrow that small deviations in peak location cause the correct spectrum to be eliminated. In order to evaluate the effect of the window size on the convergence rate, we ran the sytem with broad windows where a deviation of about 20 cm-’ was used and with narrow windows where the deviation was about 6 cm-l. We found that reducing the windows to less than 6 cm-l allowedl some correct spectra to be missed and in general did not significantly improve the convergence rate. Table I1 shows the results of the two peak window schemes when the intensity window was wide open (0.1-1.0absorbance unit). In the case of the broad windows, even with five peaks the set only reduced to six spectra. In the case with narrow windows, the set reduced to five spectra after three peaks and only eight spectra remained after the first two peaks were defined. Effects of Using Intensity Windows. In the previous case, a spectrum was acceptable if it contained a peak of any intensity in the specified window. We repeated the experiment by using a medium level intensity window. The results are shown in Table 11. In the case of the broad peak window with an intensity of reduced number of spectra retained by the first window from 36 to 71. Only 19 spectra passed the second window and the “AND” operation left a single spectrum. In the case of the narrow windows, the number of spectra retained by the first window went from 133 to 30 and only 8 were captured by the second. Although the results shown above are from a single unknown, experiments with a number of compounds indicate that if the unknown spectrum is smoothed to a resolution

1291

1

30 1

similar to the reference data, a window of f 5 cm-l is reasonable for average peak widths. Of course, if the peak of interest is a broad OH stretching peak the window must be increased. Our experience with intensity windows indicates that if the largest peak in the absorbance spectrum is scaled to 100, a window of 10 units iahould generally be adequate. These conclusions are based on gas-phase spectra where molecular interactions are minimal. Some preliminary results with condensed phase data indicate that both peak and intensity windows must be enlarged to compensate for the peak shifts and intensity variation caused by sample preparation effects. The goal of this work was to design an interactive query system for accessing infrared spectra. The results of this research indicate that the presence of intensity information and the ability to define the window sites greatly enhances the capabilities of this type of algorithm. For the vapor phase library of 3300 peak tables used in this work, the response time for each query is less than 2 s. We have concluded, after several years of research, that no best method of computerized spectral identification exists. The techniques described above allow the spectroscopist to retrieve spectra from a library based on the peaks that he feels are most important. Although this approach is not optimal in many situations, the ability to obtain easily the full spectra for compounds, when a digitized spectrum of the unknown is not available, can often provide a simple solution to an otherwise difficult problem.

LITERATURE CITED (1) Hanna, A.; Marshall, J. C.; Isenhour, T. L. J . Chromafogr. Sci. 1979, 17, 434-438. (2) Azarraga, L. V.; Wllllams, R. R.; de Haseth, J. A. Appl. Specfrosc. 1981, 35, 466-469. (3) Delaney, M. F.; Uden, P. C. Anal. Chem. 1979, 51, 1242-1243. (4) de Haseth. J. A.; Azerraga, L. V. Anal. Chem. 1981, 53, 2292-2295. (5) Hangac, G.; Wieboldt, R. C.; Lam, R. 8.; Isenhour, T. L. Appl. Specfrosc. 1982, 36, 40-47. (6) Woodruff, H. B.; Lowry. S. R.; Isenhour, T. L. J . Chem. Inf. Compul. S C l . 1975, 15, 207-212. (7) Erlckson, M. D. Appl. Spectrosc. 1981, 35, 181-184. (8) Lowry, S. R.; Huppler, D. A. Anal. Chem. 1981, 53, 889-893. (9) Azarraga, L. V.; Hanna, D. A. GIFTS, Athens, ERL GC/FT-IR Software and User’s Guide (USEPA/ERL Athens, GA, 1979). (10) Mllne, G. W. A.; Heller, S. R. J . Chem. I n f . Comput. Scl. 1980, 1 0 , 204-208. (11) Lowry, S. R.; Marshall, J. C.; Isenhour, T. L. Compuf. Chem. 1976, 1 , 3-5.

RECEIVED for review November 9,1982. Accepted March 21, 1982.