Computer assisted structural interpretation of fluorescence spectra

Dec 1, 1976 - C. Michael. O'Donnell and T. N. Solie .... Researchers are just starting to study how blazes like California's devastating Camp Fire may...
0 downloads 0 Views 792KB Size
can be improved by an order of magnitude upon integrating the VFC output for the appropriate number of minutes. Alternatively, small shifts in emission intensity or wavelength can be measured more precisely. Since the VFC conversion to pulse rate is linear over several orders of magnitude in input intensity, a direct comparison of samples is possible over an extended range of fluorescence intensities without altering operating parameters. Depending upon the type of eventcounting device, differing degrees of sophistication can be attained. In addition to providing digital readout, many counters now offer computer compatible output, which can be used to provide indirect computer access to an analog fluorimeter without designing a custom interface. These features, in combination with flexible operation, make the VFC procedure an attractive alternative t o the application of more sophisticated output handling techniques, when only a moderate enhancement is required. Many devices such as true digital integrators or microprocessor controlled digitizers can outperform the VFC procedure, but at a significant increase in cost t h a t is currently measured in thousands of dollars. In this context, the VFC approach to driftfree integration occupies a niche in both cost and sophistication between simple analog integration and digital techniques. Although signal integration using the VFC is a drift-free process, an instability in either the instrument or the sample will reduce the potential benefit of prolonged integration. Since this report describes the application of the VFC a t the fluorimeter output, any improvement in the intermediate stages of signal handling will be carried over. However, this approach is incompatible with samples showing a kinetic behavior t h a t is faster than the required integration period. In this case, other procedures have t o be used t h a t provide enhanced sensitivity with more rapid measurements. If significant modification of the fluorimeter is found necessary, then

consideration should be given to using direct photon counting or current to frequency conversion (8,241a t the photomultiplier followed by digital ratio correlation (25).

LITERATURE CITED (1)L. Stryer, Science, 162, 526 (1968). (2)G. K. Radda and J. Vanderkooi, Biochlm. Biophys. Acta, 265, 509 (1972). (3) V. Glushko and M. Sonenberg, Fed. Proc., 33, 1459 (1974). (4) H.V. Malmstadt, M. L. Franklin, and G. Horlicl?, Anal. Chem., 44 (E),63A (1972). (5)J. D. Ingle, Jr., and S. R. Crouch, Anal. Chem., 44, 785 (1972). (6)J. D. ingle, Jr., and S.R. Crouch, Anal. Chem., 44, 1375 (1972). (7) H. C.Beall and A. Haug. Anal. Biochem., 53, 98 (1973). (8) S.Cova, G. Prenna, and G. Mazzini, Histochem. J., 6, 279 (1974). (9) P. Vigny and M. Duquesne, Photochem. Photobiol., 20, 15 (1974). (10) V. Glushko, R. Caley, and C. Karp, Biophys. J., 16,64A (1976). (11) E. Zuch, Electronic Design, 18, 110 (1975). (12)M. Sonenberg, Proc. Nat. Acad. Sci. USA, 68, 1051 (1971). (13)G. A. Crosby, J. N. Demas, and J. B. Callis, J. Res. Natl. Bur. Stand., Sect. A, 76, 561 (1972). (14)Analog Devices, "Product Guide", 1975,pp. 70-79, 104-105. (15)W. H.Melhuish, J. Res. Natl. Bur. Stand., Sect. A, 76, 547 (1972). (16)R. Reisfeld, J. Res. Natl. Bur. Stand., Sect A, 76, 613 (1972). (17)R. A. Velapoldi, J. Res. Natl. Bur. Stand., Sect. A, 76, 841 (1972). (18)S. Udenfriend, "Fluorescence Assay in Biology and Medicine", Vol. 11, Academic Press, New York, 1969,Chap. 5,8-11, 14, 17-19. (19)J. B. Birks, "Photophysics of Aromatic Molecules", Wiley-interscience, London, 1970,Chap. 4. (20) H. R. Horton and D. E. Koshland, Jr, "Environmentally Sensitive Groups Attached to Proteins. I. Dansyl Chloride", in "Methods in Enzymology", Vol. 1 I , S.P. Colowick and N. D. Kaplan, Ed., Academic Press, New York, 1969,pp 856-865. (21)S.Udenfriend, S.Stein, P. Bohlen, W. Dairman, W. Leimgruber, and M. Weigle, Science, 178, 871 (1972). (22)J. Ingle, Jr., and S. D. Crouch, Anal. Chem., 45, 333 (1973). (23)V. Glushko, C. Karp, and M. Sonenberg, Biophys. J., 16,48A (1976). (24)T. A. Woodruff and H. V. Malrnstadt. Anal. Chem., 46, 1162 (1974). (25)T. A. Woodruff and H. V. Malmstadt, Anal. Chem., 46, 2141 (1974).

RECEIVEDfor review May 21,1976. Accepted August 23,1976. This research was supported by grants from the American Cancer Society, BC-119, and from the National Institutes of Health, CA-08748 and CA-16889.

Computer Assisted Structural Interpretation of Fluorescence Spectra Thomas C. Miller and Larry R. Faulkner'

Department of Chemistry, Universityof Illinois, Urbana, Ill. 6 180 1

A computer file searching procedure for fluorescence spectra has been developed. The procedure Is based on comparlsons of the most obvious spectral features in fluorescence excitation and emission spectra. These spectral features include the total number of peaks, peak locations, widths at half height, relative intensities, and the location of an excltatlon minimum. The library flle used in the file searchlng procedure was developed from the first 1000 compounds characterized in Sadtler's Standard Fluorescence Spectra. Abbreviated representations for each substance were packed into 16-blt computer words. The file searching and the acquisitlon of experlmental fluorescence spectra were both done by a minicomputer. The file searching algorithm was evaluated. The results show that the procedure is capable of either identifying an unknown compound or suggesting structural analogues.

The task of chemical identification through the interpretation of spectral data is now being routinely done by computers. Of the techniques for this purpose, file searching is

probably the most widely used. In this approach, the spectrum of an unknown compound is compared to a library of spectra of known compounds, and those entries in the library t h a t show the highest degree of similarity with the unknown's spectral characteristics are identified with structures that are presumably closely related, if not identical, to t h a t of the unknown. The file searching technique is readily adapted to different forms of spectral data ( I ) and has been successfully applied to mass spectrometry (2-9) and IR spectrometry (10-1 5 ) . Fluorescence spectrometry, with its inherent high sensitivity, is well suited for low-level quantitative analysis, but for qualitative purposes other techniques have been much more widely used. This is mainly due t o two reasons: First, not all compounds fluoresce; a n d second, easily recognizable analogues t o group frequencies or fragmentation patterns do not exist for fluorescence spectra. Thus, one cannot mentally correlate spectral features with aspects of molecular structure. However, this situation does not imply t h a t structural interpretations of fluorescence spectra are impossible. A computer-based file searching procedure, which by nature does

ANALYTICAL CHEMISTRY, VOL. 48, NO. 14, DECEMBER 1976

2083

0

I

1

2

3

415

6

7

S

1

SOLVENT

I.D. NUMBER

-0 1

2 3 4 5 6 7 NUMBER OF NUMBER OF EM~SS~ON EXC~TAT~ON PEAKS PEAKS

8

(ENTER)

9 101112131415

/a READ IN

I

EXCITATION SPECTRUM

9 10 11 12 13 14 15

v

LOCATION OF EXCITATION MINIMUM

SPECTRUM

0

1

2

3

4

5

6

RELATIVE INTENSITY OR WIDTH AT HALF HEIGHT

7

S

9101112131415

INPUT CT FILE

+ 1

I

LOC AND REL INT

OF PEAKS

ESTAR

LOCATION OF PEAK

Figure 1. Packing of spectral informatlon into 16-bit computer words ENTRIES

not depend upon the kinds of associations used in mental spectral interpretation, might find ready success in such a venture. This possibility is interesting to us because fluorescence spectroscopy is a potentially useful tool for the on-line characterization of chromatographic effluents. For both gas chromatography (16) and high pressure liquid chromatography (17),reports have described systems in which scanning fluorescence spectrometers were coupled directly t o t h e column outflow. With the addition of a computer-based file searching procedure, the amount of information returned by such systems would be enhanced greatly, and advantages similar to those established for GC/MS could be realized. Fluorescence characterization might prove especially useful for liquid chromatography, which is often selected for separations involving nonvolatile or thermally labile substances that are not easily characterized by mass spectrometry. The utility of an LC/fluorescence linkage depends strongly upon the unanswered question about information content: Can one use fluorescence spectra to provide useful structural information about an unknown substance? We have addressed this issue by developing and testing a system for searching files of fluorescence spectra. Only the most obvious spectral features are included in our compact library; hence the fraction of available information utilized in our comparisons is still rather low. Even so, the results we report below show that our algorithm is capable of either identifying an unknown altogether or suggesting structural analogues to it.

EXPERIMENTAL Apparatus. The file searching technique has been implemented on a Data General Nova 820 minicomputer with a 16-K core memory. The word length of this machine is 16 bits. The peripherals linked to the Nova 820 included three cassette units for mass data storage, a CRT display (Applied Digital Data Systems Model 580), and a serial matrix printer (Centronix Model 306). The minicomputer has been interfaced to an Aminco-Bowman spectrophotofluorometer (SPF) for the direct acquisition of digitized fluorescence spectra. An 8-bit A/D converter monitors the voltage output of the photometer and a 10-bit A/D converter monitors the wavelength signal produced by the SPF. This arrangement allows the minicomputer to obtain wavelength-intensity pairs every nanometer over the 200-700 nm range. Any digital spectrum can be displayed on either an oscilloscope or a Hewlett-Packard Model 7004B X-Y recorder with a point plotting module. The SPF itself was equipped with a 150-W xenon arc lamp, an aff-axis ellipsoidal condensing system, and a Hamamatsu 1P21 photomultiplier. Reference Collection. The main prerequisite for a file searching procedure is a collection of spectra of known compounds. In this study, the entries for the 1000compounds contained in the first four volumes of Sadtler’s “Standard Fluorescence Spectra” were used. Each entry in the reference collection contains both an excitation and an emission spectrum as obtained from an Aminco-Bowman SPF equipped much like the one in our laboratory. Also presented for each entry is information about the compound and about the experimental parameters 2084

Figure 2. Flow chart for the file searching program (LB = library, CT

= cassette) used to obtain the spectral curves, e.g., slit settings, solvent, sample concentration, and major peak positions. Abbreviation of Reference Collection. Since the file searching is done on a minicomputer with 16 K of memory, it is necessary to use an abbreviated form of the reference collection. We have chosen to enter in this library only the most obvious spectral features of a compound’s fluorescence excitation and emission spectra. These features include the number of peaks, peak locations, widths at half height, and relative intensities. (Use of this information is subject to restrictions set forth in sales agreements between Sadtler Research Laboratories, Inc., and purchasers of the fluorescence collection. Special permission was granted for us to create and utilize the digital files described below.) To compress the abbreviated reference collection still further, this information has been packed into 16-bit computer words according to a special format (see Figure 1).The first word of each entry contains a code for the solvent in bits 0-4 and an identification number for the compound in bits 5-15. The second word contains the number of emission peaks in bits 0-2, the number of excitation peaks in bits 3-6, and in the case where there are only two peaks in the excitation spectrum, bits 7-15 contain the location of the excitation minimum. Each remaining word in an entry contains information about a single peak: Its relative intensity is stored in bits 0-6, and its location relative to 200 nm is entered in bits 7-15. For each spectrum, the intensities are normalized so that the largest peak has an intensity of 100 units. When only a single peak appears in a spectrum, its width at half height (in nm), rather than its intensity, is stored in bits 0-6 of the appropriate word. Words describing excitation peaks are grouped together just after the second word of the entry and are ordered according to increasing wavelength. Words describing emission peaks follow and are arranged similarly. When the reference collection is abbreviated in this manner, the resulting library file occupies about 6600 words and can easily reside in a 16-K minicomputer. The File Searching Program. The actual file searching procedure is implemented in a program titled SEARCH. The assembled program requires about 8000 computer words and its flow chart is shown in Figure 2. The initial step in the program is to read the library into core memory from a cassette file. The library file is organized according to the total number of peaks in each entry. Those entries with only two peaks are located at the beginning of the library and those with a large number of peaks are located at the end. This facilitates the file searching process because SEARCH compares the unknown only to those library file entries that have the same number of peaks in the excitation and emission spectra as the unknown. This restriction is necessary in order to limit the size of the SEARCH program. After reading in the library file, the program obtains the experimental excitation spectrum using the spectrum acquisition subroutine. In this subroutine, the operator has the option of entering the spectrum from a cassette, if it has been previously recorded, or obtaining it directly from the SPF. In the latter case, the range over which the spectrum is to be scanned must be specified. The option exists for a baseline correction.

ANALYTICAL CHEMISTRY, VOL. 48, NO. 14, DECEMBER 1976

The next step in the spectrum acquisition subroutine allows for the application of a set of correction factors and a wavelength shift. The correction factors can be any set of numbers that are to be multiplied point by point with the spectrum. They are usually used to compensate partially for instrumental response and thereby improve the results of the file searching procedure. (See below.) The wavelength shift is any constant value that is to be added to the wavelength values of the spectrum. This feature permits compensation for small systematic errors in the wavelength scales of the SPF’s monochromators. We calibrate them periodically by comparing the observed peak positions for anthracene in cyclohexane to those entered in the library. To compile the necessary spectral information for a compact representation of the unknown, SEARCH computes a first derivative of the spectrum. Changes in sign of the derivative are used to locate primary peaks and minima. Any wavelength at which the first derivative approaches zero within a preset limit, but does not change sign, is denoted as a secondary peak. A secondary peak is therefore a shoulder on a primary peak. Any peak, ivhether primary or secondary, is ignored if its relative intensity is less than or equal to five units. Secondary peaks are disregarded by SEARCH in its first examination of the library, which contains data words only for primary peaks, but they may be used in subsequent passes. (See below.) A t the time of computation, the locations and relative intensities of both the primary and secondary peaks are listed for the operator along with,any necessary information about the location of an excitation minimum or a peak’s width at half height. The experimental spectrum is also plotted on an oscilloscopeso that the operator can verify the validity of the primary and secondary peaks. Finally, the complete normalized, wavelength-shifted spectrum can be stored on a cassette for future use. After obtaining the excitation spectrum, the program then obtains and analyzes the experimental emission spectrum in a similar manner. The information necessary for a library search is then assembled and is compared with each entry in the appropriate section of the library file. Before the actual comparisons are made, the SEARCH variables must be defined. The operator can specify how many library file entries are to be listed in the results and whether or not the solvent in which the unknown is run is to be used as a discriminating factor. This allows for the reporting of only those library file entries which were run in the same solvent as the unknown. During a comparison between the unknown and an entry in the library file, SEARCH computes an index which gives a numeric value to the degree of similarity. The index is calculated according to the following formula:

where L denotes the library entry and U the unknown. Peak locations or widths at half height are given by A, and intensities are represented as I . The location of an excitation minimum, if one is considered, is shown as M , and the total number of peaks is N. The indices p and w run over all corresponding peaks and widths considered in the match. This index is simply a summation, over all points of comparison, of the differences between the unknown and a given file entry. Because of the squared terms, it weights differences in peak locations more heavily than differences in relative peak intensities. The 3/N factor normalizes the index according to the total number of peaks under comparison. The multiplier of 3 represents the average total number of peaks for the library file entries. The index ranges from a low value of zero, indicating a perfect match, to an upper limit of about 65 000, set by overflow in a 16-bit computer word. After all the entries in the appropriate section of the library file have been considered, the program reports those entries with the best indices. This output consists of each entry’s ident.ification number, the solvent in which it was run, and its index. After listing the results of the file search, the program can be restarted or another pass through the procedure can be made. Prior to this second pass, the SEARCH variables are redefined. This feature allows the operator to request a larger list of file entries or to change the status of the solvent restriction. He can also make use of a secondary peak option, which permits him to include any secondary peaks in the comparison procedure or to exclude any peaks that have already been used. In this way, the program has the flexibility of being able to examine other sections of the library in the file searching procedure. After the second pass through the program, the entries with the best indices are again printed out and the cycle can be continued.

Returned Compounds

Flgure 3. The percentage of the library that causes SEARCH to return a given number of compounds with indices I 1 0 0

RESULTS AND DISCUSSION Uniqueness of Fluorescence Spectra. In order to test the idea that an unknown compound can be identified by its fluorescence excitation and emission spectra, 100 randomly selected entries from the library file were used in turn as “unknowns” for SEARCH. The results are shown in Figure 3. The distribution shows the number of “unknowns,” out of the 100, t h a t caused SEARCH t o return a given number of compounds with indices between 0 and 100. SEARCH returned 10 or fewer file entries for 79 of the unknowns and actually returned only one entry, i.e., the perfect match, for 49 of them. The upper index limit of 100 seems to define a reasonable boundary between likely and unlikely structures for a given unknown. Pairs of spectra giving larger indices are obviously different upon visual inspection. Figure 3 shows that SEARCH returned a large number of entries (>20) for a significant fraction of the unknowns. In each such instance, the large majority of the entries were similar in structure t o the unknown. Usually, both the unknown and the returned entriers were phenols or phenolic ethers. The Sadtler collection features many such compounds, all of which have relatively simple spectra that are difficult t o differentiate and are prone t o instrumental distortion because they are confined to short wavelength spectral regions. For example, when the library file entry for p-cresol was entered as the unknown, SEARCH returned 78 compounds with indices between 0 and 100. Of these, 57 are some type of substituted phenol, and all but one has a n oxygen attached t o a n aromatic ring. With the present library and searching algorithm, this comparison represents a valid worst-case test for spectral uniqueness. Even in this instance, the search clearly provides significant structural information. Correction Factors. In order for accurate comparisons t o be made between experimental fluorescence spectra and those in the reference collection, it is necessary that the responses of the two spectrometers involved by very similar. This requirement could be m e t by the use of corrected spectra. However, only uncorrected spectra were available for t h e present work, so a n attempt was made to equalize the responses of the two spectrometers by applying a set of adjustment factors (not correction factors in the usual fluorometric sense) to experimental spectra from our SPF. Since we and t h e Sadtler company both used Aminco-

ANALYTICAL CHEMISTRY, VOL. 48, NO. 14, DECEMBER 1976

2085

'

'

O

h

-

Figure 6. Structures, indices, and solvents (M = methanol, C cyclohexane) for the three best compounds returned by SEARCH for anthracene 0 1.0

1.5

2.0

lnltisl Value of Correction Factors

Flgure 4. Effect of the application of different sets of correction factors on the indices returned by SEARCH

SERRCH SET UP : 7 4 A/D = WAVELENGTH 76 A/D = I N T E N S I T Y READ I N LIBRARY F I L E

C T U

DEFINE SEARCH VARIABLES L I M I T FOR INDEX PRINTOUT= USE SOLVENT I N SEARCH ?

&

SPECTRAL DATA FOR UNKNOWN COMPOUND SOLVENT= C OBTAIN E X C I T A T I O N SPECTRUM RANGE OF SPECTRUM (200-788) START= 286 STOP= pBB WRVELENGTH S H I F T = Q READ I N CORRECTION FACTORS CT START SCAN ( 5 ) : BASELINE CORRECTION ? y START SCAN ( 5 ) : 5 PRIMARY PEAK AT 88254 INTENSITY= PRIMARY PEAK AT 0 0 3 2 4 INTENSITY= PRIMARY PEAK AT 80339 INTENSITY= PRIMARY PEAK AT 80356 INTENSITY= PRIMARY PEAK AT 88375 INTENSITY= SPECTRUM PLOTTED FROM 00200 TO 80458 ACCEPT < 0 > OR DELETE < SPECTRUM? I> B CT F I L E FOR SPECTRLIM C T U OBTAIN E M I S S I O N SPECTRUM RANGE OF SPECTRUM : BASELINE CORRECTION ('+=YES N=NO>? START SCAN < S > : 5 PRIMARY PEAK AT 08379 INTENSITY= PRIMARY PEAK AT 8 8 4 8 0 INTENSITY= PRIMARY PEAK AT 0 0 4 2 4 INTENSITY= PRIMARY PEAK AT 08450 INTENSITY= SPECTRUM PLOTTED FROM 88358 TO 80680 ACCEPT < 8 > OR DELETE SPECTRUM? Q CT F I L E FOR SPECTRUM C T U

s

80887 80026 88068 88108 00096

s

08091 00180 80048 88013

RESULTS I . D. NUMBER 00084 00662 08748

SOLVENT C M C

INDEX 00016 07332 88195

*************** Figure 5. Printout from SEARCH for the acquisition and the file searching of the anthracene spectra (Operator responses are underlined). 2086

Bowman SPF's t h a t were equipped identically, the instrumental characteristics of both units are naturally fairly similar. Important differences exist only in the 200-300 n m region of the excitation spectra, and they arise partially from differences in the xenon arc output in this range. Since the distribution of arc output changes as a lamp ages, it is doubtful that a single set of correction factors could ever relate our experimental spectra in a precise manner t o more than a segment of t h e Sadtler library. Even so, a generally improved match in instrumental characteristics can be obtained with a single set of factors, and it naturally leads to better results from SEARCH. Such a set consists simply of a linear ramp starting with an empirically selected initial value a t 200 n m and declining to a value of 1.0 a t 300 nm, which is maintained throughout the remaining spectral range. When such an array is multiplied point-bypoint with an excitation spectrum, the results are t o increase the intensity of any peak in the 200-300 n m region and t o slightly shift its position to shorter wavelengths. From 300-700 nm, the factors have no effect on t h e spectrum. In order to find that set of correction factors which produces the greatest improvement, the spectra for 12 arbitrarily selected compounds were obtained and 12 different sets of factors were applied t o the excitation spectrum of each compound. These sets started with different values a t 200 nm, ranging from 1.1t o 2.2. T h e resulting indices for each compound were normalized by the compound's index with no correction factors applied. The normalized indices were then averaged over each set of correction factors and plotted vs. the initial value of the array. (See Figure 4.)The graph shows that the indices, on the average, improved as the initial value of the correction factors increased from 1.1 t o 1.7. From 1.7 t o 2.2, the indices got steadily worse. In light of these results, the set of correction factors starting with a value of 1.7 a t 200 nm was used consistently in other tests of the file searching program. Results of SEARCH. T h e fluorescence excitation and emission spectra for anthracene in cyclohexane were obtained with our S P F using SEARCH. T h e resulting printout, shown in Figure 5 , demonstrates the typical dialogue. The first items in the printout are the title of the program and the set-up for the interface. The library file is then read into memory, and the SEARCH variables are defined. Next, the spectral data for the unknown compound are obtained. T h e operator specifies the solvent (C = cyclohexane) and the range that is to be scanned. In this case no wavelength shift is used, and no correction factors are applied. After acquiring t h e spectrum and correcting the baseline, SEARCH prints out the locations and relative intensities of the primary and secondary peaks. T h e spectrum is then plotted on the oscilloscope over a convenient wavelength range. Next, the operator has the option of either accepting t h e spectrum and continuing with the program or deleting it and redoing the scan. After saving the acquired spectrum on a cassette, SEARCH obtains the

ANALYTICAL CHEMISTRY, VOL. 48, NO. 14, DECEMBER 1976

C.

21814 C

3070 M

choAo# 0 D.

O 0

B.

.-P

-

I

K 0

.

' .

..

2158 C 116 C

7"

15594 M 1126 M

Figure 8. Structures, indices, and solvents C = cyclohexane, M = methanol) for the compounds returned by the first ( A ) ,second (B),third (C), and fourth (0) passes through SEARCH for 9,lO-dibromoanthracene

-%-\ 300

400

500

Wavelength- nm

Flgure 7. 9,lO-Dibromoanthracene excitation (top) and emission (bottom) spectra, run in cyclohexane at a concentration of 2.0 ppm. Spectra were acquired from our instrument by SEARCH N '

emission spectrum in a similar manner, and the locations and intensities of the emission peaks are listed. Finally, the results are printed out. In this file search for anthracene, three library entries were returned. They had identification numbers of 84,662, and 748; were run by Sadtler in cyclohexane (C), methanol (M),and cyclohexane (C); and had indices of 16, 7732, and 8195. In general, there are two possible reasons why fewer compounds than specified are listed in the results. First, there might only be a limited number of entries in the library file that have the correct number of peaks or second, any other entries t h a t might have been returned had indices greater than 65 000. In the case of anthracene, there are only three entries in the library t h a t have five excitation peaks and four emission peaks. When the correction factors are applied to the excitation spectrum, the indices for the three returned entries are improved to 6,7312, and 8178, respectively. The structures associated with these indices are given in Figure 6. T h e library entry t h a t was returned with t h e best index was anthracene, the compound used for the unknown. The two other entries returned had very high indices and, accordingly, the structures are not very similar t o anthracene's. T h e experimentally obtained spectra for 9,10-dibromoanthracene are shown in Figure 7. SEARCH lists five primary peaks in the excitation spectrum and three primary peaks and one secondary peak in the emission spectrum. T h e results from the first pass through SEARCH show t h a t the best two library entries returned have indices of 3070 and 3404. (See Figure 8A.) Such high indices indicate t h a t either there are no compounds similar to the unknown in the library or SEARCH is considering the wrong section of the library.

'

c

7 M

12

12 M

15 M -

'o-""

Ho&

17 M 21 c Flgure 9. Structures, indices, and solvents (C = cyclohexane, M = methanol) for the six best compounds returned by SEARCH for pcresol

T o test a different section of the library, the secondary peak in the emission spectrum is included in the second pass through SEARCH. The results in Figure 8B show t h a t once again the best two compounds returned have high indices. The next option available is t o delete a primary peak that is not very well defined. T h e primary peak a t 343 nm is definitely a peak in the experimentally obtained excitation spectrum, but in the Sadtler spectrum it appears as a shoulder. If this peak is deleted from the third pass through SEARCH, only one library entry is returned (see Figure 8C), and it also has a very high index. T h e final option available is to delete the secondary peak t h a t was included in the second pass. If this is done, the fourth pass through SEARCH yields the three

ANALYTICAL CHEMISTRY, VOL. 48, NO. 14, DECEMBER 1976

2087

Table I. Results from SEARCH

Compound Anthracene 9,lO-Dibromoanthracene 9-Methylanthracene 9,lO-Diphenylanthracene Diphenylamine Diphenylamine 7-hydroxy coumarin 7-Hydroxycoumarin Quinine Sulfate m -Methylanisole 3,5-Xylenol 3,5-Xylenol Salicylamide p -Toluidine p-Cresol

Solventa

Position of correct compound in list of results

C C C C C M M W S C C M W M M

1 1 1 1 1 1 1 1 1 2 53 9 1 3 4

Index 6 5 17 49 60 25 61 133 57 17 76 17 31 41 15

C = cyclohexane, M = methanol, S = 0.1 N H2S04, W = water. compounds shown in Figure 8 0 . T h e correct identity for the unknown is established with an index of 5, and a tetramethyl anthracene derivative is returned with an index of 116. This file searching procedure for 9,lO-dibromoanthracene demonstrates the flexibility of SEARCH and the capability for operator interaction a t each stage of the program. However, if desired, the operations involved in the secondary peak option could easily be automated on a larger computer system. Figure 9 displays the results of using SEARCH to characterize one of the most difficult materials, p-cresol, which is a member of that large group of substituted benzenes emitting in unstructured bands in the ultraviolet. Actual experimental spectra were used as input. The figure depicts the six most likely file entries, which include the correct structure as the fourth best choice. Obviously the present algorithm is unable to define the entire structure with certainty, but the identity of the fluorophor has been discerned with confidence. Note that all six structures shown in Figure 9 are alkyl substituted phenols, and the first four, like p-cresol itself, are p-monoalkylphenols. Similar results were obtained for other substances possessing simple, seemingly nondescript spectra. With actual experimental spectra from M -methylanisole as input, SEARCH reported the correct structure as the second most likely possibility with an index of 17. The six best candidates all featured the same fluorophor, i.e., an alkyl substituted phenoxy structure. Three were phenols and three were phenolic ethers. In the case of p-toluidine, the correct structure was ranked third, with an index of 41. The six best candidates were all ring substituted alkyl anilines. In five cases, the amino function was primary; in one it was secondary. Application of SEARCH to experimental spectra from salicylamide yielded only two structures with indices less than 100. The correct one headed the list with an index of 31. The second best choice was 2,3-

2088

dimethoxymethylbenzoate. Interestingly, all of the six best structures, with indices ranging to 306, were derivatives of benzoic acid. The ability of the searching algorithm to suggest structural analogues can be useful for the situation in which the unknown is not represented in the library. Such a case is 9,lO-dimethylanthracene. When its experimental spectra were compared with the library, SEARCH reported t h a t the best candidates were 9,lO-dibromoanthracene and 2,3,9,10-tetramethylanthracene. The indices were 152 and 216, respectively, which are large enough to indicate that a perfect match was probably not found, yet small enough t o indicate a significant degree of structural similarity to the unknown. T h e third best candidate was 1-cyanopyrene, which gave the much less satisfactory index of 772. Table 1summarizes the SEARCH results for all fifteen materials studied to date. These results demonstrate that the computer file searching procedure can be successfully applied t o a library of the fluorescence spectra of 1000 compounds. With the addition of more compounds to the library or the linkage of the file searching procedure to a high pressure liquid chromatograph-fluorescence spectrometer system, a much more detailed comparison procedure would be required. This would probably necessitate t h e use of corrected spectra. With the enlargement of the library, some sort of “presearch” would be needed to limit the number of entries t h a t need to be considered by the detailed~comparisonprocedure. The present file searching procedure based on the most obvious spectral features would be ideally suited for the restriction of the compounds under consideration.

LITERATURE CITED (1) P. R. Naegeli and J. T. Clerc, Anal. Chem., 46, 739A (1974). (2) T. 0. Gronneberg, N. A. B. Gray, and G. Ellington, Anal. Chem., 47, 415 (1975). (3) S. R. Helier, D. A. Koniver, H. M. Fales, and G. W. A. Milne, Anal. Chem., 46, 947 (1974). (4) S . L. Grotch. Anal. Chem., 45, 2 (1973). (5) S. R. Heller, Anal. Chem., 44, 1951 (1972). (6) S.L. Grotch, Anal. Chem., 43, 1362 (1971). (7) H. S. Hertz, R. A. Hites, and K. Biemann, Anal. Chem., 43, 681 (1971). (8) B. A. Knock, I. C. Smith, D. E. Wright, R . G. Ridley, and W. Kelly, Anal. Chem., 42, 1516 (1970). (9) L. R. Crawford and J. D. Morrison, Anal. Chem., 40, 1464 (1968). (IO) D.S. Erley, Anal. Chem., 40, 894 (1968). (11) F. E. Lytle, Anal. Chem., 42, 355(1970). (12) D. S. Erley, Appl. Spectrosc., 25, 200 (1971). (13) R. W. Sebesta and G. G. Johnson, Jr., Anal. Chem., 44, 260 (1972). (14) E. C. Penski, D. A. Padowski, and J. B. Bouck, Anal. Chem., 46, 955 (1974). (15) R. C. Fox, Anal. Chem., 48, 717 (1976). (16) D. J. Freed and L. R. Faulkner, Anal. Chem., 44, 1194 (1972). (17) E. D. Pellizzari and C. M. Sparacino, Anal. Chem., 45, 378 (1973).

RECEIVEDfor review May 26, 1976. Accepted September 7, 1976. We are grateful to E. I. du Pont de Nemours, Inc., for supporting this work through a Young Faculty Grant and t o the National Science Foundation for enabling the purchase of the computer system under Grants GP-37335X and MPS-75-05361. Appreciation is also extended t o the Sadtler Research Laboratories, Inc., for permitting the creation of a special digital file of copyrighted information. Presented in part a t the 170th National Meeting of the American Chemical Society, Chicago, Ill., August 29, 1975.

ANALYTICAL CHEMISTRY, VOL. 48, NO. 14, DECEMBER 1976