Automatic sorting of infrared spectra - ACS Publications

trinsic temperature coefficient are therefore the limiting factors in establishingthe temperature resolution of a thermistor. Although our primary int...
0 downloads 0 Views 649KB Size
characteristics, with the exception of their resistance, would be constant. The thermistor dissipation constant and intrinsic temperature coefficient are therefore the limiting factors in establishing the temperature resolution of a thermistor. Although our primary interest in this work was analytical rather than calorimetric, one can estimate the precision of the device for microcalorimetry from the amount of acid added and the reproducibility of the total voltage change due to chemical reaction. The calorimetric precision for the titration of 15pM perchloric acid was estimated by comparing the ratio of Y-axis deflection to the end-point volume. This was done to avoid the errors due to uptake of carbon dioxide. For three replicate titrations, this ratio was precise to 7 %, in those runs wherein the total temperature change and heat liberated were 200 p ” C and 20 millicalories, respectively. The calorimetric system is easily scaled down by a factor of 4 in a 25-ml system. Thus we expect that approximately 5 millicalories could be measured with the same ptecision. These results may be compared to those obtained with the differential heat flow microcalorimeter (volume ca. 5 ml) used by Benzinger et a / . (19). It has a precision of =t1.1 at the 5 millical. level. Such systems have a response time of 15 minutes. Clearly, our temperature resolution could be substantially improved by trading time response for greater sensitivity by decreasing the system band width. Table I indicates that very dilute solutions can be analyzed thermometrically provided that the reaction is sufficiently complete and rapid, and therefore produces a reasonably straight titration curve. The end-point precision at this level is not set by the buret employed. Most likely the real source of random end-point error is temperature imprecision (8). The results of this table should be compared to those of Jahr, Weise, and Schuchardt (17) who successfully titrated -200pM potassium hydroxide at a precision of 0 . 2 x and (19) C. Kitzinger and T. H. Benzinger, “Methods of Biochemical Analysis,” D. Glick, Ed., Vol. 8, Interscience, New York, N.Y., 1958, p 309.

to those of Jordan and Dumbaugh (20) who, in an earlier study of the precision of thermometric titration, obtained a precision of 8 z at the 0.8mM level and a limiting precision of about 1% at higher concentrations. It is interesting to note that the per cent uncertainty is inversely proportional to the analyte concentration, i.e., up to a concentration of about 1.5mM the product of the first and seventh column entries of Table I is roughly constant in accord with an a priori prediction (8). At concentrations greater than this, the precision becomes limited by the volume resolution of the buret. The results strongly indicate that the limiting precision in constant rate titrations is the reproducibility of the buret. We attempted to improve this situation by the use of a linear position transducer ; however, the small voltage required to avoid self heating of the linearly variable resistor and the effect of the continuous drain on the mercury cells employed were equivalent to the inherent imprecision in the pump and syringes. It is certainly true that the linear position transducer is considerably more convenient than a simple time drive system but is no more precise or accurate. As shown in Table I, high precision can be obtained by preadding a deficient but known quantity of titrant from a high precision pipet or buret and then locating the end point by constant rate titration. Presumably better burets or coulometric reagent generation could be used to provide higher precision in continuous titrations. ACKNOWLEDGMENT

We thank Marshall Williams of the University Electronic Design and Maintenance Shop for aid in the design and construction of the lock-in amplifier.

RECEIVED for review December 3, 1971. Accepted April 26, 1972. The Public Health Service supported this work through Grant G M 17913. (20) J. Jordan and W. H. Dumbaugh, Jr., ANAL.CHEM., 31, 210

(1959).

NOTES

Automatic Sorting of Infrared Spectra C . S. Rann National Biological Standards Laboratory, Box 462, Canberra City, A.C.T. 2601, Australia

IDENTIFICATION OF ORGANIC COMPOUNDS by use of infrared spectra has been in use for many years. However, extensive use of this technique has been limited by two fundamental difficulties. First, a large library or atlas of infrared spectra is required ( I ) and second, the time required to search this library (40,000 spectra) for a spectrum which matches the unknown compound is very great. In practice the time for such a search is so great that it is rarely, if ever, undertaken unless other information is brought to bear on the problem in (1) “Documentation of Molecular Spectroscopy,”Butterworth and

Co., London.

order to pre-sort the spectra to produce a much smaller group which can then be sorted within a reasonable time. A few large institutions have electronic data sorting equipment available and in such cases it is possible to undertake the more extensive search of the infrared library [e.g., using ASTM cards or magnetic tapes (2)]. The present system is coded in a similar fashion to that used by the Sadtler “Spec-Finder’’ system (3) but the subsequent processing details and method (2) Wyandotte-ASTM (Kuentzel) Punched-card Index, American Society of Testing and Materials, Philadelphia,Pa. (3) Sadtler Research Laboratories, Philadelphia,Pa.

ANALYTICAL CHEMISTRY, VOL. 44, NO. 9, AUGUST 1972

1669

i I

I

0

*

1

2

3





4

5

6

7



8



9



Figure 1. Method of coding an infrared chart ( A ) Infrared chart with superimposed overlay for coding ( B ) Infrared chart showing the positions used in coding the number 8008678185

I

Ll IO

15

20 DilfQrencr

2s

30

35

Figure 2. Histogram of the “sum of the differences” obtained in matching 300 drugs to an unknown sample. (The unknown gave a match of “0” when compared to the correct drug in the comparison experiment)

of operation provide greater speed and much improved discrimination in spectra identification. The work described in this paper resulted from the need for rapid identification of pharmaceutical products, but the technique evolved can be applied to a very wide range of laboratory identifications, or indeed to any curve matching problem. The method has been designed for use with a small computer 1670

or a desk top calculator. Many laboratories have these small units available either for instrumental automation control of small pilot plants or simply as laboratory calculators. The method presented here was developed for use with a programmable desk top calculator. Peripheral equipment in the form of a teletype unit and interface is also required if a large number of spectra are to be sorted automatically. Coding the Spectrum. The spectrum to be coded is divided into ten sections, the divisions being arranged such that more sampling is provided in the “finger print” regions of the spectrum. Figure 1 shows a typical infrared chart and the same chart with overlay in place for coding. Each of the ten sections are further subdivided into ten. The code digit for each section is obtained by taking in this case the maximum absorbance of the curve within that section and assigning to it the digit above it on the transparent overlay (In almost all cases this will represent the strongest band in the section). These digits have been marked on the chart, at the sampling points used. Reference to the chart and overlay demonstrates the coding positions for the code number 8 0 0 8 6 7 8 1 8 5. Several other methods of coding could be used. However, any method selected should be such that the coded digits are independent of the actual absorbance (or transmission) level of the spectrum. The digits must code an invariant property of the spectrum, such as the wavelengths of the highest or lowest transmission. After coding, each spectrum is represented by a 10-digit number which has a crude relationship to the shape of the spectrum and has the required feature that any two operators coding a particular spectrum will obtain the same (or very similar) code number. In coding the spectrum with the overlay, there are occasions when the reading of the digit is not

ANALYTICAL CHEMISTRY, VOL. 44, NO. 9, AUGUST 1972

PrO-

gram Step

00

01 02 03 04

05

06 07 08 09 0A 0B

0c

0D 10 11 12 13 14 15 16 17 18 19 1A 1B 1c

1D 20 21 22 23 24 25 26 27 28 29 2A 2B 2c 2D 30 31 32 33 34 35 36

Mnemonic Code CLR STP XTO A FMT

Machine Code 20 41 23 13 42

XTO 9 FMT F XTO B 1

23 11 42 15 23 14 01

XTO C EEX 1

23 16 26 01 01 23 17

F

0

1

XTO D GTO 3 B A

UP D DIV DN UP INT -

C X DN INT RDN B XEY D DIV DN UP INT -

C X DN INT

15

PrO-

Input/Output Operations ENTER UNKNOWN SPECTRUM

gram Step 37 38 39 .. 3A 3B 3c 3D 40 41 42 43 44 45 46 47 48 49 4A 4B 4c 4D 50 51 52 53 54

SPECTRUM IDENTIFICATION FROM TAPE COMPARISON SPECTRUM FROM TAPE

00

44

03 14 13 27 17 35 25 27 64 34 16 36 25 64 31 14 30 17 35 25 27 64 34 16 36 25 64

55

56

57 58 59 5A 5B 5c 5D 60 61 62 63 64 65 66

MneMamonic chine Code Code 34 IYI 55 60 hC+ 17 27 UP 16 C 35 DIV X=Y 50 4 04 8 10 YTO 40 D 13 44 GTO 1 01 A 13 E 12 UP 27 5 05 53 X>Y 5 05 4 04 CLR 20 44 GTO 00 CI 4 04 67 XFR 9 11 42 FMT 8 10 B 14 42 FMT 8 10 E 12 42 FMT 12 E 42 FMT 9 11 CLR 20 44 GTO 0 00 4 04 END 46

Storage Registers

E D

c

B A 9

Input/Output Operations

PRINT SPECTRUM IDENTIFICATION PRINT COMPARISON SPECTRUM PRINT SUM OF DIFFERENCES

SUM OF DIFFERENCES 10 TO THE POWER OF N DIVISOR 10 COMPARISON SPECTRUM UNKNOWN SPECTRUM IDENTIFICATION NUMBER

Figure 3. Program for a Hewlett-Packard Model 9100B desk calculator uniquely defined by the above rules and it is necessary to formulate further rules to handle these specific cases. Providing the rules are consistent, it is not a matter of particular importance which actual method is used. For example, the following rules are the ones used in the chart shown in Figure 1. When the peak absorbance in the range occurs on a grid line, take the lowest of the two alternative digits. If two or more equal peaks occur in the sampling range, choose the point with the lowest digit. If the spectrum is flat across the range, i.e., there is no peak absorbance, choose the lowest digit; in most cases this will be zero, but in some cases it will be the lowest digit before the spectrum starts to show a decrease in absorbance. Comparison of Two Spectra. In sorting the library of known spectra for a match with the spectrum of the unknown compound, it is necessary to have some figure of merit to evaluate quantitatively the degree to which the two spectra under investigation match each other. As the spectra are each represented by a 10-digit number, the digits can be individually compared in sequence. The method of comparison used here was simply to sum the modulus of the difference between the comparison digits taken sequentially, which will be called the “error.” For example:

Unknown spectrum Comparison spectrum Modulus of the differences Error (sum of the differences)

8008678185 8137678274 0131000111 = 8

A perfect match would result in the sum of the differences being zero. In practice, as several operators may have been involved in coding the spectra, some differences could occur between the digits for identical spectra. A low value of say 5 for the sum of the differences would merit a manual inspection of the spectra involved. An infrared spectrum is a complex curve; hence a large number of sampling points should be used in defining it. The ten points used here would normally be considered to be quite inadequate, but practical testing has shown that such a low number of sampling points can indeed be sufficient for sorting spectra. This result is of vital importance to the small computer user as the storage requirement, Le., memory capacity, of the small computer is very limited. In the histogram shown in Figure 2, the results are presented for the matching of approximately 150 standard drugs against two separate unknowns ; therefore the histogram is made from approximately 300 matchings. In this experiment both unknowns gave a match of “0” which has not been re-

ANALYTICAL CHEMISTRY, VOL. 44, NO. 9, AUGUST 1972

*

1671

Table I.

Comparison of Sulfa Drugs

Drug

Unknown No. 1 -1 3

805 605 805 605 605 605 605 705 605 605 605

4 5 6 7 8 9 10

Code 804 394 804 285 804 395 802 285 825 545 861 285 713 235 884 265 882 175 803 275 504 286

Error 1

2 1

2 2 2 2 3 2 2 3

6 I 8 14 15 14 16 18 8 11

Table 11. Format Instructions for H/P Coupling Box Type 2570A Format Machine code Purpose PRINTS 10 digit number from X 8 (@ 0 I, (@ 7H register 9 @0J, Carriage return, line feed on the TTY

0 I,

E

@

F

(@ 7 E, @ 00

@ 7M

PRINTS 4 digit number from X register TRANSFERS number on tape into X register

corded in the histogram. It can be seen from Figure 2 that when two different drugs are matched, the sum of the differences is most likely to be higher than 20. There are few matches below 10 and any spectra showing a match of 5 or less should be compared with the unknown manually. In all tests undertaken so far with this program, the unknowns matched the correct comparison spectra with an error of zero except in one case where the error was one. Even for a series of closely related drugs, the error is high except for the correct drug. Table I shows the code numbers of 10 sulfa drugs, all closely related in chemical structure. The unknown spectrum was obtained by a second operator taking a drug at random, preparing the infrared spectrum, coding the spectrum, and then making the comparison. As can be seen from the error column, there can be no doubt that drug No. 2 comes much closer to matching the unknown than any other drug. The error of 1 in this case was due to minor differences in coding technique between the two operators involved in the test. EXPERIMENTAL The equipment used was a Hewlett-Packard desk top calculator type 9100B interfaced to a Teletype Corporation teletype Model 33 by a Hewlett-Packard coupling box type 2570A.

1672

The library of spectra was held on a paper tape. The operator was required to enter digits of the unknown spectrum into the calculator, load the tape into the teletype reader, and start the program. The program would then run unattended until the library tape had been processed. Any comparison spectrum with a figure of merit less than 5 would be printed out by the teletype, together with a reference number which enabled the details of the compound and an infrared spectrum chart to be located in the main library of drug spectra. The Hewlett-Packard type 9100B calculator has 28 storage registers with which to store both the computer program and any numerical data required for the program. The unknown and comparison spectra require twenty registers each holding one digit. The storage space required for these numerical data considerably reduces the storage space available for the program and is very inefficient in that a register stores only a one-digit number, whereas it can store up to 12 digits in one number. It is therefore necessary to split the register so that the ten digits of a spectrum can be held in only one register and be fed out in sequence as the program calls for each digit in turn. The register was split by a subroutine which divided the number in the register by a given power of ten such that a decimal number was obtained that was greater than 1 and smaller than 10. By using the “integer” facility on the desk calculator, it was then possible to obtain the first integer free of the remaining decimal fraction. After use, this integer was subtracted from the original number and the process repeated to obtain the next integer in the sequence. The program is shown in Figure 3. The format statements (FMT) used are presented in Table 11, together with the equivalent machine language instruction to the system controller. The program required 6 registers (Le., 84 memory locations) and 2 further registers were required for holding the spectra. Using the tape reader on the teletype Model 33, the tape is read at 10 characters/sec. In a typical operation, 150 drugs would be processed in 5 min. If the Hewlett-Packard high speed tape reader Model 2748A were used, at a n operating speed of 500 characters/second, approximately 7500 drugs could be processed in the same time. The number of spectra that can be handled in this way is limited only by the size of the paper tape processing spools. If multi-spool storage facilities are used, any number of spectra could be processed. ACKNOWLEDGMENT The author wishes to acknowledge suggestions and advice offered by C. E. Kendall during the development of this method and to thank the Director, National Biological Standards Laboratory, for permission to publish the paper. RECEIVED for review October 15, 1971. Accepted February 10,1972.

ANALYTICAL CHEMISTRY, VOL. 44, NO. 9, AUGUST 1972