Fast searching system for the ASTM infrared data file - Analytical

May 1, 2002 - Application to pressuremetric titrations of iodate and ammonium ions. David J. Curran ... Near optimum computer searching of information...
4 downloads 12 Views 525KB Size
tween CD and EF because of nitrogen inversion. The temperature dependence is ascribed to the change in exchange rate; rapid exchange at high temperature averages the spectra completely into an AB quartet, and rather slow exchange at low temperature decomposes the AB pattern into CD and EF parts. The power saturation effect shows that the longitudinal relaxation times of the A and B parts are the same and that the line width difference is caused by differences in their Tz values. The frequency dependence also gives clear evidence of the exchange. The higher magnetic field increases chemical shift difference between spin systems which are exchanging. Experimentally, the line width ratio at 100 MHz was. found to be about 100/60 times as great as at 60 MHz. The fact that the HzO and DzOgive similar results is also reasonable in that the separate spin patterns arise from sluggish

nitrogen inversion and not from spin-spin splitting between the acid proton and methylene protons. The concentration-independent line width ratio indicates that HY 3- does not exchange protons by a direct bimolecular reaction with Y4- or with HzY2-. This, the pH dependence, and water broadening suggest that H2YZ- and Y4- undergo rapid inversion of nitrogens, that the maximum line broadening of HY corresponds to the minimum contribution from intermediates of the type HzYz- or Y4-, and that the conversion of HY into HzY2- and HY 9- into Y 4- is accomplished by reaction with H+ and OH-, respectively.

RECEIVED for review December 7, 1967. Accepted February 28, 1968. The authors acknowledge financial support from the National Institutes of Health, Grant GM-12598.

Fast Searching System for the ASTM Infrared Data File Duncan S . Erley Chemical Physics Research Laboratory, The Dow Chemical Co., Midland, Mich. An IBM-1130 computer has been programmed to translate the ASTM deck of 90,000 infrared data cards into a disk file (three disks), and to search that file from data entered through the keyboard (typewriter). The total search time, excluding data entry and disk warm-up, is -90 seconds. Chemical group and absorption band information i s entered as either positive or negative data. Only those standards matching all of the input data are scored as hits. The input data may be readily modified for further searching. Other options include automatic searching of a 0.1-p interval on either side of the band entered, and eliminating any sub file from the search. The program is written in Assembler Language, and could probably be used on an IBM-1800 with little modification.

THENEED for improved infrared file searching methods has become critical with the great backlog of data on file, plus increases in the rate at which it is generated. Punched card systems ( I , 2) using a sorter or collator to process the cards are useful for small files, but are too slow for large ones. They do have the advantage that the equipment is low cost and may be used by the spectroscopist himself. The ASTM infrared data file, coded on punched cards, has also been stored on magnetic tapes, and programs for searching this file on high-speed computers have been written (3-6). This approach requires the use of a large computer (1) L. E. Kuentzel, ANAL.CHEM., 23,1413 (1951). (2) A. W. Baker, N. Wright, and A. Opler, ibid., 25, 1457 (1953). (3) R. A. Sparks, “Storage and Retrieval of Wyandotte-ASTM Infrared Spectral Data Using an IBM 1401 Computer,” ASTM, Philadelphia, Pa., 1964. (4) L. D. Smithson, L. B. Fall, F. D. Pitts, and F. W. Bauer, Tech. Doc. Rept. No. RTD-TDR-4265, Research and Technology Division, Wright-PattersonAFB, Ohio, 1964. (5) T. A. Entzminger and E. A. Diephaus, “Storage and Retrieval of Wyandotte-ASTM Infrared Spectral Data Using a Honeywell400 Computer,” U. S. Public Health Service, Robert Taft Sanitary Engineering Center, Cincinnati, Ohio, 1964. (6) Sadtler Research Laboratories, 1517 Vine St., Philadelphia, Pa. 894

ANALYTICAL CHEMISTRY

which is not only expensive, but may be relatively inaccessible to the spectroscopist. In addition, passing the tape file through the machine is a relatively slow process and, while many searches can be done during this time, the spectroscopist is often:concernedwith only one unknown spectrum. In 1967, the author reported on a fast (2200 standard spectra per second) system for searching The Dow Chemical C0.s file of infrared data using an IBM-1130 computer (7). This low-cost machine has recently become widely available, and considerable interest was expressed in adapting the search system to the commercially available ASTM infrared data file. This report describes such a system. The search is rapid (-1000 standard spectra per second) and data entry has been programmed so that the spectroscopist may run searches himself with a few minutes’ instruction. In addition, the data entered may be conveniently altered if re-searching the file is desirable. The program was written in Assembler Language and requires -7000 words of core memory. It could probably be used on an IBM-1800 with little modification. File Generation. In order to make the data file as compact as possible, only those data from the cards which were felt to be useful in finding an unknown infrared spectrum were retained. The data were condensed into 10 16-bit words as shown in Table I. Words 1 and 2-a binary number and two letters-identify the serial number of the standard (8). Words 3 and 4 contain selected elemental, physical, and structural data about the compound as shown in Table 11. Words 5-10 contain the absorption band positions (5.515.0 p ) ; each card punch generates a bit in a corresponding computer word. One 512,000-word disk could theoretically hold the data from 51,200 standards; however, a certain amount of disk (7) D. S. Erley, Paper No. 121, XI11 Colloquium Spectroscopium Internationale, Ottawa, June 1967. (8) “Codes and Instructions for WyandotteASTM,” ASTM, 1916 Race St., Philadelphia, Pa., 1964.

A. SAMPLE B. GROUPS AND BANDS PRESENT C. GROUPS ABSENT. NO BAND REGIONS Figure 1. Data input requests typed by the computer

storage is required for programs, subroutines, etc. ; therefore, it is more convenient to store only 41,472 standards per disk. The present file requires three disks, with space on the third disk for an additional 32,518 standards. A file generating program WYCDK was written to translate the ASTM cards to disk storage format. The Search Program. Data from an unknown spectrum are typed directly into the computer by the spectroscopist and are used to form a series of masks. The corresponding data words from the disk file of standard spectra are compared with these masks by logical AND and EXCLUSIVE OR operations. Those which match all of the input data are scored as “hits.” A general description of the main program segments- Data Input, Searching, and Output plus Data Modification- is given below. Input. Figure 1 shows the data input requests typed by the computer. For convenience all input/output operations are programmed on the console typewriter. First a sample identifier is entered (A). Next, information about the presence of functional groups or elements and the positions of the observed absorption bands is typed in (B). Group data are accessed by the two digit codes shown in Table 11. Absorption band positions may be entered either as wavelength or wavenumber. Consecutive entries such as 10.1, 10.2, 10.3, 10.4 are generated by using a dash (10.1-10.4). Finally, negative data, the absence of groups or elements and “no band” regions of the spectrum, are requested (C). After the final data entry, the program retypes the data in the masks as shown in Figure 2 0 . Operator entries have been underlined. It will be noted that for each “group” datum, the name as well as the code number is printed. Wavenumber entries are converted to wavelength, and negative data appear in parentheses preceded by a logical NOT sign (7). Searching is begun while the data are being retyped. Searching. Standard data are read from the disk and processed at the rate of -1000 standards per second. Processing overlaps disk reading for maximum speed. The incoming absorption band data are “wiggled” to generate a bit on either side of those already present, so that a compound having a band coded in an interval adjacent to one requested will score a hit. The ‘‘wiggle’’ feature is not utilized until after the “no band“ comparisons have been made; therefore, it has no effect on them. The serial numbers of the first 100 hits are stored for printout, and searching is terminated when a zero standard number is found. Output and Data Modification. Following the search, the number of hits is printed, and five data switch options are requested (Figure 3). These are executed by turning on the appropriate data switch and enable the operator to: 1. 2. 3. 4. band

Have the serial numbers of the hits printed. Modify the data entered for re-searching the file. Change disks if more than one is used for the data file. Skip the subroutine that “wiggles” the absorption data. 5. Eliminate any sub-files from the search.

Table I. Translation of ASTM-IR Data Cards into 10 16-Bit Words Word

Identification

1 2

Serial No. 0-65535 Two letters designating ASTM and user sub-files (Card Cols. 79 and 71 in source data card) (see Table 111) Elemental data-see Table I1 Group data-see Table I1 Absorption bands-5.5-7.0 p Absorption bands-7.1-8.6 p Absorption bands-8.7-10.2 p Absorption bands-10.3-11.8 p Absorption bands-1 1.9-1 3.4 p Absorption bands-13.5-15.0 p

3 4 5 6 7 8 9 10

Table 11. Physical and Elemental Data-Words Code (bit number) WORD 3

01

02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 WORD 4

5

17 18 19 20 21 22 23 24 25 26

Identification

Card column-row

Solid Liquid Solution Polymer Inorganic Oxygen Nitrogen Sulfur Fluorine Chlorine Bromine/iodine Phosphorus/bismuth Arsenic/antimony Silicon/germanium Tin/lead Boron/aluminum

32-6 32-7 32-8 32-9 32-X

Metals Salt

32-Y 39-8

C=C

33-x

e c

Acyclic Alicyclic Aromatic Heterocyclic Fused C-0

27 28 29

OH

30

C=N

NH C-0-C

31

NO2

32

S=O

3 and 4

38-0 38-1 38-6 38-7 39-Y 32-0 32-1 32-2



32-3

32-4 32-5

33-Y 34-0 34-1 34-2 34-3

34-4, 5, 65 42-0, 1, 2, 3, 45 48-0, 1, 2 52-0. 1. 2 54-0; 1 42-7 44-4, 50 42-5, 6, 8, 95 43-2 44-25

48-4 50-4 48-5, 85 49-0 52-3, 4, 5, 6, 7, 85 53-0, 1 54-3, 4, 5, 6, 7

A punch in any one of the locations generates a bit.

OPTION1, “PRINT SERIAL NUMBERS,” is self-explanatory. If more than 100 hits are scored, only the first 100 serial numbers are printed. Output consists of five digits plus the two letters designating the ASTM sub-file (Table 111). OPTION2, “MODIFY DATA,” enables the operator to enter VOL 40, NO. 6, MAY 1968

095

SAMPLE: EXAMPLE OF DATA INPUT

(METHYL ACRYLATE)

GROUPS AN0 BANDS PRESENT CROUPS ABSENT. NO BANO REG

9

00-iiso.aoo-15.0

INPUT DATA 19-C~C,26.-C~0,5.8,6.9,7,1,7.~,8.3,8,5,9.3,~~27-0H,28-NH,5,5-S.6,S.9-6.0,7.2-7.7,~.7-9.1,12,5-~5.0~~

4 Figure 2. Infrared spectrum of methyl acrylate, and computer input for search Operator entries are underlined

NIJMRER O F H I T S

00000

1-PRINT S F R I A L NOS 9-MODIFY DATA 3-CHANRE D I S K S 4-SKIP WIGGLE 5-ELIM FILES

Figure 3. Options which may be selected by data switch settings

SAMPLE:

P-OICHLOROBENZENE

GROUPS ABSENT. AN0 BANDS GROUPS NOPRESENT BANO RE

1.9.b-9.7.10.0-12.0.12.5-15.~

INPUT DATA

6.8,~.2,9.lr9.9,12.3,~(5.S-6,0,7.3-~.8,9,4-9,7~10,0-12.0~12~5-15.0~~ NUMBER OF HITS 00003 I - P R I N T SERIAL NOS 2-MODIFY D A T A O C H A N G E O I S K S )-SKIP PRESS PROG START WHEN F I L E READY NUMBER OF H I T S 00009 1-PRINT SERIAL NOS 2-MOOIFY OATA@CHANBE PRESS PROG START WHEN F I L E READY NUMBER OF HITS

00009

SERIAL NUMBERS 00146 C 19140 C

Figure 4. Infrared spectrum of pdichlorobenzene and typical search

I10 Operator entries are underlined Data switches turned on are circled

896

ANALYTICAL CHEMISTRY

@PRINT

0 1 2 4 5 CA

WIGGLE 5-ELIM F I L E S

OISKS 4-SKIP WIGGLE I - E L I M F I L E S

SERIAL NOS 2-MODIFY DATA 3-CHANGE OISKS 4 - S K I P WIGGLE 5-ELIM F I L E S 06767 E

07611 E

17687 E

02382 F

03350 F

OD506 J

SAMPLE: RESORCINOC GROUPS AND BANDS PRESENT GROUPS ABSENT. NO BAND RE

NUMBER OF H I T S 00001 1-PRINT SERIhL NOS 2-MODIFY DATA@CHANGE PRESS PROG START WHEN F I L E READY

D I S K S 4-SKIP WIGGLE 5-ELIM F I L E S

NUMBER OF H I T S 0 0 0 0 4 1-PRINT SERIAL NOS 2-MODIFY DATA@CHANGE PRESS PROG START WHEN F I L E READY

DISKS 4 - S K I P WIPPLE I - E L I M F I L E S

NUMBER OF H I T S

00007

@PRINT

SERIAL NUMBERS 01515 J 00616 CM 0 1 1 1 2 G

SERIAL NOS 2-MODIFY DATA S-CHANGE D I S K S 4-SKIP WIEGLE 5-ELlM F I L E S 00161 H

00467 C

09700 C

0 4 0 2 3 CD

Figure 5. Infrared spectrum of resorcinol and typical search IJO Operator entries are underlined Data switches turned on are circled

SAt1PLE:

EXAMPLE

OF

nATA M O D I F I C A T I O N

(METHYL

ACRYLATE)

GROUPS AND B A l l n S P R E S E N T GROUPS ARSENT. NO BAND REG I N P U T DATA 2 6 - C - 0 , 5.8,7.1,7.8,8.3,12.30Y712-71 NUMBER OF H I T S

00009

1-PRINT

7,12.5-15.

00)

SERIAL NDS@HODIFY

D A n

3-CHANGE DISKS 4-SKIP WIGOLE 5-ELlH FILES

fi

GROUPS AND BANDS P R E S E N T &lllllO,8u GROUPS ARSEHT. NO BAND R E G I O N S 1.7.13.3

6

19-C~C,26-C~0,5.8,6.9,1.1,7.8,8,3,~~27-0H07,2-7,7,12.5-13,2,1~,~-15,0~~

I N P U T DATA NUMBER

OF H I T S

00007

S E R I A L NUttBERS 01166 C 02038 C

C

@PRINT

S E R I A L HOS@MODIFY 28442 C

25282 C

0 1 4 3 2 CD

DATA 3-CHANGE DISKS 4-SKIP WlGPLE@ELlM FILES 01760 CD

01799 ED

GROUPS AND BANDS P R E S E N T GROUPS ABSENT. NO BAND R E R I O t t S ELIMINATE F I L E S C I N P U T DATA

19~C~C~26~C~0~5~8~6~9~7.1~7~8,8.3,~~27~0H~7~2~7~7~12~5~13~2~13~~~15~ NUMBER

OF

HITS

00003

S E R I A L NUMBERS 0 1 4 3 2 CD 0 1 7 8 0 CD

@PRINT

S E R I A L NOS

2-MODIFY DATA 3-CHANGE DISKS 4-SKIP WIGGLE I - E L M FILES

0 1 7 9 9 CD

Figure 6. Data modification steps Note that data may either be added to, or selectively removed from the sear& masks The use of the “ELIM FILES’option is also illustrated

VOL 40, NO. 6, MAY I960

897

additional data, or to delete selectively any data previously entered. The latter is accomplished merely by re-entering the datum to be deleted. This is most useful when several combinations of bands are to be tried, as the group and no band information need not be re-entered. OPTION3, “CHANGE DISKS,” causes the computer to wait while one disk is taken out, and a new one inserted; a process taking about 3 minutes. No re-entry of data is required. The number of hits is cumulative and the serial numbers of standards from all disks are stored for printout. OPTION4, “SKIP WIGGLE,”causes the program to branch around those steps which “wiggle” the incoming absorption band data. This makes the search more selective, but increases the possibility of missing a standard actually in the file. The “wiggle” feature is especially important if bands lie close to the boundary between two coded intervals. OPTION5, “ELIM FILES,”allows entry of the ASTM subfile letter codes (see Table 111) of any files which one is reasonably sure do not contain the unknown. Those standards, then, will be skipped. The file data are requested after the “no band” data have been entered (see Figure 6C).

RESULTS Figure 4 shows a spectrum of p-dichlorobenzene and the input/output sheet from a computer search for it. Data typed in by the spectroscopist have been underlined; the data switches turned on have been circled. Of the nine hits scored, seven were p-dichlorobenzene, one was p-chlorobromobenzene, and one was p-chlorobenzenethiol. The total search time, including data entry and disk changes, was less than 10 minutes. A correct answer was obtained from the first disk in 2 minutes. Figure 5 shows a similar search for resorcinol. In this case all the hits were spectra of resorcinol. Figure 6 shows the use of the “MODIFY DATA” and “ELIM FILES” options. Note the changes typed in at ( A ) have been made when the input data are retyped the second time at ( B ) ; that is, Group 19-C=C and band 1440/6.9 have been added as requirements while band 812/12.3 has been deleted. Group 27-OH has been added as negative data, while the 13.3 interval (previously generated as part of the 12.5-15.0 “no band” requirement) has been taken out of the no band mask. On the third pass, certain files were eliminated note that those serial numbers are not printed as hits at (0).Only one disk was searched for this example.

(a;

898

ANALYTICAL CHEMISTRY

Table 111. ASTM Sub-File Designations First Letter A-American Petroleum Institute Project 44 B-User’s own file of spectrograms C-Sadtler Catalog of Spectrograms D-NRC-NBS file of spectrograms E-Spectrograms abstracted by ASTM-sponsored groups F-Documentation of Molecular Spectroscopy G-Coblentz Society spectrograms H-Manufacturing Chemists’ Association J-Infrared Data Committee of Japan Second Letter-“C” File A-Agricultural chemicals L-Solvents B-Polyols M-Intermediates C-Surface-active agents P-Petroleum chemicals D-Monomers and polymers, resins R-Pharmaceuticals and gums, pyrolyzates S-Steroids T-Textiles E-Plas ticizer s F-Perfumes and flavors U-Food additives W-Attenuated total G-Waxes and derivatives reflectance H-Lubricants X-Pigment J-Elastomers and rubbers Y-Inorganics K-Fibers Second Letter-“F” File A-Inorganics

DISCUSSION

The program described here, while not as flexible as those written for larger computers, has the singular advantage that the spectroscopist himself may use it on a low-cost computer. Each disk file of -40,000 standards is searched in only 40 seconds, which allows several tries to be made in a few minutes’ time. The number of hits and their names (which can be looked up quickly) often suggest modifications of the initial search data which are easily done with this program. This is, perhaps, the greatest advantage of this system over others now in use. The necessity of changing disks is obviated in some cases because of multiple entries in the file. For example, using the data from Figure 4, resorcinol was found on all three disks, hence searching two of the disks was actually unnecessary. The use of a multiple disk drive system would, of course, be a better solution. RECEIVED for review December 27,1967. Accepted February 16, 1968.