Chemical databases and the laboratory computer - Journal of

Chemical databases and the laboratory computer. Harold M. Bell, Kevin Harrington, and Jungun Lee. J. Chem. Educ. , 1991, 68 (4), p A99. DOI: 10.1021/ ...
0 downloads 0 Views 1MB Size
the computer bulletin boord Chemical Databases and the Laboratory Computer Harold M. Bell, Kevln Harrlngton, and Jungun Lee Virginia Polytechnic Institute and State University Blacksburg. VA 24061

Large databases are in widerpread use in gowrnment and industrral latwratorirs, but applirations in acndemia arc limited because there are not many low-cost commercia1 products suitable for use in teaching laboratories. We wish to announce the availability of an organic chemical database that can be used in laboratory classes to illustrate database searching and, a t the same time, provide useful results. dBASE I11has been used to build a database containing physical properties and spectroscopic data for nearly 2500 organic compounds. Included are the name, formula, molecular weight, boiling point, melting point, refractive index, density, mass spectrum, ultraviolet spectrum, references for all data, and also infrared and NMR spectral references. Finally, structural attributes are specified for each compound to indicate the presence of OHINH

groups, carbonyl groups, benzene rings, and the type of C-H bonds in the molecule (saturated, unsaturated, hath). In choosing compounds to include, emphasis was placed on simple aliphatics through C-10, common cvclic eomoounds. and mono- and di-substit&d henz'enes The datalme contams 1230 eompound~w t h at least Onc Oxygen, 155 wih nitrogen. 56 with w l f u r , and 515 w t h one or more chlorine, bromine, or iodine. There are approximately 1800 compounds with mass spectral data and 850 with ultraviolet data. Toconserve space, massspectral data indudes only the mle values for the four most intense peaks, and ultraviolet data includes, a t most, three absorption maxima and the corresponding molar absorptivities. There are 1822 densities, 1794 refractive indices, 1638 melting points, 1835 boiling points, 1786 NMR references, and 1860 infrared references. The database is used in two ways: compound Look-up and data search. The program for compound look-up provides for either searching by name or formula. The data search program allows for refractive index, density, melting point, mass spectrum, ultraviolet spectrum, or structure attribute searching. For a given search, e.&, melting point, the list of "hits" is printed and also automatically saved for further searches.

The dBASE software was chosen because of its low e a t and widespread availability, and also because it is possible to write powerful searching algorithms. Our programs are designed for students who are tatally unfamiliar with dBASE. Ofcourse, users familiar with the dBASE commands would be able toperform other kinds of searches than those included in our programs. dBASE has two notable shortcomings. (1) Some of the more sophisticated searching mutines cnn be rather uluw, taking 5-8 min to compl~te m a n 1B.M PC 1'2)I t does not have ~nrvisron far either displaying or searching structural formulas. This is a particularly severe drawback that markedly limits the utility of the database. However, we have partly overcame this deficiency through use of the aforementioned structure attribute fields. In order t o incorporate suhstructure searches, we modified the database for use with ChemBase from Molecular Design Limited'. This software bas provision for structure display, and substructure and data searching. We now have 3200 compounds, with structures. The 700 additional entries are largely fluoro-organic, heteracydies, and nonbenzenoid aromatics. ChemBase is not as easy for students to useas dBASE. Data searehing is disappoint-

Volume 68

(Continued on page AIW)

Number 4

April 1991

A99

the computer bulletin boord ing, for there is no pro\.iriun for user-friendly rearch programs to he written. I n d ~ e dt,u do data searches the user must learn the ChemBase search commands and the field names to he searched. For a single field, such as melting point, there is little difference in the two approaches, hut in a multifield search, dBASE is better. For example, in searching the four-peak mass spectral data on dBASE, the user is prompted to enter the mle values of the four to eight most intense peaks in the spectrum and also the highest mle value. Our program searches for all compounds that have at least three of these listed. The same operation on ChemBase would take several searches. each rrquiring a rather lengthy command string to he typed by the user. For suhstructurr searcher, the user must learn the ChemBase molecule editor. However, it is rather easy to learn and the power of structure searching make the effort clearly worthwhile. The ability to attach an "atom value" to an atom in the structure is particularly useful. In this way, NMR chemical shift data may he associated with a given atom type, a n d structures may be searched, with or without associated atom values. For example, every para-disuhstituted benzene with proton chemical shifts of 7.0-1.2 and 7.5-7.7 ppm could he quickly found. Our database has proton chemical shifts included for all compounds. Roughly two-thirds of these are taken from the literature, and one-third are estimates, either from similar compounds with published spectra or from tables of additive shift parameters. Shift data are included only for protons attached to carbon, for OH and NH proton shifts are often sensitive to solvent, concentration and temperature. Data files for both dBase and ChemBase are available upon request. There is no charge if formatted disks and a prepaid mailer are sent: otherwise, the cost is $8.00, payahle to VirginiaTech Chemistry Department. dBASE data files, index files, and programs require three 5.25-in. disks. ChemBase data and associated files require three 3.5-in. disks or five 5.25-in. disks. Correspondence should he addressed to H. M. Bell, Chemistry Department, VirginiaTeeh, Blacksburg, VA 24061.

' We are glatef~lto Molecular Desrgn Lfmlted. 2132 Fara on 01vs, San Leandro. CA 94577. for provloing ChemBass for thls *or& a1 no cost

Literature Cited 1. Cornelius, R.D.; Norman. P. R.J. Chrm. Educ. 1983, m sa

1019.

5. Guevremont, R.;Kratoehvil, B. A n d C h e m 1978.50,

1987,197,373.

-.

8. Leung,D. K. C.:Tae, R. S. Computsrs Chem. 1989,13. ,o. 9. Chsu. F. T.: Chik. A. S.W. J. C h e m Educ. 1989. 66. A61-A62. 10. Nsmbi. P.3. J ChemEduc. 1989,66,A163. 11. Coleman. W. F. J. Cham.Educ. L990,67,A203.