Science lnformation Retrieval A New Undergraduate Course S. V. Meschel Rooseven Universiw, Chicago, IL 60608 Currently information retrieval is usually taught in library schools as part of their graduate programs. Chemical companies often send their staff to attend vendor-snonsored workshops to facilitate training of chemists as well as technical librarians. The laree maioritv . . of science students as well aa academic and research staff still rely largely on librarians or other professional intermediaries to have their information needsfilled. Having held a position as a professional intermediary for some years a t the university of Chicago, I realized that information retrieval could be made more efficient and more cost effective if the users were better informed ahnut what to exnect and what nreliminarvinformation thev -~~~~ need to supply in order to receive the best possible results. With this nuruose in mind I desiened a course that was to build a br;dge between the ~ i b r & yresources and the students' science information needs. The course had its trial run at Roosevelt University under the auspices of the Chemistry Department. Since many of the students who enrolled in the class were industrial chemists, 1 expected that by the end of the course they would be able either to perform some of the retrievals themselves or a t least to formulate their needs to a professional intermediary more effectively. ~
.
~
~~
~~
.
General Descrlptlon of the Program Roosevelt Universitv is an urban universitv drawine.. lareely on student populatron who already have work experience or hold iohs while attendine school. The Chemistrv Department oifers BS and MS programs. The ~ e ~ a r t m e was nt conscious of the need to have their students acquainted with current information sources as well as provide some online experience. Upon their encouragement I took up the challenge to fill this need and design a course to provide a learning experience to suit this particular student population. The classes in the past two years were composed of MS candidates and senior-year chemistry majors. Most of the students held jobs in the chemical industry, which defined their subject area interests and characterized their information needs. Roosevelt University has a heterogeneous student population and these classes were no exception. Since their interests and background preparation varied extensively, designing exercises to suit the needs of the classes hecame a challenging problem. Table 1illustrates the topics covered. For detailed lists of source books and supplementary materials see Table 2 and 3. The textbook by S. P. Harter was selected as a general source of basic introduction into computer-assisted information retrieval. However, every week the reading was supplemented by either practical information on online search procedures or suggested literature on clearly science-related information problems (1-4). Most of the specifically chemical information oriented books on the market are not suitable for student text, and furthermore the purchase cost was nrohihitive. Some of the sunnlementary handouts represent &line retrievals I performextvhilea professimal inGrmediary at the University of Chicago. Some of the illustrations and audiovisual aid materials were preDared specifically for agraduatecourse I taught at the InstituteofScientific Information in Beijing, China. The recommended books in Table 562
Journal of Chemical Education
2 were available in the Roosevelt University Library. Some of these were my own copies, which were held there for short term use. Throughout the course the students were encouraged to utilize the Library facilities to gather the necessary nreliminary information for their exercises and tests. Occasionally we also used as extra reading, articles from Online, Database, and Online Review to illustrate how the journal literature deals with current information issues. Approximately two-thirds of the semester was spent discussing bibliographic and referral-type information retrieval in the physical sciences. The classes were conducted in seminar format, the students constructing their own strategies and the class discussing and critiquing the appropriateness of the logic and the cost effectiveness of the approach. Even though all the students were chemistry majors, locating compounds manually in the printed form of Chemical Abstracts appeared to be a more difficult problem than I anticipated. I found that a brief Library orientation is insufficient in familiarizing the students with the structure of Chemical Abstracts. A more effective solution to alleviate this problem is through the use of audiovisual material taking the class through some exercises in the use of Chemical Abstracts. A good illustrative sample is amovie produced by STN which was quite helpful (Table 2). A more recent movie made bv STN. however. unfortunatelv covers more the history of CAS online ratl;er than actual"examples of usage. In order to familiarize the student more with the use of Chernlcal Abstracts, I assigned specific problems involving chemical nomenclature to be solved in coniunction with the online exercises. The last few weeks of the semester were devoted to discussing the significance of retrieving physical and chemical data from bibliographic and numeric databases. The primarv sunolementar" &dine in this time interval was & own .. review paper (~a"ble2). ?he recommended books in ~ i h l 2e were also available to illustrate the erowine utilization of numeric databases in the physical and"biologycal sciences. In addition to the text and sourcehooks we also studied numerous sample strategies, made use of demonstration software, and had as much online experience as possible. In addition to the discussion of the topics in Table 1, in each semester a period was scheduled for a speciallecture by a guest speaker. In 1989 M. Palma (Dialog) discussed the special problems associated with retrieval of patents. In 1990 R. Kaminecki (Dialoe) demonstrated the use of the software ~~-~ Molkick in con;uncti& with the Beilstein file. Both lectures were well received bv the class. The latter tonic was of oarticular significance, f& this class exhibited'a great deal of interest in gathering physicaI and chemical data online. In order t o make the class more comfortable with using index terms and more proficient in selectinn the most snecific concepts or best related terms, we held clissroom e x e k e s with four thesauri. The thesauri were selected to cover wecific areas of science and also to illustrate differences in the construction of indexing. The students formed small groups, each assigned a nurnbe; of problems to solve. Even though the academic level was quite homogeneous, the language proficiency of the students was rather uneven, particularly
.
~~~
~~
~~
~
Table 1.
Table 2.
Course Outllne
Readlng Llst
A. Textbook: Hater, S.P. Online information Retrieval. Concepts, Principles and Techniques: Academic: New York. 1985. Week of: 1-15 TOPIC.
lntrca~ctionTeXmook. Library, an0 aooratory resources. Tne role of compner-assislea nlarmsl on retrevai. Type* 01 d a t a b ~ ~ ebio s : ographic: referral: ndmer c, full-text. Readings: S. P.Haner, Chapter 1 Structure of databases. Database languages. me role 1-22 TOPIC: of keywwds. Basic commands in Boolean logic. Readings: Chapter 2 1-29 TOPICS: me dynamics of the retrieval process. Telecommunication systems. Consrmcting search strategies. Reading: Chapter 3 Online exercise. LOGON process; author searches. Lab. 1:
2-5
TOPIC:
Reading: Lab. 2:
2-12 TOPIC:
Reading: Lab. 3:
2-19 TOP C
Reading: Lab. 4: Movie
2-28 TOPIC: Reading: Lab. 5:
3-5
TOPIC: Lab. 6:
3-19
TOPIC
a o 7
3-26 TOPIC: 4-2
TOPIC
Reading: Lab. 6:
4-9
TOPIC:
Lab. 9: 4-16 TOPIC:
Reading: Lab. 10:
4-23 TOPIC: Lab. 11:
4-30 TOPIC: Lab. 12:
similarities and differences in searching the malor biblie graphic systems: DIALOG. BRS. ORBIT. STN. Descriptors and identifiers. Chapter 4 Classroom exercise with Thesaurus terms: INSPEC. ~aolean logic in science information retrleval. Phrasing strategies in research problems in physics, chemistry and geology. Ward proximity, expansion of concepts. language and time interval specification. Chapter 5 Classroom exercise with Thesaurus terms: GEOREF. The s t r ~ c t of ~ ~Chemocal e Aostracts How to ilnd a Compound manuallf m CA The t,nctton of the Reg stry NLmber G u m LeCtJle from Roosevslt Un vers PI Library: Mr. C. Byre. Chapter 6 Library exercise with CA. Chemical Abstracts and me gypsy moth. Some great problem solverr: NTIS. SSIE, Conference Papers Index. Uses and misuses of Science Citation index Chapter 7 Online exercise: bibliographic retrievals in chemistry and physics. Practice with Science Citation index software. Retrieval of chemical compouods in CHEMNAME. Parent heading, substituents, molecular formula, Registry Number. Online exercise: Locating simple organic compounds in CHEMNAME. Take home test I More on relrlevai of chemical compo~ndrrlng compounds m~neralsgroups n m e per odic table On1 ne exerare Monerass n CA and GEOREF per8wnc index terms; ring identifiers. Topic to be announced. Guest Lecturer. Take home test I1 Retrieval Strateg es n the olomedlcal tlles BlOS S. Medlme. Excerpta Med ca. nternatoona Pnarmacenlcal Abstracts. Chapter 8 Classroom exercise with thesauri terms: BIOSIS, MESH. Compound identification and physical property retrieval in biomedicine: use of concept codes, registry numbers. enzyme commission codes. Online exercise: strategies in biomedicine. Comparison of bibliographic and numeric retrieval. Availability. format, cost. Retrieval of simple physical and chemical propenies. Meschel. S. V. Online Review 1984 8, 77-103. Online exercise: Heilbron. Merck Index, KirkOthmer. Bellstein. Substructure searching: building a compound bond by bond, allowing free sites: CIS; STN; OARC/Questel. Classroom exercise with DARCIQuestel: STN demonstration sonware; CIS online exercise on SANSS. Spectra retrieval online. me inverse search. Text versus graphic output. Online exercise on CIS: Mass spectra: C-13 NMR spectra; infrared spectra. Take home test 111
8. Supplementarytext: Meschel. S. V. Numeric Databases i n the Sciences: Oniine Review 1984. 8. 77-103. C. Recommended sourcebooks: 1. Antany, A. Guide m Basic hfwmation Sources in Chemistry. Wlley: New York. 1979. 2. Mairell, R. E. How To Find Chemical Infamstion; Wiley: New York.
1987. 3. Wolman, Y. Chemlcal i n f m t i o n . A Pmctical Guide to Utilization: Wiley: New Yo*, 1983. 4. Ash, J. F.; Chubb. P. A,; Ward. S. E.; Walford, S. M.: Willen, P.
5.
8. 7. 8.
9.
10.
Commoniwtion, Storage and Retrieval of Chemical information; Wiley: New Yark. 1985. Howe, W. J.; Milne. M. M.; Pennell. A. F. Retrieval of Medicinal Chemical Infwmation; American Chemical Society Symposium Series No. 84: American Chemical Society; Washington. DC, 1978. Wiswesser. W. J. A Line-Formula Chemical Notation: Crawell: New York, 1954. Chen, Ching-Chih; Hermon. P.. Eds. Numeric Databases: Ablex: Norwood. NJ. 1984. Smith. 0. H.. Ed. Computer Assisted Structure Elucidation: American Chemical Society Symposium Series No. 54; American Chemical Society: Washington, DC. 1977. Guide to DIALOG Databases: Lockheed Information Systems: Pal0 Alto. CA. Marcaccia, K. Y.; OeMaggio. J. A,, Eds. Computer-Readable Datb bases. A Oirectwy and Data Sourcebook. 5th ed.; Gale Research: nemlt ..-.., iqns .- - - . Schulh. H. From CA l o CAS Online: VCH: Deerfield Beach, FL. 1968. CAS Online: me Registry Fils; Chemical Abstracts Service: Columbus, OH 1984: Vols. 1-3.
--
11. 12.
Table 3.
Supplementary Materlal lor Onllne Exercises
A. General search aids. 1. LOGON procedures and network telephone numbers. 2. Dialog basics. A brief introductory guide to searching (1988). C. Connecting search terms. D. Viewing your results. I. Saving your strategy. J. Scanning a group of databases through DIALINDEX. 3. Demonstration sonware illustrating retrleval on Science Citation index (Institute for Scientitic Information lnc.. 1989). B. Chemistry related handouts for DIALOG retrievals: 1 . Sample compound searches on CHEMNAME. 2. Chemical Abstracts Section numbers and titles. 3. lllu~trationof the use of group number (GN) and periodic Index term (PI) in retrieving element groups in the periodic table. 4. Ring compound searching: DIALOG Chronolog 1987, (August). 194196: 1987. (October). 256-258. C. Numeric databases. 1. LOGON process and sample retrievals on CIS: Mass spectra, C-13 NMR spectra, Infrared spectra. 2. Examples of substructure searching on DARCIQuestel. 3. Demonstrationsoftware illustrating STN: STN Mentor. American Chemical Society, 1987. 4. List of graphic terminals compatible with CIS, STN, and DARClQuestel. 5. Demonstration sonware illustrating the Beilsteln file. D. Movie. . Chemical Abstracts and the gypsy moth. (Color video, 17 mln; Chemical Abstracts. Columbus. OH).
i n the first class. To make sure everyone had ample time to understand the nomenclature. term structure, and the meaning of various codes, these'classroom exercises became lengthier than I originally planned. Particularly, the selection of biology related index terms seemed to cause difficulty for both classes.
Volume66
Number7 July 1991
563
In designing online strategies, the two classes displayed entirely different preferences. The first class elected to work on topics of their own choosing, usually some research project they were involved in either through academic work or on their iobs. The second class referred the instructor to assign p;oblems t o solve. In order to accommodate the students' needs and interests, I assigned one term paper where they could work on information retrieval for their own proiects and two tests with five ~ r o h l e m seach to complete. All the assignments were take-home tests, to allow time to use the Librarv resources whenever necessary and make use of online timk if needed. At the end of t h e semester the students' performance was evaluated on the basis of the grade of the term paper and the two tests. The tests each consisted of five problems deigned t o review all the areas of information retrieval covered in the course, i.e., retrieval of name, document, concepts, molecular structure, and numeric data. Onllne Exerclses Table 4 illustrates the classroom and laboratory exercises.
The online exercises were scheduled in the Cornouter Center of Roosevelt University well in advance in order to reserve the equipment favorably situated for class viewing. We used an IBM X T (model 5160) in conjunction with an IBM printer (model 5152) and a U S Robotics modem (AutoDial212A). This equipment was not capahle of graphic output; therefore the spectra retrieval from the Chemical Information System (CIS) was printed in tabular format. I expect that in the future we will acquire the appropriate software to allow me to nrovide an exercise illustratine e r a.~ h i coutout as well. ~ h r o u ~ h othe u t semester we surveyed quite extknsive~ythe major bibliographic vendors, the similarities and differences
.~~~
--
~~~
Table 4. Laboratory Exerclses A. Classroom exercises: 1. Praotice locating relevant databases In a research area. Computerreadable Dsiabases. A Directory and Data Sourcebook: Msrcaccio. K.Y.. DeMagglo. J. A,. Eds.; Gale Research: Oetrolt. 1989. 2. Practice locating concepts and related terms in science thesauri: The in~tltutionof Elemlcal Engineers. INSPEC Thesawus; Gresham: Old Working, Surrey. England. 1985. GEOREF Thesawus and Guide to indexing, 4th ed.; American Geologlcal institute: Alexandria. VA, 1986. BiOSiS Previews Search Guide: Biological Abstracts: Philadelphia, 1989. Medical Subject Headings: National Library of Medbine: Belhesds, MD, 1987. ~~~
3. Rooseven University. Library demonstration and exercise in locating chemical compounds in Chemical Abshacts. 8. Online exercises: 1. Logon processes on DIALOG. CIS, STN. Selection of database and viewing format. Expand names. Use of Boolean logic in author retriev815. 2. Descriptors, identifiers and word proximity. Publication year and language restrictions. Practice physics and chemistry strategies in CA. INSPEC, and COMPENDEX. 3. Expand concepts, specify location, geographic coordinates, Practice chemishy and geology strategies. Retrieve minerals in CA ad GEOREF. Retrieve cited references in Science Citation Index. 4. Retrieving Organic compounds: the use of parent heading, substituent. molecular formula, registry number in CHEMNAME. 5. Retrieving ring compounds in CHEMNAME. The use of the ring identifi-
ers. 6. Locating physical and chemical properties: Heilbron. Merck Online, KirC-Othmer. .~~ .~ . ~Beilslein. -~ , ~-~~ ~~
~~~
~
codes; blosystemic codes; enzyme COmmisJion codes. Retrieval strategies in BIOSiS. Medline, International Pharmaceutical Abstracts. Excerpta Medics. 8. Substructure searching: building a compound, allowing Wee sites. OARC/Questel (ciassroorn); STN (demonstration software); CIS (anline). 9. Specba retrieval on CIS: Mass spectra. C13 NMR spectra. Infrared soecha. The oassibilitu of Inverse search. 7. Biomedical applications: concept
564
Journal of Chemical Education
in the respective command languages. However, in the lahoratory exercises only Dialog was used, for the classroom instruction rates made it possible to have several hours of online experience. We offered substantially more online time durine the second semester, making the laboratorv exercises leis pressurized for each individuil student. Since the classes were relatively small, it was possible to have four or five students obserkng the person who was working on a strategy. The other students worked with the demonstration softwares while waiting their turn at the keyboard. This arrangement could be improved upon by setting up the computer in a separate room and using the overhead projector to allow the entire class to view all the retrievals. - In the future I plan to expose the students to more retrieval languages in online exercises, perhaps expanding the program to include illustrative examples using vendors such as STN and BRS, both of which offer classroom instructional rates. In the exercises involving numeric data, we made good use of the generous offer of f&e computer time on the-^^^ system. This latter experience was particularly valuable, for it allowed the class to build molecular structures and substructures, match properties, retrieve different types of with inverse searchine. On snectra. and even -= ~ ~ ~ exoeriment ~ ~ , Dialog we made extedsive use of the numeric data searching caoabilitv. oarticularlv on INSPEC. Heilhron. Kirk~t'hmer,~ e i c Index, k &d Beilstein files (5). Since compound searching is considered one of the most important aspects of computer-assisted retrieval, two laboratory exercises were designed to provide training on this topic. The assignments included retrieval by complete name, molecular segments, empirical formulas, ring characteristics, and other features. Retrieval of compounds where free sites are allowed was extensively discussed in class. Among the languages allowing free site searching capability, STN and DARC/Questel were reviewed in class, while CIS was also used in online exercise. The lecture-demonstration on the use of the suf'tware Molkick was also helpful to keep the class up to date on current possibilities of substructure searching. T o facilitate locating a compound in the Beilstein file. we made use o i a demonstration software. The following partial list illustrates the diversity of compound retrievals the students worked on during the online exercises: 18-dichloroethane ~~~~~~
~
~
cyclooctatriene-1.2-dione
2-propenoie acid, 2-cyano-3-ethoxy,ethyl ester 1,aza-6-horabicyclo(4.4.0)decane
silicon nitride serotonine stokesite adenyl cyclase exo-1,4-alpha-glucosidase
caffeine, 8-(hydroxymethy1)Fermi liquid Oraflex noble gas compounds with halogens (exclude carbon) For the listed compounds several of the following numerical meltine.. ooint. refracdata were located online: boiline..ooint. . . tire index, density, and ultra\.iolet makimum. In addition to thesedata, massspectra, carbon-13 NMR specrra,and infrared spectra were also retrieved when available. Considerable time was spent on the process of determin. ine whether biblioeraohic. referral. full-text. or numeric files would serve as most sktable choice for a particular information problem. The point was em~hasizedthat perfect retrieval may not be possible, even-if the person performing the search is very competent, for the user is usually limited by budgetary consideIations as well as by accessibility of system or file. We discussed a t some length the conceptual differences in approach when the scientl'st is the end-user, i.e., performs the retrieval for him- or herself or the search is conducted through an intermediary. Since some chemists do become the information interpreters for their research
group, - - we also discussed the simificance of asking the most relevant questions in an i n t e ~ i e wprior to online retrieval. These topics are exceptionally well covered in the text by Harter. The majority of the students a t the start of the semester seemed intimidated bv the ca~abilitiesof the online svstems. Since I felt t h a t this m& negatively reinforce thkir learning, I sought to emphasize the point that the online system is only a n inanimate tool in the service of a capable searcher. T h e stimulus for one of the most interesting exercises on applicatious was a problem brought by a student who had been accepted to th;eegraduateschoolsand wanted to know whether online searching could help her make an informed decision. Throueh a class discussion we sueeested that she learn as much as possible about the three professors she intended to work with a t these institutions. The retrieval involved locaring publications, comparing citation frequencv aa well as the duration and level of funding. With the aid ok the retrieval output she was able to m a k i a reasonable decision. Subsequently, I was pleased to hear that she is doing well in the graduate school of her choice.
-
"-
Summary On the basis of favorable feedback the course was well received by the first two classes of students a t Roosevelt University, and the program will be offered again in the Chemistry Department. There is sufficient reason to conclude that the program was able to accomplish its aim, that is, to bridge the gap between the Library resources and the students' science information needs. T h e course has undergone some changes from the first t o the second semester, and I expect additional adjustments as the technology, equipment availability, and class composition changes. I expect that in the future we will be able to attract students from other areas of science as well as chemistry majors. I consider i t a positive change that I was able to offer more online experience this year, so the students felt less pressure to
complete their assignments. I n both semesters we had guest lecturers to cover specialized areas of online searchi&. I would like to continue this practice and if possible expand i t to two seminars Der term. Next time I would like to make more extensive use of the overhead projector in conjunction with the IBM PC, so evervone could observe all the retrievals. I n my future plans I ionsider it pedagogically advantageous to introduce other retrieval languages in addition to Dialog. However, that is only feasible when classroom discount instruction rates are available to make the use of online time more affordable. For example, a,e migh expand theexercises to utilize rhe HRS or STN educational program rates. In this regard I very much appreciated the generous offer of CIS who allowed us in both semesters free online time for oradice. Exercises on the latter svstem allowed us to experiment with molecular substructure searching a9 well as spectra retrieval of oreanicmolecules. The students in the second semester suggested that next time the course is offered more quizzes should be given. This class preferred solving teacher-assigned problems as opposed to-the first class who wished to work out searches on their own areas of specialty. At last I wish to point out that, to my knowledge, courses of this type are rarely offered (6-11).With the proliferation of scientific information I believe that it is timely and pragmatic to instruct as many future chemists as possible in the modern uses of information retrieval to the benefit of their companies, businesses, and academic institutions. Literature Clted 1. Sehack, E. 0.;Schaek, M. B. Coiiegiote Microcomputer 1989, 7.19-26. 2. Mil1er.J. M. J. Chrm. E d u c 1989.86, 2