Searching Chemical Abstracts Online in Undergraduate Chemistry Part 1. CA File, Boolean, and Proximity Operators Miroslav Krumpolcl and Diana Trimakas Department of Chemistry, University of Illinois at Chicago, Chicago, iL 60680 Connie Miller Science Library. University of Illinois at Chicago. Chicago, iL 60680 This is the first in aseries of papers that will present topics designed to introduce students to searching the online database, CAS ONLINE. The computerized equivalent of the printed Chemical Abstracts, CAS ONLINE is made available through STN International and consists of three separate hut interrelated files: the CA (bibliographic) File, the Registry (structure) File, and the CAOLD File (a hihliographic file-pre-1967). There are also two training files available-Learning CA File (LCA) and Learning Registry File (LREGISTRY). The topics included in this article, when supplemented by discussion and hands-on practice, are intended to accomplish the following: 1) 2) 3) 4)
demonstrate the advantages of computer access to information; introduce the logic of searching an online database; explain Boolean logic, proximity operators,and truncation; introduce STN International's Messenger command language2 and searchable fields.
The traditional "offline" search (i.e., in the library) is not onlv time-consuming and nearlv alwavs incomolete. . ,hut. ,in some cases, also practically impossible. Therefore, as a part of our Advanced Organic Chemistry Laboratory curriculum, we have developed several programs suitable for teaching online information retrieval from computer-readable files. Checking the recent literature3 we have noticed that the current awareness of computerized searches is still sketchy. As this approach is becoming increasingly important in the "life" of a chemistry student, we have decided to narrow the gap hetween the laboratory and the library by bringing the online search within a student's reach. In this paper we wish to outline basic strategies for searching the CA File4 by presenting a broadly formulated topic in organic analytical chemistry. The search consists of four steps: 1) problem formulation, 2) problem "translation" using search terms and operators (Table 1, Fig. I), 3) initial online search, 4)
refining and narrowing search strategy.
We have limited ourselves to the use of only five hasic commands in the STN language (Table 2) in order to demonstrate that it is all a beginner would need for an efficient bibliographic search. Deflning a Search Problem Suppose we need basic information concerning recent analytical techniques that can directly separate enantiomers Author to whom correspondence should be addressed. "STN International: A Guide to Commands and Databases"; Chemical Abstracts Service: Columbus. OH. 1984. Wolman. Y. J. Chem. Educ. 1985, 62, 331,and references cited therein. "Using CAS ONLINE: The CA File": Chemical Abstracts Service: Columbus. OH. 1985.
Table 1. Boolean bnd Prdxlmlty Operatorsa Operator
Type
AND OR NOT (Wl
~a&n Boolean Boolean Proximity
(A1
Proximity Proximity
(Ll
Search Expression
Depiction
lj6lh A and B required A AND B either A or B required AORB A but mt B required A NOT B adjacent (wiih), in the A(WP Order specified adjacent, Dither wder link in the same informa-
tion unit (sentence), e.0.. title
'Algebraic prioritV (search terms m parenuwses). (Wl, (A) > (L) > AND. NOT > OR Restriction: (W) > (A) > (L) > AND
CONJUNCTION
DISJUNCTION
A AND B
A OR B
A NOT B Figure 1. Graphical depiction W Boolean operat'on. and determine their ratio. What.ch~omatographicmethods are available (e.g., TLC, GC, HPLC)? Is there a recent book or review article on the topic? What are the most relevant citations? Who has significantly contributed to a given field? Formulating Search Terms and Constructing a Query The CA File consists of a collection of bibliographic citations on a variety of chemical topics. T o he effectively searched on an online database, a topic must first be hroken down into words or concepts and then relationships must be established between these words or concepts. Often, it is impossible before going online, to think of all the words that the databasecould use to describe a topic. Because of this, all online searches must he interactive. T o get the search underway, however, an inttial set of words must he chosen. For the information problem stated above, an initial set of words (i.e., search terms) might be
SEPARATION, ENAYI'IO\IER, ENANTIOMERS, C'HHOMA'I'OC~HAPHS Volume 64
Number 1 January 1987
55
Table 2. Selected Command Summary Chart Commanda FILE
SEARCH S
o~)emtor ~F
~
ENANTIOMER
ENANTlOMERS
=>FILE CA =>FILE REG
=> S L2 AND HPLC
~
-specific
ENANTIOMERS
Examples
Function
Seiect data base (file)for search, display and print Search words. terms, sets or their combination (query) -answer set line 2 with AND
ENANTIOMER
field with
field
=> S PIRKLE.
ENANTIOMERS
ENANTIOMER
W?IAU
DISPLAY 0
PRINT
specifier (AU = author. PY = publication year. LA = language, B/DT = bwk/document type) Display search results anline -of line 5, answers 1-5. format TI (titleonly) or TRI (trial or BIB (bibliographic data)or ALL (bibliography, a b stract, index terms; Table 4) Offlineprinting of citations -line 19,answer set 1-1 1.
IPY /LA BIDT
=> D L 5 1-5 TI TRI BIB ALL
D~
=> PRINT L19 1-11 ALL
LOGOFF a
format ALL Stop the online session
Fioure 2 -. Graohical deoictian of the initial search ouerv: . . .la1. ENANTIOMER OR EhAhlT OMERS.(blSEPARAT ONANDtENANT OMEROR ENAhTIOMERS,:(cr SEPARATION A N 0 (ENANTIOMER OR EhANTlOMERS! AND ChROMATOG-
=>LOGOFF
SO- frequently used wmmnds can be used in abbreviated form
The relationships between these words can then he estahlished using the Boolean and proximity operators (Table 1). To be useful in solving our problem, each citation must include the word "enantiomer" or "enantiomers". Each citation must also include the word "se~aration", and since chnmiltt~ymphic~nethuds,in particular, areuf interest,each c ~ t a t i mniust also inrlude the word "chrumatnnrnphy". Our initial search terms using Boolean logic (Fig. f), then, may look like this (Fig. 2), and the initial query (i.e., search statement or question) is formulated as follows (Ll):
~
~
~F
~~
~~F
~
~
~
RAPHY. Runnlng the Search In CA File
After usinga terminal and a modem5to log on to Chemical Abstracts and choosing to access the CA File of CAS ONLINE, the student is ready to start a search. In the following printout, the words and commands the searcher types are in italics. The symbol is the prompt used by STN International's command language to indicate that the computer is ready to receive a command.
-
SEPARATION A N D (ENANTIOMER OH ISK.\NTIOMFHS, Using proximity operators to specify the relationship hetween some of the initial words would require a closer relationship than Boolean operators. For example, (ENANTIOMER OR ENANTIOMERS) (L) SEPARATION requires that "enantiomer" or "enantiomers" and "separation" appear in the same sentence of a citation rather than just in the same citation. This strategy may be too narrow for the initial search as the (L) operator is more restrictive than the AND onerator. In the case of GAS(L)CHROMATOC;R.APHY, huu,ever, its us(. is justified; this term sumrrimes contains nn additional word 1.101111). I.:\.en more restr~sti\,e (W) operator is used for THIN(W)LAYER as no other word is expected between these two terms. The (A) operator is most useful in searching the nomenclature (e.g., methyl ethyl ketone or ethyl methyl ketone, M E T H Y L(A)ETHYL). Using truncation symbols defined in Table 3 can eliminate duplication of typing. For example, ENANTIOMER? tells the computer to search for the root "enantiomer" as well as that root with any additional characters.
This search was conducted using a typed text-only terminal, the Viewpointl3A mode (manufacturedby Applied Digital Data Systems, Inc. (ADDS)) interfaced with a Signalman Mark I modem (Anchor Automatic, Inc.), and set for rate 300 baud. Communication with a computer was carried out following instructions supplied by CA Service (i.e., dialing, entering through login ID and password). Further information is available directly from CAS. Ohio (phone number 1800-848-65331. 56
Journal of Chemical Education
The S is the abbreviated SEARCH command (Table 2). The database contains a total of 48120 citations that include the word SEPARATION, 1333 that include the word ENANTIOMER, and 1348 that include the word ENANTIOMERS. The computer than creates a set of items that contain the search terms specified and gives each set a consecutive line or L number. Set L1 Le., answer set), for examole, conrain. 2:36 cirations,t.nch oiwhich includes the u,ord SEPARATION and either the word ENANTIOMER or ENAN. TIOMERS. Once sets are created and given L numbers, they can be combined with other sets and search terms (Fig. 2).
3264 TLC L3 2 L2andTLC S L2 AND (GC OK GLC) 1889 GC 927 GLC L4 3 L2AND (GC ORGLC) S LZ AND HPLC
-
-
L5
5347 HPLC 25 L2AND HPLC
A search for specific methods of chromatography results in a total of 30 citations. The other 149 citations from answer set L2 must involve other chromatographic methods besides TLC, GC, GLC, HPLC or alternative search terms (e.g., thin-layer chromatography). The first two citations of L5 are requested for viewing.
Table 4.
Table 3. Truncation Svmbois Symbol
7
# !
Examples
Codes
Fields
ENANTIOMER? (enantiomer. enantiomers, enantiomeric)
AN
Accession number, i.e., location number of tlw abstract in print version of CA Title of document Author w oatent inventor Corporate source or patent assignee Corporate source or patam asslgnee locat on Patent information Patent applicationlprlority Information Patent clasrificatlon Source. i.e.. name of journal, volume, issue, page ChemiwlAb~tract.5section rode and title Chemical Abstracts sections cross-reference code Document woe. e.0.. " .iournal. bwk. dissertation. oatent Coden of the source documem, s.9.. :ournal 6SN. e . , nferna! onal standard serial number Pvol~catmyear of orqinal document Language of original document Abstract text Keywords Index terms (correspondsto CA volume indexes)
Function truncates any number of characters at the end of the word istern) ~~.~~ ~ ~ n c a f one e s or no charm e r at the end d the wwd (stem) masks exactly one eharacter (except the first one) anywhere in the word
TI AU CS LO PI A1 CL SO SC SX
ENANTIOMERa ienantiomer. snantoomersl: SEPNU (sepn. sepns) SYNTHESlS (synthesis, syntheses)
DT
63,80 Separation of diastereoisomeric amides by preparative hieh-oerformance liouid chromatommhv " . - . . and analysis of enantiomers by chromatography an a chiral support diastereoisomer amide HPLC; chromatog liq enantiomer sepn Pharmaceutical analysis (isomer sepn. in, by HPLC) Resolution (of amide enantiomers, by HPLC) Isomerism and Isomers (diastereo-, sepn. of, by HPLC) Chromatography, column and liquid (high-performance,chiral phases in, for isomer sepns)
CO IS PY LA AB KW IT
1
Direct separation of 2-bydroxy acid ensntiomers by highperformance liquid chromatography on chemically honded chiral phases hydroxy acid resoln HPLC; ligand exchange chromatog hydrony acid Resolution (of racemic hydroxy acids, on chiral stationary phases in HPLC) Silica gel, compounds (reaction products with amino acids, as stationary phases for hydroxy acid enantiomer resoln. by HPLC Chromatography, column and liquid (high-performance,ligand-exchange, of hydroxy acid enantiomers) Carhoxylic acids, analysis (hydroxy, resoln. of, by HPLC on ehem. modified c h i d phases) The D is the abbreviated DISPLAY command (Table 2). Displaying the first two citations (out of 25) from answer set L 5 in the TRIAL (ahhrev. TRI) format (Tahle 2) allows the searcher t o interact with the datahase. T h e italicized words in these citations are additional search terms t h a t can he used to expand the relevant output. These terms are located in the title, the keyword, and t h e index term fields of the citation (Tahle 4). T h e TRIAL format is free and i t should always he used in the initial stage t o make the search more effective. Revising the Search Strategy and Rerunning the Search T h e DurDose of this step is t o maximize the number of citations retrieved, a s it obvious that the initial query missed some search terms and did not include standard abbreviations. Therefore a new query is formulated utilizing the new search terms found in the TRIAL format, word truncation (Tahle 3), and proximity operators (Table 1).
..
~hplaylngor prlntlm~In TI fwmst results only In TI field, in TRI t m t In TI. KW. IT fields. in SIB format in ail fieldr,except A& KW. IT. and in AIIformat in all tleldr.
New search terms:
RESOLUTION, RESOLN, SEPN, SEPNS, LIQUID, CHROMATOG, etc. Forming a query: R E S O L N # , S E P N # , LIQUID(L)CHROMATOG?, THIN(W)LAYER, GAS(L)CHROMATOG?, etc.
ANSWER 2 of 25 64-3 (Pharmaceutical Analysis)
*
Display Field Codes In the CA Filea
=+ S (SEPARATION OR SEPN# OR RESOLUTION OR RESOLNt) AND ENANTIOMER?
T h e new query was formulated. T h e additional keywords increased the output from 236 citations ( L l ) t o 912 citations indicating the importance of interacting with the datahase online and revising the initial search query with information found in retrievals.
3264 TLC 48476 THIN 62837 LAYER 18607 THIN1W)LAYER L8 13 L7 AND ~ T L COR THIN(W)LAYER) SL7AND (GC OR GLC OR GAS(L)CHROMATOG?)
-
lRRQ C.C -----
GLC GAS CHROMATOG? GAS(L)CHROMATOG? L9 L7 AND (GC OR GLC OR GAS(L)CHROMATOG?) S L7 AND (HPLC OR LIQUID(L)CHROMATOG?) 5341 HPLC 131836 LIQUID 125990 CHROMATOG? 34941 LIQUID(L)CHROMATOG? L10 285 L7 AND (HPLC OR LIQUID(L)CHROMATOG?)
-
927 224991 125990 48606 205
Volume 64
Number 1 January 1967
57
For example, the three citations on GC or GLC retrieved through our initial search terms (L4) expanded to 205 (L9) when the additional search terms were used. These results demonstrate that the new strategy is considerably more exhaustive.
-
S L7NOT (L8OR L9 OR LIO) L11 36 L7 NOT (L8 OR L9 OR L10)
Answer set L11 includes citations from answer set L7 that discuss chromatographic techniques other than TLC, GC, GLC, or HPLCAiquid chromatography. Their titles are requested for viewing. - -... .. - -
reviews (L15), and 21 of these reviews are in English (L16). Twelve of them were published since 1983 (L17): eDL171-12TI L17 ANSWER 1of 12 TI Analytical applications of direct chromatographic enantioseparation LII
TI
ANSWER 9 of 12
Sevaration of enantiomers hv. liquid meth. chromatographic ods
L17 ANSWER 12 of 12 TI Chiral stationary phases for the gas-liquid chromatographic separation of enantiomers
L11 ANSWER 1 of 36 TI Stereochemistry of metallocenes. 53. Biphenyl tricarbonyl chromium complexes. Part 9. Synthesis, chromatographicenantiomeric resolution, circular dichroism, and chirality of mono and bis(triearbonylchromium)complexesof di- and tetrasuhstituted and bridged biphenyls
The titles of the 12 review articles in English published since 1983 are displayed in the TITLE format. The most relevant (answer 9) is then displayed in the bibliographic format BIB (Table 4), as more information on the citation and the author is required.
L11 ANSWER 36 of 36 T I Resolution of enantiomers by reversed-phase chromatography and countercurrent extraction
dDL179BIB L17 ANSWER 9 of 12 AN CA100(20):162205f TI Separation of enantiomers by liquid chromatographic methods Pirkle, William H.; Finn, John Sch. Chem. Sci., Univ. Illinois Urbana, IL 61801, USA Asymmetric Synth., Volume 1, 87-124. Edited by: Morrison, James D. Academic: New York, N.Y. 66-0 (Surface Chemistrv and Colloids)
All 36citations in answer set L11 are displayedin theTITLE (abbrev. TI; Table 2) only format. Checking the titles we found that many other chromatographic methods are used in the separation of enantiomers: inclusion, countercurrent, ligand exchange, ion exchange, column, complexation, and reversed-phased chromatography. No titles contained, for example, paper or ion-pair (sometimes called pair-ion) chromatography. -SL7ANDPAPER 471(Rn PAPER .. --. . .. .-.. L12 0 L7 ANDPAPER S L7 AND ION(A)PAIR 183084 ION 15295 PAIR 4192 ION(A)PAIR L13 3 L7 AND ION(A)PAIR
-
-
LA Eng S PIRKLE, WIAU L18 105 PIRKLE, W?/AU S L18 AND (84 OR 85)lPY
Answer set L13 demonstrates the danger of the logic behind the NOT operator. After not seeing these two chromatographic techniques in the TITLE output of L11, additional search of the initial answer set L7 was conducted to verify that. While no citations were found on paper chromatogravhv (LIZ), three citations retrieved on ion-pair chromatog;aphy ( ~ 1 3may ) come as a surprise. These citations, however, were negated in L11 as they contained, along with search terms ION, PAIR, additional search terms used in L8 or L9 or L10 (e.g., HPLC). Similarly, flash chromatography would not he retrieved in L11. because of the simultaneous aovearance of search terms LIQUID, CHROMATOGRAPHY (see D L19 1ALL, field IT) within that citation.
Since William Pirkle wrote a review on the separation of enantiomers, it is likely that he has written additional articles on the topic. A search for this author's name turns up 105 citations since 1967. Eleven have been written since 1984 (L19). Truncation is used since the authors' names appear differently in different articles (some of these articles may even he by another W. Pirkle).
~~
-
~~
~
S LIO AND BIDT 89001 B/DT L14 0 L10ANT)RmT --- .S L10 AND REVIEW 581360 REVIEW L15 23 L10 AND REVIEW S L15 AND ENGILA 3904825 ENGLA L16 21 L15 AND ENGLA S L16 AND (83 OR 84 OR 85)lPY ~
-
-
L17
424803 8 4 ' 1 ~ ~ 110331 85PY 12 L16 AND (83 OR 84 OR 85)PY
In answer sets L14-L17 further information on HPLCAiquid chromatography is requested and the search is conducted in specific fields (Table 2). None of the 285 citations (L10) are hooks (L14). Twenty-three of the citations, however, are 58
Journal of Chemical Education
-DL191-I1 TI L19 ANSWER 1 of 11 TI Preoarative seoaration of enantiomers hv flash chromatography L19 ANSWER 2 of 11 TI Effect of interstrand distance upon ehiral recognition by a c h i d stationary phase L19 ANSWER 3 of 11 T1 Reversed-phase chromatographic resolution of N-(3,5dinitrobenzoy1)-.alpha.-amino acids on chiral stationary phases All 11most recent titles of articles written by W. Pirkle are displayed in the TITLE format. Display of his latest paper in full (ALL format) with all the fields (Table 4) is then requested for viewing. -DLISIALL L19 ANSWER 1 OF 11 TI AU CS
.
.
Preparative separation of enantiomers by flash chromatography Pirkle, William H.; Tsipouras, Athanasios; Sowin, Thomas J. Seh. Chem. Sci., Univ. Illinois
Urhana, IL 61801, USA J. Chromatogr., 319(3),392-395 22-3 (Physical Organic Chemistry) 66 J JOCRAM 0021-9673 1985 Eng Chiral stationary phases I [R = Ph, R' = H (11);R = H, R' = CH&HMe& supported on silica gel, were prepd. and used in flash chromatog. E.g., raeemic henzodiazepinone 111 was completely resolved into its enantiomers by flash chromatog. on 11. For diagram(s),see printed CA Issue. chiral stationary phase flash chromatog; enantiomer sepn flash chromatog Resolution (by flash chromatog. on c h i d stationary phases) Chromatography, column and liquid (flash, sepn. of enantiomers by, an chiral stationary phases) Silica gel. uses and misct.llaneour (support. for ~ h ~ rstationary al phases in &ah chromafog.1 H77H2-RG-RDP.silica gel supported 9:8G6-63.2DI', rilrca gel supported (prep". and use as c h i d stationary phase in flash chramatog.) 919-30-2DP, silica gel supported 39200-48-1P 50691-96-8P 50691-97-9P 51990-97-IP 53531-34-3P 60646-30-2P 91402-94-IP 91403-18-8P 97866-66-5P 97866-67-6P 97866-68-7P 91866-69-8P (nrenn. ofl i'63;7-59-8 (reaction of, with dinitrobenzoylphenylglyeineand aminopropyl-substituted silica gel)
.
.-
74097.77.3 "
(reaction of, with ethoxyearhonylethoxydihydihydroquinoline and aminoprapyl-substituted silica gel) 919-30-2 (reaction of, with silica gel) 57526-84-8 60686-64-8 86125-90-8 91464-65-2 97866-64-3 97866-65-4 (resoln. of, by flash chromatog. on ehiral stationary phases)
It is obvious t h a t t h e latest Pirkle's paper contains some important information about the topic being searched. In order t o learn more about his research, the P R I N T command is used t o obtain all 11 recent papers. Offline prints (format ALL; Table 4) are then mailed to t h e user. The LOGOFF command terminates the search. d P R I N T L 1 9 1-11 ALL L19 CONTAINS 11 ANSWERS CREATED ON 12 OCT 85 AT 17:06:47 MAILING ADDRESS = -LOGOFF STN INTERNATIONAL LOGOFF AT 17:23:51 ON 12 OCT 85
...
We believe t h a t this search profile can offer chemistry students a verv realistic and functional understandina of a n online retrieval of information from the chemical literature. Judaina from our classroom experience6t h e amount of informaconupresented is sufficient t o cover most problems a n undergraduate or even a graduate student may encounter in the CA File.
This program is flexible enough t o be updated (only the number of citations will continuallv. -arow) or modified. Some problems follow: 11 Input modrfied search terms IENAKI'? gets enanrioselecrive. enantrospec~fir,enantioseporntion: SEP? gprs sepd (saparatcd,, sepg ~- ireparating): RESOL?) and rompnre both retrievnls. 2) Try new search terms (racemic,racemate, chiral, quantitative, analysis, determination, method, etc.). 3) What chromatographic methods are in L2 besides TLC, GC, GLC, and HPLC? 4) Search for more information on TLC (L8) or GCIGLC (L9) techniques. 5) Search for leading authors and papers in these fields (modify LlPL17). 6) Are there any other chromatographic methods in L7 besides those mentioned in D L11 (e.g., flash chromatography)? 7) Search for other techniques permitting direct determination of enantiomers (e.g., spectroscopy:'H-NMR, I3C-NMR). 8) Find how to separate and detect enantiomers after their conversion into diastereomers (preparative methods, analytical techniques), etc. ~
An average search takes about 20-25 min (2-3 students per hour) and is conducted during off-peak hours using the 90% academic discount offered by S T N International for CAS ONLINE. T h e total cost of a n online search is about $8-9 per hour and there is a n additional charge of $0.10 for an offline mailed print (ALL format). We found this pricing policy for academia very affordable, and we believe t h a t it opens u p the way t o the integration of computerized literature searching into the chemistry curriculum.
-
Acknowledament We are grateful to Hedy Mulhausen, Chemical Abstracts Service. Columbus. OH. for reviewina- the manuscript and adding &tical comhents. Glossary CA File: Bibliographic database containing records for documents covered in the printed Chemical Abstracts (CA) from 1967 to the present (about 7 million papers and patents). Database: Online equivalent of a printed (available) information source, e.g., index, table, catalog, dictionary, abstract. Search terms: Single American English words, standard abhreviations, acronyms,symbols,formulas, etc., expressing the main idea to be searched. Keywords: The Chemieol Abstracts applies the term "keywords" only to words that appear at the end of each weekly issue of CA. Registry Number Identification number assigned to a specific structure. Registry Numbers can be located, for example, in Chemical Substance Index, Formula Index, or CA Index Guide. Main compounds used in a citation are represented by Registry Numbers located in the IT field. Messeneer Command Laneuaee: Interactive laneuaee - incornorat" ing Boolean and proximity operators, truncation, and 17 commands (e.g., FILE, SEARCH, DISPLAY, ete.) to retrieve searchable information. Baud rote: Number of bits transmitted per second.
One of theamors (DT) of this paper who modified and tested the program is an undergraduate student.
Volume 64
Number 1 January 1987
59