Searching Chemical Abstracts Online in undergraduate chemistry

Registry (structure) File: molecular formulas, names, and name fragments. Miroslav Krumpolc, Diana ... Keywords (Pedagogy):. Internet / Web-Based Lear...
11 downloads 4 Views 4MB Size
Searching Chemical Abstracts Online in Undergraduate Chemistry Part 2. Registry (Structure) File: Molecular Formulas, Names, and Name Fragments Mlroslav K~mpolc'and Diana Trlmakas Department of Chemistry. University of Illinois at Chicago, Box 4348, Chicago. IL 60680 Connie Miller Science Library, University of Illinois at Chicago. Box 4348. Chicago, IL 60680 In our first paper2 we have outlined a basic search in one of the data bases, the CA File, of Chemical Abstracts Online (CAS ONLINE). The CA File is a bibliographic data base in which records consist of citations to narticular articles and patents, subject index terms and keywords, and abstracts. In this naner we wish to focus our attention on the other major data base of CAS ONLINE, the Registry File. This data base, essentially a substance index, consists of substance names, their Registry Numbers and characteristics, and actual structural representations. The real power and versatilityof the CAS ONLIYEsearch lies with the fact that both files are companion files complementing each other. Thus, answer sets created in the Registry File can be crossed over for searching in the CA File. The search examples in naDer demonstrate how to use both files in order to this . optimize retrieval of chemical information. A good working knowledge of our previous paper2 is therefore essential in order to understand fully the contents of this paper. The Registry File contains specific types of information on more than 7 million substances and 10 million names; approximately 10,000 to 15,000 substances are added weekly. The information is retrievable by two main routes: (1) by performing a dictionary term search3using complete names, name fragments, molecular formulas, element counts, periodic groups, classes of substances, number of components, andlor formula weighta; (2) by conducting a structure search based on a structure or a suhstructure (i.e., "open" fragment of a molecule) diagram'. &

The structure search is universal. I t is based on comparison of the innut structure done an atom bv atom. bond bv bond to matcd and retrieve structurally identical or similar (substructure search) compounds. I t is also more complex as a great number of special diagram (structure, substructure) building commands is requiredb. In this paper we shall start with the retrieval of chemical substances using the three most important dictionary terms: molecular formulas, complete names, and name fragments (Table 1). Formulating Search Problems The Registry File is more complex and versatile than the CAFile. and i t nearlvalwavs nermits avarietv of annroaches to searching for subkancein'formation. We wish present several shorter examnles coverinr! most tvnical nroblems a chemist faces daily. The examples illustr& how t o search for compounds using their IUPAC or trivial names, how to search for deuteriudabeled compounds, how to find methods of preparation, and how to use a molecular formula to verify the existence of a given structure. In the following printout, the words and commands the searcher types are italicized. The use of a terminal with graphics output is highly recommended for any structure search6. 28

Journal of Chemical Education

Table 1. Search Flelds Used In the Registry Flle Field Ccde IMF ICN (none)

Molecular Formula in Hilla system order Complete Name (CA hdex Name as well as synonvms) Name Fragment (shortest "word" containing chemia l l y significant informs

C4H41MF;C4H2DZIMF; C17HlON5021MF

1.3CYCLOBUTADiENElCN CYCLOBUTADIENE/CN ASPARTAMEICN CYCLO; Dl: METH: OXY: 2; D; d: YL: BENZ; OIC; ACID;

tion1 Elsmamr are arrawsd in alphabetical order unless limy Mntain carbon. In mat case carbon is cltsd first followed by hydrogen (if preaeml, followed by all remaining elernems In alphabstlcal order. l H Isotope is posted ar D, % lsomp is pmted as T.

Problem 1. Retrieve all citations on the preparation of cyclobutadiene. One way to start a search would be to enter the CA File and form a query composed of pertinent search terms combined with Boolean operators, for example, CYCLOBUTADIENE AND (SYNTHESIS OR PREPARATION). However, this annroach mav not vield all the relevant answers as citations containing, fo; example, synthesis of derivatives of cvclobutadiene mav also be retrieved. Therefore substances should be searched in the CA File by using Registry Numbers (RN), which uniquely identify each chemical compound? Thus, it becomes very important in this search to find the Registry Number for cyclobutadiene. T o begin this search we first turn to the Registry File, which contains

' Author to whom correspondence should be addressed.

Krumpolc, M.: Trimakas, D.; Miller, C. J. Chem. Educ. 1987, 64,

55.

'USING CAS ONLINE: The Registry File, Dictionary Searching. Volume 111; Chemical Abstracts Service: Columbus. OH, 1985. USING CAS ONLINE: The Registry File, Structure Searching, Volume 1V:Chemical Abstracts Service: Columbus, OH. 1985. "SING CAS ONLINE: The Registry File, Building Structures, VoC ume N: Chemical Abstracts Service: Columbus, OH, 1985. This search was oerformed usina a Woe-2 araohics terminal. Visual 550 (manufactiredby Visual ~ichnoibgy.ik)'interfaced with a Password modem (manufacturedby US. Robotics, Inc.) and set for ~

~

a t. e 1200 baud .r-. . -. .. -. .

'The chemical substance entries from the Chemical Substance index only appearas Registry Numbers, not by the systematic nomenclature name. Therefore, by not using Registry Numbers, the searcher would not search the Chemical Substance Index, a major pan of Chemical Abstracts indexing.

Reeistrv Numbers of chemical substances. The answer set cre'itedlna Registry Filesearch is then used asasearch term in the CA File or CAOLD Fileq (this transfer is referred to as "file crossover"). The substance name is searched in the /CN field (Table 1) to retrieve hasic information on cyclobutadiene. The answer retrieved is dis~lavedin the SUB format (Tahle 2) giving us, among others,the Registry Number of this comp&md,the number of pertinent references in the CA File, and earlier references in the CAOLD File.

-

FILEREG FILE 'REGISTRY' ENTERED AT 17:02:11 ON 24 APR 86 COPYRIGHT 1986 BY THE AMERICAN CHEMICAL SOCIETY

Table 2. Dlsplay Field Codes In the Reglslry Flle' Codes

Fields

RN IN SY DR

CAS Registry Number CA Index Name Synonym Deleted Reglsby Number Molecular Famula Substance Class ldentifler Source of Registration ISOIDPB at Unknown Location StereochemistryTen Descriptor Com~onentNumber

MF

CI SR IL ST CM

sSCYCLOBUTADIENE/CN L1

1CYCLOBUTADIENEICN

-r&m.B L1 ANSWER 1OF 1 IN SY SY MF

1,3-Cyclobutadiene(ICI, 8CI,9CI) (4lAnnulene Cyclobutadiene (GCI) C4 H4

fi ,OM REFERENCES IN FILE CAOLD (PRIOR TO 1967) 228 REFERENCES IN FILE CA (1967to DATE) The answer set L1 (i.e., the Registry Number) is then searched (crossed over) in the CA File as a search term. As we are interested only in the references on the preparation of cyclobutadiene, a slashed suffix "P" is addedg to L1. All retrieved citations (L2) can then be rapidly scanned using the free T I format (Table 2). 4

format moults in Registry Number. CA lndex name; up to 50 sinonyms, moiecular formula. and strunurs diagram fields. Numbsr of referencesin CA Fileand indlcatlon mat mere are references in C A O U ~ l l is e also aloolaved.Disdavina In ALL format results in

FILECA

Problem 2. Are there any isotopically labeled cyclobutadienes? The search for isotopically labeled compounds can he performed in several wavs. Provided we know their CA Index Names (e.g., 1,3-CYCLOBVTADIENE-1-D;1.3-CYCLOBL'TADIENE-1.3-13C2) or svnonvms (e.e.. CYCLOBUTADIENE-D) exactly as postedin the ~ e g i s t i yFile, the most straiahtforward amroach is to conduct a search in the /CN .. field: An efficient way of retrieving deuterated (tritiated) compounds can be based on molecular formula searching since deuterium and tritium are specifically posted in the Hill system order (Table 1).This is especially convenient if we are not certainabout the nameof the substance or we wish to conduct a more general search. Aretrieval of citations on the perdeuterated cyclobutadiene will be illustrated in the following example.

methylene-cyclopropene L2 ANSWER 2 OF 17 T I Small rings. Part 49. Tricyclo[2.1.0.02,5]pentan-3-one

Similarly, a crossover to the File CAOLD is performed retrieving 31 earlier references (L3). The information displayable in this file is very limited, however. The ALL formats is used to display the Accession number of the first reference. Any additional information on the references found in this file must then be obtained by manually searching for these Accession numbers in printed Chemical Abstracts. FILECAOLD

-u *

L3

31 L1 DL81ALL L3 ANSWER 1OF 31 AN CA65:6524g IT 1120-53-2

8 CAOLD File contains references to substances registered prior to 1967. The ALL format will display the CA Reference Number (a column number and fraction in printed CA) in the AN field, and the Registry Number(s)in the IT field. * If we know the Registry Number of cyclobutadiene, the identical search can be conducted directly in the CA File by typing: S 1120-532P. In the Registry File, we can directly displaythe necessary information by typing: D ACC 1120-53-2followed by a pertinent format (Table

2).

L1 ANSWER 4 OF 6 IN 1,3-Cyclobutadiene-l,2,3,4-d4 (9CI)

7 REFERENCES IN FILE CA (1967TO DATE) The molecular formula of cyclobutadiene-d4 is C4D4. The search in the /MF (molecular formula) field (Ll) retrieves six isomers (Ll), which are rapidly scanned by the free display format TRIAL (Table 2). The cyclobutadiene-d4 is found in Answer 4 and there are a total of seven references (citations) on that molecule in the CA File. Since the /MF search sometimes retrieves a large number of isomers (bundreds. even thousands), it mav be necessarv t o reduce this number to a manageable size In order to identify the right substance. Instead of displaying all six isomers (imagine 600!) we can refine our search strategy by combining L1 with chemically significant name fragments of the desired suhstance (Table 1)connected with operators. Since among all possible C D 4isomers, we are interested only in the cyclobutadiene. a auerv can he constructed (see Problem 4) bv combining L l with the name fragment CYCLO fo~~dwed immediatelv bv BUTADIENE (~roximitvooerator ( W l ) . The answe; s 2 L2 is then disprayed inth; ALL format Volume 66 Number 1 January 1989

27

(Table 2). The ALL format consists of the SUB information and the bibliographic, abstract, and indexing information for up to 10 of the latest references to that Registry Number (only the first reference is shown here). S L1 AND CYCLO(W)BUTADIENE 1032741 CYCLO 10468 BUTADIENE 961 CYCLO(W)BUTADIENE 1 L1 AND CYCLO(W)BUTADIENE D L2 ALL ANSWER 1OF 1 56516-62-2 1,3-Cyclobutadiene-l:2,3,4,-d4 (9CI) Tetradeuterocyclobutadiene Tetradeuteriocyclobutadiene

C4 D4 DC-CD

SY L-Aspartyl-L-phenylalaninemethyl ester NSY Succinamic acid, 3-amino-N-(.alpha.-carboxyphenethy1)-, methyl ester, stereoisomer (8CI) SY Aspartylphenylalanine methyl ester methyl ester SY L-.alpha.-Aspartyl-L-phenylalanine SY L-Aspartyl-L-3-phenylalaninemethyl ester methyl ester SY .alpha.-L-Aspartyl-L-phenylalanine SY Methyl aspartylphenylalanate SY Sweet dipeptide SY Aspartame SY L-Aspartyl-L-phenylalanylmethyl ester SY Canderel SY Nutrasweet SY .alpha.-Sweet DR 7421-84-3,5390-69-7 MF C14 H18 N2 05 CI COM,TSCA ST 5:L,L

I1 1I

DC-CD

. -.-- -. RENCES IN FILE CA (1967 TO DATE) REFERENCE 1 AN CA101(7):54132s TI Tunneling dynamics of cyclohutadiene AU Dewar, Michael J. S.; Merz, Kenneth M., Jr.; Stewart,James J. P. CS Deot. Chem.. Univ. Texas . 18712. USA LO ~ u s t i nTX SW:, SO J. ~ m . ~ h e m . lO6(14),4040-1 SC 22-2 (Physical Organic Chemistry)

DT -.J .

CO IS PY LA AB

JACSAT 0002-7863 1984 ENG MIND13-CI calms. for evclobutadieneindicate that interconversion of the two valence isomers of the sinelet - mound ~ ~atate~ by tunneling should be very rapid andshould lead toa splitting of the ground state vibrational wave function by ca Scm-.. KR MO tunneling automerization cyclobutadiene IT Tunneling (in automerization of hutadiene) IT Molecular orbital (MINDO, automerization of cyclohutadiene in relation to) FT Kinetics of isomerization (automerization, of eyclobutadiene) IT Isomerization (automerization, of cyclobutadiene,tunneling dy. namies of) IT 1120-53-256516-62-2 90968-11-9 (automerization of tunneling dynamics of) REFERENCE 2 AN CA99(1):4809n ~~

~

~~~~

~~~~

~

Problem 3. Retrieve basic chemical information on aspartame, an artificial sweetener.

--

F I m m S ASPARTAMEKN 1ASPARTAME /CN L1

Registry File can also be searched for compounds for which thesearcher hasonly their trivial names (brand names, trade names, acronyms, incomplete names, synonyms, etc.). By conducting the search in the ICN field (Table 1) we retrieve one citation (Ll) confirming that the sweetener is posted in the Registry File under its trivial name. Basic information including the structure diagram is then displayed in the SUB format (Table 2). Ll SUB L1 ANSWER 1OF 1 RN 22839-47-0 IN L-Phenylalanine, N-L-.alpha.-aspartyl-, l-methyl ester (9C1)

-

28

Journal of Chemlcal Education

REFERENCES IN FlLE CAOLD (PRIOR TO 19671 610 REFERENCES I N FILE CA (1967 TO DATE)

A search for additional information can then be conducted in the CA File utilizing file crossover with L1. We may be interested, e.g., in aspartame in beverages (L2), in food (L3), etc.

-

-E&&GA S L1 AND BEVERAGE* 504 L1 5367 BEVERAGE* ~2 49 AND BEVERI;\GE# " 6~ L1 AND FOOD 504 L1 42526 FOOD L3 60 L1 AND FOOD Problem 4. Is this compound known?

b*-. Me NH,

Me

This is one of the most frequent questions a searcher encounters. Theanswer can always be found by performing a structure search4 in the Registry File as this is the most comprehensive and accurate way. A faster and certainly much less expensive approach is to conduct a name search in the ICN field (Table 11, provided we know the Index Name or any synonym for the substance. In case we are not confident about the nomenclature of the compound or simply we do not know it a t all, we can base our search on nomenclature name fragments (Table 1). This is the last of the three dictionary terms we wish to present in this paper. This technique utilizes a partial name of the substance. starting with any recognizable name fragmentk) connected by peryinent operators. Thisapproach isquite general and will be demon. strated in this case. The search starts in the /hfF field and retrieves 99 isomers of the molecular formula Cl?HlsN5O2 (Ll). The number is too high to be scanned by the TRIAL format (Table 2). Inspecting the molecular structure we can recognize a dimethoxyphenyl group, which obviously will be a part of the name. It is a good starting point which in combination with the IMF search (L1) narrows the field of the isomers to four (L2):

-

DIlW)METH(W)OXY(W)PHENYL 3530816 DI 857439 METH 1713131 OXY 2418962 PHENYL 36096 DI(W)METH(W)OXY(W)PHENYL 4 L1 AND DI(W)METHY(W)OXY(W)PHENYL

Rapid scanning for the IndexName and a structure diagram is best achieved by the TRIAL format (Table 2)-about 10 structures can be viewed per minute6. The desired compound is identified in Answer 3:

-

D L2 1-4TRI

L2 ANSWER 3 OF 4 6-[(2,5-dimethoxypheny1)IN Pyido[2,3-dlpyrimidine-2,4diamine, methylld-methyl(9CI) (STRUCTUREDIAGRAM.. .)

More information is then displayed using the SUB, BIB, ABS formats (Table 2). U p to 10 of the latest references are displayed with bibliographic and abstract information for this substance (only the first one of the total of 15 is shown here).

15 REFERENCES IN FILE CA (1967 TO DATE) REFERENCE 1 AN CA103(23):194044u TI Quantitativestructur-ctivity relationship of antifolate inhibition of bacteria cell cultures resistant and sensitive to methotrexate AU Coats.. Eueene A,: Genther. Clara S.: Selassie.. Cvnthia Dias: . Strong. Cvnthia D.: ~ a n s e hComiu . CS Coll. l?ha;m. LIniv.'cineinnk LO Cincinnati, OH 45267, USA SO J. Med. Chem., 28(12),1910-16 SC 10-5 (Microbial Biochemistry) DT J CD JMCMAR IS 0022-2623 PY 1985 LA Eug AB The antifolates I (R = H, S02NH2, CONHs CF*, Me,

-

'O STN is introducing a new series of computer-based tutorials on floppy disks that run on IBM PC's (256 KRAM, MS-DOS 2.0 or higher). CAS ONLINE users mav be Darticularlv interested in STN Mentor: STN Overview (free of chergej and STN Mentor: Introduction to CAS Online. They may be used to demonstrate simulated searching to beginners. Additional lecture material, "Introduction to Computer Searching on STN International", can be obtained free from the ACS Division of Chemical Information.

OCHLsHr, etr.) and Il IR = H. CI. OH, Me. YHAc. OCHI CHjOMe, OS02Me. etc.) were evaluated as inhrhitora of Escherirhis coli dihydrofolate reductase and . . .

-KGQm Conc'uslon T h e examples presented in this paper cover a very broad spectrum of chemical topics. They profile the most important searches in the Registry File which do not require graphical input of structure diagrams. I n order to acquire a good working knowledge of this file1" we wish to present several additional prohlems which we found particularly interesting. All were tested and the answers may be readily retrieved; sometimes more than one approach can be tried.

(1) How many isomers of the molecular formula CsH6 are there? How many isomers of benzene (includingradicals, ions, isotopically labeled molecules) can you find? (2) How many mono-(di-,etc.) deuterated (tritiated) benzene molecules can you retrieve? (3) Can you find any benzene molecules containing 1% P C , 11C) atoms? (4) How would you synthesize aspartame? (5) Search for other artificial sweeteners (e.g., saccharin, cyclamate). Find our how to analyze them by chromatographic(TLC, HPLC) or spectrwcopic (UV, IR, NMR) methods. (First retrieve their Index Names and Registry Numbers, and then conduct further search in the CA File usina file crossover). (6) In Problem 4 (C17H19Ei5021MF)formulate the query using other name fragmentsand proximity operators le.g.. MKTHYL, AhllN?: (L)).How maw structures will be retrieved if these name fragments are used? (7) Draw a structure of your choice and find out if it is known. (Base your search on the molecular formula ( N F ) and name fragments). (8) Obtain the Registry Number of a compound of your interest (e.g., in Aldrich Catalog Handbook) and display hasic chemical information in the Reeistrv File. Conduct a search in the CA File using the ~egistry"~umber.~ Note: Sometimes retrieved information may meet formal requirements of the search but does not meet the intent of the searcher. This leads to unusual and unanticipated results called "false drops". They are practically unavoidable as i t is very difficult to spot them beforehand. They are usuallv caused bv incomolete auerv loeic a t the searcher's level, a d any rededy h& to beca&fulry considered within that logical content. At the end, we would like to share this amusing experience and offer several typical examples of false drops we came across with in the course of programming: (1) An effort to retrieve citations on the chromium(V1) oxidation state (one of the search terms was CR(W)6)also retrieved metallurgic alloys containing 6% Cr. In this case, an improved query may include NOT ALLOY. (2) A program using the search term SODIUM(L)CARBONATE retrieved several citations containing exclusively sodium chloride and potassium carbonate. The prahlem can be corrected by applying the (W) operator or by using the Registry Numher of the above compound. (3) Being interested in pheromones produced by bees (the search term BEE?), the searcher also retrieved beetle pheromones. In this case truncation was too hroad, it is better to use BEE#. Acknowledgment We are grateful to Hedy Mulhausen, Chemical Abstracts Service. Columbus. OH. for reviewine the m a n u s c r i ~ tand adding critical comments.

-

Volume 66

Number 1

January 1989

29