Finding Chemical Information through Citation ... - ACS Publications

His Ph.D. in structural linguistics gave him a breadth of view of the scientific literature that transcended the discipline of chemistry. A key decisi...
0 downloads 0 Views 45KB Size
Information • Textbooks • Media • Resources edited by

Chemical Information Instructor

Arleen N. Somerville Carlson Library University of Rochester Rochester, NY 14627-0236

Finding Chemical information through Citation Index Searching Allan L. Smith Chemistry Department, Drexel University, Philadelphia, PA 19104; [email protected]

Three Scenarios Consider the following three scenarios, each of which poses a problem in chemical information retrieval (1). 1. A chemist has found a key article, published five years ago, which is relevant to a new research field she is entering. She wants to expand her knowledge of the field by finding related articles, both older and more recent (2). 2. An early worker in the field of information science wonders if there is an alternative to the standard method of indexing the scientific literature by preparing key word and subject indexes. He finds that the legal profession has been provided with a tool for searching legal decisions based on listing the citations to precedents used in cases decided by courts, and wonders if such an approach could be used to index the scientific literature (3). 3. A sociologist/historian of science is studying the structure of science by examining scientific breakthroughs, such as the development and validation of the DNA theory of genetic coding that controls protein synthesis. He has found an authoritative account of the breakthrough but wants to compare that account with the results of an automated method of identifying key papers and key authors in the field.

Citation Indexing: The Solution to Each Problem The solution to the chemist’s problem in the first scenario emerged from the pioneering work of the information scientist of the second scenario, and this work provided the basis by which the problem posed in the third scenario could also be solved. The key concept in all three is citation indexing, and the science information pioneer who recognized this in the 1950s was Eugene Garfield (1). In his book, Garfield gives a succinct description of citation indexing (2, p 274): The concept of citation indexing is simple. Almost all the papers, notes, reviews, corrections, and correspondence published in scientific journals contain citations. These cite—generally by title, author, and where and when published—documents that support, provide precedent for, illustrate, or elaborate on what the author has to say. Citations are the formal, explicit linkages between papers that have particular points in common. A citation index is built around these linkages. It lists publications that have been cited and identifies the sources of the citations. Anyone conducting a literature search can find from one to dozens of additional papers

on a subject just by knowing one that has been cited. And every paper that is found provides a list of new citations with which to continue to search.

In Chapter 2 of his book, Garfield describes the steps that led him to publish the first edition of Science Citation Index in 1963 (3). His early experience as an abstractor for the Chemical Abstracts Service convinced him of the value of including in each abstracted record the citations of that record. His Ph.D. in structural linguistics gave him a breadth of view of the scientific literature that transcended the discipline of chemistry. A key decision in Garfield’s early work was to create a truly multidisciplinary data base by indexing scientific publications in all of the sciences and he founded the Institute for Scientific Information (ISI) in order to implement this idea. Since chemistry is viewed as a central science by its practitioners, the overlap of the literature in chemistry and in related fields such as molecular biology, medicine, environmental science, agriculture, and technology is expected to be significant. A multidisciplinary database covering all the sciences is thus quite important to chemists. A good discussion of ISI’s Science Citation Index (SCI ) and its related online product SciSearch can be found in Chapter 4 of Wiggins’s book (4). ISI’s Web site, http://www.isinet.com, also has essays on citation indexing and descriptions of all ISI’s database products, including the Web of Science, a graphical user interface that provides Web access to Science Citation Index. SCI covers more than 3500 of the world’s most important scientific journals. An expanded version of SCI covers more than 5600 journals. ISI has developed a method to measure the impact of a particular journal on the basis of the frequency with which it is cited. These “impact analyses” are published regularly and form the basis for inclusion of journals in SCI and SciSearch. Every journal article, or citing reference, contains older or cited references, and the Science Citation Index has print products that enable one to find a citing reference from a cited reference and vice versa. As Wiggins states (4, p 352), The unique feature of citation indexing is that it leads to a bibliography of newer works on a subject without having to use subject terms. In other words, the Science Citation Index lets the searcher work forward in time to compile a bibliography by using an earlier relevant work.

Cited references are of necessity older than citing references and can precede the establishment of SCI by decades. For example, a classic paper by Albert Einstein (5) in 1906 has been cited more than 750 times since 1974, and the citing

JChemEd.chem.wisc.edu • Vol. 76 No. 8 August 1999 • Journal of Chemical Education

1153

Information • Textbooks • Media • Resources

references to this classic paper may be retrieved by using either SCI or SciSearch. SciSearch is available through the database vendors Knight-Ridder Dialog, Orbit, or STN International, on CD-ROM, and via the World Wide Web. SciSearch is one of the databases that can be included in Dialog’s Classroom Instruction Program. The searches given in this article were performed in the Dialog implementation of SciSearch using the Dialog command language. At the beginning of 1997, SciSearch had more than 14,800,000 records, including more than 800,000 added in 1996 and all the citing reference records in SCI from 1974 onwards. About seven years ago, SciSearch began to include authors’ abstracts in their records. ISI thus does not rely on chemically or scientifically trained staff to abstract and index each journal article: the authors’ own words describe the article and the citations are the primary index terms. Having a print copy of the most current database summary sheet available while you search is very useful in optimizing your search strategies. The Dialog version of SciSearch is described in Dialog’s BlueSheet for File 434, available via the URL http://library.dialog.com/bluesheets/html/bl0034.html or by way of the bluesheets page at http://library.dialog.com/ bluesheets/. At this site you will find a description of the database, its subject coverage, and commands for searching both the basic indexes and additional indexes. A sample of a full record is also included. The STN version of SciSearch is described in http://info.cas.org/ONLINE/DBSS/scisearchss.html. Teaching Citation Index Searching It is important to include instruction on Science Citation Index in any workshop or course on chemical information. Students should realize that searches using the same keywords but in different databases produce different results because of differing indexing policies. In addition, citation searching is a good way to develop a bibliography without using key words at all, but just by starting with a single older key reference to a topic. As Huber states in his useful article on the unique features of searching SciSearch on STN (6, p 52) The great selling point of Science Citation Index has always been the ability to follow the progress of a research field forward by cited reference searching.

A good way to introduce citation indexing to a class already familiar with bibliographic chemical databases and with the concept of cited references is to ask the class to name famous scientists, and list them on the board. Many names will show up, but among them will almost certainly be Linus Pauling. Then, ask the class to tell you why Linus Pauling is famous: what did Pauling actually do as a chemist? You will be surprised at the answers, because most students have no idea of the breadth of Pauling’s contributions. Explain that the class will discover why Pauling is famous by examining his cited references. Present the field structure of the SciSearch database and explain the difference between the “author” field and the “cited author” field. View the author index : e au=pauling l Ref Items Index-term E1 1 AU=PAULING K E2 3 AU=PAULING KD 1154

E3 153 *AU=PAULING L E4 4 AU=PAULING LC Then view the cited author index: ?e ca=pauling l Ref Items Index-term E1 1 CA=PAULING JR E2 14 CA=PAULING KD E3 16321 *CA=PAULING L Point out that Pauling is famous because is he is one of the most widely cited chemists. Use the CR (Cited Reference) index to examine citations to Pauling’s earliest papers in the 1920s on crystallography ?e cr=pauling l, 1920 Ref Items Index-term E7 20 CR=PAULING L, 1924, V46, P2738, E8 4 CR=PAULING L, 1924, V46, P2738, E9 1 CR=PAULING L, 1925, V47, E10 1 CR=PAULING L, 1925, V47, P1027, E11 3 CR=PAULING L, 1925, V47, P2148, E12 2 CR=PAULING L, 1925, V47, P2904, E13 14 CR=PAULING L, 1925, V47, P781, E14 3 CR=PAULING L, 1925, V47, P781, E15 1 CR=PAULING L, 1926, V27, P568, E16 3 CR=PAULING L, 1926, V40, P344, E17 1 CR=PAULING L, 1926, V48, P1132, E18 9 CR=PAULING L, 1926, V48, P641, E19 1 CR=PAULING L, 1926, V87, P377, E20 1 CR=PAULING L, 1927, V114, E21 3 CR=PAULING L, 1927, V114,P181, E22 1 CR=PAULING L, 1927, V114,P181, E23 108 CR=PAULING L, 1927, V114,P181, E24 102 CR=PAULING L, 1927, V114, P181,

J AM CHEM SOC J AM CHEMICAL SOC J AM CHEM SOC J AM CHEM SOC J AM CHEM SOC J AM CHEM SOC J AM CHEM SOC J AM CHEMICAL SOC PHYS REV Z PHYS J AM CHEM SOC J AM CHEM SOC Z KRISTALLOGRAPHIE P R SOC LOND A P R SOC A P ROY SOC A P ROY SOC LOND A P ROY SOC LONDON

Pauling’s first publications on the quantum theory of atoms and molecules in 1931 E12 158 CR=PAULING L, 1931, V53, P1367, E13 22 CR=PAULING L, 1931, V53, P1367, E14 1 CR=PAULING L, 1931, V53, P1369, E15 1 CR=PAULING L, 1931, V53, P1376, E16 1 CR=PAULING L, 1931, V53, P1967, E17 48 CR=PAULING L, 1931, V53, P3225,

J AM CHEM SOC J AM CHEMICAL SOC J AM CHEM SOC J AM CHEM SOC J AM CHEM SOC J AM CHEM SOC

Pauling’s early influential book on quantum mechanics coauthored with E. B. Wilson, from which his chemist contemporaries learned quantum mechanics E20 229 CR=PAULING L, 1935, INTRO QUANTUM MECHAN

and its translation into Russian E26 96 CR=PAULING L, 1947, PRIRODA KHIMICHESKOI

Pauling’s famous book The Nature of the Chemical Bond, cited several thousand times, and each year since its publication almost 60 years ago E22 100 E23 13 E24 6 E25 1 E26 33

CR=PAULING L, 1939, CR=PAULING L, 1939, CR=PAULING L, 1939, CR=PAULING L, 1939, CR=PAULING L, 1939,

NATURE CHEM BOND NATURE CHEM BOND STR NATURE CHEM BONDS NATURE CHEMICAL BAND NATURE CHEMICAL BOND

Pauling’s chemistry textbooks, which have had a profound

Journal of Chemical Education • Vol. 76 No. 8 August 1999 • JChemEd.chem.wisc.edu

Information • Textbooks • Media • Resources

impact on the way general chemistry has been taught E18 E19 E49 E50

1 5 58 18

CR=PAULING L, 1947, CR=PAULING L, 1947, CR=PAULING L, 1970, CR=PAULING L, 1970,

COLLEGE CHEM GENERAL CHEM GENERAL CHEM GENERAL CHEMISTRY

Even though the earliest SciSearch record in which Pauling is an author dates only back to the late 1960s, citation analysis of these papers is also interesting because the full bibliographic record reveals in detail the topic of the paper. By examining the most recent records that cite Pauling’s classic works, students get a vivid picture of the breadth of Pauling’s intellect and his contributions as the century’s most prominent chemist. Sample Searches

Building a Bibliography from a Key Reference I recently published a paper on the gas-phase absorption spectrum of C60, buckminsterfullerene (7). I needed to know other references to the absorption coefficient of gas-phase C 60 published since the first paper by B. B. Brady, J. Chem. Phys. 1992, 97, 3855. In a few minutes, using SciSearch, I had the answer, as shown by the following Dialog search. ?e cr=brady bb, 1992 [Expand on the cited reference index to identify the paper by Brady in 1992] Ref Items Index-term E1 1 CR=BRADY BB, 1988, V147, P53, CHEM PHYS LETT E2 13 CR=BRADY BB, 1988, V147, P538, CHEM PHYS LETT E3 0 *CR=BRADY BB, 1992 E4 10 CR=BRADY BB, 1992, V97, P3855, J CHEM PHYS

?s e4 [search for all papers which cite the 1992 Brady paper] S2 10 CR=“BRADY BB, 1992, V97, P3855, J CHEM PHYS”

?t 2/ti/1-5 [Look at the titles of these papers] 2/TI/1 DIALOG®File 434:© 1996 Inst for Sci Info. All rts. reserv. Title: ABSORPTION-SPECTRUM OF C-60 IN THE GAS-PHASE AUTOIONIZATION VIA CORE-EXCITED RYDBERG STATES

[additional display omitted]

The Gasteiger Problem Here is an in-class SciSearch assignment I give my graduate class in Chemical Information Retrieval. Recently I heard an excellent talk by J. Gasteiger at an American Chemical Society meeting symposium on chemical information. Using the SciSearch database on DIALOG, answer the following questions: 1. What is Gasteiger’s present affiliation? 2. For how many records in SciSearch is Gasteiger one of the authors? 3. What is the earliest publication referenced in SciSearch by Gasteiger? 4. Find a paper by Gasteiger that is cited by more than 300 subsequent papers. 5. Find the most recent review article which cites the paper you found in question 4. 6. In his talk, Gasteiger mentioned a review article he published in 1993 in Chemical Reviews. Find this

article. 7. There is a subsequent paper by Gasteiger in 1994 in the Journal of Chemical Information and Computer Sciences on a closely related topic. From the abstract of this article, describe the work.

For question 3, most students will search the author index for papers by Gasteiger and find that the earliest reference is in 1976. But by searching the cited reference index, one finds a paper by Gasteiger in 1971. For question 6, students should use the document type command DT= REVIEW.

Garfield’s Examples In a chapter entitled “The Citation Index as a Search Tool”, Garfield gives examples of searches to verify bibliographic citations; to find information on a subject that is named for a person (eponymic searches, e.g. the Franck–Condon principle); to find information on a methodological technique; to follow up on an early development; to find information on a concept; to answer specific question; to find multidisciplinary information; to do either a quick state-of-the art search or a comprehensive bibliography search. While most if not all of these searches may be done in Chemical Abstracts On-line or other subject-oriented databases with appropriate Boolean logic, the citation index approach provides an alternative route, which often produces a higher level of relevance for the same expenditure of time and search resources. Huber’s Examples Huber (6 ) present many powerful examples of combining SciSearch with STN’s SmartSelect command to sort records, perform citation analysis, obtain impact factors for authors and journals, and do “related record searching”. A related record is defined as one that shares at least one cited reference with the parent record (6 ). Discussion Citation indexing has been used by Garfield and ISI to develop a method for identifying the significant journals of science (8). It has also been extensively used (and sometimes abused) in the assessment of the contributions of individual scientists (9). While citation indexing is a powerful concept, its implementation is not without problems. When the cited reference index in SciSearch is used to search for a key paper in a rapidly emerging field, some of SciSearch’s limitations become obvious. For example, Wolfgang Krätschmer is well known in fullerene research for his key 1990 paper in Nature (10) on a method for preparing C60 fullerene from graphite soot. By using the command e cr=kratschmer w, 1990 while in SciSearch, one gets the following display: Ref Items Index-term E1 2 CR=KRATSCHMER W, 1989, DUSTY OBJECTS UNIVER E2 1 CR=KRATSCHMER W, 1989, V347, P354, NATURE E3 0 *CR=KRATSCHMER W, 1990 E4 1 CR=KRATSCHMER W, 1990, DUSTY OBJECT UNIVERS E5 5 CR=KRATSCHMER W, 1990, DUSTY OBJECTS UNIVER E6 1 CR=KRATSCHMER W, 1990, IN PRESS DUSTY OBJEC E7 1 CR=KRATSCHMER W, 1990, P167, CHEM PHYS LETT E8 1 CR=KRATSCHMER W, 1990, P336, NATURE E9 2 CR=KRATSCHMER W, 1990, P347, NATURE E10 10 CR=KRATSCHMER W, 1990, P89, DUSTY OBJECTS UNIVER E11 1 CR=KRATSCHMER W, 1990, V147, P354, NATURE

JChemEd.chem.wisc.edu • Vol. 76 No. 8 August 1999 • Journal of Chemical Education

1155

Information • Textbooks • Media • Resources E12 2 E13 1 E14 2 E15 3 E16 528 E17 1 E18 1 E19 1 E20 4 E21 1 E22 2 E23 1 E24 3 E25 3 E26 21 E27 2 E28 16 E29 1 E30 2 E31 1 E32 4 E33 1 E34 8 E35 2 E36 1 E37 1 E38 1 E39 1 E40 1 E412435 E42 1 E43 1 E44 1 E45 1 E46 2 E47 1 E48 2

CR=KRATSCHMER W, 1990, V170, P107, CHEM PHYS LETT CR=KRATSCHMER W, 1990, V170, P160, CHEM PHYS LETT CR=KRATSCHMER W, 1990, V170, P162, CHEM PHYS LETT CR=KRATSCHMER W, 1990, V170, P1667, CHEM PHYS LET CR=KRATSCHMER W, 1990, V170, P167, CHEM PHYS LETT CR=KRATSCHMER W, 1990, V174, P219, NATURE CR=KRATSCHMER W, 1990, V318, P354, NATURE CR=KRATSCHMER W, 1990, V329, P529, NATURE CR=KRATSCHMER W, 1990, V34, P354, NATURE CR=KRATSCHMER W, 1990, V342, P354, NATURE CR=KRATSCHMER W, 1990, V346, P354, NATURE CR=KRATSCHMER W, 1990, V347, NATURE LONDON CR=KRATSCHMER W, 1990, V347, P162, NATURE CR=KRATSCHMER W, 1990, V347, P167, NATURE CR=KRATSCHMER W, 1990, V347, P254, NATURE CR=KRATSCHMER W, 1990, V347, P3, NATURE CR=KRATSCHMER W, 1990, V347, P345, NATURE CR=KRATSCHMER W, 1990, V347, P35, NATURE CR=KRATSCHMER W, 1990, V347, P351, NATURE CR=KRATSCHMER W, 1990, V347, P353, NATURE CR=KRATSCHMER W, 1990, V347, P354, CHEM PHYS LETT CR=KRATSCHMER W, 1990, V347, P354, JATURE LONDON CR=KRATSCHMER W, 1990, V347, P354, NAT CR=KRATSCHMER W, 1990, V347, P354, NATRE CR=KRATSCHMER W, 1990, V347, P354, NATUE CR=KRATSCHMER W, 1990, V347, P354, NATUE LONDON CR=KRATSCHMER W, 1990, V347, P354, NATUER CR=KRATSCHMER W, 1990, V347, P354, NATUR CR=KRATSCHMER W, 1990, V347, P354, NATUR E CR=KRATSCHMER W, 1990, V347, P354, NATURE CR=KRATSCHMER W, 1990, V347, P354, NATURE V CR=KRATSCHMER W, 1990, V347, P354, NATUREL CR=KRATSCHMER W, 1990, V347, P354, NATUREW CR=KRATSCHMER W, 1990, V347, P354, NATUURE LONDON CR=KRATSCHMER W, 1990, V347, P354, SCIENCE CR=KRATSCHMER W, 1990, V347, P355, NATURE sCR=KRATSCHMER W, 1990, V347, P357, NATURE

While the correct bibliographic reference is indeed found in item E41 (2435 citations!) notice that virtually every other item in this list is an incorrect bibliographic citation to the same paper: misspellings of Nature, incorrect volume or page number, incorrect journal names, or combinations of these errors. It becomes difficult to do a complete citation analysis when these errors are present. Some of them are due to incorrect bibliographic citations by the authors of the citing references, and some are due to keying errors by the database producer. An indexing policy of ISI is that cited references refer only to the first author of multi-author papers. This famous paper in Nature is often referred to as the Krätschmer– Huffman paper on C60 synthesis, but expanding in the cited reference index on the name of any of the other authors fails to locate the reference. One can work around this limitation in subsequent searches by finding the citing reference or record to the paper itself and examining the author list. Huber (6 ) describes a way to find authors’ citation count even if they are coauthors, using the STN implementation of SciSearch. The subject of citation indexing itself is discussed in the chemical literature. The history of citation indexes for chemistry is reviewed by Garfield (11). He describes an extension of citation indexing called co-citation clustering, which is now widely used for automatic hierarchical classification and mapping of the scientific literature. Co-citation clustering is the technique that Garfield himself used to examine the emergence 1156

of the DNA theory of genetic coding, and he checked his results against the description Isaac Asimov gives in his book The Genetic Code (12). Wiggins shows how to perform simple co-citation searching (finding a new paper that has cited two or more key papers), a technique that increases the probability that the reference found will be relevant. Co-citation clustering is represented in CD-ROM and WWW versions by the concept “related records”. The Web interface to Science Citation Index, Web of Science, is quite easy to use. For a full description, see http://www.isinet.com/products/citation/wos.html One can perform quick or full searches on authors, keyword, and other bibliographic fields, and cited reference searching is also quite straightforward. On the other hand, no search sets are generated, so it is impossible to refine a search without redoing the search completely. While some universities have subscribed to the Web of Science, many are finding that it is expensive and are joining consortia in order to negotiate more reasonable access rates. Is citation indexing always better than the combination of traditional subject and keyword indexing and Boolean searching? Not necessarily. In a comparison of the two, Synge (13) gives two examples of search strategies and results in synthetic organic chemistry, both involving related organic structures normally identified by substructure searching methods. Synge concludes that one complements the other, but that both are needed for a comprehensive search. ISI has pioneered in the development of other chemical information products incorporating citation indexing. Of particular interest is the Reaction Citation Index (14 ), which combines the chemical reaction database from Current Chemical Reactions (15) with cited references from SCI. For the synthetic organic chemist who must keep abreast of the literature on new chemical reactions, this database provides source articles since 1981, chemical reactions from 1985 to the present (with organic structure representation of reactants and products), and the capability to explore links between reaction, bibliographic, and citation information. While SciSearch does not make CAS Online obsolete, neither does CAS Online render SciSearch irrelevant. Information searchers in the chemical sciences will find citation indexing to be a powerful and useful approach in retrieving information efficiently from the immense literature of chemistry and related scientific disciplines. Acknowledgments I acknowledge helpful conversations with Eugene Garfield and Nikolai Kopelev of ISI. Literature Cited 1. Garfield, E. Science 1955, 122, 108–111. 2. Garfield, E. Citation Indexing—Its Theory and Application in Science, Technology, and Humanities; Wiley-Interscience: New York, 1979; pp 274. 3. Science Citation Index; Institute for Scientific Information, Philadelphia, PA. 4. Wiggins, G. Chemical Information Sources; McGraw-Hill: New York, 1991; p 352. 5. Einstein, A. Ann. Phys.-Leipzig 1906, 19, 289. 6. Huber, C. F. Database 1995, 18, 52. 7. Smith, A. L. J. Phys. B 1996, 29, 4975–4980.

Journal of Chemical Education • Vol. 76 No. 8 August 1999 • JChemEd.chem.wisc.edu

Information • Textbooks • Media • Resources 8. Garfield, E. Nature 1976, 264 , 609–615. 9. Garfield, E. Citation Indexing; Chapter 10. 10. Kratschmer, W.; Lamb, L. D.; Fostiropoulos, K.; Huffman, D. R. Nature 1990, 347, 354–358. 11. Garfield, E. J. Chem. Inf. Comput. Sci. 1985, 25, 170–174. 12. Asimov, I. The Genetic Code; New American Library: New York, 1963; p 187. 13. Synge, R. L. M. J. Chem. Inf. Comput. Sci. 1990, 30, 33–35. 14. Reaction Citation Index; Institute for Scientific Information; http://www.isinet.com/prodserv/chem/rctnindx.html (accessed Mar 1999). 15. Current Chemical Reactions; Institute for Scientific Information; http://www.isinet.com/prodserv/chem/chemreac.html (accessed Mar 1999).

JChemEd.chem.wisc.edu • Vol. 76 No. 8 August 1999 • Journal of Chemical Education

1157