New Kinds of indexes - ACS Publications

By CHARLES L. BERNIER. Chemical Abstracts Service, The Ohio State University, Columbus, Ohio. Chemists have become well accustomed to five kinds of ...
1 downloads 0 Views 627KB Size
NEW KINDS OF INDEXES By CHARLES L . BERNIER

Chemical Abstracts Service, The Ohio State University, Columbus, Ohio

Chemists have become well accustomed to five kinds of indexes. They m a y find i t i m p o r tant t o consider the possibility of new indexes in addition to the usual subject, m o l e c u l a r formula, author, numerical -patent, and organic ring indexes. A number of new kinds of chemic a l indexes have been proposed. Some a r e in limited use.

GROUP INDEXES The chemical-group index,' of these new kinds, is one of the m o r e promising. This kind of index enables the s e a r c h e r t o find r e f e r e n c e s t o compounds containing a common chemical group o r combination of groups. By i t s u s e the selection of r e f e r e n c e s to c l a s s e s of compounds is possible. Generic s e a r c h e s f o r chemical s t r u c t u r e s a r e facilitated. A number of these chemical-group indexes ("group indexes" f o r short) i n punched-card o r computer-tape f o r m a r e now i n u s e . Those a t the Dow Chemical Company,2 The National Institutes of Health? and The Patent Office4 come to mind. A group index was used by the Chemical-Biological Coordination Center in Washington. 5 The s e a r c h e r of these group indexes selects f r o m a list of chemical groups, such a s "amino," "iodo," "ethyl," and "phenyl," those found in the c l a s s of compounds i n which he is interested. The symbols for the chemical groups chosen are correlated t o select a l l indexed s t r u c t u r e s containing these groups. If the s e a r c h e r is interested i n p a r t s of standard groups, e.g., the carbonyl oxygen of the carboxyl group, o r i n groups not on the l i s t , he chooses a l l l i s t e d groups containing those p a r t s o r including the unlisted groups. For example, i f he wished to find r e f e r e n c e s t o chemical s t r u c t u r e s which contain a 3-carbon ring fragment attached by its c e n t r a l carbon atom t o an isopropyl group, he would select "isopropyl group'' f r o m the l i s t as well as all carbon-ring groups and c o r r e l a t e these in p a i r s . The documents o r r e f e r e n c e s selected would be examined f o r relevant m a t e rial and the r e m a i n d e r rejected, Mechanized group indexes which c o r r e l a t e s t r u c m r e with properties alsQ have been built. More elaborate devices f o r selecting r e f e r e n c e s to chemical s t r u c t u r e s have been designed. In Mooer's Zatopleg: f o r example, symbols f o r all atoms and bonds of molecules a r e l i s t e d i n such a way that they can be s t o r e d on a computer tape f o r selection of e v e r y c o m bination and permutation of a t o m s , bonds, and groups i n all s t r u c t u r e s indexed. M o o e r s ' s y s t e m is probably m o r e flexible with r e g a r d t o selection of p a r t s of s t r u c t u r e s than any other yet devised. Experimental work on t h i s s y s t e m

-

62

i s being c a r r i e d out by The Patent Office. The s t r u c t u r e designations produced by t h i s method a r e not unique and require a computer o r the equivalent for proper u s e . They a r e not d e signed for visual searching. Another effective group index for s e a r c h ing by computers has been designed by Norton and Opler.' This s y s t e m also includes coding of position i s o m e r i s m . Thus, 1,2- can be d i s tinguished f r o m 1,3-dichlorobenzene. Notations, such a s the International Notation, can be searched for groups and c o r r e l a tions of groups by computers and even s i m p l e r machines. Notations will be discussed a little l a t e r in relation to cipher indexes. In this connection, Opler has suggested8 that i t should be possible t o p r o g r a m a computer to t r a n s l a t e one group index o r notation into another group index o r notation. If this suggestion proves to be t r u e , a s s e e m s likely, then choice of different successful group indexes and notations becomes of l e s s importance because computers can take the s e a r c h e r rapidly f r o m one notation to the other without laborious manual conversion. It s e e m s possible now to produce a type of Possible group index in published book f o r m . advantages include: wide, economical distribution to chemists, no waiting for answers f r o m a centralized question-answering s e r v i c e , use by those who do not own o r have a c c e s s to a computer, and no waiting for the computer to be available, p r o g r a m m e d , and operated. The r e s u l t s would be available without machine o p e r a tion. F o r simple s e a r c h e s involving correlation of but two listed chemical groups the book-form group index would undoubtedly be f a s t e r as well a s m o r e convenient and l e s s expensive than a computer index. F o r s e a r c h e s involving c o r relation of t h r e e o r m o r e l i s t e d groups, the computer would be m o r e accurate and might be f a s t e r , but it would nearly always be l e s s convenient. F o r s e a r c h e s on unlisted groups, p a r t s of groups, and position i s o m e r i s m , a computer s y s t e m would perhaps usually be f a s t e r and m o r e accurate although the economics would s t i l l probably be in favor of the book-form index a t this date. The proportional frequency of these different types of s e a r c h remains to be m e a s u r e d a s does the effect of the availability of various kinds of group indexes upon the frequency. The u s e r of a book-form group index who wished to know, f o r example, about a l l steroids containing fluorine could readily locate r e f e r ences to such compounds indexed. This bookf o r m index would enable the s e a r c h e r to locate r e f e r e n c e s to compounds containing, for e x a m ple, plutonium, periodate i o n s , %., a s complete, and usually homogeneous collections. The s e a r c h e r who used such group indexes would select names o r symbols for one o r m o r e atoms

'

NEW KINDS O F INDEXES

o r chemical groups found i n the compound o r c l a s s of compounds i n which he was interested. Searching in the index by u s e of the n a m e s o r symbols f o r atoms o r groups he would discover r e f e r e n c e s to any e n t e r e d specific compound i n which he was i n t e r e s t e d by its molecular f o r m ula associated with its groups. O r , if the index lacked the exact compound, he v e r y likely would be l e d to discover r e f e r e n c e s t o analogous and closely r e l a t e d compounds. It is this l a s t f e a t u r e of e a s i l y selecting analogous compounds in the absence of indexed information on the p r e c i s e compound d e s i r e d that m a k e s group indexes in book f o r m especially a t t r a c t i v e . S e a r c h e s of discovery r a t h e r than r e c a l l will often be d i r e c t e d towards compounds yet unprepared. This is s o because the number of known compounds (about a million) is v e r y s m a l l when compared with the number f o r those f o r which the s t r u c t u r e s can be drawn. The s e a r c h e r should always expect t o find analogous r a t h e r than ident i c a l s t r u c t u r e s f o r questions involving discovery r a t h e r than r e c a l l . Knowledge of t h e s e analogous s t r u c t u r e s will usually b e useful; o c c a sionally it will1 be m o r e useful than the information originally sought. F o r questions of r e c a l l , i n which information previously r e a d is sought, the s e a r c h e r should, i n c o n t r a s t , expect always to find the p r e c i s e r e f e r e n c e sought. Group indexes of this n a t u r e would make knowledge of systennatic organic nomenclature e i t h e r u n n e c e s s a r y o'r of m i n o r importance in locating information about compounds. This should be good news f o r those who do not have the time t o l e a r n systema.tic nomenclature and to keep fully informed thereon. A group index i n book f o r m might have the n a m e s of the e l e m e n t s a s m a j o r subdivisions. Thus, under the "nitrogen" division would be found r e f e r e n c e s to a l l indexed compounds which contained nitrogen. As page headings under each of these m a j o r element divisions t h e r e would be a l l of the groups containing, f o r example, nitrogen. Thus, t h e r e would be group n a m e s such a s "amino," "azo," "hydrazino," "hydrazo," "nitro," and "nitroso." Under t h e s e m a j o r page headings f o r groups t h e r e would be subheadings f o r a l l of the other groups found i n the specific compounds indexed. The index ent r i e s would be: found under t h e s e subheadings and would be e a s i l y a c c e s s i b l e through a l l of the groups in the molecules indexed. Finally, the m o l e c u l a r f o r m u l a s and notations ( o r n a m e s ) of the specific s t r u c t u r e s indexed would be included in o r d e r to pin-point the specific compounds for those who r e q u i r e d this highly specific type of s e a r c h . Such indexes could be published to r e f e r t o a definite unit of literat u r e , e.g., book, volume of a journal, i s s u e of the journal, fitc, P r e s e n t - d a y m o l e c u l a r - f o r m u l a indexes enable selection of r e f e r e n c e s to individual c o m pounds and a l s o selection of r e f e r e n c e s t o one

63

o r only a few homogeneous c l a s s e s of compounds depending on the a r r a n g e m e n t of the symbols within the molecular f o r m u l a s . F o r example, the Hill s y s t e m of placing carbon f i r s t , hydrogen second, and the r e s t of the elements alphabetically t h e r e a f t e r enables the s e a r c h e r easily t o find all compounds containing the s a m e numb e r of carbon atoms. The Hill s y s t e m index a l s o enables rapid location of information about all i s o m e r s of a given compound. It was not designed t o help the s e a r c h e r with other kinds of generic questions, k., with those involving selection of r e f e r e n c e s to a l l compounds containing the s a m e atoms o r chemical groups. These s a m e statements a l s o apply to formula indexes which u s e different o r d e r s of symbols i n the molecular f o r m u l a s . The Richter s y s t e m , which employs m o r e classification and i s m o r e complicated than i s the Hill s y s t e m , gives s o m e aid i n the selection of r e f e r e n c e s to c e r t a i n c l a s s e s of compounds. Other o r d e r s of symbols in molecular f o r m u l a s have been suggested: for example, all elements alphabetically with the exception of carbon (placed next to l a s t ) and hydrogen (placed l a s t ) . Another o r d e r (used by G. M. Dyson9), is S , N ,0, C , H, and a l l other e l e m e n t s alphabetically preceding these. Each of t h e s e a r r a n g e m e n t s gives one homogeneous type of classification f o r the molecular formulas in the index. If a different type of classification i s sought, t h e s e indexes may prove to be of little use. The group index, however, brings together hundreds of complete, homogeneous c l a s s e s if the index is in book f o r m , and makes l i t e r a l l y a l l c l a s s e s sought complete and homogeneous, i f the index i s in the f o r m of punched c a r d s o r computer tape. By "complete" i s meant that the c l a s s contains a l l r e f e r e n c e s to compounds with given groups. By "homogeneous" i s meant that the c l a s s contains r e f e r e n c e s to only those compounds with these groups; t h e r e i s no e x traneous m a t e r i a l . One of the m o r e interesting f e a t u r e s of the group index in bound f o r m is that i t would not b e , according to p r e s e n t calculations, an indefinitely l a r g e volume. It probably would be, at m o s t , only five o r s i x t i m e s the s i z e of the p r e s e n t molecular -formula indexes, since the average molecule studied at p r e s e n t has fewer than s i x different groups in it.

NOTATION INDEXES Another type of index possible i s the notaChemists have, at t i m e s , tion o r cipher index." experienced difficulty in locating organic compounds by n a m e s in subject indexes, p a r t i c u l a r l y the m o r e complex compounds. When the name of the compound i s not well known, many c h e m i s t s approach organic nomenc l a t u r e and location of information about

C h a r l e s L. B e r n i e r

64

compounds through the molecular -formula i n dex. They calculate the molecular formula and then s e a r c h among the names of the i s o m e r s in the formula index in o r d e r to s e l e c t one which they know to be ( o r which s e e m s t o be) the compound in which they a r e interested. F r o m the formula index they then t u r n to the subject index. T h e r e they m a y find e n t r i e s with modifying p h r a s e s which i n c r e a s e selectivity of r e f e r e n c e s . They may also find r e f e r e n c e s t o useful analogous compounds. This r a t h e r roundabout procedure i s made n e c e s s a r y by the time r e q i i r e d to l e a r n s y s t e m a t i c nomenclat u r e . Searching would be m o r e d i r e c t if they could t u r n t o a cipher index to locate the compound of i n t e r e s t . Deriving c i p h e r s f r o m organic compounds has been proved much e a s i e r to l e a r n than n a m ing them. An index based upon ciphering should, correspondingly, be much e a s i e r to construct and to u s e than one based upon names. The International Notation derived f r o m the Dyson notation can be l e a r n e d for effective u s e , a c cording t o some t e s t s run a t Chemical A b s t r a c t s , in l e s s than one twelfth the t i m e needed for learning systematic organic nomenclature. This advantage of b e t t e r than twelve to one is s o m e what equivalent to buying a $ b , o o o automobile for l e s s than $500. The cost of producing such a notation index in conjunction with a subject index covering the s a m e m a t e r i a l should be somewhat l e s s than the cost of producing a molecular formula index in conjunction with a subject index including s y s t e m a t i c nomenclature because of the saving of space by the s h o r t e r ciphers and because the cost of naming compounds i s g r e a t e r than the cost of ciphering. I a m confident that the differential of a t l e a s t twelve to one in learning t i m e in favor of notations for organic compounds over nomenclature will induce chemists to u s e notations in t h e i r and t o notebooks -- perhaps in publications accept them in indexes. C i p h e r s , a s well a s names of compounds, can be spoken, f o r example, into a dictating machine. J u s t a s going f r o m the name to the s t r u c t u r e i s e a s i e r than the r e v e r s e p r o c e s s , so it is much e a s i e r to go f r o m the cipher t o the s t r u c t u r e . This i s t r u e because knowledge of numbering and precedence i s unnecessary in d r a w ing the s t r u c t u r e s . Thus, a cipher index could be used by those who knew l e s s about ciphering than would be required for producing a cipher index. As I s e e i t , the m o s t important function of the cipher o r notation i s to provide a way of listing organic s t r u c t u r e s that i s m o r e rapidly l e a r n e d than i s the u s e of a l i s t based upon s y s t e m a t i c nomenclature.

--

CORRELATIVE TROPE INDEXES Another type of index which has come into existence in the l a s t decade i s the c o r r e l a t i v e

t r o p e index. Extensive development of this type of index has been c a r r i e d aut l a r g e l y by M r . Calvin Mooers. 11 12 Correlative t r o p e indexes are so named because unrelated terms are c o r r e l a t e d i n the selection of documents. The terms of the vocabulary used in indexing documents are n e c e s s a r i l y of a broad, g e n e r i c n a t u r e in o r d e r t o reduce t h e i r number. The t e r m s u s e d f o r t r o p e indexing a r e usually generically r e l a t e d t o the specific t e r m s that would have been u s e d in standard subject indexing, which is normally t o the maximum specificity. These g e n e r i c terms a r e closely s i m i l a r t o r h e t o r i c a l t r o p e s of the usual specific t e r m s , if not actually identical with such t r o p e s . F o r example, "olefin" might be the t r o p e t e r m used instead of It1,3-butadiene," "thermodynamic p r o p e r t i e s " might be u s e d i n place of "entropy," and " a i r c r a f t i n place of "heliocopters." The purpose of r e d u c ing the s i z e of the indexing vocabulary is t o help the s e a r c h e r find a l l t e r m s f o r c o m p r e hensive and p r o p e r selection. In the u s e of c o r r e l a t i v e - t r o p e indexes, two o r m o r e words selected f r o m the s m a l l g e n e r i c vocabulary a r e c o r r e l a t e d t o s e l e c t documents o r r e f e r e n c e s t o documents. Correlative indexing is, of c o u r s e , not limited t o the u s e of t r o p e o r g e n e r i c t e r m s in the vocabulary. I have been told that punched c a r d s w e r e used by the Mayo Clinic e a r l y in this century f o r correlating the data of medical c a s e h i ~ t o r i e s ? ~The punched-card indexes that have come into prominence i n the l a s t decade a r e large1 c o r r e l a t i v e . A few, notably those of Mooers," a r e c o r r e l a t i v e t r o p e indexes. Difficulties that s o m e index s e a r c h e r s have experienced in answering g e n e r i c q u e s tions probably have been the s o u r c e of m o s t of the present-day i n t e r e s t i n mechanized documentation. Correlation of two o r m o r e t e r m s independent i n meaning and alphabetization was s e e n t o enable i n c r e a s e d selectivity of documents. At one t i m e , it was believed that c o r relation of any t e r m s , specific o r not, would give t r u e g e n e r i c indexing. Actually, it is now s e e n that the ability of a c o r r e l a t i v e index t o handle g e n e r i c questions is a function of the vocabulary and not of the correlation. F o r example, c o r r e l a t i o n of t h e words "nickel" and "electroplating" does not give all information about "coating with m e t a l s .I' It r e q u i r e s c o r relation of the m o r e general words "coating" and "metals" t o produce this r e s u l t . The selectivity of c o r r e l a t i v e indexes is i n c r e a s e d by c o r r e l a t i o n of m o r e t e r m s s i m u l taneously. In f a c t , c o r r e l a t i o n of too many t e r m s , o r c o r r e l a t i o n of the wrong t e r m s , i n c r e a s e s the selectivity so g r e a t l y that blank s o r t s and l o s s of relevant information m a y b e come s e r i o u s problems. Because of the enormous number of p e r m u tations and combinations of t e r m s possible f r o m even a modest vocabulary, some of those

-

65

NEW KINDS O F INDEXES

documentalists i n t e r e s t e d in mechanization have argued (probably i n c o r r e c t l y ) that c o r r e l a t i v e indexes i n book f o r m could not be produced. It has been thought that the book would be too l a r g e . Some studies I have made s e e m t o indicate that, by s i m p l e techniques, it i s now p o s sible to produce c o r r e l a t i v e indexes in book o t h e r s have indif o r m . Independent studies cated s i m i l a r possibilities. R e s e a r c h i s going ahead in this a r e a . C o r r e l a t i v e indexes in book f o r m have the t r e m e n d o u s advantage of automatically providing the s e a r c h e r quickly with closely r e l a t e d and analogous information i f the p r e c i s e information sought i s unavaila.ble. The index s u g g e s t s r e lated information during the s e a r c h just a s a m a i l - o r d e r catalog suggests new products by p i c t u r e s and d e s c r i p t i o n s found n e a r to the d e s c r i p t i o n of the product sought. C o r r e l a t i v e t r o p e indexes, in e i t h e r book o r mechanized f o r m , facilitate geneGic s e a r c h e s because of the g e n e r i c n a t u r e of the t e r m s and because of the selectivity provided by t h e i r c o r r e l a t i o n . An example of a g e n e r i c question i s , "What r e a c t i o n s of Group-Three e l e m e n t s with chalcogenides have been s t u d i e d ? "

YJ

the chemical elements of molecular f o r m u l a s in this f o r m u l a - s t r u c t u r e indexneed not be a r r a n g e d i n t h e usual o r d e r . It m a y be found m o r e helpful, f o r example, t o have an o r d e r in which all elements a r e placed alphabetically with carbon next t o l a s t and hydrogen l a s t . l 7 This p a r t i c u l a r o r d e r has advantages, the b e s t of which is that the number of hydrogen a t o m s in the s t r u c t u r e has the l e a s t control over the position of the e n t r i e s in the index, Thus, miscounting o r e r r o r s in determination of the number of hydrogen atoms then has l e a s t effect. The purpose of studying f o r m u l a - s t r u c t u r e indexes at this t i m e i s t o enable the c o n s t r u c tion of i n t e r i m indexes giving a c c e s s to r e f e r ences f o r compounds before the annual f o r m u l a indexes a r e available. P e r s o n n e l t o supply s y s t e m a t i c n a m e s f o r such i n t e r i m indexes will be unavailable f o r s o m e t i m e yet. The training t i m e r e q u i r e d f o r selecting and calculating the molecular f o r m u l a s and f o r deriving the s t r u c t u r e codes a r e much l e s s than the t i m e needed to l e a r n s y s t e m a t i c organic nomenclature. A formula-notation index a l s o i s possible. It c a n be u s e d exactly a s a f o r m u l a - s t r u c t u r e index simply by ignoring the position i s o m e r i s m and o r d e r of operations.

FORMULA-STRUC TURE INDEX CITATION INDEX The f o r m u l a - s t r u c t u r e index which we a r e now studying a t Chemical A b s t r a c t s is like the s t a n d a r d f o r m u l a index except t h a t , substituted f o r the s y s t e m a t i c n a m e , i s a s t r u c t u r e code i n dicating s o m e i m p o r t a n t s t r u c t u r a l f e a t u r e s of the compounds indexed. One s i m p l e s t r u c t u r e code, The Dyson index l5 (not t o be confused with the Dyson c i p h e r ) , indicate51 the n u m b e r of r i n g s of diff e r e n t s i z e s , the number of methyl and ethyl g r o u p s , the number of double and t r i p l e bonds, the p r e s e n c e of functional g r o u p s , e&. Another s t r u c t u r e code, developed by Luhn,16 indicates the number of "n'odes" o r a t o m s attached t o 2 , 3 , and 4 o t h e r a t o m s . The p u r p o s e of t h e s e codes i s t o help d i s tinguish among irsomers which a m o l e c u l a r f o r m u l a , by i t s e l f , i s unable t o do. While s o m e of t h e s e codes a r e not s o p r e c i s e a s a r e s y s t e m a t i c n a m e s , they usually a r e adequate f o r differentiating among the v a r i o u s known i s o m e r s , and a r e v e r y much s i m p l e r to l e a r n t o u s e than is s y s t e m a t i c nornenclature. In fact, the l e a r n ing t i m e f o r the IDyson-index code (not the Dyson cipher)' has been shown t o be substantially z e r o ; the s e a r c h e r simply s t a r t s off by using it. An example of a f o r m u l a - s t r u c t u r e - i n d e x e n t r y f o r r e f e r e n c e t o the compound ethyl-2nitrobenzene follows : CgH10ISo2 000 11 000 10/ 6.8. This code following the m o l e c u l a r f o r m u l a m e a n s that t h e r e its one s i x - m e m b e r e d r i n g , one a r o m a t i c r i n g , one ethyl group, a n i t r o group, and m o r e thanone ring substituent. The symbols f o r

Another type of index, new to c h e m i s t s , has been suggested by Eugene Garfield. This, is the citation index.18 It enables the s e a r c h e r to go f r o m an e a r l i e r paper t o all l a t e r p a p e r s which have cited i t . In effect, it enables one t o d i s cover the descendents of an idea. At p r e s e n t , i t is possible t o s t a r t with a l a t e r paper and by m e a n s of the r e f e r e n c e s cited t o d i s c o v e r e a r l i e r r e l a t e d p a p e r s . The citation index would enable the s e a r c h e r t o r e v e r s e this p r o c e s s . In the legal field, citation indexes have proved t o be vitally n e c e s s a r y in locating Later legal decisions r e l a t e d t o an e a r l i e r one. F o r the field of c h e m i s t r y the ability to go f r o m e a r l i e r p a p e r s to l a t e r ones citing them m a y t u r n out t o be much m o r e useful than it now s e e m s . I believe that it would be of m o r e u s e than simply a device f o r flattering the e a r l i e r author. The citation index would be derived f r o m r e f e r e n c e s cited in chemical p a p e r s and would lead the s e a r c h e r f r o m the e a r l i e r p a p e r s t o a l l of the l a t e r p a p e r s r e l a t e d to i t by citation and a l s o , usually, by r e l a t e d subject i n t e r e s t .

CONCORDANCE The concordance, or word index, while not exactly new t o c h e m i s t s , arid c e r t a i n l y not new t o Biblical s c h o l a r s , would be of help i f the s e a r c h e r u s e d the exact words of the author t o l e a d him back t o the original document. The

66

Charles L. Bernier

concordance might be called a word index since exact words, and t h e i r contexts, r a t h e r than subjects a r e indexed. A concordance for an annual volume of Chemical Abstracts would be much l a r g e r than the Biblical Concordance and p r o b ably would be considerably l e s s useful because word indexing of this nature leads to scattering, omissions, a flood of t r i v i a l e n t r i e s , and bulk.

The extraction of important sentences f r o m the original documents is c a r r i e d out by the computer which counts and r e c o r d s t h e number of t i m e s each word o c c u r s in the document, eliminates the simple and technically uninformative words, and detects and s e l e c t s sentences containing most of the m o r e important words located within a c e r t a i n distance apart.

CLASSIF'ICATIONS REACTION INDEX Reaction indexes have been compiled. Theilheimer l9 is an example. It should be possible to produce reaction indexes for the periodic a l o r a b s t r a c t l i t e r a t u r e also. The kind of reaction is so indexed that if the specific reagents and products sought cannot be located, then analogous reagents and products which undergo the s a m e reaction can be d i s covered. Analogous r e a c t a n t s and products sometimes may be very difficult to locate in standard subject indexes.

TAXONOMIC OR BIOLOQCAL-NAME INDEX Chemists study organisms. They a r e i n t e r e s t e d in biological reactions, control of p e s t s , chemotherapy, pharmacology, etc. In all of these studies, named o r g a n i s m y u s u a l l y a r e of interest. A taxonomic o r biological index would give r e f e r e n c e s to all documents (e.g., a b s t r a c t s ) in which the s a m e organism w a s x d i e d . Biological Abstracts has such an index. P e r h a p s chemi s t s would find one useful also. Such an index could be a r r a n g e d taxonomically to bring related organisms together i n o r d e r to make i t e a s y f o r the s e a r c h e r to d i s cover studies about related organisms in the event that the specific organism originally of i n t e r e s t had not been studied.

MACHINE INDEXING AND SEARCHING Another type of searching s y s t e m has been suggested by H. P. Luhn. 2o In this s y s t e m , i m portant sentences a r e extracted f r o m documents by a computer. The s e a r c h e r does not u s e an index, but instead, w r i t e s a n e s s a y about the question. This e s s a y is then fed into a computer which makes a comparison with the e x t r a c t s of documents already produced and s e l e c t s those related on the b a s i s of probable likeness t o the e s s a y question. If this s y s t e m proves t o be s u c cessful, it will constitute a new type of document s e l e c t o r which i s neither a n index nor a classification.

Classifications and indexes overlap. Subject indexes can be viewed t o be alphabetical classifications although they a r e definitely not h i e r a r c h i c a l classifications. Classification is used occasionally in subject indexes a s a tool. Considerable important work now is being done on classifications Ranganathan's colon 21 and the Universal Decimal Clasclassification sification22 a r e two examples. The works of Vickery 23 and others have yielded notations and arrangements of words that a r e , in effect, standard sentences identifying documents clas sified. Standard languages have been created. If these standardized sentences, associated with document r e f e r e n c e s , a r e a r r a n g e d into an o r d e r , say alphabetical, and perhaps permuted, they constitute an index to the m a t e r i a l . The technical t h e s a u r u s , 2 4 which is not an index, is, in effect, a comprehensive c l a s s i f i c a tion of t e r m s which may make indexing and the u s e of indexes e a s i e r and m o r e precise. The u s e of a technical t h e s a u r u s , were one built, would lead the s e a r c h e r f r o m those t e r m s which he knew t o all of those which he needed to know in making a complete s e a r c h in an index o r classification. The c u r r e n t state of knowledge in a field could be gained in outline f o r m by the u s e of such a t h e s a u r u s .

.

-

SUMMARY The future importance of these new indexes is difficult to evaluate. I like t o think that nothing in the way of effective new chemical documentation is too good f o r chemists. Certainly chemists need generic searching aids. Group indexes and c o r r e l a t i v e trope indexes should s e r v e a s such aids. Whether these new indexes will appear in the f o r m of books, c a r d s (unpunched o r punched), o r computer tape is largely, i f not entirely, a m a t t e r of efficiency and economics, which r e q u i r e s m o r e study. Economics, efficiency, and speed of s e r v i c e needed will largely control whether searching means a r e t o be placed on the d e s k of the chemist o r maintained in a c e n t r a l information-searching agency. Many chemists have experienced difficulty in learning and using a systematic organic nomenclature. There is probably no one "correct"

NEW KINDS O F INDEXES

name for any organic compound. Systematic organic nomenclature is growing. Time i s required t o keep up with this growth. Group indexes, notation indexes, and coded formula indexes would help those who have difficulty with s y s t e m a t i c nomenclature. So would the u s e of a notation lo which is much e a s i e r t o learn. The s y s t e m of machine abstracting und e r development and studied by Luhn” would, i f successful, probably make documentation c e n t e r s p r a c t i c a l for answering c e r t a i n t y p e s of questions difficult t o handle with conventional indexes. This; s y s t e m would make subject i n dexing in such a documentation c e n t e r unnecessary Citation indexes 18 sometimes would lead s e a r c h e r s into s t r a n g e and unpredictable r e lations. It is v e r y difficult t o picture just how useful such indexes would be in the field of chemistry. They probably would be expensive but not especially difficult t o produce.

.

67

The concordance might, a s I have m e n tioned above, be the l e a s t useful t o chemists of all the indexes that I have discussed. Reaction and taxonomic indexes should prove useful, i n m y opinion, but probably could be incorporated effectively into c u r r e n t subject indexes r a t h e r than existing as s e p a r a t e entities.

CONCLUSION In conclusion, I find the subject of indexes fascinating. I a m convinced that chemists need new kinds of indexes. There is p r o m i s e and encouragement i n the number of capable s c i entists who have been attracted t o study index design. Certainly the future f o r new document s e l e c t o r s , indexes o r not, looks v e r y bright.

REFERENCES 1. 2. 3. 4.

5. 6. 7. 8. 9. 10. 11.

12. 13. 14. 15. 16. 17. 18. 19. 20. 2 1. 22. 23. 24.

Bemier. Charles L., “Correlative Indexes I. Correlative Chemical Group Indexes,” American Documentation, 8, 306-13 (1957). Nuniag, Howard !j., Private communication Gamble, Dean F., “ A Coordinate Iodex of Organic Compounds,” paper presented at ACS Meeting, March 31, 1955. Frome, Julius, and Leibowitz, Jacob, “A Punched Card System for Searching Steroid Compounds,” Patent Office Research and Development Report No. 7., “A Method of Coding C x m i c a l s for Correlation and Classification,” Chemical-Biological Coordination Center, National Research Council, Washington, 1950. Mooers, Calvin N., “Ciphering Suuctural Formulas-The Zatopleg System,” &tor Technical BulleunNo. 59, Zator Company, Boston, Mass., 1951. Norton, T. R., and Opler, A., “A Manual for Coding Organic Compounds for Use With a Mechanized Searching %stem,” The Dow Chemical Company, Midland, Michigan, 1956. Opler, A., suggested in a conversation. Dyson, G. Malcolm, “Studies in Chemical Documentation,” Chemistry and I n d u s q , 676 (1952). Dyson, G. Malcolm, “Some Applications of the Dysonian Notation of Organic Compounds,” paper ptesented at ACS Meeting, April, 1947. Mooers, Calvin N., “A New Semantic Principle in Information Retrieval Systems,” a paper presented before the American Documentation Institute, November >I,1954. Bemier, Charles L., “Correlative Iadexes 11. Correlative Trope Indexes,” American Documentation, 8, 47-50 (1957). Garfield, Eugene!, private communication. O’CQMO~, John J., private communication. Dyson, G. Malcolm, “Studies in Chemical Documentation,’’ Chemistry and Industry, 676-84 (1952). Luhn, H. P., “A Serial Notation for Describing the Topology of Multidimensional Branched Structures (Nodal Index for Branched Structures),” Research Laborntory, IBM, Poughkeepsie, New York. Scolnik, Herman, and Hopkins, Jane K., “A Simplified Stoichiometric Formula Index,’’ paper presented at ACS Meeting in Mi&, 1957. Garfield, Eugene, “Citation Indexes for Science,” Science, 122, 108-1 1 (1955). Theilheimer, W.,,“Synthetic Methods of Organic Chemistry,” Series 2, Vol. X, Interscience Publishers, Inc., New York, N. Y., 1956. Luhn, H. P., ‘“he Automatic Geation of Literature Ahstracts,” IBM Journal of Research and Development, 159-65 (1958). Rahganathan, S. R., “Colon Classification and i t s Approach to Documentation,” in Bibliographic Organization, edited by J. H. S e r a and M. E. Egan, (1951); “Library Classification as a Discipline,” Classification Research Group Bulletin, No. 2, (1957). Bradford, S. C., “The Universal Decimal Classification, 62-86,” in Documentation, 2nd Ed., Gosby Lockward and Son Ltd., London, 1953. Vickery, B. C., “Notational Symbols in Classification,” The Journal of Documentation, 4 14-32 (1952). Bemier, Charleis L., and Heumann, Karl F., “Correlative Indexes. In. Semantic Relations Among Semantemes-The Technical Thesaurus,” American Documentation, 8, 211-20 (1957).

3

--