LITERATURE
T o p left o f card contains Cyanamid's molecular formula, structure of which appears below. .Pertinent research data follows
Quick Access to Research Records J O H N H.
FLETCHER: a n d DOLORES S. DUBBS
Stamford Research Laboratories, American Cyanamid Co v Stamford, Conn.
Cyanamid's molecular formula index arranges its 35,000 organic compounds so that any one com pound or related ones can be located in minutes IΝ
RECENT YEARS there has developed
among industrial chemical research laboratories an increasing realization of the impor-tance of ready information retrieval from t h e records of their re search activities. These records repre sent millions of dollars expended. If properly preserved they also represent potential savings in t h e course of fu ture research. Such savings cannot be realized, however, unless there is effi cient access to desired information. A very large proportion of the in formation sought from research records by chemists has to d o with chemical compounds—their properties and their synthesis. Several years ago, as a re 5888
C&EN
N O V . 26, 1 9 5 6
sult of a decision by American Cyana mid to establish some kind of index comprising all compounds handled in its research laboratories, we undertook a study of existing methods for classify ing and indexing organic compounds. As is undoubtedly the case with other organizations conducting chemical r e search, two types of questions are very frequently asked by those seeking in formation from research records: ( 1 ) a question concerning a single chemical compound, and (2) a question con cerning a group of functionally or structurally related chemical com pounds. I n our experience, the former had been the more common type of
question, a n d w e , therefore, decided to give it primary consideration.
Indexing By Formula A molecular formula index in the form of a card file appeared to b e the most satisfactory means of arranging several thousand organic compounds so that any one compound could be located quickly. Indexing by formula eliminates, t o a degree, the complexities and inconsistencies of chemical nomen clature which h a v e always confronted the compilers of subject indexes. However, in a formula index contain ing a large number of compounds a single molecular formula will very often correspond to several different com pounds. Therefore, some other means of distinguishing one compound from another is necessary, if the index is on file cards, this distinction can be made b y showing the structural formula of each compound. Of course, if the index must be issued in book form like the Chemical Abstracts Formula Index, it is obviously n o t practical to include a structural representation for each compound. Having decided to establish a molecular formula index, w e turned our attention t o the problem of h o w to answer the second type of question, namely the search for a group of func tionally related compounds. At that time ( 1951 ) the only successful practi cal approach to this problem that we knew was the u s e of punched cards. If one encodes t h e structural and func tional features of organic compounds into a punched card index, one can then search t h e index for all com pounds having a n y specified combina tion of these features. For example, a search might b e made for chlorinated dicarboxylic acids, or one might wish to locate all halogenated nitro hydroxy compounds. Assuming proper encod ing, a punched card index can readily answer this type of question. W e wished, if possible, to avoid a multiplicity of indexes. Hence, before deciding to proceed with punched-card indexing, w e gave careful attention to the possibility of modifying the existing types of formula index to improve their properties of classification. T h e Hill System, used w i t h slight modification by Chemical Abstracts, a n d the Richter System, used b y Beilstein, British Ab stracts, and Chemisches Xentralblatt, were each considered. Although these formula indexes have proved their utility over t h e years, they offer very little i n the way of classifica tion on the basis of structure or func tionality. F o r organic compounds both the Hill a n d Richter Systems specify that t h e symbol for carbon must al ways b e written first. Therefore, the primary classification afforded is on the
basis of the number of carbon atoms in each compound. This sort of classi fication is of very limited value, be cause one is not often interested in a group of organic compounds whose only feature in common is that the number of carbon atoms in each compound is the same. Furthermore, both t h e Hill a n d Richter systems place hydrogen second, directh' following carbon. T h e num* ber of hydrogen atoms in an organic compound is, for practical purposes, without significance, yet both the Hill and Richter systems lead to classifica tion first on the basis of number of carbon atoms, and second on the basis of n u m b e r of hydrogen atoms. W h e n one further takes into account that hydrogen atoms are often very numer ous and tedious to count and hence that errors in counting hydrogen are easy to make, one concludes that hy drogen should probably b e written last in each molecular formula, so that it would h a v e very little effect on the arrangement of compounds in the index.
New Order of Chemical Symbols This line of reasoning led us to experiment with other methods of writ ing the molecular formula whereby the symbols for carbon a n d hydrogen would b e placed near t h e end of the molecular formula. After some study we chose the following order of sym bols for the elements most frequently encountered in organic compounds: P , N, S, O, I, Br, Cl, F , C, H I t is significant that about a year after w e had selected and begun to use this revised order of elements a publication by G. M. Dyson appeared (Chemistry & Industry, July 12, 1952, pages 676-684) in which a somewhat similar order of elements is proposed. In this article Dyson describes his "molform index," which is essentially a digital representation of the molecular formula based on the following order of elements: P , I, F , Cl, Br, S, N, O, C, H W e were very pleased to have this independent confirmation of our own reasoning, namely that no matter what the specific arrangement of the other elements might be, the symbols for carbon and hydrogen should certainly be placed last in the molecular formula. I n contrast to the rather easy-to-reach decision concerning C and H, selection of the most satisfactory sequence for the other elements was not a simple problem. Should Ν precede O, or should Ο precede N? W h e r e should t h e halogens b e placed and in what order? T h e most obvious basis for answering questions like these is the
Table I SEQUENCE OF ELEMENTS BASED ON THE PERIODIC TABLE GROUP IV
GROUP V
GROUP Vf
GROUP VIf
N o t e — F is followed b y C, then H. For example, t h e molecular formula f o r 2-chîoro-4-nîîrobenzenethîo! is written* N S 0 2 C i C 6 H 4 a n d lhat of t e t r a e t h y u e a d , PbC 8 H 2 o
Most of the elements which are frequently encountered in organic compounds are listed. However, the principle of proceeding upward in each group and from left to right can be used for the entire Periodic Table.
relative interest of users of the index in compounds containing one o r another of the elements. For example, if organic compounds of silicon were considered to b e of primary interest, then Si ought to precede all the other symbols in each molecular formula. Filing of the formulas alphabetically and numerically would then result i n the arrangement of silicon compounds as a group in the index. The halogens might be next in order of interest, and if this were true they would b e placed immediately following silicon in each molecular formula. O n e can see that in this manner it would be possible t o develop varying types of special-purpose formula in-
dexes. I n our case it was considered desirable, because of the varied interests of C y a n a m i d chemists, to develop a n all-purpose indexing arrangement of chemical compounds. Since it seemed wise to b a s e the sequence of elements in t h e molecular formula on some fundamental relationship, w e chose the Periodic Table of Elements as a frame of reference ( T a b l e I ) . Having selected a sequence of elem e n t s t o b e used in writing t h e molecular formula of each organic comp o u n d , w e considered t h e possibility of also u s i n g the sequence of elements a s the basis for t h e filing arrangement of a large n u m b e r of compounds in an index. After careful comparison of the
J o h n H. Fletcher is the leader of the coding and indexing group a t Stamford Research Laboratories of American C y a n a m i d . The group has been active a t the Stamford laboratories since 1950 and is still going strong. Fletcher says "It's an internal Chemical Abstracts for Cyanamid." Recently, Fletcher also b e c a m e manager of t h e organic chemicals section of the basic research department. O n e of his main chemical interests is i n general synthetic chemistry with specialization in organic phosphorus compounds, particularly Cyanamid's insecticides, parathion and malathion.
Dolores Schwartz Dubbs, a native of Urbana, 111., attended the University of Illinois and was gradu a t e d in L947 with a B.S. degree in chemistry. She immediately took a position with E l i Lilly & Co. Research Laboratories as chemical compila tor. In 1950 she became a scientific editor in t h e coding & indexing group at American Cyanamid's Stamford, Conn., Research Laboratories. Mrs. Dubbs left t h e career world temporarily in 1954. She has recently joined the staff of Walter Kidde Nuclear Laboratories as a junior scientist.
NOV.
2 6,
1956
C&EN
5889
LITERATURE two alternative filing arrangements which might be used in our new formula index, i.e. alphabetical-numeri cal vs. the special order of elements based on the Periodic Table, we con cluded that the very small advantage in classification afforded by the latter method was not worthwhile. Alpha betical filing has the outstanding ad vantage:» of simplicity and consistency with the procedure already in use by Chemical Abstracts. Hence, we de cided to file molecular formulas in our new formula index on an alphabeticalnumerical basis (Table I I ) . When, as often happens, a single molecular formula corresponds to sev eral different compounds, these com pounds are arranged in the index al phabetically according to name. This procedure is followed in the formula indexes o f Chemical Abstracts, British Abstracts, Chemisches Xentralblatt, and BeilsteinTs Handbuch.
- The molecular formula of a salt of an organic base with an inorganic acid is written with the acid last, separated from the organic formula by a raised period.
be^t be demonstrated by examples of searches actually performed with the n e w type formula index. Each of the three searches described below was carried out when our formula index NCeHas · HBr triethylamine hydro- contained about 24,000 different chemical compounds. T h e index n o w conbromide NOH15 · HC1 triethylamine hydro tains more than 35,000 compounds. chloride Suppose one desired information on NCeHis · V2H2SO4 triethylamine sulfate NC*HÎ2 · Br tetrain e t h y l ammo- chlorinated dicarboxyiic acids not connium bromide taining other functional groups or (Mote that the formula of the inorganic heterocyclic systems, i.e. any hydrocarbon skeleton carrying two CO OH acid is written conventionally.) 9 The molecular formula of a^ salt of groups and one or more Ci atoms. To an organic base with a n organic acid make this search w e had to examine is written with the formulas of the base less than 80 structural formulas out of and acid separated by a raised period. the total collection of 24,000. All the Entries are made in the index at each compounds we were looking for were in one file drawer, and the entire search of the t w o formulas. required about 15 minutes. Most of N«CH5 · VaOaCHa guanidine carbonate this time was consumed in hand-copyOsCHa · 2NaCH5 guanidine carbonate ing structures and references for the • The molecular formula of an ad- 34 compounds located. These 3 4 com* dition compound is written with the pounds included salts and esters as well formulas of its components separated as the parent acids. Rules f o r Salts by a raised period. Entries are made As an example of a somewhat The fundamental principles de in the index at each of t h e components. broader type of search, one might wish scribed, while adequate i n themselves CioHs · N*07CeHa naphthalene picrate to locate references concerning halogenfor the indexing of most organic (car Na07CeHe · CioHe naphthalene picrate ated nitro hydroxy compounds, no rebon-containing) compounds, require I t has been our experience that ap- striction being made as to the number certain supporting rules when applied plication of these rules leads to a very of each kind of function present, i.e. to salts a n d addition compounds (some satisfactory arrangement of salts and any hydrocarbon skeleton carrying one times referred to as "molecular com pounds'*). I t is, of course, desirable addition compounds in a formula index. or more Br, Cl, F, or I atoms, one or more N 0 2 groups, and one or more to collect together all the salts and/or OH groups. In this instance w e had addition compounds o f any given com H o w to Use the I n d e x to inspect several sections of the forpound. This may be accomplished by Generally speaking, compounds of mula index, nevertheless we found that application of the following rules: related functionality are brought to- it required only about 25 minutes to • The molecular formula of a salt of gether in the Cyanamid formula index. locate 19 compounds and hand-copy an organic acid with an inorganic base This means that in order to locate all their structures and references. is written with the cationic group last, compounds having certain functional In a few instances, an even broader separated from t h e formula of the par characteristics in common, one needs kind of search might be desired. For ent acid by a raised period. to inspect only certain portions of the example, one might wish to assemble index. Depending upon the degree of information on all compounds containO3C2K4 acetic acid ring system respecificity of the search, the number ing the benzimidazole O2C2HU · (K) potassium acetate O2C2EL» · (ΝΉ*) ammonium acetate of structural formulas which must be gardless of other structural or funcO2C2H* · (N'a) sodium acetate inspected can vary widely. This can tional characteristics. This search inO0C2H* · (VsPb) lead acetate volved inspection of a much larger portion of the index than had been required in either of the two previous examples. However, in approximately Table II. Arrangement of Molecular Formulas in the three hours we located 48 eligible Cyanamid 1 Formula Index compounds among the total of 24,000. N 2 C e H* cacoâyl trichloride dimethylpyrazine AsCLGHe As0 2 C 2 Hk dimethylpiperazine cacodylic acid N2C*Hx4 A word about t h e physical form of chloropyrazine As»C*H» cacodyl N2ClCUHs Cyanamid's new molecular formula inethyl cacodyl chlorodimethylpyrazine AsoCsHs» N 3 ClCeH 7 cacodyl oxide As 2 OC s H: e OC 4 H 8 methyl ethyl ketone dex is in order. After due consideracacodyl sulfide ASaSCaHe methyl isorSutyl ketone OCH» tion, a Remington Rand Kardex installatrimethylbismuthine OCICJHU chloroacetone RiCsHs tion was selected as offering outstandtriethylbismuthine chloroacetophenone OCIC8H7 RiCeHis triphenylbismuthine PNSOaCiHu diethyl phosphoramidothioate ing manipulative advantages over conJBiUxsîlis NCH. acrylonitrile PNS02CeHM diallyl phosphoramidothioate ventional file drawer equipment. These thiophene methylacrylonitrile SCJi* NCJH» vinjdthiophene benzonitrile advantages have been well established SC e H e NCTH6 chlorobiityronitrile SCIC4H* chlorothiophene NClCiHe by our experience to date. N2C*H< pyrazine SOC e Hacetylthiophene The structural formulas in the Cyan-
5890 C&EN NOWV. 26, 1956
LITERATURE
Barbara Allstrom, SL member of the coding and indexing gro*ip, uses the index to locate information about a group of compounds amid molecular formula index are typed using a special IBM electric typewriter w h o s e keyboard was designed hy a former member of the Stamford indexing group ( C&EN, J u n e 23, 1952, p a g e 2 6 2 2 ) . This machine has been u s e d t o type t h e structural formulas of over 35,000 different compounds and we are entirely satisfied with it.
W h y Not Punched 'Cards? W e m e n t i o n e d earlier that w e h a d considered puncbied-card indexing, a n d the reader may he asking "why not punched cards?"' O u r reasons for choosing t h e molecular formula index are based on two considerations: direct utility t o the library clientele, and cost. L e t u s t a k e utility first. T h e Cyanamid molecular formula index, as has already been pointed out, serves efficiently in locating information both about a single c o m p o u n d and a b o u t grozips of related compounds. It would b e highly inefficient to resort t o mechanical sorting o f punched cards to search for individual compounds. Hence we w o u l d have needed some sort of moleculajr formula index anyway. Furthermore, in searching for groups of r e l a t e d compounds, unless the search w a s a n extremely broad one it i s very doubtfud that machine sorting of punched cards, even with the n e w est type of equipment, w o u l d save a worthwhile amoumt of time when compared with the Cyanarnid molecular formula index.
Of course, with punched cards sorting is not the only operation involved. In the sample searches described the time required in each case included transcription of data and references. W i t h punched cards this would h a v e to b e done either by machine transcription or by hand-copying from a conventional type auxiliary indexIn considering utility of an index for a laboratory like Cyanamid's, there is another factor which, in our opinion, is more important than mere speed of information retrieval. This factor is t h e ability of the index to supply the searcher with pertinent, yet unaskedfor information which may generate entirely new fines of thought. Because searches in our molecular formula index are done by visual inspection of structural formulas, it is a common occurrence for the searcher to come across compounds which, although they do not exactly fulfill the original specifications of the search, are of sufficient interest to b e included. This kind of unsought information can often prove very valuable to t h e research scientist. Unfortunately, mechanical methods of information retrieval are not well adapted to serve in this way because of restrictions imposed by t h e encoding process and t h e inability of t h e machine to "browse/" Furthermore, for obvious reasons it is not feasible to allow the research man to operate the machine himself. This renders the search entirely impersonal, which is no small disadvantage. Even if we ignore t h e possible loss of cards and mishandling of t h e machine, it is certain that very few people would become experienced enough t o use the equipment efficiently without assistance from an expert. Finally, when w e considered cost, machines did n o t look attractive. Since w e had concluded that a molecular formula index was a "must" for locating information about single compounds, the a d d e d cost of encoding, punching, and machine rental could not be justified for the foreseeable future. If our present collection of 35,000 compounds should grow to several times its present size and when much more highly efficient mechanical, electronic, or other kind of equipment becomes available, it is possible that installation of such equipment might prove desirable. Presented before the Division of Chemical Literature, 130th ACS National Meeting, September 1956.
ANNOUNCING
For Production of Ultra-Pure Water
THE
BARNSTEAD MF SUBMICRON FILTER Removes
P a r t i c l e s t o 0.45 m i c r o n (.000016 i n . ) This new filter permits o n a production basis a n ultra-fine filtration heretofore pos- ' sLble only o n a small laboratory scale. T h e 1V1FSubmicron Filter provides positive filtration to 0.45 micron. I t removes bacteria- Removal of trie sub microscopic particulate matter from the pure water assures b e t t e r results i n work with semi-conductors, transistors, charactrou tubes, condensers, reactor components, high resistance cooling systems etc. E m p l o y s replaceable Millipore filtering raemî>rane. Capacities: 100 to 500 or more gallons per hour. Write for Bulletin 141 for full details on production of water with resistance of 10,000,000 ohms or more, and free of organics, bacteria, a n d particulate matter.
arnstead _
_
ST8LÏ. & SEftA.!!\iEP.ALSZEP. CO. iBimitnd
Still *nd i l r n l i i t r
Co.)
1 9 Lanesville Terrace, Boston 31, M a s s .
NOV.
2 6,
1956 C & E N
5891