Numerical Index Key for the Beilstein System J -
d
F. LOWELL TAYLOR The Dow Chemical Company, Midland, Mich.
T h e Beilstein system is the only thoroughly tested and used scheme for classifjing organic compounds. It has served this purpose w-ell, but the difficulty in keeping the lexicon up to date, so that it will be of maximum service, indicates the need for mechanizing the process of handling chemical literature. 4 straightforward code for this process ma3 require modification of the system. Now is an opportune time to study classification, in conjunction with nomenclature and coding, so that the three methods of designating and arranging compounds can be correlated.
group of organisms, substances, or things. A good system automatically falls into numerical order; conversely, the test of a system is whether it can be outlined numerically. Although for the most part the Beilatein system fits a numerical key, the irregularities make specific coding difficult. Obviously a natural iubject with as varied features as organic chemistry will not fit a uniform arbitrary svstem, and practical considerations may call for variations in certain categories. REG1 STER FORnIUL.4 S
The major premise of the Beilstein system is that compounds shall be arranged as derivatives of certain structures called primary register formulas, which are structural isomers within a given series. The categories dealing n-ith the process of forming derivatives, although coded through the first grade, are not discussed here because of their extended nature. While this featurc is an advantage from the standpoint, of flexibility, it, may be too cumbersome for practical application to punched cards. The major categories, which define the register formulas, are listed in Table I.
T
HIS study was undertaken t o assist in the tmlr of compiling
classified indexes and compendia of organic chemistry. The problem of classifying organic compounds for a compendium such as Beilstein's Handbuch is affect,ed by the number of 1inon.n compounds and the rate a t which new compounds are discovered or synthesized. These tn-o factors are illuetrat ed graphically by Figure 1. Since formal recognition of organic chemistry as a science nearly 120 years ago, the number of compounds has increased from a mere handful to something of the order of a half million. The greater part of this increase has occurred during the past 25 years, within the scientific lifetime of most, chemists. For example, from the discovery of the ring structure of benzene by Kekul6 in 1865 until 1922, it is est,imated that the number of ring structures which became known did not exceed 800 or 1000 (Ring Index, page 14). I n the following 16 years t o 1938, the number increased four- or fivefold to 4000 in the Ring Index; and in the past 8 years it is estiniat,ed that a t least 600 additional ring systems have'been described. I t was practically a t the end of the first period of slow groivth that the fourth edition of Beilstein was issued, which covered the period from 1910 to 1920 in the first supplement. The rapid growth which began a t about that time has made it impossible to keep the Handbuch up to date. Chemists are interested primarily in literature published during the past 25 years. The lack of a compendium based on a classification system puts a heavy burden on the profession as time consumed in continually searching for references. The burden is increased by the present unsatisfactory state of nomenclature. The solution to the problem of handling the great bulk of chemical literature seems to lie, therefore, in mechanization of the process. The purpose of a numerical key to a classification system is t o provide a "handle." The key has two functions: t o designate the particular parts of the system in order to facilitate teaching and learning, and t o provide means of handling materia,l mechanically. Because of the immense amount of material t o be handled in classifying organic compounds, this function is the main incentive t o devising a keg. This mechanical use of a key on punched cards can also overcome individual objections t o a classification system. I t is inevitable that, minor categories from the viewpoint of a system may contain major groups for a research program. From an index of punched cards it is possible t o pick out compounds with any desired combinat'ion of groups regardless of their relative order in the system. d classification system is an outline of characteristics of a
TABLE I.
?VfAJOR CATEGORIES
1. Division 2 . Class 3 . Order
4. Rubric 5. Series 6. Subseries 7. Type 8. Subtype
9. Individual System diverges into derivatives 10. Replacement 11. iionfunctional 12. Functional
To set up a numerical key it is necessary to determinc thc number of digit's needed t o describe each category. If the key is to be applied to punched cards, a field is reserved for each category. Such a card is shown in Figure 2. I t is seen from this card 22 digits are necessary to define primary register formulas, 6 additional or 28 t,o define secondary register formulas, another 6 partIy to define the first-grade functional derivatives, 2 for subclasses of organometallic compounds, and one as a means of differentiating the various types of hydroxyl and carbonyl functions. Thus a total of 37 columns or digits is necessary approximately to define a compound. This was a compromise to fit the standard card on which it was desired to code empirical formulas also. The right half of the card (Figure 2) was designcd for empirical formulas. Two fields of three columns each were reserved, respectively, for the number of at,oms of carbon and of hydrogen in organic compounds. The remaining seven fields of four columns each , were used for the symbols and subscripts, in pairs of two columns each. The spacing of the headings on the face of the cards allowed notes to be made between the rows of punches. The numbers were omitted from the face of the cards. Three sets of cards, identical except for color stripes across the top, were planned for
470
I N D U S T R I A L A N D E N G I N E E R I N GC H E M I S T R Y
March 1948
the Beilstein code, the empirical formulas, and the names-each set wbuld carry two of the systems of designation and be arranged according to one of them. The amount of time required to prepare a master index of such sets of cards would be reasonable, provided a master code were available. The index then would serve as a time-saver.
471
60 W
2 K
’ 550 W
’
The task of coding all organic compounds is seemingly too great to attempt. Fortunately the machines which use the cards can do much of the routine work-for example, each subordinate criterion is repeated as a rule after a change of one character in a higher category. The machine can set up the code and the known compounds can be fitted to it in their appropriate places. However, there are two prerequisites. The code must be generally acceptable and therefore devised by cooperative effort of chemists, and the classification system must be examined t o determine whether it will be satisfactory for compounds now known but not yet compiled, or whether a better system can be devised. DIVISION. Some of the code numbers for the divisions are shown in Table 11. Three digits are necessary to accommodate subdivisions of the heterocyclic division.
U
0
1870
1880
320
330 340
350 353 390 393
TABLE 11. CODENUMBERS Acyclic division Carbocyclic division Heterocyclic division Hetero oxygen only (and its analogs) Third column indicates number of atoms Hetero nitrogen only One hetero nitrogen, and 1 to 9 oxygen Two hetero nitrogen, and 1 to 9 oxygen Other hetero atoms, not oxygen analogs Latest atom of Class 23, Group 5 No carbon atoms in ring Designated as in subdivision 350
CLASS, The main functioial classes were denoted by numbering in order, 1 t o 28. Some of the classes are: Basic nucleus Hydroxy “oxygen function” Oxo “oxygen function” Carboxylic ‘bxygen function” 08 Amine 11 Azo
The subclasses in mixed classes could be designated conveniently only if they were coded by a single digit number. This was done by arranging the digits in decreasing order-e.g., an aminoasohydroxycarboxylic acid was coded as:
1930
1940
A.
No. of abstracts (multiply by 108) Lines of formula index X 104) Lines of formula index [ X 104) ‘ D . No. of compounds (X 104) E. No. of ring systems ( X 102)
B. C.
005 Cn Hzn 006 007 008 009
010 990
’
plus 4 plus 3 plus 2 plus 1 plus 0 minus 1 minus 2, etc., t o minus 99, if necessary
Here a third column is needed t o separate the positive from the negative constants. However, the irregularity of the previous category is reflected in this one. When oxygen functions are present the rubric designation applies to the parent oxygen compound without the other functional groups. SERIES. Category 5 specifies the number of carbon atoms in the register formula. The subseries differentiates among structural isomers. Although the acyclic structures and the com-
11. azo class 8 amino 4 carboxylic
2 hydroxy 11.8420 class, subclass code number This designation is not explicit because the number of each kind of functional group must also be taken into consideration. ORDER. The third category, the order, is determined by the number of each substituent group. Because of irregularity in handling this category, coding was limited to designation of the number of preferred groups. Although the highest numbered group is preferred in selecting the main class, this is not true for selection of the order. When present, the oxygen functions determine the order’accordingto the total number
1920
of atoms of oxygen in those groups. Considerable irregularity exists in the method of assigning preference to amino and sulfonic acid groups. Thus simple coding was not possible in this category. RUBRIC. The fourth major category is the rubric which represents the carbon-hydrogen ratio in the register formula. The arbitrary constant a in the general formula, C,H2, + was designated as:
020
01 02 03 04
1900 1910 YEARS
Figure 1. Chemical Literature zs. Years Chemical Abstracts Centralblati Beilstein Ring Index
100 200 300 310 311
IS90
Figure 2.
Sample Card .
472
INDUSTRIAL A N D E N G I N E E R I N G C H E M I S T R Y TABLE 111. CODIXGACYCLICSTRUCTURES Serial number within the group Number of side chains attached to the main chain Number of carbon atoms in all side chains Series, number of carbot: atom? in the register-formula n-Nonane 09001 2-Methyloctane 09111 3-Ethylheptane 09211 2,2-Dimethylheptane 09221 4,4-Dimethylheptane 09229 3-Isopropylhexane 093 11 3-Ethyl-3-methylhexane 09321 3-Ethyl-5-methylhexane 09323 2,2,3-Trimethylhexane 09331 3,3,4-Trimethylhexane 09338 3-tert-Butylpentane 09411 3-Isopropyl-2-methylpentane 09421 2,3,3,4-Tetramethylpentane 09444
pounds in the benzene subseries are arranged well in Beilstein, it is likely that the delineation described in the Handbuch would be inadequate to handle more complicated structures which are known a t present. But it is this category which should describe explicitly the most distinctive and individualistic feature of organic compounds. I n the field of ring structures both classification and nomenclature are unsatisfactory. Thciefore, study of the basic structures for classification, coding, and nomenclature is first order in importance. The manner of coding the acyclic structures is shown by Table 111. It is proposed that these structures be enumerated and the enumeration be retained in naming derivatives. This would correlate compounds derived from the same carbon structure. The results of this study can be summarized as follows: Although the main categories in Beilstein can be coded to fit a numerical system a simple code would not be satisfactory for differentiating all pdmary register formulas. A numerical outline is a great help in learning a system as massive and variant as organic chemistry. If we are not to deprive ourselves of the aid of a definite numerical key to organic classification, both for learning it and for compiling and searching material with the aid of machines, further study with the objective of devising such a system is necessary. Now is the time to make the study, because the major part of the accumulated knowledge of organic chemistry lies unassembled and unclassified in the original literature and abstracts.
Vol. 40,
NO. 3
tion, but ignore the old rules of precedence. Suppose that an explicit arrangement of the basic structures is agreed upon. Then let us divide the 28 classes into groups and place the groups in separate categories, designated by Roman numerals to avoid confusion. The categories are shown in Table IV numbered in reverse order of their precedence. Category VI, Table V, is a supercategory ranking above the division. This arrangement would collect organometallic compounds of each element, whereas the present system distributes them among the divisions as classes. Compounds of monovalent elements would be included in this category instend of considered as salts as is present practice.
TABLE V. CATEGORY VI, ORGANOMETALLTC 1st column Group of periodic table in which the element occurs 2nd column Row of periodic table 3rd column Following code 0. No hydroxyl group on the functional atom 1. One hydroxyl group on the functional atom 2. TNO hydroxyl groups 3. Three hydroxyl groups 4. Four hydroxyl groups 5. Twolike functional atoms joined by a single bond: =p-p= 6. Two like functional atoms joined by a double bond: -PEP7 . Two unlike functional atoms joined by a single bond. =P-N= 8. Element is hetero atom with carbon in the ring 9. Element is in a ring without carbon Category V would be coded as in Table 11, including ordy hetero nitrogen and oxygen and its analogs. Category IV i s coded in Table V I ; only one column is necessary.
TABLE VI. CATEGORY IV, OXYGEN FUNCTIONS 1. Fundamental structure (without or with functional substituents of categories I to 111,but with no oxygen functions) 2. Hydroxy compounds 3. Oxo compounds 4. Hydroxyoxo compounds 5 . Carboxylic acids 6. Hydroxycarboxylic acids 7 . Oxocarboxylic acids 8. Hydroxyoxocarboxylic acids
DEVISING A BETTER SYSTEM
If we turn our attention now to the problem of devising a better system, we must recognize that considerable experimentation and cooperative effort are required. The experimentation can be done well by punched card machines after preliminary exploratory tests indicate promising systems. From past experience of chemists with the Beilstein system, information should be available to point the way. Experience gained in coding the original system suggests the following outline, offered as the basis for an exploratory test rather than as a definite proposal. I t will serve to indicate the problem of correlating classification and coding. Let us take the basic structures of compounds and the 28 functional classes of Beilstein as the major criteria for classifica-
TABLE IV. PROPOSED ORDEROF CATEGORIES VI. Organometallic (and similar) atoms V. Division IV. Nuclear structure and oxygen functions 111. Classes 16 to 22, inclusive 11. Classes 8 t o 15 I. Classes 5 to 7 (and analogs) 0. Nucleus
Categories I, 11, and 111 are coded in Table VII; a t least three columns would be needed for each category to allow designation of mixed functions. The latest occurring function would be designated in the left column.
TABLE VII. CATEGORIES I, 11, AND IT1 111. Nitrogen chains 11. 1 and 2 nitrogens I. Group 6 acids 1. Triazane 1. Amine 1. Sulfinic 2. Triasene 2. Hydroxylamine 2. Sulfonic 3. Hydroxytriazane ’ 3. Hydrazine 3. Seleninic 4. Hydroxytriazene 4. Azo 4. Selenonic 5. Azoamidoxide 5. Hydroxyhydra5. Tellurinic 6. Tetrazane sine 6. Telluronic 6. Diazo 7 . Tetrasene 7. Azoxy 8. Longer N chains 8. Nitramine This outline would give more regular classification in the mixed functional groups than does the original Beilstein system. Each category would necessarily have a subcategory in which the number of each kind of functional group would be designated. If preferred, these subcategories, in order, could be made subordinate t o the fundamental structure, and the rubric could be
INDUSTRIAL AND ENGINEERING CHEMISTRY
March 1948
TABLE VIII.
SYSTEM NUMBER
654.321 Six digits, from category of corresponding number 6 Group of periodic table (organometallic) 5 Abbreviated key to division 4 Digit direct from category I V 3 Largest digit from category 111 2 Largest digit from category I1 1 Largest digit from category I
placed immediately after the basic nucleus rather than preceding it. This method would arrange each combination of functional substituents under a parent nucleus in order of increasing unsaturation. The various combinations could be recapitulated in a general way as a system number in which presence of groups in the various categories would be indicated as in Table VIII. The exploratory system can now be summarized by several simple rules to determine the choice of the register formula.
413
A. Choice of Nucleus 1. The fundamental structure which occurs latest in the order of preference without regard to functional substituents is chosen as the nucleus, except when Rule 2 applies. 2. When organometallic groups are present, the structure of which they are a part is the basic nucleus. B. Choice of Functional Category The functional groups which are attached to the chosen nucleus, by Rule 1 or 2, determine the class of the register formula, with the following subordinate rules. Cleavage shall be made of substituted functional groups to leave the simple substituent on the basic nucleus, provided the group is then in its normal form. If the fundamental structures obtained by cleavage are equal, the latest functional group determines the class, unless Rule 6 amdies. If two ways of deaving substituent groups give the same. constituent compounds, the earliest functional group determines the class. RECEIVEDAugust 28, 1947. Presented before the Division of ChemicaI Education, Symposium on Chemical Literature, at the 111th Meeting of the AMERICAN CHEMICAL SOCIETY, Atlantic City, N. J.
The Philosophy of Classification of Chemical Literature L
ERNEST H. HUNTRESS Massachusetts Institute of Technology, Cambridge, Mass. T h e responsibilities of a learned profession and of a professional society are discussed. Although the American Chemical Society has long been active in the development of the production, publication, and dissemination of chemical knowledge, it has made only modest beginnings in the organization and correlation of knowledge for utilization.
T
HE early concept of the learned professions in narrow terms of theology, law, and medicine has gradually expanded to include many other fields. If a profession be regarded as any
occupation involving a liberal education, special disciplines, and intellectual rather than manual labor, the pursuit 6f chemistry is properly accepted as well within the scope of these three definitive characteristics. In view of the existence of the substantial body of facts and experience comprising its literature, chemistry appears to merit additional characterization as a learned profession. SIXFOLD RESPONSIBILITY OF A LEARNED PROFESSION
I n the effective fulfillment of the objectives of any learned profession we must recognize the successive application of two operational sequences. The first involves the discovery or acquisition of information and its transmission to the entire group of persons to whom it has or eventually will have value. The second, dealing with the location, evaluation, and eventual use of this information, is characterized by an opposite vector. This second operation reaches out into the vast area of recorded knowledge, selects material relevant t o the current issue, and focuses it upon the individual or relatively small group who desires to use it. In seeking brief but descriptive designations for these two complementary sequences, we might express the first as the radiative sequence, the second as the absorptive sequence. Each sequence comprises several distinct functions, so that the total cycle of knowledge may be represented by six stages,: I. Radiative Sequence Production Publication Dissemination
11. Absorptive Sequence Organization Correlation Utilization
SPECIAL RESPONSIBILITIES OF THE PROFESSlONAL SOCIETY
Whereas the initial production and ultimate utilization can and do occur within the scope of an individual person, group, laboratory, or institution, the more general aspects of publication, dissemination, organization, and correlation reach full expression only by the combined efforts of many persons or groups. This leads immediately to recognition of the place of the professional society and to a consideration of its special responsibility toward its constituents for the development of these four aspects as adequate tools for their prosecution of effective work. That the professional society is the only proper recipient of this responsibility and that in the public interest it ought not to be shifted to individuals, small groups, or commercial publishers is readily evident. The professional society alone possesses the prestige and authority requisite throughout the whole area of an expanding professional interest. By corporate existence it alone can supply the stability and permanence essential to effective maintenance of service over long periods of time, relatively unaffected by vital, economic, social, and financial hazards. It alone can coordinate its functions to effect required economy of time, effort, and expense. It alone can achieve the financial backing required for the extended operation of its services. I n a general way these truths have been recognized by many professional groups. They are suggested, for example, in the broad language of the Congressional Charter of the AMMIRICAN CHEMICAL SOCIETY(9). Section 2 of its Act of Incorporation and Act I1 of its Constitution and Bylaws are identical wordings: The objects of the incorporation shall be to encourage in the broadest and most liberal manner the advancement of chemistry in all its branches; the promotion of research in chemical science and industry; the improvement of the qualifications and useful-