The Chemical Abstracts Service Chemical Registry System. IV. Use of

Apr 19, 1976 - author-supplied nomenclature, the Registry System now recognizes when a substance is a duplicate of one already on file, and the correc...
0 downloads 0 Views 686KB Size
The Chemical Abstracts Service Chemical Registry System. IV. Use of the Registry System to Support the Preparation of Index Nomenclature? GIERALD G. VANDER STOUW,’ CHRISTINE GUSTAFSON, JOHN D. RULE, and CHARLES E. WATSON Chemical Abstracts Service, The Ohio State University, Columbus, Ohio 43210 Received April 19, 1976 ‘The structure record used in the Registry 111version of the Chemical Abstracts Service Chemical Registry System has been designed to allow automated support of Chemical Absrracts (CA) [ndex Name preparation. A new substance entering the Registry files may match previously iregistered substances at several levels of “partial match”; for example, a new organic compound containing a known ring system, a new compound which is a stereoisomer of a known compound, or a new copolymer formed from known monomers. Names of such related substances provide relevant precedents in application of CA Index Name selection rules and are retrieved to assist 1.he nomenclature generation staff in selecting new Index Names. The retrieval of “related nomenclature” also makes it possible to construct algorithms for the automated generation of inany CA Index Names. Procedures installed in early 1976 will automatically generate names for alloys, copolymers, mixtures, and addition compounds. Additional algorithms are being prepared for generation of systematic names for organic compounds and coordination compounds from 1.heir connection tables. When these developments are completed, a significant percentage of ]new CA Index Names will be automatically generated.

INTRODUCTION The Chemical Abstracts Service (CAS) Chemical Registry is a computer-based system that provides unique identification of chemical substances on the basis of their molecular structures. This system, which was established in 1965, contained approximately 3.4 million unique substances at the end of 1975. In the Chemical Substance Index to Chemical Abstracts (CA), all of the references to a specific chemical substance are brought together at one chemical name, the CA Index Name. (Extra names are added for some substances to provide better coverage of ring systems, ligands, and components of multicomponent substances such as copolymers.) CAS nomenclature specialists derive these names by applying the CA Index Name selection rules,4 a rigorous set of rules designed to ensure that each substance appears at a predictable and reproducible location iin the Index. Before the Registry System was incorporated into CA Index production processes, it was necessary for a substance to be named each time it was selected as an Index entry, regardless of whether that substance had previously been indexed. By comparing structural records or author-supplied nomenclature, the Registry System now recognizes when a substance is a duplicate of one already on file, and the correct name is retrieved for use in the Index. Thus a substance now must be systematically named only upon its initial entry into thl: files. The number of substance names that must be derived is now only about one-fourth of what it would be if Registry processing were not in use. Nevertheless, there are still more than 400000 new chemical substance names that must be derived each year, or about 1600 per working day. Early in 1974 CAS installed the Registry I11 version of the Registry which includes some important modifications to the form of the stored structural record for a chemical substance. ‘These modifications have permitted the development of processes for increased computer-based support of the substance-naming process, with a corresponding reduction in the human effort involved. This increase in computer support results from the fact that with Registry I11 t Presented in part to the 170th National Meeting of the American Chemical Society, Chicago, Ill., Aug 1075. * To whom correspondence should be addressed.

it is possible to identify substance names on file which are related to a new substance being named. The retrieval of related nomenclature for use by the naming specialists results in a reduction of the human effort involved in naming, since it provides them with relevant precedents in application of the name selection rules. The use of retrieved nomenclature as “building blocks” in the formation of new names makes possible the beginning of program development for automatic nomenclature generation. This paper describes the types of naming support currently available at CAS and discusses the current state of development of automatic name generation. PARTIAL STRUCTURE MATCHING IN REGISTRY I11 The processes involved in registering a chemical substance in the Registry I11 System have been extensively described.’-3 Of particular importance for support of name derivation is the fact that the system recognizes various types of partial match between new substances and substances already on file. Each kind of partial match permits a corresponding type of nomenclature support. The three basic types of partial match are: (1) the new substance has the same two-dimensional structure, Le., the same atom-bond connection table, as a substance already on file, but it differs at a level of greater detail, such as stereochemistry or isotopic labeling; ( 2 ) the new substance has an atom-bond connection table which is new to the system, but that table contains one or more ring systems which are already on file; (3) the new substance contains one or more components which are already known, as in the case of a new copolymer of known monomers. MATCHING OF STRUCTURES WITH IDENTICAL CONNECTION TABLES When a new substance is found to have the same twodimensional structure, or topology, as one or more substances already on file, the new record is compared with those records with the same connection table. Comparison is made at successive levels of detail corresponding to logical segments in the Unique Chemical Registry Record (UCRR) (see Figure 1). The Text Descriptor segment of the UCRR contains information about stereochemistry, as well as other items of information about special classes of substances. Text Descriptors have been described in detail by B l a c k ~ o o d .If~ the

Journal of Chemical Information and Computer Sciences, Vol. 16, No. 4, 1976

213

VANDERSrouw ET Topology -- atom-bond connection table

AL.

* Input Structure: u

u

T e x t Descriptor

Isotopic labelling

Derivatives Figure 1. Unique Chemical Registry Record (UCRR).

I n p u t Structure

Retrieved Name: Benzenesulfinic acid, 4-methyl-

-

O= C- 0 C H 2-

Ph

Retrieved Name: 1-Azetidineacetic acid, 3-aminoir-(1 -methylethylidene)-2-oxo-4-( 2-prop: ynylthio), phenylmethyl ester, (3R-tvaris)New Name: 1 -Azetidineacetic acid, 3-aminoir-( 1-rnethylethylidene)-2-0~0-4-(2-propc

ynylthio), phenylmethyl ester, (3R-cis)Figure 3. Retrieval of name for a stereoisomer.

New N a m e : Benzenesulfinic acid, 4-methyl-, palladium(2+) salt Figure 2. Example of related name retrieval.

new structure matches another structure on file at the Text Descriptor level, the procedure continues to the next level of comparison, the Labeling segment, which contains information about isotopic labeling. If there is a match at this level, the incoming record is compared against the matching records at the Derivatives level, which includes detail on charges, abnormal valence, tautomerism, and whether a substance is a derivative, such as a sodium salt, a hydrochloride, or a hydrate. When a substance being registered matches a substance on file at all levels of the UCRR, then the two substances are considered to be identical, and the previously assigned Registry Number and CA Index nomenclature are retrieved, thus eliminating the need for construction of a new Index Name. If a complete match cannot be made, a new Registry Number is assigned and the Index nomenclature for the new substance must be derived. There is a strong likelihood that any substance on file having the same topology as the new substance will have a name very similar to the new name that must be derived. Procedures have therefore been incorporated into the system to retrieve “related” names and provide them to the nomenclature specialists to assist them in preparing the new name. In practice, the retrieval is limited to the names of one or two substances which match the new substance in as much detail as possible. Figures 2 and 3 show two examples of related name retrieval. In Figure 2 , the palladium salt shown was a new substance but the free acid had been registered previously. The system recognized that the connection table of the free acid was already on file and retrieved its Index Name: “benzenesulfinic acid, 4-methyl-”. The new name then prepared was identical with the retrieved name except for the specification of the palladium salt. The example of Figure 3 shows a somewhat more complex structure. Although this substance was not on file, there was on file a substance having the same connection table but differing at the Text Descriptor level. The retrieved name included the stereochemical specification “(3R-trans)”;the new name was identical except for the added stereochemistry, “(3R-cis)-”. MATCHING OF RING SYSTEMS A basic design feature of the Registry I11 system is that the ring systems present in a structure are recognized during the registration process and are stored in the structural record in terms of an identifying number; this ring system identifier in turn serves to link the structural record to a file of ring no214

Input Structure:

Retrieved Ring Name: SH-1,2,4-Triazolo[ 3,4-b] [ 1,3]thiazine Ring Structural Diagram I

,

New Name:

SH-l,2,4-Triazolo[3,4-b] [ 1,3]thiazine, 6,7-dihydro^ -3-phenylFigure 4. Ring nomenclature support. menclature. This feature of Registry I11 has permitted development of an important type of name generation support. When a newly registered substance does not match the connection table of any substance already on file but contains one or more ring systems already known, the names of these ring systems can be retrieved and provided to the nomenclature staff. Since some 85% of the substances in the Registry files contain ring systems, ring nomenclature constitutes an important kind of building block in the construction of new names. There are of course some new ring systems which must be named, but there are only about 3000 new organic ring systems indexed per year compared to 100 times more new organic substances, so the overwhelming majority of new substances do contain ring systems that are already known. Since most of these rings are common rings such as benzene and pyridine, a stop list of common rings is employed to prevent the repetitious printing of names which will already be familiar to the nomenclature chemist. When a new substance contains a known ring system not on the stop list, that ring name is printed out; currently this type of support is provided for about 15% of the substances to be named, or about 250 per working day. Figure 4 shows a new substance which contained a known ring named “5H-1,2,4-triazolo[3,4-b] [ 1,3]thiazine”. This name was retrieved and provided to a nomenclature expert, who used it to access a file of ring system information to find a standardized drawing of the ring system with nomenclature locants, as shown in the figure. When the locants are known, it is a simple matter to complete the name by adding the substituent portion “6,7-dihydro-3-phenyl-”.

Journal of Chemical Information and Computer Sciences, Vol. 16, No. 4, 1976

PREPARATION OF INDEXNOMENCLATURE New Addition Compound:

Input Structure:

1

-c-CI

Retrieved Ring Name:

Retrieved Names: 2-Propen-l-one, 1-(2,4-dimeth^ Phenol, pentachlorooxyphenyl)-3-(2-thieny1)-

1,2-MethanodiCyclopropa[c d , g h ] pentalene

Ring Structure Diagram:

New Names: /a

New Name:

2-Propen-l-one, 1-( 2,4-dimethoxyphenyl)-3-(2-c thieny1)-, compd. with pentachlorophenol ( 1 : 1 )

I

1,2-Methanodicyclopropa[cd,gh ] pentalene-2c( 1H)c

-carbonyl chloride, hexahydroFigure 5 . Complex ring nomenclature support. CH3

oxyphenyl)-3-(2-thienyl)-2-propen-l-one( 1 : 1 )

Figure 7. Combination of retrieved component names.

““\ C’C, 7“

I

/

substance was checked and found to be new to the file, so a new Registry Number, 52163-57-2, was assigned to the complex.

\

NC

1

Phenol, pentachloro-, compd. with 1-(2,4-dimethz

CN

RETRIEVAL OF COMPONENT NAMES

1

,1072-91-9

670-54-2 I 52163-57-2

Figure 6. Example of component-based substance.

The more complex the ring system, the more difficult it is for the chemist to recognize it as equivalent to a different drawing of the same system, and hence the more useful is the automatic linking provided by Registry 111. Figure 5 shows both a substance containing a ring system which was on file and the standardized. drawing of this ring system. Even an experienced nomenclature chemist would find it difficult to verify quickly that these are drawings of the same ring system or to locate this ring system in a manual file. With the retrieved ring name 1,:I-methanodicyclopropa [cd,gh]pentalene”, he can refer directly to a drawing which includes the nomenclature locants. Relating these locants back to the new structure is still not a. completely straightforward task, but it is considerably simpler than heretofore. In this case the needed locant was “2c”, whi’ch was used for the “carbonyl chloride” suffix. “

COMPONENT-BASED REGISTRATION The above discussion has focused on the registration and naming of substances with only a single set of contiguous atoms. However, ma.ny substances indexed and registered by CAS are structured using two or more sets of atoms. The principal classes of substances handled in this way are molecular addition compounds, copolymers, and mixtures. These substances are registered by using the Registry Numbers of their components, a somewhat different procedure which allows another type of nomenclature support. The Registry record for a n addition compound, mixture, or copolymer consists of the Registry Numbers of its components and information about the relationship between them. For example, this information may indicate that a substance is an addition complex of two compounds in the ratio 1:2, or is a copolymer of three monomers, etc. Figure 6 shows a compound which is a 1:l complex of a trimethylpyrazole with tetracyanoethylene. When this substance was input to Registry 111, the programs first determined whether the individual components were on file; both were, and their respective Registry Numbers 1072-91-9 and 670-54-2 were retrieved. Then the total

The possibility of providing automated support to the process of naming component-based substances arises from the fact that their names are to a considerable extent based on the names of the components. In many cases the naming of such a substance is largely a matter of linking the names of the components with a phrase such as “compd. with” or “polymer with” and, in the case of a n addition compound, including information on the proportions involved; other cases are more complex but still use the names of the components as part of the final names. If a new component-based substance is registered, and some or all of the components of that substance are already on file, the Index Names of those components are supplied to the nomenclature staff for use in constructing the new names. Currently this type of support is supplied for 5-10% of the new substances being named. Figure 7 shows an example of a 1:l complex of two substances which were already on file and which had the Index Names “2-propen-l-one, 1-(2,4-dimethoxyphenyl)-3-(2-thienyl)-”,and “phenol, pentachloro-”. These names were provided to a nomenclature chemist who combined them to generate the two names shown, each corresponding to one of the two components. Note that in each case the second component name has been changed to its uninverted form; for example, “phenol, pentachloro-” in the first name is changed to “pentachlorophenol”. As was discussed earlier for ring nomenclature, a stop list of common component names is used so that the names for components which occur frequently are not printed over and over again. Often a substance is made up of one simple, frequently occurring component and another, more complex component. Figure 8 shows an example where the substance to be named was the sulfate of a complex derivative of “Dstreptamine”. The complex name shown was retrieved; the nomenclature chemist needed only to add the words “sulfate (1:2) (salt)” to complete the name of the complex. If any of the components of a new substance is itself new, its name is of course not available and must be generated by a chemist. In such cases other available nomenclature support is supplied to help in the naming task. This situation often occurs, for example, when a new compound is isolated as a picrate, or a new structure occurs which is part of an ionic salt. In the example of Figure 9, the new compound was the tetrafluoroborate of a large cation consisting of two dimethylsulfonium units attached to a bridged ring system. The

Journal of Chemical Information and Computer Sciences, Vol. 16, No. 4, 1976

215

VANDERSTOUW ET AL. New Substance:

1

Computer Edits

-

J

Correct and Reinput

Retrieved Name: D-Streptarnine, 0-3-amino-3-deoxy-cr-~-glucopyrano: syl-( 1+6)-0-[ 6-amino-6-deoxyi~-~-glucopyranosyl-c ( 1 + 4 ) ] -N’ -( 3-amino-2-hydroxy- 1-oxopropyl)-2-c deoxy-, (S)New Name: D -Streptarnine, 0-3-amino-3-deoxyiu-D-glucopyrano: syl-( 1 -+6)-0-[6-amino-6-deoxy~-~-gIucopyranosyl-: ( 1+ 4 ) ] -Ar’ -( 3-amino-2-hydroxy-l-oxopropyl)-2-~ deoxy-, (SI-, sulfate ( 1 : 2 ) (salt) Figure 8. Retrieval of name for one component. Input Structure

0 2 BF,

Retrieved Ring Name: Tricycle[ 9.2.2.14” ] hexadecane New Names: Sulfonium, ( 1 6-cyanotricyclo[ 9.2.2.1 4 ’ 8 ] hexadeca: -4,6,8( 16),1 1 , I 3,14-hexaene-3,9-diyl)bis[dimeth~ yl-, his[ tetrafluoroborate( I - ) ] Tricycle[ 9 . 2 . 2 . 1 4 ” ] hexadecane, sulfonium deriv.

Borate( I-), tetrafluoro-, (16-cyanotricyclo[ 9.2.2.1: 4 ,x 1 hexadeca-4,6,8( 16),1 1,13,14-hexaene-3,9-diyl): bis[dimethylsulfonium] ( 2 : 1 ) Figure 9. Retrieval of ring system name for a component.

complete cation was not on file, but the ring system was known, and its name, ‘‘tricyclo[9.2.2.14~s]hexadecane”, was retrieved and used as the basis for naming the cation. The name of the cation was in turn used as the basis for the three Index entries shown. PURPOSE OF AUTOMATED NOMENCLATURE GENERATION The various types of nomenclature support described above help to reduce the intellectual effort expended in preparing new CA Index Names and to ensure consistent application of the Index Name selection rules by providing the nomenclature chemist with relevant precedents in application of the rules. However, this support does not reduce the physical effort involved in naming new substances; each new name must still be dictated, keyboarded, edited, and if necessary corrected and 216

Figure 10. Algorithmic nomenclature generation.

reinput. In view of the volume of new substances which must be named (400 000 or more per year, as mentioned above), considerable decreases in the professional and clerical effort involved in naming can result from development of computer programs for nomenclature generation. The nomenclatura retrieval capabilities of Registry I11 have opened up the possibility of developing a significant capability for nomenclature generation. CAS has therefore undertaken the development of programs which will be used to generate many of the new names required for CA Indexes. These programs will process the Registry 111 record for a new substance (see Figure 10) and generate a candidate name or names for that substance. If the program is successful, the resulting names will be reviewed by a nomenclature specialist and either released to the file or corrected. Ifthe program does not succeed in a particular case, that substance will be named by a chemist as in current procedures, keyboarded, edited by program5f’, and reviewed and corrected if necessary, and finally added to the nomenclature file. NOMENCLATURE GENERATION FOR COMPONENT-BASED SUBSTANCES Our initial efforts at developing nomenclature generation programs have concentrated on making use of retrieved names for the components of addition compounds, mixtures, and polymers. As discussed above, naming these substances is often a matter of combining the component names using appropriate phrases such as “polymer with”, and is to a considerable extent a tedious and intellectually unchallenging area of nomenclature. It is also very repetitious, since many of these substances receive a name corresponding to each component. For example, a copolymer of four monomers can receive four names, one corresponding to each monomer. Thus the capability to generate names for these substances automatically from the component names can lead to considerable reduction in repetitious naming, keyboarding, editing, and recycling. We have also included alloys in our initial work, since similar techniques can be applied to them. Alloy nomenclature is an area which is also very tedious and repetitious for the naming staff. Alloys are registered in terms of the Registry Numbers of their component metals and their percentage compositions. A name corresponding to each component present is prepared, citing all the other components and their percentages. Programs for generating alloy names have been completed and were installed in CAS production processes in December 1975. Procedures for addition compounds, mixtures, and copolymers were installed in early 1976. The input to the alloy naming procedures consists of the identity of each component of an alloy and its percent composition. From these data the program generates an entry corresponding to each component present in 21% by weight. For example, the alloy shown in Figure 1 1 consists of 65% chromium, 30% iron, 2% aluminum, 1.8% hafnium nitride, 0.8% rhenium, and 0.2% cerium. From these data the program

Journal of Chemical Information and Computer Sciences, Vol. 16, No. 4, 1976

PREPARATION OF INDEXNOMENCLATURE Uninverted Names

Inverted Names

Input: Cr

Fe

.41

HfN

Re

Ce

65

30

2

1.8

0.8

0.2

H 0- ( C H )4-

OH

1 ,CButanediol

Same

1,4-Benzenedicarboxylic acid, dimethyl ester

dimethyl 1,4-benzene= dicarboxylate

Generated Names:

Chromium all’oy, base Cr 65,Fe 30,Al 2,HfN 1.8,Re 0.8,Ce 0.2 0

HfO-CH,CH,tn

Iron alloy, nonbase Cr 65,Fe 30,Al 2,HfN 1.8,Re 0.8,Ce 0.2

Poly(oxy-1,2-ethanediyl), ct-hydro-w-hydroxy-

Aluminum alloy, nonbase Cr 65,Fe 30,Al 2,HfN 1.8,Re 0.8,Ce 0.2

a-hydro-w-hydroxypoly( oxy- 1 ,2-e thane diy 1)

Generated Names:

Hafnium nitride ( H f N ) alloy, Cr 65,Fe 30,Al 2,HfN 1.8,Re 0.8,Ce 0.2 Figure 11. Alloy name generation.

1 ,CButanediol, polymer with dimethyl 1,4-benzenes dicarboxylate and a-hydro-w-hydroxypoly(oxy-1,2-ethanediyl)

1,4-Benzenedicarboxylicacid, dimethyl ester, poly: mer with 1,4-butanediol and a-hydro-w-hydroxy-

Inverted

Cholest-4-en-6-one, 3-(acetyloxy)-, oxime, (3p)-

poly(oxy- 1,2-ethanediyl)

Poly(oxy-l,2-ethanediyl),a-hydro-w-hydroxy-, poly= mer with 1 ,Cbutanediol and dimethyl 1,4-benzenedicarboxylate

Uninverted (3~)-3-(acetyloxy)cholest-4-en-6-one oxime

Figure 13. Example of name generation for a copolymer.

Inverted

2-Propenoic acid, 7-bromo-5-chloro-8-quinolinylester Uninverted 7-bromo-5-chloro-8-quinolinyl2-propenoate

OH

Substance Input 0

Figure 12. Examples of name uninversion. 0 generated the four entries shown, corresponding to the four Generated Names: major components. For addition compounds, mixtures, and copolymers, the Benzenamine, 4,4’-oxybis-, compd. with 2,3,5,6-: procedures receive as input the Index Name of each component tetrachloro-2,5-cyclohexadiene1,4-dione as well as information on the nature of the combination. The input names are in the inverted form in which they normally 2,5-Cyclohexadiene-l,4-dione, 2,3,5,6-tetrachloro-,= appear in CA Indexes; Le., the name of the parent structure compd. with 4,4’-oxybis[ benzenamine] is given first, followed by the names of the structural units Figure 14. Naming of addition compound. attached to it. However, in names of multicomponent substances only the component name being used as an Index heading is in the inverted form; the other component names of polyfunctional acid or base derivatives are very complex and are uninverted. For example, in the name “benzene, hexarequire examination of not only the names but also the full methyl-, compd. with 2,3,5,6-tetrachloro-2,5-cyclohexadi- structural record. When the program recognizes that it is ene-l,4-dione (1 :l)”, the name “benzene, hexamethyl-” is dealing with such a name, that name will be put out for manual inverted, while the other component is uninverted; the corprocessing. responding inverted name would begin with the parent name The programs for naming multicomponent substances thus “2,5-cyclohexadiene-1,4-dione”.Thus a key part of the noinvolve converting each component name to its uninverted menclature generation procedures for multicomponent subform, and then combining the inverted and uninverted names stances is an “uninversion” routine, Le., a procedure for such that an Index entry is generated corresponding to each converting an inverted Index Name to its corresponding component. An example of this processing is shown in Figure uninverted form. Normal uninverting of a name is a simple 13, which illustrates the naming of a copolymer of three matter of reordering the data elements that are present. monomers. As shown, two of these monomers have uninverted However, the procedure must handle not only the straightnames which are different from the inverted name used when forward reordering but also more complex situations. In the monomer itself appears in the Index. After the uninversion Figure 12 the first example is simple; the processing reorders is completed, the program then generates the three names the name so that the stereochemical designation ‘‘(3P)” is first, shown. Each of these consists of the name of one component, followed by the radical “3-(acetyloxy)-”, the parent followed by the term “polymer with” and the uninverted names “cholest-4-en-6-one”’, and finally the modification “oxime”. of the other two components in alphabetical order. The More complex situations often involve esters and salts, as procedures for copolymers can handle most cases, except those illustrated by the second part of Figure 12. Here the uninwhere the uninversion routine cannot process a component or version process involves removing the word “ester” and where one component is itself a multicomponent substance. changing the suffix ‘“ic acid” to “ate”, so that “2-propenoic The procedures for molecular addition compounds have been acid” becomes “2-propenoate”. Some uninversions of names limited to those with two components, since those with a larger

Journal of Chemical Information and Computer Sciences, Vol. 16, No. 4, 1976

217

VANDERSTOUW ET

Substance: “7.

1

1

Generated Names:

2-Pyridinecarboxylic acid, 4-amino-3,5,6-trichloro-, compd. with N,N-diethylethanamine (1 : 1 ) Et hanamine, N,N-diethy 1-, 4-amino-3,5,6-trichloroc -2-pyridinecarboxylate Figure 15. Naming of amine salt.

number are relatively rare and often very complex. These programs must handle several different types of addition compounds. Figure 14 shows a straightforward type, a charge-transfer complex of two components in undetermined ratio. The two names generated each consist of the inverted name of one component, the term “compd. with”, and the uninverted name of the other component. In other cases the uninverted name produced by the general uninversion routine must be modified to meet the requirements of the nomenclature rules for a specific situation. For the substance shown in Figure 15, for example, the first name, based on “2pyridinecarboxylic acid”, is formed conventionally, but the other name treats the substance as a salt of the amine, so the “ic acid” ending is replaced by “ate”. NOMENCLATURE GENERATION FOR ORGANIC AND COORDINATION COMPOUNDS Although our initial implementations of computer-generated nomenclature are in somewhat specialized areas, we are also actively developing procedures for the more general areas; specifically, the systematic names of organic compounds and coordination compounds. In these areas also the installation of Registry I11 has opened up possibilities which did not exist previously. In particular, the fact that ring systems are specifically identified in the Registry I11 record, as described above, makes it possible to retrieve a ring system name and use it as a building block in naming a complete structure. Prior to Registry I11 we had done a preliminary investigation of the possibility of name generation and had concluded that the most difficult problems would lie in the area of naming ring systems and assigning ring locants. Reports in the literature of other work on name g e n e r a t i ~ nhave ~ . ~ in fact concentrated on the

218

AL.

problems of naming ring systems. The use of Registry I11 capabilities to retrieve ring system names allows us to bypass the problem of ring names except for those cases where a new ring is encountered. In such cases the new ring must be named by a nomenclature specialist before that name can be used as part of other names. In contrast to the 3000 or so new organic ring systems, there are more than 400 000 new substances to be named each year. The naming of monomeric organic and coordination compounds encompasses a wide range of complexity. Many of the new substances, particularly natural products, involve very complicated nomenclature procedures that will require a human nomenclature expert for the foreseeable future. However, for those new substances that are simple in structure and belong to well-defined areas of systematic nomenclature, there is potential for reducing naming costs by generating names via algorithms from the Registry I11 structural record. W e are currently defining algorithms to generate new names according to the rules of general organic and coordination nomenclature. Current plans call for installation of some of these name generation programs early in 1978. ACKNOWLEDGMENT CAS gratefully acknowledges financial support received from the National Science Foundation (Contract C656). REFERENCES AND NOTES P. G. Dittmar, R. E. Stobaugh, and C. E. Watson, “The Chemical Abstracts Service Chemical Registry System. I. General Design”, presented to the 169th Kational Meeting of the American Chemical Society, Philadelphia, Pa., April 1975. R. G. Freeland, S. J. Funk, L. J. O’Korn, and G. A. Wilson, “Augmented Connectivity Molform-A Technique for Recognition of Structure Topology Identity”, presented to the 169th National Meeting of the American Chemical Society, Philadelphia, Pa., April 1975. J. E. Blackwood, P. M. Elliott, R. E. Stobaugh, and C. E. Watson, “The Chemical Abstracts Service Chemical Registry System. 111. Stereochemistry”, presented to the 170th National Meeting of the American Chemical Society, Chicago, Ill,, Aug 1975. N. Donaldson, W. H . Powell, R. J. Rowlett, Jr., R. W. White, and K . V. Yorka, “Chemical Abstracts Index Names for Chemical Substances in the Ninth Collective Period (1972-1976)”, J . Chem. Doc., 14, 3-14 (1974); CA Volume 76 Index Guide, 1972. G. G. Vander Stouw, “Computer Programs for Editing and Validation of Chemical Names”, J . Chem. Inf. Compur. Sci., 15, 232-6 (1975). G . G. Vander Stouw, P. M. Elliott, and A. C. Isenberg, “Automated Conversion of Chemical Substance Names to Atom-Bond Connection Tables”, J . Chem. Doc.. 14, 185-193 (1974). K. Conrow, “Computer Generation of Baeyer System Names of Saturated, Bridged, Bicyclic, Tricyclic, and Tetracyclic Hydrocarbons”, J . Chem. DOC..6 , 206-212 (1966). D. Van Binnendyk, and A. C. MacKay, “Computer-Assisted Generation of IUPAC Names of Polycyclic Bridged Ring Systems”, Can. J . Chem., 51, 718-723 (1973).

Journal of Chemical Information and Computer Sciences, Vol. 16, No. 4, 1976