Chemical Abstracts Service Chemical Registry System. 11. Substance

Chemical Abstracts Service Chemical Registry System. 11. Substance-Related Statistics: Update and Additions. ROBERT E. STOBAUGH. Chemical Abstracts Se...
0 downloads 8 Views 811KB Size
J . Chem. If. Comput. Sci. 1988, 28, 180-187

180

Chemical Abstracts Service Chemical Registry System. 11. Substance-Related Statistics: Update and Additions ROBERT E. STOBAUGH Chemical Abstracts Service, P.O. Box 3012, Columbus, Ohio 43210 Received May 18, 1988 Statistics are updated for types of substances, ring systems, and elemental composition that have been determined for the Chemical Abstracts Service Registry Structure File at different points in time. This paper reports the updated figures and in addition some new statistics and offers some comparisons to show various shifts in file characteristics. INTRODUCTION

Table I. CAS Chemical Registry Coverage (June 1987)

machine registered

The Chemical Abstracts Service (CAS) Chemical Registry System is a computer-based system that uniquely identifies chemical substances on the basis of their molecular structure. The design, content, functions, statistics, special features, and input structure conventions have been described in detail in previous papers.'-'O The computer-readable structure records that make up the basis of the CAS Chemical Registry System are basically records of the atoms and bonds present in the molecular structure of the substances. They represent the ring systems that are present, the substituents attached to the rings, and any substituents that link two or more rings. From these structure records, statistics can be obtained for analyses of elemental composition and ring characteristics. The statistics were first published in 19806 and in this paper are updated as of 1987. Some new statistics are also included concerning the occurrence of various elements in chemical substances. The tables in this paper present a comparison of cumulative occurrence data concerning ring graphs for the years 1974, 1978, and 1987 and ring systems for 1974, 1976, 1978, and 1987. Also compared are the cumulative occurrence data for elemental composition for the years 1967, 1974, 1979, and 1987. Tables report the percentage increase from 1979 to 1987 for the occurrence of elements and for substances containing the given elements. Some statistics are provided for the percentage increase for ring graphs and ring systems. New statistics are provided for the frequency of occurrence of various elements in the presence of the halogens fluorine, chlorine, bromine, and iodine. TYPES OF SUBSTANCES The numbers of substances for several classes of chemical substances as of June 1987 are provided in Table I. These are for machine-registered substances and do not include those registered by manual techniques.* The classes listed are mutually exclusive and are those for which statistics are routinely obtained during operation of the CAS Chemical Registry System. The percentage distribution of these classes is very similar to that found for the same classes as of December 1978.6 The numbers themselves are larger for all classes, considerably so in most cases. STRUCTURAL CHARACTERISTICS Chemical substances are recorded in the C A S Chemical Registry System in terms of components,6 a component being a set of contiguous atoms. For some substances more than 0095-2338/88/1628-0180$01.50/0

tvues of substances fully defined substances

no.

%

radical ions ring parents

6 794 065 116087 289 898 587 775 190800 17 306 1712 14351 68 560

84.1 1.4 3.6 7.3 2.4 0.2 0.02 0.18 0.8

total

8 080 554

100.00

incompletely defined substances polymers coordination compounds

alloys mixtures minerals

Table 11. Expression Statistics (June 1987)

expressions

no.

%

1 component 2 Components 3 components 4 or more components

63 810 660 627 107 628 153 973

6.5 67.0 10.9 15.6

total

986 038

100.0

components/expression av: components/expression high:

2.5 19.0

Table 111. Component Statistics (June 1987)

components 7 094 516 acyclic 777 867 atoms/acyclic component av: 15.9 atoms/acyclic component high: 253 cyclic 6316649 with one ring systems 2 821 733 with two ring systems 2 045 446 with three ring systems 871 762 with four or more ring systems 577 708 ring systems/cyclic fragment av: 1.9 ring systems/cyclic fragment high: 36

(1 1% of

total)

(44.6%) (32.4%)

Table IV. Ring Statistics (June 1987) total

ring graphs ring systems/ring graph ring-node sets sets/ring graph ring-node-bond sets (ring systems) sets/rine node variant

av

high

6.9

6908

3.8

2089

1.8

386

38 842 149 690 270 782

one component or one component plus additional information [for example, a homopolymer like the homopolymer of 2propenoic acid (C,H,O,),] is necessary for adequate representation, for example, salts of acids and of bases, polymers, mixtures, and others. The combination or augmentation of 0 1988 American Chemical Society

J . Chem. InJ Comput. Sci., Vol. 28, No. 4, 1988 181

CAS CHEMICAL REGISTRY SYSTEM.11 Table V. Ring Graph Categorization of 200 Most Frequent Ring

Table VII. Ring Systems

Graphs occurrence type

1978

1987

ortho-fused spiro-fused fused- bridged bridged single spiro macro *-complex polyhedral

98 26 25 17 15 13 3 2 1

86 31 25 21 14 11 4 6 2

Table VI. Ring Graphs 1974

1978

1987

0

2 362 044

4143118

8 263 432

0

368 894

689 176

1466 857

202 188

358 567

697 101

m

03 n

210 128

342 957

657 161

47 116

85 826

170 569

dP

67 395

93 378

147 342

033

45 175

70063

131 155

0

21 325

39 303

89 952

15 150

32 784

79 200

22 954

39 161

74 060

21 282

35 109

61 215

a2

7 194

23 189

(15)

1974

1976

1978

1987

1181 106

2 577 641

3 269 266

6 436 928

95 086

130 190

161 097

310502

80 004

112132

142 050

285 470

44 935

69716

99 482

268 217

37 767

56 355

76 062

191 739

61 509

78 793

95 295

188 160

57 048

74 685

93 236

175 495

40 285

56 110

75 046

160831

27 794

39 152

48 643

96315

25 531

33 743

40 829

83 829

16 366

26 527

40 224

83 233

23 139

32 739

41 068

82 337

22 549

32 128

41 141

81 768

27 116

36751

44 873

80 868

27 959

37 565

46 152

79 262

17 130

24 570

33 304

73 589

25 087

32 564

39 564

71 175

17 919

25 801

33 147

69 973

12505

19484

25 899

66 203

10811

16921

23 748

64 822

56 538

16 299

26 560

51 823

8 339

19218

47 678

16 239

26 321

47 456

13 572

23 124

42715

13 500

23 447

40 798

11 372

19 298

34 832

8 729

15 739

30035

11018

16421

25 595

components is called an “expression”. About 12.2% of the recorded substances are made up of expressions, broken down

into various types, as listed in Table 11. The overall percent is the same as that for the 1978 analysis. The variation as to number of components is similar, with a larger percentage found for expressions with three and four or more components. As before, from 1 to 19 components are found in expressions, the average being 2.5. Two-component expressions are the most common; those with five or more are almost always alloys. The majority of chemical substances are represented by a single component, and these may either be acyclic (containing no ring structure) or cyclic. Table 111lists statistics for these single-component substances. A small portion (1 1%, down from 12.5% in 1978) contain no rings. These acyclic substances contain up to 253 non-hydrogen atoms, the average being about 16. Of the substances containing rings, 44.6% contain one ring system and 32.4% two ring systems. The

182 J. Chem. In$ Comput. Sci., Vol. 28, No. 4 , 1988 Table VIII. Elemental Composition Statistics by Substance 513 1/67 2/28/74 element no. of no. of symbol substances R substances % Ac 62 0.002 275 6 336 0.232 474 88 0.014756 Ag 0.152422 41 909 19436 0.713 127 0.012 181 2 Am 0.000 335 332 2 Ar 0.004 660 127 0.000 335 2 165 0.363 030 As 15 162 0.556 309 2 At 0.003 155 86 0.000 335 144 3 117 0.024 146 Au 0.1 14366 5417 B 1.322 970 0.908 333 36 057 39 Ba 0.157 1 1 1 0.006 539 4 282 76 0.01 2 744 2 669 Be 0.097 928 BI 186 0.031 189 3 002 0.1 I O 146 Bk 53 0.001 945 Br 24 378 4.087 750 147014 5.394 095 2 630 958 C 594 706 99.676 353 96.532 551 Ca 72 0.012 073 6 839 0.250 930 6 266 Cd 120 0.020 1 13 0.229 906 8 2412 0.001 341 Ce 0.088 499 Cf 76 0.002 789 CI 544891 19.992610 83 432 13.990 043 Cm 130 0.004 770 38 157 co 179 0.030 01 5 1.400014 Cr 22 740 136 0.022 805 0.834354 3 789 25 0.004 192 cs 0.139022 32 635 25 1 0.042 088 1.197 412 cu D 0.470 5 16 0.586213 2 806 15 977 7 0.001 173 1334 0.048 946 D> 0.001 676 0.057 788 1575 Er IO Es 33 0.001 21 I 2 043 Eu 8 0.001 341 0.074 960 F 166 386 9.840752 6.104 873 58 687 40 322 0.024 48 1 1.479456 146 €e 42 Frn 0.001 541 45 Fr 0.001 651 3 252 0.1 I9 3 19 Ga 104 0.017439 9 0.001 509 Gd 1888 0.069 273 1306 0.218 992 Ge 8 677 0.318368 H 596 347 99.996 646 2 636 306 96.728 775 107 He 0.003 926 Hf 8 1252 0.045 937 0.001 341 0.296 629 I769 1 1 601 0.425 653 Hg Ho 0.044 543 3 1214 0.000 503 6 957 I 2.379 376 64 849 1.166 563 In 45 3 279 0.007 546 0.120 310 4717 Ir 0.173 072 16 0.002 683 K 0.639 635 161 17433 0.026 997 Kr 184 0.006 75 1 La 11 0.001 844 2 843 0.104 313 Ll 0.285 676 298 0.049 696 7 786 14 Lr 0.000 514 93 1 Lu 0.034 159 0.000 503 3 Md 25 0.000 9 17 320 8 726 0.320 166 0.053 658 Mg 19931 66 0.01 1 067 Mn 0.731 289 29 Mo 14476 0.531 139 0.004 863 h 1751 974 383 050 64.230583 64.281 725 41 217 187 0.031 356 ha 1.512 295 Nb 5 328 50 0.195 490 0.008 384 Nd IO 0.001 677 2 435 0.089 343 Ne 100 0.003 669 NI 1 I7 34 294 0.019 619 1.258 282 No 23 0.000 844 56 1 0.020 584 NP 0 492 746 82.624 625 2 215 027 81.271 616 47 os 0.007 881 2 000 0.073 382 P 28 044 4.702 473 169 677 6.225 623 Pa 0.013062 356 Pb 404 6 707 0.067 743 0.246078 Pd 36 0.006 036 0.314222 8 564 Pm 0.005 247 143 Po 62 0.010 396 0.005 944 162 Pr 9 0.069 493 1894 0.001 509 Pt 172 12 378 0.028 841 0.454 162 Pu 3 0.000 503 0.024 069 656

S ToBA uGH 2/18/79 no. of substances 7% 106 0.002 328 12 162 0.267 151 37 902 0.832 558 0.009 906 45 1 0.004 393 200 0.552710 25 162 0.003 514 160 0.145 876 6641 1.283433 58 428 7 035 0.1 54 531 0.091 686 4 I74 0.117891 5 367 0.002 350 107 243 199 5.342 127 4293917 94.320495 11 770 0.258 540 10 445 0.229 435 4 363 0.095 837 134 0.002 943 894 992 19.659459 214 0.004 700 67 243 1.477 064 49 69 1 1.091 516 0.137837 6 275 1.341 709 61 081 30 028 0.659 597 0.052235 2 378 0.058 319 2 655 88 0.001 933 0.072663 3 308 265 714 5.836693 1.992 059 90 688 77 0.001 691 78 0.001 713 5 594 0.122878 3 447 0.075 7 17 14030 0.308 184 4 289 454 94.222461 156 0.003 426 2455 0.053 926 18273 0.401 386 2017 0.044 305 99516 2.185 975 5817 0.127776 0.186 096 8 472 0.618872 28 174 290 0.006 370 4 847 0.106469 0.318 420 14496 0.001 164 53 0.033 410 1521 58 0.001 274 0.357 541 16277 1.055 733 48 062 0.754 227 34 336 2 872 I42 63.089681 1.566 752 71 326 1 1 441 0.251 313 4016 0.088 215 135 0.002 965 69 436 1.525 236 64 0.001 405 886 0.0 19 46 1 3 630 656 79.751 256 4 062 0.089 226 292 133 6.417014 44 1 0.009 687 11 351 0.249 336 0.380 122 I 7 305 205 0.004 503 24 1 0.005 293 3 181 0.069 874 24 745 0.543 550 0.021 526 980

7130187 no. of substances 224 22733 68 590 638 318 38 905 444 14212 109481 10961 5 790 9091 164 438 177 7 961 367 17568 16991 8 I23 185 1 632 270 370 119284 99 886 8 888 116034 59 658 4 397 4 356 123 5 497 519 I13 190 117 108 446 9 906 6 206 23 591 7 923 788 219 5 046 30216 3 720 172 552 10228 15794 47 248 349 8 593 28 095 86 2 678 86 27 352 96 636 71 840 5 400 264 126 625 22111 8 848 160 131 312 93 1254 6 808 457 10226 552 793 589 I8 370 35 873 366 399 6 159 48 771 1318

%

0.002 658 0.269 754 0.8 13 902 0.007 571 0.003 773 0.461 654 0.005 269 0.168 642 1.299 123 0.130 065 0.068 705 0.107 876 0.001 946 5.199 495 94.471 147 0.208 465 0.201 619 0.096 389 0.002 195 19.368 837 0.003 643 1.415 447 1.I85 267 0.105467 1.376 882 0.707914 0.052 176 0.051 689 0.001 460 0.065 228 6.159 897 2.255 966 0.001 282 0.005 292 0.1 I7 547 0.073 642 0.279935 94.025 228 0.002 599 0.059 877 0.358 549 0.044 142 2.047 536 0.121 367 0.187 41 5 0.560 654 0.004 141 0.101 966 0.333 381 0.001 020 0.031 778 0.001 020 0.324 564 1. I46 702 0.852468 64.080 595 1.502557 0.262 373 0.100720 0.001 899 1.558 174 0.001 104 0.014880 80.790490 0.121 344 6.559 551 0.006 989 0.21 7 982 0.425 676 0.004 343 0.004735 0.073 084 0.578 726 0.015 640

CAS CHEMICAL REGISTRY SYSTEM.I I

J . Chem. In$ Comput. Sci., Vol. 28, No. 4, 1988 183

Table VI11 (Continued) 5/31 167

element symbol Ra Rb Re Rh Rn Ru S

Sb Sb Se

Si Sm Sn Sr T Ta Tb Tb Te Th

Ti

TI Tm

U V

W

2/28/74

no. of substances 4 14 13 3 7 118 536 88 1 16 2116 8 628 46 2 857 25 579 38

%

0.000 671 0.002 347 0.002 180 0.000 503

4 225 48 407 129 3 137 145 57

0.001 174 19.876 351 0.147 728 0.002 683 0.354 8 15 1.446 760 0.007 713 0.479 067 0.004 192 0.097 088 0.006 372 0.000 838 0.000 671 0.037 728 0.008 049 0.068 247 0.021 631 0.000 503 0.022 972 0.024314 0.009 558

9 5 595 57

0.001 509 0.000838 0.099 771 0.009 558

5

Xe

Y Yb Zn Zr

2/18/79

7130187

no. of

no. of subs tances

no. of %

substances

%

substances

%

72 2 486 3 606 6 906 54 4 746 569 805 8 442 1513 14765 66 152 2 068 21 605 2 325 2313 2971 1190 345 3 506 1966 12 402 3 467 829 5 799 8 705 9 255 422 2 192 1320 14 703 4914

0.002 642 0.091 214 0.132 308 0.253 388 0.001 981 0.174 136 20.906 73 1 0.309 746 0.055 514 0.541 743 2.427 185 0.075 877 0.792 710 0.085 307 0.084 866 0.109 009 0.043 662 0.012658 0.128 639 0.072 135 0.455 042 0.127208 0.030 417 0.212 771 0.3 19 395 0.297 564 0.015 484 0.080 427 0.048 432 0.539468 0.180300

119 4 140 6 526 14 108 81 9 724 967 434 14 741 2 163 25 251 125 165 3 483 36 652 3 863 4 069 6 024 2 002 650 6 153 3 053 24710 5 846 1361 9 675 17437 18 122 574 3 985 2 233 25 030 9 703

0.002 6 13 0.090 939 0.143 380 0.309 897 0.001 779 0.213 598 21.250726 0.323 801 0.047512 0.554665 2.749 383 0.076 507 0.805 100 0.084 854 0.089 379 0.132323 0.043 976 0.014277 0.135 157 0.067 062 0.542 781 0.128413 0.029 895 0.212521 0.383 022 0.398 069 0.012608 0.087 534 0.049 050 0.549810 0.213 136

285 6 022 13 440 30 548 195 26 884 1 865 753 23717 3 155 46 125 264 263 6771 64731 5 585 8 428 11 094 4 472 1923 13632 4 867 46 239 8 622 2 468 14 764 32 809 38 842 632 7 391 3 939 43 157 21 013

0.003 382 0.071 458 0.159482 0.362 489 0.002314 0.31901 1 22.139392 0.28 1 43 1 0.037 438 0.547 328 3.135 797 0.080 346 0.768 111 0.066 273 0.100 008 0.131 644 0.053 066 0.022819 0.161 760 0.057 753 0.548 681 0.102 3 10 0.029 286 0.175 193 0.389 3 18 0.460 907 0.007 523 0.087 703 0.046 741 0.512 109 0.249 344

figures from 1978 were 48.0% and 31.8%. A basic design feature of the CAS Chemical Registry System is that the ring systems present in a structure are recognized during the registration process.' The systems are stored in the structural record of the substance as an identifying number, linking the structure record to a file of ring systems. In this file, ring systems are recorded as composites of the ring graphs or basic patterns, as graph-node (atom) variations, and as graph-node-bond variations for fully s p i f i e d systems. Statistics for the ring system constituents, listed in Table IV, show that in the total file of over 8 million substances there are 38 842 basic graphs and 270782 different ring systems. The number of ring systems per ring graph ranges from 1 to 6908, the average being almost 7. For each graph, there is an average of 3.8 different sets of specific atoms, ranging from 1 to 2089 sets. For each graph-node variant, there are on the average almost 2 specific bond variations, ranging from 1 to 386. Ring systems are very common in chemical substances, occurring 13 522 53 1 times in 6 3 16 649 different substances. Of the total number of occurrences, 6436928, 47.6% are phenyl rings, down from 55% in the 1978 statistics. The next most common ring is the pyridyl ring, occurring 310 502 times, 2.30%, down slightly from the 2.72% reported previously. The 200 most common ring graphs, which account for over 96% of the total ring occurrences, are categorized as listed in Table V, with no substantial variation from that reported previously. The number of ortho-fused graphs decreased somewhat, and the number of spiro-fused graphs increased, with small differences in other types. The 20 most frequently occurring ring graphs, with the number of occurrences given for 1974, 1978, and 1987, are pictured in Table VI. Again, the six-membered ring occurs most frequently, to the extent of over 61% of total graph occurrences. The same graphs are in the top 20 as previously, with some slight shifting, in most cases by one or two places

in the list. The total number of different graphs showed a 73% increase from 1978 to 1987. The percentage increase in number of occurrences of individual graphs from 1978 to 1987 shows that the bicyclo[3.2.0] (no. 14 on the list) increased the most (148.1%), followed by the bicyclo[4.2.0] graph (no. 12) (141.6%) and the bicyclo[3.3.0] graph (no. 9) (141.6%). The same three graphs, in slightly different order, showed the largest percentage increase of Occurrence previously, from 1974 to 1978. The typical tetracyclic steroid graph, which showed the smallest increase from 1974 to 1978, similarly showed a comparatively small increase from 1978 to 1987 (57.8%). This is the same trend found previously, indicating a higher publishing activity in such fields as cephalosporins, penicillins, prostaglandins, and other classes having the above bicyclo systems as compared to the activity in the steroid field. The rather large increase from 1978 to 1987 for the four-membered ring graph (no. 8) (128.9%) and the three-membered ring (no. 5) (98.7%) indicate increasing publishing activity involving small (three- and four-membered) ring compounds. Occurrence statistics for the top 20 individual ring systems are shown in Table VII, figures being given for 1974, 1976, 1978, and 1987. The first four remain in the same order as in 1978. These are all six-membered rings, as are the majoriy of the top 20 and also the top 200 (supplementary material). Six-membered rings, single and fused, account for about 70% of all ring occurrences. Statistics for ring system occurrence support the trends noted above in the discussion of ring graph occurrence. There is no steroid system in the top 20 systems, the one previously occurring as no. 19 in 1978 having dropped to no. 25 in 1987 and showing a relatively small increase (52.9%). The percentage increase in occurrence of the three-membered cyclopropane ring (no. 16) is large (121.0%), as is the case for ethylene oxide (oxirane) (no. 21, not shown) at 97.6%. The dihydropyrimidine system (no. 20) shows the largest increase in frequency of occurrence at 173%, followed by the tetrahydropyran ring (no. 4) at 169.6% and the purine

184 J . Chem. If. Comput. Sci., Vol. 28, No. 4, 1988

STOBAUGH

Table IX. Elemental Composition Statistics by Occurrence 5/31/67

2/28/74

no. of element Ac Ag AI Am Ar As At Au B Ba Be Bi Bk Br C Ca Cd Ce Cf CI Cm co Cr cs cu D

DY

Er Es Eu F Fe Fm Fr Ga Gd Ge H He Hf Hg Ho

I In Ir K Kr La Li Lr Lu Md Mg Mn Mo N Na Nb Nd Ne Ni No NP 0

os

P Pa Pb Pd Pm Po Pr Pt Pu

2/18/79

no. of

occurrences

70

91 947 2 2 2 496 2 144 7 279 39 77 193

0.0003 0.0039

34 255 9400518 73 120 8

0.1413 38.7785 0.0003 0.0004

149 248

0.6157

198 159 26 262 7 833 7 10

0.0008 0.0006 0.0001 0.0010 0.0323

8 21 1854 149 108 9 1563 11 547 701

8 1974 3 10 429 47 16 180 11 347

0.0102 0.0006 0.0300 0.0002 0.0003 0.0008

0.8730 0.0006 0.0004 0.0064 47.6360 0.008 1 0.0430 0.0007 0.0014

3 328 69 29 917258 215 50 10

0.0013 0.0002 0.000 1 3.7838 0.0009 0.0002

119

0.0004

1715545 47 35 706

7.0769 0.0002 0.1437

470 37

0.0019 0.0001

62 9 174

0.0002

3

0.0007

occurrences 64 9 129 29 753 368 173 25610 93 4 530 115237 5717 4 664 5 101 56 228 223 44 959 807 9915 7 653 3 384 79 974 531 163 46 754 26216 4 423 39 908 52 560 2 043 2 764 35 2 473 723 577 51 396 44 49 5 290 3014 13 219 5 5 600 78 1 137 1835 14071 1844 94 153 4 406 5 322 19439 191 4 590 9971 16 1248 27 12 155 22 979 24 867 4 674 936 44 699 11 393 3 652 121 39 706 27 623 8 702 367 2 669 260 947 416 9 529 10 740 156 165 2 873 14426 89 1

no. of 70 0.000054 0.007 725 0.025 177 0.000 3 11 0.000 146 0.021 672 0.000 079 0.003 833 0.097516 0.004838 0.003 947 0.004 3 17 0.000 047 0.193 126 38.045 697 0.008 390 0.006 476 0.002 864 0.000 067 0.824663 0.000 138 0.039 564 0.022 184 0.003 743 0.033 771 0.044477 0.001 729 0.002 339 0.000030 0.002 093 0.612 302 0.043 492 0.000037 0.000 041 0.004 476 0.002 550 0.01 I 186 47.050257 0.00 116 0.001 553 0.01 1 907 0.001 560 0.079 674 0.003 728 0.004 504 0.016 450 0.000 162 0.003 884 0.008 438 0.000 014 0.001 056 0.000023 0.0 10 268 0.019445 0.021 043 3.956 004 0.037 825 0.009 641 0.003 090 0.000 102 0.033 600 0.000 023 0.000 527 7.364 080 0.002 259 0.220 817 0.000 352 0.008 064 0.009 088 0.000 132 0.000 140 0.002 43 1 0.012 208 0.000 754

occurrences 108 15 448 50 972 504 243 42531 167 8 092 182493 7618 5 743 9314 115 377 743 76 131 591 14051 12 190 5 548 147 1587485 255 79 123 55 462 6 602 72 696 99610 3 263 3 455 90 4 195 1181 628 108 053 79 80 7 872 4 820 20331 94 297 072 203 3 032 22 044 2 770 146 121 7 187 9 682 29 139 308 7212 16 536 55 1921 60 18916 52 376 53 268 8 040 072 73 581 20 772 5 556 161 77 127 66 988 14932265 5 864 466 904 509 15 205 20 936 229 245 4 276 28 255 1088

7%

0.000 053 0.007 703 0.025 417 0.000251 0.000 121 0.021 208 0.000 083 0.004 035 0.091 002 0.003 798 0.002 863 0.004 644 0.000 057 0.188 365 37.963 829 0.007 006 0.006 078 0.002 721 0.000 073 0.791 616 0.000 127 0.039 455 0.027 656 0.003 292 0.036 250 0.049 671 0.001 627 0.001 722 0.000 044 0.002 091 0.589 231 0.053 881 0.000 039 0.000 039 0.003 295 0.002 403 0.010 138 47.022 240 0.000 101 0.001 511 0.010992 0.001 381 0.072 864 0.003 583 0.004 828 0.014 530 0.000 153 0.003 596 0.008 245 0.000 027 0.000 957 0.000 029 0.009 432 0.026 117 0.026 562 4.009 267 0.036 691 0.010358 0.002 770 0.000 080 0.038 460 0.000032 0.000492 7.446 133 0.002 924 0.232 826 0.000 253 0.007 582 0.010 439 0.000 114 0.000 122 0.002 132 0.014089 0.000 542

7130187 no. of Occurrences 229 28212 86616 723 40 1 64 479 47 1 17633 285 272 11753 7 459 16600 176 651 845 149 728 571 19936 19351 9 726 208 2 765 829 365 137635 108 556 9 396 137501 194838 5617 5 507 127 6 723 2 212 825 222 572 111 453 13251 8013 31 870 186 360933 289 5 838 36 042 4 833 246 240 12273 18 879 48 563 379 12 107 31 650 89 3 501 89 30 556 103 454 106 165 16 032 745 129 397 34 472 10816 174 145 070 113 1426 29 456 422 17040 974 028 66 1 24470 43 174 397 407 7 861 55 628 1509

%

0.000058 0.007 147 0.021 943 0.000 183 0.000 102 0.016 335 0.000 119 0.004 467 0.072 270 0.002 977 0.001 890 0.004 205 0.000045 0.165 137 37.931 878 0.005 051 0.004 902 0.002 464 0.000053 0.700689 0.000092 0.034 868 0.027 501 0.002 380 0.034 834 0.049 360 0.001 423 0.001 395 0.000032 0.001 703 0.560 592 0.056 386 0.000 028 0.000 115 0.003 357 0.002 030 0.008 074 47.212234 0.000073 0.001 479 0.009 13 1 0.001 244 0.062 382 0.003 109 0.004 783 0.012 303 0.000 096 0.003 067 0.008018 0.000023 0.000 887 0.000023 0.007 741 0.026 232 0.026 896 4.061 697 0.032 78 1 0.008 733 0.002 740 0.000 044 0.036 752 0.000 029 0.000 361 7.462 420 0.004317 0.246 758 0.000 167 0.006 199 0.010938 0.000 101 0.000 103 0.001 991 0.014093 0.000 382

J . Chem. InJ Comput. Sci., Vol. 28, No. 4, 1988 185

CAS CHEMICAL REGISTRY SYSTEM.11 Table IX (Continued)

513 1/67

2/28/74

no. of Occurrences

element

%

4 14 13 3

Ra Rb Re Rh Rn Ru S Sb sc Se

7 166 664 967 16 2 534 16 362 46 3 396 25 857 38 5 4 231 48 478 140 3 141 148 59

Si Sm Sn

Sr T Ta Tb Tc Te Th Ti TI Tm

U V W Xe Y Yb Zn Zr

%

0.0104 0.0675 0.0002 0.0 144 0.0001 0.0035 0.0001 0.0009 0.0002 0.0019 0.0006 0.0006 0.0006 0.0002

9

5 0.0025 0.0002

7130187

% 0.000 059 0.002 2 16 0.004 425 0.008 9 16 0.000 040 0.006 097 0.748 040 0.009 773 0.001 478 0.020 585 0.103 912 0.002 343 0.021 679 0.002 252 0.003 220 0.005 111 0.001 340 0.000 373 0.005 425 0.001 788 0.014940 0.003 434 0.000 877 0.006 323 0.013361 0.022 214 0.000 320 0.002 842 0.001 462 0.014 117 0.006015

occurrences 119 4 445 8 874 17881 81 12 227 1 500 099 19600 2 965 41 282 208 384 4 699 43 476 4518 6 458 10251 2 689 749 10881 3 587 29 962 6 888 1760 12681 26 794 44 549 643 5 700 2 932 28311 12063

0.000 061

72 3061 4 980 8 810 54 5 987 881 655 10787 1986 25 105 120257 2 903 26 774 3 684 3 486 5 653 1740 424 6 889 2 349 17 361 4 193 1170 7 512 14 492 134 149 462 3 442 1854 19 132 6 747

0.6875 0.0040

613 57

2/ 18/79

no. of

no. of Occurrences

0.002 590 0.004 2 14 0.007 455 0.000 046 0.005 066 0.746 070 0.009 128 0.001 68 1 0.021 244 0.101 763 0.002457 0.022 657 0.003 117 0.002 950 0.004 784 0.001 472 0.000 360 0.005 830 0.001 988 0.014 691 0.003 548 0.000 990 0.006 357 0.012 263 0.020 113 0.000 391 0.002 913 0.001 569 0.016 190 0.005 709

no. of occurrences 289 6 324 18 472 41 119 197 35 688 2 885 595 30 047 4 264 66 087 408113 8 474 76 224 6 253 13811 16 845 5 457 2 303 20 806 5518 54 547 10413 3 172 18 563 46 203 88 739 729 9 842 4981 47918 24 553

% 0.000073 0.001 602 0.004 680 0.010417 0.000 050 0.009 041 0.731 030 0.007 6 12 0.001 080 0.0 16 742 0.103 390 0.002 147 0.019 3 10 0.001 584 0.003 499 0.004 267 0.001 382 0.000 583 0.005 271 0.001 398 0.01 3 8 19 0.002 638 0.000 804 0.004 703 0.01 1 705 0.022 48 1 0.000 185 0.002493 0.001 262 0.012 139 0.006 220

Table X. Twenty Most Frequently Occurring Elements According to Number of Substances 1967

element H C 0 N

S C1

F P Br

Si

% 99.99 99.68 82.62 64.23 19.88 13.99 9.84 4.70 4.09 1.45

1974

element I B

Sn D As Se Hg Ge AI Sb

% 1.17 0.91 0.48 0.47 0.36 0.35 0.30 0.22 0.152 0.148

element H C 0 N

S C1

P F Br Si

7%

1979 %

element

%

2.34 1.51 1.48 1.40 1.32 1.26 1.20 0.83 0.79 0.73

C H 0 N S Cl P

94.32 94.22 79.75 63.09 21.25 19.66 6.42 5.84 5.34 2.75

element

96.73 96.53 81.27 64.28 20.91 19.99 6.23 6.10 5.39 2.43

I Na Fe Co B Ni Cu Cr Sn Mn

F Br Si

1987

element

I Fe Na Ni Co Cu B Cr Mn A1

9%

element

%

element

%

2.19 1.99 1.57 1.53 1.48 1.34 1.28 1.09 1.06 0.83

C H 0 N S C1

94.47 94.03 80.79 64.08 22.14 19.37 6.56 6.16 5.20 3.14

Fe

2.26 2.05 1.56 1.50 1.42 1.38 1.30 1.19 1.15 0.85

P F Br Si

I Ni Na Co Cu B Cr Mn Mo

Table XI. Twenty Most Fresuently Occurring Elements According to Number of Atoms 1967

element H C 0 N

F S C1 P Cr

Si I D B Sn Se As

Hg Ge AI Sb

1974

% 47.64 38.79 7.08 3.78 0.87 0.69 0.62 0.15 0.14 0.07 0.04 0.032 0.030 0.014 0.010 0.010 0.008 0.006 0.004 0.004

element H C 0

N

c1 S F P Br Si B I D Fe co cu Ni AI Sn Cr

1979

% 47.05 38.05 7.36 3.96 0.82 0.75 0.6 1 0.22 0.19 0.10 0.098 0.080 0.044 0.044 0.040 0.034 0.034 0.025 0.023 0.022

element

H C 0 N

c1 S F P Br Si B I Fe D co Ni cu Cr A1 Sn

1987

% 47.02 37.96 7.45 4.01 0.79 0.75 0.59 0.23 0.19 0.10 0.093 0.074 0.052 0.050 0.039 0.038 0.036 0.028 0.025 0.022

element H C 0 N

S

c1 F

P Br Si B I Fe D Ni co cu Na Cr W

% 47.21 37.93 7.46 4.06 0.73 0.70 0.56 0.25 0.17 0.10 0.072 0.062 0.056 0.049 0.037 0.035 0.035 0.033 0.028 0.022

186 J . Chem. In& Comput. Sci., Vol. 28, No. 4, 1988 Table XII. Registry Structure File Statistics substances element no. % Ac 118 111.0 10571 86.9 Ag 81.0 A1 30 688 41.5 Am 187 59.0 118 Ar 54.6 13 743 As 177.5 284 At 114.0 7 571 Au 87.4 B 51 053 55.8 3 926 Ba 38.7 1616 Be 69.4 3 724 Bi 53.3 57 Bk 80.2 I94 978 Br 83.1 3 567 450 C 49.3 5 798 Ca 62.7 6 546 Cd 86.2 3 760 Ce 38.1 51 Cf 82.4 737 278 CI 43.5 93 Cm 77.4 52041 co 50 195 Cr 101.0 41.6 2613 cs 54 953 cu 90.0 29 630 D 98.7 84.9 2019 DY 1701 64.1 Dr 35 39.8 ES 2 189 66.2 Eu 253 399 95.4 F 99 429 109.6 Fe 31 40.3 Fm Fr 47 1.8 368 4312 Ga 77.1 2 759 80.0 Gd 9 561 68.1 Ge 3 634 334 84.8 H 40.4 63 He Hf 2 591 105.5 11 943 65.4 Hg 1703 84.4 Ho 73.4 73 036 I 4411 75.8 In 7 322 86.4 Ir 19074 67.7 K 59 20.3 Kr 77.3 3 746 La 13 599 93.8 Li 33 62.3 Lr 1157 76.1 Lu 28 48.3 Md 68.0 11 075 Mg

occurrences no. % 121 112.0 12 764 82.6 37 644 73.9 43.5 218 65.0 158 51.6 21 948 182.0 304 117.9 9 541 56.3 102 779 54.3 4 135 29.9 I716 78.2 7 286 53.0 61 72.6 274 102 96.7 73 596 980 41.9 5 885 7 161 58.7 78.2 4 268 41.5 61 74.2 1 178 344 43.1 110 74.0 58 512 95.7 53 094 42.3 2 794 89.1 64 805 95.6 95 228 72.1 2 354 59.4 2 052 41.1 37 60.3 2 528 87.3 1031 197 114519 106.0 40.5 32 373 466.3 5 379 68.3 3 193 66.2 56.8 1 I 539 97.6 92063861 42.4 86 92.5 2 806 63.5 13998 74.5 2 063 68.5 100119 70.8 5 086 95.0 9 197 66.7 19424 71 23.1 67.9 4 895 91.4 15114 61.8 34 82.2 1580 48.3 29 61.5 11 640

Table XIII. Elemental Composition Increases from 1979 to 1987 substances atoms

1967 1974 1979 596 367 2 725 462 4 552 475 24241534 118173172 200537175

1987 8 427 300 394730177

ring system (no. 19) at 155.6% and then the tetrahydrofuran ring (no. 5 ) a t 152.0%. ELEMENT STATISTICS

Statistics have been obtained a t varying time intervals on the occurrence of the chemical elements in the CAS Chemical Registry Structure File. Table VI11 shows the number of substances containing a given element (atomic numbers 1-103) and the percent of the total as of 1967," 1974, 1979, and 1987. Table IX lists the analogous data for occurrences (number of atoms) of the elements. The 1967 statistics are included for

STOBAUGH

element Mn Mo N Na Nb Nd Ne Ni No NP 0

os

P Pa Pb Pd Pm Po Pr Pt Pu Ra Rb Re Rh Rn Ru S Sb sc Se Si Sm Sn Sr T Ta Tb Tc Te Th Ti TI Tm U V

w

Xe Y Yb Zn Zr

substances no. 5% 48 574 101.1 37 504 109.2 2528 122 88.0 55 299 77.5 10670 93.3 4 472 111.4 25 18.5 61 876 89.1 29 45.3 368 41.5 3 177801 87.5 6 I64 151.7 260 660 89.2 148 33.6 7019 61.8 18 568 107.3 161 78.5 158 65.6 2 978 93.6 24 026 97.1 338 34.5 166 139.5 1882 45.5 6914 105.9 16 440 116.5 114 140.7 17 160 176.5 898 319 92.9 8 976 60.9 992 45.9 20 874 82.7 139098 111.1 3 288 94.4 28 079 76.6 1722 44.6 4 359 107.1 5 070 84.2 2 470 123.4 1273 195.8 7 479 121.6 1814 59.4 21 529 87.1 2 776 47.5 1107 81.3 5 089 52.6 15 372 88.2 20 720 114.3 60 10.5 3 406 85.5 1706 76.4 18 127 72.4 11310 116.6

occurrences

no. 51 169 52 897 7 992 673 55 816 13 700 5 260 13 67 943 47 438 14524 I57 11 176 507 124 152 9 265 22 238 168 162 3 585 27 373 42 1 170 1879 9 958 23 238 116 23 461 1 385 496 10447 1299 24 805 199 729 3 775 32 748 1735 7 353 6 594 2 768 1554 9 925 1931 24 585 3 525 1412 5 882 19409 44 190 86 4 142 2 049 19607 12 490

95 97.7 99.3 99.4 75.9 66.0 94.7 8.1 88.1 71.2 44.3 97.3 190.6 108.6 30.0 60.9 106.2 73.4 66.1 83.8 96.9 38.7 142.9 42.3 108.2 130.0 143.2 191.9 92.4 53.3 43.8 60.1 95.8 80.3 75.3 38.4 113.9 64.3 102.9 207.5 91.2 53.8 82.1 51.2 80.2 46.4 72.4 99.2 13.4 72.7 69.9 69.3 103.5

historical purposes only, since the Registry File a t that early stage was not at all representative of the entire chemical field. Examination of the statistics of Tables VI11 and IX shows that the element hydrogen is the most frequently occurring element according to the number of atoms, followed by carbon, oxygen, nitrogen, sulfur, and chlorine. This is also the case according to the number of substances, although the listings in Table VI11 do not seem to bear that out, showing the element carbon first. However, if data for hydrogen (H) and its isotopes deuterium and tritium are combined, then it is the most frequently occurring for substances as well as atoms. Comparisons according to time show only minor variations, as indicated in Table X, where the 20 most frequently occurring elements according to the number of substances are listed, and in Table XI, where the criterion is the number of atoms. Table XI11 lists the number of substances (both machine and manual registration) and the number of total atoms in the four times chosen for generation of statistics. The per-

J . Chem. In$ Comput. Sci., Vol. 28, No. 4, 1988 187

CAS CHEMICAL REGISTRYSYSTEM. 11 Table XIV. Frequency of Occurrence of Various Elements in Halogen-Containing Compounds AI As

B Br CI D F Ge Hg I Mg N 0

os P S

Sb Se Si Sn Te Ti TI

Zn

I

Br

CI

F

1.062 13.835 1.451 0.270 0.570 2.122 0.287 8.742 18.608 1.000 4.982 0.034 0.020 55.716 0.41 1 0.077 13.907 6.178 0.325 3.706 35.846 1.114 36.572 4.639

3.181 17.200 3.554 1.000 0.206 6.235 0.725 14.246 28.925 1.751 27.816 0.080 0.060 71.823 0.831 0.206 24.308 9.895 0.728 8.285 57.614 3.635 67.492 8.158

9.399 48.888 8.186 2.868 1.ooo 12.848 3.680 46.950 84.641 5.116 28.654 0.366 0.243 231.084 3.391 0.935 118.385 22.398 3.317 30.375 106.475 27.423 150.098 36.066

1.946 23.170 21.345 1.018 0.372 5.654 1 .ooo 22.125 21.926 2.61 1 8.105 0.103 0.074 73.713 1.426 0.289 48.830 7.604 1.509 5.983 68.544 5.917 120.447 4.828

centage increase in number of substance from 1979 to 1987 is 85.1%, for the number of atoms, 96.8%. For individual elements, the percentage increase (Table XII) varied from 10.5% (xenon) to 471 -8% (francium) for substances and from 8.1% (neon) to 466.3% (francium) for occurrences. While two of the rarer elements, francium and technetium, show the very highest percentage increases, several more common elements showed high increases as well. For number of substances, ruthenium (176.5%), osmium (151.7%), tellurium (121.6%), and zirconium (1 16.6%) show high increases; for occurrences, rutheniuim (191.9%), osmium (190.6%), rhodium (130.0%), and gold (1 17.9%) show high increases. Ruthenium, osmium, and gold also showed high increases in the previous paper6 on statistics. Besides the statistics on elemental composition, others can be obtained by appropriate manipulation of data present in the CAS O N L I N E Registry File. As an illustration, occurrences of elements can be determined, as shown in Table XIV. It gives the frequency of occurrence of various elements in compounds containing the halogens iodine, bromine, chlorine, and fluorine. For comparisons, figures giving actual numbers of compounds have been normalized according to the formula B / A = C, where A is the proportion of arsenic (for example) compounds in the total file, B the proportion of arsenic-iodine compounds to all arsenic compounds, and C the factor found in the table. In the case of arsenic-iodine compounds, the figure 13.835 appears at the intersection of As and I; that is, arsenic appears 13.835 times as often in iodine-containing compounds as it does in the total file. Sulfur, on the other hand, appears only 0.077 times as often in iodine-containing compounds as it does in the total file. In some cases a chemical connection is clear, as for boron trifluoride complexes, chloroosmates, chlorozincates, chloroantimonates, mercuric chloride complexes, and the like. The statistics presented here show large increases in absolute values, since the Registry File increases rapidly over time. In many cases the percentage values will show only little change

over a period of time, with an occasional substantial variation that indicates some shift in activity in the field as shown in the published literature. CONCLUSION

These statistics have been collected and presented simply as such, statistics, with very little commentary on the author’s part. Derek de Solla Price hypothesized’* that the growth of knowledge is like the gradual solving of a very large or perhaps infinite jigsaw puzzle that was started at the center ca. 1660 with the first scientific papers and journals and is now working its way outward on all fronts. It is hoped that these statistics will contribute to the solution of the puzzle in stimulating ideas and research in chemistry, chemometrics, and information science. ACKNOWLEDGMENT

The development of the CAS Chemical Registry System was substantially supported by the National Science Foundation. Chemical Abstracts Service, a division of the American Chemical Society, gratefully acknowledges this support. SupplementaryMaterial Available: Cumulative occurrence statistics for the 200 most frequently occurring ring graphs in the C A S Chemical Registry System for the years 1974, 1978, and 1987 and for the 200 most frequently occurring ring systems in the CAS Chemical Registry System for the years 1974, 1976, 1978, and 1987 (39 pages). Ordering information is given on any current masthead page.

REFERENCES Dittmar, P. G.; Stobaugh, R. E.; Watson, C. E. “The Chemical Abstracts Service Registry System. 1. General Design”. J . Chem. InJ Comput. Sci. 1976, 16, 111-121. Freeland, R. G.; Funk, S. A.; O’Korn, L. J.; Wilson, G. A. “The Chemical Abstracts Service Chemical Registry System. 2. Augmented Connectivity Molecular Formula”. J . Chem. Inf. Comput. Sci. 1979, 19, 94-98. Blackwood, J. E.; Elliott, P. M.; Stobaugh, R. E.; Watson, C. E. “The Chemical Abstracts Service Chemical Registry System. 3. Stereochemistry”. J . Chem. Inf. Comput. Sci. 1977, 17, 3-8. Vander Stouw, G. G.; Gustafson, C. R.; Rule, J. D.; Watson, C. E. “The Chemical Abstracts Service Chemical Registry System. 4. Use of the Registry System To Support the Preparation of Index Nomenclature”. J . Chem. InJ Comput. Sci. 1976, 16, 213-218. Zamora, A,; Dayton, D. L. “The Chemical Abstracts Service Chemical Registry System. 5. Structure Input and Editing”. J . Chem. InJ Cornput. Sci. 1976, 16, 219-222. Stobaugh, R. E. “The Chemical Abstracts Service Chemical Registry System. 6. Substance-Related Statistics”. J. Chem. Inf Comput. Sci. 1980, 20, 76-82. Mockus, J.; Stobaugh, R. E. “The Chemical Abstracts Service Chemical Registry System. 7. Tautomerism and Alternating Bonds”. J . Chem. Inf Comput. Sci. 1980, 20, 18-22. Moosemiller, J. P.; Ryan, A. W.; Stobaugh, R. E. “The Chemical Abstracts Service Chemical Registry System. 8. Manual Registration”. J . Chem. Inf. Comput. Sci. 1980, 20, 83-88. Ryan, A. W.; Stobaugh, R. E. “The Chemical Abstracts Service Chemical Registry System. 9. Input Structure Conventions“. J. Chem. InJ Comput. Sci. 1982, 22, 22-28. Hamill, K. A,; Nelson, R. D.; Vander Stouw, G. G.; Stobaugh, R. E. “The Chemical Abstracts Service Chemical Registry System. 10. Registration of Substances from Pre-1965 Indexes of Chemical Abstracts”. J . Chem. InJ Comput. Sci. (preceding paper in this issue). Leighner, H. L.; Leiter, D. P. “A Statistical Analysis of the Structure Registry at Chemical Abstracts Service”. Presented at the 154th National Meeting of the American Chemical Society, Chicago, IL, Sept 11, 1967. de Solla Price, D. “The Revolution of Mapping of Science“. Presented at the 42nd Annual Meeting of the American Society for Information Science, Minneapolis, MN, Oct 14-18, 1979.